Benchmarks

Performance

Single-threaded, one file (7.3 GB Illumina WES). Pipelines typically parallelise across files via an external scheduler (Nextflow, Snakemake), so single-file throughput is the relevant metric.

Wall clock time

Java

9m 46s

Rust

3m 13s 3.0x faster

Peak memory (RSS)

Java

444 MB

Rust

28 MB 16x less

Full benchmark report

Where the speed comes from

Three specific optimisations, each identified through CPU profiling:

SIMD adapter search

The memchr crate uses ARM NEON / SSE to compare 16 bytes per cycle. Adapter search drops from 35% to 6% of runtime.

Lookup-table base counting

A 256-byte compile-time table replaces unpredictable branch chains, halving time spent in per-base modules.

Zero per-read allocation

In-place uppercase conversion, zero-copy tile parsing, byte-slice tracking. No garbage collection overhead.

Test dataset

119 GB across 11 files: Illumina WES, Element AVITI, ONT PromethION long-read, and BAM. Apple Silicon (aarch64-apple-darwin). The full report has per-file results, memory profiles, a CPU breakdown, and analysis of whether these optimisations could be back-ported to Java.