Benchmarks

Performance

Single-threaded, one file (7.3 GB Illumina WES). Pipelines typically parallelise across files via an external scheduler (Nextflow, Snakemake), so single-file throughput is the relevant metric.

Wall clock time

Java
9m 46s
Rust
3m 13s 3.0x faster

Peak memory (RSS)

Java
444 MB
Rust
28 MB 16x less
Full benchmark report

Where the speed comes from

Three specific optimisations, each identified through CPU profiling:

SIMD adapter search

The memchr crate uses ARM NEON / SSE to compare 16 bytes per cycle. Adapter search drops from 35% to 6% of runtime.

Lookup-table base counting

A 256-byte compile-time table replaces unpredictable branch chains, halving time spent in per-base modules.

Zero per-read allocation

In-place uppercase conversion, zero-copy tile parsing, byte-slice tracking. No garbage collection overhead.

Test dataset

119 GB across 11 files: Illumina WES, Element AVITI, ONT PromethION long-read, and BAM. Apple Silicon (aarch64-apple-darwin). The full report has per-file results, memory profiles, a CPU breakdown, and analysis of whether these optimisations could be back-ported to Java.