linux-x64 · ns/op

Benchmarks

Best-of-5, ns/op (lower is better), on linux-x64. The native packages are the AVX2 / SSE4.2 builds. Green is the fastest per row. All three reject malformed framing; they differ in how much they validate — see the trade-off.

Request header parsing — contiguous

Single buffer, parsed in place. → BinaryRequest for Glyph11 / Pico; raw field spans for Native.

Payload	Glyph11	Glyph11.Native	Glyph11.Pico
~95 B	120 ns	99 ns	80 ns
4 KB	750 ns	502 ns	487 ns
32 KB	5251 ns	3752 ns	3370 ns

Request header parsing — multi-segment

Fragmented request (3 segments) linearized into a buffer per request, then parsed.

Payload	Glyph11	Glyph11.Native	Glyph11.Pico
~95 B	245 ns	116 ns	118 ns
4 KB	1258 ns	721 ns	674 ns
32 KB	8695 ns	4969 ns	4627 ns

Both native packages are ~2× the managed parser on fragmented input; Pico leads on large, Native on tiny.

Chunked body decoding

Decoded size. Glyph11.Pico reuses Glyph11's ChunkedBodyStream, so they share a column.

Decoded	Glyph11 / Pico	Glyph11.Native
256 B	19 ns	21 ns
4 KB	115 ns	70 ns
32 KB	826 ns	808 ns

Methodology

Best-of-5 timed trials after warmup, on identical payloads, in one process.
Native builds: libglyph11 at -march=x86-64-v3 (AVX2); libglyph11pico with picohttpparser's SSE4.2 path.
Same output where comparable: Glyph11 and Pico both build a full BinaryRequest (path/query split, header list); Native fills raw (offset,length) spans.
Multi-segment linearizes a fresh buffer per request (3 segments → one contiguous block), then parses — the cost of fragmentation.

The harnesses are in bench/. Numbers vary run-to-run and by hardware; treat them as relative, not absolute.