How REX Separated GPU-Total From Wall-Clock Noise In pathfinder And srad

Tue, 28 Apr 2026 00:00:00 +0000

The previous post closed the last obvious fair b+tree kernel-body gap. REX was no longer relying on an unfair launch-shape rewrite, and it no longer needed a global cache flag. It recovered read-only provenance in the generated device kernel and emitted selective __ldg(...) loads where the proof was strong enough.

That left a strange-looking benchmark table.

Some rows were clearly resolved. b+tree had moved from a fair loss into a clear REX win. Several other benchmarks already had stable REX advantages. But three rows still looked suspicious if we looked only at the broad comparison table:

Profiling on ./Code

How REX Separated GPU-Total From Wall-Clock Noise In pathfinder And srad