How REX Validates Benchmark Correctness Without Trusting Naive Diffs

Thu, 16 Apr 2026 00:00:00 +0000

The previous post in this series focused on fairness in performance comparison: same runtime stack, same user intent, and the right meaning of time.

This post covers the correctness half of the same problem.

The question is:

when a benchmark is supposed to prove that native LLVM and REX still compute the same thing, what exactly counts as “the same thing”?

The answer turned out to be more careful than a raw diff.

Correctness on ./Code