Posts, notes, and experiments.

Why REX Treats GPU Benchmark Results As An Investigation Surface, Not A Scoreboard


A focused explanation of the top-layer benchmark contract in REX: same runtime stack, same inputs, normalized correctness, benchmark-appropriate timing, clause-preserving fairness, and results interpreted as evidence rather than a simplistic scoreboard.
Read more ⟶

Why REX's GPU Benchmark Layer Must Not Become A Catch-All Test Suite


A focused case for keeping REX's GPU benchmark layer narrow: benchmarks are the final reality check, but they are the wrong place to first detect parser drift, semantic normalization bugs, or simple lowering-structure regressions.
Read more ⟶

What Only Real GPU Benchmarks Still Catch In REX


A focused case for why REX still needs full GPU benchmark runs after parser, AST, lowering, and CPU-equivalence tests already pass: only real applications exposed misplaced offload init, bad timing proxies, baseline drift, and full-stack integration issues.
Read more ⟶

How REX Validates Benchmark Correctness Without Trusting Naive Diffs


A focused walkthrough of the correctness side of REX benchmark validation: stripping non-semantic output lines, using reduced-output modes when benchmarks hide their results, and distinguishing current native-versus-REX agreement from stale-baseline drift.
Read more ⟶

How REX Makes Fair GPU Offloading Comparisons Against Native LLVM


A focused methodology post on fair GPU benchmark comparisons between REX and native LLVM: same runtime stack, clause-preserving launch policy, benchmark-appropriate timing sources, and correctness rules that do not confuse drift with regression.
Read more ⟶

How REX Emits `omp_offloading_entries` And Keeps Kernel Identity Aligned


A focused walkthrough of the host offload-entry table in REX: how `__tgt_offload_entry` objects are emitted, why REX uses synthetic host identity symbols, how entry names stay aligned with generated device kernels, and how the table is joined with the CUBIN during registration.
Read more ⟶

How To Debug When REX GPU Offloading Builds But Does Not Run


A focused debugging guide for REX GPU offloading failures that appear only at runtime: checking CUBIN presence, offload-entry integrity, kernel-name matching, init ordering, and map-array correctness.
Read more ⟶

How REX Places `rex_offload_init()` And Why It Avoids Automatic `rex_offload_fini()`


A focused walkthrough of REX's offload init/fini policy: why the lowerer eagerly inserts `rex_offload_init()` near the top of `main`, why it deliberately does not auto-insert `rex_offload_fini()`, and what the lowering tests verify about that decision.
Read more ⟶

How `rex_kmp.h` Rewrites Offloading Runtime Calls In REX


A focused walkthrough of the wrapper layer in `rex_kmp.h`: vendored runtime ABI structs, asm aliases for the real `libomptarget` symbols, direct and safe REX wrappers, macro rewriting of generated `__tgt_*` calls, and the `ident_t *` bridge for `__tgt_target_kernel`.
Read more ⟶

How REX Registers CUBIN Images With `libomptarget`


A focused walkthrough of REX's device-image registration path: why it uses standalone CUBIN files, how `register_cubin.cpp` builds `__tgt_device_image` and `__tgt_bin_desc`, how one-time registration is synchronized, and why `rex_offload_init()` is explicit.
Read more ⟶