How To Debug When REX GPU Offloading Builds But Does Not Run

Mon, 13 Apr 2026 00:00:00 +0000

The previous posts in this series split the runtime boundary into focused pieces:

how REX registers CUBIN images with libomptarget,
how rex_kmp.h rewrites generated runtime calls,
and why the lowerer inserts rex_offload_init() eagerly but avoids automatic rex_offload_fini().

Those posts explain the design.

This one is about failure handling.

Sometimes a REX-lowered program reaches a frustrating state:

That kind of failure is easy to misdiagnose because it feels like “a CUDA runtime problem” or “an LLVM issue” in the abstract.

Debugging on ./Code