Posts, notes, and experiments.
How REX Places `rex_offload_init()` And Why It Avoids Automatic `rex_offload_fini()`
A focused walkthrough of REX's offload init/fini policy: why the lowerer eagerly inserts `rex_offload_init()` near the top of `main`, why it deliberately does not auto-insert `rex_offload_fini()`, and what the lowering tests verify about that decision.
Read more ⟶How `rex_kmp.h` Rewrites Offloading Runtime Calls In REX
A focused walkthrough of the wrapper layer in `rex_kmp.h`: vendored runtime ABI structs, asm aliases for the real `libomptarget` symbols, direct and safe REX wrappers, macro rewriting of generated `__tgt_*` calls, and the `ident_t *` bridge for `__tgt_target_kernel`.
Read more ⟶How REX Registers CUBIN Images With `libomptarget`
A focused walkthrough of REX's device-image registration path: why it uses standalone CUBIN files, how `register_cubin.cpp` builds `__tgt_device_image` and `__tgt_bin_desc`, how one-time registration is synchronized, and why `rex_offload_init()` is explicit.
Read more ⟶How REX Expands `declare mapper` Clauses Into Dynamic Runtime Map Entries
A focused walkthrough of REX's mapper-expansion path: scope-aware `declare mapper` resolution, recursive clause expansion, array-section deferral into dynamic entries, and the shared two-pass builder used by `target`, `target data`, and `target update`.
Read more ⟶How REX Packs Literal Target Parameters For GPU Kernels
A focused walkthrough of REX's literal target parameter path: how eligible mapped scalars are tagged, packed into pointer-sized launch arguments, stabilized in the host packet, and reconstructed inside the generated GPU kernel.
Read more ⟶How REX Lowers `target`, `target teams`, and `target parallel` Through The SPMD Path
A focused walkthrough of REX's non-loop GPU offloading branch: how `transOmpTargetSpmd()` handles `target`, `target teams`, and `target parallel`, outlines the region body, builds offload entries and runtime argument arrays, and launches a simpler SPMD kernel path without worksharing loop rewriting.
Read more ⟶How REX Lowers Target Loops Into Direct GPU Kernels
A focused walkthrough of the target-loop lowering stage in REX: how `transOmpTargetLoopBlock()` recovers canonical loop structure, rewrites loops into direct grid-stride kernels, keeps a round-robin fallback, and shares loop analysis with host launch shaping.
Read more ⟶How REX Validates GPU Offloading With Real Benchmarks
A walkthrough of REX's top validation layer: building native LLVM and REX binaries side by side, comparing outputs with normalization rules, and catching runtime and performance regressions using real benchmarks.
Read more ⟶How REX Validates OpenMP Semantic Analysis With `checkOmpAnalyzing`
A focused walkthrough of REX's `checkOmpAnalyzing` layer: `-rose:openmp:analyzing`, targeted AST assertions for schedule defaults and implicit target mapping, and why this semantic-analysis checkpoint sits between frontend AST coverage and lowering tests.
Read more ⟶How REX Tests OpenMP Frontend AST Construction With The `OpenMP_tests` Corpus
A focused walkthrough of the broad `OpenMP_tests` frontend corpus in REX: `parseOmp` in `ast_only` mode, `rose_*` output generation, OpenMP-line reference diffs, mixed OpenMP/OpenACC cases, multi-file and macro-preservation checks, and the separate Fortran AST-coverage slice.
Read more ⟶