Why REX Does Not Let Clang Own the OpenMP AST
OpenMPIR, converts them into SgOmp*, and treats those Sage nodes as the real OpenMP contract.One of the most important design decisions in REX is also one of the easiest to misread from the outside.
REX uses Clang and LLVM in its toolchain. But when it comes to OpenMP, REX does not treat Clang’s AST as the source of truth.
That is not because Clang’s OpenMP implementation is weak. It is because REX is solving a different problem.
REX is a source-to-source compiler built in the ROSE/Sage world. Its job is not just to accept OpenMP code and produce a binary. Its job is to:
- parse and analyze directive structure,
- rewrite the program while keeping source-level intent visible,
- lower that structure into explicit host/device code when needed,
- and still be able to unparse the result back into readable source.
Once you understand that requirement, the “why not just let Clang own OpenMP?” question answers itself: if OpenMP lives only in a foreign AST, then REX loses ownership of exactly the structure it needs to transform.
Figure 1. Clang is still in the toolchain, but REX does not let Clang become the OpenMP source of truth. The decisive boundary is where the OpenMP model lives. In REX, it lives inside Sage as SgOmp*.
This is not a “Clang bad” story
It is important to be precise here. Clang is very good at what it is built to do. If your goal is “compile a C/C++ OpenMP program with LLVM’s frontend and runtime,” then Clang owning the OpenMP AST is completely natural.
REX is built around a different center of gravity:
- the main program representation is the Sage AST,
- source-preserving rewrites are first-class,
- Fortran support matters alongside C/C++,
- and the pipeline has to support checkpoints before lowering, after AST construction, and after analysis.
That means the question is not “can Clang represent OpenMP?” Of course it can.
The real question is: what representation should the rest of the REX pipeline depend on?
For a source-to-source compiler, the answer has to be: the compiler’s own AST.
The core problem: Dual ownership is a trap
Imagine the alternative design.
The generic frontend still builds the Sage AST for the rest of the program. But OpenMP is delegated to Clang, which builds its own OpenMP AST model somewhere else. Now the compiler has two problems immediately:
The program no longer has one source of truth.
The base language lives in Sage. OpenMP lives in Clang. Any transformation that needs both is now a cross-AST synchronization problem.Every OpenMP rewrite becomes a translation problem.
If a pass modifies a loop, a clause expression, or directive attachment in the Sage AST, the OpenMP model in Clang either becomes stale or has to be rebuilt. If the pass modifies OpenMP in the Clang model, Sage no longer reflects the real transformed program.
That is a bad place to be for any compiler. It is especially bad for a source-to-source compiler, where inspectability and unparsing are part of the product, not optional extras.
REX avoids that trap by making OpenMP cross the boundary early:
- keep directive text alive in the Sage frontend,
- parse directive structure with
ompparser, - convert it into
SgOmp*, - from then on, let the rest of the compiler work on one AST world.
The result is much less glamorous than “we reuse Clang’s OpenMP AST,” but it is much more operationally correct.
Why source-to-source changes the design completely
If REX were only a compile-to-binary frontend, several shortcuts would become more attractive.
But source-to-source compilers need things that ordinary compile-only frontends can afford not to care about as much:
- original directive spelling still matters,
- source locations still matter after rewriting,
- clause structure needs to remain explainable to a human reader,
- unparsing needs to produce understandable code,
- partial pipeline checkpoints need to stop on meaningful compiler-owned state.
This is why the first OpenMP post in the series emphasized ownership instead of just parsing mechanics. The central technical requirement is not “recognize #pragma omp.” The real requirement is:
make OpenMP a first-class citizen of the same AST universe as the rest of the transformed program.
That is exactly what SgOmp* gives REX.
Figure 2. Once OpenMP is represented as SgOmp*, the rest of the compiler gets a simpler life. Analysis, lowering, unparsing, and testing all consume one representation instead of coordinating between incompatible models.
Why OpenMPIR exists instead of building SgOmp* directly
REX also does not jump directly from pragma text to SgOmp*. It inserts one intermediate layer:
- Stage 1 and
ompparserbuildOpenMPIR - Stage 2 converts
OpenMPIRintoSgOmp*
That intermediate step is part of the same ownership story.
The standalone OpenMP parser is responsible for:
- directive grammar,
- clause spelling,
- combined constructs,
- begin/end directive matching,
- basic OpenMP structural recognition.
The Sage AST builder is responsible for:
- AST attachment,
- body ownership,
- source positions,
- scope-aware clause expressions,
- and making the result part of the compiler’s main IR.
This split is much cleaner than importing a foreign AST wholesale. It lets REX isolate “OpenMP grammar knowledge” from “compiler-owned AST knowledge” while still ending up with a single internal representation.
Why C/C++ and Fortran force this issue
REX does not only care about C/C++. It also has to deal with Fortran OpenMP comments, continuation lines, and begin/end directive forms.
That alone is a strong argument against letting Clang define the model boundary.
Clang is not the natural owner of the whole language space REX needs to support. REX’s frontend, however, already lives at the boundary where:
- C/C++ pragmas can be collected from
SgPragmaDeclaration, - Fortran directive comments can be gathered and normalized,
- both can be parsed through
ompparser, - and both can be converted into the same
SgOmp*family.
That is a much healthier architectural shape than “use one AST model for one language family and a different model for another.”
The key benefit is not only reuse. It is semantic convergence. Once both languages become SgOmp*, every later stage can stay focused on transformations instead of on language-specific directive collection trivia.
Why Clause Expressions Must Stay In Sage’s Semantic World
One of the deepest reasons REX cannot stop at a foreign OpenMP AST is that OpenMP clauses embed real host-language semantics.
Examples:
if(cond)map(tofrom: a[0:n])depend(inout: a[i])schedule(static, chunk)
These are not just directive tokens. They contain:
- identifiers that must resolve in the surrounding scope,
- types that matter,
- array sections that later lowering needs to decompose,
- source positions that later passes may want to preserve.
REX therefore makes a very deliberate split:
ompparserhandles directive grammar,- Sage-aware helpers such as
parseOmpExpression(...)andparseOmpArraySection(...)reconstruct clause payloads in the correct Sage context, - the final clause nodes become part of
SgOmp*.
That split is exactly what a source-to-source compiler wants. If clause payloads lived only in a foreign AST, then either:
- later Sage-based passes would have to ask another AST system for semantic details, or
- REX would have to translate those clause expressions back into Sage anyway.
At that point, the foreign OpenMP AST is not simplifying the design. It is just delaying the conversion while making ownership murkier.
Why unparsing is a first-class argument
Many compiler architectures treat unparsing as an afterthought. REX cannot.
REX needs to be able to emit transformed source that is:
- still readable,
- still inspectable,
- and still faithful enough that a developer can understand what changed.
That only works cleanly if the transformed program is represented in the same IR that the unparser already knows how to walk.
And this is not theoretical. The REX backend has explicit support for SgOmp* in both C/C++ and Fortran unparsing paths. That means once OpenMP becomes part of the Sage AST, the compiler can continue to operate in one representation all the way out to source emission.
If Clang’s AST owned OpenMP instead, REX would face an ugly choice:
- translate the transformed OpenMP model into Sage anyway so the existing unparser can emit it, or
- bolt on a separate unparsing path that is aware of a foreign OpenMP model.
Neither is attractive. The first proves that Sage still needs to be the final owner. The second creates long-term maintenance debt and splits the backend’s mental model.
Why the checkpoints matter
Another benefit of REX ownership is that the OpenMP pipeline can stop at meaningful internal stages:
- parse-only: parse directives into
OpenMPIR - AST-only: build
SgOmp*and stop - analysis-only
- lowering
Those options are explicitly exposed in the command-line layer. That is not just a debugging convenience. It reflects a pipeline where the compiler owns each step strongly enough that each step can stand on its own.
If Clang owned the OpenMP AST, those checkpoints would either disappear or become much less coherent:
- what exactly is “AST-only” if the OpenMP AST is not Sage’s AST?
- how do you inspect the transformed Sage program if the OpenMP structure is still owned elsewhere?
- how do you debug “directive parsed correctly, attached incorrectly” if the attachment boundary is split between AST systems?
Checkpoint-friendly architecture is not accidental. It comes directly from keeping OpenMP inside the compiler’s own representation.
Figure 3. The REX OpenMP checkpoints are meaningful because the compiler owns the representation at every stage. Once OpenMP is inside Sage, each checkpoint has a clear artifact and a clear debugging story.
What Clang still does in the toolchain
Not letting Clang own the OpenMP AST does not mean Clang disappears.
Clang and LLVM still matter downstream:
- the generated source is compiled with the normal toolchain,
- the lowered offloading code eventually speaks LLVM’s runtime ABI,
- the backend still relies on the broader LLVM ecosystem for code generation and execution.
So the right mental model is:
- Clang still compiles the result,
- LLVM still matters for runtime and code generation,
- but the OpenMP transformation model remains owned by REX.
That is the clean boundary. REX owns the semantic and structural manipulation of OpenMP directives. Downstream tools compile the output of that manipulation.
What would get harder if REX delegated OpenMP ownership to Clang
The simplest way to test an architecture is to ask what breaks when you invert it.
If REX delegated OpenMP ownership to Clang, at least five things would get materially worse:
Cross-language consistency
C/C++ and Fortran would stop converging into one OpenMP representation under REX ownership.Clause-expression integration
Clause payloads would be harder to keep aligned with Sage scope and Sage symbol tables.Source-preserving rewrites
Rewriting in Sage while OpenMP lives elsewhere invites synchronization bugs and stale structure.Unparsing
The backend would either need translation back into Sage or special-case foreign-model support.Debugging checkpoints
Parse-only, AST-only, and lowering-stage debugging would stop lining up cleanly with compiler-owned artifacts.
None of these are hypothetical inconveniences. They are the kinds of structural mismatches that quietly dominate engineering time once the architecture is large enough.
Why the design still scales as OpenMP evolves
OpenMP keeps evolving. New clauses appear. Combined constructs grow. Offloading behavior changes. Toolchains drift.
A compiler architecture that depends on someone else’s AST model too deeply becomes fragile when any of those things move. REX’s design is more resilient because the ownership layers are clear:
- Stage 1 preserves and parses directive structure,
- Stage 2 converts it into compiler-owned
SgOmp*, - later stages consume that representation,
- compatibility layers at lowering/runtime boundaries absorb downstream changes.
This is not “more layers for fun.” It is what allows the compiler to change one boundary without destabilizing every other boundary.
The design in one sentence
REX does not let Clang own the OpenMP AST because a source-to-source compiler has to own the structure it rewrites.
That is why OpenMP in REX travels from directive text, to OpenMPIR, to SgOmp*, and only then into lowering and downstream compilation. The design is not about avoiding Clang. It is about keeping OpenMP in the same IR world as the rest of the program so transformation, debugging, testing, and unparsing all stay coherent.