How OpenMP AST Construction Works in REX: OpenMPIR to SgOmp*
ompparser does not directly build the compiler’s main AST. It builds OpenMPIR. Stage 2 is the bridge: OpenMPIRToSageAST() walks collected directives bottom-up, converts each OpenMPDirective into SgOmp* nodes, parses embedded clause expressions in the correct Sage scope, and attaches everything so later lowering and unparsing operate on a single REX-owned OpenMP model.The first post in this series explained why REX insists on owning the OpenMP model instead of outsourcing it to Clang. This post zooms into the most important piece of that decision—the bridge between the standalone directive parser and the main Sage/ROSE AST—which provides critical context for the performance tuning and migration topics discussed elsewhere in this series.
In REX terms, this is Stage 2:
ompparserparses directive text into an intermediate form,OpenMPIR.ompAstConstruction.cppconverts thatOpenMPIRinto real Sage OpenMP nodes:SgOmp*.
This stage is where the directive stops being “text attached to a statement” and becomes a first-class AST object that the rest of the compiler can transform, analyze, and unparse.
Figure 1. Stage 2 is an AST bridge. REX keeps OpenMP parsing and OpenMP AST construction separate so the directive grammar stays isolated, while expression parsing and scoping remain under the main compiler.
Why Stage 2 Exists At All
It is tempting to ask: why not teach the parser to build SgOmp* directly?
REX deliberately avoids that for two reasons:
Directive grammar and host-language semantics should not be entangled.
ompparseris excellent at parsing the OpenMP directive language (keywords, clause spelling, combined constructs, begin/end forms). But OpenMP clauses embed host-language expressions that need correct name lookup, correct types, and correct source attachment. That is best done in the host compiler’s AST world, not in a standalone directive parser.REX needs one OpenMP model inside Sage.
Once the directive becomesSgOmp*, all later stages (lowering, code generation, unparser) can work on a single representation. That is how REX keeps the pipeline debuggable and source-preserving.
So Stage 2 is “where we pay the cost” of owning OpenMP. It is also where we get the biggest payoff: the rest of the pipeline becomes simpler because it only has to understand Sage nodes, not OpenMPIR.
The Inputs: What Stage 2 Actually Converts
At the end of Stage 1, REX has:
- a normal Sage AST for the program, including
SgPragmaDeclarationnodes (or pragma-like placeholders) for OpenMP directives; and - a list of parsed OpenMPIR directives (
OpenMPDirective*) produced byompparser.
The crucial detail is that Stage 2 does not “search for directives again.” Stage 1 already paired each OpenMP pragma site with its parsed OpenMPDirective.
Conceptually, Stage 2 converts a collection of pairs:
| |
Each pair answers:
- Where is the directive attached in the Sage AST? (
SgPragmaDeclaration*) - What is the directive, structurally? (
OpenMPDirective*)
Everything else flows from those two pieces of information.
The Driver: OpenMPIRToSageAST() Is Bottom-Up On Purpose
The entry point for Stage 2 is OpenMPIRToSageAST(). The first design choice you notice is that it converts directives bottom-up, using a reverse walk.
This matters because OpenMP constructs are frequently nested. Consider:
| |
If the compiler converts the outer directive first, it risks:
- claiming the wrong body,
- reparenting statements too early,
- losing the link between the inner directive and its associated statement.
Bottom-up conversion avoids those hazards.
The idea is:
- convert the innermost constructs and attach them correctly;
- then convert the outer construct, now that its body already contains the correct OpenMP substructure.
Figure 2. Stage 2 converts nested directives bottom-up. This keeps attachment rules simple: inner directives are fully materialized before an outer directive claims a body.
In practice, the conversion loop looks like this (simplified):
| |
That reverse iteration is one of those “small but load-bearing” details. It is not a performance choice; it is an attachment-correctness choice.
Converting A Directive: “What Node Do We Build, And Where Does It Go?”
After Stage 2 starts iterating, every directive conversion has to answer the same set of questions:
- Which
SgOmp*statement corresponds to this directive spelling? - Does it own a body? If so, which Sage statement becomes that body?
- Which clauses exist, and how do we attach them?
- Which expressions appear inside those clauses, and what Sage scope must they be parsed in?
This is why ompAstConstruction.cpp has a family of conversion helpers rather than one monolithic function. The dispatcher (convertDirective) delegates to helpers that match the structural shape of the directive.
Two high-level categories cover most directives:
- Body directives: directives that are immediately associated with a structured block or statement.
- Example:
parallel,task,target,teams.
- Example:
- Combined body directives: combined constructs such as
target teams distribute parallel for.- These usually need to build multiple Sage nodes (or a combined node) and attach them in the correct nesting order.
Even when the final Sage node is a single SgOmp* statement, the conversion code typically has to:
- find or build the correct body statement,
- construct the clause list objects,
- attach clause expressions with correct scoping,
- set source positions so unparsing remains stable and debugging stays honest.
Clause Conversion: Preserve Structure, Parse Expressions Late
Clauses are where Stage 2 earns its keep. From the directive grammar’s point of view, clauses are just a structured list of keywords plus “expression-ish” payloads.
But from a compiler’s point of view, clause payloads are real program expressions:
if(cond)must become anSgExpressionthat type-checks and resolves symbols.map(tofrom: a[0:n])contains an array section expression with a base and bounds.depend(inout: a[i])contains lvalues and subscripts.
The design REX uses is:
ompparserparses the directive language and stores clause payloads in a structured form (OpenMPClauseobjects), typically keeping host-language expressions as unparsed text.- Stage 2 uses Sage-aware helpers to parse or reconstruct those expressions in the correct scope.
- The resulting
SgExpression*nodes are attached toSgOmp*Clauseobjects that become part of the Sage OpenMP AST.
The key idea is: do not parse host-language expressions inside the directive parser. Parse them where scope and symbols are available.
Figure 3. Clause payloads look like strings to a directive parser, but they must become real expressions in the compiler IR. REX parses them in a Sage scope so the resulting AST is correct and reusable in later passes.
Array Sections: The “Looks Simple, Is Not” Clause Payload
Array sections are a good example of why Stage 2 cannot be just a mechanical “keyword mapping.”
Consider:
| |
The a[0:n] payload is not a normal C expression. It is an OpenMP-specific array section with:
- a base (
a) - a lower bound (
0) - a length (
n)
Lowering and runtime mapping care about those pieces individually. Stage 2 needs to preserve them in AST form, not just as a string.
This is why Stage 2 has dedicated helpers (for example, array section parsing helpers) rather than treating the payload as opaque text.
Fortran: Normalize Syntax Early, Reuse The Same Bridge
REX also supports OpenMP in Fortran. The surface syntax differs:
- OpenMP is often expressed as comments (
!$omp ...) - constructs may come in begin/end forms (
!$omp parallel…!$omp end parallel)
But the important architectural goal is: do not fork the Stage 2 pipeline.
Instead, REX normalizes Fortran OpenMP comment directives into pragma-like nodes early, so the same Stage 2 conversion can run:
- lift Fortran OpenMP comment directives into temporary pragma nodes attached to Sage statements;
- parse directive text into
OpenMPIRusing the sameompparser; - run
OpenMPIRToSageAST()to buildSgOmp*nodes.
This is a practical example of a general REX principle: normalize surface syntax differences into a common internal representation early, so later stages remain single-path.
Debugging Stage 2: What To Inspect When It Breaks
When Stage 2 breaks, you typically see one of three symptom classes:
- Wrong nesting: the wrong statement becomes the body of a directive.
- Missing clause payload: a clause exists but its expression is null or malformed.
- Scope errors: a clause expression refers to a symbol that is not resolved correctly in the Sage AST.
The debugging approach that works best is to inspect Stage 2’s artifacts in increasing detail:
- Check the
OpenMPIRoutput first (directive spelling and clause structure). - Check the generated
SgOmp*nodes (directive kind, clause list, body attachment). - Only then check lowering output, because lowering failures are often downstream of a Stage 2 attachment mistake.
The main reason REX keeps parse-only and AST-only checkpoints is exactly this: it lets you stop at the first incorrect stage instead of debugging in the lowerer when the real bug is “the AST node was attached to the wrong statement.”
The Stage 2 Philosophy In One Sentence
Stage 2 exists because OpenMP is a directive language that embeds host-language semantics.
REX’s approach is to:
- parse directive structure with a dedicated OpenMP parser, and then
- construct compiler-native OpenMP AST nodes in the main AST, with correct scoping and attachments.
That is how REX keeps transformation ownership inside the compiler while still producing code that downstream toolchains can compile and run.