How OpenMP AST Construction Works in REX: OpenMPIR to SgOmp*

Posted on (Updated on )
In REX, ompparser does not directly build the compiler’s main AST. It builds OpenMPIR. Stage 2 is the bridge: OpenMPIRToSageAST() walks collected directives bottom-up, converts each OpenMPDirective into SgOmp* nodes, parses embedded clause expressions in the correct Sage scope, and attaches everything so later lowering and unparsing operate on a single REX-owned OpenMP model.

The first post in this series explained why REX insists on owning the OpenMP model instead of outsourcing it to Clang. This post zooms into the most important piece of that decision—the bridge between the standalone directive parser and the main Sage/ROSE AST—which provides critical context for the performance tuning and migration topics discussed elsewhere in this series.

In REX terms, this is Stage 2:

  • ompparser parses directive text into an intermediate form, OpenMPIR.
  • ompAstConstruction.cpp converts that OpenMPIR into real Sage OpenMP nodes: SgOmp*.

This stage is where the directive stops being “text attached to a statement” and becomes a first-class AST object that the rest of the compiler can transform, analyze, and unparse.

A bridge that converts OpenMPIR produced by ompparser into Sage AST SgOmp* nodes, keeping clauses and expressions connected to the surrounding scope.

Figure 1. Stage 2 is an AST bridge. REX keeps OpenMP parsing and OpenMP AST construction separate so the directive grammar stays isolated, while expression parsing and scoping remain under the main compiler.

Why Stage 2 Exists At All

It is tempting to ask: why not teach the parser to build SgOmp* directly?

REX deliberately avoids that for two reasons:

  1. Directive grammar and host-language semantics should not be entangled.
    ompparser is excellent at parsing the OpenMP directive language (keywords, clause spelling, combined constructs, begin/end forms). But OpenMP clauses embed host-language expressions that need correct name lookup, correct types, and correct source attachment. That is best done in the host compiler’s AST world, not in a standalone directive parser.

  2. REX needs one OpenMP model inside Sage.
    Once the directive becomes SgOmp*, all later stages (lowering, code generation, unparser) can work on a single representation. That is how REX keeps the pipeline debuggable and source-preserving.

So Stage 2 is “where we pay the cost” of owning OpenMP. It is also where we get the biggest payoff: the rest of the pipeline becomes simpler because it only has to understand Sage nodes, not OpenMPIR.

The Inputs: What Stage 2 Actually Converts

At the end of Stage 1, REX has:

  • a normal Sage AST for the program, including SgPragmaDeclaration nodes (or pragma-like placeholders) for OpenMP directives; and
  • a list of parsed OpenMPIR directives (OpenMPDirective*) produced by ompparser.

The crucial detail is that Stage 2 does not “search for directives again.” Stage 1 already paired each OpenMP pragma site with its parsed OpenMPDirective.

Conceptually, Stage 2 converts a collection of pairs:

1
2
// Conceptual shape (not exact code):
std::vector<std::pair<SgPragmaDeclaration*, OpenMPDirective*>> omp_pragma_list;

Each pair answers:

  • Where is the directive attached in the Sage AST? (SgPragmaDeclaration*)
  • What is the directive, structurally? (OpenMPDirective*)

Everything else flows from those two pieces of information.

The Driver: OpenMPIRToSageAST() Is Bottom-Up On Purpose

The entry point for Stage 2 is OpenMPIRToSageAST(). The first design choice you notice is that it converts directives bottom-up, using a reverse walk.

This matters because OpenMP constructs are frequently nested. Consider:

1
2
3
4
5
#pragma omp parallel
{
  #pragma omp for
  for (int i = 0; i < n; ++i) work(i);
}

If the compiler converts the outer directive first, it risks:

  • claiming the wrong body,
  • reparenting statements too early,
  • losing the link between the inner directive and its associated statement.

Bottom-up conversion avoids those hazards.

The idea is:

  1. convert the innermost constructs and attach them correctly;
  2. then convert the outer construct, now that its body already contains the correct OpenMP substructure.
Bottom-up OpenMPIR to SgOmp conversion for nested directives, converting the inner omp for before the outer omp parallel to preserve correct attachment.

Figure 2. Stage 2 converts nested directives bottom-up. This keeps attachment rules simple: inner directives are fully materialized before an outer directive claims a body.

In practice, the conversion loop looks like this (simplified):

1
2
3
4
5
for (auto it = omp_pragma_list.rbegin(); it != omp_pragma_list.rend(); ++it) {
  SgPragmaDeclaration *decl = it->first;
  OpenMPDirective *dir = it->second;
  convertDirective(std::make_pair(decl, dir));
}

That reverse iteration is one of those “small but load-bearing” details. It is not a performance choice; it is an attachment-correctness choice.

Converting A Directive: “What Node Do We Build, And Where Does It Go?”

After Stage 2 starts iterating, every directive conversion has to answer the same set of questions:

  • Which SgOmp* statement corresponds to this directive spelling?
  • Does it own a body? If so, which Sage statement becomes that body?
  • Which clauses exist, and how do we attach them?
  • Which expressions appear inside those clauses, and what Sage scope must they be parsed in?

This is why ompAstConstruction.cpp has a family of conversion helpers rather than one monolithic function. The dispatcher (convertDirective) delegates to helpers that match the structural shape of the directive.

Two high-level categories cover most directives:

  • Body directives: directives that are immediately associated with a structured block or statement.
    • Example: parallel, task, target, teams.
  • Combined body directives: combined constructs such as target teams distribute parallel for.
    • These usually need to build multiple Sage nodes (or a combined node) and attach them in the correct nesting order.

Even when the final Sage node is a single SgOmp* statement, the conversion code typically has to:

  1. find or build the correct body statement,
  2. construct the clause list objects,
  3. attach clause expressions with correct scoping,
  4. set source positions so unparsing remains stable and debugging stays honest.

Clause Conversion: Preserve Structure, Parse Expressions Late

Clauses are where Stage 2 earns its keep. From the directive grammar’s point of view, clauses are just a structured list of keywords plus “expression-ish” payloads.

But from a compiler’s point of view, clause payloads are real program expressions:

  • if(cond) must become an SgExpression that type-checks and resolves symbols.
  • map(tofrom: a[0:n]) contains an array section expression with a base and bounds.
  • depend(inout: a[i]) contains lvalues and subscripts.

The design REX uses is:

  1. ompparser parses the directive language and stores clause payloads in a structured form (OpenMPClause objects), typically keeping host-language expressions as unparsed text.
  2. Stage 2 uses Sage-aware helpers to parse or reconstruct those expressions in the correct scope.
  3. The resulting SgExpression* nodes are attached to SgOmp*Clause objects that become part of the Sage OpenMP AST.

The key idea is: do not parse host-language expressions inside the directive parser. Parse them where scope and symbols are available.

Parsing a clause expression inside the Sage scope so symbol lookup and typing are correct, then attaching the resulting SgExpression to an SgOmp clause node.

Figure 3. Clause payloads look like strings to a directive parser, but they must become real expressions in the compiler IR. REX parses them in a Sage scope so the resulting AST is correct and reusable in later passes.

Array Sections: The “Looks Simple, Is Not” Clause Payload

Array sections are a good example of why Stage 2 cannot be just a mechanical “keyword mapping.”

Consider:

1
#pragma omp target map(tofrom: a[0:n])

The a[0:n] payload is not a normal C expression. It is an OpenMP-specific array section with:

  • a base (a)
  • a lower bound (0)
  • a length (n)

Lowering and runtime mapping care about those pieces individually. Stage 2 needs to preserve them in AST form, not just as a string.

This is why Stage 2 has dedicated helpers (for example, array section parsing helpers) rather than treating the payload as opaque text.

Fortran: Normalize Syntax Early, Reuse The Same Bridge

REX also supports OpenMP in Fortran. The surface syntax differs:

  • OpenMP is often expressed as comments (!$omp ...)
  • constructs may come in begin/end forms (!$omp parallel!$omp end parallel)

But the important architectural goal is: do not fork the Stage 2 pipeline.

Instead, REX normalizes Fortran OpenMP comment directives into pragma-like nodes early, so the same Stage 2 conversion can run:

  1. lift Fortran OpenMP comment directives into temporary pragma nodes attached to Sage statements;
  2. parse directive text into OpenMPIR using the same ompparser;
  3. run OpenMPIRToSageAST() to build SgOmp* nodes.

This is a practical example of a general REX principle: normalize surface syntax differences into a common internal representation early, so later stages remain single-path.

Debugging Stage 2: What To Inspect When It Breaks

When Stage 2 breaks, you typically see one of three symptom classes:

  • Wrong nesting: the wrong statement becomes the body of a directive.
  • Missing clause payload: a clause exists but its expression is null or malformed.
  • Scope errors: a clause expression refers to a symbol that is not resolved correctly in the Sage AST.

The debugging approach that works best is to inspect Stage 2’s artifacts in increasing detail:

  1. Check the OpenMPIR output first (directive spelling and clause structure).
  2. Check the generated SgOmp* nodes (directive kind, clause list, body attachment).
  3. Only then check lowering output, because lowering failures are often downstream of a Stage 2 attachment mistake.

The main reason REX keeps parse-only and AST-only checkpoints is exactly this: it lets you stop at the first incorrect stage instead of debugging in the lowerer when the real bug is “the AST node was attached to the wrong statement.”

The Stage 2 Philosophy In One Sentence

Stage 2 exists because OpenMP is a directive language that embeds host-language semantics.

REX’s approach is to:

  • parse directive structure with a dedicated OpenMP parser, and then
  • construct compiler-native OpenMP AST nodes in the main AST, with correct scoping and attachments.

That is how REX keeps transformation ownership inside the compiler while still producing code that downstream toolchains can compile and run.