How REX Tests OpenMP Frontend AST Construction With The `OpenMP_tests` Corpus
OpenMP_tests corpus is the broad frontend checkpoint in REX’s OpenMP test stack. It runs the thin parseOmp driver in -rose:openmp:ast_only mode over 225 C cases, 18 C++ cases, 13 mixed OpenMP/OpenACC cases, and a separate 53-case Fortran slice. For the C/C++ and mixed-language top-level cases, the harness then extracts OpenMP-bearing lines from the generated rose_* outputs and diffs them against curated referenceResults. That makes this layer strong at catching AST attachment bugs, unparser regressions, directive-preservation drift, multi-file issues, and comment or macro handling problems before lowering begins.The previous posts in this series filled in three important test layers:
- parser tests for
ompparser, - lowering structural tests in
lowering_rodinia, - and CPU semantic-equivalence tests in
lowering_cpu.
What was still missing was the broad middle frontend layer: the one that proves the compiler can ingest real OpenMP programs, build SgOmp* nodes, unparse them back into source, and preserve the directive structure well enough that later stages have something trustworthy to work with.
That layer lives in:
tests/nonsmoke/functional/CompileTests/OpenMP_tests
and it is much larger than the focused suites around it.
This is where REX stops asking only “did the directive parse?” and starts asking:
did the frontend build the right OpenMP AST and emit a stable
rose_*representation of it across a large corpus of real constructs and language variants?
That is a different question from parser correctness, and it deserves its own test layer.
Figure 1. The OpenMP_tests corpus is the broad frontend checkpoint. It sits after parser-only validation and before analyzer, lowering, and runtime-oriented test layers.
Why This Layer Exists Separately
The parser tests are deliberately narrow. They validate ompparser as a parser:
- can it recognize directive grammar?
- can it round-trip a directive string?
- can it preserve line and column information?
Those are the right parser questions, but they are not enough for the full frontend.
Once OpenMPIR exists, the compiler still has to:
- attach directives to the right statements,
- build the right
SgOmp*node kinds, - preserve combined constructs and clause associations,
- survive real input files with includes, macros, comments, and multiple source files,
- and unparse the AST back into a stable source representation.
That is what OpenMP_tests is for.
This suite is broader than parser testing because it exercises the actual ROSE/REX frontend path. But it is still earlier than lowering, which makes failures much easier to localize than a benchmark or GPU-offloading failure would be.
If this layer breaks, the likely diagnosis is:
- the directive grammar may still be fine,
- but AST construction, AST attachment, comment/directive collection, or unparsing drifted.
That is exactly the kind of failure report a compiler needs.
What Actually Runs: parseOmp In ast_only Mode
At the center of this layer is a very small driver:
| |
That is parseOmp.C.
The driver itself is intentionally thin. Almost all of the behavior comes from the flags the test harness passes to it. The key one is:
| |
That flag choice is the heart of the layer.
It means the suite is not trying to validate lowering or runtime code generation yet. It is trying to validate the frontend product:
- parse the source,
- run the OpenMP frontend path,
- construct the Sage OpenMP AST,
- and emit a
rose_*output file that reflects that AST.
This is exactly the right point in the pipeline to test frontend structure. It is later than a standalone directive parser, but earlier than transformations that would blur whether the bug came from AST construction or from lowering.
The main C/C++ harness makes that explicit:
| |
The output artifact is not hidden. The suite explicitly names the generated file rose_<input> and uses it as the basis for later checks.
That matters because it turns the AST-only step into a visible test surface.
The Breadth Of The Corpus
One of the most important facts about OpenMP_tests is simply its scale.
As currently registered in CMake, the suite includes:
- 225 C test cases,
- 18 C++ test cases,
- 13 mixed OpenMP/OpenACC cases,
- and a separate 53-case Fortran slice.
That breadth is not accidental.
The C and C++ lists cover a very wide swath of the language surface:
parallel,for,sections,single,master,task,taskloop,simd,declare_simd,declare_mapper,- reductions and reduction modifiers,
requires,target,teams, and combined constructs,- schedule variants,
- orphaned directives,
- preprocessing and macro cases,
- and many ordinary regression specimens that only become interesting once they pass through a real frontend.
The mixed OpenMP/OpenACC list is intentionally separate:
| |
That is a strong signal that the suite is not merely a dump of OpenMP syntax examples. It is deliberately testing one of REX’s real differentiators: it must survive in a source-to-source environment where directive systems can coexist.
The Fortran tests are registered in their own subdirectory:
| |
That separation is also healthy. It keeps the Fortran-specific frontend path visible instead of pretending that a C-centric corpus is enough.
Figure 2. The frontend corpus is deliberately varied. Different slices stress different parts of the AST-construction and unparsing path.
How The Reference Diff Works
The top-level C/C++ and mixed OpenMP/OpenACC part of the suite does not diff the entire generated rose_* file. That would be far too brittle for frontend testing.
Instead, it extracts only the lines that carry OpenMP directives or OpenMP-commented directives:
| |
Then it diffs that extracted output against a reference file under referenceResults:
| |
This is a very pragmatic test surface.
It deliberately ignores:
- unrelated formatting churn elsewhere in the output file,
- non-OpenMP text the frontend may legitimately reorder or normalize,
- and many details that are not central to frontend OpenMP correctness.
Instead, it asks a narrower and more stable question:
after AST construction and unparsing, do the OpenMP-bearing lines in the generated source still look the way this suite expects?
That is the right level of strictness here. Not too weak, not needlessly brittle.
Why this is stronger than it looks
Even though the harness only greps selected lines, it still catches a surprising amount:
- missing directives,
- wrong combined-construct spelling,
- misplaced or dropped clauses,
- comment/directive preservation regressions,
- macro-expansion drift visible in directive lines,
- and AST attachment bugs that show up when the unparser emits the wrong structure.
This is one of those testing designs that looks simple until you realize how carefully the comparison surface was chosen.
Figure 3. The frontend harness does not diff the whole unparsed file. It extracts the OpenMP-bearing lines from the generated rose_* output and compares that reduced surface to curated references.
Representative Cases
The best way to understand this layer is to look at a few representative tests and the exact kind of regression each one is meant to catch.
parallel-if-numthreads.c: clause preservation after AST construction
The reference result for rose_parallel-if-numthreads.c.output is concise:
| |
That is a good example of the AST-only contract.
The parser-only layer can tell you the clauses were recognized. But this layer verifies that after the compiler builds SgOmp* nodes and unparses them again:
- the
ifclauses are still attached to the correct constructs, num_threads(3)is still present where it belongs,- and the nested
singleregion remains structurally visible.
That is not just parse success. It is frontend structural preservation.
axpy_ompacc_parseonly.c: mixed directive systems and exotic clause syntax
axpy_ompacc_parseonly.c is a strong example of why this suite exists separately from the ordinary OpenMP parser tests.
It contains mixed OpenMP/OpenACC-style content and deliberately exaggerated accelerator policies for parsing-only coverage:
| |
The reference output captures exactly the OpenMP-bearing lines the suite expects to survive:
| |
This is the kind of case that sits awkwardly between layers:
- parser tests alone are too small and too isolated,
- lowering tests are later than necessary,
- but the broad frontend path still needs to prove it can survive this syntax and emit stable
rose_*output.
That is exactly what this corpus provides.
macroIds.c: macro handling and directive preservation under comment collection
The suite also carries a targeted macro test:
| |
That case matters because frontend pipelines often drift at the edges where preprocessing, comment collection, and directive handling interact.
The reference output is intentionally simple:
| |
The value of the test is not in complexity. It is in the guarantee that the directive lines survive the frontend with macro-based clause operands intact and readable.
bonds-2: multiple source files in one frontend invocation
The suite also has a dedicated multi-file test:
| |
That is important because compiler bugs often hide at translation-unit boundaries rather than inside one isolated source file. A frontend that only ever tests one file at a time can look healthier than it really is.
The separate Fortran slice
The Fortran subdirectory is a reminder that the frontend layer is supposed to be language-unified, not just “OpenMP in C with a few extras.”
Its current CMake wiring registers AST-only compile tests over 53 Fortran files, including cases such as:
parallel-if-numthreads.f,ompdo-default.f,continuation.f90,collapse.f90,barrier.f,critical.f90,- and many OpenMP example-derived inputs.
This gives the frontend layer a dedicated place to catch regressions in:
- Fortran directive comments,
!$ompversusc$ompforms,- continuation behavior,
- and Fortran-specific combined constructs.
That distinction matters because REX’s OpenMP pipeline is explicitly supposed to work across both C-family languages and Fortran. The test tree should reflect that.
What This Layer Catches That Parser Tests Do Not
This distinction is the most important one to keep clear.
Parser tests validate directive grammar.
OpenMP_tests validates the frontend result after directive parsing has already happened and SgOmp* construction has already taken place.
That means it can catch failures such as:
- the right directive was parsed, but attached to the wrong statement,
- the clause existed in
OpenMPIR, but the unparser emitted the wrong clause sequence, - a combined construct was recognized, but the AST/unparser split or normalized it incorrectly,
- macro-preserved operands or commented directives were lost,
- a multi-file frontend invocation handled one translation unit differently from another,
- or a mixed OpenMP/OpenACC source stopped surviving the AST-only path.
Those are all frontend failures, but they are not parser failures in the narrow sense.
That is why this layer needs to exist separately.
What This Layer Catches Earlier Than Lowering Tests
It is equally important to understand why this layer exists before lowering.
Once lowering begins, the compiler is doing much more:
- outlining,
- map construction,
- helper emission,
- runtime launch generation,
- and source restructuring that is much more invasive than AST-only unparsing.
If you use lowering tests as the first place to notice a frontend AST bug, you are debugging too late.
The OpenMP_tests corpus narrows that problem:
- if parser tests pass but
OpenMP_testsfails, the bug is likely in AST construction or frontend unparsing, - not in the parser and not yet in the lowerer.
That is a very useful diagnostic boundary.
The Relationship To The Analyzer Layer
There is one more useful distinction inside this same directory tree.
The top-level OpenMP_tests CMake also registers focused analyzer regressions via checkOmpAnalyzing, for example:
- default schedule handling,
- dynamic schedule handling,
- implicit target-map behavior.
That is related work, but it is not the same layer as the broad AST-only corpus.
This post is about the broad coverage question:
can the frontend ingest and unparse a large, realistic OpenMP source corpus correctly?
The analyzer tests ask a narrower semantic-analysis question after that. They deserve their own focused discussion later.
The Real Value Of OpenMP_tests
The deepest value of this corpus is not just that it has many files.
It is that it exercises the full frontend boundary where a source-to-source compiler can become fragile:
- large construct coverage,
- multiple languages,
- mixed directive systems,
- macro and comment handling,
- multi-file invocation,
- and a reduced but meaningful reference-diff surface over the generated
rose_*output.
That is exactly the kind of test layer a compiler like REX needs.
Parser tests are too narrow. Lowering tests are later and more invasive. Benchmarks are too noisy and expensive.
OpenMP_tests sits in the right middle position:
broad enough to keep the frontend honest, early enough to keep debugging cheap.
That is why this layer deserves to be understood on its own, and it is why the series needed a post dedicated to it.