Technical Retrospective: Stabilizing REX's Clang Frontend CTest Suite
The merged REX cleanup PR was large enough that the raw diff is not the best way to understand it.
It touched hundreds of files and moved a full CTest suite from a historical failure pile to a green result. But the work was not a random collection of unrelated fixes. Most failures were different surfaces of one modernization problem:
| |
That question reaches far beyond parsing. A source-to-source compiler does not stop at recognizing syntax. It needs declarations, scopes, symbols, types, templates, comments, tokens, source positions, and generated output to agree.
When those invariants are wrong, failures appear everywhere.
Figure 1. The failure list looked broad because the compiler is layered. Many later failures were consumers reporting frontend AST inconsistencies.
The Major Failure Families
The largest family was C and C++ frontend structural failure.
This included missing or wrong parent pointers, declarations inserted into the wrong scope, symbols missing from symbol tables, defining and nondefining declarations not paired, tags not mapped across redeclarations, and types that referenced incomplete or noncanonical declarations. These failures often appeared as assertions, generated-source compile errors, or name-qualification regressions.
The second family was template and type construction.
Clang exposes templates, template arguments, aliases, dependent types, injected class names, specialization records, and implicit declarations in a way that is not identical to the older frontend. REX had to represent those constructs as Sage nodes without losing the relationship between the spelling the user wrote and the semantic entity Clang resolved.
The third family was declaration ordering and unparsing.
Some generated files failed because a tag definition appeared after a use, because a typedef did not preserve the right underlying type, because associated declarations were grouped across a scope boundary, or because name qualification produced a spelling that no longer matched the declaration context. The fix was not to teach the unparser string exceptions. The fix was to make AST state strong enough that declaration ordering and qualification could be derived from it.
The fourth family was token and source-position preservation.
Token-stream mapping, physical file ranges, comments, directives, macro-adjacent nodes, and implicit nodes all needed clearer ownership. Once structural AST failures were fixed, these tests became more precise reporters. Comment placement was especially sensitive: moving a comment from before a statement to after it is not a harmless formatting change.
The fifth family was midend analysis.
Callgraph, CFG, dataflow, def-use, inlining, outlining, move-declaration, and normalization tests assumed shapes that were mostly true for older frontend output. Clang-built Sage ASTs exposed missing cases: lambdas, operators, implicit constructs, reference parameters, copied declarations, and different declaration ownership paths.
The final family was test infrastructure and output hygiene.
Some tests wrote artifacts into source-tree locations, some long aggregate tests carried stale scheduling assumptions, and some reference outputs no longer matched stable corrected compiler output. These were handled carefully because test updates can either document a real compiler fix or hide a bug.
Root Causes
The first root cause was incomplete declaration ownership.
In Sage, a declaration is not just a node. It belongs to a scope. It may have a defining declaration and a nondefining declaration. It may need a symbol. It may represent a tag, typedef, variable, function, namespace, class, template, or specialization. Later passes expect those relationships to be present and internally consistent.
Clang provides rich semantic information, but not in Sage’s ownership model. The frontend has to construct that model deliberately.
The second root cause was treating too much Clang-reachable state as if it should become eagerly materialized output state.
Clang knows the whole translation unit, including system headers and template-heavy library declarations. REX needs the source-to-source surface for the current workflow. The Cxx_Grammar.C timeout was the clearest symptom of crossing that boundary too aggressively.
The third root cause was type identity drift.
Generated-source compile failures often came from a declaration and its out-of-line definition disagreeing about a type. That kind of failure can be caused by a small frontend mismatch: using a simplified alias where the original nested template type was required, losing an elaborated tag, or building a type node that points at the wrong declaration.
The fourth root cause was downstream code assuming AST shapes that the new frontend no longer guaranteed.
This is not a frontend bug by itself. A mature compiler has many consumers. When the frontend becomes more complete, consumers may need to handle constructs they previously never saw.
Figure 2. The core work was not just constructing nodes. Each materialized Clang entity needed the Sage invariants that later passes rely on.
Key Subsystems Changed
The Clang frontend changed the most.
The important behavior was centralizing how declarations become Sage declarations and how they are connected to scopes, symbols, defining/nondefining pairs, tag mappings, typedef completion, template metadata, and source locations. The goal was to reduce ad hoc declaration construction paths that each remembered a different subset of invariants.
Target and option propagation also mattered. Tests involving explicit targets, ABI choices, x86-specific inline assembly, -m32/-m64, architecture flags, and calling convention details needed the frontend and backend command construction to preserve the user’s requested compilation environment.
The unparser and name-qualification work was narrower in principle but broad in effect. The unparser needed to respect corrected AST state: declaration ordering, tag definitions before uses, typedef and elaborated type output, anonymous tag handling, comments, directives, and token frontiers. The rule stayed clear: do not invent string-based hacks to hide frontend state bugs.
Token and source-position handling became more explicit. Token mapping needs physical file ownership, duplicate map avoidance, macro and implicit-node awareness, and stable preprocessing attachment. These fixes were downstream of structural frontend correctness; they were not substitutes for it.
Midend consumers were updated where the corrected frontend exposed valid AST shapes they did not handle. Dataflow lattices needed ownership semantics that did not leak or share internal lattice pointers. The inliner needed to preserve reference-shadow parameters. Move-declaration and normalization tests needed symbol, declaration-pair, initializer, and token-frontier stability.
The test system and local workflow also changed. A failure ledger helped classify and compare frozen-set runs. Test outputs were moved into build-tree locations where needed. Long aggregate tests were made less wasteful without weakening what they validated.
Representative Fix Patterns
The exact file list is less important than the repeated fix patterns.
One pattern was replacing scattered construction with invariant-aware construction. In a compiler frontend, the risky part is often not allocating a node. It is making sure the node is linked to the right parent, scope, symbol, declaration pair, type, and source location before another subsystem sees it. Several frontend repairs followed this shape: identify a construct that was built by a special path, then bring that path under the same ownership rules as the rest of the frontend.
A second pattern was making ownership explicit. Raw pointers and implicit transfer conventions are easy to misuse in old compiler code. When a structure owns copied state, the API should say so. When a caller retains ownership, the type should not imply that the callee might delete it. The final live/dead lattice review fix was a small example of this broader pattern.
A third pattern was separating workflow requirements. Token-preserving and source-position-preserving tests legitimately need more original source surface than ordinary compile tests. Ordinary compile tests should not eagerly materialize every Clang-reachable header declaration just because Clang knows it. The Cxx_Grammar.C timeout fix came from recognizing that difference.
A fourth pattern was moving downstream code from historical assumptions to explicit handling. Some midend code assumed the older frontend’s AST shapes. Clang-built Sage nodes exposed valid cases that had not been common before. The correct response was to handle those cases directly, not to force the frontend to mimic every old shape.
These patterns matter because they are reusable. Future REX frontend work should ask whether a patch strengthens one of these contracts or creates another special path that will need cleanup later.
What Was Not Allowed
The most important constraint was negative:
| |
That meant no CTest pass-property masking. No deleting valid tests. No weakening assertions just because they were inconvenient. No unparser string attributes that papered over wrong AST state. No special casing a test specimen by name to bypass the real invariant.
Reference-output updates were allowed only when the compiler output was now stable and semantically equivalent. Even then, comment placement had to be treated as semantic-adjacent. A comment attached before statement A cannot silently move after statement A and be called formatting.
This discipline was necessary because many historical failures were clustered. A hack in one layer could make many tests green while also making the next failure family harder to understand.
Why Downstream Fixes Were Still Real Fixes
Saying the campaign was frontend-led does not mean every non-frontend change was suspicious.
The frontend is the producer. The midend and backend are consumers. Once the producer becomes more complete, consumers must sometimes become more general.
For example, callgraph and CFG tests need deterministic handling of valid Sage nodes built from Clang constructs. Dataflow analyses need to own copied lattice state instead of relying on ambiguous raw-pointer transfer. Inlining needs to preserve reference parameter semantics. Move-declaration tests need token and symbol metadata to survive relocation.
Those are not frontend hacks. They are consumer fixes made visible by frontend modernization.
The review question for each downstream change was:
| |
Only the first category belongs in a compiler stabilization PR.
How To Read The Key Changed Areas
A future engineer does not need to memorize every changed file to understand the PR. A better mental map is:
| |
When reading a specific change, place it in that map.
If a frontend change affects declarations, ask which Sage invariant it is completing. If an unparser change affects output order, ask which AST relationship now justifies the order. If a reference output changes, ask whether the source program still means the same thing and whether comments stayed attached to the same logical construct. If a midend change accepts a new AST shape, ask whether that shape is valid or whether the frontend should have produced something else.
This map is also useful for debugging future regressions. A generated compile error is not automatically an unparser bug. A token test failure is not automatically a token bug. A callgraph assertion is not automatically a callgraph bug. The failure belongs to the first layer whose invariant is actually wrong.
Remaining Risks
A green full suite is a milestone, not a proof of total correctness.
The highest residual risk is still the size of the C and C++ language surface. Templates, dependent names, anonymous tags, elaborated types, lambdas, hidden friends, and system-header interactions are deep wells. The suite is now green, but future real code can still expose missing combinations.
The second risk is source preservation. Comments, directives, token frontiers, macro boundaries, and physical file ranges are easy to perturb. Tests cover a lot of this surface, but source-to-source compilers always need caution around “format-only” changes.
The third risk is reference-output confidence. Some references were updated because the compiler output changed after root-cause fixes. The accepted updates were reviewed as stable semantic equivalents, but future maintainers should continue treating reference churn as high risk.
The fourth risk is LLVM drift. REX is pinned to LLVM 22 for this work, and the frontend now matches that API and behavior. A future LLVM migration can change Clang AST shape, implicit nodes, source ranges, or driver behavior.
Figure 3. The full suite going green removes the historical failure pile. It does not remove the need for careful review around the remaining high-risk surfaces.
What The Final Green Suite Means
The final result matters because it gives REX a clean baseline.
Before the cleanup, a new failure could hide inside the historical pile. After the cleanup, a new failure is much easier to classify as a regression. That changes day-to-day development.
It also changes how risky future frontend work feels. Before the cleanup, a developer had to ask whether a failure was new or merely part of the old migration debt. After the cleanup, the default assumption can be stricter: if a test fails after a change, the change probably introduced or exposed something that deserves immediate attention.
That is a practical maintenance benefit, not just a scoreboard improvement.
The result also validates the broad shape of the Clang frontend migration. REX can now build and test the full local suite on the LLVM 22 Clang frontend path with:
| |
That does not mean every internal design is perfect. It means the system has reached a stability level where future work can be incremental instead of archaeological.
The main technical lesson is this:
| |
Once that contract became coherent enough, the rest of the failure families became fixable without hiding bugs.
The next migrations should preserve that lesson. LLVM changes, new C++ constructs, or OpenMP frontend extensions should be judged by the same standard: build the right AST state first, then let the unparser and midend consume it honestly.