How REX Expands `declare mapper` Clauses Into Dynamic Runtime Map Entries
declare mapper as a string decoration on a map(...) clause. It resolves the visible mapper by scope and type, recursively expands mapper items into real leaf map entries, turns array-section mapper uses into deferred dynamic entries, and then runs a two-pass count-and-populate builder to materialize the final runtime arrays. The same machinery is reused by target, target data, and target update, which keeps the mapping model consistent across all three constructs.The previous post in this series focused on one special transport class inside the runtime packet: literal target parameters for eligible scalars.
This post stays in the same neighborhood, but it addresses a very different kind of complexity:
what happens when one mapped clause item is not really one runtime entry at all?
That is the problem declare mapper introduces.
If the user writes a mapper like this:
| |
and later uses it on an array section:
| |
then the lowerer cannot pretend that v[0:n] is a single map slot. It is not.
Semantically, the compiler has to do something much more precise:
- resolve which mapper declaration is visible for
Vec, - substitute the mapped expression into the mapper body,
- recursively expand the mapper’s own
map(...)items, - preserve
to,from, ortofromdirection semantics, - and, when the mapper is applied to an array section, expand the section element-by-element into real runtime arguments.
That last part is where things stop being a simple expression rewrite and become a lowering stage in their own right.
This post focuses on the mapper-expansion machinery inside src/midend/programTransformation/ompLowering/omp_lowering.cpp. It explains:
- how
resolveVisibleMapperForExpression(...)chooses the active mapper, - how
collectExpandedMapEntriesForExpression(...)andcollectExpandedMapEntriesUsingResolvedMapper(...)recursively flatten mapper clauses, - why array sections become deferred
dynamic_mapper_sectionentries instead of immediate argument expressions, - how
buildDynamicRuntimeMapArgumentArrays(...)uses a two-pass count/populate strategy, - and why the same expansion path is shared by
target,target data, andtarget update.
Figure 1. Mapper lowering starts as a semantic lookup problem. REX resolves the visible declare mapper by scope, identifier, and equivalent type before it builds any runtime arguments.
Why Mapper Expansion Deserves Its Own Lowering Stage
Without mappers, the mental model of mapping is relatively straightforward. Each clause item becomes one resolved map item, and each resolved item eventually contributes one position in the runtime arrays:
__args_base__args__arg_sizes__arg_types
There are still details, but the shape is one source item to one logical runtime slot.
Mappers break that simplicity on purpose.
A mapper says that the source-level object written in the clause is not the final mapping unit. It is shorthand for a mapping recipe. That recipe can itself contain:
- direct scalar members,
- pointer members,
- array sections,
- nested mapper requests,
- and different direction semantics depending on whether the outer construct is a
map(...)or atarget update to/from.
So a mapper-qualified item is not really “an expression waiting for packet assembly.” It is a small semantic program that the lowerer must interpret.
That is why the mapper code lives earlier than final packet assembly. The packet builder expects concrete leaf entries. Mapper lowering is the stage that creates those leaf entries.
This is also why the implementation introduces ExpandedMapEntry rather than trying to force everything directly into ResolvedMapItem.
There are two expansion kinds:
direct_itemdynamic_mapper_section
direct_item means the compiler was able to fully resolve one leaf map entry immediately.
dynamic_mapper_section means the compiler knows what mapper recipe to use, but the final number of runtime entries depends on section lengths such as n or v[i].len, so actual expansion must be deferred until a dynamic builder can emit loops and heap-backed arrays.
That split is the core idea of the design.
Step 1: Resolve The Visible Mapper By Scope, Identifier, And Type
The first job is not expansion. It is lookup.
That logic lives in resolveVisibleMapperForExpression(...).
At a high level, the algorithm is:
- collect candidate types for the mapped expression,
- normalize the requested mapper identifier,
- walk outward through enclosing scopes,
- scan preceding statements in each scope for
SgOmpDeclareMapperStatement, - filter those declarations by identifier and equivalent formal type,
- reject ambiguity,
- and return the first matching declaration from the nearest scope that contains one.
The important part is that this is a real scope walk, not a global name search.
The implementation keeps track of the current anchor statement and, for each enclosing scope, only considers mapper declarations that appear before the anchor point in that scope. That is the right behavior for a source-to-source compiler: mapper visibility follows lexical structure, not some post-hoc symbol dump.
The key shape looks like this:
| |
That is already more disciplined than many ad hoc frontend features. The lowerer does not assume that “the visible mapper” can be recovered from a single symbol table query. It reconstructs the lexical rule explicitly.
Then it filters by identifier.
If the user explicitly requested mapper(foo), REX only accepts user-defined mapper declarations with normalized identifier foo. If the use site did not explicitly name a mapper, REX only considers default mappers.
Then it filters by type.
The candidate type set comes from the mapped expression, and the formal mapper type is compared with SageInterface::isEquivalentType(...). That matters because mapper lookup is a type-level semantic choice, not a string comparison on the expression text.
Finally, the helper rejects ambiguity inside the same scope:
| |
That is the right failure mode. If two mapper declarations in the same visible scope could both apply, the compiler should stop. Quietly choosing one would poison every later lowering step.
Step 2: Expand Mapper Clauses Recursively Into Leaf Entries
Once a mapper is resolved, collectExpandedMapEntriesUsingResolvedMapper(...) interprets the mapper’s own clauses.
This is where the design gets interesting.
The mapper declaration is not lowered as a blob. REX iterates the mapper’s map(...) clauses, then iterates each clause item, materializes that item against the actual mapped expression, and decides whether the result is:
- a direct leaf entry,
- another mapper expansion,
- or a deferred array-section expansion.
The direct-substitution step is handled by materializeMapperExpression(...). Conceptually, it rewrites the mapper formal into the actual mapped expression so that a mapper item like:
| |
becomes the equivalent expression rooted at the real target object, not the mapper’s dummy formal variable.
After substitution, the lowerer distinguishes a special case: a direct self-item.
If the mapper item is just the formal itself, and there is no explicitly requested nested mapper identifier, REX can turn that into a ResolvedMapItem directly. Otherwise, it recurses back through collectExpandedMapEntriesForExpression(...) so nested mapping logic applies consistently.
This keeps the model uniform. A mapper item is not treated specially just because it came from inside a mapper. It still goes through the same expansion decision tree as a top-level clause item.
The recursion guard is also explicit.
The helper carries a vector of active_mappers, and if the current mapper declaration is already active, it aborts with a recursive-expansion error. That prevents infinite expansion loops for pathological or cyclic mapper definitions.
This is one of those details that is easy to overlook in a polished implementation. But without it, mapper lowering would be a trapdoor for unbounded recursion.
Step 3: Preserve Direction Semantics Instead Of Flattening Them Away
A good mapper implementation cannot simply expand structure. It also has to preserve motion semantics.
That is why the expansion helpers carry both:
- a
MapperUseKind - and a map operator such as
to,from, ortofrom
When expansion originates from a normal map(...) clause, the lowerer derives the effective operator by combining the outer map operator with the mapper item’s own operator:
| |
When expansion originates from target update to(...) or target update from(...), the logic is different. REX does not blindly preserve mapper items whose direction is incompatible with the requested motion.
Instead, it checks:
| |
and drops mapper items that do not apply to the requested update direction.
That behavior is important because target update is not a kernel launch. It is a data-motion operation. If the lowerer forgot that distinction and expanded every mapper item unconditionally, a to(...) update could silently include fields that are supposed to flow only in the opposite direction.
So the mapper-expansion code is doing two jobs at once:
- structural flattening,
- semantic filtering based on construct kind.
That is one reason it deserves its own focused post.
Step 4: Array Sections Become Deferred Dynamic Entries
The sharpest transition in the implementation happens when a resolved mapper applies to an array section.
If the mapped expression is an array section reference, REX does not immediately expand the mapper into concrete ResolvedMapItem leaf entries.
Instead, collectExpandedMapEntriesForExpression(...) creates an ExpandedMapEntry of kind dynamic_mapper_section:
| |
This is the architectural pivot of the whole feature.
Why not expand immediately?
Because for a section such as v[0:n], the final runtime entry count depends on n, and each element may itself expand through the mapper into multiple leaf items such as:
&v[i].lenv[i].data + 0sizeof(float) * v[i].len
So the lowerer cannot emit one static expression list and be done. It needs a procedural expansion phase that can:
- generate loop indices,
- construct one element expression at a time,
- recursively expand the mapper on that element,
- and either count or populate the eventual arrays.
That is why the intermediate representation remembers:
- the base expression,
- the section dimensions,
- the resolved mapper declaration,
- the originating construct kind,
- and the runtime flag bits.
In other words, the lowerer does not lose semantic information when it defers expansion. It packages exactly the facts the later dynamic builder will need.
Figure 2. Array-section mapper uses are not lowered as one runtime entry. REX defers them into a dynamic section record, then later expands each element through the mapper recipe.
Step 5: Use A Two-Pass Dynamic Builder Instead Of Guessing Array Size
Once mapper expansion has produced one or more dynamic_mapper_section entries, the lowerer switches to buildDynamicRuntimeMapArgumentArrays(...).
This builder does not try to be clever in one pass. It explicitly runs two passes:
count_onlypopulate
That decision is one of the cleaner pieces of the design.
The count pass starts with any static prefix and suffix entries already known, initializes __arg_num, and then walks the dynamic entries:
| |
For direct items, count_only just increments __arg_num.
For dynamic_mapper_section, the helper emits a loop nest over the section lengths and recursively expands each element through the resolved mapper, still in count mode.
Only after the full count is known does REX allocate:
__args_base__args__arg_sizes__arg_types
with heap storage sized by __arg_num.
Then the populate pass reruns the same expansion tree, this time emitting actual array writes through an __arg_index cursor.
That is a better design than trying to predict counts with one bespoke formula per mapper shape.
The point is not just correctness. It is reuse.
The same recursive expansion logic works in both passes because the pass kind is an explicit input. The lowerer does not maintain one implementation for “how many entries will this make?” and a second unrelated implementation for “now actually emit them.” It keeps one traversal and changes the action taken at each leaf.
This is the reason the lowered output for the dedicated mapper test contains both:
- a dynamic argument count path beginning from
__arg_num = 0; - and a generated loop over
__rex_mapper_section_index_0
The test is not merely checking syntax trivia. It is checking that the compiler took the dynamic-expansion route at all.
Step 6: Generate Per-Element Loops, Not Whole-Section Fake Entries
Inside appendExpandedMapEntryDynamicPass(...), a dynamic_mapper_section produces a loop nest through buildLoopNest.
For each dimension, REX creates an index variable such as:
| |
and loops from zero to the runtime length for that dimension.
At the innermost point, it materializes one element expression with buildArraySectionElementExpression(...), then recursively expands the mapper on that element.
That is exactly the behavior the lowering test checks.
The script tests/nonsmoke/functional/roseTests/ompLoweringTests/scripts/run_mapper_lowering_check.sh verifies that the lowered host file contains fragments such as:
__rex_mapper_section_index_0 < (int64_t)n&v[0 + (long long)__rex_mapper_section_index_0].lenv[0 + (long long)__rex_mapper_section_index_0].data + 0sizeof(float) * v[0 + (long long)__rex_mapper_section_index_0].len
Those are not arbitrary textual checks. Together they prove the real semantic point:
REX did not lower v[0:n] as one fake aggregate map entry.
It lowered it element-by-element through the mapper’s recipe.
That is the difference between “supports mapper syntax” and “actually lowers mapper semantics.”
Step 7: The Same Machinery Is Shared By target, target data, And target update
One of the strongest parts of this implementation is where it is reused.
For kernel launches, transOmpMapVariables(...) collects expanded entries for map(...) clauses and passes dynamic_map_entries into the host launch and runtime-packet path.
For target data, transOmpTargetData(...) calls the same mapping collector and, when dynamic entries are present, routes them through buildDynamicRuntimeMapArgumentArrays(...) before calling:
__tgt_target_data_begin__tgt_target_data_end
For target update, collectOmpTargetUpdateInfo(...) uses collectExpandedMotionItemsForClause(...), which still funnels into collectExpandedMapEntriesForExpression(...), and the dynamic path again reuses buildDynamicRuntimeMapArgumentArrays(...) before calling __tgt_target_data_update.
So the feature is not “mapper lowering for target kernels” plus a second and third implementation elsewhere.
It is one shared mapping vocabulary with different consumers:
- kernel launch,
- data-region lifetime management,
- directional data updates.
That is exactly the right design for a source-to-source compiler. The semantic expansion of a mapper-qualified section should not depend on whether the consuming runtime call is a kernel launch or a data-motion API.
Figure 3. REX keeps mapper expansion in one shared path. target, target data, and target update differ at the final runtime API, not in how mapper-qualified sections are semantically flattened.
What The Current Tests Actually Cover
The best concrete specimen for this feature is tests/nonsmoke/functional/CompileTests/OpenMP_tests/declare_mapper_target_update.c.
It is a strong test because it exercises three consumers in one small source:
target datatarget updatetarget
all against the same declare mapper(default : Vec v) definition and the same section-shaped use v[0:n].
The dedicated lowering check then verifies:
- dynamic counting is present,
- runtime loops over section length are present,
- mapper member addressing is lowered per element,
- mapper-derived size expressions are present,
- heap-backed array allocation happens,
- and whole-section raw fallbacks were not left in the output.
That is a good fit for this kind of feature. Mapper lowering is too semantic and too shape-sensitive to rely only on broad end-to-end execution tests. The compiler also needs structural checks that the generated code took the right lowering route.
There is still room to grow.
Future tests could tighten coverage around:
- nested user-defined mappers,
- ambiguity diagnostics,
- recursive mapper rejection,
- multi-dimensional section expansion,
- and explicit mapper identifiers that differ from
default.
But the current test already proves the essential thing: the lowerer expands mapper-qualified array sections into the runtime argument model it actually needs.
Why This Design Fits REX Well
The interesting part of this feature is not that it supports declare mapper syntax. Many systems can parse syntax.
The interesting part is that the lowering strategy is modular.
REX separates the problem into layers:
- resolve the visible mapper correctly,
- recursively flatten mapper items into expansion records,
- preserve directional semantics across
map,to, andfrom, - defer array-section cases instead of forcing them into fake static items,
- and reuse one dynamic builder for every construct that consumes map arrays.
That structure buys REX three things at once.
First, it keeps the mapping model semantically honest. Mapper-qualified array sections are not squeezed through a simpler path that only accidentally works.
Second, it keeps the ABI layer simpler. By the time the runtime-packet builder sees the mapping data, the hard semantic part has already been resolved into either concrete leaf items or well-formed deferred dynamic entries.
Third, it keeps the compiler maintainable. target, target data, and target update reuse the same mapper expansion logic instead of each growing their own subtly different interpretation of the same directive feature.
Closing
declare mapper looks small at the surface syntax level, but it is one of the places where OpenMP mapping stops being “collect a few expressions” and becomes a real lowering problem.
REX handles that by refusing to blur the stages together.
It resolves mappers semantically, expands them recursively, defers section-shaped cases into dynamic entries, and only then lets the runtime-array builder turn the result into concrete arguments.
That is why the generated code for a case like v[0:n] ends up looking explicit and procedural. The lowerer is not being verbose by accident. It is making the mapper semantics concrete enough that the runtime can execute them correctly.