Amdgpu on ./Code

How A GFX803 OpenMP Printf Bug Became An AMDGPU M0 Backend Fix

Sat, 20 Jun 2026 00:00:00 +0000

This was one of those bugs where the first successful fix was not the right fix.

The visible failure was small:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


#include <omp.h>
#include <stdio.h>

void foo(void) {
#pragma omp target teams distribute parallel for num_teams(2) num_threads(6)
 for (int i = 0; i < 18; i++)
 printf("teams id = %d, thread id = %d\n", omp_get_team_num(),
 omp_get_thread_num());
}

int main(void) {
 foo();
 return 0;
}

On the working stack, that prints 18 lines and exits normally.

Extending The LLVM OpenMP AMDGPU Stack To GFX906 And Mixed GPUs

Fri, 19 Jun 2026 00:00:00 +0000

The first milestone was one old GPU: WX3200, gfx803, LoongArch64, LLVM 22, and enough ROCr to run OpenMP target regions.

The next question was practical:

What happens when an MI50 is installed next to it?

The MI50 is a very different card from the WX3200:

MI50 is Vega 20, reported as gfx906.
It supports newer AMDGPU code object ABIs.
It does not need the gfx803 COV4 policy.
It is much closer to the generation LLVM 22 expects.

That made the extension promising. But there was one important constraint: an OpenMP program does not load one ROCr runtime per GPU architecture. One process loads one HSA runtime, and libomptarget uses that runtime to enumerate and launch work on the visible agents.

Enabling End-To-End LLVM OpenMP AMDGPU Offloading On GFX803 And LoongArch64

Thu, 18 Jun 2026 00:00:00 +0000

The goal sounded small:

Build LLVM 22 with OpenMP GPU offloading and make it run on a Radeon Pro WX 3200.

The actual target was more unusual:

an old Polaris GPU, reported as gfx803;
a LoongArch64 host;
a modern LLVM 22 toolchain;
no desire to rebuild the full ROCm stack.

That combination matters. gfx803 is old enough that current ROCm no longer treats it as a normal supported target, and LoongArch64 is not one of the usual ROCm host architectures. But LLVM OpenMP offloading does not need all of ROCm. It needs a smaller chain to work end to end.