<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Amdgpu on ./Code</title><link>https://blog.ouankou.com/tags/amdgpu/</link><description>Recent content in Amdgpu on ./Code</description><generator>Hugo</generator><language>en</language><copyright>© Anjia Wang</copyright><lastBuildDate>Sun, 21 Jun 2026 22:58:48 -0700</lastBuildDate><atom:link href="https://blog.ouankou.com/tags/amdgpu/index.xml" rel="self" type="application/rss+xml"/><item><title>How A GFX803 OpenMP Printf Bug Became An AMDGPU M0 Backend Fix</title><link>https://blog.ouankou.com/2026/06/20/how-a-gfx803-openmp-printf-bug-became-an-amdgpu-m0-backend-fix/</link><pubDate>Sat, 20 Jun 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/06/20/how-a-gfx803-openmp-printf-bug-became-an-amdgpu-m0-backend-fix/</guid><description>&lt;p&gt;This was one of those bugs where the first successful fix was not the right fix.&lt;/p&gt;
&lt;p&gt;The visible failure was small:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;
&lt;table style="border-spacing:0;padding:0;margin:0;border:0;"&gt;&lt;tr&gt;&lt;td style="vertical-align:top;padding:0;margin:0;border:0;"&gt;
&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt; 1
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt; 2
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt; 3
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt; 4
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt; 5
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt; 6
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt; 7
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt; 8
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt; 9
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt;10
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt;11
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt;12
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt;13
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt;14
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%"&gt;
&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#include&lt;/span&gt; &lt;span style="color:#75715e"&gt;&amp;lt;omp.h&amp;gt;&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#include&lt;/span&gt; &lt;span style="color:#75715e"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;foo&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#pragma omp target teams distribute parallel for num_teams(2) num_threads(6)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;; i&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a6e22e"&gt;printf&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;teams id = %d, thread id = %d&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\n&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;, &lt;span style="color:#a6e22e"&gt;omp_get_team_num&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a6e22e"&gt;omp_get_thread_num&lt;/span&gt;());
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;main&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a6e22e"&gt;foo&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;On the working stack, that prints 18 lines and exits normally.&lt;/p&gt;</description></item><item><title>Extending The LLVM OpenMP AMDGPU Stack To GFX906 And Mixed GPUs</title><link>https://blog.ouankou.com/2026/06/19/extending-the-llvm-openmp-amdgpu-stack-to-gfx906-and-mixed-gpus/</link><pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/06/19/extending-the-llvm-openmp-amdgpu-stack-to-gfx906-and-mixed-gpus/</guid><description>&lt;p&gt;The first milestone was one old GPU: WX3200, &lt;code&gt;gfx803&lt;/code&gt;, LoongArch64, LLVM 22,
and enough ROCr to run OpenMP target regions.&lt;/p&gt;
&lt;p&gt;The next question was practical:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What happens when an MI50 is installed next to it?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The MI50 is a very different card from the WX3200:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MI50 is Vega 20, reported as &lt;code&gt;gfx906&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;It supports newer AMDGPU code object ABIs.&lt;/li&gt;
&lt;li&gt;It does not need the &lt;code&gt;gfx803&lt;/code&gt; COV4 policy.&lt;/li&gt;
&lt;li&gt;It is much closer to the generation LLVM 22 expects.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That made the extension promising. But there was one important constraint: an
OpenMP program does not load one ROCr runtime per GPU architecture. One process
loads one HSA runtime, and &lt;code&gt;libomptarget&lt;/code&gt; uses that runtime to enumerate and
launch work on the visible agents.&lt;/p&gt;</description></item><item><title>Enabling End-To-End LLVM OpenMP AMDGPU Offloading On GFX803 And LoongArch64</title><link>https://blog.ouankou.com/2026/06/18/enabling-end-to-end-llvm-openmp-amdgpu-offloading-on-gfx803-loongarch64/</link><pubDate>Thu, 18 Jun 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/06/18/enabling-end-to-end-llvm-openmp-amdgpu-offloading-on-gfx803-loongarch64/</guid><description>&lt;p&gt;The goal sounded small:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Build LLVM 22 with OpenMP GPU offloading and make it run on a Radeon Pro WX
3200.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The actual target was more unusual:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;an old Polaris GPU, reported as &lt;code&gt;gfx803&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;a LoongArch64 host;&lt;/li&gt;
&lt;li&gt;a modern LLVM 22 toolchain;&lt;/li&gt;
&lt;li&gt;no desire to rebuild the full ROCm stack.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That combination matters. &lt;code&gt;gfx803&lt;/code&gt; is old enough that current ROCm no longer
treats it as a normal supported target, and LoongArch64 is not one of the usual
ROCm host architectures. But LLVM OpenMP offloading does not need all of ROCm.
It needs a smaller chain to work end to end.&lt;/p&gt;</description></item></channel></rss>