<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Btree on ./Code</title><link>https://blog.ouankou.com/tags/btree/</link><description>Recent content in Btree on ./Code</description><generator>Hugo</generator><language>en-US</language><copyright>© Anjia Wang</copyright><lastBuildDate>Mon, 04 May 2026 12:14:04 -0700</lastBuildDate><atom:link href="https://blog.ouankou.com/tags/btree/index.xml" rel="self" type="application/rss+xml"/><item><title>How REX Recovered b+tree Read-Only Loads With __ldg</title><link>https://blog.ouankou.com/2026/04/27/how-rex-recovered-btree-read-only-loads-with-ldg/</link><pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/04/27/how-rex-recovered-btree-read-only-loads-with-ldg/</guid><description>&lt;p&gt;The previous post ended with an important constraint: REX was not allowed to win &lt;code&gt;b+tree&lt;/code&gt; by silently shrinking a valid user-requested launch shape. The manual thread-width sweep had found a faster shape, but the source explicitly requested the launch geometry and native LLVM preserved it. That made the optimization useful as a diagnostic, not as a fair compiler rewrite.&lt;/p&gt;
&lt;p&gt;So the remaining &lt;code&gt;b+tree&lt;/code&gt; problem became sharper.&lt;/p&gt;
&lt;p&gt;The launch contract had to stay fair. The direct &lt;code&gt;__tgt_target_kernel&lt;/code&gt; path was already in place. Literal scalar target parameters were already repaired. Cubin registration was no longer the issue. The output matched native LLVM. Yet &lt;code&gt;b+tree&lt;/code&gt; still had a small native advantage in fair runs:&lt;/p&gt;</description></item><item><title>How REX Kept b+tree Launch Geometry Fair</title><link>https://blog.ouankou.com/2026/04/26/how-rex-kept-btree-launch-geometry-fair/</link><pubDate>Sun, 26 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/04/26/how-rex-kept-btree-launch-geometry-fair/</guid><description>&lt;p&gt;The previous post established the general fairness principles for GPU launch geometry. This post applies those rules to a specific case: the &lt;code&gt;b+tree&lt;/code&gt; benchmark, which exposed a performance gap that remained even after the direct &lt;code&gt;__tgt_target_kernel&lt;/code&gt; and ABI migration work.&lt;/p&gt;
&lt;p&gt;At that point, the easy failure modes were mostly gone. The &lt;code&gt;b+tree&lt;/code&gt; benchmark built. It registered its cubin. It launched its kernels. Its output matched.&lt;/p&gt;
&lt;p&gt;That left the uncomfortable kind of performance gap: a small one.&lt;/p&gt;</description></item></channel></rss>