<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Benchmarking on ./Code</title><link>https://blog.ouankou.com/tags/benchmarking/</link><description>Recent content in Benchmarking on ./Code</description><generator>Hugo</generator><language>en-US</language><copyright>© Anjia Wang</copyright><lastBuildDate>Thu, 23 Apr 2026 00:35:06 -0700</lastBuildDate><atom:link href="https://blog.ouankou.com/tags/benchmarking/index.xml" rel="self" type="application/rss+xml"/><item><title>How REX Builds Native LLVM And Generated Variants For Side-By-Side GPU Benchmarking</title><link>https://blog.ouankou.com/2026/04/20/how-rex-builds-native-llvm-and-generated-variants-for-side-by-side-gpu-benchmarking/</link><pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/04/20/how-rex-builds-native-llvm-and-generated-variants-for-side-by-side-gpu-benchmarking/</guid><description>&lt;p&gt;The previous post explained why the GPU benchmark layer in REX should be treated as an investigation surface rather than a scoreboard.&lt;/p&gt;
&lt;p&gt;This post steps one level earlier in that same top-layer contract.&lt;/p&gt;
&lt;p&gt;Before you can compare correctness or performance meaningfully, you need to answer a much more basic question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;what exactly are the two binaries being compared?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That sounds like bookkeeping.&lt;/p&gt;
&lt;p&gt;It is not.&lt;/p&gt;
&lt;p&gt;If the native LLVM side and the REX side are not built under a disciplined contract, then the benchmark result stops meaning what it claims to mean.&lt;/p&gt;</description></item><item><title>Why REX Treats GPU Benchmark Results As An Investigation Surface, Not A Scoreboard</title><link>https://blog.ouankou.com/2026/04/19/why-rex-treats-gpu-benchmark-results-as-an-investigation-surface-not-a-scoreboard/</link><pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/04/19/why-rex-treats-gpu-benchmark-results-as-an-investigation-surface-not-a-scoreboard/</guid><description>&lt;p&gt;The previous post argued that the GPU benchmark layer in REX must stay narrow instead of becoming a catch-all test suite.&lt;/p&gt;
&lt;p&gt;This post narrows the same idea one step further.&lt;/p&gt;
&lt;p&gt;It is about the &lt;em&gt;style&lt;/em&gt; of interpretation at the top layer.&lt;/p&gt;
&lt;p&gt;The benchmark layer is not most useful when it says:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;
&lt;table style="border-spacing:0;padding:0;margin:0;border:0;"&gt;&lt;tr&gt;&lt;td style="vertical-align:top;padding:0;margin:0;border:0;"&gt;
&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt;1
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt;2
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt;3
&lt;/span&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%"&gt;
&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REX faster
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LLVM faster
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pass
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fail
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;It is most useful when it says:&lt;/p&gt;</description></item><item><title>Why REX's GPU Benchmark Layer Must Not Become A Catch-All Test Suite</title><link>https://blog.ouankou.com/2026/04/18/why-rex-gpu-benchmark-layer-must-not-become-a-catch-all-test-suite/</link><pubDate>Sat, 18 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/04/18/why-rex-gpu-benchmark-layer-must-not-become-a-catch-all-test-suite/</guid><description>&lt;p&gt;The previous post argued that real GPU benchmarks still matter in REX because they are the only place where the full offloading stack meets a real application.&lt;/p&gt;
&lt;p&gt;That does not mean benchmarks should become the default place to detect every kind of bug.&lt;/p&gt;
&lt;p&gt;This post is about that boundary.&lt;/p&gt;
&lt;p&gt;The benchmark layer is indispensable, but only if it stays disciplined.&lt;/p&gt;
&lt;p&gt;If it tries to answer every testing question at once, it becomes:&lt;/p&gt;</description></item><item><title>What Only Real GPU Benchmarks Still Catch In REX</title><link>https://blog.ouankou.com/2026/04/17/what-only-real-gpu-benchmarks-still-catch-in-rex/</link><pubDate>Fri, 17 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/04/17/what-only-real-gpu-benchmarks-still-catch-in-rex/</guid><description>&lt;p&gt;The previous two posts in this series narrowed the benchmark layer into two specific contracts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;fairness in performance comparison,&lt;/li&gt;
&lt;li&gt;and correctness comparison that does not trust naive raw diffs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This post steps back one level.&lt;/p&gt;
&lt;p&gt;It asks a simpler question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;after REX already has parser tests, Frontend AST tests, semantic checks, lowering invariant tests, and CPU equivalence tests, why are real GPU benchmarks still necessary at all?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The short answer is that those earlier layers prove narrower things.&lt;/p&gt;</description></item><item><title>How REX Validates Benchmark Correctness Without Trusting Naive Diffs</title><link>https://blog.ouankou.com/2026/04/16/how-rex-validates-benchmark-correctness-without-trusting-naive-diffs/</link><pubDate>Thu, 16 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/04/16/how-rex-validates-benchmark-correctness-without-trusting-naive-diffs/</guid><description>&lt;p&gt;The previous post in this series focused on fairness in performance comparison: same runtime stack, same user intent, and the right meaning of time.&lt;/p&gt;
&lt;p&gt;This post covers the correctness half of the same problem.&lt;/p&gt;
&lt;p&gt;The question is:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;when a benchmark is supposed to prove that native LLVM and REX still compute the same thing, what exactly counts as &amp;ldquo;the same thing&amp;rdquo;?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The answer turned out to be more careful than a raw &lt;code&gt;diff&lt;/code&gt;.&lt;/p&gt;</description></item><item><title>How REX Makes Fair GPU Offloading Comparisons Against Native LLVM</title><link>https://blog.ouankou.com/2026/04/15/how-rex-makes-fair-gpu-offloading-comparisons-against-native-llvm/</link><pubDate>Wed, 15 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/04/15/how-rex-makes-fair-gpu-offloading-comparisons-against-native-llvm/</guid><description>&lt;p&gt;The previous post in this series explained how REX emits &lt;code&gt;omp_offloading_entries&lt;/code&gt; and keeps host and device kernel identity aligned.&lt;/p&gt;
&lt;p&gt;This one moves back up to the benchmark layer, but it stays much narrower than the general benchmark-validation post.&lt;/p&gt;
&lt;p&gt;The question here is not:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;how do we run a benchmark suite at all?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The question is:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;when REX and native LLVM are close, noisy, or trading wins across different measurement modes, what makes a comparison fair enough to trust?&lt;/p&gt;</description></item></channel></rss>