<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Profiling on ./Code</title><link>https://blog.ouankou.com/tags/profiling/</link><description>Recent content in Profiling on ./Code</description><generator>Hugo</generator><language>en-US</language><copyright>© Anjia Wang</copyright><lastBuildDate>Mon, 04 May 2026 12:42:33 -0700</lastBuildDate><atom:link href="https://blog.ouankou.com/tags/profiling/index.xml" rel="self" type="application/rss+xml"/><item><title>How REX Separated GPU-Total From Wall-Clock Noise In pathfinder And srad</title><link>https://blog.ouankou.com/2026/04/28/how-rex-separated-gpu-total-from-wall-clock-noise-in-pathfinder-and-srad/</link><pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/04/28/how-rex-separated-gpu-total-from-wall-clock-noise-in-pathfinder-and-srad/</guid><description>&lt;p&gt;The previous post closed the last obvious fair &lt;code&gt;b+tree&lt;/code&gt; kernel-body gap. REX was no longer relying on an unfair launch-shape rewrite, and it no longer needed a global cache flag. It recovered read-only provenance in the generated device kernel and emitted selective &lt;code&gt;__ldg(...)&lt;/code&gt; loads where the proof was strong enough.&lt;/p&gt;
&lt;p&gt;That left a strange-looking benchmark table.&lt;/p&gt;
&lt;p&gt;Some rows were clearly resolved. &lt;code&gt;b+tree&lt;/code&gt; had moved from a fair loss into a clear REX win. Several other benchmarks already had stable REX advantages. But three rows still looked suspicious if we looked only at the broad comparison table:&lt;/p&gt;</description></item></channel></rss>