<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Gpt-5.5 on ./Code</title><link>https://blog.ouankou.com/tags/gpt-5.5/</link><description>Recent content in Gpt-5.5 on ./Code</description><generator>Hugo</generator><language>en-US</language><copyright>© Anjia Wang</copyright><lastBuildDate>Mon, 25 May 2026 13:40:11 -0700</lastBuildDate><atom:link href="https://blog.ouankou.com/tags/gpt-5.5/index.xml" rel="self" type="application/rss+xml"/><item><title>What A Twelve-Day Codex Session Revealed About Persistent Engineering Agents</title><link>https://blog.ouankou.com/2026/05/28/what-a-twelve-day-codex-session-revealed-about-persistent-engineering-agents/</link><pubDate>Thu, 28 May 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/05/28/what-a-twelve-day-codex-session-revealed-about-persistent-engineering-agents/</guid><description>&lt;p&gt;The REX frozen-failure cleanup was a compiler project, but it was also an agent project.&lt;/p&gt;
&lt;p&gt;For twelve days, Codex worked through a large historical CTest failure set in a real codebase. The task was not a toy benchmark. It involved a Clang frontend, a source-to-source AST, an unparser, token and source-position preservation, OpenMP and Fortran guardrails, midend analyses, generated files, reference outputs, review comments, hooks, and full-suite CTest runs.&lt;/p&gt;
&lt;p&gt;That is the kind of task where many agent demos stop being useful.&lt;/p&gt;</description></item><item><title>How REX Cleaned Up A Thousand Historical Test Failures Without Bounce</title><link>https://blog.ouankou.com/2026/05/25/how-rex-cleaned-up-a-thousand-historical-test-failures-without-bounce/</link><pubDate>Mon, 25 May 2026 00:00:00 +0000</pubDate><guid>https://blog.ouankou.com/2026/05/25/how-rex-cleaned-up-a-thousand-historical-test-failures-without-bounce/</guid><description>&lt;p&gt;The REX Clang frontend cleanup did not look like one heroic fix.&lt;/p&gt;
&lt;p&gt;It looked like a long sequence of small decisions that could easily have gone wrong. The original full-suite run had roughly one thousand failures. Many were C and C++ frontend failures, but the failure surface was not limited to parsing. Once a source-to-source compiler builds an inconsistent AST, every later layer becomes a possible reporter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;generated source fails to compile,&lt;/li&gt;
&lt;li&gt;token streams stop matching nodes,&lt;/li&gt;
&lt;li&gt;source-position checks report drift,&lt;/li&gt;
&lt;li&gt;name qualification chooses the wrong spelling,&lt;/li&gt;
&lt;li&gt;dataflow and callgraph passes assert on unexpected AST shapes,&lt;/li&gt;
&lt;li&gt;OpenMP and Fortran gates become collateral damage if a broad change moves shared infrastructure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That is why the campaign was not organized around &amp;ldquo;make the next red test green.&amp;rdquo; It was organized around a stricter rule:&lt;/p&gt;</description></item></channel></rss>