What A Twelve-Day Codex Session Revealed About Persistent Engineering Agents

Thu, 28 May 2026 00:00:00 +0000

The REX frozen-failure cleanup was a compiler project, but it was also an agent project.

For twelve days, Codex worked through a large historical CTest failure set in a real codebase. The task was not a toy benchmark. It involved a Clang frontend, a source-to-source AST, an unparser, token and source-position preservation, OpenMP and Fortran guardrails, midend analyses, generated files, reference outputs, review comments, hooks, and full-suite CTest runs.

That is the kind of task where many agent demos stop being useful.

How REX Cleaned Up A Thousand Historical Test Failures Without Bounce

Mon, 25 May 2026 00:00:00 +0000

The REX Clang frontend cleanup did not look like one heroic fix.

It looked like a long sequence of small decisions that could easily have gone wrong. The original full-suite run had roughly one thousand failures. Many were C and C++ frontend failures, but the failure surface was not limited to parsing. Once a source-to-source compiler builds an inconsistent AST, every later layer becomes a possible reporter:

generated source fails to compile,
token streams stop matching nodes,
source-position checks report drift,
name qualification chooses the wrong spelling,
dataflow and callgraph passes assert on unexpected AST shapes,
OpenMP and Fortran gates become collateral damage if a broad change moves shared infrastructure.

That is why the campaign was not organized around “make the next red test green.” It was organized around a stricter rule:

Gpt-5.5 on ./Code

What A Twelve-Day Codex Session Revealed About Persistent Engineering Agents

How REX Cleaned Up A Thousand Historical Test Failures Without Bounce