Modernization with Agents: A Playbook

act101 · modernization, agents

A modernization that doesn't preserve the contract isn't a modernization — it's a rewrite with a more polite name. The contract is the characterization suite from the prior round of work, and the agent's job for the next quarter is to keep that suite green while replacing everything underneath it. Frameworks change. Runtimes change. Languages get a version bump. The public surface and the behavior under it do not. That's the whole game, and it's easy to lose by treating the modernization like a normal feature stream.

If you want the verdict in one breath: stage everything, run old and new in parallel, compare at the seam, cut over with a flag — never a deploy. Skip any of that and you don't have a modernization; you have a coin toss on a Friday afternoon.

Why modernization with agents needs its own discipline

Classic modernization assumed a small team, a year of runway, and a tolerance for the kind of regressions you only catch in production. Agents collapse the timeline and inflate the surface they touch at the same time. A single weekend of agent work can produce a diff that touches half the service, builds clean, deploys, and breaks in places nobody thought to look. The standard pipeline doesn't notice because nothing throws an exception.

The diff is too big to read. A framework jump touches every controller, every test, every config. Review fatigue isn't a moral failure; it's the structural condition. Unbounded modernization PRs get waved through.
Tests pass on the new system for the wrong reason. The agent ported the tests alongside the code. Now both sides agree on a behavior the old system never did. The characterization suite is the only honest reference; if it isn't the contract, you have no contract.
Performance regresses silently. The new framework is "modern" and "fast" and tests in milliseconds. Production p99 doubles. Unit tests don't measure latency; characterization tests don't measure latency; nothing in the standard suite catches it.
The strangler fig grows in the wrong direction. Without a named seam, the agent rewrites pieces all over the system. Six weeks in, you can't decommission anything because every part is half-modernized.
Cost overhead becomes permanent. Parallel run was supposed to last six weeks. It's been ninety. Nobody has the authority to flip the cutover, because nobody owns the proof that the new system is actually equivalent.
The framework's idioms infect the codebase. The agent loves the new framework's patterns. Half the codebase is now in the new idiom while the other half is in the old, and the next modernization is harder than this one was.

The bottom line: classic modernization measured whether the new system shipped. Agentic modernization has to measure whether the contract held while the new system shipped — and those are two completely different questions, with the second one being the one your users will judge you on.

The modernization loop, end to end

modernization-audit  ->  staged-modernization  ->  verify-modernization
        ^                                                    |
        |____________________________________________________|
                       (audit re-scored after each stage)

Three stages, one direction, a back-edge so the audit you started with is the audit you re-score against. The audit names the seams and the contract. Staged modernization executes one seam at a time. Verify proves the contract held — behavior, performance, cost — before the cutover ramps. Skip the audit and the agent picks its own seams. Skip verify and parallel run becomes permanent.

This loop is the canonical pattern for agentic modernization — it predates any one tool, and any team can run it. It has two halves that stay honest in different ways. The structural half finds the seams instead of guessing them, tracks the work as an inventory-and-manifest of what's modernized, in flight, or stubbed instead of leaving it in someone's head, and re-scores the structure to catch the idiom infection that turns a codebase bilingual. The runtime half — routing, flags, the parallel-run diff, the latency and cost gates — proves the cutover is safe. Keep the structure honest and the cutover honest and you have a modernization; drop either and you have a coin toss.

Each stage has a uniquely agentic twist. The audit isn't a migration plan in Notion. The staged modernization isn't a long-running branch. The verify isn't a green CI run.

Stage 1 · modernization-audit

The single most common reason modernizations stall isn't the new framework — it's that nobody decided what was actually being preserved before they started porting code. Without that decision, every disagreement becomes a debate, every rollback becomes a negotiation, and the audit becomes whatever the loudest engineer remembered last week. Do four things:

State the from and the to in writing. Framework version, runtime, language version, deployment target. Not "we're modernizing checkout" — exactly what's changing on each axis.
Name the contract. The characterization suite is the spec. The public API definition is the spec. The SLA numbers are the spec. If it isn't named in the audit, it isn't a contract.
Identify the seams. Where can the new system run independently of the old? HTTP controllers, RPC adapters, scheduled jobs, queue consumers, database access layers. The seam list determines the staging plan; without seams, you have a rewrite. Dependency-graph and layering analysis is how you find the real seams instead of the obvious ones — the boundaries where coupling is already thin enough that one side can run without the other, surfaced from the structure rather than from folklore about where the modules "should" split.
Decide what's out of scope. No new features. No opportunistic refactors. No public-API changes during the modernization. Every exception to this list is a stage your timeline didn't budget for.

The framework's idioms are not behavior. You can change them freely. The behavior is what the characterization suite pins. Be ruthless about this distinction or the audit will quietly grow a hundred sub-scopes nobody noticed.

# audits/checkout-modernization.2026-05-30.yaml — one file, in git, one SHA.
audit:
  target: services/checkout
  from:
    framework: rails 5.2
    runtime: heroku
    language: ruby 2.7
  to:
    framework: rails 7.1
    runtime: kubernetes (eks)
    language: ruby 3.3
  contract:
    characterization_suite: audits/checkout-suite.2026-05-30.yaml@a1b2c3d
    public_api: openapi/checkout.v3.yaml@d4e5f6
    sla:
      p50_latency_ms: 120
      p99_latency_ms: 480
      throughput_rps: 800
      error_rate: 0.001
  seams:
    - id: order.controller
      kind: http
      strategy: strangler-fig
    - id: payment.adapter
      kind: rpc
      strategy: parallel-run
    - id: inventory.cron
      kind: scheduled
      strategy: dark-launch
  staging:
    - stage: 1
      seam: order.controller
      ramp: [shadow, 5%, 25%, 100%]
    - stage: 2
      seam: payment.adapter
      ramp: [shadow, 5%, 25%, 100%]
    - stage: 3
      seam: inventory.cron
      ramp: [shadow, 100%]
  out_of_scope:
    - new features during modernization
    - public-API changes
    - refactors not on a named seam
  budgets:
    timeline_weeks: 14
    parallel_run_window_days: 30
    peak_cost_overhead_pct: 25
    decommission_by: 2026-09-30

The biggest single thing most teams skip: writing a decommission date in the audit. Without one, parallel run is forever, peak cost overhead is permanent, and the new system is just the second of two production systems you're now paying to operate. The date is part of the audit; missing it is the audit failing.

Stage 2 · staged-modernization

This is the stage where most "modernizations" actually become slow-motion rewrites. The defense against that is procedural, not technical: one stage, one seam, one PR. The agent doesn't decide which seam comes next; the audit does.

What keeps "which seam comes next" from living in chat history is a durable artifact: an inventory of every symbol that has to move, a dependency-aware order so seams are taken leaf-first, and a manifest recording what's modernized, what's stubbed, and what's deferred and why. That manifest is the difference between "the agent thinks it's about 60% done" and a checkable record of exactly which seams are live, which are dark, and which haven't been touched.

The shape that works:

One PR per seam. A PR that modernizes the order controller is reviewable. A PR that modernizes "checkout" is a six-week branch nobody can rebase. If the agent's plan crosses two seams, that's a planning failure — fix the audit.
Parallel run is the default. The new implementation runs alongside the old, fed the same input, output compared at the seam. Shadow-only is the fallback when parallel-run isn't possible (one-way side effects). "We tested it in staging" isn't a strategy.
Cutover is a feature flag, not a deploy. The new code ships dark. Traffic ramps via a flag. Rollback is a flag flip, not a redeploy. A deploy-gated cutover is a 2008 modernization.
The ramp is in the audit, not in someone's head. Shadow → 5% → 25% → 100%, with a hold period between rungs. Any rung skipped is a process violation, not a judgment call.
Tenant-gated when traffic is non-uniform. A flag that ramps by traffic percentage will burn the same handful of high-volume tenants every time. Per-tenant cutover scopes the blast radius.
The diff cites the seam. Each hunk references the seam ID it addresses. Hunks that don't cite anything get cut before review.

One stage = one declarative artifact. Audit SHA, seam ID, strategy, ramp position, parallel-run start, diff threshold. All in the PR. The agent's plan in markdown, the run trace attached. Future you needs to know exactly what changed at this seam without spelunking through chat history.

# stages/2026-05-30-order-controller.yaml
stage:
  audit: audits/checkout-modernization.2026-05-30.yaml@a1b2c3d
  addresses: order.controller
  strategy: strangler-fig
  ramp_today: shadow
  ramp_next:
    on_verify_pass: 5%
    hold_period_hours: 24
  parallel_run:
    started: 2026-05-25
    diff_threshold: 0.001
    primary: old
    shadow: new
  api_compatibility:
    contract: openapi/checkout.v3.yaml@d4e5f6
    breaking_changes: []           # non-empty = block
  verify:
    plan: verify/2026-05-30-order-controller.yaml

Stage 3 · verify-modernization

Verify is the stage that decides whether the cutover ramps. Without it, every ramp decision is "looks fine, ship it." With it, every ramp decision is mechanical — the gate passed or it didn't, the parallel-run diff is under threshold or it isn't, the p99 is within the contract or it isn't. The job becomes watching the trend lines, not arguing about them.

Verify has to do more than green up the characterization suite on the new system (the agent ported the suite; of course it's green). It needs six signals, and the modernization-specific ones aren't optional:

Characterization suite. The old suite, unmodified, running against the new system. Anything below 100% is a contract break, full stop.
Parallel-run diff. Real production traffic, fed to both systems, outputs compared at the seam. A byte-level diff under a tight threshold (typically <0.1%). Non-determinism allowances are named in the audit, not invented at gate time.
Performance contract. p50, p99, throughput against the audit's SLA. Latency budgets are part of the contract; a 4× p99 with a green suite is a failed modernization.
API contract. OpenAPI/Protobuf diff against the public definition. Any breaking change is a hard block, period.
Cost overhead. Current spend on the modernization (old + new + parallel-run infra) against the audit's peak budget and against the decommission date. Overshoot triggers a re-audit.
Behavior trace diff. Span-level comparison of representative traces. Catches the case where the output matches but the system did the work three times to get there.
Structural drift. Re-run structural analysis scoped to the active seam and its neighbors. The new framework's idioms must not have leaked outside the seam, no new cycle or layering break may have appeared, and the un-modernized half must still read in the old idiom. This is the gate that catches "the codebase went bilingual" while it's one seam wide instead of a quarter later — a structural check the runtime gates above are blind to.

Don't ship a single number. Verify reports per dimension: characterization pass, parallel-run diff, p99 ratio, throughput ratio, contract breaks, cost overhead. "Everything's fine" hides the case where p99 quietly drifted up 30% over a week.

# verify/run.py — a tiny but real modernization verify gate.
import sys
from verify import characterize, parallel, perf, contract, cost, traces

target = "services/checkout"
audit  = "audits/checkout-modernization.2026-05-30.yaml"

scores = {
    "characterization":  characterize.run(target, suite_sha="a1b2c3d"),
    "parallel_run_diff": parallel.diff(window="24h"),
    "p99_ratio":         perf.ratio("p99_latency"),
    "throughput_ratio":  perf.ratio("throughput"),
    "contract_breaks":   contract.violations(spec="openapi/checkout.v3.yaml"),
    "cost_overhead":     cost.overhead_pct(),
    "trace_diff":        traces.diff(samples=1000),
}

THRESHOLDS = {
    "characterization":  1.00,    # behavior preservation is binary
    "parallel_run_diff": 0.999,   # <0.1% byte-level diff
    "p99_ratio":         1.10,    # ≤10% slower than baseline
    "throughput_ratio":  0.95,    # ≥95% of baseline throughput
    "contract_breaks":   0,
    "cost_overhead":     1.25,    # ≤25% peak overhead
    "trace_diff":        0.98,    # ≥98% span-shape similarity
}
failed = {k: scores[k] for k, t in THRESHOLDS.items()
          if (scores[k] < t if k != "p99_ratio" and k != "cost_overhead" else scores[k] > t)}
if failed:
    print("VERIFY GATE FAILED:", failed); sys.exit(1)
print("VERIFY GATE PASSED:", scores)

The merge gate

The gate stops a stage from ramping. The minimum that earns its keep:

Audit SHA present and unmodified. The stage cites a real audit at a real revision. The audit hasn't been edited in the same PR.
One PR, one seam. PRs that span seams auto-split.
Every hunk cites a seam. Hunks with no audit reference are cut.
Verify gate green on every dimension. Characterization, parallel-run diff, p99, throughput, contract, cost, trace diff — all individually green. No "overall pass."
Hold period observed. The previous ramp rung has held for the audit's required period. Skipping ahead because "it looked fine for an hour" is a process violation.
Decommission date still credible. If burn rate against the budget projects past the decommission date, the stage doesn't ramp — the audit re-opens.

A stage with all six is rampable. A stage with five of six is interesting and goes back to whichever dimension is missing.

Best practices, in plain English

Run the testing uplift first. A modernization without a characterization suite is a rewrite with optimistic phrasing. The suite is the contract; without it, there is none.
The framework's idioms are not behavior. Change them freely. The user-visible behavior is what the contract pins.
One seam at a time. Modernizations that try to flip two seams in one PR end up flipping neither cleanly.
Parallel run is the default; shadow is the fallback. Always compare at the seam. "We deployed and watched the metrics" is not a comparison.
Cutover is a flag, never a deploy. Rollback should be measured in seconds, not in pipelines.
Tenant-gate the ramp. Per-tenant flags are the difference between "one bad customer rolls back" and "every customer feels the rollback."
Performance is a contract dimension. Bake p50, p99, and throughput into the audit. A green suite with a 4× p99 is a failed modernization.
Cost is bounded and dated. Peak overhead has a number; decommission has a date. Both live in the audit.
Pin the new system's framework version. Modernizing onto a moving target adds a second modernization mid-flight.
Decommission is a deliverable. The artifact that retires the old system has a date and an owner. "We'll get to it" is how parallel run becomes permanent.

Failure modes & gotchas

These have actually taken teams down. Every one has a one-week fix nobody had time for.

Big-bang Friday. Three months of careful staging, then someone flipped 100% on a Friday afternoon "to start the weekend with it clean." Fix: ramp rungs and hold periods are enforced by the gate, not by goodwill.
The suite was green; p99 doubled. Characterization tests don't measure latency. Production users do. Fix: performance contract is a verify dimension with thresholds.
Scope creep. "While we're modernizing, let's also rebuild the email layer." Fix: out-of-scope list in the audit, and the gate rejects PRs that touch it.
The opportunistic refactor. The agent saw a smell on its way through. It "just cleaned it up." Now your diff is half modernization, half refactor, all unreviewable. Fix: hunks must cite a seam; rejects fire automatically.
Parallel run became permanent. Six weeks turned into nine months. Cost overhead is now part of the run rate. Fix: decommission date in the audit; the gate fails when burn rate projects past it.
The seam leaked. Clients had been depending on transitive behavior of the old implementation — a header you didn't document, a response field you didn't promise. Fix: behavior trace diff catches what unit tests miss; surface these to the audit before ramping.
The new framework's idioms infected the half that wasn't modernized yet. Now the codebase is bilingual. Fix: edits outside the active seam are blocked at the gate during a stage.
Modernization PRs became unmergeable. Main moved. Rebase hell ensued. The branch died. Fix: smaller PRs, scheduled merge windows, never a long-running modernization branch.

The gotcha behind half of them: the modernization wasn't bounded in writing. An audit with seams, an out-of-scope list, a peak cost, and a decommission date prevents most of this list before it ships.

Cost, parallel-run overhead, and the agent budget

Cost isn't a finance problem during modernization — it's a commitment problem. The parallel-run window doubles infrastructure for the targeted seam. That's fine for six weeks. It is not fine for six months, and the difference between those two is the audit's decommission date, enforced.

Peak overhead has a number. 25% is a reasonable starting point; whatever you pick, it lives in the audit and the gate watches it.
Decommission has a date. Burn-rate projection past the date fails the next ramp. The audit re-opens.
Cap timeline. Modernizations that take more than four months are de facto rewrites with a polite name. Either compress the seam list or admit it's a rewrite and run a different playbook.
Cap agent context spend. Tokens per stage. A stage that cost 900k tokens didn't understand the seam; it summarized it badly five times.
Cap reviewer load. Diffs above a hunk count auto-split.
Cheap models for the boring stages. Idiom-style ports (Rails 5 → 7 controller syntax) on a small model; cross-runtime work (sync → async, monolith → modules) on the strong one. Routing pays for itself within a week.
Old system is not a free dependency. While both run, the old system still has security obligations, patches, on-call rotation. Don't pretend it's already retired.

How act101 fits the loop

The loop is tool-agnostic. The cutover half — routing, flags, traffic comparison, latency and cost gates — is operational tooling; reach for one of each and no more: a router (nginx/envoy/Istio), a flags backend (Unleash/Flagsmith/GrowthBook), a traffic comparator (scientist/diffy), a load tool (k6/Gatling), and a contract tool (pact/openapi-diff/buf).

Where act101 fits is the structural half, which maps onto its analyze → act → attest spine:

Analyze the seams — dependency-graph, layering, coupling, and cluster/seam analysis surface the thin-coupling boundaries where one side can run without the other, read from the structure rather than guessed.
Act on one seam — a port inventory and manifest track the work as a durable artifact (every symbol to move, a dependency-aware leaf-first order, and a record of what's modernized, stubbed, or deferred), and deterministic refactor operations execute the structural edits within the seam.
Attest the structure held — re-running the structural analyzers scoped to the active seam is the idiom-infection and layering-drift gate: no new cycle, no layer break, no framework idiom leaking past the seam. Pair it with the runtime gates that decide whether the cutover is safe.

It's callable as MCP tools from inside an agent (Cursor, Claude Code, any MCP host) and as the act CLI for the structural gates in CI. act101 keeps the structure honest; operational tooling keeps the cutover honest — and the maturity gain is in running the loop, not in stacking more vendors.

The maturity ladder

Most teams don't sit at one tier — they're advanced on flagging and primitive on parallel run, or have great contract tests and no decommission discipline. Tick what you actually do today.

[ ] A characterization suite for the system exists and is the named contract
[ ] The modernization-audit names the from, the to, the seams, and the contract
[ ] Each stage targets exactly one seam, in one PR
[ ] Parallel run is the default; shadow is the documented fallback
[ ] Cutover is a feature flag, ramped per tenant
[ ] Behavior trace diff runs on production traffic
[ ] Performance (p99, throughput) is a verify-gate dimension with thresholds
[ ] Cost overhead has a peak and a decommission date in the audit
[ ] Rollback shifts traffic in seconds; redeploy is the fallback
[ ] The decommission artifact exists in git from the day the audit is signed

Zero to three: rewrite-in-progress, just with better branding. Four to six: real modernization, porous — the regressions that show up are the ones the gate didn't measure. Seven to nine: the cutover ramps mechanically; the team's energy moves to the next seam. Ten: the audit re-scores per stage, decommission lands on time, and the next modernization on this service is half as hard as this one.

A reasonable 30 / 60 / 90-day plan

Days 1–30 — get to honest. Write the modernization-audit. Stand up the strangler infrastructure: router, flags, parallel-run harness, comparator. Pick the smallest seam from the audit. Don't cut anything yet. You're not modernizing; you're building the substrate that makes modernization mechanical.
Days 31–60 — run one seam end-to-end. Shadow → 5% → 25% → 100% on the smallest seam, with hold periods at every rung. The pipeline now demonstrably works on something. The team has a worked example of every gate dimension passing or failing.
Days 61–90 — roll the loop. Subsequent seams move through the same pipeline at higher cadence. The audit re-scores per stage. The decommission artifact exists; the date is real. The cost overhead trend is down, not up.

What Fowler got right (and where this fits in the broader process)

Martin Fowler's StranglerFigApplication article, originally written in 2004, is the modernization technique this whole playbook depends on. The vine grows around the old tree, replaces it piece by piece, and one day the tree is gone — the structure that replaced it bears its own weight. The technique is decades old and still load-bearing, because every alternative is either a big-bang rewrite (the failure mode the technique was invented to avoid) or a permanent dual-running of two systems (the failure mode teams fall into when they don't enforce a decommission date).

Agentic modernization is what makes the strangler fig usable at the cadence and scope modern systems actually need: the audit picks the seams from dependency analysis, the staging executes them one at a time against a tracked inventory-and-manifest, the verify re-scores the structure — and the characterization suite from the prior round of work is the contract that makes "the contract held" mean something concrete. Read Fowler's original article alongside this playbook; the overlap is the part that's actually load-bearing. That is what turns "grow the vine, retire the tree" from a metaphor into a tracked, re-scored sequence.

This is the stage of the work where the prior rounds cash in. The testing uplift produced the suite; the modernization treats that suite as a contract. The refactoring loop ran at the file level; this loop runs at the system level. The mechanics are the same — audit, execute, verify, with a versioned artifact at each step — and the discipline is what carries between them. None of it is new. What's new is that the agents make it tractable at a cadence and a scope that used to require a team of ten and a year of patience.

The shortest possible summary: name the contract, then never break it. Stage every change. Run old and new in parallel. Compare at the seam. Cut over with a flag, never a deploy. Decommission is a dated artifact, not a deferred wish. Trace every run. That loop, run boringly for six months, is how you ship a modernization that doesn't quietly become a second production system you can't afford to retire.

Discuss on X