Modernization with Agents: A Playbook

A modernization that doesn't preserve the contract isn't a modernization — it's a rewrite with a more polite name. The contract is the characterization suite from the prior round of work, and the agent's job for the next quarter is to keep that suite green while replacing everything underneath it. Frameworks change. Runtimes change. Languages get a version bump. The public surface and the behavior under it do not. That's the whole game, and it's easy to lose by treating the modernization like a normal feature stream.

If you want the verdict in one breath: stage everything, run old and new in parallel, compare at the seam, cut over with a flag — never a deploy. Skip any of that and you don't have a modernization; you have a coin toss on a Friday afternoon.

Why modernization with agents needs its own discipline

Classic modernization assumed a small team, a year of runway, and a tolerance for the kind of regressions you only catch in production. Agents collapse the timeline and inflate the surface they touch at the same time. A single weekend of agent work can produce a diff that touches half the service, builds clean, deploys, and breaks in places nobody thought to look. The standard pipeline doesn't notice because nothing throws an exception.

The bottom line: classic modernization measured whether the new system shipped. Agentic modernization has to measure whether the contract held while the new system shipped — and those are two completely different questions, with the second one being the one your users will judge you on.

The modernization loop, end to end

modernization-audit  ->  staged-modernization  ->  verify-modernization
        ^                                                    |
        |____________________________________________________|
                       (audit re-scored after each stage)

Three stages, one direction, a back-edge so the audit you started with is the audit you re-score against. The audit names the seams and the contract. Staged modernization executes one seam at a time. Verify proves the contract held — behavior, performance, cost — before the cutover ramps. Skip the audit and the agent picks its own seams. Skip verify and parallel run becomes permanent.

This loop is the canonical pattern for agentic modernization — it predates any one tool, and any team can run it. It has two halves that stay honest in different ways. The structural half finds the seams instead of guessing them, tracks the work as an inventory-and-manifest of what's modernized, in flight, or stubbed instead of leaving it in someone's head, and re-scores the structure to catch the idiom infection that turns a codebase bilingual. The runtime half — routing, flags, the parallel-run diff, the latency and cost gates — proves the cutover is safe. Keep the structure honest and the cutover honest and you have a modernization; drop either and you have a coin toss.

Each stage has a uniquely agentic twist. The audit isn't a migration plan in Notion. The staged modernization isn't a long-running branch. The verify isn't a green CI run.

Stage 1 · modernization-audit

The single most common reason modernizations stall isn't the new framework — it's that nobody decided what was actually being preserved before they started porting code. Without that decision, every disagreement becomes a debate, every rollback becomes a negotiation, and the audit becomes whatever the loudest engineer remembered last week. Do four things:

  1. State the from and the to in writing. Framework version, runtime, language version, deployment target. Not "we're modernizing checkout" — exactly what's changing on each axis.
  2. Name the contract. The characterization suite is the spec. The public API definition is the spec. The SLA numbers are the spec. If it isn't named in the audit, it isn't a contract.
  3. Identify the seams. Where can the new system run independently of the old? HTTP controllers, RPC adapters, scheduled jobs, queue consumers, database access layers. The seam list determines the staging plan; without seams, you have a rewrite. Dependency-graph and layering analysis is how you find the real seams instead of the obvious ones — the boundaries where coupling is already thin enough that one side can run without the other, surfaced from the structure rather than from folklore about where the modules "should" split.
  4. Decide what's out of scope. No new features. No opportunistic refactors. No public-API changes during the modernization. Every exception to this list is a stage your timeline didn't budget for.

The framework's idioms are not behavior. You can change them freely. The behavior is what the characterization suite pins. Be ruthless about this distinction or the audit will quietly grow a hundred sub-scopes nobody noticed.

# audits/checkout-modernization.2026-05-30.yaml — one file, in git, one SHA.
audit:
  target: services/checkout
  from:
    framework: rails 5.2
    runtime: heroku
    language: ruby 2.7
  to:
    framework: rails 7.1
    runtime: kubernetes (eks)
    language: ruby 3.3
  contract:
    characterization_suite: audits/checkout-suite.2026-05-30.yaml@a1b2c3d
    public_api: openapi/checkout.v3.yaml@d4e5f6
    sla:
      p50_latency_ms: 120
      p99_latency_ms: 480
      throughput_rps: 800
      error_rate: 0.001
  seams:
    - id: order.controller
      kind: http
      strategy: strangler-fig
    - id: payment.adapter
      kind: rpc
      strategy: parallel-run
    - id: inventory.cron
      kind: scheduled
      strategy: dark-launch
  staging:
    - stage: 1
      seam: order.controller
      ramp: [shadow, 5%, 25%, 100%]
    - stage: 2
      seam: payment.adapter
      ramp: [shadow, 5%, 25%, 100%]
    - stage: 3
      seam: inventory.cron
      ramp: [shadow, 100%]
  out_of_scope:
    - new features during modernization
    - public-API changes
    - refactors not on a named seam
  budgets:
    timeline_weeks: 14
    parallel_run_window_days: 30
    peak_cost_overhead_pct: 25
    decommission_by: 2026-09-30

The biggest single thing most teams skip: writing a decommission date in the audit. Without one, parallel run is forever, peak cost overhead is permanent, and the new system is just the second of two production systems you're now paying to operate. The date is part of the audit; missing it is the audit failing.

Stage 2 · staged-modernization

This is the stage where most "modernizations" actually become slow-motion rewrites. The defense against that is procedural, not technical: one stage, one seam, one PR. The agent doesn't decide which seam comes next; the audit does.

What keeps "which seam comes next" from living in chat history is a durable artifact: an inventory of every symbol that has to move, a dependency-aware order so seams are taken leaf-first, and a manifest recording what's modernized, what's stubbed, and what's deferred and why. That manifest is the difference between "the agent thinks it's about 60% done" and a checkable record of exactly which seams are live, which are dark, and which haven't been touched.

The shape that works:

  1. One PR per seam. A PR that modernizes the order controller is reviewable. A PR that modernizes "checkout" is a six-week branch nobody can rebase. If the agent's plan crosses two seams, that's a planning failure — fix the audit.
  2. Parallel run is the default. The new implementation runs alongside the old, fed the same input, output compared at the seam. Shadow-only is the fallback when parallel-run isn't possible (one-way side effects). "We tested it in staging" isn't a strategy.
  3. Cutover is a feature flag, not a deploy. The new code ships dark. Traffic ramps via a flag. Rollback is a flag flip, not a redeploy. A deploy-gated cutover is a 2008 modernization.
  4. The ramp is in the audit, not in someone's head. Shadow → 5% → 25% → 100%, with a hold period between rungs. Any rung skipped is a process violation, not a judgment call.
  5. Tenant-gated when traffic is non-uniform. A flag that ramps by traffic percentage will burn the same handful of high-volume tenants every time. Per-tenant cutover scopes the blast radius.
  6. The diff cites the seam. Each hunk references the seam ID it addresses. Hunks that don't cite anything get cut before review.

One stage = one declarative artifact. Audit SHA, seam ID, strategy, ramp position, parallel-run start, diff threshold. All in the PR. The agent's plan in markdown, the run trace attached. Future you needs to know exactly what changed at this seam without spelunking through chat history.

# stages/2026-05-30-order-controller.yaml
stage:
  audit: audits/checkout-modernization.2026-05-30.yaml@a1b2c3d
  addresses: order.controller
  strategy: strangler-fig
  ramp_today: shadow
  ramp_next:
    on_verify_pass: 5%
    hold_period_hours: 24
  parallel_run:
    started: 2026-05-25
    diff_threshold: 0.001
    primary: old
    shadow: new
  api_compatibility:
    contract: openapi/checkout.v3.yaml@d4e5f6
    breaking_changes: []           # non-empty = block
  verify:
    plan: verify/2026-05-30-order-controller.yaml

Stage 3 · verify-modernization

Verify is the stage that decides whether the cutover ramps. Without it, every ramp decision is "looks fine, ship it." With it, every ramp decision is mechanical — the gate passed or it didn't, the parallel-run diff is under threshold or it isn't, the p99 is within the contract or it isn't. The job becomes watching the trend lines, not arguing about them.

Verify has to do more than green up the characterization suite on the new system (the agent ported the suite; of course it's green). It needs six signals, and the modernization-specific ones aren't optional:

Don't ship a single number. Verify reports per dimension: characterization pass, parallel-run diff, p99 ratio, throughput ratio, contract breaks, cost overhead. "Everything's fine" hides the case where p99 quietly drifted up 30% over a week.

# verify/run.py — a tiny but real modernization verify gate.
import sys
from verify import characterize, parallel, perf, contract, cost, traces

target = "services/checkout"
audit  = "audits/checkout-modernization.2026-05-30.yaml"

scores = {
    "characterization":  characterize.run(target, suite_sha="a1b2c3d"),
    "parallel_run_diff": parallel.diff(window="24h"),
    "p99_ratio":         perf.ratio("p99_latency"),
    "throughput_ratio":  perf.ratio("throughput"),
    "contract_breaks":   contract.violations(spec="openapi/checkout.v3.yaml"),
    "cost_overhead":     cost.overhead_pct(),
    "trace_diff":        traces.diff(samples=1000),
}

THRESHOLDS = {
    "characterization":  1.00,    # behavior preservation is binary
    "parallel_run_diff": 0.999,   # <0.1% byte-level diff
    "p99_ratio":         1.10,    # ≤10% slower than baseline
    "throughput_ratio":  0.95,    # ≥95% of baseline throughput
    "contract_breaks":   0,
    "cost_overhead":     1.25,    # ≤25% peak overhead
    "trace_diff":        0.98,    # ≥98% span-shape similarity
}
failed = {k: scores[k] for k, t in THRESHOLDS.items()
          if (scores[k] < t if k != "p99_ratio" and k != "cost_overhead" else scores[k] > t)}
if failed:
    print("VERIFY GATE FAILED:", failed); sys.exit(1)
print("VERIFY GATE PASSED:", scores)

The merge gate

The gate stops a stage from ramping. The minimum that earns its keep:

  1. Audit SHA present and unmodified. The stage cites a real audit at a real revision. The audit hasn't been edited in the same PR.
  2. One PR, one seam. PRs that span seams auto-split.
  3. Every hunk cites a seam. Hunks with no audit reference are cut.
  4. Verify gate green on every dimension. Characterization, parallel-run diff, p99, throughput, contract, cost, trace diff — all individually green. No "overall pass."
  5. Hold period observed. The previous ramp rung has held for the audit's required period. Skipping ahead because "it looked fine for an hour" is a process violation.
  6. Decommission date still credible. If burn rate against the budget projects past the decommission date, the stage doesn't ramp — the audit re-opens.

A stage with all six is rampable. A stage with five of six is interesting and goes back to whichever dimension is missing.

Best practices, in plain English

Failure modes & gotchas

These have actually taken teams down. Every one has a one-week fix nobody had time for.

The gotcha behind half of them: the modernization wasn't bounded in writing. An audit with seams, an out-of-scope list, a peak cost, and a decommission date prevents most of this list before it ships.

Cost, parallel-run overhead, and the agent budget

Cost isn't a finance problem during modernization — it's a commitment problem. The parallel-run window doubles infrastructure for the targeted seam. That's fine for six weeks. It is not fine for six months, and the difference between those two is the audit's decommission date, enforced.

How act101 fits the loop

The loop is tool-agnostic. The cutover half — routing, flags, traffic comparison, latency and cost gates — is operational tooling; reach for one of each and no more: a router (nginx/envoy/Istio), a flags backend (Unleash/Flagsmith/GrowthBook), a traffic comparator (scientist/diffy), a load tool (k6/Gatling), and a contract tool (pact/openapi-diff/buf).

Where act101 fits is the structural half, which maps onto its analyze → act → attest spine:

It's callable as MCP tools from inside an agent (Cursor, Claude Code, any MCP host) and as the act CLI for the structural gates in CI. act101 keeps the structure honest; operational tooling keeps the cutover honest — and the maturity gain is in running the loop, not in stacking more vendors.

The maturity ladder

Most teams don't sit at one tier — they're advanced on flagging and primitive on parallel run, or have great contract tests and no decommission discipline. Tick what you actually do today.

Zero to three: rewrite-in-progress, just with better branding. Four to six: real modernization, porous — the regressions that show up are the ones the gate didn't measure. Seven to nine: the cutover ramps mechanically; the team's energy moves to the next seam. Ten: the audit re-scores per stage, decommission lands on time, and the next modernization on this service is half as hard as this one.

A reasonable 30 / 60 / 90-day plan

  1. Days 1–30 — get to honest. Write the modernization-audit. Stand up the strangler infrastructure: router, flags, parallel-run harness, comparator. Pick the smallest seam from the audit. Don't cut anything yet. You're not modernizing; you're building the substrate that makes modernization mechanical.
  2. Days 31–60 — run one seam end-to-end. Shadow → 5% → 25% → 100% on the smallest seam, with hold periods at every rung. The pipeline now demonstrably works on something. The team has a worked example of every gate dimension passing or failing.
  3. Days 61–90 — roll the loop. Subsequent seams move through the same pipeline at higher cadence. The audit re-scores per stage. The decommission artifact exists; the date is real. The cost overhead trend is down, not up.

What Fowler got right (and where this fits in the broader process)

Martin Fowler's StranglerFigApplication article, originally written in 2004, is the modernization technique this whole playbook depends on. The vine grows around the old tree, replaces it piece by piece, and one day the tree is gone — the structure that replaced it bears its own weight. The technique is decades old and still load-bearing, because every alternative is either a big-bang rewrite (the failure mode the technique was invented to avoid) or a permanent dual-running of two systems (the failure mode teams fall into when they don't enforce a decommission date).

Agentic modernization is what makes the strangler fig usable at the cadence and scope modern systems actually need: the audit picks the seams from dependency analysis, the staging executes them one at a time against a tracked inventory-and-manifest, the verify re-scores the structure — and the characterization suite from the prior round of work is the contract that makes "the contract held" mean something concrete. Read Fowler's original article alongside this playbook; the overlap is the part that's actually load-bearing. That is what turns "grow the vine, retire the tree" from a metaphor into a tracked, re-scored sequence.

This is the stage of the work where the prior rounds cash in. The testing uplift produced the suite; the modernization treats that suite as a contract. The refactoring loop ran at the file level; this loop runs at the system level. The mechanics are the same — audit, execute, verify, with a versioned artifact at each step — and the discipline is what carries between them. None of it is new. What's new is that the agents make it tractable at a cadence and a scope that used to require a team of ten and a year of patience.

The shortest possible summary: name the contract, then never break it. Stage every change. Run old and new in parallel. Compare at the seam. Cut over with a flag, never a deploy. Decommission is a dated artifact, not a deferred wish. Trace every run. That loop, run boringly for six months, is how you ship a modernization that doesn't quietly become a second production system you can't afford to retire.