Migration with Agents: A Playbook

A migration is the modernization where you also throw away the language. Everything the prior playbook said about preserving the contract while replacing the implementation still applies, plus a category of failures that doesn't exist when the source and target speak the same tongue. The agent has been trained on idiomatic target-language code, and given the chance, it will produce idiomatic target-language code — not code that does what the source code did. The whole playbook is built around closing that gap without giving up the speed the agents make possible.

If you want the verdict in one breath: port at the wire, substitute libraries in writing, translate idioms by policy, verify on bytes and concurrency — not just on green tests. Skip any of that and you don't have a port; you have a fluent rewrite that compiles, runs, almost passes the suite, and corrupts your monetary fields on the seventeenth Tuesday in production.

Why migration with agents needs its own discipline

Classic porting assumed a small team, a deep understanding of both languages, and the willingness to argue line-by-line about idiom choices. Agents collapse the cost of producing the diff and offer no help with any of the arguments. Worse, they produce confident output in a language they "know" — the bug is that the language they know isn't the one the source code was written in.

The bottom line: classic porting measured whether the new program compiled and the tests passed. Agentic porting has to measure whether the new program does what the old one did, including the parts the old one was doing by accident. Idiomatic compilation is not the bar.

The migration loop, end to end

migration-audit  ->  staged-porting  ->  verify-porting
        ^                                       |
        |_______________________________________|
                 (audit re-scored after each stage)

Three stages, one direction, a back-edge so the audit you started with is the audit you re-score against. The audit names the seam, the contract, the substitution table, the idiom policy, and the data hazards. Staged porting executes one seam at a time over a wire protocol. Verify proves the contract held — behavior, bytes, precision, concurrency, performance, cost — before the cutover ramps. Skip the audit and the agent ports to its favorite idioms. Skip verify and your monetary fields drift in the third decimal place forever.

This loop is the canonical pattern for agentic migration — it predates any one tool, and any team can run it. It splits cleanly into two halves. The structural half keeps the port faithful to the source: the inventory of every symbol that must move, the dependency-aware order it moves in, and the manifest of what's ported, stubbed, or deferred all have to be tracked state, not the agent's recollection. The verification half keeps the target honest on the wire: byte and semantic diff, numeric-precision and encoding conformance, the race detector. Both halves matter more here than in any same-language work — the agent's whole hazard is producing fluent target code that doesn't do what the source did. The rest of this playbook is tool-agnostic; one note near the end shows how a single AST-aware engine covers the structural half.

Each stage has a uniquely migration-specific twist. The audit is bigger than a modernization audit. The staged porting is mediated by a wire seam, not a function-level seam. The verify gate adds dimensions for the ways cross-language work breaks that same-language work doesn't.

Stage 1 · migration-audit

The most common reason migrations stall isn't the target language — it's that nobody decided, in writing, which target idioms and which target libraries map to which source ones before the agent started porting. Without that decision, every PR becomes a small architectural debate, every reviewer rules on a different convention, and the codebase ends up speaking five dialects of the target language by week six. Do five things:

  1. State the from and the to on every axis. Language version, runtime, framework, notable libraries, build system. Not "we're moving to Go" — exactly what's changing on each axis.
  2. Write the library substitution table. Every significant source-language library gets a target-side substitute, with the semantic delta written down. "Picks the closest match at PR time" is not a strategy.
  3. Write the idiom translation policy. How do checked exceptions translate? How does nullability translate? How does shared-memory concurrency translate? Decided once, in writing, before any code.
  4. Catalogue the data hazards. Numeric precision, encoding, time-zone, sort stability, hash determinism — the things that break silently across languages. Name each one and the mitigation.
  5. Name the seam. Wire protocol. Process boundary. Always. The seam is where old and new coexist; it is never in source.

The agent fluently produces wrong code. It compiles, it runs, it passes the tests it generated — and at the edges of the input space, it does something the source code never did. The audit's whole job is to make those edges visible before the agent starts producing fluent code at them.

# audits/checkout-migration.2026-05-30.yaml — one file, in git, one SHA.
audit:
  target: services/checkout
  from:
    language: java 17
    framework: spring boot 3
    runtime: jvm/k8s
    notable_libs: [guava, jackson, apache-commons-lang, logback]
    build: maven
  to:
    language: go 1.22
    framework: chi + standard library
    runtime: go/k8s
    build: go modules
  library_substitutions:
    - source: guava
      target: golang.org/x/exp + stdlib
      semantic_delta: |
        Multimap has no stdlib equivalent; use map[K][]V; iteration order
        is undefined in both. Cache eviction policies differ — do not
        substitute Guava Cache; introduce a named eviction package.
    - source: jackson
      target: encoding/json
      semantic_delta: |
        jackson tolerates loose JSON (single quotes, trailing commas);
        encoding/json does not. Document and reject these inputs.
    - source: apache-commons-lang
      target: stdlib + internal/util
      semantic_delta: |
        StringUtils.isBlank(null) returns true; Go nil-string is a panic.
        Wrap explicitly; never inline-translate.
    - source: logback (+ MDC)
      target: log/slog
      semantic_delta: |
        MDC context propagation maps to context.Context value passing.
        Existing MDC keys preserved verbatim in slog attributes.
  idiom_translation:
    exceptions_to_errors:
      checked: explicit error returns; sentinel vars for each domain error
      runtime: panics only for programmer error; surface as 500 via middleware
      wrapping: errors.Wrap; preserve cause via %w
    nullability:
      Optional: pointer with explicit nil check; never the zero value as sentinel
      java_null: pointer with explicit nil check
    concurrency:
      synchronized: sync.Mutex on receiver, named guard field
      CompletableFuture: errgroup + bounded channel
      ConcurrentHashMap: sync.Map only when access pattern is read-heavy; otherwise mutex + map
  data_hazards:
    - kind: numeric-precision
      hazard: BigDecimal → float64 corrupts monetary amounts
      mitigation: shopspring/decimal for all monetary fields, enforced by linter
    - kind: encoding
      hazard: JVM UTF-16 vs Go UTF-8; surrogate pair handling differs
      mitigation: wire diff at byte level; never normalize during diff
    - kind: time-zone
      hazard: java.util.Date implicit-UTC vs time.Time location-aware
      mitigation: persist Time as UTC-explicit; reject naive timestamps at boundary
    - kind: sort-stability
      hazard: Collections.sort is stable; sort.Slice is not
      mitigation: use sort.SliceStable wherever order is observable
    - kind: hash-determinism
      hazard: Java hashCode is deterministic per JVM session; Go map iteration is randomized
      mitigation: never depend on iteration order for output; sort keys explicitly
  contract:
    characterization_suite: audits/checkout-suite.2026-05-30.yaml@a1b2c3d
    public_api: openapi/checkout.v3.yaml@d4e5f6
    sla: { p50_ms: 120, p99_ms: 480, throughput_rps: 800 }
  seams:
    - { id: order.controller,  kind: http,      strategy: parallel-run }
    - { id: payment.adapter,   kind: rpc,       strategy: shadow }
    - { id: inventory.cron,    kind: scheduled, strategy: dark-launch }
  staging:
    - { stage: 1, seam: order.controller, ramp: [shadow, 5%, 25%, 100%] }
    - { stage: 2, seam: payment.adapter,  ramp: [shadow, 5%, 25%, 100%] }
    - { stage: 3, seam: inventory.cron,   ramp: [shadow, 100%] }
  out_of_scope:
    - new features during the port
    - public-API changes
    - library substitutions not in the table
    - opportunistic redesigns
  budgets:
    timeline_weeks: 18
    parallel_run_window_days: 45
    peak_cost_overhead_pct: 35
    decommission_by: 2026-12-15

This audit is bigger than a modernization audit on purpose. Every additional section — the substitution table, the idiom policy, the hazards — represents a category of bug that same-language work doesn't have. Writing it down up front is how you avoid having the conversation in twenty different PR comments later.

Underneath those human decisions is the structural backbone the audit rides on: the source's behavioral contract recovered symbol by symbol, the inventory of everything that has to move, and the dependency-aware (leaf-first) order to move it in. That is the part of the audit you cannot write from memory across a million lines — and the part that, left to the agent, becomes "it ported what it happened to read first."

The single biggest thing most teams skip: the library substitution table. Without it, the agent shops for substitutes at PR time, picks differently each time, and the codebase ends up with three subtly different ways to do the same thing. The table is two days of work and saves three months of cleanup.

Stage 2 · staged-porting

This is the stage that everyone thinks is the whole job, and it's actually the cheapest one once the audit is real. Staged porting doesn't decide what to translate or how to translate it; it executes the audit. If the agent proposes a translation that isn't in the policy, that's a planning failure — fix the audit, don't fudge the port.

Two things have to happen here that the agent cannot do reliably from its own memory. The source is read through the AST — so what gets ported is the function's actual behavior and call graph, not the agent's summary of a file it skimmed in a language it's less fluent in — and the manifest is updated as each symbol lands, so the port carries a checkable record of what's ported, what's stubbed behind the wire, and what's still source-side. The library-in-table and one-seam-per-PR constraints become structural facts you can read off the import graph, not hopes a reviewer has to verify by eye.

The shape that works:

  1. One seam, one PR. A PR that ports the order controller is reviewable. A PR that ports "checkout to Go" is a six-week branch nobody can rebase.
  2. The seam is always a wire protocol. HTTP, gRPC, queue, message bus. Old code calls new code over the wire, or vice versa. Never link the two source trees.
  3. Library substitutions follow the table. Any new import on the target side that isn't in the audit's table blocks the PR. Period.
  4. Idiom translations are declared in the plan. Before the diff is generated, the agent writes which idioms it's translating in this port, citing the policy. Reviewers spot bad translations on the plan, not in three thousand lines of new code.
  5. Data hazards are addressed by named patterns. Monetary fields use shopspring/decimal. Times are UTC-explicit. Strings are byte-comparable at the wire. These are linter-enforced, not "we'll check at review."
  6. Parallel run via the wire seam. Old and new both receive the production request; outputs are compared at the seam, at the byte level and at the semantic level.
  7. Cutover is a flag, never a deploy. Same rule as modernization. Rollback is a flag flip, measured in seconds.

One stage = one declarative artifact. Audit SHA, seam ID, declared idiom translations, library imports used, data-handling patterns applied, parallel-run config. All in the PR. The agent's plan in markdown. The run trace attached. Future you needs to know exactly which idiom decisions were made, by which agent run, on which day.

# stages/2026-05-30-order-controller-port.yaml
stage:
  audit: audits/checkout-migration.2026-05-30.yaml@a1b2c3d
  addresses: order.controller
  strategy: parallel-run
  ramp_today: shadow
  ramp_next:
    on_verify_pass: 5%
    hold_period_hours: 24
  library_imports:
    - chi          # in table
    - slog         # in table
    - shopspring/decimal  # in table; required by monetary policy
    # any import not in audit's table blocks the gate
  idiom_translations_applied:
    - source: throws OrderNotFoundException
      target: errors.New + sentinel ErrOrderNotFound; wrapped with %w
    - source: synchronized(this) { ... }
      target: sync.Mutex on receiver; named guard field "mu"
    - source: Optional<Order>
      target: *Order; explicit nil check at every dereference
  data_handling:
    monetary: shopspring/decimal
    timestamps: utc-explicit
    strings: utf-8 raw at wire; no normalization
  parallel_run:
    started: 2026-05-25
    diff_threshold_bytes: 0.001
    diff_threshold_semantic: 0.0001
    primary: old
    shadow: new
  api_compatibility:
    contract: openapi/checkout.v3.yaml@d4e5f6
    breaking_changes: []
  verify:
    plan: verify/2026-05-30-order-controller-port.yaml

Stage 3 · verify-porting

Verify is where the cross-language failures get caught before the cutover ramps. Without it, the new system passes the ported tests on its own terms; with it, the new system has to pass against the old system on the wire, at the byte level, on the production workload. The agent's idiom choices, library substitutions, and data handling all get scored, separately, on threshold-per-dimension.

Verify needs the same six signals as a modernization verify, plus four cross-language ones. The cross-language ones aren't optional:

Don't ship a single number. Verify reports per dimension. A green characterization suite with a 12% byte-level diff and a 4% precision deviation on monetary fields is not a passing port; it's a confidently-shipped regression.

# verify/run.py — a tiny but real migration verify gate.
import sys
from verify import (
    characterize, parallel, precision, encoding,
    races, libs, idiom, perf, contract, cost,
)

target = "services/checkout"
audit  = "audits/checkout-migration.2026-05-30.yaml"

scores = {
    "characterization":     characterize.run(target, suite_sha="a1b2c3d"),
    "parallel_bytes":       parallel.diff(window="24h", mode="bytes"),
    "parallel_semantic":    parallel.diff(window="24h", mode="json-semantic"),
    "numeric_precision":    precision.deviation(fields="monetary"),
    "encoding_conformance": encoding.byte_match(fields="all-strings"),
    "race_count":           races.scan(target),
    "library_in_table":     libs.in_substitution_table(audit),
    "idiom_adherence":      idiom.score(target, language="go", policy=audit),
    "p99_ratio":            perf.ratio("p99_latency"),
    "throughput_ratio":     perf.ratio("throughput"),
    "contract_breaks":      contract.violations(),
    "cost_overhead":        cost.overhead_pct(),
}

# Direction-aware thresholds: most are "≥", some are "≤".
GE = {  # must meet or exceed
    "characterization":     1.00,
    "parallel_bytes":       0.999,
    "parallel_semantic":    0.9999,
    "encoding_conformance": 1.00,
    "library_in_table":     1.00,
    "idiom_adherence":      0.85,
    "throughput_ratio":     0.90,
}
LE = {  # must meet or stay below
    "numeric_precision":    0.0,   # zero deviation on money
    "race_count":           0,
    "p99_ratio":            1.15,  # ≤15% slower than baseline
    "contract_breaks":      0,
    "cost_overhead":        1.35,  # ≤35% peak overhead
}
failed = {k: scores[k] for k, t in GE.items() if scores[k] < t}
failed |= {k: scores[k] for k, t in LE.items() if scores[k] > t}
if failed:
    print("VERIFY GATE FAILED:", failed); sys.exit(1)
print("VERIFY GATE PASSED:", scores)

The merge gate

The gate stops a stage from ramping. The minimum that earns its keep:

  1. Audit SHA present and unmodified. The PR cites a real audit at a real revision.
  2. One PR, one seam. PRs spanning seams auto-split.
  3. Library imports in the substitution table. Any out-of-table import blocks the gate.
  4. Idiom translations declared. PRs whose idiom decisions aren't written down get sent back for a plan.
  5. Data hazards addressed by named patterns. Monetary fields, timestamps, strings — the linter enforces the policy and the gate enforces the linter.
  6. Verify gate green on every dimension. No "overall pass."
  7. Hold period observed. Ramp rungs are enforced.
  8. Decommission date credible. Burn rate against the budget projects to landing on or before the date.

A stage with all eight is rampable. A stage with seven of eight is interesting and goes back to whichever dimension is missing.

Best practices, in plain English

Failure modes & gotchas

These have actually taken teams down. Every one has a one-week fix nobody had time for.

The gotcha behind half of them: the cross-language difference wasn't named in writing before the port began. An audit with a substitution table, an idiom policy, and a data-hazards catalogue prevents most of this list before it ships.

Cost, parallel-run overhead, and the agent budget

Cost during migration is structurally worse than during modernization. Two stacks running, two build pipelines, two sets of monitoring dashboards, two on-call rotations, two sets of toolchain licenses if applicable. The audit's cost overhead figure has to account for all of it; the decommission date has to land before the budget breaks.

How act101 implements the loop

The loop above is tool-agnostic. This is the one place the playbook names a tool, because the migration case is where its structural half earns its keep most visibly.

A migration splits into two halves: keeping the port faithful to the source's structure, and proving the target's bytes match on the wire. act101 owns the first half; the second is operational tooling it doesn't replace.

The structural half is the porting state machine this playbook is built around — a contract (the source behavior that must survive, recovered symbol by symbol), an inventory (every symbol that has to move), an order (dependency-aware, leaf-first sequencing), and a manifest (a persistent record of what's ported, stubbed, or deferred, and why). act101 reads the source you're leaving and the target you're entering through one AST across 163 grammars, executes the structural edits as deterministic operations that run the same way every time, and tracks that state so a multi-month port doesn't lose its place. The import-graph read behind the library-in-table check, and the systemic idiom-drift check (its inconsistency analysis), are structural facts rather than greps a clever alias can dodge. All of it is callable as MCP tools so the agent drives the port in-band from Cursor or Claude Code, and from the act CLI for the structural gates in CI.

For the wire-and-hazard half, reach for one of each and no more: a wire diff (diffy/jd), a decimal library per side (shopspring/decimal, rust_decimal), an encoding/normalization library (ICU), a race detector (-race/ThreadSanitizer/JCStress), and a contract tool (pact/buf/openapi-diff). act101 keeps the port faithful to the source; those tools prove the bytes match on the wire. The maturity gain is in running the loop, not in stacking more vendors.

The maturity ladder

Most teams don't sit at one tier — they're advanced on flags and primitive on substitution tables, or have great numeric discipline and no idiom policy. Tick what you actually do today.

Zero to three: rewrite-in-progress, just with better branding. Four to six: a real port, porous — the regressions you find will be the data-hazards you didn't catalogue. Seven to nine: the cutover ramps mechanically; the team's energy moves to the next seam. Ten: the audit re-scores per stage, decommission lands on time, and the next migration on this codebase is half as hard as this one.

A reasonable 30 / 60 / 90-day plan

  1. Days 1–30 — get to honest. Write the migration-audit, including the substitution table, the idiom policy, and the data-hazards catalogue. Stand up the wire-seam infrastructure: router, flags, parallel-run harness, byte-and-semantic comparators. Pick the smallest seam. Don't port a line of code yet.
  2. Days 31–60 — run one seam end-to-end. Shadow → 5% → 25% → 100% on the smallest seam, with hold periods at every rung. Every verify dimension wired and reporting. The pipeline now demonstrably catches encoding drift, precision deviation, and concurrency unsafety on a worked example.
  3. Days 61–90 — roll the loop. Subsequent seams move through the same pipeline at higher cadence. The audit re-scores per stage. The decommission artifact exists; the date is real. The cost-overhead trend is down, not up. The team starts to discuss the next migration on the same codebase without dread.

What Spolsky was warning about (and why this playbook lets you do it anyway)

Joel Spolsky's Things You Should Never Do, Part I, written in 2000, is the cautionary tale that haunts every migration. The Netscape rewrite — same problem, new language, new architecture, no preserved contract — took years, shipped late, and gave the company's market to a competitor while the team was busy not shipping. Joel's verdict was categorical: don't do the rewrite. Ever.

He was right about the rewrite. The whole point of this playbook is that a migration is not a rewrite — provided the discipline that makes it not-a-rewrite is actually present. The contract is preserved (the characterization suite is the spec). The deployment is staged (the strangler fig is mechanical, not aspirational). The cutover is mechanical (the flag, not the deploy). The decommission is dated (parallel run isn't allowed to become permanent). Strip any of those out and you're doing the thing Joel said never to do; keep them all and you can do the thing he was actually warning against without the outcome he was warning about.

This is the playbook in the sequence where the prior investments cash in most visibly. The testing uplift produced the suite. The refactoring loop taught the team to ship behavior-preserving change. The modernization loop taught them to ramp it mechanically. The migration loop turns the difficulty up one notch — different language, different runtime, different idioms — and the same discipline keeps it ship-able. What makes that discipline mechanical across the language gap is the state machine: the inventory, the order, and the manifest that keep a multi-month port from losing its place, plus structural edits that execute the same way on every run. Read Spolsky's essay before starting; the failure mode he documented is exactly the failure mode this playbook prevents, by procedural discipline rather than by choosing not to play.

The shortest possible summary: write the substitution table, write the idiom policy, catalogue the data hazards. Port behind a wire seam. Parallel run at the byte level and the semantic level. Verify on precision, encoding, concurrency, and idiom adherence, not just on green tests. Cut over with a flag. Decommission on a dated artifact. That loop, run boringly for six months, is how you ship a cross-language migration that doesn't quietly become the rewrite Joel warned about.