How we measure

A rank on the act101 leaderboard is instrumented, signed, third-party proof — not a claim. This page is the arithmetic behind every number, so a #1 finish survives a skeptical thread.

The headline is the privacy rule: act101 NEVER ships your code off-site. We ship only opt-in usage metrics. Everything below describes how those metrics are measured — never estimated from a benchmark, never ratio-guessed.

1. The standard candle: a naive full-file agent

Token savings are a measured counterfactual, not a ratio estimate. For every operation, we measure two numbers:

The naive full-file agent is a standard candle that equalizes across coding agents regardless of their own shortcuts. Naivete is the feature: one rule, no caps, no special cases, no per-agent negotiation about what "would have" happened.

2. The accounting, per operation kind

Operation kindnaive_bytesactual_bytes
read / analyzeΣ full size of touched filesresponse bytes
write / refactorΣ (before size + after size) per ChangeSet-affected fileresponse bytes + Σ edit-text bytes
no-file ops (status, docs, …)= actual_bytesresponse bytes

Honest zero. A no-file operation records savings ≡ 0 — not a rounded-down estimate, an actual zero. Operations that touch nothing cost nothing to save, and we say so.

Whole-workspace operations count it all: the naive agent reads every file the operation could have touched, with no cap. Within one operation, a file counts once (a set keyed by path). Across operations, repeated touches count each time — the naive agent re-reads too; each operation is measured independently.

3. Bytes → tokens

# the single conversion, applied at display and upload time
tokens_saved = (naive_bytes − actual_bytes) / 4

Bytes are the stored unit, on disk and over the wire. The / 4 is applied only when a number is shown to you or uploaded — one conversion, stated plainly, visible everywhere a token count appears.

4. What is measured vs. estimated

Every savings number on the leaderboard is measured: the byte counts come from the actual files the operation touched and the actual bytes it emitted, recorded at the operation dispatch layer.

Historically, act shipped a ratio-based estimator (a benchmark table of per-operation savings ratios). That estimator and its table have been deleted, not bypassed. Older runs' estimated totals are preserved in a separate legacy_estimated block — rendered as a clearly-labeled line in act stats, never summed with measured numbers. One accounting truth, no mixing.

Nothing on the leaderboard is estimated. If a number can't be measured, it isn't shown.

5. What never leaves your machine

Raw operation records — which files, which paths, which projects — stay on your machine. Only an aggregate payload leaves, and only if you opt in at act onboarding. The uploaded payload carries:

Never repo names, file paths, project hashes, per-project breakdowns, or code. There is no path from your source to the leaderboard.

6. Signed ingestion: fabrication is closed

Every upload is HMAC-signed with an upload token minted only at CLI onboarding. Server-side acceptance:

  1. Signature valid + nonce unseen — kills replay and fabrication. A payload not signed by a token we minted is rejected; a replayed nonce is rejected.
  2. Plausibility clamp per event and per day — a ceiling derived from measured maximums. A clamped event is still counted; it just can't be absurd. A cheater still plays, they can't be silly.
  3. Weekly taper applied at the midnight rollup, not at ingest — full credit up to a generous heavy-use threshold, logarithmic above. Raw events stay honest in storage; the taper is presentation math, tunable without migration.

The leaderboard's legitimacy stance is explicit: legitimacy over cheat-prevention. Once fabrication is cryptographically closed, every remaining cheat requires actually using act — real scans, real sessions. A cheater is a power user and a billboard.

7. The other board: measured health improvement

The "Most Improved" board ranks public repos by AI-Code Health Score gain, and it is equally closed against gaming:

8. Rebuild cadence

Both boards rebuild once daily at 00:00 UTC — one cron computes tiers, streaks, achievements, rivalries, and the published JSON. The cadence is itself a published rule and a daily check-in ritual. If a rebuild fails, boards serve the previous day's data and the countdown shows "rebuilding…" — stale beats blank beats wrong.

← Privacy Policy  ·  act101 Online →