How we measure

A rank on the act101 leaderboard is instrumented, signed, third-party proof — not a claim. This page is the arithmetic behind every number, so a #1 finish survives a skeptical thread.

The headline is the privacy rule: act101 NEVER ships your code off-site. We ship only opt-in usage metrics. Everything below describes how those metrics are measured — never estimated from a benchmark, never ratio-guessed.

1. The standard candle: a naive full-file agent

Token savings are a measured counterfactual, not a ratio estimate. For every operation, we measure two numbers:

Naive bytes — what a naive agent would have read or written to acquire the same information: every touched file in full, before and after a change.
Actual bytes — what act actually consumed or emitted for the operation.

The naive full-file agent is a standard candle that equalizes across coding agents regardless of their own shortcuts. Naivete is the feature: one rule, no caps, no special cases, no per-agent negotiation about what "would have" happened.

2. The accounting, per operation kind

Operation kind	`naive_bytes`	`actual_bytes`
read / analyze	Σ full size of touched files	response bytes
write / refactor	Σ (before size + after size) per ChangeSet-affected file	response bytes + Σ edit-text bytes
no-file ops (status, docs, …)	= `actual_bytes`	response bytes

Honest zero. A no-file operation records savings ≡ 0 — not a rounded-down estimate, an actual zero. Operations that touch nothing cost nothing to save, and we say so.

Whole-workspace operations count it all: the naive agent reads every file the operation could have touched, with no cap. Within one operation, a file counts once (a set keyed by path). Across operations, repeated touches count each time — the naive agent re-reads too; each operation is measured independently.

3. Bytes → tokens

# the single conversion, applied at display and upload time
tokens_saved = (naive_bytes − actual_bytes) / 4

Bytes are the stored unit, on disk and over the wire. The / 4 is applied only when a number is shown to you or uploaded — one conversion, stated plainly, visible everywhere a token count appears.

4. What is measured vs. estimated

Every savings number on the leaderboard is measured: the byte counts come from the actual files the operation touched and the actual bytes it emitted, recorded at the operation dispatch layer.

Historically, act shipped a ratio-based estimator (a benchmark table of per-operation savings ratios). That estimator and its table have been deleted, not bypassed. Older runs' estimated totals are preserved in a separate legacy_estimated block — rendered as a clearly-labeled line in act stats, never summed with measured numbers. One accounting truth, no mixing.

Nothing on the leaderboard is estimated. If a number can't be measured, it isn't shown.

5. What never leaves your machine

Raw operation records — which files, which paths, which projects — stay on your machine. Only an aggregate payload leaves, and only if you opt in at act onboarding. The uploaded payload carries:

aggregate byte totals (naive_bytes, actual_bytes) and the derived token count,
per-operation-name and per-grammar-name counts — product vocabulary, not your data — and
the week bucket, a nonce, and a session count.

Never repo names, file paths, project hashes, per-project breakdowns, or code. There is no path from your source to the leaderboard.

6. Signed ingestion: fabrication is closed

Every upload is HMAC-signed with an upload token minted only at CLI onboarding. Server-side acceptance:

Signature valid + nonce unseen — kills replay and fabrication. A payload not signed by a token we minted is rejected; a replayed nonce is rejected.
Plausibility clamp per event and per day — a ceiling derived from measured maximums. A clamped event is still counted; it just can't be absurd. A cheater still plays, they can't be silly.
Weekly taper applied at the midnight rollup, not at ingest — full credit up to a generous heavy-use threshold, logarithmic above. Raw events stay honest in storage; the taper is presentation math, tunable without migration.

The leaderboard's legitimacy stance is explicit: legitimacy over cheat-prevention. Once fabrication is cryptographically closed, every remaining cheat requires actually using act — real scans, real sessions. A cheater is a power user and a billboard.

7. The other board: measured health improvement

The "Most Improved" board ranks public repos by AI-Code Health Score gain, and it is equally closed against gaming:

High-water mark (HWM). counted_improvement = max(0, best_score_this_week − HWM_at_week_start). Poison-then-fix banks 0 by construction — self-sabotage is pointless, not forbidden, and no intent adjudication exists.
Scoring epochs. The score formula is actively evolving. Each history row records its scoring_epoch; HWM comparisons are intra-epoch only. A formula change never mints fake improvements.
1,000-line floor. Repos under 1,000 non-blank lines are ineligible — stated on the board, not hidden.
Diff-scoped scores never count. PR-mode scan scores are labeled diff-scoped and never compared to full-repo scores.

8. Rebuild cadence

Both boards rebuild once daily at 00:00 UTC — one cron computes tiers, streaks, achievements, rivalries, and the published JSON. The cadence is itself a published rule and a daily check-in ritual. If a rebuild fails, boards serve the previous day's data and the countdown shows "rebuilding…" — stale beats blank beats wrong.

← Privacy Policy · act101 Online →