AI-readiness rubric — 28 criteria

AgentFit scores every audited site against 28 criteria grouped into six categories totalling 100 points. Each criterion has an explicit spec: what the audit checks, how scoring tiers map to points, and a one-line hint for closing the gap. The spec is deterministic and fully replicated in Go code. Read the methodology article for the framing.

Beyond the 28 scored criteria, AgentFit also surfaces emerging signals such as WebMCP support and MCP server detection. WebMCP · MCP.

What AI-readiness actually means

An agent reads your documentation over HTTP, not in a browser with a person behind it. It does not run your JavaScript, does not scroll, does not hover a tab to reveal the endpoint list, and does not ask a colleague what the rate limit is. It fetches, parses, and either finds a usable contract or gives up and guesses. AI-readiness is the share of that path a machine can complete unaided.

It is measurable because every step of that path is an observable HTTP fact. Does /llms.txt exist and parse. Does the page carry an absolute canonical URL. Does the OpenAPI document validate and describe its response schemas. Are the method, the path and a working example present in the raw HTML that arrives before any script runs. None of this requires judging whether the prose is good. AgentFit checks 28 such facts and reports what it saw for every one of them, with the URL it fetched and the evidence snippet.

How the score is computed

Each of the 28 criteria has a ladder of integer tiers and a weight. A criterion returns a status — present, partial, absent, not applicable, or error — and a score between zero and its weight. The 28 scores roll up into six categories that sum to exactly 100. Not applicable and error both score zero, but they are labelled separately on purpose: “we could not measure this” is not the same claim as “this is not there”, and collapsing them would let a blocked fetch masquerade as a finding.

The weights are not editorial taste. Each one comes from a rule written down before the numbers were computed: a criterion earns weight when it separates sites that are otherwise tied on the rest of the rubric, and loses weight when its score is mostly decided by which documentation platform the site runs on. A point you get for choosing a particular docs host is a point about your hosting, not about your documentation. The rule, the metric and the final weight vector were committed to the repository ahead of the calibration run; the git order is the audit trail, and the same procedure has already rejected its own author’s hypothesis once.

The audit is deterministic: the same site fetched twice produces byte-identical JSON. No language model runs during scoring — two small classifiers, for example-realism and endpoint completeness, are compiled into the binary and versioned with it, so a retrain cannot silently move your score. When the ruler itself changes, the rubric version is bumped and stamped on the run, and comparing two runs scored under different versions is refused rather than presented as a change in your site.

Category weights. The third column is the question the category answers on your behalf.

Category	Weight	What it answers
A — Discovery	14	Can an agent find your docs at predictable URLs?
B — Page artifacts	21	Can a parser ingest each page's contents directly?
C — API contract	17	Is the API contract published as a machine-readable spec?
D — Content	23	Does each endpoint page carry enough context to use?
E — Rendering & hygiene	21	Is the site stable and usable without JavaScript?
F — Agent capability	4	Does the site expose agent-native surfaces (llms.txt, WebMCP, MCP, accessibility)?

Each bar is the share of that category's own budget, and the budgets differ, so a short bar in a small category is not a large loss. The right-hand column counts sites that earned nothing there, which includes criteria that did not apply to them. SVG

Every criterion is drawn against the same 0-100 % scale of its own budget, and the scale is not clipped. That all the bars are short is the finding, not a rendering choice. SVG

A — Discovery · 14/100

Can an agent find your docs at predictable URLs?

Discovery covers everything an agent finds before it reads a single documentation page: an /llms.txt index at the host root, a full-text /llms-full.txt aggregate, a robots.txt that takes an explicit position on AI crawlers, a sitemap listing real pages rather than tag archives, and homepage tags pointing at a markdown twin. This is the cheapest category on the board — every artifact in it is a static file produced at build time — and it is the one most often left empty.

A1 · llms.txt at host root conforms to llmstxt.org spec

Anchor: /rubric#a1

agents (and humans new to a site) need a single, predictable index of where the docs live. /llms.txt is the convention proposed by Anthropic + the llmstxt.org community: one Markdown file at the host root with the docs map.

Scoring. 3 = conformant llms.txt OR full agent-discovery breadth (llms.txt real text + llms-full.txt + advertised MCP/agent endpoint) · 2 = strong on one side · 1 = one weak signal · 0 = neither · error if unreachable

Fix. Publish /llms.txt at host root with an `# H1` title, `## H2` section headers, and at least three Markdown bullet links pointing at concrete doc pages. See https://llmstxt.org for the spec.

A2 · llms-full.txt or per-section LLM aggregates exist

Anchor: /rubric#a2

/llms-full.txt is the full-text dump of your docs in one place — large language model agents prefer it over crawling 200 HTML pages. Per-section variants (/llms-api.txt etc.) work too.

Scoring. 2 = /llms-full.txt at host root, >1 KB · 2 = per-section aggregate found via llms.txt · 0 = neither, or SPA-shell at /llms-full.txt.

Fix. Generate /llms-full.txt at build time (mkdocs/docusaurus plugins exist) and serve it as text/plain. Keep it under 100 MB so agents can fetch it without streaming.

A3 · robots.txt declares an AI-bot policy and absolute Sitemap

Anchor: /rubric#a3

every AI crawler (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot) reads /robots.txt before crawling. An explicit Allow/Disallow per UA + an absolute `Sitemap:` directive removes ambiguity about both indexing and what to index.

Scoring. 3 = explicit AI-bot UA directive AND absolute Sitemap: line · 2 = AI-bot directive only · 1 = absolute Sitemap only · 0 = neither, or 404.

Fix. Add `User-agent: GPTBot\nAllow: /` (or Disallow as policy dictates) for each major LLM bot, plus a `Sitemap: https://example.com/sitemap.xml` line. Cloudflare's `Content-Signal:` directive also counts.

A4 · sitemap.xml: well-formed, absolute URLs, low taxonomy noise

Anchor: /rubric#a4

sitemaps tell crawlers what to index and how often it changes. A well-formed `<urlset>` with absolute `<loc>` URLs across many distinct pages signals real coverage; a stub with one URL (or one with 70%+ /tag/ /category/ noise) gives no signal.

Scoring. 3 = well-formed + ≥3 distinct paths + <30% taxonomy junk · 2 = thin (<3 paths) or 30-70% junk · 1 = well-formed only · 0 = 404 or parse error.

Fix. Generate /sitemap.xml at build time, include every doc page with `<loc>` as absolute URLs, and exclude /tag/, /category/, /author/, /page= variants. Reference it from robots.txt with an absolute `Sitemap:` line.

A5 · Homepage discovery tags: markdown alternate + OpenGraph

Anchor: /rubric#a5

discovery tags let agents find the markdown version of a page without a separate probe, and OpenGraph turns shared docs links into rich previews on Slack/Discord/Twitter. Both signal an awareness of machine consumers.

Scoring. 3 = `<link rel=alternate type=text/markdown>` + ≥3 distinct `og:` properties · 1 = markdown alternate only OR OpenGraph only · 0 = neither.

Fix. Add `<link rel="alternate" type="text/markdown" href="/page.md">` next to your canonical link, and ensure `og:title`, `og:description`, `og:image` (minimum 3 properties) are set on the homepage.

B — Page artifacts · 21/100

Can a parser ingest each page's contents directly?

Page artifacts decide whether a single page can be ingested without guesswork: a clean markdown companion served at .md or under Accept: text/markdown, JSON-LD that parses and declares a type, an absolute canonical URL, a machine-readable modification date, and a main or article element marking where the content ends and the navigation begins. Agents deduplicate on canonicals and decide what to re-fetch on dates; without those signals your page is an undifferentiated wall of divs.

B1 · .md companion of doc pages returns clean markdown

Anchor: /rubric#b1

browsing 200 HTML pages to read your docs is fine for humans; for agents it's an order-of-magnitude tokenization cost. A markdown twin per page lets agents pull just the prose.

Scoring. 5 = 3/3 sampled pages have a working .md twin · 4 = 2/3 · 2 = 1/3 · 0 = 0/3.

Fix. Serve `{page}.md` (or `{page}/index.md`) alongside every HTML page, OR support `Accept: text/markdown` content negotiation that returns `Content-Type: text/markdown`. mkdocs-material and docusaurus both have plugins.

B2 · JSON-LD with valid @type on homepage and sample doc page

Anchor: /rubric#b2

JSON-LD is the schema.org-compatible way to declare "this page is an Article" / "this product is a SoftwareApplication". Search engines, agents, and structured-data extractors all key off it.

Scoring. 3 = parseable JSON-LD with `@type` on BOTH homepage and a sample doc page · 3 = one of the two · 0 = none.

Fix. Embed `<script type="application/ld+json">{"@context":"https://schema.org","@type":"TechArticle",...}</script>` on every doc page. The `WebApplication` type is a good fit for the homepage.

B3 · Absolute <link rel=canonical> on homepage and sample page

Anchor: /rubric#b3

canonical links resolve the "is this the http or https version, with or without trailing slash, with or without query?" question deterministically. Without them, agents may index the same content under multiple URLs.

Scoring. 4 = absolute canonical on BOTH homepage AND a sample page · 2 = homepage only · 1 = present but relative · 0 = absent.

Fix. Add `<link rel="canonical" href="https://example.com/page">` to every page's `<head>`. The URL must include the scheme + host (relative canonicals are valid HTML but defeat the purpose for cross-host agents).

B4 · Freshness: dateModified (JSON-LD) or Last-Modified header

Anchor: /rubric#b4

agents (and search engines) lower their trust in docs that don't declare when they were last updated. A docs page from 2019 with no freshness signal is indistinguishable from one updated yesterday.

Scoring. 4 = JSON-LD `dateModified` OR HTTP `Last-Modified` header present · 0 = neither.

Fix. Either include `"dateModified": "2026-05-28"` in your JSON-LD block, or have your CDN/server emit a `Last-Modified` HTTP header. Build-time templating does this for free in most static-site generators.

B5 · Machine-readable taxonomies (keywords, tags, categories)

Anchor: /rubric#b5

tagged docs help agents filter ("show me the auth-related pages") without parsing full prose. `<meta name="keywords">`, JSON-LD `keywords`, or `/tags/`-style URLs all count.

Scoring. 2 = at least one taxonomy signal present (meta keywords, JSON-LD keywords, or /tags|/categories|/topics/ link patterns) · 0 = none.

Fix. Add `<meta name="keywords" content="api,auth,oauth">` to each page, OR include a `keywords` array in your JSON-LD, OR organise content under `/topics/` or `/tags/` URL prefixes.

B6 · <main> or <article> wraps the primary content prose

Anchor: /rubric#b6

semantic HTML5 wrappers let agents (and screen readers) strip the navigation/footer/sidebars and read just the docs prose. A page where the body is all `<div>` requires guesswork.

Scoring. 3 = `<main>` text >200 chars AND `<article>` text >100 chars · 1 = `<main>` only OR `<article>` only · 0 = neither.

Fix. Wrap your page's primary prose in `<main>` (or `<article>` for individual doc pages). Avoid using these for sidebars or navigation — they're meant for the actual content.

C — API contract · 17/100

Is the API contract published as a machine-readable spec?

The API contract category asks one thing: is there a specification an agent can find and trust. C1 is the heaviest criterion in the rubric at 8 points, split between finding the document at a discoverable URL and it being a valid OpenAPI 3.x with info, paths and response schemas. The split is deliberate — a file that exists but does not describe what comes back is half a contract. A valid spec is worth more than any page of prose, because a client, a test suite and a set of tool definitions can be generated from it without reading the docs at all.

C1 · OpenAPI / Swagger / AsyncAPI spec — found and valid

Anchor: /rubric#c1

an OpenAPI spec is THE primary machine-readable contract for a REST API: agents that find a VALID one generate clients, tests and accurate docs without reading any HTML. Rubric v3 merges discovery and validity into one criterion — finding a spec is half the battle, but a spec with no response schemas is half a contract.

Scoring. 8 = valid OpenAPI 3.x (info, ≥1 path, response schemas on ≥30% of operations) · 3 = a spec was found at a reachable URL but is not valid 3.x (incl. Swagger 2.0) · 0 = nothing found · error if the homepage couldn't be fetched.

Fix. Publish your spec at `/openapi.json` or `/openapi.yaml` at host root (or advertise it via an RFC 9727 api-catalog `service-doc` link). Make it OpenAPI 3.x and give each operation a `responses` schema — that is what earns the full 8.

C2 · Postman collection or SDKs with discoverable download/fork

Anchor: /rubric#c2

an OpenAPI spec lets agents generate a client; a curated Postman collection or pre-built SDK lets HUMANS try the API in 30 seconds. Both signal investment in developer experience.

Scoring. 2 = Postman collection AND/OR ≥1 SDK registry link · 1 = a single SDK link · 0 = nothing

Fix. Publish a "Run in Postman" button linking to god.gw.postman.com/run-collection, and link to at least one official SDK from npm/PyPI/RubyGems/etc. directly from your docs homepage.

C3 · Endpoint pages show method, URL, types, required, examples

Anchor: /rubric#c3

a docs page that just says "call /users" is useless without method, parameter types, required fields, and a sample request/response. Agents (and humans) need all five to make a working call.

Scoring. 7 = majority of sampled pages classified `complete` by the ML model · 3 = majority `partial` (or 2 complete + 1 absent) · 1 = majority `absent` · 0 = no candidate pages found.

Fix. On every endpoint page include: HTTP method + path, a parameter table with types and required flags, a curl example, and a JSON response example with status code. Markdown-style param tables and `<pre>` JSON blocks classify cleanly.

D — Content · 23/100

Does each endpoint page carry enough context to use?

Content is the largest category, and the only one that cannot be fixed in a configuration file. It checks whether endpoint pages carry runnable examples in more than one language, whether those examples use values that look like real data instead of foo and example.com, whether errors are catalogued with codes and causes, whether authentication and rate limits are documented, and whether the same concept is called the same thing throughout. Agents ground their answers in your examples; a placeholder-heavy snippet is copied into generated code verbatim.

D1 · Code examples include curl AND at least one language SDK

Anchor: /rubric#d1

curl examples are universally testable; SDK examples show idiomatic usage. Together they hit both the "can I try this quickly?" and the "how do I integrate?" needs.

Scoring. 3 = curl AND a language SDK block on ≥1 page · 2 = curl only · 1 = SDK only · 0 = neither.

Fix. Add a tabbed code block per endpoint with at least curl + your most-used SDK language (Python or JavaScript). Use `<code class="language-python">` or `language-bash` so syntax highlighters and our classifier both pick it up.

D2 · Realistic examples (not foo/bar/example.com)

Anchor: /rubric#d2

`/users/{id}` with `id = 1` and `email = [email protected]` requires the reader to imagine what real data looks like. Realistic placeholders (`[email protected]`, `org_2N5x...`) reduce friction and prevent paste-from-docs accidents.

Scoring. 5 = ML model says <20% of code blocks are placeholder-heavy · 3 = 20-40% · 2 = 40-60% · 1 = 60-80% · 0 = >80% or no code blocks.

Fix. Replace `foo`/`bar`/`example.com`/`your_api_key`/`<string>` with realistic-looking values (Stripe's `pk_test_51N5...`, Twilio's `+14155552671`). Don't use real customer data — but mimic its shape.

D3 · Error catalogue with HTTP codes + reasons

Anchor: /rubric#d3

when an integration breaks at 3am, the dev needs to know what `403 - resource_not_owned` actually means without filing a ticket. A dedicated error reference page is the difference between a 5-minute fix and a half-hour debug.

Scoring. 5 = dedicated error page (≥3 codes with explanations) · 1 = error codes documented inline across pages · 0 = none.

Fix. Publish `/errors` (or `/reference/errors`) listing each HTTP status you return + the application-level error codes + a one-sentence cause for each. Tables work well; so do `<dl>` definition lists.

D4 · Authentication AND rate limits documented

Anchor: /rubric#d4

auth is table-stakes; rate limits are how a dev knows whether their integration will survive production load. Both belong on a top-level docs page that's discoverable from the homepage.

Scoring. 4 = both auth and rate-limits documented · 2 = auth only · 1 = rate-limits only · 0 = neither.

Fix. Add `/authentication` (bearer / API-key / OAuth flows) and `/rate-limits` (req/min, headers like `X-RateLimit-Remaining`, 429 retry semantics) pages. Each needs at least 200 chars of context — not just a code snippet.

D5 · Glossary OR consistent terminology across pages

Anchor: /rubric#d5

is it a "workspace", "team", or "organisation"? Picking one term and sticking with it across all docs prevents a class of "what does X mean here?" support tickets. A dedicated glossary is best; consistent usage is acceptable.

Scoring. 4 = dedicated /glossary with ≥3 structured term/definition pairs · 2 = no glossary but cross-page terminology stays consistent (≥80% dominant variant) · 1 = glossary link exists but content is sparse · 0 = neither.

Fix. Publish `/glossary` as a `<dl>` with `<dt>term</dt><dd>definition</dd>` pairs (or a 2-column table with ≥50-char definitions). Use the same casing/spelling for each term across all pages.

D6 · Deprecated / beta endpoints marked in plain text

Anchor: /rubric#d6

a developer pasting your code sample from 2022 into a 2026 project shouldn't discover the endpoint is deprecated at runtime. Explicit `deprecated` / `beta` / `sunset` markers in the docs save migration headaches.

Scoring. 2 = `deprecated` in OpenAPI spec OR in ≥2 sample pages near endpoint headings · 1 = beta/experimental keywords found but no deprecation · 0 = none.

Fix. Mark each deprecated endpoint with `deprecated: true` in OpenAPI AND a visible badge or admonition in the HTML docs (Mintlify's `<Warning>`, Docusaurus's admonition syntax, etc.). Same for beta endpoints — visible in the prose, not just in the spec.

E — Rendering & hygiene · 21/100

Is the site stable and usable without JavaScript?

Rendering and hygiene is about whether any of the above survives contact with a plain HTTP client. E1 is the gating criterion: if the content is not in the HTML that arrives before JavaScript runs, an agent sees an empty application shell no matter how good the docs are. The rest of the category covers stable URLs across moves, an explicit API version, internal links that actually resolve, stated usage terms, and controls carrying accessible names an agent can target when it operates the page rather than reads it.

E1 · Content visible in plain HTML without JavaScript (gating)

Anchor: /rubric#e1

this is the GATING criterion. If your docs only render after JavaScript runs (single-page-app shell), agents that fetch raw HTML see nothing. WebCrawlers + scrapers + curl + most AI fetchers don't run JS.

Scoring. 6 = body text >500 chars on at least one of homepage or 2 sub-pages across UA modes · 3 = homepage passes but sub-pages SPA · 0 = SPA shell everywhere, or uniform-shell trap (3+ URLs return identical body).

Fix. Serve pre-rendered HTML at static URLs. If you use Next.js/Nuxt/SvelteKit, enable SSG or SSR for the docs section. Single-page-app shells (React SPA, Vue SPA without SSR) fail this gate and cascade-zero many other criteria.

E2 · Stable URLs: 301 redirects preserve old paths

Anchor: /rubric#e2

when you reorganise docs, old links shouldn't 404 — they should 301 to the new URL. Stable URLs are how internal links from blogs, Stack Overflow, and bookmarks survive your refactor.

Scoring. 2 = stable 301/308 redirect on ≥1 of 2 sampled URL-variants · 1 = canonical-alias pattern (200 with `<link rel=canonical>`) · 0 = 302 (impermanent), 404, or no redirect.

Fix. When you change a doc URL, add a 301 redirect from the old path to the new one. Static-site generators handle this via `_redirects` (Netlify) or `vercel.json` `redirects:` (Vercel) configs.

E3 · Explicit API version in URL path, heading, or OpenAPI spec

Anchor: /rubric#e3

`/v1/users` vs `/v2/users` is the cheap way to do API versioning AND make it obvious to agents. Version metadata that's only in a header (and not in the path or docs heading) is invisible to crawlers.

Scoring. 2 = version in the docs' own URL structure (sitemap or OpenAPI-spec URL: `/v1/`, `/2024-01-15/`) · 1 = version in a documented API endpoint URL (curl/code examples), in OpenAPI `info.version`, or in an `<h1>`/`<h2>`/footer heading · 0 = none.

Fix. Prefix your API paths with `/v1/`, `/v2/` and show them in your curl/code examples; OR set a non-empty `info.version` in your OpenAPI spec. Date-versioned APIs (`/2024-01-15/users`) also count.

E4 · Spot-check of 5 internal links → all return 200

Anchor: /rubric#e4

rotten internal links are the most-common docs failure mode after years of churn. A 5-link spot-check catches the worst cases (4xx/5xx on links right on your homepage) without trying to crawl every link.

Scoring. 4 = 5/5 of sampled same-host links return 200 · 1 = 4/5 · 0 = ≤3/5, OR fewer than 5 distinct same-host links found on the homepage.

Fix. Run a link-checker as part of your CI (lychee, htmltest, linkinator). For the most-trafficked links on the homepage, fix any 404s before shipping. Five working links from the landing page is the bare minimum.

E5 · Usage terms: TOS / license / AI policy explicit

Anchor: /rubric#e5

without an explicit TOS or AI-usage policy, every LLM scraper has to guess your stance. Adding a substantive `/terms` or `/license` page — especially one with AI/ML keywords — makes the policy machine-readable.

Scoring. 3 = TOS page found AND contains AI/ML policy keywords in main content · 1 = TOS found (substantial but no AI keywords, or link present but page 404/sparse) · 0 = no TOS link.

Fix. Publish `/terms` (or `/legal`, `/license`) with at least 1000 chars of policy text. Include explicit language on AI scraping, model training, and automated access — even if you allow everything, saying so is the signal.

E6 · Agent accessibility: static name + ARIA validity

Anchor: /rubric#e6

an AI agent operates a page through its accessibility tree — it needs every button, link, input and image to carry a name it can target. Missing accessible names, invalid ARIA, and positive tabindex all break that. Rubric v3 re-homes this from category F into E next to E1, because it grades the RENDERED page, not an agent-discovery surface.

Scoring. 4 = 0 violations (and ≥1 element to check) · 2 = 1–2 · 1 = 3–5 · 0 = ≥6 · not_applicable if the page has nothing nameable · error if the homepage couldn't be scanned. Static heuristic (no axe-core/headless): name + ARIA-validity rules only.

Fix. Give every `<button>`/`<a>`/icon an accessible name (text, `aria-label`, or a labelled child `<img alt>`); associate `<label>`s with inputs/selects; add `alt` to images and `<title>` to inline SVGs; drop positive `tabindex`; fix typo'd `aria-*` attributes and roles.

F — Agent capability · 4/100

Does the site expose agent-native surfaces (llms.txt, WebMCP, MCP, accessibility)?

Agent capability records the explicit agent-facing surfaces: a WebMCP tool surface in the page, declarative or scripted, and an advertised MCP server whose OAuth metadata conforms to RFC 9728 and RFC 8414. It is deliberately small — four points out of a hundred. Both specifications are young and still changing, adoption outside a handful of documentation platforms is measured in single-digit percentages, and taxing every site for not shipping an experimental standard would say more about our enthusiasm than about their documentation. The weight grows when adoption does.

F2 · WebMCP tool surface exposed to in-browser agents

Anchor: /rubric#f2

WebMCP lets a page expose callable tools to an AI agent running in the browser tab — via declarative `<form toolname tooldescription>` markup, the imperative `navigator.modelContext` API, or a polyfill. F2 rewards any detected surface; the schema check on declarative forms rides the diagnostics and does not change the point.

Scoring. 1 = WebMCP detected, 0 schema errors + 0 warnings · 1 = detected with schema issues (errors or warnings) · 0 = not detected · error if the homepage couldn't be scanned.

Fix. Add a WebMCP tool surface. The declarative form is the one an outside auditor can verify from static HTML: annotate a `<form>` with `toolname` + `tooldescription`, and give every input `name` + `toolparamdescription`. Fix missing-toolname / required-param-no-name errors first; they're the hard failures.

F3 · MCP server advertised (RFC 9728 / 8414 OAuth)

Anchor: /rubric#f3

an MCP server lets agents call your API as governed tools. F3 rewards advertising a discoverable, OAuth-protected MCP endpoint via the standard `.well-known` metadata, so an agent can authenticate and connect without bespoke setup.

Scoring. 3 = full oauth-mcp (RFC 9728 protected-resource + RFC 8414 auth-server metadata + PKCE S256) · 2 = partial · 1 = endpoint-only · 0 = none · error if the site was unreachable.

Fix. Serve `/.well-known/oauth-protected-resource` pointing at your MCP endpoint and a same-host RFC 8414 authorization-server metadata document advertising `S256` PKCE.

Where to start

Across the 5,827 sites in the public AgentFit corpus the median score is 22 out of 100, a quarter of sites score below 11, and only 19 % clear 40. The points are lost in the same four places, ranked here by points available per hour of work:

No /llms.txt at the host root — 75.0 % of sites score zero on A1, worth 3 points. One generated file, one build step. 57 % of the corpus has neither llms.txt, nor llms-full.txt, nor an AI-crawler policy in robots.txt: no declared entry point for an agent at all.
robots.txt names no AI crawler and carries no absolute Sitemap line — 65.8 % score zero on A3, worth 3 points. Five lines of text, in the file every crawler reads first.
No markdown companion for documentation pages — 77.9 % of the sites where the check could run score zero on B1, worth 5 points. Most static site generators already hold the source markdown; the work is routing, not writing.
No discoverable, valid specification — 96.4 % score zero on C1, worth 8 points. The single largest loss on the board, and the only one of the four that is real engineering rather than configuration.

The order matters more than the list. Nine criteria — llms.txt, llms-full.txt, robots, sitemap, homepage discovery tags, JSON-LD, canonical, last-modified date, taxonomies — are worth 27 points between them and require no rewriting of the documentation itself; the average site collects 8.5 of those 27, so eighteen points are sitting unclaimed. C1 and the error catalogue in D are weeks of work and should be planned, not squeezed in. One caveat on causation: sites with an llms.txt have a median of 41 against 18 for the rest, but the file does not buy 23 points — the arrow more likely runs the other way, and teams that maintain their docs are the teams that add the file. Markup is a cheap way to collect points, not a way to make documentation good.

Every figure in this section is a frozen snapshot: 5,827 hosts with a valid run under rubric v3, taken 27 July 2026, one run per host, excluding the domains AgentFit owns. The charts higher up this page are drawn from the live corpus under the CURRENT rubric, which was reweighted after that snapshot — the two sets of numbers are not expected to agree, and where they differ the charts are the current measurement. These are URLs people submitted themselves rather than a curated list of API documentation, so the median describes what was submitted, not the market.

Questions

What is llms.txt, and do I actually need one?

It is a Markdown file at the root of your host that indexes your documentation: an H1 title, section headers, and links to the pages that matter, in the format proposed at llmstxt.org. No crawler is obliged to read it, and AgentFit does not claim it is a standard. It is scored because it costs one build step and it is the only place where you, rather than a crawler’s heuristics, decide what counts as your documentation. Its companion /llms-full.txt is the same idea taken to the full text.

Do I need an MCP server to score well?

No. The entire agent-capability category is 4 points out of 100 — a site with no MCP server and no WebMCP can score in the nineties. The category exists to record who is building agent-facing surfaces, not to penalise who is not. If you do run an MCP server, F3 checks that it advertises itself the way the specification expects: protected-resource metadata under RFC 9728, an authorization-server document under RFC 8414, and PKCE with S256.

How is this different from Lighthouse or an SEO audit?

Different reader. Lighthouse measures the experience of a human in a browser: paint timing, layout shift, the accessibility of the rendered page. An SEO audit measures fit with a search index and its ranking signals. AgentFit measures whether a program that does not execute your JavaScript and does not scroll can retrieve the contract of your API. The overlap is real — no-JS rendering, canonical URLs and structured data appear in all three — but the failure modes diverge, and a perfect Lighthouse score is entirely compatible with a documentation site an agent cannot use.

Why is there no language model in the scoring?

Because a score you cannot reproduce is not a measurement. Every check is an HTTP fetch plus a parser or a rule, so the same site scores the same twice and every point carries a URL and a snippet you can verify yourself. Two small classifiers assist two criteria; they are compiled into the binary and versioned with it, so nobody’s retrain moves your score overnight. The practical consequences are that a full audit takes about thirty seconds, costs nothing to run, and can be checked by a sceptic.

How often should I re-run the audit?

After any change to how the documentation is built or served, and monthly otherwise. A score only moves when a fact moves, so daily runs mostly measure noise in your CDN. Compare two runs on the diff view; if the rubric version changed between them, the comparison is refused rather than showing our change of ruler as your regression.

My score is low. What does that tell me?

Usually not that the documentation is bad for people. The most common shape of a low score is a well-written site delivered as a JavaScript application with no machine surfaces: the agent gets an empty shell, no markdown twin, no specification, and everything downstream fails with it. Read the report from the fix list up — it is ordered by points per unit of effort, and every entry names the URL that was fetched, so you can reproduce the finding before you plan the work.

Will a high score make ChatGPT cite my documentation?

Nobody can promise that, and this tool does not. What a high score means is narrower and checkable: an agent that reaches your site can retrieve, parse and quote it without a browser. Whether an answer engine then cites you also depends on its crawler policy, on how often people ask about your product, and on ranking machinery none of the vendors publish. AgentFit measures the part that is under your control.

Can I run the rubric myself?

Yes — by hand. The full specification is on this page, criterion by criterion; the same content is served as Markdown at this URL under Accept: text/markdown, and the report schema is in the public OpenAPI document. What you cannot do is run our implementation: the source is private and we publish no builds. What you can do instead is hold ours to account — /reproducibility publishes three raw reports and the exact commands to diff them, field by field, against a live run of the same site. Every site already audited is browsable, so you can compare against a peer before you start.

Audit your documentation · Browse audited sites · Check an MCP server

framing article