agentfit · browse · webmcp · mcp · rubric reference

26-criteria rubric

AgentFit scores every audited site against 26 criteria grouped into 5 categories totalling 100 points. Each criterion has an explicit spec: what the audit checks, how scoring tiers map to points, and a one-line hint for closing the gap. The spec is deterministic and fully replicated in Go code — see the repo for the implementation, and the original article for the framing.

Beyond the 26 scored criteria, AgentFit also flags emerging, non-scoring beta signals — like WebMCP support and MCP server detection.


A — Discovery (18 pts)

Can an agent find your docs at predictable URLs?

A1 · llms.txt at host root conforms to llmstxt.org spec

Anchor: /rubric#a1

agents (and humans new to a site) need a single, predictable index of where the docs live. /llms.txt is the convention proposed by Anthropic + the llmstxt.org community: one Markdown file at the host root with the docs map.

Scoring. 2 = H1 + ≥1 H2 + ≥3 links + all resolve · 1 = any H1-bearing file (downgraded from 2 if links don't resolve) · 0 = missing or HTML shell.

Fix. Publish /llms.txt at host root with an `# H1` title, `## H2` section headers, and at least three Markdown bullet links pointing at concrete doc pages. See https://llmstxt.org for the spec.

A2 · llms-full.txt or per-section LLM aggregates exist

Anchor: /rubric#a2

/llms-full.txt is the full-text dump of your docs in one place — large language model agents prefer it over crawling 200 HTML pages. Per-section variants (/llms-api.txt etc.) work too.

Scoring. 3 = /llms-full.txt at host root, >1 KB · 2 = per-section aggregate found via llms.txt · 0 = neither, or SPA-shell at /llms-full.txt.

Fix. Generate /llms-full.txt at build time (mkdocs/docusaurus plugins exist) and serve it as text/plain. Keep it under 100 MB so agents can fetch it without streaming.

A3 · robots.txt declares an AI-bot policy and absolute Sitemap

Anchor: /rubric#a3

every AI crawler (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot) reads /robots.txt before crawling. An explicit Allow/Disallow per UA + an absolute `Sitemap:` directive removes ambiguity about both indexing and what to index.

Scoring. 3 = explicit AI-bot UA directive AND absolute Sitemap: line · 2 = AI-bot directive only · 1 = absolute Sitemap only · 0 = neither, or 404.

Fix. Add `User-agent: GPTBot\nAllow: /` (or Disallow as policy dictates) for each major LLM bot, plus a `Sitemap: https://example.com/sitemap.xml` line. Cloudflare's `Content-Signal:` directive also counts.

A4 · sitemap.xml: well-formed, absolute URLs, low taxonomy noise

Anchor: /rubric#a4

sitemaps tell crawlers what to index and how often it changes. A well-formed `<urlset>` with absolute `<loc>` URLs across many distinct pages signals real coverage; a stub with one URL (or one with 70%+ /tag/ /category/ noise) gives no signal.

Scoring. 3 = well-formed + ≥3 distinct paths + <30% taxonomy junk · 2 = thin (<3 paths) or 30-70% junk · 1 = well-formed only · 0 = 404 or parse error.

Fix. Generate /sitemap.xml at build time, include every doc page with `<loc>` as absolute URLs, and exclude /tag/, /category/, /author/, /page= variants. Reference it from robots.txt with an absolute `Sitemap:` line.

A5 · Homepage discovery tags: markdown alternate + OpenGraph

Anchor: /rubric#a5

discovery tags let agents find the markdown version of a page without a separate probe, and OpenGraph turns shared docs links into rich previews on Slack/Discord/Twitter. Both signal an awareness of machine consumers.

Scoring. 2 = `<link rel=alternate type=text/markdown>` + ≥3 distinct `og:` properties · 1 = markdown alternate only OR OpenGraph only · 0 = neither.

Fix. Add `<link rel="alternate" type="text/markdown" href="/page.md">` next to your canonical link, and ensure `og:title`, `og:description`, `og:image` (minimum 3 properties) are set on the homepage.

B — Per-page artifacts (22 pts)

Can a parser ingest each page's contents directly?

B1 · .md companion of doc pages returns clean markdown

Anchor: /rubric#b1

browsing 200 HTML pages to read your docs is fine for humans; for agents it's an order-of-magnitude tokenization cost. A markdown twin per page lets agents pull just the prose.

Scoring. 7 = 3/3 sampled pages have a working .md twin · 4 = 2/3 · 2 = 1/3 · 0 = 0/3.

Fix. Serve `{page}.md` (or `{page}/index.md`) alongside every HTML page, OR support `Accept: text/markdown` content negotiation that returns `Content-Type: text/markdown`. mkdocs-material and docusaurus both have plugins.

B2 · JSON-LD with valid @type on homepage and sample doc page

Anchor: /rubric#b2

JSON-LD is the schema.org-compatible way to declare "this page is an Article" / "this product is a SoftwareApplication". Search engines, agents, and structured-data extractors all key off it.

Scoring. 4 = parseable JSON-LD with `@type` on BOTH homepage and a sample doc page · 3 = one of the two · 0 = none.

Fix. Embed `<script type="application/ld+json">{"@context":"https://schema.org","@type":"TechArticle",...}</script>` on every doc page. The `WebApplication` type is a good fit for the homepage.

B3 · Absolute <link rel=canonical> on homepage and sample page

Anchor: /rubric#b3

canonical links resolve the "is this the http or https version, with or without trailing slash, with or without query?" question deterministically. Without them, agents may index the same content under multiple URLs.

Scoring. 3 = absolute canonical on BOTH homepage AND a sample page · 2 = homepage only · 1 = present but relative · 0 = absent.

Fix. Add `<link rel="canonical" href="https://example.com/page">` to every page's `<head>`. The URL must include the scheme + host (relative canonicals are valid HTML but defeat the purpose for cross-host agents).

B4 · Freshness: dateModified (JSON-LD) or Last-Modified header

Anchor: /rubric#b4

agents (and search engines) lower their trust in docs that don't declare when they were last updated. A docs page from 2019 with no freshness signal is indistinguishable from one updated yesterday.

Scoring. 2 = JSON-LD `dateModified` OR HTTP `Last-Modified` header present · 0 = neither.

Fix. Either include `"dateModified": "2026-05-28"` in your JSON-LD block, or have your CDN/server emit a `Last-Modified` HTTP header. Build-time templating does this for free in most static-site generators.

B5 · Machine-readable taxonomies (keywords, tags, categories)

Anchor: /rubric#b5

tagged docs help agents filter ("show me the auth-related pages") without parsing full prose. `<meta name="keywords">`, JSON-LD `keywords`, or `/tags/`-style URLs all count.

Scoring. 2 = at least one taxonomy signal present (meta keywords, JSON-LD keywords, or /tags|/categories|/topics/ link patterns) · 0 = none.

Fix. Add `<meta name="keywords" content="api,auth,oauth">` to each page, OR include a `keywords` array in your JSON-LD, OR organise content under `/topics/` or `/tags/` URL prefixes.

B6 · <main> or <article> wraps the primary content prose

Anchor: /rubric#b6

semantic HTML5 wrappers let agents (and screen readers) strip the navigation/footer/sidebars and read just the docs prose. A page where the body is all `<div>` requires guesswork.

Scoring. 2 = `<main>` text >200 chars AND `<article>` text >100 chars · 1 = `<main>` only OR `<article>` only · 0 = neither.

Fix. Wrap your page's primary prose in `<main>` (or `<article>` for individual doc pages). Avoid using these for sidebars or navigation — they're meant for the actual content.

C — API spec (25 pts)

Is the API contract published as a machine-readable spec?

C1a · OpenAPI / Swagger / AsyncAPI spec at a discoverable URL

Anchor: /rubric#c1a

an OpenAPI spec is THE primary machine-readable contract for a REST API. Agents that find it can generate clients, test cases, and accurate docs without reading any HTML.

Scoring. 8 = found at a standard probe path (/openapi.json, /swagger.yaml, /v1/openapi.json, etc.) · 5 = found via HTML link discovery only · 0 = nothing found.

Fix. Publish your spec at `/openapi.json` or `/openapi.yaml` at host root (or under your docs path — Phase 53 probes both). For OpenAPI-first projects, your build tool already produces this — just expose it.

C1b · Valid OpenAPI 3.x with info, ≥1 path, and response schemas

Anchor: /rubric#c1b

finding a spec (C1a) is half the battle; the spec must also be valid 3.x AND describe response schemas so agents can know what they'll get back. A spec that lists paths but no response shapes is half a contract.

Scoring. 7 = OpenAPI 3.x + valid info + paths + ≥30% operations have response content schemas · 6 = +paths but few schemas · 3 = Swagger 2.0 fallback · 0 = parse error.

Fix. Bump your spec to OpenAPI 3.0 or 3.1 if you're still on Swagger 2.0. Add `responses: { '200': { content: { 'application/json': { schema: ... } } } }` to each operation — schemas are what make a spec useful to clients.

C2 · Postman collection or SDKs with discoverable download/fork

Anchor: /rubric#c2

an OpenAPI spec lets agents generate a client; a curated Postman collection or pre-built SDK lets HUMANS try the API in 30 seconds. Both signal investment in developer experience.

Scoring. 4 = Postman collection link AND ≥1 SDK registry link · 3 = Postman OR ≥2 SDK links · 2 = 1 SDK link · 0 = nothing.

Fix. Publish a "Run in Postman" button linking to god.gw.postman.com/run-collection, and link to at least one official SDK from npm/PyPI/RubyGems/etc. directly from your docs homepage.

C3 · Endpoint pages show method, URL, types, required, examples

Anchor: /rubric#c3

a docs page that just says "call /users" is useless without method, parameter types, required fields, and a sample request/response. Agents (and humans) need all five to make a working call.

Scoring. 5 = majority of sampled pages classified `complete` by the ML model · 3 = majority `partial` (or 2 complete + 1 absent) · 1 = majority `absent` · 0 = no candidate pages found.

Fix. On every endpoint page include: HTTP method + path, a parameter table with types and required flags, a curl example, and a JSON response example with status code. Markdown-style param tables and `<pre>` JSON blocks classify cleanly.

D — Content depth (20 pts)

Does each endpoint page carry enough context to use?

D1 · Code examples include curl AND at least one language SDK

Anchor: /rubric#d1

curl examples are universally testable; SDK examples show idiomatic usage. Together they hit both the "can I try this quickly?" and the "how do I integrate?" needs.

Scoring. 4 = curl AND a language SDK block on ≥1 page · 2 = curl only · 1 = SDK only · 0 = neither.

Fix. Add a tabbed code block per endpoint with at least curl + your most-used SDK language (Python or JavaScript). Use `<code class="language-python">` or `language-bash` so syntax highlighters and our classifier both pick it up.

D2 · Realistic examples (not foo/bar/example.com)

Anchor: /rubric#d2

`/users/{id}` with `id = 1` and `email = [email protected]` requires the reader to imagine what real data looks like. Realistic placeholders (`[email protected]`, `org_2N5x...`) reduce friction and prevent paste-from-docs accidents.

Scoring. 4 = ML model says <20% of code blocks are placeholder-heavy · 3 = 20-40% · 2 = 40-60% · 1 = 60-80% · 0 = >80% or no code blocks.

Fix. Replace `foo`/`bar`/`example.com`/`your_api_key`/`<string>` with realistic-looking values (Stripe's `pk_test_51N5...`, Twilio's `+14155552671`). Don't use real customer data — but mimic its shape.

D3 · Error catalogue with HTTP codes + reasons

Anchor: /rubric#d3

when an integration breaks at 3am, the dev needs to know what `403 - resource_not_owned` actually means without filing a ticket. A dedicated error reference page is the difference between a 5-minute fix and a half-hour debug.

Scoring. 3 = dedicated error page (≥3 codes with explanations) · 1 = error codes documented inline across pages · 0 = none.

Fix. Publish `/errors` (or `/reference/errors`) listing each HTTP status you return + the application-level error codes + a one-sentence cause for each. Tables work well; so do `<dl>` definition lists.

D4 · Authentication AND rate limits documented

Anchor: /rubric#d4

auth is table-stakes; rate limits are how a dev knows whether their integration will survive production load. Both belong on a top-level docs page that's discoverable from the homepage.

Scoring. 3 = both auth and rate-limits documented · 2 = auth only · 1 = rate-limits only · 0 = neither.

Fix. Add `/authentication` (bearer / API-key / OAuth flows) and `/rate-limits` (req/min, headers like `X-RateLimit-Remaining`, 429 retry semantics) pages. Each needs at least 200 chars of context — not just a code snippet.

D5 · Glossary OR consistent terminology across pages

Anchor: /rubric#d5

is it a "workspace", "team", or "organisation"? Picking one term and sticking with it across all docs prevents a class of "what does X mean here?" support tickets. A dedicated glossary is best; consistent usage is acceptable.

Scoring. 3 = dedicated /glossary with ≥3 structured term/definition pairs · 2 = no glossary but cross-page terminology stays consistent (≥80% dominant variant) · 1 = glossary link exists but content is sparse · 0 = neither.

Fix. Publish `/glossary` as a `<dl>` with `<dt>term</dt><dd>definition</dd>` pairs (or a 2-column table with ≥50-char definitions). Use the same casing/spelling for each term across all pages.

D6 · Deprecated / beta endpoints marked in plain text

Anchor: /rubric#d6

a developer pasting your code sample from 2022 into a 2026 project shouldn't discover the endpoint is deprecated at runtime. Explicit `deprecated` / `beta` / `sunset` markers in the docs save migration headaches.

Scoring. 2 = `deprecated` in OpenAPI spec OR in ≥2 sample pages near endpoint headings · 1 = beta/experimental keywords found but no deprecation · 0 = none.

Fix. Mark each deprecated endpoint with `deprecated: true` in OpenAPI AND a visible badge or admonition in the HTML docs (Mintlify's `<Warning>`, Docusaurus's admonition syntax, etc.). Same for beta endpoints — visible in the prose, not just in the spec.

E — Hygiene (15 pts)

Is the site stable + usable without JavaScript?

E1 · Content visible in plain HTML without JavaScript (gating)

Anchor: /rubric#e1

this is the GATING criterion. If your docs only render after JavaScript runs (single-page-app shell), agents that fetch raw HTML see nothing. WebCrawlers + scrapers + curl + most AI fetchers don't run JS.

Scoring. 6 = body text >500 chars on at least one of homepage or 2 sub-pages across UA modes · 3 = homepage passes but sub-pages SPA · 0 = SPA shell everywhere, or VK-trap (3+ URLs return identical body).

Fix. Serve pre-rendered HTML at static URLs. If you use Next.js/Nuxt/SvelteKit, enable SSG or SSR for the docs section. Single-page-app shells (React SPA, Vue SPA without SSR) fail this gate and cascade-zero many other criteria.

E2 · Stable URLs: 301 redirects preserve old paths

Anchor: /rubric#e2

when you reorganise docs, old links shouldn't 404 — they should 301 to the new URL. Stable URLs are how internal links from blogs, Stack Overflow, and bookmarks survive your refactor.

Scoring. 2 = stable 301/308 redirect on ≥1 of 2 sampled URL-variants · 1 = canonical-alias pattern (200 with `<link rel=canonical>`) · 0 = 302 (impermanent), 404, or no redirect.

Fix. When you change a doc URL, add a 301 redirect from the old path to the new one. Static-site generators handle this via `_redirects` (Netlify) or `vercel.json` `redirects:` (Vercel) configs.

E3 · Explicit API version in URL path, heading, or OpenAPI spec

Anchor: /rubric#e3

`/v1/users` vs `/v2/users` is the cheap way to do API versioning AND make it obvious to agents. Version metadata that's only in a header (and not in the path or docs heading) is invisible to crawlers.

Scoring. 2 = version in URL path (`/v1/`, `/2024-01-15/`) OR OpenAPI `info.version` non-empty · 1 = version in `<h1>`/`<h2>` headings only · 0 = none.

Fix. Either prefix your API paths with `/v1/`, `/v2/` (URL-versioned) OR set a non-empty `info.version` in your OpenAPI spec. Date-versioned APIs (`/2024-01-15/users`) also count.

E4 · Spot-check of 5 internal links → all return 200

Anchor: /rubric#e4

rotten internal links are the most-common docs failure mode after years of churn. A 5-link spot-check catches the worst cases (4xx/5xx on links right on your homepage) without trying to crawl every link.

Scoring. 2 = 5/5 of sampled same-host links return 200 · 1 = 4/5 · 0 = ≤3/5, OR fewer than 5 distinct same-host links found on the homepage.

Fix. Run a link-checker as part of your CI (lychee, htmltest, linkinator). For the most-trafficked links on the homepage, fix any 404s before shipping. Five working links from the landing page is the bare minimum.

E5 · Usage terms: TOS / license / AI policy explicit

Anchor: /rubric#e5

without an explicit TOS or AI-usage policy, every LLM scraper has to guess your stance. Adding a substantive `/terms` or `/license` page — especially one with AI/ML keywords — makes the policy machine-readable.

Scoring. 2 = TOS page found AND contains AI/ML policy keywords in main content · 1 = TOS found (substantial but no AI keywords, or link present but page 404/sparse) · 0 = no TOS link.

Fix. Publish `/terms` (or `/legal`, `/license`) with at least 1000 chars of policy text. Include explicit language on AI scraping, model training, and automated access — even if you allow everything, saying so is the signal.

framing article


agentfit · browse · rubric · privacy · terms · cookies · cookie settings · bot

© 2026 Stanislav Gumeniuk · All rights reserved