https://docs.coderabbit.ai

done · finished 2026-06-06 11:33 UTC · run id 019e9cb5-be55-7265-b2e1-e3bf15eb7987

ID	Criterion	Status	Score
`A1`	llms.txt at host root conforms to llmstxt.org spec agents (and humans new to a site) need a single, predictable index of where the docs live. /llms.txt is the convention proposed by Anthropic + the llmstxt.org community: one Markdown file at the host root with the docs map. Scoring. 2 = H1 + ≥1 H2 + ≥3 links + all resolve · 1 = any H1-bearing file (downgraded from 2 if links don't resolve) · 0 = missing or HTML shell. Fix. Publish /llms.txt at host root with an `# H1` title, `## H2` section headers, and at least three Markdown bullet links pointing at concrete doc pages. See https://llmstxt.org for the spec. Evidence. https://docs.coderabbit.ai/llms.txt /rubric#a1 · article	present	2/2
`A2`	llms-full.txt or per-section LLM aggregates exist /llms-full.txt is the full-text dump of your docs in one place — large language model agents prefer it over crawling 200 HTML pages. Per-section variants (/llms-api.txt etc.) work too. Scoring. 3 = /llms-full.txt at host root, >1 KB · 2 = per-section aggregate found via llms.txt · 0 = neither, or SPA-shell at /llms-full.txt. Fix. Generate /llms-full.txt at build time (mkdocs/docusaurus plugins exist) and serve it as text/plain. Keep it under 100 MB so agents can fetch it without streaming. Evidence. https://docs.coderabbit.ai/llms-full.txt /rubric#a2 · article	present	3/3
`A3`	robots.txt declares an AI-bot policy and absolute Sitemap every AI crawler (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot) reads /robots.txt before crawling. An explicit Allow/Disallow per UA + an absolute `Sitemap:` directive removes ambiguity about both indexing and what to index. Scoring. 3 = explicit AI-bot UA directive AND absolute Sitemap: line · 2 = AI-bot directive only · 1 = absolute Sitemap only · 0 = neither, or 404. Fix. Add `User-agent: GPTBot\nAllow: /` (or Disallow as policy dictates) for each major LLM bot, plus a `Sitemap: https://example.com/sitemap.xml` line. Cloudflare's `Content-Signal:` directive also counts. Evidence. https://docs.coderabbit.ai/robots.txt /rubric#a3 · article	present	3/3
`A4`	sitemap.xml: well-formed, absolute URLs, low taxonomy noise sitemaps tell crawlers what to index and how often it changes. A well-formed `<urlset>` with absolute `<loc>` URLs across many distinct pages signals real coverage; a stub with one URL (or one with 70%+ /tag/ /category/ noise) gives no signal. Scoring. 3 = well-formed + ≥3 distinct paths + <30% taxonomy junk · 2 = thin (<3 paths) or 30-70% junk · 1 = well-formed only · 0 = 404 or parse error. Fix. Generate /sitemap.xml at build time, include every doc page with `<loc>` as absolute URLs, and exclude /tag/, /category/, /author/, /page= variants. Reference it from robots.txt with an absolute `Sitemap:` line. Evidence. https://docs.coderabbit.ai/sitemap.xml /rubric#a4 · article	present	3/3
`A5`	Homepage discovery tags: markdown alternate + OpenGraph discovery tags let agents find the markdown version of a page without a separate probe, and OpenGraph turns shared docs links into rich previews on Slack/Discord/Twitter. Both signal an awareness of machine consumers. Scoring. 2 = `<link rel=alternate type=text/markdown>` + ≥3 distinct `og:` properties · 1 = markdown alternate only OR OpenGraph only · 0 = neither. Fix. Add `<link rel="alternate" type="text/markdown" href="/page.md">` next to your canonical link, and ensure `og:title`, `og:description`, `og:image` (minimum 3 properties) are set on the homepage. Evidence. https://docs.coderabbit.ai /rubric#a5 · article	partial	1/2
`B1`	.md companion of doc pages returns clean markdown browsing 200 HTML pages to read your docs is fine for humans; for agents it's an order-of-magnitude tokenization cost. A markdown twin per page lets agents pull just the prose. Scoring. 7 = 3/3 sampled pages have a working .md twin · 4 = 2/3 · 2 = 1/3 · 0 = 0/3. Fix. Serve `{page}.md` (or `{page}/index.md`) alongside every HTML page, OR support `Accept: text/markdown` content negotiation that returns `Content-Type: text/markdown`. mkdocs-material and docusaurus both have plugins. Evidence. https://docs.coderabbit.ai/api-reference/audit-logs.md /rubric#b1 · article	present	7/7
`B2`	JSON-LD with valid @type on homepage and sample doc page JSON-LD is the schema.org-compatible way to declare "this page is an Article" / "this product is a SoftwareApplication". Search engines, agents, and structured-data extractors all key off it. Scoring. 4 = parseable JSON-LD with `@type` on BOTH homepage and a sample doc page · 3 = one of the two · 0 = none. Fix. Embed `<script type="application/ld+json">{"@context":"https://schema.org","@type":"TechArticle",...}</script>` on every doc page. The `WebApplication` type is a good fit for the homepage. Evidence. https://docs.coderabbit.ai /rubric#b2 · article	present	4/4
`B3`	Absolute <link rel=canonical> on homepage and sample page canonical links resolve the "is this the http or https version, with or without trailing slash, with or without query?" question deterministically. Without them, agents may index the same content under multiple URLs. Scoring. 3 = absolute canonical on BOTH homepage AND a sample page · 2 = homepage only · 1 = present but relative · 0 = absent. Fix. Add `<link rel="canonical" href="https://example.com/page">` to every page's `<head>`. The URL must include the scheme + host (relative canonicals are valid HTML but defeat the purpose for cross-host agents). Evidence. https://docs.coderabbit.ai /rubric#b3 · article	present	3/3
`B4`	Freshness: dateModified (JSON-LD) or Last-Modified header agents (and search engines) lower their trust in docs that don't declare when they were last updated. A docs page from 2019 with no freshness signal is indistinguishable from one updated yesterday. Scoring. 2 = JSON-LD `dateModified` OR HTTP `Last-Modified` header present · 0 = neither. Fix. Either include `"dateModified": "2026-05-28"` in your JSON-LD block, or have your CDN/server emit a `Last-Modified` HTTP header. Build-time templating does this for free in most static-site generators. Evidence. https://docs.coderabbit.ai /rubric#b4 · article	present	2/2
`B5`	Machine-readable taxonomies (keywords, tags, categories) tagged docs help agents filter ("show me the auth-related pages") without parsing full prose. `<meta name="keywords">`, JSON-LD `keywords`, or `/tags/`-style URLs all count. Scoring. 2 = at least one taxonomy signal present (meta keywords, JSON-LD keywords, or /tags\|/categories\|/topics/ link patterns) · 0 = none. Fix. Add `<meta name="keywords" content="api,auth,oauth">` to each page, OR include a `keywords` array in your JSON-LD, OR organise content under `/topics/` or `/tags/` URL prefixes. Evidence. https://docs.coderabbit.ai /rubric#b5 · article	absent	0/2
`B6`	<main> or <article> wraps the primary content prose semantic HTML5 wrappers let agents (and screen readers) strip the navigation/footer/sidebars and read just the docs prose. A page where the body is all `<div>` requires guesswork. Scoring. 2 = `<main>` text >200 chars AND `<article>` text >100 chars · 1 = `<main>` only OR `<article>` only · 0 = neither. Fix. Wrap your page's primary prose in `<main>` (or `<article>` for individual doc pages). Avoid using these for sidebars or navigation — they're meant for the actual content. Evidence. https://docs.coderabbit.ai /rubric#b6 · article	partial	1/2
`C1a`	OpenAPI / Swagger / AsyncAPI spec at a discoverable URL an OpenAPI spec is THE primary machine-readable contract for a REST API. Agents that find it can generate clients, test cases, and accurate docs without reading any HTML. Scoring. 8 = found at a standard probe path (/openapi.json, /swagger.yaml, /v1/openapi.json, etc.) · 5 = found via HTML link discovery only · 0 = nothing found. Fix. Publish your spec at `/openapi.json` or `/openapi.yaml` at host root (or under your docs path — Phase 53 probes both). For OpenAPI-first projects, your build tool already produces this — just expose it. Evidence. https://docs.coderabbit.ai/openapi.json /rubric#c1a · article	present	8/8
`C1b`	Valid OpenAPI 3.x with info, ≥1 path, and response schemas finding a spec (C1a) is half the battle; the spec must also be valid 3.x AND describe response schemas so agents can know what they'll get back. A spec that lists paths but no response shapes is half a contract. Scoring. 7 = OpenAPI 3.x + valid info + paths + ≥30% operations have response content schemas · 6 = +paths but few schemas · 3 = Swagger 2.0 fallback · 0 = parse error. Fix. Bump your spec to OpenAPI 3.0 or 3.1 if you're still on Swagger 2.0. Add `responses: { '200': { content: { 'application/json': { schema: ... } } } }` to each operation — schemas are what make a spec useful to clients. Evidence. https://docs.coderabbit.ai/openapi.json /rubric#c1b · article	present	7/7
`C2`	Postman collection or SDKs with discoverable download/fork an OpenAPI spec lets agents generate a client; a curated Postman collection or pre-built SDK lets HUMANS try the API in 30 seconds. Both signal investment in developer experience. Scoring. 4 = Postman collection link AND ≥1 SDK registry link · 3 = Postman OR ≥2 SDK links · 2 = 1 SDK link · 0 = nothing. Fix. Publish a "Run in Postman" button linking to god.gw.postman.com/run-collection, and link to at least one official SDK from npm/PyPI/RubyGems/etc. directly from your docs homepage. Evidence. https://docs.coderabbit.ai /rubric#c2 · article	absent	0/4
`C3`	Endpoint pages show method, URL, types, required, examples a docs page that just says "call /users" is useless without method, parameter types, required fields, and a sample request/response. Agents (and humans) need all five to make a working call. Scoring. 5 = majority of sampled pages classified `complete` by the ML model · 3 = majority `partial` (or 2 complete + 1 absent) · 1 = majority `absent` · 0 = no candidate pages found. Fix. On every endpoint page include: HTTP method + path, a parameter table with types and required flags, a curl example, and a JSON response example with status code. Markdown-style param tables and `<pre>` JSON blocks classify cleanly. Evidence. https://docs.coderabbit.ai — ml: model unavailable; heuristic stub (2/5) /rubric#c3 · article	partial	2/5
`D1`	Code examples include curl AND at least one language SDK curl examples are universally testable; SDK examples show idiomatic usage. Together they hit both the "can I try this quickly?" and the "how do I integrate?" needs. Scoring. 4 = curl AND a language SDK block on ≥1 page · 2 = curl only · 1 = SDK only · 0 = neither. Fix. Add a tabbed code block per endpoint with at least curl + your most-used SDK language (Python or JavaScript). Use `<code class="language-python">` or `language-bash` so syntax highlighters and our classifier both pick it up. Evidence. https://docs.coderabbit.ai/api-reference/audit-logs /rubric#d1 · article	partial	2/4
`D2`	Realistic examples (not foo/bar/example.com) `/users/{id}` with `id = 1` and `email = [email protected]` requires the reader to imagine what real data looks like. Realistic placeholders (`[email protected]`, `org_2N5x...`) reduce friction and prevent paste-from-docs accidents. Scoring. 4 = ML model says <20% of code blocks are placeholder-heavy · 3 = 20-40% · 2 = 40-60% · 1 = 60-80% · 0 = >80% or no code blocks. Fix. Replace `foo`/`bar`/`example.com`/`your_api_key`/`<string>` with realistic-looking values (Stripe's `pk_test_51N5...`, Twilio's `+14155552671`). Don't use real customer data — but mimic its shape. Evidence. https://docs.coderabbit.ai/api-reference/audit-logs — regex-only; ML unavailable /rubric#d2 · article	present	4/4
`D3`	Error catalogue with HTTP codes + reasons when an integration breaks at 3am, the dev needs to know what `403 - resource_not_owned` actually means without filing a ticket. A dedicated error reference page is the difference between a 5-minute fix and a half-hour debug. Scoring. 3 = dedicated error page (≥3 codes with explanations) · 1 = error codes documented inline across pages · 0 = none. Fix. Publish `/errors` (or `/reference/errors`) listing each HTTP status you return + the application-level error codes + a one-sentence cause for each. Tables work well; so do `<dl>` definition lists. Evidence. https://docs.coderabbit.ai /rubric#d3 · article	partial	1/3
`D4`	Authentication AND rate limits documented auth is table-stakes; rate limits are how a dev knows whether their integration will survive production load. Both belong on a top-level docs page that's discoverable from the homepage. Scoring. 3 = both auth and rate-limits documented · 2 = auth only · 1 = rate-limits only · 0 = neither. Fix. Add `/authentication` (bearer / API-key / OAuth flows) and `/rate-limits` (req/min, headers like `X-RateLimit-Remaining`, 429 retry semantics) pages. Each needs at least 200 chars of context — not just a code snippet. Evidence. https://docs.coderabbit.ai/api-reference/audit-logs /rubric#d4 · article	partial	2/3
`D5`	Glossary OR consistent terminology across pages is it a "workspace", "team", or "organisation"? Picking one term and sticking with it across all docs prevents a class of "what does X mean here?" support tickets. A dedicated glossary is best; consistent usage is acceptable. Scoring. 3 = dedicated /glossary with ≥3 structured term/definition pairs · 2 = no glossary but cross-page terminology stays consistent (≥80% dominant variant) · 1 = glossary link exists but content is sparse · 0 = neither. Fix. Publish `/glossary` as a `<dl>` with `<dt>term</dt><dd>definition</dd>` pairs (or a 2-column table with ≥50-char definitions). Use the same casing/spelling for each term across all pages. Evidence. https://docs.coderabbit.ai/reference/glossary /rubric#d5 · article	partial	1/3
`D6`	Deprecated / beta endpoints marked in plain text a developer pasting your code sample from 2022 into a 2026 project shouldn't discover the endpoint is deprecated at runtime. Explicit `deprecated` / `beta` / `sunset` markers in the docs save migration headaches. Scoring. 2 = `deprecated` in OpenAPI spec OR in ≥2 sample pages near endpoint headings · 1 = beta/experimental keywords found but no deprecation · 0 = none. Fix. Mark each deprecated endpoint with `deprecated: true` in OpenAPI AND a visible badge or admonition in the HTML docs (Mintlify's `<Warning>`, Docusaurus's admonition syntax, etc.). Same for beta endpoints — visible in the prose, not just in the spec. Evidence. https://docs.coderabbit.ai/api-reference/audit-logs /rubric#d6 · article	partial	1/2
`E1`	Content visible in plain HTML without JavaScript (gating) this is the GATING criterion. If your docs only render after JavaScript runs (single-page-app shell), agents that fetch raw HTML see nothing. WebCrawlers + scrapers + curl + most AI fetchers don't run JS. Scoring. 6 = body text >500 chars on at least one of homepage or 2 sub-pages across UA modes · 3 = homepage passes but sub-pages SPA · 0 = SPA shell everywhere, or VK-trap (3+ URLs return identical body). Fix. Serve pre-rendered HTML at static URLs. If you use Next.js/Nuxt/SvelteKit, enable SSG or SSR for the docs section. Single-page-app shells (React SPA, Vue SPA without SSR) fail this gate and cascade-zero many other criteria. Evidence. https://docs.coderabbit.ai /rubric#e1 · article	present	6/6
`E2`	Stable URLs: 301 redirects preserve old paths when you reorganise docs, old links shouldn't 404 — they should 301 to the new URL. Stable URLs are how internal links from blogs, Stack Overflow, and bookmarks survive your refactor. Scoring. 2 = stable 301/308 redirect on ≥1 of 2 sampled URL-variants · 1 = canonical-alias pattern (200 with `<link rel=canonical>`) · 0 = 302 (impermanent), 404, or no redirect. Fix. When you change a doc URL, add a 301 redirect from the old path to the new one. Static-site generators handle this via `_redirects` (Netlify) or `vercel.json` `redirects:` (Vercel) configs. Evidence. https://docs.coderabbit.ai/api-reference/audit-logs/ /rubric#e2 · article	present	2/2
`E3`	Explicit API version in URL path, heading, or OpenAPI spec `/v1/users` vs `/v2/users` is the cheap way to do API versioning AND make it obvious to agents. Version metadata that's only in a header (and not in the path or docs heading) is invisible to crawlers. Scoring. 2 = version in the docs' own URL structure (sitemap or OpenAPI-spec URL: `/v1/`, `/2024-01-15/`) · 1 = version in a documented API endpoint URL (curl/code examples), in OpenAPI `info.version`, or in an `<h1>`/`<h2>`/footer heading · 0 = none. Fix. Prefix your API paths with `/v1/`, `/v2/` and show them in your curl/code examples; OR set a non-empty `info.version` in your OpenAPI spec. Date-versioned APIs (`/2024-01-15/users`) also count. Evidence. https://docs.coderabbit.ai /rubric#e3 · article	absent	0/2
`E4`	Spot-check of 5 internal links → all return 200 rotten internal links are the most-common docs failure mode after years of churn. A 5-link spot-check catches the worst cases (4xx/5xx on links right on your homepage) without trying to crawl every link. Scoring. 2 = 5/5 of sampled same-host links return 200 · 1 = 4/5 · 0 = ≤3/5, OR fewer than 5 distinct same-host links found on the homepage. Fix. Run a link-checker as part of your CI (lychee, htmltest, linkinator). For the most-trafficked links on the homepage, fix any 404s before shipping. Five working links from the landing page is the bare minimum. Evidence. https://docs.coderabbit.ai/ /rubric#e4 · article	present	2/2
`E5`	Usage terms: TOS / license / AI policy explicit without an explicit TOS or AI-usage policy, every LLM scraper has to guess your stance. Adding a substantive `/terms` or `/license` page — especially one with AI/ML keywords — makes the policy machine-readable. Scoring. 2 = TOS page found AND contains AI/ML policy keywords in main content · 1 = TOS found (substantial but no AI keywords, or link present but page 404/sparse) · 0 = no TOS link. Fix. Publish `/terms` (or `/legal`, `/license`) with at least 1000 chars of policy text. Include explicit language on AI scraping, model training, and automated access — even if you allow everything, saying so is the signal. Evidence. https://docs.coderabbit.ai /rubric#e5 · article	absent	0/2
`F1`	Agentic discovery breadth: llms.txt, llms-full, MCP an agent finds your site through several surfaces; F1 rewards exposing more than one — a reachable llms.txt, a full-content llms-full.txt feed, and an advertised MCP/agent endpoint. It complements A1/A2 (which judge each surface's quality) by counting how many an agent can discover. Scoring. 2 = ≥2 of {reachable llms.txt, llms-full.txt exists, MCP/agent endpoint advertised} · 1 = one of those · 0 = none · error if the site couldn't be fetched. Fix. Publish `/llms.txt` and `/llms-full.txt` at host root, and reference your MCP server or a `.well-known/` discovery endpoint inside `/llms.txt` so agents can find it. Evidence. https://docs.coderabbit.ai/llms.txt /rubric#f1 · article	present	2/2
`F2`	WebMCP declarative tool forms, schema-valid WebMCP lets a page expose callable tools to in-browser agents via declarative `<form toolname tooldescription>` markup. F2 rewards a present, schema-clean WebMCP surface so an agent can invoke the tools without guessing. Scoring. 2 = WebMCP detected, 0 schema errors + 0 warnings · 1 = detected with schema issues (errors or warnings) · 0 = not detected · error if the homepage couldn't be scanned. Fix. Add WebMCP declarative tool forms — each with `toolname` + `tooldescription`, and `name`+`toolparamdescription` on every input. Fix missing-toolname / required-param-no-name errors first; they're the hard failures. /rubric#f2 · article	absent	0/2
`F3`	MCP server advertised (RFC 9728 / 8414 OAuth) an MCP server lets agents call your API as governed tools. F3 rewards advertising a discoverable, OAuth-protected MCP endpoint via the standard `.well-known` metadata, so an agent can authenticate and connect without bespoke setup. Scoring. 3 = full oauth-mcp (RFC 9728 protected-resource + RFC 8414 auth-server metadata + PKCE S256) · 2 = partial · 1 = endpoint-only · 0 = none · error if the site was unreachable. Fix. Serve `/.well-known/oauth-protected-resource` pointing at your MCP endpoint and a same-host RFC 8414 authorization-server metadata document advertising `S256` PKCE. Evidence. https://docs.coderabbit.ai/.well-known/oauth-protected-resource/integrations/mcp-servers — mcp tier: partial /rubric#f3 · article	partial	2/3
`F4`	Agent accessibility: static name + ARIA validity an AI agent operates a page through its accessibility tree — it needs every button, link, input and image to carry a name it can target. Missing accessible names, invalid ARIA, and positive tabindex all break that, leaving controls an agent can see but can't reliably name or actuate. Scoring. 3 = 0 violations (and ≥1 element to check) · 2 = 1–2 · 1 = 3–5 · 0 = ≥6 · not_applicable if the page has nothing nameable · error if the homepage couldn't be scanned. Static heuristic (no axe-core/headless): name + ARIA-validity rules only. Fix. Give every `<button>`/`<a>`/icon an accessible name (text, `aria-label`, or a labelled child `<img alt>`); associate `<label>`s with inputs/selects; add `alt` to images and `<title>` to inline SVGs; drop positive `tabindex`; fix typo'd `aria-` attributes and roles. Evidence.* https://docs.coderabbit.ai — static a11y heuristic: name/attr-validity rules only; computed-tree rules (required-children/parent, hidden-focus, role-conflict) not checked — no axe-core/headless /rubric#f4 · article	present	3/3

Criterion

Status

Score

A1

llms.txt at host root conforms to llmstxt.org spec

agents (and humans new to a site) need a single, predictable index of where the docs live. /llms.txt is the convention proposed by Anthropic + the llmstxt.org community: one Markdown file at the host root with the docs map.

Scoring. 2 = H1 + ≥1 H2 + ≥3 links + all resolve · 1 = any H1-bearing file (downgraded from 2 if links don't resolve) · 0 = missing or HTML shell.

Fix. Publish /llms.txt at host root with an `# H1` title, `## H2` section headers, and at least three Markdown bullet links pointing at concrete doc pages. See https://llmstxt.org for the spec.

Evidence. https://docs.coderabbit.ai/llms.txt

/rubric#a1 · article

present

2/2

A2

llms-full.txt or per-section LLM aggregates exist

/llms-full.txt is the full-text dump of your docs in one place — large language model agents prefer it over crawling 200 HTML pages. Per-section variants (/llms-api.txt etc.) work too.

Scoring. 3 = /llms-full.txt at host root, >1 KB · 2 = per-section aggregate found via llms.txt · 0 = neither, or SPA-shell at /llms-full.txt.

Fix. Generate /llms-full.txt at build time (mkdocs/docusaurus plugins exist) and serve it as text/plain. Keep it under 100 MB so agents can fetch it without streaming.

Evidence. https://docs.coderabbit.ai/llms-full.txt

/rubric#a2 · article

present

3/3

A3

robots.txt declares an AI-bot policy and absolute Sitemap

every AI crawler (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot) reads /robots.txt before crawling. An explicit Allow/Disallow per UA + an absolute `Sitemap:` directive removes ambiguity about both indexing and what to index.

Scoring. 3 = explicit AI-bot UA directive AND absolute Sitemap: line · 2 = AI-bot directive only · 1 = absolute Sitemap only · 0 = neither, or 404.

Fix. Add `User-agent: GPTBot\nAllow: /` (or Disallow as policy dictates) for each major LLM bot, plus a `Sitemap: https://example.com/sitemap.xml` line. Cloudflare's `Content-Signal:` directive also counts.

Evidence. https://docs.coderabbit.ai/robots.txt

/rubric#a3 · article

present

3/3

A4

sitemap.xml: well-formed, absolute URLs, low taxonomy noise

sitemaps tell crawlers what to index and how often it changes. A well-formed `<urlset>` with absolute `<loc>` URLs across many distinct pages signals real coverage; a stub with one URL (or one with 70%+ /tag/ /category/ noise) gives no signal.

Scoring. 3 = well-formed + ≥3 distinct paths + <30% taxonomy junk · 2 = thin (<3 paths) or 30-70% junk · 1 = well-formed only · 0 = 404 or parse error.

Fix. Generate /sitemap.xml at build time, include every doc page with `<loc>` as absolute URLs, and exclude /tag/, /category/, /author/, /page= variants. Reference it from robots.txt with an absolute `Sitemap:` line.

Evidence. https://docs.coderabbit.ai/sitemap.xml

/rubric#a4 · article

present

3/3

A5

Homepage discovery tags: markdown alternate + OpenGraph

discovery tags let agents find the markdown version of a page without a separate probe, and OpenGraph turns shared docs links into rich previews on Slack/Discord/Twitter. Both signal an awareness of machine consumers.

Scoring. 2 = `<link rel=alternate type=text/markdown>` + ≥3 distinct `og:` properties · 1 = markdown alternate only OR OpenGraph only · 0 = neither.

Fix. Add `<link rel="alternate" type="text/markdown" href="/page.md">` next to your canonical link, and ensure `og:title`, `og:description`, `og:image` (minimum 3 properties) are set on the homepage.

Evidence. https://docs.coderabbit.ai

/rubric#a5 · article

partial

1/2

B1

.md companion of doc pages returns clean markdown

browsing 200 HTML pages to read your docs is fine for humans; for agents it's an order-of-magnitude tokenization cost. A markdown twin per page lets agents pull just the prose.

Scoring. 7 = 3/3 sampled pages have a working .md twin · 4 = 2/3 · 2 = 1/3 · 0 = 0/3.

Fix. Serve `{page}.md` (or `{page}/index.md`) alongside every HTML page, OR support `Accept: text/markdown` content negotiation that returns `Content-Type: text/markdown`. mkdocs-material and docusaurus both have plugins.

Evidence. https://docs.coderabbit.ai/api-reference/audit-logs.md

/rubric#b1 · article

present

7/7

B2

JSON-LD with valid @type on homepage and sample doc page

JSON-LD is the schema.org-compatible way to declare "this page is an Article" / "this product is a SoftwareApplication". Search engines, agents, and structured-data extractors all key off it.

Scoring. 4 = parseable JSON-LD with `@type` on BOTH homepage and a sample doc page · 3 = one of the two · 0 = none.

Fix. Embed `<script type="application/ld+json">{"@context":"https://schema.org","@type":"TechArticle",...}</script>` on every doc page. The `WebApplication` type is a good fit for the homepage.

Evidence. https://docs.coderabbit.ai

/rubric#b2 · article

present

4/4

B3

Absolute <link rel=canonical> on homepage and sample page

canonical links resolve the "is this the http or https version, with or without trailing slash, with or without query?" question deterministically. Without them, agents may index the same content under multiple URLs.

Scoring. 3 = absolute canonical on BOTH homepage AND a sample page · 2 = homepage only · 1 = present but relative · 0 = absent.

Fix. Add `<link rel="canonical" href="https://example.com/page">` to every page's `<head>`. The URL must include the scheme + host (relative canonicals are valid HTML but defeat the purpose for cross-host agents).

Evidence. https://docs.coderabbit.ai

/rubric#b3 · article

present

3/3

B4

Freshness: dateModified (JSON-LD) or Last-Modified header

agents (and search engines) lower their trust in docs that don't declare when they were last updated. A docs page from 2019 with no freshness signal is indistinguishable from one updated yesterday.

Scoring. 2 = JSON-LD `dateModified` OR HTTP `Last-Modified` header present · 0 = neither.

Fix. Either include `"dateModified": "2026-05-28"` in your JSON-LD block, or have your CDN/server emit a `Last-Modified` HTTP header. Build-time templating does this for free in most static-site generators.

Evidence. https://docs.coderabbit.ai

/rubric#b4 · article

present

2/2

B5

Machine-readable taxonomies (keywords, tags, categories)

tagged docs help agents filter ("show me the auth-related pages") without parsing full prose. `<meta name="keywords">`, JSON-LD `keywords`, or `/tags/`-style URLs all count.

Scoring. 2 = at least one taxonomy signal present (meta keywords, JSON-LD keywords, or /tags|/categories|/topics/ link patterns) · 0 = none.

Fix. Add `<meta name="keywords" content="api,auth,oauth">` to each page, OR include a `keywords` array in your JSON-LD, OR organise content under `/topics/` or `/tags/` URL prefixes.

Evidence. https://docs.coderabbit.ai

/rubric#b5 · article

absent

0/2

B6

<main> or <article> wraps the primary content prose

semantic HTML5 wrappers let agents (and screen readers) strip the navigation/footer/sidebars and read just the docs prose. A page where the body is all `<div>` requires guesswork.

Scoring. 2 = `<main>` text >200 chars AND `<article>` text >100 chars · 1 = `<main>` only OR `<article>` only · 0 = neither.

Fix. Wrap your page's primary prose in `<main>` (or `<article>` for individual doc pages). Avoid using these for sidebars or navigation — they're meant for the actual content.

Evidence. https://docs.coderabbit.ai

/rubric#b6 · article

partial

1/2

C1a

OpenAPI / Swagger / AsyncAPI spec at a discoverable URL

an OpenAPI spec is THE primary machine-readable contract for a REST API. Agents that find it can generate clients, test cases, and accurate docs without reading any HTML.

Scoring. 8 = found at a standard probe path (/openapi.json, /swagger.yaml, /v1/openapi.json, etc.) · 5 = found via HTML link discovery only · 0 = nothing found.

Fix. Publish your spec at `/openapi.json` or `/openapi.yaml` at host root (or under your docs path — Phase 53 probes both). For OpenAPI-first projects, your build tool already produces this — just expose it.

Evidence. https://docs.coderabbit.ai/openapi.json

/rubric#c1a · article

present

8/8

C1b

Valid OpenAPI 3.x with info, ≥1 path, and response schemas

finding a spec (C1a) is half the battle; the spec must also be valid 3.x AND describe response schemas so agents can know what they'll get back. A spec that lists paths but no response shapes is half a contract.

Scoring. 7 = OpenAPI 3.x + valid info + paths + ≥30% operations have response content schemas · 6 = +paths but few schemas · 3 = Swagger 2.0 fallback · 0 = parse error.

Fix. Bump your spec to OpenAPI 3.0 or 3.1 if you're still on Swagger 2.0. Add `responses: { '200': { content: { 'application/json': { schema: ... } } } }` to each operation — schemas are what make a spec useful to clients.

Evidence. https://docs.coderabbit.ai/openapi.json

/rubric#c1b · article

present

7/7

C2

Postman collection or SDKs with discoverable download/fork

an OpenAPI spec lets agents generate a client; a curated Postman collection or pre-built SDK lets HUMANS try the API in 30 seconds. Both signal investment in developer experience.

Scoring. 4 = Postman collection link AND ≥1 SDK registry link · 3 = Postman OR ≥2 SDK links · 2 = 1 SDK link · 0 = nothing.

Fix. Publish a "Run in Postman" button linking to god.gw.postman.com/run-collection, and link to at least one official SDK from npm/PyPI/RubyGems/etc. directly from your docs homepage.

Evidence. https://docs.coderabbit.ai

/rubric#c2 · article

absent

0/4

C3

Endpoint pages show method, URL, types, required, examples

a docs page that just says "call /users" is useless without method, parameter types, required fields, and a sample request/response. Agents (and humans) need all five to make a working call.

Scoring. 5 = majority of sampled pages classified `complete` by the ML model · 3 = majority `partial` (or 2 complete + 1 absent) · 1 = majority `absent` · 0 = no candidate pages found.

Fix. On every endpoint page include: HTTP method + path, a parameter table with types and required flags, a curl example, and a JSON response example with status code. Markdown-style param tables and `<pre>` JSON blocks classify cleanly.

Evidence. https://docs.coderabbit.ai — ml: model unavailable; heuristic stub (2/5)

/rubric#c3 · article

partial

2/5

D1

Code examples include curl AND at least one language SDK

curl examples are universally testable; SDK examples show idiomatic usage. Together they hit both the "can I try this quickly?" and the "how do I integrate?" needs.

Scoring. 4 = curl AND a language SDK block on ≥1 page · 2 = curl only · 1 = SDK only · 0 = neither.

Fix. Add a tabbed code block per endpoint with at least curl + your most-used SDK language (Python or JavaScript). Use `<code class="language-python">` or `language-bash` so syntax highlighters and our classifier both pick it up.

Evidence. https://docs.coderabbit.ai/api-reference/audit-logs

/rubric#d1 · article

partial

2/4

D2

Realistic examples (not foo/bar/example.com)

`/users/{id}` with `id = 1` and `email = [email protected]` requires the reader to imagine what real data looks like. Realistic placeholders (`[email protected]`, `org_2N5x...`) reduce friction and prevent paste-from-docs accidents.

Scoring. 4 = ML model says <20% of code blocks are placeholder-heavy · 3 = 20-40% · 2 = 40-60% · 1 = 60-80% · 0 = >80% or no code blocks.

Fix. Replace `foo`/`bar`/`example.com`/`your_api_key`/`<string>` with realistic-looking values (Stripe's `pk_test_51N5...`, Twilio's `+14155552671`). Don't use real customer data — but mimic its shape.

Evidence. https://docs.coderabbit.ai/api-reference/audit-logs — regex-only; ML unavailable

/rubric#d2 · article

present

4/4

D3

Error catalogue with HTTP codes + reasons

when an integration breaks at 3am, the dev needs to know what `403 - resource_not_owned` actually means without filing a ticket. A dedicated error reference page is the difference between a 5-minute fix and a half-hour debug.

Scoring. 3 = dedicated error page (≥3 codes with explanations) · 1 = error codes documented inline across pages · 0 = none.

Fix. Publish `/errors` (or `/reference/errors`) listing each HTTP status you return + the application-level error codes + a one-sentence cause for each. Tables work well; so do `<dl>` definition lists.

Evidence. https://docs.coderabbit.ai

/rubric#d3 · article

partial

1/3

D4

Authentication AND rate limits documented

auth is table-stakes; rate limits are how a dev knows whether their integration will survive production load. Both belong on a top-level docs page that's discoverable from the homepage.

Scoring. 3 = both auth and rate-limits documented · 2 = auth only · 1 = rate-limits only · 0 = neither.

Fix. Add `/authentication` (bearer / API-key / OAuth flows) and `/rate-limits` (req/min, headers like `X-RateLimit-Remaining`, 429 retry semantics) pages. Each needs at least 200 chars of context — not just a code snippet.

Evidence. https://docs.coderabbit.ai/api-reference/audit-logs

/rubric#d4 · article

partial

2/3

D5

Glossary OR consistent terminology across pages

is it a "workspace", "team", or "organisation"? Picking one term and sticking with it across all docs prevents a class of "what does X mean here?" support tickets. A dedicated glossary is best; consistent usage is acceptable.

Scoring. 3 = dedicated /glossary with ≥3 structured term/definition pairs · 2 = no glossary but cross-page terminology stays consistent (≥80% dominant variant) · 1 = glossary link exists but content is sparse · 0 = neither.

Fix. Publish `/glossary` as a `<dl>` with `<dt>term</dt><dd>definition</dd>` pairs (or a 2-column table with ≥50-char definitions). Use the same casing/spelling for each term across all pages.

Evidence. https://docs.coderabbit.ai/reference/glossary

/rubric#d5 · article

partial

1/3

D6

Deprecated / beta endpoints marked in plain text

a developer pasting your code sample from 2022 into a 2026 project shouldn't discover the endpoint is deprecated at runtime. Explicit `deprecated` / `beta` / `sunset` markers in the docs save migration headaches.

Scoring. 2 = `deprecated` in OpenAPI spec OR in ≥2 sample pages near endpoint headings · 1 = beta/experimental keywords found but no deprecation · 0 = none.

Fix. Mark each deprecated endpoint with `deprecated: true` in OpenAPI AND a visible badge or admonition in the HTML docs (Mintlify's `<Warning>`, Docusaurus's admonition syntax, etc.). Same for beta endpoints — visible in the prose, not just in the spec.

Evidence. https://docs.coderabbit.ai/api-reference/audit-logs

/rubric#d6 · article

partial

1/2

E1

Content visible in plain HTML without JavaScript (gating)

this is the GATING criterion. If your docs only render after JavaScript runs (single-page-app shell), agents that fetch raw HTML see nothing. WebCrawlers + scrapers + curl + most AI fetchers don't run JS.

Scoring. 6 = body text >500 chars on at least one of homepage or 2 sub-pages across UA modes · 3 = homepage passes but sub-pages SPA · 0 = SPA shell everywhere, or VK-trap (3+ URLs return identical body).

Fix. Serve pre-rendered HTML at static URLs. If you use Next.js/Nuxt/SvelteKit, enable SSG or SSR for the docs section. Single-page-app shells (React SPA, Vue SPA without SSR) fail this gate and cascade-zero many other criteria.

Evidence. https://docs.coderabbit.ai

/rubric#e1 · article

present

6/6

E2

Stable URLs: 301 redirects preserve old paths

when you reorganise docs, old links shouldn't 404 — they should 301 to the new URL. Stable URLs are how internal links from blogs, Stack Overflow, and bookmarks survive your refactor.

Scoring. 2 = stable 301/308 redirect on ≥1 of 2 sampled URL-variants · 1 = canonical-alias pattern (200 with `<link rel=canonical>`) · 0 = 302 (impermanent), 404, or no redirect.

Fix. When you change a doc URL, add a 301 redirect from the old path to the new one. Static-site generators handle this via `_redirects` (Netlify) or `vercel.json` `redirects:` (Vercel) configs.

Evidence. https://docs.coderabbit.ai/api-reference/audit-logs/

/rubric#e2 · article

present

2/2

E3

Explicit API version in URL path, heading, or OpenAPI spec

`/v1/users` vs `/v2/users` is the cheap way to do API versioning AND make it obvious to agents. Version metadata that's only in a header (and not in the path or docs heading) is invisible to crawlers.

Scoring. 2 = version in the docs' own URL structure (sitemap or OpenAPI-spec URL: `/v1/`, `/2024-01-15/`) · 1 = version in a documented API endpoint URL (curl/code examples), in OpenAPI `info.version`, or in an `<h1>`/`<h2>`/footer heading · 0 = none.

Fix. Prefix your API paths with `/v1/`, `/v2/` and show them in your curl/code examples; OR set a non-empty `info.version` in your OpenAPI spec. Date-versioned APIs (`/2024-01-15/users`) also count.

Evidence. https://docs.coderabbit.ai

/rubric#e3 · article

absent

0/2

E4

Spot-check of 5 internal links → all return 200

rotten internal links are the most-common docs failure mode after years of churn. A 5-link spot-check catches the worst cases (4xx/5xx on links right on your homepage) without trying to crawl every link.

Scoring. 2 = 5/5 of sampled same-host links return 200 · 1 = 4/5 · 0 = ≤3/5, OR fewer than 5 distinct same-host links found on the homepage.

Fix. Run a link-checker as part of your CI (lychee, htmltest, linkinator). For the most-trafficked links on the homepage, fix any 404s before shipping. Five working links from the landing page is the bare minimum.

Evidence. https://docs.coderabbit.ai/

/rubric#e4 · article

present

2/2

E5

Usage terms: TOS / license / AI policy explicit

without an explicit TOS or AI-usage policy, every LLM scraper has to guess your stance. Adding a substantive `/terms` or `/license` page — especially one with AI/ML keywords — makes the policy machine-readable.

Scoring. 2 = TOS page found AND contains AI/ML policy keywords in main content · 1 = TOS found (substantial but no AI keywords, or link present but page 404/sparse) · 0 = no TOS link.

Fix. Publish `/terms` (or `/legal`, `/license`) with at least 1000 chars of policy text. Include explicit language on AI scraping, model training, and automated access — even if you allow everything, saying so is the signal.

Evidence. https://docs.coderabbit.ai

/rubric#e5 · article

absent

0/2

F1

Agentic discovery breadth: llms.txt, llms-full, MCP

an agent finds your site through several surfaces; F1 rewards exposing more than one — a reachable llms.txt, a full-content llms-full.txt feed, and an advertised MCP/agent endpoint. It complements A1/A2 (which judge each surface's quality) by counting how many an agent can discover.

Scoring. 2 = ≥2 of {reachable llms.txt, llms-full.txt exists, MCP/agent endpoint advertised} · 1 = one of those · 0 = none · error if the site couldn't be fetched.

Fix. Publish `/llms.txt` and `/llms-full.txt` at host root, and reference your MCP server or a `.well-known/` discovery endpoint inside `/llms.txt` so agents can find it.

Evidence. https://docs.coderabbit.ai/llms.txt

/rubric#f1 · article

present

2/2

F2

WebMCP declarative tool forms, schema-valid

WebMCP lets a page expose callable tools to in-browser agents via declarative `<form toolname tooldescription>` markup. F2 rewards a present, schema-clean WebMCP surface so an agent can invoke the tools without guessing.

Scoring. 2 = WebMCP detected, 0 schema errors + 0 warnings · 1 = detected with schema issues (errors or warnings) · 0 = not detected · error if the homepage couldn't be scanned.

Fix. Add WebMCP declarative tool forms — each with `toolname` + `tooldescription`, and `name`+`toolparamdescription` on every input. Fix missing-toolname / required-param-no-name errors first; they're the hard failures.

/rubric#f2 · article

absent

0/2

F3

MCP server advertised (RFC 9728 / 8414 OAuth)

an MCP server lets agents call your API as governed tools. F3 rewards advertising a discoverable, OAuth-protected MCP endpoint via the standard `.well-known` metadata, so an agent can authenticate and connect without bespoke setup.

Scoring. 3 = full oauth-mcp (RFC 9728 protected-resource + RFC 8414 auth-server metadata + PKCE S256) · 2 = partial · 1 = endpoint-only · 0 = none · error if the site was unreachable.

Fix. Serve `/.well-known/oauth-protected-resource` pointing at your MCP endpoint and a same-host RFC 8414 authorization-server metadata document advertising `S256` PKCE.

Evidence. https://docs.coderabbit.ai/.well-known/oauth-protected-resource/integrations/mcp-servers — mcp tier: partial

/rubric#f3 · article

partial

2/3

F4

Agent accessibility: static name + ARIA validity

an AI agent operates a page through its accessibility tree — it needs every button, link, input and image to carry a name it can target. Missing accessible names, invalid ARIA, and positive tabindex all break that, leaving controls an agent can see but can't reliably name or actuate.

Scoring. 3 = 0 violations (and ≥1 element to check) · 2 = 1–2 · 1 = 3–5 · 0 = ≥6 · not_applicable if the page has nothing nameable · error if the homepage couldn't be scanned. Static heuristic (no axe-core/headless): name + ARIA-validity rules only.

Fix. Give every `<button>`/`<a>`/icon an accessible name (text, `aria-label`, or a labelled child `<img alt>`); associate `<label>`s with inputs/selects; add `alt` to images and `<title>` to inline SVGs; drop positive `tabindex`; fix typo'd `aria-*` attributes and roles.

Evidence. https://docs.coderabbit.ai — static a11y heuristic: name/attr-validity rules only; computed-tree rules (required-children/parent, hidden-focus, role-conflict) not checked — no axe-core/headless

/rubric#f4 · article

present

3/3

https://docs.coderabbit.ai

Categories

30 criteria