agentfit · audit another site · browse
done · finished 2026-06-06 11:33 UTC
· run id 019e9cb5-be55-7265-b2e1-e3bf15eb7987
/r/019e9cb5-be55-7265-b2e1-e3bf15eb7987Click a row to read what the criterion checks. Full rubric →
| ID | Criterion | Status | Score |
|---|---|---|---|
A1 |
llms.txt at host root conforms to llmstxt.org specagents (and humans new to a site) need a single, predictable index of where the docs live. /llms.txt is the convention proposed by Anthropic + the llmstxt.org community: one Markdown file at the host root with the docs map. Scoring. 2 = H1 + ≥1 H2 + ≥3 links + all resolve · 1 = any H1-bearing file (downgraded from 2 if links don't resolve) · 0 = missing or HTML shell. Fix. Publish /llms.txt at host root with an `# H1` title, `## H2` section headers, and at least three Markdown bullet links pointing at concrete doc pages. See https://llmstxt.org for the spec. Evidence. https://docs.coderabbit.ai/llms.txt |
present | 2/2 |
A2 |
llms-full.txt or per-section LLM aggregates exist/llms-full.txt is the full-text dump of your docs in one place — large language model agents prefer it over crawling 200 HTML pages. Per-section variants (/llms-api.txt etc.) work too. Scoring. 3 = /llms-full.txt at host root, >1 KB · 2 = per-section aggregate found via llms.txt · 0 = neither, or SPA-shell at /llms-full.txt. Fix. Generate /llms-full.txt at build time (mkdocs/docusaurus plugins exist) and serve it as text/plain. Keep it under 100 MB so agents can fetch it without streaming. Evidence. https://docs.coderabbit.ai/llms-full.txt |
present | 3/3 |
A3 |
robots.txt declares an AI-bot policy and absolute Sitemapevery AI crawler (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot) reads /robots.txt before crawling. An explicit Allow/Disallow per UA + an absolute `Sitemap:` directive removes ambiguity about both indexing and what to index. Scoring. 3 = explicit AI-bot UA directive AND absolute Sitemap: line · 2 = AI-bot directive only · 1 = absolute Sitemap only · 0 = neither, or 404. Fix. Add `User-agent: GPTBot\nAllow: /` (or Disallow as policy dictates) for each major LLM bot, plus a `Sitemap: https://example.com/sitemap.xml` line. Cloudflare's `Content-Signal:` directive also counts. Evidence. https://docs.coderabbit.ai/robots.txt |
present | 3/3 |
A4 |
sitemap.xml: well-formed, absolute URLs, low taxonomy noisesitemaps tell crawlers what to index and how often it changes. A well-formed `<urlset>` with absolute `<loc>` URLs across many distinct pages signals real coverage; a stub with one URL (or one with 70%+ /tag/ /category/ noise) gives no signal. Scoring. 3 = well-formed + ≥3 distinct paths + <30% taxonomy junk · 2 = thin (<3 paths) or 30-70% junk · 1 = well-formed only · 0 = 404 or parse error. Fix. Generate /sitemap.xml at build time, include every doc page with `<loc>` as absolute URLs, and exclude /tag/, /category/, /author/, /page= variants. Reference it from robots.txt with an absolute `Sitemap:` line. Evidence. https://docs.coderabbit.ai/sitemap.xml |
present | 3/3 |
A5 |
Homepage discovery tags: markdown alternate + OpenGraphdiscovery tags let agents find the markdown version of a page without a separate probe, and OpenGraph turns shared docs links into rich previews on Slack/Discord/Twitter. Both signal an awareness of machine consumers. Scoring. 2 = `<link rel=alternate type=text/markdown>` + ≥3 distinct `og:` properties · 1 = markdown alternate only OR OpenGraph only · 0 = neither. Fix. Add `<link rel="alternate" type="text/markdown" href="/page.md">` next to your canonical link, and ensure `og:title`, `og:description`, `og:image` (minimum 3 properties) are set on the homepage. Evidence. https://docs.coderabbit.ai |
partial | 1/2 |
B1 |
.md companion of doc pages returns clean markdownbrowsing 200 HTML pages to read your docs is fine for humans; for agents it's an order-of-magnitude tokenization cost. A markdown twin per page lets agents pull just the prose. Scoring. 7 = 3/3 sampled pages have a working .md twin · 4 = 2/3 · 2 = 1/3 · 0 = 0/3. Fix. Serve `{page}.md` (or `{page}/index.md`) alongside every HTML page, OR support `Accept: text/markdown` content negotiation that returns `Content-Type: text/markdown`. mkdocs-material and docusaurus both have plugins. Evidence. https://docs.coderabbit.ai/api-reference/audit-logs.md |
present | 7/7 |
B2 |
JSON-LD with valid @type on homepage and sample doc pageJSON-LD is the schema.org-compatible way to declare "this page is an Article" / "this product is a SoftwareApplication". Search engines, agents, and structured-data extractors all key off it. Scoring. 4 = parseable JSON-LD with `@type` on BOTH homepage and a sample doc page · 3 = one of the two · 0 = none. Fix. Embed `<script type="application/ld+json">{"@context":"https://schema.org","@type":"TechArticle",...}</script>` on every doc page. The `WebApplication` type is a good fit for the homepage. Evidence. https://docs.coderabbit.ai |
present | 4/4 |
B3 |
Absolute <link rel=canonical> on homepage and sample pagecanonical links resolve the "is this the http or https version, with or without trailing slash, with or without query?" question deterministically. Without them, agents may index the same content under multiple URLs. Scoring. 3 = absolute canonical on BOTH homepage AND a sample page · 2 = homepage only · 1 = present but relative · 0 = absent. Fix. Add `<link rel="canonical" href="https://example.com/page">` to every page's `<head>`. The URL must include the scheme + host (relative canonicals are valid HTML but defeat the purpose for cross-host agents). Evidence. https://docs.coderabbit.ai |
present | 3/3 |
B4 |
Freshness: dateModified (JSON-LD) or Last-Modified headeragents (and search engines) lower their trust in docs that don't declare when they were last updated. A docs page from 2019 with no freshness signal is indistinguishable from one updated yesterday. Scoring. 2 = JSON-LD `dateModified` OR HTTP `Last-Modified` header present · 0 = neither. Fix. Either include `"dateModified": "2026-05-28"` in your JSON-LD block, or have your CDN/server emit a `Last-Modified` HTTP header. Build-time templating does this for free in most static-site generators. Evidence. https://docs.coderabbit.ai |
present | 2/2 |
B5 |
Machine-readable taxonomies (keywords, tags, categories)tagged docs help agents filter ("show me the auth-related pages") without parsing full prose. `<meta name="keywords">`, JSON-LD `keywords`, or `/tags/`-style URLs all count. Scoring. 2 = at least one taxonomy signal present (meta keywords, JSON-LD keywords, or /tags|/categories|/topics/ link patterns) · 0 = none. Fix. Add `<meta name="keywords" content="api,auth,oauth">` to each page, OR include a `keywords` array in your JSON-LD, OR organise content under `/topics/` or `/tags/` URL prefixes. Evidence. https://docs.coderabbit.ai |
absent | 0/2 |
B6 |
<main> or <article> wraps the primary content prosesemantic HTML5 wrappers let agents (and screen readers) strip the navigation/footer/sidebars and read just the docs prose. A page where the body is all `<div>` requires guesswork. Scoring. 2 = `<main>` text >200 chars AND `<article>` text >100 chars · 1 = `<main>` only OR `<article>` only · 0 = neither. Fix. Wrap your page's primary prose in `<main>` (or `<article>` for individual doc pages). Avoid using these for sidebars or navigation — they're meant for the actual content. Evidence. https://docs.coderabbit.ai |
partial | 1/2 |
C1a |
OpenAPI / Swagger / AsyncAPI spec at a discoverable URLan OpenAPI spec is THE primary machine-readable contract for a REST API. Agents that find it can generate clients, test cases, and accurate docs without reading any HTML. Scoring. 8 = found at a standard probe path (/openapi.json, /swagger.yaml, /v1/openapi.json, etc.) · 5 = found via HTML link discovery only · 0 = nothing found. Fix. Publish your spec at `/openapi.json` or `/openapi.yaml` at host root (or under your docs path — Phase 53 probes both). For OpenAPI-first projects, your build tool already produces this — just expose it. Evidence. https://docs.coderabbit.ai/openapi.json |
present | 8/8 |
C1b |
Valid OpenAPI 3.x with info, ≥1 path, and response schemasfinding a spec (C1a) is half the battle; the spec must also be valid 3.x AND describe response schemas so agents can know what they'll get back. A spec that lists paths but no response shapes is half a contract. Scoring. 7 = OpenAPI 3.x + valid info + paths + ≥30% operations have response content schemas · 6 = +paths but few schemas · 3 = Swagger 2.0 fallback · 0 = parse error. Fix. Bump your spec to OpenAPI 3.0 or 3.1 if you're still on Swagger 2.0. Add `responses: { '200': { content: { 'application/json': { schema: ... } } } }` to each operation — schemas are what make a spec useful to clients. Evidence. https://docs.coderabbit.ai/openapi.json |
present | 7/7 |
C2 |
Postman collection or SDKs with discoverable download/forkan OpenAPI spec lets agents generate a client; a curated Postman collection or pre-built SDK lets HUMANS try the API in 30 seconds. Both signal investment in developer experience. Scoring. 4 = Postman collection link AND ≥1 SDK registry link · 3 = Postman OR ≥2 SDK links · 2 = 1 SDK link · 0 = nothing. Fix. Publish a "Run in Postman" button linking to god.gw.postman.com/run-collection, and link to at least one official SDK from npm/PyPI/RubyGems/etc. directly from your docs homepage. Evidence. https://docs.coderabbit.ai |
absent | 0/4 |
C3 |
Endpoint pages show method, URL, types, required, examplesa docs page that just says "call /users" is useless without method, parameter types, required fields, and a sample request/response. Agents (and humans) need all five to make a working call. Scoring. 5 = majority of sampled pages classified `complete` by the ML model · 3 = majority `partial` (or 2 complete + 1 absent) · 1 = majority `absent` · 0 = no candidate pages found. Fix. On every endpoint page include: HTTP method + path, a parameter table with types and required flags, a curl example, and a JSON response example with status code. Markdown-style param tables and `<pre>` JSON blocks classify cleanly. Evidence. https://docs.coderabbit.ai — ml: model unavailable; heuristic stub (2/5) |
partial | 2/5 |
D1 |
Code examples include curl AND at least one language SDKcurl examples are universally testable; SDK examples show idiomatic usage. Together they hit both the "can I try this quickly?" and the "how do I integrate?" needs. Scoring. 4 = curl AND a language SDK block on ≥1 page · 2 = curl only · 1 = SDK only · 0 = neither. Fix. Add a tabbed code block per endpoint with at least curl + your most-used SDK language (Python or JavaScript). Use `<code class="language-python">` or `language-bash` so syntax highlighters and our classifier both pick it up. Evidence. https://docs.coderabbit.ai/api-reference/audit-logs |
partial | 2/4 |
D2 |
Realistic examples (not foo/bar/example.com)`/users/{id}` with `id = 1` and `email = [email protected]` requires the reader to imagine what real data looks like. Realistic placeholders (`[email protected]`, `org_2N5x...`) reduce friction and prevent paste-from-docs accidents. Scoring. 4 = ML model says <20% of code blocks are placeholder-heavy · 3 = 20-40% · 2 = 40-60% · 1 = 60-80% · 0 = >80% or no code blocks. Fix. Replace `foo`/`bar`/`example.com`/`your_api_key`/`<string>` with realistic-looking values (Stripe's `pk_test_51N5...`, Twilio's `+14155552671`). Don't use real customer data — but mimic its shape. Evidence. https://docs.coderabbit.ai/api-reference/audit-logs — regex-only; ML unavailable |
present | 4/4 |
D3 |
Error catalogue with HTTP codes + reasonswhen an integration breaks at 3am, the dev needs to know what `403 - resource_not_owned` actually means without filing a ticket. A dedicated error reference page is the difference between a 5-minute fix and a half-hour debug. Scoring. 3 = dedicated error page (≥3 codes with explanations) · 1 = error codes documented inline across pages · 0 = none. Fix. Publish `/errors` (or `/reference/errors`) listing each HTTP status you return + the application-level error codes + a one-sentence cause for each. Tables work well; so do `<dl>` definition lists. Evidence. https://docs.coderabbit.ai |
partial | 1/3 |
D4 |
Authentication AND rate limits documentedauth is table-stakes; rate limits are how a dev knows whether their integration will survive production load. Both belong on a top-level docs page that's discoverable from the homepage. Scoring. 3 = both auth and rate-limits documented · 2 = auth only · 1 = rate-limits only · 0 = neither. Fix. Add `/authentication` (bearer / API-key / OAuth flows) and `/rate-limits` (req/min, headers like `X-RateLimit-Remaining`, 429 retry semantics) pages. Each needs at least 200 chars of context — not just a code snippet. Evidence. https://docs.coderabbit.ai/api-reference/audit-logs |
partial | 2/3 |
D5 |
Glossary OR consistent terminology across pagesis it a "workspace", "team", or "organisation"? Picking one term and sticking with it across all docs prevents a class of "what does X mean here?" support tickets. A dedicated glossary is best; consistent usage is acceptable. Scoring. 3 = dedicated /glossary with ≥3 structured term/definition pairs · 2 = no glossary but cross-page terminology stays consistent (≥80% dominant variant) · 1 = glossary link exists but content is sparse · 0 = neither. Fix. Publish `/glossary` as a `<dl>` with `<dt>term</dt><dd>definition</dd>` pairs (or a 2-column table with ≥50-char definitions). Use the same casing/spelling for each term across all pages. |
partial | 1/3 |
D6 |
Deprecated / beta endpoints marked in plain texta developer pasting your code sample from 2022 into a 2026 project shouldn't discover the endpoint is deprecated at runtime. Explicit `deprecated` / `beta` / `sunset` markers in the docs save migration headaches. Scoring. 2 = `deprecated` in OpenAPI spec OR in ≥2 sample pages near endpoint headings · 1 = beta/experimental keywords found but no deprecation · 0 = none. Fix. Mark each deprecated endpoint with `deprecated: true` in OpenAPI AND a visible badge or admonition in the HTML docs (Mintlify's `<Warning>`, Docusaurus's admonition syntax, etc.). Same for beta endpoints — visible in the prose, not just in the spec. Evidence. https://docs.coderabbit.ai/api-reference/audit-logs |
partial | 1/2 |
E1 |
Content visible in plain HTML without JavaScript (gating)this is the GATING criterion. If your docs only render after JavaScript runs (single-page-app shell), agents that fetch raw HTML see nothing. WebCrawlers + scrapers + curl + most AI fetchers don't run JS. Scoring. 6 = body text >500 chars on at least one of homepage or 2 sub-pages across UA modes · 3 = homepage passes but sub-pages SPA · 0 = SPA shell everywhere, or VK-trap (3+ URLs return identical body). Fix. Serve pre-rendered HTML at static URLs. If you use Next.js/Nuxt/SvelteKit, enable SSG or SSR for the docs section. Single-page-app shells (React SPA, Vue SPA without SSR) fail this gate and cascade-zero many other criteria. Evidence. https://docs.coderabbit.ai |
present | 6/6 |
E2 |
Stable URLs: 301 redirects preserve old pathswhen you reorganise docs, old links shouldn't 404 — they should 301 to the new URL. Stable URLs are how internal links from blogs, Stack Overflow, and bookmarks survive your refactor. Scoring. 2 = stable 301/308 redirect on ≥1 of 2 sampled URL-variants · 1 = canonical-alias pattern (200 with `<link rel=canonical>`) · 0 = 302 (impermanent), 404, or no redirect. Fix. When you change a doc URL, add a 301 redirect from the old path to the new one. Static-site generators handle this via `_redirects` (Netlify) or `vercel.json` `redirects:` (Vercel) configs. Evidence. https://docs.coderabbit.ai/api-reference/audit-logs/ |
present | 2/2 |
E3 |
Explicit API version in URL path, heading, or OpenAPI spec`/v1/users` vs `/v2/users` is the cheap way to do API versioning AND make it obvious to agents. Version metadata that's only in a header (and not in the path or docs heading) is invisible to crawlers. Scoring. 2 = version in the docs' own URL structure (sitemap or OpenAPI-spec URL: `/v1/`, `/2024-01-15/`) · 1 = version in a documented API endpoint URL (curl/code examples), in OpenAPI `info.version`, or in an `<h1>`/`<h2>`/footer heading · 0 = none. Fix. Prefix your API paths with `/v1/`, `/v2/` and show them in your curl/code examples; OR set a non-empty `info.version` in your OpenAPI spec. Date-versioned APIs (`/2024-01-15/users`) also count. Evidence. https://docs.coderabbit.ai |
absent | 0/2 |
E4 |
Spot-check of 5 internal links → all return 200rotten internal links are the most-common docs failure mode after years of churn. A 5-link spot-check catches the worst cases (4xx/5xx on links right on your homepage) without trying to crawl every link. Scoring. 2 = 5/5 of sampled same-host links return 200 · 1 = 4/5 · 0 = ≤3/5, OR fewer than 5 distinct same-host links found on the homepage. Fix. Run a link-checker as part of your CI (lychee, htmltest, linkinator). For the most-trafficked links on the homepage, fix any 404s before shipping. Five working links from the landing page is the bare minimum. Evidence. https://docs.coderabbit.ai/ |
present | 2/2 |
E5 |
Usage terms: TOS / license / AI policy explicitwithout an explicit TOS or AI-usage policy, every LLM scraper has to guess your stance. Adding a substantive `/terms` or `/license` page — especially one with AI/ML keywords — makes the policy machine-readable. Scoring. 2 = TOS page found AND contains AI/ML policy keywords in main content · 1 = TOS found (substantial but no AI keywords, or link present but page 404/sparse) · 0 = no TOS link. Fix. Publish `/terms` (or `/legal`, `/license`) with at least 1000 chars of policy text. Include explicit language on AI scraping, model training, and automated access — even if you allow everything, saying so is the signal. Evidence. https://docs.coderabbit.ai |
absent | 0/2 |
F1 |
Agentic discovery breadth: llms.txt, llms-full, MCPan agent finds your site through several surfaces; F1 rewards exposing more than one — a reachable llms.txt, a full-content llms-full.txt feed, and an advertised MCP/agent endpoint. It complements A1/A2 (which judge each surface's quality) by counting how many an agent can discover. Scoring. 2 = ≥2 of {reachable llms.txt, llms-full.txt exists, MCP/agent endpoint advertised} · 1 = one of those · 0 = none · error if the site couldn't be fetched. Fix. Publish `/llms.txt` and `/llms-full.txt` at host root, and reference your MCP server or a `.well-known/` discovery endpoint inside `/llms.txt` so agents can find it. Evidence. https://docs.coderabbit.ai/llms.txt |
present | 2/2 |
F2 |
WebMCP declarative tool forms, schema-validWebMCP lets a page expose callable tools to in-browser agents via declarative `<form toolname tooldescription>` markup. F2 rewards a present, schema-clean WebMCP surface so an agent can invoke the tools without guessing. Scoring. 2 = WebMCP detected, 0 schema errors + 0 warnings · 1 = detected with schema issues (errors or warnings) · 0 = not detected · error if the homepage couldn't be scanned. Fix. Add WebMCP declarative tool forms — each with `toolname` + `tooldescription`, and `name`+`toolparamdescription` on every input. Fix missing-toolname / required-param-no-name errors first; they're the hard failures. |
absent | 0/2 |
F3 |
MCP server advertised (RFC 9728 / 8414 OAuth)an MCP server lets agents call your API as governed tools. F3 rewards advertising a discoverable, OAuth-protected MCP endpoint via the standard `.well-known` metadata, so an agent can authenticate and connect without bespoke setup. Scoring. 3 = full oauth-mcp (RFC 9728 protected-resource + RFC 8414 auth-server metadata + PKCE S256) · 2 = partial · 1 = endpoint-only · 0 = none · error if the site was unreachable. Fix. Serve `/.well-known/oauth-protected-resource` pointing at your MCP endpoint and a same-host RFC 8414 authorization-server metadata document advertising `S256` PKCE. Evidence. https://docs.coderabbit.ai/.well-known/oauth-protected-resource/integrations/mcp-servers — mcp tier: partial |
partial | 2/3 |
F4 |
Agent accessibility: static name + ARIA validityan AI agent operates a page through its accessibility tree — it needs every button, link, input and image to carry a name it can target. Missing accessible names, invalid ARIA, and positive tabindex all break that, leaving controls an agent can see but can't reliably name or actuate. Scoring. 3 = 0 violations (and ≥1 element to check) · 2 = 1–2 · 1 = 3–5 · 0 = ≥6 · not_applicable if the page has nothing nameable · error if the homepage couldn't be scanned. Static heuristic (no axe-core/headless): name + ARIA-validity rules only. Fix. Give every `<button>`/`<a>`/icon an accessible name (text, `aria-label`, or a labelled child `<img alt>`); associate `<label>`s with inputs/selects; add `alt` to images and `<title>` to inline SVGs; drop positive `tabindex`; fix typo'd `aria-*` attributes and roles. Evidence. https://docs.coderabbit.ai — static a11y heuristic: name/attr-validity rules only; computed-tree rules (required-children/parent, hidden-focus, role-conflict) not checked — no axe-core/headless |
present | 3/3 |
Raw report: JSON · history: JSON · diff: JSON
agentfit · browse · rubric · privacy · terms · cookies · cookie settings · bot
language: English · Français · 简体中文 · Русский · Español
© 2026 Stanislav Gumeniuk · All rights reserved