https://docs.musixmatch.com/overview

done · finished 2026-06-04 21:00 UTC · run id 019e9452-54d8-732f-b5d4-4dc852194190

Category	From	Δ
Discovery	16	-16
Page artifacts	19	-19
API spec	18	-18
Content	14	-14
Hygiene	10	-10
Agent Surface	0	0

Criterion	Status	From	Δ
`A1`	present → error	5	-5
`A2`	present → error	3	-3
`A3`	present → error	3	-3
`A4`	present → error	4	-4
`A5`	partial → error	1	-1
`B1`	present → error	7	-7
`B2`	present → error	5	-5
`B3`	present → error	3	-3
`B4`	present → error	2	-2
`B5`	absent → error	0	0
`B6`	partial → error	2	-2
`C1a`	present → error	8	-8
`C1b`	present → not applicable	7	-7
`C2`	absent → error	0	0
`C3`	partial → error	3	-3
`D1`	partial → error	2	-2
`D2`	present → error	4	-4
`D3`	partial → error	1	-1
`D4`	partial → error	2	-2
`D5`	partial → error	2	-2
`D6`	present → error	3	-3
`E1`	present → error	6	-6
`E2`	present → error	2	-2
`E3`	absent → error	0	0
`E4`	present → error	2	-2
`E5`	absent → error	0	0
`F1`	→ error	0	0
`F2`	→ error	0	0
`F3`	→ error	0	0
`F4`	→ error	0	0

ID	Criterion	Status	Score
`A1`	llms.txt at host root conforms to llmstxt.org spec agents (and humans new to a site) need a single, predictable index of where the docs live. /llms.txt is the convention proposed by Anthropic + the llmstxt.org community: one Markdown file at the host root with the docs map. Scoring. 2 = H1 + ≥1 H2 + ≥3 links + all resolve · 1 = any H1-bearing file (downgraded from 2 if links don't resolve) · 0 = missing or HTML shell. Fix. Publish /llms.txt at host root with an `# H1` title, `## H2` section headers, and at least three Markdown bullet links pointing at concrete doc pages. See https://llmstxt.org for the spec. Evidence. https://docs.musixmatch.com/llms.txt — fetch: GET https://docs.musixmatch.com/llms.txt: Get "https://docs.musixmatch.com/llms.txt": net/http: TLS handshake timeout /rubric#a1 · article	error	0/2
`A2`	llms-full.txt or per-section LLM aggregates exist /llms-full.txt is the full-text dump of your docs in one place — large language model agents prefer it over crawling 200 HTML pages. Per-section variants (/llms-api.txt etc.) work too. Scoring. 3 = /llms-full.txt at host root, >1 KB · 2 = per-section aggregate found via llms.txt · 0 = neither, or SPA-shell at /llms-full.txt. Fix. Generate /llms-full.txt at build time (mkdocs/docusaurus plugins exist) and serve it as text/plain. Keep it under 100 MB so agents can fetch it without streaming. Evidence. https://docs.musixmatch.com/llms-full.txt — pre-fetch unreachable /rubric#a2 · article	error	0/3
`A3`	robots.txt declares an AI-bot policy and absolute Sitemap every AI crawler (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot) reads /robots.txt before crawling. An explicit Allow/Disallow per UA + an absolute `Sitemap:` directive removes ambiguity about both indexing and what to index. Scoring. 3 = explicit AI-bot UA directive AND absolute Sitemap: line · 2 = AI-bot directive only · 1 = absolute Sitemap only · 0 = neither, or 404. Fix. Add `User-agent: GPTBot\nAllow: /` (or Disallow as policy dictates) for each major LLM bot, plus a `Sitemap: https://example.com/sitemap.xml` line. Cloudflare's `Content-Signal:` directive also counts. Evidence. https://docs.musixmatch.com/robots.txt — robots.txt pre-fetch failed /rubric#a3 · article	error	0/3
`A4`	sitemap.xml: well-formed, absolute URLs, low taxonomy noise sitemaps tell crawlers what to index and how often it changes. A well-formed `<urlset>` with absolute `<loc>` URLs across many distinct pages signals real coverage; a stub with one URL (or one with 70%+ /tag/ /category/ noise) gives no signal. Scoring. 3 = well-formed + ≥3 distinct paths + <30% taxonomy junk · 2 = thin (<3 paths) or 30-70% junk · 1 = well-formed only · 0 = 404 or parse error. Fix. Generate /sitemap.xml at build time, include every doc page with `<loc>` as absolute URLs, and exclude /tag/, /category/, /author/, /page= variants. Reference it from robots.txt with an absolute `Sitemap:` line. Evidence. https://docs.musixmatch.com/sitemap.xml — fetch: GET https://docs.musixmatch.com/sitemap.xml: Get "https://docs.musixmatch.com/sitemap.xml": net/http: TLS handshake timeout /rubric#a4 · article	error	0/3
`A5`	Homepage discovery tags: markdown alternate + OpenGraph discovery tags let agents find the markdown version of a page without a separate probe, and OpenGraph turns shared docs links into rich previews on Slack/Discord/Twitter. Both signal an awareness of machine consumers. Scoring. 2 = `<link rel=alternate type=text/markdown>` + ≥3 distinct `og:` properties · 1 = markdown alternate only OR OpenGraph only · 0 = neither. Fix. Add `<link rel="alternate" type="text/markdown" href="/page.md">` next to your canonical link, and ensure `og:title`, `og:description`, `og:image` (minimum 3 properties) are set on the homepage. Evidence. https://docs.musixmatch.com/overview — homepage not parsed /rubric#a5 · article	error	0/2
`B1`	.md companion of doc pages returns clean markdown browsing 200 HTML pages to read your docs is fine for humans; for agents it's an order-of-magnitude tokenization cost. A markdown twin per page lets agents pull just the prose. Scoring. 7 = 3/3 sampled pages have a working .md twin · 4 = 2/3 · 2 = 1/3 · 0 = 0/3. Fix. Serve `{page}.md` (or `{page}/index.md`) alongside every HTML page, OR support `Accept: text/markdown` content negotiation that returns `Content-Type: text/markdown`. mkdocs-material and docusaurus both have plugins. Evidence. https://docs.musixmatch.com/overview — no sitemap samples available /rubric#b1 · article	error	0/7
`B2`	JSON-LD with valid @type on homepage and sample doc page JSON-LD is the schema.org-compatible way to declare "this page is an Article" / "this product is a SoftwareApplication". Search engines, agents, and structured-data extractors all key off it. Scoring. 4 = parseable JSON-LD with `@type` on BOTH homepage and a sample doc page · 3 = one of the two · 0 = none. Fix. Embed `<script type="application/ld+json">{"@context":"https://schema.org","@type":"TechArticle",...}</script>` on every doc page. The `WebApplication` type is a good fit for the homepage. Evidence. https://docs.musixmatch.com/overview — no parsed pages available /rubric#b2 · article	error	0/4
`B3`	Absolute <link rel=canonical> on homepage and sample page canonical links resolve the "is this the http or https version, with or without trailing slash, with or without query?" question deterministically. Without them, agents may index the same content under multiple URLs. Scoring. 3 = absolute canonical on BOTH homepage AND a sample page · 2 = homepage only · 1 = present but relative · 0 = absent. Fix. Add `<link rel="canonical" href="https://example.com/page">` to every page's `<head>`. The URL must include the scheme + host (relative canonicals are valid HTML but defeat the purpose for cross-host agents). Evidence. https://docs.musixmatch.com/overview — no parsed pages available /rubric#b3 · article	error	0/3
`B4`	Freshness: dateModified (JSON-LD) or Last-Modified header agents (and search engines) lower their trust in docs that don't declare when they were last updated. A docs page from 2019 with no freshness signal is indistinguishable from one updated yesterday. Scoring. 2 = JSON-LD `dateModified` OR HTTP `Last-Modified` header present · 0 = neither. Fix. Either include `"dateModified": "2026-05-28"` in your JSON-LD block, or have your CDN/server emit a `Last-Modified` HTTP header. Build-time templating does this for free in most static-site generators. Evidence. https://docs.musixmatch.com/overview — no page metadata available /rubric#b4 · article	error	0/2
`B5`	Machine-readable taxonomies (keywords, tags, categories) tagged docs help agents filter ("show me the auth-related pages") without parsing full prose. `<meta name="keywords">`, JSON-LD `keywords`, or `/tags/`-style URLs all count. Scoring. 2 = at least one taxonomy signal present (meta keywords, JSON-LD keywords, or /tags\|/categories\|/topics/ link patterns) · 0 = none. Fix. Add `<meta name="keywords" content="api,auth,oauth">` to each page, OR include a `keywords` array in your JSON-LD, OR organise content under `/topics/` or `/tags/` URL prefixes. Evidence. https://docs.musixmatch.com/overview — no parsed pages available /rubric#b5 · article	error	0/2
`B6`	<main> or <article> wraps the primary content prose semantic HTML5 wrappers let agents (and screen readers) strip the navigation/footer/sidebars and read just the docs prose. A page where the body is all `<div>` requires guesswork. Scoring. 2 = `<main>` text >200 chars AND `<article>` text >100 chars · 1 = `<main>` only OR `<article>` only · 0 = neither. Fix. Wrap your page's primary prose in `<main>` (or `<article>` for individual doc pages). Avoid using these for sidebars or navigation — they're meant for the actual content. Evidence. https://docs.musixmatch.com/overview — no parsed pages available /rubric#b6 · article	error	0/2
`C1a`	OpenAPI / Swagger / AsyncAPI spec at a discoverable URL an OpenAPI spec is THE primary machine-readable contract for a REST API. Agents that find it can generate clients, test cases, and accurate docs without reading any HTML. Scoring. 8 = found at a standard probe path (/openapi.json, /swagger.yaml, /v1/openapi.json, etc.) · 5 = found via HTML link discovery only · 0 = nothing found. Fix. Publish your spec at `/openapi.json` or `/openapi.yaml` at host root (or under your docs path — Phase 53 probes both). For OpenAPI-first projects, your build tool already produces this — just expose it. Evidence. https://docs.musixmatch.com/overview — homepage pre-fetch failed; spec discovery skipped /rubric#c1a · article	error	0/8
`C1b`	Valid OpenAPI 3.x with info, ≥1 path, and response schemas finding a spec (C1a) is half the battle; the spec must also be valid 3.x AND describe response schemas so agents can know what they'll get back. A spec that lists paths but no response shapes is half a contract. Scoring. 7 = OpenAPI 3.x + valid info + paths + ≥30% operations have response content schemas · 6 = +paths but few schemas · 3 = Swagger 2.0 fallback · 0 = parse error. Fix. Bump your spec to OpenAPI 3.0 or 3.1 if you're still on Swagger 2.0. Add `responses: { '200': { content: { 'application/json': { schema: ... } } } }` to each operation — schemas are what make a spec useful to clients. Evidence. https://docs.musixmatch.com/overview — C1a did not find a spec body /rubric#c1b · article	not applicable	0/7
`C2`	Postman collection or SDKs with discoverable download/fork an OpenAPI spec lets agents generate a client; a curated Postman collection or pre-built SDK lets HUMANS try the API in 30 seconds. Both signal investment in developer experience. Scoring. 4 = Postman collection link AND ≥1 SDK registry link · 3 = Postman OR ≥2 SDK links · 2 = 1 SDK link · 0 = nothing. Fix. Publish a "Run in Postman" button linking to god.gw.postman.com/run-collection, and link to at least one official SDK from npm/PyPI/RubyGems/etc. directly from your docs homepage. Evidence. https://docs.musixmatch.com/overview — no parsed pages available /rubric#c2 · article	error	0/4
`C3`	Endpoint pages show method, URL, types, required, examples a docs page that just says "call /users" is useless without method, parameter types, required fields, and a sample request/response. Agents (and humans) need all five to make a working call. Scoring. 5 = majority of sampled pages classified `complete` by the ML model · 3 = majority `partial` (or 2 complete + 1 absent) · 1 = majority `absent` · 0 = no candidate pages found. Fix. On every endpoint page include: HTTP method + path, a parameter table with types and required flags, a curl example, and a JSON response example with status code. Markdown-style param tables and `<pre>` JSON blocks classify cleanly. Evidence. https://docs.musixmatch.com/overview — no pages fetched; ML/heuristic skipped /rubric#c3 · article	error	0/5
`D1`	Code examples include curl AND at least one language SDK curl examples are universally testable; SDK examples show idiomatic usage. Together they hit both the "can I try this quickly?" and the "how do I integrate?" needs. Scoring. 4 = curl AND a language SDK block on ≥1 page · 2 = curl only · 1 = SDK only · 0 = neither. Fix. Add a tabbed code block per endpoint with at least curl + your most-used SDK language (Python or JavaScript). Use `<code class="language-python">` or `language-bash` so syntax highlighters and our classifier both pick it up. Evidence. https://docs.musixmatch.com/overview — no sample pages available /rubric#d1 · article	error	0/4
`D2`	Realistic examples (not foo/bar/example.com) `/users/{id}` with `id = 1` and `email = [email protected]` requires the reader to imagine what real data looks like. Realistic placeholders (`[email protected]`, `org_2N5x...`) reduce friction and prevent paste-from-docs accidents. Scoring. 4 = ML model says <20% of code blocks are placeholder-heavy · 3 = 20-40% · 2 = 40-60% · 1 = 60-80% · 0 = >80% or no code blocks. Fix. Replace `foo`/`bar`/`example.com`/`your_api_key`/`<string>` with realistic-looking values (Stripe's `pk_test_51N5...`, Twilio's `+14155552671`). Don't use real customer data — but mimic its shape. Evidence. https://docs.musixmatch.com/overview — no sample pages available; ML/regex path skipped /rubric#d2 · article	error	0/4
`D3`	Error catalogue with HTTP codes + reasons when an integration breaks at 3am, the dev needs to know what `403 - resource_not_owned` actually means without filing a ticket. A dedicated error reference page is the difference between a 5-minute fix and a half-hour debug. Scoring. 3 = dedicated error page (≥3 codes with explanations) · 1 = error codes documented inline across pages · 0 = none. Fix. Publish `/errors` (or `/reference/errors`) listing each HTTP status you return + the application-level error codes + a one-sentence cause for each. Tables work well; so do `<dl>` definition lists. Evidence. https://docs.musixmatch.com/overview — no homepage / sitemap available /rubric#d3 · article	error	0/3
`D4`	Authentication AND rate limits documented auth is table-stakes; rate limits are how a dev knows whether their integration will survive production load. Both belong on a top-level docs page that's discoverable from the homepage. Scoring. 3 = both auth and rate-limits documented · 2 = auth only · 1 = rate-limits only · 0 = neither. Fix. Add `/authentication` (bearer / API-key / OAuth flows) and `/rate-limits` (req/min, headers like `X-RateLimit-Remaining`, 429 retry semantics) pages. Each needs at least 200 chars of context — not just a code snippet. Evidence. https://docs.musixmatch.com/overview — no pages available /rubric#d4 · article	error	0/3
`D5`	Glossary OR consistent terminology across pages is it a "workspace", "team", or "organisation"? Picking one term and sticking with it across all docs prevents a class of "what does X mean here?" support tickets. A dedicated glossary is best; consistent usage is acceptable. Scoring. 3 = dedicated /glossary with ≥3 structured term/definition pairs · 2 = no glossary but cross-page terminology stays consistent (≥80% dominant variant) · 1 = glossary link exists but content is sparse · 0 = neither. Fix. Publish `/glossary` as a `<dl>` with `<dt>term</dt><dd>definition</dd>` pairs (or a 2-column table with ≥50-char definitions). Use the same casing/spelling for each term across all pages. Evidence. https://docs.musixmatch.com/overview — homepage unreachable; glossary probes skipped /rubric#d5 · article	error	0/3
`D6`	Deprecated / beta endpoints marked in plain text a developer pasting your code sample from 2022 into a 2026 project shouldn't discover the endpoint is deprecated at runtime. Explicit `deprecated` / `beta` / `sunset` markers in the docs save migration headaches. Scoring. 2 = `deprecated` in OpenAPI spec OR in ≥2 sample pages near endpoint headings · 1 = beta/experimental keywords found but no deprecation · 0 = none. Fix. Mark each deprecated endpoint with `deprecated: true` in OpenAPI AND a visible badge or admonition in the HTML docs (Mintlify's `<Warning>`, Docusaurus's admonition syntax, etc.). Same for beta endpoints — visible in the prose, not just in the spec. Evidence. https://docs.musixmatch.com/overview — no pages available /rubric#d6 · article	error	0/2
`E1`	Content visible in plain HTML without JavaScript (gating) this is the GATING criterion. If your docs only render after JavaScript runs (single-page-app shell), agents that fetch raw HTML see nothing. WebCrawlers + scrapers + curl + most AI fetchers don't run JS. Scoring. 6 = body text >500 chars on at least one of homepage or 2 sub-pages across UA modes · 3 = homepage passes but sub-pages SPA · 0 = SPA shell everywhere, or VK-trap (3+ URLs return identical body). Fix. Serve pre-rendered HTML at static URLs. If you use Next.js/Nuxt/SvelteKit, enable SSG or SSR for the docs section. Single-page-app shells (React SPA, Vue SPA without SSR) fail this gate and cascade-zero many other criteria. Evidence. https://docs.musixmatch.com/overview — all UA probes failed to reach target /rubric#e1 · article	error	0/6
`E2`	Stable URLs: 301 redirects preserve old paths when you reorganise docs, old links shouldn't 404 — they should 301 to the new URL. Stable URLs are how internal links from blogs, Stack Overflow, and bookmarks survive your refactor. Scoring. 2 = stable 301/308 redirect on ≥1 of 2 sampled URL-variants · 1 = canonical-alias pattern (200 with `<link rel=canonical>`) · 0 = 302 (impermanent), 404, or no redirect. Fix. When you change a doc URL, add a 301 redirect from the old path to the new one. Static-site generators handle this via `_redirects` (Netlify) or `vercel.json` `redirects:` (Vercel) configs. Evidence. https://docs.musixmatch.com/overview — no sitemap samples available /rubric#e2 · article	error	0/2
`E3`	Explicit API version in URL path, heading, or OpenAPI spec `/v1/users` vs `/v2/users` is the cheap way to do API versioning AND make it obvious to agents. Version metadata that's only in a header (and not in the path or docs heading) is invisible to crawlers. Scoring. 2 = version in the docs' own URL structure (sitemap or OpenAPI-spec URL: `/v1/`, `/2024-01-15/`) · 1 = version in a documented API endpoint URL (curl/code examples), in OpenAPI `info.version`, or in an `<h1>`/`<h2>`/footer heading · 0 = none. Fix. Prefix your API paths with `/v1/`, `/v2/` and show them in your curl/code examples; OR set a non-empty `info.version` in your OpenAPI spec. Date-versioned APIs (`/2024-01-15/users`) also count. Evidence. https://docs.musixmatch.com/overview — no input source (homepage, sitemap, or OpenAPI URL) /rubric#e3 · article	error	0/2
`E4`	Spot-check of 5 internal links → all return 200 rotten internal links are the most-common docs failure mode after years of churn. A 5-link spot-check catches the worst cases (4xx/5xx on links right on your homepage) without trying to crawl every link. Scoring. 2 = 5/5 of sampled same-host links return 200 · 1 = 4/5 · 0 = ≤3/5, OR fewer than 5 distinct same-host links found on the homepage. Fix. Run a link-checker as part of your CI (lychee, htmltest, linkinator). For the most-trafficked links on the homepage, fix any 404s before shipping. Five working links from the landing page is the bare minimum. Evidence. https://docs.musixmatch.com/overview — homepage not parsed /rubric#e4 · article	error	0/2
`E5`	Usage terms: TOS / license / AI policy explicit without an explicit TOS or AI-usage policy, every LLM scraper has to guess your stance. Adding a substantive `/terms` or `/license` page — especially one with AI/ML keywords — makes the policy machine-readable. Scoring. 2 = TOS page found AND contains AI/ML policy keywords in main content · 1 = TOS found (substantial but no AI keywords, or link present but page 404/sparse) · 0 = no TOS link. Fix. Publish `/terms` (or `/legal`, `/license`) with at least 1000 chars of policy text. Include explicit language on AI scraping, model training, and automated access — even if you allow everything, saying so is the signal. Evidence. https://docs.musixmatch.com/overview — no TOS probe reached target /rubric#e5 · article	error	0/2
`F1`	Agentic discovery breadth: llms.txt, llms-full, MCP an agent finds your site through several surfaces; F1 rewards exposing more than one — a reachable llms.txt, a full-content llms-full.txt feed, and an advertised MCP/agent endpoint. It complements A1/A2 (which judge each surface's quality) by counting how many an agent can discover. Scoring. 2 = ≥2 of {reachable llms.txt, llms-full.txt exists, MCP/agent endpoint advertised} · 1 = one of those · 0 = none · error if the site couldn't be fetched. Fix. Publish `/llms.txt` and `/llms-full.txt` at host root, and reference your MCP server or a `.well-known/` discovery endpoint inside `/llms.txt` so agents can find it. Evidence. pre-fetch unreachable /rubric#f1 · article	error	0/2
`F2`	WebMCP declarative tool forms, schema-valid WebMCP lets a page expose callable tools to in-browser agents via declarative `<form toolname tooldescription>` markup. F2 rewards a present, schema-clean WebMCP surface so an agent can invoke the tools without guessing. Scoring. 2 = WebMCP detected, 0 schema errors + 0 warnings · 1 = detected with schema issues (errors or warnings) · 0 = not detected · error if the homepage couldn't be scanned. Fix. Add WebMCP declarative tool forms — each with `toolname` + `tooldescription`, and `name`+`toolparamdescription` on every input. Fix missing-toolname / required-param-no-name errors first; they're the hard failures. Evidence. homepage unreachable or unscannable /rubric#f2 · article	error	0/2
`F3`	MCP server advertised (RFC 9728 / 8414 OAuth) an MCP server lets agents call your API as governed tools. F3 rewards advertising a discoverable, OAuth-protected MCP endpoint via the standard `.well-known` metadata, so an agent can authenticate and connect without bespoke setup. Scoring. 3 = full oauth-mcp (RFC 9728 protected-resource + RFC 8414 auth-server metadata + PKCE S256) · 2 = partial · 1 = endpoint-only · 0 = none · error if the site was unreachable. Fix. Serve `/.well-known/oauth-protected-resource` pointing at your MCP endpoint and a same-host RFC 8414 authorization-server metadata document advertising `S256` PKCE. Evidence. site unreachable — MCP probe inconclusive /rubric#f3 · article	error	0/3
`F4`	Agent accessibility: static name + ARIA validity an AI agent operates a page through its accessibility tree — it needs every button, link, input and image to carry a name it can target. Missing accessible names, invalid ARIA, and positive tabindex all break that, leaving controls an agent can see but can't reliably name or actuate. Scoring. 3 = 0 violations (and ≥1 element to check) · 2 = 1–2 · 1 = 3–5 · 0 = ≥6 · not_applicable if the page has nothing nameable · error if the homepage couldn't be scanned. Static heuristic (no axe-core/headless): name + ARIA-validity rules only. Fix. Give every `<button>`/`<a>`/icon an accessible name (text, `aria-label`, or a labelled child `<img alt>`); associate `<label>`s with inputs/selects; add `alt` to images and `<title>` to inline SVGs; drop positive `tabindex`; fix typo'd `aria-` attributes and roles. Evidence.* homepage unreachable or unscannable /rubric#f4 · article	error	0/3

Criterion

Status

Score

A1

llms.txt at host root conforms to llmstxt.org spec

agents (and humans new to a site) need a single, predictable index of where the docs live. /llms.txt is the convention proposed by Anthropic + the llmstxt.org community: one Markdown file at the host root with the docs map.

Scoring. 2 = H1 + ≥1 H2 + ≥3 links + all resolve · 1 = any H1-bearing file (downgraded from 2 if links don't resolve) · 0 = missing or HTML shell.

Fix. Publish /llms.txt at host root with an `# H1` title, `## H2` section headers, and at least three Markdown bullet links pointing at concrete doc pages. See https://llmstxt.org for the spec.

Evidence. https://docs.musixmatch.com/llms.txt — fetch: GET https://docs.musixmatch.com/llms.txt: Get "https://docs.musixmatch.com/llms.txt": net/http: TLS handshake timeout

/rubric#a1 · article

error

0/2

A2

llms-full.txt or per-section LLM aggregates exist

/llms-full.txt is the full-text dump of your docs in one place — large language model agents prefer it over crawling 200 HTML pages. Per-section variants (/llms-api.txt etc.) work too.

Scoring. 3 = /llms-full.txt at host root, >1 KB · 2 = per-section aggregate found via llms.txt · 0 = neither, or SPA-shell at /llms-full.txt.

Fix. Generate /llms-full.txt at build time (mkdocs/docusaurus plugins exist) and serve it as text/plain. Keep it under 100 MB so agents can fetch it without streaming.

Evidence. https://docs.musixmatch.com/llms-full.txt — pre-fetch unreachable

/rubric#a2 · article

error

0/3

A3

robots.txt declares an AI-bot policy and absolute Sitemap

every AI crawler (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot) reads /robots.txt before crawling. An explicit Allow/Disallow per UA + an absolute `Sitemap:` directive removes ambiguity about both indexing and what to index.

Scoring. 3 = explicit AI-bot UA directive AND absolute Sitemap: line · 2 = AI-bot directive only · 1 = absolute Sitemap only · 0 = neither, or 404.

Fix. Add `User-agent: GPTBot\nAllow: /` (or Disallow as policy dictates) for each major LLM bot, plus a `Sitemap: https://example.com/sitemap.xml` line. Cloudflare's `Content-Signal:` directive also counts.

Evidence. https://docs.musixmatch.com/robots.txt — robots.txt pre-fetch failed

/rubric#a3 · article

error

0/3

A4

sitemap.xml: well-formed, absolute URLs, low taxonomy noise

sitemaps tell crawlers what to index and how often it changes. A well-formed `<urlset>` with absolute `<loc>` URLs across many distinct pages signals real coverage; a stub with one URL (or one with 70%+ /tag/ /category/ noise) gives no signal.

Scoring. 3 = well-formed + ≥3 distinct paths + <30% taxonomy junk · 2 = thin (<3 paths) or 30-70% junk · 1 = well-formed only · 0 = 404 or parse error.

Fix. Generate /sitemap.xml at build time, include every doc page with `<loc>` as absolute URLs, and exclude /tag/, /category/, /author/, /page= variants. Reference it from robots.txt with an absolute `Sitemap:` line.

Evidence. https://docs.musixmatch.com/sitemap.xml — fetch: GET https://docs.musixmatch.com/sitemap.xml: Get "https://docs.musixmatch.com/sitemap.xml": net/http: TLS handshake timeout

/rubric#a4 · article

error

0/3

A5

Homepage discovery tags: markdown alternate + OpenGraph

discovery tags let agents find the markdown version of a page without a separate probe, and OpenGraph turns shared docs links into rich previews on Slack/Discord/Twitter. Both signal an awareness of machine consumers.

Scoring. 2 = `<link rel=alternate type=text/markdown>` + ≥3 distinct `og:` properties · 1 = markdown alternate only OR OpenGraph only · 0 = neither.

Fix. Add `<link rel="alternate" type="text/markdown" href="/page.md">` next to your canonical link, and ensure `og:title`, `og:description`, `og:image` (minimum 3 properties) are set on the homepage.

Evidence. https://docs.musixmatch.com/overview — homepage not parsed

/rubric#a5 · article

error

0/2

B1

.md companion of doc pages returns clean markdown

browsing 200 HTML pages to read your docs is fine for humans; for agents it's an order-of-magnitude tokenization cost. A markdown twin per page lets agents pull just the prose.

Scoring. 7 = 3/3 sampled pages have a working .md twin · 4 = 2/3 · 2 = 1/3 · 0 = 0/3.

Fix. Serve `{page}.md` (or `{page}/index.md`) alongside every HTML page, OR support `Accept: text/markdown` content negotiation that returns `Content-Type: text/markdown`. mkdocs-material and docusaurus both have plugins.

Evidence. https://docs.musixmatch.com/overview — no sitemap samples available

/rubric#b1 · article

error

0/7

B2

JSON-LD with valid @type on homepage and sample doc page

JSON-LD is the schema.org-compatible way to declare "this page is an Article" / "this product is a SoftwareApplication". Search engines, agents, and structured-data extractors all key off it.

Scoring. 4 = parseable JSON-LD with `@type` on BOTH homepage and a sample doc page · 3 = one of the two · 0 = none.

Fix. Embed `<script type="application/ld+json">{"@context":"https://schema.org","@type":"TechArticle",...}</script>` on every doc page. The `WebApplication` type is a good fit for the homepage.

Evidence. https://docs.musixmatch.com/overview — no parsed pages available

/rubric#b2 · article

error

0/4

B3

Absolute <link rel=canonical> on homepage and sample page

canonical links resolve the "is this the http or https version, with or without trailing slash, with or without query?" question deterministically. Without them, agents may index the same content under multiple URLs.

Scoring. 3 = absolute canonical on BOTH homepage AND a sample page · 2 = homepage only · 1 = present but relative · 0 = absent.

Fix. Add `<link rel="canonical" href="https://example.com/page">` to every page's `<head>`. The URL must include the scheme + host (relative canonicals are valid HTML but defeat the purpose for cross-host agents).

Evidence. https://docs.musixmatch.com/overview — no parsed pages available

/rubric#b3 · article

error

0/3

B4

Freshness: dateModified (JSON-LD) or Last-Modified header

agents (and search engines) lower their trust in docs that don't declare when they were last updated. A docs page from 2019 with no freshness signal is indistinguishable from one updated yesterday.

Scoring. 2 = JSON-LD `dateModified` OR HTTP `Last-Modified` header present · 0 = neither.

Fix. Either include `"dateModified": "2026-05-28"` in your JSON-LD block, or have your CDN/server emit a `Last-Modified` HTTP header. Build-time templating does this for free in most static-site generators.

Evidence. https://docs.musixmatch.com/overview — no page metadata available

/rubric#b4 · article

error

0/2

B5

Machine-readable taxonomies (keywords, tags, categories)

tagged docs help agents filter ("show me the auth-related pages") without parsing full prose. `<meta name="keywords">`, JSON-LD `keywords`, or `/tags/`-style URLs all count.

Scoring. 2 = at least one taxonomy signal present (meta keywords, JSON-LD keywords, or /tags|/categories|/topics/ link patterns) · 0 = none.

Fix. Add `<meta name="keywords" content="api,auth,oauth">` to each page, OR include a `keywords` array in your JSON-LD, OR organise content under `/topics/` or `/tags/` URL prefixes.

Evidence. https://docs.musixmatch.com/overview — no parsed pages available

/rubric#b5 · article

error

0/2

B6

<main> or <article> wraps the primary content prose

semantic HTML5 wrappers let agents (and screen readers) strip the navigation/footer/sidebars and read just the docs prose. A page where the body is all `<div>` requires guesswork.

Scoring. 2 = `<main>` text >200 chars AND `<article>` text >100 chars · 1 = `<main>` only OR `<article>` only · 0 = neither.

Fix. Wrap your page's primary prose in `<main>` (or `<article>` for individual doc pages). Avoid using these for sidebars or navigation — they're meant for the actual content.

Evidence. https://docs.musixmatch.com/overview — no parsed pages available

/rubric#b6 · article

error

0/2

C1a

OpenAPI / Swagger / AsyncAPI spec at a discoverable URL

an OpenAPI spec is THE primary machine-readable contract for a REST API. Agents that find it can generate clients, test cases, and accurate docs without reading any HTML.

Scoring. 8 = found at a standard probe path (/openapi.json, /swagger.yaml, /v1/openapi.json, etc.) · 5 = found via HTML link discovery only · 0 = nothing found.

Fix. Publish your spec at `/openapi.json` or `/openapi.yaml` at host root (or under your docs path — Phase 53 probes both). For OpenAPI-first projects, your build tool already produces this — just expose it.

Evidence. https://docs.musixmatch.com/overview — homepage pre-fetch failed; spec discovery skipped

/rubric#c1a · article

error

0/8

C1b

Valid OpenAPI 3.x with info, ≥1 path, and response schemas

finding a spec (C1a) is half the battle; the spec must also be valid 3.x AND describe response schemas so agents can know what they'll get back. A spec that lists paths but no response shapes is half a contract.

Scoring. 7 = OpenAPI 3.x + valid info + paths + ≥30% operations have response content schemas · 6 = +paths but few schemas · 3 = Swagger 2.0 fallback · 0 = parse error.

Fix. Bump your spec to OpenAPI 3.0 or 3.1 if you're still on Swagger 2.0. Add `responses: { '200': { content: { 'application/json': { schema: ... } } } }` to each operation — schemas are what make a spec useful to clients.

Evidence. https://docs.musixmatch.com/overview — C1a did not find a spec body

/rubric#c1b · article

not applicable

0/7

C2

Postman collection or SDKs with discoverable download/fork

an OpenAPI spec lets agents generate a client; a curated Postman collection or pre-built SDK lets HUMANS try the API in 30 seconds. Both signal investment in developer experience.

Scoring. 4 = Postman collection link AND ≥1 SDK registry link · 3 = Postman OR ≥2 SDK links · 2 = 1 SDK link · 0 = nothing.

Fix. Publish a "Run in Postman" button linking to god.gw.postman.com/run-collection, and link to at least one official SDK from npm/PyPI/RubyGems/etc. directly from your docs homepage.

Evidence. https://docs.musixmatch.com/overview — no parsed pages available

/rubric#c2 · article

error

0/4

C3

Endpoint pages show method, URL, types, required, examples

a docs page that just says "call /users" is useless without method, parameter types, required fields, and a sample request/response. Agents (and humans) need all five to make a working call.

Scoring. 5 = majority of sampled pages classified `complete` by the ML model · 3 = majority `partial` (or 2 complete + 1 absent) · 1 = majority `absent` · 0 = no candidate pages found.

Fix. On every endpoint page include: HTTP method + path, a parameter table with types and required flags, a curl example, and a JSON response example with status code. Markdown-style param tables and `<pre>` JSON blocks classify cleanly.

Evidence. https://docs.musixmatch.com/overview — no pages fetched; ML/heuristic skipped

/rubric#c3 · article

error

0/5

D1

Code examples include curl AND at least one language SDK

curl examples are universally testable; SDK examples show idiomatic usage. Together they hit both the "can I try this quickly?" and the "how do I integrate?" needs.

Scoring. 4 = curl AND a language SDK block on ≥1 page · 2 = curl only · 1 = SDK only · 0 = neither.

Fix. Add a tabbed code block per endpoint with at least curl + your most-used SDK language (Python or JavaScript). Use `<code class="language-python">` or `language-bash` so syntax highlighters and our classifier both pick it up.

Evidence. https://docs.musixmatch.com/overview — no sample pages available

/rubric#d1 · article

error

0/4

D2

Realistic examples (not foo/bar/example.com)

`/users/{id}` with `id = 1` and `email = [email protected]` requires the reader to imagine what real data looks like. Realistic placeholders (`[email protected]`, `org_2N5x...`) reduce friction and prevent paste-from-docs accidents.

Scoring. 4 = ML model says <20% of code blocks are placeholder-heavy · 3 = 20-40% · 2 = 40-60% · 1 = 60-80% · 0 = >80% or no code blocks.

Fix. Replace `foo`/`bar`/`example.com`/`your_api_key`/`<string>` with realistic-looking values (Stripe's `pk_test_51N5...`, Twilio's `+14155552671`). Don't use real customer data — but mimic its shape.

Evidence. https://docs.musixmatch.com/overview — no sample pages available; ML/regex path skipped

/rubric#d2 · article

error

0/4

D3

Error catalogue with HTTP codes + reasons

when an integration breaks at 3am, the dev needs to know what `403 - resource_not_owned` actually means without filing a ticket. A dedicated error reference page is the difference between a 5-minute fix and a half-hour debug.

Scoring. 3 = dedicated error page (≥3 codes with explanations) · 1 = error codes documented inline across pages · 0 = none.

Fix. Publish `/errors` (or `/reference/errors`) listing each HTTP status you return + the application-level error codes + a one-sentence cause for each. Tables work well; so do `<dl>` definition lists.

Evidence. https://docs.musixmatch.com/overview — no homepage / sitemap available

/rubric#d3 · article

error

0/3

D4

Authentication AND rate limits documented

auth is table-stakes; rate limits are how a dev knows whether their integration will survive production load. Both belong on a top-level docs page that's discoverable from the homepage.

Scoring. 3 = both auth and rate-limits documented · 2 = auth only · 1 = rate-limits only · 0 = neither.

Fix. Add `/authentication` (bearer / API-key / OAuth flows) and `/rate-limits` (req/min, headers like `X-RateLimit-Remaining`, 429 retry semantics) pages. Each needs at least 200 chars of context — not just a code snippet.

Evidence. https://docs.musixmatch.com/overview — no pages available

/rubric#d4 · article

error

0/3

D5

Glossary OR consistent terminology across pages

is it a "workspace", "team", or "organisation"? Picking one term and sticking with it across all docs prevents a class of "what does X mean here?" support tickets. A dedicated glossary is best; consistent usage is acceptable.

Scoring. 3 = dedicated /glossary with ≥3 structured term/definition pairs · 2 = no glossary but cross-page terminology stays consistent (≥80% dominant variant) · 1 = glossary link exists but content is sparse · 0 = neither.

Fix. Publish `/glossary` as a `<dl>` with `<dt>term</dt><dd>definition</dd>` pairs (or a 2-column table with ≥50-char definitions). Use the same casing/spelling for each term across all pages.

Evidence. https://docs.musixmatch.com/overview — homepage unreachable; glossary probes skipped

/rubric#d5 · article

error

0/3

D6

Deprecated / beta endpoints marked in plain text

a developer pasting your code sample from 2022 into a 2026 project shouldn't discover the endpoint is deprecated at runtime. Explicit `deprecated` / `beta` / `sunset` markers in the docs save migration headaches.

Scoring. 2 = `deprecated` in OpenAPI spec OR in ≥2 sample pages near endpoint headings · 1 = beta/experimental keywords found but no deprecation · 0 = none.

Fix. Mark each deprecated endpoint with `deprecated: true` in OpenAPI AND a visible badge or admonition in the HTML docs (Mintlify's `<Warning>`, Docusaurus's admonition syntax, etc.). Same for beta endpoints — visible in the prose, not just in the spec.

Evidence. https://docs.musixmatch.com/overview — no pages available

/rubric#d6 · article

error

0/2

E1

Content visible in plain HTML without JavaScript (gating)

this is the GATING criterion. If your docs only render after JavaScript runs (single-page-app shell), agents that fetch raw HTML see nothing. WebCrawlers + scrapers + curl + most AI fetchers don't run JS.

Scoring. 6 = body text >500 chars on at least one of homepage or 2 sub-pages across UA modes · 3 = homepage passes but sub-pages SPA · 0 = SPA shell everywhere, or VK-trap (3+ URLs return identical body).

Fix. Serve pre-rendered HTML at static URLs. If you use Next.js/Nuxt/SvelteKit, enable SSG or SSR for the docs section. Single-page-app shells (React SPA, Vue SPA without SSR) fail this gate and cascade-zero many other criteria.

Evidence. https://docs.musixmatch.com/overview — all UA probes failed to reach target

/rubric#e1 · article

error

0/6

E2

Stable URLs: 301 redirects preserve old paths

when you reorganise docs, old links shouldn't 404 — they should 301 to the new URL. Stable URLs are how internal links from blogs, Stack Overflow, and bookmarks survive your refactor.

Scoring. 2 = stable 301/308 redirect on ≥1 of 2 sampled URL-variants · 1 = canonical-alias pattern (200 with `<link rel=canonical>`) · 0 = 302 (impermanent), 404, or no redirect.

Fix. When you change a doc URL, add a 301 redirect from the old path to the new one. Static-site generators handle this via `_redirects` (Netlify) or `vercel.json` `redirects:` (Vercel) configs.

Evidence. https://docs.musixmatch.com/overview — no sitemap samples available

/rubric#e2 · article

error

0/2

E3

Explicit API version in URL path, heading, or OpenAPI spec

`/v1/users` vs `/v2/users` is the cheap way to do API versioning AND make it obvious to agents. Version metadata that's only in a header (and not in the path or docs heading) is invisible to crawlers.

Scoring. 2 = version in the docs' own URL structure (sitemap or OpenAPI-spec URL: `/v1/`, `/2024-01-15/`) · 1 = version in a documented API endpoint URL (curl/code examples), in OpenAPI `info.version`, or in an `<h1>`/`<h2>`/footer heading · 0 = none.

Fix. Prefix your API paths with `/v1/`, `/v2/` and show them in your curl/code examples; OR set a non-empty `info.version` in your OpenAPI spec. Date-versioned APIs (`/2024-01-15/users`) also count.

Evidence. https://docs.musixmatch.com/overview — no input source (homepage, sitemap, or OpenAPI URL)

/rubric#e3 · article

error

0/2

E4

Spot-check of 5 internal links → all return 200

rotten internal links are the most-common docs failure mode after years of churn. A 5-link spot-check catches the worst cases (4xx/5xx on links right on your homepage) without trying to crawl every link.

Scoring. 2 = 5/5 of sampled same-host links return 200 · 1 = 4/5 · 0 = ≤3/5, OR fewer than 5 distinct same-host links found on the homepage.

Fix. Run a link-checker as part of your CI (lychee, htmltest, linkinator). For the most-trafficked links on the homepage, fix any 404s before shipping. Five working links from the landing page is the bare minimum.

Evidence. https://docs.musixmatch.com/overview — homepage not parsed

/rubric#e4 · article

error

0/2

E5

Usage terms: TOS / license / AI policy explicit

without an explicit TOS or AI-usage policy, every LLM scraper has to guess your stance. Adding a substantive `/terms` or `/license` page — especially one with AI/ML keywords — makes the policy machine-readable.

Scoring. 2 = TOS page found AND contains AI/ML policy keywords in main content · 1 = TOS found (substantial but no AI keywords, or link present but page 404/sparse) · 0 = no TOS link.

Fix. Publish `/terms` (or `/legal`, `/license`) with at least 1000 chars of policy text. Include explicit language on AI scraping, model training, and automated access — even if you allow everything, saying so is the signal.

Evidence. https://docs.musixmatch.com/overview — no TOS probe reached target

/rubric#e5 · article

error

0/2

F1

Agentic discovery breadth: llms.txt, llms-full, MCP

an agent finds your site through several surfaces; F1 rewards exposing more than one — a reachable llms.txt, a full-content llms-full.txt feed, and an advertised MCP/agent endpoint. It complements A1/A2 (which judge each surface's quality) by counting how many an agent can discover.

Scoring. 2 = ≥2 of {reachable llms.txt, llms-full.txt exists, MCP/agent endpoint advertised} · 1 = one of those · 0 = none · error if the site couldn't be fetched.

Fix. Publish `/llms.txt` and `/llms-full.txt` at host root, and reference your MCP server or a `.well-known/` discovery endpoint inside `/llms.txt` so agents can find it.

Evidence. pre-fetch unreachable

/rubric#f1 · article

error

0/2

F2

WebMCP declarative tool forms, schema-valid

WebMCP lets a page expose callable tools to in-browser agents via declarative `<form toolname tooldescription>` markup. F2 rewards a present, schema-clean WebMCP surface so an agent can invoke the tools without guessing.

Scoring. 2 = WebMCP detected, 0 schema errors + 0 warnings · 1 = detected with schema issues (errors or warnings) · 0 = not detected · error if the homepage couldn't be scanned.

Fix. Add WebMCP declarative tool forms — each with `toolname` + `tooldescription`, and `name`+`toolparamdescription` on every input. Fix missing-toolname / required-param-no-name errors first; they're the hard failures.

Evidence. homepage unreachable or unscannable

/rubric#f2 · article

error

0/2

F3

MCP server advertised (RFC 9728 / 8414 OAuth)

an MCP server lets agents call your API as governed tools. F3 rewards advertising a discoverable, OAuth-protected MCP endpoint via the standard `.well-known` metadata, so an agent can authenticate and connect without bespoke setup.

Scoring. 3 = full oauth-mcp (RFC 9728 protected-resource + RFC 8414 auth-server metadata + PKCE S256) · 2 = partial · 1 = endpoint-only · 0 = none · error if the site was unreachable.

Fix. Serve `/.well-known/oauth-protected-resource` pointing at your MCP endpoint and a same-host RFC 8414 authorization-server metadata document advertising `S256` PKCE.

Evidence. site unreachable — MCP probe inconclusive

/rubric#f3 · article

error

0/3

F4

Agent accessibility: static name + ARIA validity

an AI agent operates a page through its accessibility tree — it needs every button, link, input and image to carry a name it can target. Missing accessible names, invalid ARIA, and positive tabindex all break that, leaving controls an agent can see but can't reliably name or actuate.

Scoring. 3 = 0 violations (and ≥1 element to check) · 2 = 1–2 · 1 = 3–5 · 0 = ≥6 · not_applicable if the page has nothing nameable · error if the homepage couldn't be scanned. Static heuristic (no axe-core/headless): name + ARIA-validity rules only.

Fix. Give every `<button>`/`<a>`/icon an accessible name (text, `aria-label`, or a labelled child `<img alt>`); associate `<label>`s with inputs/selects; add `alt` to images and `<title>` to inline SVGs; drop positive `tabindex`; fix typo'd `aria-*` attributes and roles.

Evidence. homepage unreachable or unscannable

/rubric#f4 · article

error

0/3

https://docs.musixmatch.com/overview

Categories

History

Change vs previous run

30 criteria