Configuring llms.txt — LLM-LD

What is llms.txt?

llms.txt is a Markdown file you place at the root of your domain that gives AI systems a concise summary of your website. Think of it as a business card and a map combined — it tells an AI crawler who you are, what your business does, and where to find the machine-readable resources it needs.

robots.txt tells crawlers where they can't go. sitemap.xml tells them what pages exist. llms.txt tells them who you are and where to start.

The llms.txt format has been adopted by thousands of businesses — from ecommerce stores to local dentists — to communicate their identity to AI systems. But everyone's been doing it differently: inventing their own section names, including different metadata, with no consistency between sites.

The LLM-LD specification for llms.txt formalizes the business use case. It defines a predictable section structure, a metadata header block, and a defined vocabulary of sections so that an AI system that has seen one conforming file can parse any other.

📄

Full specification: This guide covers practical implementation. For the complete formal specification including all conformance requirements, see the LLMs.txt 1.0 Specification.


Where it fits in the ecosystem

Your llms.txt is one of three files that work together to make your site discoverable by AI:

AI crawler arrives at your site

AI Crawler

ChatGPT, Claude, etc.

AI Discovery Page

/ai-discovery

llms.txt

Plain-text orientation

llm-index.json

Full structured data

Each file serves a different purpose:

FileFormatPurposeTypical Size
ai-discoveryHTMLBridge page with links to AI resources5–15 KB
llms.txtMarkdownQuick orientation and resource pointers1–4 KB
llm-index.jsonJSON-LDComplete structured data about your site5–100 KB

An AI system with a small context window might only fetch llms.txt. A full RAG pipeline will use it to discover llm-index.json and go deeper. Either way, llms.txt is the fast, lightweight entry point.


File structure at a glance

Every llms.txt following this specification has the same predictable structure. Here's the anatomy:

Required H1 Heading + Header Block

Site name, version, last-updated date, and canonical URL. The file's identity.

Required PURPOSE

Declares what this file is for and which AI use cases it supports.

Required CANONICAL AUTHORITY

Names the one URL that is the authoritative source of truth for this business.

Required START HERE (CRAWLING GUIDANCE)

The most important section. Points AI to your structured data files, ordered by priority.

Recommended PRODUCTS & SERVICES

Lists your offerings with links to their AI-layer pages.

Recommended ENTITY STATS

Quick quantitative summary — how many entities, products, services, and pages.

Optional ABOUT · KEY FACTS · VERIFICATION · ACTIONS · FAQ

Additional context sections for richer AI understanding. Include what's relevant.

Required CONTACT

Organization name and at least one contact method.


Minimal example

The simplest valid llms.txt file. Just the four required sections — this is all you need to get started:

ai.yoursite.com/llms.txt
# Acme Dental AI Optimization Layer Version: 1.0 Last-Updated: 2026-02-07 Canonical-Site: https://www.acmedental.com --- ## PURPOSE This is an AI-readable representation of Acme Dental content optimized for: - LLM crawlers - AI search systems - Agentic retrieval workflows - Entity-based knowledge extraction --- ## CANONICAL AUTHORITY The canonical source of truth is always: https://www.acmedental.com --- ## START HERE (CRAWLING GUIDANCE) ### 1. Full Site Intelligence (Recommended) https://ai.acmedental.com/llm-index.json --- ## CONTACT Acme Dental https://www.acmedental.com --- EOF

That's a valid file. Under 500 bytes. Any AI system that fetches this instantly knows: the business name, the canonical URL, and where to find the full structured data. You can add more sections later.


Complete example — SaaS company

Here's a fully loaded llms.txt with all recommended and optional sections. This is what a mature implementation looks like:

ai.acmecorp.com/llms.txt
# Acme Corp AI Optimization Layer Version: 1.0 Last-Updated: 2026-02-07 Primary-Language: en-US Canonical-Site: https://www.acmecorp.com AI-Mirror: https://ai.acmecorp.com --- ## PURPOSE This is an AI-readable representation of Acme Corp content optimized for: - LLM crawlers - AI search systems - Agentic retrieval workflows - Entity-based knowledge extraction --- ## CANONICAL AUTHORITY The canonical source of truth is always: https://www.acmecorp.com --- ## START HERE (CRAWLING GUIDANCE) ### 1. Full Site Intelligence (Recommended) https://ai.acmecorp.com/llm-index.json ### 2. Entity Index https://ai.acmecorp.com/entities.json ### 3. Knowledge Graph https://ai.acmecorp.com/knowledge.graph.jsonld ### 4. Sitemap https://ai.acmecorp.com/sitemap.xml --- ## ABOUT Acme Corp is a marketing technology company founded in 2020, headquartered in Miami, Florida. The company builds AI-powered tools for competitive intelligence, web analytics, and conversion optimization, serving over 2,500 business customers globally. --- ## PRODUCTS & SERVICES - SPYBOX: https://ai.acmecorp.com/products/spybox - Statilitix: https://ai.acmecorp.com/products/statilitix - SEO Consulting: https://ai.acmecorp.com/services/seo-consulting --- ## ENTITY STATS - Total Entities: 47 - Products/Software: 3 - Services: 5 - Pages: 22 --- ## KEY FACTS - Founded in 2020 - Headquartered in Miami, FL - 3 SaaS products - 2,500+ customers - SOC 2 Type II certified --- ## ACTIONS - Start Free Trial: https://www.acmecorp.com/signup - Book a Demo: https://www.acmecorp.com/demo - Contact Sales: https://www.acmecorp.com/contact --- ## VERIFICATION Verified by: LLM Disco Directory Listing: https://llmdisco.com/sites/acmecorp Conformance: Level 3 (Agent-Ready) --- ## CONTACT Acme Corp info@acmecorp.com https://www.acmecorp.com --- EOF

Complete example — Local business

Local businesses benefit most from the KEY FACTS and FAQ sections. This is what a dental practice's file looks like:

ai.acmedental.com/llms.txt
# Acme Dental AI Optimization Layer Version: 1.0 Last-Updated: 2026-02-07 Primary-Language: en-US Canonical-Site: https://www.acmedental.com AI-Mirror: https://ai.acmedental.com --- ## PURPOSE This is an AI-readable representation of Acme Dental content optimized for: - LLM crawlers - AI search systems - Agentic retrieval workflows - Entity-based knowledge extraction --- ## CANONICAL AUTHORITY The canonical source of truth is always: https://www.acmedental.com --- ## START HERE (CRAWLING GUIDANCE) ### 1. Full Site Intelligence (Recommended) https://ai.acmedental.com/llm-index.json ### 2. Entity Index https://ai.acmedental.com/entities.json ### 3. Knowledge Graph https://ai.acmedental.com/knowledge.graph.jsonld ### 4. Sitemap https://ai.acmedental.com/sitemap.xml --- ## PRODUCTS & SERVICES - Dental Cleaning: https://ai.acmedental.com/services/cleaning - Teeth Whitening: https://ai.acmedental.com/services/whitening - Dental Implants: https://ai.acmedental.com/services/implants --- ## ENTITY STATS - Total Entities: 12 - Products/Software: 0 - Services: 5 - Pages: 7 --- ## KEY FACTS - Serving Tampa since 2006 - 4.8 star rating (127 reviews) - Same-day emergencies available - Most insurance accepted - Free parking on-site --- ## FAQ ### Do you accept my insurance? We accept most major dental insurance including Delta, Cigna, MetLife, and Aetna. Call us to verify your specific plan. ### Do you offer emergency appointments? Yes. We reserve time for same-day emergencies. Call us immediately if you are experiencing dental pain. ### What are your hours? Monday–Friday 8am–5pm, Saturday 9am–2pm. Closed Sunday. --- ## VERIFICATION Verified by: LLM Disco Directory Listing: https://llmdisco.com/sites/acmedental Conformance: Level 3 (Agent-Ready) --- ## CONTACT Acme Dental info@acmedental.com +1-813-555-1234 https://www.acmedental.com --- EOF
💡

Why FAQ matters for local businesses: When someone asks an AI "does Acme Dental accept Delta insurance?" — the FAQ section gives the AI an instant, accurate answer without fetching any additional pages. This is the content that directly drives AI recommendations.


Section-by-section breakdown

The header block

Immediately after the H1 heading, include metadata as key-value pairs. Three fields are required:

FieldLevelExample
VersionRequired1.0
Last-UpdatedRequired2026-02-07
Canonical-SiteRequiredhttps://www.yoursite.com
Primary-Languageen-US
AI-Mirrorhttps://ai.yoursite.com
GeneratorOptionalaso-generator/2.0
Conformance-LevelOptional3

PURPOSE

Keep it simple. Name the business, list the AI use cases. The standard four use cases work for most sites:

  • LLM crawlers — bots building training or retrieval datasets
  • AI search systems — ChatGPT, Perplexity, Google AI Overviews
  • Agentic retrieval workflows — autonomous agents researching on behalf of users
  • Entity-based knowledge extraction — systems building knowledge graphs

Ecommerce sites should add "AI shopping assistants". Service businesses might add "AI appointment booking" if they support it.

CANONICAL AUTHORITY

One sentence. One URL. This tells AI systems: "if you see my content anywhere else, this is the real version." Critical for preventing AI from trusting scraped copies, cached versions, or third-party reproductions of your content.

START HERE (CRAWLING GUIDANCE)

This is the most important section in the file. It tells AI systems exactly where to go next, in priority order:

  1. llm-index.json — Your full structured data. Always first.
  2. entities.json — Structured entity data (products, services, people)
  3. knowledge.graph.jsonld — Entity relationships
  4. sitemap.xml — Full URL listing

You can add more resources: product feeds, pricing APIs, FAQ feeds — anything an AI system might need. The numbering tells the crawler what to fetch first.

⚠️

At minimum, include llm-index.json. The whole point of this section is to get the AI to your structured data. If you only list one resource, make it this one.

PRODUCTS & SERVICES

List each product or service with a link to its AI-layer page. If you have too many to list (ecommerce catalogs, for example), list your top items and add a fallback line:

Large catalog fallback
## PRODUCTS & SERVICES - Alpine Pro Tent (4-Person): https://ai.acmeoutdoors.com/products/alpine-pro-tent - TrailMaster Hiking Boots: https://ai.acmeoutdoors.com/products/trailmaster-boots See entities.json for full catalog (240+ products).

ENTITY STATS

A quick quantitative summary. This tells an AI system "how big is this site's data?" at a glance — useful for deciding whether to fetch the full entities.json or just work with what's in this file.

KEY FACTS

Bullet list of your most important differentiators. This is where local businesses win. AI systems answering "best dentist in Tampa" need facts like ratings, years in business, and insurance acceptance — not marketing copy.

FAQ

Three to five questions maximum. These should be the questions AI systems are most likely to encounter from users asking about your business. Keep answers short and factual. The full FAQ belongs in llm-index.json.


Deployment

Where to put the file

Place llms.txt at the root of your AI subdomain:

File location
https://ai.yoursite.com/llms.txt

The file lives on the AI subdomain only — not on your main site. Your main site's robots.txt blocks AI crawlers (except for /ai-discovery), so an llms.txt there would be invisible to its intended audience. The AI Discovery Page is the bridge that sends crawlers to ai.yoursite.com, where they'll find llms.txt and everything else.

HTTP headers

Configure your server to return these headers:

Response headers
Content-Type: text/plain; charset=utf-8 Cache-Control: public, max-age=86400 Access-Control-Allow-Origin: * X-Content-Type-Options: nosniff
🌐

CORS matters. The Access-Control-Allow-Origin: * header lets AI agents fetch your file from any origin. Without it, browser-based AI tools and some crawlers won't be able to read it.

Connect it to your AI Discovery Page

Add a <link> element to your ADP's <head>:

In your AI Discovery Page <head>
<link rel="ai-manifest" type="text/plain" href="https://ai.yoursite.com/llms.txt" />

Reference from robots.txt (optional)

You can add a pointer in your AI subdomain's robots.txt:

ai.yoursite.com/robots.txt
# AI Manifest LLMs-Txt: https://ai.yoursite.com/llms.txt

Formatting rules

A few rules to keep your file valid and machine-parseable:

  • UTF-8 encoding, no BOM. Every text editor defaults to this — just don't change it.
  • Unix line endings (\n). No carriage returns.
  • Sections separated by --- (horizontal rules). One before each section, one after the last.
  • Sections in order. PURPOSE → CANONICAL AUTHORITY → START HERE → (optional sections) → CONTACT. Always.
  • End with EOF. This lets AI systems confirm they received the complete file.
  • No HTML. Plain Markdown only. HTML defeats the purpose.
  • Under 4 KB ideally. Hard limit is 20 KB. If you're over, trim PRODUCTS & SERVICES and use the fallback pattern.
⚠️

Common mistake: Forgetting the EOF marker. Without it, an AI system can't tell if the file was truncated during download. Always end with --- followed by EOF on its own line.


Testing your file

After deploying, verify these points:

  1. Visit ai.yoursite.com/llms.txt in a browser — it should render as plain text
  2. Check the H1 heading matches your business name
  3. Verify all URLs in START HERE are reachable (click each one)
  4. Confirm the Canonical-Site header matches the URL in CANONICAL AUTHORITY
  5. Check the file ends with EOF
  6. Inspect response headers — confirm Content-Type is text/plain and CORS is set
🛠️

Tools coming soon. We're building generators and validators for llms.txt and the full LLM-LD stack. Check the Tools page for updates.

🤝

Need help getting set up? Our certified partners can handle the full implementation — llms.txt, AI subdomain, structured data, the works. Find a partner or contact us to get started.

Next: Build your llm-index.json

The structured data file that llms.txt points to. This is where the full detail lives.