Configuring llms.txt
The plain-text manifest that tells AI systems who you are, what you offer, and where to find your structured data. One file, instant orientation.
What is llms.txt?
llms.txt is a Markdown file you place at the root of your domain that gives AI systems a concise summary of your website. Think of it as a business card and a map combined — it tells an AI crawler who you are, what your business does, and where to find the machine-readable resources it needs.
robots.txt tells crawlers where they can't go. sitemap.xml tells them what pages exist. llms.txt tells them who you are and where to start.
The llms.txt format has been adopted by thousands of businesses — from ecommerce stores to local dentists — to communicate their identity to AI systems. But everyone's been doing it differently: inventing their own section names, including different metadata, with no consistency between sites.
The LLM-LD specification for llms.txt formalizes the business use case. It defines a predictable section structure, a metadata header block, and a defined vocabulary of sections so that an AI system that has seen one conforming file can parse any other.
Full specification: This guide covers practical implementation. For the complete formal specification including all conformance requirements, see the LLMs.txt 1.0 Specification.
Where it fits in the ecosystem
Your llms.txt is one of three files that work together to make your site discoverable by AI:
AI Crawler
ChatGPT, Claude, etc.
AI Discovery Page
/ai-discovery
llms.txt
Plain-text orientation
llm-index.json
Full structured data
Each file serves a different purpose:
| File | Format | Purpose | Typical Size |
|---|---|---|---|
ai-discovery | HTML | Bridge page with links to AI resources | 5–15 KB |
llms.txt | Markdown | Quick orientation and resource pointers | 1–4 KB |
llm-index.json | JSON-LD | Complete structured data about your site | 5–100 KB |
An AI system with a small context window might only fetch llms.txt. A full RAG pipeline will use it to discover llm-index.json and go deeper. Either way, llms.txt is the fast, lightweight entry point.
File structure at a glance
Every llms.txt following this specification has the same predictable structure. Here's the anatomy:
Required H1 Heading + Header Block
Site name, version, last-updated date, and canonical URL. The file's identity.
Required PURPOSE
Declares what this file is for and which AI use cases it supports.
Required CANONICAL AUTHORITY
Names the one URL that is the authoritative source of truth for this business.
Required START HERE (CRAWLING GUIDANCE)
The most important section. Points AI to your structured data files, ordered by priority.
Recommended PRODUCTS & SERVICES
Lists your offerings with links to their AI-layer pages.
Recommended ENTITY STATS
Quick quantitative summary — how many entities, products, services, and pages.
Optional ABOUT · KEY FACTS · VERIFICATION · ACTIONS · FAQ
Additional context sections for richer AI understanding. Include what's relevant.
Required CONTACT
Organization name and at least one contact method.
Minimal example
The simplest valid llms.txt file. Just the four required sections — this is all you need to get started:
That's a valid file. Under 500 bytes. Any AI system that fetches this instantly knows: the business name, the canonical URL, and where to find the full structured data. You can add more sections later.
Complete example — SaaS company
Here's a fully loaded llms.txt with all recommended and optional sections. This is what a mature implementation looks like:
Complete example — Local business
Local businesses benefit most from the KEY FACTS and FAQ sections. This is what a dental practice's file looks like:
Why FAQ matters for local businesses: When someone asks an AI "does Acme Dental accept Delta insurance?" — the FAQ section gives the AI an instant, accurate answer without fetching any additional pages. This is the content that directly drives AI recommendations.
Section-by-section breakdown
The header block
Immediately after the H1 heading, include metadata as key-value pairs. Three fields are required:
| Field | Level | Example |
|---|---|---|
Version | Required | 1.0 |
Last-Updated | Required | 2026-02-07 |
Canonical-Site | Required | https://www.yoursite.com |
Primary-Language | Recommended | en-US |
AI-Mirror | Recommended | https://ai.yoursite.com |
Generator | Optional | aso-generator/2.0 |
Conformance-Level | Optional | 3 |
PURPOSE
Keep it simple. Name the business, list the AI use cases. The standard four use cases work for most sites:
- LLM crawlers — bots building training or retrieval datasets
- AI search systems — ChatGPT, Perplexity, Google AI Overviews
- Agentic retrieval workflows — autonomous agents researching on behalf of users
- Entity-based knowledge extraction — systems building knowledge graphs
Ecommerce sites should add "AI shopping assistants". Service businesses might add "AI appointment booking" if they support it.
CANONICAL AUTHORITY
One sentence. One URL. This tells AI systems: "if you see my content anywhere else, this is the real version." Critical for preventing AI from trusting scraped copies, cached versions, or third-party reproductions of your content.
START HERE (CRAWLING GUIDANCE)
This is the most important section in the file. It tells AI systems exactly where to go next, in priority order:
- llm-index.json — Your full structured data. Always first.
- entities.json — Structured entity data (products, services, people)
- knowledge.graph.jsonld — Entity relationships
- sitemap.xml — Full URL listing
You can add more resources: product feeds, pricing APIs, FAQ feeds — anything an AI system might need. The numbering tells the crawler what to fetch first.
At minimum, include llm-index.json. The whole point of this section is to get the AI to your structured data. If you only list one resource, make it this one.
PRODUCTS & SERVICES
List each product or service with a link to its AI-layer page. If you have too many to list (ecommerce catalogs, for example), list your top items and add a fallback line:
ENTITY STATS
A quick quantitative summary. This tells an AI system "how big is this site's data?" at a glance — useful for deciding whether to fetch the full entities.json or just work with what's in this file.
KEY FACTS
Bullet list of your most important differentiators. This is where local businesses win. AI systems answering "best dentist in Tampa" need facts like ratings, years in business, and insurance acceptance — not marketing copy.
FAQ
Three to five questions maximum. These should be the questions AI systems are most likely to encounter from users asking about your business. Keep answers short and factual. The full FAQ belongs in llm-index.json.
Deployment
Where to put the file
Place llms.txt at the root of your AI subdomain:
The file lives on the AI subdomain only — not on your main site. Your main site's robots.txt blocks AI crawlers (except for /ai-discovery), so an llms.txt there would be invisible to its intended audience. The AI Discovery Page is the bridge that sends crawlers to ai.yoursite.com, where they'll find llms.txt and everything else.
HTTP headers
Configure your server to return these headers:
CORS matters. The Access-Control-Allow-Origin: * header lets AI agents fetch your file from any origin. Without it, browser-based AI tools and some crawlers won't be able to read it.
Connect it to your AI Discovery Page
Add a <link> element to your ADP's <head>:
Reference from robots.txt (optional)
You can add a pointer in your AI subdomain's robots.txt:
Formatting rules
A few rules to keep your file valid and machine-parseable:
- UTF-8 encoding, no BOM. Every text editor defaults to this — just don't change it.
- Unix line endings (
\n). No carriage returns. - Sections separated by
---(horizontal rules). One before each section, one after the last. - Sections in order. PURPOSE → CANONICAL AUTHORITY → START HERE → (optional sections) → CONTACT. Always.
- End with
EOF. This lets AI systems confirm they received the complete file. - No HTML. Plain Markdown only. HTML defeats the purpose.
- Under 4 KB ideally. Hard limit is 20 KB. If you're over, trim PRODUCTS & SERVICES and use the fallback pattern.
Common mistake: Forgetting the EOF marker. Without it, an AI system can't tell if the file was truncated during download. Always end with --- followed by EOF on its own line.
Testing your file
After deploying, verify these points:
- Visit
ai.yoursite.com/llms.txtin a browser — it should render as plain text - Check the H1 heading matches your business name
- Verify all URLs in START HERE are reachable (click each one)
- Confirm the
Canonical-Siteheader matches the URL in CANONICAL AUTHORITY - Check the file ends with
EOF - Inspect response headers — confirm
Content-Typeistext/plainand CORS is set
Tools coming soon. We're building generators and validators for llms.txt and the full LLM-LD stack. Check the Tools page for updates.
Need help getting set up? Our certified partners can handle the full implementation — llms.txt, AI subdomain, structured data, the works. Find a partner or contact us to get started.
Next: Build your llm-index.json
The structured data file that llms.txt points to. This is where the full detail lives.