LLMs.txt 1.0 — Plain-Text AI Manifest Specification
LLMs.txt 1.0 Specification
Plain-Text AI Manifest for Website Discovery and Orientation
Version: 1.0
Status: Draft Specification
Published: February 2026
Latest Version: https://llmld.org/spec/llms-txt/1.0
Companion to: LLM-LD 1.0 Specification, AI Discovery Page (ADP) 1.0 Specification
Editors: CAPXEL LLC
Abstract
LLMs.txt is a specification for a plain-text Markdown file that provides AI systems with a concise, human- and machine-readable orientation to a website. It serves as a lightweight manifest that identifies the site, declares its canonical authority, enumerates its machine-readable resources, summarizes its products and services, and provides contact information — all in a single file that can be consumed in a fraction of a context window.
This specification defines:
- A file format (
llms.txt) using Markdown with a predictable section structure - Required and optional sections with defined semantics
- A metadata header block for machine-parseable fields
- Placement and discovery rules consistent with the LLM-LD ecosystem
- The relationship between
llms.txtand companion structured-data resources (llm-index.json,entities.json,knowledge.graph.jsonld)
This document is a companion to the LLM-LD 1.0 Specification and the AI Discovery Page (ADP) 1.0 Specification. Together, the three standards provide a complete discovery and ingestion framework: the ADP is an HTML bridge for crawlers arriving at the primary domain, llms.txt is a plain-text summary for inference-time retrieval and agent orientation, and llm-index.json is a full structured-data representation of the site.
Table of Contents
- Introduction
- Terminology
- Relationship to Existing Standards
- File Placement and Discovery
- Document Structure
- Header Block
- Required Sections
- Recommended Sections
- Optional Sections
- Formatting Rules
- Size and Performance Considerations
- Security Considerations
- IANA Considerations
- Examples
- Changelog
1. Introduction
1.1 Background
AI systems consume web content in fundamentally different ways than human readers or traditional search crawlers. Large language models operating at inference time face strict context-window constraints that make crawling an entire website impractical. Retrieval-augmented generation (RAG) pipelines benefit from a single, authoritative summary that can orient the model before it decides which deeper resources to fetch. Autonomous agents need a quick overview of available actions, resources, and contact channels before they begin navigating.
In September 2024, Jeremy Howard proposed a convention of placing a Markdown file at /llms.txt to help LLMs navigate website content (llmstxt.org). That proposal targeted developer documentation — helping coding assistants find their way around API references and library docs. Since then, the convention has been widely adopted: Anthropic, Perplexity, Zapier, and hundreds of other organizations publish llms.txt files, and tools like Yoast SEO, Mintlify, and numerous generators automate their creation.
Critically, the market has already extended the format well beyond its original documentation focus. SEO practitioners, ecommerce platforms, local businesses, and SaaS companies are using llms.txt to communicate business identity, products, services, locations, and contact information to AI systems — a practice broadly known as Generative Engine Optimization (GEO) or AI Search Optimization (ASO). However, these implementations are ad hoc and inconsistent: every site invents its own section names, includes different metadata, and structures its content differently. There is no defined vocabulary, no required sections, and no interoperability between implementations.
This specification formalizes the emerging practice. It defines a predictable section structure, a metadata header block, and a defined vocabulary of sections optimized for business websites — the use case the market has organically adopted but that the original llmstxt.org proposal did not address. It further integrates llms.txt into the LLM-LD ecosystem, connecting it to structured-data companions (llm-index.json, entities.json, knowledge.graph.jsonld) and the LLM Disco Directory.
1.2 Design Principles
Plain text first. The file MUST be readable by any system that can fetch a text file. No parsing libraries, JSON deserializers, or XML processors are required.
Orientation, not duplication. The file orients an AI system — telling it what the site is, where the deeper resources live, and what actions are available. It does not attempt to replicate the full content of
llm-index.jsonor the site itself.Predictable structure. Every
llms.txtfile following this specification uses the same section ordering with the same heading names. An AI system that has seen one conforming file can parse any other without additional instructions.Formalizing practice. This specification standardizes patterns already in widespread organic use across the SEO, GEO, and ASO communities, rather than inventing new conventions. Where the market has converged on a pattern, this specification adopts it.
Minimal by default. A conforming file can be very short. Required sections establish identity and point to deeper resources. Everything else is optional enrichment.
Compatible with the LLM-LD ecosystem. The file is designed to work alongside
llm-index.json, the AI Discovery Page,entities.json, andknowledge.graph.jsonld. It references these resources by URL, creating a web of discovery.Backwards compatible with llmstxt.org. A file conforming to this specification is readable by any system that expects a generic
llms.txtfile per the llmstxt.org convention. The H1 heading, descriptive content, and linked URLs will be intelligible to any consumer.
1.3 Audience
This specification is intended for:
- Web developers generating or authoring
llms.txtfor business websites - AI system developers consuming
llms.txtat inference time or during crawling - SEO, GEO, and ASO practitioners optimizing sites for AI search visibility
- Tool developers building generators, validators, and linters
- Directory operators (such as the LLM Disco Directory) integrating
llms.txtinto their indexing pipelines
1.4 Document Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
2. Terminology
- AI Layer
- The collection of machine-readable resources a website publishes for consumption by AI systems. This may live on a dedicated subdomain (e.g.,
ai.example.com), a well-known path, or the primary domain itself. - AI Search Optimization (ASO)
- The practice of structuring website content and metadata for discovery and accurate representation by AI search systems, chatbots, and autonomous agents. Also known as Generative Engine Optimization (GEO) or Answer Engine Optimization (AEO).
- Canonical Site
- The primary human-facing website (e.g.,
https://www.example.com). The canonical site is the authoritative source of truth for all content. - AI Mirror
- An optional AI-optimized subdomain (e.g.,
https://ai.example.com) that serves the same content as the canonical site with enhanced structured data, simplified markup, and permissive crawler policies. - Manifest
- In this specification, the
llms.txtfile itself — a plain-text document that enumerates resources and provides orientation. - Resource
- A machine-readable file published as part of the AI layer, such as
llm-index.json,entities.json,sitemap.xml, orknowledge.graph.jsonld. - Header Block
- The metadata key-value pairs that appear immediately after the H1 heading and before the first horizontal rule. These carry structured metadata in a plain-text-friendly format.
3. Relationship to Existing Standards
3.1 The llmstxt.org Convention
Jeremy Howard's llms.txt proposal (llmstxt.org) established the convention of placing a Markdown file at /llms.txt to help LLMs understand a website. The proposal defines a minimal structure: an H1 heading, an optional blockquote summary, free-form content, and H2 sections containing lists of links to detailed Markdown files. It includes an ## Optional section convention for lower-priority resources. Companion files llms-full.txt (full site content in one file) and per-page .md variants are also proposed.
The llmstxt.org convention was designed primarily for developer documentation and software projects. Its structure is intentionally loose — section names are free-form, no metadata is defined, and no companion structured-data resources are specified.
This specification addresses a different and complementary use case: business websites — local businesses, SaaS companies, ecommerce stores, agencies, and other organizations that need to communicate identity, offerings, and structured-data resources to AI systems. This is the use case the broader SEO/GEO community has organically adopted llms.txt for, but without a formal standard to ensure consistency and interoperability.
This specification extends the llmstxt.org convention with:
- A defined section vocabulary optimized for business websites
- A metadata header block with machine-parseable fields (version, canonical URL, language)
- Explicit resource enumeration pointing to structured-data companions (
llm-index.json,entities.json,knowledge.graph.jsonld) - Integration with the LLM-LD ecosystem (conformance levels, verification, directory membership)
Files conforming to this specification are backwards compatible with the llmstxt.org convention: any system that reads a generic llms.txt file will be able to consume the H1 heading, content, and linked URLs defined here.
3.2 LLM-LD 1.0
LLM-LD 1.0 defines llm-index.json — a comprehensive JSON-LD file containing structured data about a website. The llms.txt file acts as a lightweight companion:
| Concern | llm-index.json | llms.txt |
|---|---|---|
| Format | JSON-LD | Markdown (plain text) |
| Primary consumer | Programmatic pipelines, agents | Chat-based LLMs, RAG systems, human auditors |
| Depth | Complete structured data | Summary and resource pointers |
| Typical size | 5–100 KB | 0.5–5 KB |
| Parsing required | JSON parser | None (plain text) |
The two files SHOULD be generated together. llms.txt SHOULD reference llm-index.json by URL in the START HERE section.
3.3 AI Discovery Page (ADP)
The ADP is an HTML page on the primary domain that bridges crawlers from the human web to the AI layer. The ADP's <head> includes a <link rel="ai-manifest"> element that points to llms.txt:
<link rel="ai-manifest" type="text/plain" href="https://ai.example.com/llms.txt" />
The three standards form a discovery chain:
Crawler arrives at primary domain
│
▼
ADP (/ai.html) ← HTML bridge (ADP 1.0)
│
├──► llms.txt ← Plain-text orientation (this spec)
│
└──► llm-index.json ← Full structured data (LLM-LD 1.0)
3.4 robots.txt and sitemap.xml
llms.txt does not replace robots.txt or sitemap.xml. These files serve different purposes:
robots.txtcontrols crawler access permissionssitemap.xmlenumerates URLs for crawlingllms.txtprovides orientation and resource discovery for AI systems
llms.txt MAY be referenced from robots.txt using a custom directive (see Section 4.3).
3.5 Current Market Practice
As of early 2026, llms.txt is in widespread organic use across multiple communities:
- Developer documentation: Anthropic, Vercel, Stripe, and hundreds of software projects use llmstxt.org-style files to help coding assistants navigate API docs.
- SEO/GEO tools: Yoast SEO auto-generates
llms.txtfor WordPress sites. LLMrefs, Rankability, and other tools provide generators and validators. - Ecommerce: BigCommerce, Shopify app developers, and DTC brands use
llms.txtto surface product catalogs and pricing for AI shopping assistants. - Local businesses: Dental practices, law firms, agencies, and other SMBs use
llms.txtto establish business identity and service areas for AI search. - Enterprise: Dell, Zapier, and other large organizations publish
llms.txtwith product feeds and support documentation.
This specification standardizes the patterns common across these implementations while defining the formal structure needed for interoperability, tooling, and ecosystem integration.
4. File Placement and Discovery
4.1 File Location
The llms.txt file MUST be served at the root path of the AI subdomain:
https://ai.example.com/llms.txt
The file MUST NOT be served on the primary domain. The primary domain's robots.txt blocks AI crawlers (except for the AI Discovery Page at /ai-discovery), so an llms.txt on the primary domain would be inaccessible to its intended consumers. The AI Discovery Page serves as the bridge from the primary domain to the AI subdomain, where llms.txt and all other machine-readable resources reside.
4.2 HTTP Requirements
The file MUST be served with:
- Content-Type:
text/plain; charset=utf-8ortext/markdown; charset=utf-8 - Status Code:
200 OK - Encoding: UTF-8 without BOM
The server SHOULD include the following response headers:
| Header | Value | Purpose |
|---|---|---|
Cache-Control | public, max-age=86400 | Cache for 24 hours |
Access-Control-Allow-Origin | * | Enable cross-origin fetches by AI agents |
X-Content-Type-Options | nosniff | Prevent MIME type sniffing |
4.3 Discovery Mechanisms
AI systems can discover llms.txt through any of the following mechanisms, listed in order of preference:
1. Direct URL convention
AI crawlers SHOULD check {domain}/llms.txt as a first-pass discovery step, mirroring the robots.txt convention. This is already standard practice across most AI crawlers and GEO tooling.
2. ADP link relation
The AI Discovery Page SHOULD include a <link> element pointing to llms.txt:
<link rel="ai-manifest" type="text/plain" href="https://ai.example.com/llms.txt" />
3. robots.txt reference
Implementations MAY include a reference in robots.txt:
# AI Manifest
LLMs-Txt: https://ai.example.com/llms.txt
4. LLM Disco Directory
The LLM Disco Directory maintains a registry of llms.txt URLs for all listed sites. AI systems can query the directory API to discover llms.txt for any registered domain.
5. HTTP Link header
Implementations MAY include an HTTP Link header on any page:
Link: <https://ai.example.com/llms.txt>; rel="ai-manifest"; type="text/plain"
4.4 URL Canonicalization
The URL referenced in all discovery mechanisms MUST be consistent. Implementations MUST NOT advertise a URL different from the one at which the file is actually served. The canonical URL is always the AI subdomain URL (https://ai.example.com/llms.txt).
5. Document Structure
An llms.txt file is a UTF-8 plain-text document formatted as Markdown. It consists of:
- An H1 heading declaring the site identity
- A header block of key-value metadata
- A sequence of H2 sections separated by horizontal rules
- A terminal
EOFmarker
The overall structure:
# {Site Name} AI Optimization Layer ← H1 (required)
Version: 1.0 ← Header block (required)
Last-Updated: 2026-02-07
Primary-Language: en-US
Canonical-Site: https://www.example.com
AI-Mirror: https://ai.example.com
---
## PURPOSE ← Required section
...content...
---
## CANONICAL AUTHORITY ← Required section
...content...
---
## START HERE (CRAWLING GUIDANCE) ← Required section
...content...
---
## PRODUCTS & SERVICES ← Recommended section
...content...
---
## ENTITY STATS ← Recommended section
...content...
---
## CONTACT ← Required section
...content...
---
EOF
5.1 Section Ordering
Sections MUST appear in the order specified in this document. This predictable ordering allows AI systems to locate specific information by position without parsing section headers.
5.2 Section Separators
Each section MUST be preceded and followed by a Markdown horizontal rule (---). The horizontal rule after the header block serves as the separator before the first section.
5.3 EOF Marker
The file MUST end with the literal string EOF on its own line, preceded by a horizontal rule. This allows AI systems to confirm they have received the complete file.
6. Header Block
The header block appears immediately after the H1 heading and before the first horizontal rule. It consists of key-value pairs in Key: Value format, one per line. Blank lines within the header block are permitted for visual grouping.
6.1 Required Header Fields
| Field | Type | Description |
|---|---|---|
Version | String | Specification version (MUST be "1.0" for this version) |
Last-Updated | Date | ISO 8601 date (YYYY-MM-DD) of last generation or edit |
Canonical-Site | URL | The primary human-facing website URL |
6.2 Recommended Header Fields
| Field | Type | Description |
|---|---|---|
Primary-Language | String | BCP 47 language code (e.g., en-US) |
AI-Mirror | URL | The AI-optimized subdomain URL, if one exists |
6.3 Optional Header Fields
| Field | Type | Description |
|---|---|---|
Generator | String | Software that generated this file (e.g., aso-generator/2.0) |
LLM-LD-Version | String | Version of LLM-LD specification the site conforms to |
Conformance-Level | Integer | LLM-LD conformance level (1, 2, or 3) |
Expires | Date | ISO 8601 date after which this file should be re-fetched |
Directory-Listing | URL | URL of the site's LLM Disco Directory listing |
6.4 Example
# Acme Corp AI Optimization Layer
Version: 1.0
Last-Updated: 2026-02-07
Primary-Language: en-US
Canonical-Site: https://www.acmecorp.com
AI-Mirror: https://ai.acmecorp.com
6.5 Trailing Whitespace
Header field lines SHOULD end with two trailing spaces (Markdown line break) to ensure correct rendering when the file is displayed as rendered Markdown. Parsers MUST trim trailing whitespace when extracting values.
7. Required Sections
The following sections MUST be present in every conforming llms.txt file.
7.1 PURPOSE
Heading: ## PURPOSE
Description: Declares the intent of the file and enumerates the AI use cases it supports.
Content requirements:
- MUST contain a sentence identifying the site by name
- MUST contain a list of supported AI use cases
Standard use-case list: Implementations SHOULD include the following use cases unless they are genuinely not applicable:
- LLM crawlers
- AI search systems
- Agentic retrieval workflows
- Entity-based knowledge extraction
Implementations MAY add additional use cases specific to their domain.
Example:
## PURPOSE
This is an AI-readable representation of Acme Corp content optimized for:
- LLM crawlers
- AI search systems
- Agentic retrieval workflows
- Entity-based knowledge extraction
7.2 CANONICAL AUTHORITY
Heading: ## CANONICAL AUTHORITY
Description: Declares which URL is the authoritative source of truth. This is critical for AI systems that must distinguish between the canonical site, the AI mirror, cached copies, and third-party reproductions.
Content requirements:
- MUST contain a single sentence naming the canonical URL
- The URL MUST match the
Canonical-Siteheader field
Example:
## CANONICAL AUTHORITY
The canonical source of truth is always: https://www.acmecorp.com
7.3 START HERE (CRAWLING GUIDANCE)
Heading: ## START HERE (CRAWLING GUIDANCE)
Description: Enumerates the machine-readable resources available in the AI layer, ordered by recommended consumption priority. This is the most important section for AI agents: it tells them where to go next.
Content requirements:
- MUST contain at least one numbered subsection with a resource URL
- Each resource MUST be listed as an H3 heading (
###) with a priority number and descriptive name - The resource URL MUST appear on its own line immediately below the heading
Standard resource ordering:
| Priority | Resource | Description |
|---|---|---|
| 1 | Full Site Intelligence | llm-index.json — recommended first fetch |
| 2 | Entity Index | entities.json — structured entity data |
| 3 | Knowledge Graph | knowledge.graph.jsonld — entity relationships |
| 4 | Sitemap | sitemap.xml — full URL listing |
Implementations MUST include at least resource 1 (Full Site Intelligence). Resources 2–4 are RECOMMENDED if the corresponding files exist. Implementations MAY add additional resources (e.g., product feeds, FAQ feeds, pricing data).
Example:
## START HERE (CRAWLING GUIDANCE)
### 1. Full Site Intelligence (Recommended)
https://ai.acmecorp.com/llm-index.json
### 2. Entity Index
https://ai.acmecorp.com/entities.json
### 3. Knowledge Graph
https://ai.acmecorp.com/knowledge.graph.jsonld
### 4. Sitemap
https://ai.acmecorp.com/sitemap.xml
7.4 CONTACT
Heading: ## CONTACT
Description: Provides basic contact information for the organization.
Content requirements:
- MUST contain the organization name
- MUST contain at least one of: URL, email address, or phone number
- Each piece of contact information SHOULD appear on its own line
Example:
## CONTACT
Acme Corp
info@acmecorp.com
https://www.acmecorp.com
8. Recommended Sections
The following sections SHOULD be present when applicable.
8.1 PRODUCTS & SERVICES
Heading: ## PRODUCTS & SERVICES
Description: Lists the site's products and/or services with links to their AI-layer pages.
Content requirements:
- Each product or service MUST be listed as a Markdown list item (
-) with its name - Each entry SHOULD include a URL to its AI-layer page on a second line, indented by two spaces
- Entries MUST be deduplicated by name
- If no products or services are available, the section SHOULD contain a fallback reference to
entities.json
Format:
- {Product/Service Name}:
{URL}
Example:
## PRODUCTS & SERVICES
- SPYBOX:
https://ai.acmecorp.com/products/spybox
- Statilitix:
https://ai.acmecorp.com/products/statilitix
- SEO Consulting:
https://ai.acmecorp.com/services/seo-consulting
Fallback:
## PRODUCTS & SERVICES
See entities.json for full list.
8.2 ENTITY STATS
Heading: ## ENTITY STATS
Description: Provides a quantitative summary of the site's structured data coverage. This gives AI systems an at-a-glance understanding of the site's scope and data richness.
Content requirements:
- MUST be a Markdown unordered list
- Each item MUST be a label–value pair in the format
- {Label}: {Value} - SHOULD include at minimum: Total Entities and Pages
Standard statistics:
| Statistic | Description |
|---|---|
| Total Entities | Count of all extracted entities |
| Products/Software | Count of SoftwareApplication and Product entities |
| Services | Count of Service entities |
| Pages | Count of indexed pages |
Implementations MAY include additional statistics relevant to their domain (e.g., Articles, People, Locations, SKUs).
Example:
## ENTITY STATS
- Total Entities: 47
- Products/Software: 3
- Services: 5
- Pages: 22
9. Optional Sections
The following sections MAY be included for additional context.
9.1 ABOUT
Heading: ## ABOUT
Description: A brief plain-text summary of the organization, suitable for inclusion in an AI system's context window.
Content requirements:
- SHOULD be 2–5 sentences
- SHOULD NOT duplicate the
one_linerorparagraphfromllm-index.jsonverbatim; a unique summary is preferred - MUST NOT contain marketing hyperbole; keep factual
Example:
## ABOUT
Acme Corp is a marketing technology company founded in 2020, headquartered in Miami, Florida. The company builds AI-powered tools for competitive intelligence, web analytics, and conversion optimization, serving over 2,500 business customers globally.
9.2 KEY FACTS
Heading: ## KEY FACTS
Description: A bullet list of the most important facts about the site or organization. This section is particularly valuable for local businesses and service providers where key differentiators (ratings, years in business, insurance acceptance, certifications) drive AI recommendations.
Content requirements:
- MUST be a Markdown unordered list
- SHOULD contain 3–10 items
- Each item SHOULD be a single sentence or phrase
Example:
## KEY FACTS
- Founded in 2020
- Headquartered in Miami, FL
- 3 SaaS products
- 2,500+ customers
- SOC 2 Type II certified
9.3 VERIFICATION
Heading: ## VERIFICATION
Description: Declares the site's verification status within the LLM Disco Directory or other trust registries.
Content requirements:
- MUST name the verifying directory
- SHOULD include the listing URL
- SHOULD include the conformance level
Example:
## VERIFICATION
Verified by: LLM Disco Directory
Listing: https://llmdisco.com/sites/acmecorp
Conformance: Level 3 (Agent-Ready)
9.4 ACTIONS
Heading: ## ACTIONS
Description: Lists the primary actions available on the site, for AI agents that need to understand available interaction endpoints before consulting llm-index.json.
Content requirements:
- Each action SHOULD be a list item with a name and URL
- Actions SHOULD be limited to the 3–5 most important; full action definitions belong in
llm-index.json
Example:
## ACTIONS
- Start Free Trial: https://www.acmecorp.com/signup
- Book a Demo: https://www.acmecorp.com/demo
- Contact Sales: https://www.acmecorp.com/contact
9.5 FAQ
Heading: ## FAQ
Description: A small set of frequently asked questions, useful for giving AI systems quick answers to common queries without requiring a fetch of the full FAQ from llm-index.json.
Content requirements:
- Each FAQ entry MUST use an H3 heading (
###) for the question and a paragraph for the answer - SHOULD contain no more than 5 entries; full FAQ content belongs in
llm-index.json
Example:
## FAQ
### Do you offer a free trial?
Yes. All products include a 14-day free trial with no credit card required.
### What integrations do you support?
We integrate with Google Analytics, HubSpot, Salesforce, and 50+ other platforms.
10. Formatting Rules
10.1 Markdown Dialect
llms.txt files MUST use CommonMark-compatible Markdown. The following Markdown constructs are used:
| Construct | Syntax | Usage |
|---|---|---|
| H1 heading | # | File title (exactly one) |
| H2 heading | ## | Section headings |
| H3 heading | ### | Subsections (in START HERE and FAQ) |
| Unordered list | - | Lists of items |
| Horizontal rule | --- | Section separators |
| Inline link | [text](url) | MAY be used but bare URLs are preferred for machine readability |
| Bare URL | https://... | Preferred format for resource URLs |
10.2 Line Endings
Files MUST use Unix-style line endings (\n). Carriage return characters (\r) MUST NOT appear in the file.
10.3 Trailing Whitespace
Lines that should produce a Markdown line break (e.g., header fields, contact information) SHOULD end with two trailing spaces followed by a newline.
10.4 Blank Lines
Blank lines SHOULD be used to separate logical groups within a section. A single blank line SHOULD appear before and after each horizontal rule.
10.5 Character Encoding
The file MUST be encoded as UTF-8. A byte-order mark (BOM) MUST NOT be present.
10.6 No HTML
llms.txt files MUST NOT contain HTML tags. The file is plain text formatted with Markdown; HTML defeats the purpose of a lightweight, universally parseable format.
11. Size and Performance Considerations
11.1 Target Size
A llms.txt file SHOULD be under 4 KB. This ensures it fits comfortably within a single LLM context-window fetch even on systems with small context limits.
Files MUST NOT exceed 20 KB. If a site has too many products, services, or entities to fit within this limit, the PRODUCTS & SERVICES section SHOULD use the fallback pattern (referencing entities.json) rather than listing every item.
11.2 Token Budget
For reference, 4 KB of Markdown text is approximately 1,000–1,200 tokens in most tokenizers. This leaves ample room for the file to be included alongside other context in a typical 8K–128K context window.
11.3 Compression
Servers SHOULD support gzip or Brotli compression for llms.txt. A typical file compresses to 30–50% of its original size.
11.4 Caching
Servers SHOULD set Cache-Control: public, max-age=86400 (24 hours). AI crawlers that maintain their own caches SHOULD respect standard HTTP caching headers.
12. Security Considerations
12.1 No Sensitive Information
llms.txt files MUST NOT contain:
- Passwords, API keys, or authentication tokens
- Internal IP addresses or infrastructure details
- Personal data of individuals (names, emails, phone numbers of employees are acceptable only if they are already publicly available on the website)
- Non-public pricing or business terms
12.2 URL Integrity
All URLs in the file MUST use HTTPS. HTTP URLs MUST NOT be used.
12.3 Cross-Origin Considerations
Because llms.txt is a plain-text file intended for broad consumption, the Access-Control-Allow-Origin: * header is appropriate. However, the file MUST NOT be used as a vector for injecting content into AI systems. Implementations SHOULD NOT include prompt-injection patterns, hidden instructions, or adversarial text.
12.4 Canonical Verification
AI systems consuming llms.txt SHOULD verify that the Canonical-Site URL in the header block matches the domain from which the file was served (allowing for the www. / ai. subdomain convention). Files with mismatched canonical claims SHOULD be treated with reduced trust.
13. IANA Considerations
13.1 Link Relation
This specification relies on the ai-manifest link relation defined in the AI Discovery Page 1.0 Specification. No additional link relation registration is required.
13.2 Well-Known URI
This specification does not define a well-known URI. The file is served at /llms.txt on the relevant domain, following the convention established by robots.txt.
13.3 Media Type
The file SHOULD be served as text/plain; charset=utf-8. Implementations MAY serve it as text/markdown; charset=utf-8 if the server supports this media type.
14. Examples
14.1 Minimal Conforming File
This example shows the minimum required content for a valid llms.txt file under this specification:
# Acme Dental AI Optimization Layer
Version: 1.0
Last-Updated: 2026-02-07
Canonical-Site: https://www.acmedental.com
---
## PURPOSE
This is an AI-readable representation of Acme Dental content optimized for:
- LLM crawlers
- AI search systems
- Agentic retrieval workflows
- Entity-based knowledge extraction
---
## CANONICAL AUTHORITY
The canonical source of truth is always: https://www.acmedental.com
---
## START HERE (CRAWLING GUIDANCE)
### 1. Full Site Intelligence (Recommended)
https://ai.acmedental.com/llm-index.json
---
## CONTACT
Acme Dental
https://www.acmedental.com
---
EOF
14.2 Complete File — SaaS Company
# Acme Corp AI Optimization Layer
Version: 1.0
Last-Updated: 2026-02-07
Primary-Language: en-US
Canonical-Site: https://www.acmecorp.com
AI-Mirror: https://ai.acmecorp.com
---
## PURPOSE
This is an AI-readable representation of Acme Corp content optimized for:
- LLM crawlers
- AI search systems
- Agentic retrieval workflows
- Entity-based knowledge extraction
---
## CANONICAL AUTHORITY
The canonical source of truth is always: https://www.acmecorp.com
---
## START HERE (CRAWLING GUIDANCE)
### 1. Full Site Intelligence (Recommended)
https://ai.acmecorp.com/llm-index.json
### 2. Entity Index
https://ai.acmecorp.com/entities.json
### 3. Knowledge Graph
https://ai.acmecorp.com/knowledge.graph.jsonld
### 4. Sitemap
https://ai.acmecorp.com/sitemap.xml
---
## ABOUT
Acme Corp is a marketing technology company founded in 2020, headquartered in Miami, Florida. The company builds AI-powered tools for competitive intelligence, web analytics, and conversion optimization, serving over 2,500 business customers globally.
---
## PRODUCTS & SERVICES
- SPYBOX:
https://ai.acmecorp.com/products/spybox
- Statilitix:
https://ai.acmecorp.com/products/statilitix
- SEO Consulting:
https://ai.acmecorp.com/services/seo-consulting
---
## ENTITY STATS
- Total Entities: 47
- Products/Software: 3
- Services: 5
- Pages: 22
---
## ACTIONS
- Start Free Trial: https://www.acmecorp.com/signup
- Book a Demo: https://www.acmecorp.com/demo
- Contact Sales: https://www.acmecorp.com/contact
---
## VERIFICATION
Verified by: LLM Disco Directory
Listing: https://llmdisco.com/sites/acmecorp
Conformance: Level 3 (Agent-Ready)
---
## CONTACT
Acme Corp
info@acmecorp.com
https://www.acmecorp.com
---
EOF
14.3 Complete File — Local Business
# Acme Dental AI Optimization Layer
Version: 1.0
Last-Updated: 2026-02-07
Primary-Language: en-US
Canonical-Site: https://www.acmedental.com
AI-Mirror: https://ai.acmedental.com
---
## PURPOSE
This is an AI-readable representation of Acme Dental content optimized for:
- LLM crawlers
- AI search systems
- Agentic retrieval workflows
- Entity-based knowledge extraction
---
## CANONICAL AUTHORITY
The canonical source of truth is always: https://www.acmedental.com
---
## START HERE (CRAWLING GUIDANCE)
### 1. Full Site Intelligence (Recommended)
https://ai.acmedental.com/llm-index.json
### 2. Entity Index
https://ai.acmedental.com/entities.json
### 3. Knowledge Graph
https://ai.acmedental.com/knowledge.graph.jsonld
### 4. Sitemap
https://ai.acmedental.com/sitemap.xml
---
## PRODUCTS & SERVICES
- Dental Cleaning:
https://ai.acmedental.com/services/cleaning
- Teeth Whitening:
https://ai.acmedental.com/services/whitening
---
## ENTITY STATS
- Total Entities: 8
- Products/Software: 0
- Services: 2
- Pages: 4
---
## KEY FACTS
- Serving Tampa since 2006
- 4.8 star rating (127 reviews)
- Same-day emergencies available
- Most insurance accepted
---
## FAQ
### Do you accept my insurance?
We accept most major dental insurance including Delta, Cigna, MetLife, and Aetna. Call us to verify your specific plan.
### Do you offer emergency appointments?
Yes. We reserve time for same-day emergencies. Call us immediately if you are experiencing dental pain.
---
## VERIFICATION
Verified by: LLM Disco Directory
Listing: https://llmdisco.com/sites/acmedental
Conformance: Level 3 (Agent-Ready)
---
## CONTACT
Acme Dental
info@acmedental.com
+1-813-555-1234
https://www.acmedental.com
---
EOF
14.4 Complete File — Ecommerce Store
# Acme Outdoors AI Optimization Layer
Version: 1.0
Last-Updated: 2026-02-07
Primary-Language: en-US
Canonical-Site: https://www.acmeoutdoors.com
AI-Mirror: https://ai.acmeoutdoors.com
---
## PURPOSE
This is an AI-readable representation of Acme Outdoors content optimized for:
- LLM crawlers
- AI search systems
- Agentic retrieval workflows
- Entity-based knowledge extraction
- AI shopping assistants
---
## CANONICAL AUTHORITY
The canonical source of truth is always: https://www.acmeoutdoors.com
---
## START HERE (CRAWLING GUIDANCE)
### 1. Full Site Intelligence (Recommended)
https://ai.acmeoutdoors.com/llm-index.json
### 2. Entity Index
https://ai.acmeoutdoors.com/entities.json
### 3. Product Feed
https://ai.acmeoutdoors.com/products.json
### 4. Knowledge Graph
https://ai.acmeoutdoors.com/knowledge.graph.jsonld
### 5. Sitemap
https://ai.acmeoutdoors.com/sitemap.xml
---
## PRODUCTS & SERVICES
- Alpine Pro Tent (4-Person):
https://ai.acmeoutdoors.com/products/alpine-pro-tent
- TrailMaster Hiking Boots:
https://ai.acmeoutdoors.com/products/trailmaster-boots
- Summit Down Jacket:
https://ai.acmeoutdoors.com/products/summit-down-jacket
See entities.json for full catalog (240+ products).
---
## ENTITY STATS
- Total Entities: 267
- Products: 243
- Pages: 38
- Articles: 24
---
## KEY FACTS
- Family-owned since 1998
- 243 products across camping, hiking, and climbing
- Free shipping on orders over $75
- 30-day return policy
- 4.7 star average (2,100+ reviews)
---
## ACTIONS
- Shop All Products: https://www.acmeoutdoors.com/shop
- Find a Store: https://www.acmeoutdoors.com/stores
---
## CONTACT
Acme Outdoors
support@acmeoutdoors.com
+1-303-555-7890
https://www.acmeoutdoors.com
---
EOF
15. Changelog
Version 1.0 (February 2026)
- Initial specification release
- Defined header block with required, recommended, and optional fields
- Required sections: PURPOSE, CANONICAL AUTHORITY, START HERE, CONTACT
- Recommended sections: PRODUCTS & SERVICES, ENTITY STATS
- Optional sections: ABOUT, KEY FACTS, VERIFICATION, ACTIONS, FAQ
- Discovery mechanisms aligned with ADP 1.0 and LLM-LD 1.0
- Size limit: 20 KB maximum, 4 KB target
- Backwards compatibility with llmstxt.org convention
Appendix A: Section Reference
Quick reference of all sections by requirement level:
Required
| Section | Heading | Purpose |
|---|---|---|
| PURPOSE | ## PURPOSE | Declare file intent and AI use cases |
| CANONICAL AUTHORITY | ## CANONICAL AUTHORITY | Name the authoritative source URL |
| START HERE | ## START HERE (CRAWLING GUIDANCE) | Enumerate machine-readable resources |
| CONTACT | ## CONTACT | Provide basic contact information |
Recommended
| Section | Heading | Purpose |
|---|---|---|
| PRODUCTS & SERVICES | ## PRODUCTS & SERVICES | List offerings with AI-layer URLs |
| ENTITY STATS | ## ENTITY STATS | Quantitative summary of structured data |
Optional
| Section | Heading | Purpose |
|---|---|---|
| ABOUT | ## ABOUT | Brief organizational summary |
| KEY FACTS | ## KEY FACTS | Bullet list of important facts |
| VERIFICATION | ## VERIFICATION | Directory membership and conformance |
| ACTIONS | ## ACTIONS | Primary interaction endpoints |
| FAQ | ## FAQ | Common questions and answers |
Appendix B: Header Field Reference
| Field | Required | Type | Example |
|---|---|---|---|
Version | Yes | String | 1.0 |
Last-Updated | Yes | Date | 2026-02-07 |
Canonical-Site | Yes | URL | https://www.example.com |
Primary-Language | Recommended | String | en-US |
AI-Mirror | Recommended | URL | https://ai.example.com |
Generator | Optional | String | aso-generator/2.0 |
LLM-LD-Version | Optional | String | 1.0 |
Conformance-Level | Optional | Integer | 3 |
Expires | Optional | Date | 2026-02-14 |
Directory-Listing | Optional | URL | https://llmdisco.com/sites/example |
Appendix C: Implementation Checklist
Minimal Conformance
- [ ] File served at
/llms.txt - [ ] H1 heading with site name
- [ ] Header block with Version, Last-Updated, Canonical-Site
- [ ] PURPOSE section with use-case list
- [ ] CANONICAL AUTHORITY section naming the canonical URL
- [ ] START HERE section with at least
llm-index.jsonURL - [ ] CONTACT section with organization name and at least one contact method
- [ ] EOF marker
- [ ] UTF-8 encoding, no BOM
- [ ] File size under 20 KB
Recommended Additions
- [ ] Primary-Language and AI-Mirror header fields
- [ ] PRODUCTS & SERVICES section
- [ ] ENTITY STATS section
- [ ] CORS header (
Access-Control-Allow-Origin: *) - [ ] Cache-Control header set
Full Implementation
- [ ] All optional header fields populated
- [ ] ABOUT, KEY FACTS, VERIFICATION, ACTIONS, FAQ sections
- [ ] Referenced from ADP via
<link rel="ai-manifest"> - [ ] Referenced from robots.txt via
LLMs-Txt:directive - [ ] Registered with LLM Disco Directory
- [ ] Automated regeneration on content change
Appendix D: Relationship to llmstxt.org
This specification and the llmstxt.org proposal share the same filename convention (/llms.txt) and the same basic Markdown format. They serve different but complementary purposes:
| Aspect | llmstxt.org (Howard) | LLMs.txt 1.0 (This Specification) |
|---|---|---|
| Primary use case | Developer documentation, software projects | Business websites, local businesses, SaaS, ecommerce |
| Typical content | Links to Markdown docs and API references | Business identity, structured-data resources, products/services |
| Companion files | llms-full.txt, per-page .md variants | llm-index.json, entities.json, knowledge.graph.jsonld |
| Structure | H1 + blockquote + free-form H2 sections | H1 + header block + defined section vocabulary |
| Metadata | None | Structured header block (Version, Canonical-Site, etc.) |
| Section names | Free-form (author chooses) | Defined vocabulary (PURPOSE, START HERE, etc.) |
| Ecosystem | Standalone convention | Part of LLM-LD + ADP + LLM Disco Directory |
| Spec maturity | Informal proposal | Formal specification with conformance requirements |
Backwards compatibility: A file conforming to this specification is readable by any system that expects a generic llmstxt.org-style file. The H1 heading, descriptive content, and linked URLs will be intelligible to any consumer. However, a generic llmstxt.org file will not necessarily conform to this specification, as it may lack the required sections or header block.
Coexistence: A website MAY publish both a llmstxt.org-style file (for developer documentation use cases) and a file conforming to this specification (for business discovery use cases). In practice, the use cases rarely overlap: a dentist's office has no need for a documentation-focused file, and a Python library has no need for a business identity manifest. Sites that serve both purposes (e.g., a SaaS company with both a marketing site and developer docs) MAY combine both approaches in a single file by including documentation links as additional resources in the START HERE section.
Appendix E: Glossary
| Term | Definition |
|---|---|
| AI Layer | The collection of machine-readable resources published for AI consumption |
| AI Mirror | An AI-optimized subdomain serving enhanced structured data |
| AI Search Optimization (ASO) | The practice of structuring content for AI search visibility |
| Canonical Site | The primary human-facing website; authoritative source of truth |
| Conformance Level | An LLM-LD tier of implementation (Crawl-Ready, Ingest-Ready, Agent-Ready) |
| EOF Marker | The literal string EOF at the end of the file |
| Generative Engine Optimization (GEO) | Alternate term for ASO, emphasizing generative AI search |
| Header Block | Key-value metadata between the H1 heading and the first horizontal rule |
| Manifest | The llms.txt file itself |
| Resource | A machine-readable file in the AI layer |
| Section | An H2-headed block of content within the file |
Acknowledgments
This specification formalizes patterns that emerged organically across the AI search optimization community. It builds on the llmstxt.org convention proposed by Jeremy Howard (Answer.AI) in September 2024, extending it for the business website use case that practitioners in the SEO, GEO, and ASO communities have widely adopted. The specification was developed by CAPXEL LLC with input from early adopters of the LLM-LD standard and the LLM Disco Directory network.
Copyright Notice
Copyright © 2026 CAPXEL LLC. This specification is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
End of Specification