LLMs.txt 1.0 — Plain-Text AI Manifest Specification

LLMs.txt 1.0 — Plain-Text AI Manifest Specification

Status
Draft
Published
February 2026
Companion to
LLM-LD 1.0 Specification, ADP 1.0 Specification
Maintained by
CAPXEL
Latest version
llmld.org/spec/llms-txt-v1
License
CC BY 4.0

LLMs.txt 1.0 Specification

Plain-Text AI Manifest for Website Discovery and Orientation


Version: 1.0
Status: Draft Specification
Published: February 2026
Latest Version: https://llmld.org/spec/llms-txt/1.0
Companion to: LLM-LD 1.0 Specification, AI Discovery Page (ADP) 1.0 Specification
Editors: CAPXEL LLC


Abstract

LLMs.txt is a specification for a plain-text Markdown file that provides AI systems with a concise, human- and machine-readable orientation to a website. It serves as a lightweight manifest that identifies the site, declares its canonical authority, enumerates its machine-readable resources, summarizes its products and services, and provides contact information — all in a single file that can be consumed in a fraction of a context window.

This specification defines:

  1. A file format (llms.txt) using Markdown with a predictable section structure
  2. Required and optional sections with defined semantics
  3. A metadata header block for machine-parseable fields
  4. Placement and discovery rules consistent with the LLM-LD ecosystem
  5. The relationship between llms.txt and companion structured-data resources (llm-index.json, entities.json, knowledge.graph.jsonld)

This document is a companion to the LLM-LD 1.0 Specification and the AI Discovery Page (ADP) 1.0 Specification. Together, the three standards provide a complete discovery and ingestion framework: the ADP is an HTML bridge for crawlers arriving at the primary domain, llms.txt is a plain-text summary for inference-time retrieval and agent orientation, and llm-index.json is a full structured-data representation of the site.


Table of Contents

  1. Introduction
  2. Terminology
  3. Relationship to Existing Standards
  4. File Placement and Discovery
  5. Document Structure
  6. Header Block
  7. Required Sections
  8. Recommended Sections
  9. Optional Sections
  10. Formatting Rules
  11. Size and Performance Considerations
  12. Security Considerations
  13. IANA Considerations
  14. Examples
  15. Changelog

1. Introduction

1.1 Background

AI systems consume web content in fundamentally different ways than human readers or traditional search crawlers. Large language models operating at inference time face strict context-window constraints that make crawling an entire website impractical. Retrieval-augmented generation (RAG) pipelines benefit from a single, authoritative summary that can orient the model before it decides which deeper resources to fetch. Autonomous agents need a quick overview of available actions, resources, and contact channels before they begin navigating.

In September 2024, Jeremy Howard proposed a convention of placing a Markdown file at /llms.txt to help LLMs navigate website content (llmstxt.org). That proposal targeted developer documentation — helping coding assistants find their way around API references and library docs. Since then, the convention has been widely adopted: Anthropic, Perplexity, Zapier, and hundreds of other organizations publish llms.txt files, and tools like Yoast SEO, Mintlify, and numerous generators automate their creation.

Critically, the market has already extended the format well beyond its original documentation focus. SEO practitioners, ecommerce platforms, local businesses, and SaaS companies are using llms.txt to communicate business identity, products, services, locations, and contact information to AI systems — a practice broadly known as Generative Engine Optimization (GEO) or AI Search Optimization (ASO). However, these implementations are ad hoc and inconsistent: every site invents its own section names, includes different metadata, and structures its content differently. There is no defined vocabulary, no required sections, and no interoperability between implementations.

This specification formalizes the emerging practice. It defines a predictable section structure, a metadata header block, and a defined vocabulary of sections optimized for business websites — the use case the market has organically adopted but that the original llmstxt.org proposal did not address. It further integrates llms.txt into the LLM-LD ecosystem, connecting it to structured-data companions (llm-index.json, entities.json, knowledge.graph.jsonld) and the LLM Disco Directory.

1.2 Design Principles

  1. Plain text first. The file MUST be readable by any system that can fetch a text file. No parsing libraries, JSON deserializers, or XML processors are required.

  2. Orientation, not duplication. The file orients an AI system — telling it what the site is, where the deeper resources live, and what actions are available. It does not attempt to replicate the full content of llm-index.json or the site itself.

  3. Predictable structure. Every llms.txt file following this specification uses the same section ordering with the same heading names. An AI system that has seen one conforming file can parse any other without additional instructions.

  4. Formalizing practice. This specification standardizes patterns already in widespread organic use across the SEO, GEO, and ASO communities, rather than inventing new conventions. Where the market has converged on a pattern, this specification adopts it.

  5. Minimal by default. A conforming file can be very short. Required sections establish identity and point to deeper resources. Everything else is optional enrichment.

  6. Compatible with the LLM-LD ecosystem. The file is designed to work alongside llm-index.json, the AI Discovery Page, entities.json, and knowledge.graph.jsonld. It references these resources by URL, creating a web of discovery.

  7. Backwards compatible with llmstxt.org. A file conforming to this specification is readable by any system that expects a generic llms.txt file per the llmstxt.org convention. The H1 heading, descriptive content, and linked URLs will be intelligible to any consumer.

1.3 Audience

This specification is intended for:

  • Web developers generating or authoring llms.txt for business websites
  • AI system developers consuming llms.txt at inference time or during crawling
  • SEO, GEO, and ASO practitioners optimizing sites for AI search visibility
  • Tool developers building generators, validators, and linters
  • Directory operators (such as the LLM Disco Directory) integrating llms.txt into their indexing pipelines

1.4 Document Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.


2. Terminology

AI Layer
The collection of machine-readable resources a website publishes for consumption by AI systems. This may live on a dedicated subdomain (e.g., ai.example.com), a well-known path, or the primary domain itself.
AI Search Optimization (ASO)
The practice of structuring website content and metadata for discovery and accurate representation by AI search systems, chatbots, and autonomous agents. Also known as Generative Engine Optimization (GEO) or Answer Engine Optimization (AEO).
Canonical Site
The primary human-facing website (e.g., https://www.example.com). The canonical site is the authoritative source of truth for all content.
AI Mirror
An optional AI-optimized subdomain (e.g., https://ai.example.com) that serves the same content as the canonical site with enhanced structured data, simplified markup, and permissive crawler policies.
Manifest
In this specification, the llms.txt file itself — a plain-text document that enumerates resources and provides orientation.
Resource
A machine-readable file published as part of the AI layer, such as llm-index.json, entities.json, sitemap.xml, or knowledge.graph.jsonld.
Header Block
The metadata key-value pairs that appear immediately after the H1 heading and before the first horizontal rule. These carry structured metadata in a plain-text-friendly format.

3. Relationship to Existing Standards

3.1 The llmstxt.org Convention

Jeremy Howard's llms.txt proposal (llmstxt.org) established the convention of placing a Markdown file at /llms.txt to help LLMs understand a website. The proposal defines a minimal structure: an H1 heading, an optional blockquote summary, free-form content, and H2 sections containing lists of links to detailed Markdown files. It includes an ## Optional section convention for lower-priority resources. Companion files llms-full.txt (full site content in one file) and per-page .md variants are also proposed.

The llmstxt.org convention was designed primarily for developer documentation and software projects. Its structure is intentionally loose — section names are free-form, no metadata is defined, and no companion structured-data resources are specified.

This specification addresses a different and complementary use case: business websites — local businesses, SaaS companies, ecommerce stores, agencies, and other organizations that need to communicate identity, offerings, and structured-data resources to AI systems. This is the use case the broader SEO/GEO community has organically adopted llms.txt for, but without a formal standard to ensure consistency and interoperability.

This specification extends the llmstxt.org convention with:

  • A defined section vocabulary optimized for business websites
  • A metadata header block with machine-parseable fields (version, canonical URL, language)
  • Explicit resource enumeration pointing to structured-data companions (llm-index.json, entities.json, knowledge.graph.jsonld)
  • Integration with the LLM-LD ecosystem (conformance levels, verification, directory membership)

Files conforming to this specification are backwards compatible with the llmstxt.org convention: any system that reads a generic llms.txt file will be able to consume the H1 heading, content, and linked URLs defined here.

3.2 LLM-LD 1.0

LLM-LD 1.0 defines llm-index.json — a comprehensive JSON-LD file containing structured data about a website. The llms.txt file acts as a lightweight companion:

Concernllm-index.jsonllms.txt
FormatJSON-LDMarkdown (plain text)
Primary consumerProgrammatic pipelines, agentsChat-based LLMs, RAG systems, human auditors
DepthComplete structured dataSummary and resource pointers
Typical size5–100 KB0.5–5 KB
Parsing requiredJSON parserNone (plain text)

The two files SHOULD be generated together. llms.txt SHOULD reference llm-index.json by URL in the START HERE section.

3.3 AI Discovery Page (ADP)

The ADP is an HTML page on the primary domain that bridges crawlers from the human web to the AI layer. The ADP's <head> includes a <link rel="ai-manifest"> element that points to llms.txt:

<link rel="ai-manifest" type="text/plain" href="https://ai.example.com/llms.txt" />

The three standards form a discovery chain:

Crawler arrives at primary domain
        │
        ▼
   ADP (/ai.html)          ← HTML bridge (ADP 1.0)
        │
        ├──► llms.txt       ← Plain-text orientation (this spec)
        │
        └──► llm-index.json ← Full structured data (LLM-LD 1.0)

3.4 robots.txt and sitemap.xml

llms.txt does not replace robots.txt or sitemap.xml. These files serve different purposes:

  • robots.txt controls crawler access permissions
  • sitemap.xml enumerates URLs for crawling
  • llms.txt provides orientation and resource discovery for AI systems

llms.txt MAY be referenced from robots.txt using a custom directive (see Section 4.3).

3.5 Current Market Practice

As of early 2026, llms.txt is in widespread organic use across multiple communities:

  • Developer documentation: Anthropic, Vercel, Stripe, and hundreds of software projects use llmstxt.org-style files to help coding assistants navigate API docs.
  • SEO/GEO tools: Yoast SEO auto-generates llms.txt for WordPress sites. LLMrefs, Rankability, and other tools provide generators and validators.
  • Ecommerce: BigCommerce, Shopify app developers, and DTC brands use llms.txt to surface product catalogs and pricing for AI shopping assistants.
  • Local businesses: Dental practices, law firms, agencies, and other SMBs use llms.txt to establish business identity and service areas for AI search.
  • Enterprise: Dell, Zapier, and other large organizations publish llms.txt with product feeds and support documentation.

This specification standardizes the patterns common across these implementations while defining the formal structure needed for interoperability, tooling, and ecosystem integration.


4. File Placement and Discovery

4.1 File Location

The llms.txt file MUST be served at the root path of the AI subdomain:

https://ai.example.com/llms.txt

The file MUST NOT be served on the primary domain. The primary domain's robots.txt blocks AI crawlers (except for the AI Discovery Page at /ai-discovery), so an llms.txt on the primary domain would be inaccessible to its intended consumers. The AI Discovery Page serves as the bridge from the primary domain to the AI subdomain, where llms.txt and all other machine-readable resources reside.

4.2 HTTP Requirements

The file MUST be served with:

  • Content-Type: text/plain; charset=utf-8 or text/markdown; charset=utf-8
  • Status Code: 200 OK
  • Encoding: UTF-8 without BOM

The server SHOULD include the following response headers:

HeaderValuePurpose
Cache-Controlpublic, max-age=86400Cache for 24 hours
Access-Control-Allow-Origin*Enable cross-origin fetches by AI agents
X-Content-Type-OptionsnosniffPrevent MIME type sniffing

4.3 Discovery Mechanisms

AI systems can discover llms.txt through any of the following mechanisms, listed in order of preference:

1. Direct URL convention

AI crawlers SHOULD check {domain}/llms.txt as a first-pass discovery step, mirroring the robots.txt convention. This is already standard practice across most AI crawlers and GEO tooling.

2. ADP link relation

The AI Discovery Page SHOULD include a <link> element pointing to llms.txt:

<link rel="ai-manifest" type="text/plain" href="https://ai.example.com/llms.txt" />

3. robots.txt reference

Implementations MAY include a reference in robots.txt:

# AI Manifest
LLMs-Txt: https://ai.example.com/llms.txt

4. LLM Disco Directory

The LLM Disco Directory maintains a registry of llms.txt URLs for all listed sites. AI systems can query the directory API to discover llms.txt for any registered domain.

5. HTTP Link header

Implementations MAY include an HTTP Link header on any page:

Link: <https://ai.example.com/llms.txt>; rel="ai-manifest"; type="text/plain"

4.4 URL Canonicalization

The URL referenced in all discovery mechanisms MUST be consistent. Implementations MUST NOT advertise a URL different from the one at which the file is actually served. The canonical URL is always the AI subdomain URL (https://ai.example.com/llms.txt).


5. Document Structure

An llms.txt file is a UTF-8 plain-text document formatted as Markdown. It consists of:

  1. An H1 heading declaring the site identity
  2. A header block of key-value metadata
  3. A sequence of H2 sections separated by horizontal rules
  4. A terminal EOF marker

The overall structure:

# {Site Name} AI Optimization Layer        ← H1 (required)
Version: 1.0                                ← Header block (required)
Last-Updated: 2026-02-07
Primary-Language: en-US

Canonical-Site: https://www.example.com
AI-Mirror: https://ai.example.com

---

## PURPOSE                                  ← Required section

...content...

---

## CANONICAL AUTHORITY                      ← Required section

...content...

---

## START HERE (CRAWLING GUIDANCE)           ← Required section

...content...

---

## PRODUCTS & SERVICES                      ← Recommended section

...content...

---

## ENTITY STATS                             ← Recommended section

...content...

---

## CONTACT                                  ← Required section

...content...

---
EOF

5.1 Section Ordering

Sections MUST appear in the order specified in this document. This predictable ordering allows AI systems to locate specific information by position without parsing section headers.

5.2 Section Separators

Each section MUST be preceded and followed by a Markdown horizontal rule (---). The horizontal rule after the header block serves as the separator before the first section.

5.3 EOF Marker

The file MUST end with the literal string EOF on its own line, preceded by a horizontal rule. This allows AI systems to confirm they have received the complete file.


6. Header Block

The header block appears immediately after the H1 heading and before the first horizontal rule. It consists of key-value pairs in Key: Value format, one per line. Blank lines within the header block are permitted for visual grouping.

6.1 Required Header Fields

FieldTypeDescription
VersionStringSpecification version (MUST be "1.0" for this version)
Last-UpdatedDateISO 8601 date (YYYY-MM-DD) of last generation or edit
Canonical-SiteURLThe primary human-facing website URL
FieldTypeDescription
Primary-LanguageStringBCP 47 language code (e.g., en-US)
AI-MirrorURLThe AI-optimized subdomain URL, if one exists

6.3 Optional Header Fields

FieldTypeDescription
GeneratorStringSoftware that generated this file (e.g., aso-generator/2.0)
LLM-LD-VersionStringVersion of LLM-LD specification the site conforms to
Conformance-LevelIntegerLLM-LD conformance level (1, 2, or 3)
ExpiresDateISO 8601 date after which this file should be re-fetched
Directory-ListingURLURL of the site's LLM Disco Directory listing

6.4 Example

# Acme Corp AI Optimization Layer
Version: 1.0  
Last-Updated: 2026-02-07  
Primary-Language: en-US  

Canonical-Site: https://www.acmecorp.com  
AI-Mirror: https://ai.acmecorp.com  

6.5 Trailing Whitespace

Header field lines SHOULD end with two trailing spaces (Markdown line break) to ensure correct rendering when the file is displayed as rendered Markdown. Parsers MUST trim trailing whitespace when extracting values.


7. Required Sections

The following sections MUST be present in every conforming llms.txt file.

7.1 PURPOSE

Heading: ## PURPOSE

Description: Declares the intent of the file and enumerates the AI use cases it supports.

Content requirements:

  • MUST contain a sentence identifying the site by name
  • MUST contain a list of supported AI use cases

Standard use-case list: Implementations SHOULD include the following use cases unless they are genuinely not applicable:

  • LLM crawlers
  • AI search systems
  • Agentic retrieval workflows
  • Entity-based knowledge extraction

Implementations MAY add additional use cases specific to their domain.

Example:

## PURPOSE

This is an AI-readable representation of Acme Corp content optimized for:

- LLM crawlers  
- AI search systems  
- Agentic retrieval workflows  
- Entity-based knowledge extraction  

7.2 CANONICAL AUTHORITY

Heading: ## CANONICAL AUTHORITY

Description: Declares which URL is the authoritative source of truth. This is critical for AI systems that must distinguish between the canonical site, the AI mirror, cached copies, and third-party reproductions.

Content requirements:

  • MUST contain a single sentence naming the canonical URL
  • The URL MUST match the Canonical-Site header field

Example:

## CANONICAL AUTHORITY

The canonical source of truth is always: https://www.acmecorp.com

7.3 START HERE (CRAWLING GUIDANCE)

Heading: ## START HERE (CRAWLING GUIDANCE)

Description: Enumerates the machine-readable resources available in the AI layer, ordered by recommended consumption priority. This is the most important section for AI agents: it tells them where to go next.

Content requirements:

  • MUST contain at least one numbered subsection with a resource URL
  • Each resource MUST be listed as an H3 heading (###) with a priority number and descriptive name
  • The resource URL MUST appear on its own line immediately below the heading

Standard resource ordering:

PriorityResourceDescription
1Full Site Intelligencellm-index.json — recommended first fetch
2Entity Indexentities.json — structured entity data
3Knowledge Graphknowledge.graph.jsonld — entity relationships
4Sitemapsitemap.xml — full URL listing

Implementations MUST include at least resource 1 (Full Site Intelligence). Resources 2–4 are RECOMMENDED if the corresponding files exist. Implementations MAY add additional resources (e.g., product feeds, FAQ feeds, pricing data).

Example:

## START HERE (CRAWLING GUIDANCE)

### 1. Full Site Intelligence (Recommended)
https://ai.acmecorp.com/llm-index.json

### 2. Entity Index
https://ai.acmecorp.com/entities.json

### 3. Knowledge Graph
https://ai.acmecorp.com/knowledge.graph.jsonld

### 4. Sitemap
https://ai.acmecorp.com/sitemap.xml

7.4 CONTACT

Heading: ## CONTACT

Description: Provides basic contact information for the organization.

Content requirements:

  • MUST contain the organization name
  • MUST contain at least one of: URL, email address, or phone number
  • Each piece of contact information SHOULD appear on its own line

Example:

## CONTACT

Acme Corp  
info@acmecorp.com  
https://www.acmecorp.com

The following sections SHOULD be present when applicable.

8.1 PRODUCTS & SERVICES

Heading: ## PRODUCTS & SERVICES

Description: Lists the site's products and/or services with links to their AI-layer pages.

Content requirements:

  • Each product or service MUST be listed as a Markdown list item (-) with its name
  • Each entry SHOULD include a URL to its AI-layer page on a second line, indented by two spaces
  • Entries MUST be deduplicated by name
  • If no products or services are available, the section SHOULD contain a fallback reference to entities.json

Format:

- {Product/Service Name}:  
  {URL}

Example:

## PRODUCTS & SERVICES

- SPYBOX:  
  https://ai.acmecorp.com/products/spybox

- Statilitix:  
  https://ai.acmecorp.com/products/statilitix

- SEO Consulting:  
  https://ai.acmecorp.com/services/seo-consulting

Fallback:

## PRODUCTS & SERVICES

See entities.json for full list.

8.2 ENTITY STATS

Heading: ## ENTITY STATS

Description: Provides a quantitative summary of the site's structured data coverage. This gives AI systems an at-a-glance understanding of the site's scope and data richness.

Content requirements:

  • MUST be a Markdown unordered list
  • Each item MUST be a label–value pair in the format - {Label}: {Value}
  • SHOULD include at minimum: Total Entities and Pages

Standard statistics:

StatisticDescription
Total EntitiesCount of all extracted entities
Products/SoftwareCount of SoftwareApplication and Product entities
ServicesCount of Service entities
PagesCount of indexed pages

Implementations MAY include additional statistics relevant to their domain (e.g., Articles, People, Locations, SKUs).

Example:

## ENTITY STATS

- Total Entities: 47
- Products/Software: 3
- Services: 5
- Pages: 22

9. Optional Sections

The following sections MAY be included for additional context.

9.1 ABOUT

Heading: ## ABOUT

Description: A brief plain-text summary of the organization, suitable for inclusion in an AI system's context window.

Content requirements:

  • SHOULD be 2–5 sentences
  • SHOULD NOT duplicate the one_liner or paragraph from llm-index.json verbatim; a unique summary is preferred
  • MUST NOT contain marketing hyperbole; keep factual

Example:

## ABOUT

Acme Corp is a marketing technology company founded in 2020, headquartered in Miami, Florida. The company builds AI-powered tools for competitive intelligence, web analytics, and conversion optimization, serving over 2,500 business customers globally.

9.2 KEY FACTS

Heading: ## KEY FACTS

Description: A bullet list of the most important facts about the site or organization. This section is particularly valuable for local businesses and service providers where key differentiators (ratings, years in business, insurance acceptance, certifications) drive AI recommendations.

Content requirements:

  • MUST be a Markdown unordered list
  • SHOULD contain 3–10 items
  • Each item SHOULD be a single sentence or phrase

Example:

## KEY FACTS

- Founded in 2020
- Headquartered in Miami, FL
- 3 SaaS products
- 2,500+ customers
- SOC 2 Type II certified

9.3 VERIFICATION

Heading: ## VERIFICATION

Description: Declares the site's verification status within the LLM Disco Directory or other trust registries.

Content requirements:

  • MUST name the verifying directory
  • SHOULD include the listing URL
  • SHOULD include the conformance level

Example:

## VERIFICATION

Verified by: LLM Disco Directory  
Listing: https://llmdisco.com/sites/acmecorp  
Conformance: Level 3 (Agent-Ready)  

9.4 ACTIONS

Heading: ## ACTIONS

Description: Lists the primary actions available on the site, for AI agents that need to understand available interaction endpoints before consulting llm-index.json.

Content requirements:

  • Each action SHOULD be a list item with a name and URL
  • Actions SHOULD be limited to the 3–5 most important; full action definitions belong in llm-index.json

Example:

## ACTIONS

- Start Free Trial: https://www.acmecorp.com/signup
- Book a Demo: https://www.acmecorp.com/demo
- Contact Sales: https://www.acmecorp.com/contact

9.5 FAQ

Heading: ## FAQ

Description: A small set of frequently asked questions, useful for giving AI systems quick answers to common queries without requiring a fetch of the full FAQ from llm-index.json.

Content requirements:

  • Each FAQ entry MUST use an H3 heading (###) for the question and a paragraph for the answer
  • SHOULD contain no more than 5 entries; full FAQ content belongs in llm-index.json

Example:

## FAQ

### Do you offer a free trial?
Yes. All products include a 14-day free trial with no credit card required.

### What integrations do you support?
We integrate with Google Analytics, HubSpot, Salesforce, and 50+ other platforms.

10. Formatting Rules

10.1 Markdown Dialect

llms.txt files MUST use CommonMark-compatible Markdown. The following Markdown constructs are used:

ConstructSyntaxUsage
H1 heading#File title (exactly one)
H2 heading##Section headings
H3 heading###Subsections (in START HERE and FAQ)
Unordered list-Lists of items
Horizontal rule---Section separators
Inline link[text](url)MAY be used but bare URLs are preferred for machine readability
Bare URLhttps://...Preferred format for resource URLs

10.2 Line Endings

Files MUST use Unix-style line endings (\n). Carriage return characters (\r) MUST NOT appear in the file.

10.3 Trailing Whitespace

Lines that should produce a Markdown line break (e.g., header fields, contact information) SHOULD end with two trailing spaces followed by a newline.

10.4 Blank Lines

Blank lines SHOULD be used to separate logical groups within a section. A single blank line SHOULD appear before and after each horizontal rule.

10.5 Character Encoding

The file MUST be encoded as UTF-8. A byte-order mark (BOM) MUST NOT be present.

10.6 No HTML

llms.txt files MUST NOT contain HTML tags. The file is plain text formatted with Markdown; HTML defeats the purpose of a lightweight, universally parseable format.


11. Size and Performance Considerations

11.1 Target Size

A llms.txt file SHOULD be under 4 KB. This ensures it fits comfortably within a single LLM context-window fetch even on systems with small context limits.

Files MUST NOT exceed 20 KB. If a site has too many products, services, or entities to fit within this limit, the PRODUCTS & SERVICES section SHOULD use the fallback pattern (referencing entities.json) rather than listing every item.

11.2 Token Budget

For reference, 4 KB of Markdown text is approximately 1,000–1,200 tokens in most tokenizers. This leaves ample room for the file to be included alongside other context in a typical 8K–128K context window.

11.3 Compression

Servers SHOULD support gzip or Brotli compression for llms.txt. A typical file compresses to 30–50% of its original size.

11.4 Caching

Servers SHOULD set Cache-Control: public, max-age=86400 (24 hours). AI crawlers that maintain their own caches SHOULD respect standard HTTP caching headers.


12. Security Considerations

12.1 No Sensitive Information

llms.txt files MUST NOT contain:

  • Passwords, API keys, or authentication tokens
  • Internal IP addresses or infrastructure details
  • Personal data of individuals (names, emails, phone numbers of employees are acceptable only if they are already publicly available on the website)
  • Non-public pricing or business terms

12.2 URL Integrity

All URLs in the file MUST use HTTPS. HTTP URLs MUST NOT be used.

12.3 Cross-Origin Considerations

Because llms.txt is a plain-text file intended for broad consumption, the Access-Control-Allow-Origin: * header is appropriate. However, the file MUST NOT be used as a vector for injecting content into AI systems. Implementations SHOULD NOT include prompt-injection patterns, hidden instructions, or adversarial text.

12.4 Canonical Verification

AI systems consuming llms.txt SHOULD verify that the Canonical-Site URL in the header block matches the domain from which the file was served (allowing for the www. / ai. subdomain convention). Files with mismatched canonical claims SHOULD be treated with reduced trust.


13. IANA Considerations

This specification relies on the ai-manifest link relation defined in the AI Discovery Page 1.0 Specification. No additional link relation registration is required.

13.2 Well-Known URI

This specification does not define a well-known URI. The file is served at /llms.txt on the relevant domain, following the convention established by robots.txt.

13.3 Media Type

The file SHOULD be served as text/plain; charset=utf-8. Implementations MAY serve it as text/markdown; charset=utf-8 if the server supports this media type.


14. Examples

14.1 Minimal Conforming File

This example shows the minimum required content for a valid llms.txt file under this specification:

# Acme Dental AI Optimization Layer
Version: 1.0  
Last-Updated: 2026-02-07  
Canonical-Site: https://www.acmedental.com  

---

## PURPOSE

This is an AI-readable representation of Acme Dental content optimized for:

- LLM crawlers  
- AI search systems  
- Agentic retrieval workflows  
- Entity-based knowledge extraction  

---

## CANONICAL AUTHORITY

The canonical source of truth is always: https://www.acmedental.com

---

## START HERE (CRAWLING GUIDANCE)

### 1. Full Site Intelligence (Recommended)
https://ai.acmedental.com/llm-index.json

---

## CONTACT

Acme Dental  
https://www.acmedental.com

---
EOF

14.2 Complete File — SaaS Company

# Acme Corp AI Optimization Layer
Version: 1.0  
Last-Updated: 2026-02-07  
Primary-Language: en-US  

Canonical-Site: https://www.acmecorp.com  
AI-Mirror: https://ai.acmecorp.com  

---

## PURPOSE

This is an AI-readable representation of Acme Corp content optimized for:

- LLM crawlers  
- AI search systems  
- Agentic retrieval workflows  
- Entity-based knowledge extraction  

---

## CANONICAL AUTHORITY

The canonical source of truth is always: https://www.acmecorp.com

---

## START HERE (CRAWLING GUIDANCE)

### 1. Full Site Intelligence (Recommended)
https://ai.acmecorp.com/llm-index.json

### 2. Entity Index
https://ai.acmecorp.com/entities.json

### 3. Knowledge Graph
https://ai.acmecorp.com/knowledge.graph.jsonld

### 4. Sitemap
https://ai.acmecorp.com/sitemap.xml

---

## ABOUT

Acme Corp is a marketing technology company founded in 2020, headquartered in Miami, Florida. The company builds AI-powered tools for competitive intelligence, web analytics, and conversion optimization, serving over 2,500 business customers globally.

---

## PRODUCTS & SERVICES

- SPYBOX:  
  https://ai.acmecorp.com/products/spybox

- Statilitix:  
  https://ai.acmecorp.com/products/statilitix

- SEO Consulting:  
  https://ai.acmecorp.com/services/seo-consulting

---

## ENTITY STATS

- Total Entities: 47
- Products/Software: 3
- Services: 5
- Pages: 22

---

## ACTIONS

- Start Free Trial: https://www.acmecorp.com/signup
- Book a Demo: https://www.acmecorp.com/demo
- Contact Sales: https://www.acmecorp.com/contact

---

## VERIFICATION

Verified by: LLM Disco Directory  
Listing: https://llmdisco.com/sites/acmecorp  
Conformance: Level 3 (Agent-Ready)  

---

## CONTACT

Acme Corp  
info@acmecorp.com  
https://www.acmecorp.com

---
EOF

14.3 Complete File — Local Business

# Acme Dental AI Optimization Layer
Version: 1.0  
Last-Updated: 2026-02-07  
Primary-Language: en-US  

Canonical-Site: https://www.acmedental.com  
AI-Mirror: https://ai.acmedental.com  

---

## PURPOSE

This is an AI-readable representation of Acme Dental content optimized for:

- LLM crawlers  
- AI search systems  
- Agentic retrieval workflows  
- Entity-based knowledge extraction  

---

## CANONICAL AUTHORITY

The canonical source of truth is always: https://www.acmedental.com

---

## START HERE (CRAWLING GUIDANCE)

### 1. Full Site Intelligence (Recommended)
https://ai.acmedental.com/llm-index.json

### 2. Entity Index
https://ai.acmedental.com/entities.json

### 3. Knowledge Graph
https://ai.acmedental.com/knowledge.graph.jsonld

### 4. Sitemap
https://ai.acmedental.com/sitemap.xml

---

## PRODUCTS & SERVICES

- Dental Cleaning:  
  https://ai.acmedental.com/services/cleaning

- Teeth Whitening:  
  https://ai.acmedental.com/services/whitening

---

## ENTITY STATS

- Total Entities: 8
- Products/Software: 0
- Services: 2
- Pages: 4

---

## KEY FACTS

- Serving Tampa since 2006
- 4.8 star rating (127 reviews)
- Same-day emergencies available
- Most insurance accepted

---

## FAQ

### Do you accept my insurance?
We accept most major dental insurance including Delta, Cigna, MetLife, and Aetna. Call us to verify your specific plan.

### Do you offer emergency appointments?
Yes. We reserve time for same-day emergencies. Call us immediately if you are experiencing dental pain.

---

## VERIFICATION

Verified by: LLM Disco Directory  
Listing: https://llmdisco.com/sites/acmedental  
Conformance: Level 3 (Agent-Ready)  

---

## CONTACT

Acme Dental  
info@acmedental.com  
+1-813-555-1234  
https://www.acmedental.com

---
EOF

14.4 Complete File — Ecommerce Store

# Acme Outdoors AI Optimization Layer
Version: 1.0  
Last-Updated: 2026-02-07  
Primary-Language: en-US  

Canonical-Site: https://www.acmeoutdoors.com  
AI-Mirror: https://ai.acmeoutdoors.com  

---

## PURPOSE

This is an AI-readable representation of Acme Outdoors content optimized for:

- LLM crawlers  
- AI search systems  
- Agentic retrieval workflows  
- Entity-based knowledge extraction  
- AI shopping assistants  

---

## CANONICAL AUTHORITY

The canonical source of truth is always: https://www.acmeoutdoors.com

---

## START HERE (CRAWLING GUIDANCE)

### 1. Full Site Intelligence (Recommended)
https://ai.acmeoutdoors.com/llm-index.json

### 2. Entity Index
https://ai.acmeoutdoors.com/entities.json

### 3. Product Feed
https://ai.acmeoutdoors.com/products.json

### 4. Knowledge Graph
https://ai.acmeoutdoors.com/knowledge.graph.jsonld

### 5. Sitemap
https://ai.acmeoutdoors.com/sitemap.xml

---

## PRODUCTS & SERVICES

- Alpine Pro Tent (4-Person):  
  https://ai.acmeoutdoors.com/products/alpine-pro-tent

- TrailMaster Hiking Boots:  
  https://ai.acmeoutdoors.com/products/trailmaster-boots

- Summit Down Jacket:  
  https://ai.acmeoutdoors.com/products/summit-down-jacket

See entities.json for full catalog (240+ products).

---

## ENTITY STATS

- Total Entities: 267
- Products: 243
- Pages: 38
- Articles: 24

---

## KEY FACTS

- Family-owned since 1998
- 243 products across camping, hiking, and climbing
- Free shipping on orders over $75
- 30-day return policy
- 4.7 star average (2,100+ reviews)

---

## ACTIONS

- Shop All Products: https://www.acmeoutdoors.com/shop
- Find a Store: https://www.acmeoutdoors.com/stores

---

## CONTACT

Acme Outdoors  
support@acmeoutdoors.com  
+1-303-555-7890  
https://www.acmeoutdoors.com

---
EOF

15. Changelog

Version 1.0 (February 2026)

  • Initial specification release
  • Defined header block with required, recommended, and optional fields
  • Required sections: PURPOSE, CANONICAL AUTHORITY, START HERE, CONTACT
  • Recommended sections: PRODUCTS & SERVICES, ENTITY STATS
  • Optional sections: ABOUT, KEY FACTS, VERIFICATION, ACTIONS, FAQ
  • Discovery mechanisms aligned with ADP 1.0 and LLM-LD 1.0
  • Size limit: 20 KB maximum, 4 KB target
  • Backwards compatibility with llmstxt.org convention

Appendix A: Section Reference

Quick reference of all sections by requirement level:

Required

SectionHeadingPurpose
PURPOSE## PURPOSEDeclare file intent and AI use cases
CANONICAL AUTHORITY## CANONICAL AUTHORITYName the authoritative source URL
START HERE## START HERE (CRAWLING GUIDANCE)Enumerate machine-readable resources
CONTACT## CONTACTProvide basic contact information
SectionHeadingPurpose
PRODUCTS & SERVICES## PRODUCTS & SERVICESList offerings with AI-layer URLs
ENTITY STATS## ENTITY STATSQuantitative summary of structured data

Optional

SectionHeadingPurpose
ABOUT## ABOUTBrief organizational summary
KEY FACTS## KEY FACTSBullet list of important facts
VERIFICATION## VERIFICATIONDirectory membership and conformance
ACTIONS## ACTIONSPrimary interaction endpoints
FAQ## FAQCommon questions and answers

Appendix B: Header Field Reference

FieldRequiredTypeExample
VersionYesString1.0
Last-UpdatedYesDate2026-02-07
Canonical-SiteYesURLhttps://www.example.com
Primary-LanguageRecommendedStringen-US
AI-MirrorRecommendedURLhttps://ai.example.com
GeneratorOptionalStringaso-generator/2.0
LLM-LD-VersionOptionalString1.0
Conformance-LevelOptionalInteger3
ExpiresOptionalDate2026-02-14
Directory-ListingOptionalURLhttps://llmdisco.com/sites/example

Appendix C: Implementation Checklist

Minimal Conformance

  • [ ] File served at /llms.txt
  • [ ] H1 heading with site name
  • [ ] Header block with Version, Last-Updated, Canonical-Site
  • [ ] PURPOSE section with use-case list
  • [ ] CANONICAL AUTHORITY section naming the canonical URL
  • [ ] START HERE section with at least llm-index.json URL
  • [ ] CONTACT section with organization name and at least one contact method
  • [ ] EOF marker
  • [ ] UTF-8 encoding, no BOM
  • [ ] File size under 20 KB
  • [ ] Primary-Language and AI-Mirror header fields
  • [ ] PRODUCTS & SERVICES section
  • [ ] ENTITY STATS section
  • [ ] CORS header (Access-Control-Allow-Origin: *)
  • [ ] Cache-Control header set

Full Implementation

  • [ ] All optional header fields populated
  • [ ] ABOUT, KEY FACTS, VERIFICATION, ACTIONS, FAQ sections
  • [ ] Referenced from ADP via <link rel="ai-manifest">
  • [ ] Referenced from robots.txt via LLMs-Txt: directive
  • [ ] Registered with LLM Disco Directory
  • [ ] Automated regeneration on content change

Appendix D: Relationship to llmstxt.org

This specification and the llmstxt.org proposal share the same filename convention (/llms.txt) and the same basic Markdown format. They serve different but complementary purposes:

Aspectllmstxt.org (Howard)LLMs.txt 1.0 (This Specification)
Primary use caseDeveloper documentation, software projectsBusiness websites, local businesses, SaaS, ecommerce
Typical contentLinks to Markdown docs and API referencesBusiness identity, structured-data resources, products/services
Companion filesllms-full.txt, per-page .md variantsllm-index.json, entities.json, knowledge.graph.jsonld
StructureH1 + blockquote + free-form H2 sectionsH1 + header block + defined section vocabulary
MetadataNoneStructured header block (Version, Canonical-Site, etc.)
Section namesFree-form (author chooses)Defined vocabulary (PURPOSE, START HERE, etc.)
EcosystemStandalone conventionPart of LLM-LD + ADP + LLM Disco Directory
Spec maturityInformal proposalFormal specification with conformance requirements

Backwards compatibility: A file conforming to this specification is readable by any system that expects a generic llmstxt.org-style file. The H1 heading, descriptive content, and linked URLs will be intelligible to any consumer. However, a generic llmstxt.org file will not necessarily conform to this specification, as it may lack the required sections or header block.

Coexistence: A website MAY publish both a llmstxt.org-style file (for developer documentation use cases) and a file conforming to this specification (for business discovery use cases). In practice, the use cases rarely overlap: a dentist's office has no need for a documentation-focused file, and a Python library has no need for a business identity manifest. Sites that serve both purposes (e.g., a SaaS company with both a marketing site and developer docs) MAY combine both approaches in a single file by including documentation links as additional resources in the START HERE section.


Appendix E: Glossary

TermDefinition
AI LayerThe collection of machine-readable resources published for AI consumption
AI MirrorAn AI-optimized subdomain serving enhanced structured data
AI Search Optimization (ASO)The practice of structuring content for AI search visibility
Canonical SiteThe primary human-facing website; authoritative source of truth
Conformance LevelAn LLM-LD tier of implementation (Crawl-Ready, Ingest-Ready, Agent-Ready)
EOF MarkerThe literal string EOF at the end of the file
Generative Engine Optimization (GEO)Alternate term for ASO, emphasizing generative AI search
Header BlockKey-value metadata between the H1 heading and the first horizontal rule
ManifestThe llms.txt file itself
ResourceA machine-readable file in the AI layer
SectionAn H2-headed block of content within the file

Acknowledgments

This specification formalizes patterns that emerged organically across the AI search optimization community. It builds on the llmstxt.org convention proposed by Jeremy Howard (Answer.AI) in September 2024, extending it for the business website use case that practitioners in the SEO, GEO, and ASO communities have widely adopted. The specification was developed by CAPXEL LLC with input from early adopters of the LLM-LD standard and the LLM Disco Directory network.


Copyright © 2026 CAPXEL LLC. This specification is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).


End of Specification