Configuring robots.txt
How to direct AI crawlers to your AI-optimized content while keeping them away from your main site. Two files, clean separation.
The two-file strategy
When you set up an AI subdomain, you need two separate robots.txt files with opposite rules:
- Main site (
yoursite.com/robots.txt) — Blocks AI crawlers, allows search engines - AI subdomain (
ai.yoursite.com/robots.txt) — Allows AI crawlers, blocks search engines
Your main site is for humans and Google. Your AI subdomain is for ChatGPT, Claude, and Perplexity. robots.txt enforces the separation.
Why separate sites?
The core idea behind the two-site architecture is independence. Your human-facing website and your AI-facing content have different audiences, different needs, and will evolve differently over time.
Future-proofing your AI content
AI systems are changing fast. What AI needs to understand your business today might be different from what it needs in six months. With a separate AI layer, you can:
- Add AI-specific content that doesn't belong on your main site (detailed entity relationships, machine-readable FAQs, structured pricing data)
- Remove content that confuses AI without affecting what humans see (marketing fluff, outdated promotions, content that gets misinterpreted)
- Experiment and iterate on your AI presence without touching your production website
- Respond to new AI capabilities as they emerge (new structured data formats, agent-specific instructions, real-time feeds)
If your AI content lives on your main site, you're stuck. Every change optimized for AI risks breaking something for humans, and vice versa. You can't serve two masters from one page.
Think of it like responsive design
You wouldn't serve the exact same layout to mobile and desktop — you adapt to the device. The AI subdomain is the same concept: adapting your content to the consumer. Humans get a beautiful, interactive website. AI gets clean, structured, machine-optimized data.
Looking ahead: As AI agents become more capable (booking appointments, making purchases, comparing options), your AI layer can evolve to support these interactions — without cluttering your human experience with machine-readable instructions.
Why block AI from your main site?
This might seem counterintuitive — don't you want AI to crawl your site? Yes, but you want AI to crawl the right version of your site. Here's why:
1. Signal-to-noise ratio
Your main site is built for humans: navigation menus, hero images, JavaScript frameworks, cookie banners, chat widgets. AI crawlers have to wade through all of this to extract the actual content. Your AI subdomain is pure signal — clean HTML, rich Schema.org, no clutter.
2. Consistent structured data
Your AI subdomain has guaranteed Schema.org markup on every page, plus llm-index.json, entities, and knowledge graphs. Your main site might have inconsistent or missing structured data depending on how it was built.
3. Control over what AI learns
When AI crawls your main site, you don't control what it extracts or how it interprets your content. With an AI subdomain, you're explicitly defining what AI should know about your business — the same way you'd brief a new employee.
4. Avoid duplicate/conflicting information
If AI crawls both your main site and your AI subdomain, it might find slightly different information (different wording, outdated pages on the main site, etc.). By blocking the main site, you ensure AI only sees your canonical, structured version.
Exception: AI Discovery Page. Your ADP at /ai-discovery should always be crawlable — it's the bridge that points AI crawlers to your AI subdomain. Make sure to add Allow: /ai-discovery in your main site's robots.txt.
The traffic flow
Here's how crawlers are directed with this setup:
AI Crawler
Wants your content
yoursite.com
robots.txt: BLOCKED
/ai-discovery
ALLOWED (bridge)
ai.yoursite.com
robots.txt: ALLOWED
Search Crawler
Indexing for search
yoursite.com
robots.txt: ALLOWED
The result: search engines index your human-friendly main site, AI systems consume your machine-optimized AI subdomain, and the AI Discovery Page connects them.
Main site robots.txt
This file blocks all known AI crawlers while preserving access for traditional search engines. Place it at yoursite.com/robots.txt.
Why Allow: /ai-discovery appears for every AI crawler: In robots.txt, specific user-agent rules override wildcard rules. If you block GPTBot with Disallow: /, it can't access anything — including /ai-discovery — even if you have a separate User-agent: * Allow: /ai-discovery. Each AI crawler needs its own explicit Allow: /ai-discovery line before the Disallow: / so it can find the bridge page that directs it to your AI subdomain.
AI subdomain robots.txt
This file does the opposite: welcomes all AI crawlers while blocking traditional search engines (to avoid duplicate content). Place it at ai.yoursite.com/robots.txt.
Quick reference
Here's a summary of what each crawler type sees:
| Crawler | Main Site | AI Subdomain |
|---|---|---|
| GPTBot (OpenAI) | Blocked | Allowed |
| ClaudeBot (Anthropic) | Blocked | Allowed |
| PerplexityBot | Blocked | Allowed |
| Google-Extended (AI) | Blocked | Allowed |
| Googlebot (Search) | Allowed | Blocked |
| Bingbot (Search) | Allowed | Blocked |
| /ai-discovery page | Allowed for all | — |
Keeping up with new crawlers
New AI crawlers appear regularly. When a new LLM or AI system launches, it typically announces its user-agent string. Add new crawlers to both files — blocked on main, allowed on AI subdomain.
Current major AI crawler user-agents to track:
- OpenAI: GPTBot, ChatGPT-User, OAI-SearchBot
- Anthropic: anthropic-ai, ClaudeBot, Claude-Web, Claude-SearchBot
- Google: Google-Extended, GoogleOther, Gemini-AI
- Meta: FacebookBot, meta-externalagent
- Perplexity: PerplexityBot, Perplexity-User
- Others: Amazonbot, Applebot-Extended, Bytespider, CCBot, cohere-ai, DeepSeekBot
Testing your setup
After deploying both files:
- Visit
yoursite.com/robots.txt— verify AI crawlers are blocked - Visit
ai.yoursite.com/robots.txt— verify AI crawlers are allowed - Check that
/ai-discoveryis explicitly allowed on the main site - Use Google's robots.txt Tester to validate syntax
Common mistake: Getting the two files swapped. Double-check that the main site blocks AI and the AI subdomain allows AI. It's easy to mix up.
Next: Build your AI Discovery Page
The ADP is the bridge that connects your main site to your AI subdomain.