AI crawler documentation converged on IP verification, opt-out granularity, and infrastructure-level routing controls
Over the past year, AI crawler documentation has shifted decisively toward operational maturity: vendors have published IP-range lists (Anthropic, Amazon, OpenAI, Google), clarified and expanded user-agent strings to disambiguate search vs. training vs. specialized roles (Amazon added three distinct bots; OpenAI disclosed OAI-AdsBot for the first time), and reframed opt-out mechanisms from binary robots.txt blocks to granular page-level tags and infrastructure redirects. Simultaneously, platform providers—most prominently Cloudflare—have layered canonical-redirect and content-format verification features that allow publishers to steer AI training crawlers without affecting search traffic, signaling a shift from blocking to routing. The adoption of emerging standards like llms.txt by Perplexity and Cloudflare's visibility tooling suggest the ecosystem is moving toward declarative, machine-readable site postures. These changes address three converging pressures: firewall operators demand IP ranges for accurate verification; publishers demand fine-grained controls to monetize or selectively block; and crawler vendors demand clearer signals (llms.txt, directives tabs) to avoid over-crawling or missing content.
- IP range publication expansion
- Granular opt-out mechanisms
- UA string disambiguation
- Infrastructure-level routing controls
- Machine-readable site standards
Synthesized by Claude Haiku 4.5 from the last 365 days of detected events in this pillar. Regenerates each daily run. Methodology.
New CCBot IPv4 CIDR block added: 3.41.188.32/29
A new IPv4 CIDR block `3.41.188.32/29` has been appended to the [CCBot IP range list](https://commoncrawl.org/faq) under the "What is the IP range of the Common Crawl CCBot?" section. The date stamp on the block list sti
Perplexity bots doc adds llms.txt documentation index pointer
A new "Documentation Index" header block was prepended to the [Perplexity bots guide](https://docs.perplexity.ai/guides/bots), directing readers (and LLM crawlers) to fetch [https://docs.perplexity.ai/llms.txt](https://d
Digest Refresh: 8 Items Dropped, 2 New Items Added, Framing Shifts on Pay-Per-Crawl and Cloudflare Features
The crawler insights digest was substantially refreshed. Eight previous items were removed entirely: dedicated AI training crawlers approaching 50% of bot traffic; publisher struggles with the third-party scraper economy
New 30-day crawler digest: 14 items covering Cloudflare pay-per-crawl, redirects for AI training, bot traffic milestones, and publisher blocking trends
This source went from empty to a 14-item digest covering the AI crawler and bot-policy landscape through mid-April 2026. Key hard facts include: (1) [Cloudflare's "Redirects for AI Training"](https://blog.cloudflare.com/
Amazon expands crawler doc to three distinct bots with explicit AI training disclosures and new UA strings
The [Amazonbot developer page](https://developer.amazon.com/amazonbot) was substantially overhauled: it now documents **three separate crawlers** — `Amazonbot` (general; explicitly "may be used to train Amazon AI models"
Anthropic publishes IP range list for crawler verification, replacing "we do not publish IP ranges" statement
The previous text explicitly stated "we do not currently publish IP ranges, as we use service provider public IPs. This may change in the future." The [current page](https://support.anthropic.com/en/articles/8896518-does
CCBot UA string clarified with full URL, new "How does CCBot fetch a web page?" section added, ZStandard compression support added
Three substantive changes on the [Common Crawl FAQ](https://commoncrawl.org/faq): (1) The current UA string is now explicitly stated as `CCBot/2.0 (https://commoncrawl.org/faq/)` — the previous version only said the bot
Google renames crawler IP range JSON object from `googlebot.json` to `common-crawlers.json`
The [Google common crawlers reference page](https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers) changed the named IP-range data source for common crawlers from `googlebot.json` to `common-
OpenAI publishes full crawler/bot documentation page with four UA strings, including new OAI-AdsBot
The [OpenAI crawlers page](https://developers.openai.com/api/docs/bots) was created from scratch (previously empty), documenting four user agents: **OAI-SearchBot/1.3** (`Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) A
Amazonbot doc rewritten: adds meta-tag directives (noarchive/noindex), drops detailed robots.txt field listing and 24-hour refresh SLA
The page was substantially rewritten: (1) branding shifted from "Amazonbot" to "Amazon crawlers" throughout; (2) a new paragraph explicitly states that Amazon crawlers honor link-level `rel=nofollow` and page-level robot
IP range JSON source renamed from googlebot.json to common-crawlers.json
The authoritative JSON object for common crawler IP ranges was renamed from `googlebot.json` to `common-crawlers.json`. The page's last-updated date also advanced from 2025-04-25 to 2026-02-11.
Cloudflare AI Crawl Control adds 301-redirect feature for AI training crawlers hitting canonical URLs
A new feature — "Redirects for AI Training" — has been added to [Cloudflare's AI Crawl Control](https://developers.cloudflare.com/ai-crawl-control/reference/redirects-for-ai-training/). When toggled on via **AI Crawl Con
Cloudflare AI Crawl Control adds Content Format insights and renames Robots.txt tab to "Directives"
Cloudflare's [AI Crawl Control changelog](https://developers.cloudflare.com/changelog/post/2026-04-17-tools-for-agentic-internet/) documents two new additions: (1) a **Content Format** chart in the Metrics tab showing wh
Cloudflare Radar AI Insights adds three new AI bot/crawler visibility features
Cloudflare Radar's [AI Insights page](https://radar.cloudflare.com/ai-insights) gained three new features (announced 2026-04-17): (1) an [Adoption of AI Agent Standards widget](https://radar.cloudflare.com/ai-insights#ad
Cloudflare AI Crawl Control adds WAF rule preservation for custom modifications
A new capability was added to Cloudflare AI Crawl Control: custom modifications made directly in the WAF custom rules editor (e.g., path-based exceptions, extra user agents, additional expression clauses) are now preserv