Crawler past-year briefing · 13 material events

AI crawler documentation shifted toward IP verification, bot-specific UA disclosure, and infrastructure monetization

Over the past year, AI crawler documentation has undergone four major structural shifts. First, leading vendors—Anthropic, Amazon, Google, and OpenAI—have published or clarified canonical IP-range lists, enabling site operators to verify crawler authenticity via network-layer detection rather than User-Agent matching alone. Second, crawler UA strings have proliferated and become more granular: OpenAI disclosed OAI-AdsBot for ad-submission traffic, Amazon split Amazonbot into three distinct agents with independent AI-training semantics, and CCBot's canonical string was formalized with a trailing URL. Third, page-level opt-out mechanisms (Amazon's `noarchive` meta tag, Cloudflare's canonical redirects and pay-per-crawl) have emerged alongside robots.txt, widening the toolkit available to publishers. Fourth, Cloudflare's AI Crawl Control and Radar suite have matured into operational platforms—adding content-format diagnostics, WAF rule layering, and public visibility metrics for adoption of llms.txt and AI-specific robots.txt directives—suggesting crawler policy enforcement is becoming a managed service rather than a DIY configuration task.

IP verification expansion
UA string proliferation
Multi-layer opt-out mechanisms
Managed crawler infrastructure
Publisher monetization models

Synthesized by Claude Haiku 4.5 from the last 365 days of detected events in this pillar. Regenerates each daily run. Methodology.

Events in this pillar

Crawler Crawler insights (search) · 7h ago

Digest Refresh: 8 Items Dropped, 2 New Items Added, Framing Shifts on Pay-Per-Crawl and Cloudflare Features

The crawler insights digest was substantially refreshed. Eight previous items were removed entirely: dedicated AI training crawlers approaching 50% of bot traffic; publisher struggles with the third-party scraper economy

Crawler Crawler insights (search) · 7h ago

New 30-day crawler digest: 14 items covering Cloudflare pay-per-crawl, redirects for AI training, bot traffic milestones, and publisher blocking trends

This source went from empty to a 14-item digest covering the AI crawler and bot-policy landscape through mid-April 2026. Key hard facts include: (1) [Cloudflare's "Redirects for AI Training"](https://blog.cloudflare.com/

Crawler Amazon · 8h ago

Amazon expands crawler doc to three distinct bots with explicit AI training disclosures and new UA strings

The [Amazonbot developer page](https://developer.amazon.com/amazonbot) was substantially overhauled: it now documents **three separate crawlers** — `Amazonbot` (general; explicitly "may be used to train Amazon AI models"

Crawler Anthropic · 8h ago

Anthropic publishes IP range list for crawler verification, replacing "we do not publish IP ranges" statement

The previous text explicitly stated "we do not currently publish IP ranges, as we use service provider public IPs. This may change in the future." The [current page](https://support.anthropic.com/en/articles/8896518-does

Crawler Common Crawl · 8h ago

CCBot UA string clarified with full URL, new "How does CCBot fetch a web page?" section added, ZStandard compression support added

Three substantive changes on the [Common Crawl FAQ](https://commoncrawl.org/faq): (1) The current UA string is now explicitly stated as `CCBot/2.0 (https://commoncrawl.org/faq/)` — the previous version only said the bot

Crawler Google · 8h ago

Google renames crawler IP range JSON object from `googlebot.json` to `common-crawlers.json`

The [Google common crawlers reference page](https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers) changed the named IP-range data source for common crawlers from `googlebot.json` to `common-

Crawler OpenAI · 8h ago

OpenAI publishes full crawler/bot documentation page with four UA strings, including new OAI-AdsBot

The [OpenAI crawlers page](https://developers.openai.com/api/docs/bots) was created from scratch (previously empty), documenting four user agents: **OAI-SearchBot/1.3** (`Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) A

Crawler Amazon · 11h ago

Amazonbot doc rewritten: adds meta-tag directives (noarchive/noindex), drops detailed robots.txt field listing and 24-hour refresh SLA

The page was substantially rewritten: (1) branding shifted from "Amazonbot" to "Amazon crawlers" throughout; (2) a new paragraph explicitly states that Amazon crawlers honor link-level `rel=nofollow` and page-level robot

Crawler Google · 11h ago

AI crawler documentation shifted toward IP verification, bot-specific UA disclosure, and infrastructure monetization

Digest Refresh: 8 Items Dropped, 2 New Items Added, Framing Shifts on Pay-Per-Crawl and Cloudflare Features

New 30-day crawler digest: 14 items covering Cloudflare pay-per-crawl, redirects for AI training, bot traffic milestones, and publisher blocking trends

Amazon expands crawler doc to three distinct bots with explicit AI training disclosures and new UA strings

Anthropic publishes IP range list for crawler verification, replacing "we do not publish IP ranges" statement

CCBot UA string clarified with full URL, new "How does CCBot fetch a web page?" section added, ZStandard compression support added

Google renames crawler IP range JSON object from `googlebot.json` to `common-crawlers.json`

OpenAI publishes full crawler/bot documentation page with four UA strings, including new OAI-AdsBot

Amazonbot doc rewritten: adds meta-tag directives (noarchive/noindex), drops detailed robots.txt field listing and 24-hour refresh SLA

IP range JSON source renamed from googlebot.json to common-crawlers.json

Cloudflare AI Crawl Control adds 301-redirect feature for AI training crawlers hitting canonical URLs

Cloudflare AI Crawl Control adds Content Format insights and renames Robots.txt tab to "Directives"

Cloudflare Radar AI Insights adds three new AI bot/crawler visibility features

Cloudflare AI Crawl Control adds WAF rule preservation for custom modifications