About AI Content Ecosystem Insights
A monitoring publication for the AI content ecosystem: what AI crawlers publicly say about themselves, what policy and infrastructure actors are doing around them, and how the emerging AI agent layer is taking shape.
What's tracked
- 12 AI crawler / bot documentation pages (full-page HTML diff)
- 12 content-ecosystem news sources (RSS + relevance filter)
- 8 AI agent infrastructure sources (spec repos, draft specs, vendor blogs)
How it works
Once a day at 08:00 UTC, an automated pipeline fetches each source, compares against its previous snapshot, and decides if anything material changed. For content-ecosystem and agent-infrastructure sources (which publish a stream of posts rather than a single page), each new post goes through a two-stage relevance filter first: a cheap keyword regex, then a Claude Haiku classifier that decides whether the post is about AI crawlers, training data, bot policy, content regulation, or agent infrastructure.
Items that pass the filter are analyzed by Claude Sonnet, which classifies each as material, cosmetic, or noise and writes a structured summary. Only material changes become entries in the feed.
LLM disclosure
Every event page's What changed and Implication sections are written by Claude Sonnet 4.6 from the raw diff, which is always shown (often collapsed) on the same page. The diff is the authoritative record; the LLM sections are our best attempt to make that diff legible.
The State-of-Play matrix on the homepage is regenerated daily by Claude Haiku 4.5, which extracts a small set of factual fields from each crawler's latest snapshot. When a vendor's documentation doesn't explicitly address a field, the value appears as "unknown" rather than guessed.
Source of truth
The repo is the source of truth. Every snapshot is a committed file; git log
for any source shows its full history. If the LLM misreads a diff, the raw diff is still
right there, and the git log shows when the human record differs.
Reporting errors / suggesting sources
Open an issue on the tracker's GitHub repo, or email the maintainer team. To request a new source, include the URL, the source type (page to diff vs. RSS feed vs. GitHub repo vs. IETF draft), and why it matters.