Weekly Dispatch · archived
Weekly Dispatch · Week 20 of 2026
Crawling & Publisher Controls
This week's discourse on AI crawling and publisher controls highlights escalating legal battles over scraped content, with new lawsuits against Meta and Udio, alongside the emergence of technical standards like AI.txt, LLMs.txt, and CoMP for granular control. Analyses reveal a surge in AI bot traffic eroding publisher revenue and the complexities of effectively blocking crawlers, often due to default platform settings overriding publisher directives.
- Beyond Robots.txt: Implementing AI.txt and LLMs.txt for Purpose-Based Scraping Control
Discusses proposed `ai.txt` and `llms.txt` standards for granular AI crawler control, highlighting `robots.txt` limitations and EU legal backing for purpose-based scraping.
"Purpose-based control allows granular bot selection (Yes to search, No to training), and it is a legal requirement regulated by the EU AI Act and TDM Directive."
- The Article III Standing Problem for AI-Scraping Anti-Circumvention Claims
Analyzes the vulnerability of DMCA anti-circumvention claims in AI scraping lawsuits due to plaintiffs' difficulty in proving Article III standing without demonstrable economic harm.
"Paradoxically, even in courts that recognise these claims, the very feature that makes Section 1201(a)(1) attractive to plaintiffs—liability untethered from infringement—may render many such claims constitutionally defective under modern Article III standing doctrine."
- Can Companies Insure Against AI's Growing Risks?
Examines the proliferation of IP litigation against AI companies for scraping training data, highlighting the financial stakes and the role of insurance in managing these risks.
"Several high-profile lawsuits have alleged that the developers of prominent large language models (LLMs) violated copyright protections in scraping training data for their models from the Web."
- Publishers, author Scott Turow accuse Meta and Mark Zuckerberg of training AI on copyrighted works
Reports on a class-action lawsuit against Meta and Mark Zuckerberg for allegedly training Llama AI models on millions of copyrighted works, including from "pirate sites," without permission.
"The plaintiffs allege that Meta scraped millions of copyrighted works from across the internet —including from "notorious pirate sites"— and used the content to train Llama, Meta's suite of AI models, without permission."
- TollBit: AI content licensing platform turns bot traffic into Publisher Revenue
Reviews TollBit, an AI content licensing platform that enables publishers to monetize AI crawler access, transforming uncompensated scraping into revenue-generating interactions.
"TollBit acts as a Toll Booth on the internet, ensuring AI companies pay for the content their AI agents consume."
- Udio admits to scraping YouTube audio for AI training in answer to Sony Music lawsuit
Reports that AI music startup Udio acknowledged using YouTube audio for AI training in response to a Sony Music copyright infringement lawsuit, while denying other claims.
"Udio admits that it obtained audio data from YouTube for use as training data."
- Landmark New Report Warns That A Flawed AI Content Market is Accelerating 'Content Cannibalization'
A report reveals how Big Tech's AI products are eroding website traffic and controlling licensing, leading to "content cannibalization" and a surge in bots bypassing restrictions.
"The rate of AI bots bypassing voluntary access restrictions has quadrupled in the past six months, from 3.3% to 12.9%."
- TIL Cloudflare blocks Claude's web fetch tool by default. Tested 8 AI crawlers across 50 sites
A field report shows Cloudflare blocks Claude's web fetch tool by default, and many sites unknowingly block AI bots, creating a "belt-and-suspenders problem" with `robots.txt`.
"The wild part: most founders don't know Cloudflare ships with AI bots blocked out of the box now. So even if your robots.txt says 'GPTBot allow,' Cloudflare can still block it at the edge."
- Why DMCA Claims Against Web Scrapers Face Long Odds
Argues that DMCA Section 1201 claims against web scrapers are challenging for platforms as they often lack copyright ownership of user-generated content, despite potential high statutory damages.
"Platforms face an uphill battle in winning on DMCA Section 1201 claims against data scrapers, as content platforms generally do not own the copyright to the user content in question."
- AI Bot Traffic Surge 300%: Why Publishers Are Losing Traffic in 2026
Reports a 300% surge in AI bot traffic causing publishers to lose revenue from "zero-click search," prompting new strategies like selective blocking and "pay-per-crawl" models.
"AI bot traffic has surged by 300%, rapidly changing how website traffic works. Publishers Are Losing Traffic: AI-generated answers reduce clicks, with up to 96% less traffic compared to traditional search."
- Crawled 1M domains to see who's blocking AI bots. The numbers are worse than I expected.
A study of 1M domains reveals widespread inconsistencies in AI bot blocking, with many sites having conflicting `robots.txt` and Terms of Service regarding AI scraping.
"7,575 sites prohibit AI scraping in their Terms of Service but don't enforce it technically. The agents see no restriction in robots.txt, the legal terms say stop. ToS gap."
- Navigating Copyright in the Age of Generative AI: EU, French, and UK Developments and Approaches
Discusses EU, French, and UK copyright developments for generative AI, emphasizing mandatory transparency for training data and requirements for crawlers to identify themselves.
"The EU Resolution insists on full mandatory transparency and source documentation regarding the use of copyrighted works by providers and deployers of general-purpose AI models placed on the EU market..."
- Gated Content and AI Search: Why It's Invisible
Explains that gated content is invisible to AI search engines because crawlers cannot bypass login walls or forms, impacting AI training datasets and real-time retrieval.
"Gated content is invisible to AI search engines. AI crawlers including GPTBot (OpenAI), PerplexityBot, ClaudeBot (Anthropic), and Google-Extended (Gemini) cannot fill out lead-capture forms or bypass login walls."
- AI Search and New Technical Standards for the Future Web
Examines the limitations of `robots.txt` for AI control and the emerging technical standards and licensing frameworks, including efforts to expand `robots.txt` for the AI age.
"The AIPREF Working Group is currently working on expanding the robots.txt file to address at least some of its shortcomings in the AI age and allow distinctions for search vs AI training."
- Generative AI – IP cases and policy tracker
Tracks ongoing IP cases and policy discussions concerning generative AI, including lawsuits against OpenAI and Meta for copyright infringement and web scraping.
"The complaint relates to the Llama platform and covers the following issues: (1) direct copyright infringement by torrenting (through Anna's Archive, LibGen, Sci-Hub and other pirate sites) (2) direct copyright infringement by web scraping..."
- CoMP (Content Monetization Protocols) Initiative Specification
Introduces the IAB Tech Lab's CoMP Initiative, an open technical standard for transparent interaction and monetization between Content Owners and AI Systems, addressing usage rights and authenticity.
"The CoMP Working Group is developing open technical standards to enable responsible, transparent interaction between Content Owners and AI Systems."
- News publishers target Common Crawl, the AI training data backdoor
Reports that the News/Media Alliance demanded Common Crawl cease unauthorized scraping and block AI companies from using news content for training, citing `robots.txt` insufficiency.
"News/Media Alliance sent a formal letter to Common Crawl demanding it stop unauthorized scraping and block AI companies from using news content for training."
- Publishers Back Amazon Against AI Scrapers. That Should Scare You.
Argues publishers' support for Amazon against Perplexity highlights their limited options, viewing Amazon as a "known predator" while Perplexity's model offers no compensation for scraped content.
"This is publishers choosing the predator they know over the one they cannot control."
- Google's AI Search Update: Better for Publishers
Comments on Google's AI Mode and AI Overviews updates aiming to drive traffic back to publishers through direct links and previews, addressing concerns about lost revenue from AI summaries.
"Google just announced updates to AI Mode and AI Overviews that promise to surface more original content and drive traffic back to publishers."
- Taboola's next act: an AI answer engine for publishers
Reports on publishers adopting AI-powered search and chatbot tools, like Taboola's DeeperDive, to enhance user engagement and mitigate the impact of zero-click search.
"More publishers are turning to AI-powered search and chatbot tools to make their sites stickier to users and offset the impact of zero-click search."
Agents
This week's reporting emphasizes the growing security risks and incidents associated with autonomous AI agents, alongside crucial developments in agent-to-agent communication protocols for enterprise deployment. Critiques of agentic commerce highlight challenges in consumer trust, merchant adaptation, and the need for robust dispute resolution infrastructure.
- The AI Agent Security Surface: What Gets Exposed When You Add Tools and Memory
Examines the expanded attack surfaces of autonomous AI agents, highlighting the gap between deployment speed and security readiness.
"88 percent of organizations reported confirmed or suspected AI agent security incidents in the past year; Only 14.4 percent of agentic systems went live with full security and IT approval.""
- AI agents create new risks requiring continuous monitoring and oversight
Discusses the escalating risks of autonomous AI agents, emphasizing the need for stringent, continuous monitoring to prevent unintended actions and security incidents.
"The recent report of Meta employees being given access to sensitive data after an engineer followed flawed advice from an AI agent, is a clear example.""
- Enterprise AI Agents Strategy: Where CIOs Should Deploy—and where to exercise discipline
Advises CIOs on strategically deploying enterprise AI agents within existing IT architecture and governance, emphasizing readiness over isolated projects.
"CIOs must ask: “Where do AI agents add the most enterprise value, and where do they bring fragility, cost instability, and governance risks?”""
- Agentic AI in commerce: A shift merchants can't ignore
Examines the impact of agentic commerce on merchants, highlighting concerns around customer ownership, loyalty, and the evolving protocol landscape.
"Trust and control are the central merchant challenge. As transactions are increasingly initiated by autonomous agents, merchants face real risks: price integrity, brand experience, loyalty disruption, and fraud systems that were built to block bots rather than serve them.""
- MCP vs A2A: 7 Critical Differences Between AI Agent Protocols
Explains the fundamental differences between MCP (agent-to-tool) and A2A (agent-to-agent) protocols, emphasizing their complementary roles in AI agent infrastructure.
"MCP vs A2A is the most misunderstood architectural decision in AI agent infrastructure right now and most teams get it wrong not because the protocols are complex, but because they're solving completely different problems at different layers of the stack.""
- When prompts become shells: RCE vulnerabilities in AI agent frameworks
Microsoft's security research uncovers critical RCE vulnerabilities in Semantic Kernel, demonstrating how prompt injection can lead to code execution in AI agents.
"A single prompt was enough to launch calc.exe on the device running our AI agent, with no browser exploit, malicious attachment, or memory corruption bug needed.""
- AI Agent Protocols for Agent Engine Optimization
Defines AI agent protocols (MCP, A2A, x402, UCP) as essential for autonomous systems to discover capabilities, exchange context, and handle payments for Agent Engine Optimization.
"AI agent protocols define how autonomous systems discover capabilities, exchange context, call actions, handle payments, and verify outcomes.""
- ACP vs MCP vs A2A: The Complete Guide to AI Agent Protocols
Provides a comprehensive guide to three key AI agent protocols—MCP, A2A, and ACP—explaining their architectural roles and how they enable interoperability.
"These three protocols are not alternatives. They are complementary layers of the same architecture.""
- The infrastructure gap in agentic commerce: payments are ready, disputes are not
Argues that agentic commerce lacks a crucial consent and permission architecture for dispute management, posing significant challenges for merchants, consumers, and processors.
"Without a consent and permission architecture built into the transaction record, disputes in agentic commerce will become almost impossible to arbitrate fairly — for merchants, consumers, and processors alike, warns Donald Kossmann.""
- Real AI Security Incidents: Lessons from the Field
Analyzes real-world AI security incidents, revealing that most stem from combined system interactions, over-permissioned identities, and limited visibility.
"Sensitive data leakage through generative AI tools, prompt injection attacks, and unauthorized access across SaaS environments are already occurring at scale.""
- AI Dev Patterns: A2A Protocol, MCP 2026 Roadmap, and Agent Interoperability, 2026-05-01
Clarifies that MCP and A2A protocols are complementary, with MCP handling agent-to-tool communication and A2A enabling agent-to-agent interoperability.
"Google published a Developer's Guide to AI Agent Protocols that clarifies a distinction that has caused significant confusion in the community: the Model Context Protocol (MCP) and Agent-to-Agent Protocol (A2A) are not competing standards—they solve fundamentally different problems and are designed to be used together.""
- How Sellers Must Adapt for Agentic Buying
Explains how agentic commerce fundamentally changes selling, requiring sellers to optimize for machine-readable data and adapt to new customer relationship dynamics.
"In agent-mediated commerce, the customer relationship often sits with the agent, not the seller.""
- Webinar Recap: Build Safe AI Agents for an Enterprise Deployment
Recaps best practices for deploying safe enterprise AI agents, emphasizing layered safety controls, rigorous testing, and matching guardrails to specific contexts.
"No single control ensures safe behavior. Input guardrails, output validation, prompt hardening, and testing must work together.""
- Adversaries Leverage AI for Vulnerability Exploitation, Augmented Operations, and Initial Access
Google Threat Intelligence reports on the maturing use of AI by adversaries for vulnerability exploitation, developing polymorphic malware, and orchestrating autonomous attacks.
"For the first time, GTIG has identified a threat actor using a zero-day exploit that we believe was developed with AI.""
Copyright & Legal
This week saw a new copyright infringement lawsuit against Meta and Mark Zuckerberg by publishers and authors over AI training data, while Japan's LDP proposed stricter AI policies and a UK parliamentary report detailed a working group on AI content licensing for smaller creators.
- Mark Zuckerberg 'personally authorized' Meta's copyright infringement, publishers allege - AP News
Five publishing houses and author Scott Turow sued Meta and Mark Zuckerberg, alleging illegal use of copyrighted works to train Llama.
"“Defendants reproduced and distributed millions of copyrighted works without permission, without providing any compensation to authors or publishers, and with full knowledge that their conduct violated copyright law,” the complaint reads in part."
- Wednesday 29 April 2026 Daily Report of Written Answers and Written Statements - UK Parliament
The UK government announced a working group to explore supporting independent and smaller creative organizations in licensing their content for AI.
"In particular, the 18th March 2026 Statement on Copyright and AI Progress announced a working group on independent and smaller creative organisations to explore whether there is a role for government to support their ability to license their content."
Web Ecosystem & AI Impact
This week's analysis highlights Google's AI search integrating user-generated content and adding more links, while new research confirms significant organic traffic and revenue losses for publishers due to AI Overviews. Publishers are also grappling with falling display CPMs and the need to adapt to a new brand discovery landscape.
- Google AI Search Now Quotes Reddit: What Marketers Must Know
Google's AI search now integrates user-generated content from platforms like Reddit, requiring marketers to rethink content and community strategies.
"Google confirmed on May 6, 2026 that its AI search features will now surface “a preview of perspectives” from firsthand sources — including Reddit, social media platforms, and web forums — directly inside AI-generated search summaries."
- The First Causal Proof: AI Overviews Cut Organic Clicks 38% Without Improving Search Quality
A new study provides causal evidence that Google's AI Overviews significantly reduce organic clicks to publishers without improving search quality.
"On April 3, 2026, researchers Saharsh Agarwal and Ananya Sen published a working paper on SSRN titled "Google AI Overviews and Publisher Traffic: Evidence from a Field Experiment." On April 30, Search Engine Journal broke the story to the broader industry."
- Food blogs beat AI for recipes: what a 2026 study found
A study reveals strong consumer preference for human-authored food blogs over AI-generated recipes, despite AI Overviews reducing overall organic clicks.
"The 300% gap between food blogs and AI is not marginal. It indicates that search infrastructure disruption and consumer trust are not the same thing."
- 5WPR Releases The GEO Reckoning at POSSIBLE Miami, Documenting the 18-Month Replacement of the Brand Discovery Playbook
A new report details how AI answer engines have replaced traditional brand discovery, causing significant traffic and revenue losses for publishers across the board.
"Global publisher traffic from Google fell 34% in twelve months. Business Insider lost 55% of its organic traffic and reduced staff by 21%. Chegg lost 49% and is suing Google."
- Google Adds More Links to AI Search | Let's Data Science
Google is adding more links to AI search results, including subscription highlights, but independent studies still show reduced publisher click-through rates.
"Independent studies continue to show lower publisher click-through when AI Overviews appear: Pew Research Center measured clicks of 8% with AI Overviews versus 15% without and found only 1% clicked links inside Overviews, while reporting to the UK Competition and Markets Authority from DMG Media and data from Digital Content Next documented larger CTR declines."