Amazon · AI Content Ecosystem Insights

Documented user-agents (3)

Each distinct UA this vendor publishes on its docs page, extracted by Haiku from the latest snapshot. New UAs appearing or scope changes here are the high-signal events to watch.

User-agent	Purpose	Scope / when it fires	Opt-out
`Amazonbot`	Improve products and services; train Amazon AI models	General web crawl for product improvement and AI training	`User-agent: Amazonbot / Disallow: / in robots.txt`
`Amzn-SearchBot`	Improve search experiences in Amazon products and services	Content indexing for Alexa and Rufus search experiences	`User-agent: Amzn-SearchBot / Disallow: / in robots.txt`
`Amzn-User`	Support user actions requiring up-to-date information for live responses	User-triggered fetch for live information to answer Alexa queries	`User-agent: Amzn-User / Disallow: / in robots.txt`

Change timeline — diffs over time with insights

Each block is a detected change: the new-vs-prior snapshot diff and the LLM-written insight. Newest first.

2025-10-22 → 2026-04-19 179 days apart

+100 −14

Amazonbot doc rewritten: adds meta-tag directives (noarchive/noindex), drops detailed robots.txt field listing and 24-hour refresh SLA

material importance 0.82

View diff

Index: amazonbot
===================================================================
--- amazonbot	2025-10-22
+++ amazonbot	2026-04-19
@@ -1,14 +1,100 @@
-Amazonbot respects the robots.txt protocol, honors the user-agent and the allow/disallow directives, enabling webmasters to manage how crawlers access their site. Amazonbot attempts to read robots.txt files at the host level (for example example.com), so it looks for robots.txt at example.com/robots.txt. If a domain has multiple hosts, then we will honor robots rules exposed under each host. For example, in this scenario, if there is also a site.example.com host, it will look for robots.txt at example.com/robots.txt and also at site.example.com/robots.txt. If example.com/robots.txt blocks Amazonbot, but there are no robots.txt files on site.example.com or page.example.com, then Amazonbot cannot crawl example.com (blocked by its robots.txt), but will crawl site.example.com and page.example.com.
-In the event Amazonbot cannot fetch robots.txt due to IP or user agent blocking, parsing errors, network timeouts, or any other non-successful status codes (such as 3XX, 4XX or 5XX), Amazonbot will attempt to refetch robots.txt or use a cached copy from the last 30 days. If both these approaches fail, Amazonbot will behave as if robots.txt does not exist and will crawl the site. When accessible, Amazonbot will respond to changes in robots.txt files within 24 hours.
-Amazonbot honors the "Robots Exclusion protocol" defined at (
-https://www.rfc-editor.org/rfc/rfc9309.html
-) and recognizes the following fields. The field names are interpreted as case-insensitive. However the values for each of these fields are case-sensitive.
-user-agent
-: identifies which crawler the rules apply to.
-allow
-: a URL path that may be crawled.
-disallow
-: a URL path that may not be crawled.
-sitemap
-: the complete URL of a sitemap.
-Note: Amazonbot does not currently support the crawl-delay directive
\ No newline at end of file
+Alexa
+Amazon Appstore
+Ring
+AWS
+Documentation
+Console
+as
+Settings
+Sign out
+Notifications
+Alexa
+Amazon Appstore
+Ring
+AWS
+Documentation
+Support
+Contact Us
+My Cases
+Console
+Support
+Contact Us
+My Cases
+as
+Settings
+Sign out
+Webmasters can manage how their sites and content are used by Amazon with the following web crawlers. Amazon honors industry standard opt-out directives. Each setting is independent of the others, and may take ~24 hours for our systems to reflect changes.
+Amazonbot
+Amazonbot is used to improve our products and services. This helps us provide more accurate information to customers and may be used to train Amazon AI models.
+User Agent String:
+Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +
+https://developer.amazon.com/support/amazonbot
+) Chrome/119.0.6045.214 Safari/537.36
+Published IP Addresses:
+https://developer.amazon.com/amazonbot/ip-addresses/
+Amzn-SearchBot
+Amzn-SearchBot is used to improve search experiences in Amazon products and services. By permitting Amzn-SearchBot access to your website, your content is eligible to appear in search experiences such as Alexa and Rufus. Amzn-SearchBot does not crawl content for generative AI model training.
+User Agent String:
+Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amzn-SearchBot/0.1) Chrome/119.0.6045.214 Safari/537.36
+Published IP Addresses:
+https://developer.amazon.com/amazonbot/searchbot-ip-addresses/
+Amzn-User
+Amzn-User supports user actions, such as responding to Alexa queries that require up-to-date information. For example, when a customer asks a question, Amzn-User may fetch live information from the web to provide accurate answers on the user’s behalf.
+Amzn-User does not crawl content for generative AI model training.
+User Agent String:
+Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amzn-User/0.1) Chrome/119.0.6045.214 Safari/537.36
+Published IP Addresses:
+https://developer.amazon.com/amazonbot/live-ip-addresses/
+Our Approach to Robots.txt
+Amazon respects the
+Robots Exclusion Protocol
+, honoring the user-agent and the allow/disallow directives. Amazon will fetch host-level robots.txt files or use a cached copy from the last 30 days. When a file can’t be fetched, Amazon will behave as if it does not exist.
+Amazon attempts to read robots.txt files at the host level (for example
+example.com
+), so it looks for robots.txt at
+example.com/robots.txt
+. If a domain has multiple hosts, then we will honor robots rules exposed under each host. For example, if there is also a
+site.example.com
+host, it will look for robots.txt at
+site.example.com/robots.txt
+When Amazon crawlers access web pages they respect the link-level rel=nofollow directive, and page level robots meta tags of noarchive (do not use the page for model training), noindex (do not index the page) and none (do not index the page). Amazon crawlers do not support the crawl-delay directive.
+Contact Us
+If you are a content owner or publisher and have questions, please contact us at
[email protected]
+. Always include any relevant domain names in your message.
+Back to Top
+Follow us:
+Legal
+Terms and agreement
+Amazon Developers Service Portal terms of use
+Program Materials license agreement
+Amazon Appstore
+Developer portal
+Amazon Fire TV
+Fire tablets
+Alexa
+Developer portal
+Alexa Skills Kit
+Alexa Voice Service
+Alexa Fund
+Other services & APIs
+Login with Amazon
+Amazon Data Portability
+Amazon Merch on Demand
+Frustration-Free Setup
+Amazon Incentives API
+Amazon Music
+Just Walk Out technology by Amazon
+Blogs
+Appstore Developer blog
+Alexa Developer blog
+Alexa Science blog
+Support
+Amazon Developer support
+Appstore Developer Community
+Alexa Skills community
+FAQs
+© 2010-2026, Amazon.com, Inc. or its affiliates. All Rights Reserved.
+Terms
+Amazon Developer Blog
+Contact Us
\ No newline at end of file

Events

Crawler Amazon · 48d ago

Amazon expands crawler doc to three distinct bots with explicit AI training disclosures and new UA strings

The [Amazonbot developer page](https://developer.amazon.com/amazonbot) was substantially overhauled: it now documents **three separate crawlers** — `Amazonbot` (general; explicitly "may be used to train Amazon AI models"

Crawler Amazon · 48d ago

Amazonbot doc rewritten: adds meta-tag directives (noarchive/noindex), drops detailed robots.txt field listing and 24-hour refresh SLA

The page was substantially rewritten: (1) branding shifted from "Amazonbot" to "Amazon crawlers" throughout; (2) a new paragraph explicitly states that Amazon crawlers honor link-level `rel=nofollow` and page-level robot