What Is a User-Agent? How Bots Identify Themselves
A user-agent is the string a browser or bot uses to identify itself to a server. Learn how it works, how AI bots use it, and why it can be spoofed.

A user-agent is a short text string that a browser or bot includes in every web request to identify what it is — for example, which browser a person is using, or which crawler is fetching a page. Servers read it to recognize who is asking for content, and it's the value you target in robots.txt to allow or block specific crawlers. In AI search, the user-agent is how you tell GPTBot from PerplypexityBot from a human visitor, which makes it central to controlling and measuring AI access.
This guide explains what a user-agent is, the form of the string, how servers use it, its role in robots.txt, AI user-agents and their categories, the tokens that never appear in logs, and the problem of spoofing.
What does a user-agent string look like?
A user-agent is a line of text sent in the HTTP request headers. Browser user-agents are long and list rendering engines and versions; bot user-agents are usually shorter and include a recognizable name plus a link to documentation, such as a string identifying "GPTBot" or "PerplexityBot." The practical takeaway is that each crawler announces itself with a distinctive token inside its user-agent string, and that token is what you match on.
How do servers use the user-agent?
Servers and applications read the user-agent to recognize the client and respond accordingly — serving an appropriate layout to a browser, or identifying a crawler in access logs. It's the basis for log analysis (counting how often each bot visits), for robots.txt rules (which are keyed to user-agent), and for server-side decisions (such as challenging or blocking specific bots). In short, the user-agent is the primary signal for telling one visitor type from another.
How is the user-agent used in robots.txt?
robots.txt rules are grouped by user-agent: a User-agent: line names the crawler, and the following directives apply to it. The critical detail is that the name must match the crawler's token exactly — a typo like "GPT-Bot" instead of "GPTBot" silently fails, leaving the bot uncontrolled. Because each AI provider runs several user-agents for different purposes, effective rules name each one precisely rather than assuming one entry covers a whole company.
What are AI user-agents, and how are they categorized?
AI providers use distinct user-agents for distinct jobs, which lets you control them independently.
| Purpose | Example tokens |
|---|---|
| Training | GPTBot, ClaudeBot, CCBot |
| Retrieval / search | OAI-SearchBot, Claude-SearchBot, PerplexityBot |
| On-demand fetch | ChatGPT-User, Claude-User, Perplexity-User |
| Standard search | Googlebot, Bingbot |
Knowing which token does what is essential: allowing a retrieval token preserves AI-answer visibility, while a training token governs whether your content feeds future models.
Which AI controls never appear in your logs?
An important subtlety: some AI controls are robots.txt opt-out tokens rather than real crawler user-agents, and they never show up in your server access logs. Google-Extended and Applebot-Extended are the main examples — they're directives that tell Google and Apple whether they may use your content for AI training, but no bot announces itself by those names. So if you're auditing logs, don't expect to see them; you set them in robots.txt and verify by behavior, not by log entries.
Can user-agents be spoofed?
Yes — a user-agent is a self-declared claim, and anyone can set it to any value. This matters in AI search because a meaningful portion of traffic claiming to be an AI crawler is actually spoofed; one industry analysis found roughly 5.7% of requests presenting AI-crawler user-agents were fake. The implication is that you should never treat the user-agent as proof of identity. Reputable crawler verification confirms legitimacy by checking the request's IP against the provider's published IP ranges, or via reverse DNS — treating the user-agent as a claim to verify rather than trust. [Editor: optional Cliro tie-in on verified AI crawler detection beyond the user-agent string.]
User-agent checklist
- Match user-agent tokens exactly in robots.txt — typos silently fail.
- Name each AI bot separately, since one company runs several.
- Map tokens to purpose — training vs retrieval vs on-demand.
- Remember opt-out tokens (Google-Extended, Applebot-Extended) won't appear in logs.
- Don't trust the user-agent alone — verify by IP for important decisions.
- Audit logs regularly to see which agents actually visit.
Frequently asked questions
What is a user-agent?
A user-agent is a short text string a browser or bot includes in every web request to identify itself. Servers read it to recognize the client, and it's the value targeted in robots.txt to allow or block specific crawlers.
How is a user-agent used in robots.txt?
Rules are grouped by user-agent: a User-agent line names the crawler and the directives below apply to it. The name must match the crawler's token exactly, or the rule silently fails.
What are AI user-agents?
They're the distinct tokens AI providers use for different jobs — training (GPTBot, ClaudeBot, CCBot), retrieval/search (OAI-SearchBot, Claude-SearchBot, PerplexityBot), and on-demand fetch (ChatGPT-User, Perplexity-User) — allowing independent control.
Why don't Google-Extended and Applebot-Extended appear in my logs?
Because they're robots.txt opt-out tokens for AI training, not real crawler user-agents. No bot announces itself by those names; you set them in robots.txt rather than seeing them in access logs.
Can user-agents be faked?
Yes. A user-agent is self-declared, and roughly 5.7% of requests claiming AI-crawler identities have been found to be spoofed. Verify legitimacy by checking the IP against published ranges or reverse DNS rather than trusting the string.

Written by
Federico Ergang
Cliro cofounder & CEO
Federico Ergang is cofounder and CEO of Cliro, the AI visibility and GEO platform for Latin America.
Related articles
What Is an AI Crawler? The Bots Behind AI Search
AI crawlers are the bots AI companies use to fetch web content for training, retrieval and answers. Learn the types, the major bots, and why detection matters.
What Is robots.txt? Controlling Crawlers and AI Bots
robots.txt tells crawlers which parts of your site they may access. Learn how it works, how to control AI bots with it, and the mistakes to avoid.
What Is Crawling in SEO? How Search Bots Find Pages
Crawling is how search engine bots discover and read the pages on your site. Learn how it works, how to control it with robots.txt, and how to fix crawl errors.
