GEO

What Is a User-Agent? How Bots Identify Themselves

A user-agent is the string a browser or bot uses to identify itself to a server. Learn how it works, how AI bots use it, and why it can be spoofed.

June 11, 20264 min

Summarize with ChatGPT

Summarize with Claude

Summarize with Perplexity

Summarize with Gemini

Summarize with Grok

Summarize with Google AI Mode

Summarize with Microsoft Copilot

What Is a User-Agent? How Bots Identify Themselves

A user-agent is a short text string that a browser or bot includes in every web request to identify what it is — for example, which browser a person is using, or which crawler is fetching a page. Servers read it to recognize who is asking for content, and it's the value you target in robots.txt to allow or block specific crawlers. In AI search, the user-agent is how you tell GPTBot from PerplypexityBot from a human visitor, which makes it central to controlling and measuring AI access.

This guide explains what a user-agent is, the form of the string, how servers use it, its role in robots.txt, AI user-agents and their categories, the tokens that never appear in logs, and the problem of spoofing.

What does a user-agent string look like?

A user-agent is a line of text sent in the HTTP request headers. Browser user-agents are long and list rendering engines and versions; bot user-agents are usually shorter and include a recognizable name plus a link to documentation, such as a string identifying "GPTBot" or "PerplexityBot." The practical takeaway is that each crawler announces itself with a distinctive token inside its user-agent string, and that token is what you match on.

How do servers use the user-agent?

Servers and applications read the user-agent to recognize the client and respond accordingly — serving an appropriate layout to a browser, or identifying a crawler in access logs. It's the basis for log analysis (counting how often each bot visits), for robots.txt rules (which are keyed to user-agent), and for server-side decisions (such as challenging or blocking specific bots). In short, the user-agent is the primary signal for telling one visitor type from another.

How is the user-agent used in robots.txt?

robots.txt rules are grouped by user-agent: a User-agent: line names the crawler, and the following directives apply to it. The critical detail is that the name must match the crawler's token exactly — a typo like "GPT-Bot" instead of "GPTBot" silently fails, leaving the bot uncontrolled. Because each AI provider runs several user-agents for different purposes, effective rules name each one precisely rather than assuming one entry covers a whole company.

What are AI user-agents, and how are they categorized?

AI providers use distinct user-agents for distinct jobs, which lets you control them independently.

Purpose	Example tokens
Training	GPTBot, ClaudeBot, CCBot
Retrieval / search	OAI-SearchBot, Claude-SearchBot, PerplexityBot
On-demand fetch	ChatGPT-User, Claude-User, Perplexity-User
Standard search	Googlebot, Bingbot

Knowing which token does what is essential: allowing a retrieval token preserves AI-answer visibility, while a training token governs whether your content feeds future models.

Which AI controls never appear in your logs?

An important subtlety: some AI controls are robots.txt opt-out tokens rather than real crawler user-agents, and they never show up in your server access logs. Google-Extended and Applebot-Extended are the main examples — they're directives that tell Google and Apple whether they may use your content for AI training, but no bot announces itself by those names. So if you're auditing logs, don't expect to see them; you set them in robots.txt and verify by behavior, not by log entries.

Can user-agents be spoofed?

Yes — a user-agent is a self-declared claim, and anyone can set it to any value. This matters in AI search because a meaningful portion of traffic claiming to be an AI crawler is actually spoofed; one industry analysis found roughly 5.7% of requests presenting AI-crawler user-agents were fake. The implication is that you should never treat the user-agent as proof of identity. Reputable crawler verification confirms legitimacy by checking the request's IP against the provider's published IP ranges, or via reverse DNS — treating the user-agent as a claim to verify rather than trust. [Editor: optional Cliro tie-in on verified AI crawler detection beyond the user-agent string.]

User-agent checklist

Match user-agent tokens exactly in robots.txt — typos silently fail.
Name each AI bot separately, since one company runs several.
Map tokens to purpose — training vs retrieval vs on-demand.
Remember opt-out tokens (Google-Extended, Applebot-Extended) won't appear in logs.
Don't trust the user-agent alone — verify by IP for important decisions.
Audit logs regularly to see which agents actually visit.

Frequently asked questions

What is a user-agent?

A user-agent is a short text string a browser or bot includes in every web request to identify itself. Servers read it to recognize the client, and it's the value targeted in robots.txt to allow or block specific crawlers.

How is a user-agent used in robots.txt?

Rules are grouped by user-agent: a User-agent line names the crawler and the directives below apply to it. The name must match the crawler's token exactly, or the rule silently fails.

What are AI user-agents?

They're the distinct tokens AI providers use for different jobs — training (GPTBot, ClaudeBot, CCBot), retrieval/search (OAI-SearchBot, Claude-SearchBot, PerplexityBot), and on-demand fetch (ChatGPT-User, Perplexity-User) — allowing independent control.

Why don't Google-Extended and Applebot-Extended appear in my logs?

Because they're robots.txt opt-out tokens for AI training, not real crawler user-agents. No bot announces itself by those names; you set them in robots.txt rather than seeing them in access logs.

Can user-agents be faked?

Yes. A user-agent is self-declared, and roughly 5.7% of requests claiming AI-crawler identities have been found to be spoofed. Verify legitimacy by checking the IP against published ranges or reverse DNS rather than trusting the string.

Written by

Federico Ergang

Cliro cofounder & CEO

Federico Ergang is cofounder and CEO of Cliro, the AI visibility and GEO platform for Latin America.

GEOAI visibility

What Is an AI Crawler? The Bots Behind AI Search

AI crawlers are the bots AI companies use to fetch web content for training, retrieval and answers. Learn the types, the major bots, and why detection matters.

June 9, 20265 min