SEO

What Is a Sitemap? XML Sitemaps and SEO Best Practices

A sitemap is a file that lists your site's URLs to help search engines discover and crawl them. Learn the types, what to include, and the best practices.

Share

What Is a Sitemap? XML Sitemaps and SEO Best Practices

A sitemap is a file that lists the important URLs on your website to help search engines discover, crawl and understand them. It acts as a roadmap of your site, telling search engines which pages exist and, optionally, how often they change and how they relate to each other. While a sitemap doesn't guarantee indexing, it is one of the simplest and most reliable ways to help crawlers find your content — especially on large sites or for new pages with few internal links.

This guide explains what a sitemap is, the difference between XML and HTML sitemaps, the main sitemap types, what to include and exclude, how to submit one, the size limits, and the best practices that keep it useful.

What is a sitemap, exactly?

A sitemap is a structured file — most often XML — that enumerates the URLs you want search engines to know about. When a crawler reads it, it gets an authoritative list of your pages rather than relying solely on following links to discover them. This matters because link-based discovery alone can miss pages that are new, deeply nested, or poorly linked internally. The sitemap is a direct channel that says, in effect, "here are the pages that matter on this site."

A sitemap is a discovery aid, not a ranking tool or an indexing command. Listing a URL in a sitemap suggests it for crawling; it does not force the engine to index or rank it. Quality and relevance still decide what happens after discovery.

What's the difference between an XML and an HTML sitemap?

The two serve different audiences. An XML sitemap is written for search engines: a machine-readable list of URLs, sometimes with metadata like last-modified dates. An HTML sitemap is a page built for human visitors, linking to the main sections of a site to aid navigation. For SEO discovery, the XML sitemap is the one that matters; HTML sitemaps are a usability feature that can also help internal linking on large sites.

What are the main types of XML sitemap?

Beyond the standard page sitemap, several specialized formats exist for different content, and large sites combine them under a sitemap index file.

TypeLists
Standard (URL) sitemapYour indexable pages
Image sitemapImages you want discovered for image search
Video sitemapVideo content and its metadata
News sitemapRecent articles for Google News (time-sensitive)
Sitemap indexA file that points to multiple sitemaps, used on large sites

What should a sitemap include — and exclude?

The guiding rule is that a sitemap should list only canonical, indexable URLs that return a 200 status — the pages you genuinely want in search. Excluding the wrong URLs is just as important as including the right ones. Keep out: non-canonical duplicates, pages set to noindex, redirected URLs, 404s, and pages blocked by robots.txt. A sitemap full of non-indexable URLs sends mixed signals and wastes crawl attention. A clean sitemap, by contrast, reinforces your canonical choices and concentrates crawling where it counts.

How do you submit a sitemap?

There are two standard channels, and using both is best practice. First, reference the sitemap in your robots.txt file with a Sitemap: line, so any crawler reading robots.txt finds it automatically. Second, submit it directly in Google Search Console (and Bing Webmaster Tools), which also lets you monitor how many submitted URLs are actually indexed and surfaces errors. Most modern CMS platforms generate and update an XML sitemap automatically; the job is to make sure it's accurate and discoverable.

What are the size limits?

A single XML sitemap can contain up to 50,000 URLs and must not exceed 50MB uncompressed. Sites larger than that split their URLs across multiple sitemap files referenced by a sitemap index. Keeping individual sitemaps well under the limit, and grouping them logically (by section or content type), also makes it easier to diagnose indexing issues — you can see which group of pages is being indexed and which isn't.

Do you really need a sitemap?

Not every site strictly requires one. A small, well-linked site can be fully discovered through internal links alone. Sitemaps deliver the most value when a site is large, has pages that aren't well linked internally, is new with little external linking, or uses rich media that benefits from image and video sitemaps. Even when not strictly necessary, a clean sitemap is low-cost insurance for discovery and a useful diagnostic surface in Search Console.

What are the common sitemap mistakes?

The usual problems: including non-canonical, redirected or noindexed URLs; letting the sitemap go stale so it lists deleted pages; exceeding size limits without splitting; listing URLs blocked by robots.txt; and forgetting to update it after a site migration. After any major change — like a domain or language migration — regenerating and resubmitting an accurate sitemap is one of the fastest ways to help engines re-discover the new structure.

How do sitemaps relate to AI crawlers?

AI crawlers discover content the same way search crawlers do, so an accurate sitemap helps them find your pages too. A current sitemap that lists your canonical, indexable URLs gives any compliant crawler — search or AI — a reliable map of what to read, which supports both classic indexing and eligibility for AI citation. [Editor: optional Cliro tie-in on monitoring which listed URLs get crawled by AI bots.]

Sitemap checklist

  1. List canonical, indexable URLs only. Exclude duplicates, noindex, redirects and 404s.
  2. Keep it current. Regenerate after content changes and migrations.
  3. Reference it in robots.txt and submit it in Search Console.
  4. Respect the limits. Split large sites with a sitemap index (50k URLs / 50MB per file).
  5. Use specialized sitemaps for images, video or news where relevant.
  6. Monitor coverage in Search Console to catch indexing gaps.

Frequently asked questions

What is a sitemap in SEO?

A sitemap is a file, usually XML, that lists a website's important URLs to help search engines discover and crawl them. It is a discovery aid, not a guarantee of indexing or ranking.

What is the difference between an XML and HTML sitemap?

An XML sitemap is machine-readable and built for search engines; an HTML sitemap is a navigation page built for human visitors. For SEO discovery, the XML sitemap is the one that matters.

Does a sitemap help with rankings?

Not directly. A sitemap helps search engines discover and crawl pages, but indexing and ranking still depend on content quality, relevance and other signals.

How many URLs can a sitemap contain?

Up to 50,000 URLs and a maximum of 50MB uncompressed per file. Larger sites split their URLs across multiple sitemaps referenced by a sitemap index file.

How do I submit my sitemap to Google?

Reference it in your robots.txt with a Sitemap line and submit it in Google Search Console, which also reports how many of the listed URLs are indexed.

Federico Ergang

Written by

Federico Ergang

Cliro cofounder & CEO

Federico Ergang is cofounder and CEO of Cliro, the AI visibility and GEO platform for Latin America.

Related articles