XML sitemap

XML sitemap

Definition

An XML sitemap is a file that lists the important URLs of a site together with metadata (last modified date, change frequency, priority). It helps search engines discover and prioritise content for indexing.

Why it helps

  • Speed up the discovery of new pages
  • Signal updates to search engines
  • Optimise the crawl budget on large sites
  • Cover content that is poorly connected through internal linking

Best practices

  • Only list canonical URLs (see canonical tag) that return a 200 status
  • Exclude pages with noindex or blocked by robots.txt
  • Keep the file clean, gzip-compressed beyond 50,000 URLs or 50 MB
  • Declare it in robots.txt and inside Search Console
  • For multilingual sites, use hreflang attributes inside the sitemap