Robots.txt

Robots.txt

Definition

The robots.txt file is a plain text file placed at the root of a site (for example https://ipzen.com/robots.txt) that tells crawlers which parts of the site they may or may not browse. It follows the robots exclusion protocol, formalised as a standard by the IETF in 2022.

What it is used for

  • Block crawling of private areas (admin sections, user accounts)
  • Preserve the crawl budget on large sites
  • Prevent indexing of duplicates or e-commerce filters
  • Declare the location of the XML sitemap

Best practices

A misconfigured robots.txt can accidentally block the whole site. Key rules:

  • Never use Disallow: / without reason on a live site
  • Do not rely on it to hide sensitive content: the file is public and does not prevent indexing via external links
  • Test through Google Search Console before any deployment
  • Specify target User-agents (Googlebot, Bingbot, GPTBot, ClaudeBot, etc.)