Robots.txt

Definition

The robots.txt file is a plain text file placed at the root of a site (for example https://ipzen.com/robots.txt) that tells crawlers which parts of the site they may or may not browse. It follows the robots exclusion protocol, formalised as a standard by the IETF in 2022.

What it is used for

Block crawling of private areas (admin sections, user accounts)
Preserve the crawl budget on large sites
Prevent indexing of duplicates or e-commerce filters
Declare the location of the XML sitemap

Best practices

A misconfigured robots.txt can accidentally block the whole site. Key rules:

Never use Disallow: / without reason on a live site
Do not rely on it to hide sensitive content: the file is public and does not prevent indexing via external links
Test through Google Search Console before any deployment
Specify target User-agents (Googlebot, Bingbot, GPTBot, ClaudeBot, etc.)

Robots.txt

Robots.txt

Definition

What it is used for

Best practices

SIGN-UP FOR OUR NEWSLETTER

FEATURES

RESSOURCES

CONTACT