Copyright © 2024 IPzen | Legal Notices | Privacy Policy | Cookie Policy
Robots.txt
Robots.txt
Definition
The robots.txt file is a plain text file placed at the root of a site (for example https://ipzen.com/robots.txt) that tells crawlers which parts of the site they may or may not browse. It follows the robots exclusion protocol, formalised as a standard by the IETF in 2022.
What it is used for
- Block crawling of private areas (admin sections, user accounts)
- Preserve the crawl budget on large sites
- Prevent indexing of duplicates or e-commerce filters
- Declare the location of the XML sitemap
Best practices
A misconfigured robots.txt can accidentally block the whole site. Key rules:
- Never use
Disallow: /without reason on a live site - Do not rely on it to hide sensitive content: the file is public and does not prevent indexing via external links
- Test through Google Search Console before any deployment
- Specify target User-agents (Googlebot, Bingbot, GPTBot, ClaudeBot, etc.)