Developer Tools

robots.txt Validator

Paste your robots.txt content to check its syntax. Parses User-agent, Disallow, Allow, and Sitemap directives and visualizes crawler access rules.


[[ fetchError ]]

[[ labels.or_paste ]]
[[ labels.sample_label ]]:
[[ labels.no_input ]]

[[ labels.result_label ]]

[[ labels.errors_label ]]
  • [[ labels.line_label ]] [[ err.line ]]: [[ errorMessage(err) ]]
✓ [[ labels.no_errors ]]

[[ labels.blocks_label ]]

User-agent: [[ ua === '*' ? labels.all_crawlers + ' (*)' : ua ]]
[[ labels.no_rules ]]
Crawl-delay [[ block.crawlDelay ]]

[[ labels.sitemaps_label ]]

Tips

  • User-agent: * applies to all crawlers. Blocks with a specific bot name (e.g. Googlebot) take precedence over the wildcard block.
  • Disallow: with an empty value means "allow all paths". Disallow: / (slash only) blocks everything. Do not confuse the two.
  • The Allow: directive takes precedence over Disallow: when both match. For example, Disallow: /private/ followed by Allow: /private/public.html allows that single file.
  • Crawl-delay: is a non-standard directive specifying the delay in seconds between requests. Bing and Yandex respect it, but Google does not — use Google Search Console's crawl rate settings instead.
  • The Sitemap: directive tells search engines the URL of your sitemap. Both Google and Bing recognize it. Combine it with manual submission via Google Search Console or Bing Webmaster Tools for best results.

FAQ

Yes. Blocking a page with Disallow prevents crawlers from reading its content, so it will not be indexed. However, if the page is linked from other sites, search engines may still list it as "known but not crawled." For full exclusion, combine robots.txt with a noindex meta tag.

No. robots.txt is scoped per origin. example.com/robots.txt does not apply to blog.example.com. Each subdomain needs its own robots.txt file.

Googlebot supports * (any sequence of characters) and $ (end of URL). For example, Disallow: /private/*.pdf$ blocks all PDF files under /private/. Not all crawlers support wildcards, so check each bot's documentation.

Side Note — The Birth of robots.txt: The Day the Internet Got Manners

In 1994, Dutch engineer Martijn Koster invented the Robots Exclusion Protocol (REP). At the time, web crawlers were roaming sites uncontrolled, placing excessive load on servers. Koster's idea was simple: "If you place a file at the site root with rules, crawlers will read it." That file became robots.txt.

Crucially, robots.txt is a voluntary gentlemen's agreement — it has no enforcement power. Well-behaved crawlers like Googlebot and Bingbot respect it, but malicious bots ignore it. This means robots.txt works well for telling search engines what not to index, but it should never be relied on to protect sensitive content from unauthorized access. Confidential pages must be protected with authentication.

The Robots Exclusion Protocol existed as an informal convention for nearly three decades before being formally registered as RFC 9309 by the IETF in September 2022 — almost 28 years after its creation. The RFC clarified that directive names are case-insensitive and resolved various long-standing syntactic ambiguities.