/robots.txt). For accurate live testing of your own site, use the URL Tester tab with your actual robots.txt content.
How This Robots.txt Tester Works
Everything runs 100% in your browser — no data is sent to any server.
What Is a Robots.txt File — and Why Does It Actually Matter?
A robots.txt file sits at the root of your website — always at yourdomain.com/robots.txt — and acts as a polite instruction sheet for web crawlers. When Googlebot, Bingbot, or any other crawler visits your site, the very first thing it checks is this file. It tells bots which pages they can crawl and which ones to skip.
Here's the thing most people get wrong: robots.txt doesn't prevent pages from appearing in search results. It only stops crawlers from visiting those URLs. A page can still get indexed if another site links to it, even if it's disallowed in robots.txt. If you truly want a page out of Google's index, you need a noindex meta tag combined with proper crawl access.
For SEO in 2026, robots.txt is as important as ever — especially with Google's rendering pipeline. If your CSS, JavaScript, or font files are accidentally blocked, Googlebot can't render your pages properly, which directly impacts how it understands and ranks your content.
The Basic Structure
Every valid robots.txt follows the same pattern: a User-agent line that names the bot, followed by one or more Allow or Disallow rules. Groups are separated by blank lines. The asterisk * matches all bots. Rules for a named bot like Googlebot always take priority over wildcard rules when both apply.
A Disallow: / blocks the entire site. An empty Disallow: allows everything. When there's no robots.txt at all, bots assume full access — which is fine for most public websites.
Robots.txt Directives — Plain English Glossary
These are all the directives you'll encounter when reading or writing a robots.txt file. Understanding each one helps you write rules that actually do what you intend.
* for all bots, or name a specific crawler like Googlebot./search/* blocks all URLs starting with /search/, regardless of what follows.Disallow: /*.pdf$ blocks only URLs ending in .pdf, not URLs that contain .pdf somewhere in the middle.7 Robots.txt Mistakes That Quietly Hurt Your SEO
Most robots.txt errors aren't obvious — they don't break your site, they just silently stop Google from doing its job. These are the ones we see most often when auditing sites.
/wp-includes/ and /assets/ disallow rule you have.
noindex tag on the page itself, not a Disallow rule.
Sitemap: /sitemap.xml instead of the full absolute URL (https://yourdomain.com/sitemap.xml) is technically invalid. Some crawlers handle it fine; others ignore it entirely.
Disallow: /Admin/ and Disallow: /admin/ are different rules. URLs are case-sensitive in robots.txt. Write your paths exactly as they appear on your server.
User-agent: Googlebot blocks in one file creates ambiguity. Different crawlers interpret this differently — consolidate all rules for a bot into a single group.
Robots.txt vs Meta Noindex vs Canonical — When to Use Which
These three tools often get confused because they all deal with controlling what Google sees. They work very differently though, and using the wrong one can create invisible SEO problems.
| Method | Stops Crawling? | Removes from Index? | Best Used For |
|---|---|---|---|
| robots.txt Disallow | Yes | No | Saving crawl budget — keeping bots away from admin pages, internal search, duplicate parameter URLs |
| Meta noindex | No | Yes | Removing a page from search results while still letting it be crawled (so Google can read the tag) |
| Canonical tag | No | Consolidates signals | Duplicate or near-duplicate content — tells Google which version is the "real" URL to rank |
| Disallow + noindex combined | Yes | Won't work | This is a trap — if Googlebot is blocked from the page, it can't read the noindex tag. Don't combine these. |
Frequently Asked Questions About Robots.txt
These are the questions that come up most often — from beginners setting up their first file to developers debugging complex crawl issues.
noindex meta tag on the page — and that page must be crawlable so Google can read the tag.
Disallow: /*.pdf$ blocks all URLs ending in .pdf. The dollar sign anchors the match to the end of the URL, so it won't accidentally block something like /pdf-guide/. You can apply this approach to any file extension — .doc, .xls, .zip, and so on.
Disallow: /Admin/ and Disallow: /admin/ are treated as two different rules. Always match the exact case of your actual URL paths.
Disallow tells a bot not to crawl a path. Allow explicitly permits a path — its main purpose is to carve out exceptions inside a broader Disallow rule. For example, if you block all of /wp-admin/ but need Googlebot to access /wp-admin/admin-ajax.php (which powers live search and AJAX features), you add an Allow rule for that specific file. When both rules match a URL equally, Allow wins over Disallow.
https://), and you can list multiple Sitemap lines if you have more than one sitemap file.