Skip to main content
SEO/AEO/GEO

Robots.txt

Portrait of Lukas Horvath, co-founder of Roelu Studio
Lukas HorvathCo-founder

What is Robots.txt?

Robots.txt is a plain text file placed at the root of a website (e.g. yoursite.com/robots.txt) that gives instructions to search engine crawlers about which URLs they can access. Following the Robots Exclusion Protocol, it can allow or disallow specific paths, point to the XML sitemap, and set crawl delays. It does not block indexing on its own — for that you need a noindex tag — but it controls crawl behavior.

Why it matters

Robots.txt is the first file Google reads on your site. One typo and you can accidentally block your entire domain from being crawled — yes, this happens, and yes, it has cost companies six-figure traffic drops overnight. Used correctly, it keeps crawlers away from admin pages, search results, faceted navigation, and other URL bloat that wastes crawl budget. Used incorrectly, it nukes your SEO. The fix is review every change before deploying, and never confuse "disallow" with "noindex." Different jobs, very different consequences.

How it works

The robots.txt file lives at yoursite.com/robots.txt. Inside, you write directives by user-agent: "User-agent: *" applies to all crawlers, while "Disallow: /admin/" tells them not to crawl that path. You can also target specific bots individually, like Googlebot or GPTBot. You include a Sitemap directive pointing to your XML sitemap as well. After deploying any change, you test using Google Search Console's robots.txt tester to confirm critical URLs aren't accidentally blocked. Important: disallowing a URL in robots.txt doesn't remove it from Google's index if it's already there — for that, you need a noindex meta tag instead. Robots.txt controls crawling, not indexing of already-known pages.

  • XML Sitemap

    SEO/AEO/GEO

    A machine-readable file that lists every important page on your site, helping search engines find and crawl your content faster and more reliably than they…

  • Indexing

    SEO/AEO/GEO

    The process search engines use to store and organize web pages so they can show up in results — if your page isn't indexed, it can't rank, and most sites have…

  • Technical SEO

    SEO/AEO/GEO

    The plumbing of SEO — making sure search engines can crawl, render, and index your site quickly and cleanly, so your content actually has a chance to rank…

  • AI Crawler

    AI & Search

    An automated bot that AI companies use to read websites and feed the content into their models or live answer engines — including GPTBot, ClaudeBot,…

  • LLMs.txt

    AI & Search

    A plain markdown file you put at the root of your website that tells AI models which pages matter most and how to read them — like robots.txt, but for large…

  • Canonical URL

    SEO/AEO/GEO

    A tag that tells search engines which version of a page is the original when duplicates or near-duplicates exist, so ranking signals consolidate on one URL…