Robots.txt Generator
Generate a correct robots.txt and avoid crawling errors that hurt your SEO.
User-agent: * Disallow: Sitemap: https://example.com/sitemap.xml
Why it matters
The correct robots.txt protects your SEO
No syntax errors
Generate a valid file with the exact syntax search engines expect.
Avoid de-indexing
Configure correct rules so you don't accidentally block CSS, JS, or important pages.
Includes Sitemap directive
Add your sitemap URL so Googlebot discovers your content faster.
No signup
Generate and download your robots.txt without creating any account.
How it works
Three steps, no hassle
Configure your rules
Choose the bots (Googlebot, Bingbot, all) and define which paths to allow or disallow. Add your sitemap URL if you have one.
Preview the file
The generator builds the robots.txt in real time. Review that the rules are exactly what you need.
Download and upload to your server
Copy the content or download the file. Upload it to your domain root as /robots.txt.
FAQ
Got questions?
Robots.txt is a plain text file that websites place in their root directory to tell search engine crawlers (bots) which pages or sections they should not visit. It was proposed by Martijn Koster in 1994 as part of the Robots Exclusion Standard, an informal convention quickly adopted by all major search engines. Google, Bing, Yahoo, and virtually all bots voluntarily respect this file.
No. This is the most common misconception. Robots.txt controls crawling, not indexing. A search engine can index a URL blocked in robots.txt if it finds links to it from other pages. To truly prevent indexing, you must use the <meta name='robots' content='noindex'> tag or the HTTP header X-Robots-Tag: noindex on the page itself.
The most frequent are: (1) blocking CSS and JavaScript files — this prevents Googlebot from rendering your pages correctly and can hurt rankings; (2) using robots.txt to hide pages with sensitive information — it is not a security mechanism; (3) incorrect syntax (case errors, extra spaces) — the file is case-sensitive for paths; (4) not adding the Sitemap directive, which helps search engines discover your content.
Google respects the standard User-agent, Disallow, and Allow directives, plus some extensions: Crawl-delay (though Google officially ignores it and prefers adjusting crawl rate via Search Console), the Sitemap directive (to declare the XML sitemap URL), and the * wildcard in paths. Google also reads the X-Robots-Tag HTTP header for document-level instructions, including on non-HTML resources like PDFs and images.
The Sitemap: https://example.com/sitemap.xml directive inside robots.txt tells crawlers where to find the site's XML sitemap. It is a discovery method complementary to registering in Google Search Console. You can declare multiple sitemaps in the same robots.txt. Although not part of the original 1994 standard, all major search engines recognize it.
Robots.txt: the Robots Exclusion Standard and its SEO impact
The Robots Exclusion Standard (RES) was born in 1994 from a proposal by Martijn Koster, a Dutch software engineer, published on the www-talk mailing list. At the time, the first web crawlers — such as Matthew Gray's World Wide Web Wanderer (1993) and the primitive Webcrawler — consumed so much server bandwidth that administrators needed a way to control them. Koster proposed robots.txt as a voluntary convention, and the major search engines of the era (ALIWEB, the first to actively use the file, followed by WebCrawler, AltaVista, and Yahoo) quickly adopted it.
In 2019, Google submitted a formal specification of the Robots Exclusion Protocol (REP) as RFC 9309, definitively published by the IETF in 2022. This formalization — nearly 30 years after the original proposal — standardized aspects that had remained ambiguous, such as behavior on HTTP 4xx responses (treat 404/410 as unrestricted, treat 429 as temporarily denied), the file size limit (maximum 500 kibibytes per the RFC), the precedence order of Allow/Disallow rules, and path case sensitivity.
A misconfigured robots.txt can have devastating SEO consequences. In 2006, Google accidentally de-indexed part of its own site due to a robots.txt error. In 2013, Expedia lost significant organic visibility from an accidental block. In 2020, several large sites experienced organic traffic drops from similar errors during migrations. The 'Disallow: /' directive (blocking the entire site) appears in robots.txt by default in many CMSs during development — and forgetting to revert it in production is a classic error that SEO auditors check first. Google Search Console includes a robots.txt testing tool that lets you verify if a specific URL can be crawled before pushing changes to the server.