Ever heard the term robots.txt and wondered how it applies to your website? Most websites have a robots.txt file, but that doesnât mean most webmasters understand it. In this post, we hope to change that by offering a deep dive into the WordPress robots.txt file, as well as howÂ it can control and limit access to your site. By the end, youâll be able to answer questions like:
Thereâs a lot to cover so letâs get started!
Before we can talk about the WordPress robots.txt, itâs important to define what a ârobotâ is in this case. Robots are any type of âbotâ that visits websites on the Internet. The most common example is search engine crawlers. These bots âcrawlâ around the web to help search engines like Google index and rank the billions of pages on the Internet.
So, bots are, in general, a good thing for the Internet…or at least a necessary thing. But that doesnât necessarily mean that you, or other webmasters, want bots running around unfettered. The desire to control how web robots interact with websites led to the creation of the robots exclusion standard in the mid-1990s. Robots.txt is the practical implementation of that standard – it allows you to control how participating bots interact with your site. You can block bots entirely, restrict their access to certain areas of your site, and more.
That âparticipatingâ part is important, though. Robots.txt cannot force a bot to follow its directives. And malicious bots can and will ignore the robots.txt file. Additionally, even reputable organizations ignore some commands that you can put in Robots.txt. For example, Google will ignore any rules that you add to your robots.txt about how frequently its crawlers visit. If you are having a lot of issues with bots, a security solution such as Cloudflare or Sucuri can come in handy.
For most webmasters, the benefits of a well-structured robots.txt file boil down to two categories:
- Optimizing search enginesâ crawl resources by telling them not to waste time on pages you donât want to be indexed. This helps ensure that search engines focus on crawling the pages that you care about the most.
- Optimizing your research usage by blocking bots that are wasting your server resources.
Robots.txt Isnât Specifically About Controlling Which Pages Get Indexed In Search Engines
Robots.txt is not a foolproof way to control what pages search engines index. If your primary goal is to stop certain pages from being included in search engine results, the proper approach is to use a meta noindex tag or another similarly direct method.
This is because your Robots.txt is not directly telling search engines not to index content – itâs just telling them not to crawl it. While Google wonât crawl the marked areas from inside your site, Google itself statesÂ that if an external site links to a page that you exclude with your Robots.txt file, Google still might index that page.
By default, WordPress automatically creates a virtual robots.txt file for your site. So even if you donât lift a finger, your site should already have the default robots.txt file. You can test if this is the case by appending â/robots.txtâ to the end of your domain name. For example, âhttps://kinsta.com/robots.txtâ brings up the robots.txt file that we use here at Kinsta:
Because this file is virtual, though, you canât edit it. If you want to edit your robots.txt file, youâll need to actually create a physical file on your server that you can manipulate as needed. Here are three simple ways to do that…
How to Create And Edit A Robots.txt File With Yoast SEO
If youâre using the popular Yoast SEO plugin, you can create (and later edit) your robots.txt file right from Yoastâs interface. Before you can access it, though, you need to enable Yoast SEOâs advanced features by going to SEO â Dashboard â Features and toggling on Advanced settings pages:
Once thatâs activated, you can go to SEO â Tools and click on File editor:
Assuming you donât already have a physical Robots.txt file, Yoast will give you an option to Create robots.txt file:
And once you click that button, youâll be able to edit the contents of your Robots.txt file directly from the same interface:
As you read on, weâll dig more into what types of directives to put in your WordPress robots.txt file.
How to Create And Edit A Robots.txt File With All In One SEO
If youâre using the almost-as-popular-as-Yoast All in One SEO Pack plugin, you can also create and edit your WordPress robots.txt file right from the pluginâs interface. All you need to do is go to All in One SEO â Feature Manager and Activate the Robots.txt feature:
Then, youâll be able to manage your Robots.txt file by going to All in One SEO â Robots.txt:
How to Create And Edit A Robots.txt File Via FTP
If youâre not using an SEO plugin that offers robots.txt functionality, you can still create and manage your robots.txt file via SFTP. First, use any text editor to create an empty file named ârobots.txtâ:
Then, connect to your site via SFTP and upload that file to the root folder of your site. You can make further modifications to your robots.txt file by editing it via SFTP or uploading new versions of the file.
Ok, now you have a physical robots.txt file on your server that you can edit as needed. But what do you actually do with that file? Well, as you learned in the first section, robots.txt lets you control how robots interact with your site. You do that with two core commands:
- User-agent – this lets you target specific bots. User agents are what bots use to identify themselves. With them, you could, for example, create a rule that applies to Bing, but not to Google.
- Disallow – this lets you tell robots not to access certain areas of your site.
Thereâs also an Allow command that youâll use in niche situations. By default, everything on your site is marked with Allow, so itâs not necessary to use the Allow command in 99% of situations. But it does come in handy where you want to Disallow access to a folder and its child folders but Allow access to one specific child folder.
You add rules by first specifying which User-agent the rule should apply to and then listing out what rules to apply using Disallow and Allow. There are also some other commands like Crawl-delay and Sitemap, but these are either:
- Ignored by most major crawlers, or interpreted in vastly different ways (in the case of crawl delay)
- Made redundant by tools like Google Search Console (for sitemaps)
Letâs go through some specific use cases to show you how this all comes together.
How To Use Robots.txt To Block Access To Your Entire Site
Letâs say you want to block all crawler access to your site. This is unlikely to occur on a live site, but it does come in handy for a development site. To do that, you would add this code to your WordPress robots.txt file:
User-agent: * Disallow: /
Whatâs going on in that code?
The *asterisk next to User-agent means âall user agentsâ. The asterisk is a wildcard, meaning it applies to every single user agent. The /slash next to Disallow says you want to disallow access to all pages that contain âyourdomain.com/â (which is every single page on your site).
How To Use Robots.txt To Block A Single Bot From Accessing Your Site
Letâs change things up. In this example, weâll pretend that you donât like the fact that Bing crawls your pages. Youâre Team Google all the way and donât even want Bing to look at your site. To block only Bing from crawling your site, you would replace the wildcard *asterisk with Bingbot:
User-agent: Bingbot Disallow: /
Essentially, the above code says to only apply the Disallow rule to bots with the User-agent âBingbotâ. Now, youâre unlikely to want to block access to Bing – but this scenario does come in handy if thereâs a specific bot that you donât want to access your site.Â This site has a good listing of most serviceâs known User-agent names.
How To Use Robots.txt To Block Access To A Specific Folder Or File
For this example, letâs say that you only want to block access to a specific file or folder (and all of that folderâs subfolders). To make this apply to WordPress, letâs say you want to block:
- The entire wp-admin folder
You could use the following commands:
User-agent: * Disallow: /wp-admin/ Disallow: /wp-login.php
How to Use Robots.txt To Allow Access To A Specific File In A Disallowed Folder
Ok, now letâs say that you want to block an entire folder, but you still want to allow access to a specific file inside that folder. This is where the Allow command comes in handy. And itâs actually very applicable to WordPress. In fact, the WordPress virtual robots.txt file illustrates this example perfectly:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
This snippet blocks access to the entire /wp-admin/ folder except for the /wp-admin/admin-ajax.php file.
How To Use Robots.txt To Stop Bots From Crawling WordPress Search Results
One WordPress-specific tweak you might want to make is to stop search crawlers from crawling your search results pages. By default, WordPress uses the query parameter â?s=â. So to block access, all you need to do is add the following rule:
User-agent: * Disallow: /?s= Disallow: /search/
This can be an effective way to also stop soft 404 errors if you are getting them.
How To Create Different Rules For Different Bots In Robots.txt
Up until now, all the examples have dealt with one rule at a time. But what if you want to apply different rules to different bots? You simply need to add each set of rules under the User-agent declaration for each bot. For example, if you want to make one rule that applies to all bots and another rule that applies to just Bingbot, you could do it like this:
User-agent: * Disallow: /wp-admin/ User-agent: Bingbot Disallow: /
In this example, all bots will be blocked from accessing /wp-admin/, but Bingbot will be blocked from accessing your entire site.
You can test your WordPress robots.txt file in Google Search Console to ensureÂ it’s setup correctly. Simply click into your site, and under “Crawl” click on “robots.txt Tester.” You can then submit any URL, including your homepage. You should see a greenÂ Allowed if everything is crawlable. You could also test URLs you have blocked to ensure they are in fact blocked, and or Disallowed.
Beware of the UTF-8 BOM
BOM stands for byte order mark and is basically an invisible character that is sometimes added to files by old text editors and the like. If this happens to your robots.txt file, Google might not read it correctly. This is why it is important to check your file for errors. For example, as seen below, our file had an invisible character and Google complains about the syntax not being understood. This essentially invalidates the first line of our robots.txt file altogether, which is not good! Glenn Gabe has an excellent article on how a UTF-8 Bom could kill your SEO.
To actually provide some context for the points listed above, here is how some of the most popular WordPress sites are using their robots.txt files.
In addition to restricting access to a number of unique pages, TechCrunch notably disallows crawlers to:
They also set special restrictions on two bots:
In case youâre interested, IRLbot is a crawler from a Texas A&M University research project. Thatâs odd!
The Obama Foundation
The Obama Foundation hasnât made any special additions, opting exclusively to restrict access to /wp-admin/.
Angry Birds has the same default setup as The Obama Foundation. Nothing special is added.
Finally, Drift opts to define its sitemaps in the Robots.txt file, but otherwise, leave the same default restrictions as The Obama Foundation and Angry Birds.
Use Robots.txt The Right Way
As we wrap up our robots.txt guide, we want to remind you one more time that using a Disallow command in your robots.txt file is not the same as using a noindex tag. Robots.txt blocks crawling, but not necessarily indexing. You can use it to add specific rules to shape how search engines and other bots interact with your site, but it will not explicitly control whether your content is indexed or not.
For most casual WordPress users, thereâs not an urgent need to modify the default virtual robots.txt file. But if youâre having issues with a specific bot, or want to change how search engines interact with a certain plugin or theme that youâre using, you might want to add your own rules.
We hope you enjoyed this guide and be sure to leave a comment if you have any further questions aboutÂ using your WordPress robots.txt file.