Robots.txt Rules for WordPress

Robots.txt Rules for WordPress

Robots.txt in 30 seconds

Primarily, robots directives disallow obedient spiders access to specified parts of your site. They can also explicitly “allow” access to specific files and directories. So basically they’re used to let Google, Bing et al know where they can go when visiting your site. You can also do nifty stuff like instruct specific user-agents and declare sitemaps. For just a simple text file, robots.txt wields considerable power. And we want to use whatever power we can get to our greatest advantage.

Robots.txt and WordPress

Running WordPress, you want search engines to crawl and index your posts and pages, but not your core WP files and directories. You also want to make sure that feeds and trackbacks aren’t included in the search results. It’s also good practice to declare a sitemap. Here is a good starting point for your next WP-based robots.txt:

Update (2014/02/09): The following rules have been removed from the tried and true robots.txt rules to appease new Google requirements that all JavaScript and CSS files are not blocked:

Disallow: /wp-content/
Disallow: /wp-includes/

It may not be necessary to allow Google access to the includes directory, but it does contain some JS and CSS files, so better safe than sorry. Apparently Google is so hardcore about this new requirement that they actually are penalizing sites (a LOT) for non-compliance. Bad news for hundreds of thousands of site owners who have better things to do than keep up with Google’s constant changes. Note that it’s still fine to block /wp-content/ and /wp-includes/ for other bots — as of the time of this writing only Google demands access to all JS and CSS files.

That said, here are the new and improved robots.txt rules for WordPress:

User-agent: *
Disallow: /wp-admin/
Disallow: /trackback/
Disallow: /xmlrpc.php
Disallow: /feed/
Sitemap: http://example.com/sitemap.xml

That’s a plug-n-play version that you can further customize to fit specific site structure as well as your own SEO strategy. To use this code for your WordPress-powered site, just copy/paste into a blank file named robots.txt located in your web-accessible root directory, for example:

http://sitename.com/robots.txt

Let’s have a look:

User-agent: *
Disallow: /wp-admin/
Disallow: /trackback/
Disallow: /xmlrpc.php
Disallow: /blackhole/
Disallow: /mint/
Disallow: /feed/
Allow: /tag/mint/
Allow: /tag/feed/
Sitemap: http://sitename.com/sitemap.xml

Spiders don’t need to be crawling around anything in /wp-admin/, so that’s disallowed. Likewise, trackbacks, xmlrpc, and feeds don’t need to be crawled, so blocked those as well. Then I add a few explicit Allow directives to unblock access to specific URLs otherwise disallowed by existing rules. I also declare the location of my sitemap, just to make it official.

Previously on robots.txt

As mentioned, my previous robots.txt file went unchanged for several years (which just vanished in the blink of an eye), but proved quite effective, especially with compliant spiders like googlebot. Unfortunately, it contains language that only a few of the bigger search engines understand (and thus obey):

Update (2015/02/09): Google has new rules, see the above update note and do not use the following rules. They are for example/reference only.

User-agent: *
Disallow: /mint/
Disallow: /labs/
Disallow: /*/wp-*
Disallow: /*/feed/*
Disallow: /*/*?s=*
Disallow: /*/*.js$
Disallow: /*/*.inc$
Disallow: /transfer/
Disallow: /*/cgi-bin/*
Disallow: /*/blackhole/*
Disallow: /*/trackback/*
Disallow: /*/xmlrpc.php
Allow: /*/20*/wp-*
Allow: /press/feed/$
Allow: /press/tag/feed/$
Allow: /*/wp-content/online/*
Sitemap: http://sitename.com/sitemap.xml

User-agent: ia_archiver
Disallow: /

Apparently, the wildcard character isn’t recognized by lesser bots, and I’m thinking that the end-pattern symbol (dollar sign $) is probably not well-supported either, although Google certainly gets it.

These patterns may be better supported in the future, but going forward there is no reason to include them. As with the robots examples above, the same pattern-matching is possible without using wildcards and dollar signs, enabling all compliant bots to understand your crawl preferences.


Jayesh Patel
Author
Jayesh Patel

Jayesh Patel is a Professional Web Developer & Designer and the Founder of InCreativeWeb.

As a highly Creative Web/Graphic/UI Designer - Front End / PHP / WordPress / Shopify Developer, with 14+ years of experience, he also provide complete solution from SEO to Digital Marketing. The passion he has for his work, his dedication, and ability to make quick, decisive decisions set him apart from the rest.

His first priority is to create a website with Complete SEO + Speed Up + WordPress Security Code of standards.



Explore

Related Articles

21st November, 2024

Advantages of Business Web Development

18th November, 2024

How to delete your cache and force a browser refresh?

16th November, 2024

How to Manage Website Technical Debt for Long-Term Success?