Prevent Your Development Website from being Indexed
During the build of a new website, or while implementing new development on an existing site, it is typically advisable to block search engine crawls and to prevent your website from being indexed.
The obvious problem is with duplicate content and how this can negatively affect your live website’s SEO. If the development version gets indexed by search engines this can lead to precisely that problem. But perhaps more significantly, for brand new websites, all that new content and design will leak online before your ready for your big reveal. Even if your site is being created on an obscure URL that even the Enigma Machine codebreakers couldn’t find, Google will find it if you don’t block it from being indexed.
New Web Development | Hiding in Plain Sight
If you are working on your new website in an online development environment, or if you have a copy of your live site for staging / development purposes, it is imperative that you don’t inadvertently let Google (or any Search Engine, for that matter) crawl your development site.
Allowing search engines to crawl both the live and development version of your site will harm your SEO, sometimes significantly. This happens because of a number of factors, but not least due to splitting your SEO score across two sites and falling foul of duplicate content issues.
Noindex – Block All Agents in Your Robots.txt File
The easiest way to impede Search Engines from indexing your development site is to add a robots.txt file to the primary domain directory (i.e. the parent web directory for your website). This robots file should contain the single directive, given below:
User-agent: *
Disallow: /
Most reputable search engines will respect the directives in your robots.txt file and so the above method will be sufficient for blocking crawls and indexing during development. However, those pesky bots that like to crawl sites in a frenzy for less than virtuous reasons don’t respect anything. so, don’t expect them to tip their hat and walk on by.
Of course, they’re not strictly what we are concerned with here; in this case we simply want to avoid the likes of Google, for example, from indexing our development site. You can learn more about robots.txt on Moz.com.
Blocking Search Engines on WordPress
On a WordPress site, you don’t need to trouble yourself with manually editing and uploading the robots.txt file. You can simply visit the settings->reading management page and tick the box that says: discourage search engines from indexing this site.
Note that comment under the checkbox in the screenshot above: It is up to search engines to honour this request. This is what we were referring to earlier and while most search engines will honour the request, some of the less noble ones will.
This is why you will find some references online to password protecting the entire WordPress site at server level in order to prevent indexing of any elements whatsoever. However, in our experience, this is probably overkill and not necessary, but do keep it in mind if you begin to notice images or snippets of content making its way into search results.
Don’t Forget to Remove the Directive when Going Live
If, ultimately, your development site will be deployed as the live website, then don’t forget to edit the robots.txt file accordingly and / or deselect the setting on WordPress. Otherwise you’ll be left scratching your head wondering why indexing isn’t happening and staring blankly at your Google Search Console screen while your website fails to gain any traction in the SERPs.
Leave a Comment