How to block Google bot and other bots with the help htaccess, robots.txt and meta tags

How to block Google and bots with the help htaccess, robots.txt and meta tags

Every website owner is well-aware of the fact that it is incredibly important that Internet users get what they ask for. In spite of this, very few actually know that excluding certain pages from SERP has alike value for the final goal. Let’s take a look at some techniques that you can use to block the access to specific parts of your own web-resource.

Options for addressing the problem

 

  • Set up a password.

Blocking on the basis of htaccess-password is a truly useful way to make some sectors of your web-resource inaccessible for strangers. However, there is a disturbing problem. If you are using a demo version of your website, there is no possibility to make this password permanent.

 

  • Robots.txt.

Another alternative that Google offers us is to make use of robots.txt. This tool will inform searching engines that some parts of the website are not to be included in SERP.

In order to activate this tool, use the following coding:

User-agent: *

Disallow: /

Unfortunately, even this method is not always valid and reliable. Google’s software engineer, Matt Cuts, emphasizes that search engineers will still classify these pages as relevant to customers’ enquires, regardless of robots.txt

 

  • How to use .htaccess RewriteCond.?

If you want Google to be banned from reaching you web-resource, you may try using htaccess and coding the following:

RewriteEngine on

RewriteCond %{HTTP_USER_AGENT} AltaVista [OR]

RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]

RewriteCond %{HTTP_USER_AGENT} msnbot [OR]

RewriteCond %{HTTP_USER_AGENT} Slurp

RewriteRule ^.*$ "http\:\/\/htmlremix\.com" [R=301,L]

 

Make sure you substitute a default URL by your own one. As soon as you type yours one, the SEO ratings will start accessing your website.

 

  • Meta tags.

You can make your website disappear from the Google’s search engine by using noindex meta tag on the HTML-code. While scanning this, bots will eliminate your web-resource from search engines. Regardless of the fact whether other websites have references to yours one, it will be deleted.

Remember that this tools works correctly only if your web-resource is blocked with the help of a file called Robots.txt. In any other case, the scanner will not detect your tool and all of your efforts will be in vain. Thus, the web-resource will still be present in SERP list. Especially, if other websites have references to it.

Apart from this, noindex meta tag will be useful if you don’t have any root access to your web-resource. Hence, it will provide with an access to the specific parts of your site – every separate page.

To prevent the indexation of your page through other web-scanners, you have to put the following meta tag in the section called <head>:

<meta name="robots" content="noindex">

If you want to deny access only for Google, try the following:

<meta name="googlebot" content="noindex">

You have to realize that some scanners will interpret these actions – noindex meta tag -  in a specific manner. With that being said, you can still expect that even after these operation, you page will still be visible in many search engines.

Even after you implemented the codes, there is a possibility that your web-resource will maintain in the list of SERP. The reason behind this may be as follows: not enough time has passed for system to recognize the meta tag. Therefore, you may try to re-scan this procedure with the help of an app called Fetch. Even after this, the system still recognizes your website, you have to recheck the tag itself. It is likely that the system just doesn’t see your tag. Edit your file robots.txt. Then, try the whole process again, but this time use the program called Tester.