What not to write in Robots.txt

Robots.txt is a vital part of most/all of websites. It feeds the robots from different search engine e.g. Google, Bing etc.

Instructions in this file instructs robots to what to crawl(eat), what not to crawl (eat) by blacklisting or whitelisting urls. The thumb rule that is followed to pick either of the approach are:

1.) Use Whitelisting public urls if one wish to block crawler to eat up newly added url (without entry being written for same in robots.txt). We whitelist url’s by writing

Allow: /users
Allow: /*/tests

in the robots.txt

2.) Use Blacklisting urls if one wish to allow crawler to eat up everything else. We blacklist url’s by writing

Disallow: /users
Disallow: /*/tests

in robots.txt

Now this is okay as far as one is only blocking public urls from being content aggregated. But this may pose security threats to web application if one is blacklisting secret urls, by making them public (unknowingly), since robots.txt is publicly available and in human readable format.

Disallow: /secret.html
Disallow: /*/password.xml

So by blacklisting secret urls, one make them prone to attack. Instead of blacklisting secret urls. One should whitelist other urls

Allow: /public
Allow: /public/*

<3 <3 <3