What not to write in Robots.txt11 Jan 2014
Robots.txt is a vital part of most/all of websites. It feeds the robots from different search engine e.g. Google, Bing etc.
Instructions in this file instructs robots to what to crawl(eat), what not to crawl (eat) by blacklisting or whitelisting urls. The thumb rule that is followed to pick either of the approach are:
1.) Use Whitelisting public urls if one wish to block crawler to eat up newly added url (without entry being written for same in robots.txt). We whitelist url’s by writing
in the robots.txt
2.) Use Blacklisting urls if one wish to allow crawler to eat up everything else. We blacklist url’s by writing
Now this is okay as far as one is only blocking public urls from being content aggregated. But this may pose security threats to web application if one is blacklisting secret urls, by making them public (unknowingly), since robots.txt is publicly available and in human readable format.
So by blacklisting secret urls, one make them prone to attack. Instead of blacklisting secret urls. One should whitelist other urls
<3 <3 <3