
03-12-07, 11:42 AM
|
|
|
|
Join Date: May 2007
Native Country: UK
Posts: 20
|
|
The origin of the robots.txt file
Quote:
|
Eventually Martin figured out that I was the bozo who kept leeching all his bandwidth, and contacted me. Throttling and QoS stuff was all in the future back then, so he went for a simpler solution: "Look for a text file called /robots.txt. It has a list of stuff you are not to pull in. Obey it, or I yell at your sysadmins." And so, I guess, my first attempt at a spider was also the first spider to obey the embryonic robot exclusion protocol. Which Martin subsequently generalized and which got turned into a standard.
|
more
|