[kwlug-disc] Should kwlug-disc archives be private?

Khalid Baheyeldin kb at 2bits.com
Thu Dec 6 11:42:40 EST 2018


On Thu, Dec 6, 2018 at 11:26 AM Ronald Barnes <ron at ronaldbarnes.ca> wrote:

> Khalid Baheyeldin wrote on 2018-12-03 11:49 a.m.:
>
> > The robots.txt is useless. Current crawlers do not not respect it.
>
> Interesting - that's not been my (somewhat dated) experience.
>
> I put a folder in my document root called "verboten" with absolutely no
> links to it, anywhere.
>
> Then, in robots.txt:
>
> > User-agent: *
> > Disallow: /verboten
>
> Then occasionally grep the access log(s) for "verboten".
>
> Nothing ever showed up.


Perhaps things have improved then.

I remember when Google and Bing did not obey it, and still crawled stuff
that is disallowed by robots.txt

There are also the rogue crawlers, for example, CHANGELOG.txt is a Drupal
file.

https://security.stackexchange.com/questions/118260/changelog-txt-in-apache-logs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://kwlug.org/pipermail/kwlug-disc_kwlug.org/attachments/20181206/a6a70a76/attachment.htm>


More information about the kwlug-disc mailing list