You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The addition of `Disallow:` is made in order to be compliant with:
* the `robots.txt` specification (http://www.robotstxt.org/), which
specifies that: "At least one Disallow field needs to be present
in a record"
* what is suggested in the documentation of most of the major search
engines, e.g.:
- Baidu: http://www.baidu.com/search/robots_english.html
- Google: https://developers.google.com/webmasters/control-crawl-index/docs/getting_startedhttp://www.youtube.com/watch?v=P7GY1fE5JQQ
- Yandex: help.yandex.com/webmaster/controlling-robot/robots-txt.xml
Besides the addition specified above, this commit also:
* adds a comment making it clear to everyone that the directives from
the `robots.txt` file allow all content on the site to be crawled
* updates the URL to `www.robotstxt.org`, as `robotstxt.org` doesn't
quite work:
curl -LsS robotstxt.org
curl: (7) Failed connect to robotstxt.org:80; Operation timed out
Closeh5bp#1487.
0 commit comments