Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Search engine optimization
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Preventing crawling === {{main|Robots exclusion standard}} To avoid undesirable content in the search indexes, webmasters can instruct spiders not to crawl certain files or directories through the standard [[robots.txt]] file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine's database by using a [[meta tag]] specific to robots (usually <meta name="robots" content="noindex"> ). When a search engine visits a site, the robots.txt located in the [[root directory]] is the first file crawled. The robots.txt file is then parsed and will instruct the robot as to which pages are not to be crawled. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wish to crawl. Pages typically prevented from being crawled include login-specific pages such as shopping carts and user-specific content such as search results from internal searches. In March 2007, Google warned webmasters that they should prevent indexing of internal search results because those pages are considered search spam.<ref>{{cite web|url=http://searchengineland.com/newspapers-amok-new-york-times-spamming-google-la-times-hijacking-carscom-11169|title=Newspapers Amok! New York Times Spamming Google? LA Times Hijacking Cars.com?|publisher=[[Search Engine Land]]|date=May 8, 2007|access-date=May 9, 2007|archive-date=December 26, 2008|archive-url=https://web.archive.org/web/20081226161450/http://searchengineland.com/newspapers-amok-new-york-times-spamming-google-la-times-hijacking-carscom-11169|url-status=live}}</ref> In 2020, Google [[Sunset provision|sunsetted]] the standard (and open-sourced their code) and now treats it as a hint rather than a directive. To adequately ensure that pages are not indexed, a page-level robot's meta tag should be included.<ref>{{cite web|url=https://www.practicalecommerce.com/google-downgrades-nofollow-directive-now-what|title=Google Downgrades Nofollow Directive. Now What?|publisher=Practical Ecommerce|author=Jill Kocher Brown|date=February 24, 2020|access-date=2021-02-11|archive-date=January 25, 2021|archive-url=https://web.archive.org/web/20210125080754/https://www.practicalecommerce.com/google-downgrades-nofollow-directive-now-what|url-status=live}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)