Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Robots.txt
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Standard== When a site owner wishes to give instructions to web robots they place a text file called {{mono|robots.txt}} in the root of the web site hierarchy (e.g. {{mono|<nowiki>https://www.example.com/robots.txt</nowiki>}}). This text file contains the instructions in a specific format (see examples below). Robots that choose to follow the instructions try to fetch this file and read the instructions before fetching any other file from the [[website]]. If this file does not exist, web robots assume that the website owner does not wish to place any limitations on crawling the entire site. A robots.txt file contains instructions for bots indicating which web pages they can and cannot access. Robots.txt files are particularly important for web crawlers from search engines such as Google. A robots.txt file on a website will function as a request that specified robots ignore specified files or directories when crawling a site. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operates on certain data. Links to pages listed in robots.txt can still appear in search results if they are linked to from a page that is crawled.<ref>{{cite web |url=https://www.youtube.com/watch?v=KBdEwpRQRD0#t=196s |title=Uncrawled URLs in search results |publisher=YouTube |date=Oct 5, 2009 |access-date=2013-12-29 |archive-url=https://web.archive.org/web/20140106222500/http://www.youtube.com/watch?v=KBdEwpRQRD0#t=196s |archive-date=2014-01-06 |url-status=live }}</ref> A robots.txt file covers one [[Same origin policy|origin]]. For websites with multiple subdomains, each subdomain must have its own robots.txt file. If {{mono|example.com}} had a robots.txt file but {{mono|a.example.com}} did not, the rules that would apply for {{mono|example.com}} would not apply to {{mono|a.example.com}}. In addition, each protocol and port needs its own robots.txt file; {{mono|<nowiki>http://example.com/robots.txt</nowiki>}} does not apply to pages under {{mono|<nowiki>http://example.com:8080/</nowiki>}} or {{mono|<nowiki>https://example.com/</nowiki>}}.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)