Editing Robots.txt (section)

==Standard==
When a site owner wishes to give instructions to web robots they place a text file called {{mono|robots.txt}} in the root of the web site hierarchy (e.g. {{mono|<nowiki>https://www.example.com/robots.txt</nowiki>}}). This text file contains the instructions in a specific format (see examples below). Robots that choose to follow the instructions try to fetch this file and read the instructions before fetching any other file from the [[website]]. If this file does not exist, web robots assume that the website owner does not wish to place any limitations on crawling the entire site.

A robots.txt file contains instructions for bots indicating which web pages they can and cannot access. Robots.txt files are particularly important for web crawlers from search engines such as Google.

A robots.txt file on a website will function as a request that specified robots ignore specified files or directories when crawling a site. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operates on certain data. Links to pages listed in robots.txt can still appear in search results if they are linked to from a page that is crawled.<ref>{{cite web |url=https://www.youtube.com/watch?v=KBdEwpRQRD0#t=196s |title=Uncrawled URLs in search results |publisher=YouTube |date=Oct 5, 2009 |access-date=2013-12-29 |archive-url=https://web.archive.org/web/20140106222500/http://www.youtube.com/watch?v=KBdEwpRQRD0#t=196s |archive-date=2014-01-06 |url-status=live }}</ref>

A robots.txt file covers one [[Same origin policy|origin]]. For websites with multiple subdomains, each subdomain must have its own robots.txt file. If {{mono|example.com}} had a robots.txt file but {{mono|a.example.com}} did not, the rules that would apply for {{mono|example.com}} would not apply to {{mono|a.example.com}}. In addition, each protocol and port needs its own robots.txt file; {{mono|<nowiki>http://example.com/robots.txt</nowiki>}} does not apply to pages under {{mono|<nowiki>http://example.com:8080/</nowiki>}} or {{mono|<nowiki>https://example.com/</nowiki>}}.