Editing Noindex

{{short description|Meta tag used to request that Internet bots avoid indexing a web page}}
{{lowercase}}
{{for|the internal use on Wikipedia|WP:NOINDEX|selfref=y}}
The '''noindex''' value of an HTML robots [[meta tag]] requests that automated [[Internet bots]] avoid [[Search engine indexing|indexing]] a web page.<ref name="W3spec">[http://www.w3.org/TR/html401/appendix/notes.html#h-B.4.1.2 Robots and the META element], Official W3 specification</ref><ref>[http://www.robotstxt.org/meta.html About the Robots <META> tag]</ref> It is also a value of the HTTP response header X-Robots-Tag.<ref>{{cite web|url=https://webmasters.stackexchange.com/questions/71351/robots-txt-vs-noindex-tags|title=Robots.txt vs. Noindex Tags}}</ref> Reasons why one might want to use this meta tag include advising robots not to index a very large database, web pages that are very transitory, web pages that are under development, web pages that one wishes to keep slightly more private, or the printer and mobile-friendly versions of pages. Since the burden of honoring a website's noindex tag lies with the author of the search robot, sometimes these tags are ignored. Also the interpretation of the noindex tag is sometimes slightly different from one search engine company to the next.

== Noindexing entire pages ==
<syntaxhighlight lang="html">
<html>
<head>
  <meta name="robots" content="noindex">
  <title>Don't index this page</title>
</head>
</syntaxhighlight>

Possible values for the meta tag content are: "none", "all", "index", "noindex", "nofollow", and "follow". A combination of the values is also possible,<ref name="W3spec" /> for example:
<syntaxhighlight lang="html">
<meta name="robots" content="noindex, follow">
</syntaxhighlight>
	
=== Bot-specific directives ===
The noindex directive can be restricted only to certain bots by specifying a different "name" value in the meta tag.
For example, to specifically block Google's bot,<ref name="google_noindex">[http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=93710 Using meta tags to block access to your site], Google Webmasters Tools Help</ref> specify: 
<syntaxhighlight lang="html">
<meta name="googlebot" content="noindex">
</syntaxhighlight>

Or, to block Bing's bot, specify: 
<syntaxhighlight lang="html">
<meta name="bingbot" content="noindex">
</syntaxhighlight>

Or, to block Baidu's bot, specify: 
<syntaxhighlight lang="html">
<meta name="baiduspider" content="noindex">
</syntaxhighlight>

=== robots.txt file ===
A [[Robots exclusion standard|robots.txt]] file can be used to block crawling.

== Noindexing part of a page ==

It is also possible to exclude part of a Web page, for example navigation text, from being indexed rather than the whole page. There are various techniques for doing this; it is possible to use several in combination. Google's main indexing spider, [[Googlebot]], is not known to recognize any of these techniques.

=== <noindex> tag ===
The Russian search engine [[Yandex]] introduced a new <nowiki><noindex></nowiki> tag which prevents indexing of the content between the tags. To allow the source code to validate, <nowiki><!--noindex--></nowiki> alternatively can be used:<ref>{{cite web |url=http://help.yandex.com/webmaster/controlling-robot/html.xml#noindex |title=Using HTML tags |work=webmaster → help |publisher=[[Yandex]] |at=Section: &lt;noindex&gt; tag |access-date=March 25, 2013}}</ref>
<syntaxhighlight lang="html">
<p>
Do index this text.
<noindex>Don't index this text.</noindex>
<!--noindex-->Don't index this text.<!--/noindex-->
</p>
</syntaxhighlight>

Other [[Web crawler|indexing spider]]s also recognize the <nowiki><noindex></nowiki> tag, including [[Atomz]].<ref>{{cite web |url=https://center.atomz.com/center/help/?sp_topic=/Search/FAQs/General_Search#150 |title=General Search FAQ |year=2013 |work=Help |publisher=[[Atomz]] |at=Section: How do I exclude parts of my site from being searched? |access-date=March 23, 2013 |quote=Need to prevent parts of individual pages from being searched? If you want to exclude portions of a page from indexing, surround the text with &lt;noindex&gt; and &lt;/noindex&gt; tags. This is useful, for example, if you want to exclude navigation text from searches. |archive-date=December 8, 2021 |archive-url=https://web.archive.org/web/20211208074738/https://center.atomz.com/center/help/?sp_topic=/Search/FAQs/General_Search#150 |url-status=dead }}{{registration required}}</ref>

=== microformat ===
There is a 2005 draft [[microformat]]s specification with the same functionality. The Robot Exclusion Profile looks for the attribute and value ''<nowiki>class="robots-noindex"</nowiki>'' in HTML tags:<ref name="microformat">{{cite web |url=http://microformats.org/wiki/robots-exclusion |title=Robot Exclusion Profile |last=Janes |first=Peter |date=June 18, 2005 |publisher=Microformats |access-date=March 24, 2013}}</ref>
<syntaxhighlight lang="html">
<p>Do index this text.</p>
<div class="robots-noindex">Don't index this text.</div>
<span class="robots-noindex">Don't index this text.</span>
<p class="robots-noindex">Don't index this text.</p>
</syntaxhighlight>

A combination of values is also possible,<ref name="microformat" /> for example:
<syntaxhighlight lang="html">
<div class="robots-noindex robots-follow">Text.</div>
</syntaxhighlight>

=== Yahoo! ===
In 2007, [[Yahoo!]] introduced similar functionality to the microformat into its spider. However, Yahoo!'s spider is incompatible in that it looks for the value ''<nowiki>class="robots-nocontent"</nowiki>'' and only this value:<ref>{{cite web |url=http://www.ysearchblog.com/2007/05/02/introducing-robots-nocontent-for-page-sections/ |title=Introducing Robots-Nocontent for Page Sections |last=Garg |first=Priyank |date=May 2, 2007 |work=Yahoo! Search Blog |publisher=[[Yahoo!]] |access-date=March 23, 2013 |archive-url=https://web.archive.org/web/20140820072720/http://www.ysearchblog.com/2007/05/02/introducing-robots-nocontent-for-page-sections/ |archive-date=August 20, 2014 |url-status=dead }}</ref>
<syntaxhighlight lang="html">
<p>Do index this text.</p>
<div class="robots-nocontent">Don't index this text.</div>
<span class="robots-nocontent">Don't index this text.</span>
<p class="robots-nocontent">Don't index this text.</p>
</syntaxhighlight>

=== SharePoint ===

[[SharePoint]] 2010’s iFilter excludes content inside of a <nowiki><div></nowiki> tag with the attribute and value ''<nowiki>class="noindex"</nowiki>''. Inner <nowiki><div></nowiki>s were initially not excluded, but this may have changed. It is also unknown whether the attribute can be applied to tags other than <nowiki><div></nowiki>.<ref>{{cite web |url=https://blogs.msdn.microsoft.com/markarend/2010/06/07/control-search-indexing-crawling-within-a-page-with-noindex/ |title=Control Search Indexing (Crawling) Within a Page with Noindex |date=June 7, 2010 |website=Microsoft Developer |publisher=[[Microsoft]] |access-date=November 4, 2017 |url-status=live |archive-url=https://web.archive.org/web/20171104100854/https://blogs.msdn.microsoft.com/markarend/2010/06/07/control-search-indexing-crawling-within-a-page-with-noindex/ |archive-date=November 4, 2017}}
</ref>

<syntaxhighlight lang="html">
<p>Do index this text.</p>
<div class="noindex">Don't index this text.</div>
</syntaxhighlight>

=== Structured comments ===
==== Google Search Appliance ====
The [[Google Search Appliance]] uses structured comments:<ref>{{cite web |url=https://developers.google.com/search-appliance/documentation/68/admin_crawl/Preparing |title=Administering Crawl: Preparing for a Crawl |date=August 23, 2012 |work=[[Google Search Appliance]] |publisher=Google Inc. |at=Section: Excluding Unwanted Text from the Index |access-date=March 23, 2013 |archive-url=https://web.archive.org/web/20121123112433/https://developers.google.com/search-appliance/documentation/68/admin_crawl/Preparing |archive-date=November 23, 2012}}</ref>
<syntaxhighlight lang="html">
<p>
Do index this text.
<!--googleoff: all-->
Don't index this text.
<!--googleon: all-->
</p>
</syntaxhighlight>

Other indexing spiders also use their own structured comments.<!-- including FreeFind.com, but that cannot be included as a <ref> because the website is on Wikipedia's banned list, and JRank.org -->

== See also ==
* [[Nofollow]] link attribute
* [[Robots Exclusion Standard]]

== References ==
{{Reflist}}

[[Category:Search engine optimization]]
[[Category:World Wide Web]]