Editing RAID (section)

== <span class="anchor" id="SCRUBBING"></span>Integrity ==
[[Data scrubbing]] (referred to in some environments as ''patrol read'') involves periodic reading and checking by the RAID controller of all the blocks in an array, including those not otherwise accessed. This detects bad blocks before use.<ref>Ulf Troppens, Wolfgang Mueller-Friedt, Rainer Erkens, Rainer Wolafka, Nils Haustein. ''Storage Networks Explained: Basics and Application of Fibre Channel SAN, NAS, ISCSI, InfiniBand and FCoE''. John Wiley and Sons, 2009. p.39</ref> Data scrubbing checks for bad blocks on each storage device in an array, but also uses the redundancy of the array to recover bad blocks on a single drive and to reassign the recovered data to spare blocks elsewhere on the drive.<ref>Dell Computers, Background Patrol Read for Dell PowerEdge RAID Controllers, By Drew Habas and John Sieber, Reprinted from Dell Power Solutions, February 2006 http://www.dell.com/downloads/global/power/ps1q06-20050212-Habas.pdf</ref>

Frequently, a RAID controller is configured to "drop" a component drive (that is, to assume a component drive has failed) if the drive has been unresponsive for eight seconds or so; this might cause the array controller to drop a good drive because that drive has not been given enough time to complete its internal error recovery procedure. Consequently, using consumer-marketed drives with RAID can be risky, and so-called "enterprise class" drives limit this error recovery time to reduce risk.{{Citation needed|date=October 2013}} Western Digital's desktop drives used to have a specific fix. A utility called WDTLER.exe limited a drive's error recovery time. The utility enabled [[Time-Limited Error Recovery|TLER (time limited error recovery)]], which limits the error recovery time to seven seconds. Around September 2009, Western Digital disabled this feature in their desktop drives (such as the Caviar Black line), making such drives unsuitable for use in RAID configurations.<ref name="csc.liv.ac.uk">{{cite web |title=Error Recovery Control with Smartmontools |url=http://www.csc.liv.ac.uk/~greg/projects/erc/ |date=2009 |access-date=September 29, 2017 |url-status=dead |archive-url=https://web.archive.org/web/20110928190045/http://www.csc.liv.ac.uk/~greg/projects/erc/ |archive-date=September 28, 2011}}</ref> However, Western Digital enterprise class drives are shipped from the factory with TLER enabled. Similar technologies are used by Seagate, Samsung, and Hitachi. For non-RAID usage, an enterprise class drive with a short error recovery timeout that cannot be changed is therefore less suitable than a desktop drive.<ref name="csc.liv.ac.uk" /> In late 2010, the [[Smartmontools]] program began supporting the configuration of ATA Error Recovery Control, allowing the tool to configure many desktop class hard drives for use in RAID setups.<ref name="csc.liv.ac.uk" />

While RAID may protect against physical drive failure, the data is still exposed to operator, software, hardware, and virus destruction. Many studies cite operator fault as a common source of malfunction,<ref>{{cite journal |last1=Gray |first1=Jim |title=A census of Tandem system availability between 1985 and 1990 |journal=IEEE Transactions on Reliability |date=Oct 1990 |volume=39 |issue=4 |pages=409–418 |doi=10.1109/24.58719 |publisher=IEEE|s2cid=2955525 |url=http://pdfs.semanticscholar.org/22a4/ddf4d609c6e9c8a0a0ea6187af4c3178a7ed.pdf |archive-url=https://web.archive.org/web/20190220114624/http://pdfs.semanticscholar.org/22a4/ddf4d609c6e9c8a0a0ea6187af4c3178a7ed.pdf |url-status=dead |archive-date=2019-02-20 }}</ref><ref>{{cite journal |last1=Murphy |first1=Brendan |last2=Gent |first2=Ted |title=Measuring system and software reliability using an automated data collection process |journal=Quality and Reliability Engineering International |date=1995 |volume=11 |issue=5 |pages=341–353 |doi=10.1002/qre.4680110505}}</ref> such as a server operator replacing the incorrect drive in a faulty RAID, and disabling the system (even temporarily) in the process.<ref>Patterson, D., Hennessy, J. (2009), 574.</ref>

An array can be overwhelmed by catastrophic failure that exceeds its recovery capacity and the entire array is at risk of physical damage by fire, natural disaster, and human forces, however backups can be stored off site. An array is also vulnerable to controller failure because it is not always possible to migrate it to a new, different controller without data loss.<ref>{{cite web |url=http://www.tomshardware.com/reviews/RAID-MIGRATION-ADVENTURE,1640.html |title=The RAID Migration Adventure |date=10 July 2007 |access-date=2010-03-10}}</ref>