Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Data corruption
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== {{Anchor|SILENT}}Silent == {{See also|Hard disk drive error rates and handling}} Some errors go unnoticed, without being detected by the disk firmware or the host operating system; these errors are known as ''silent data corruption''.<ref>{{cite web |url=https://support.google.com/cloud/answer/10759085?hl=en#:~:text=Silent%20Data%20Corruption%20(SDC)%2C,to%20data%20loss%20and%20corruption. |title=Silent Data Corruption |date=2023 |publisher=Google Inc. |access-date=January 30, 2023 |quote=Silent Data Corruption (SDC), sometimes referred to as Silent Data Error (SDE), is an industry-wide issue impacting not only long-protected memory, storage, and networking, but also computer CPUs.}}</ref> There are many error sources beyond the disk storage subsystem itself. For instance, cables might be slightly loose, the power supply might be unreliable,<ref>{{cite web|title=ZFS saves the day(-ta)!|url=http://blogs.oracle.com/elowe/entry/zfs_saves_the_day_ta|work=Oracle β Core Dumps of a Kernel Hacker's Brain β Eric Lowe's Blog|publisher=Oracle|access-date=9 June 2012|author=Eric Lowe|format=Blog|date=16 November 2005|url-status=dead|archive-url=https://web.archive.org/web/20120205040345/http://blogs.oracle.com/elowe/entry/zfs_saves_the_day_ta|archive-date=5 February 2012}}</ref> external vibrations such as a loud sound,<ref>{{cite web|title=Shouting in the Datacenter|url=https://www.youtube.com/watch?v=tDacjrSCeq4|work=YouTube|access-date=9 June 2012|author=bcantrill|format=Video file|date=31 December 2008|url-status=live|archive-url=https://web.archive.org/web/20120703132341/http://www.youtube.com/watch?v=tDacjrSCeq4|archive-date=3 July 2012}}</ref> the network might introduce undetected corruption,<ref>{{cite web|title=Faulty FC port meets ZFS|url=http://jforonda.blogspot.com/2007/01/faulty-fc-port-meets-zfs.html|work=Blogger β Outside the Box|access-date=9 June 2012|author=jforonda|format=Blog|date=31 January 2007|url-status=live|archive-url=https://web.archive.org/web/20120426055112/http://jforonda.blogspot.com/2007/01/faulty-fc-port-meets-zfs.html|archive-date=26 April 2012}}</ref> [[Cosmic ray#Effect on electronics|cosmic radiation]] and many other causes of [[soft error|soft memory errors]], etc. In 39,000 storage systems that were analyzed, firmware bugs accounted for 5β10% of storage failures.<ref>{{cite web |url=http://www.usenix.org/event/fast08/tech/full_papers/jiang/jiang.pdf |title=Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics |publisher=USENIX |access-date=2014-01-18 |archive-date=2022-01-25 |archive-url=https://web.archive.org/web/20220125061938/https://www.usenix.org/legacy/event/fast08/tech/full_papers/jiang/jiang.pdf |url-status=live }}</ref> All in all, the error rates as observed by a [[CERN]] study on silent corruption are far higher than one in every 10<sup>16</sup> bits.<ref name="CERN2007">{{cite web|title=Draft 1.3|url=http://indico.cern.ch/getFile.py/access?contribId=3&sessionId=0&resId=1&materialId=paper&confId=13797|work=Data integrity|publisher=CERN|access-date=9 June 2012|author=Bernd Panzer-Steindel|date=8 April 2007|url-status=live|archive-url=https://web.archive.org/web/20121027083405/http://indico.cern.ch/getFile.py/access?contribId=3&sessionId=0&resId=1&materialId=paper&confId=13797|archive-date=27 October 2012}}</ref> Webshop [[Amazon.com]] has acknowledged similar high data corruption rates in their systems.<ref>{{cite web| url = http://perspectives.mvdirona.com/2012/02/26/ObservationsOnErrorsCorrectionsTrustOfDependentSystems.aspx| title = Observations on Errors, Corrections, & Trust of Dependent Systems| url-status = live| archive-url = https://web.archive.org/web/20131029192337/http://perspectives.mvdirona.com/2012/02/26/ObservationsOnErrorsCorrectionsTrustOfDependentSystems.aspx| archive-date = 2013-10-29}}</ref> In 2021, faulty processor cores were identified as an additional cause in publications by Google and Facebook; cores were found to be faulty at a rate of several in thousands of cores.<ref>{{Cite book|last1=Hochschild|first1=Peter H.|last2=Turner|first2=Paul Jack|last3=Mogul|first3=Jeffrey C.|last4=Govindaraju|first4=Rama Krishna|last5=Ranganathan|first5=Parthasarathy|last6=Culler|first6=David E.|last7=Vahdat|first7=Amin|title=Proceedings of the Workshop on Hot Topics in Operating Systems |chapter=Cores that don't count |date=2021|chapter-url=https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s01-hochschild.pdf|pages=9β16|doi=10.1145/3458336.3465297|isbn=9781450384384|s2cid=235311320|access-date=2021-06-02|archive-date=2021-06-03|archive-url=https://web.archive.org/web/20210603055415/https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s01-hochschild.pdf|url-status=live}}</ref><ref>{{Citation|title=HotOS 2021: Cores That Don't Count (Fun Hardware)| date=27 May 2021 |url=https://www.youtube.com/watch?v=QMF3rqhjYuM |archive-url=https://ghostarchive.org/varchive/youtube/20211222/QMF3rqhjYuM |archive-date=2021-12-22 |url-status=live|language=en|access-date=2021-06-02}}{{cbignore}}</ref> One problem is that hard disk drive capacities have increased substantially, but their error rates remain unchanged. The data corruption rate has always been roughly constant in time, meaning that modern disks are not much safer than old disks. In old disks the probability of data corruption was very small because they stored tiny amounts of data. In modern disks the probability is much larger because they store much more data, whilst not being safer. That way, silent data corruption has not been a serious concern while storage devices remained relatively small and slow. In modern times and with the advent of larger drives and very fast RAID setups, users are capable of transferring 10<sup>16</sup> bits in a reasonably short time, thus easily reaching the data corruption thresholds.<ref>{{cite web |url = http://www.necam.com/docs/?id=54157ff5-5de8-4966-a99d-341cf2cb27d3 |title = Silent data corruption in disk arrays: A solution |year = 2009 |access-date = 14 December 2020 |format = PDF |publisher = NEC |archive-url = https://web.archive.org/web/20131029210013/http://www.necam.com/docs/?id=54157ff5-5de8-4966-a99d-341cf2cb27d3 |archive-date = 29 October 2013 }}</ref> As an example, [[ZFS]] creator Jeff Bonwick stated that the fast database at [[Greenplum]], which is a database software company specializing in large-scale data warehousing and analytics, faces silent corruption every 15 minutes.<ref>{{cite web |url = http://queue.acm.org/detail.cfm?id=1317400 |title = A Conversation with Jeff Bonwick and Bill Moore |date = November 15, 2007 |publisher = Association for Computing Machinery |access-date = 14 December 2020 |url-status = live |archive-url = https://web.archive.org/web/20110716221142/http://queue.acm.org/detail.cfm?id=1317400 |archive-date = 16 July 2011 }}</ref> As another example, a real-life study performed by [[NetApp]] on more than 1.5 million HDDs over 41 months found more than 400,000 silent data corruptions, out of which more than 30,000 were not detected by the hardware RAID controller (only detected during [[data scrubbing|scrubbing]]).<ref>{{Cite news |title= Keeping Bits Safe: How Hard Can It Be? |work= ACM Queue |date= October 1, 2010 |author= David S. H. Rosenthal |url= http://queue.acm.org/detail.cfm?id=1866298 |access-date= 2014-01-02 |url-status= live |archive-url= https://web.archive.org/web/20131217020947/http://queue.acm.org/detail.cfm?id=1866298 |archive-date= December 17, 2013 |author-link= David S. H. Rosenthal }}; Bairavasundaram, L., Goodson, G., Schroeder, B., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H. 2008. An analysis of data corruption in the storage stack. In Proceedings of 6th Usenix Conference on File and Storage Technologies.</ref> Another study, performed by [[CERN]] over six months and involving about 97 [[petabytes]] of data, found that about 128 [[megabytes]] of data became permanently corrupted silently somewhere in the pathway from network to disk.<ref>{{cite conference |conference=8th Annual Workshop on Linux Clusters for Super Computing |last1=Kelemen |first1=P |title=Silent corruptions |url=https://indico.desy.de/event/257/contributions/58082/attachments/37574/46878/kelemen-2007-HEPiX-Silent_Corruptions.pdf}}</ref> Silent data corruption may result in [[cascading failure]]s, in which the system may run for a period of time with undetected initial error causing increasingly more problems until it is ultimately detected.<ref>{{cite web | url = http://www.fiala.me/pubs/papers/sc12-redmpi.pdf | title = Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing | date = November 2012 | access-date = 2015-01-26 | author1 = David Fiala | author2 = Frank Mueller | author3 = Christian Engelmann | author4 = Rolf Riesen | author5 = Kurt Ferreira | author6 = Ron Brightwell | website = fiala.me | publisher = [[IEEE]] | url-status = live | archive-url = https://web.archive.org/web/20141107074511/http://www.fiala.me/pubs/papers/sc12-redmpi.pdf | archive-date = 2014-11-07 }}</ref> For example, a failure affecting file system [[metadata]] can result in multiple files being partially damaged or made completely inaccessible as the file system is used in its corrupted state.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)