Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Ext3
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===No checksumming in journal=== ext3 does not do [[checksum]]ming when writing to the journal. On a storage device with extra cache, if ''barrier=1'' is not enabled as a mount option (in [[/etc/fstab]]), and if the hardware is doing out-of-order write caching, one runs the risk of severe filesystem corruption during a crash.<ref name="archives.free">[http://archives.free.net.ph/message/20070518.134838.52e26369.en.html Re: Frequent metadata corruption with ext3 + hard power-off] {{Webarchive|url=https://web.archive.org/web/20070928031902/http://archives.free.net.ph/message/20070518.134838.52e26369.en.html |date=2007-09-28 }}. Archives.free.net.ph. Retrieved on 2013-06-22.</ref><ref>[http://archives.free.net.ph/message/20070519.014256.ac3a2e07.en.html Re: Frequent metadata corruption with ext3 + hard power-off] {{Webarchive|url=https://web.archive.org/web/20070928031908/http://archives.free.net.ph/message/20070519.014256.ac3a2e07.en.html |date=2007-09-28 }}. Archives.free.net.ph. Retrieved on 2013-06-22.</ref><ref>Red Hat Enterprise Linux, [https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/writebarr.html ''Chapter 20. Write Barriers'']</ref> This is because storage devices with write caches report to the system that the data has been completely written, even if it was written to the (volatile) cache. If hard disk writes are done out-of-order (due to modern hard disks caching writes in order to [[amortized analysis|amortize]] write speeds), it is likely that one will write a commit block of a transaction before the other relevant blocks are written. If a power failure or unrecoverable crash should occur before the other blocks get written, the system will have to be rebooted. Upon reboot, the file system will replay the log as normal, and replay the "winners" (transactions with a commit block, including the invalid transaction above, which happened to be tagged with a valid commit block). The unfinished disk write above will thus proceed, but using corrupt journal data. The file system will thus mistakenly overwrite normal data with corrupt data while replaying the journal. If checksums had been used, where the blocks of the "fake winner" transaction were tagged with a mutual checksum, the file system could have known better and not replayed the corrupt data onto the disk. Journal checksumming has been added to ext4.<ref>[http://article.gmane.org/gmane.linux.file-systems/21373 ext4: Add the journal checksum feature]. Article.gmane.org (2008-02-26). Retrieved on 2013-06-22.</ref> Filesystems going through the device mapper interface (including software [[RAID]] and LVM implementations) may not support barriers, and will issue a warning if that mount option is used.<ref>[http://oss.sgi.com/archives/xfs/2007-12/msg00080.html Re: write barrier over device mapper supported or not?] {{Webarchive|url=https://web.archive.org/web/20090504120507/http://oss.sgi.com/archives/xfs/2007-12/msg00080.html |date=2009-05-04 }}. Oss.sgi.com. Retrieved on 2013-06-22.</ref><ref>[http://madduck.net/blog/2006.08.11:xfs-zeroes/ XFS and zeroed files] {{Webarchive|url=https://web.archive.org/web/20080430221349/http://madduck.net/blog/2006.08.11:xfs-zeroes/ |date=2008-04-30 }}. Madduck.net (2008-07-11). Retrieved on 2013-06-22.</ref> There are also some disks that do not properly implement the write cache flushing extension necessary for barriers to work, which causes a similar warning.<ref>[https://web.archive.org/web/20110727154012/http://forums.opensuse.org/archives/sls-archives/suse-linux/desktop-environments/379681-barrier-sync.html Barrier Sync]. forums.opensuse.org (March 2007)</ref> In these situations, where barriers are not supported or practical, reliable write ordering is possible by turning off the disk's write cache and using the {{code|1=data=journal}} mount option.<ref name="archives.free" /> Turning off the disk's write cache may be required even when barriers are available. Applications like databases expect a call to [[sync (Unix)|fsync()]] to flush pending writes to disk, and the barrier implementation doesn't always clear the drive's write cache in response to that call.<ref>[http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg272253.html Re: Proposal for "proper" durable fsync() and fdatasync()]. Mail-archive.com (2008-02-26). Retrieved on 2013-06-22.</ref> There is also a potential issue with the barrier implementation related to error handling during events, such as a drive failure.<ref>[http://www.mjmwired.net/kernel/Documentation/block/barrier.txt I/O Barriers, as of kernel version 2.6.31]. Mjmwired.net. Retrieved on 2013-06-22.</ref> It is also known that sometimes some [[virtualization]] technologies do not properly forward fsync or flush commands to the underlying devices (files, volumes, disk) from a guest operating system.<ref>[http://www.mysqlperformanceblog.com/2011/03/21/virtualization-and-io-modes-extra-complexity/ Virtualization and IO Modes = Extra Complexity]. Mysqlperformanceblog.com (2011-03-21). Retrieved on 2013-06-22.</ref> Similarly, some hard disks or controllers implement cache flushing incorrectly or not at all, but still advertise that it is supported, and do not return any error when it is used.<ref>[http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ SSD, XFS, LVM, fsync, write cache, barrier and lost transactions]. Mysqlperformanceblog.com (2009-03-02). Retrieved on 2013-06-22.</ref> There are so many ways to handle fsync and write cache handling incorrectly, it is safer to assume that cache flushing does not work unless it is explicitly tested, regardless of how reliable individual components are believed to be.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)