Editing Computer data storage (section)

== Data organization and representation ==
A modern [[Computer|digital computer]] represents [[data]] using the [[binary numeral system]]. Text, numbers, pictures, audio, and nearly any other form of information can be converted into a string of [[bit]]s, or binary digits, each of which has a value of 0&nbsp;or&nbsp;1. The most common unit of storage is the [[byte]], equal to 8 bits. A piece of information can be handled by any computer or device whose storage space is large enough to accommodate ''the binary representation of the piece of information'', or simply [[data (computing)|data]]. For example, the [[complete works of Shakespeare]], about 1250&nbsp;pages in print, can be stored in about five [[megabyte]]s (40&nbsp;million bits) with one byte per character.

Data are [[encoded]] by assigning a bit pattern to each [[Character (computing)|character]], [[Numerical digit|digit]], or [[multimedia]] object. Many standards exist for encoding (e.g. [[character encoding]]s like [[ASCII]], image encodings like [[JPEG]], and video encodings like [[MPEG-4]]).

By adding bits to each encoded unit, redundancy allows the computer to detect errors in coded data and correct them based on mathematical algorithms. Errors generally occur in low probabilities due to [[random]] bit value flipping, or "physical bit fatigue", loss of the physical bit in the storage of its ability to maintain a distinguishable value (0&nbsp;or&nbsp;1), or due to errors in inter or intra-computer communication. A random [[RAM parity|bit flip]] (e.g. due to random [[radiation]]) is typically corrected upon detection. A bit or a group of malfunctioning physical bits (the specific defective bit is not always known; group definition depends on the specific storage device) is typically automatically fenced out, taken out of use by the device, and replaced with another functioning equivalent group in the device, where the corrected bit values are restored (if possible). The [[cyclic redundancy check]] (CRC) method is typically used in communications and storage for [[error detection]]. A detected error is then retried.

[[Data compression]] methods allow in many cases (such as a database) to represent a string of bits by a shorter bit string ("compress") and reconstruct the original string ("decompress") when needed. This utilizes substantially less storage (tens of percent) for many types of data at the cost of more computation (compress and decompress when needed). Analysis of the trade-off between storage cost saving and costs of related computations and possible delays in data availability is done before deciding whether to keep certain data compressed or not.

For [[data security|security reasons]], certain types of data (e.g. [[credit card]] information) may be kept [[encrypted]] in storage to prevent the possibility of unauthorized information reconstruction from chunks of storage snapshots.