Editing Scalability (section)

==Strong versus eventual consistency (storage)==
In the context of scale-out [[Computer data storage|data storage]], scalability is defined as the maximum storage cluster size which guarantees full data consistency, meaning there is only ever one valid version of stored data in the whole cluster, independently from the number of redundant physical data copies. Clusters which provide "lazy" redundancy by updating copies in an asynchronous fashion are called [[Eventual consistency|'eventually consistent']]. This type of scale-out design is suitable when availability and responsiveness are rated higher than consistency, which is true for many web file-hosting services or web caches (''if you want the latest version, wait some seconds for it to propagate''). For all classical transaction-oriented applications, this design should be avoided.<ref>{{cite news|title=Eventual consistency by Werner Vogels|author=Sadek Drobi|url=http://www.infoq.com/news/2008/01/consistency-vs-availability|date=January 11, 2008|access-date=April 8, 2017|publisher=InfoQ}}</ref>

Many open-source and even commercial scale-out storage clusters, especially those built on top of standard PC hardware and networks, provide eventual consistency only, such as some NoSQL databases like [[CouchDB]] and others mentioned above. Write operations invalidate other copies, but often don't wait for their acknowledgements. Read operations typically don't check every redundant copy prior to answering, potentially missing the preceding write operation. The large amount of metadata signal traffic would require specialized hardware and short distances to be handled with acceptable performance (i.e., act like a non-clustered storage device or database).{{cn|date=May 2023}}

Whenever strong data consistency is expected, look for these indicators:{{cn|date=May 2023}} 
* the use of InfiniBand, Fibrechannel or similar low-latency networks to avoid performance degradation with increasing cluster size and number of redundant copies. 
* short cable lengths and limited physical extent, avoiding signal runtime performance degradation.
* majority / quorum mechanisms to guarantee data consistency whenever parts of the cluster become inaccessible.

Indicators for eventually consistent designs (not suitable for transactional applications!) are:{{cn|date=May 2023}}
* write performance increases linearly with the number of connected devices in the cluster.
* while the storage cluster is partitioned, all parts remain responsive. There is a risk of conflicting updates.