Editing Comma-separated values (section)

==Standardization==
The name "CSV" indicates the use of the comma to separate data fields. Nevertheless, the term "CSV" is widely used to refer to a large family of formats that differ in many ways. Some implementations allow or require single or double quotation marks around some or all fields; and some reserve the first record as a header containing a list of field names. The character set being used is undefined: some applications require a Unicode [[byte order mark]] (BOM) to enforce Unicode interpretation (sometimes even a UTF-8 BOM).<ref name="rfc4180"/> Files that use the tab character instead of comma can be more precisely referred to as "TSV" for tab-separated values.

Other implementation differences include the handling of more commonplace field separators (such as space or semicolon) and newline characters inside text fields. One more subtlety is the interpretation of a blank line: it can equally be the result of writing a record of zero fields, or a record of one field of zero length; thus decoding it is ambiguous.

=== RFC <nowiki/>4180 and MIME standards === <!-- given markup prevents magic linking -->
The 2005 technical standard RFC 4180 formalizes the CSV file format and defines the [[MIME type]] "text/csv" for the handling of text-based fields. However, the interpretation of the text of each field is still application-specific. Files that follow the RFC 4180 standard can simplify CSV exchange and should be widely portable. Among its requirements:
* MS-DOS-style lines that end with (CR/LF) characters (optional for the last line).
* An optional header record (there is no sure way to detect whether it is present, so care is required when importing).
* Each record ''should'' contain the same number of comma-separated fields.
* Any field ''may'' be quoted (with double quotes).
* Fields containing a line-break, double-quote or commas ''should'' be quoted. (If they are not, the file will likely be impossible to process correctly.)
* ''If'' double-quotes are used to enclose fields, then a double-quote in a field ''must'' be represented by two double-quote characters.

The format can be processed by most programs that claim to read CSV files. The exceptions are ''(a)'' programs may not support line-breaks within quoted fields, ''(b)'' programs may confuse the optional header with data or interpret the first data line as an optional header, and ''(c)'' double-quotes in a field may not be parsed correctly automatically.

=== OKF frictionless tabular data package ===

In 2011 [[Open Knowledge Foundation]] (OKF) and various partners created a data protocols working group, which later evolved into the Frictionless Data initiative. One of the main formats they released was the Tabular Data Package. Tabular Data package was heavily based on CSV, using it as the main data transport format and adding basic type and schema metadata (CSV lacks any type information to distinguish the string "1" from the number 1).<ref>{{cite web |title=Tabular Data Package |url=https://frictionlessdata.io/specs/tabular-data-package/ |website=Frictionless Data Specs}}</ref>

The Frictionless Data Initiative has also provided a standard CSV Dialect Description Format for describing different dialects of CSV, for example specifying the field separator or quoting rules.<ref>{{cite web |title=CSV Dialect |url=https://frictionlessdata.io/specs/csv-dialect/ |website=Frictionless Data Specs}}</ref>

=== W3C tabular data standard ===
In 2013 the [[World Wide Web Consortium|W3C]] "CSV on the Web" working group began to specify technologies providing higher interoperability for web applications using CSV or similar formats.<ref>{{cite web|url=http://www.w3.org/2013/csvw/wiki/Main_Page|title=CSV on the Web Working Group|year=2013|publisher=[[World Wide Web Consortium|W3C]] CSV WG|access-date=2015-04-22}}</ref> The working group completed its work in February 2016 and is officially closed in March 2016 with the release of a set of documents and W3C recommendations<ref>[https://github.com/w3c/csvw CSV on the Web Repository] (on GitHub)</ref>
for modeling "Tabular Data",<ref>[http://www.w3.org/TR/tabular-data-model/ Model for Tabular Data and Metadata on the Web] {{Webarchive|url=https://web.archive.org/web/20150424074413/http://www.w3.org/TR/tabular-data-model/ |date=2015-04-24 }} (W3C Recommendation)</ref> and enhancing CSV with [[metadata]] and [[Semantic Web|semantics]].

While the [[well-formed element|well-formedness]] of CSV data can readily checked, testing validity and canonical form is less well developed, relative to more precise data formats, such as [[XML]] and [[SQL]], which offer richer types and rules-based validation.<ref>{{cite web|url=https://www.csvpath.org/topics/validation/schemas-or-rules|title=Rules Or Schemas|year=2024|publisher=CsvPath Project|access-date=2025-02-13}}</ref>