Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Text file
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{short description|Computer file containing plain text}} {{more citations needed|date=December 2015}} {{Infobox file format | name = Text file | icon = Text-txt.svg | iconcaption = | icon_size = | screenshot = | screenshot_size = | caption = |_noextcode = | extension = .text, .txt |_nomimecode = | mime = text/plain | type code = TEXT | uniform_type = public.plain-text | conforms_to = public.text | magic = | developer = | released = <!-- {{start date and age|YYYY|mm|dd|df=yes/no}} --> | latest_release_version = | latest_release_date = <!-- {{start date and age|YYYY|mm|dd|df=yes/no}} --> | genre = [[Document file format]], [[Digital container format|Generic container format]] | container_for = | contained_by = | extended_from = | extended_to = | standard = <!-- or: | standards = --> | free = | url = }} A '''text file''' (sometimes spelled '''textfile'''; an old alternative name is '''flat file''') is a kind of [[computer file]] that is structured as a sequence of [[line (text file)|lines]] of [[electronic text]]. A text file exists [[Data storage|stored as data]] within a [[computer file system]]. In operating systems such as [[CP/M]], where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an [[end-of-file]] (EOF) marker, as padding after the last line in a text file. In modern operating systems such as [[DOS]], [[Microsoft Windows]] and [[Unix-like]] systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. Some operating systems, such as [[Multics]], Unix-like systems, CP/M, DOS, the [[classic Mac OS]], and Windows, store text files as a sequence of bytes, with an [[Newline|end-of-line]] [[delimiter]] at the end of each line. Other operating systems, such as [[OpenVMS]] and [[OS/360 and successors|OS/360 and its successors]], have [[record-oriented filesystem]]s, in which text files are stored as a sequence either of fixed-length records or of variable-length records with a record-length value in the record header. "Text file" refers to a type of container, while [[plain text]] refers to a type of content. At a generic level of description, there are two kinds of computer files: text files and [[binary file]]s.<ref name="Lewis000">{{cite book |last=Lewis |first=John |url=https://archive.org/details/computersciencei00nell |title=Computer Science Illuminated |publisher=Jones and Bartlett |year=2006 |isbn=0-7637-4149-3 |url-access=registration}}</ref> == Data storage == {{Unreferenced section|date=September 2024}}[[Image:CsvDelimited001.svg|thumb|right|200px|A stylized iconic depiction of a [[Comma-separated values|CSV]]-formatted '''text file''']] Because of their simplicity, text files are commonly used for [[Computer data storage|storage]] of information. They avoid some of the problems encountered with other file formats, such as [[endianness]], padding bytes, or differences in the number of bytes in a [[Word (computer architecture)|machine word]]. Further, when [[data corruption]] occurs in a text file, it is often easier to recover and continue processing the remaining contents. A disadvantage of text files is that they usually have a low [[Entropy (information theory)|entropy]], meaning that the information occupies more storage than is strictly necessary. A simple text file may need no additional [[metadata]] (other than knowledge of its [[character set]]) to assist the reader in interpretation. A text file may contain no data at all, which is a case of [[zero-byte file]]. == Encoding == {{Unreferenced section|date=September 2024}} The [[ASCII|ASCII character set]] is the most common compatible subset of character sets for English-language text files, and is generally assumed to be the default file format in many situations. It covers American English, but for the British [[pound sign]], the [[euro sign]], or characters used outside English, a richer character set must be used. In many systems, this is chosen based on the default [[Locale (computer software)|locale]] setting on the computer it is read on. Prior to UTF-8, this was traditionally single-byte encodings (such as [[ISO-8859-1]] through [[ISO-8859-16]]) for European languages and [[wide character]] encodings for Asian languages. Because encodings necessarily have only a limited repertoire of characters, often very small, many are only usable to represent text in a limited subset of human languages. [[Unicode]] is an attempt to create a common standard for representing all known languages, and most known character sets are subsets of the very large Unicode character set. Although there are multiple character encodings available for Unicode, the most common is [[UTF-8]], which has the advantage of being backwards-compatible with ASCII; that is, every ASCII text file is also a UTF-8 text file with identical meaning. UTF-8 also has the advantage that [[UTF-8#fallback and auto-detection|it is easily auto-detectable]]. Thus, a common operating mode of UTF-8 capable software, when opening files of unknown encoding, is to try UTF-8 first and fall back to a locale dependent legacy encoding when it definitely is not UTF-8. == Formats == On most operating systems, the name ''text file'' refers to a file format that allows only [[plain text]] content with very little formatting (e.g., no '''[[Emphasis (typography)|bold]]''' or ''[[Italic type|italic]]'' types). Such files can be viewed and edited on [[text terminal]]s or in simple [[text editor]]s. Text files usually have the [[MIME]] type <code>text/plain</code>, usually with additional information indicating an encoding. ===Microsoft Windows text files=== <!-- This Anchor tag serves to provide a permanent target for incoming section links. Please do not remove it, nor modify it, except to add another appropriate anchor. If you modify the section title, please anchor the old title. It is always best to anchor an old section header that has been changed so that links to it won't be broken. See [[Template:Anchor]] for details. This template is {{subst:Anchor comment}} --> DOS and [[Microsoft Windows]] use a common text file format, with each line of text separated by a two-character combination: [[carriage return]] (CR) and [[line feed]] (LF). It is common for the last line of text ''not'' to be terminated with a CR-LF marker, and many text editors (including [[Notepad (Windows)|Notepad]]) do not automatically insert one on the last line. On Microsoft Windows operating systems, a file is regarded as a text file if the suffix of the name of the file (the "[[filename extension]]") is <code>.txt</code>. However, many other suffixes are used for text files with specific purposes. For example, source code for computer programs is usually kept in text files that have file name suffixes indicating the [[programming language]] in which the source is written. Most Microsoft Windows text files use ANSI, OEM, Unicode or UTF-8 encoding. What Microsoft Windows terminology calls "ANSI encodings" are usually single-byte [[ISO/IEC 8859]] encodings (i.e. ANSI in the Microsoft Notepad menus is really "System Code Page", non-Unicode, legacy encoding), except for in locales such as Chinese, Japanese and Korean that require double-byte character sets. ANSI encodings were traditionally used as default system locales within Microsoft Windows, before the transition to Unicode. By contrast, OEM encodings, also known as [[DOS code page]]s, were defined by [[IBM]] for use in the original [[IBM PC]] text mode display system. They typically include graphical and [[Box-drawing character|line-drawing characters]] common in DOS applications. "Unicode"-encoded Microsoft Windows text files contain text in [[UTF-16]] Unicode Transformation Format. Such files normally begin with [[byte order mark]] (BOM), which communicates the endianness of the file content. Although UTF-8 does not suffer from endianness problems, many Microsoft Windows programs (i.e. Notepad) prepend the contents of UTF-8-encoded files with BOM,<ref>{{cite web |date=Jan 7, 2021 |title=Using Byte Order Marks |url=https://docs.microsoft.com/en-gb/windows/win32/intl/using-byte-order-marks |url-status=live |archive-url=https://web.archive.org/web/20230221224807/https://learn.microsoft.com/en-gb/windows/win32/intl/using-byte-order-marks |archive-date=Feb 21, 2023 |access-date=2022-04-21 |work=Internationalization for Windows Applications |publisher=[[Microsoft]]}}</ref> to differentiate UTF-8 encoding from other 8-bit encodings.<ref>{{cite web |url=https://www.unicode.org/faq/utf_bom.html#BOM |title=FAQ β UTF-8, UTF-16, UTF-32 & BOM |first=Asmus |last=Freytag |publisher=The Unicode Consortium |date=2015-12-18 |access-date=2016-05-30 |quote=Yes, UTF-8 can contain a BOM. However, it makes ''no'' difference as to the endianness of the byte stream. UTF-8 always has the same byte order. An initial BOM is only used as a signature β an indication that an otherwise unmarked text file is in UTF-8. Note that some recipients of UTF-8 encoded data do not expect a BOM. Where UTF-8 is used ''transparently'' in 8-bit environments, the use of a BOM will interfere with any protocol or file format that expects specific ASCII characters at the beginning, such as the use of "#!" of at the beginning of Unix shell scripts.}}</ref> === Unix text files === On [[Unix-like]] operating systems, text files format is precisely described: [[POSIX]] defines a text file as a file that contains characters organized into zero or more lines,<ref>{{cite web |url=http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403 |title=3.403 Text File |work=[[POSIX|IEEE Std 1003.1, 2017 Edition]] |publisher=[[IEEE Computer Society]] |access-date=2019-03-01}}</ref> where lines are sequences of zero or more non-newline characters plus a terminating newline character,<ref>{{cite web |url=http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206 |title=3.206 Line |work=[[POSIX|IEEE Std 1003.1, 2013 Edition]] |publisher=[[IEEE Computer Society]] |access-date=2015-12-15}}</ref> normally LF. Additionally, POSIX defines a '''{{vanchor|printable file}}''' as a text file whose characters are printable or space or backspace according to regional rules. This excludes most control characters, which are not printable.<ref>{{cite web |url=http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_284 |title=3.284 Printable File |work=[[POSIX|IEEE Std 1003.1, 2013 Edition]] |publisher=[[IEEE Computer Society]] |access-date=2015-12-15}}</ref> === Apple Macintosh text files === Prior to the advent of [[macOS]], the [[classic Mac OS]] system regarded the content of a file (the data fork) to be a text file when its [[resource fork]] indicated that the type of the file was "TEXT".<ref name="mac-uti">{{cite web |url=https://developer.apple.com/library/prerelease/content/documentation/Miscellaneous/Reference/UTIRef/Articles/System-DeclaredUniformTypeIdentifiers.html |title=System-Declared Uniform Type Identifiers |work=Guides and Sample Code |publisher=[[Apple Inc.]] |date=2009-11-17 |access-date=2016-09-12}}</ref> Lines of classic Mac OS text files are terminated with CR characters.<ref name="mac-line-endings">{{cite web |url=https://developer.apple.com/library/mac/documentation/OpenSource/Conceptual/ShellScripting/PortingScriptstoMacOSX/PortingScriptstoMacOSX.html |title=Designing Scripts for Cross-Platform Deployment |work=Mac Developer Library |publisher=[[Apple Inc.]] |date=2014-03-10 |access-date=2016-09-12}}</ref> Being a Unix-like system, macOS uses Unix format for text files.<ref name="mac-line-endings"/> [[Uniform Type Identifier]] (UTI) used for text files in macOS is "public.plain-text"; additional, more specific UTIs are: "public.utf8-plain-text" for utf-8-encoded text, "public.utf16-external-plain-text" and "public.utf16-plain-text" for utf-16-encoded text and "com.apple.traditional-mac-plain-text" for classic Mac OS text files.<ref name="mac-uti" /> == Rendering == When opened by a text editor, human-readable content is presented to the user. This often consists of the file's plain text visible to the user. Depending on the application, control codes may be rendered either as literal instructions acted upon by the editor, or as visible [[escape character]]s that can be edited as plain text. Though there may be plain text in a text file, control characters within the file (especially the end-of-file character) can render the plain text unseen by a particular method. == Related concepts == The use of [[lightweight markup languages]] such as [[TeX]], [[markdown]] and [[wikitext]] can be regarded as an extension of plain text files, as marked-up text is still wholly or partially human-readable in spite of containing machine-interpretable annotations. Early uses of [[HTML]] could also be regarded in this way, although the HTML of modern websites is largely unreadable by humans. Other file formats such as [[enriched text]] and [[Comma-separated values|CSV]] can also be regarded as human-interpretable to some degree. == See also == * [[ASCII]] * [[EBCDIC]] * [[Filename extension]] * [[List of file formats]] * [[Newline]] * [[Syntax highlighting]] * [[Text-based protocol]] * [[Text editor]] * [[Unicode]] == Notes and references == {{reflist}} == External links == * [[c2:PowerOfPlainText|Power of Plain Text]] on C2 wiki {{Computer files}} [[Category:Text file formats|*]] [[Category:Computer data]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite book
(
edit
)
Template:Cite web
(
edit
)
Template:Computer files
(
edit
)
Template:Infobox file format
(
edit
)
Template:More citations needed
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Unreferenced section
(
edit
)
Template:Vanchor
(
edit
)