Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Universal Disk Format
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Character set == The UDF specifications<ref name="OSTA - UDF Specifications"/> allow only one Character Set ''OSTA CS0'', which can store any [[Unicode]] [[Code point]] excluding U+FEFF and U+FFFE. Additional character sets defined in ECMA-167 are not used.<ref name=ecma167/>{{rp|at=7.2}} Since Errata DCN-5157, the range of code points was expanded to all code points from Unicode 4.0 (or any newer or older version), which includes [[Plane (Unicode)#Supplementary Multilingual Plane|Plane]] 1β16 characters such as [[Emoji]]. DCN-5157 also recommends [[Unicode equivalence#Normalization|normalizing]] the strings to Normalization Form C.<ref name=dcn-5157>{{cite web|title=UDF 2.60 approved errata|url=http://www.osta.org/specs/pdf/udf260_errata.pdf|access-date=22 April 2018}}</ref> The OSTA CS0 character set stores a 16-bit Unicode string "compressed" into 8-bit or 16-bit units, preceded by a single-byte "compID" tag to indicate the compression type. The 8-bit storage is functionally equivalent to [[ISO-8859-1]], and the 16-bit storage is [[UTF-16]] in big endian. 8-bit-per-character file names save space because they only require half the space per character, so they should be used if the file name contains no special characters that can not be represented with 8 bits only.<ref>[http://www.osta.org/specs/pdf/udf102.pdf UDF 1.02 specification]: 2.1.1 Character Sets (also present in later versions)</ref> The reference algorithm neither checks for forbidden code points nor interprets [[Universal Character Set characters#Surrogates|surrogate pairs]], so like [[NTFS]] the string may be malformed.<ref name="OSTA - UDF Specifications" />{{rp|at=2.1.2, 6.4}} (No specific form of storage is specified by DCN-5157, but UTF-16BE is the only well-known method for storing all of Unicode while being mostly backward compatible with [[UCS-2]].)<ref name="dcn-5157" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)