Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Filename
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Encoding interoperability== There is no general encoding standard for filenames. File names have to be exchanged between software environments for network file transfer, file system storage, backup and file synchronization software, configuration management, data compression and archiving, etc. It is thus very important not to lose file name information between applications. This led to wide adoption of Unicode as a standard for encoding file names, although legacy software might not be Unicode-aware. ===Encoding indication interoperability=== Traditionally, filenames allowed any character in their filenames as long as they were file system safe.<ref name="solaris presentations IUC29-FileSystems" /> Although this permitted the use of any encoding, and thus allowed the representation of any local text on any local system, it caused many interoperability issues. A filename could be stored using different byte strings in distinct systems within a single country, such as if one used Japanese [[Shift JIS]] encoding and another Japanese [[EUC-JP|EUC]] encoding. Conversion was not possible as most systems did not expose a description of the encoding used for a filename as part of the extended file information. This forced costly filename encoding guessing with each file access.<ref name="solaris presentations IUC29-FileSystems">{{cite web|author=David Robinson|author2=Ienup Sung|author3=Nicolas Williams |date=March 2006 |url=http://developers.sun.com/global/products_platforms/solaris/reference/presentations/IUC29-FileSystems.pdf |title=Solaris presentations: File Systems, Unicode, and Normalization |location=San Francisco |publisher=Sun.com |url-status=dead |archive-url=https://web.archive.org/web/20120704003732/http://developers.sun.com/global/products_platforms/solaris/reference/presentations/IUC29-FileSystems.pdf |archive-date=July 4, 2012 }}</ref> A solution was to adopt Unicode as the encoding for filenames. In the classic Mac OS, however, encoding of the filename was stored with the filename attributes.<ref name="solaris presentations IUC29-FileSystems" /> ===Unicode interoperability=== The Unicode standard solves the encoding determination issue. Nonetheless, some limited interoperability issues remain, such as normalization (equivalence), or the Unicode version in use. For instance, UDF is limited to Unicode 2.0; macOS's [[HFS+]] file system applies NFD Unicode normalization and is optionally case-sensitive (case-insensitive by default.) Filename maximum length is not standard and might depend on the code unit size. Although it is a serious issue, in most cases this is a limited one.<ref name="solaris presentations IUC29-FileSystems" /> On Linux, this means the filename is not enough to open a file: additionally, the exact byte representation of the filename on the storage device is needed. This can be solved at the application level, with some tricky normalization calls.<ref>{{cite web|url=http://nedbatchelder.com/blog/201106/filenames_with_accents.html |title=Filenames with accents |date=June 2011 |publisher=Ned Batchelder |access-date=September 17, 2013}}</ref> The issue of Unicode equivalence is known as "normalized-name collision". A solution is the ''Non-normalizing Unicode Composition Awareness'' used in the Subversion and Apache technical communities.<ref>{{cite web|url=https://cwiki.apache.org/confluence/display/SVN/NonNormalizingUnicodeCompositionAwareness |title=NonNormalizingUnicodeCompositionAwareness - Subversion Wiki |publisher=Wiki.apache.org |date=January 21, 2013 |access-date=October 8, 2023}}</ref> This solution does not normalize paths in the repository. Paths are only normalized for the purpose of comparisons. Nonetheless, some communities have patented this strategy, forbidding its use by other communities.{{clarify|A patent cannot be held by multiple communities (patent-holders)|date=February 2013}} ===Perspectives=== To limit interoperability issues, some ideas described by Sun are to: * use one Unicode encoding (such as UTF-8) * do transparent code conversions on filenames * store no normalized filenames * check for canonical equivalence among filenames, to avoid two canonically equivalent filenames in the same directory.<ref name="solaris presentations IUC29-FileSystems" /> Those considerations create a limitation not allowing a switch to a future encoding different from UTF-8. === Unicode migration === One issue was migration to Unicode. For this purpose, several software companies provided software for migrating filenames to the new Unicode encoding. * Microsoft provided migration transparent for the user throughout the VFAT technology * Apple provided "File Name Encoding Repair Utility v1.0".<ref>{{cite web|date=June 1, 2006|title=File Name Encoding Repair Utility v1.0|url=https://support.apple.com/kb/DL355|access-date=October 2, 2018|publisher=Support.apple.com}}</ref> * The Linux community provided "[[convmv]]".<ref>{{cite web|title=convmv - converts filenames from one encoding to another|url=http://www.j3e.de/linux/convmv/man/|access-date=September 17, 2013|publisher=J3e.de}}</ref> [[Mac OS X 10.3]] marked Apple's adoption of Unicode 3.2 character decomposition, superseding the Unicode 2.1 decomposition used previously. This change caused problems for developers writing software for Mac OS X.<ref>{{cite web|date=May 7, 2010|title=Re: git on MacOSX and files with decomposed utf-8 file names|url=http://kerneltrap.org/mailarchive/git/2008/1/23/593749/thread|url-status=dead|archive-url=https://web.archive.org/web/20110315014244/http://kerneltrap.org/mailarchive/git/2008/1/23/593749/thread|archive-date=March 15, 2011|access-date=July 5, 2010|publisher=KernelTrap}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)