Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
ZIP (file format)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Internationalization issues === Versions of the format prior to 6.3.0 did not support storing file names in [[Unicode]].<ref name=":0">{{Cite web|url=https://pkwaredownloads.blob.core.windows.net/pkware-general/Documentation/APPNOTE-6.3.9.TXT|title=APPNOTE.TXT - .ZIP File Format Specification|language=en|author=PKWARE|website=PKWARE|date=July 15, 2020}}</ref> According to the standard,<ref name=":0" /> file names should be stored in the [[CP437]] encoding, which is standard for the [[IBM PC]],<ref name=":0" /> but in practice, [[DOS]] archivers used the system's installed [[character encoding]]. The built-in archiver of Windows up to 11 also used the DOS encoding corresponding to the selected system language for backward compatibility when creating archives. Subsequently, the standard was updated to include two options for storing file names in Unicode: 1) when the 11th bit in the General purpose bit flag field is set, the file name in the "File name" field of the header should be considered as [[UTF-8]] rather than a single-byte encoding, and 2) the Unicode Path Extra Field was added to store the file name in UTF-8 encoding.<ref name=":0" /> Some versions of archivers on the Windows platform have also used ANSI encoding in the past. Thus, to correctly extract files with names containing non-English characters, it is necessary:<ref name="auto">{{Cite web|url=https://git.launchpad.net/ubuntu/+source/unzip/commit/?id=8d0362fcc3761dc75fe42de312eb5a067533f68d|title=ubuntu/+source/unzip - [no description]|website=git.launchpad.net}}</ref> # Check for the presence of the Unicode Path Extra Field, and if it exists, use the filename from it, encoded in UTF-8. # Check for the presence of flag 11 in the General purpose bit flag field, and if it is set, consider the filename encoding in the "File name" field to be UTF-8. # If the "packing OS" field contains the value 11 (NTFS, Windows), and the "version of the packer" field value is greater than or equal to 20, consider the filename encoding in the "File name" field to be the ANSI (Windows) encoding corresponding to the system locale if one can be determined; otherwise, use CP437. # If the "packing OS" field contains the value 0 (FAT, DOS), and the "version of the packer" field value is between 25 and 40 inclusive, consider the filename encoding in the local header's "File name" field to be ANSI (Windows) encoding, and in the central header's "File name" field to be OEM (DOS) encoding, corresponding to the system locale if one can be determined; otherwise, use CP437. # In other cases, if the "OS packing" field contains the value 0 (FAT, DOS), 6 (HPFS, OS/2), or 11 (NTFS, Windows), consider the filename encoding in the "File name" field to be OEM (DOS) encoding, corresponding to the system locale if one can be determined; otherwise, use CP437. # In all other cases, consider the filename encoding in the "File name" field to be the system encoding of operating system unpacker is running on. Some implementations of zip unpackers did not implement this algorithm or only partially implemented it, as a result, when viewing the contents of an archive or extracting it, users saw a chaotic set of characters, known as "mojibake", instead of letters of the national alphabet. In 2016, this problem was solved in the [[FAR Manager#Linux, MacOS and BSD version|far2l]] file and archive manager for Linux, BSD and Mac.<ref>{{Cite web|url=https://github.com/elfmz/far2l/issues/114|title=error processing archives with non-english characters in the names of archived files/folders 路 Issue #114 路 elfmz/far2l|language=en|website=GitHub|access-date=2024-05-23}}</ref> In 2024, similar solution was added<ref name=":1">{{Cite web|url=https://salsa.debian.org/debian/7zip/-/merge_requests/8/diffs|title=Use system locale to select codepage for legacy zip archives (!8) 路 Merge requests 路 Debian / 7zip 路 GitLab|language=en|website=GitLab|date=2024-05-22|access-date=2024-05-23}}</ref> to the version of 7zip used in the [[Debian]] distribution and its derivatives, and to the version of unzip used in the [[Ubuntu]] distribution and its derivatives.<ref name="auto"/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)