Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
ZIP (file format)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Structure === [[File:ZIP-64 Internal Layout.svg|thumb|400px|ZIP-64 Internal Layout]] A ZIP file is correctly identified by the presence of an ''end of central directory record'' which is located at the end of the archive structure in order to allow the easy appending of new files. If the end of central directory record indicates a non-empty archive, the name of each file or directory within the archive should be specified in a ''central directory'' entry, along with other metadata about the entry, and an offset into the ZIP file, pointing to the actual entry data. This allows a file listing of the archive to be performed relatively quickly, as the entire archive does not have to be read to see the list of files. The entries within the ZIP file also include this information, for redundancy, in a ''local file header''. Because ZIP files may be appended to, only files specified in the central directory at the end of the file are valid. Scanning a ZIP file for local file headers is invalid (except in the case of corrupted archives), as the central directory may declare that some files have been deleted and other files have been updated. For example, we may start with a ZIP file that contains files A, B and C. File B is then deleted and C updated. This may be achieved by just appending a new file C to the end of the original ZIP file and adding a new central directory that only lists file A and the new file C. When ZIP was first designed, transferring files by floppy disk was common, yet writing to disks was very time-consuming. If you had a large zip file, possibly spanning multiple disks, and only needed to update a few files, rather than reading and re-writing all the files, it would be substantially faster to just read the old central directory, append the new files then append an updated central directory. The order of the file entries in the central directory need not coincide with the order of file entries in the archive. Each entry stored in a ZIP archive is introduced by a ''local file header'' with information about the file such as the comment, file size and file name, followed by optional "extra" data fields, and then the possibly compressed, possibly encrypted file data. The "Extra" data fields are the key to the extensibility of the ZIP format. "Extra" fields are exploited to support the ZIP64 format, WinZip-compatible AES encryption, file attributes, and higher-resolution NTFS or Unix file timestamps. Other extensions are possible via the "Extra" field. ZIP tools are required by the specification to ignore Extra fields they do not recognize. The ZIP format uses specific 4-byte "signatures" to denote the various structures in the file. Each file entry is marked by a specific signature. The end of central directory record is indicated with its specific signature, and each entry in the central directory starts with the 4-byte ''central file header signature''. There is no BOF or EOF marker in the ZIP specification. Conventionally the first thing in a ZIP file is a ZIP entry, which can be identified easily by its ''local file header signature''. However, this is not necessarily the case, as this is not required by the ZIP specification - most notably, a self-extracting archive will begin with an executable file header. Tools that correctly read ZIP archives must scan for the end of central directory record signature, and then, as appropriate, the other, indicated, central directory records. They must not scan for entries from the top of the ZIP file, because (as previously mentioned in this section) only the central directory specifies where a file chunk starts and that it has not been deleted. Scanning could lead to false positives, as the format does not forbid other data to be between chunks, nor file data streams from containing such signatures. However, tools that attempt to recover data from damaged ZIP archives will most likely scan the archive for local file header signatures; this is made more difficult by the fact that the compressed size of a file chunk may be stored after the file chunk, making sequential processing difficult. Most of the signatures end with the short integer 0x4b50, which is stored in [[little-endian]] ordering. Viewed as an [[ASCII]] string this reads "PK", the initials of the inventor Phil Katz. Thus, when a ZIP file is viewed in a text editor the first two bytes of the file are usually "PK". (DOS, OS/2 and Windows self-extracting ZIPs have an [[EXE]] before the ZIP so start with "MZ"; self-extracting ZIPs for other operating systems may similarly be preceded by executable code for extracting the archive's content on that platform.) The {{mono|.ZIP}} specification also supports spreading archives across multiple file-system files. Originally intended for storage of large ZIP files across multiple [[floppy disk]]s, this feature is now used for sending ZIP archives in parts over email, or over other transports or removable media. The [[File Allocation Table|FAT filesystem]] of DOS has a timestamp resolution of only two seconds; ZIP file records mimic this. As a result, the built-in timestamp resolution of files in a ZIP archive is only two seconds, though extra fields can be used to store more precise timestamps. The ZIP format has no notion of [[time zone]], so timestamps are only meaningful if it is known what time zone they were created in. In September 2006, PKWARE released a revision of the ZIP specification providing for the storage of file names using [[UTF-8]], finally adding Unicode compatibility to ZIP.<ref name="zip630"/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)