Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
PDF
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== File format == A PDF file is organized using [[ASCII]] characters, except for certain elements that may have binary content. The file starts with a header containing a [[File format#Magic number|magic number]] (as a readable string) and the version of the format, for example <code>%PDF-1.7</code>. The format is a subset of a COS ("Carousel" Object Structure) format.<ref>{{cite web |url = http://jimpravetz.com/blog/2012/12/in-defense-of-cos/ |archive-url=https://web.archive.org/web/20140502013429/http://jimpravetz.com/blog/2012/12/in-defense-of-cos/ |archive-date=May 2, 2014 |url-status=usurped |title = In Defense of COS, or Why I Love JSON and Hate XML |first = Jim |last = Pravetz |website = jimpravetz.com }}</ref> A COS tree file consists primarily of ''objects'', of which there are nine types:<ref name=":0" /> * [[Boolean data type|Boolean]] values, representing ''true'' or ''false'' * [[Real number]]s * [[Integer]]s * [[String (computer science)|Strings]], enclosed within parentheses (<code>(...)</code>) or represented as hexadecimal within single angle brackets (<code><...></code>). Strings may contain 8-bit characters. * Names, starting with a forward slash (<code>/</code>) * [[Array data type|Arrays]], ordered collections of objects enclosed within square brackets (<code>[...]</code>) * [[Dictionary (data structure)|Dictionaries]], collections of objects indexed by names enclosed within double angle brackets (<code><<...>></code>) * [[Stream (computing)|Streams]], usually containing large amounts of optionally compressed binary data, preceded by a dictionary and enclosed between the <code>stream</code> and <code>endstream</code> keywords. * The [[Pointer (computer programming)|null]] object Comments using 8-bit characters prefixed with the percent sign (<code>%</code>) may be inserted. Objects may be either ''direct'' (embedded in another object) or ''indirect''. Indirect objects are numbered with an ''object number'' and a ''generation number'' and defined between the <code>obj</code> and <code>endobj</code> keywords if residing in the document root. Beginning with PDF version 1.5, indirect objects (except other streams) may also be located in special streams known as ''object streams'' (marked <code>/Type /ObjStm</code>). This technique enables non-stream objects to have standard stream filters applied to them, reduces the size of files that have large numbers of small indirect objects and is especially useful for ''Tagged PDF''. Object streams do not support specifying an object's ''generation number'' (other than 0). An index table, also called the cross-reference table, is located near the end of the file and gives the byte offset of each indirect object from the start of the file.<ref>Adobe Systems, PDF Reference, pp. 39β40.</ref> This design allows for efficient [[random access]] to the objects in the file, and also allows for small changes to be made without rewriting the entire file (''incremental update''). Before PDF version 1.5, the table would always be in a special ASCII format, be marked with the <code>xref</code> keyword, and follow the main body composed of indirect objects. Version 1.5 introduced optional ''cross-reference streams'', which have the form of a standard stream object, possibly with filters applied. Such a stream may be used instead of the ASCII cross-reference table and contains the offsets and other information in binary format. The format is flexible in that it allows for integer width specification (using the <code>/W</code> array), so that for example, a document not exceeding 64 [[KiB]] in size may dedicate only 2 bytes for object offsets. At the end of a PDF file is a footer containing * The <code>startxref</code> keyword followed by an offset to the start of the cross-reference table (starting with the <code>xref</code> keyword) or the cross-reference stream object, followed by * The <code>%%EOF</code> [[end-of-file]] marker. If a cross-reference stream is not being used, the footer is preceded by the <code>trailer</code> keyword followed by a dictionary containing information that would otherwise be contained in the cross-reference stream object's dictionary: * A reference to the root object of the tree structure, also known as the ''catalog'' (<code>/Root</code>) * The count of indirect objects in the cross-reference table (<code>/Size</code>) * Other optional information Within each page, there are one or multiple content streams that describe the text, vector and images being drawn on the page. The content stream is [[Stack-oriented programming language|stack-based]], similar to PostScript.<ref>{{cite web|url = https://pikepdf.readthedocs.io/en/latest/topics/content_streams.html|title = Working with content streams|subject = PikePdf documentation|access-date = May 8, 2022|archive-date = July 5, 2022|archive-url = https://web.archive.org/web/20220705084446/https://pikepdf.readthedocs.io/en/latest/topics/content_streams.html|url-status = live}}</ref> [[File:Seitengroesse PDF 7.png|thumb|The maximum size of an Acrobat PDF page, superimposed on a map of Europe.]] There are two layouts to the PDF files: non-linearized (not "optimized") and linearized ("optimized"). Non-linearized PDF files can be smaller than their linear counterparts, though they are slower to access because portions of the data required to assemble pages of the document are scattered throughout the PDF file. Linearized PDF files (also called "optimized" or "web optimized" PDF files) are constructed in a manner that enables them to be read in a Web browser plugin without waiting for the entire file to download, since all objects required for the first page to display are optimally organized at the start of the file.<ref name="pdf-ref">{{cite web|url=https://www.adobe.com/devnet/pdf/pdf_reference.html|title=Adobe Developer Connection: PDF Reference and Adobe Extensions to the PDF Specification|publisher=Adobe Systems Inc.|access-date=December 13, 2010|url-status=dead|archive-url=https://web.archive.org/web/20061115132507/https://www.adobe.com/devnet/pdf/pdf_reference.html|archive-date=November 15, 2006}}</ref> PDF files may be optimized using [[Adobe Acrobat]] software or [[QPDF]]. Page dimensions are not limited by the format itself. However, Adobe Acrobat imposes a limit of 15 million by 15 million inches, or 225 trillion in<sup>2</sup> (145,161 km<sup>2</sup>).<ref name="pdf-ref-1.7" />{{rp|1129}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)