Editing PDF (section)

== Imaging model ==
The basic design of how graphics are represented in PDF is very similar to that of PostScript, except for the use of transparency, which was added in PDF 1.4.

PDF graphics use a [[device independence|device-independent]] [[Cartesian coordinate system]] to describe the surface of a page. A PDF page description can use a [[matrix (mathematics)|matrix]] to [[scale (ratio)|scale]], [[rotate]], or [[Shear mapping|skew]] graphical elements. A key concept in PDF is that of the ''graphics state'', which is a collection of graphical parameters that may be changed, saved, and restored by a ''page description''. PDF has (as of version 2.0) 25 graphics state properties, of which some of the most important are:
* The ''current transformation matrix'' (CTM), which determines the coordinate system
* The ''[[clipping path]]''
* The ''[[color space]]''
* The ''[[alpha compositing|alpha constant]]'', which is a key component of transparency
*''[[Black point compensation]]'' control (introduced in PDF 2.0)

=== Vector graphics ===

As in PostScript, vector graphics in PDF are constructed with ''paths''. Paths are usually composed of lines and cubic [[Bézier curve]]s, but can also be constructed from the outlines of text. Unlike PostScript, PDF does not allow a single path to mix text outlines with lines and curves. Paths can be stroked, filled, fill then stroked, or used for [[clipping path|clipping]]. Strokes and fills can use any color set in the graphics state, including ''patterns''. PDF supports several types of patterns. The simplest is the ''tiling pattern'' in which a piece of artwork is specified to be drawn repeatedly. This may be a ''colored tiling pattern'', with the colors specified in the pattern object, or an ''uncolored tiling pattern'', which defers color specification to the time the pattern is drawn. Beginning with PDF 1.3 there is also a ''shading pattern'', which draws continuously varying colors. There are seven types of shading patterns of which the simplest are the ''axial shading'' (Type 2) and ''radial shading'' (Type 3). <!-- Pictures desperately needed here! -->

=== Raster images ===

Raster images in PDF (called ''Image XObjects'') are represented by dictionaries with an associated stream. The dictionary describes the properties of the image, and the stream contains the image data. (Less commonly, small raster images may be embedded directly in a page description as an ''inline image''.) Images are typically ''filtered'' for compression purposes. Image filters supported in PDF include the following general-purpose filters:

* ''ASCII85Decode'', a filter used to put the stream into 7-bit ASCII,
* ''ASCIIHexDecode'', similar to ASCII85Decode but less compact,
* ''FlateDecode'', a commonly used filter based on the [[deflate]] algorithm defined in {{IETF RFC|1951}} (deflate is also used in the [[gzip]], [[Portable Network Graphics|PNG]], and [[ZIP (file format)|zip]] file formats among others); introduced in PDF 1.2; it can use one of two groups of predictor functions for more compact zlib/deflate compression: ''Predictor 2'' from the [[TIFF]] 6.0 specification and predictors (filters) from the [[Portable Network Graphics|PNG]] specification ({{IETF RFC|2083}}),
* ''LZWDecode'', a filter based on [[LZW]] Compression; it can use one of two groups of predictor functions for more compact LZW compression: ''Predictor 2'' from the TIFF 6.0 specification and predictors (filters) from the PNG specification,
* ''RunLengthDecode'', a simple compression method for streams with repetitive data using the [[run-length encoding]] algorithm and the image-specific filters,
* ''DCTDecode'', a [[lossy]] filter based on the [[JPEG]] standard,
* ''CCITTFaxDecode'', a lossless [[bi-level image|bi-level]] (black/white) filter based on the Group 3 or [[Group 4 compression|Group 4]] [[CCITT]] (ITU-T) [[fax]] compression standard defined in ITU-T [[T.4]] and T.6,
* ''JBIG2Decode'', a lossy or [[lossless]] bi-level (black/white) filter based on the [[JBIG2]] standard, introduced in PDF 1.4, and
* ''JPXDecode'', a lossy or lossless filter based on the [[JPEG 2000]] standard, introduced in PDF 1.5.

Normally all image content in a PDF is embedded in the file. But PDF allows image data to be stored in external files by the use of ''external streams'' or ''Alternate Images''. Standardized subsets of PDF, including [[PDF/A]] and [[PDF/X]], prohibit these features.

=== Text ===

Text in PDF is represented by ''text elements'' in page content streams. A text element specifies that ''characters'' should be drawn at certain positions. The characters are specified using the ''encoding'' of a selected ''font resource''.

A font object in PDF is a description of a digital [[typeface]]. It may either describe the characteristics of a typeface, or it may include an embedded ''font file''. The latter case is called an ''embedded font'' while the former is called an ''unembedded font''. The font files that may be embedded are based on widely used standard digital font formats: [[PostScript fonts|Type 1]] (and its compressed variant CFF), [[TrueType]], and (beginning with PDF 1.6) [[OpenType]]. Additionally PDF supports the Type 3 variant in which the components of the font are described by PDF graphic operators. <!--- Type 3 bit is awkward and should be cleaned up --->

Fourteen typefaces, known as the ''standard 14 fonts'', have a special significance in PDF documents:

* [[Times Roman|Times]] (v3) (in regular, italic, bold, and bold italic)
* [[Courier (typeface)|Courier]] (in regular, oblique, bold and bold oblique)
* [[Helvetica]] (v3) (in regular, oblique, bold and bold oblique)
* [[Symbol (typeface)|Symbol]]
* [[Zapf Dingbats]]

These fonts are sometimes called the ''base fourteen fonts''.<ref>{{cite web|url=http://desktoppub.about.com/od/glossary/g/base14fonts.htm|title=Desktop Publishing: Base 14 Fonts – Definition|last=Howard|first=Jacci|work=About.com Tech|archive-url=https://web.archive.org/web/20160614134144/http://desktoppub.about.com/od/glossary/g/base14fonts.htm|archive-date=June 14, 2016|url-status=dead}}</ref> These fonts, or suitable substitute fonts with the same metrics, should be available in most PDF readers, but they are not ''guaranteed'' to be available in the reader, and may only display correctly if the system has them installed.<ref name="aquarium">{{Cite web|url=http://www.planetpdf.com/planetpdf/pdfs/pdf2k/03e/merz_fontaquarium.pdf|title=The PDF Font Aquarium|last=Merz|first=Thomas|date=June 2003|url-status=usurped|archive-url=https://web.archive.org/web/20110718231502/http://www.planetpdf.com/planetpdf/pdfs/pdf2k/03e/merz_fontaquarium.pdf|archive-date=July 18, 2011}}</ref> Fonts may be substituted if they are not embedded in a PDF.

Within text strings, characters are shown using ''character codes'' (integers) that map to glyphs in the current font using an ''encoding''. There are several predefined encodings, including ''WinAnsi'', ''MacRoman'', and many encodings for East Asian languages and a font can have its own built-in encoding. (Although the WinAnsi and MacRoman encodings are derived from the historical properties of the [[Microsoft Windows|Windows]] and [[classic Mac OS|Macintosh]] operating systems, fonts using these encodings work equally well on any platform.) PDF can specify a predefined encoding to use, the font's built-in encoding or provide a lookup table of differences to a predefined or built-in encoding (not recommended with TrueType fonts).<ref name="pdf-ref-1.7" /> The encoding mechanisms in PDF were designed for Type 1 fonts, and the rules for applying them to TrueType fonts are complex.

For large fonts or fonts with non-standard glyphs, the special encodings ''Identity-H'' (for horizontal writing) and ''Identity-V'' (for vertical) are used. With such fonts, it is necessary to provide a ''ToUnicode'' table if semantic information about the characters is to be preserved.

A text document which is [[Image scanner|scanned]] to PDF without the text being recognised by [[optical character recognition]] (OCR) is an image, with no fonts or text properties.

=== Transparency ===

The original imaging model of PDF was ''opaque,'' similar to PostScript, where each object drawn on the page completely replaced anything previously marked in the same location. In PDF 1.4 the imaging model was extended to allow transparency. When transparency is used, new objects interact with previously marked objects to produce blending effects. The addition of transparency to PDF was done by means of new extensions that were designed to be ignored in products written to PDF 1.3 and earlier specifications. As a result, files that use a small amount of transparency might be viewed acceptably by older viewers, but files making extensive use of transparency could be viewed incorrectly by an older viewer.

The transparency extensions are based on the key concepts of ''transparency groups'', ''blending modes'', ''shape'', and ''alpha''. The model is closely aligned with the features of [[Adobe Illustrator]] version 9. The [[blend modes]] were based on those used by [[Adobe Photoshop]] at the time. When the PDF 1.4 specification was published, the formulas for calculating blend modes were kept secret by Adobe. They have since been published.<ref>{{Cite web|url=https://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_archives/blend_modes.pdf|title=PDF Blend Modes Addendum|url-status=dead|archive-url=https://web.archive.org/web/20111014100004/https://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_archives/blend_modes.pdf|archive-date=October 14, 2011|access-date=January 12, 2023}}</ref>

The concept of a transparency group in PDF specification is independent of existing notions of "group" or "layer" in applications such as Adobe Illustrator. Those groupings reflect logical relationships among objects that are meaningful when editing those objects, but they are not part of the imaging model.