Editing TeX (section)

==Aspects==
The TeX software incorporates several aspects that were not available, or were of lower quality, in other typesetting programs at the time when TeX was released. Some of the innovations are based on interesting algorithms, and have led to several theses for Knuth's students. While some of these discoveries have now been incorporated into other typesetting programs, others, such as the rules for mathematical spacing, are still unique.

===Mathematical spacing===
[[File:AMS Euler sample math.svg|right|280px|thumb|Mathematical text typeset using TeX and the [[AMS Euler]] font]]

Since the primary goal of the TeX language is high-quality typesetting for publishers of books, Knuth gave a lot of attention to the spacing rules for mathematical formulae.<ref>{{citation |title=Portraits in Silicon|first=Robert|last=Slater|publisher=MIT Press|date=1989|isbn=9780262691314 |page=349|url=https://books.google.com/books?id=aWTtMyYmKhUC&pg=PA349}}</ref><ref>{{citation|title=Digital Typography Using LaTeX|first1=Apostolos |last1=Syropoulos|first2=Antonis|last2=Tsolomitis|first3=Nick |last3=Sofroniou|publisher=Springer|date=2003|isbn=9780387952178 |url=https://books.google.com/books?id=LLYYisjrFdEC&pg=PA92|page=93}}</ref> He took three bodies of work that he considered to be standards of excellence for mathematical typography: the books typeset by the [[Addison-Wesley|Addison-Wesley Publishing]] house (the publisher of ''The Art of Computer Programming'') under the supervision of Hans Wolf; editions of the mathematical journal ''[[Acta Mathematica]]'' dating from around 1910; and a copy of ''[[Indagationes Mathematicae]]'', a [[Netherlands|Dutch]] mathematics journal. Knuth looked closely at these printed papers to sort out and look for a set of rules for spacing.<ref>{{Cite journal|last=Knuth |first=Donald E |title=Questions and Answers II |journal=TUGboat |volume=17 |date=1996 |pages=355–367}}</ref><ref name="DigitalTypography"/> While TeX provides some basic rules and the tools needed to specify proper spacing, the exact parameters depend on the font used to typeset the formula. For example, the spacing for Knuth's [[Computer Modern]] fonts has been precisely fine-tuned over the years and is now set; but when other fonts, such as [[AMS Euler]], were used by Knuth for the first time, new spacing parameters had to be defined.<ref>Knuth, Donald E. [http://www.tug.org/TUGboat/Articles/tb10-1/tb23knut.pdf ''Typesetting Concrete Mathematics''], TUGboat '''10''' (1989), pp. 31–36, 342. Reprinted as chapter 18 of ''Digital Typography'', pp. 367–378.</ref>

The typesetting of math in TeX is not without criticism, particularly with respect to technical details of the font metrics, which were designed in an era when significant attention was paid to storage requirements. This resulted in some "hacks" overloading some fields, which in turn required other "hacks". On an aesthetics level, the rendering of radicals has also been criticized.<ref>{{cite web |url=http://www.ntg.nl/maps/26/27.pdf |title=Math typesetting in TEX: The good, the bad, the ugly |last=Vieth |first=Ulrik |archive-url=https://web.archive.org/web/20220120120543/http://www.ntg.nl/maps/26/27.pdf |url-status=dead |archive-date=January 20, 2022}}</ref> The [[OpenType math]] font specification largely borrows from TeX, but has some new features/enhancements.<ref>{{cite web |url=http://blogs.msdn.com/b/murrays/archive/2006/09/13/752206.aspx|title=High-Quality Editing and Display of Mathematical Text in Office 2007}}</ref><ref>{{cite web |url=http://blogs.msdn.com/b/murrays/archive/2006/11/15/lineservices.aspx|title=LineServices}}</ref><ref>{{cite web|url=http://www.ntg.nl/maps/38/03.pdf |title=Map |website=ntg.nl}}</ref>

===Hyphenation and justification===
In comparison with manual typesetting, the problem of [[Justification (typesetting)|justification]] is easy to solve with a digital system such as TeX, which, provided that good points for line breaking have been defined, can automatically spread the spaces between words to fill in the line. The problem is thus to find the set of breakpoints that will give the most visually pleasing result. Many line-breaking algorithms use a [[greedy algorithm|''first-fit'' approach]], where the breakpoints for each line are determined one after the other, and no breakpoint is changed after it has been chosen.<ref>{{Citation | first = Michael P | last = Barnett | title = Computer Typesetting: Experiments and Prospects | place = [[Cambridge, Massachusetts|Cambridge]], [[Massachusetts|MA]] | publisher = [[MIT Press]] | date = 1965}}</ref> Such a system is not able to define a breakpoint depending on the effect that it will have on the following lines. In comparison, the ''total-fit'' line-breaking algorithm used by TeX and developed by Donald Knuth and Michael Plass <ref>url=http://svn.tug.org/interviews/plass.html</ref> considers ''all'' the possible breakpoints in a paragraph, and finds the combination of line breaks that will produce the most globally pleasing arrangement.

Formally, the algorithm defines a value called ''badness'' associated with each possible line break; the badness is increased if the spaces on the line must stretch or shrink too much to make the line the correct width. Penalties are added if a breakpoint is particularly undesirable: for example, if a word must be [[hyphen]]ated, if two lines in a row are hyphenated, or if a very loose line is immediately followed by a very tight line. The algorithm will then find the breakpoints that will minimize the sum of squares of the badness (including penalties) of the resulting lines. If the paragraph contains <math>n</math> possible breakpoints, the number of situations that must be evaluated naively is <math>2^n</math>. However, by using the method of [[dynamic programming]], the complexity of the algorithm can be brought down to <math>O(n^2)</math> (see [[Big O notation]]). Further simplifications (for example, not testing extremely unlikely breakpoints such as a hyphenation in the first word of a paragraph, or very overfull lines) lead to an efficient algorithm whose running time is <math>O(n w)</math>, where <math>w</math> is the width of a line. A similar algorithm is used to determine the best way to break paragraphs across two pages, in order to avoid [[Widow (typesetting)|widows]] or [[Orphan (typesetting)|orphans]] (lines that appear alone on a page while the rest of the paragraph is on the following or preceding page). However, in general, a thesis by Michael Plass shows how the page-breaking problem can be [[NP-complete]] because of the added complication of placing figures.{{Sfn | Knuth | Plass | 1981}} TeX's line-breaking algorithm has been adopted by several other programs, such as [[Adobe InDesign]] (a [[desktop publishing]] [[Computer application|application]])<ref>{{Citation | publisher = [[Advogato]] | url = http://www.advogato.org/article/28.html | type = interview | title = Donald E. Knuth| journal = TUGboat | volume = 21 | date = 2000 | pages = 103–10 | access-date = 26 December 2005 | archive-url = https://web.archive.org/web/20090122043044/http://www.advogato.org/article/28.html | archive-date = 22 January 2009 | url-status = dead }}</ref> and the [[GNU]] [[fmt (Unix)|fmt]] [[Unix]] [[command line]] utility.<ref>{{Citation|publisher=GNU Project |chapter-url=https://www.gnu.org/software/coreutils/manual/html_node/fmt-invocation.html#fmt-invocation |title=Core GNU utilities (GNU coreutils) manual |chapter=4.1 fmt: Reformat paragraph text |date=2016 }}</ref>

If no suitable line break can be found for a line, the system will try to hyphenate a word. The original version of TeX used a hyphenation algorithm based on a set of rules for the removal of prefixes and suffixes of words, and for deciding if it should insert a break between the two consonants in a pattern of the form [[vowel]]–[[consonant]]–[[consonant]]–[[vowel]] (which is possible most of the time).{{Sfn | Liang | 1983 | p = 3}} TeX82 introduced a new hyphenation algorithm, designed by [[Frank Liang]] in 1983, to assign priorities to breakpoints in letter groups. A list of hyphenation patterns is first generated automatically from a corpus of hyphenated words (a list of 50,000 words). If TeX must find the acceptable hyphenation positions in the word ''encyclopedia'', for example, it will consider all the subwords of the extended word ''.encyclopedia.'', where ''.'' is a special marker to indicate the beginning or end of the word. The list of subwords includes all the subwords of length 1 (''.'', ''e'', ''n'', ''c'', ''y'', etc.), of length 2 (''.e'', ''en'', ''nc'', etc.), etc., up to the subword of length 14, which is the word itself, including the markers. TeX will then look into its list of hyphenation patterns, and find subwords for which it has calculated the desirability of hyphenation at each position. In the case of our word, 11 such patterns can be matched, namely <sub>1</sub>c<sub>4</sub>l<sub>4</sub>, <sub>1</sub>cy, <sub>1</sub>d<sub>4</sub>i<sub>3</sub>a, <sub>4</sub>edi, e<sub>3</sub>dia, <sub>2</sub>i<sub>1</sub>a, ope<sub>5</sub>d, <sub>2</sub>p<sub>2</sub>ed, <sub>3</sub>pedi, pedia<sub>4</sub>, y<sub>1</sub>c. For each position in the word, TeX will calculate the ''maximum value'' obtained among all matching patterns, yielding en<sub>1</sub>cy<sub>1</sub>c<sub>4</sub>l<sub>4</sub>o<sub>3</sub>p<sub>4</sub>e<sub>5</sub>d<sub>4</sub>i<sub>3</sub>a<sub>4</sub>. Finally, the acceptable positions are those indicated by an [[even and odd numbers|odd]] number, yielding the acceptable hyphenations ''en-cy-clo-pe-di-a''. This system based on subwords allows the definition of very general patterns (such as <sub>2</sub>i<sub>1</sub>a), with low indicative numbers (either odd or even), which can then be superseded by more specific patterns (such as <sub>1</sub>d<sub>4</sub>i<sub>3</sub>a) if necessary. These patterns find about 90% of the hyphens in the original dictionary; more importantly, they do not insert any spurious hyphen. In addition, a list of exceptions (words for which the patterns do not predict the correct hyphenation) are included with the Plain TeX format; additional ones can be specified by the user.{{Sfn | Liang | 1983}}{{Rp | needed = yes | date = March 2013}}<ref>{{Citation | title = The TeXbook | chapter = Appendix H: Hyphenation | pages = 449–55}}</ref>

===Metafont===
{{main|Metafont}}

Metafont, not strictly part of TeX, is a font description system which allows the designer to describe characters algorithmically. It uses [[Bézier curve]]s in a fairly standard way to generate the actual characters to be displayed, but Knuth devotes substantial attention to the [[font rasterization|rasterizing]] problem on [[Raster graphics|bitmapped]] displays. Another thesis, by [[John Hobby]], further explores this problem of digitizing "brush trajectories". This term derives from the fact that Metafont describes characters as having been drawn by abstract brushes (and erasers). It is commonly believed that TeX is based on bitmap fonts but, in fact, these programs "know" nothing about the fonts that they are using other than their dimensions. It is the responsibility of the device driver to appropriately handle fonts of other types, including PostScript Type 1 and TrueType. Computer Modern (commonly known as "the TeX font") is freely available in Type 1 format, as are the AMS math fonts. Users of TeX systems that output directly to PDF, such as pdfTeX, XeTeX, or LuaTeX, generally never use Metafont output at all.

===Macro language===

TeX documents are written and programmed using an unusual macro language. Broadly speaking, the running of this macro language involves expansion and execution stages which do not interact directly. Expansion includes both literal expansion of macro definitions as well as conditional branching, and execution involves such tasks as setting variables/registers and the actual typesetting process of adding glyphs to boxes.

The definition of a macro not only includes a list of commands but also the syntax of the call. It differs with most widely used [[lexical preprocessor]]s like [[M4 (computer language)|M4]], in that the body of a macro gets tokenized at definition time.

The TeX macro language has been used to write larger document production systems, most notably including LaTeX and ConTeXt.