Editing Closed captioning (section)

== Television and video ==
For live programs, spoken words comprising the television program's [[soundtrack]] are transcribed by a human operator (a [[speech-to-text reporter]]) using [[stenotype]]- or [[stenomask]]-type machines, whose phonetic output is instantly translated into text by a computer and displayed on the screen. This technique was developed in the 1970s as an initiative of the [[BBC]]'s [[Ceefax]] [[teletext]] service.<ref>{{cite web|url=http://teletext.mb21.co.uk/timeline/early-ceefax-subtitling.shtml|title=mb21 - ether.net - The Teletext Museum - Timeline|work=mb21.co.uk}}</ref> In collaboration with the BBC, a university student took on the research project of writing the first phonetics-to-text conversion program for this purpose. Sometimes, the captions of live broadcasts, like news bulletins, sports events, live entertainment shows, and other live shows, fall behind by a few seconds. This delay is because the machine does not know what the person is going to say next, so after the person on the show says the sentence, the captions appear.<ref>{{cite web|url=http://www.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP065.pdf|title=Publications|work=bbc.co.uk|url-status=dead|archive-url=https://web.archive.org/web/20061012151554/http://www.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP065.pdf|archive-date=12 October 2006}}</ref> Automatic computer speech recognition works well when trained to recognize a single voice, and so since 2003, the BBC does live subtitling by having someone re-speak what is being broadcast. Live captioning is also a form of [[real-time text]]. Meanwhile, sport events on ESPN are using [[court reporter]]s, using a special (steno) keyboard and individually constructed "dictionaries."

In some cases, the transcript is available beforehand, and captions are simply displayed during the program after being edited. For programs that have a mix of prepared and live content, such as [[news bulletin]]s, a combination of techniques is used.

For prerecorded programs, commercials, and home videos, audio is transcribed and captions are prepared, positioned, and timed in advance.

For all types of [[NTSC]] programming, captions are encoded into [[EIA-608|Line 21]] of the [[vertical blanking interval]]{{dash}}a part of the TV picture that sits just above the visible portion and is usually unseen. For [[ATSC Standards|ATSC]] ([[digital television]]) programming, three streams are encoded in the video: two are backward-compatible ''Line 21'' captions, and the third is a set of up to 63 additional caption streams encoded in [[EIA-708]] format.<ref name="atsc.org">{{cite web|url=http://www.atsc.org/faq/faq_closed.html |title=Closed Captioning FAQ |access-date=2008-05-31 |url-status=dead |archive-url=https://web.archive.org/web/20080901221032/http://www.atsc.org/faq/faq_closed.html |archive-date=2008-09-01 }} - ATSC Closed Captioning FAQ ([http://www.evertz.com/resources/cc-imp-paper.pdf cached copy] {{webarchive |url=https://web.archive.org/web/20060322091039/http://www.evertz.com/resources/cc-imp-paper.pdf |date=2006-03-22 }})</ref>

Captioning is modulated and stored differently in [[PAL]] and [[SECAM]] countries (625 lines, 50 fields per second), where [[teletext]] is used rather than in [[EIA-608]], but the methods of preparation and the Line 21 field used are similar. For home [[Betamax|Beta]] and [[VHS]] videotapes, a shift down of this Line 21 field must be done due to the greater number of VBI lines used in 625 line PAL countries, though only a small minority of European PAL VHS machines support this (or any) format for closed caption recording. Like all teletext fields, teletext captions can not be stored by a standard 625 line VHS recorder (due to the lack of field shifting support); they are available on all professional [[S-VHS]] recordings due to all fields being recorded. Recorded Teletext caption fields also suffer from a higher number of caption errors due to increased number of bits and a low [[signal-to-noise ratio]], especially on low-bandwidth VHS. This is why Teletext captions were stored on floppy disk, separate from the analogue master tape. DVDs have their own system for subtitles and captions, which are digitally inserted in the data stream and decoded on playback into video.

For older televisions, a set-top box or other decoder is usually required. In the US, since the passage of the Television Decoder Circuitry Act, manufacturers of most television receivers sold have been required to include closed captioning display capability. High-definition TV sets, receivers, and [[TV tuner card|tuner cards]] are also covered, though the technical specifications are different (high-definition display screens, as opposed to high-definition TVs, may lack captioning). Canada has no similar law but receives the same sets as the US in most cases.

During transmission, single byte errors can be replaced by a white space which can appear at the beginning of the program. More byte errors during EIA-608 transmission can affect the screen momentarily, by defaulting to a real-time mode such as the "roll up" style, type random letters on screen, and then revert to normal. Uncorrectable byte errors within the teletext page header will cause whole captions to be dropped. EIA-608, due to using only two characters per video frame, sends these captions ahead of time storing them in a second buffer awaiting a command to display them; Teletext sends these in real-time.

The use of capitalization varies among caption providers. Most caption providers capitalize all words while others such as WGBH and non-US providers prefer to use mixed-case letters.

There are two main styles of Line 21 closed captioning:
* '''Roll-up''' or '''scroll-up''' or '''paint-on''' or '''scrolling''': Real-time words sent in paint-on or scrolling mode appear from left to right, up to one line at a time; when a line is filled in roll-up mode, the whole line scrolls up to make way for a new line, and the line on top is erased. The lines usually appear at the bottom of the screen, but can actually be placed on any of the 14 screen rows to avoid covering graphics or action. This method is used when captioning video in real-time such as for live events, where a sequential word-by-word captioning process is needed or a pre-made intermediary file isn't available. This method is signaled on [[EIA-608]] by a two-byte caption command or in Teletext by replacing rows for a roll-up effect and duplicating rows for a paint-on effect. This allows for real-time caption line editing.

[[File:Closed Caption Demonstration Still-Felix.png|thumb|225px|A still frame showing simulated closed captioning in the pop-on style]]
* '''Pop-on''' or '''pop-up''' or '''block''': A caption appears on any of the 14 screen rows as a complete sentence, which can be followed by additional captions. This method is used when captions come from an intermediary file (such as the Scenarist or EBU STL file formats) for pre-taped television and film programming, commonly produced at captioning facilities. This method of captioning can be aided by digital scripts or voice recognition software, and if used for live events, would require a video delay to avoid a large delay in the captions' appearance on-screen, which occurs with Teletext-encoded live subtitles.

=== Caption formatting ===
[[TVNZ]] Access Services and Red Bee Media for [[BBC]] and Australia example:
<pre style="color:#004000;">
I got the machine ready.</pre>
<pre style="color:red;">ENGINE STARTING
(speeding away)
</pre>
UK IMS for [[ITV (TV network)|ITV]] and Sky example:
<pre style="color:red;">
(man) I got the machine ready. (engine starting)
</pre>
US WGBH Access Services example:
<pre>
MAN: I got the machine ready. (engine starting)
</pre>
US [[National Captioning Institute]] example:
<pre>
Man: I GOT THE MACHINE READY. 
[ENGINE STARTING]
</pre>
US [[CaptionMax]] example:
<pre>
- I got the machine ready.
[engine starting]
</pre>
US in-house real-time roll-up example:
<pre>
>> Man: I GOT THE MACHINE READY.
[engine starting]
</pre>
Non-US in-house real-time roll-up example:
<pre style="color:#808000;">
MAN: I got the machine ready.
(ENGINE STARTING)
</pre>
US [[VITAC]] example:
<pre>
Man:
I got the machine ready.
[ Engine starting ]
</pre>

==== Syntax ====
For real-time captioning done outside of captioning facilities, the following syntax is used:
* '>>' (two prefixed [[greater-than sign]]s) indicates a change in single speaker.
** Sometimes appended with the speaker's name in alternate case, followed by a [[colon (punctuation)|colon]].
* '>>>' (three prefixed greater-than signs) indicates a change in news story.

Styles of syntax that are used by various captioning producers:
* Capitals indicate main on-screen dialogue and the name of the speaker.
** Legacy [[EIA-608]] home caption decoder fonts had no [[descender]]s on lowercase letters.
** Outside North America, capitals with background coloration indicate a song title or sound effect description.
** Outside North America, capitals with black or no background coloration indicates when a word is stressed or emphasized.
* Descenders indicate background sound description and [[Offscreen|off-screen]] dialogue.
** Most modern caption producers, such as [[WGBH-TV]], use [[mixed case]] for both on-screen and [[Offscreen|off-screen]] dialogue.
* '-' (a prefixed dash) indicates a change in single speaker (used by [[National Captioning Institute]] or [[CaptionMax]]).
* Words in [[italic type|italics]] indicate when a word is stressed or emphasized and when real world names are quoted.
** Italics and [[bold type]] are only supported by [[EIA-608]].
** Some North American providers use this for [[Narration|narrated]] dialogue.
** Some providers use this for [[offscreen|off-screen]] dialogue.
** Italics are also applied when a word is spoken in a foreign language.
* Text coloration indicates captioning credits and sponsorship.
** Used by [[music video]]s in the past, but generally has declined due to system incompatibilities.
** In Ceefax/Teletext countries, it indicates a change in single speaker in place of '>>'.
** Some Teletext countries use coloration to indicate when a word is stressed or emphasized.
** Coloration is limited to white, green, blue, cyan, red, yellow and magenta.
** UK order of use for text is [[white]], [[green]], [[cyan]], [[yellow]]; and backgrounds is [[black]], [[red]], [[blue]], [[magenta]], [[white]].
** US order of use for text is [[white]], [[yellow]], [[cyan]], [[green]]; and backgrounds is [[black]], [[blue]], [[red]], [[magenta]], white.
* [[Square brackets]] or [[parentheses]] indicate a song title or sound effect description.
* [[Parentheses]] indicate speaker's vocal pitch e.g., (man), (woman), (boy) or (girl).
** Outside North America, [[parentheses]] indicate a silent on-screen action.
* A pair of [[eighth note]]s is used to bracket a line of [[lyrics]] to indicate singing.
** A pair of eighth notes on a line of no text are used during a section of instrumental music or even voice tones playing with the music.
** Outside North America, a single [[number sign]] is used on a line of [[lyrics]] to indicate singing or may just instead use the eighth notes without the lyrics playing.
** An additional musical notation character is appended to the end of the last line of lyrics to indicate the song's end.
** As the symbol is unsupported by [[Ceefax]]/[[Teletext]], a [[number sign]] - which resembles a musical [[sharp (music)|sharp]] - is substituted.

==== Technical aspects ====
There were many shortcomings in the original Line 21 specification from a [[typography|typographic]] standpoint, since, for example, it lacked many of the characters required for captioning in languages other than English. Since that time, the core Line 21 character set has been expanded to include quite a few more characters, handling most requirements for languages common in North and South America such as [[French language|French]], [[spanish language|Spanish]], and [[portuguese language|Portuguese]], though those extended characters are not required in all decoders and are thus unreliable in everyday use. The problem has been almost eliminated with a market specific full set of Western European characters and a private adopted [[Norpak]] extension for [[South Korea]]n and [[Japan]]ese markets. The full [[EIA-708]] standard for digital television has worldwide character set support, but there has been little use of it due to [[EBU]] Teletext dominating [[Digital Video Broadcasting|DVB]] countries, which has its own extended character sets.

Captions are often edited to make them easier to read and to reduce the amount of text displayed onscreen. This editing can be very minor, with only a few occasional unimportant missed lines, to severe, where virtually every line spoken by the actors is condensed. The measure used to guide this editing is words per minute, commonly varying from 180 to 300, depending on the type of program. Offensive words are also captioned, but if the program is censored for TV broadcast, the broadcaster might not have arranged for the captioning to be edited or censored also. The "TV Guardian", a television [[set-top box]], is available to parents who wish to censor offensive language of programs—the video signal is fed into the box and if it detects an offensive word in the captioning, the audio signal is bleeped or muted for that period of time.

=== Caption channels ===
[[File:Cc3tout.jpg|thumb|150px|right|A [[bug (television)|bug]] touting CC1 and CC3 captions (on [[Telemundo]])]]
The Line 21 data stream can consist of data from several data channels [[multiplexed]] together. Odd field 1 can have four data channels: two separate synchronized captions (CC1, CC2) with caption-related text, such as website [[URL]]s (T1, T2). Even field 2 can have five additional data channels: two separate synchronized captions (CC3, CC4) with caption related text (T3, T4), and [[Extended Data Services]] (XDS) for Now/Next [[EPG]] details. XDS data structure is defined in CEA-608.

As CC1 and CC2 share bandwidth, if there is a lot of data in CC1, there will be little room for CC2 data and is generally only used for the primary audio captions. Similarly, CC3 and CC4 share the second even field of line 21. Since some early caption decoders supported only single field decoding of CC1 and CC2, captions for [[Second audio program|SAP]] in a second language were often placed in CC2. This led to bandwidth problems, and the U.S. [[Federal Communications Commission]] (FCC) recommendation is that bilingual programming should have the second caption language in CC3. Many Spanish television networks such as [[Univision]] and [[Telemundo]], for example, provides [[Telemundo#English subtitles|English subtitles]] for many of its [[Spain|Spanish]] programs in CC3. [[Canada|Canadian]] broadcasters use CC3 for French translated SAPs, which is also a similar practice in South Korea and Japan.

Ceefax and Teletext can have a larger number of captions for other languages due to the use of multiple VBI lines. However, only [[Europe|European countries]] used a second subtitle page for second language audio tracks where either the [[NICAM]] dual mono or [[Zweikanalton]] were used.