Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Formant
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Spectrum of phonetic resonance in speech production}} [[Image:Spectrogram -iua-.png|250px|thumb|[[Spectrogram]] of American English vowels {{IPA|[i, u, ɑ]}} showing the formants ''F''<sub>1</sub> and ''F''<sub>2</sub>]] {{IPA notice}} In [[speech science]] and [[phonetics]], a '''formant''' is the broad spectral maximum that results from an [[acoustic resonance]] of the [[Vocal tract|human vocal tract]].<ref>Titze, I.R. (1994). Principles of Voice Production, Prentice Hall, {{ISBN|978-0-13-717893-3}}.</ref><ref>Titze, I.R., Baken, R.J. Bozeman, K.W., Granqvist, S. Henrich, N., Herbst, C.T., Howard, D.M., Hunter, E.J., Kaelin, D., Kent, R.D., Löfqvist, A., McCoy, S., Miller, D.G., Noé, H., Scherer, R.C., Smith, J.R., Story, B.H., Švec, J.G., Ternström, S. and Wolfe, J. (2015) "Toward a consensus on symbolic notation of harmonics, resonances, and formants in vocalization." J. Acoust. Soc. America. 137, 3005–3007.</ref> In [[acoustics]], a formant is usually defined as a broad peak, or local maximum, in the spectrum.<ref>Jeans, J.H. (1938) Science & Music, reprinted by Dover, 1968.</ref><ref>Standards Secretariat, Acoustical Society of America, (1994). ANSI S1.1-1994 (R2004) American National Standard Acoustical Terminology, (12.41) Acoustical Society of America, Melville, NY.</ref> For harmonic sounds, with this definition, the formant frequency is sometimes taken as that of the [[harmonic]] that is most augmented by a resonance. The difference between these two definitions resides in whether "formants" characterise the production mechanisms of a sound or the produced sound itself. In practice, the frequency of a spectral peak differs slightly from the associated resonance frequency, except when, by luck, harmonics are aligned with the resonance frequency, or when the sound source is mostly non-harmonic, as in whispering and [[Vocal fry register|vocal fry]]. A room can be said to have formants characteristic of that particular room, due to its resonances, i.e., to the way sound reflects from its walls and objects. Room formants of this nature reinforce themselves by emphasizing specific frequencies and absorbing others, as exploited, for example, by [[Alvin Lucier]] in his piece ''[[I Am Sitting in a Room]]''. In acoustic [[digital signal processing]], the way a collection of formants (such as a room) affects a signal can be represented by an [[impulse response]]. In both speech and rooms, formants are characteristic features of the resonances of the space. They are said to be ''excited'' by acoustic sources such as the voice, and they shape (filter) the sources' sounds, but they are not sources themselves. ==History== From an acoustic point of view, phonetics had a serious problem with the idea that the effective length of vocal tract changed vowels.<ref>{{Cite book |last=Hermann |first=Ludimar |title=Phonophotographische Untersuchungen |year=1894 |edition=5th |language=de |trans-title=Phonophotographical Studies}}</ref> Indeed, when the length of the vocal tract changes, all the acoustic resonators formed by mouth cavities are scaled, and so are their resonance frequencies. Therefore, it was unclear how vowels could depend on frequencies when talkers with different vocal tract lengths, for instance [[Bass (voice type)|bass]] and [[soprano]] singers, can produce sounds that are perceived as belonging to the same phonetic category. There had to be some way to normalize the spectral information underpinning the vowel identity. [[Ludimar Hermann|Hermann]] suggested a solution to this problem in 1894, coining the term “formant”. A vowel, according to him, is a special acoustic phenomenon, depending on the intermittent production of a special partial, or “formant”, or “characteristique” feature. The frequency of the “formant” may vary a little without altering the character of the vowel. For “long e” (''ee'' or ''iy'') for example, the lowest-frequency “formant” may vary from 350 to 440 Hz even in the same person.<ref name="McKendrick">McKendrick, J. G. (1903). Experimental phonetics. In Annual report of the board of regents of the Smithsonian institution for the year ending June 30, 1902 (pp. 241–259). Smithsonian Institution.</ref> ==Phonetics== {| class="wikitable sortable plainrowheaders" style="margin:1em;float:right;text-align:center" |+ Average vowel formants for a male voice (in Hz)<ref>{{cite book | last=Catford | first=John Cunnison|authorlink=J. C. Catford | title=A Practical Introduction to Phonetics | publisher=Oxford University Press |location=Oxford | year=2001|edition=2nd | isbn=0-19-924635-1 |page=154}}</ref> |- ! scope="col" | Vowel<br />([[International Phonetic Alphabet|IPA]]) ! scope="col" style="width:3.5em" | ''F''<sub>1</sub> ! scope="col" style="width:3.5em" | ''F''<sub>2</sub> ! scope="col" style="width:3.5em" | ''F''<sub>2</sub> – ''F''<sub>1</sub> |- ! scope="row" style="text-align:center" | {{IPA|i}} | 240 || 2400 || 2160 |- ! scope="row" style="text-align:center" | {{IPA|y}} | 235 || 2100 || 1865 |- ! scope="row" style="text-align:center" | {{IPA|e}} | 390 || 2300 || 1910 |- ! scope="row" style="text-align:center" | {{IPA|ø}} | 370 || 1900 || 1530 |- ! scope="row" style="text-align:center" | {{IPA|ɛ}} | 610 || 1900 || 1290 |- ! scope="row" style="text-align:center" | {{IPA|æ}} | 585 || 1710 || 1125 |- ! scope="row" style="text-align:center" | {{IPA|a}} | 850 || 1610 || 760 |- ! scope="row" style="text-align:center" | {{IPA|ɶ}} | 820 || 1530 || 710 |- ! scope="row" style="text-align:center" | {{IPA|ɑ}} | 750 || 940 || 190 |- ! scope="row" style="text-align:center" | {{IPA|ɒ}} | 700 || 760 || 60 |- ! scope="row" style="text-align:center" | {{IPA|ʌ}} | 600 || 1170 || 570 |- ! scope="row" style="text-align:center" | {{IPA|ɔ}} | 500 || 700 || 200 |- ! scope="row" style="text-align:center" | {{IPA|ɤ}} | 460 || 1310 || 850 |- ! scope="row" style="text-align:center" | {{IPA|o}} | 360 || 640 || 280 |- ! scope="row" style="text-align:center" | {{IPA|ɯ}} | 300 || 1390 || 1090 |- ! scope="row" style="text-align:center" | {{IPA|u}} | 250 || 595 || 345 |} Formants are distinctive frequency components of the acoustic signal produced by speech, musical instruments<ref>Reuter, Christoph (2009): The role of formant positions and micro-modulations in blending and partial masking of musical instruments. In: [[Journal of the Acoustical Society of America]] (JASA), Vol. 126,4, p. 2237</ref> or [[singing]]. The information that humans require to distinguish between speech sounds can be represented purely quantitatively by specifying peaks in the frequency spectrum. Most of these formants are produced by tube and chamber [[resonance]], but a few whistle tones derive from periodic collapse of [[Venturi effect]] low-pressure zones.<ref>{{cite book |last1=Flanagan |first1=James L. |title=Speech Analysis Synthesis and Perception |date=1972 |doi=10.1007/978-3-662-01562-9 |isbn=978-3-662-01564-3 |url=https://link.springer.com/book/10.1007/978-3-662-01562-9 |language=en}}</ref> The formant with the lowest frequency is called ''F''<sub>1</sub>, the second ''F''<sub>2</sub>, the third ''F''<sub>3</sub>, and so forth. The [[fundamental frequency]] or [[Pitch (music)|pitch]] of the voice is sometimes referred to as ''F''<sub>0</sub>, but it is not a formant. Most often the two first formants, ''F''<sub>1</sub> and ''F''<sub>2</sub>, are sufficient to identify the vowel. The relationship between the perceived vowel quality and the first two formant frequencies can be appreciated by listening to "artificial vowels" that are generated by passing a click train (to simulate the glottal pulse train) through a pair of bandpass filters (to simulate vocal tract resonances). [[Front vowel]]s have higher ''F''<sub>2</sub>, while [[low vowel]]s have higher ''F''<sub>1</sub>. [[Lip rounding]] tends to lower ''F''<sub>1</sub> and ''F''<sub>2</sub> in back vowels and ''F''<sub>2</sub> and ''F''<sub>3</sub> in front vowels.<ref>{{cite book|last=Thomas|first=Erik R.|year=2011|title=Sociophonetics: An Introduction|publisher=Palgrave Macmillan|page=145|isbn=978-0-230-22455-1}}</ref> Nasal consonants usually have an additional formant around 2500 Hz. The liquid {{IPA|[l]}} usually has an extra formant at 1500 Hz, whereas the [[English language|English]] "r" sound ({{IPA|[ɹ]}}) is distinguished by a very low third formant (well below 2000 Hz). [[Plosives]] (and, to some degree, [[Fricative consonant|fricatives]]) modify the placement of formants in the surrounding vowels. [[Bilabial consonant|Bilabial]] sounds (such as {{IPA|/b/}} and {{IPA|/p/}} in "ball" or "sap") cause a lowering of the formants; on spectrograms, [[Velar consonant|velar]] sounds ({{IPA|/k/}} and {{IPA|/ɡ/}} in English) almost always show ''F''<sub>2</sub> and ''F''<sub>3</sub> coming together in a 'velar pinch' before the [[Velar consonant|velar]] and separating from the same 'pinch' as the velar is released; [[Alveolar consonant|alveolar]] sounds (English {{IPA|/t/}} and {{IPA|/d/}}) cause fewer systematic changes in neighbouring vowel formants, depending partially on exactly which vowel is present. The time course of these changes in vowel formant frequencies are referred to as 'formant transitions'. In normal voiced speech, the underlying vibration produced by the vocal folds resembles a [[sawtooth wave]], rich in [[harmonic]] overtones. If the fundamental frequency or (more often) one of the overtones is higher than a resonance frequency of the system, then the resonance will be only weakly excited and the formant usually imparted by that resonance will be mostly lost. This is most apparent in the case of [[soprano]] [[opera]] singers, who sing at pitches high enough that their vowels become very hard to distinguish. Control of resonances is an essential component of the vocal technique known as [[overtone singing]], in which the performer sings a low fundamental tone, and creates sharp resonances to select upper [[harmonics]], giving the impression of several tones being sung at once. [[Spectrogram]]s may be used to visualise formants. In spectrograms, it can be hard to distinguish formants from naturally occurring harmonics when one sings. However, one can hear the natural formants in a vowel shape through atonal techniques such as [[vocal fry]]. ==Formant estimation== Formants, whether they are seen as acoustic resonances of the vocal tract, or as local maxima in the speech spectrum, like [[band-pass filter]]s, are defined by their frequency and by their [[spectral width]] ([[Bandwidth (signal processing)|bandwidth]]). Different methods exist to obtain this information. Formant frequencies, in their acoustic definition, can be estimated from the [[frequency spectrum]] of the sound, using a spectrogram (in the figure) or a spectrum analyzer. However, to estimate the acoustic resonances of the vocal tract (i.e. the speech definition of formants) from a speech recording, one can use ''[[linear predictive coding]]''. An intermediate approach consists in extracting the spectral envelope by neutralizing the fundamental frequency,<ref>{{cite journal |last1=Kawahara |first1=Hideki |last2=Masuda-Katsuse |first2=Ikuyo |last3=de Cheveigné |first3=Alain |title=Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds |journal=Speech Communication |date=April 1999 |volume=27 |issue=3–4 |pages=187–207 |doi=10.1016/S0167-6393(98)00085-5}}</ref> and only then looking for local maxima in the spectral envelope. ==Formant plots== [[File:Catford formant plot.png|thumb|A plot of the average formants listed in the above chart]] The first two formants are important in determining the quality of vowels, and are frequently said to correspond to the open/close (or low/high) and front/back dimensions (which have traditionally been associated with the shape and position of the [[tongue]]). Thus the first formant ''F''<sub>1</sub> has a higher frequency for an open or low vowel such as {{IPA|[a]}} and a lower frequency for a closed or high vowel such as {{IPA|[i]}} or {{IPA|[u]}}; and the second formant ''F''<sub>2</sub> has a higher frequency for a front vowel such as {{IPA|[i]}} and a lower frequency for a back vowel such as {{IPA|[u]}}.<ref>Ladefoged, Peter (2006) ''A Course in Phonetics (Fifth Edition)'', Boston, MA: Thomson Wadsworth, p. 188. {{ISBN|1-4130-2079-8}}</ref><ref>Ladefoged, Peter (2001) ''Vowels and Consonants: An Introduction to the Sounds of Language'', Maldern, MA: Blackwell, p. 40. {{ISBN|0-631-21412-7}}</ref> Vowels will almost always have four or more distinguishable formants, and sometimes more than six. However, the first two formants are the most important in determining vowel quality and are often plotted against each other in vowel diagrams,<ref>Deterding, David (1997) 'The Formants of Monophthong Vowels in Standard Southern British English Pronunciation', ''Journal of the International Phonetic Association'', 27, pp. 47–55.</ref> though this simplification fails to capture some aspects of vowel quality such as rounding.<ref>Hayward, Katrina (2000) ''Experimental Phonetics'', Harlow, UK: Pearson, p. 149. {{ISBN|0-582-29137-2}}</ref> Many writers have addressed the problem of finding an optimal alignment of the positions of vowels on formant plots with those on the conventional vowel quadrilateral. The pioneering work of Ladefoged<ref>{{cite book|last1=Ladefoged|first1=P.|title=Three Areas of Experimental Phonetics|date=1967|publisher=Oxford|page=87}}</ref> used the [[Mel scale]] because this scale was claimed to correspond more closely to the auditory scale of [[pitch (music)|pitch]] than to the acoustic measure of [[fundamental frequency]] expressed in Hertz. Two alternatives to the Mel scale are the [[Bark scale]] and the [[Equivalent rectangular bandwidth|ERB-rate scale]].<ref>{{cite book|last1=Hayward|first1=K.|title=Experimental Phonetics|date=2000|publisher=Longman|isbn=0-582-29137-2}}</ref> Another widely adopted strategy is plotting the difference between ''F''<sub>1</sub> and ''F''<sub>2</sub> rather than ''F''<sub>2</sub> on the horizontal axis.{{Citation needed|date=November 2021}} ==Singer's formant== {{Main|Squillo}} Studies of the frequency spectrum of trained speakers and classical [[singing|singers]], especially male singers, indicate a clear formant around 3000 Hz (between 2800 and 3400 Hz) that is absent in speech or in the spectra of untrained speakers or singers. It is thought to be associated with one or more of the higher resonances of the vocal tract.<ref>Sundberg, J. (1974). "Articulatory interpretation of the 'singing formant'", ''Journal of the Acoustical Society of America'', 55, 838–844.</ref><ref>{{cite journal |last1=Bele |first1=Irene Velsvik |title=The Speaker's Formant |journal=J. Voice |date=December 2006 |volume=20 |issue=4 |pages=555–578 |doi=10.1016/j.jvoice.2005.07.001 |pmid=16325374 |access-date=}}</ref> It is this increase in energy at 3000 Hz which allows singers to be heard and understood over an [[orchestra]]. This formant is actively developed through [[vocal pedagogy|vocal training]], for instance through so-called ''[[voce di strega]]'' or "witch's voice"<ref name="FrisellBrandenBooks">{{cite book |author=Frisell, Anthony |title=Baritone Voice |publisher=Branden Books |location=Boston |year=2007 |pages=84 |isbn=978-0-8283-2181-5 }}</ref> exercises and is caused by a part of the vocal tract acting as a [[Helmholtz resonator|resonator]].<ref name="Sundberg">{{cite book |author=Sundberg, Johan |title=The science of the singing voice |publisher=[[Northern Illinois University Press]] |location=DeKalb, Ill |year=1987 |isbn=0-87580-542-6 }}</ref> In classical music and vocal pedagogy, this phenomenon is also known as ''[[squillo]]''. ==See also== *[[Formant synthesis]] *[[Human voice]] *[[Linear predictive coding]] *[[Praat]] *[[Timbre]] *[[Vocoder]] ==References== {{reflist}} ==External links== *[http://homepage.ntu.edu.tw/~karchung/Phonetics%20II%20page%20nineteen.htm Formants for fun and profit] *[http://www.geofex.com/Article_Folders/wahpedl/voicewah.htm Formants and wah-wah pedals] *[http://www.phys.unsw.edu.au/jw/formant.html What is a formant?] A discussion of the three different meanings of the word 'formant' *[http://www.phys.unsw.edu.au/jw/soprane.html Formant tuning by soprano singers] from the University of New South Wales *[http://www.phys.unsw.edu.au/jw/xoomi.html The acoustics of harmonic or overtone singing] from the University of New South Wales *[http://videoweb.nie.edu.sg/phonetic/vowels/measurements.html Materials for measuring and plotting vowel formants] {{Acoustics}} [[Category:Human voice]] [[Category:Sound synthesis types]] [[Category:Acoustics]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Acoustics
(
edit
)
Template:Citation needed
(
edit
)
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:IPA
(
edit
)
Template:IPA notice
(
edit
)
Template:ISBN
(
edit
)
Template:Main
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)