Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Mel scale
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Conceptual scale}} [[Image:Mel-Hz_plot.svg|right|thumb|450px|Plots of pitch mel scale versus hertz scale]] [[Image:A440.png|thumb|A440 {{audio|A440.mid|Play}}. 440 Hz = 549.64 mels]] The '''mel scale''' (after the word ''[[melody]]'')<ref name=stevens1937> {{cite journal |journal = Journal of the Acoustical Society of America |title = A scale for the measurement of the psychological magnitude pitch |author1 = Stevens, Stanley Smith |author2 = Volkmann |author3 = John |author4 = Newman, Edwin B. |volume = 8 |issue = 3 |pages = 185–190 |year = 1937 |url = http://asadl.org/jasa/resource/1/jasman/v8/i3/p185_s1 |bibcode = 1937ASAJ....8..185S |doi = 10.1121/1.1915893 |url-status = dead |archive-url = https://archive.today/20130414065947/http://asadl.org/jasa/resource/1/jasman/v8/i3/p185_s1 |archive-date = 2013-04-14 |url-access= subscription }}</ref> is a perceptual scale of [[pitch (music)|pitch]]es judged by listeners to be equal in distance from one another. The reference point between this scale and normal [[frequency]] measurement is defined by assigning a perceptual pitch of 1000 mels to a 1000 [[Hertz|Hz]] tone, 40 [[decibel|dB]] above the listener's threshold. Above about 500 Hz, increasingly large [[interval (music)|interval]]s are judged by listeners to produce equal pitch increments. ==Formula== A formula (O'Shaughnessy 1987) to convert ''f'' hertz into ''m'' mels is<ref> {{cite book | title = Speech communication: human and machine | author = Douglas O'Shaughnessy | publisher = Addison-Wesley | year = 1987 | isbn = 978-0-201-16520-3 | page = 150 | url = https://books.google.com/books?id=mHFQAAAAMAAJ&q=2595 }}</ref> <math display="block">m = 2595 \log_{10}\left(1 + \frac{f}{700}\right).</math> [[File:Mel-scale_from_200_to_1500,_in_intervals_of_50.ogg|thumb|Mel-scale from 200 to 1500, in intervals of 50]] ==History and other formulas== The formula from O'Shaughnessy's book can be expressed with different logarithmic bases: <math display="block">m = 2595 \log_{10}\left(1 + \frac{f}{700}\right) = 1127 \ln\left(1 + \frac{f}{700}\right).</math> The corresponding inverse expressions are <math display="block">f = 700\left(10^\frac{m}{2595} - 1\right) = 700\left(e^\frac{m}{1127} - 1\right).</math> There were published curves and tables on psychophysical pitch scales since Steinberg's 1937<ref> {{cite journal | journal = Journal of the Acoustical Society of America | title = Positions of stimulation in the cochlea by pure tones | author = John C. Steinberg | volume = 8 | issue = 3 | pages = 176–180 | year = 1937 | url = http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=JASMAN000008000003000176000001 | bibcode = 1937ASAJ....8..176S | doi = 10.1121/1.1915891 | url-access = subscription }}</ref> curves based on [[just-noticeable difference]]s of pitch. More curves soon followed in Fletcher and Munson's 1937<ref> {{cite journal | journal = Journal of the Acoustical Society of America | title = Relation Between Loudness and Masking | author1 = Harvey Fletcher | author2 = W. A. Munson | volume = 9 | issue = 1 | pages = 1–10 | year = 1937 | bibcode = 1937ASAJ....9....1F |doi = 10.1121/1.1915904 }}</ref> and Fletcher's 1938<ref> {{cite journal | journal = Journal of the Acoustical Society of America | title = Loudness, Masking and Their Relation to the Hearing Process and the Problem of Noise Measurement | author = Harvey Fletcher | volume = 9 | pages = 275–293 | year = 1938 | issue = 4 | url = http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=JASMAN000009000004000275000001 | bibcode = 1938ASAJ....9..275F | doi = 10.1121/1.1915935 | url-access = subscription }}</ref> and Stevens' 1937<ref name=stevens1937/> and Stevens and Volkmann's 1940<ref> {{cite journal | journal = American Journal of Psychology | title = The Relation of Pitch to Frequency: A Revised Scale | author1 = Stevens, S. | author2 = Volkmann, J. | volume = 53 | issue = 3 | pages = 329–353 | year = 1940 | doi=10.2307/1417526 | jstor = 1417526 }}</ref> papers using a variety of experimental methods and analysis approaches. In 1949 Koenig published an approximation based on separate linear and logarithmic segments, with a break at 1000 Hz.<ref> {{cite journal | journal = Bell Telephone Laboratory Record | title = A new frequency scale for acoustic measurements | author = W. Koenig | volume = 27 | pages = 299–301 | year = 1949 }}</ref> [[Gunnar Fant]] proposed the current popular linear/logarithmic formula in 1949, but with the 1000 Hz corner frequency.<ref> Gunnar Fant (1949) "Analys av de svenska konsonantljuden : talets allmänna svängningsstruktur", LM Ericsson protokoll H/P 1064.</ref> An alternate expression of the formula, not depending on choice of logarithm base, is noted in Fant (1968):<ref>Fant, Gunnar. (1968). Analysis and synthesis of speech processes. In B. Malmberg (ed.), ''Manual of phonetics'' (pp. 173–177). Amsterdam: North-Holland.</ref><ref> {{cite book | title = Techniques in speech acoustics | author1 = Jonathan Harrington | author2 = Steve Cassidy | publisher = Springer | year = 1999 | isbn = 978-0-7923-5731-5 | page = 18 | url = https://books.google.com/books?id=E1SyZZN8WQkC&pg=PA18 }}</ref> <math display="block">m = \frac{1000}{\log 2} \log\left(1 + \frac{f}{1000}\right).</math> In 1976, [[John Makhoul|Makhoul]] and Cosell published the now-popular version with the 700 Hz corner frequency.<ref> {{Cite book | title = ICASSP '76. IEEE International Conference on Acoustics, Speech, and Signal Processing | chapter = LPCW: An LPC vocoder with linear predictive spectral warping | author1 = John Makhoul | author2 = Lynn Cosell | volume = 1 | publisher = IEEE | pages = 466–469 | year = 1976 | author1-link = John Makhoul | doi = 10.1109/ICASSP.1976.1170013 }}</ref> As Ganchev et al. have observed, "The formulae [with 700], when compared to [Fant's with 1000], provide a closer approximation of the Mel scale for frequencies below 1000 Hz, at the price of higher inaccuracy for frequencies higher than 1000 Hz."<ref> {{citation | work = Proceedings of the SPECOM-2005 | title = Comparative evaluation of various MFCC implementations on the speaker verification task | author1 = T. Ganchev | author2 = N. Fakotakis | author3 = G. Kokkinakis | pages = 191–194 | year = 2005 | citeseerx = 10.1.1.75.8303 }}</ref> Above 7 kHz, however, the situation is reversed, and the 700 Hz version again fits better. Data by which some of these formulas are motivated are tabulated in Beranek (1949), as measured from the curves of Stevens and Volkmann:<ref>Beranek, Leo L. (1949). ''Acoustic measurements''. New York: McGraw-Hill.</ref> {| class="wikitable" style="text-align:center" |+ Beranek 1949 mel scale data from Stevens and Volkmann 1940 |- ! Hz | 20 || 160 || 394 || 670 || 1000 || 1420 || 1900 || 2450 || 3120 || 4000 || 5100 || 6600 || 9000 || 14000 |- ! mel | 0 || 250 || 500 || 750 || 1000 || 1250 || 1500 || 1750 || 2000 || 2250 || 2500 || 2750 || 3000 || 3250 |} A formula with a break frequency of 625 Hz is given by Lindsay & Norman (1977);<ref>Lindsay, Peter H.; & Norman, Donald A. (1977). ''Human information processing: An introduction to psychology'' (2nd ed.). New York: Academic Press.</ref> the formula does not appear in their 1972 first edition: <math display="block">m = 2410 \log_{10}(0.0016 f + 1).</math> For direct comparison with other formulae, this is equivalent to <math display="block">m = 2410 \log_{10}\left(1 + \frac{f}{625}\right).</math> Most mel-scale formulas give exactly 1000 mels at 1000 Hz. The break frequency (e.g. 700 Hz, 1000 Hz, or 625 Hz) is the only free parameter in the usual form of the formula. Some non-mel auditory-frequency-scale formulas use the same form but with much lower break frequency, not necessarily mapping to 1000 at 1000 Hz; for example the [[Equivalent rectangular bandwidth|ERB-rate]] scale of Glasberg and Moore (1990) uses a break point of 228.8 Hz,<ref>B. C. J. Moore and B. R. Glasberg, "Suggested formulae for calculating auditory-filter bandwidths and excitation patterns", Journal of the Acoustical Society of America 74: 750–753, 1983.</ref> and the cochlear frequency–place map of Greenwood (1990) uses 165.3 Hz.<ref>Greenwood, D. D. (1990). A cochlear frequency–position function for several species—29 years later. ''The Journal of the Acoustical Society of America'', 87, 2592–2605.</ref> Other functional forms for the mel scale have been explored by Umesh et al.; they point out that the traditional formulas with a logarithmic region and a linear region do not fit the data from Stevens and Volkmann's curves as well as some other forms, based on the following data table of measurements that they made from those curves:<ref> {{cite conference | conference = Proc. ICASSP 1999 | URL=https://www.researchgate.net/publication/3793925_Fitting_the_Mel_scale | doi=10.1109/ICASSP.1999.758101 | title = Fitting the mel scale | author1 = Umesh, S. | author2 = Cohen, L. | author3 = Nelson, D. | pages = 217–220 | isbn = 978-0-7803-5041-0 | year = 1999 }}</ref> {| class="wikitable" style="text-align:center" |+ Umesh et al. 1999 mel scale data from Stevens and Volkmann 1940 |- ! Hz | 40 || 161 || 200 || 404 || 693 || 867 || 1000 || 2022 || 3000 || 3393 || 4109 || 5526 || 6500 || 7743 || 12000 |- ! mel | 43 || 257 || 300 || 514 || 771 || 928 || 1000 || 1542 || 2000 || 2142 || 2314 || 2600 || 2771 || 2914 || 3228 |} [[Malcolm Slaney|Slaney]]'s MATLAB Auditory Toolbox agrees with Umesh et al. and uses the following two-piece fit, though notably not using the "1000 mels at 1000 Hz" convention:<ref>Slaney, M. Auditory Toolbox: A MATLAB Toolbox for Auditory Modeling Work. Technical Report, version 2, Interval Research Corporation, 1998., translated to Python in [https://librosa.org/doc/0.10.0/_modules/librosa/core/convert.html#hz_to_mel librosa] ([https://librosa.org/doc/0.10.0/generated/librosa.mel_frequencies.html librosa documentation]).</ref> <math display="block"> m(f) = \begin{cases} \dfrac{3f}{200}, & f < 1000, \\ 15 + 27 \log_{6.4} \left(\dfrac{f}{1000}\right), & f \geq 1000. \end{cases} </math> == Applications == The first version of [[Google]]'s [[Lyra (codec)|Lyra codec]] uses ''log mel spectrograms'' as the feature-extraction step. The transmitted data is a [[vector quantization|vector-quantized]] form of the spectrogram, which is then synthesized back to speech by a neural network. Use of the mel scale is believed to weigh the data in a way appropriate to human perception.<ref>{{cite web |title=Lyra: A New Very Low-Bitrate Codec for Speech Compression |url=https://ai.googleblog.com/2021/02/lyra-new-very-low-bitrate-codec-for.html |website=ai.googleblog.com |language=en |date=25 February 2021}} See also: {{arXiv|2102.11906}}, {{arXiv|2102.09660}}.</ref> MelGAN takes a similar approach.<ref>{{cite journal |last1=Kumar |first1=Kundan |last2=Kumar |first2=Rithesh |last3=de Boissiere |first3=Thibault |last4=Gestin |first4=Lucas |last5=Teoh |first5=Wei Zhen |last6=Sotelo |first6=Jose |last7=de Brebisson |first7=Alexandre |last8=Bengio |first8=Yoshua |last9=Courville |first9=Aaron |title=MelGAN: generative adversarial networks for conditional waveform synthesis |journal=Proceedings of the 33rd International Conference on Neural Information Processing Systems |date=8 December 2019 |pages=14910–14921 |url=https://dl.acm.org/doi/abs/10.5555/3454287.3455622 |publisher=Curran Associates Inc.}}</ref> {{See also|Mel-frequency cepstrum#Applications}} ==Criticism== Stevens' student Donald D. Greenwood, who had worked on the mel scale experiments in 1956, considers the scale biased by experimental flaws. In 2009 he posted to a mailing list:<ref>{{Cite web |url=http://lists.mcgill.ca/scripts/wa.exe?A2=ind0907d&L=auditory&P=389 |title=Archived copy |access-date=2012-12-12 |archive-date=2013-02-08 |archive-url=https://web.archive.org/web/20130208164732/http://lists.mcgill.ca/scripts/wa.exe?A2=ind0907d&L=auditory&P=389 |url-status=dead }}</ref> {{blockquote|I would ask, why use the Mel scale now, since it appears to be biased? If anyone wants a Mel scale, they should do it over, controlling carefully for order bias and using plenty of subjects{{snd}} more than in the past{{snd}} and using both musicians and non-musicians to search for any differences in performance that may be governed by musician/non-musician differences or subject differences generally.}} ==See also== *[[Bark scale]] *[[Mel-frequency cepstrum]] *[[Fletcher–Munson curves]] ==References== {{reflist}} ==External links== *{{Commons category-inline}} *{{cite journal |title=A scale for the measurement of the psychological magnitude pitch |last1=Volkmann |first1=J |last2=Stevens |first2=SS |last3=Newman |first3=EB |journal=The Journal of the Acoustical Society of America |volume=8 |issue=3 |pages=208 |date=1937 |doi=10.1121/1.1901999 |bibcode=1937ASAJ....8..208V |doi-access=free }} *[http://www.sfu.ca/sonic-studio-webdav/handbook/Mel.html Handbook for Acoustic Ecology] {{Acoustics}} [[Category:Scales]] [[Category:Psychoacoustics]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Acoustics
(
edit
)
Template:ArXiv
(
edit
)
Template:Audio
(
edit
)
Template:Blockquote
(
edit
)
Template:Category handler
(
edit
)
Template:Citation
(
edit
)
Template:Cite book
(
edit
)
Template:Cite conference
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Comma separated entries
(
edit
)
Template:Commons category-inline
(
edit
)
Template:Main other
(
edit
)
Template:Reflist
(
edit
)
Template:See also
(
edit
)
Template:Short description
(
edit
)
Template:Snd
(
edit
)