Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Harmonic Vector Excitation Coding
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{short description|Audio compression standard}} '''Harmonic Vector Excitation Coding''', abbreviated as '''HVXC''' is a [[speech coding]] [[algorithm]] specified in [[MPEG-4 Part 3]] (MPEG-4 Audio) standard for very low [[bit rate]] speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and [[variable bit rate]] mode and [[sampling frequency]] of 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique.<ref name="mpeg4audio-version4-2009">{{Citation | url=http://webstore.iec.ch/preview/info_isoiec14496-3%7Bed4.0%7Den.pdf | title=ISO/IEC 14496-3:2009 - Information technology -- Coding of audio-visual objects -- Part 3: Audio | format=PDF | author=[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] | publisher=IEC | date=2009-09-01 | access-date=2009-10-07}}</ref> The total algorithmic [[Latency (audio)|delay]] for the encoder and decoder is 36 ms.<ref name="hvxc">{{Citation | url=http://www.jstage.jst.go.jp/article/ast/27/6/27_6_375/_pdf | title=Harmonic vector excitation coding of speech | author=Masayuki Nishiguchi | publisher=Acoustical Science and Technology | format=PDF | date=2006-04-17 | access-date=2009-10-09}}</ref> It was published as subpart 2 of [[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] 14496-3:1999 (MPEG-4 Audio) in 1999.<ref name="mpeg4audio">{{Cite web | url=http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.htm?csnumber=25035 | title=ISO/IEC 14496-3:1999 - Information technology -- Coding of audio-visual objects -- Part 3: Audio | author=[[International Organization for Standardization|ISO]] | publisher=ISO | year=1999 | access-date=2009-10-09}}</ref> An extended version of HVXC was published in MPEG-4 Audio Version 2 (ISO/IEC 14496-3:1999/Amd 1:2000).<ref name="mpeg4audio-iso-2-amd">{{Cite web | url=http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.htm?csnumber=31568 | title=ISO/IEC 14496-3:1999/Amd 1:2000 - Audio extensions | author=[[International Organization for Standardization|ISO]] | publisher=ISO | year=2000 | access-date=2009-10-07}}</ref><ref name="mpeg4audio-version2">{{Cite FTP | url=ftp://ftp.tnt.uni-hannover.de/pub/MPEG/audio/mpeg4/documents/w2803/w2803_n.pdf | title=ISO/IEC 14496-3:/Amd.1 - Final Committee Draft - MPEG-4 Audio Version 2 | format=PDF | author=[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] JTC 1/SC 29/WG 11 | date=July 1999 | access-date=2009-10-07 | url-status=dead | server=FTP server }}</ref> MPEG-4 Natural Speech Coding Tool Set uses two algorithms: HVXC and CELP ([[Code Excited Linear Prediction]]). HVXC is used at a low bit rate of 2 or 4 kbit/s. Higher bitrates than 4 kbit/s in addition to 3.85 kbit/s are covered by CELP.<ref name="speech-coding-chiariglione">{{Cite web | url=http://www.mp3-tech.org/programmer/docs/audio.pdf| title=MPEG-4 Natural Audio Coding - Natural Speech Coding Tools |author1=Karlheinz Brandenburg |author2=Oliver Kunz |author3=Akihiko Sugiyama | access-date=2013-03-25}}</ref> ==Technology== ===Linear Predictive Coding=== HVXC uses [[Linear predictive coding]] (LPC) with block-wise adaptation every 20ms.<ref name="hvxc" /> The LPC parameters are transformed into [[Line spectral pairs|Line spectral pair]] (LSP) coefficients, which are jointly quantized.<ref name="hvxc" /> The LPC residual signal is classified as either [[voiced]] or [[unvoiced]]. In the case of voiced speech, the residual is coded in a parametric representation (operating as a [[vocoder]]), while in the case of unvoiced speech, the residual waveform is quantized (thus operating as a hybrid speech codec). ===Voiced (Harmonic) Residual Coding=== In voiced segments, the residual signal is represented by two parameters: the pitch period and the spectral envelope.<ref name="hvxc" /> The pitch period is estimated from the peak values of the [[autocorrelation]] of the residual signal.<ref name="hvxc"/> In this process, the residual signal is compared against shifted copies of itself, and the shift which yields the greatest similarity by the measure of linear dependence is identified as the pitch period. The spectral envelope is represented by a set of amplitude values, one per [[harmonic]].<ref name="hvxc" /> To extract these values, the LPC residual signal is [[linear transform|transformed]] into the [[Discrete Fourier transform|DFT]]-domain.<ref name="hvxc"/> The DFT-spectrum is segmented into bands, one band per harmonic. The frequency band for the m-th harmonic consists of the DFT-coefficients from (m-1/2)Ο<sub>0</sub> to (m+1/2)Ο<sub>0</sub>, Ο<sub>0</sub> being the pitch frequency.<ref name="hvxc"/> The amplitude value for the m-th harmonic is chosen to optimally represent these DFT-coefficients.<ref name="hvxc" /> Phase information is discarded in this process. The spectral envelope is then coded using variable-dimension weighted [[vector quantization]]. This process is also referred to as '''Harmonic VQ'''. To make a speech with a mixture of voiced and unvoiced excitation sound more natural and smooth, three different modes of voiced speech (Mixed Voiced-1, Mixed Voiced-2, Full Voiced) are differentiated.<ref name="hvxc" /> The degree of voicing is determined by the value of the normalized autocorrelation function at a shift of one pitch period. Depending on the chosen mode, different amounts of band-pass [[Gaussian noise]] are added to the synthesized harmonic signal by the decoder. ===Voiceless (VXC) Residual Coding=== Unvoiced segments are encoded according to the [[CELP]] scheme, which is also referred to as '''vector excitation coding''' (VXC).<ref name="hvxc" /> The CELP coding in HVXQ is performed using only a stochastic codebook. In other CELP codecs, a dynamic codebook is used additionally to perform [[Long Term Prediction|long-term prediction]] of voiced segments. However, since HVXC does not use CELP for voiced segments, the dynamic codebook is omitted from the design. ==See also== * [[Opus (audio format)]] ==References== {{Reflist}} {{Compression formats}} [[Category:MPEG-4]] [[Category:Speech codecs]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Citation
(
edit
)
Template:Cite FTP
(
edit
)
Template:Cite web
(
edit
)
Template:Compression formats
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)