Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Vocoder
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Theory== {{Unreferenced section|date=July 2019}} The [[human voice]] consists of sounds generated by the periodic opening and closing of the [[glottis]] by the [[vocal cords]], which produces an acoustic waveform with many [[harmonic]]s. This initial sound is then [[Audio filter|filter]]ed by movements in the nose, mouth and throat (a complicated [[resonant]] piping system known as the [[vocal tract]]) to produce fluctuations in harmonic content ([[formant]]s) in a controlled way, creating the wide variety of sounds used in speech. There is another set of sounds, known as the [[unvoiced]] and [[plosive]] sounds, which are created or modified by a variety of sound generating disruptions of airflow occurring in the [[vocal tract]]. The vocoder analyzes speech by measuring how its [[spectral energy distribution]] characteristics fluctuate across time. This analysis results in a set of temporally parallel [[Envelope (waves)|envelope]] signals, each representing the individual [[frequency band]] amplitudes of the user's speech. Put another way, the voice signal is divided into a number of [[frequency bands]] (the larger this number, the more accurate the analysis) and the level of signal present at each frequency band, occurring simultaneously, measured by an [[Envelope detector|envelope follower]], represents the spectral energy distribution across time. This set of envelope amplitude signals is called the [[Modulation|"modulator"]]. To recreate speech, the vocoder reverses the analysis process, variably filtering an initial broadband noise (referred to alternately as the "source" or "carrier"), by passing it through a set of [[Band-pass filter|band-pass filters]], whose individual envelope amplitude levels are controlled, in real time, by the set of analyzed envelope amplitude signals from the modulator. The digital encoding process involves a periodic analysis of each of the modulator's multiband set of filter envelope amplitudes. This analysis results in a set of digital [[Pulse-code modulation|pulse code modulation]] stream readings. Then the pulse code modulation stream outputs of each band are transmitted to a decoder. The decoder applies the pulse code modulations as control signals to corresponding amplifiers of the output filter channels. Information about the [[fundamental frequency]] of the initial voice signal (as distinct from its spectral characteristic) is discarded; it was not important to preserve this for the vocoder's original use as an encryption aid. It is this dehumanizing aspect of the vocoding process that has made it useful in creating special voice effects in popular music and audio entertainment. Instead of a point-by-point recreation of the waveform, the vocoder process sends only the parameters of the vocal model over the communication link. Since the parameters change slowly compared to the original speech waveform, the bandwidth required to transmit speech can be reduced. This allows more speech channels to utilize a given [[communication channel]], such as a radio channel or a [[Submarine communications cable|submarine cable]]. Analog vocoders typically analyze an incoming signal by splitting the signal into multiple tuned frequency bands or ranges. To reconstruct the signal, a [[carrier signal]] is sent through a series of these tuned band-pass filters. In the example of a typical robot voice the carrier is noise or a [[sawtooth waveform]]. There are usually between 8 and 20 bands. The amplitude of the modulator for each of the individual analysis bands generates a voltage that is used to control amplifiers for each of the corresponding carrier bands. The result is that frequency components of the modulating signal are mapped onto the carrier signal as discrete amplitude changes in each of the frequency bands. Often there is an unvoiced band or [[sibilance]] channel. This is for frequencies that are outside the analysis bands for typical speech but are still important in speech. Examples are words that start with the letters ''s'', ''f'', ''ch'' or any other sibilant sound. Using this band produces recognizable speech, although somewhat mechanical sounding. Vocoders often include a second system for generating unvoiced sounds, using a [[noise generator]] instead of the fundamental frequency. This is mixed with the carrier output to increase clarity. In the channel vocoder algorithm, among the two components of an [[analytic signal]], considering only the [[amplitude]] component and simply ignoring the [[instantaneous phase|phase]] component tends to result in an unclear voice; on methods for rectifying this, see [[phase vocoder]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)