Editing Quantization (signal processing) (section)

==Design==
===Granular distortion and overload distortion===
Often the design of a quantizer involves supporting only a limited range of possible output values and performing clipping to limit the output to this range whenever the input exceeds the supported range. The error introduced by this clipping is referred to as ''overload'' distortion.  Within the extreme limits of the supported range, the amount of spacing between the selectable output values of a quantizer is referred to as its ''granularity'', and the error introduced by this spacing is referred to as ''granular'' distortion.  It is common for the design of a quantizer to involve determining the proper balance between granular distortion and overload distortion. For a given supported number of possible output values, reducing the average granular distortion may involve increasing the average overload distortion, and vice versa.  A technique for controlling the amplitude of the signal (or, equivalently, the quantization step size <math>\Delta</math>) to achieve the appropriate balance is the use of ''[[automatic gain control]]'' (AGC). However, in some quantizer designs, the concepts of granular error and overload error may not apply (e.g., for a quantizer with a limited range of input data or with a countably infinite set of selectable output values).<ref name=GrayNeuhoff/>

===Rate–distortion quantizer design===
A scalar quantizer, which performs a quantization operation, can ordinarily be decomposed into two stages:
;Classification
:A process that classifies the input signal range into <math>M</math> non-overlapping ''[[interval (mathematics)|intervals]]'' <math>\{I_k\}_{k=1}^{M}</math>, by defining <math>M-1</math> ''decision boundary'' values <math> \{b_k\}_{k=1}^{M-1} </math>, such that <math> I_k = [b_{k-1}~,~b_k)</math> for <math>k = 1,2,\ldots,M</math>, with the extreme limits defined by <math> b_0 = -\infty</math> and <math> b_M = \infty</math>. All the inputs <math>x</math> that fall in a given interval range <math>I_k</math> are associated with the same quantization index <math>k</math>.
;Reconstruction
:Each interval <math> I_k </math> is represented by a ''reconstruction value'' <math> y_k </math> which implements the mapping <math> x \in I_k \Rightarrow y = y_k </math>.

These two stages together comprise the mathematical operation of <math>y = Q(x)</math>.

[[Entropy coding]] techniques can be applied to communicate the quantization indices from a source encoder that performs the classification stage to a decoder that performs the reconstruction stage. One way to do this is to associate each quantization index <math>k</math> with a binary codeword <math>c_k</math>. An important consideration is the number of bits used for each codeword, denoted here by <math>\mathrm{length}(c_k)</math>. As a result, the design of an <math>M</math>-level quantizer and an associated set of codewords for communicating its index values requires finding the values of <math> \{b_k\}_{k=1}^{M-1} </math>, <math>\{c_k\}_{k=1}^{M} </math> and <math> \{y_k\}_{k=1}^{M} </math> which optimally satisfy a selected set of design constraints such as the ''bit rate'' <math>R</math> and ''distortion'' <math>D</math>.

Assuming that an information source <math>S</math> produces random variables <math>X</math> with an associated PDF <math>f(x)</math>, the probability <math>p_k</math> that the random variable falls within a particular quantization interval <math>I_k</math> is given by:
:<math> p_k = P[x \in I_k] = \int_{b_{k-1}}^{b_k} f(x)dx </math>.

The resulting bit rate <math>R</math>, in units of average bits per quantized value, for this quantizer can be derived as follows:
:<math> R = \sum_{k=1}^{M} p_k \cdot \mathrm{length}(c_{k}) = \sum_{k=1}^{M} \mathrm{length}(c_k) \int_{b_{k-1}}^{b_k} f(x)dx </math>.

If it is assumed that distortion is measured by mean squared error,{{efn|Other distortion measures can also be considered, although mean squared error is a popular one.}} the distortion '''D''', is given by:
:<math> D = E[(x-Q(x))^2] = \int_{-\infty}^{\infty} (x-Q(x))^2f(x)dx = \sum_{k=1}^{M} \int_{b_{k-1}}^{b_k} (x-y_k)^2 f(x)dx </math>.

A key observation is that rate <math>R</math> depends on the decision boundaries <math>\{b_k\}_{k=1}^{M-1}</math> and the codeword lengths <math>\{\mathrm{length}(c_k)\}_{k=1}^{M}</math>, whereas the distortion <math>D</math> depends on the decision boundaries <math>\{b_k\}_{k=1}^{M-1}</math> and the reconstruction levels <math>\{y_k\}_{k=1}^{M}</math>.

After defining these two performance metrics for the quantizer, a typical rate–distortion formulation for a quantizer design problem can be expressed in one of two ways:
# Given a maximum distortion constraint <math>D \le D_\max</math>, minimize the bit rate <math>R</math>
# Given a maximum bit rate constraint <math>R \le R_\max</math>, minimize the distortion <math>D</math>

Often the solution to these problems can be equivalently (or approximately) expressed and solved by converting the formulation to the unconstrained problem <math>\min\left\{ D + \lambda \cdot R \right\}</math> where the [[Lagrange multiplier]] <math>\lambda</math> is a non-negative constant that establishes the appropriate balance between rate and distortion. Solving the unconstrained problem is equivalent to finding a point on the [[convex hull]] of the family of solutions to an equivalent constrained formulation of the problem. However, finding a solution – especially a [[Closed-form expression|closed-form]] solution – to any of these three problem formulations can be difficult. Solutions that do not require multi-dimensional iterative optimization techniques have been published for only three PDFs: the uniform,<ref>{{cite journal | last1=Farvardin | first1=N. |author-link=Nariman Farvardin| last2=Modestino | first2=J. | title=Optimum quantizer performance for a class of non-Gaussian memoryless sources | journal=IEEE Transactions on Information Theory | publisher=Institute of Electrical and Electronics Engineers (IEEE) | volume=30 | issue=3 | year=1984 | issn=0018-9448 | doi=10.1109/tit.1984.1056920 | pages=485–497}}(Section VI.C and Appendix B)</ref> [[Exponential distribution|exponential]],<ref name=SullivanIT>{{cite journal | last=Sullivan | first=G.J. |author-link=Gary Sullivan (engineer)| title=Efficient scalar quantization of exponential and Laplacian random variables | journal=IEEE Transactions on Information Theory | publisher=Institute of Electrical and Electronics Engineers (IEEE) | volume=42 | issue=5 | year=1996 | issn=0018-9448 | doi=10.1109/18.532878 | pages=1365–1374}}</ref> and [[Laplace distribution|Laplacian]]<ref name=SullivanIT/> distributions. Iterative optimization approaches can be used to find solutions in other cases.<ref name=GrayNeuhoff/><ref name=Berger72>{{cite journal | last=Berger | first=T. |author-link=Toby Berger| title=Optimum quantizers and permutation codes | journal=IEEE Transactions on Information Theory | publisher=Institute of Electrical and Electronics Engineers (IEEE) | volume=18 | issue=6 | year=1972 | issn=0018-9448 | doi=10.1109/tit.1972.1054906 | pages=759–765}}</ref><ref name=Berger82>{{cite journal | last=Berger | first=T. |author-link=Toby Berger| title=Minimum entropy quantizers and permutation codes | journal=IEEE Transactions on Information Theory | publisher=Institute of Electrical and Electronics Engineers (IEEE) | volume=28 | issue=2 | year=1982 | issn=0018-9448 | doi=10.1109/tit.1982.1056456 | pages=149–157}}</ref>

Note that the reconstruction values <math>\{y_k\}_{k=1}^{M}</math> affect only the distortion – they do not affect the bit rate – and that each individual <math>y_k</math> makes a separate contribution <math> d_k </math> to the total distortion as shown below:
:<math> D = \sum_{k=1}^{M} d_k </math>
where
:<math> d_k = \int_{b_{k-1}}^{b_k} (x-y_k)^2 f(x)dx </math>
This observation can be used to ease the analysis – given the set of <math>\{b_k\}_{k=1}^{M-1}</math> values, the value of each <math>y_k</math> can be optimized separately to minimize its contribution to the distortion <math>D</math>.

For the mean-square error distortion criterion, it can be easily shown that the optimal set of reconstruction values <math>\{y^*_k\}_{k=1}^{M}</math> is given by setting the reconstruction value <math>y_k</math> within each interval <math>I_k</math> to the [[conditional expected value]] (also referred to as the ''[[centroid]]'') within the interval, as given by:
:<math>y^*_k = \frac1{p_k} \int_{b_{k-1}}^{b_k} x f(x)dx</math>.

The use of sufficiently well-designed entropy coding techniques can result in the use of a bit rate that is close to the true information content of the indices <math>\{k\}_{k=1}^{M}</math>, such that effectively
:<math> \mathrm{length}(c_k) \approx -\log_2\left(p_k\right)</math>
and therefore
:<math> R = \sum_{k=1}^{M} -p_k \cdot \log_2\left(p_k\right) </math>.

The use of this approximation can allow the entropy coding design problem to be separated from the design of the quantizer itself. Modern entropy coding techniques such as [[arithmetic coding]] can achieve bit rates that are very close to the true entropy of a source, given a set of known (or adaptively estimated) probabilities <math>\{p_k\}_{k=1}^{M}</math>.

In some designs, rather than optimizing for a particular number of classification regions <math>M</math>, the quantizer design problem may include optimization of the value of <math>M</math> as well.  For some probabilistic source models, the best performance may be achieved when <math>M</math> approaches infinity.

===Neglecting the entropy constraint: Lloyd–Max quantization===
In the above formulation, if the bit rate constraint is neglected by setting <math>\lambda</math> equal to 0, or equivalently if it is assumed that a fixed-length code (FLC) will be used to represent the quantized data instead of a [[variable-length code]] (or some other entropy coding technology such as arithmetic coding that is better than an FLC in the rate–distortion sense), the optimization problem reduces to minimization of distortion <math>D</math> alone.

The indices produced by an <math>M</math>-level quantizer can be coded using a fixed-length code using <math> R = \lceil \log_2 M \rceil </math> bits/symbol. For example, when <math>M=</math>256 levels, the FLC bit rate <math>R</math> is 8 bits/symbol. For this reason, such a quantizer has sometimes been called an 8-bit quantizer. However using an FLC eliminates the compression improvement that can be obtained by use of better entropy coding.

Assuming an FLC with <math>M</math> levels, the rate–distortion minimization problem can be reduced to distortion minimization alone. The reduced problem can be stated as follows: given a source <math>X</math> with PDF <math>f(x)</math> and the constraint that the quantizer must use only <math>M</math> classification regions, find the decision boundaries <math>\{b_k\}_{k=1}^{M-1} </math> and reconstruction levels <math>\{y_k\}_{k=1}^M</math> to minimize the resulting distortion 
:<math> D=E[(x-Q(x))^2] = \int_{-\infty}^{\infty} (x-Q(x))^2f(x)dx = \sum_{k=1}^{M} \int_{b_{k-1}}^{b_k} (x-y_k)^2 f(x)dx =\sum_{k=1}^{M} d_k </math>.

Finding an optimal solution to the above problem results in a quantizer sometimes called a MMSQE (minimum mean-square quantization error) solution, and the resulting PDF-optimized (non-uniform) quantizer is referred to as a ''Lloyd–Max'' quantizer, named after two people who independently developed iterative methods<ref name=GrayNeuhoff/><ref>{{cite journal | last=Lloyd | first=S. | title=Least squares quantization in PCM | journal=IEEE Transactions on Information Theory | publisher=Institute of Electrical and Electronics Engineers (IEEE) | volume=28 | issue=2 | year=1982 | issn=0018-9448 | doi=10.1109/tit.1982.1056489 | pages=129–137| s2cid=10833328 | citeseerx=10.1.1.131.1338 }} (work documented in a manuscript circulated for comments at [[Bell Laboratories]] with a department log date of 31 July 1957 and also presented at the 1957 meeting of the [[Institute of Mathematical Statistics]], although not formally published until 1982).</ref><ref>{{cite journal | last=Max | first=J. | title=Quantizing for minimum distortion | journal=IEEE Transactions on Information Theory | publisher=Institute of Electrical and Electronics Engineers (IEEE) | volume=6 | issue=1 | year=1960 | issn=0018-9448 | doi=10.1109/tit.1960.1057548 | pages=7–12| bibcode=1960ITIT....6....7M }}</ref> to solve the two sets of simultaneous equations resulting from <math> {\partial D / \partial b_k} = 0 </math> and <math>{\partial D/ \partial y_k} = 0 </math>, as follows:
:<math> {\partial D \over\partial b_k} = 0 \Rightarrow b_k = {y_k + y_{k+1} \over 2} </math>,
which places each threshold at the midpoint between each pair of reconstruction values, and
:<math> {\partial D \over\partial y_k} = 0 \Rightarrow y_k = { \int_{b_{k-1}}^{b_k} x f(x) dx \over \int_{b_{k-1}}^{b_k} f(x)dx } = \frac1{p_k} \int_{b_{k-1}}^{b_k} x f(x) dx </math>
which places each reconstruction value at the centroid (conditional expected value) of its associated classification interval.

[[Lloyd's algorithm|Lloyd's Method I algorithm]], originally described in 1957, can be generalized in a straightforward way for application to vector data. This generalization results in the [[Linde–Buzo–Gray algorithm|Linde–Buzo–Gray (LBG)]] or [[k-means]] classifier optimization methods. Moreover, the technique can be further generalized in a straightforward way to also include an entropy constraint for vector data.<ref name=ChouLookabaughGray>{{cite journal | last1=Chou | first1=P.A. | last2=Lookabaugh | first2=T. | last3=Gray | first3=R.M. |author-link3=Robert M. Gray| title=Entropy-constrained vector quantization | journal=IEEE Transactions on Acoustics, Speech, and Signal Processing | publisher=Institute of Electrical and Electronics Engineers (IEEE) | volume=37 | issue=1 | year=1989 | issn=0096-3518 | doi=10.1109/29.17498 | pages=31–42}}</ref>

===Uniform quantization and the 6&nbsp;dB/bit approximation===
The Lloyd–Max quantizer is actually a uniform quantizer when the input PDF is uniformly distributed over the range <math>[y_1-\Delta/2,~y_M+\Delta/2)</math>. However, for a source that does not have a uniform distribution, the minimum-distortion quantizer may not be a uniform quantizer. The analysis of a uniform quantizer applied to a uniformly distributed source can be summarized in what follows:

A symmetric source X can be modelled with <math> f(x)= \tfrac1{2X_{\max}}</math>, for <math>x \in [-X_{\max} , X_{\max}]</math> and 0 elsewhere.
The step size <math>\Delta = \tfrac {2X_{\max}} {M} </math> and the ''signal to quantization noise ratio'' (SQNR) of the quantizer is
:<math>{\rm SQNR}= 10\log_{10}{\frac {\sigma_x^2}{\sigma_q^2}} = 10\log_{10}{\frac {(M\Delta)^2/12}{\Delta^2/12}}= 10\log_{10}M^2= 20\log_{10}M</math>.

For a fixed-length code using <math>N</math> bits, <math>M=2^N</math>, resulting in
<math>{\rm SQNR}= 20\log_{10}{2^N} = N\cdot(20\log_{10}2) = N\cdot 6.0206\,\rm{dB}</math>,

or approximately 6&nbsp;dB per bit. For example, for <math>N</math>=8 bits, <math>M</math>=256 levels and SQNR = 8×6 = 48&nbsp;dB; and for <math>N</math>=16 bits, <math>M</math>=65536 and SQNR = 16×6 = 96&nbsp;dB. The property of 6&nbsp;dB improvement in SQNR for each extra bit used in quantization is a well-known figure of merit. However, it must be used with care: this derivation is only for a uniform quantizer applied to a uniform source. For other source PDFs and other quantizer designs, the SQNR may be somewhat different from that predicted by 6&nbsp;dB/bit, depending on the type of PDF, the type of source, the type of quantizer, and the bit rate range of operation.

However, it is common to assume that for many sources, the slope of a quantizer SQNR function can be approximated as 6&nbsp;dB/bit when operating at a sufficiently high bit rate. At asymptotically high bit rates, cutting the step size in half increases the bit rate by approximately 1 bit per sample (because 1 bit is needed to indicate whether the value is in the left or right half of the prior double-sized interval) and reduces the mean squared error by a factor of 4 (i.e., 6&nbsp;dB) based on the <math>\Delta^2/12</math> approximation.

At asymptotically high bit rates, the 6&nbsp;dB/bit approximation is supported for many source PDFs by rigorous theoretical analysis.<ref name=Bennett/><ref name=OliverPierceShannon/><ref name=GishPierce/><ref name=GrayNeuhoff/> Moreover, the structure of the optimal scalar quantizer (in the rate–distortion sense) approaches that of a uniform quantizer under these conditions.<ref name=GishPierce/><ref name=GrayNeuhoff/>
<!-- I don't think that was proved by anyone else before it was done by Gish & Pearce in '68. For example, was it done by Koshelev in '63? (I don't think so) Zador in '66? (I don't know - probably not) Goblick & Holsinger in '67? (I don't see it in that paper.) -->