Editing Motion compensation

{{Short description|Video compression technique, used to efficiently predict and generate video frames}}
{{distinguish|motion compensator}}

[[Image:Elephantsdream vectorstill04 crop.png|thumb|Visualization of MPEG block motion compensation. Blocks that moved from one frame to the next are shown as white arrows, making the motions of the different platforms and the character clearly visible.]]

'''Motion compensation''' in computing is an algorithmic technique used to predict a frame in a video given the previous and/or future frames by accounting for motion of the camera and/or objects in the video. It is employed in the encoding of video data for [[video compression]], for example in the generation of {{nowrap|[[MPEG-2]]}} files. Motion compensation describes a picture in terms of the transformation of a reference picture to the current picture. The reference picture may be previous in time or even from the future. When images can be accurately synthesized from previously transmitted/stored images, the compression efficiency can be improved.

Motion compensation is one of the two key [[video compression]] techniques used in [[video coding standards]], along with the [[discrete cosine transform]] (DCT). Most video coding standards, such as the [[H.26x]] and [[MPEG]] formats, typically use motion-compensated DCT hybrid coding,<ref>{{cite book |last1=Chen |first1=Jie |last2=Koc |first2=Ut-Va |last3=Liu |first3=KJ Ray |title=Design of Digital Video Coding Systems: A Complete Compressed Domain Approach |date=2001 |publisher=[[CRC Press]] |isbn=9780203904183 |page=71 |url=https://books.google.com/books?id=LUzFKU3HeegC&pg=PA71}}</ref><ref name="Li"/> known as block motion compensation (BMC) or motion-compensated DCT (MC DCT).

== Functionality ==
Motion compensation exploits the fact that, often, for many [[Film frame|frames]] of a movie, the only difference between one frame and another is the result of either the camera moving or an object in the frame moving. In reference to a video file, this means much of the information that represents one frame will be the same as the information used in the next frame.

Using motion compensation, a video stream will contain some full (reference) frames; then the only information stored for the frames in between would be the information needed to transform the previous frame into the next frame.

== Illustrated example ==
The following is a simplistic illustrated explanation of how motion compensation works. Two successive frames were captured from the movie ''[[Elephants Dream]]''. As can be seen from the images, the bottom (motion compensated) difference between two frames contains significantly less detail than the prior images, and thus compresses much better than the rest. Thus the information that is required to encode compensated frame will be much smaller than with the difference frame. This also means that it is also possible to encode the information using difference image at a cost of less compression efficiency but by saving coding complexity without motion compensated coding; as a matter of fact that motion compensated coding (together with [[motion estimation]], motion compensation) occupies more than 90% of encoding complexity.

{| class="wikitable"
|-
! Type
! Example Frame
! Description
|-
| Original
| [[File:Motion compensation example-original.jpg|160px]]
| Full original frame, as shown on screen.
|-
| Difference
| [[File:Motion compensation example-difference.jpg|160px]]
| Differences between the original frame and the next frame.
|-
| Motion compensated difference
| [[File:Motion compensation example-compensated difference.jpg|160px]]
| Differences between the original frame and the next frame, shifted right by 2 pixels. Shifting the frame ''compensates'' for the [[panning (camera)|panning]] of the camera, thus there is greater overlap between the two frames.
|}

== MPEG ==

In [[MPEG]], images are predicted from previous frames {{nowrap|([[Video compression picture types|P frames]])}} or bidirectionally from previous and future frames {{nowrap|([[Video compression picture types|B frames]]).}} {{nowrap|B frames}} are more complex because the image sequence must be transmitted and stored out of order so that the future frame is available to generate the {{nowrap|B frames.}}<ref>{{Cite web|url=http://bmrc.berkeley.edu/research/mpeg/faq/mpeg2-v38/faq_v38.html#tag40|title=MPEG-2 FAQ|date=February 20, 2009|archive-url=https://web.archive.org/web/20090220062554/http://bmrc.berkeley.edu/research/mpeg/faq/mpeg2-v38/faq_v38.html#tag40 |archive-date=2009-02-20 }}</ref>

After predicting frames using motion compensation, the coder finds the residual, which is then compressed and transmitted.

== Global motion compensation ==

In [[global motion compensation]], the motion model basically reflects camera motions such as:
* Dolly — moving the camera forward or backward
* Track — moving the camera left or right
* Boom — moving the camera up or down
* Pan — rotating the camera around its Y axis, moving the view left or right
* Tilt — rotating the camera around its X axis, moving the view up or down
* Roll — rotating the camera around the view axis

It works best for still scenes without moving objects.

There are several advantages of global motion compensation:
* It models the dominant motion usually found in video sequences with just a few parameters. The share in bit-rate of these parameters is negligible.
* It does not partition the frames. This avoids artifacts at partition borders.
* A straight line (in the time direction) of pixels with equal spatial positions in the frame corresponds to a continuously moving point in the real scene. Other MC schemes introduce discontinuities in the time direction.

MPEG-4 ASP supports global motion compensation with three reference points, although some implementations can only make use of one. A single reference point only allows for translational motion which for its relatively large performance cost provides little advantage over block based motion compensation.

Moving objects within a frame are not sufficiently represented by global motion compensation.
Thus, local [[motion estimation]] is also needed.

== Motion-compensated DCT ==
=== Block motion compensation ===
{{See also|Block-matching algorithm}}
'''Block motion compensation''' (BMC), also known as motion-compensated [[discrete cosine transform]] (MC DCT), is the most widely used motion compensation technique.<ref name="Li">{{cite book |last1=Li |first1=Jian Ping |title=Proceedings of the International Computer Conference 2006 on Wavelet Active Media Technology and Information Processing: Chongqing, China, 29-31 August 2006 |date=2006 |publisher=[[World Scientific]] |isbn=9789812709998 |page=847 |url=https://books.google.com/books?id=FZiK3zXdK7sC&pg=PA847}}</ref> In BMC, the frames are partitioned in blocks of pixels (e.g. macro-blocks of 16×16 pixels in [[MPEG]]).
Each block is predicted from a block of equal size in the reference frame.
The blocks are not transformed in any way apart from being shifted to the position of the predicted block.
This shift is represented by a ''motion vector''.

To exploit the redundancy between neighboring block vectors, (e.g. for a single moving object covered by multiple blocks) it is common to encode only the difference between the current and previous motion vector in the bit-stream. The result of this differentiating process is mathematically equivalent to a global motion compensation capable of panning.
Further down the encoding pipeline, an [[entropy encoding|entropy coder]] will take advantage of the resulting statistical distribution of the motion vectors around the zero vector to reduce the output size.

It is possible to shift a block by a non-integer number of pixels, which is called ''sub-pixel precision''.
The in-between pixels are generated by interpolating neighboring pixels. Commonly, half-pixel or quarter pixel precision ([[Qpel]], used by H.264 and MPEG-4/ASP) is used. The computational expense of sub-pixel precision is much higher due to the extra processing required for interpolation and on the encoder side, a much greater number of potential source blocks to be evaluated.

The main disadvantage of block motion compensation is that it introduces discontinuities at the block borders (blocking artifacts).
These artifacts appear in the form of sharp horizontal and vertical edges which are easily spotted by the human eye and produce false edges and ringing effects (large coefficients in high frequency sub-bands) due to quantization of coefficients of the [[List of Fourier-related transforms|Fourier-related transform]] used for [[transform coding]] of the [[residual frame]]s<ref>Zeng, Kai, et al. "Characterizing perceptual artifacts in compressed video streams." IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 2014.</ref>

Block motion compensation divides up the ''current'' frame into non-overlapping blocks, and the motion compensation vector tells where those blocks come ''from''
(a common misconception is that the ''previous frame'' is divided up into non-overlapping blocks, and the motion compensation vectors tell where those blocks move ''to'').
The source blocks typically overlap in the source frame.
Some video compression algorithms assemble the current frame out of pieces of several different previously transmitted frames.

Frames can also be predicted from future frames.
The future frames then need to be encoded before the predicted frames and thus, the encoding order does not necessarily match the real frame order.
Such frames are usually predicted from two directions, i.e. from the I- or P-frames that immediately precede or follow the predicted frame.
These bidirectionally predicted frames are called [[Video compression picture types|''B-frames'']].
A coding scheme could, for instance, be IBBPBBPBBPBB.

Further, the use of triangular tiles has also been proposed for motion compensation. Under this scheme, the frame is tiled with triangles, and the next frame is generated by performing an affine transformation on these triangles.<ref>Aizawa, Kiyoharu, and Thomas S. Huang. "Model-based image coding advanced video coding techniques for very low bit-rate applications." Proceedings of the IEEE 83.2 (1995): 259-271.</ref> Only the affine transformations are recorded/transmitted. This is capable of dealing with zooming, rotation, translation etc.

=== Variable block-size motion compensation ===
'''Variable block-size motion compensation''' (VBSMC) is the use of BMC with the ability for the encoder to dynamically select the size of the blocks.  When coding video, the use of larger blocks can reduce the number of bits needed to represent the motion vectors, while the use of smaller blocks can result in a smaller amount of prediction residual information to encode.  Other areas of work have examined the use of variable-shape feature metrics, beyond block boundaries, from which interframe vectors can be calculated.<ref>{{Cite book|last=Garnham|first=Nigel W.|title=Motion Compensated Video Coding - PhD Thesis|publisher=University of Nottingham|year=1995|url=http://eprints.nottingham.ac.uk/13447/1/thesis.pdf|oclc=59633188}}</ref> Older designs such as [[H.261]] and [[MPEG-1]] video typically use a fixed block size, while newer ones such as [[H.263]], [[MPEG-4 Part 2]], [[H.264/MPEG-4 AVC]], and [[VC-1]] give the encoder the ability to dynamically choose what block size will be used to represent the motion.

=== Overlapped block motion compensation ===

'''Overlapped block motion compensation''' (OBMC) is a good solution to these problems because it not only increases prediction accuracy but also avoids blocking artifacts.  When using OBMC,
blocks are typically twice as big in each dimension and overlap quadrant-wise with all 8 neighbouring blocks.
Thus, each pixel belongs to 4 blocks.  In such a scheme, there are 4 predictions for each pixel which are summed up to a weighted mean.
For this purpose, blocks are associated with a window function that has the property that the sum of 4 overlapped windows is equal to 1 everywhere.

Studies of methods for reducing the complexity of OBMC have shown that the contribution to the window function is smallest for the diagonally-adjacent block.  Reducing the weight for this contribution to zero and increasing the other weights by an equal amount leads to a substantial reduction in complexity without a large penalty in quality.  In such a scheme, each pixel then belongs to 3 blocks rather than 4, and rather than using 8 neighboring blocks, only 4 are used for each block to be compensated.  Such a scheme is found in the [[H.263]] Annex F Advanced Prediction mode

== Quarter Pixel (QPel) and Half Pixel motion compensation ==

In motion compensation, quarter or half samples are actually interpolated sub-samples caused by fractional motion vectors. Based on the vectors and full-samples, the sub-samples can be calculated by using bicubic or bilinear 2-D filtering.  See subclause 8.4.2.2 "Fractional sample interpolation process" of the H.264 standard.

== 3D image coding techniques ==

Motion compensation is utilized in [[stereoscopic video coding]].

In video, ''time'' is often considered as the third dimension. Still, image coding techniques can be expanded to an extra dimension.

[[JPEG 2000]] uses wavelets, and these can also be used to encode motion without gaps between blocks in an adaptive way. Fractional pixel [[affine transformation]]s lead to bleeding between adjacent pixels. If no higher internal resolution is used the delta images mostly fight against the image smearing out. The delta image can also be encoded as wavelets, so that the borders of the adaptive blocks match.

[[2D+Delta]] Encoding techniques utilize [[H.264]] and [[MPEG-2]] compatible coding and can use motion compensation to compress between stereoscopic images.

==History==
{{Main|Video coding format}}

A precursor to the concept of motion compensation dates back to 1929, when R.D. Kell in Britain proposed the concept of transmitting only the portions of an [[analog video]] scene that changed from frame-to-frame. In 1959, the concept of [[inter-frame]] motion compensation was proposed by [[NHK]] researchers Y. Taki, M. Hatori and S. Tanaka, who proposed predictive inter-frame [[video coding]] in the [[temporal dimension]].<ref name="ITU">{{cite web |title=History of Video Compression |url=https://www.itu.int/wftp3/av-arch/jvt-site/2002_07_Klagenfurt/JVT-D068.doc |website=[[ITU-T]] |publisher=Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6) |date=July 2002 |pages=11, 24–9, 33, 40–1, 53–6 |access-date=3 November 2019}}</ref>

===Motion-compensated DCT===
Practical motion-compensated [[video compression]] emerged with the development of motion-compensated [[Discrete cosine transform|DCT]] (MC DCT) coding,<ref name="Lea">{{cite book |last1=Lea |first1=William |title=Video on demand: Research Paper 94/68 |date=1994 |publisher=[[House of Commons Library]] |url=https://researchbriefings.parliament.uk/ResearchBriefing/Summary/RP94-68 |access-date=20 September 2019 |archive-url=https://web.archive.org/web/20190920082623/https://researchbriefings.parliament.uk/ResearchBriefing/Summary/RP94-68 |archive-date=20 September 2019 |url-status=dead }}</ref> also called block motion compensation (BMC) or DCT motion compensation. This is a hybrid coding algorithm,<ref name="ITU"/> which combines two key [[data compression]] techniques: [[discrete cosine transform]] (DCT) coding<ref name="Lea"/> in the [[spatial dimension]], and predictive motion compensation in the [[temporal dimension]].<ref name="ITU"/> DCT coding is a [[lossy compression|lossy]] block compression [[transform coding]] technique that was first proposed by [[N. Ahmed|Nasir Ahmed]], who initially intended it for [[image compression]], in 1972.<ref name="Ahmed">{{cite journal |last=Ahmed |first=Nasir |author-link=N. Ahmed |title=How I Came Up With the Discrete Cosine Transform |journal=[[Digital Signal Processing (journal)|Digital Signal Processing]] |date=January 1991 |volume=1 |issue=1 |pages=4–5 |doi=10.1016/1051-2004(91)90086-Z |bibcode=1991DSP.....1....4A |url=https://www.scribd.com/doc/52879771/DCT-History-How-I-Came-Up-with-the-Discrete-Cosine-Transform|url-access=subscription }}</ref>

In 1974, Ali Habibi at the [[University of Southern California]] introduced hybrid coding,<ref>{{cite journal |last1=Habibi |first1=Ali |title=Hybrid Coding of Pictorial Data |journal=IEEE Transactions on Communications |date=1974 |volume=22 |issue=5 |pages=614–624 |doi=10.1109/TCOM.1974.1092258}}</ref><ref>{{cite journal |last1=Chen |first1=Z. |last2=He |first2=T. |last3=Jin |first3=X. |last4=Wu |first4=F. |title=Learning for Video Compression |journal=IEEE Transactions on Circuits and Systems for Video Technology |volume=30 |issue=2 |pages=566–576 |doi=10.1109/TCSVT.2019.2892608 |arxiv=1804.09869 |year=2020 |s2cid=13743007 }}</ref> which combines predictive coding with transform coding.<ref name="ITU"/><ref>{{cite book |last1=Ohm |first1=Jens-Rainer |title=Multimedia Signal Coding and Transmission |date=2015 |publisher=Springer |isbn=9783662466919 |pages=364 |url=https://books.google.com/books?id=e7xnBwAAQBAJ&pg=PA364}}</ref> However, his algorithm was initially limited to [[intra-frame]] coding in the spatial dimension. In 1975, John A. Roese and Guner S. Robinson extended Habibi's hybrid coding algorithm to the temporal dimension, using transform coding in the spatial dimension and predictive coding in the temporal dimension, developing [[inter-frame]] motion-compensated hybrid coding.<ref name="ITU"/><ref name="Roese">{{cite journal |last1=Roese |first1=John A. |last2=Robinson |first2=Guner S. |editor-first1=Andrew G. |editor-last1=Tescher |title=Combined Spatial And Temporal Coding Of Digital Image Sequences |journal=Efficient Transmission of Pictorial Information |date=30 October 1975 |volume=0066 |pages=172–181 |doi=10.1117/12.965361 |bibcode=1975SPIE...66..172R |publisher=International Society for Optics and Photonics|s2cid=62725808 }}</ref> For the spatial transform coding, they experimented with the DCT and the [[fast Fourier transform]] (FFT), developing inter-frame hybrid coders for both, and found that the DCT is the most efficient due to its reduced complexity, capable of compressing image data down to 0.25-[[bit]] per [[pixel]] for a [[videotelephone]] scene with image quality comparable to an intra-frame coder requiring 2-bit per pixel.<ref>{{cite book |last1=Huang |first1=T. S. |title=Image Sequence Analysis |date=1981 |publisher=[[Springer Science & Business Media]] |isbn=9783642870378 |page=29 |url=https://books.google.com/books?id=bAirCAAAQBAJ&pg=PA29}}</ref><ref name="Roese"/>

In 1977, Wen-Hsiung Chen developed a fast DCT algorithm with C.H. Smith and S.C. Fralick.<ref>{{cite journal |last1=Chen |first1=Wen-Hsiung |last2=Smith |first2=C. H. |last3=Fralick |first3=S. C. |title=A Fast Computational Algorithm for the Discrete Cosine Transform |journal=[[IEEE Transactions on Communications]] |date=September 1977 |volume=25 |issue=9 |pages=1004–1009 |doi=10.1109/TCOM.1977.1093941}}</ref> In 1979, [[Anil K. Jain (electrical engineer, born 1946)|Anil K. Jain]] and Jaswant R. Jain further developed motion-compensated DCT video compression,<ref>{{cite book |last1=Cianci |first1=Philip J. |title=High Definition Television: The Creation, Development and Implementation of HDTV Technology |date=2014 |publisher=McFarland |isbn=9780786487974 |page=63 |url=https://books.google.com/books?id=0mbsfr38GTgC&pg=PA63}}</ref><ref name="ITU"/> also called block motion compensation.<ref name="ITU"/> This led to Chen developing a practical video compression algorithm, called motion-compensated DCT or adaptive scene coding, in 1981.<ref name="ITU"/> Motion-compensated DCT later became the standard coding technique for video compression from the late 1980s onwards.<ref name="Ghanbari">{{cite book |last1=Ghanbari |first1=Mohammed |title=Standard Codecs: Image Compression to Advanced Video Coding |date=2003 |publisher=[[Institution of Engineering and Technology]] |isbn=9780852967102 |pages=1–2 |url=https://books.google.com/books?id=7XuU8T3ooOAC&pg=PA1}}</ref><ref name="Li"/>

The first digital [[video coding standard]] was [[H.120]], developed by the [[ITU-T|CCITT]] (now ITU-T) in 1984.<ref name="history">{{cite web |title=The History of Video File Formats Infographic |url=http://www.real.com/resources/digital-video-file-formats/ |website=[[RealNetworks]] |access-date=5 August 2019 |date=22 April 2012}}</ref> H.120 used motion-compensated DPCM coding,<ref name="ITU"/> which was inefficient for video coding,<ref name="Ghanbari"/> and H.120 was thus impractical due to low performance.<ref name="history"/> The [[H.261]] standard was developed in 1988 based on motion-compensated DCT compression,<ref name="Ghanbari"/><ref name="Li"/> and it was the first practical video coding standard.<ref name="history"/> Since then, motion-compensated DCT compression has been adopted by all the major video coding standards (including the [[H.26x]] and [[MPEG]] formats) that followed.<ref name="Ghanbari"/><ref name="Li"/>

==See also==
* [[Motion estimation]]
* [[Image stabilization]]
* [[Inter frame]]
* [[HDTV blur]]
* [[Television standards conversion]]
* [[VidFIRE]]
* [[X-Video Motion Compensation]]

== Applications ==
* [[video compression]]
* change of [[framerate]] for playback of 24 frames per second movies on 60&nbsp;Hz [[LCD]]s or 100&nbsp;Hz [[interlaced]] [[cathode-ray tube]]s

== References ==
{{reflist}}

== External links ==
* [https://msdn.microsoft.com/en-us/windows/hardware/gg463407 Temporal Rate Conversion] - article giving an overview of motion compensation techniques.
* [https://portal.acm.org/citation.cfm?id=784892.784978 A New FFT Architecture and Chip Design for Motion Compensation based on Phase Correlation]
* {{web archive |url=https://web.archive.org/web/20161229181448/https://vision.arc.nasa.gov/publications/mathjournal94.pdf |title=DCT and DFT coefficients are related by simple factors}}
* [http://actapress.com/PaperInfo.aspx?PaperID=26756&reason=500 DCT better than DFT also for video]
* {{cite web |author=John Wiseman |url=http://old.siggraph.org/education/materials/HyperGraph/video/mpeg/ |archive-url=https://web.archive.org/web/20170430222819/http://old.siggraph.org/education/materials/HyperGraph/video/mpeg/ |archive-date=2017-04-30 |title=An Introduction to MPEG Video Compression}}
* [https://ieeexplore.ieee.org/document/856453/ DCT and motion compensation]
* {{web archive |url=https://archive.today/20070623104949/http://www.hindawi.com/GetArticle.aspx?doi=10.1155/S1110865701000245 |title=Compatibility between DCT, motion compensation and other methods}}

{{Clear}}
{{Compression Methods}}

[[Category:Film and video technology]]
[[Category:H.26x]]
[[Category:Video compression]]
[[Category:Motion in computer vision]]
[[Category:Data compression]]