Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Match moving
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Principle == The process of match moving can be broken down into two steps. === Tracking === The first step is identifying and [[video tracking|tracking]] features. A '''feature''' is a specific point in the image that a tracking algorithm can lock onto and follow through multiple frames ([https://www.ssontech.com/synovu.html SynthEyes] calls them ''blips''). Often features are selected because they are bright/dark spots, edges or corners depending on the particular tracking algorithm. Popular programs use [[template matching]] based on [[Cross-correlation#Normalized cross-correlation|NCC score]] and [[Root mean square deviation|RMS error]]. What is important is that each feature represents a specific point on the surface of a real object. As a feature is tracked it becomes a series of two-dimensional coordinates that represent the position of the feature across a series of frames. This series is referred to as a "track". Once tracks have been created they can be used immediately for 2-D motion tracking, or then be used to calculate 3-D information. === Calibration === {{main|Geometric camera calibration}} The second step involves solving for 3D motion. This process attempts to derive the motion of the camera by solving the inverse-projection of the 2-D paths for the position of the camera. This process is referred to as [[camera resectioning|calibration]]. When a point on the surface of a three-dimensional object is photographed, its position in the 2-D frame can be calculated by a [[3-D projection]] function. We can consider a camera to be an abstraction that holds all the parameters necessary to model a camera in a real or virtual world. Therefore, a camera is a vector that includes as its elements the position of the camera, its orientation, focal length, and other possible parameters that define how the camera focuses light onto the [[film plane]]. Exactly how this vector is constructed is not important as long as there is a compatible projection function ''P''. The projection function ''P'' takes as its input a camera vector (denoted <u>''camera''</u>) and another vector the position of a 3-D point in space (denoted <u>''xyz''</u>) and returns a 2D point that has been projected onto a plane in front of the camera (denoted <u>''XY''</u>). We can express this: :<u>''XY''</u> = P(<u>''camera''</u>, <u>''xyz''</u>) [[Image:Match moving - projection de points 3D.jpg|thumb|right|An illustration of feature projection. Around the rendering of a 3-D structure, red dots represent points that are chosen by the tracking process. Cameras at frame ''i'' and ''j'' project the view onto a plane depending on the parameters of the camera. In this way features tracked in 2-D [[Correspondence problem|correspond]] to real points in a 3D space. Although this particular illustration is computer-generated, match moving is normally done on real objects.]] The projection function transforms the 3-D point and strips away the component of depth. Without knowing the depth of the component an inverse projection function can only return a set of possible 3D points, that form a line emanating from the [[nodal point]] of the camera lens and passing through the projected 2-D point. We can express the inverse projection as: :<u>''xyz'' </u> β P'(<u>''camera''</u>, <u>''XY''</u>) or :{<u>''xyz'' </u>:P(<u>''camera''</u>, <u>''xyz''</u>) = <u>''XY''</u>} Let's say we are in a situation where the features we are tracking are on the surface of a rigid object such as a building. Since we know that the real point <u>''xyz''</u> will remain in the same place in real space from one frame of the image to the next we can make the point a constant even though we do not know where it is. So: :<u>''xyz''</u><sub>''i''</sub> = <u>''xyz''</u><sub>''j''</sub> where the subscripts ''i'' and ''j'' refer to arbitrary frames in the shot we are analyzing. Since this is always true then we know that: :P'(<u>''camera''</u><sub>''i''</sub>, <u>''XY''</u><sub>''i''</sub>) ∩ P'(<u>''camera''</u><sub>''j''</sub>, <u>''XY''</u><sub>''j''</sub>) β {} Because the value of <u>''XY''</u><sub>''i''</sub> has been determined for all frames that the feature is tracked through by the tracking program, we can solve the reverse projection function between any two frames as long as P'(<u>''camera''</u><sub>''i''</sub>, <u>''XY''</u><sub>''i''</sub>) β© P'(<u>''camera''</u><sub>''j''</sub>, <u>''XY''</u><sub>''j''</sub>) is a small set. Set of possible <u>''camera''</u> vectors that solve the equation at i and j (denoted C<sub>''ij''</sub>). :C<sub>''ij''</sub> = {(<u>''camera''</u><sub>''i''</sub>,<u>''camera''</u><sub>''j''</sub>):P'(<u>''camera''</u><sub>''i''</sub>, <u>''XY''</u><sub>''i''</sub>) ∩ P'(<u>''camera''</u><sub>''j''</sub>, <u>''XY''</u><sub>''j''</sub>) β {}) So there is a set of camera vector pairs C<sub>''ij''</sub> for which the intersection of the inverse projections of two points <u>''XY''</u><sub>''i''</sub> and <u>''XY''</u><sub>''j''</sub> is a non-empty, hopefully small, set centering on a theoretical stationary point <u>''xyz'' </u>. In other words, imagine a black point floating in a white void and a camera. For any position in space that we place the camera, there is a set of corresponding parameters (orientation, focal length, etc.) that will photograph that black point exactly the same way. Since ''C'' has an infinite number of members, one point is never enough to determine the actual camera position. As we start adding tracking points, we can narrow the possible camera positions. For example, if we have a set of points {<u>''xyz''</u><sub>''i,0''</sub>,...,<u>''xyz''</u><sub>''i,n''</sub>} and {<u>''xyz''</u><sub>''j,0''</sub>,...,<u>''xyz''</u><sub>''j,n''</sub>} where i and j still refer to frames and n is an index to one of many tracking points we are following. We can derive a set of camera vector pair sets {C<sub>''i,j,0''</sub>,...,C<sub>''i,j,n''</sub>}. In this way multiple tracks allow us to narrow the possible camera parameters. The set of possible camera parameters that fit, F, is the intersection of all sets: :F = C<sub>''i,j,0''</sub> ∩ ... ∩ C<sub>''i,j,n''</sub> The fewer elements are in this set the closer we can come to extracting the actual parameters of the camera. In reality errors introduced to the tracking process require a more statistical approach to determining a good camera vector for each frame, [[Optimization (mathematics)|optimization]] algorithms and [[bundle adjustment|bundle block adjustment]] are often utilized. Unfortunately there are so many elements to a camera vector that when every parameter is free we still might not be able to narrow F down to a single possibility no matter how many features we track. The more we can restrict the various parameters, especially focal length, the easier it becomes to pinpoint the solution. In all, the 3D solving process is the process of narrowing down the possible solutions to the motion of the camera until we reach one that suits the needs of the composite we are trying to create. === Point-cloud projection === Once the camera position has been determined for every frame it is then possible to estimate the position of each feature in real space by inverse projection. The resulting set of points is often referred to as a '''point cloud''' because of its raw appearance like a [[nebula]]. Since point clouds often reveal some of the shape of the 3-D scene they can be used as a reference for placing synthetic objects or by a '''reconstruction''' program to create a 3-D version of the actual scene. === Ground-plane determination === The camera and point cloud need to be oriented in some kind of space. Therefore, once calibration is complete, it is necessary to define a ground plane. Normally, this is a unit plane that determines the scale, orientation and origin of the projected space. Some programs attempt to do this automatically, though more often the user defines this plane. Since shifting ground planes does a simple transformation of all of the points, the actual position of the plane is really a matter of convenience. === Reconstruction === '''[[3D reconstruction|3D reconstruction]]''' is the interactive process of recreating a photographed object using tracking data. This technique is related to [[photogrammetry]]. In this particular case we are referring to using match moving software to reconstruct a scene from incidental footage. A reconstruction program can create three-dimensional objects that mimic the real objects from the photographed scene. Using data from the point cloud and the user's estimation, the program can create a virtual object and then extract a texture from the footage that can be projected onto the virtual object as a surface texture.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)