My Ph.D thesis is entitled Compactly-Encoded Optical Flow
Fields for Motion-Compensated Video Coding and Processing , and my
thesis advisor was
Professor John Woods.
This work was performed in collaboration
with Dr. Pierre Moulin, who is currently a professor at
University of Illinois, Urbana-Champaign. Dr. Pierre Moulin
was formerly at Bell Communications Research, Morristown, NJ,
and was my supervisor during my internships there during
the summers of 1993, 1994 and 1995.
My research involved the application of optical flow fields (that are obtained from a gradient-based algorithm) to forward motion-compensated video coding. In the popular coding standards, like MPEG and H.263, the motion estimates are usually obtained by block-matching algorithms (BMAs) and are constant over a fairly large block (8x8 or 16x16). Therefore, these motion vectors require small overheads for transmission. In contrast, we concentrate on dense motion estimators that provide higher resolution motion fields when compared to BMAs. In fact, we allow a resolution of 1 vector/pixel and rely on the smoothness of the motion field for compression. The compression is accomplished by using a multiscale motion model and then quantizing the model coefficients. For motion estimation, we first developed a basic algorithm that uses a gradient-based technique; this algorithm may be regarded as an extension of the popular Horn and Schunck method. The motion estimation is done under quantizer-set constraints, enabling us to perform motion estimation and quantization in one step. This basic algorithm cannot estimate large displacements, so we have developed extensions that can do so. One of the extensions involves the use of iterated registration while the other uses a control (image) pyramid and a coarse-to-fine strategy. By using these extensions, we have demonstrated the ability to track large motion using smooth and compactly-encodable motion estimates.
Using our high-resolution motion estimators, we have demonstrated substantial coding improvements (especially visually), when compared to the popular BMA. We have also considered the application of our dense, compactly-encoded motion fields to frame-interpolation at the decoder. Again, we have demonstrated substantial improvements when compared to BMA. Our motion fields also perform better than those obtained using triangle motion compensation (TMC).
Bidirectional prediction (or frame-interpolation) is complicated by the presence of covered/uncovered regions. In order to handle these occlusions, we develop a label field that weights the influence of the forward and backward predictions. This may be considered as a dense, high-resolution extension of ideas in the MPEG standard. This label field has potential for application in other appplications as well. For example, they could be combined with the motion estimation and used to handle occlusions and illumination changes when predictive coding the P frames.
Publications related to the above work can be downloaded at http://cipr.rpi.edu/ravik/publications.html
Back to CIPR Home Page