[Back to main]

Overview of current research work:

 

Scalable Video coding :

Due to advances in computer and network technology over past decade, a single workstation may serve as a personal computer, a high-definition TV, a videophone, or a fax machine. The main media of transmission is a computer network, which is often a heterogeneous environment, consisting of a diverse mixture of subnets and network users. In this situation, scalable transmission of video is essential to service different clients with widely varying display and processing capabilities. Scalability refers to the ability of an algorithm to decode a certain part of video bitstream to obtain a video at the desired quality or spatiotemporal resolution

 

Since its introduction, subband/wavelet coding has emerged as a powerful method for compressing still images and videos. Besides its effectiveness in data compression, the presence of subbands makes this scheme a natural choice for scalability applications. In simple 3-D subband/wavelet schemes, subband decomposition is extended in temporal domain also. But its performance can be improved with motion compensation.

 

MCTF with 2-tap Haar filters :

Most of the current schemes use 2-tap Haar filters for temporal analysis/synthesis. In this approach, the temporal low subband is motion compensated average of two frames, while the temporal high subband is the motion compensated difference. Low temporal subbands generated in such coders can be sent as a part of a low frame-rate sequence. With the sub-pixel motion compensation, images need to be interpolated for MC temporal filtering (MCTF). Thus the resulting analysis/synthesis scheme is not invertible. To achieve invertibility for any arbitrary subpixel accuracy, a lifting scheme was used. Besides the computational efficiency, this is the big advantage of the lifting scheme. 

 

MCTF with 5/3 filters :

Instead of 2-tap Haar filter, a longer length filter can make better use of the correlation in the temporal domain. One main advantage of Haar filters over other longer filters is we need the motion field between every other pair of input frames as opposed to every other frame.

 

In current work, we use a lifting based 3-D subband/wavelet coder using 5/3 filters for temporal filtering and unidirectional motion estimation. In this approach the backward motion field (i.e. the current frame comes before the reference frame) is used for the MCTF with quarter pixel accuracy, as determined using hierarchical variable size block matching (HVSBM).  We estimate and transmit a backward motion field between every consecutive frame and infer the forward motion field from this backward motion field. All the temporal subbands generated are further spatially analyzed and encoded using EZBC.

 

If we retain the fixed size GOP structure of the Haar filter MCTFs, we need to use symmetric extensions at both ends of the GOP. This gives rise to loss of coding efficiency at the GOP boundaries resulting in significant PSNR drops there.  This performance can be considerably improved by using a 'sliding window,' in place of the GOP block. We employ the 5/3 filter and its non-orthogonality causes PSNR variation, which can be reduced by employing filter-based weighting coefficients.

 

Overall the longer filters have a higher coding gain than the Haar filters and show significant improvement in average PSNR at high bit rates. However, a doubling in the number of motion vectors to be transmitted, translates to a drop in PSNR at the lower video bit rates.

 

Motion vector estimation and encoding :

 

Since we have motion vector data available between each frame, we can use this temporal redundancy in motion vector estimation and motion vector coding.

 

The motion estimation can either be done independently at each temporal level or MVs at the previous level were used as the starting point for current level. 

We do the pixel-by-pixel vector addition of two motion fields at the previous level and use that as starting point for the motion vector search.  We can generally use a smaller refinement range to generate the initial quadtree and then prune again.   Thus instead of using a spatial multiresolution pyramid like the one used in HVSBM, we use the temporal pyramid.   The smaller refinement range used gives rise to a more uniform motion field and can help in the motion vector encoding.

 

The motion vector prediction residuals to be encoded can be evaluated by 3 methods: use differentials along the scanning order (Scan), use spatial prediction from neighboring blocks (Spatial), or use temporal prediction from MVs of previous frame (Temporal).  For Bus and Mobile, the temporal error works better, but for Foreman and Football spatial error works better.

 

In our old scheme, we use adaptive arithmetic coding (AAC) described by Witten et al.  We used one probability model for all the motion vector symbols in a given frame and updated it adaptively at the encoder and decoder.  As the number of symbols increases, this scheme faces the zero frequency problem, i.e. even the unused symbols must be assigned some initial probability. We can replace this m-ary arithmetic coding by a context based binary arithmetic coding (CABAC) scheme similar to the one used for H.26L. Thus CABAC proves more successful in adapting to different motion models.

 

 

Related Publications: