MING-T - Scalable Coding

This page presents a summary of scalable coding techniques and their use in mobile television. For more details, please check out the recent project report D3.6. It describes the final architecture for scalable-video coding. Additional experimental data can be found in the predecessor report D3.3, which describes our initial architecture, published after the first nine months of project research. We also offer a tutorial video for download (65 MB .mov format) to illustrate the image-quality achievable at typical broadcasting bitrates.

The term Scalable Coding refers to techniques that allow a tradeoff between the video-quality experienced by the user and the technical parameters like channel bandwidth, bit-error-rate etc. The idea is to use a robust encoding scheme that provides a low-quality (low-resolution, low framerate) image even under the worst reception conditions, which is then combined with extra (possibly less robust) encodings to provide much better image quality when receiving conditions are good.Scalable coding techniques are well established for a large family of video standards belonging to the MPEG family. More precisely, not only have these techniques been fully standardized for MPEG-2, H.263 and MPEG-IV, but they have also been extensively studied in different transmission environments and conditions.

As part of the MING-T project, we intend to use the most advanced video coding standard which is supported by DTMB and DVB-H, which is H.264. While the H.264 baseline is currently standardized, its scalable extensions are still under discussion. The latter will likely inherit the same conceptual models of the scalable extensions already existing for the other codecs, i.e. spatial (size); SNR (quality) and temporal scalability (frame skipping), plus Fine Granular Scalability, the latter allowing graceful video quality degradation when bandwidth is fluctuating. Therefore there is already a framework which enables us to understand and experiment with scalability features for H.264.

However, to the best of our knowledge, there has not been any attempt to match the layered coding to the DVB-H and DTMB constraints in terms of bandwidth, processing time and the like. Moreover, even if there are recent studies which propose a combined approach to these techniques (for instance a mix of temporal and SNR scalability), they do not mention the possibility to also consider them in a dynamic configuration (allowing for instance to change over time the weight given to the temporal scalability with respect to the weight given to the SNR).

Systems like DVB-H allow the transmission of two video streams with different priorities by using the so-called “hierarchical modulation” scheme. By using that scheme, the High Priority (HP) stream bits correspond to a coarse position of the possible digital states of the constellation, whereas the Low Priority (LP) correspond to the finer position of the state. A classical example is the hierarchical interpretation of 64-QAM as a combination of 16-QAM and QPSK modulation.

As a result, hierarchical modulation allows the transmission of two streams, having different bit-rates and performance, in the same RF channel. Actually, performances are different since the HP stream has a better immunity against the noise than the remaining least significant bits. By taking advantage of scalable coding at the application layer, hierarchical modulation can enable a smart use of the available bandwidth, as explained in what follows. This is for instance recommended by the BlueBook “DVB-H Implementation Guidelines” (DVB Document A092, July 2005).

Actually scalable coding enables to structure the total bit-stream in two or more layers starting from a standalone base layer and adding a number of enhancement layers. The base layer carries the data for a base mode picture using high compression. The other layers contain enhancement data which bring the base mode pictures up to a better quality. In this case the HP stream would carry only the base layer, whereas the LP stream contains the full resolution video.

The above combination of hierarchical modulation and scalable video coding might be used to provide a differentiated service which could be successfully exploited in several application scenarios: when there are different wireless receiving conditions (e.g outdoor/indoor), different terminal capabilities, and different billing schemes.

Although much work has been done in the recent years on video scalability, both at research and standardization levels, there are lots of practical issues to be addressed when dealing with transmission standard, especially those who have been only recently approved (such as DVB-H) or not even finalized yet (see DMB-T). Actually, these standards introduce lot of constraints (such as the ratio between the modulation levels, the available bandwidths, and so on) which have to be taken into account when deploying hierarchical modulation combined with video scalability.

In this task we consider the video codec H.264, known for its high compression efficiency (and allowed by the DVB-H standard). These are the most important items which must be taken into account:

The ratio between the LP bandwidth and the HP bandwidth, which depends on the selected hierarchy mode of the hierarchical modulation
Different scalability choices are available: notably spatial, temporal, SNR and FGS. Each of them have different characteristics in terms of codec complexity, visual performance (which in turn might depend on the video content), and introduced overhead. Also, combined scalability modes could be employed
The overall transmission robustness depends also on the selected FEC which improves the SNR. Actually, Reed Solomon parity data can take different values, resulting in different noise immunity
Finally, there are different measures which could be taken at the decoder side to counteract the effect of BER and/or packet loss

Our proposal is to develop a block which decides which is the best scalability mode to be employed at the coder side. To fix the ideas, we are interested to switch between a scalability mainly based on frame skipping (i.e. temporal scalability) and scalability based on the degradation of the quality of each single frame (basically SNR). The decision has to be taken according to the video content. Actually, this is a general visual problem since, given a limited bandwidth, a video can be coded at high frame rate with poor picture quality or vice versa. From the users’ point of view it is therefore needed a trade-off between the temporal and the spatial quality: generally speaking higher frame rate is preferred only for high motion video (e.g. sports event), while lower frame rate is preferred for low motion video (e.g. news).

Practically speaking, the decision block has to decide the most suitable scalable mode for the HP stream. To accomplish this task we propose to measure real-time the degree of the “jerky effect” present in the video sequence. This is real-time computed for each couple of frames, and compared against a threshold. The instantaneous frame size is set according to the result of this comparison. Since the bandwidth is limited, that frame size poses a constraint on the frame rate.

This can be viewed as a scalable mode selector since a greater weight is put on temporal scalability when the frame dropping is preferred to a degradation of picture quality. Otherwise SNR should be employed. As a further enhancement also terminal capabilities will be considered by using spatial scalability. Actually the latter allows to control the size of the video in a very efficient manner.

This decision block would work in the architecture sketched in Figure 12. The block receives as inputs:

the transmission and receiving scenarios parameters which justify the hierarchical modulation
the constraints given by the DVB standards
the output of the motion analyzer, which in turns processes the motion information (basically the motion vectors) available at the coder side

According to the output of the Motion Analyzer, the decision block selects the best scalability mode to be used by the scalable coder. And, what is most important, no a priori setting of the outgoing frame rate is required, whereas in fixed scalability systems the frame rate should be set according to what is being filmed, which can often be unknown or unpredictable. This scheme implies that suitable visual experiments, to be performed off-line, have to provide the parameters (e.g. the threshold) which drive the Decision Block behavior.

< Prev		Next >