[IEEE 2009 Picture Coding Symposium (PCS) - Chicago, IL, USA (2009.05.6-2009.05.8)] 2009 Picture...

4
ERROR CONCEALMENT BY MEANS OF CLUSTERED BLOCKWISE PCA Alessandra M.Coelho, Joaquim T de. Assis, Vania Vieira Estrela * [email protected], [email protected], [email protected] Universidade do Estado do Rio de Janeiro (UERJ), Instituto Politécnico do Rio de Janeiro (IPRJ), CP 972825, 28630-050, Nova Friburgo, RJ, Brazil ABSTRACT This paper analyzes two variants of Principal Component Analysis (PCA) for error-concealment: blockwise PCA and clustered blockwise PCA. Realistic communication channels are not error free. Since the signals transmitted on real-world channels are highly compressed, regardless of cause, the quality of images reconstructed from any corrupted data can be very unsatisfactory. Error concealment is intended to ameliorate the impact of channel impairments by utilizing a priori information about typical images in conjunction with available picture redundancy to provide subjectively acceptable renditions of affected picture regions. Some experiments have been performed with the two proposed algorithms and they are shown. Index Terms— Principal component analysis, error concealment, clustering, region partitioning, mixtures. 1. INTRODUCTION Since we are dealing with compressed images and videos, error concealment schemes are closely related to the type of coding/decoding algorithms used [2, 12]. Although there are many other competitive coding techniques in the literature, like fractal-based coding [11], model-based coding [3, 9], and object-based coding [10], block transform-based coding is by far the most popular for video compression, and sub band-based coding is the best for image compression. Error concealment/correction algorithms permit reliable and timely delivery of image and video data over the Internet. In block transform coding, the spatial redundancies inside blocks are removed, and the energy is compacted into a small number of coefficients after the transformation. Realistic communication channels are not error free, although the loss mechanism may vary broadly from media to media. Data corruption may be caused by network congestion, thermal noise, switching noise, signal fading, bit-errors in noisy channels, cell loss in packet networks, etc.. Because signals transmitted through real- world channels are highly compressed, the quality of reconstructed images from any corrupted data can be unsatisfactory. Error concealment is intended to improve the impact of channel impairments via a priori information about typical images combined with picture redundancy to provide subjectively acceptable renditions of affected blocks. Some helpful a priori properties of natural image sequences are: 1) Smoothness of the reconstructed blocks which requires them to be connected with their neighbors and without abrupt image intensity changes; 2) Edge Continuity of the objects present in the scene; and 3) Consistency implies that correctly received blocks will not be altered by a restoration process, and that restored values will lie in a known range. The concealment process must be supported by a suitable transfer format that helps to identify image regions corresponding to lost or damaged data [6, 7, 8]. Once the blocks to be concealed are identified, a mixture of spatial and temporal replacement techniques may be applied to fill in lost picture elements. In conversational applications, such as video-telephony or videoconferencing, a peer-to- peer communication is established, and the equivalent operations, for instance, video encoding and decoding, are performed at both sides. In video streaming, client-server architecture is rather employed. The client queries the server for a specific video stream; a certain amount of data is pre-rolled; then, video data are transmitted from the server; and decoded and displayed from the client in real- time. In case of packet losses, the initial delay due to the pre-roll operation usually allows the client to ask for a given number of retransmissions; on the other hand, a residual packet loss rate subsists. In order to fix the effect of losses, decoded error concealment is highly desirable. Algorithms based on temporal, spatial, and spatio-temporal interpolations have been proposed [2, 12, 13]. These algorithms assume that either a single macro-block (MB), or a slice consisting of several consecutive MBs is lost. The latter case is quite realistic, while the former one requires

Transcript of [IEEE 2009 Picture Coding Symposium (PCS) - Chicago, IL, USA (2009.05.6-2009.05.8)] 2009 Picture...

ERROR CONCEALMENT BY MEANS OF CLUSTERED BLOCKWISE PCA

Alessandra M.Coelho, Joaquim T de. Assis, Vania Vieira Estrela*

[email protected], [email protected], [email protected]

Universidade do Estado do Rio de Janeiro (UERJ), Instituto Politécnico do Rio de Janeiro (IPRJ), CP 972825, 28630-050, Nova Friburgo, RJ, Brazil

ABSTRACT

This paper analyzes two variants of Principal Component Analysis (PCA) for error-concealment: blockwise PCA and clustered blockwise PCA. Realistic communication channels are not error free. Since the signals transmitted on real-world channels are highly compressed, regardless of cause, the quality of images reconstructed from any corrupted data can be very unsatisfactory. Error concealment is intended to ameliorate the impact of channel impairments by utilizing a priori information about typical images in conjunction with available picture redundancy to provide subjectively acceptable renditions of affected picture regions. Some experiments have been performed with the two proposed algorithms and they are shown.

Index Terms— Principal component analysis, error concealment, clustering, region partitioning, mixtures.

1. INTRODUCTION Since we are dealing with compressed images and videos, error concealment schemes are closely related to the type of coding/decoding algorithms used [2, 12]. Although there are many other competitive coding techniques in the literature, like fractal-based coding [11], model-based coding [3, 9], and object-based coding [10], block transform-based coding is by far the most popular for video compression, and sub band-based coding is the best for image compression.

Error concealment/correction algorithms permit reliable and timely delivery of image and video data over the Internet. In block transform coding, the spatial redundancies inside blocks are removed, and the energy is compacted into a small number of coefficients after the transformation. Realistic communication channels are not error free, although the loss mechanism may vary broadly from media to media. Data corruption may be caused by network congestion, thermal noise, switching noise, signal fading, bit-errors in noisy channels, cell loss in packet networks, etc.. Because signals transmitted through real-

world channels are highly compressed, the quality of reconstructed images from any corrupted data can be unsatisfactory. Error concealment is intended to improve the impact of channel impairments via a priori information about typical images combined with picture redundancy to provide subjectively acceptable renditions of affected blocks. Some helpful a priori properties of natural image sequences are:

1) Smoothness of the reconstructed blocks which requires them to be connected with their neighbors and without abrupt image intensity changes;

2) Edge Continuity of the objects present in the scene; and

3) Consistency implies that correctly received blocks will not be altered by a restoration process, and that restored values will lie in a known range.

The concealment process must be supported by a

suitable transfer format that helps to identify image regions corresponding to lost or damaged data [6, 7, 8]. Once the blocks to be concealed are identified, a mixture of spatial and temporal replacement techniques may be applied to fill in lost picture elements. In conversational applications, such as video-telephony or videoconferencing, a peer-to-peer communication is established, and the equivalent operations, for instance, video encoding and decoding, are performed at both sides. In video streaming, client-server architecture is rather employed. The client queries the server for a specific video stream; a certain amount of data is pre-rolled; then, video data are transmitted from the server; and decoded and displayed from the client in real-time. In case of packet losses, the initial delay due to the pre-roll operation usually allows the client to ask for a given number of retransmissions; on the other hand, a residual packet loss rate subsists. In order to fix the effect of losses, decoded error concealment is highly desirable. Algorithms based on temporal, spatial, and spatio-temporal interpolations have been proposed [2, 12, 13]. These algorithms assume that either a single macro-block (MB), or a slice consisting of several consecutive MBs is lost. The latter case is quite realistic, while the former one requires

that a MB interleaving scheme is employed. Information on the neighboring available MB, and on the MBs in the adjacent frames, are used to estimate the missing information, namely the motion vectors (MV) of the missing MB, and its texture information [2, 4].

Computer vision and image processing have been using PCA extensively because it provides the optimal linear subspace for representing data in a least-square sense, it has been used for dimensionality reduction and subspace analysis in various domains. This text focuses on the decoder where errors can be detected in two places: (i) In the compression syntax, whenever discrepancies are encountered, an error is assumed; and (ii) When features in the reconstructed image resemble artifacts caused by errors in the channel, a fault is assumed. For instance, one can look for damage in a single transform coefficient by examining the difference between the edge pixels in a block and its four neighboring blocks. Assuming that the change between blocks is smooth, the difference is small and otherwise an error took place.

k k’

Figure 1 – Multi-modal data at time k and at time k’.

2. ERROR CONCEALMENT

In video coding standards such as MPEG-4 [10], the region of interest (ROI) information is available. A model-based error concealment scheme can use this information to devise a better error concealment system. Given a set of observations, the multi-modal data (Fig. 1) are modeled with minimum representation error. The data are clustered to multiple components (two components in this example) in a multi-dimensional space. As mentioned, the data can be non-stationary, i.e., their stochastic properties are time-varying. At discreet time instant k, the data are clustered and at time instant k’ , the data are clustered as in Fig. 1. The mean of each component is shifting and the most representative axes of each component are also rotating.

PCA can be applied to any given data set by breaking the data into pieces and vectorizing them. Then it can be applied to each piece of vectors to be compressed/coded,

treating them as if they are independent identically distributed. In this illustration, we are assuming the same reconstruction error for each of the cases along this axis.

2.1. Blockwise PCA (BP) BP explores various partitionings of the visual data volume (of size W H F = width height frames) into smaller blocks and applies PCA to the blocks individually. With BP, it is possible to take full advantage of the local linearity in the data. Since this approach constructs subspaces for each block, it is safe to expect that they will contain very low dimensionality compared to the entire data set. The power of using blocks lies in the fact that, even when the data has global nonlinearities, it can be well approximated by linear models at a local level. Thus, typically, the dimensionality of blocks is expected to be much lower than that of the complete data set. As PCA takes out the best possible subspace of the data set in a least-square sense, it is used to trim down redundancy while preserving the total RMS error between the original data and the data reconstructed using the extracted subspace. As each eigenvalue represents the variance along its corresponding eigenvector, a fine measure often used to predict the RMS error is the eigenratio defined as

,k

ik M

ij

j

R

where i is the i-th largest eigenvalue (please, see [14]). In general, visual data displays more intricate form variations over space and time. BP takes advantage of the neighboring linearity in such largely nonlinear data and, hence, is able to present a better compression. 2.2. Clustered Blockwise PCA (CBP) BP allows us to apply PCA to large data sets faster and more notably, with higher compression rates when compared to global PCA. It provides a convenient representation of the data as a set of subspaces, which can be used to exploit self similarities that commonly occur within visual data as well. Subspaces of “similar” blocks can be merged leading to an extra reduction in the total amount of storage. Note that such correlations in appearance can happen not only in the spatial domain but also in the temporal domain.

From a statistical standpoint, the distribution of the visual data variation can be distant from a Gaussian distribution, whilst PCA assumes the data as being normally distributed. Nevertheless, variations in local regions can often be modeled with normal distributions and regions in different spatio-temporal locations may have identical Gaussian distributions. A recent analysis of PCA [4] suggests a trade-off between the number of subspaces, the dimensionality of each subspace, and the dimensionality of the data. This approach achieves higher efficiency not only in terms of storage, but also computational cost. As a result, PCA is successfully scaled to large problems devoid of losing any of its intrinsic valuable properties. Automatic methods for estimating the most advantageous spatial and temporal block size and/or allowing it to vary across the data volume are still under study. It clusters the subspaces obtained from BP, merging them into clusters each with a single subspace. This merging of subspaces makes possible to tie together blocks with similar local subspaces or statistics to achieve greater efficiency in representation by avoiding redundant encoding of subspaces. To evaluate block similarity, a metric to assess the closeness or distance between dissimilar subspaces is needed. Linear subspaces can be characterized by their orthogonal projection matrix, P = VVT, where V is a matrix whose columns are the eigenvectors of the subspace. The distance between two subspaces can be the difference between their projection matrices:

( , ) ,2

ΧP P where PX and PY are the projection matrices of subspaces and , in that order, and

( , ) sin ,1 where 1 is the first and largest canonical angle. A canonical angles are the angles between all pairs of orthonormal vectors of the QR decomposed eigenvector matrices of the two subspaces. So, the largest canonical angle gives a rotation invariant distance involving two subspaces. Hence, the distance can be calculated without computing the projection matrices. This distance can be compared to a threshold to decide if two subspaces of interest should be combined. Subspace should not be merged with when the dimensionality of is smaller than the one corresponding to because it will not result in a reduction in the total amount of storage. Clustering and

merging the block subspaces result in a reduction in the number of subspaces.

(a) (b)

(c) (d)

Figure 2 - Low motion GOP: (a) original frame; (b) Frame estimated by the CBA algorithm. High motion GOP: (c) original frame; (d) frame estimated by the CBA algorithm. 3. EXPERIMENTAL RESULTS AND CONCLUSIONS When it comes to performance assessment of the proposed image processing methods, a tough issue is how to contrast different error concealment techniques. Image and video quality can be assessed both subjectively and objectively. Indubitably, the best and decisive judges of image and video quality are human observers. Subjective evaluations are time-consuming and not repeatable. In contrast, objective measures are often methodically tractable and reproducible. For images and videos, the most commonly used objective measure has been the peak signal-to-noise ratio (PSNR). The PSNR values for a single corrected frame k and for a sequence of frames

2 2

22

255 255 ( )10 log and 10 log ,ˆ( ) ˆk

i ii

MNPSNR PSNRx y

x y

where xi and ˆ iy are correspondingly the original and the

reconstructed 8-bit pixel values Vectors. x and y are, likewise, the original and the restored images, and each frame has M ×N pixels. The definition indicates that the

larger the PSNR value is, the better will be the image quality. However, PSNR values do not always correlate well with subjective quality evaluation.

Figure 3 – PSNR comparison for part of the “Foreman” sequence.

Some experiments have been performed with the two proposed algorithms. A few results for the “Foreman” QCIF sequence (176×144 pixels/frame) are presented in Figures 2 and 3. These results suppose that 15% of the received information has been corrupted.

Fig. 2 shows results for low and high amounts of motion using the CBA algorithm As one can visually deduce from the reconstructed frames on the right. Fig. 3 illustrates a PSNR × Number of frames plot where the behaviors of the BP and the CBP algorithms are compared against an error free reception. There is little gain in terms of PSNR between BP and CBP, however CBP permits a much faster transmission rate due to the variable block size. Hence, CBP allows a more compact representation of a compressed frame.

CBP was introduced for representing visual data and it not only takes benefit of the fact that representative visual data contains restricted variations, but also exploits the spatio-temporal correlations in the data. It achieves superior effectiveness not only in terms of storage space, but also computational cost. As a consequence, PCA was successfully scaled to be applied to large problems without loosing any of the its intrinsic valuable properties.

ACKNOWLEDGEMENTS Dr. Estrela’s and Dr. de Assis’ work are supported by CAPES, CNPq and FAPERJ.

5. REFERENCES [1] K. Blekas, A. Likas, N.P. Galatsanos, I.E. Lagaris, “A Spatially-Constrained Mixture Model for Image Segmentation”, IEEE Trans. on Neural Networks, vol. 16, 494-498, 2005. [2] A.K. Katsaggelos and N.P. Galatsanos, Signal Recovery Techniques for Image and Video Compression and Transmission, Springer, 1998 [3] X. Li, A.K. Katsaggelos, and G.M. Schuster, “A Recursive Shape Error Concealment Algorithm”, Proceedings of the IEEE International Conference on Image Processing, 177, 2002. [4] I. Jolliffe, Principal Component Analysis. New York: Springer-Verlag, 2002. [5] K.I. Kim, M. Franz, B. Schölkopf, “Iterative Kernel Principal Component Analysis for Image Modeling”, IEEE Trans. on Pattern Analysis and Machine Intelligence,27(9), 1351-1366, 2005. [6] A.M. Tekalp, Digital Video Processing, Prentice-Hall, New Jersey, 1995. [7] H. Sub and W. Kwok, “Concealment of Damaged Block Transform Coded Images using Projections Onto Convex Sets”, IEEE Trans. Image Processing, Vol. 4, pp. 470-477, April 1995. [8] M. Yajnik, S. Moon, J. Kurose, D. Towsley, Measurement and Modeling of the Temporal Dependence in Packet Loss, INFOCOM 1999, pp. 345-52, March 1999. [9] P. Eisert, T. Wiegand, and B. Girod. “Model-Aided Coding: a New Approach to Incorporate Facial Information into Motion-Compensated Video Coding”, IEEE Trans. on Circ. and Syst. for Video Tech., 10:344-358, April 2000. [10] T. Sikora. “The MPEG-4 Video Standard Verification Model”, IEEE Trans. on Circuits and Systems for Video Technology, 7(1):19-31, February 1997. [11] B. Wohlberg and G. DeJager, “A Review of the Fractal Image Coding Literature”, IEEE Trans. on Image Proc., 8(12):1716-1729, December 1999. [12] Y. Wang, S. Wenger, J. Wen, and A.K. Katsaggelos, “Error Resilient Video Coding Techniques,” IEEE Sig. Proc. Mag., pp. 61–82, July 2000. [13] S. Belfiore, L. Crisa, M. Grangetto, E. Magli, and G. Olmo, “Robust and Edge-Preserving Video Error Concealment by Coarse-to-Fine Block Replenishment”, Proc. of ICASSP, 2002. [14] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1990.