EFFICIENT COMPRESSION TECHNIQUES FOR MULTI-DIMENSIONAL IMAGES by Hariharan G. Lalgudi A Dissertation Submitted to the Faculty of the DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2008 2 FINAL EXAMINING COMMITTEE APPROVAL FORM As members of the Dissertation Committee, we verify that we have read the dissertation prepared by Hariharan G. Lalgudi entitled Efficient Compression Techniques for Multi-Dimensional Images and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy Date: August 20, 2008 Date: August 20, 2008 Date: August 20, 2008 Date: August 20, 2008 Michael W. Marcellin Ali Bilgin Bane Vasic Ivan Djordjevic Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final copies of the dissertation to the Graduate College. I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement. Date: August 20, 2008 Dissertation Director: Michael W. Marcellin Date: Dissertation Co-Director: Ali Bilgin August 20, 2008 3 STATEMENT BY AUTHOR This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at The University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the library. Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed use of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author. SIGNED: Hariharan G. Lalgudi 4 ACKNOWLEDGEMENTS I would like to thank my advisors Prof. Michael W. Marcellin and Prof. Ali Bilgin for their continued support and valuable guidance throughout the PhD program. Thanks to Prof. Bane Vasic and Prof. Ivan Djordjevic for spending their valuable time to review my dissertation. I would also like to thank my colleagues at SPACL for their valuable inputs. Last but not the least, I would like to thank my family and friends, without whom this work would not have been possible. 5 To my parents and my wife Rajeswari 6 TABLE OF CONTENTS LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . CHAPTER 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . 2.1 Linear block transforms . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Subband transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Discrete wavelet transform . . . . . . . . . . . . . . . . . . . 2.3 JPEG2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 3 COMPRESSION OF MULTI-DIMENSIONAL IMAGES USING JPEG2000 . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 N-D compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Scalability issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 4 VIEW COMPENSATED COMPRESSION OF VOLUME RENDERED IMAGES . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Volume rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Geometric relationship between volume rendered images . . . . . . 4.4 Compression of volume rendered images . . . . . . . . . . . . . . . 4.4.1 Lifting-based view compensated wavelet transform . . . . . . 4.4.2 Compression of depth maps . . . . . . . . . . . . . . . . . . 4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 5 SCALABLE LOW COMPLEXITY CODER . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Scalable low complexity coder . . . . . . . . . . . . . . . . . . . . . 5.3 Remote Volume Rendering . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Airborne video transmission . . . . . . . . . . . . . . . . . . . . . . 8 10 11 12 15 16 18 20 22 23 28 28 31 33 36 39 41 41 44 48 53 54 55 57 63 66 66 67 70 71 74 7 TABLE OF CONTENTS — Continued 5.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 CHAPTER 6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . 83 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8 LIST OF FIGURES 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 Filter bank realization of ‘two-channel’ subband transforms. . . . . . . Lifting implementation of subband analysis. . . . . . . . . . . . . . . . Lifting implementation of subband synthesis. . . . . . . . . . . . . . . . One dimensional DWT with 3 levels. . . . . . . . . . . . . . . . . . . . Two dimensional DWT. . . . . . . . . . . . . . . . . . . . . . . . . . . JPEG2000 Part 1 encoder. . . . . . . . . . . . . . . . . . . . . . . . . . Three level 2-D DWT. . . . . . . . . . . . . . . . . . . . . . . . . . . . Codeblocks from different subbands (left) that correspond to a region of interest in the original image (right). . . . . . . . . . . . . . . . . . . . 3.1 JPEG2000 Part 2 Encoder. . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Stage k of MCT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Compression of a 4-D data using 2 MCT stages. . . . . . . . . . . . . . 4.1 Client-server model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Ray casting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Volume rendering example. . . . . . . . . . . . . . . . . . . . . . . . . 4.4 View-point geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Depth map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Compression/decompression of volume rendered images. . . . . . . . . 4.7 Forward lifting-based view compensated wavelet transform. . . . . . . . 4.8 Processed depth map. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 PSNR of frames in sequence 1 of dataset 1 at 200 Kbps. . . . . . . . . 4.10 PSNR of frames in sequence 1 of dataset 1 at 800 Kbps. . . . . . . . . 5.1 Schematic of the scalable low complexity coder. . . . . . . . . . . . . . 5.2 Wavelet coefficient structure. . . . . . . . . . . . . . . . . . . . . . . . 5.3 Low complexity entropy coding schemes. . . . . . . . . . . . . . . . . . 5.4 Client-Server communication. . . . . . . . . . . . . . . . . . . . . . . . 5.5 Comparison of end-to-end decompression time for test sequence 1. . . . 5.6 Comparison of end-to-end decompression time for test sequence 2. . . . 5.7 Average PSNR of decompressed images at the client for test sequence 1. 5.8 Average PSNR of decompressed images at the client for test sequence 2. 5.9 Frame 1 of test sequence 1. . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Compressed/Decompressed image – JPEG2000. . . . . . . . . . . . . . 5.11 Compressed/Decompressed image – SLCC. . . . . . . . . . . . . . . . . 19 21 22 22 23 24 26 27 30 33 34 42 45 47 48 50 53 56 57 63 64 67 68 69 71 72 73 74 75 76 77 78 9 LIST OF FIGURES — Continued 5.12 Codeblocks from different subbands (left) that correspond to a region of interest in the image (right). . . . . . . . . . . . . . . . . . . . . . . . . 5.13 Comparison of end-to-end encoding times for SLCC and JPEG2000. . . 5.14 Comparison of rate vs. quality for SLCC and JPEG2000. . . . . . . . . 5.15 Achievable frame rate at different quality levels for SLCC and JPEG2000. 79 80 81 82 10 LIST OF TABLES 3.1 4.1 4.2 4.3 4.4 4.5 5.1 5.2 Compression performance of 5-D fMRI dataset. . . . . . Average PSNR of volume rendered images for Dataset 1 Average PSNR of volume rendered images for Dataset 2 Average PSNR of volume rendered images for Dataset 3 Average PSNR of volume rendered images for Dataset 4 Standard deviation of PSNR for Dataset 1 . . . . . . . . Compression performance of test sequence 1. . . . . . . . Compression performance of test sequence 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 60 60 61 62 65 73 74 11 ABSTRACT With advances in imaging and communication systems, there is increased use of multi-dimensional images. Examples include multi-view image/video, hyperspectral image/video and dynamic volume imaging in CT/MRI/Ultrasound. These datasets consume even larger amounts of resources for transmission or storage compared to 2-D images. Hence, it is vital to have efficient compression methods for multidimensional images. In this dissertation, first, a JPEG2000 Part-2 compliant scheme is proposed for compressing multi-dimensional datasets for any dimension N ≥ 3. Secondly, a novel view-compensated compression method is investigated for remote visualization of volumetric data. Experimental results indicate superior compression performance compared to state-of-the-art compression standards. Thirdly, a new scalable low complexity coder is designed that sacrifices some compression efficiency to get substantial gain in throughput. Potential use of the scalable low complexity coder is illustrated for two applications: Airborne video transmission and remote volume visualization. 12 CHAPTER 1 INTRODUCTION With tremendous advances in imaging modalities, there has been increased use of multi-dimensional images. Multi-dimensional sources include Computed Tomography (CT), Magnetic Resonance Imaging (MRI), ultrasound imaging and multi-view image/video [1, 2, 3]. These multi-dimensional images demand large amounts of resources for transmission/storage and hence call for efficient compression techniques. In this dissertation, we present several novel techniques for efficient compression of multi-dimensional images. Chapter 2 of the dissertation provides a brief review of image compression. Subsequent chapters present the proposed compression techniques as outlined below. JPEG2000 is the current state-of-the-art still-image compression standard [4, 5]. In addition to state-of-the-art compression performance it provides rich scalability features that are useful for a wide variety of applications. Part 2 of the standard provides tools for compressing multi-component images. These part 2 extensions have been used for compressing 3-D images in applications such as 13 medical imaging [6] and remote sensing [7]. It is widely believed that the tools are applicable only to 3-D data. In Chapter 3, we present a Part 2 compliant N -D1 compression scheme for any N ≥ 3. Volume visualization or volume rendering refers to the display of volumetric data (such as CT/MRI scans) as 2-D images from any desired view-points (A point in the 3-D space from which the data is seen). It is a computationally intensive process, often requiring hardware acceleration to achieve a real-time viewing experience. Remote volume visualization has gained importance over the past few years in medical and industrial applications. One remote visualization model that can accomplish this would transmit rendered images from a server based on view-point requests from a client. For constrained server-client bandwidth, an efficient compression scheme is vital for transmitting high quality rendered images. In Chapter 4, we present a new compression scheme that utilizes the geometric relationship between view-points to exploit the correlation between successive rendered images. The proposed method performs better than H.264/AVC (Advanced Video Codec), the state of the art video compression standard. Additionally, our scheme obviates motion estimation between rendered images, enabling significant reduction to the complexity of the compressor compared to H.264/AVC. 1 N refers to the number of dimensions of the data 14 Applications that require real-time (25-30 frames/s) transmission of images, often call for a low complexity compressor or decompressor. Additionally, scalability and compression efficiency can be very desirable properties in such applications. In Chapter 5, we present a new scalable low complexity image coder that has good compression efficiency and high throughput. We illustrate its use for two applications: remote volume visualization and airborne video transmission. 15 CHAPTER 2 BACKGROUND Data compression (or source coding) refers to removing redundancy from a source such as speech, image, video, etc. It can be broadly categorized into lossless and lossy compression. With lossless compression, the original data can be recovered exactly. Lossy compression further reduces the compressed size by allowing loss in the fidelity of the data. A typical lossy image compression system consists of transformation, quantization and entropy coding. The input image samples are first subjected to transformation. The goal of transformation in image compression is to confine the energy to a few coefficients (‘energy compaction’), so that subsequent stages can exploit the redundancy efficiently. The transformed coefficients are reduced in precision by the quantizer. The quantized values are then compressed using an entropy coder [5]. Popular entropy coding techniques used in image compression include Huffman coding [8], arithmetic coding [9, 10] and run-length coding [5]. Overviews of linear block transforms and subband transforms are provided 16 in Section 2.1 and Section 2.2, respectively. Section 2.3 gives a brief overview of JPEG2000, the current international standard for image compression. The reader is referred to [5] for comprehensive coverage of these topics. Much of the notation used in the following sections is adopted from [5]. 2.1 Linear block transforms A linear block transform maps an input vector I (size CI × 1) into an output vector O (size CO × 1). The analysis (forward transform) equation is given by, O = A∗ I where, A is the analysis matrix (size CI × CO ) and A∗ is its conjugate transpose. We shall restrict our attention to ‘non-expansive’ transforms for which the analysis matrix is square (CI = CO = C). The synthesis equation (inverse transform) is given by I = SO where the synthesis matrix, S = A−∗ . If S = A, the transform is said to be orthonormal. Applying a linear block transform to an input vector containing all the pixels in an image is prohibitive for most practical systems. For example, with a image 17 size of 1000×1000, applying the forward transform involves multiplying the analysis matrix (size 106 × 106 ) with 106 pixels. Hence, the image is typically subdivided into small blocks and the transform is applied to each block independently. Typical block sizes are 16 × 16 and 8 × 8. Pixels in a block are raster scanned to form the input vector. The forward transform is then given by, O[n] = A∗ I[n] where, n denotes the block index, I is the input vector (size C × 1) and O is the output vector (size C × 1). Brief discussions of 2 linear block tranforms are given below. The Karhunen-Loève transform (KLT) is an orthonormal transform that decorrelates the input data. Let the input vector I and the output vector O be realizations of random vectors with covariance matrices COVI and COVO respectively. The eigen-vectors of covariance matrix COVI form the columns of the KLT analysis matrix. It can then be shown [5] that the output vector of the KLT transform will be uncorrelated (COVO will be diagonal). The disadvantage of KLT is high computational complexity and that the statistics of the input data should be known apriori. The Discrete cosine transform (DCT) is an orthonormal transform with 18 analysis matrix elements given by, ap,q = νq cos πq (p + 0.5) C where, ap,q denotes the element of A at pth row and q th column. νq is a normalization constant. It is equal to p p 1/C if q = 0 and 2/C if q 6= 0. The DCT is a separable transform and hence the 2-D transform can be implemented as a 1-D transform first along the rows and then along columns (or viceversa). For natural imagery, the analysis matrix of the DCT closely resembles that of the KLT. Additionally, the DCT has lower complexity than the KLT and the analysis matrix does not depend on input statistics. Additionally, there exist fast implementations of the DCT. Hence, it is used in many compression systems. 2.2 Subband transform One disadvantage with block transforms is that the redundancy between adjacent blocks is not removed. This manifests as blocky artifacts in reconstructed images. Subband transforms overcomes the disadvantage by adding memory to block transforms. The analysis equation is given by, O[n] = X i A∗ [i]I[n − i] 19 where, A[i] denotes sequence of analysis matrices (size C × C). The synthesis equation is given by, I[n] = X S[i]O[n − i] i The size of analysis/synthesis matrices used in image compression is typically equal to 2 (size C = 2). Hence, we restrict discussions to this special class (‘two-channel’) of subband transforms. 2 2 L H 2 2 Figure 2.1: Filter bank realization of ‘two-channel’ subband transforms. Fig. 2.1 shows the filter bank interpretation of ‘two-channel’ subband transforms. In this interpretation, the input is considered as a single sequence I. The correspondance to the vector treatment is given by, I[2n + p] = (I[n])p , pth sample of I[n] p ∈ 0, 1 The input sequence is passed through a lowpass filter and a highpass filter with impulse response h0 and h1 respectively. The output of the filters are down-sampled by 2 to form the lowpass coefficients (L band) and the highpass coefficients (H band). The lowpass coefficients and highpass coefficients will form the even (O[2n]) and 20 odd (O[2n + 1]) samples of the output sequence, respectively. Mathematically, O[2n + q] = (O[n])q , q th sample of O[n] q ∈ 0, 1 = (I ⋆ hq )[2n] = X hq [k]I[2n − k] k In the synthesis path, upsampled (Fig. 2.1) lowpass coefficients and highpass coefficients are passed through filters with impulse response g0 and g1 respectively. The output of the filters are added to reconstruct back the input sequence I. The relationship between the impulse response of the analysis/synthesis filters and the analysis/synthesis matrices can be shown [5] to be hq [2i − j] = (A∗ [i])q,j gq [2i + j] = (S[i])j,q 2.2.1 q, j ∈ 0, 1 Lifting ‘Lifting’ is a method of factorizing the analysis/synthesis filters that make design and implementation of subband transforms convenient. Fig. 2.2 shows the lifting implementation of subband analysis. The input sequence is separated into even and odd sub-sequences. In each lifting step, one sub-sequence is filtered and added to the other sub-sequence. Let l (= 1, 2, . . . , Λ) denote the lifting step index and Ol denote the intermediate output sequence after l lifting steps. The update equations 21 are given by, If l is odd Ol [2n] = Ol−1 [2n] Ol [2n + 1] = Ol−1 [2n + 1] + X λl [k]Ol−1 [2n − k] k else Ol [2n + 1] = Ol−1 [2n + 1] Ol [2n] = Ol−1 [2n] + X λl [k]Ol−1 [2n + 1 − k] k At the end of Λ lifting steps the output sub-sequences Ol [2n] and Ol [2n + 1] are scaled by subband gains K0 and K1 respectively. + + + + + + + + Figure 2.2: Lifting implementation of subband analysis. Fig. 2.3 shows the lifting implementation of subband synthesis. In the synthesis path, the subband gains are inverted and lifting steps of the analysis path are applied in reverse order. In each lifting step, one sub-sequence is filtered and subtracted from the other sub-sequence. A chief advantage with lifting implementation 22 is that the invertibility of the transform is unaffected even if the convolution operators are replaced with any other linear/non-linear, fixed/time-varying operators. Other advantages include computation and memory savings [5]. + + + - - - + - Figure 2.3: Lifting implementation of subband synthesis. 2.2.2 Discrete wavelet transform The Discrete Wavelet Transform (DWT) is intimately connected to the subband transform. It is generally understood as a dyadic ‘tree-structured’ subband transform, as illustrated in Fig. 2.4. The subband transform is recursively applied to the lowpass band. The suffix to the subband notation denotes the recursion level. 2 2 2 L3 2 H3 L2 L1 2 2 H2 H1 Figure 2.4: One dimensional DWT with 3 levels. 23 In a 2-D DWT, the subband transform is first applied along the columns and then along the rows. This procedure generates 4 subbands: LL, HL, LH and HH, as shown in Fig. 2.5a. As in the 1-D case, the procedure is recursively applied on the LL band to yield subbands at different levels. Fig. 2.5b shows the 2-D DWT with 3 levels of transform. LL 3 LH 3 HL 3 HH 3 LH2 LH1 LH1 LL1 HL2 HL1 HH 1 (a) 1 level HH 2 HH 1 HL1 (b) 3 levels Figure 2.5: Two dimensional DWT. 2.3 JPEG2000 The architectural layout of a typical Part 1 JPEG2000 encoder is shown in Fig. 2.6 [5]. The image samples are first level shifted to make the pixels nominally zero mean. The encoder can take two paths, reversible or irreversible. In the reversible case the original image can be recovered exactly. In the irreversible case, complete 24 recovery of the original image may not be possible. Both the reversible and the irreversible paths have an optional color transform. The color transform is typically used for natural color images to exploit the redundancy between RGB components. The reversible color transform (RCT) does integer to integer transformation while the irreversible color transform (ICT) involves floating point computation. Level shift Image samples Irreversible path Irreversible color transform 2-D DWT (9,7) Quantization Reversible path Reversible color transform 2-D DWT Reversible (5,3) Embedded bit-stream from each codeblock Ranging Block coder Figure 2.6: JPEG2000 Part 1 encoder. The next step in the encoder is a 2-D DWT. JPEG2000 Part 1 supports two wavelet transform: (9,7) and (5,3). The numbers indicate the respective filter tap lengths of h0 and h1 . The (9,7) transform is used in the irreversible path. The reversible (5,3) is an integer-to-integer transform that is obtained by introducing 25 rounding operations in the lifting steps of a regular (5,3) transform. Perfect reconstruction is possible with the reversible (5,3) transform using integer arithmetic and hence, it is used in the reversible path. The 2-D DWT enables multi-resolution representation of the image as illustrated in Fig. 2.7. In the figure, R0 denotes the lowest resolution level and R3 denotes the highest resolution level (Subbands belonging to different resolution levels are shaded differently). The image can be reconstructed at a desired resolution by combining the subbands at that and lower resolution levels. For example, the LL3 subband can be used as a low resolution version (R0) of the original image. When the HL3, LH3, and HH3 subbands are used together with the LL3 subband, the image can be reconstructed at the next higher resolution (R1). Note that this resolution scalability of JPEG2000 is enabled by independent compression (and thus decompression) of subbands, as discussed below. Each subband is subdivided into codeblocks which are compressed independently by a block coder. In addition to allowing independent compression of subbands (thus resolution scalability), this scheme allows finer granularity of access to wavelet coefficients within each subband. This finer granularity of access to wavelet coefficients enables spatial scalability, as illustrated in Fig. 2.8. In the figure, the codeblocks containing the wavelet coefficients which contribute to a spatial 26 R0 (LL3 ) R1 LH 3 LH 2 HL 3 HH 3 LH 1 R2 HL2 HH2 R3 HL 1 HH 1 Codeblock Figure 2.7: Three level 2-D DWT. region of interest (ROI) are highlighted. Since each codeblock is compressed independently, decompressing the portions of the codestream corresponding to these codeblocks is sufficient to reconstruct the desired ROI. The block coder in JPEG2000 uses a variant of context-adaptive binary arithmetic coding to compress the quantized wavelet coefficients. It generates compressed bit-stream for each codeblock. The arrangement of the bit-stream from each codeblock into a final compressed data stream (‘code-stream’) could be seen as a secondary process. The PCRD-opt (Post compression rate distortion optimization) algorithm can be used for generating the code-stream, to get optimal rate distortion performance [5]. 27 Figure 2.8: Codeblocks from different subbands (left) that correspond to a region of interest in the original image (right). 28 CHAPTER 3 COMPRESSION OF MULTI-DIMENSIONAL IMAGES USING JPEG2000 3.1 Introduction JPEG2000 Part 2 [11] describes the use of multi-component transforms (MCT) to decorrelate a multi-component image in the component direction. The procedure is illustrated in Fig. 3.1. Input image components, each of size X × Y , can be grouped arbitrarily to form subsets known as ‘component collections.’ Each such collection of components is then processed by a ‘transform block’ which uses one of four transform options: linear block transform, dependency transform, wavelet transform or null transform. This transform is applied independently at each spatial location (x,y) and hence is referred to as a point transform. It is important to note that multiple component collections can be defined, and that each collection can be processed by a different transform. Additionally, outputs from the processing of one collection can be used as inputs to another collection. The final output components are compressed using the 2-D (‘xy’) wavelet transform and block coding. PCRD-opt 29 is then typically used to perform rate allocation across the components to create an embedded code-stream [5]. Indices describing which components belong to each collection are signalled in the header via the MCC and MIC marker segments. The particular transform performed on each collection is signalled through MCT marker segments [11]. A linear block transform operates on an input component collection I having CI components, producing an output collection O having CO components. Mathematically, for a given spatial location (x, y), I(x, y) is a CI × 1 matrix containing the pixel at location (x, y) from each input component. Then, O(x, y) = AT [I(x, y) − B] (3.1) where A is the analysis matrix (size CI × CO ). Additionally, B (size CI × 1) is an offset vector. It is worth emphasizing that CO can be different from CI . This can be useful in cases where it might be decided apriori that l output components will be discarded (CO = CI − l). The synthesis equation of the linear block transform is given by R(x, y) = S Ô(x, y) + B̃ (3.2) where Ô denotes the compressed/decompressed version of O, R denotes the reconstructed image components and S is the synthesis matrix of size CR × CO . Here again, CR need not be equal to CO (nor CI ) . This can be useful, for example, in 30 creating pseudo-color components from compressed LANDSAT data when no such components were present at compression time [5]. MCT Muli-component Image Data Component Collections Each Collection is Processed by a Transform Block 1. 2. 3. 4. Linear block Transform Dependency Transform Wavelet Transform Null transform Transform Block Embedded Bit Stream PCRD-opt Codestream Formation 2D Wavelet Transform and Block Coding 2D Compression of Each Component Figure 3.1: JPEG2000 Part 2 Encoder. The synthesis equation for a dependency transform is given by R(x, y) = SR(x, y) + Ô(x, y) + B̃ (3.3) Unlike linear block transforms, the synthesis matrix S is restricted to be square and strictly lower triangular, so that R can be reconstructed recursively. This can be used to effect a predictive coding system as described in [5]. Apart from the two matrix based transforms described above, JPEG2000 Part 2 supports wavelet transforms for decorrelating a component collection. In this case, the 1-D wavelet transform is applied to I(x, y) to get O(x, y). Similarly, the 1-D inverse wavelet transform is applied to Ô(x, y) to get R(x, y). With a null transform, no processing is done. Hence O(x, y) = I(x, y) and R(x, y) = Ô(x, y). 31 The MCT extensions described above have been used for efficient 3-D compression applications such as medical [6] and hyper-spectral [7] imaging. In this dissertation, we propose a JPEG2000 Part 2 compliant compression scheme for N D datasets for any N ≥ 3. Section 3.2 gives the algorithmic details of the N -D compression scheme. Scalability features of the proposed scheme are discussed in Section 3.3. In Section 3.4, the proposed methodology is tested on a 5-D functional Magnetic Resonance Imaging (fMRI) dataset. Experimental results indicate considerable improvement in compression performance for this dataset, when the MCT is used with N > 3. Section 3.5 concludes the chapter. 3.2 N-D compression Let (M1 , M2 , . . . , MN −2 , X, Y ) denote the dimensions (size) of an N -D dataset. The data can then be interpreted as a collection of 2-D (X, Y ) images (or components). The number of such components is M= N −2 Y Mk (3.4) k=1 Each component is indexed by an N − 2 dimensional vector: [m1 , m2 , . . . , mN −2 ] where mk ∈ {1, 2, . . . , Mk }. To compress an N -D dataset, the M components are processed by N −2 stages of MCT. Specifically, each stage has M input components and produces M output components, which in turn are used as inputs to the next 32 stage. Consider the k th (k ∈ {1, 2, . . . , N − 2}) stage of the MCT which is used to decorrelate the data along the k th dimension. The procedure is depicted in Fig. 3.2. The input components are first reordered to form collections. A collection is formed by grouping all components having the same value of the N − 3 dimensional index [m1 , m2 , . . . , mk−1 , mk+1 , . . . , mN −2 ]. There are then M/Mk collections, each containing Mk components. For each collection, a point transform is applied, resulting in decorrelation of the data along the k th dimension. The transform of a particular collection is referred to as a Transform Block (TB). The output components of the final MCT stage (N − 2) are referred to as code-stream components, and are compressed using 2D (x, y) DWT and block coding. It should be noted that the encoding procedure described above will result in a code-stream that is entirely compliant with JPEG2000 Part 2. Fig. 3.3 illustrates the procedure with a 4-D example. The data for this example contain two volumetric (‘xyz’) cubes, corresponding to two time instances (T = 2). Each ‘xyz’ cube contains three slices (Z = 3). Two MCT stages are used to process the 4-D data. In the first MCT stage, the input components are grouped to form two ‘xyz’ cubes. Each cube is processed by a transform block thereby decorrelating the data along z. The output components of the first stage are passed to the second stage, where they are reordered to form three ‘xyt’ cubes. Each of these three cubes is processed by a transform block to decorrelate the data 33 along t. The result is 6 code-stream components that are each subjected to 2-D compression. Transform Block No. 1 Components from previous stage are reordered Output components passed to the next stage Mk components Transform Block No. M/M k Mk components Figure 3.2: Stage k of MCT. For simplicity, we assume in what follows that the same transform is used for each transform block within a stage, but that a different transform can be used for each stage. 3.3 Scalability issues JPEG2000 is a highly scalable compression scheme, in which portions of the compressed data can be selectively decoded to obtain a particular component, resolution, quality, or region of interest (ROI). In this section, spatial scalability is discussed for the proposed N -D compression scheme. Recall that each component is indexed by an N − 2 dimensional vector: [m1 , m2 , . . . , mN −2 ] and each transform block in the k th (k ∈ {1, 2, . . . , N −2}) stage is indexed by an N − 3 dimensional vector: [m1 , m2 , . . . , mk−1 , mk+1 , . . . , mN −2 ]. 34 MCT Stage 1 ‘xyz’ cube-2 ‘xyz’ cube-1 Transform Block T2 Y X Transform Block T1 Two ‘xyz’ cubes (corresponding to t=1 and t=2) are formed Component reordern ig Each transnform block decorrelates one cube in z direction Transform Block Z3 Y Transform Block Z2 X Transform Block Z1 Each transform block decorrelates one cube in t direction Three ‘xyt’ cubes (corresponding to z=1, z=2 and z=3) are formed MCT Stage 2 Figure 3.3: Compression of a 4-D data using 2 MCT stages. To decompress the N -D data, the MCT stages are applied in reverse order. Specifically, the reconstructed code-stream components are first processed by stage N − 2 followed by stage N − 3 and so on. Let us consider decompressing one ‘xy’ component of the N -D data. Let [m2 , . . . , mN −2 ] denote the particular transform block in stage 1 that generates the desired ‘xy’ component. To output the desired component, the transform block will require a set of input components indexed by [j1 , m2 , m3 , . . . , mN −2 ], where j1 ∈ J1 , kJ1 k = J1 . Each of these J1 components will be produced by a different transform block in stage 2. Specifically, the component [j1 , m2 , m3 , . . . , mN −2 ] will be generated by the transform block [j1 , m3 , . . . , mN −2 ]. 35 To output that component, the transform block will require a set of input components denoted by [j1 , j2 , m3 , . . . , mN −2 ] (j2 ∈ J2 , kJ2 k = J2 ). Thus, to generate each j1 ∈ J1 , stage 2 requires a different set of components (of size J2 ) from stage 3. In total, we need J1 × J2 components from stage 3. Arguing inductively, the set of code-stream components S, that is necessary to reconstruct the desired ‘xy’ component is of size kSk = N −2 Y Jk (3.5) k=1 The value of Jk will depend on the type of point transform used in the k th stage of MCT. For matrix based transforms, Jk will depend on the number of nonzero coefficients in the synthesis matrix. In the case of wavelet transforms, Jk will be determined by the lengths of the wavelet filters and the number of wavelet transform levels used. It is important to note that Jk is the number of input components required to produce one output component by a transform block in the k th stage of MCT. If consecutive components are desired, the required number of input components may increase only modestly depending on the point transform employed (e.g., wavelet). If a subset of pixels from a component (or multiple components) are desired, the spatial scalability properties of 2-D JPEG2000 can be used to significantly reduce the amount of data that must be accessed and/or decompressed. 36 Table 3.1: Compression performance of 5-D fMRI dataset. Bit-rate=⇒ Treatment Transform 1 2D - ‘xy’ 0.1 0.25 0.5 0.75 1 1.5 33.12 42.49 48.93 53.52 57.12 62.56 41.01 47.33 52.89 56.85 60.09 65.07 55.06 60.78 65.01 67.73 69.86 72.97 55.88 61.19 65.20 67.91 69.97 73.03 Description JPEG2000 Part 1 Single stage MCT 2 3D - ‘xyz’ - 2 levels of (9,7) Stage 1 MCT ‘t’ - 5 levels of (9,7) 3 4D - ‘xyzt’ Stage 2 MCT ‘z’ - 2 levels of (9,7) Stage 1 MCT ‘r’ - KLT Stage 2 MCT ‘t’ 4 5D - ‘xyztr’ - 5 levels of (9,7) Stage 3 MCT ‘z’ - 2 levels of (9,7) 3.4 Results Two examples of multi-dimensional sources are medical images (e.g., fMRI, 4D cardiac and ultrasound) and multi-view video [2]. 4-D compression techniques have been developed in [12, 13, 14, 3] for specific classes of multi-dimensional datasets. In this section, we present experimental results on the compression performance of a 5-D fMRI dataset using the proposed N -D compression scheme. fMRI is a medical imaging modality that generates a time series of 3-D images of the brain while a subject is performing some task. Variations of voxel intensity as a function of time are analyzed to determine activations pertaining to a 37 particular task. Multiple trials of the fMRI experiment are carried out to improve the accuracy of the analysis. The 5-D dataset used here contains ‘xyz’ cubes with dimensions X = 64, Y = 64 and Z = 21. A single trial (‘r’) consists of 100 ‘xyz’ cubes corresponding to 100 time (‘t’) instances (T = 100). There are three 4D-cubes (‘xyzt’) corresponding to three trials (R = 3) of the experiment. Each sample value is represented with 13 bits. Table 3.1 gives PSNR values (in dB) for 4 different compression treatments. In treatment 1, each ‘xy’ slice is compressed independently using a JPEG2000 Part 1 encoder. Post Compression Rate Distortion optimization (PCRD-opt) [5, 15] is used to perform rate allocation jointly across all 6300 slices (Z ×T ×R). Treatments 2-4 use one or more MCT stages to decorrelate the data along ‘z’, ‘t’ and/or ‘r’. Each of the output code-stream components is subjected to 2 levels of 2-D dyadic (9,7) wavelet transform and compression by the JPEG2000 block coder. Here again, PCRD-opt is used to perform rate allocation jointly across all 6300 components. As noted in Table 3.1, treatment-2 uses a single stage of MCT that applies 2 levels of 1-D dyadic (9,7) transform along ‘z’. Treatment-3 decorrelates the data along ‘z’ and ‘t’, using two stages of MCT. As noted in the table, the first stage applies 5 levels of (9,7) transform along ‘t’, while the second stage uses 2 levels of (9,7) transform along ‘z’. Treatment-4 exploits dependancy in all five dimensions. In this treatment, the Karhunen-Loève transform (KLT) is used along ‘r’ in the first 38 MCT stage. The (9,7) wavelet transform is then used along ‘t’ and ‘z’ in the second and third stages, respectively. Treatments 2-4 described above result in JPEG2000 Part 2 compliant codestreams. As seen in Table 3.1, considerable compression gain is achieved with 4-D compression when compared to 3-D compression. The gain is 12 to 14 dB at low to moderate bit rates and 8 to 9 dB at high bit rates. This gain is attributed to high temporal correlation present in the fMRI dataset, which is well-exploited by the (9,7) transform. It should be noted that the performance of the 3-D case shown in Table 3.1 (treatment 2) is a little better than the ‘usual’ 3-D JPEG2000 because rate allocation is across all cubes. The same is true for treatment 1. The 5-D method gives an additional gain of 0.4 to 0.8 dB at low bit rates. This is due to the decorrelation of the data along ‘r’ using the KLT. The (9,7) transform, when used in place of the KLT, yields smaller gains at low bit rates and small losses at high bit rates. Note that the gain of the 5-D treatment over 4-D is much less compared to the gain of the 4-D treatment over 3-D. This can be attributed to the fact that the size of the trial dimension (R=3) is very small for this dataset. For the results reported here, the transform decomposition structure employed was separable for all dimensions except ‘xy.’ The ‘xy’ transform was dyadic. However, it is worth mentioning that any desired structure can be employed for N − 2 dimensions, and any desired structure can be used for the ‘xy’ dimensions. 39 The only restriction is that the ‘xy’ transform must be performed last (after all operations on the ‘N − 2’ other dimensions are complete). This restricts the usage of some N -D wavelet decompositions. For example, in the 3-D case, 2 or more levels of 3-D dyadic wavelet transform cannot be performed in a JPEG2000 Part 2 compliant scheme. This may not be a serious limitation as JPEG2000 Part 2 compliant decompositions can be very efficient. For example, with hyperspectral images (3-D), a 1-D DWT in the spectral domain followed by a 2-D DWT has been found to be the most efficient [16]. For the fMRI dataset presented in this paper, the performance of treatment-3 is only slightly worse (0.20 to 0.31 dB) than the best 4-D (‘xyzt’) decomposition structure for fMRI found in [17]. Also, the performance of treatment-3 is only slightly worse (0.13 to 0.24 dB) compared to 4D SPIHT [13]. 3.5 Conclusion In this chapter, a JPEG2000 Part 2 compliant N -D compression scheme is presented for any N ≥ 3. The data are treated as multiple ‘xy’ slices. Three different transform coding tools can then be used to decorrelate the data in N −2 dimensions. The resulting code-stream components are compressed using 2-D DWT and block coding. Experimental results indicate that the proposed scheme can give considerable compression gain when a 5-D dataset is decorrelated in more than 3 dimensions. 40 Scalability is supported along all N dimensions. Additionally, incremental processing can be used to achieve considerable savings in memory usage. For example, line-based (sliding window) wavelet transforms can be employed in all dimensions rather than just in the y-direction as typically used for 2-D imagery. 41 CHAPTER 4 VIEW COMPENSATED COMPRESSION OF VOLUME RENDERED IMAGES 4.1 Introduction With tremendous advances in volume imaging modalities, remote visualization of volumetric images has gained importance in many applications such as medical imaging, non-destructive testing and computational fluid dynamics. One way to achieve remote visualization would be to transmit volumetric data to the client where it is rendered for visualization. Volume rendering is a memory- and computation-intensive process, often requiring hardware acceleration for high quality, real-time rendering [18]. Thus, it is likely that the quality and frame rate would be limited in a client side rendering system. Another remote visualization model would be to do the rendering at a PACS (Picture Communication and Communication System) server with dedicated hardware. This client-server model is shown in Fig. 4.1. All the steps in a volume rendering pipeline are executed at the server, based on view-point requests from the client. The rendered images are compressed 42 and transmitted to the client. The client decompresses the images for display. This model has been investigated [19, 20] and it has been shown that a real time viewing experience is achievable. Generic compression schemes such as LZO [21], ZLIB [22] and color cell compression [23, 24] were examined [19] for compressing the rendered images. In this work, we present a compression scheme that exploits characteristics of volume rendered images to achieve significant improvement in compression efficiency. Client request Hardware Accelerated Rendering Engine Desired View Point Rendered image Compressor PACS Server Decompressor client Figure 4.1: Client-server model. The sequence of volume rendered images can be treated as video frames and compressed using video compression standards such as Motion JPEG2000 [25, 5] or Advanced Video Codec (AVC/H.264) [26]. Motion JPEG2000 (JPEG2000 Part 3) codes each frame in a video independently using JPEG2000 [5]. AVC exploits inter-frame correlation to achieve state-of-the-art compression performance for video. It uses a block based motion model to capture the motion between successive frames. An essential part of a typical AVC encoder is motion estimation, 43 which is a computationally intensive process. To exploit temporal redundancy without the complexity of motion estimation, we note that rendered images possess an underlying geometric relationship. We show that this relationship can be exploited with little computational complexity. A Lifting-based [27, 28] Invertible Motion Adaptive Transform (LIMAT) was developed in [29] for highly scalable video compression. In LIMAT, motion compensation between video frames is incorporated into the lifting steps of a temporal wavelet transform. A chief benefit of the LIMAT scheme is that the temporal transform remains invertible even with non-invertible motion warping operations. Thus, it gives a higher degree of freedom in choosing a motion warping model that accurately captures the actual motion between frames. With accurate motion modelling and estimation, the LIMAT scheme can realize the temporal wavelet transform along the motion trajectories in a video [29]. Hence, excellent energy compaction can be achieved. We adopt an approach similar to LIMAT for compressing volume rendered images. Specifically, warping operations between rendered images are derived from their geometric relationship. They are incorporated into the lifting steps of a temporal wavelet transform to decorrelate the sequence of volume rendered images. Resulting subband images are compressed using JPEG2000. The chapter is organized as follows. Section 4.2 gives a brief review of 44 volume rendering. In Section 4.3, we establish the geometric relationship between volume rendered images. The proposed compression scheme is laid out in Section 4.4. First, we discuss the incorporation of geometric warping operations into the lifting steps of a wavelet transform. Secondly, compression of side information is considered. Results are presented in Section 4.5, where the compression performance of our scheme is compared with AVC and Motion JPEG2000. Section 4.6 concludes the chapter. 4.2 Volume rendering Ray casting [30, 31] is an effective method for achieving high quality volume rendering. Other techniques such as light field rendering [32, 33] or rendering based on GPUs [34] can be used for rendering volumetric data. However, the quality of images produced by such techniques is poor compared to the ray casting approach [31]. It should be noted that in this chapter, we focus on dynamic visualization of a static dataset as opposed to time-varying datasets [35]. In ray casting (depicted in Fig. 4.2), a ray from each pixel in the 2D image is passed through the volume. The intensities at sample points on the ray are used for computing the pixel value from which the ray emerges. This operation of formulating the pixel value is known as ray composition. Sample points on the ray do not typically coincide with voxels1 1 We refer to values in the 3-D volume as voxels. We refer to values in the 2-D rendered image as pixels. 45 in the 3-D volume and hence an interpolation scheme is required. Linear interpolation is a simple and effective choice. Higher quality images can be achieved through cubic-convolution or B-spline interpolation schemes at the expense of more computation. In this work, we use linear interpolation. Sample points Rays Volumetric data 2D image Figure 4.2: Ray casting. In one method of ray composition, referred to as Maximum Intensity Projection (MIP) [31], the maximum value along the ray is taken to be the intensity of the pixel. More advanced ray compositing operations are typically used to produce color images that aid in better visualization of the volumetric data. To this end, consider the ray emerging from pixel (x, y) in the rendered image I. Let N (x, y) be the number of sample points on this ray (Fig. 4.2). Color transfer functions Fr , Fg and Fb map the intensity si , at sample point i, to red (ri ) , green (gi ) and blue (bi ) values respectively. The scalar opacity transfer function (FSO ) maps si to opacity value αi . A normalized opacity value between 0 and 1 is used, 1 being fully opaque. 46 To summarize, the color and opacity values at sample point i are written as ri = Fr (si ), gi = Fg (si ), bi = Fb (si ), αi = FSO (si ) ; ri , gi , bi , and αi ∈ [0, 1] . The red, green and blue components of the pixel (x, y) in image I are given by I (x, y) = X i=1 N (x,y) N (x,y) N (x,y) r r̂i , g I (x, y) = X ĝi , b I (x, y) = i=1 X b̂i , (4.1) i=1 where r̂i = ri × αi × i−1 Y j=1 i−1 i−1 Y Y (1 − αj ) ĝi = gi × αi × (1 − αj ) b̂i = bi × αi × (1 − αj ) j=1 j=1 This equation is referred as the ray casting equation [30] in the literature. In what follows, only processing for the red component is discussed, with identical operations being carried out for the green and blue components. In many visualization scenarios, it is necessary for rendered images to show clear distinction between 3D objects in the volumetric data. In this regard, use of gradient information helps in accentuating the edges in the data. First, a gradient volume is formed by computing the gradient magnitude at each voxel. As with intensity si , the gradient magnitude k∇i k at sample point i on the ray, can be obtained through interpolation. The gradient opacity transfer function (FGO ) then maps k∇i k to a gradient opacity value between 0 and 1. The αi used in the ray casting equation is modified to incorporate the gradient term as αi = FSO (si ) × FGO (k∇i k). 47 In some volume acquisition systems such as Magnetic Resonance Imaging (MRI), determining color and opacity values based on voxel intensity alone is difficult. In such cases, the volumetric data is divided into structural units using a segmentation algorithm. This along with the intensity is then used to determine the color and opacity at sample points on the ray. Fig. 4.3 illustrates volume rendering with an example. A CT dataset consisting of 200 slices, each of size 102×247, is rendered using transfer functions as shown in the figure. All rendering for this work was done with Visualization Tool Kit (VTK) [36], an open source library for visualization algorithms. 1 ri 0.6 0.3 60 71 102 255 60 71 102 255 60 71 102 255 1 gi 0.6 0.3 1 bi 0.6 0.3 0 (a) Rendered image FSO ( (b) Color transfer functions si ) 0.6 1 0.4 0.6 0.2 0.3 7 255 102 (c) Scalar opacity transfer function FGO ( ) 11 20 (d) Gradient opacity transfer function Figure 4.3: Volume rendering example. 255 48 4.3 Geometric relationship between volume rendered images Fig. 4.4 shows the coordinate system convention we use for deriving the geometric relationship between volume rendered images. The axes Xw , Yw and Zw represent the ‘world coordinate system.’ It is fixed and does not change with view-points. Let us consider rendering the volume with respect to view-point P1 which is at a distance d from origin O of the world coordinates. A view coordinate system v1 ≡‘X1 Y1 Z1 ’ is formed with P1 as the origin and line OP1 as the Z1 axis. The line OP1 subtends angles θ1 (azimuth) and φ1 (elevation) with planes ‘Yw Zw ’ and ‘Xw Zw ,’ respectively. A 2D image (I1 ) of dimension L × L is formed on the ‘X1 Y1 ’ plane, where L is the diagonal length of the volumetric data2 . The image is formed by casting a ray (parallel to Z1 axis) from each pixel location through the volume. Yw O Xw Y1 X1 P1 Zw Z1 I1 Figure 4.4: View-point geometry Let us now consider another image I2 rendered at view-point P2 . P2 is 2 Note that this choice of L ensures sufficient pixels in the image to cover the volume at any view-point. 49 parameterized by distance d, azimuth θ2 and elevation φ2 . As before, a view coordinate system v2 ≡‘X2 Y2 Z2 ’ is constructed and image I2 is formed on the ‘X2 Y2 ’ plane, by casting rays parallel to the Z2 axis. Let us consider the ray from pixel (x2 , y2 ) in I2 . The number of sample points on the ray is N (x2 , y2 ). Let the ith sample point on the ray be denoted by (x2 , y2 , z2i ) in the view coordinate system v2 . Thus, the ray casting equation (Eqn.4.1) for the red component of I2 can be written as N (x2 ,y2 ) I2r (x2 , y2 ) = X r̂(x2 , y2 , z2i ) where (4.2) i=1 r̂(x2 , y2 , z2i ) = r(x2 , y2 , z2i ) × α(x2 , y2 , z2i ) × i−1 Y (1 − α(x2 , y2 , z2j )) j=1 Our goal is to get a good prediction from I1 for the pixel (x2 , y2 ) in I2 . To this end, we first find a single point on the ray (emanating from (x2 , y2 )) that can well describe the pixel (x2 , y2 ). Since every sample point on the ray can have some contribution to the pixel, we compute a ‘centroid depth value’ denoted by C2 (x2 , y2 ). Specifically, each sample point depth z2i is weighted by the fractional contribution of that sample point to the pixel. The equation for determining the red component of the centroid depth value is written as N (x2 ,y2 ) C2r (x2 , y2 ) = X z2i × r̂(x2 , y2 , z2i )/I2r (x2 , y2 ), assuming (4.3) 0 I2r (x2 , y2 ) 6= i=1 Thus, a ‘depth map’ consisting of depth values for each non-zero pixel in I2 is generated. Discussion on depth values for zero-pixels in I2 is defered to Section 4.4.2. 50 Such depths are taken to be zero for now. Fig. 4.5 shows the depth map corresponding to the rendered image of Fig. 4.3a. The use of this depth map to determine the geometric mapping from I2 to I1 is discused next. Red component Green component Blue component Figure 4.5: Depth map. Based on coordinate system transformations, it can be shown that a point, represented as (x2 ,y2 ,z2 ) in view coordinate system v2 can be transformed to the world coordinate representation (xw ,yw ,zw ) through the matrix multiplication defined below. xw y w = M2→w × z w 1 x2 y2 z2 1 (4.4) 51 where, M2→w cos(θ2 ) − sin(θ2 ) × sin(φ2 ) sin(θ2 ) × cos(φ2 ) d × sin(θ2 ) × cos(φ2 ) 0 cos(φ2 ) sin(φ2 ) d × sin(φ2 ) = − sin(θ ) − cos(θ ) × sin(φ ) cos(θ ) × cos(φ ) d × cos(θ ) × cos(φ ) 2 2 2 2 2 2 2 0 0 0 1 Similarly, the world coordinate system can be transformed to view coordinate system v1 using Mw→1 . Thus, transformation from coordinate system v2 to v1 can be accomplished using M2→1 where, M2→1 = Mw→1 × M2→w . For the red component of the pixel (x2 , y2 ) in I2 , the coordinate system v2 location corresponding the centroid depth value is (x2 , y2 , C2r (x2 , y2 )). This representation, can be transformed to view coordinate system v1 to locate the pixel (xr1 , y1r ) in I1 as x2 r y2 x1 = M̂2→1 × C r (x , y ) y1r 2 2 2 1 (4.5) where M̂2→1 is a 2 × 4 matrix containing the first 2 rows of M2→1 . Thus, using the depth map C2 , a geometric mapping from I2 to I1 is obtained. In the ensuing sections, we denote this geometric mapping as µ2→1 . In a straight forward implementation, this mapping requires 6 multiplications and 6 additions per pixel. 52 Specifically, xr1 = m11 × x2 + m12 × y2 + m13 × C2r (x2 , y2 ) + m14 y1r = m21 × x2 + m22 × y2 + m23 × C2r (x2 , y2 ) + m24 where mij represents the element in the ith row and j th column of M̂2→1 . More efficient implementation can give considerable savings in computation. For example, with raster scanning of I2 , only x2 and the depth values change while visiting pixels in row y2 . This observation leads to the following algorithm for computing (xr1 , y1r ) for all pixels of I2 in row y2 . Σx = m12 × y2 + m14 Σy = m22 × y2 + m24 f or x2 = 1 to L { Σx = Σx + m11 Σy = Σy + m21 xr1 = Σx + m13 × C2r (x2 , y2 ) y1r = Σy + m23 × C2r (x2 , y2 ) } Ignoring initialization, the required computation is reduced to 2 multiplications and 53 4 additions per pixel. 4.4 Compression of volume rendered images Fig. 4.6 shows the proposed compression scheme for volume rendered images. The rendered images are first subjected to the Lifting-based View Compensated Wavelet Transform (LVCWT) described below in subsection 4.4.1. The resulting subband images are then compressed using the JPEG2000 [5] encoder. The LVCWT requires depth maps for establishing the geometric relation between the rendered images. Hence, the depth maps should to be sent to the decoder as side information. Compression of this side information is discussed in subsection 4.4.2. Compressor Ray casting View compensated forward wavelet transform JPEG2000 encoder JPEG2000 decoder Depth map generation Side information JPEG2000 encoder JPEG2000 decoder Display View compensated inverse wavelet transform JPEG2000 decoder Decompressor Figure 4.6: Compression/decompression of volume rendered images. 54 4.4.1 Lifting-based view compensated wavelet transform Let I0 , I1 , . . . , I2k , I2k+1 , . . . denote the sequence of rendered images corresponding to view-points P0 , P1 , . . . , P2k , P2k+1 , . . ., respectively. The rendered images are all of size L×L. Let us first consider the application of “the usual” 5/3 wavelet transform across the sequence of rendered images. In the first lifting step (prediction step), pixel (x, y) in an odd-indexed image is predicted from pixel (x, y) in the neighboring even-indexed images. The prediction residual (high pass coefficient) is given by O2k+1 (x, y) = I2k+1 (x, y) − 1/2[I2k (x, y) + I2k+2 (x, y)] In the second lifting step (update step), the low pass coefficient is obtained as follows. O2k (x, y) = I2k (x, y) + 1/4[O2k−1 (x, y) + O2k+1 (x, y)] The high pass coefficients are finally scaled by half. A chief benefit of the lifting implementation is that the wavelet transform remains invertible, even if non-invertible operations are performed inside the lifting steps. Hence, we incorporate the geometric mapping derived in Section 4.3 inside the lifting steps to form the LVCWT. Fig. 4.7 shows the forward LVCWT. Let µi→j denote the geometric mapping from image Ii to image Ij . In the first lifting step of LVCWT, the pixel (x, y) in I2k+1 is predicted from pixels µ2k+1→2k (x, y) and 55 µ2k+1→2k+2 (x, y) in I2k and I2k+2 respectively. The prediction residual is then given by O2k+1 (x, y) = O2k+1 (x, y) − 1/2[I2k (µ2k+1→2k (x, y)) + I2k+2 (µ2k+1→2k+2 (x, y))] Note that µ2k+1→2k (x, y) and µ2k+1→2k+2 (x, y) will typically yield non-integer locations. Hence, linear interpolation is used to compute the pixel value. In the second lifting step, the pixel (x, y) in I2k is updated with pixels µ2k→2k−1 (x, y) and µ2k→2k+1 (x, y) in O2k−1 and O2k+1 respectively. The resulting low pass coefficient is given by O2k (x, y) = I2k (x, y) + 1/4[O2k−1 (µ2k→2k−1 (x, y)) + O2k+1 (µ2k→2k+1 (x, y))] The prediction residuals are finally scaled by half to form the high pass coefficients. In the inverse LVCWT, high pass coefficients are first scaled by 2. The lifting steps are then applied in reverse order with signs of the prediction and update values reversed. Though we have described the LVCWT using the 5/3 wavelet transform, it is applicable for any transform implemented in lifting fashion. 4.4.2 Compression of depth maps Depth maps are required at the encoder and decoder to determine the geometric mapping used in LVCWT. Hence, efficient compression of depth maps is necessary for transmiting them as side information. Before we consider the compression of 56 + + + + _ + 1/2 Figure 4.7: Forward lifting-based view compensated wavelet transform. depth maps, depth values for zero intensity pixels need to be determined. Recall that depth values for such pixels were taken to be zero in Section 4.4 (see Fig. 4.5). However, experimental results indicate that copying the depth value of a nearby ‘non-zero pixel’ gives better compression performance. Specifically, when I(x, y) = 0, the depth value at (x, y) is set to be the same as that of the nearest ‘non-zero pixel’ in row x (if any exist). The process is repeated along the columns. Fig. 4.8 shows the depth map obtained by processing the depth map in Fig. 4.5 using the aforementioned procedure. Since the red, green and blue components of the depth map differ little, we transmit only the average of the three component depth maps. The performance loss incured due to this is small (0.04 to 0.11 dB). Each processed average depth map is compressed independently using 57 Red component Green component Blue component Figure 4.8: Processed depth map. JPEG2000. At the decompressor, the side information is decoded to get the reconstructed depth maps that are used by the inverse LVCWT. As shown in Fig. 4.6, the forward LVCWT also uses the reconstructed version of the depth maps. A fraction f of the target bit rate is used for compressing the depth maps. The trade-off parameter f was set at 0.06 based on experiments with training sequences. The optimum f was found to vary only slightly with bit rate and/or training sequence. 4.5 Results In this section, we compare the compression performance of the proposed scheme with AVC and Motion JPEG2000. With all three methods, we do not use any color transform across the RGB components of the volume rendered images. The YCbCr transform that is generally used for decorrelating the RGB components of natural 58 images did not yield any improvement for our test images. With AVC, every 8th image in the rendered image sequence is coded as Intra frame (GOP length = 8). The case of GOP length = 16 is also considered. All other images in the sequence are coded as P frames. Profile 4:4:4/12 of AVC is used and encoding parameters are tuned to achieve best possible compression performance. In the proposed compression scheme, the sequence is subjected to 3 levels of dyadic LVCWT3 . The resulting subband images are partitioned into groups of 8. Each group is treated as an 8-component image to be compressed using JPEG2000 (Fig. 4.6). No (further) component transform is used. The advantage of this procedure is that PCRD-opt [5] (rate allocation) is performed jointly for the 8 subband images. Table 4.1 shows the compression performance of Motion JPEG2000, AVC and view compensated compression at different target bit-rates. Target bit-rates for AVC are shown in the first row of the table. The bit-rates achieved by AVC are shown in row 2. With our scheme, it is possible to achieve any target bit-rate very closely. Hence, to make a fair comparison, the AVC achieved bit-rate is used as the target bit-rate for view compensated compression and for Motion JPEG2000. The compression performance of the three schemes are presented for 4 different ‘out of training’ datasets (Table 4.1, a-d). Dataset 1 correspond to the rendering example of Fig. 4.3. Dataset 2 is obtained using the same parameters as dataset 3 The images are not divided into groups of 8 for the purpose of the temporal transform. 59 1, but without gradient opacity mapping. Datasets 3 and 4 are rendered sequences from another CT volume consisting of 300 slices each of size 512×512. Dataset 3 is obtained with gradient opacity maping while dataset 4 is obtained without gradient opacity mapping. Each dataset comprises 4 sequences, each corresponding to a different set of view-points as listed below: 1. Constant elevation of 0◦ , azimuth varied from 0◦ to 59.5◦ , in steps of 0.5◦ . 2. Azimuth and elevation both varied from 0◦ to 59.5◦ , in steps of 0.5◦ . 3. Constant azimuth of 180◦ , elevation varied from 0◦ to 59.5◦ , in steps of 0.5◦ . 4. Azimuth and elevation varied from 45◦ to 104.5◦ , and 0◦ to 59.5◦ respectively, both in steps of 0.5◦ . Each sequence contains 120 frames, and the encoder is restarted at the beginning of each sequence. The average PSNR of the datasets, defined below, is presented in Tables 4.1-4.4. 4 Average PSNR = 120 XX 1 PSNR(n, f ) 120 × 4 n=1 f =1 PSNR(n, f ) = 10 × log10 255 × 255 M SE(n, f ) where, M SE(n, f ) denotes the mean squared error of decompressed frame f in sequence n. As seen from tables, Motion JPEG2000 has the lowest average PSNR 60 for all datasets as inter-frame correlation is not exploited. AVC performs better by exploiting inter-frame correlation. With AVC, gains from 0.1 to 1 dB are obtained at different bit-rates when the GOP length is increased from 8 to 16. View compensated compression performs better than AVC consistently across all datasets. The gains range from 1.97 to 2.13 dB at low bit-rates and from 0.68 to 2.53 dB at high bit-rates. By exploiting the underlying geometric relation between rendered images we are able to achieve superior compression performance. Additionally, as noted in Section 4.3, the geometric mapping has very low complexity (2 multiplications and 4 additions per pixel) when compared to estimating motion between frames. Table 4.1: Average PSNR of volume rendered images for AVC target bit rate (Kbps) ⇒ 200 400 600 AVC achieved bit rate (Kbps) ⇒ 206 411 617 Motion JPEG2000, GOP length = 1 23.91 26.00 27.64 AVC, GOP length = 8 28.28 31.97 33.83 AVC, GOP length = 16 29.24 32.34 34.07 View compensated compression 31.37 34.09 35.76 Gain over AVC, GOP length = 16 +2.13 +1.75 +1.69 Dataset 800 825 29.02 35.23 35.33 36.92 +1.59 1 1000 1030 30.10 36.33 36.38 37.88 +1.51 Table 4.2: Average PSNR of volume rendered images for AVC target bit rate (Kbps) ⇒ 200 400 600 AVC achieved bit rate (Kbps) ⇒ 206 411 617 Motion JPEG2000, GOP length = 1 26.96 29.91 32.32 AVC, GOP length = 8 34.42 37.92 39.53 AVC, GOP length = 16 35.50 38.35 39.80 View compensated compression 37.46 40.28 41.93 Gain over AVC, GOP length = 16 +1.97 +1.93 +2.13 Dataset 800 823 34.00 40.66 40.83 43.00 +2.17 2 1000 1029 35.50 41.56 42.62 43.30 +0.68 The PSNR values for all frames in sequence 1 of dataset 1, at bit-rates of 200 Kbps and 800 Kbps, are plotted in Fig. 4.9 and Fig. 4.10, respectively. The 61 Table 4.3: Average AVC target bit rate (Kbps) AVC achieved bit rate (Kbps) Motion JPEG2000 GOP length = 1 AVC GOP length = 8 AVC GOP length = 16 View compensated compression Gain over AVC with GOP length = 16 PSNR of volume rendered images for Dataset 3 1000 2000 3000 4000 5000 6000 7000 1028 2052 3074 4099 5124 6158 7198 26.61 28.87 30.31 31.40 32.29 33.07 33.76 28.47 31.83 33.22 34.20 34.91 35.48 36.00 29.39 32.03 33.43 34.32 34.96 35.55 36.07 31.40 33.15 34.29 35.17 35.91 36.52 37.05 +2.00 +1.12 +0.86 +0.85 +0.95 +0.97 +0.98 variation of PSNR with AVC is due to the fact that more bits are allocated for I frames compared to P frames. The variation pattern observed with view compensated compression is due to the cyclostationary property of the 5/3 transform. As seen in Fig. 4.9 and Fig. 4.10, the variation of PSNR is considerably less for view compensated compression than for AVC. To quantify the variation in quality, the standard deviation for PSNR is computed as v 4 u 120 X u 1 X 1 t (P SN R(n, f ) − M P (n, f ))2 σ = 4 n=1 120 f =1 where 1 M P (n, f ) = 10 −1 10×⌊ f10 ⌋+10 X i=10×⌊ f −1 10 P SN R(n, i) ⌋+1 M P (n, f ) is the mean PSNR for the 10 consecutive frames of sequence n that contain frame f . M P (n, f ) is calculated for every 10 frames as opposed to taking 62 Table 4.4: Average AVC target bit rate (Kbps) AVC achieved bit rate (Kbps) Motion JPEG2000 GOP length = 1 AVC GOP length = 8 AVC GOP length = 16 View compensated compression Gain over AVC with GOP length = 16 PSNR of volume rendered images for Dataset 4 1000 2000 3000 4000 5000 6000 7000 1028 2052 3074 4099 5124 6158 7198 25.93 28.08 29.43 30.48 31.32 32.04 32.67 29.10 32.13 33.35 34.20 34.86 35.40 35.90 30.14 32.46 33.55 34.31 34.93 35.38 35.84 32.35 34.44 35.65 36.54 37.25 37.87 38.37 +2.21 +1.98 +2.10 +2.23 +2.32 +2.49 +2.53 the average of all 120 frames, to discount the variation in PSNR due to variation in compressibility of frames, which can be high for long sequences. Table 4.5 shows the standard deviation of PSNR for dataset 1. As noted before, a gain in average PSNR is obtained for AVC when GOP is increased from 8 to 16. However, this is generally accompanied by a marked increase in the standard deviation. View compensated compression has a decrease in standard deviation of 56% to %75 when compared to AVC at different bit-rates. Similar decreases were observed for other datasets. 63 34 32 PSNR (dB) 30 28 26 View compensated compression AVC with GOP length = 8 AVC with GOP length = 16 24 22 0 20 40 60 Frame No. 80 100 120 Figure 4.9: PSNR of frames in sequence 1 of dataset 1 at 200 Kbps. 4.6 Conclusion In this Chapter, we presented a new scheme for determining the underlying geometric relationship between volume rendered images. The geometric mapping, so obtained, is incorporated into the lifting steps of the 5/3 wavelet transform, which is applied along the temporal dimension. The proposed method is tested on 4 different datasets and results indicate superior compression performance when compared to AVC, the state-of-the-art video compression standard. Additionally, the scheme obviates motion estimation between rendered images, enabling significant reduction to the complexity of the encoder. 64 40 38 PSNR (dB) 36 34 32 View compensated compression AVC with GOP length = 8 AVC with GOP length = 16 30 28 0 20 40 60 Frame No. 80 100 120 Figure 4.10: PSNR of frames in sequence 1 of dataset 1 at 800 Kbps. 65 Table 4.5: Standard deviation of PSNR for Dataset 1 AVC AVC target bit rate (Kbps) 200 400 600 800 AVC AVC achieved bit rate (Kbps) 206 411 617 825 Motion JPEG2000 GOP length = 1 0.10 0.07 0.06 0.06 AVC GOP length = 8 1.53 0.86 1.11 1.38 AVC GOP length = 16 1.47 1.24 1.63 1.99 View compensated compression 0.36 0.55 0.68 0.77 Percentage gain over AVC with GOP length = 16 75.36 55.59 58.69 61.63 1000 1030 0.06 1.72 2.26 0.86 62.09 66 CHAPTER 5 SCALABLE LOW COMPLEXITY CODER 5.1 Introduction Applications that require real time transmission of images often call for a low complexity compressor/decompressor. Additionally, scalability and compression efficiency can also be desirable properties in such applications. JPEG2000 [4, 5] offers superior compression performance and highly scalable codestreams. The drawback of JPEG2000 is its high complexity, which may cause its encoding/decoding speed to be a bottleneck in some real-time applications. Hence, a ‘JPEG2000-like’ coder is proposed that sacrifices some compression efficiency to give substantial increase to the throughput of encoder and decoder. The chapter is organized as follows. In Section 5.2, we present the Scalable Low Complexity Coder (SLCC). The use of SLCC in remote volume visualization and airborne video transmission is described in Section 5.3 and Section 5.4, respectively. 67 5.2 Scalable low complexity coder Fig. 5.1 gives the encoding architecture of SLCC. The input image samples are subjected to 2-D DWT. Each subband is subdivided into codeblocks and are compressed independently by a block coder. This procedure enables resolution and spatial scalability features in SLCC. In addition, a limited amount of quality scalability is obtained by dividing each codeblock into two parts (layers) as described below. Image samples 2-D dyadic discrete wavelet transform Division of subbands into codeblocks Stack coding of each codeblock Codestream with 2 quality layers Block coder Figure 5.1: Schematic of the scalable low complexity coder. Fig. 5.2 illustrates the data structure of codeblocks in different subbands. For codeblocks in the LL2 subband, all bit-planes above the 4th bit plane are stacked and sent as the first quality layer. The other subbands have fewer bit planes included in the first layer due to their lesser importance. The importance of each subband is measured from a MSE point of view, by taking into account the synthesis filter energy weights associated with the inverse wavelet transform. These energy weights are rounded to the nearest power of two and used to adjust the stack lengths. In the example of Fig. 5.2, codeblocks belonging to the HL2 and LH2 subbands will 68 have one less bit plane in the first layer while the HH2 and level-1 subbands have two less. All-zero bit planes in a codeblock are termed as missing MSBs and are indicated in the header information. The stack of bit-planes (discounting missing MSBs) from each codeblock contributing to a layer is coded in one single pass. The coding scheme employed depends on the stack length of the codeblock and is shown in Fig. 5.3a. Entropy coding is restricted to the first three (or less) bit-planes in the first layer. When there is one bit-plane, the position indices of ‘ones’ in that bit-plane are coded. Run-value and Quad-Comma coding are used for stack lenghts of 2 and 3 respectively. These two techniques are described below. For codeblocks with more than 3 bit planes, Quad-Comma coding is used for the three MSBs and raw bits are coded for the remaining bit-planes. Codeblocks Bit-plane No. 11 10 9 8 7 6 5 4 3 2 1 LL2 HL2 Missing MSBs LH2 HH2 First (Initial) Quality Layer HL1 LH1 HH1 Second (Final) Quality Layer Figure 5.2: Wavelet coefficient structure. Run-value coding is chosen here on the assumption of a sparse distribution of non-zero values. The run length of zeros and value of the significant coefficient 69 terminating the run are coded. Additional gain is obtained by coding values of three consecutive coefficients after the zero run. This is because neighbors of a significant coefficient have higher probability of being significant. With Quad-Comma coding (Fig. 5.3b), one bit is first spent to indicate the significance of a quad. Based on statistical experiments, coefficient values in a significant quad were found to follow a distribution that is close to geometric with parameter ρ≤0.5. Comma codes [37] are optimum for such geometric distributions and hence are used to code each of the four coefficients in a significant quad. For both the coding schemes described above, the sign bit is appended to the value of the significant coefficient. Stack length of a codeblock Method in the first layer Position index of significant coefficients =1 Run value coding =2 Quad-Comma codes =3 Quad-comma codes for 3 MSBs and >3 raw bits for remaining bit-planes (a) Coding scheme decisions 4 3 2 1 (b) Quad coding with comma codes Figure 5.3: Low complexity entropy coding schemes. 70 5.3 Remote Volume Rendering Fig. 4.1 shows a client-server communication setup for remote volume rendering. All the steps in the volume rendering pipeline are executed at the server using dedicated hardware. Based on view point requests from a client, the server transmits the sequence of 2D rendered images interactively. With limited bandwidth, an efficient compression scheme is vital for transmitting high quality rendered images. In Chapter 4, an efficient view-compensated compression scheme was presented. In addition to efficiency, the complexity of the decompressor should be low so that the transmitted data can be decompressed by a low-end client at a desired frame rate (25-30 frames per second). Hence, SLCC can be useful for remote visualization when low cost client stations do not have specialized hardware/high computational power. With the two layer scheme of SLCC, the server may transmit only the first layer during an interactive session. Once the interaction stops, the second layer can be sent to give lossless representation of the image at that particular view point. The number of bit-planes of the first layer can be adjusted based on a desired bit rate, which may be computed from available bandwidth and desired frame rate. The compression and throughput performance of SLCC is presented below. 71 Hardware Accelerated Rendering Engine Desired View Point Client request Rendered image Wavelet transform Decompressor Block coding PACS server Compressed data client Figure 5.4: Client-Server communication. 5.3.1 Results Volume rendering typically produces RGB color images. The RGB color components are compressed without the use of a decorrelating transform. The YCbCr transform that is generally used for natural RGB images did not yield any improvement for our test images. Experimental results are presented with 2 ‘out of training’ test sequences. These 2 sequences were obtained using different rendering parameters. Each sequence has 30 frames and the frame size is 600×600. Fig. 5.9 shows the 1st frame in test sequence 1. The throughput performance of SLCC is compared with Kakadu V5.0 - an efficient JPEG2000 software[38] implementation. Timing experiments were carried out on a PC with ‘Intel Core2 Duo T5470 1.6GHz’ processor and 2GB RAM. The end-to-end decompression time consists of reading the compressed data 72 from memory (RAM), block decoding, inverse 2-D DWT and writing the decompressed image to display. The end-to-end decompression time of SLCC is compared to Kakadu V5.0 in Fig. 5.5 and Fig. 5.6 for test sequence 1 and test sequence 2 respectively. The fastest mode of JPEG2000, referred as the ‘bypass’ mode is used [5]. As seen from the figures, 2 to 4 times speed-up is obtained with SLCC at moderate to high rates. 140 Kakadu V5.0 SLCC End−to−end decompression time (ms) 120 100 Lossless 80 60 40 20 0 0 2 4 6 8 10 Rate (bits/pixel) 12 14 16 18 Figure 5.5: Comparison of end-to-end decompression time for test sequence 1. Table 5.1 and Table 5.2 show the compression performance of SLCC and kakadu V5.0 for test sequence 1 and test sequence 2 respectively. As noted from the tables, SLCC has 1.1 to 3.1 dB PSNR loss for sequence 1 and 1.1 to 2.3 dB PSNR loss for sequence 2, compared to Kakadu V5.0. Fig. 5.8 compares the quality of decompressed images at the client under a decoding time constraint corresponding to a frame rate of 30 fps. At this frame 73 140 Kakadu V5.0 SLCC End−to−end decompression time (ms) 120 100 Lossless 80 60 40 20 0 0 2 4 6 8 10 Rate (bits/pixel) 12 14 16 18 Figure 5.6: Comparison of end-to-end decompression time for test sequence 2. Table 5.1: Compression performance of test sequence 1. Rate (in bits/pixel) 0.19 0.39 0.87 1.75 3.61 JPEG2000 30.46 31.50 33.49 36.54 41.69 SLCC 29.38 30.52 32.01 34.58 38.59 rate, the decompressor will have to work with a time constraint of 33ms/frame. In the case of Kakadu V5.0, the decompression time reaches this limit at 0.76 bpp (7.8Mbps). Thus the image quality with Kakadu V5.0 will be limited to 33.03 dB (PSNR at 0.76 bpp) at all bandwidths greater than 7.8 Mbps. If the available bandwidth exceeds approximately 15 Mbps, SLCC can use the available data to yield higher PSNR than Kakadu V5.0. Fig. 5.10 and Fig. 5.11 show the decompressed images from Kakadu V5.0 and SLCC respectively, when the availble bandwidth is 70 Mbps. As seen from the figure, SLCC can display very high quality images at this bandwidth. 74 Table 5.2: Compression performance of test sequence 2. Rate (in bits/pixel) 0.17 0.29 0.70 1.48 2.97 JPEG2000 30.72 31.54 33.29 36.21 40.43 SLCC 29.63 30.71 32.12 34.39 38.14 Average PSNR of decompressed frames (dB) 44 Kakadu V5.0 SLCC 42 40 38 36 34 32 30 28 0 10 20 30 40 Bandwidth (in Mbps) 50 60 70 Figure 5.7: Average PSNR of decompressed images at the client for test sequence 1. 5.4 Airborne video transmission With technological advances in image acquisition systems, use of high speed, high resolution video cameras are common in many telemetering applications. Such cameras [39, 40] have been developed for airborne applications including reconnaissance, earth survey [41], radar and sonar systems [42]. These specialized video cameras can record images at 200-400 frames per second (fps) and acquire dual band imagery (visible and IR). Features required in an image compression algorithm for airborne reconnaissance are discussed in [43]. For real-time transmission, the image encoder 75 Average PSNR of decompressed frames (dB) 44 Kakadu V5.0 SLCC 42 40 38 36 34 32 30 28 0 10 20 30 40 Bandwidth (in Mbps) 50 60 70 Figure 5.8: Average PSNR of decompressed images at the client for test sequence 2. needs to keep up with the image acquisition hardware. Additionally, the transmission of compressed images is often through an error prone wireless channel. Thus a fast, efficient, and error resilient image compression scheme is vital to realize the full potential of airborne reconnaisance [44]. The computational complexity of JPEG2000 can make software implementations impractical for some real-time airborne reconnaissance. Other compression standards such as MPEG-4 and H.264 may not be practical in these applications either. While the decoding complexities of these standards are lower, their encoding complexities are much higher than JPEG2000 [45]. In this section, we describe the use of SLCC for airborne reconnaissance. In airborne video transmission, the scalability features of SLCC can be used 76 Figure 5.9: Frame 1 of test sequence 1. to achieve ample bandwidth savings and functionality. With resolution scalability, only parts of the compressed data, corresponding to the resolution required at the ground station needs to be transmitted in real-time. Similarly, quality scalability can be very beneficial as well. While a small portion of the compressed codestream can be transmitted to the ground station for real-time analysis, the entire compressed codestream yielding a much higher quality (perhaps even lossless) can be stored onboard for further processing at a later time. Spatial scalability may allow real-time transmission of the data that corresponds to a desired ROI to the ground station, while the data for the rest of the scene is saved on board for later processing. Alternatively, spatial scalability can be used in conjunction with quality scalability to separately adjust the quality of the ROI and the background. This can ensure high quality reconstruction of the ROI when sufficient bit budget is not 77 Figure 5.10: Compressed/Decompressed image – JPEG2000. available to provide high quality throughout the entire image. This is illustrated in Fig. 5.12 where the ROI is reconstructed at higher quality level than the background. This feature, when used with an object tracking mechanism, can be very useful in surveillance applications as shown in [46]. 5.4.1 Results In this section, the performance of SLCC is compared to Kakadu V5.0 for a 720x576 grayscale aerial video sequence with 100 frames. All the timing experiments were carried out on a PC with a 2.8GHz P4 processor and 512MB RAM. Fig. 5.13 compares the end-to-end encoding times of JPEG2000 and SLCC at different bit-rates (bits/pixel). The end-to-end encoding time comprises reading the input image from 78 Figure 5.11: Compressed/Decompressed image – SLCC. memory, 2D DWT, block encoding and writing the compressed data to memory. As seen in the figure, SLCC is 3 to 5 times faster than Kakadu V5.0. Fig. 5.14 shows the compression performance of SLCC and Kakadu V5.0 averaged over 100 frames for the above aerial video sequence. Peak Signal to Noise Ratio (PSNR) is used as the quality metric. In the figure, SLCC incurs a 0.6 to 1 dB loss at low to moderate bit rates compared to Kakadu V5.0. Alternatively, for a given image quality, SLCC produces a 15-20% larger compressed codestream when compared to Kakadu V5.0. However, due to its significantly reduced complexity, SLCC can deliver a much higher frame rate for a desired quality level. This can be seen in Fig. 5.15 where the achievable frame rate (reciprocal of end-to-end encoding time) is plotted against PSNR. For example, at a quality level of 30 dB PSNR, 79 Figure 5.12: Codeblocks from different subbands (left) that correspond to a region of interest in the image (right). SLCC can deliver images at 98 fps while Kakadu V5.0 can only deliver at 30 fps. The corresponding required transmission rates are 7.75 Mbps and 2.09 Mbps for SLCC and Kakadu V5.0, respectively. At a PSNR of 45 dB, SLCC can run at 70 fps where Kakadu V5.0 can only run at 15 fps. The corresponding required transmission rates are 60.91 Mbps and 11.57 Mbps for SLCC and Kakadu V5.0, respectively. Note that SLCC and Kakadu V5.0 have roughly symmetric encoder/decoder complexity. That is, complexities of the encoder and the decoder are roughly equal. Thus, the above results are representative for decoder performance as well. 80 80 End to end encoding time (ms) 70 Lossless 60 50 Kakadu V5.0 SLCC 40 30 Lossless 20 10 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Rate (bits/pixel) Figure 5.13: Comparison of end-to-end encoding times for SLCC and JPEG2000. 81 Inf Lossless 50 PSNR (dB) 45 40 35 SLCC Kakadu V5.0 30 25 20 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Rate (bits/pixel) Figure 5.14: Comparison of rate vs. quality for SLCC and JPEG2000. 82 110 JPEG2000 SLCC Achievable frame rate (fps) 100 90 80 70 60 50 40 Lossless 30 20 10 25 30 35 40 PSNR (dB) 45 50 Inf Figure 5.15: Achievable frame rate at different quality levels for SLCC and JPEG2000. 83 CHAPTER 6 CONCLUSION In this dissertation, we presented several novel techniques for efficient compression of multi-dimensional images. In chapter 3, a JPEG2000 Part 2 compliant N -D compression scheme was presented for any N ≥ 3. The data are treated as multiple ‘xy’ slices. Three different transform coding tools can then be used to decorrelate the data in N − 2 dimensions. The resulting code-stream components are compressed using 2-D DWT and block coding. Experimental results indicate that the proposed scheme can give considerable compression gain when a 5-D dataset is decorrelated in more than 3 dimensions. Scalability is supported along all N dimensions. In Chapter 4, a new view compensation scheme was presented that utilizes the geometric relationship between view-points to exploit the correlation between successive volume rendered images. The proposed method performs better than H.264/AVC (Advanced Video Codec), the state of the art video compression standard. Additionally, our scheme obviates motion estimation between rendered images, enabling significant reduction to the complexity of the compressor compared 84 to H.264/AVC. Applications that require real-time (25-30 frames/s) transmission of images, often call for a low complexity compressor or decompressor. Additionally, scalability and compression efficiency can be very desirable properties in such applications. In Chapter 5, we presented a new scalable low complexity image coder that has good compression efficiency and high throughput. We illustrate its use for two applications: remote volume visualization and airborne video transmission. 85 REFERENCES [1] Z. H. Cho, J. P. Jones, and M. Singh, Foundations of Medical Imaging. WileyInterscience, 1993. [2] J. G. Lou, H. Cai, and J. Li, “A real-time interactive multi-view video system,” 13th Annual ACM International Conference on Multimedia, pp. 161–170, 2005. [3] M. Kitahara, H. Kimata, S. Shimizu, K. Kamikura, Y. Yashima, K. Yamamoto, K. Yendo, T. Fujii, and M. Tanimoto, “Multi-view video coding using view interpolation and reference picture selection,” IEEE Int. Conference on Multimedia and Expo, pp. 97–100, Jul 2006. [4] Information technology - JPEG2000 image coding system - Part 1: Core coding system, ISO/IEC JTC1/SC29 WG1, 15444-1:2000 Std., Jul 2002. [5] D. S. Taubman and M. W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards and Practice. Boston: Kluwer Academic Publishers, 2002. [6] K. M. Siddiqui, E. L. Siegel, B. I. Reiner, O. Crave, J. P. Johnson, Z. Wu, J. C. Dagher, A. Bilgin, M. W. Marcellin, and M. Nadar, “Improved compressibility of multi-slice CT datasets using 3D JPEG2000 compression,” in Computer Assisted Radiology and Surgery, Jun 2004, pp. 28–31. [7] J. H. Kasner, A. Bilgin, M. W. Marcellin, A. Lan, B. V. Brower, S. S. Shen, , and T. Wilkinson, “JPEG2000 compression using 3D wavelets and KLT with application to HYDICE data,” in Applications of Digital Image Processing XXV, Proc. SPIE, vol. 4132, Jul 2000, pp. 157–166. [8] D. A. Huffman, “A method for the construction of mimimum redundancy codes,” in Proc. IRE, vol. 40, 1952, pp. 1098–1101. [9] J. Rissanen, “Generalized kraft inequality and arithmetic coding,” in IBM J. Res. Develop., vol. 20, 1976, pp. 198–203. 86 [10] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression,” in Communications of the ACM, vol. 30, 1987, pp. 520–540. [11] Information technology - JPEG2000 image coding system - Part 2: Extensions, ISO/IEC JTC1/SC29 WG1, 15444-2:2000 Std., Jul 2002. [12] L. Zeng, C. P. Jansen, S. Marsch, M. Unser, and P. R. Hunziker, “Fourdimensional wavelet compression of arbitrarily sized echocardiographic data,” IEEE Transactions on Medical Imaging, vol. 21, pp. 1179–1187, Sept 2002. [13] H. G. Lalgudi, A. Bilgin, M. W. Marcellin, and M. S. Nadar, “Compression of fMRI and ultrasound images using 4D SPIHT,” IEEE Int. Conference on Image Processing, vol. 2, pp. 746–9, Sept 2005. [14] Y. Liu and W. A. Pearlman, “Four-dimensional wavelet compression of 4-D medical images using scalable 4-D SBHP,” Data Compression Conference, pp. 233–242, March 2007. [15] D. S. Taubman, “High performance scalable image compression with EBCOT,” IEEE Trans. Image Processing, vol. 9, no. 7, pp. 1258–1170, Jul 2000. [16] B. Penna, T. Tilloa, E. Magli, and G. Olmo, “Progressive 3-D coding of hyperspectral images based on JPEG2000,” IEEE Geoscience and Remote Sensing Letters, vol. 2, no. 1, pp. 125–129, January 2006. [17] H. G. Lalgudi, A. Bilgin, M. W. Marcellin, A. Tabesh, M. S. Nadar, and T. P. Trouard, “Four-dimensional compression of fMRI using JPEG2000,” in Proc. SPIE International Symposium on Medical Imaging, Feb 2005. [18] http://www.terarecon.com/. [19] S. Stegmaier, M. Magalln, and T. Ertl, “Visualization techniques I: A generic solution for hardware-accelerated remote visualization,” Proceedings of the Symposium on Data Visualisation, pp. 87–94, May 2002. [20] S. Stegmaier, J. Diepstraten, M. Weiler, and T. Ertl, “Widening the remote visualization bottleneck,” Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis, pp. 174–179, Sept 2003. [21] M. Oberhumer, http://www.oberhumer.com/opensource/lzo/. 87 [22] L. Gailly and M. Adler, http://www.gzip.org/zlib. [23] G. Campbell, T. A. DeFanti, I. Frederiksen, S. A. Joyce, and L. A. Leske, “Two bit/pixel full color encoding,” Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques, pp. 215–223, 1986. [24] T. Chu, J. Fowler, and R. Moorhead, “Evaluation and extension of SGI vizserver,” Visualization of Temporal and Spatial Data for Civilian and Defense Applications III, Proceedings of the SPIE 4368, Apr 2001. [25] Information technology – JPEG 2000 image coding system – Part 3: Motion JPEG 2000, ISO/IEC 15444-3, 2002. [26] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, July 2003. [27] I. Daubechies and W. Sweldens, “Factoring wavelet and subband transforms into lifting steps,” Journal of Fourier Analysis and Applications, vol. 4, no. 3, pp. 247–269, May 1998. [28] R. Calderbank, I. Daubechies, W. Sweldens, and B. Yeo, “Wavelet transforms that map integers to integers,” Applied and Computational Harmonic Analysis, vol. 5, no. 3, pp. 332–369, July 1998. [29] A. Secker and D. S. Taubman, “Lifting-based invertible motion adaptive transform (LIMAT) framework for highly scalable video compression,” IEEE Trans. Image Processing, vol. 12, no. 12, pp. 1530–1542, Dec 2003. [30] M. Levoy, “Efficient ray tracing of volume data,” ACM Transactions on Graphics, vol. 9, no. 3, July 1990. [31] B. Lichtenbelt, R. Crane, and S. Naqvi, Introduction to volume rendering. Prentice Hall, 1998. [32] M. Levoy and P. Hanrahan, “Light field rendering,” Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques SIGGRAPH. 88 [33] C. Chang, X. Zhu, P. Ramanathan, and B. Girod, “Light field compression using disparity-compensated lifting and shape adaptation,” IEEE Trans. Image Processing, vol. 15, no. 4, pp. 793–806, Apr 2006. [34] D. Hearn and M. P. Baker, Computer graphics with openGL. 2003. Prentice Hall, [35] L. Zhanping and R. Moorhead, “A texture-based hardware-independent technique for time-varying volume flow visualization,” Journal of Visualization, vol. 8, no. 3, pp. 235–244, 2005. [36] http://www.vtk.org/. [37] R. G. Gallager and D. C. V. Voorhis, “Optimal source codes for geometrically distributed integer alphabets,” IEEE Trans. on Information Theory, vol. 21, no. 2, pp. 228–230, Mar 1975. [38] D. Taubman, http://www.kakadusoftware.com/. [39] D. A. Dobson, S. Agwani, W. D. Washkurak, and S. G. Chamberlain, “A highspeed, high resolution TDI image sensor for use in airborne reconnaissance applications,” in Airborne Reconnaissance XVIII, Proc. SPIE, vol. 2272, Jul 1994, pp. 221–229. [40] B. Mathews, “An ultra high resolution, electro-optical framing camera for reconnaissance and other applications using a 9216 by 9216 pixel, wafer scale, focal plane array,” in Airborne Reconnaissance XXII, Proc. SPIE, vol. 3431, Jul 1998, pp. 144–154. [41] B. Uhl, “Aerial data acquisition system for earth survey,” in Airborne Reconnaissance XIV, Proc. SPIE, vol. 1342, Jul 1990, pp. 51–60. [42] A. Mason and S. Gills, “Real-time high resolution digital video for range and training applications,” in Proceedings of International Telemetering Conference, Oct 2001, pp. 646–655. [43] D. C. L. V. Berg and M. R. Kruer, “Image compression for airborne reconnaissance,” in Airborne Reconnaissance XXII, Proc. SPIE, vol. 3431, Jul 1998, pp. 2–13. 89 [44] D. R. Schmitt, H. Dorge1oh, J. Fries, H. Keil, W. Wetjen, and S. Kleindienst, “Airborne network system for the transmission of reconnaissance image data,” in Airborne Reconnaissance XXIV, Proc. SPIE, vol. 4127, Jul 2000, pp. 97–100. [45] F. Dufaux and T. Ebrahimi, “Error-resilient video coding performance analysis of motion JPEG2000 and MPEG-4,” in Visual Communications and Image Processing, Proc. SPIE, vol. 5308, 2004, pp. 596–607. [46] H. S. Kong, A. Vetro, T. Hata, and N. Kuwahara, “ROI-based SNR scalable JPEG2000 image transcoding,” in Visual Communications and Image Processing, Proc. SPIE, vol. 5690, Jan 2005, pp. 5Q1–5Q10.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement