EFFICIENT COMPRESSION TECHNIQUES FOR MULTI-DIMENSIONAL IMAGES by Hariharan G. Lalgudi

EFFICIENT COMPRESSION TECHNIQUES FOR MULTI-DIMENSIONAL IMAGES by Hariharan G. Lalgudi
EFFICIENT COMPRESSION TECHNIQUES FOR MULTI-DIMENSIONAL
IMAGES
by
Hariharan G. Lalgudi
A Dissertation Submitted to the Faculty of the
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
In Partial Fulfillment of the Requirements
For the Degree of
DOCTOR OF PHILOSOPHY
In the Graduate College
THE UNIVERSITY OF ARIZONA
2008
2
FINAL EXAMINING COMMITTEE APPROVAL FORM
As members of the Dissertation Committee, we verify that we have read the dissertation prepared by Hariharan G. Lalgudi entitled Efficient Compression Techniques
for Multi-Dimensional Images and recommend that it be accepted as fulfilling the
dissertation requirement for the Degree of Doctor of Philosophy
Date:
August 20, 2008
Date:
August 20, 2008
Date:
August 20, 2008
Date:
August 20, 2008
Michael W. Marcellin
Ali Bilgin
Bane Vasic
Ivan Djordjevic
Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final copies of the dissertation to the Graduate College. I
hereby certify that I have read this dissertation prepared under my direction and
recommend that it be accepted as fulfilling the dissertation requirement.
Date:
August 20, 2008
Dissertation Director: Michael W. Marcellin
Date:
Dissertation Co-Director: Ali Bilgin
August 20, 2008
3
STATEMENT BY AUTHOR
This dissertation has been submitted in partial fulfillment of requirements
for an advanced degree at The University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the library.
Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for
permission for extended quotation from or reproduction of this manuscript in whole
or in part may be granted by the head of the major department or the Dean of the
Graduate College when in his or her judgment the proposed use of the material is
in the interests of scholarship. In all other instances, however, permission must be
obtained from the author.
SIGNED:
Hariharan G. Lalgudi
4
ACKNOWLEDGEMENTS
I would like to thank my advisors Prof. Michael W. Marcellin and Prof. Ali Bilgin
for their continued support and valuable guidance throughout the PhD program.
Thanks to Prof. Bane Vasic and Prof. Ivan Djordjevic for spending their valuable
time to review my dissertation.
I would also like to thank my colleagues at SPACL for their valuable inputs.
Last but not the least, I would like to thank my family and friends, without whom
this work would not have been possible.
5
To my parents and my wife Rajeswari
6
TABLE OF CONTENTS
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . .
CHAPTER 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . .
2.1 Linear block transforms . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Subband transform . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Discrete wavelet transform . . . . . . . . . . . . . . . . . . .
2.3 JPEG2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CHAPTER 3 COMPRESSION OF MULTI-DIMENSIONAL IMAGES USING JPEG2000 . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 N-D compression . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Scalability issues . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CHAPTER 4 VIEW COMPENSATED COMPRESSION OF VOLUME RENDERED IMAGES . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Volume rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Geometric relationship between volume rendered images . . . . . .
4.4 Compression of volume rendered images . . . . . . . . . . . . . . .
4.4.1 Lifting-based view compensated wavelet transform . . . . . .
4.4.2 Compression of depth maps . . . . . . . . . . . . . . . . . .
4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CHAPTER 5 SCALABLE LOW COMPLEXITY CODER . . . .
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Scalable low complexity coder . . . . . . . . . . . . . . . . . . . . .
5.3 Remote Volume Rendering . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Airborne video transmission . . . . . . . . . . . . . . . . . . . . . .
8
10
11
12
15
16
18
20
22
23
28
28
31
33
36
39
41
41
44
48
53
54
55
57
63
66
66
67
70
71
74
7
TABLE OF CONTENTS — Continued
5.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
CHAPTER 6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . 83
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8
LIST OF FIGURES
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
Filter bank realization of ‘two-channel’ subband transforms. . . . . . .
Lifting implementation of subband analysis. . . . . . . . . . . . . . . .
Lifting implementation of subband synthesis. . . . . . . . . . . . . . . .
One dimensional DWT with 3 levels. . . . . . . . . . . . . . . . . . . .
Two dimensional DWT. . . . . . . . . . . . . . . . . . . . . . . . . . .
JPEG2000 Part 1 encoder. . . . . . . . . . . . . . . . . . . . . . . . . .
Three level 2-D DWT. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Codeblocks from different subbands (left) that correspond to a region of
interest in the original image (right). . . . . . . . . . . . . . . . . . . .
3.1 JPEG2000 Part 2 Encoder. . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Stage k of MCT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Compression of a 4-D data using 2 MCT stages. . . . . . . . . . . . . .
4.1 Client-server model. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Ray casting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Volume rendering example. . . . . . . . . . . . . . . . . . . . . . . . .
4.4 View-point geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Depth map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 Compression/decompression of volume rendered images. . . . . . . . .
4.7 Forward lifting-based view compensated wavelet transform. . . . . . . .
4.8 Processed depth map. . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9 PSNR of frames in sequence 1 of dataset 1 at 200 Kbps. . . . . . . . .
4.10 PSNR of frames in sequence 1 of dataset 1 at 800 Kbps. . . . . . . . .
5.1 Schematic of the scalable low complexity coder. . . . . . . . . . . . . .
5.2 Wavelet coefficient structure. . . . . . . . . . . . . . . . . . . . . . . .
5.3 Low complexity entropy coding schemes. . . . . . . . . . . . . . . . . .
5.4 Client-Server communication. . . . . . . . . . . . . . . . . . . . . . . .
5.5 Comparison of end-to-end decompression time for test sequence 1. . . .
5.6 Comparison of end-to-end decompression time for test sequence 2. . . .
5.7 Average PSNR of decompressed images at the client for test sequence 1.
5.8 Average PSNR of decompressed images at the client for test sequence 2.
5.9 Frame 1 of test sequence 1. . . . . . . . . . . . . . . . . . . . . . . . .
5.10 Compressed/Decompressed image – JPEG2000. . . . . . . . . . . . . .
5.11 Compressed/Decompressed image – SLCC. . . . . . . . . . . . . . . . .
19
21
22
22
23
24
26
27
30
33
34
42
45
47
48
50
53
56
57
63
64
67
68
69
71
72
73
74
75
76
77
78
9
LIST OF FIGURES — Continued
5.12 Codeblocks from different subbands (left) that correspond to a region of
interest in the image (right). . . . . . . . . . . . . . . . . . . . . . . . .
5.13 Comparison of end-to-end encoding times for SLCC and JPEG2000. . .
5.14 Comparison of rate vs. quality for SLCC and JPEG2000. . . . . . . . .
5.15 Achievable frame rate at different quality levels for SLCC and JPEG2000.
79
80
81
82
10
LIST OF TABLES
3.1
4.1
4.2
4.3
4.4
4.5
5.1
5.2
Compression performance of 5-D fMRI dataset. . . . . .
Average PSNR of volume rendered images for Dataset 1
Average PSNR of volume rendered images for Dataset 2
Average PSNR of volume rendered images for Dataset 3
Average PSNR of volume rendered images for Dataset 4
Standard deviation of PSNR for Dataset 1 . . . . . . . .
Compression performance of test sequence 1. . . . . . . .
Compression performance of test sequence 2. . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36
60
60
61
62
65
73
74
11
ABSTRACT
With advances in imaging and communication systems, there is increased use of
multi-dimensional images. Examples include multi-view image/video, hyperspectral
image/video and dynamic volume imaging in CT/MRI/Ultrasound. These datasets
consume even larger amounts of resources for transmission or storage compared
to 2-D images. Hence, it is vital to have efficient compression methods for multidimensional images. In this dissertation, first, a JPEG2000 Part-2 compliant scheme
is proposed for compressing multi-dimensional datasets for any dimension N ≥ 3.
Secondly, a novel view-compensated compression method is investigated for remote
visualization of volumetric data. Experimental results indicate superior compression
performance compared to state-of-the-art compression standards. Thirdly, a new
scalable low complexity coder is designed that sacrifices some compression efficiency
to get substantial gain in throughput. Potential use of the scalable low complexity
coder is illustrated for two applications: Airborne video transmission and remote
volume visualization.
12
CHAPTER 1
INTRODUCTION
With tremendous advances in imaging modalities, there has been increased use of
multi-dimensional images. Multi-dimensional sources include Computed Tomography (CT), Magnetic Resonance Imaging (MRI), ultrasound imaging and multi-view
image/video [1, 2, 3]. These multi-dimensional images demand large amounts of resources for transmission/storage and hence call for efficient compression techniques.
In this dissertation, we present several novel techniques for efficient compression of
multi-dimensional images. Chapter 2 of the dissertation provides a brief review of
image compression. Subsequent chapters present the proposed compression techniques as outlined below.
JPEG2000 is the current state-of-the-art still-image compression standard
[4, 5]. In addition to state-of-the-art compression performance it provides rich
scalability features that are useful for a wide variety of applications. Part 2 of
the standard provides tools for compressing multi-component images. These part
2 extensions have been used for compressing 3-D images in applications such as
13
medical imaging [6] and remote sensing [7]. It is widely believed that the tools are
applicable only to 3-D data. In Chapter 3, we present a Part 2 compliant N -D1
compression scheme for any N ≥ 3.
Volume visualization or volume rendering refers to the display of volumetric
data (such as CT/MRI scans) as 2-D images from any desired view-points (A point
in the 3-D space from which the data is seen). It is a computationally intensive
process, often requiring hardware acceleration to achieve a real-time viewing experience. Remote volume visualization has gained importance over the past few years
in medical and industrial applications. One remote visualization model that can
accomplish this would transmit rendered images from a server based on view-point
requests from a client. For constrained server-client bandwidth, an efficient compression scheme is vital for transmitting high quality rendered images. In Chapter
4, we present a new compression scheme that utilizes the geometric relationship
between view-points to exploit the correlation between successive rendered images.
The proposed method performs better than H.264/AVC (Advanced Video Codec),
the state of the art video compression standard. Additionally, our scheme obviates
motion estimation between rendered images, enabling significant reduction to the
complexity of the compressor compared to H.264/AVC.
1
N refers to the number of dimensions of the data
14
Applications that require real-time (25-30 frames/s) transmission of images, often call for a low complexity compressor or decompressor. Additionally,
scalability and compression efficiency can be very desirable properties in such applications. In Chapter 5, we present a new scalable low complexity image coder
that has good compression efficiency and high throughput. We illustrate its use for
two applications: remote volume visualization and airborne video transmission.
15
CHAPTER 2
BACKGROUND
Data compression (or source coding) refers to removing redundancy from a source
such as speech, image, video, etc. It can be broadly categorized into lossless and
lossy compression. With lossless compression, the original data can be recovered
exactly. Lossy compression further reduces the compressed size by allowing loss
in the fidelity of the data. A typical lossy image compression system consists of
transformation, quantization and entropy coding. The input image samples are
first subjected to transformation. The goal of transformation in image compression
is to confine the energy to a few coefficients (‘energy compaction’), so that subsequent stages can exploit the redundancy efficiently. The transformed coefficients are
reduced in precision by the quantizer. The quantized values are then compressed
using an entropy coder [5]. Popular entropy coding techniques used in image compression include Huffman coding [8], arithmetic coding [9, 10] and run-length coding
[5].
Overviews of linear block transforms and subband transforms are provided
16
in Section 2.1 and Section 2.2, respectively. Section 2.3 gives a brief overview of
JPEG2000, the current international standard for image compression. The reader
is referred to [5] for comprehensive coverage of these topics. Much of the notation
used in the following sections is adopted from [5].
2.1
Linear block transforms
A linear block transform maps an input vector I (size CI × 1) into an output vector
O (size CO × 1). The analysis (forward transform) equation is given by,
O = A∗ I
where, A is the analysis matrix (size CI × CO ) and A∗ is its conjugate transpose.
We shall restrict our attention to ‘non-expansive’ transforms for which the analysis
matrix is square (CI = CO = C). The synthesis equation (inverse transform) is
given by
I = SO
where the synthesis matrix, S = A−∗ . If S = A, the transform is said to be
orthonormal.
Applying a linear block transform to an input vector containing all the pixels in an image is prohibitive for most practical systems. For example, with a image
17
size of 1000×1000, applying the forward transform involves multiplying the analysis
matrix (size 106 × 106 ) with 106 pixels. Hence, the image is typically subdivided
into small blocks and the transform is applied to each block independently. Typical
block sizes are 16 × 16 and 8 × 8. Pixels in a block are raster scanned to form the
input vector. The forward transform is then given by,
O[n] = A∗ I[n]
where, n denotes the block index, I is the input vector (size C × 1) and O is the
output vector (size C × 1). Brief discussions of 2 linear block tranforms are given
below.
The Karhunen-Loève transform (KLT) is an orthonormal transform that
decorrelates the input data. Let the input vector I and the output vector O be
realizations of random vectors with covariance matrices COVI and COVO respectively. The eigen-vectors of covariance matrix COVI form the columns of the KLT
analysis matrix. It can then be shown [5] that the output vector of the KLT transform will be uncorrelated (COVO will be diagonal). The disadvantage of KLT is
high computational complexity and that the statistics of the input data should be
known apriori.
The Discrete cosine transform (DCT) is an orthonormal transform with
18
analysis matrix elements given by,
ap,q = νq cos
πq
(p + 0.5)
C
where, ap,q denotes the element of A at pth row and q th column. νq is a normalization
constant. It is equal to
p
p
1/C if q = 0 and 2/C if q 6= 0. The DCT is a separable
transform and hence the 2-D transform can be implemented as a 1-D transform
first along the rows and then along columns (or viceversa). For natural imagery,
the analysis matrix of the DCT closely resembles that of the KLT. Additionally,
the DCT has lower complexity than the KLT and the analysis matrix does not
depend on input statistics. Additionally, there exist fast implementations of the
DCT. Hence, it is used in many compression systems.
2.2
Subband transform
One disadvantage with block transforms is that the redundancy between adjacent
blocks is not removed. This manifests as blocky artifacts in reconstructed images.
Subband transforms overcomes the disadvantage by adding memory to block transforms. The analysis equation is given by,
O[n] =
X
i
A∗ [i]I[n − i]
19
where, A[i] denotes sequence of analysis matrices (size C × C). The synthesis
equation is given by,
I[n] =
X
S[i]O[n − i]
i
The size of analysis/synthesis matrices used in image compression is typically equal
to 2 (size C = 2). Hence, we restrict discussions to this special class (‘two-channel’)
of subband transforms.
2
2
L
H
2
2
Figure 2.1: Filter bank realization of ‘two-channel’ subband transforms.
Fig. 2.1 shows the filter bank interpretation of ‘two-channel’ subband transforms. In this interpretation, the input is considered as a single sequence I. The
correspondance to the vector treatment is given by,
I[2n + p] = (I[n])p , pth sample of I[n] p ∈ 0, 1
The input sequence is passed through a lowpass filter and a highpass filter with
impulse response h0 and h1 respectively. The output of the filters are down-sampled
by 2 to form the lowpass coefficients (L band) and the highpass coefficients (H band).
The lowpass coefficients and highpass coefficients will form the even (O[2n]) and
20
odd (O[2n + 1]) samples of the output sequence, respectively. Mathematically,
O[2n + q] = (O[n])q , q th sample of O[n] q ∈ 0, 1
=
(I ⋆ hq )[2n]
=
X
hq [k]I[2n − k]
k
In the synthesis path, upsampled (Fig. 2.1) lowpass coefficients and highpass coefficients are passed through filters with impulse response g0 and g1 respectively.
The output of the filters are added to reconstruct back the input sequence I. The
relationship between the impulse response of the analysis/synthesis filters and the
analysis/synthesis matrices can be shown [5] to be
hq [2i − j] = (A∗ [i])q,j
gq [2i + j] = (S[i])j,q
2.2.1
q, j ∈ 0, 1
Lifting
‘Lifting’ is a method of factorizing the analysis/synthesis filters that make design
and implementation of subband transforms convenient. Fig. 2.2 shows the lifting
implementation of subband analysis. The input sequence is separated into even and
odd sub-sequences. In each lifting step, one sub-sequence is filtered and added to
the other sub-sequence. Let l (= 1, 2, . . . , Λ) denote the lifting step index and Ol
denote the intermediate output sequence after l lifting steps. The update equations
21
are given by,
If l is odd
Ol [2n] = Ol−1 [2n]
Ol [2n + 1] = Ol−1 [2n + 1] +
X
λl [k]Ol−1 [2n − k]
k
else
Ol [2n + 1] = Ol−1 [2n + 1]
Ol [2n] = Ol−1 [2n] +
X
λl [k]Ol−1 [2n + 1 − k]
k
At the end of Λ lifting steps the output sub-sequences Ol [2n] and Ol [2n + 1] are
scaled by subband gains K0 and K1 respectively.
+
+
+
+
+
+
+
+
Figure 2.2: Lifting implementation of subband analysis.
Fig. 2.3 shows the lifting implementation of subband synthesis. In the synthesis path, the subband gains are inverted and lifting steps of the analysis path are
applied in reverse order. In each lifting step, one sub-sequence is filtered and subtracted from the other sub-sequence. A chief advantage with lifting implementation
22
is that the invertibility of the transform is unaffected even if the convolution operators are replaced with any other linear/non-linear, fixed/time-varying operators.
Other advantages include computation and memory savings [5].
+
+
+
-
-
-
+
-
Figure 2.3: Lifting implementation of subband synthesis.
2.2.2
Discrete wavelet transform
The Discrete Wavelet Transform (DWT) is intimately connected to the subband
transform. It is generally understood as a dyadic ‘tree-structured’ subband transform, as illustrated in Fig. 2.4. The subband transform is recursively applied to the
lowpass band. The suffix to the subband notation denotes the recursion level.
2
2
2
L3
2
H3
L2
L1
2
2
H2
H1
Figure 2.4: One dimensional DWT with 3 levels.
23
In a 2-D DWT, the subband transform is first applied along the columns
and then along the rows. This procedure generates 4 subbands: LL, HL, LH and
HH, as shown in Fig. 2.5a. As in the 1-D case, the procedure is recursively applied
on the LL band to yield subbands at different levels. Fig. 2.5b shows the 2-D DWT
with 3 levels of transform.
LL 3
LH 3
HL 3
HH 3
LH2
LH1
LH1
LL1
HL2
HL1
HH 1
(a) 1 level
HH 2
HH 1
HL1
(b) 3 levels
Figure 2.5: Two dimensional DWT.
2.3
JPEG2000
The architectural layout of a typical Part 1 JPEG2000 encoder is shown in Fig. 2.6
[5]. The image samples are first level shifted to make the pixels nominally zero
mean. The encoder can take two paths, reversible or irreversible. In the reversible
case the original image can be recovered exactly. In the irreversible case, complete
24
recovery of the original image may not be possible. Both the reversible and the
irreversible paths have an optional color transform. The color transform is typically
used for natural color images to exploit the redundancy between RGB components.
The reversible color transform (RCT) does integer to integer transformation while
the irreversible color transform (ICT) involves floating point computation.
Level
shift
Image
samples
Irreversible path
Irreversible
color
transform
2-D DWT
(9,7)
Quantization
Reversible path
Reversible
color
transform
2-D DWT
Reversible
(5,3)
Embedded bit-stream
from each codeblock
Ranging
Block coder
Figure 2.6: JPEG2000 Part 1 encoder.
The next step in the encoder is a 2-D DWT. JPEG2000 Part 1 supports
two wavelet transform: (9,7) and (5,3). The numbers indicate the respective filter
tap lengths of h0 and h1 . The (9,7) transform is used in the irreversible path. The
reversible (5,3) is an integer-to-integer transform that is obtained by introducing
25
rounding operations in the lifting steps of a regular (5,3) transform. Perfect reconstruction is possible with the reversible (5,3) transform using integer arithmetic and
hence, it is used in the reversible path.
The 2-D DWT enables multi-resolution representation of the image as illustrated in Fig. 2.7. In the figure, R0 denotes the lowest resolution level and R3
denotes the highest resolution level (Subbands belonging to different resolution levels are shaded differently). The image can be reconstructed at a desired resolution
by combining the subbands at that and lower resolution levels. For example, the
LL3 subband can be used as a low resolution version (R0) of the original image.
When the HL3, LH3, and HH3 subbands are used together with the LL3 subband,
the image can be reconstructed at the next higher resolution (R1). Note that this
resolution scalability of JPEG2000 is enabled by independent compression (and thus
decompression) of subbands, as discussed below.
Each subband is subdivided into codeblocks which are compressed independently by a block coder. In addition to allowing independent compression of
subbands (thus resolution scalability), this scheme allows finer granularity of access to wavelet coefficients within each subband. This finer granularity of access to
wavelet coefficients enables spatial scalability, as illustrated in Fig. 2.8. In the figure, the codeblocks containing the wavelet coefficients which contribute to a spatial
26
R0
(LL3 )
R1
LH 3
LH 2
HL 3
HH 3
LH 1
R2
HL2
HH2
R3
HL 1
HH 1
Codeblock
Figure 2.7: Three level 2-D DWT.
region of interest (ROI) are highlighted. Since each codeblock is compressed independently, decompressing the portions of the codestream corresponding to these
codeblocks is sufficient to reconstruct the desired ROI.
The block coder in JPEG2000 uses a variant of context-adaptive binary
arithmetic coding to compress the quantized wavelet coefficients. It generates compressed bit-stream for each codeblock. The arrangement of the bit-stream from each
codeblock into a final compressed data stream (‘code-stream’) could be seen as a
secondary process. The PCRD-opt (Post compression rate distortion optimization)
algorithm can be used for generating the code-stream, to get optimal rate distortion
performance [5].
27
Figure 2.8: Codeblocks from different subbands (left) that correspond to a region
of interest in the original image (right).
28
CHAPTER 3
COMPRESSION OF MULTI-DIMENSIONAL IMAGES USING
JPEG2000
3.1
Introduction
JPEG2000 Part 2 [11] describes the use of multi-component transforms (MCT) to
decorrelate a multi-component image in the component direction. The procedure
is illustrated in Fig. 3.1. Input image components, each of size X × Y , can be
grouped arbitrarily to form subsets known as ‘component collections.’ Each such
collection of components is then processed by a ‘transform block’ which uses one
of four transform options: linear block transform, dependency transform, wavelet
transform or null transform. This transform is applied independently at each spatial
location (x,y) and hence is referred to as a point transform. It is important to note
that multiple component collections can be defined, and that each collection can be
processed by a different transform. Additionally, outputs from the processing of one
collection can be used as inputs to another collection. The final output components
are compressed using the 2-D (‘xy’) wavelet transform and block coding. PCRD-opt
29
is then typically used to perform rate allocation across the components to create
an embedded code-stream [5]. Indices describing which components belong to each
collection are signalled in the header via the MCC and MIC marker segments. The
particular transform performed on each collection is signalled through MCT marker
segments [11].
A linear block transform operates on an input component collection I having CI components, producing an output collection O having CO components.
Mathematically, for a given spatial location (x, y), I(x, y) is a CI × 1 matrix containing the pixel at location (x, y) from each input component. Then,
O(x, y) = AT [I(x, y) − B]
(3.1)
where A is the analysis matrix (size CI × CO ). Additionally, B (size CI × 1) is an
offset vector. It is worth emphasizing that CO can be different from CI . This can
be useful in cases where it might be decided apriori that l output components will
be discarded (CO = CI − l). The synthesis equation of the linear block transform
is given by
R(x, y) = S Ô(x, y) + B̃
(3.2)
where Ô denotes the compressed/decompressed version of O, R denotes the reconstructed image components and S is the synthesis matrix of size CR × CO . Here
again, CR need not be equal to CO (nor CI ) . This can be useful, for example, in
30
creating pseudo-color components from compressed LANDSAT data when no such
components were present at compression time [5].
MCT
Muli-component
Image Data
Component
Collections
Each
Collection is
Processed by
a Transform
Block
1.
2.
3.
4.
Linear block Transform
Dependency Transform
Wavelet Transform
Null transform
Transform Block
Embedded
Bit Stream
PCRD-opt
Codestream Formation
2D Wavelet Transform
and
Block Coding
2D Compression of Each Component
Figure 3.1: JPEG2000 Part 2 Encoder.
The synthesis equation for a dependency transform is given by
R(x, y) = SR(x, y) + Ô(x, y) + B̃
(3.3)
Unlike linear block transforms, the synthesis matrix S is restricted to be square and
strictly lower triangular, so that R can be reconstructed recursively. This can be
used to effect a predictive coding system as described in [5].
Apart from the two matrix based transforms described above, JPEG2000
Part 2 supports wavelet transforms for decorrelating a component collection. In
this case, the 1-D wavelet transform is applied to I(x, y) to get O(x, y). Similarly,
the 1-D inverse wavelet transform is applied to Ô(x, y) to get R(x, y). With a null
transform, no processing is done. Hence O(x, y) = I(x, y) and R(x, y) = Ô(x, y).
31
The MCT extensions described above have been used for efficient 3-D compression applications such as medical [6] and hyper-spectral [7] imaging. In this
dissertation, we propose a JPEG2000 Part 2 compliant compression scheme for N D datasets for any N ≥ 3. Section 3.2 gives the algorithmic details of the N -D
compression scheme. Scalability features of the proposed scheme are discussed in
Section 3.3. In Section 3.4, the proposed methodology is tested on a 5-D functional
Magnetic Resonance Imaging (fMRI) dataset. Experimental results indicate considerable improvement in compression performance for this dataset, when the MCT
is used with N > 3. Section 3.5 concludes the chapter.
3.2
N-D compression
Let (M1 , M2 , . . . , MN −2 , X, Y ) denote the dimensions (size) of an N -D dataset. The
data can then be interpreted as a collection of 2-D (X, Y ) images (or components).
The number of such components is
M=
N
−2
Y
Mk
(3.4)
k=1
Each component is indexed by an N − 2 dimensional vector: [m1 , m2 , . . . , mN −2 ]
where mk ∈ {1, 2, . . . , Mk }. To compress an N -D dataset, the M components are
processed by N −2 stages of MCT. Specifically, each stage has M input components
and produces M output components, which in turn are used as inputs to the next
32
stage. Consider the k th (k ∈ {1, 2, . . . , N − 2}) stage of the MCT which is used to
decorrelate the data along the k th dimension. The procedure is depicted in Fig. 3.2.
The input components are first reordered to form collections. A collection is formed
by grouping all components having the same value of the N − 3 dimensional index
[m1 , m2 , . . . , mk−1 , mk+1 , . . . , mN −2 ]. There are then M/Mk collections, each containing Mk components. For each collection, a point transform is applied, resulting
in decorrelation of the data along the k th dimension. The transform of a particular
collection is referred to as a Transform Block (TB).
The output components of the final MCT stage (N − 2) are referred to
as code-stream components, and are compressed using 2D (x, y) DWT and block
coding. It should be noted that the encoding procedure described above will result
in a code-stream that is entirely compliant with JPEG2000 Part 2.
Fig. 3.3 illustrates the procedure with a 4-D example. The data for this
example contain two volumetric (‘xyz’) cubes, corresponding to two time instances
(T = 2). Each ‘xyz’ cube contains three slices (Z = 3). Two MCT stages are
used to process the 4-D data. In the first MCT stage, the input components are
grouped to form two ‘xyz’ cubes. Each cube is processed by a transform block
thereby decorrelating the data along z. The output components of the first stage
are passed to the second stage, where they are reordered to form three ‘xyt’ cubes.
Each of these three cubes is processed by a transform block to decorrelate the data
33
along t. The result is 6 code-stream components that are each subjected to 2-D
compression.
Transform
Block
No. 1
Components
from
previous
stage are
reordered
Output
components
passed to the
next stage
Mk
components
Transform
Block
No. M/M k
Mk
components
Figure 3.2: Stage k of MCT.
For simplicity, we assume in what follows that the same transform is used
for each transform block within a stage, but that a different transform can be used
for each stage.
3.3
Scalability issues
JPEG2000 is a highly scalable compression scheme, in which portions of the compressed data can be selectively decoded to obtain a particular component, resolution,
quality, or region of interest (ROI). In this section, spatial scalability is discussed
for the proposed N -D compression scheme.
Recall that each component is indexed by an N − 2 dimensional vector:
[m1 , m2 , . . . , mN −2 ] and each transform block in the k th (k ∈ {1, 2, . . . , N −2}) stage
is indexed by an N − 3 dimensional vector: [m1 , m2 , . . . , mk−1 , mk+1 , . . . , mN −2 ].
34
MCT Stage 1
‘xyz’
cube-2
‘xyz’
cube-1
Transform Block T2
Y
X
Transform Block T1
Two ‘xyz’ cubes
(corresponding to t=1
and t=2) are formed
Component
reordern
ig
Each transnform block decorrelates
one cube in z direction
Transform Block Z3
Y
Transform Block Z2
X
Transform Block Z1
Each transform block decorrelates
one cube in t direction
Three ‘xyt’ cubes
(corresponding to z=1,
z=2 and z=3) are formed
MCT Stage 2
Figure 3.3: Compression of a 4-D data using 2 MCT stages.
To decompress the N -D data, the MCT stages are applied in reverse order. Specifically, the reconstructed code-stream components are first processed by stage N − 2
followed by stage N − 3 and so on. Let us consider decompressing one ‘xy’ component of the N -D data. Let [m2 , . . . , mN −2 ] denote the particular transform block
in stage 1 that generates the desired ‘xy’ component. To output the desired component, the transform block will require a set of input components indexed by
[j1 , m2 , m3 , . . . , mN −2 ], where j1 ∈ J1 , kJ1 k = J1 . Each of these J1 components will
be produced by a different transform block in stage 2. Specifically, the component
[j1 , m2 , m3 , . . . , mN −2 ] will be generated by the transform block [j1 , m3 , . . . , mN −2 ].
35
To output that component, the transform block will require a set of input components denoted by [j1 , j2 , m3 , . . . , mN −2 ] (j2 ∈ J2 , kJ2 k = J2 ). Thus, to generate
each j1 ∈ J1 , stage 2 requires a different set of components (of size J2 ) from stage
3. In total, we need J1 × J2 components from stage 3. Arguing inductively, the
set of code-stream components S, that is necessary to reconstruct the desired ‘xy’
component is of size
kSk =
N
−2
Y
Jk
(3.5)
k=1
The value of Jk will depend on the type of point transform used in the k th
stage of MCT. For matrix based transforms, Jk will depend on the number of nonzero coefficients in the synthesis matrix. In the case of wavelet transforms, Jk will
be determined by the lengths of the wavelet filters and the number of wavelet transform levels used. It is important to note that Jk is the number of input components
required to produce one output component by a transform block in the k th stage of
MCT. If consecutive components are desired, the required number of input components may increase only modestly depending on the point transform employed (e.g.,
wavelet). If a subset of pixels from a component (or multiple components) are desired, the spatial scalability properties of 2-D JPEG2000 can be used to significantly
reduce the amount of data that must be accessed and/or decompressed.
36
Table 3.1: Compression performance of 5-D fMRI dataset.
Bit-rate=⇒
Treatment
Transform
1
2D - ‘xy’
0.1
0.25
0.5
0.75
1
1.5
33.12
42.49
48.93
53.52
57.12
62.56
41.01
47.33
52.89
56.85
60.09
65.07
55.06
60.78
65.01
67.73
69.86
72.97
55.88
61.19
65.20
67.91
69.97
73.03
Description
JPEG2000 Part 1
Single stage MCT
2
3D - ‘xyz’
- 2 levels of (9,7)
Stage 1 MCT ‘t’
- 5 levels of (9,7)
3
4D - ‘xyzt’
Stage 2 MCT ‘z’
- 2 levels of (9,7)
Stage 1 MCT ‘r’
- KLT
Stage 2 MCT ‘t’
4
5D - ‘xyztr’
- 5 levels of (9,7)
Stage 3 MCT ‘z’
- 2 levels of (9,7)
3.4
Results
Two examples of multi-dimensional sources are medical images (e.g., fMRI, 4D
cardiac and ultrasound) and multi-view video [2]. 4-D compression techniques have
been developed in [12, 13, 14, 3] for specific classes of multi-dimensional datasets.
In this section, we present experimental results on the compression performance of
a 5-D fMRI dataset using the proposed N -D compression scheme.
fMRI is a medical imaging modality that generates a time series of 3-D
images of the brain while a subject is performing some task. Variations of voxel
intensity as a function of time are analyzed to determine activations pertaining to a
37
particular task. Multiple trials of the fMRI experiment are carried out to improve
the accuracy of the analysis. The 5-D dataset used here contains ‘xyz’ cubes with
dimensions X = 64, Y = 64 and Z = 21. A single trial (‘r’) consists of 100 ‘xyz’
cubes corresponding to 100 time (‘t’) instances (T = 100). There are three 4D-cubes
(‘xyzt’) corresponding to three trials (R = 3) of the experiment. Each sample value
is represented with 13 bits.
Table 3.1 gives PSNR values (in dB) for 4 different compression treatments.
In treatment 1, each ‘xy’ slice is compressed independently using a JPEG2000 Part
1 encoder. Post Compression Rate Distortion optimization (PCRD-opt) [5, 15] is
used to perform rate allocation jointly across all 6300 slices (Z ×T ×R). Treatments
2-4 use one or more MCT stages to decorrelate the data along ‘z’, ‘t’ and/or ‘r’.
Each of the output code-stream components is subjected to 2 levels of 2-D dyadic
(9,7) wavelet transform and compression by the JPEG2000 block coder. Here again,
PCRD-opt is used to perform rate allocation jointly across all 6300 components.
As noted in Table 3.1, treatment-2 uses a single stage of MCT that applies
2 levels of 1-D dyadic (9,7) transform along ‘z’. Treatment-3 decorrelates the data
along ‘z’ and ‘t’, using two stages of MCT. As noted in the table, the first stage
applies 5 levels of (9,7) transform along ‘t’, while the second stage uses 2 levels of
(9,7) transform along ‘z’. Treatment-4 exploits dependancy in all five dimensions.
In this treatment, the Karhunen-Loève transform (KLT) is used along ‘r’ in the first
38
MCT stage. The (9,7) wavelet transform is then used along ‘t’ and ‘z’ in the second
and third stages, respectively. Treatments 2-4 described above result in JPEG2000
Part 2 compliant codestreams.
As seen in Table 3.1, considerable compression gain is achieved with 4-D
compression when compared to 3-D compression. The gain is 12 to 14 dB at low
to moderate bit rates and 8 to 9 dB at high bit rates. This gain is attributed to
high temporal correlation present in the fMRI dataset, which is well-exploited by
the (9,7) transform. It should be noted that the performance of the 3-D case shown
in Table 3.1 (treatment 2) is a little better than the ‘usual’ 3-D JPEG2000 because
rate allocation is across all cubes. The same is true for treatment 1. The 5-D
method gives an additional gain of 0.4 to 0.8 dB at low bit rates. This is due to the
decorrelation of the data along ‘r’ using the KLT. The (9,7) transform, when used
in place of the KLT, yields smaller gains at low bit rates and small losses at high
bit rates. Note that the gain of the 5-D treatment over 4-D is much less compared
to the gain of the 4-D treatment over 3-D. This can be attributed to the fact that
the size of the trial dimension (R=3) is very small for this dataset.
For the results reported here, the transform decomposition structure employed was separable for all dimensions except ‘xy.’ The ‘xy’ transform was dyadic.
However, it is worth mentioning that any desired structure can be employed for
N − 2 dimensions, and any desired structure can be used for the ‘xy’ dimensions.
39
The only restriction is that the ‘xy’ transform must be performed last (after all
operations on the ‘N − 2’ other dimensions are complete). This restricts the usage
of some N -D wavelet decompositions. For example, in the 3-D case, 2 or more
levels of 3-D dyadic wavelet transform cannot be performed in a JPEG2000 Part
2 compliant scheme. This may not be a serious limitation as JPEG2000 Part 2
compliant decompositions can be very efficient. For example, with hyperspectral
images (3-D), a 1-D DWT in the spectral domain followed by a 2-D DWT has been
found to be the most efficient [16]. For the fMRI dataset presented in this paper,
the performance of treatment-3 is only slightly worse (0.20 to 0.31 dB) than the best
4-D (‘xyzt’) decomposition structure for fMRI found in [17]. Also, the performance
of treatment-3 is only slightly worse (0.13 to 0.24 dB) compared to 4D SPIHT [13].
3.5
Conclusion
In this chapter, a JPEG2000 Part 2 compliant N -D compression scheme is presented
for any N ≥ 3. The data are treated as multiple ‘xy’ slices. Three different transform coding tools can then be used to decorrelate the data in N −2 dimensions. The
resulting code-stream components are compressed using 2-D DWT and block coding. Experimental results indicate that the proposed scheme can give considerable
compression gain when a 5-D dataset is decorrelated in more than 3 dimensions.
40
Scalability is supported along all N dimensions. Additionally, incremental processing can be used to achieve considerable savings in memory usage. For example,
line-based (sliding window) wavelet transforms can be employed in all dimensions
rather than just in the y-direction as typically used for 2-D imagery.
41
CHAPTER 4
VIEW COMPENSATED COMPRESSION OF VOLUME RENDERED
IMAGES
4.1
Introduction
With tremendous advances in volume imaging modalities, remote visualization
of volumetric images has gained importance in many applications such as medical imaging, non-destructive testing and computational fluid dynamics.
One
way to achieve remote visualization would be to transmit volumetric data to the
client where it is rendered for visualization. Volume rendering is a memory- and
computation-intensive process, often requiring hardware acceleration for high quality, real-time rendering [18]. Thus, it is likely that the quality and frame rate would
be limited in a client side rendering system. Another remote visualization model
would be to do the rendering at a PACS (Picture Communication and Communication System) server with dedicated hardware. This client-server model is shown
in Fig. 4.1. All the steps in a volume rendering pipeline are executed at the server,
based on view-point requests from the client. The rendered images are compressed
42
and transmitted to the client. The client decompresses the images for display. This
model has been investigated [19, 20] and it has been shown that a real time viewing
experience is achievable. Generic compression schemes such as LZO [21], ZLIB [22]
and color cell compression [23, 24] were examined [19] for compressing the rendered
images. In this work, we present a compression scheme that exploits characteristics of volume rendered images to achieve significant improvement in compression
efficiency.
Client
request
Hardware
Accelerated
Rendering
Engine
Desired View
Point
Rendered image
Compressor
PACS Server
Decompressor
client
Figure 4.1: Client-server model.
The sequence of volume rendered images can be treated as video frames
and compressed using video compression standards such as Motion JPEG2000 [25,
5] or Advanced Video Codec (AVC/H.264) [26]. Motion JPEG2000 (JPEG2000
Part 3) codes each frame in a video independently using JPEG2000 [5]. AVC
exploits inter-frame correlation to achieve state-of-the-art compression performance
for video. It uses a block based motion model to capture the motion between
successive frames. An essential part of a typical AVC encoder is motion estimation,
43
which is a computationally intensive process.
To exploit temporal redundancy without the complexity of motion estimation, we note that rendered images possess an underlying geometric relationship. We
show that this relationship can be exploited with little computational complexity.
A Lifting-based [27, 28] Invertible Motion Adaptive Transform (LIMAT)
was developed in [29] for highly scalable video compression. In LIMAT, motion
compensation between video frames is incorporated into the lifting steps of a temporal wavelet transform. A chief benefit of the LIMAT scheme is that the temporal
transform remains invertible even with non-invertible motion warping operations.
Thus, it gives a higher degree of freedom in choosing a motion warping model that
accurately captures the actual motion between frames. With accurate motion modelling and estimation, the LIMAT scheme can realize the temporal wavelet transform
along the motion trajectories in a video [29]. Hence, excellent energy compaction
can be achieved. We adopt an approach similar to LIMAT for compressing volume
rendered images. Specifically, warping operations between rendered images are derived from their geometric relationship. They are incorporated into the lifting steps
of a temporal wavelet transform to decorrelate the sequence of volume rendered
images. Resulting subband images are compressed using JPEG2000.
The chapter is organized as follows. Section 4.2 gives a brief review of
44
volume rendering. In Section 4.3, we establish the geometric relationship between
volume rendered images. The proposed compression scheme is laid out in Section 4.4. First, we discuss the incorporation of geometric warping operations into
the lifting steps of a wavelet transform. Secondly, compression of side information
is considered. Results are presented in Section 4.5, where the compression performance of our scheme is compared with AVC and Motion JPEG2000. Section 4.6
concludes the chapter.
4.2
Volume rendering
Ray casting [30, 31] is an effective method for achieving high quality volume rendering. Other techniques such as light field rendering [32, 33] or rendering based
on GPUs [34] can be used for rendering volumetric data. However, the quality of
images produced by such techniques is poor compared to the ray casting approach
[31]. It should be noted that in this chapter, we focus on dynamic visualization of a
static dataset as opposed to time-varying datasets [35]. In ray casting (depicted in
Fig. 4.2), a ray from each pixel in the 2D image is passed through the volume. The
intensities at sample points on the ray are used for computing the pixel value from
which the ray emerges. This operation of formulating the pixel value is known as
ray composition. Sample points on the ray do not typically coincide with voxels1
1
We refer to values in the 3-D volume as voxels. We refer to values in the 2-D rendered image
as pixels.
45
in the 3-D volume and hence an interpolation scheme is required. Linear interpolation is a simple and effective choice. Higher quality images can be achieved
through cubic-convolution or B-spline interpolation schemes at the expense of more
computation. In this work, we use linear interpolation.
Sample points
Rays
Volumetric
data
2D image
Figure 4.2: Ray casting.
In one method of ray composition, referred to as Maximum Intensity Projection (MIP) [31], the maximum value along the ray is taken to be the intensity of
the pixel. More advanced ray compositing operations are typically used to produce
color images that aid in better visualization of the volumetric data. To this end,
consider the ray emerging from pixel (x, y) in the rendered image I. Let N (x, y) be
the number of sample points on this ray (Fig. 4.2). Color transfer functions Fr , Fg
and Fb map the intensity si , at sample point i, to red (ri ) , green (gi ) and blue (bi )
values respectively. The scalar opacity transfer function (FSO ) maps si to opacity
value αi . A normalized opacity value between 0 and 1 is used, 1 being fully opaque.
46
To summarize, the color and opacity values at sample point i are written as
ri = Fr (si ), gi = Fg (si ), bi = Fb (si ), αi = FSO (si ) ;
ri , gi , bi , and αi ∈ [0, 1] .
The red, green and blue components of the pixel (x, y) in image I are given by
I (x, y) =
X
i=1
N (x,y)
N (x,y)
N (x,y)
r
r̂i ,
g
I (x, y) =
X
ĝi ,
b
I (x, y) =
i=1
X
b̂i ,
(4.1)
i=1
where
r̂i = ri × αi ×
i−1
Y
j=1
i−1
i−1
Y
Y
(1 − αj ) ĝi = gi × αi ×
(1 − αj ) b̂i = bi × αi ×
(1 − αj )
j=1
j=1
This equation is referred as the ray casting equation [30] in the literature. In what
follows, only processing for the red component is discussed, with identical operations
being carried out for the green and blue components.
In many visualization scenarios, it is necessary for rendered images to show
clear distinction between 3D objects in the volumetric data. In this regard, use of
gradient information helps in accentuating the edges in the data. First, a gradient
volume is formed by computing the gradient magnitude at each voxel. As with
intensity si , the gradient magnitude k∇i k at sample point i on the ray, can be
obtained through interpolation. The gradient opacity transfer function (FGO ) then
maps k∇i k to a gradient opacity value between 0 and 1. The αi used in the ray
casting equation is modified to incorporate the gradient term as
αi = FSO (si ) × FGO (k∇i k).
47
In some volume acquisition systems such as Magnetic Resonance Imaging
(MRI), determining color and opacity values based on voxel intensity alone is difficult. In such cases, the volumetric data is divided into structural units using a
segmentation algorithm. This along with the intensity is then used to determine
the color and opacity at sample points on the ray.
Fig. 4.3 illustrates volume rendering with an example. A CT dataset consisting of 200 slices, each of size 102×247, is rendered using transfer functions as
shown in the figure. All rendering for this work was done with Visualization Tool
Kit (VTK) [36], an open source library for visualization algorithms.
1
ri
0.6
0.3
60 71
102
255
60 71
102
255
60 71
102
255
1
gi
0.6
0.3
1
bi
0.6
0.3
0
(a) Rendered image
FSO (
(b) Color transfer functions
si )
0.6
1
0.4
0.6
0.2
0.3
7
255
102
(c) Scalar opacity transfer function
FGO (
)
11 20
(d) Gradient opacity transfer function
Figure 4.3: Volume rendering example.
255
48
4.3
Geometric relationship between volume rendered images
Fig. 4.4 shows the coordinate system convention we use for deriving the geometric
relationship between volume rendered images. The axes Xw , Yw and Zw represent
the ‘world coordinate system.’ It is fixed and does not change with view-points.
Let us consider rendering the volume with respect to view-point P1 which is at
a distance d from origin O of the world coordinates. A view coordinate system
v1 ≡‘X1 Y1 Z1 ’ is formed with P1 as the origin and line OP1 as the Z1 axis. The
line OP1 subtends angles θ1 (azimuth) and φ1 (elevation) with planes ‘Yw Zw ’ and
‘Xw Zw ,’ respectively. A 2D image (I1 ) of dimension L × L is formed on the ‘X1 Y1 ’
plane, where L is the diagonal length of the volumetric data2 . The image is formed
by casting a ray (parallel to Z1 axis) from each pixel location through the volume.
Yw
O
Xw
Y1
X1
P1
Zw
Z1
I1
Figure 4.4: View-point geometry
Let us now consider another image I2 rendered at view-point P2 . P2 is
2
Note that this choice of L ensures sufficient pixels in the image to cover the volume at any
view-point.
49
parameterized by distance d, azimuth θ2 and elevation φ2 . As before, a view coordinate system v2 ≡‘X2 Y2 Z2 ’ is constructed and image I2 is formed on the ‘X2 Y2 ’
plane, by casting rays parallel to the Z2 axis. Let us consider the ray from pixel
(x2 , y2 ) in I2 . The number of sample points on the ray is N (x2 , y2 ). Let the ith
sample point on the ray be denoted by (x2 , y2 , z2i ) in the view coordinate system
v2 . Thus, the ray casting equation (Eqn.4.1) for the red component of I2 can be
written as
N (x2 ,y2 )
I2r (x2 , y2 )
=
X
r̂(x2 , y2 , z2i )
where
(4.2)
i=1
r̂(x2 , y2 , z2i ) = r(x2 , y2 , z2i ) × α(x2 , y2 , z2i ) ×
i−1
Y
(1 − α(x2 , y2 , z2j ))
j=1
Our goal is to get a good prediction from I1 for the pixel (x2 , y2 ) in I2 .
To this end, we first find a single point on the ray (emanating from (x2 , y2 )) that
can well describe the pixel (x2 , y2 ). Since every sample point on the ray can have
some contribution to the pixel, we compute a ‘centroid depth value’ denoted by
C2 (x2 , y2 ). Specifically, each sample point depth z2i is weighted by the fractional
contribution of that sample point to the pixel. The equation for determining the
red component of the centroid depth value is written as
N (x2 ,y2 )
C2r (x2 , y2 )
=
X
z2i × r̂(x2 , y2 , z2i )/I2r (x2 , y2 ),
assuming
(4.3)
0
I2r (x2 , y2 ) 6=
i=1
Thus, a ‘depth map’ consisting of depth values for each non-zero pixel in I2 is generated. Discussion on depth values for zero-pixels in I2 is defered to Section 4.4.2.
50
Such depths are taken to be zero for now. Fig. 4.5 shows the depth map corresponding to the rendered image of Fig. 4.3a. The use of this depth map to determine the
geometric mapping from I2 to I1 is discused next.
Red component
Green component
Blue component
Figure 4.5: Depth map.
Based on coordinate system transformations, it can be shown that a point,
represented as (x2 ,y2 ,z2 ) in view coordinate system v2 can be transformed to the
world coordinate representation (xw ,yw ,zw ) through the matrix multiplication defined below.



 xw 







 y 

 w 


 = M2→w × 



 z 

 w 







1

x2 


y2 



z2 



1
(4.4)
51
where,

M2→w
 cos(θ2 ) − sin(θ2 ) × sin(φ2 ) sin(θ2 ) × cos(φ2 ) d × sin(θ2 ) × cos(φ2 )



0
cos(φ2 )
sin(φ2 )
d × sin(φ2 )

= 

 − sin(θ ) − cos(θ ) × sin(φ ) cos(θ ) × cos(φ ) d × cos(θ ) × cos(φ )

2
2
2
2
2
2
2


0
0
0
1
Similarly, the world coordinate system can be transformed to view coordinate system v1 using Mw→1 . Thus, transformation from coordinate system v2 to v1 can be
accomplished using M2→1 where, M2→1 = Mw→1 × M2→w .
For the red component of the pixel (x2 , y2 ) in I2 , the coordinate system
v2 location corresponding the centroid depth value is (x2 , y2 , C2r (x2 , y2 )). This representation, can be transformed to view coordinate system v1 to locate the pixel
(xr1 , y1r ) in I1 as

x2




r
y2
 x1 


 = M̂2→1 × 



 C r (x , y )
y1r
 2 2 2


1














(4.5)
where M̂2→1 is a 2 × 4 matrix containing the first 2 rows of M2→1 . Thus, using
the depth map C2 , a geometric mapping from I2 to I1 is obtained. In the ensuing sections, we denote this geometric mapping as µ2→1 . In a straight forward
implementation, this mapping requires 6 multiplications and 6 additions per pixel.












52
Specifically,
xr1 = m11 × x2 + m12 × y2 + m13 × C2r (x2 , y2 ) + m14
y1r = m21 × x2 + m22 × y2 + m23 × C2r (x2 , y2 ) + m24
where mij represents the element in the ith row and j th column of M̂2→1 . More
efficient implementation can give considerable savings in computation. For example,
with raster scanning of I2 , only x2 and the depth values change while visiting pixels
in row y2 . This observation leads to the following algorithm for computing (xr1 , y1r )
for all pixels of I2 in row y2 .
Σx = m12 × y2 + m14
Σy = m22 × y2 + m24
f or x2 = 1 to L
{
Σx = Σx + m11
Σy = Σy + m21
xr1 = Σx + m13 × C2r (x2 , y2 )
y1r = Σy + m23 × C2r (x2 , y2 )
}
Ignoring initialization, the required computation is reduced to 2 multiplications and
53
4 additions per pixel.
4.4
Compression of volume rendered images
Fig. 4.6 shows the proposed compression scheme for volume rendered images. The
rendered images are first subjected to the Lifting-based View Compensated Wavelet
Transform (LVCWT) described below in subsection 4.4.1. The resulting subband
images are then compressed using the JPEG2000 [5] encoder. The LVCWT requires
depth maps for establishing the geometric relation between the rendered images.
Hence, the depth maps should to be sent to the decoder as side information. Compression of this side information is discussed in subsection 4.4.2.
Compressor
Ray casting
View compensated
forward wavelet transform
JPEG2000 encoder
JPEG2000 decoder
Depth map generation
Side information
JPEG2000 encoder
JPEG2000 decoder
Display
View compensated
inverse wavelet transform
JPEG2000 decoder
Decompressor
Figure 4.6: Compression/decompression of volume rendered images.
54
4.4.1
Lifting-based view compensated wavelet transform
Let I0 , I1 , . . . , I2k , I2k+1 , . . . denote the sequence of rendered images corresponding to
view-points P0 , P1 , . . . , P2k , P2k+1 , . . ., respectively. The rendered images are all of
size L×L. Let us first consider the application of “the usual” 5/3 wavelet transform
across the sequence of rendered images. In the first lifting step (prediction step),
pixel (x, y) in an odd-indexed image is predicted from pixel (x, y) in the neighboring
even-indexed images. The prediction residual (high pass coefficient) is given by
O2k+1 (x, y) = I2k+1 (x, y) − 1/2[I2k (x, y) + I2k+2 (x, y)]
In the second lifting step (update step), the low pass coefficient is obtained as
follows.
O2k (x, y) = I2k (x, y) + 1/4[O2k−1 (x, y) + O2k+1 (x, y)]
The high pass coefficients are finally scaled by half.
A chief benefit of the lifting implementation is that the wavelet transform
remains invertible, even if non-invertible operations are performed inside the lifting
steps. Hence, we incorporate the geometric mapping derived in Section 4.3 inside
the lifting steps to form the LVCWT. Fig. 4.7 shows the forward LVCWT. Let
µi→j denote the geometric mapping from image Ii to image Ij . In the first lifting
step of LVCWT, the pixel (x, y) in I2k+1 is predicted from pixels µ2k+1→2k (x, y) and
55
µ2k+1→2k+2 (x, y) in I2k and I2k+2 respectively. The prediction residual is then given
by
O2k+1 (x, y) = O2k+1 (x, y) − 1/2[I2k (µ2k+1→2k (x, y)) + I2k+2 (µ2k+1→2k+2 (x, y))]
Note that µ2k+1→2k (x, y) and µ2k+1→2k+2 (x, y) will typically yield non-integer locations. Hence, linear interpolation is used to compute the pixel value. In the
second lifting step, the pixel (x, y) in I2k is updated with pixels µ2k→2k−1 (x, y) and
µ2k→2k+1 (x, y) in O2k−1 and O2k+1 respectively. The resulting low pass coefficient is
given by
O2k (x, y) = I2k (x, y) + 1/4[O2k−1 (µ2k→2k−1 (x, y)) + O2k+1 (µ2k→2k+1 (x, y))]
The prediction residuals are finally scaled by half to form the high pass coefficients.
In the inverse LVCWT, high pass coefficients are first scaled by 2. The lifting steps
are then applied in reverse order with signs of the prediction and update values
reversed. Though we have described the LVCWT using the 5/3 wavelet transform,
it is applicable for any transform implemented in lifting fashion.
4.4.2
Compression of depth maps
Depth maps are required at the encoder and decoder to determine the geometric
mapping used in LVCWT. Hence, efficient compression of depth maps is necessary
for transmiting them as side information. Before we consider the compression of
56
+
+
+
+
_
+
1/2
Figure 4.7: Forward lifting-based view compensated wavelet transform.
depth maps, depth values for zero intensity pixels need to be determined. Recall
that depth values for such pixels were taken to be zero in Section 4.4 (see Fig. 4.5).
However, experimental results indicate that copying the depth value of a nearby
‘non-zero pixel’ gives better compression performance. Specifically, when I(x, y) =
0, the depth value at (x, y) is set to be the same as that of the nearest ‘non-zero
pixel’ in row x (if any exist). The process is repeated along the columns. Fig. 4.8
shows the depth map obtained by processing the depth map in Fig. 4.5 using the
aforementioned procedure. Since the red, green and blue components of the depth
map differ little, we transmit only the average of the three component depth maps.
The performance loss incured due to this is small (0.04 to 0.11 dB).
Each processed average depth map is compressed independently using
57
Red component
Green component
Blue component
Figure 4.8: Processed depth map.
JPEG2000. At the decompressor, the side information is decoded to get the reconstructed depth maps that are used by the inverse LVCWT. As shown in Fig. 4.6,
the forward LVCWT also uses the reconstructed version of the depth maps. A fraction f of the target bit rate is used for compressing the depth maps. The trade-off
parameter f was set at 0.06 based on experiments with training sequences. The
optimum f was found to vary only slightly with bit rate and/or training sequence.
4.5
Results
In this section, we compare the compression performance of the proposed scheme
with AVC and Motion JPEG2000. With all three methods, we do not use any color
transform across the RGB components of the volume rendered images. The YCbCr
transform that is generally used for decorrelating the RGB components of natural
58
images did not yield any improvement for our test images. With AVC, every 8th
image in the rendered image sequence is coded as Intra frame (GOP length = 8).
The case of GOP length = 16 is also considered. All other images in the sequence
are coded as P frames. Profile 4:4:4/12 of AVC is used and encoding parameters
are tuned to achieve best possible compression performance. In the proposed compression scheme, the sequence is subjected to 3 levels of dyadic LVCWT3 . The
resulting subband images are partitioned into groups of 8. Each group is treated as
an 8-component image to be compressed using JPEG2000 (Fig. 4.6). No (further)
component transform is used. The advantage of this procedure is that PCRD-opt
[5] (rate allocation) is performed jointly for the 8 subband images.
Table 4.1 shows the compression performance of Motion JPEG2000, AVC
and view compensated compression at different target bit-rates. Target bit-rates
for AVC are shown in the first row of the table. The bit-rates achieved by AVC are
shown in row 2. With our scheme, it is possible to achieve any target bit-rate very
closely. Hence, to make a fair comparison, the AVC achieved bit-rate is used as
the target bit-rate for view compensated compression and for Motion JPEG2000.
The compression performance of the three schemes are presented for 4 different
‘out of training’ datasets (Table 4.1, a-d). Dataset 1 correspond to the rendering
example of Fig. 4.3. Dataset 2 is obtained using the same parameters as dataset
3
The images are not divided into groups of 8 for the purpose of the temporal transform.
59
1, but without gradient opacity mapping. Datasets 3 and 4 are rendered sequences
from another CT volume consisting of 300 slices each of size 512×512. Dataset 3 is
obtained with gradient opacity maping while dataset 4 is obtained without gradient
opacity mapping. Each dataset comprises 4 sequences, each corresponding to a
different set of view-points as listed below:
1. Constant elevation of 0◦ , azimuth varied from 0◦ to 59.5◦ , in steps of 0.5◦ .
2. Azimuth and elevation both varied from 0◦ to 59.5◦ , in steps of 0.5◦ .
3. Constant azimuth of 180◦ , elevation varied from 0◦ to 59.5◦ , in steps of 0.5◦ .
4. Azimuth and elevation varied from 45◦ to 104.5◦ , and 0◦ to 59.5◦ respectively,
both in steps of 0.5◦ .
Each sequence contains 120 frames, and the encoder is restarted at the beginning
of each sequence. The average PSNR of the datasets, defined below, is presented in
Tables 4.1-4.4.
4
Average PSNR =
120
XX
1
PSNR(n, f )
120 × 4 n=1 f =1
PSNR(n, f ) = 10 × log10
255 × 255
M SE(n, f )
where, M SE(n, f ) denotes the mean squared error of decompressed frame f in sequence n. As seen from tables, Motion JPEG2000 has the lowest average PSNR
60
for all datasets as inter-frame correlation is not exploited. AVC performs better by
exploiting inter-frame correlation. With AVC, gains from 0.1 to 1 dB are obtained
at different bit-rates when the GOP length is increased from 8 to 16. View compensated compression performs better than AVC consistently across all datasets. The
gains range from 1.97 to 2.13 dB at low bit-rates and from 0.68 to 2.53 dB at high
bit-rates. By exploiting the underlying geometric relation between rendered images
we are able to achieve superior compression performance. Additionally, as noted in
Section 4.3, the geometric mapping has very low complexity (2 multiplications and
4 additions per pixel) when compared to estimating motion between frames.
Table 4.1: Average PSNR of volume rendered images for
AVC target bit rate (Kbps) ⇒
200
400
600
AVC achieved bit rate (Kbps) ⇒
206
411
617
Motion JPEG2000, GOP length = 1 23.91 26.00 27.64
AVC, GOP length = 8
28.28 31.97 33.83
AVC, GOP length = 16
29.24 32.34 34.07
View compensated compression
31.37 34.09 35.76
Gain over AVC, GOP length = 16 +2.13 +1.75 +1.69
Dataset
800
825
29.02
35.23
35.33
36.92
+1.59
1
1000
1030
30.10
36.33
36.38
37.88
+1.51
Table 4.2: Average PSNR of volume rendered images for
AVC target bit rate (Kbps) ⇒
200
400
600
AVC achieved bit rate (Kbps) ⇒
206
411
617
Motion JPEG2000, GOP length = 1 26.96 29.91 32.32
AVC, GOP length = 8
34.42 37.92 39.53
AVC, GOP length = 16
35.50 38.35 39.80
View compensated compression
37.46 40.28 41.93
Gain over AVC, GOP length = 16 +1.97 +1.93 +2.13
Dataset
800
823
34.00
40.66
40.83
43.00
+2.17
2
1000
1029
35.50
41.56
42.62
43.30
+0.68
The PSNR values for all frames in sequence 1 of dataset 1, at bit-rates of
200 Kbps and 800 Kbps, are plotted in Fig. 4.9 and Fig. 4.10, respectively. The
61
Table 4.3: Average
AVC
target bit rate (Kbps)
AVC
achieved bit rate (Kbps)
Motion JPEG2000
GOP length = 1
AVC
GOP length = 8
AVC
GOP length = 16
View compensated
compression
Gain over AVC
with GOP length = 16
PSNR of volume rendered images for Dataset 3
1000
2000
3000
4000
5000
6000
7000
1028
2052
3074
4099
5124
6158
7198
26.61
28.87
30.31
31.40
32.29
33.07
33.76
28.47
31.83
33.22
34.20
34.91
35.48
36.00
29.39
32.03
33.43
34.32
34.96
35.55
36.07
31.40
33.15
34.29
35.17
35.91
36.52
37.05
+2.00 +1.12 +0.86 +0.85 +0.95 +0.97 +0.98
variation of PSNR with AVC is due to the fact that more bits are allocated for I
frames compared to P frames. The variation pattern observed with view compensated compression is due to the cyclostationary property of the 5/3 transform. As
seen in Fig. 4.9 and Fig. 4.10, the variation of PSNR is considerably less for view
compensated compression than for AVC. To quantify the variation in quality, the
standard deviation for PSNR is computed as
v
4 u
120
X
u 1 X
1
t
(P SN R(n, f ) − M P (n, f ))2
σ =
4 n=1 120 f =1
where
1
M P (n, f ) =
10
−1
10×⌊ f10
⌋+10
X
i=10×⌊
f −1
10
P SN R(n, i)
⌋+1
M P (n, f ) is the mean PSNR for the 10 consecutive frames of sequence n that
contain frame f . M P (n, f ) is calculated for every 10 frames as opposed to taking
62
Table 4.4: Average
AVC
target bit rate (Kbps)
AVC
achieved bit rate (Kbps)
Motion JPEG2000
GOP length = 1
AVC
GOP length = 8
AVC
GOP length = 16
View compensated
compression
Gain over AVC
with GOP length = 16
PSNR of volume rendered images for Dataset 4
1000
2000
3000
4000
5000
6000
7000
1028
2052
3074
4099
5124
6158
7198
25.93
28.08
29.43
30.48
31.32
32.04
32.67
29.10
32.13
33.35
34.20
34.86
35.40
35.90
30.14
32.46
33.55
34.31
34.93
35.38
35.84
32.35
34.44
35.65
36.54
37.25
37.87
38.37
+2.21 +1.98 +2.10 +2.23 +2.32 +2.49 +2.53
the average of all 120 frames, to discount the variation in PSNR due to variation
in compressibility of frames, which can be high for long sequences.
Table 4.5 shows the standard deviation of PSNR for dataset 1. As noted
before, a gain in average PSNR is obtained for AVC when GOP is increased from 8
to 16. However, this is generally accompanied by a marked increase in the standard
deviation. View compensated compression has a decrease in standard deviation of
56% to %75 when compared to AVC at different bit-rates. Similar decreases were
observed for other datasets.
63
34
32
PSNR (dB)
30
28
26
View compensated compression
AVC with GOP length = 8
AVC with GOP length = 16
24
22
0
20
40
60
Frame No.
80
100
120
Figure 4.9: PSNR of frames in sequence 1 of dataset 1 at 200 Kbps.
4.6
Conclusion
In this Chapter, we presented a new scheme for determining the underlying geometric relationship between volume rendered images. The geometric mapping, so
obtained, is incorporated into the lifting steps of the 5/3 wavelet transform, which
is applied along the temporal dimension. The proposed method is tested on 4 different datasets and results indicate superior compression performance when compared
to AVC, the state-of-the-art video compression standard. Additionally, the scheme
obviates motion estimation between rendered images, enabling significant reduction
to the complexity of the encoder.
64
40
38
PSNR (dB)
36
34
32
View compensated compression
AVC with GOP length = 8
AVC with GOP length = 16
30
28
0
20
40
60
Frame No.
80
100
120
Figure 4.10: PSNR of frames in sequence 1 of dataset 1 at 800 Kbps.
65
Table 4.5: Standard deviation of PSNR for Dataset 1
AVC
AVC target bit rate (Kbps)
200
400
600
800
AVC
AVC achieved bit rate (Kbps) 206
411
617
825
Motion JPEG2000
GOP length = 1
0.10 0.07 0.06 0.06
AVC
GOP length = 8
1.53 0.86 1.11 1.38
AVC
GOP length = 16
1.47 1.24 1.63 1.99
View compensated
compression
0.36 0.55 0.68 0.77
Percentage gain over AVC
with GOP length = 16
75.36 55.59 58.69 61.63
1000
1030
0.06
1.72
2.26
0.86
62.09
66
CHAPTER 5
SCALABLE LOW COMPLEXITY CODER
5.1
Introduction
Applications that require real time transmission of images often call for a low complexity compressor/decompressor. Additionally, scalability and compression efficiency can also be desirable properties in such applications. JPEG2000 [4, 5] offers
superior compression performance and highly scalable codestreams. The drawback
of JPEG2000 is its high complexity, which may cause its encoding/decoding speed
to be a bottleneck in some real-time applications. Hence, a ‘JPEG2000-like’ coder
is proposed that sacrifices some compression efficiency to give substantial increase
to the throughput of encoder and decoder.
The chapter is organized as follows. In Section 5.2, we present the Scalable
Low Complexity Coder (SLCC). The use of SLCC in remote volume visualization
and airborne video transmission is described in Section 5.3 and Section 5.4, respectively.
67
5.2
Scalable low complexity coder
Fig. 5.1 gives the encoding architecture of SLCC. The input image samples are
subjected to 2-D DWT. Each subband is subdivided into codeblocks and are compressed independently by a block coder. This procedure enables resolution and
spatial scalability features in SLCC. In addition, a limited amount of quality scalability is obtained by dividing each codeblock into two parts (layers) as described
below.
Image
samples
2-D dyadic
discrete wavelet
transform
Division of
subbands
into
codeblocks
Stack coding of each
codeblock
Codestream
with 2 quality
layers
Block coder
Figure 5.1: Schematic of the scalable low complexity coder.
Fig. 5.2 illustrates the data structure of codeblocks in different subbands.
For codeblocks in the LL2 subband, all bit-planes above the 4th bit plane are stacked
and sent as the first quality layer. The other subbands have fewer bit planes included
in the first layer due to their lesser importance. The importance of each subband
is measured from a MSE point of view, by taking into account the synthesis filter
energy weights associated with the inverse wavelet transform. These energy weights
are rounded to the nearest power of two and used to adjust the stack lengths. In
the example of Fig. 5.2, codeblocks belonging to the HL2 and LH2 subbands will
68
have one less bit plane in the first layer while the HH2 and level-1 subbands have
two less. All-zero bit planes in a codeblock are termed as missing MSBs and are
indicated in the header information. The stack of bit-planes (discounting missing
MSBs) from each codeblock contributing to a layer is coded in one single pass. The
coding scheme employed depends on the stack length of the codeblock and is shown
in Fig. 5.3a. Entropy coding is restricted to the first three (or less) bit-planes in
the first layer. When there is one bit-plane, the position indices of ‘ones’ in that
bit-plane are coded. Run-value and Quad-Comma coding are used for stack lenghts
of 2 and 3 respectively. These two techniques are described below. For codeblocks
with more than 3 bit planes, Quad-Comma coding is used for the three MSBs and
raw bits are coded for the remaining bit-planes.
Codeblocks
Bit-plane No.
11
10
9
8
7
6
5
4
3
2
1
LL2
HL2
Missing MSBs
LH2
HH2
First (Initial) Quality Layer
HL1
LH1
HH1
Second (Final) Quality Layer
Figure 5.2: Wavelet coefficient structure.
Run-value coding is chosen here on the assumption of a sparse distribution
of non-zero values. The run length of zeros and value of the significant coefficient
69
terminating the run are coded. Additional gain is obtained by coding values of three
consecutive coefficients after the zero run. This is because neighbors of a significant
coefficient have higher probability of being significant. With Quad-Comma coding
(Fig. 5.3b), one bit is first spent to indicate the significance of a quad. Based on
statistical experiments, coefficient values in a significant quad were found to follow
a distribution that is close to geometric with parameter ρ≤0.5. Comma codes [37]
are optimum for such geometric distributions and hence are used to code each of
the four coefficients in a significant quad. For both the coding schemes described
above, the sign bit is appended to the value of the significant coefficient.
Stack length of a codeblock
Method
in the first layer
Position index of significant coefficients
=1
Run value coding
=2
Quad-Comma codes
=3
Quad-comma codes for 3 MSBs and
>3
raw bits for remaining bit-planes
(a) Coding scheme decisions
4
3
2
1
(b) Quad coding with comma codes
Figure 5.3: Low complexity entropy coding schemes.
70
5.3
Remote Volume Rendering
Fig. 4.1 shows a client-server communication setup for remote volume rendering.
All the steps in the volume rendering pipeline are executed at the server using
dedicated hardware. Based on view point requests from a client, the server transmits
the sequence of 2D rendered images interactively. With limited bandwidth, an
efficient compression scheme is vital for transmitting high quality rendered images.
In Chapter 4, an efficient view-compensated compression scheme was presented. In
addition to efficiency, the complexity of the decompressor should be low so that
the transmitted data can be decompressed by a low-end client at a desired frame
rate (25-30 frames per second). Hence, SLCC can be useful for remote visualization
when low cost client stations do not have specialized hardware/high computational
power.
With the two layer scheme of SLCC, the server may transmit only the first
layer during an interactive session. Once the interaction stops, the second layer can
be sent to give lossless representation of the image at that particular view point.
The number of bit-planes of the first layer can be adjusted based on a desired bit
rate, which may be computed from available bandwidth and desired frame rate.
The compression and throughput performance of SLCC is presented below.
71
Hardware
Accelerated
Rendering
Engine
Desired View
Point
Client
request
Rendered image
Wavelet transform
Decompressor
Block coding
PACS server
Compressed
data
client
Figure 5.4: Client-Server communication.
5.3.1
Results
Volume rendering typically produces RGB color images. The RGB color components are compressed without the use of a decorrelating transform. The YCbCr
transform that is generally used for natural RGB images did not yield any improvement for our test images. Experimental results are presented with 2 ‘out of
training’ test sequences. These 2 sequences were obtained using different rendering
parameters. Each sequence has 30 frames and the frame size is 600×600. Fig. 5.9
shows the 1st frame in test sequence 1. The throughput performance of SLCC is
compared with Kakadu V5.0 - an efficient JPEG2000 software[38] implementation.
Timing experiments were carried out on a PC with ‘Intel Core2 Duo T5470 1.6GHz’
processor and 2GB RAM.
The end-to-end decompression time consists of reading the compressed data
72
from memory (RAM), block decoding, inverse 2-D DWT and writing the decompressed image to display. The end-to-end decompression time of SLCC is compared
to Kakadu V5.0 in Fig. 5.5 and Fig. 5.6 for test sequence 1 and test sequence 2
respectively. The fastest mode of JPEG2000, referred as the ‘bypass’ mode is used
[5]. As seen from the figures, 2 to 4 times speed-up is obtained with SLCC at
moderate to high rates.
140
Kakadu V5.0
SLCC
End−to−end decompression time (ms)
120
100
Lossless
80
60
40
20
0
0
2
4
6
8
10
Rate (bits/pixel)
12
14
16
18
Figure 5.5: Comparison of end-to-end decompression time for test sequence 1.
Table 5.1 and Table 5.2 show the compression performance of SLCC and
kakadu V5.0 for test sequence 1 and test sequence 2 respectively. As noted from the
tables, SLCC has 1.1 to 3.1 dB PSNR loss for sequence 1 and 1.1 to 2.3 dB PSNR
loss for sequence 2, compared to Kakadu V5.0.
Fig. 5.8 compares the quality of decompressed images at the client under
a decoding time constraint corresponding to a frame rate of 30 fps. At this frame
73
140
Kakadu V5.0
SLCC
End−to−end decompression time (ms)
120
100
Lossless
80
60
40
20
0
0
2
4
6
8
10
Rate (bits/pixel)
12
14
16
18
Figure 5.6: Comparison of end-to-end decompression time for test sequence 2.
Table 5.1: Compression performance of test sequence 1.
Rate (in bits/pixel)
0.19
0.39
0.87
1.75
3.61
JPEG2000
30.46
31.50
33.49
36.54
41.69
SLCC
29.38
30.52
32.01
34.58
38.59
rate, the decompressor will have to work with a time constraint of 33ms/frame. In
the case of Kakadu V5.0, the decompression time reaches this limit at 0.76 bpp
(7.8Mbps). Thus the image quality with Kakadu V5.0 will be limited to 33.03 dB
(PSNR at 0.76 bpp) at all bandwidths greater than 7.8 Mbps. If the available bandwidth exceeds approximately 15 Mbps, SLCC can use the available data to yield
higher PSNR than Kakadu V5.0. Fig. 5.10 and Fig. 5.11 show the decompressed
images from Kakadu V5.0 and SLCC respectively, when the availble bandwidth is
70 Mbps. As seen from the figure, SLCC can display very high quality images at
this bandwidth.
74
Table 5.2: Compression performance of test sequence 2.
Rate (in bits/pixel)
0.17
0.29
0.70
1.48
2.97
JPEG2000
30.72
31.54
33.29
36.21
40.43
SLCC
29.63
30.71
32.12
34.39
38.14
Average PSNR of decompressed frames (dB)
44
Kakadu V5.0
SLCC
42
40
38
36
34
32
30
28
0
10
20
30
40
Bandwidth (in Mbps)
50
60
70
Figure 5.7: Average PSNR of decompressed images at the client for test sequence
1.
5.4
Airborne video transmission
With technological advances in image acquisition systems, use of high speed, high
resolution video cameras are common in many telemetering applications. Such cameras [39, 40] have been developed for airborne applications including reconnaissance,
earth survey [41], radar and sonar systems [42]. These specialized video cameras can
record images at 200-400 frames per second (fps) and acquire dual band imagery
(visible and IR). Features required in an image compression algorithm for airborne
reconnaissance are discussed in [43]. For real-time transmission, the image encoder
75
Average PSNR of decompressed frames (dB)
44
Kakadu V5.0
SLCC
42
40
38
36
34
32
30
28
0
10
20
30
40
Bandwidth (in Mbps)
50
60
70
Figure 5.8: Average PSNR of decompressed images at the client for test sequence
2.
needs to keep up with the image acquisition hardware. Additionally, the transmission of compressed images is often through an error prone wireless channel. Thus
a fast, efficient, and error resilient image compression scheme is vital to realize the
full potential of airborne reconnaisance [44].
The computational complexity of JPEG2000 can make software implementations impractical for some real-time airborne reconnaissance. Other compression
standards such as MPEG-4 and H.264 may not be practical in these applications
either. While the decoding complexities of these standards are lower, their encoding
complexities are much higher than JPEG2000 [45]. In this section, we describe the
use of SLCC for airborne reconnaissance.
In airborne video transmission, the scalability features of SLCC can be used
76
Figure 5.9: Frame 1 of test sequence 1.
to achieve ample bandwidth savings and functionality. With resolution scalability,
only parts of the compressed data, corresponding to the resolution required at the
ground station needs to be transmitted in real-time. Similarly, quality scalability
can be very beneficial as well. While a small portion of the compressed codestream
can be transmitted to the ground station for real-time analysis, the entire compressed codestream yielding a much higher quality (perhaps even lossless) can be
stored onboard for further processing at a later time. Spatial scalability may allow
real-time transmission of the data that corresponds to a desired ROI to the ground
station, while the data for the rest of the scene is saved on board for later processing. Alternatively, spatial scalability can be used in conjunction with quality
scalability to separately adjust the quality of the ROI and the background. This
can ensure high quality reconstruction of the ROI when sufficient bit budget is not
77
Figure 5.10: Compressed/Decompressed image – JPEG2000.
available to provide high quality throughout the entire image. This is illustrated
in Fig. 5.12 where the ROI is reconstructed at higher quality level than the background. This feature, when used with an object tracking mechanism, can be very
useful in surveillance applications as shown in [46].
5.4.1
Results
In this section, the performance of SLCC is compared to Kakadu V5.0 for a 720x576
grayscale aerial video sequence with 100 frames. All the timing experiments were
carried out on a PC with a 2.8GHz P4 processor and 512MB RAM. Fig. 5.13 compares the end-to-end encoding times of JPEG2000 and SLCC at different bit-rates
(bits/pixel). The end-to-end encoding time comprises reading the input image from
78
Figure 5.11: Compressed/Decompressed image – SLCC.
memory, 2D DWT, block encoding and writing the compressed data to memory.
As seen in the figure, SLCC is 3 to 5 times faster than Kakadu V5.0.
Fig. 5.14 shows the compression performance of SLCC and Kakadu V5.0
averaged over 100 frames for the above aerial video sequence. Peak Signal to Noise
Ratio (PSNR) is used as the quality metric. In the figure, SLCC incurs a 0.6 to 1
dB loss at low to moderate bit rates compared to Kakadu V5.0. Alternatively, for a
given image quality, SLCC produces a 15-20% larger compressed codestream when
compared to Kakadu V5.0. However, due to its significantly reduced complexity,
SLCC can deliver a much higher frame rate for a desired quality level. This can be
seen in Fig. 5.15 where the achievable frame rate (reciprocal of end-to-end encoding
time) is plotted against PSNR. For example, at a quality level of 30 dB PSNR,
79
Figure 5.12: Codeblocks from different subbands (left) that correspond to a region
of interest in the image (right).
SLCC can deliver images at 98 fps while Kakadu V5.0 can only deliver at 30 fps. The
corresponding required transmission rates are 7.75 Mbps and 2.09 Mbps for SLCC
and Kakadu V5.0, respectively. At a PSNR of 45 dB, SLCC can run at 70 fps where
Kakadu V5.0 can only run at 15 fps. The corresponding required transmission rates
are 60.91 Mbps and 11.57 Mbps for SLCC and Kakadu V5.0, respectively. Note
that SLCC and Kakadu V5.0 have roughly symmetric encoder/decoder complexity.
That is, complexities of the encoder and the decoder are roughly equal. Thus, the
above results are representative for decoder performance as well.
80
80
End to end encoding time (ms)
70
Lossless
60
50
Kakadu V5.0
SLCC
40
30
Lossless
20
10
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Rate (bits/pixel)
Figure 5.13: Comparison of end-to-end encoding times for SLCC and JPEG2000.
81
Inf
Lossless
50
PSNR (dB)
45
40
35
SLCC
Kakadu V5.0
30
25
20
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Rate (bits/pixel)
Figure 5.14: Comparison of rate vs. quality for SLCC and JPEG2000.
82
110
JPEG2000
SLCC
Achievable frame rate (fps)
100
90
80
70
60
50
40
Lossless
30
20
10
25
30
35
40
PSNR (dB)
45
50
Inf
Figure 5.15: Achievable frame rate at different quality levels for SLCC and
JPEG2000.
83
CHAPTER 6
CONCLUSION
In this dissertation, we presented several novel techniques for efficient compression
of multi-dimensional images. In chapter 3, a JPEG2000 Part 2 compliant N -D
compression scheme was presented for any N ≥ 3. The data are treated as multiple
‘xy’ slices. Three different transform coding tools can then be used to decorrelate the data in N − 2 dimensions. The resulting code-stream components are
compressed using 2-D DWT and block coding. Experimental results indicate that
the proposed scheme can give considerable compression gain when a 5-D dataset
is decorrelated in more than 3 dimensions. Scalability is supported along all N
dimensions. In Chapter 4, a new view compensation scheme was presented that
utilizes the geometric relationship between view-points to exploit the correlation
between successive volume rendered images. The proposed method performs better
than H.264/AVC (Advanced Video Codec), the state of the art video compression
standard. Additionally, our scheme obviates motion estimation between rendered
images, enabling significant reduction to the complexity of the compressor compared
84
to H.264/AVC. Applications that require real-time (25-30 frames/s) transmission of
images, often call for a low complexity compressor or decompressor. Additionally,
scalability and compression efficiency can be very desirable properties in such applications. In Chapter 5, we presented a new scalable low complexity image coder
that has good compression efficiency and high throughput. We illustrate its use for
two applications: remote volume visualization and airborne video transmission.
85
REFERENCES
[1] Z. H. Cho, J. P. Jones, and M. Singh, Foundations of Medical Imaging. WileyInterscience, 1993.
[2] J. G. Lou, H. Cai, and J. Li, “A real-time interactive multi-view video system,”
13th Annual ACM International Conference on Multimedia, pp. 161–170, 2005.
[3] M. Kitahara, H. Kimata, S. Shimizu, K. Kamikura, Y. Yashima, K. Yamamoto,
K. Yendo, T. Fujii, and M. Tanimoto, “Multi-view video coding using view
interpolation and reference picture selection,” IEEE Int. Conference on Multimedia and Expo, pp. 97–100, Jul 2006.
[4] Information technology - JPEG2000 image coding system - Part 1: Core coding
system, ISO/IEC JTC1/SC29 WG1, 15444-1:2000 Std., Jul 2002.
[5] D. S. Taubman and M. W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards and Practice. Boston: Kluwer Academic Publishers,
2002.
[6] K. M. Siddiqui, E. L. Siegel, B. I. Reiner, O. Crave, J. P. Johnson, Z. Wu, J. C.
Dagher, A. Bilgin, M. W. Marcellin, and M. Nadar, “Improved compressibility
of multi-slice CT datasets using 3D JPEG2000 compression,” in Computer
Assisted Radiology and Surgery, Jun 2004, pp. 28–31.
[7] J. H. Kasner, A. Bilgin, M. W. Marcellin, A. Lan, B. V. Brower, S. S. Shen, ,
and T. Wilkinson, “JPEG2000 compression using 3D wavelets and KLT with
application to HYDICE data,” in Applications of Digital Image Processing
XXV, Proc. SPIE, vol. 4132, Jul 2000, pp. 157–166.
[8] D. A. Huffman, “A method for the construction of mimimum redundancy
codes,” in Proc. IRE, vol. 40, 1952, pp. 1098–1101.
[9] J. Rissanen, “Generalized kraft inequality and arithmetic coding,” in IBM J.
Res. Develop., vol. 20, 1976, pp. 198–203.
86
[10] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression,” in Communications of the ACM, vol. 30, 1987, pp. 520–540.
[11] Information technology - JPEG2000 image coding system - Part 2: Extensions,
ISO/IEC JTC1/SC29 WG1, 15444-2:2000 Std., Jul 2002.
[12] L. Zeng, C. P. Jansen, S. Marsch, M. Unser, and P. R. Hunziker, “Fourdimensional wavelet compression of arbitrarily sized echocardiographic data,”
IEEE Transactions on Medical Imaging, vol. 21, pp. 1179–1187, Sept 2002.
[13] H. G. Lalgudi, A. Bilgin, M. W. Marcellin, and M. S. Nadar, “Compression
of fMRI and ultrasound images using 4D SPIHT,” IEEE Int. Conference on
Image Processing, vol. 2, pp. 746–9, Sept 2005.
[14] Y. Liu and W. A. Pearlman, “Four-dimensional wavelet compression of 4-D
medical images using scalable 4-D SBHP,” Data Compression Conference, pp.
233–242, March 2007.
[15] D. S. Taubman, “High performance scalable image compression with EBCOT,”
IEEE Trans. Image Processing, vol. 9, no. 7, pp. 1258–1170, Jul 2000.
[16] B. Penna, T. Tilloa, E. Magli, and G. Olmo, “Progressive 3-D coding of hyperspectral images based on JPEG2000,” IEEE Geoscience and Remote Sensing
Letters, vol. 2, no. 1, pp. 125–129, January 2006.
[17] H. G. Lalgudi, A. Bilgin, M. W. Marcellin, A. Tabesh, M. S. Nadar, and T. P.
Trouard, “Four-dimensional compression of fMRI using JPEG2000,” in Proc.
SPIE International Symposium on Medical Imaging, Feb 2005.
[18] http://www.terarecon.com/.
[19] S. Stegmaier, M. Magalln, and T. Ertl, “Visualization techniques I: A generic
solution for hardware-accelerated remote visualization,” Proceedings of the
Symposium on Data Visualisation, pp. 87–94, May 2002.
[20] S. Stegmaier, J. Diepstraten, M. Weiler, and T. Ertl, “Widening the remote
visualization bottleneck,” Proceedings of the 3rd International Symposium on
Image and Signal Processing and Analysis, pp. 174–179, Sept 2003.
[21] M. Oberhumer, http://www.oberhumer.com/opensource/lzo/.
87
[22] L. Gailly and M. Adler, http://www.gzip.org/zlib.
[23] G. Campbell, T. A. DeFanti, I. Frederiksen, S. A. Joyce, and L. A. Leske, “Two
bit/pixel full color encoding,” Proceedings of the 13th Annual Conference on
Computer Graphics and Interactive Techniques, pp. 215–223, 1986.
[24] T. Chu, J. Fowler, and R. Moorhead, “Evaluation and extension of SGI
vizserver,” Visualization of Temporal and Spatial Data for Civilian and Defense Applications III, Proceedings of the SPIE 4368, Apr 2001.
[25] Information technology – JPEG 2000 image coding system – Part 3: Motion
JPEG 2000, ISO/IEC 15444-3, 2002.
[26] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the
H.264/AVC video coding standard,” IEEE Trans. on Circuits and Systems for
Video Technology, vol. 13, no. 7, pp. 560–576, July 2003.
[27] I. Daubechies and W. Sweldens, “Factoring wavelet and subband transforms
into lifting steps,” Journal of Fourier Analysis and Applications, vol. 4, no. 3,
pp. 247–269, May 1998.
[28] R. Calderbank, I. Daubechies, W. Sweldens, and B. Yeo, “Wavelet transforms
that map integers to integers,” Applied and Computational Harmonic Analysis,
vol. 5, no. 3, pp. 332–369, July 1998.
[29] A. Secker and D. S. Taubman, “Lifting-based invertible motion adaptive transform (LIMAT) framework for highly scalable video compression,” IEEE Trans.
Image Processing, vol. 12, no. 12, pp. 1530–1542, Dec 2003.
[30] M. Levoy, “Efficient ray tracing of volume data,” ACM Transactions on Graphics, vol. 9, no. 3, July 1990.
[31] B. Lichtenbelt, R. Crane, and S. Naqvi, Introduction to volume rendering.
Prentice Hall, 1998.
[32] M. Levoy and P. Hanrahan, “Light field rendering,” Proceedings of the 23rd
Annual Conference on Computer Graphics and Interactive Techniques SIGGRAPH.
88
[33] C. Chang, X. Zhu, P. Ramanathan, and B. Girod, “Light field compression
using disparity-compensated lifting and shape adaptation,” IEEE Trans. Image
Processing, vol. 15, no. 4, pp. 793–806, Apr 2006.
[34] D. Hearn and M. P. Baker, Computer graphics with openGL.
2003.
Prentice Hall,
[35] L. Zhanping and R. Moorhead, “A texture-based hardware-independent technique for time-varying volume flow visualization,” Journal of Visualization,
vol. 8, no. 3, pp. 235–244, 2005.
[36] http://www.vtk.org/.
[37] R. G. Gallager and D. C. V. Voorhis, “Optimal source codes for geometrically
distributed integer alphabets,” IEEE Trans. on Information Theory, vol. 21,
no. 2, pp. 228–230, Mar 1975.
[38] D. Taubman, http://www.kakadusoftware.com/.
[39] D. A. Dobson, S. Agwani, W. D. Washkurak, and S. G. Chamberlain, “A highspeed, high resolution TDI image sensor for use in airborne reconnaissance
applications,” in Airborne Reconnaissance XVIII, Proc. SPIE, vol. 2272, Jul
1994, pp. 221–229.
[40] B. Mathews, “An ultra high resolution, electro-optical framing camera for reconnaissance and other applications using a 9216 by 9216 pixel, wafer scale,
focal plane array,” in Airborne Reconnaissance XXII, Proc. SPIE, vol. 3431,
Jul 1998, pp. 144–154.
[41] B. Uhl, “Aerial data acquisition system for earth survey,” in Airborne Reconnaissance XIV, Proc. SPIE, vol. 1342, Jul 1990, pp. 51–60.
[42] A. Mason and S. Gills, “Real-time high resolution digital video for range and
training applications,” in Proceedings of International Telemetering Conference, Oct 2001, pp. 646–655.
[43] D. C. L. V. Berg and M. R. Kruer, “Image compression for airborne reconnaissance,” in Airborne Reconnaissance XXII, Proc. SPIE, vol. 3431, Jul 1998, pp.
2–13.
89
[44] D. R. Schmitt, H. Dorge1oh, J. Fries, H. Keil, W. Wetjen, and S. Kleindienst,
“Airborne network system for the transmission of reconnaissance image data,”
in Airborne Reconnaissance XXIV, Proc. SPIE, vol. 4127, Jul 2000, pp. 97–100.
[45] F. Dufaux and T. Ebrahimi, “Error-resilient video coding performance analysis
of motion JPEG2000 and MPEG-4,” in Visual Communications and Image
Processing, Proc. SPIE, vol. 5308, 2004, pp. 596–607.
[46] H. S. Kong, A. Vetro, T. Hata, and N. Kuwahara, “ROI-based SNR scalable JPEG2000 image transcoding,” in Visual Communications and Image
Processing, Proc. SPIE, vol. 5690, Jan 2005, pp. 5Q1–5Q10.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement