INFORMATION TO USERS
INFORMATION TO USERS
This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer.
The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedtbrough, substandard margins, and improper alignment can adversely affect reproduction.
In the unlikely. event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper lefthand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book.
Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9" black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order.
UMI
University Microfilms International
A Bell & Howell Information Company
300 North Zeeb Road. Ann Arbor. M1481061346 USA
313/7614700 800/5210600
     
Order Number 9426576
Entropyconstrained predictive trellis coded quantization and compression of hyperspectral imagery
Abousleman, Glen Patrick, Ph.D.
The University of Arizona, 1994
U·M·I
300 N. Zeeb Rd.
Ann Arbor, MI 48106
ENTROPYCONSTRAINED PREDICTIVE TRELLIS CODED
QUANTIZATION AND COMPRESSION OF HYPERSPECTRAL
IMAGERY
by
Glen Patrick Abousleman
A Dissertation Submitted to the Faculty of the
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
In Partial Fulfillment of the Requirements
For the Degree of
DOCTOR OF PHILOSOPHY
WITH A MAJOR IN ELECTRICAL ENGINEERING
In the Graduate College
THE UNIVERSITY OF ARIZONA
1994
THE UNIVERSITY OF ARIZONA
GRADUATE COLLEGE
As members of the Final Examination Committee, we certify that we have read the dissertation prepared by
Glen Patrick Abouslp.man entitled Zntro9yConstrained Predictive Trellis Coded
Quantization and Compression of Hyperspectral
Imagery and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy
Robert
SChOWenge~
Date
Date
Final approval and acceptance of this dissertation is contingent upon the candidate's submission of the final copy of the dissertation to the
Graduate College.
I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation
R. Hunt
1at~!
r,,199
'I
7'g'f?Y
Date
3
STATEMENT BY AUTHOR
This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at The University of Arizona and is deposited in the University
Library to be made available to borrowers under rules of the Library.
Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate
College when in his or her judgment the proposed use of the material is in the interests of scholarship.
In
all other instances, however, permission must be obtained from the author.
SIGNED:
4
ACKNOWLEDGEMENTS
I would like to thank my advisors Dr. Michael W. Marcellin and Dr. Bobby R.
Hunt for their support, persistence, guidance, and willingness to help at a moments notice. A special thanks goes to my fellow graduate students and lab colleagues
Ralph, Jim, Phil, Scott, Patrick, Dave, and Mike.
On a personal level, I wish to thank my parents for their continued support, confidence, and encouragement.
5
TABLE OF CONTENTS
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6
ABSTRACT. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. 8
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . .. " 10
2 ENTROPYCONSTRAINED PREDICTIVE TCQ . . . . . . . .. 20
2.1 B a c k g r o u n d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Trellis Coded Quantization . . . . . . . . . . . . . . . . . . . . . .. 29
2.3 EntropyConstrained Trellis Coded Quantization . . . . . . . . . " 34
2.4 EntropyConstrained Predictive Trellis Coded Quantization . 37
2.4.1 Performance of ECPTCQ . . . . . . . . . . . . . . . . . . .. 40
3 HYPERSPECTRAL IMAGE CODER USING ECPTCQ . . . .. 43
3.1 System Description . . . . . . . . . . . . . . . . . . . . . . . . . . " 43
3.2 Codebook Design . . . . . . . . . . . . . . . . . . . . . . . . . . . " 43
3.3 Side Information and Rate Allocation . . . . . . . . . . . . . . . " 46
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 50
4
3D DCT HYPERSPECTRAL CODER.
4.1 System Description.
4.2 Codebook Design .
4.3 Side Information and Rate Allocation
4.4 Results
56
56
57
60
61
5 HYBRID DPCM/DCT HYPERSPECTRAL CODER . . . . . " 64
5.1 System Description. . . . . . . . . . . . . . . . . . . . . . . . . . .. 64
5.2 Codebook Design, Rate Allocation, and Side Information . . . . . . 67
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6 COMPARISONS OF HYPERSPECTRAL CODERS ..
72
78 7 SUMMARY . . . . . . . . . . . .
Appendix A. PHOTOGRAPHS ..
REFERENCES . . . . . . . . . . . . .
81
86
6
LIST OF FIGURES
2.1 Inputoutput characteristics of a uniform scalar quantizer.
2.2 A 4state trellis with subset labeling and codebook. . . . .
2.3 Granular gain for various trellis sizes (IOg2
N).. . . . . . .
23
30
31
2.4 Performance of 4state and 256state TCQ for encoding the memoryless Gaussian source. . . . . . . . . . . . . . . . . . . . . . . .. 32
2.5 Performance of 4state and 256state TCQ for encoding the memoryless Laplacian source. . . . . . . . . . . . . . . . . . . . . . . .. 33
2.6 Uniform TCQ codebook. . . . . . . . . . . . . . . . . . . . . . . . . 35
2.7 Performance of an 8state entropy coded uniform TCQ for encoding the memoryless Gaussian source. . . . . . . . . . . . . . . . . . 36
2.8 Performance of an 8state entropyconstrained TCQ for encoding the memory less Gaussian source. . . . . . . . . . . . . . . . . . . .. 37
2.9 SNR performance of ECPTCQ. . 42
3.1 Hyperspectral image coder. . . . 44
3.2 Overall rate versus number of spectral bands. 51
3.3 Performance of encoding hyperspectral sequence at R/I
Rl
=
0.10 b/p/b,
=
1.0 bpp, and
RJ.'
=
2.0 bpp. . . . . . . . . . . . . . . . . 53
3.4 Performance of encoding hyperspectral sequence at
RB
=
0.10 b/p/b,
Rl = 0.75 bpp, and RJ.' = 0.75 bpp. . . . . . . . . . . . 54
4.1 Hyperspectral image coder using the 3D DCT and ECTCQ.
4.2 Probability density function for generalized Gaussian distributions
56 with alpha values of 0.5, 1.0, 1.5, 2.0, and 2.5. . . . . . . . . .. 58
4.3 Kurtosis vs. alpha . . . . . . . . . . . . . . . . . . . . . . . . . . .. 59
4.4 Performance of encoding hyperspectral sequence at RB = 0.10 b/p/b. 62
4.5 Overall rate versus number of spectral bands . . . 63
5.1 Hybrid DPCM/DCT hyperspectral image encoder
5.2 Performance of encoding hyperspectral sequence at
RB
65
=
0.10 b/p/b. 70
5.3 Overall rate versus number of spectral bands . . . . . . . . . . . . . 70
6.1 Performance of encoding hyperspectral sequence at
RII
6.2 Overall rate versus number of spectral bands. . .
=
0.10 b/p/b. 73
74
A.1 Band 50 of a hyperspectral sequence (256 x 256). 81
7
A.2 Encoded image from ECPTCQ coder with high side information
(44.37 dB at 0.19
b/p/b). . . . . . . . . . . . . . . . . . . . . .
82
A.3 Difference image from ECPTCQ coder with high side information. . 82
A.4 Encoded image from ECPTCQ coder with low side information (42.11 dB at 0.17 b/p/b). . . . . . . . . . . . . . . . . . . . . . . . .. 83
A.5 Difference image from ECPTCQ coder with low side information. 83
A.6 Encoded image from 3D DCT coder (41.66 dB at 0.104 b/p/b). .. 84
A.7 Difference image from 3D DCT coder. . . . . . . . . . . . . . . 84
A.8 Encoded image from hybrid coder (41.01 dB at 0.116 b/p/b). .
A.9 Difference image from hybrid coder. . . . . . . . . . . . . . . .
85
85
8
ABSTRACT
A trainingsequencebased entropyconstrained predictive trellis coded quantization (ECPTCQ) scheme is presented for encoding autoregressive sources. For encoding a firstorder GaussMarkov source, the MSE performance of an 8state ECPTCQ system exceeds that of entropyconstrained DPCM by up to 1.0 dB. In addition, three systems an ECPTCQ system, a 3D Discrete Cosine Transform (DCT) system and a hybrid system are presented for compression of hyperspectral imagery which utilize trellis coded quantization (TCQ). Specifically, the first system utilizes a 2D DCT and ECPTCQ. The 2D DCT is used to transform all nonoverlapping
8
X
8 blocks of each band. Thereafter, ECPTCQ is used to encode the transform coefficients in the spectral dimension. The 3D DCT system uses TCQ to encode transform coefficients resulting from the application of an 8 x 8 x 8 DCT. The hybrid system uses DPCM to spectrally decorrelate the data, while a 2D DCT coding scheme is used for spatial decorrelation. Side information and rate allocation strategies for all systems are discussed. Entropyconstrained codebooks are optimized for various generalized Gaussian distributions using a modified version of the generalized
Lloyd algorithm. The first system can compress a hyperspectral image sequence at
0.125 bits/pixel/band while retaining an average peak signaltonoise ratio of greater
9 than 43 dB over the spectral bands. The 3D DCT and hybrid systems achieve compression ratios of 77:1 and 69:1 while maintaining average peak signaltonoise ratios of 40.75 dB and 40.29 dB, respectively, over the coded bands.
10
CHAPTER 1
INTRODUCTION
The common operating mode for strategic and tactical reconnaissance sensors has been, from the earliest days of photography, panchromatic. That is, the usual form of focal plane sensors integrate a wide range of data wavelengths into a single response.
This single response is then usually displayed as a grayscale image in which the integrated wavelength response at a given spatial point is given a grayscale range between pure white (maximum response) and pure black (zero response). The choice of panchromatic sensing has been purely pragmatic in motivation. Although it is known that some narrowband responses are of interest, such as in infrared, there have been few problems of reconnaissance significance that were driven by the need to couple wavelength sensitivity to spatial resolution.
The preferability of only panchromatic sensing has begun to wane in recent years.
A variety of questions are now important that can only be answered by the ability to perform precise recording of sensed energy in a number of narrow wavelength slices.
For example, various types of camouflage and concealment techniques are revealed by narrowband spectral sensing. The effluents of various manufacturing facilities, as sensed by fine spectral resolution, can be a critical clue to the type of processes employed in the facility. The agricultural yield and health of crops can be predicted
11 from quantitative analysis of fine resolution spectral images [1]. The development and utilization of fine resolution spectral sensors is becoming of prominent interest for these and many other applications.
The Landsat series of earth satellites conclusively established the value of employing image sensors with multiple wavelength sensitivity [2]. Landsat images represent a rather coarse slicing of the optical wavelength spectrum, however, being only 4 to
6 overlapping bands through the visible and near infrared wavelengths, with each band having a width of 100 to 200 nm. Given that many surface materials have absorption features that are only 20 to 40 nm wide [1], it is apparent that this class of "multispectral" sensors cannot record the narrow wavelength absorption features that are indicative of specific materials in laboratorybased spectroscopy [3].
The limitation associated with multispectral scanners has led to imaging spectrometers which register many narrow waveband images of a scene and allow the assemblage of a contiguous reflectance spectrum for each pixel in the scene. An early example of such a sensor is the Airborne Imaging Spectrometer (AIS) [4]. AIS was developed by the NASA Jet Propulsion Laboratory (JPL) for civil environmental applications. AIS could simultaneously record 128 nearinfrared wavebands (each being
9.3 nm wide) with a 365 to 787 m swath.
12
A somewhat similar yet more complex JPL sensor was the Airborne Visible/Infrared
Imaging Spectrometer (AVIRIS) [5],[6]. This sensor collected 209 visible and nearinfrared wavebands, each of width 10 nm, with an llkm swath. The radiometric quantization of AVIRIS is 10 bits.
The Earth Observing Satellite (EOS), to be launched in the near future, will serve as the platform for new, advanced high dimensional multispectral scanners. EOS will contain the High Resolution Imaging Spectrometer! (HIRIS) [7], a Moderate Resolution Imaging Spectrometer (MODIS) [8], and a synthetic aperture radar. HIRIS can collect 192 spectral bands in the 0.42.5
p.m
wavelength region, with 30 meter spatial and 10 nm spectral resolution, over a 24 km wide swath. The 2D CCD array aboard HIRIS is sampled and quantized at 12 bits. The 192 bands will comprise a single HIRIS (i.e., hyperspectraQ image. HIRIS will be used by a wide diversity of remote sensing disciplines including geologic, oceans, soil, vegetation, atmosphere, snow/ice, and others.
The extreme spectral resolution afforded by HIRIS is representative of future spectral sensors. However, because of the enhanced spectral (and spatial) resolution, tremendous amounts of raw data are produced. In fact, HIRIS will expel 512 megabits/second in its operational state. Transmission of the complete data record to an earth receiving station is difficult since the EOS downlink channel is capable of
IThe
BIRIS project has been cancelled with no replacement announced.
13 only 300 megabit/second and must service all onboard experiments, not just HIRIS
[9].
An obvious data reduction technique is to discard a portion of the data (i.e., socalled "spectral editing"). Although the remaining data is unscathed, many of the advantages of the fine spectral resolution imagery may be lost. Perhaps a more desirable solution is to use lossless or information preserving compression [10]. The purpose of lossless compression is to represent the data using a minimum number of bits by reducing the statistical redundancies inherent to the data. Although certainly viable, this method can only provide compression ratios of about 3:1 [11].
Another alternative, which is the subject of this paper, is lossy compression
[12],[13],[14]. Although distortion is introduced (in the reconstructed data), very high compression ratios can be obtained. Proper optimization of the compression system can yield an error small enough such that visual degradations are practically nonexistent and classification errors are small. Compression ratios of greater than
30:1 would enable HIRIS to operate 100% of the time rather than the anticipated 2 to 3% duty cycle [9].
Once the sensor information is downloaded to an earth receiving station, processing and handling of the data is very problematic. For example, the EOS platform will expel terabytes of data per day [15], with the total data collected being on the order of 10
16 bytes over its 15year mission. Moreover, it is estimated that data handling
14 expenses alone will amount to $300 million over the life of the system [16]. Groundbased operations include archival, browsing, dissemination, and rapid delivery and analysis of hyperspectral data.
In its pristine form, storage of the data may be feasible (although expensive) with the use of optical disk technology, but transmission from site to site places undue demands on the communications link, and browsing as a prelude to analysis is nearly impossible. It may be advantageous to have browsequality data online to facilitate rapid viewing of several hyperspectral images. Fewer bytes on the disk equate to much faster read times and transmission costs of the data are reduced significantly
[16].
Lossy compression of imagery can manifest itself in sensorbased and/or groundbased systems. The former requires fast, efficient algorithms, implement able with existing low complexity (low power consumption) hardware in nearrealtime. The requirements of the latter are somewhat relaxed in that greater complexity hardware can be utilized, and speed may not be of paramount importance. However, as discussed earlier, for uses such as rapid browsing, decoding should be fast enough to handle data rates in real time (50 kbit/sec). Also, savings in transmission and storage should not be negated by decoding computational costs.
HIRIS / AVIRIS is exemplary of the characteristics of future finespectralresolution image sensors. The volume of data in such images requires fundamental rethinking of many image processing operations that have been developed for panchromatic
15 and even lowdimensional multispectral data. A property of finespectralresolution imagery is interband correlation.
It is easy to observe in even coarseband imagery, such as Landsat multispectral or threeprimary color images, that many features of edge definition, contrast, texture, graylevel, etc., remain substantially the same from spectral band to spectral band. The interband correlation facilitates substantial reduction of the data required for storing and/or transmitting such imagery. However, a careless approach to reducing the correlation could lead to disastrous loss of the information differences between bands that are the critical value of multispectral imagery. An improper reduction of correlation redundancy could make it impossible to exploit the imagery for significant utility.
Recent work related to lossy compression of multispectral and hyperspectral imagery has been reported in the literature. In [9], DPCM, block truncation coding, transform coding, and various vector quantization (VQ) schemes were investigated to compress hyperspectral data from the Airborne Imaging Spectrometer.
It was found that mean residual and gainshape VQ yield the best results (in the context of spectral signature mapping and mixture component analysis, as performed by the Spectral
Analysis Manager (SPAM) software package) with compression ratios of
~
20:1. VQ was found to yield the best quantitative performance with a compression ratio of 32:1.
Nonlinear predictive VQ was developed in [17] which used interband vector prediction and intraband VQ. Simulations with Landsat Thermatic Mapper (TM) imagery yielded average bit rates of 0.34 bpp with excellent quantitative results. This work
16 was extended in [18] using Variable Rate Multistage Vector Quantization (MSVQ) to yield slightly better performance. [19] used spectral VQ to compress imagery from the Thermal Infrared Multispectral Scanner (TIMS) and the Calibrated Airborne
Multispectral Scanner (CAMS) at 24:1 and 28:1, respectively, with very favorable results. [20] also investigated the use of VQ to compress data from the Advanced Very
High Resolution Radiometer (AVHRR). Their system yielded compression ratios of
24:1 with visually pleasing results. [16] also reports results of compressing AVHRR data using progressive VQ with the use of a Single Instruction Multiple Data (SIMD) machine. The system so described can be operated in various image reconstruction levels ranging from browse quality to lossless.
Several nonVQ systems have also been recently introduced. A hierarchical data compression system was developed in [21] which allowed compression with high loss, moderate loss, and no loss. This system incorporated block averaging, quadtrees and iterative parallel region growing. For compressing Landsat TM imagery, compression ratios of up to 27:1 were obtained. [22] introduced a system whereby the 3D data is first decorrelated spectrally by using the KarhunenLoeve (KL) transformation, and then subjected to an adaptive discrete cosine transform (DCT) coding technique.
A similar system was presented in [15]. That system also used the KL transform to spectrally decorrelate the multispectral data. However, the discrete wavelet transform was used to spatially decorrelate the principal components. It was reported that compression of the principal components yielded a 40% improvement in compression
17
(as compared to spatial coding of the bands themselves), with a similar reduction in meansquared error.
Lossless compression schemes have also been reported in the literature [23],[24]. In particular, [11] found that in Landsat TM imagery, exploitation of spatial correlation yields better compression than does the use of spectral correlation. Accordingly, compression ratios of 3:1 were obtained. Slightly better performance was obtained in
[25] with the use of spatial prediction and arithmetic coding.
The algorithms developed in the present work for compression of hyperspectral imagery are based on a relatively new quantization scheme known as trellis coded quantization (TCQ). TCQ was introduced in [26] as an effective and computationally tractable method for encoding memoryless sources. TCQ is spawned from trellis coded modulation (TCM) originally developed by Ungerboeck [27]. In its simplest form, for encoding at
R
bits per sample using TCQ, a scalar codebook of size 2
R
+
1 is partitioned into 4 subsets, each of size 2
R

1
, and an appropriate trellis is labeled with these subsets. The Viterbi algorithm [28] is then used to find the minimum meansquared error (MSE) path through the trellis.
Highperformance image coding algorithms using TCQ have been recently introduced in the literature. For example, [29] designed an image coder which uses
TCQ to encode the coefficients resulting from the application of the 2D DCT. For encoding the "Leena" image, it is shown that peak signaltonoise ratios (PSNR)
(10Iog1Q255/MSE) of 39.33, 35.97, and 32.49 dB are obtained at encoding rates of
18
1.0, 0.5, and 0.25 bits/pixel, respectively. Further, the wavelet coder described in
[30] codes "Leena" with PSNR values of 39.85, 36.61, and 33.77 dB, at rates of
1.0, 0.5, and 0.25 bits/pixel, respectively. This coder uses wavelet filters to create
16 equalsized subbands, each of which is encoded using TCQ, with the lowest frequency subband being transformed using a 2D DCT prior to quantization. Both systems have excellent subjective performance, with the wavelet coder quantitatively surpassing all other systems reported in the literature to date.
In the present work, TCQ is used in three hyperspectral coding systems which are based on the DCT. The DCT is often rationalized as an approximation to the full optimality of the KL transform; the computational simplicity afforded by the DCT being a strong motivating factor. The DCT is very attractive in the computational cost versus performance category and has been adopted for international standards such as JPEG [31], CCITT H.261 [32], and MPEG [33]. Recent introduction of dedicated DCT processors [34],[35],[36] may facilitate implementation of the systems discussed herein in either sensorbased or groundbased applications.
Of the three systems described, the first uses the 2D DCT to spatially decorrelate the bands, and ECPTCQ to encode the transform coefficients in the spectral dimension. The second system partitions the hyperspectral sequence into 8 x 8 x 8 cubes and applies a 3D DCT. "Likecoefficients" are collected and quantized using entropyconstrained trellis coded quantization (ECTCQ). The third system is a hybrid DPCM/DCT configuration whereby DPCM is used in the spectral dimension
19 and each "error image" is coded using a 2D nCT coder. In this coder, 8 x 8 blocks are transformed and like coefficients are collected and quantized using ECTCQ.
Codebooks for the 3n nCT system are optimized for different generalized Gaussian distributions and the appropriate codebook is chosen based on the fourth central moment, while the hybrid system assumes Gaussian and Laplacian statistics. Spectral codebooks for the ECPTCQ system are optimized for firstorder GaussMarkov sources with various correlation coefficients. Codebook design for all systems uses a modified version of the generalized Lloyd algorithm in a trainingsequencebased iterative scheme. Rate allocation is performed in an optimal fashion by an iterative technique which uses the ratedistortion performance of the various trellisbased quantizers.
20
CHAPTER 2
ENTROPYCONSTRAINED PREDICTIVE TCQ
2.1 Background
The function of any lossy image compression scheme is to represent a digital image with fewer bits than that required by the original sampled image, while adhering to certain quantitative and subjective fidelity criteria. Generally speaking, data compression is achieved by decorrelating the redundancies (either spatial or spectral, depending on the type of imagery) inherent to the imagery, and quantizing the decorrelated data.
Decorrelation methods fall into one of two categories, spatial domain methods and frequency domain methods. Spatial domain methods attempt to remove redundancies by operating on the data directly. An example is DPCM, where the past data sample is used to predict the current sample. Ideally, the difference between the current sample and the predicted sample (i.e., the "error sample") is not correlated with other error samples at different instances of time. Frequency domain methods can further be broken down into transform methods and subband methods. Transform methods involve applying an energy preserving transform to the data such that the representation is in a different domain. In the transform domain, the majority
21 of energy is concentrated in a small number of "transform coefficients," where the coefficients tend to be less correlated than the original data. There are many different transforms with good energy compaction properties [12],[37]. Of these, the OCT has emerged as the most popular, partially because there exist fast implementation algorithms [38]. Transform coding has proven to be a very effective technique for image coding [38],[39],[40],[37].
Subband methods divide the data into frequency components or "subbands" by applying bandpass filters to the data, each tuned to a different center frequency.
Ideally, the subbands so created are mutually uncorrelated. Subband coding was first introduced in [41] for speech signals. Quadrature mirror filters (QMF) were introduced [42] and shown to allow alias free signal reconstruction in the absence of quantization errors. Subband decomposition was extended to multidimensional signals in [43] and subband image coders were subsequently introduced in [44] and
[45]. Since then, a variety of subband coders have been developed [46],[47]'[48].
A particular class of subband decomposition is
wavelet decomposition [49],[50],[51].
Wavelets are a family of functions derived from scaling and translating a single function known as a "mother" wavelet[52]. At each stage in a wavelet decomposition, the input signal is divided into a coarse signal (i.e., a low passsignal) and a detail signal
(i.e., a highpass signal). The discrete wavelet transform effectively decomposes an input signal into a set of frequency subbands.
22
Once the data has been decorrelated, information reduction is achieved with the use of a
quantizer.
The simplest quantization scheme is the onedimensional or
scalar
quantizer (SQ). For our discussion, we assume the source to be a discretetime, zeromean memoryless stationary process
{Xi}
with probability density function (pdf)
fx(x)
and variance (72. An NIevel quantizer maps the source output
x
E
lR
into one of
N
values
Q17 Q2, .•. , QN,
each in
lR,
based on a set of thresholds T t ,
T2, ... , TNt.
That is, an input sample x is represented by a discrete output level
Q"
if x
E
(1117 T,]
(with To
=
00 and
TN
=
(0).
The
T,
are called
thresholds levels
and the
QI
are called
reconstruction levels.
The performance of the quantizer is typically measured by the meansquare error and is given by
P
=
N fTI
E
11
1=1 T,_l
(x 
QI)2 fx(x)dx.
(2.1)
Figure
2.1 shows an example of a 8level uniform scalar quantizer (USQ). Note that for a uniform quantizer, the reconstruction levels (shown along the vertical axis) and the threshold levels (shown along the horizontal axis) have uniform spacing A.
For N levels and equallength codeword assignment, the number of bits required to code each output index is
R
= log2
N
bits/sample.
Two types of distortion are inherent to any finitelevel scalar quantizer. These are granular distortion and overload distortion [12]. Granular distortion refers to errors incurred by representing x with
Q"
while overload distortion occurs when the input exceeds
XOL
as shown in Figure
2.1.
y
QS
Q7
Q6
XOL
Tl
T2
T3
Q5
Q4 T5
I
• •
I
T6 T7
XOL
x
r
Q3
~t.
Q2 r
Ql
Figure 2.1: Inputoutput characteristics of a uniform scalar quantizer.
23
24
For nonuniform memory less sources, the uniform scalar quantizer diverges from the ratedistortion function
R(D)
with increasing rate. (Recall that the rate distortion function gives the minimum distortion possible at a given rate for a particular distribution.). For example, when quantizing the memoryless Gaussian source using
USQ at
R
=
3 bits/sample, USQ is 3.79 dB from
R(D)
while at 7 bits/sample, the margin increases to 7.01 dB.
The performance of SQ can be improved for nonuniform sources by using nonuniformly spaced reconstruction and threshold levels [53]. These socalled LloydMax quantizers (LMQ) are optimum since they minimize the average distortion
p,
for a fixed number of levels
N.
The necessary conditions are obtained by differentiating
p
in equation (2.1) with respect to the
Q,
and
11
and setting derivatives equal to zero.
The
Q,
become the centroids of the area of the pdf bounded by
T'l
and
11,
and the
T,
become the midpoints of Q'l and
Q,.
Solutions for
11
and
Q,
can be computed iteratively. For logconcave distributions (i.e.,
(Plogjx(x)/8x
2
<
0) these conditions are also sufficient.
For quantizing the memoryless Gaussian source, LMQ is 0.35 dB better than USQ at 3 bits/sample and 2.68 dB better at 7 bits/sample. The performance disparity is even greater for the memoryless Laplacian source, where at 7 bits/sample, LMQ outperforms USQ by 5.46 dB. However, although the divergence from
R(D)
of LMQ is less than that of USQ, the performance of LMQ (with respect to
R(D»
at high
25 rates is still very poor, with LMQ being 4.33 dB and 7.08 dB from the respective ratedistortion functions of the Gaussian and Laplacian sources, at
R
=
7 bits/sample.
The performance of LMQ (or USQ) can be improved for encoding nonuniform sources by entropy coding the quantizer output with the use of a variablelength coding scheme such as Huffman [54] or Arithmetic [55] coding. These coding schemes enable the actual bit rate to approach the output entropy of the quantizer, which is defined as
HN
=
N
:E
P,log2 P,
bits/sample.
1=1
(2.2)
In this way, more levels can be used (which minimizes overload distortion) and noninteger encoding rates can be accommodated. Drawbacks of entropy coding include the increased complexity of the encoder/decoder, the additional buffering required to maintain a constant channel bit rate [56], and the increased susceptibility to error propagation [57].
At high encoding rates the optimum scalar quantizer is simply entropycoded
USQ [58]. However, at lower rates, improved performance can be obtained by using
entropyconstrained scalar' quantization
(ECSQ). In this scheme, the average distortion is minimized, while t.he output entropy is held below a certain value. The optimum reconstruction levels are computed using
Q,
=
1
D
iTI
.q
T'_l xfx(dx)
(2.3)
26 where
1',
is the probability that the source output assumes a value in the
Ith
partition:
1',
== prob(1i_l
<
X
:51i)
=
1'1
fx(x)dx.
T,_
1
(2.4)
The optimum threshold levels (for a given output entropy) are computed using
(2.5)
In this fashion, the levels of the scalar quantizer are optimized so as to redistribute the output probabilities such that levels with high probability of occurrence are assigned short codeword lengths and low probability levels are assigned long codewords. It is useful to note two important characteristics of entropyconstrained scalar quantizers.
First, the output entropy depends only on the threshold levels
1i,
and hence the
Ql
have no effect on the entropy. Secondly, it is shown in [59] that nearly equal performance can be obtained by using uniform thresholds. That is, rather than designing the threshold levels according to (2.5), it suffices to simply choose the
Tl
uniformly, with spacing
~, and calculate the
Ql
using equation (2.3) as before. The choice of
~ greatly simplified. It is shown in [60] that for the Laplacian and exponential densities, uniformly spaced thresholds satisfy the necessary conditions for optimality.
At 1 bit/sample, ECSQ is 0.24 dB better than entropycoded USQ for the memoryless Gaussian source, while the difference is 2.75 dB for the Laplacian source.
Although these differences in performance are source dependent, they are also attributable to the number of reconstruction levels. In fact, when the constraint on
27 the number of reconstruction levels is removed, ECSQ is known to provide a lower bound on the ratedistortion performance of all scalar quantizers [59]. For encoding all memory less sources, ECSQ comes within 1.53 dB of the respective ratedistortion functions for all nonnegative rates.
The reason why LMQ and USQ do not achieve performance comparable to ECSQ
(for nonuniform sources) is explained as follows. Let
xm
=
[Xl!
X2, ••• , xm]
be a block of m samples from a memoryless source
X
with joint density function
fx(x) = fx(xt,
X2, ••• , x m ).
As m becomes larger, mdimensional samples (blocks) become uniformly distributed inside a mdimensional region
'P
elRm. This result is the socalled asymptotic equipartition principal (AEP) [61]. The shape or boundary of this region is dependent solely upon the source distribution. For the Gaussian source the region is a mdimensional spherical shell, while for the Laplacian source, it is a mdimensional pyramid [62]. Entropy coding exploits AEP and in effect, provides performance increases equivalent to those produced by matching the boundary of an mdimensional codebook to that required by the source distribution. For the uniform source, this boundary is a mdimensional cube and ECSQ provides no performance increase over USQ (at high rates).
Thus far, we have discussed quantization in the context of one dimension. However, there are substantial performance advantages to quantizing blocks of data rather then a single sample. Given a mdimensional block of data
[Xl!
X2, ••• , x m ],
we choose a mdimensional codeword that is "closest" (as defined by some metric, usually MSE)
28 to the input. For quantizing mdimensional blocks at
R
bits/sample, a codebook of size
2Rm
is required. This form of quantization is known as vector quantization
(VQ). VQ can come arbitrarily close to the ratedistortion function by using properly designed codebooks and large vector dimension [63]. In effect, VQ places all its codevectors inside the high probability region
P.
Specifically, VQ gains advantages over uniform scalar quantization via three mechanisms 
boundary gain, granular gain,
and nonuniform density gain. [64],[65].
As discussed earlier, for large vector dimension, the codevectors must be distributed uniformly inside a mdimensional region
P.
The shape of this region depends only on the pdf of the source (spherical and pyramidal for Gaussian and Laplacian, respectively), and is independent of the distortion measure [65]. Uniform scalar quantizers place their codevectors in a mdimensional cubic lattice (obtained by taking a mfold Cartesian product of the scalar codebook), and cannot conform to the boundaries required by nonuniform sources. For uniform sources,
f'll
is uniform over a mdimensional cubic lattice and VQ provides no boundary gain over USQ (at high encoding rates).
Granular gain is a function of the shape of the Voronoi regions (partitions) of the quantizer. Granular gain is dependent only on the distortion measure and is independent of the source distribution. For the MSE distortion measure, the mdimensional region with the smallest normalized second moment is a msphere. Consequently, as m
~ the granular gain (as compared to the mdimensional cubic Veronoi regions
29 of USQ) of a mdimensional spherical region approaches 1.53 dB [66]. This is the maximum granular gain achievable or the
ultimate granular gain.
For smaller vector dimension m, nonuniform density gain is the gain achievable by selecting the vector codewords to lie closely in high probability regions and farther apart in other regions. This gain disappears as m gets large, since the codewords become uniformly distributed inside the high probability region
1', as dictated by
AEP. Of course, this gain does not exist for uniform sources.
For nonuniform vector quantizers, it is difficult to evaluate separately boundary and nonuniform density gains, since the shapes and volumes of the Veronoi regions are extremely varied [64]. Boundary gain and granular gain are also difficult to evaluate independently for both nonuniform and uniform (e.g., lattice) VQ at low encoding rates. However, at high encoding rates, it was shown in [65] that granular and boundary gain can be independently evaluated and that the maximum granular gain achievable is 1.53 dB.
2.2 Trellis Coded Quantization
Trellis Coded Quantization was developed in [26]. For encoding a memoryless source at
R
bits per sample, a codebook of size
2RH
is partitioned into four subsets, each containing 2
R

1 codewords. These subsets are labeled
Do, Db D2
and
D
3 ,
and are used as labels for the branches of a suitably chosen trellis. An example is shown in Figure 2.2 for R
=
2.
30
•
• •
01 01 01
00
•
01
•
02
•
03
•
I
00
•
01
•
02
•
03
•
Figure 2.2: A 4state trellis with subset labeling and codebook.
Sequences of codewords that can be produced by the TCQ system are those that result from "walks" along the trellis from left to right. For example, if beginning in the top left state of the trellis in Figure 2.2, the first codeword must be chosen from either
Do or
D2.
If a codeword from
D2
is chosen, then we walk along the lower branch (shown with a heavy line) to the second state from the bottom, at which we must choose a codeword from either DI or
D
3 •
Given an input data sequence
Xl,
X2, ••• ,
the best (minimum meansquare error) allowable sequence of codewords is determined as follows. For the
ith
stage in the trellis (corresponding to
Xi), the best codeword in the
lh
subset
(j
=
0, 1, 2, 3), say
Cj, is chosen and the associated cost
Pj
=
(Xi Cj)2 is calculated. Each branch in the
ith
stage of the trellis that is labeled with subset Dj is assigned cost Pj. The Viterbi algorithm [28] is then used to find the path through the trellis with the lowest overall cost.
31
,.... 1.0
ID
~ c
'0
(!)
:; c
~
(!)
0.5
2
4 6
Number of trellis states
8
Figure 2.3: Granular gain for various trellis sizes (lOg2
N).
10
A 256state TCQ system can achieve a granular gain of 1.36 dB (out of the maximum 1.53 dB). Further, a simple 4state system can achieve 0.99 dB. Figure 2.3 shows the granular gain of TCQ as a function of the number of trellis states.
Figures 2.4 and 2.5 show the performance of encoding zeromean, unityvariance, memoryless Gaussian and Laplacian sources, respectively, using TCQ with trellis sizes of 4 and 256. All TCQ codebooks were designed using the generalized Lloyd algorithm [67]. It is apparent that even a 4state TCQ system outperforms LMQ for all integer encoding rates. In fact, 4state TCQ outperforms LMQ by 1.57 dB and 2.23 dB when encoding the Gaussian and Laplacian sources, respectively, at 3
32
RateDistortion Function
Uniform Scalar Quontizer
LloydMax Quontizer
TCQ (4 states)
TCQ (256 stotes)
20
Figure 2.4: Performance of 4state and 256state TCQ for encoding the memory less
Gaussian source. bits/sample, while a 256state TCQ system achieves a performance increase of 2.02
~ dB and 2.73 dB, respectively.
As shown above, TCQ can realize a significant portion of the maximum achievable granular gain (e.g., 1.36 dB out of 1.53 dB with a 256state trellis). However, it does not realize any boundary gain (at high rates). Some nonuniform density gain can be realized by using nonuniform codebooks.
The complexity of TCQ is roughly independent of the encoding rate. The computations required per data sample are 4 rate(R  1) scalar quantizations with associated distortion calculations, 2N adds and N twoway compares, where N is the number of trellis states. For comparison, the encoding and design complexity of VQ
33
Shannon Lower Bound
Uniform Scalar Quantizer lloydMax Quantizer
TCQ (4 states)
)
..
••• It>
",,0
' "
.<~'>
."/
,:, •• , .... »
~~~:~::::,:
..
." ,"
.;;.{{;:~".::::~;/
./"
..
;;
".~/
/
".
". .~.
",.:;.
/,,:7
.:,.,
.,.:/.,
~ .0..~., o
J'~
__
5 10
SNR (dB)
15 20
Figure 2.5: Performance of 4state and 256state TCQ for encoding the memoryless
Laplacian source. grows exponentially with
Rm,
where
R
is the encoding rate in bits/sample and m is the vector dimension. By evaluating the asymptotic VQ bound, [26] showed that no VQ of dimension less than 69 can achieve performance comparable to that of a
256state TCQ system. Further, a lattice VQ of dimension 24 would be required to match the performance of a simple 4state TCQ [65]. Application of VQ with high vector dimension is computationally prohibitive (for both encoding and codebook design) for moderate to high rates. Even at low rates, a 4state TCQ system is far easier to implement than a VQ with similar performance. It should be noted that several reducedcomplexity VQ schemes exist, but at the expense of some reduced performance [62],[68].
34
2.3 EntropyConstrained Trellis Coded Quantization
Although the performance of TCQ is far superior to that of LMQ, there is still much performance to be gained by entropy coding the output of a specially designed
TCQ system.
Note that at each step in the encoding, the codeword must be chosen from either
Ao
=
Do
U
D2
or
Al
=
DI
U
D
3 •
Each of these "supersets" contains 2R codewords and hence, given an initial trellis state, the sequence of selected codewords can be transmitted using one Rbit label for each sample. In [69], the codeword labels (as described above) were noiselessly compressed using one variable length code for each superset. The encoding rate achievable by this process is the conditional entropy of the codebook
G,
given the superset:
H(GIA)
= 
I
E E
P(cIAi)P(A,)log2 P (cIA,).
,=0 cEAi
(2.6)
For a codebook of size
2R+1
(as discussed above) this noiseless compression causes the encoding rate to fall below
R
bits/sample. Thus, the size of the codebook should be increased.
An obvious method of adjusting the output entropy of the TCQ system (as computed using (2.6)) is to use a midtread uniform codebook with spacing
~, as shown in Figure 2.6. Inclusion of the zero codeword enables encoding rates below one bit/sample to be achieved [59]. As!1 gets large, the top state in Figure 2.2 will rarely be exited and the output entropy (computed using (2.6)) will drop to zero.
35
• •
•
D3 DO Dl D2 D3 DO Dl D2 D3 DO Dl
•
• •
• •
• • •
•
• •
56 46 36 26 6 o
6 26 36 46 56
Figure
2.6:
Uniform TCQ codebook.
• • •
Figure
2.7 shows the performance of an 8state TCQ system using a uniform codebook. Points on the curve were generated by varying the value of A. For comparison, the curve generated by encoding the source using (entropycoded) USQ is also shown
(where the entropy is calculated using equation
(2.2)).
Note that at high rates, the
TCQ system comes within 0.5 dB of the ratedistortion function and outperforms entropycoded USQ by about 1.0 dB. However, at low rates, the performance of
TCQ deteriorates, where, for example, the TCQ system is approximately 2.3 dB away from the ratedistortion function at 0.75 bits/sample.
It is shown in [69] that improved performance of the TCQ system can be obtained at low rates by employing the codebook design and encoding rule from [70]. This algorithm is a modified version of the generalized Lloyd algorithm for vector quantizer design [70], and is used to minimize the MSE of an encoding, subject to an entropy constraint. This is accomplished by minimizing the cost function
J
=
E[p(x, c)]
+
AE[I(c)],
(2.7) where
x
is the data, c is the encoded version of
x, p(x,
c) is the cost (usually MSE) of representing
x
by c,
A is a Lagrange multiplier, and
l(c)
~
log2
P(xIA j )
is the number of bits used by the variablelength code to represent c. This algorithm chooses the "best" codeword by considering both the MSE and the number of bits
36
RoteDistortion Function
Entropy Coded TCQ
Entropy Coded USQ
5 10
SNR (dB)
15
20
Figure 2.7: Performance of an 8state entropy coded uniform TCQ for encoding the memoryless Gaussian source. required to represent the particular codeword. Utilization of the encoding rule in
(2.7) significantly improves the low rate performance of the TCQ system.
It is shown in [65] that for the memoryless Gaussian source, the maximum boundary gain obtainable is equivalent to the gain realized by entropycoding the quantizer output. Experimentation has shown that this is also true of various generalized
Gaussian sources. For example, for encoding the memoryless Laplacian source using ECTCQ, performance within 0.5 dB (for an 8state trellis) of the ratedistortion function is also obtained.
Figure 2.8 shows the performance of an 8state entropyconstrained TCQ (ECTCQ) system for encoding the zeromean, unitvariance Gaussian source. Also shown is the
37
4
RoteDistortion Function
EntropyConstrained TCQ
EntropyConstrained SQ
5
10
SNR (dB)
15 20
Figure 2.8: Performance of an 8state entropyconstrained TCQ for encoding the memoryless Gaussian source. performance of ECSQ. The 8state ECTCQ system comes within 0.5 dB of the ratedistortion function for all nonnegative rates and is about 1.0 dB better than ECSQ at high rates.
2.4 EntropyConstrained Predictive Trellis Coded Quantization
For sources with memory, a predictive TCQ (PTCQ) system was developed in
[26]. In that system, each path through the trellis corresponds to a potential output sequence of PTCQ. Predictions of the data sample
Xi are made at each state using the potential output sequences leading into that state. The prediction residual for state
k,
say
df,
is calculated by subtracting the prediction from the current data sample.
38
The codeword closest to the prediction residual in each subset corresponding to a branch exiting state
k
is chosen and the branch cost is calculated as the squared error between that codeword and the prediction residual at state
k.
The quantized value of the current data sample (for each branch) is formed by adding the chosen codeword (quantized prediction residual) to the prediction. As in the memoryless case, the Viterbi algorithm is used to choose a path through the trellis.
Consider now a codebook with separate decision and reconstruction codewords.
The codeword chosen for each branch is that which minimizes the distance between the prediction residual (of the state from which the branch emanates) and a decision codeword
tj.
However, the quantized prediction residual is represented by the corresponding reconstruction codeword
Cj.
The cost associated with each branch (for use in the Viterbi search) is set to
pf
=
(df 
t
j)2.
The decision codewords
t
j are formed by taking a uniform scalar codebook with stepsize
~ and assigning its codewords
(from lefttoright) to subsets
Do, Db D
2 ,
D
3 ,
Do, Db D
2 ,
D
3 , ••••
It is important that a "midtread" codebook be used, and that the "zero" codeword is included in the Do subset [69]. By partitioning the decision codewords in this fashion, four codebooks are created, each corresponding to a uniform threshold scalar quantizer with step size
4~.
The reconstruction codewords
Cj are the centroids of support regions defined by tj
±
2~.
Philosophically, this process is a generalization (to a trellisbased structure) of that employed in [71]. In that system, the codewords used for reconstruction are computed
39 as centroids (based on the probability density function (pdf) of prediction residuals) while the decisions of which codewords to use are based on uniform thresholds. From the point of view of minimizing a cost function, this is equivalent to making codeword decisions based on minimizing the distance between prediction residuals and the codewords of a uniform codebook. This is exactly what is done by the ECPTCQ system. In fact, finding the best
tj
in each subset can be accomplished by performing a uniform threshold quantization.
Design of the reconstruction codewords is facilitated by using the following algorithm.
1. For a given stepsize
~ and integer
N,
let c~O)
=
tj,j
=
1, ... ,
N,
be the codewords of a midtread uniform scalar quantizer with stepsize~.
Partition these codewords into subsets and set i = o.
2. Set i
= i
+
1; quantize the training data and assign the prediction residual,
d,
at each stage (of the surviving path) to one of the sets, 8
1 , ••• ,
8N,
corresponding to the codeword selected for that residual.
3. For each set
8j, compute a reconstruction codeword as
cy)
= lijl
EdESj d,
where
18jl is the number of elements belonging to
8j.
4. If the difference in quantizer structure
k
Ef=1
Ic~i)
_cYl)l,
is greater than some small
€
>
0, go to step 2. Otherwise, stop with codebook
C.
40
Experiments using different codebook sizes at various rates were performed to determine the appropriate codebook size N. It was found that excellent performance can be obtained when
N
is at least
2(R+5) , where
R
is the desired entropy. It should be noted that the ECPTCQ training algorithm is not guaranteed to converge. However, suitable codebooks can typically be obtained in less than five iterations with N
=
512 and
€
=
0.0001.
The algorithm described above is similar to those employed in [71] and [72]. It is closest to the algorithm in [71] with the major difference being that we compute centroids from prediction residuals obtained from training data, while in [71], centroids are computed from approximate prediction residual pdfs obtained through polynomial expansions.
2.4.1 Performance of ECPTCQ
The performance curve for an 8state ECPTCQ system is shown in Figure 2.9. For comparison, Figure 2.9 also shows the performance of an entropyconstrained DPCM system obtained by applying the same training algorithm to a scalar quantizer based system. In each case, the codebooks were trained using 50,000 data samples from a unityvariance firstorder GaussMarkov source with correlation coefficient
p
=
0.8.
The results of Figure 2.9 were then obtained by encoding 50,000 samples from outside the training sequence. Each curve was produced parametrically by varying the value of the step size~.
(i.e., Each point on a particular curve was obtained by applying
41 the algorithm with a different choice of
~.).
Although not shown here, experiments with different values of p provide similar results.
At rates above about 2.0 bits/sample, the performance of the DPCM system in
Figure 2.9 is equivalent to that obtained in [71]. At rates below 2.0 bits/sample, the
DPCM results reported here are superior to those in [71]. Apparently, at low rates, the polynomial model used there is not sufficiently accurate and the trainingsequencebased approach yields superior performance. The ECPTCQ system provides a further performance increase over both scalar quantizer based schemes for all rates above about 0.25 bits/sample. For example, at 1.0 bits/sample, ECPTCQ is roughly 1.1 dB and 1.75 dB better than the ECDPCM results reported here and in [71], respectively.
Asymptotically, as the rate is increased and the quantization becomes very fine, the residual pdf should approach that of the memory less Gaussian source, and the performance of ECPTCQ (and ECDPCM) should approach that of entropyconstrained
TCQ (and entropyconstrained scalar quantization) operating on the memoryless
Gaussian source. This is in fact what occurs. At high rates, the performance of t.he
ECDPCM system is within 1.53 dB of the ratedistortion function (as expected from
[59],[71]) and the ECPTCQ system is within 0.5 dB of the ratedistortion function (as expected from [73],[69]). It should be pointed out that this convergence is markedly faster for the scalar system than for the trellis based system. The asymptotic result is nearly achieved for the DPCM systems at about 2.0 bits/sample, while the
ECPTCQ system requires rates as high as 5 bits/sample. It is expected that this
42
2.5
~
Q.
2.0
E o
!
1.5
~ e
;5
1.0
0.5 o
",""
",,"
."
••.••. " "
..... >
Rotedistortion function
•••
Entropyconstrained PlCQ
Entropyconstrained DPCt.4
...... " ,
..
'
.,
•••••• , "
.....
.0
'"
...
.O'
..
,,"
.,
'
.....
.,
....
•••
..
'
"
"
.,
..
'
til
.ee
.oo'"
'"
,
..
'
•• e
.'
,.e ."
. ,
.e· '"
' "
'"
"
;I'
.,;..,;..;..:,.:,.;
.
.
5
__ __
L~~~~~L~
10
SNR (dB)
15 20
Figure 2.9: SNR performance of ECPTCQ. disparity can be reduced by designing codebooks with optimized thresholds (rather than the uniform thresholds used here).
43
CHAPTER 3
HYPERSPECTRAL IMAGE CODER USING ECPTCQ
3.1 System Description
In this chapter, we present a system for encoding hyperspectral imagery which uses the 2D DCT and ECPTCQ. A block diagram illustrating the procedural flow of the system is shown in Figure 3.1. Each spectral band is partitioned into 8 x 8 blocks and the 2D DCT is applied to each of these blocks. The transform coefficients at each spatial location are then collected to obtain spectral vectors. Thus, for a hyperspectral image with
L
bands, each of size
N
x
N,
there are
N
2
spectral vectors, each of length
L.
Since the spectral vectors are highly correlated, they are encoded using ECPTCQ. Additionally, to avoid degradation due to startup transients, the transform coefficients of the first band are encoded using ECTCQ.
3.2 Codebook Design
Experimentation using hyperspectral data from AVIRIS reveals that the correlation coefficient
p,
of the spectral vectors within any transformed block ranges from about 0.6 to 0.95. More specifically, the spectral vectors corresponding to the DC coefficients have the highest correlation (e.g., 0.95) while the vectors corresponding
CoIr. Coeff. r
Calc. Specual
Mean.
Variances
Corr. Coeff.
Calc.
Slats.
SpecIral mCIIIJ
Q l
I
TCQ
I
Coder.
,
I
All~tc.1
(Side
Info)
QH~I

InIClband
ECPI'CQ
Encoder
T
Hyperspectral
InPUt
2D ocr
DECOMP.
Intcrbancl
ECPI'CQ
Encoder
T
MUX
Reoonstructed
Hyperspectral
Image INVERSE ocr
Intcrbancl
ECPrCQ
Encoder
L .   . #
T
(Side
Info)
,
I
Digital
I
Channel
!
Recon.
Cocff.
Recon.
Cocff.
DEMUX
~
Figure 3.1: Hyperspectral image coder.
44
45 to the highest frequency coefficient (in both spatial dimensions) have the lowest (e.g.,
0.60).
Accordingly, codebooks were designed using the ECPTCQ training algorithm for correlation coefficient values ranging from
0.60 to
0.95, in
0.05 increments. For each allowable value of
p,
twentyone codebooks were designed with rates ranging from
0.25 bits/sample to
5.25 bits/sample, in quarterbit increments. Each training sequence consisted of 50,000 samples from a firstorder GaussMarkov pseudo random number generator with the appropriate value of
p
(as discussed above).
The first spectral band in the hyperspectral sequence must also be encoded, with each quantized coefficient being used as the initial condition to encode the corresponding spectral vector using ECPTCQ. Coefficients corresponding to the same position within each block ("likecoefficients") are collected into sequences to be encoded using ECTCQ. Codebooks were designed using the algorithm in [69]. This algorithm uses a modified version of the generalized Lloyd algorithm [70] to minimize the MSE of an encoding, subject to an entropy constraint. This is accomplished by minimizing the cost function
J
=
E[p(x,
c)]
+
XE[l(c)]
(3.1) where
x
is the data, c is the encoded version of
x, p(x,
c) is the meansquarederror of representing
x
by c,
A
is a Lagrange multiplier, and
l(c)
~
log2
P(xIAi)
is the number of bits used by a variablelength code to represent c. This process considers both the MSE and the number of bits used by the variable length code to choose the
"best" codeword.
46
It was shown in [69] that for rates less than 2.5 bpp, optimum codebooks do not yield increased performance over uniform codebooks. Thus, optimum codebook with
256 elements were designed in onetenth bit increments for rates up to 2.4 bpp, with uniform codebooks used thereafter.
The sequence corresponding to the DC coefficients is assumed to have Gaussian statistics, while the remaining highfrequency sequences are assumed to have Laplacian statistics [74]. Therefore, training sequences consisted of 100,000 samples derived from Gaussian and Laplacian pseudo random number generators, respectively.
3.3 Side Information and Rate Allocation
The side information required for this algorithm is substantial. In principle, the mean, standard deviation, and correlation coefficient for each spectral vector must be transmitted. Additionally, the transform coefficients of the first band must be transmitted, along with the initial states for the trellises used to encode spectral vectors having nonzero rates (assuming 4state trellises, each initial state requires 2 bits).
Fortunately, for a given spatial frequency, the variance of the correlation coefficients is extremely small. That is, if a sequence is formed by collecting the spectral correlation coefficients from each block at a given frequency, the variance of the sequence is negligible compared to its mean or average value. Thus, only the average
47 correlation coefficients (quantized using 16 bit uniform scalar quantizers) are transmitted as side information and used for encoding and rate allocation. Since for an
8 x 8 DCT, there are 64 spatial frequencies, only 64 average correlation coefficients need to be transmitted.
Similarly, the variance of the spectral standard deviations is quite small for all spatial frequencies except DC. Thus, the average spectral standard deviations (again, quantized to 16 bits) are used in encoding and transmitted as side information for each of the 63 nonDC spatial frequencies. The DC spectral standard deviations change considerably throughout the image, but are highly correlated in neighboring blocks.
Thus, the DC spectral standard deviations are quantized using ECPTCQ with raster scan (back and forth) at 5.25 bits/sample l
.
The "extra" side information required for encoding this sequence is 66 bits (16 bits each for the initial value, mean, standard deviation, and correlation coefficient of the sequence, and 2 bits for the initial trellis state).
Unfortunately, the means of the spectral vectors are quite random in nature, except for the DC spectral means, which exhibit a very high degree of correlation
(typically 0.99) if scanned in an order similar to that of the DC spectral standard deviations. Hence, ECPTCQ is also used to encode this sequence at 5.25 bits/sample
(plus an additional 66 bits, as in the case of the DC spectral standard deviations). lThe highest rate codebook available was used to ensure accurate representation of these values.
48
Since the remaining spectral means exhibit no significant correlation, ECTCQ is used to encode the 63 sequences formed by grouping all spectral means for a given (nonDC) spatial frequency ("likemeans"). These sequences are encoded at an average rate chosen so that the rate required for all spectral means (including the
DC means) is
RJj
bits/pixel. The first spectral band, which is encoded in the same manner as the nonDC spectral means, is assigned an average rate of
Rl bits/pixel.
Rate allocation is performed using the algorithm described in [75]. This scheme uses the ratedistortion performance of different quantizers to provide a nearoptimal allocation of bits, given an overall bit quota. The overall MSE incurred by encoding the
N
2
spectral vectors using ECPTCQ at an average rate of
Rs bits/coefficient is represented by
Es
N2
=
~u?Eij(ri) i=l
(3.2) where
ul
is the variance of sequence i, and
Eij(ri) denotes the ratedistortion performance of the lh quantizer (e.g.,
p
=
0.6 to 0.95 in 0.05 increments) at ri bits/sample.
The rate allocation vector
B
=
(ri'
T2, •• • ,
rN2)
is chosen such that Es is minimized, subject to an average rate constraint:
N2
Ra bits/coefficient. (3.3) i=1
It is shown in [75] that the solution
B*(
ri, ri, ... ,
rN2)
to the unconstrained problem min
B
(3.4)
49 minimizes
Ell
subject to
Ef:l
ri ::;
Ef:l
r;.
Thus, to find a solution to the constrained problem of equations (3.2) and (3.3), it suffices to find
A such that the solution to equation (3.4) yields
Ef:l
r; ::;
R
II •
Procedures for finding the appropriate
A
are given in [75).
For a given .A, the solution to the unconstrained problem is obtained by minimizing each term of the sum in (3.4) separately.
If
Sj
=
{p;, ... , qj}
is the set of allowable rates for the
ph
quantizer and r; is the
ith
component of the solution vector
B*,
then r; solves ri min
E
S j
{O':
Eij(ri)
+
.Ari}.
(3.5)
Testing revealed that equal performance was obtained whether globally allocating bits using all
N
2
spectral variances, or by using the 64 average spectral variances (as discussed above). The latter approach is adopted since the rate allocation algorithm need only take into account 64 spectral variances (rather than
N2),
thus resulting in far fewer computations. Also, this results in all blocks having identical spectral bit maps. All spectral variances are quantized prior to being used in the spectral rate allocation algorithm.
As in the case of the spectral vectors themselves, rate allocation for the spectral means and the first spectral band is accomplished using the algorithm of [75). The scalarquantized variances of the "likecoefficient" and "likemean" sequences are used in the rate allocation algorithms for the first band and spectral means, respectively.
It should be noted that the rate allocation for the spectral vectors is used to constrain
50 the spatial rate allocation of the first spectral band. Thus, if any spectral vector is assigned zero rate, the corresponding transform coefficient in the first spectral band is also assigned zero rate. Finally, any spectral vector that is assigned zero rate is set to (the quantized value of) its corresponding spectral mean.
The total side information required for encoding an
N
x
N
hyperspectral image consists of the spectral correlation coefficients (64· 16 bits), spectral standard deviations (63 . 16
+
5.25(N2/82)
+
66 bits), the spectral means
(R/1N2
bits), the first spectral band
(R1N2
bits), and the initial trellis states
(2n
bits, where
n ::; N2 is the number of spectral vectors assigned a nonzero encoding rate). Combining these quantities yields (2098
+
2n
+
(R/1
+
Rl
+
0.082)N2)
bits. The overall encoding rate for the system operating on a hyperspectral image with
L
bands is then
(R6(L
1)
+
RJ.'
+
Rl
+
0.082
+
(2098
+
2n)/N2)/L
bits/pixel/band (b/p/b).
Figure 3.2 shows the overall encoding rate of a hyperspectral sequence as a function of the number of spectral bands. In this particular case, the asymptotic rate
R6
is 0.10 bits/coefficient and the encoding rates of the first band Rt, and the spectral means
R/17
are 1.0 and 2.0 bits/pixel (bpp), respectively. Note that for small sequences (e.g., less than ten bands), the side information dominates the overall rate.
3.4 Results
Coding simulations were carried out using a 140band, 8bit byperspectral data sequence from Cuprite, Nevada, obtained by the AVIRIS system. The bands were
51

Number of spectral bands
Figure 3.2: Overall rate versus number of spectral bands.
256 x
256 pixels and were taken from larger images (for computational simplicity).
The performance of the encoding system is reported using the peak signaltonoise ratio defined as
(255)2
10log
1o 1
Nl Nl
~
N2 Eu=O EtI=o
II(u,v)  I(u,v)1
2 where
i(u,v)
is the coded version of the original band
I(u, v).
(3.6)
For a desired asymptotic rate of
RII
=
0.1 b/p/b, the first band in the sequence was quantized at
Rl
=
1.0 bpp and the spectral means were quantized at
RI'
=
2.0 bpp. It was found that these rates are a good compromise between MSE performance and side information. These choices lead to side information totaling about 3.65 bpp.
For short subsequences of hyperspectral data, this amount of side information can
52 be quite significant. However, if all 140 bands are coded, the side information is only
0.025 b/p/b.
Figure 3.3 shows the PSNRs obtained by encoding bands 30 through 69 of the hyperspectral sequence with
RII
=
0.10 b/p/b. If all 140 bands were encoded, this would correspond to an overall rate of about 0.125 b/p/b. The average PSNR of the coded sequence is 43.10 dB, with the PSNR of some bands approaching 46 dB.
The dip in PSNR around bands 56 and 57 is indicative of high sensor noise that is clearly evident upon visual examination. Figures A.1 and A.2 show the original and coded image from band 50, respectively, while Figure A.3 is the difference image obtained by displaying the magnitude of the error
2
•
The coded image is virtually indistinguishable from the original with no artifacts or contrast variations. All fine spatial detail is preserved with essentially no blurring. This subjective performance is indicative of the entire sequence. Note the complete absence of structure in the difference image.
The overall encoding rate for short sequences can be significantly reduced (at the expense of MSE performance) by assigning less rate to the first spectral band and the spectral means. For example, Figure 3.4 shows the PSNR values of coding bands
30 through 69 at
RlJ =
0.1 b/p/b as before, but with Rl = 0.75 bpp and
RJl
=
0.75 bpp. Although the average PSNR for the sequence has dropped to 41.34 dB (a decrease of 1.76 dB from the previous case), the performance of the system is still
2For the purpose of visual display, the difference images discussed herein were multiplied by a factor of
45 and hard clipped at
255.
53
50
..,
'it
III
Q.
40
35
Figure 3.3: Performance of encoding hyperspectral sequence at
=
1.0 bpp, and
R~
=
2.0 bpp.
Ra
=
0.10 b/p/b,
Rl extremely good. The advantage gained, however, is the significant drop in overall rate for shorter subsequences. Figure A.4 shows the coded version of band 50 with the reduced side information. Note that this image is nearly identical to the image coded with greater side information (and to the original), as would be expected from its PSNR of 42.1 dB. The difference image of Figure A.5 reveals greater error and structure as compared to the case of high side information.
The system achieves a compression ratio of 42:1 at an average PSNR of 43.10 dB when encoding a 40band sequence. The same level of performance is attainable at compression ratios of 64:1 if 140 bands are coded. Additionally, compression ratios
54
45
Iii'
"0
'it
VI
Q.
40
35
30~~~~~~~~~~~~~~~~~~~~~~~
~ ~ ~ ~ ~
Bond
Figure 3.4: Performance of encoding hyperspectral sequence at
RlI =
0.10 b/p/b,
Rl
=
0.75 bpp, and
Rp.
= 0.75 bpp. approaching 70:1 can be obtained if coding all 209 bands of the AVIRIS system. Further, by decreasing the side information of
Rl and
Rp.
to 0.75 bpp each, compression ratios of 40:1, 53:1, 70:1, and 73:1 can be obtained for sequences of length 20, 40,
140, and 209, respectively, with a decrease of less than 2 dB in average PSNR.
There are many operational modes of the system, each obtained by adjusting
RlI ,
R
1 ,
Rp.,
and the number of trellis states. The two modes used here were chosen to give extremely high quantitative and subjective quality of the coded sequences (at very low rates), with varied amounts of side information. Depending on the quality required by the application, the system can easily be operated at compression ratios in excess of 100:1. This mode may be ideal for browsing or rapid analysis of the data.
55
The computational complexity of our coder is moderate, with the majority of computations being used to compute the DOT of each spectral band. Since 4state trellises are used exclusively throughout the system, the quantization process requires minimal computation [26]. However, the transform coefficients of the entire hyperspectral sequence must reside in memory, which may limit its use to groundbased applications. Of course, much of this problem can be alleviated by breaking the sequence into shorter subsequences and coding each individually.
56
CHAPTER 4
3D nCT HYPERSPECTRAL CODER
4.1 System Description
A hyperspectral coding system using the 3D DCT and ECTCQ is shown in Figure 4.1. The image sequence is partitioned into 8 x 8 x 8 cubes and transformed using the 3D DCT. Coefficients corresponding to the same position within each cube ("likecoefficients") are collected into sequences to be encoded using ECTCQ.
For a hyperspectral sequence with
L
bands, each of size
N
X
N,
the total number of sequences to be encoded is 8
3
=
512, each of length
LN2/512.
Input
Sequence
DCTof8x8x8
Cubes
Recon.
Sequence
Inverse ncr
Figure 4.1: Hyperspectral image coder using the 3D DCT and ECTCQ.
57
4.2
Codebook Design
The probability distribution of each likecoefficient sequence is modeled by the socalled Generalized Gaussian Distribution (GGD), whose probability density function
(pdf) is given by
(4.1) where
11(a,O")
_ 1
[r(3/a)]1/
2
=
0"
r(l/a)
(4.2)
The shape parameter a describes the exponential rate of decay and
0"
is the standard deviation of the associated random variable
[59].
The gamma function r() is defined as
(4.3)
Distributions corresponding to a
=
1.0 and
2.0 are Laplacian and Gaussian, respectively. Figure
4.2 shows generalized Gaussian pdfs corresponding to a
=
0.5, 1.0, 1.5,
2.0, and
2.5.
It can be shown that
(4.4) or
K
=
E[X4]
= r(5/a)r(1/a)
0"4
r(2/a)r(3/a)
(4.5) where
K
is the fourth central moment, or Kurtosis. Recall that the Kurtosis is a measure of the peakedness of a given distribution.
If a pdf is symmetric about its
58
.'
. .
:j
! :
: j
! :
: i
! :
: i
I •
: r,! i/
.~
::.:
..
~
!
:j
I. q
!: q
I . alpha
=
0.5 alpha .. 1.0 alpha
=
1.5 alpha alpha
= 2.0
=
2.5
2 o
Normalized Input
2 4
Figure 4.2: Probability density function for generalized Gaussian distributions with alpha values of 0.5, 1.0, 1.5, 2.0, and 2.5. mean and is very flat in the vicinity of its mean, the coefficient of Kurtosis is relatively large. Similarly, a pdf that is peaked about its mean has a small Kurtosis value.
The sample Kurtosis of any sequence can be calculated easily and used as a measure by which the distribution of the sequence can be determined. Figure 4.3 shows the relationship between the shape parameter
a
and K. This graph is used to determine the appropriate
a
for a particular sequence.
Codebooks were designed for generalized Gaussian distributions with
a
values of
0.5,0.75, 1.0, 1.5, and 2.0, using the algorithm in [69). It was shown in [69) that for the Gaussian distribution, optimum codebooks do not yield significant MSE improvement over uniform codebooks at rates greater than 2.5 bits/sample. Experimentation
60r~~~r~~T~
59 o
__
~~ ~
____
~
____
~
__
~
2 alpha
<4
Figure 4.3: Kurtosis vs. alpha revealed that this is also true for a = 1.5 and a = 1.0. However, for a = 0.75, optimum codebooks are superior up to 3.0 bits/sample, while for a
=
0.5, optimum codebooks should be used up to 3.5 bits/sample. Accordingly, for a values of 2.0,
1.5, and 1.0, optimum codebooks were designed in onetenth bit increments up to
2.5 bits/sample, while for a
=
0.75 and a
=
0.5, optimum codebooks were designed in onetenth bit increments up to 3.0 and 3.5 bits/sample, respectively. Thereafter, uniform codebooks were designed in onetenth bit increments up to 12 bits/sample.
Training sequences consisted of 100,000 samples derived from generalized Gaussian pseudo random number generators, each tuned to the appropriate a value.
60
4.3 Side Information and Rate Allocation
Given the threedimensional frequency space in the transform domain where the
x
and y axes represent the spatial dimensions and the
z
axis represents the spectral dimension, all likecoefficient sequences should (theoretically) be zero mean except the DC sequence (i.e., x,
y, z
=
0). In fact, the sample means of the likecoefficient sequences drop off very rapidly in any frequency direction relative to DC, with the mean of the DC sequence typically being two orders of magnitude greater than the means of those sequences corresponding to frequencies one position higher along any axis (i.e.,
(x,y,z)
=
(1,0,0), (0,1,0), and (0,0,1)). Thus, only the DC mean should need to be transmitted as side information. Experiments confirm this for the
(x,
0, 0) and (0,
y,
0) sequences which have sample standard deviations at least an order of magnitude larger than their corresponding sample means. On the other hand, the extremely high correlation along the zaxis (spectral dimension) causes a sharp drop in standard deviation along that axis. As a result, the sample means along this axis are within an order of magnitude of their respective standard deviations. For this reason, we chose to transmit all eight sample means for the sequences with coordinates of the form (0,0,
z).
The side information then consists of 512 standard deviations and 8 spatial means.
These quantities are quantized using 16 bit uniform scalar quantizers, resulting in a total of (520)(16)
=
8320 bits. The initial trellis states are also transmitted which
61
(for 4state trellises) total (512)(2)
=
1024 bits. Combining these quantities yields
9344 bits of side information per hyperspectral sequence.
Rate allocation is performed by using the algorithm described in [75]. The overall
MSE incurred by encoding the likecoefficient sequences using ECTCQ at an average rate of
Ra
bits/coefficient is represented by
Ell
512
=
'EulEij(ri) i=l
(4.6) where
ul
is the variance of sequence i, and
Eij(ri)
denotes the ratedistortion performance of the
lh
quantizer (e.g., the quantizer corresponding to the Kurtosis of sequence i) at
n
bits/sample.
4.4 Results
Coding simulations were performed using a 140band hyperspectral image sequence obtained from the AVIRIS system. The bands were 256 x 256 pixels and were taken from larger images (for computational simplicity). The performance of the system is reported using PSNR.
Figure 4.4 shows the PSNRs obtained by encoding bands 30 through 69 of the hyperspectral sequence at an asymptotic bit rate of 0.1 b/p/b. The average PSNR of the coded sequence is 40.75 dB. Figure A.6 shows the coded version of band 50 from the sequence. The coded image is very similar to the original (shown in Figure A.1).
We note a very fine blurring in some areas, but no contrast variations or artifacts
62
50
45
35
Figure 4.4: Performance of encoding hyperspectral sequence at
RII
=
0.10 b/p/b. are present. This subjective performance is indicative of the entire sequence. The difference image of Figure A.7 shows a fairly small coding error with very little structure.
Figure 4.5 shows the overall coding rate of the system as a function of sequence length. The overall rate for encoding the 40band sequence is 0.104 b/p/b (for
R,
=
0.1 b/p/b). If all 140 bands were coded, the overall rate would be 0.101 b/p/b. It is evident that the overall rate (for a given asymptotic rate) required by the 3D DCT system is roughly independent of image sequence length and image size for sequences longer than 15, after which the side information contributes less than 10% to the overall rate.
63

,...
"C
C o
0.6 I
~
II
[
e
~
~O.2l


~
Number of spectral bonds
Figure 4.5: Overall rate versus number of spectral bands
The 3D DCT system achieves a compression ratio of 77:1 at an average PSNR of
40.75 dB. The same level of performance is attainable at compression ratios of 79:1 if all 140 bands were encoded. Moreover, the subjective performance of the coder is excellent.
The complexity of the 3D DCT coder is moderate to high, being more computationally demanding than either the ECPTCQ coder discussed in Chapter 3, or the hybrid coder in Chapter 5, since the sequence is transformed in all three dimensions. Moreover, the entire sequence must reside in memory for calculation of sample statistics and encoding. The memory requirements can be lessened by breaking the sequence into shorter subsequences and coding each independently.
64
CHAPTER 5
HYBRID DPCM/DCT HYPERSPECTRAL CODER
5.1 System Description
DPCM is a simple and wellknown method of achieving moderate compression of correlated sequences [13],[76],[12]. Given a pixel
XiI, the next pixel in the sequence,
Xi, is predicted. If is the predicted value of
Xi, then the difference, fi
=
XiXilil, will, on average, be significantly smaller in magnitude than
Xi.
Accordingly, fewer quantization bins, and thus, fewer bits are required to encode the error sequence than would be required to encode the sequence of pixels. It can be shown that for a nonzeromean input sequence, the optimum (minimum MSE) firstorder linear predictor is given by
(5.1) where
Jl
and
p
are the mean and correlation coefficient of the sequence, respectively, and xilil is the predicted value of
Xi.
It is apparent that the input sequence must be normalized to zero mean or the sequence mean must be included in (5.1). In either case, for applying DPCM to a nonzeromean sequence, the mean must be transmitted as side information.
65
2D ocr
DccompoiitiOll
2DInvene ocr
XnlDl
Figure 5.1: Hybrid DPCM/DCT hyperspectral image encoder
The utilization of DPCM to exploit the spectral correlation of hyperspectral imagery is straightforward. For an Lband image of size
N
x
N,
an ordinary DPCM loop could be employed to encode each of the
N
2
sequences that result from treating each hyperspectral pixel as a sequence of length
L.
Unfortunately, while this scheme exploits spectral correlation, it does not exploit the spatial correlation inherent to the data, and requires the transmission of
N2
spectral means.
These problems can be circumvented by using the encoder configuration shown in Figure 5.1 [77],[78],[79]. Here, the DPCM loop operates on entire images rather than on individual spectral sequences. Given an image
X n , the next image in the hyperspectral sequence is estimated and an "error image" is formed from the difference
En
=
Xn Xnlnl.
The error image (at each instant in time) is spatially correlated and can be quantized using any image coding scheme. Note that the error image
66 must be decoded within the encoder loop so that the quantized image, xn, can be constructed and used to predict the next image.
The prediction error images have much lower energy than the original bands and can be subjected to very coarse quantization (less than 0.1 bits/pixel) without introducing "blocking effects." In tact, the bit rate chosen to encode each error image will be the asymptotic bit rate of the system.
Testing of AVIRIS data revealed that the spectral correlation coefficient
p,
for any pixel in the image is approximately 0.95. Accordingly, this value of
p
was used in the
DPCM loop.
Coding of the error image is similar to the system described in [29]. In that system, an image is partitioned into 16
X
16 blocks and the 2D DCT is performed on each block. Coefficients corresponding to the same position within each block ("likecoefficients") are collected into sequences to be encoded using ECTCQ. Codebooks were designed by using a training image approach. Specifically, a group of images
(similar to the test image) was collected, partitioned, and transformed (as discussed above). The "DC" coefficients of each block were collected as a sequence, as were the remaining "high frequency" coefficients. These two training sequences were then used to form codebooks in quarterbit increments (one set each for the DC and high frequency coefficients) using the ECTCQ training algorithm developed in [69]. A rate allocation scheme was developed in which an assumed distortion model was used.
67
5.2 Codebook Design, Rate Allocation, and Side Information
Our coder differs (from that in [29]) with respect to codebook design and rate allocation. In our system, the sequence corresponding to the DC coefficients is assumed to have Gaussian statistics, while the remaining highfrequency sequences are assumed to have Laplacian statistics [74]. Codebooks are designed in onetenth bit increments up to 2.5 bits per pixel (bpp) using 100,000 sample sequences derived from Gaussian and Laplacian pseudo random number generators, respectively (using the algorithm in [69]). It was shown in [69] that for rates less than 2.5 bpp, optimum codebooks do not yield increased performance over uniform codebooks. Thus, at rates greater than 2.4 bpp, uniform codebooks are used. Rate allocation is performed using the algorithm in [75].
The DC sequence is normalized by subtracting its mean and dividing by its standard deviation. The nonDC sequences are normalized by dividing by their respective standard deviations (the nonDC sequences are assumed to have zero mean). The sequences are then encoded using ECTCQ. The total number of sequences M, when using 8 x 8 blocks, is 64 (one DC and 63 nonDC sequences).
The side information required to encode each error image (and the first spectral band) consists of the mean of the DC sequence and the standard deviations of all M =
64 sequences. These quantities are quantized using 16bit uniform scalar quantizers to yield 16(M
+
1) bits. In addition, the initial trellis state of each sequence requires
68
2 bits (for a 4state trellis) which yields 2M bits. The total side information is then
(18M
+
16) bits which corresponds to 1168/(256)2
=
0.0178 bpp for a 256 x 256 image or 1168/(512)2
=
0.00446 bpp for a 512 x 512 image. The first spectral band is encoded and transmitted (as the initial conditions for DPCM) at a total rate
(including side information) of Rl bits/pixel.
5.3 Results
Coding simulations were performed using a 140band hyperspectral image sequence obtained from the AVIRIS system. The bands were 256 x 256 pixels and the performance of the coder is reported using the peak signaltonoise ratio.
The first band in the sequence is quantized at Rl
=
0.75 bits/pixel and is used as the initial condition for the spectral DPCM. This rate was chosen so that the PSNR of the coded first band did not significantly deviate from the average PSNR of the sequence, when encoded at an asymptotic rate
R a,
of 0.10 bits/pixel/band (b/p/b).
Figure 5.2 shows the PSNRs obtained by encoding bands 30 through 69 of the hyperspectral sequence at an asymptotic bit rate l of
Ra
= 0.1 b/p/b. The average
PSNR of the coded sequence is 40.29 dB. Figure A.8 shows the coded image from band 50 of the sequence. The coded image is very similar to that obtained from the
3D DCT system (shown in Figure A.6). We note a very slight blurring of the edges, lit should be noted that the asymptotic rate of the hybrid system includes the side information required to encode each error image, since the contribution of the side information to overall rate will remain constant regardless of sequence length.
69 especially in the higher graylevel regions. We also note that although each error image was subjected to very coarse quantization (e.g., 0.1 bpp), no blocking effects are observed. The error image shown in Figure A.9 reveals greater coding error and structure than either the ECPTCQ system (with high or low side information) or the
3D DCT system. This result is expected since the PSNR of the coded band is lower than that obtained from the coders discussed previously.
Figure 5.3 shows the overall coding rate (including all side information) of the system as a function of sequence length. Note that for encoding short subsequences
(i.e., less than 10 bands), the overall rate of the hybrid system is dominated by the rate required to code the first spectral band. The overall rate for encoding the 40band sequence is 0.116 b/p/b when
RII
=
0.1 b/p/b. If all 140 bands were coded, the overall rate would be 0.105 b/p/b. It is evident that at least 65 bands are required such that the overall rate is within 10% of the asymptotic rate (when Rl
=
0.75 bpp).
The hybrid system achieves a compression ratio of 69:1 at an average PSNR of
40.29 dB. The same level of performance is attainable at a compression ratio of 76:1, if all 140 bands were encoded. Moreover, the subjective performance of the coder is excellent.
The hybrid system is the most computationally tractable of the three systems presented here, with the majority of computations being used to compute the 2D DCT.
The most significant advantage of the hybrid system, however, is the small amount of memory required for encoding/decoding. The encoder requires only 2 bands at
70
50
45 

35 

30~~~~~~~~~~~~~~~~~~~~~~~~~
30 40 50 60 70
Band
Figure 5.2: Performance of encoding hyperspectral sequence at
R/I
=
0.10 b/p/b.
O.B
.., c
0
.0
~
II
)(
0.6
'0.
';;; 0.4
!
~
~
0.21



Number of spectral bands
Figure 5.3: Overall rate versus number of spectral bands
71 once while the ECPTCQ and 3D DCT coders require the entire hyperspectral sequence to reside in memory. Accordingly, the hybrid coder is an ideal candidate for sensorbased applications.
72
CHAPTER 6
COMPARISONS OF HYPERSPECTRAL CODERS
The three hyperspectral image coders presented in the present work produce very highquality results at extremely low bit rates. Figure 6.1 shows the PSNRs obtained by encoding the sequence at
R6
=
0.1 b/p/b with the ECPTCQ, 3D DCT, and hybrid systems. Note that the average PSNR of the ECPTCQ coder is 2.35 dB better than that of the 3D nCT coder and is 2.81 dB better than the hybrid coder.
Additionally, the 3D nCT coder outperforms the hybrid coder by 0.46 dB, with average PSNRs of 40.75 dB and 40.29 dB, respectively.
The increased performance of the ECPTCQ system (for a given asymptotic rate) as compared to the hybrid and 3D nCT systems is largely attributable to the increased side information required by the ECPTCQ system. The ECPTCQ coder, in effect, has access to more information about the sequence than do the other two coders. For short sequences, the increased side information adds considerably to the overall rate and makes the results of Figure 6.1 slightly misleading.
A more accurate comparison of the coders is shown in Figure 6.2, where the overall rate of the coders is plotted as a function of the number of spectral bands.
For this figure, the encoding rates of the hybrid and 3D DCT coders were adjusted so that the average PSNR values for encoding the 40band sequence equaled that of
73
ECPTCQ
30DCT
Hybrid
I
t"I'~
35
40 50
Band
60 70
Figure 6.1: Performance of encoding hyperspectral sequence at
R.
=
0.10 b/p/b. the ECPTCQ system (with high side information). For the hybrid coder,
Rl
(the encoding rate of the first spectral band) was increased to 1.0 bpp so that the PSNR of encoding the first band more closely matched the average PSNR of the sequence.
To achieve performance comparable to the ECPTCQ system, the hybrid coder must be operated at an asymptotic rate of
R.
=
0.262 b/p/b, while the 3D DCT system requires an asymptotic rate of 0.187
b/p/b.
Clearly, as the sequence length becomes shorter, the 3D DCT and hybrid systems achieve comparable performance at lower overall rates than the ECPTCQ system, since they require less side information.
For example, for sequences shorter than 20, the hybrid system has a lower overall rate than ECPTCQ, while the 3D DCT system has the lowest overall rate (of the
2.0
1.5
......
"
~
~
'il.
~ 1.0 :
3DDCT
Hybrid
ECPTCQ i
0.5 \ .•.•
".
,

~
==
==~~"'""==:=
=...::
:...:=
==
==~
Number of spectrol bands
Figure 6.2: Overall rate versus number of spectral bands. three coders) when encoding sequences of less than about 55 bands. In fact, the side information of the 3D DCT system contributes less than 10% to the overall rate for sequences longer than 8, and the hybrid system requires a sequence of only 30 bands such that its overall rate is within 10% of its asymptotic rate. This is in sharp contrast to the ECPTCQ system, where sequences longer than 200 are needed such that the side information does not significantly contribute to the overall rate.
For performance comparisons, we have listed the results from [9] for coding hyperspectral data from the Airborne Imaging Spectrometer. It was shown that for encoding a 32band sequence, compression ratios of 3.4:1, 6.7:1, and 8:1 were obtained by using DPCM, block truncation coding, and transform coding, with PSNR
74
75 values of 35.73, 31.35, and 32.60 dB, respectively. Various VQ schemes yielded performances ranging from 32.09 to 33.53 dB at compression ratios ranging from 32:1 to 21.3:1, respectively.
For encoding a 40band hyperspectral sequence, the ECPTCQ, 3D DCT, and hybrid systems achieve compression ratios of 42:1,77:1, and 69:1 at average PSNRs of
43.1, 40.75, and 40.29 dB, respectively. The same level of performance is attainable at compression ratios of 64:1, 79:1, and 76:1, respectively, if all 140 bands were encoded.
Moreover, the subjective performance of coders is excellent. Further, by decreasing the side information of the ECPTCQ system to R
1 ,
RII
=
0.75 bpp, compression ratios of 53:1 and 70:1 can be obtained for sequences of length 40 and 140, respectively, with a decrease of less than 2 dB in average PSNR, as compared to the highsideinformation case.
Note that when encoding the sequence with the ECPTCQ or hybrid systems, the
PSNR values of the first few spectral bands is highly dependent upon the bit rate, R
1 , chosen to encode the first spectral band. It seems reasonable to choose
Rl
such that the PSNR of the first band is roughly equal to the average PSNR of the remaining bands. For the ECPTCQ system,
Rl
was chosen as 1.0 bpp (when
R/l
=
2.0 bpp and
Ra
=
0.1
b/p/b),
while
Rl
for the hybrid system was chosen at 0.75 bpp (when
Ra
=
0.1 b/p/b). However, when the ECPTCQ system is operated with low side information (e.g.,
RII
=
0.75 bpp),
Rl
was selected as 0.75 bpp to reflect the lower average PSNR of the coded sequence.
76
Despite having lower PSNR performance, the hybrid system gains advantages over the 3D DCT coder in terms of computational simplicity, and over both 3D
DCT and ECPTCQ systems in terms of memory requirements. The hybrid system requires only two bands at once to encode an image sequence, while the 3D DCT and ECPTCQ systems use the entire transformed sequence (of course, much of this problem can be alleviated by breaking the sequence into shorter subsequences and coding each individually). Moreover, the hybrid coder uses simple DPCM in the spectral dimension, while the 3D DCT coder uses transform operations in all three dimensions, thus affording the hybrid coder lower complexity. The complexity of the
ECPTCQ coder is comparable to that of the hybrid system. Since 4state trellises are used exclusively in the three systems, the quantization process requires minimal computations [26].
Finally, we note that our systems need not be optimized for any particular class of imagery, thus making the coders very robust to a wide range of scenery. Many of the coders mentioned in the literature use codebooks trained with specific imagery which may result in image class dependencies. Consequently, those coders require some form of scenebased codebook selection mechanism (either supervised or unsupervised) to operate optimally with varying terrain.
The interband correlation of hyperspectral imagery facilitates substantial reduction of the data required for storing and/or transmitting such imagery. However, a careless approach to reducing the correlation could lead to disastrous loss of the
77 information differences between bands that are the critical value of multispectral imagery. Accordingly, subjective and quantitative performance are only starting points for the eva.luation of any hyperspectral image coder. Future studies involving techniques such as spectral signature mapping and mixture component analysis [9] may help reveal the true performance of the systems.
78
CHAPTER 7
SUMMARY
In
Chapter 1, we discussed the formation of hyperspectral imagery and its utility in identifying specific materials within a given scene. We defined the problems associated with storage and transmission of hyperspectral data, and found that lossless compression (i.e., information preserving) can provide compression ratios of only
3:1, which is insufficient to deal with the immense quantities of data produced by hyperspectral sensors. Lossy compression schemes for multispectral data were reviewed and found to be capable of providing compression ratios of greater than 30:1 by exploiting spatial and spectral redundancies inherent to multispectral data.
Chapter 2 reviewed the quantization process and described various scalar quantizers. Vector quantizers were presented and shown to provide substantial gains over scalar quantization via three different mechanisms granular gain, boundary gain, and nonuniform density gain. Trellis coded quantization was reviewed and shown to outperform scalar quantization as well, with far less complexity than VQ.
Entropyconstrained TCQ was shown to provide nearoptimal performance (in a ratedistortion theory sense) for encoding all memoryless sources at all nonnegative encoding rates. TCQ was shown to be far more computationally tractable than VQ for a given encoding rate (at comparable performance levels). A new predictive coding
79 scheme called entropyconstrained predictive trellis coded quantization (ECPTCQ) was developed for encoding autoregressive sources. For encoding a GaussMarkov source, an 8state ECPTCQ outperforms entropyconstrained DPCM by up to 1.0 dB at all nonnegative encoding rates, and comes within 0.5 dB of the ratedistortion function at high rates.
In
Chapter 3, a hyperspectral coding scheme was developed which uses ECPTCQ.
Specifically, the hyperspectral image sequence is spatially decorrelated by applying a
2D DCT to nonoverlapping 8 x 8 blocks. Thereafter, ECPTCQ is used to encode the transform coefficients in the spectral dimension. The first spectral band is encoded
(as initial conditions for ECPTCQ) using entropyconstrained TCQ. ECPTCQ and
ECTCQ codebooks are designed using a modified version of the generalized Lloyd algorithm. Spatial and spectral rate allocation is handled by an iterative optimization algorithm. For encoding an 8bit hyperspectral sequence at 0.125 b/p/b, an average
PSNR of 43.1 dB is obtained.
A 3D DCT hyperspectral coding system is presented in Chapter 4. The hyperspectral sequence is divided into nonoverlapping 8 x 8 x 8 cubes and "likecoefficients" are collected and encoded using ECTCQ. Codebooks are optimized for various generalized Gaussian distributions with the appropriate codebook being chosen based on the fourth central moment or Kurtosis. For encoding a 40band hyperspectral sequence, a compression ratio of 77:1 is obtained with an average PSNR of 40.75 dB.
80
A third system is presented in Chapter 5. This coder is a hybrid DPCM/DCT configuration whereby DPCM is used in the spectral dimension and each "error image" is coded using a 2D DCT coding scheme. The errorimage coder transforms
8 x 8 blocks and encodes like coefficients using ECTCQ. Compression ratios of 69:1 are obtainable while maintaining an average PSNR of 40.29 dB over the encoded bands.
Comparisons of the hyperspectral coders were provided in Chapter 6. It was shown that at a given asymptotic rate, the highest level of performance is obtained with the
ECPTCQ coder, followed by the 3D nCT and hybrid coders, respectively. When average PSNR levels of the coders are matched, the 3D DCT coder has the lowest overall rate when the sequence length is less than 55 bands. The computational complexity of the ECPTCQ and hybrid systems are roughly equivalent, with the 3D
DCT system being the most computationally intensive. The hybrid system requires only 2 bands at once for encoding purposes while the ECPTCQ and 3D DCT systems require the entire hyperspectral sequence. Accordingly, the hybrid system is suitable for sensorbased applications while the ECPTCQ and 3D DCT coders are better suited to groundbased applications.
Appendix A
PHOTOGRAPHS
81
Figure A.l: Band 50 of a hyperspectral sequence (256
X
256).
82
Figure A.2: Encoded image from ECPTCQ coder with high side information (44.37 dB at 0.19 b/p/b).
Figure A.3: Difference image from ECPTCQ coder with high side information.
83
Figure A.4: Encoded image from ECPTCQ coder with low side information (42.11 dB at 0.17 hlp/b).
Figure A.5: Difference image from ECPTCQ coder with low side information.
Figure A.6: Encoded image from 3D DCT coder (41.66 dB at 0.104 b/p/b).
Figure A.7: Difference image from 3D
neT
coder.
84
Figure A.8: Encoded image from hybrid coder (41.O1
dB
at 0.116 b/p/b).
Figure A.9: Difference image from hybrid coder.
85
86
REFERENCES
[1] A. F. H. Goetz, G. Vane, J. E. Solomon, and B. N. Rock, "Imaging spectrometry for earth remote sensing," Science, vol. 228, pp. 11471153, June 1985.
[2]
L.
Blanchard and O. Weinstein, "Design challenges of the Thermatic Mapper,"
IEEE Trans. Geosci. Remote Sensing, vol. GE18, pp. 146160, Apr. 1980.
[3] P. J. Curran and J.
1.
Dungan, "Estimation of signaltonoise: A new Procedure applied to AVIRIS data," IEEE Trans. Geosci. Remote Sensing, vol. 27, pp. 620
628, Sept. 1989.
[4] G. Vane and A. F. H. Goetz, "Terrestrial imaging spectrometry," Remote Sensing
Environ., vol. 24, pp. 129, 1988.
[5] W. M. Porter and H. T. Enmark, "A system overview of the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS)," Imaging Spectroscopy II, G.
Vane, Editor, Proc. SPIE 834, pp. 2229, 1987.
[6] G. Vane, "First results from the Airborne visible/infrared imaging spectrometer
(AVIRIS)," Imaging Spectroscopy II, G. Vane, Editor, Proc. SPIE 834, pp. 166
174, 1987.
[7] D. Rockey, "High Resolution Imaging Spectrometer (HIRIS)  A major advance in imaging spectrometry," Imaging Spectroscopy of the Terrestrial Environment,
G. Vane, Editor, Proc. SPIE 1298, pp. 93104, 1990.
[8] W. Esaias and W. Barnes, "Moderate resolution imaging spectrometer (MODIS) instrument panel report," NASA, Washington, DC, EOS Rep., vol. 2b, 1986.
[9] R. L. Baker and Y. T. Tze, "Compression of high spectral resolution imagery,"
Applications of Digital Image Processing XI, A. G. Tescher, Editor, Proc. SPIE
974, pp. 255264, 1988.
[10] A. Jain, "Image data compression: A review," Proc. IEEE, vol. 69, pp. 349389,
Mar. 1981.
87
[11] T. M. Chen, D. H. Sta.elin, and R. B. Arps, "Information content analysis for
Landsat image data for compression," IEEE Trans. on Geoscience ana Remote
Sensing,
vol. GE25, pp. 499501, July 1987.
[12] N. S. Jayant and P. Noll, Digital Coding of Waveforms. Englewood Cliffs, NJ:
PrenticeHall, 1984.
[13] A. K. Jain, Fundamentals of Digital Image Processing. Englewood Cliffs, NJ:
PrenticeHall, 1989.
[14] A. N. Netravali and B. G. Haskell, Digital Pictures, Representation and Com
pression.
New York, NY: Plenum Press, 1988.
[15] B. R. Epstein, R. Hingorani, J. M. Shapiro, and M. Czigler, "Multispectral
KLTwavelet data compression for Landsat thermatic mapper images," Proc.
Data Compression Conf.,
pp. 200208, Apr. 1992.
[16] M. Manohar and J. C. Tilton, "Progressive vector quantization of multispectral image data using a massively parallel SIMD machine," Proc. Data Compression
Conj.,
pp. 181190, Apr. 1992.
[17] S. Gupta and A. Gersho, "Feature predictive vector quantization of multispectral images," IEEE Trans. Geoscience and Remote Sensing, vol. 30, no. 3, pp. 491
501, 1992.
[18] S. Gupta and A. Gersho, "Variable rate multistage vector quantization of multispectral imagery with greedy bit allocation," Visual Communications and Image
Processing, B. G. Haskell and H. Hang, Editors, Proc. SPIE 2094, pp. 890901,
1993.
[19] S. Jaggi, "An investigative study of multispectral lossy data compression using vector quantization," Hybrid Image and Signal Processing III, D. P. Casasent and A. G. Tescher, Editors, Proc. SPIE 1702, pp. 238249, 1992.
[20] J. C. Tilton, D. Han, and M. Manohar, "Compression experiments with AVHRR data," Proc. Data Compression Conj., pp. 411420, Apr. 1991.
[21] J. C. Tilton, "Hierarchical data compression: Integrated browse, moderate loss, and lossless levels of data compression," Proc. of the international Geoscience
and Remote Sensing Symp.,
pp. 16511654, 1990.
88
[22] J. A. Saghri and A. G. Tescher, "NearIossless bandwidth compression for radiometric data," Opt. Engr., vol. 30, pp. 934939, July 1991.
[23] R. F. Rice, "Some practical universal noiseless coding techniques," JPL Publi
cation, vol. 7922, Mar. 15 1979.
[24] R. F. Rice and J. Lee, "Some practical universal noiseless coding techniques II,"
JPL Publication, vol. 8317, Mar. 1 1983.
[25] P. G. Howard and J. S. Vitter, "New methods for lossless image compression using arithmetic coding," Proc. Data Compression Conf., pp. 257266, Apr.
1991.
[26] M. W. Marcellin and T. R. Fischer, "Trellis coded quantization of memoryless and GaussMarkov sources," IEEE Trans. Commun., vol. COM38, pp. 8293,
Jan. 1990.
[27] G. Ungerboeck, "Channel coding with multilevel/phase signals," IEEE Trans.
Inform. Th., vol. 28, pp. 5567, Jan. 1982.
[28] G. D. Forney, Jr., "The Viterbi algorithm," Proc. IEEE, vol. 61, pp. 268278,
Mar. 1973.
[29] M. W. Marcellin, P. Sriram, and K. Tong, "Transform coding of monochrome and color images using trellis coded quantization," IEEE Trans. Circuits and
Systems for Vidio Technology, vol. 3, pp. 270276, Aug. 1993.
[30] P. Sriram and M. W. Marcellin, "Image coding using wavelet transforms and entropyconstrained trellis coded quantization," Submitted to IEEE Trans. Im
age Proc.
[31] G. K. Wallace, "The JPEG still picture compression standard," Communications
of the ACM, vol. 34, pp. 3044, Apr. 1991.
[32] M. L. Liou, "Overwiew of the px64 kbps video coding standard," Communica
tions of the ACM, vol. 34, pp. 5963, Apr. 1991.
[33] D. L. Le Gall, "MPEG: A video compression standard for multimedia applications," Communications of the ACM, vol. 34, pp. 4658, Apr. 1991.
89
[34] M. T. Sun, T. C. Chen, A. Gottlieb, L. Wu, and M. L. Liou, "A 16 x 16 Discrete
Cosine Transform chip," Visual Communications and Image Processing II, T.
R. Hsing, Editor, Proc. SPIE 845, pp. 1318, 1987.
[35] A. M. Chiang, "A videorate CCD twodimensional cosine transform processor,"
Visual Communications and Image Processing II,
T. R. Hsing, Editor, Proc.
SPIE 845, pp. 25, 1987.
[36] F. Jutand, N. Demassieux, M. Dana, J.P. Durandeau, and G. Concordel, "A
13.5 MHz single chip multiformat discrete cosine transform," Visual Communi
cations and Image Processing II,
T. R. Hsing, Editor, Proc. SPIE 845, pp. 612,
1987.
[37] R. J. Clarke, Transform Coding of Images. Orlando, FL: Academic Press, 1985.
[38] W. H. Chen and C. H. Smith, "Adaptive coding of monochrome and color images," IEEE Trans. Commun., vol. COM25, pp. 12851292, Nov. 1977.
[39] W. H. Chen and W. K. Pratt, "Scene adaptive coder," IEEE Trans. Commun., vol. COM32, pp. 225232, Mar. 1984.
[40] W. A. Pearlman, P. Jakatdar, and M. M. Leung, "Adaptive transform tree coding of images," IEEE
J. Select. Areas in Commun.,
vol. 10, pp. 902912, June 1992.
[41] R. E. Crochiere, S. A. Webber, and J. L. Flanagan, "Digital coding of speech in subbands," Bell Syst. Tech.
J.,
vol. 55, pp. 10691085, Oct. 1976.
[42] A. Croisier, D. Esteban, and C. Garland, "Perfect channel splitting by use of interpolation/decimation/tree decomposition techniques," in Conf. Proceedings,
1976 IEEE Int. Conf. on Inform. Sci., Patras, Greece, May 1976.
[43] M. Vetterli, "Multidimensional subband coding: Some theory and algorithms,"
Signal Processing,
vol. 6, pp. 97112, Apr. 1984.
[44] J. W. Woods and S. D. O'Neil, "Subband coding of images," IEEE Trans.
Acoust., Speech, and Signal Proc.,
vol. ASSP34, pp. 12781288, Oct. 1986.
[45] H. Gharavi and A. Tabatabai, "Subband coding of monochrome and color images," IEEE Trans. Circuits Syst., vol. CAS35, pp. 207214, Feb. 1988.
90
[46] P. H. Westerink, D. E. Boekee, J. Biemond, and J. W. Woods, "Subband coding of images using vector quantization," IEEE Trans. Commun., vol. COM36, pp. 713719, June 1988.
[47] M. J. T. Smith and S. L. Eddins, "Analysis/Synthisis techniques for subband image coding,"
IEEE Trans. Acoust., Speech, and Signal Proc., vol. ASSP38, pp. 14461456, Aug. 1990.
[48] S. Nanda and W. A. Pearlman, "Tree coding of image subbands,"
IEEE Trans.
Image Proc., vol. IP1, pp. 133147, Apr. 1992.
[49] S. G. Mallat, "A theory for multiresolution signal decomposition: the wavelet representation,"
IEEE Trans. Pattern Anal. and Mach. Intel., vol. 11, pp. 674
693, July 1989.
[50] S. G. Mallat, "Multifrequency channel decomposition of images and wavelet models," IEEE Trans. Acoust., Speech, and Signal Proc., vol. ASSP37, pp. 20912110, Dec. 1989.
[51]
I.
Daubechies, "Orthonormal bases of compactly supported wavelets,"
Commun.
Pure Appl. Math, vol. 41, pp. 909996, Nov. 1988.
[52] A. Grossmann and J. Morlet, "Decomposition of Hardy functions into square integrable wavelets of constant shape,"
SIAM J.
Math., vol. 15, pp. 723736,
July 1984.
[53] J. Max, "Quantizing for minimum distortion," IEEE Trans. Inform. Th., vol. IT
6, pp. 712, Mar. 1960.
[54] R. E. Blahut, Principles and Practice of Information Theory. Reading, MA:
Addison Wesley, 1987.
[55]
I.
H. Witten, R. M. Neal, and J. G. Cleary, "Arithmetic coding for data compression," Communications of the ACM, vol. 30, pp. 520540, June 1987.
[56] F. Jelinek, "Buffer overflow in variable length coding of fixed rate sources,"
IEEE
Trans. Inform. Th., vol. IT14, pp. 490501, May 1968.
91
[57] N. Farvardin and J. W. Modestino, "On overflow and underflow problems in bufferinstrumented variablelength coding of fixedrate memoryless sources,"
IEEE Trans. Inform. Th.,
vol. IT32, pp. 839845, Nov. 1986.
[58] H. Gish and J. N. Pierce, "Asymptotically efficient quantizing,"
IEEE Trans.
Inform. Th.,
vol. IT14, pp. 676683, Sept. 1968.
[59] N. Farvardin and J. W. Modestino, "Optimum quantizer performance for a class of nongaussian memoryless sources,"
IEEE Trans. Inform. Th.,
vol. 30, pp. 485
497, May 1984.
[60] T. Burger, "Optimum quantizers and permutation codes,"
IEEE Trans. Inform.
Th.,
vol. IT18, pp. 759765, Nov. 1972.
[61] T. Berger,
Rate Distortion Theory: A Mathematical Basis for Data Compression.
Englewood Cliffs, NJ: Prentice Hall, 1971.
[62] T. R. Fischer, "Geometric source coding and vector quantization,"
IEEE Trans.
Inform. Th.,
vol. 35, pp. 137145, Jan. 1989.
[63] A. Gersho and R. M. Gray,
Vector Quantization and Signal Compression.
Boston, MA: Kluwer Academic Press, 1992.
[64] R. Laroia, "Design and analysis of a fixedrate structured vector quantizer derived from variablelength scalar quantizers," Ph.D. dissertation, University of
Maryland, College Park, MD, May 1992.
[65] M. V. Eyuboglu and G. D Forney, Jr., "Lattice and trellis quantization with lattice and trellisbounded codebooks high rate theory for memoryless sources,"
IEEE Trans. Inform. Th.,
vol. 39, pp. 4659, Jan. 1993.
[66] A. Gersho, "Asymptotically optimal block quantization,"
IEEE Trans. Inform.
Th.,
vol. IT25, pp. 373380, July 1979.
[67] Y. Linde, A. Buzo, and R. M. Gray, "An algorithm for vector quantizer design,"
IEEE Trans. Commun.,
vol. COM28, pp. 8495, Jan. 1980.
[68] T. R. Fischer, "A pyramid vector quantizer,"
IEEE Trans. Inform. Th.,
vol. 32, pp. 568583, July 1986.
92
[69] M. W. Marcellin, "On entropyconstrained trellis coded quantization," To appear in IEEE Trans. Comm'Un., vol. 42, Jan. 1994.
[70] P. Chou, T. Lookabaugh, and R. M. Gray, "Entropyconstrained vector quantization," IEEE Trans. Aco'Ust., Speech, and Signal Proc., vol. ASSP37, pp. 3142,
Jan. 1989.
[71] N. Farvardin and J. W. Modestino, "Ratedistortion performance of DPCM schemes for autoregressive sources," IEEE Trans. Inform. Th., vol. 31, pp. 402
418, May 1985.
[72] E. Ayanoglu and R. M. Gray, "The design of predictive trellis waveform coders using the generalized Lloyd algorithm," IEEE Trans. Comm'Un., vol. COM34, pp. 10731080, Nov. 1986.
[73] T. R. Fischer and M. Wang, "Entropyconstrained trellis coded quantization,"
IEEE Trans. Inform. Th.,
vol. 38, pp. 415425, Mar. 1992.
[74] R. C. Reininger and J. D. Gibson, "Distributions of the twodimensional DCT coefficients for images," IEEE Trans. Commun., vol. COM31, pp. 835839, June
1983.
[75] Y. Shoham and A. Gersho, "Efficient bit allocation for an arbitrary set of quantizers," IEEE Trans. Aco'Ust., Speach, and Signal Proc., vol. 36, pp. 14451453,
Sept. 1988.
[76] R. C. Gonzalez and P. Wintz, Digital Image Processing. Reading, MA: Addison
Wesley, 1989.
[77] J. W. Woods, Editor, Subband Image Coding. Boston, MA: Kluwer Academic
Press, 1991.
[78] Y. Q. Zhang and S. Zafar, "Motioncompensated wavelet transform coding for color video compression," IEEE Trans. Circuits and Systems for Vidio Technol
ogy,
vol. 2, pp. 285296, Sept. 1992.
[79] S. Zafar, Y. Q. Zhang, and B. J. Jabbari, "Multiscale video representation using multiresolution motion compensation and wavelet decomposition," IEEE
J.
Selected Areas in Commun.,
vol. 11, pp. 2435, Jan. 1993.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Related manuals
advertisement