abStraCt Preferential Coding for Mobile Mul

abStraCt Preferential Coding for Mobile Mul
Different parts of source encoded multimedia streams such
The thesis is divided into four parts. In the first part, region
as those associated with standard image or video formats
of interest (ROI) identification, coding and advantages of ROI
possess different levels of importance with respect to their
coding for different applications are investigated. In addition, a
contribution to the quality of the reconstructed image or vi-
framework is proposed for using ROI coding in wireless ima-
deo. This unequal importance among data within a codestream
ging. The second part analyses the error sensitivity of wireless
gives rise to preferential treatment of the more significant
JPEG2000 (JPWL) in terms of perceptual quality metrics. It
parts of the codestream compared to the less important
is also shown that using reduced-reference perceptual qua-
parts. Similarly, visual information offered by certain regions
lity metrics as error sensitivity descriptor (ESD) in JPWL
of an image or video may attract viewer’s attention more than
increases the effectiveness of ESD. Specifically, this type of
other parts of the viewing area. As a consequence, preferen-
metrics correlates well with subjective quality assessment
tial treatment of important data and information can play a
and provides an additional estimate for the obtained quality.
vital role in mobile multimedia services in order to preserve
In the third part, two UEP schemes for JPWL are proposed
satisfactory quality of service under the harsh conditions of a
and compared with equal error protection (EEP). Their per-
band-limited, error-prone wireless channel. In this thesis, we
formance is evaluated in terms of perceptual quality metrics
therefore, investigate how preferential coding can be used to
such as structural similarity index and the visual information
protect multimedia services more efficiently against transmis-
fidelity criterion of the reconstructed image and their benefit
sion errors. For this purpose, an error sensitivity analysis of
over EEP is revealed. Finally, in the fourth part, a framework for
the specific application is utilized as a basis to design efficient
optimized preferential coding of ROI based images and videos
unequal error protection (UEP) schemes. The performance of
is proposed. Specifically, a dynamic programming algorithm for
the proposed preferential coding schemes is evaluated using
optimal parity distribution is provided.
Preferential Coding for
Mobile Multimedia Services
ABSTRACT
Preferential Coding for
Mobile Multimedia Services
Muhammad Imran Iqbal
objective perceptual quality metrics in order to account for
the fact that humans are the ultimate judges of service quality.
Muhammad Imran Iqbal
ISSN 1650-2140
ISBN 978-91-7295-182-2
2010:07
2010:07
Blekinge Institute of Technology
Licentiate Dissertation Series No. 2010:07
School of Engineering
Preferential Coding for Mobile
Multimedia Services
Muhammad Imran Iqbal
Blekinge Institute of Technology Licentiate Dissertation Series
No 2010:07
Preferential Coding for Mobile
Multimedia Services
Muhammad Imran Iqbal
Department of Electrical Engineering
School of Engineering
Blekinge Institute of Technology
SWEDEN
© 2010 Muhammad Imran Iqbal
Department of Electrical Engineering
School of Engineering
Publisher: Blekinge Institute of Technology
Printed by Printfabriken, Karlskrona, Sweden 2010
ISBN 978-91-7295-182-2
Blekinge Institute of Technology Licentiate Dissertation Series
ISSN 1650-2140
urn:nbn:se:bth-00468
Abstract
Different parts of source encoded multimedia streams such as those associated
with standard image or video formats possess different levels of importance with
respect to their contribution to the quality of the reconstructed image or video.
This unequal importance among data within a codestream gives rise to preferential treatment of the more significant parts of the codestream compared to the less
important parts. Similarly, visual information offered by certain regions of an image or video may attract viewer’s attention more than other parts of the viewing
area. As a consequence, preferential treatment of important data and information
can play a vital role in mobile multimedia services in order to preserve satisfactory quality of service under the harsh conditions of a band-limited, error-prone
wireless channel. In this thesis, we therefore, investigate how preferential coding
can be used to protect multimedia services more efficiently against transmission
errors. For this purpose, an error sensitivity analysis of the specific application
is utilized as a basis to design efficient unequal error protection (UEP) schemes.
The performance of the proposed preferential coding schemes is evaluated using
objective perceptual quality metrics in order to account for the fact that humans
are the ultimate judges of service quality.
The thesis is divided into four parts. In the first part, region of interest (ROI)
identification, coding and advantages of ROI coding for different applications are
investigated. In addition, a framework is proposed for using ROI coding in wireless imaging. The second part analyses the error sensitivity of wireless JPEG2000
(JPWL) in terms of perceptual quality metrics. It is also shown that using reducedreference perceptual quality metrics as error sensitivity descriptor (ESD) in JPWL
increases the effectiveness of ESD. Specifically, this type of metrics correlates
well with subjective quality assessment and provides an additional estimate for
the obtained quality. In the third part, two UEP schemes for JPWL are proposed
and compared with equal error protection (EEP). Their performance is evaluated
v
vi
Abstract
in terms of perceptual quality metrics such as structural similarity index and the
visual information fidelity criterion of the reconstructed image and their benefit over EEP is revealed. Finally, in the fourth part, a framework for optimized
preferential coding of ROI based images and videos is proposed. Specifically, a
dynamic programming algorithm for optimal parity distribution is provided.
Preface
This licentiate thesis summarizes part of my work within the field of mobile multimedia. The work has been performed at the Department of Electrical Engineering,
School of Engineering, at Blekinge Institute of Technology.
The thesis consists of four parts:
Parts
I
On Region of Interest Coding for Wireless Imaging.
II
Error Sensitivity Analysis for Wireless JPEG2000 Using Perceptual Quality
Metrics.
III
Quality Assessment of Error Protection Schemes for Wireless JPEG2000.
IV
A Framework for Error Protection of Region of Interest Coded Images and
Videos.
vii
Acknowledgements
I would like to take this opportunity to thank all the people, without whom it
would not have been at all possible for me to achieve what I have achieved so far
in my private and professional life.
First of all, I would like to express my deep and sincere gratitude to my supervisor Prof. Hans-Jürgen Zepernick for giving me the great opportunity to work
with him as his graduate student. His guidance and support has been invaluable
throughout this research. I have been benefiting from his knowledge and experience all these years. I also thank him for all the mentoring he has provided me in
all aspects of professional life. It has been an absolute pleasure working with him.
I would also like to thank my co-supervisor Dr. Mats Pettersson for referring
me to Prof. Hans-Jürgen Zepernick at first place and for his encouragement and
support later on during this work. I am very grateful to Dr. Benny Lövström for
being so friendly and giving me opportunities of time to time discussions with him
on different topics. Many thanks to all colleagues at the department for their help
and for providing a great work environment. Special thanks to Mikael Swartling
for helping me in resolving LATEX issues. I am also grateful to Madeleine Jarlten, Lena Brandt Gustafsson and Marie Ahlgren for helping me in administrative
issues.
I would like to thank my dear friends Robert and Anna for introducing me to
the delights of Swedish culture and hospitality. I will never forget the evenings
and the other time I spent with them on different occasions. Special thanks to all
of my friends in Sweden and abroad who bring joy and pleasure in my life.
Finally, I want to express my gratitude to my parents who provided me the
opportunity to study abroad and have been giving me unmeasurable love and support. I am also thankful to my brother and sisters for their love and care. My
brother Ishfan Iqbal has been a continuous source of motivation for me since my
childhood.
ix
x
Acknowledgements
This research was partially funded by the Graduate School of Telecommunications (GST) administrated by the Royal Institute of Technology (KTH), Stockholm Sweden. I have also been receiving funding from the European Network of
Excellence EuroFGI and EuroNF for attending summer schools and PhD courses.
Muhammad Imran Iqbal
Karlskrona, July 2010
Publications
Publications included in this thesis:
Part I is published as
M. I. Iqbal and H.-J. Zepernick, “On Region of Interest Coding for Wireless
Imaging,” in Proceedings of International Conference on Signal Processing and
Telecommunication Systems, Gold Coast, Australia, pp. 198-208, Dec. 2007.
Part II is published as
M. I. Iqbal, H.-J. Zepernick, and U. Engelke, “Error Sensitivity Analysis for Wireless JPEG2000 Using Perceptual Quality Metrics,” in Proceedings of International Conference on Signal Processing and Communication Systems, Gold Coast,
Australia, pp. 1-9, Dec. 2008.
Part III is published as
M. I. Iqbal and H.-J. Zepernick, “Quality Assessment of Error Protection Schemes
for Wireless JPEG2000,” in Research Report, Blekinge Institute of Technology, no.
4, 2010, ISSN: 1103-1581.
M. I. Iqbal, H.-J. Zepernick, and U. Engelke, “Perceptual-based Quality Assessment of Error Protection Schemes for Wireless JPEG2000,” in Proc. IEEE International Symposium on Wireless Communication Systems, Siena, Italy, pp. 348352, Sept. 2009.
xi
xii
Publications
Part IV is submitted as
M. I. Iqbal and H.-J. Zepernick, “A Framework for Error Protection of Region of
Interest Coded Images and Videos,” EURASIP Journal: Image Communication,
2010, under review.
Other publications:
M. I. Iqbal and H.-J. Zepernick, “Error Protection for Wireless Imaging: Providing
a Trade-off Between Performance and Complexity,” in Proceedings of IEEE International Symposium on Communications and Information Technologies, Tokyo,
Japan, October 2010.
M. I. Iqbal, H.-J Zepernick, M. Fiedler, and J. Shaikh, “Spatio-Temporal Quality of Experience Trade-offs for Mobile Imaging Applications,” in Proceedings
of Workshop on Quality of Experience for Multimedia Content Sharing, Tampere,
Finland, June 2010.
A. M. Aibinu, M. I. Iqbal, A. A. Shafie, M. J. E. Salami, and M. Nilsson, “Vascular
Intersection Detection in Retina Fundus Images Using a New Hybrid Approach,”
ELSEVIER Journal: Computers in Biology and Medicine, vol. 40, no. 1, pp. 8189, Jan. 2010.
M. I. Iqbal, A. Aibinu, M. Nilsson, I. Tijani, and M. Salami, “Detection of Vascular Intersection in Retina Fundus Image Using Modified Cross Point Number
and Neural Network Technique,” in Proceedings of International Conference on
Computer and Communication Engineering, Kuala Lumpur, Malaysia, pp. 241246, May, 2008.
A. M. Aibinu, M. I. Iqbal, M. Nilsson and M. J. E. Salami, “Automatic Diagnosis of Diabetic Retinopathy from Fundus Images Using Digital Signal and Image
Processing Techniques,” in Proceedings of International Conference on Robotics,
Vision, Information, and Signal Processing, Penang, Malaysia, Nov. 2007.
Publications
xiii
A. M. Aibinu, M. I. Iqbal, M. Nilsson and M. J. E. Salami, “A New Method
of Correcting Uneven Illumination Problem in Fundus Images,” in Proceedings of
International Conference on Robotics, Vision, Information, and Signal Processing, Penang, Malaysia, Nov. 2007.
Contents
Abstract
v
Preface
vii
Acknowledgements
ix
Publications
xi
Contents
xv
Introduction
1
Mobile Multimedia Services . . . . . . . . . . . . . . . . . . . .
2
Properties of the Human Visual System . . . . . . . . . . . . . .
3
Multimedia Quality Assessment . . . . . . . . . . . . . . . . . .
4
Error Protection for Mobile Multimedia . . . . . . . . . . . . . .
5
Thesis Overview and Contributions . . . . . . . . . . . . . . . . .
5.1
PART I: On Region of Interest Coding for Wireless Imaging
5.2
PART II: Error Sensitivity Analysis for Wireless JPEG2000
Using Perceptual Quality Metrics . . . . . . . . . . . . .
5.3
PART III: Quality Assessment of Error Protection Schemes
for Wireless JPEG2000 . . . . . . . . . . . . . . . . . . .
5.4
PART IV: A Framework for Error Protection of Region of
Interest Coded Images and Videos . . . . . . . . . . . . .
1
3
4
6
8
10
11
I On Region of Interest Coding for Wireless Imaging
21
1
2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
General Concepts of ROI Coding . . . . . . . . . . . . . . . . . .
xv
12
13
14
24
26
xvi
Contents
3
4
5
6
7
2.1
Region of Interest . . . . . . . . . . . . . . . . . . .
2.2
Image Classes and Region of Interest . . . . . . . .
2.3
Quality assessment of ROI coding schemes . . . . .
Framework of ROI Coding for Wireless Imaging . . . . . .
ROI Identification . . . . . . . . . . . . . . . . . . . . . . .
4.1
Visual Attention Based ROI Identification . . . . . .
4.2
Knowledge Based ROI Identification . . . . . . . .
4.3
ROI Identification for Document Images . . . . . . .
ROI Coding Techniques . . . . . . . . . . . . . . . . . . . .
5.1
Non-Standard Techniques . . . . . . . . . . . . . .
5.2
ROI Coding in the JPEG2000 Standard . . . . . . .
5.3
Amendments to ROI coding in JPEG2000 . . . . . .
5.4
Comparison of ROI Coding Methods . . . . . . . .
Advantages of ROI Coding in Different Applications . . . .
6.1
General Performance Characteristics of ROI Coding
6.2
Exploring ROI Coding for Wireless Imaging . . . .
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
II Error Sensitivity Analysis for Wireless JPEG2000 Using Perceptual Quality Metrics
1
2
3
4
5
Introduction . . . . . . . . . . . . . . . . . . . . . . . .
JPEG2000 and JPWL . . . . . . . . . . . . . . . . . . .
2.1
JPEG2000 Image Compression and Codestream
2.2
JPWL Image Compression and Markers . . . . .
Error Sensitivity Description . . . . . . . . . . . . . . .
Objective Image Quality Metrics . . . . . . . . . . . . .
4.1
Image Fidelity Metrics . . . . . . . . . . . . . .
4.2
Perceptual Relevance Weighted LP -norm . . . .
4.3
Normalized Hybrid Image Quality Metric . . . .
4.4
Structural Similarity Index . . . . . . . . . . . .
4.5
Visual Information Fidelity . . . . . . . . . . . .
Numerical Results . . . . . . . . . . . . . . . . . . . . .
5.1
System Settings . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
27
28
31
34
35
35
36
36
37
39
40
46
47
47
50
51
61
.
.
.
.
.
.
.
.
.
.
.
.
.
64
66
66
67
69
71
71
71
72
73
73
75
75
Contents
6
xvii
5.2
Results of the Error Sensitivity Analysis . . . . . . . . . .
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
III Quality Assessment of Error Protection Schemes for Wireless
JPEG2000
1
2
3
4
5
6
76
83
89
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Wireless JPEG2000 . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.1
JPWL System Description . . . . . . . . . . . . . . . . . 95
2.2
JPWL Marker Segments . . . . . . . . . . . . . . . . . . 97
Image Quality Assessment . . . . . . . . . . . . . . . . . . . . . 98
3.1
Fidelity Metrics . . . . . . . . . . . . . . . . . . . . . . . 98
3.2
Perceptual Quality Metrics . . . . . . . . . . . . . . . . . 99
Error Protection for Wireless JPEG2000 . . . . . . . . . . . . . . 101
4.1
Equal Error Protection . . . . . . . . . . . . . . . . . . . 103
4.2
Unequal Error Protection - Strategy 1 . . . . . . . . . . . 104
4.3
Unequal Error Protection - Strategy 2 . . . . . . . . . . . 104
Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . 105
5.1
Performance Comparison for Additive White Gaussian Noise
Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.2
Performance Comparison for Fading Channel . . . . . . . 109
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
IV A Framework for Error Protection of Region of Interest Coded
Images and Videos
123
1
2
3
4
5
Introduction . . . . . . . . . . . . . . . . . . .
Framework for Preferential Error Protection . .
Splitting the Codestream into ROI and BG Cells
3.1
Splitting Approach for Images . . . . .
3.2
Splitting Approach for Videos . . . . .
Parity Allocation to ROI and BG Cells . . . . .
Parity Allocation to ROI and BG Packets . . . .
5.1
Problem Statement . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
126
129
131
131
133
134
136
136
xviii
6
7
Contents
5.2
Solution to the Problem . .
5.3
Complexity Considerations .
Numerical Results . . . . . . . . . .
6.1
Simulation Setting . . . . .
6.2
Results and Discussion . . .
Conclusions . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
138
139
140
140
146
149
Introduction
Multimedia services have become an essential part of our daily life. We are surrounded by voice, audio and video based services in different technologies operated over wired and wireless links. A tremendous increase of providing these
services over wireless links has become apparent in recent years due to advancements in wireless technologies that can now offer high data rates to both stationary
and mobile users. This trend is expected to continue in the years to come. According to recent estimates [1,2], the global mobile data traffic in cellular networks will
double every year and video streaming will constitute its major part.
User mobility is the key feature associated to modern wireless technologies
which helps users benefiting from there subscriptions of mobile multimedia services when they are at home or on the move. Along with benefits it brings great
challenges in providing services over wireless links with acceptable quality of
service (QoS) because of the continuously varying wireless channel between the
transmitter and receiver. When it comes to multimedia services such as wireless
image and video services, that are inherently data rate demanding services, the
limited bandwidth of wireless channel imposes additional challenges. From the
angle of a service provider, bandwidth is an asset and its efficient utilization is the
key to serve more users without additional costs of obtaining more bandwidth.
Source coding partially solves the problem by reducing the bandwidth requirements for these services but at the cost of increased vulnerability of the compressed data to errors. Channel coding or error protection is used to improve
error resilience of compressed codestreams. However, searching for the optimal
error protection for mobile multimedia services is a very complex problem. The
complexity arises due to its dependency on many factors including channel conditions, allowable data rate, distortion-rate characteristics of multimedia source,
and size of multimedia data. Furthermore, an optimal error protection scheme for
one channel condition may not be optimal anymore if the channel changes, which
1
2
Introduction
is very common when users are mobile. Efforts have been made [3–10] to provide
unequal error protection (UEP) schemes that are both simple and effective. Yet,
most of the developed techniques are either computationally demanding and are
not applicable in live systems or they are suitable for specific applications, channel
codes, and/or data packet formats.
Quality evaluation of mobile multimedia services is another issue since the
best quality assessment method, i.e. conducting a subjective experiment, is very
expensive and not applicable to in-service quality monitoring [11, 12]. Fidelity
metrics such as the mean squared error (MSE) and peak signal-to-noise ratio
(PSNR) are commonly used for quality assessment and evaluation of error protection techniques but they are not always consistent with subjective tests [11,12].
Similarly, the evaluation of the above mentioned UEP techniques is also done in
terms of MSE and/or PSNR. Perceptual quality metrics such as the structural similarity (SSIM) index [12] and the visual information fidelity (VIF) criterion [13]
possess better correlation with human quality assessment. Using perceptual quality metrics in quality monitoring and evaluation of error protection strategies may
therefore help improving the quality of mobile multimedia services.
In this thesis, we aim at exploiting characteristics of the human visual system
(HVS) for channel coding to improve the end user quality of mobile multimedia
services, specifically, wireless imaging and video services. In particular, we have
used the concept of region of interest (ROI) to design error protection strategies
which give preferential treatment to ROI over the background and in this way take
into account the visual importance of ROI. As such, the ROI quality is largely
preserved at the receiver at the cost of some loss in background quality. Since
the background quality loss is typically less annoying to the viewer compared to
degradation in ROI quality, the overall visual quality of an image or video will be
improved. In addition, dividing a codestream into multiple but shorter streams,
representing ROIs and background, the rather complex task of error protecting a
long codestream is translated into smaller problems of providing separate error
protection for relatively short streams. As a consequence, the overall complexity
of designing an error protection scheme can be reduced. Furthermore, the evaluation of the proposed error protection schemes is done in terms of perceptual
quality metrics which adjusts the error protection schemes according to the quality perception of the end user.
Introduction
3
The rest of this introduction aims at providing some background of the topic
and giving the reader an outline of the scope of the thesis. Section 1 gives an
overview of some common mobile multimedia services. In Section 2, properties
of the human visual system are described and how they can be used to improve
source and channel coding. Various metrics that are used for multimedia quality
assessment are discussed in Section 3. Section 4 describes error control coding
and how error protection can efficiently be applied to mobile multimedia services
in order to maintain acceptable QoS for various channel coding rates and channel
conditions. Finally, Section 5 gives an overview of the thesis and summarizes its
major contributions.
1
Mobile Multimedia Services
Multimedia can be defined as the use of multiple media such as text, graphics, audio, images, video and user interactivity to convey information [14]. It can also be
described in a more narrower sense as a set of software and hardware means used
to create, store and transmit information presented in various digital formats [15].
Multimedia is a general term but we will refer to multimedia data as data associated with speech, audio, images, and video. In this work, we will in particular
focus on images and videos. Mobile multimedia services refers to multimedia
services that are provided to mobile users. These services include transmission of
images, streaming of audio or video, gaming and multimedia messaging services
as shown in Fig. 1.
Mobile audio and video services include live music, TV and video on demand,
obtained on mobile devices. Wireless imaging services consist of wireless image
transmission from one device to another. Wireless imaging is supported in modern
wireless technologies such as Wi-Fi and Bluetooth. The latest digital cameras also
support this service and enable the user to send photos to anyone at the time of
capture. Voice over IP (VoIP) represents a family of technologies that use Internet
or packet-switched networks for voice services. Multimedia Messaging Service
(MMS) supports any combination of text, graphics, photos, audio clips, and video
clips within certain size limits. Gaming services include the network games for
mobile users.
4
Introduction
Audio,
Video
Imaging
Mobile multimedia
services
VoIP
MMS
Gaming
Others
Figure 1: Composition of mobile multimedia services.
2 Properties of the Human Visual System
Alongside the radio technologies, source and channel coding has been under
sharper focus for research in the last two decades. On the source coding fronts,
novel techniques have been developed to compress the source information as
much as possible with minimum loss in quality. These novel approaches include
those that make use of our existing knowledge about the working of human audio
and visual systems. For example, the HVS is less sensitive to the color (chrominance) information in any image or video scene compared to the brightness (luminance) information. Making use of this information has led many image and
video compression systems including the well known Joint Photographic Experts
Group (JPEG) image format to gain a better compression without any loss in visual quality. Other features of the HVS, some of which are exploited in modern
compression systems, include different levels of sensitivity in moving and static
objects in a video scene, foveated vision and ROI.
In an image or a video scene, some objects or regions attract the viewers at-
Introduction
5
(a) Image with ROI
(b) ROI
(c) Background
Figure 2: Aircraft image: (a) ROI highlighted, (b) ROI, (c) Background.
6
Introduction
tention more compared to the rest of the viewing scene. These regions are called
ROIs. This is illustrated in Fig. 2 which shows a parked aircraft. It is more likely
that a viewer of this image will be attracted to the aircraft more compared to the
background, which makes the aircraft to become the ROI of this particular image
as highlighted by the white rectangle in Fig. 2a. Separate ROI and background
are shown in Fig. 2b and Fig. 2c, respectively.
Based on ROI and foveated vision properties of the HVS, it has been shown
[16, 17] that the quality degradation due to the distortions occurring in the ROI is
more noticeable to an observer than the degradation due to the same amount of
distortions occurring in the background. Thus, a separate and prioritized processing of ROI and background parts by utilizing more resources on preserving the
ROI quality compared to the background can be useful in providing the underlying wireless imaging or video services with acceptable QoS.
3 Multimedia Quality Assessment
The digital representation of images and videos has enabled many new features
that were not possible to their analogue counterparts such as compression and error control coding. Many lossy and lossless compression algorithms have been
developed providing trade-offs between compression ratio and the resulting quality. Lossy compression is most commonly used in real systems due to its higher
compression ratio. In order to judge the performance of a compression system, we
need to quantify the resulting quality or distortions after compression. Subjective
assessment provides the best estimate of quality as humans are the final judges of
the quality. However, subjective experiments are expensive and not suitable for inservice quality monitoring [11, 12]. Therefore, various objective quality metrics
have been developed to assess image and video quality that have been used also
to examine the performance of various image/video processing and compression
algorithms.
Objective quality metrics can be divided into three main groups: Full-reference
(FR), reduced-reference (RR) and no-reference (NR) quality metrics. The difference among these groups being the access to reference information required to
estimate the quality of the distorted image or video. Specifically, FR metrics re-
Introduction
7
quire the original image or video in order to quantify the quality of the impaired
image while NR does not require anything but the impaired image for quality
assessment. RR metrics do not require the original image but some reduced information from it such as certain features to estimate the quality of an impaired
image or video. FR metrics include the PSNR, SSIM index [12] and VIF criterion [13]. RR metrics include the perceptual relevance weighted LP -norm [11],
the extreme value normalized hybrid image quality metric (NHIQM) [18], and the
metric proposed in [19], while the metrics proposed in [20] and [21] belong to the
NR group.
In the following, a brief description of some of these metrics is provided:
• PSNR: As the name suggests, PSNR is defined as the ratio of peak signal
power to average noise power and is specified in decibel (dB). The peak
signal is the dynamic range of pixel values while the noise is given by the
difference between pixels of the impaired and the original image. As such,
PSNR is based on pixel-by-pixel comparison of the original and the distorted image.
• SSIM Index: This metric works on a block basis and divides the impaired
image into small rectangular windows. On this basis, it measures the structural similarity between the impaired image windows and the corresponding
windows of the original image. Finally, a single quality value is obtained
by averaging the structural similarity values of all windows.
• VIF Criterion: Based on a statistical information model of natural scenes,
it quantifies the visual information present in the reference image. The quality of an impaired image is then related to the extent to which the same
information is extractable from it. This model estimates the perceptual annoyance caused by different artifacts instead of the artifacts themselves.
• LP -norm: The perceptual relevance weighted LP -norm is based on image
features such as blocking, blur, image activity, and intensity masking that
quantify the presence of certain artifacts in an image. It first extracts these
features from both distorted and original image. The distortion measure for
the impaired image, referred to as LP -norm, is then obtained based on the
8
Introduction
weighted difference between each feature value calculated for the impaired
image to the same feature value for the original image. Different weights are
used for different features depending on their relevance to the quality of the
viewing experience. These relevance weights are deduced from subjective
experiments to adequately represent human perception.
• NHIQM: Similar to LP -norm, the NHIQM is centered around the normalized feature values of an image. Specifically, it extracts features form the
impaired image and finds an NHIQM value as a weighted sum of these features. A mapping function can also be found to tailor these weights to the
mean opinion scores (MOS). The difference between the NHIQM values
for original image and for the impaired image quantifies the quality loss in
the impaired image.
Classical approaches for image and video quality assessment are generally
based on fidelity metric MSE or PSNR mainly due to the their simplicity and
mathematical convenience [12]. The poor correlation of these metrics has led to
the development of perceptual quality metrics such as the SSIM index, the VIF
criterion and the LP -norm. The perceptual quality metrics tend to extract and
use the structural information of an image in order to quantify its quality. As a
result, they possess better correlation with human quality perception. Due to this
fact, we have mainly used the perceptual quality metrics in our work for quality
assessment.
4 Error Protection for Mobile Multimedia
Error control coding (ECC) has been commonly used to protect digital data against
errors for storage and transmission over error-prone channels. Several channel
codes have been developed to improve the coding performance while keeping the
complexity as low as possible. The basic principle of ECC is to add redundancy to
the information in order to detect and/or correct errors occurring during transmission or storage [22]. The error correcting codes can be divided into two classes:
Block codes and convolutional codes. Block codes encode the information stream
on a block-by-block basis. They accept a block of k information symbols and
Introduction
9
generate a codeword of n symbols by introducing redundancy of n − k symbols. Some of the commonly used block codes are Hamming codes, Golay codes,
Bose-Chaudhuri-Hochquenghem (BCH) codes, and Reed Solomon (RS) codes.
Convolutional codes, on the other hand, convert the entire data stream into one
single codeword. Convolutional codes have memory since the value of an encoded bit depends not only on the current k input bits but it may also depend on
past input and coded bits. The powerful turbo codes are well known examples of
convolutional codes.
Error control codes tend to operate on link layer level. As a result, their performance is judged in terms of bit or symbol error correction capabilities for a given
code rate. However, when these codes are used for the protection of image and
video streams, the coding performance may not be judged in terms of the aforementioned link layer metrics due to following reasons. Firstly, there are generally
different levels of importance among the data within an image and video stream.
As a result, the effect of bit errors occurring in one part of the codestream on the
reconstructed image or video quality may not be the same compared to the same
number of errors occurring in some other part of the codestream. Secondly, for
these services the objective for applying channel coding is not only reducing the
bit errors that occur in the codestream but also minimizing the effect of these errors on the reconstructed image/video quality. Finding a suitable error protection
for these services, therefore, is generally carried out as followed. First of all, a
suitable channel code family is chosen. Then, the optimal set of channel codes
from the chosen code family is obtained, one code for each codestream packet,
that provide the best quality of the reconstructed image or video at the receiver
under given channel conditions and overall allowable code rate. In addition, combining multiple channel code families and using interleaving techniques may also
improve the error resilience for packet loss channels and channels having burst
error characteristics.
UEP is preferable over equal error protection (EEP) since it allows to treat
different parts of an image or video stream differently, based on their importance.
As mentioned before, the performance of an UEP strategy depends on different
factors such as the choice of channel codes, code rate, and channel conditions.
It also depends strongly on how the levels of importance are defined within the
codestream. As the final quality of an image or video services is judged by hu-
10
Introduction
mans, it is beneficial to define the levels of importance within the codestream in
relation to their influence on the visual quality of the reconstructed image or video
as perceived by the end user. One means of dividing a codestream into levels of
importance is to assign ROI a higher priority than the less important background.
As such, offering the ROI better protection compared to the background can result
in an overall better visual quality for wireless imaging and video services.
In this thesis, therefore, we have proposed an error protection framework
based on preferential treatment of ROI and have shown advantages it may bring
in error protection of mobile multimedia services.
5 Thesis Overview and Contributions
The thesis consists of four parts. Part I describes ROI identification, ROI coding
and its potential uses in wireless imaging services. It also proposes a framework
for ROI coding for wireless imaging. In Part II, the error sensitivity of JPEG2000
[23] images is analyzed in terms of perceptual quality metrics. To further improve
the usefulness of the error sensitivity descriptor (ESD) marker segment of wireless
JPEG2000 (JPWL) [24], the use of reduced-reference perceptual quality metrics
as ESD is proposed. Part III proposes two UEP schemes for JPEG2000 images
and analysis their performance in various channel conditions and for different
code rates. The performance evaluation is done in terms of both fidelity metrics
and perceptual quality metrics. The effectiveness of the proposed UEP techniques
is discussed for the wireless transmission of ROI coded images. Finally, in Part IV,
a framework for optimized preferential coding of ROI coded images and videos
is proposed. Specifically, the visual importance of ROI is exploited by providing
better protection to ROI against channel errors compared to the background. It is
shown by simulations that our approach provides a very similar performance to
that of optimal UEP [4], in particular, for multiple description image and video
streams. The complexity of our algorithm is very low compared to the optimal
UEP technique.
Introduction
5.1
11
PART I: On Region of Interest Coding for Wireless Imaging
In this part, a brief survey of different aspects of ROI coding is presented. Different techniques for ROI identification and ROI coding are discussed. Some applications are also identified for which ROI coding may prove to be beneficial in
improving the performance of these applications. In addition, a wireless imaging
framework is proposed which takes into account the importance of ROI for link
adaptation.
The main contributions of this part can be summarized as follows:
• A classification among different images is made which is based on the
sources, the origins and/or the purpose of these images such as digital photographs and medical images. This classification can be useful since different classes have different purposes and uses for images. Hence, a transmission system can be optimized based on the needs of the class of images it is
used for. The most likely candidates for ROIs are also identified in these image classes which may serve as a priori information for ROI identification
algorithms.
• Given the importance of ROI identification in the framework of ROI coding
for wireless imaging, some of the approaches for ROI identification are
discussed. These approaches include visual attention based identification,
knowledge based identification, and ROI identification in document images.
• Various standard and nonstandard ROI coding techniques for image compression are discussed. In addition, a comparison is made among these
techniques based on different features. These features include the capabilities of coding of multiple ROIs with different priorities and support of
arbitrary shaped ROIs without having the need to encode and transmit their
shape information.
• Advantages of ROI coding are highlighted and some potential applications
are identified that can benefit from ROI coding.
• A framework is proposed for wireless transmission of images. This framework benefits from ROI coding in link adaptation and can be very useful
12
Introduction
for wireless imaging systems in providing mobile multimedia services with
acceptable quality.
5.2 PART II: Error Sensitivity Analysis for Wireless JPEG2000
Using Perceptual Quality Metrics
Accounting for the importance of error sensitivity information for wireless imaging systems, the sensitivity of different parts of a JPEG2000 codestream to channel errors is investigated in terms of perceptual quality metrics. The sensitivity
of different parts of an image or video stream provides an estimate of the quality when those parts are lost during transmission. The prior knowledge about the
quality of an image or video, reconstructed from such an erroneous codestream,
can be very useful for the receiver in deciding either to display it to the viewer or
to request for a retransmission. Furthermore, the error sensitivity information may
assist the channel coding by highlighting the level of importance of different parts
of the codestream. The channel coder may use different channel codes to protect
various parts of the codestream depending on their error sensitivity values.
Apart from offering an error protection tool set, JPWL provides means of
finding error sensitivity for JPEG2000 images. Nevertheless, the error sensitivity
mechanism has limitations including its failure to quantify the quality in the presence of residual errors in the codestream and very weak correlation of the fidelity
metrics recommended for error sensitivity description with human perceived quality. We have proposed to include RR objective quality metrics as sensitivity descriptors which eliminates both of the aforementioned shortcomings.
The major contributions of this part are as follows:
• We have identified some limitations of JPWL. Firstly, the metrics used
to specify the sensitivity of the JPEG2000 codestream such as MSE and
PSNR possess very weak correlation with human perceived quality. Secondly, these metrics fail to determine the quality of an image if it is reconstructed from an erroneous codestream, hence limiting the scope of the ESD
marker segment. We have proposed solutions to overcome these limitations
of JPWL.
Introduction
13
• An alternative error sensitivity analysis is provided by suggesting the use of
a reduced-reference perceptual quality metric for error sensitivity description of JPEG2000 images. This can improve the performance of the JPWL
error control tool set in the following way. Firstly, the utilized LP -norm
possesses better correlation with human perceived quality [11, 18] which
helps designing the wireless system according to human perception. Secondly, in addition to being used as ESD, it can also quantify the quality of
the image reconstructed from either truncated and/or erroneous codestream.
This enhancement makes JPWL capable of monitoring the in-service quality, making the JPWL enabled JPEG2000 image format more suitable for
wireless imaging services. Further, the suggested quality metric can be
added in JPWL standard without losing its backward compatibility.
5.3
PART III: Quality Assessment of Error Protection Schemes for
Wireless JPEG2000
Suitable error control mechanisms play a vital role in wireless imaging and video
services in providing and maintaining acceptable QoS. We have, therefore, investigated the performance of two error protection schemes for JPEG2000 images for various channel code rates and for different channel conditions. Both of
the suggested UEP schemes outperform EEP in the medium signal-to-noise ratio
(SNR) regime for both additive white Gaussian noise (AWGN) and fading channels. Moreover, the schemes are comparable to EEP in complexity and may be
deployed in real time services. Since the proposed UEP schemes apply two levels
of error protection to the codestream, they can be very useful for the transmission
of ROI coded images by applying stronger protection to ROI and weaker or no
protection to the background. ROI quality can be preserved at the cost of some
background quality loss, resulting in an overall better image quality, compared to
the image quality obtained by EEP.
The major contributions of this part are summarized below:
• Two UEP strategies are proposed for JPEG2000 images using the error control tool set of JPWL. It is shown that both strategies outperform EEP in
14
Introduction
various channel conditions and for different code rates. In addition, the
complexity of the proposed UEP strategies is comparable to that of EEP.
• The main novelty of this part is that the performance analysis of the proposed UEP strategies was done in terms of perceptual quality metrics such
as SSIM index [12] and VIF criterion [13].
• The proposed UEP schemes are very simple to implement, yet very effective specifically for the codestreams in which the data can be divided into
different levels of importance. These levels can be based on, for example,
the spatial importance of their corresponding regions.
• Though the performance of the proposed UEP strategies is examined for
JPEG2000 images, they are equally applicable to Motion JPEG2000 streams
and other image and video coding standards that support ROI coding or
similar concepts. Modern video coding standards, e.g. MPEG4 and H.264,
support objects-based coding [25], a concept similar to ROI in the sense of
dividing the codestream into foreground objects and background. Therefore, the proposed UEP strategies are also applicable to these modern video
formats.
5.4 PART IV: A Framework for Error Protection of Region of
Interest Coded Images and Videos
Designing an optimal error control scheme for a wireless multimedia service is a
very challenging problem due to many factors including varying channel conditions and limited bandwidth. The error protection systems also face the problem
of complexity raised due to searching for optimal error protection solution from
a large solution space. Research has been done in recent years and various approaches have been adopted to reduce the complexity for searching the optimal
channel codes for given channel conditions and bandwidth. Complexity is still
a main issue for most of the the developed techniques, for instance, the optimal
techniques presented in [4] and [7], and the suboptimal techniques such as [5], are
computationally demanding and are not applicable in live systems. Further, no
Introduction
15
optimal or suboptimal technique has been developed specifically for error protection of ROI coded images and videos. In this part, we have therefore proposed a
framework for error protection of ROI coded images and videos.
Specifically, more visual importance of ROI in an image or a video scene,
compared to the background is exploited and is used for protection against channel
errors. It is shown that with reduced complexity, our approach gives very similar
performance to that of the optimal UEP scheme of [4], for multiple description
image and video streams.
The main contributions of this part are described below:
• A framework is proposed for error protection of ROI coded images and
videos. The proposed framework benefits form the spatial importance of
ROI and background with respect to designing efficient channel coding.
• A dynamic programming algorithm is developed based on the proposed
framework.
• Performance of the algorithm is evaluated in terms of augmented SSIM index using spatial weighting for ROIs and background. Taking into account
the higher viewer attention towards ROIs compared to background, higher
weights are given to ROIs and a lower weight is applied to the background
for computing the overall image and video quality.
• The proposed framework is a potential candidate for future wireless image
and video services which are based on ROI coding due to its good performance and low complexity.
Bibliography
[1]
—, “Recognising the Promise of Mobile Broadband,” A White Paper from
the UMTS Forum, Jun. 2010.
[2]
J. A. Harmer, “Mobile Multimedia Services,” BT Technology Journal, vol.
21, no. 2, pp. 169-180, Jul. 2003.
16
Introduction
[3]
P. G. Sherwood and K. Zeger, “Error Protection for Progressive Image Transmission Over Memoryless and Fading Channels,” IEEE Trans. on Communications, vol. 46, no. 12, pp. 1555-1559, Dec. 1998.
[4]
V. Chande and N. Farvardin, “Progressive Transmission of Images over
Memoryless Noisy Channels,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, pp. 850-860, Jun. 2000.
[5]
B. A. Banister, B. Belzer, and T. R. Fischer, “Robust Image Transmission
Using JPEG2000 and Turbo-Codes,” IEEE Signal Processing Letters, vol.
9, no. 4, pp. 117-119, April 2002.
[6]
J. Kim, R. M. Mersereau, and Y. Altunbasak, “Error-Resilient Image and
Video Transmission Over the Internet Using Unequal Error Protection”,
IEEE Trans. on Image Proc., vol. 12, no. 2, pp. 121-131, Feb. 2003.
[7]
S. Dumitrescu, X. Wu, and Z. Wang, “Globally Optimal Uneven ErrorProtected Packetization of Scalable Code Streams”, IEEE Trans. on Multimedia, vol. 6, no. 2, pp. 230-239, April 2004.
[8]
N. Thomos, N.V. Boulgouris, and M.G. Strintzis, “Wireless Image Transmission Using Turbo Codes and Optimal Unequal Error Protection”, IEEE
Trans. on Im. Proc., vol. 14, no. 11, pp. 1890-1901, Nov. 2005.
[9]
Lei Cao, “On the Unequal Error Protection for Progressive Image Transmission,” IEEE Trans. on Image Proc., vol. 16, no. 9, pp. 2384-2388, Sept.
2007.
[10] G. Baruffa and P. Micanti, “Error Protection and Interleaving for Wireless
Transmission of JPEG 2000 Images and Video,” IEEE Trans. on Image
Proc., vol. 18, no. 2, pp. 346-356, Feb. 2009.
[11] U. Engelke, “Perceptual Quality Metric Design for Wireless Image and
Video Communication,” Licentitiate dissertation, no. 2008:08, Blekinge Institute of Technology, Karlskrona, Sweden, Jun. 2008.
Introduction
17
[12] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality
Assessment: From Error Visibility to Structural Similarity,” IEEE Trans. on
Image Proc., vol. 13, no. 4, pp. 600-612, Apr. 2004.
[13] H. R. Sheikh and A. C. Bovik, “Image Information and Visual Quality,”
IEEE Trans. on Image Processing, vol. 15, no. 2, pp. 430-444, Feb. 2006.
[14] —, “Wikipedia: The Free Encyclopedia,” [Online], Available http://en.
wikipedia.org/wiki/Multimedia, Last accessed: Jul. 2010.
[15] I. Bocharova, “Compression for Multimedia,” Department of Information
Technology, Lund University, ISBN: 91-7167-030-0, Lund, Mar. 2008.
[16] R. Barland and A. Saadane, “Blind Quality Metric Using a Perceptual Importance Map for JPEG-2000 Compressed Images,” in Proc. IEEE International Conference on Image Processing, Atlanta, USA, pp. 2941-2944, Oct.
2006.
[17] E. C. Larson, C. Vu, and D. M. Chandler, “Can Visual Fixation Patterns Improve Image Fidelity Assessment?,” in Proc. IEEE International Conference
on Image Processing, San Diego, USA, pp. 2572-2575, Oct. 2008.
[18] U. Engelke and H.-J. Zepernick, “Quality Evaluation in Wireless Imaging
Using Feature-Based Objective Metrics,” in Proc. IEEE Int. Symp. on Wireless Pervasive Computing, San Juan, Puerto Rico, pp. 367–372, Feb. 2007.
[19] Z. Wang and E. P. Simoncelli, “Reduced-reference Image Quality Assessment Using a Wavelet-Domain Natural Image Statistic Model,” in Proc.
SPIE Human Vision and Electronic Imaging, vol. 5666, pp. 149-159, Mar.
2005.
[20] T. Jeong, Y. Kim, and C. Lee, “No-reference Image-quality Metric Based
on Blur Radius and Visual Blockiness,” SPIE Journal: Optical Engineering,
vol. 49, no. 4, pp. 045001-1–045001-9, Apr. 2010.
[21] Z. Wang, H. R. Sheikh, and A. C. Bovik, “No-reference Perceptual Quality Assessment of JPEG Compressed Images,” in Proc. IEEE International
18
Introduction
Conference on Image Processing, Rochester, New York, USA, pp. I-477-I480, Sept. 2002.
[22] R. H. Morelos-Zaragoza, “The Art of Error Correcting Coding,” Second Edition, John Wiley & Sons, 2006.
[23] International Organization for Standardization, “Information Technology –
JPEG 2000 Image Coding System – Part 1: Core coding system,” ISO/IEC
15444-1:2004(E), Sept. 2004.
[24] International Organization for Standardization, “Information Technology –
JPEG 2000 Image Coding System – Part 11: Wireless,” ISO/IEC 1544411:2007, May 2007.
[25] Y. Q. Shi and H. Sun, “Image and Video Compression for Multimedia Engineering: Fundamentals, Algorithms, and Standards,” Second Edition, CRC
Press, 2008.
Part I
Part I
On Region of Interest Coding for
Wireless Imaging
Part I is published as:
M. I. Iqbal and H.-J. Zepernick, “On Region of Interest Coding for Wireless
Imaging,” in Proceedings of International Conference on Signal Processing and
Telecommunication Systems, Gold Coast, Australia, pp. 198-208, Dec. 2007.
On Region of Interest Coding for Wireless Imaging
Muhammad Imran Iqbal and Hans-Jürgen Zepernick
Abstract
Numerous studies in the fields of human vision and electronic imaging have revealed that the human visual system (HVS) tends to focus on a
few preferred areas for given typical images/scenes. Subjective experiments
have also shown a strong correlation for these preferred areas among the involved test subjects provided that the same context is viewed. In light of the
limited available and expensive resources in technical systems such as mobile multimedia systems, it would therefore be favorable to explore findings
about the operation of the HVS in the design of technical communication
systems. This paper aims at stimulating such HVS driven approaches in
the context of preferential image coding; the region of interest (ROI) coding and its potential application in wireless imaging. In particular, we will
elaborate on the general concepts of ROI coding, propose a framework for
ROI coding for wireless imaging, review ROI identification mechanisms,
and discuss ROI support by non-standardized techniques and ROI support
in the JPEG2000 standard. As this paper is of conceptual nature, the work
will be consolidated in a classification of contemporary ROI coding techniques including a discussion of their advantages and disadvantages. As a
consequence, a number of application areas of ROI coding are identified
with the major focus given to the field of wireless imaging.
23
24
Part I
1 Introduction
The rapid growth of the third and development of future generation radio communication technologies has led to a significant increase in the demand for multimedia communications involving image and video services. However, the hostile
nature of the radio channel, which is time-varying and susceptible to multipath
fading, makes the deployment of such multimedia services much more challenging as would be the case in a wired system. Limitations are also imposed on these
services due to the constraints on the available spectrum resources. Especially,
image and video services require a substantial amount of bandwidth compared to
speech services, which in turn results in increased terminal power consumption
and potentially reduced range. Radio resource management (RRM) has therefore
gained particular attention in the design of modern mobile radio systems where
high bandwidth demanding services and mixed traffic characteristics pose key
challenges on the resource management. Efficient utilization of bandwidth in digital imaging and video services is achieved by the use of various compression
techniques. In this paper, we are interested in preferential image coding; the region of interest (ROI) coding and its potential application in wireless imaging.
The concept of ROI coding is motivated by studies in the fields of human
vision and electronic imaging, which have revealed that the human visual system
(HVS) tends to focus on a few preferred areas when viewing an image [1]. Factors
that have been identified to influence visual attention include contrast [2], shape
of objects [3], size of objects [4], color [5], location [6], and context of a given
image. It has also been shown that people in an image attract immediate attention
by the human observer [7]. Furthermore, it has been observed that objects in the
foreground of an image are given preferable attention over objects in the background of an image. As such, it is natural to exploit characteristics of the HVS,
especially ROI coding, in the design of technical systems. So far, applications
of ROI coding have mainly been focusing on source compression techniques, image quality evaluation, image databases, and strategies for advertising. However,
the use of ROI coding for wireless imaging in mobile radio systems or wireless
local area networks (WLAN) appears to mainly concentrate in combination with
approaches that deploy unequal error protection [8–12]. On the other hand, more
general approaches of using ROI coding with RRM can be expected to perform
On Region of Interest Coding for Wireless Imaging
25
favorable in terms of overall system capacity over traditional RRM techniques.
Moreover, ROI coding potentially allows specified regions of interest to receive preferential treatment at compression-time whereby these regions can be
compressed at a higher quality than the rest of the image. This allows further
gains in compression efficiency to be achieved at the expense of the quality of the
less important background image information while preserving the quality of the
more important ROI image information. Thus, ROI coding is applicable in a radio
environment where the aim is to efficiently utilize and manage the scarce radio
resources whilst at the same time guarantee a satisfactory end-user quality of service (QoS). As a consequence of the high compression efficiency of ROI coding,
the resulting data streams are highly susceptible to the effects of transmission errors. Even though a number of error resilience tools have been incorporated with
certain standards, these do not guarantee that the received data is free of errors.
Thus, it is recommended to use advanced error control coding along with ROI
coded multimedia services in mobile radio applications to minimize the effect of
transmission errors.
In view of the above, this paper aims at stimulating HVS driven signal processing approaches in terms of preferential image coding; the ROI coding and
its potential application in wireless imaging. In particular, we elaborate on the
general concepts of ROI coding, present ROI definitions and identification mechanisms, and discuss ROI support by non-standard techniques and as suggested in
the JPEG2000 standard. This will be consolidated in a classification of contemporary ROI coding techniques including a discussion about their advantages and
disadvantages.
This paper is organized as follows. Section 2 describes general concepts of
ROI coding. In Section 3, a framework for use of ROI coding in wireless imaging
is presented. Subsequently, Section 4 elaborates on approaches that support ROI
identification as this would be a crucial function in exploiting ROI coding in applications such as RRM in mobile multimedia systems. In Section 5, we provide
a survey of realizations of ROI coding techniques ranging from non-standardized
approaches to JPEG2000 and research focusing on improving JPEG2000. In Section 6, advantages of ROI coding in different applications are presented. Section 7
concludes the paper.
26
Part I
2 General Concepts of ROI Coding
The sizes of digital images and videos have been increasing continuously in recent
years, which in turn has increased storage demands as well as bandwidth requirements for their transmission. Source compression is the most common technique
used to cope with these resource problems. These techniques may take advantage
of different features of the HVS. For example, the HVS has been shown to be
more sensitive to the luminance (brightness) of an image/scene and less sensitive
to its chrominance (color) information. This has lead to preferential treatment of
luminance over chrominance during compression, i.e. more compression is imposed on chrominance while less compression is applied to the luminance. As
far as imaging is concerned, the Joint Photographic Experts Group (JPEG) and
JPEG2000 standards have used these types of HVS characteristics.
2.1 Region of Interest
While luminance and other features are commonly applied to whole images, additional compression gains can be obtained by exploring spacial sensitivity of the
HVS. Studies have shown that the HVS is more sensitive to certain areas than the
rest of an image. For example, for the image shown in Fig. 1, the majority of
human observers will give preferential attention to the two helicopters while the
remainder of the image will hardly attract the observers interest. In general, the
areas or regions of the image which attract the HVS attention more are called ROIs
while the rest of the image is called background. In literature, different terms are
used for ROI such as “preferential area”, “zone of interest”, “focus of attention”,
“object of interest” [13] and “targets” [14]. Keeping terminology with the majority of publications, we will use the term ROI. In this sense, ROI coding refers to
image and video coding that gives preferential treatment to ROIs.
Apparently, ROIs can have arbitrary shapes and sizes and can be different for
different observers and for different image classes. Furthermore, ROIs can be
static or dynamic. The static or fixed ROIs are those defined at encoding time
and cannot be changed later on while the dynamic ROIs are those defined and/or
changed interactively by the users during the progressive transmission of images.
In view of defining an ROI in a technical system, different shapes may be consid-
On Region of Interest Coding for Wireless Imaging
27
Figure 1: Image with helicopters as ROIs [58].
ered for segmentation the image/scene content into ROI and background such as
circle, rectangle, elliptical segmentation, or even arbitrary shapes.
2.2
Image Classes and Region of Interest
As far as implementation of ROI concepts in technical systems is concerned, a
classification of image types with reference to their source of origin and/or application may be useful. In the sequel, some of the prominent image classes along
with the related notion of ROI that is typically associated with these classes are
identified and explained.
Digital photographs
This class of images is produced using a digital camera such as a pocket camera, professional camera or a mobile handheld equipped with a digital camera.
Accordingly, the range of content may vary from portraits of people, photos of
groups of people, architectural buildings and landmarks, and landscapes. As such,
an ROI is naturally related to the type of content, for example, faces of people in
a group.
28
Part I
Satellite images and aerial photographs
Satellite images include images of the Earth or other planets taken from satellites.
Arial photographs are images of the ground taken from the air, for example, using
a helicopter or aircraft. These images can have a variety of ROIs depending on
the application. Typical examples include buildings, tanks, and planes on military
bases.
Medical images
This class refers to images of the human body or parts of it, taken for either diagnostic and examination purposes for different diseases or for study purposes.
Endoscopic images, thermographic images, and retina fundus images are examples of medical images. ROI in this class of images can vary depending upon
many factors including the disease under consideration.
Document images
These are images of different documents generated by scanning the printed documents. These documents can be of any type used in residential homes and offices.
The ROIs in this class of images may be printed text, handwritten text, and stamps
etc.
2.3 Quality assessment of ROI coding schemes
Quality assessment of particular source compression or ROI coding schemes are
often based on using measures such as peak signal-to-noise ratio (PSNR) and
mean squared error (MSE). However, both measures are fidelity metrics that relate to pixel-by-pixel comparisons, they not necessarily correlate well with human perception [17, 18]. In order to better correlate the performance of image or
video processing approaches to the actual quality as perceived by humans, it is
suggested to deploy measures that incorporate aspects of the HVS. We will refer
to these measures as objective perceptual image quality measures when they are
based on algorithms that mimic characteristics of the HVS and subjective perceptual image quality measures when they are deduced from subjective experiments.
On Region of Interest Coding for Wireless Imaging
(a)
29
(b)
Figure 2: Sample image “Mandrill”: (a) Original image giving PSNR = 58.48 dB
and HIQM = 24.2, (b) Mirrored image giving PSNR = 14.59 dB and HIQM =
24.2 [16].
In particular with ROI coding, which itself is based on characteristics of the HVS,
it would be natural to replace fidelity metrics with objective perceptual quality
metrics.
To further motivate this perceptual-based image quality assessment, Fig. 2
presents the example of sample image “Mandrill” showing (a) the original image
and (b) the same image but mirrored with respect to a vertical axis placed at the
center of the horizontal axis. Clearly, PSNR cannot cope with this type of operation as pixels would not line up between unprocessed and mirrored image. In this
example, it would suggest a significant reduction in image quality indicated by
the decrease of PSNR from 58.48 dB to 14.59 dB. In contrast, perceptual-based
quality metrics such as the hybrid image quality metric (HIQM) [15, 16] can be
expected to better align with quality as perceived by humans. For the example
shown in Fig. 2, HIQM actual gives the same value of 24.2 for both the original
image and the mirrored image as the viewing experience is the same.
In recent years, a number of metrics that are based on image features rather
than image fidelity have been proposed such as the following measures:
30
Part I
Reduced-reference image quality assessment
The reduced-reference image quality assessment (RRIQA) technique has been
proposed in [19]. It is based on the natural image statistic model in the wavelet
domain. In particular, it calculates the distortion between received and transmitted image using the Kullback-Leibler distance between the probability density
functions with respect to each subband in the transmitted and received image.
The attribute of being reduced-reference relates to the fact that this metric does
not rely on the availability of the original image but requires only information of
some image properties.
Measure of structural similarity
A full-reference metric has been reported in [20] requiring availability of the original image for its operation as is the case with PSNR. Although the applicability
of this metric for wireless imaging is limited due to its full-reference nature, it
may serve as a benchmark test for the reduced-reference metrics. This metric is
based on the degradation of structural information. Its outcome is a measure of
structural similarity (SSIM) between the reference and the distorted image.
Hybrid image quality metric
The reduced-reference hybrid image quality metric (HIQM) as proposed in [15]
focuses on extracting different features of an image. Namely, it considers blocking, blur, image activity, and intensity masking each described by its own metric. The contribution of each feature to the overall image quality is calculated as
weighted sum of the involved feature metrics. It is noted that these weights were
extracted by statistical analysis of subjective experiments. The HIQM value is calculated for both the original image and distorted image, thus, quality degradation
is indicated simply by their difference. Due to the limited bandwidth of the radio
channel, HIQM seems to be well suited as the overall perceptual quality measure
can be represented by a single number. This number can be concatenated with the
data stream of each transmitted image without creating too much overhead.
On Region of Interest Coding for Wireless Imaging
31
Normalized hybrid image quality metric
Although HIQM uses feature value normalization, namely relevance normalization, an extreme value normalization would be more convenient in view of comparisons with other distance measures such as the Lp -norm. The related NHIQM
approach has been presented in [21, 22]. The normalization ensures that the
weights of the different feature measures fall into the same range. It is also suggested in this work to optionally clip the normalized feature values that are actually calculated in a real-time wireless imaging application to fall in the interval
[0, 1]. For example, severe signal fading due to multipath propagation could result in significant image impairments at certain times such that the user-perceived
quality is in a region where the HVS cannot differentiate anymore among quality
degradation levels.
3
Framework of ROI Coding for Wireless Imaging
Having motivated a HVS driven system design paradigm, key functionalities with
respect to wireless imaging are identified in this section. Fig. 3 illustrates the
associated conceptual framework of ROI coding for wireless imaging.
At the transmitting end, source encoding and channel encoding represent the
unique processing blocks in this framework that explore characteristics of the
HVS for efficient image compression and enable reliable transmission of the compressed image data over the radio channel, respectively. In contrast to the conventional image compression approaches of using spatial redundancy and psychovisual redundancy for lossy compression, we suggest to also deploy ROI coding for
the preferential areas. This allows for another degree of freedom in the design of
an efficient overall system supporting the following mechanisms among others:
• High quality source encoding of ROIs over background to reduce bandwidth
and storage requirements.
• Controlling of ROI and background compression rates subject to given QoS
constraints. In other words, bit rate may be traded off with QoS and vice
versa.
Part I
32
Original image
Quality/bitrate
Source
encoding
ROI coding
- ROI indetification
- Preferential ROI
encoding
- Bitstream
generation
UEP/EEP
Channel
encoding
Hidden Text
Feedback
Radio channel
Channel
decoding
Quality measures
Source
decoding
ROI decoding
- Bitstream parsing
- Image decoding
(Reverse to ROI
encoding)
Reconstructed
image
Figure 3: A framework for ROI coded image transmission over error prone radio channel.
On Region of Interest Coding for Wireless Imaging
33
• Increasing the tool set for producing a desired bit rate budget other than by
the conventional bits per pixel (bpp) considerations.
• Offering additional source significant information that can be used to advice
more efficient channel coding schemes for source compressed image data
other than preferential treatment of headers over payload.
• Increasing the options for explicit link adaptation in terms of ROI versus
background compression to accompany power control, adaptive modulation, and adaptive channel coding.
• Enabling to produce HVS based key performance indicators that may be
explored for charging models.
The conceptual components of ROI coding include automatic ROI identification as a first processing step and the subsequent encoding of the ROI area or
shape. Owing the fact that the contemporary image coding techniques use some
form of linear block transform, ROI representation may include a transform of
the ROI shape into an ROI mask with respect to the transform coefficients. The
subsequent preferential ROI encoding may then be combined with the usual image compression steps such as quantization, linear block transform, encoding, and
bitstream generation.
The bitstream released by the ROI encoder constitutes the input to the channel encoder. In addition to conventional bitstream compositions such as headers,
markers, and payload, the additional source significant information given by the
ROIs calls for replacing equal error protection (EEP) by unequal error protection
(UEP) or incremental redundancy concepts. This increases the flexibility in adaptation of channel coding depending on the source compressed image data and the
progression of the transmission conditions on the radio channel over time including the following options:
• Preferential channel encoding of header and ROI information over other
components of the bitstream.
• Adaptation between EEP and UEP depending on the channel conditions.
34
Part I
• Support of explicit link adaption by providing a range of UEP codes.
• Implicit link adaption with focus on the ROIs using error detection along
with automatic repeat request (ARQ) and soft-combining techniques.
At the receiving end, the reverse operations to the ROI encoding and channel
encoding need to be performed, given by the related channel decoding and ROI
decoding algorithms. The major challenges at the receiver, however, may be imposed by the calculation of suitable quality measures. As far as mobile multimedia systems in general and wireless imaging in particular is concerned, approaches
that support objective perceptual quality assessment have gained increased attention just recently. Hence, especially no-reference or reduced-reference objective
perceptual quality metrics are still needed as the original image content would not
be available at the receiver. These metrics may then be fed back to the transmitter
to be used for link adaptation purposes.
It shall be noted that the proposed framework of ROI coding for wireless
imaging may scale towards wireless video services. However, spatial-temporal
redundancy would need to be considered in this case requiring to process visual
information across a sequence of frames.
4 ROI Identification
Given the important role of ROI identification within the framework of ROI coding
in wireless imaging as shown in Fig. 3, some of the related identification methods
are described in this section.
The importance of image content to viewers varies largely with the content
itself and the class of image under consideration. This variation makes the task
of automatic identification of ROIs more difficult and it becomes challenging for
an algorithm to identify ROIs such that they correlate well with the ones identified by the human observer. To increase this correlation, human perception and
visual attention (VA) should be taken into account when developing algorithms
to identify ROIs. Different approaches and algorithms for automatic ROIs identification and extraction can be found in literature [13, 23–29], some of which are
discussed in the sequel. It may be concluded from this overview that the area of
On Region of Interest Coding for Wireless Imaging
35
ROI identification constitutes a field which needs further work on algorithms supporting efficient wireless imaging. For example, although two methods for ROI
coding are defined in JPEG2000 [30–32], no procedures are given for automatic
ROI identification.
4.1
Visual Attention Based ROI Identification
An image coding scheme in conjunction with automatic ROI identification is presented in [33]. In this work, first ROIs are identified using an algorithm [23, 24]
that simulates VA. After refining these ROIs [34], the image is finally encoded
following JPEG2000 Part 1.
The VA simulation algorithm is based on the hypothesis that visual attention
is, to a certain extent, dependent upon the disparities between neighborhoods in
the image. Although, the results described in [23, 24] lend some support to the
conjecture, more experiments are needed to further clarify this hypothesis and the
performance of the algorithm.
4.2
Knowledge Based ROI Identification
A knowledge based hierarchical ROI detection method is proposed in [26]. This
method comprises of three steps as will be explained in the following paragraphs.
In the first step, object grouping is performed based on their optical characteristics and then within each group proper resolutions are assigned to objects of
similar sizes.
In the next step, ROIs are detected for a given resolution commencing with
extracting different image features based on color, intensity, edges, and others.
Then, some morphological operations are performed to split overlapping objects
and to merge different regions to form a complete description of one object in one
region. Finally, the detected ROIs are verified on bases of supervisory information.
In the last step, redundancy that may have been introduced in the preceding
ROI detection step due to downsampled versions of the input image is removed by
pixel grouping. The small candidate ROIs are then connected to form bigger ROI
36
Part I
bases on the existing knowledge of ROI sizes. A probability based voting method
is used for proper integration.
The advantages of the method include its ability to detect ROIs of arbitrary
shapes, applicability to images containing connected or broken objects, insensitivity to contrast levels, and robustness to noise. This approach is useful in the
situations where some of the information about the ROIs is available.
4.3 ROI Identification for Document Images
An approach for ROI detection for the financial document images is presented
in [25]. In this method, the ROIs are defined and classified into three types; filled
information (FI), stamps and seals (SS), and handwritings (HW). All three types
of ROIs are detected differently. For FI ROIs, first, the document classification is
done based on matching the input document with a library of predefined document
models to find the best match. After the input document is classified, the exact locations of FI ROIs are known based on the document model of that category. The
class of SS ROIs are detected using connected component analysis based on color
and shape information and the HW ROIs are located by handwriting identification
using an incremental fisher linear discriminant classifier. After merging of all the
three types of ROIs, a final ROI mask is constructed and finally the document is
encoded with the JPEG2000 encoder using the generated ROI mask.
The method works well in the detection of ROIs for financial documents. After some modifications, the method may be used for ROI detection in other types
of document images but it cannot be used for other image classes.
5 ROI Coding Techniques
In this section, we consider some typical non-standard ROI coding techniques that
have been proposed as well as ROI coding in the JPEG2000 standard along with
amendments to this standard. This will reveal the tool set and existing approaches
that may be deployed in wireless imaging applications.
On Region of Interest Coding for Wireless Imaging
5.1
37
Non-Standard Techniques
Many different schemes for preferential image coding have been proposed in the
literature of which a representative selection shall be presented in the following.
ROI coding scheme for digital photograpghy
A progressive ROI image coding technique based on improved zerotree wavelet
(IEZW) algorithm [36] has been proposed in [37]. After applying a wavelet transform, the algorithm encodes the wavelet coefficients in three stages as follows.
Firstly, only the ROI coefficients for N successive approximation quantization
(SAQ) iterations are encoded. Then, the coefficients related to the background
region are encoded for the next N iterations. Finally, encoding is performed on
all wavelet coefficients of the image until the desired bit rate is achieved. The
quality of the ROI increases with the number N of SAQ iterations. The location information of the ROI is encoded using the coordinate data compression
(CDC) method [38] in the lowest frequency band. The simulation results presented in [37] indicate that, for low bit rates, this algorithm performs better for the
ROI compared to conventional progressive coding in terms of PSNR. For high bit
rates, it gives the same PSNR for the whole image as the conventional approaches.
Regarding applications for wireless imaging, progressive ROI image coding
is attractive, for example, in database searches. It allows to quickly identify the
content of an image using a small area that is given by the ROI and proceed to
the next image if the viewed image is not of interest. In this way, transmission
capacity and service costs can be conserved as only a small amount of bits need to
be transmitted over the radio link. Progressive ROI image coding can also be used
to adopt to a given bit rate budget by trading off the quality of ROI and background
such that a given quality measure is fulfilled.
ROI coding scheme for satellite and aerial images
Another example of ROI coding can be found in [14] aiming at very low bit rates.
This scheme advocates an object-based wavelet technique for encoding the ROI.
In the first step, contour masks are generated for all ROIs indicating the shapes of
38
Part I
the targets. For this purpose, the automatic target recognition (ATR) system described in [39] is used. To reduce the size of information, these contour masks are
downsampled and then coded using a differential chain code (DCC). The significant mask in the wavelet domain is constructed by using the upsampled versions
of the downsampled masks. In this step, the image is decomposed into 22 subbands using a 5-3 biorthogonal 2-D discrete wavelet transform (DWT). Then, two
normalized sequences are produced for each subband; one consisting of wavelet
coefficients related to the ROI and the other to the coefficients of the background.
In the next step, these normalized subband coefficients are encoded using a fixed
rate trellis code quantizer (TCQ) [14]. Finally, the bits are allocated optimally in
the MSE sense as suggested in [40].
In view of wireless imaging, one advantage of the above algorithm and similar
schemes is its ability to scale the quality of the regions of interest. In addition, the
potential of operating with very low bit rates, such as related to compression ratios
of 320:1, would be a beneficial feature for wireless applications.
ROI coding scheme for medical images
A detailed survey on ROI coding for the class of medical images can be found
in [41]. In the sequel, we will discuss the method presented in [42] as an example
of this class.
In particular, the image compression algorithm proposed in [42] aims at reducing storage data with lossless ROI but lossy background compression. The
algorithm is based on the hierarchical subband decomposition called S+P transform (Sequential transform + Prediction) [43], which is an integer wavelet transform. The algorithm is organized in the following steps. Firstly, normalization
of coefficients is done after taking S+P transform. Then, the ROI mask is calculated following the same steps as the forward S+P transform. Subsequently, a
progressive quantization of the calculated coefficients is performed using a modified version of the set partitioning in hierarchical trees (SPIHT) algorithm [44].
The results are kept in order of importance. Finally, the symbol stream is encoded
using entropy encoder.
On Region of Interest Coding for Wireless Imaging
5.2
39
ROI Coding in the JPEG2000 Standard
As far as imaging is concerned, JPEG2000 is the latest version of a series of
standards for image compression developed by the Joint Photographic Experts
Group (JPEG). JPEG2000 is thought to provide superior performance and overcome limitations of the existing JPEG image compression standard, which suffers
from blocking and ringing artifacts, especially at high compression rates. The
rich feature set of JPEG2000 include superior compression performance, multiple resolution representation, bitstream organization mechanisms to facilitate progressive decoding by quality and/or by spatial resolution, single architecture for
both lossless and lossy compression, ROI coding and error resilience tools. Conceptually, JPEG2000 is a wavelet-based coding system drawing mainly on ideas
of embedded block-based coding with optimized truncation (EBCOT). In brief,
wavelet coefficients are calculated to reveal the redundancy contained in the image content, the resulting coefficients are quantized and subsequently entropy encoded. The information in the wavelet domain is organized within bitplanes with
each bitplane relating to a particular compression rate and image quality with respect to the reconstruction. While the lower bitplanes relate to large compression
rates and lowest quality, the higher bitplanes relate to small compression rates and
highest quality. The bitstream released by the encoder is organized accordingly,
commencing with the highest bitplane followed by the underlying bitplanes.
Preferential encoding of the ROI is one of the unique features of JPEG2000
[30, 45] that makes it suitable for applications such as imaging over error prone
channels. In the sequel, the two methods for ROI coding defined in the standard,
general scaling based (GSB) and maximum shift (MAXShift) method [30–32],
are presented.
General scaling based method
In GSB [30, 32] the quantized wavelet coefficients associated with the ROI are
scaled up by a given value, so that the corresponding bits are placed in higher bitplanes compared to the background as shown in Fig. 4. For this purpose, the ROI
shape needs to be defined, encoded, and included in the bitstream. This shape
is represented by the so-called ROI mask, a bitplane indicating those quantized
40
Part I
transform coefficients that are sufficient for the decoder to reconstruct the ROI. In
order to reduce processing complexity and overhead, only rectangular and elliptical ROI shapes are allowed.
Advantages of GSB include simple means of adjusting the preferential treatment of ROI compared to background by choosing the scaling value and the potential of defining multiple ROIs with different priorities.
Maximum shift method
In MAXShift, the quantized wavelet coefficients associated with the ROI are
scaled up by an amount such that ROI and background coefficient bitplanes do
not overlap. In this way, the ROI coefficients are elevated to bitplanes above those
for the background coefficients as shown in Fig. 4. As a consequence, ROI information is placed first in the bitstream and hence no background can be processed
before the whole ROI is decoded. Clearly, all coefficients above a distinct bitplane belonging to the ROI while those coefficients below that bitplane relate to
the background. As a result, neither overhead nor complexity has to be spend for
ROI mask features.
The advantages of MAXShift include the fact that no ROI shape needs to
be encoded and transmitted, which reduces bit rate and processing complexity
at encoder and decoder. Also, arbitrary ROI shapes are supported and different
subbands can be treated differently [32].
5.3 Amendments to ROI coding in JPEG2000
Despite all benefits, JPEG2000 has certain limitations. MAXShift has limitations
including those listed below:
• Relative importance between ROIs and background cannot be controlled.
• No more than one ROI out of several ROIs can be encoded with different
priority.
• No background information can be processed until all ROI coefficients are
fully decoded.
On Region of Interest Coding for Wireless Imaging
41
-ROI Coefficients Bitplanes
-Non-ROI Coefficients Bitplanes
No Scaling
GSB method
MAXShift method
Figure 4: ROI coding in JPEG2000 [47, 51].
• Large bit shifts for ROI coding may cause bit overflows.
GSB overcomes some of the above listed limitations but has its own shortcomings, such as those listed below:
• ROI shape information needs to be encoded, which increases overhead and
complexity.
• Arbitrary shaped ROIs other than rectangular and elliptical cannot be encoded with GSB.
Suggestions have been given to overcome these limitations and to improve
ROI coding in the existing JPEG2000 standard. These approaches fall into two
main categories i.e. coefficient scaling based methods and packet rearrangement
based methods. Some of the approaches covering above mentioned categories are
demonstrated below and a comparison is made at the end of this section.
42
Part I
MAXShift-like method
An improvement for MAXShift and GSB is proposed in [46]. Suggestions are
given on how to choose the optimal scaling value for MAXShift and the padding
of the extra bits appearing during the shift operation. Also a new ROI coding algorithm called MAXShift-like algorithm is presented. The MAXShift-like algorithm
uses a smaller scaling value compared to MAXShift.
The MAXShift-like algorithm reduces the bit rate compared to MAXShift
at the cost of a slightly decrease in ROI and background qualities. MAXShiftlike improves over GSB by removing the need for encoding and transmitting ROI
shape information. The generated codestreams can be handled by any JPEG2000
decoder.
Bitplane-by-bitplane shift/Generalized bitplane-by-bitplane shift method
A bitplane-by-bitplane shift (BbBShift) based ROI coding method is presented
in [47]. Instead of scaling all the bitplanes of ROI coefficients with the same
amount as in JPEG2000, the scaling is done differently for different bitplanes. The
resulting bitplanes after scaling can be divided into three categories; the first and
most significant category contains s1 most significant bits (MSBs) of ROI coefficients, the second category comprises of the subsequent 2s2 bitplanes containing
s2 MSBs of background and unassigned s2 bits of ROI coefficients alternately,
while the third category contains the remaining least significant bitplanes of the
background. Here the sum s1 + s2 is equal to the largest number of bitplanes for
ROI coefficients. This bit assignment scheme is illustrated in Fig. 5a.
The so-called generalized bitplane-by-bitplane shift (GBbBShift) method proposed in [48] expands on BbBShift. Specifically, after arbitrary non-overlapping
ROI and background bitplane shifting, a binary bitplane (BP) mask is created that
locates the shifted ROI and background bitplanes. Each bit of the BP mask represents one bitplane as shown in Fig. 5b. The BP mask is encoded and transmitted
which assists the decoder in shifting the bitplanes back to their original order.
The BbBShift/GBbBShift methods allow early decoding of significant bitplanes of the background once sufficient ROI quality is achieved. Clearly, more
flexible control of ROI and background is supported by BbBShift/GBbBShift
On Region of Interest Coding for Wireless Imaging
43
-ROI Coefficients Bitplanes
-Non-ROI Coefficients Bitplanes
BP
Mask
1
1
1
0
1
0
0
1
0
1
1
1
0
0
0
0
(a) BbBShift (s1=3, s2=5)
(b) GBbBShift
Figure 5: ROI coding using BbBShift and GBbBShift as in [47] [48].
which would operate favorable within the considered framework for wireless imaging. Moreover, the ROI shape does not to be encoded while arbitrary scaling
values are allowed.
Most/partial significant bitplane shift method
In the most significant bitplane shift (MSBShift) method [49], also known as partial significant bitplane shift (PSBShift) [50], only the s most significant ROI bitplanes are shifted keeping the other bitplanes at their original places (Fig. 6).
The advantages of the method include that relative importance of ROI and
background can be adjusted, multiple ROIs can be handled using different priorities, and arbitrary shaped ROIs can be handled without shape encoding.
44
Part I
-ROI Coefficients Bitplanes
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Mb-s
0
0
0
0
0
0
0
0
0
0
0
s2
0
0
0
0
0
s1
0
0
0
0
0
Mb-s1
(a) Single ROI
Mb-s2
s
-Non-ROI Coefficients Bitplanes
(b) Multiple ROIs
Figure 6: ROI coding using MSBShift/PSBShift.
Hybrid bitplane shift method
Another ROI coding method called hybrid bitplane shift (HBShift) method is presented in [51], which is actually a combination of the BbBShift and MSBShift
methods. Here, the coefficient bitplanes are divided into three parts; the most
significant bitplanes of ROI (MSR), the general significant bitplanes of ROI and
BG (GSRB), and the least significant bitplanes of ROI and BG (LSRB). The MSR
and GSRB bitplanes are shifted in a way similar to the BbBShift method while the
encoding of LSRB bitplanes is done without any shift like in MSBShift method
(see Fig. 7).
The advantages of the method include; arbitrary ROI shapes can be encoded
and for this purpose no ROI shape information is needed to be encoded and transmitted, the relative importance of ROI and background can be controlled and multiple ROIs can be encoded with different priorities.
On Region of Interest Coding for Wireless Imaging
45
-ROI Coefficients Bitplanes
LSRB
LSRB
GSRB
GSRB
MSR
MSR
-Non-ROI Coefficients Bitplanes
(a) Single ROI
(b) Multiple ROIs
Figure 7: ROI coding using HBShift.
Demand driven packet rearrangement method
A dynamic ROI coding scheme based on the rearrangement of the packets in the
encoded JPEG2000 data stream is proposed in [52]. The main idea is to place
packets associated with ROI first and those belonging to background afterwards
in the data stream so that the ROI can be received and decoded before the background. The packets linked to the user-defined ROIs are selected after the computation of associated precincts. The remaining packets in the data stream are
rearranged in such a way that these selected packets are sent prior to the nondemanded packets, now considered as background for the rest of the transmission
or until a change in the user demands occurs.
The demand driven packet rearrangement method allows for dynamic ROI
46
Part I
encoding and supports regular polygonal-shaped ROIs.
Multi-level-priority based packet rearrangement method
This method allows for gradual priority change between ROIs and background
[10, 11]. Instead of defining two priority levels, one for ROIs and the other for
background, multiple priority levels are introduced between ROIs and the background. The highest priority level is assigned to ROI packets while for the background packets, the priority drops following a Gaussian distribution. After priority
assignment, the packets are rearranged among the layers regarding their level of
priorities. The main header of the final codestream is modified to accommodate
the new generated layers and empty packets are created where necessary.
Controlling relative importance between ROIs and background in each decomposition level and handling multiple ROIs with different priorities are among
the benefits of this method.
Low priority packet supersession based packet rearrangement method
Another approach of ROI coding based on packet rearrangement is proposed in
[53], which is similar to the demand driven method discussed above. The only
difference is that no new layers are created and the packets with priorities lower
than certain values are suppressed instead and therefore there is no need to update
the main header.
As such, this method is simple and fast. Also, re-encoding of the image, i.e.
updating the main header, is not needed.
5.4 Comparison of ROI Coding Methods
A comparison among different approaches of scaling based ROI coding is given
here and summarized in Table 1. The features used for classification of these
methods are:
• Support of multiple ROIs with different priorities.
• Relative preference control for ROI and background.
On Region of Interest Coding for Wireless Imaging
47
• Support of arbitrary shaped ROIs without need for shape encoding and
transmission.
• Compatibility with JPEG2000 standard.
A more comprehensive comparison may be given if some more features/parameters
such as complexity and memory requirements for all the methods could be measured accurately. However, this is outside the scope of this conceptual paper as
the presented methods serve only as indication for potential amendments in contemporary image compression techniques towards wireless imaging.
6
Advantages of ROI Coding in Different Applications
It is important to understand the effects of preferential ROI image coding on perceptual quality in order to eventually benefit over non-ROI coding. As JPEG2000
is the only standard that supports ROI coding, preliminary research has been focusing on this standard and its ROI coding approach.
6.1
General Performance Characteristics of ROI Coding
Subjective experiment
The work presented in [33] investigates on perceptual quality of ROI coding versus non-ROI coding. Detection of the primary ROIs was done using a VA algorithm [23, 24, 54]. Images were ROI encoded using the MAXShift method
of JPEG2000. A subjective experiment was conducted involving ten subjects (8
male and 2 female viewers) to reveal the relationship between encoding/decoding
mechanisms and perceptual quality. It was concluded that ROI coding performs
well for image content that comprises of a distinguished ROI over background of
little relevance. The favorable operation of ROI coding seems to decrease with
an increase of contextual detail in the background. It was also concluded that the
benefits of ROI coding over non-ROI coding increases with the decrease of bit
rate in terms of bits per pixel. Specifically, it was suggested that coefficient scaling based ROI coding is best in situations where arbitrary shaped ROI needs to be
encoded in the bitstream and where ROI size is less than 25% of the image size.
Part I
48
Table 1: Comparison between different ROI coding methods.
Support of multiple ROIs and background Arbitrary shaped-
Compatibility with
GSB
MAXShift
-
X
-
different priorities
X
X
X
-
control
X
X
X
-
X
shape encoding
-
-
-
X
Included in the standard
Included in the standard
JPEG2000 standard
MAXShift-Like
-
X
X
ROIs without
BbBShift/GBbBShift
X
X
ROIs encoding with relative preference
MSBShift/PSBShift
X
Name
HBShift
On Region of Interest Coding for Wireless Imaging
49
Although this experimental work indicates benefits of ROI coding in certain
scenarios over non-ROI coding, larger subjective test campaigns would need to be
conducted. In order to produce statistically more relevant results, a methodology
following ITU-R Rec. BT.500-11 [55] is recommended. This would involve a
group of at least 20 test subjects and the experiment to be conducted in two independent visual laboratories. In view of applications in wireless imaging, it would
also be interesting to investigate whether the benefits of ROI coding are more pronounced once impairments to the JPEG2000 compressed image are imposed by
the error prone radio channel as an ultimate test of its applicability. Furthermore,
ROI coefficient scaling methods other than MAXShift may perform differently
and should be analyzed for comparision.
Mechanisms of processing spatial detail
In [54], an investigation about the impact of the three different mechanisms provided in JPEG2000 for encoding/decoding of spatial detail on the perceptual image quality is provided. In particular, tiling of ROI and background, code block
size selection, and scaling of ROI coefficients are considered. Performance assessment was done using the fidelity metric PSNR and was not linked to a perceptual metric or a subjective experiment. The numerical results produced in this
work support the following conclusions. ROI coding using tiling was not found
to be efficient. This is thought to be due to the primary purpose of tiling being
a mechanism to reduce memory requirements rather than advancing ROI coding.
As for the ROI coding using different code block sizes, it was observed that the
selection highly dependents on bit rate, ROI size and priority of ROI compared to
the background. For high bit rates, large ROI or nearly equal important ROI and
background, a block size of 32 × 32 was suggested while a 16 × 16 block size
should be used when ROI size is small compared to image size, bit rate is low or
when ROI has high priority over background. In addition, it is also suggested to
use code block selection where arbitrarily shaped ROI is expected to be extracted
from the coded bitstream. Finally, the coefficient scaling based ROI coding seems
to perform best in scenarios where arbitrary shaped ROIs are needed and ROI
accounts for less than 25% of the image.
50
Part I
6.2 Exploring ROI Coding for Wireless Imaging
Combining ROI coding with channel coding, in particular using UEP over EEP, is
a natural choice for exploring spatial detail. Three UEP schemes for ROI coding
are reported in the sequel. They all support the favorable operation of ROI coding
in conjunction with UEP for wireless imaging.
UEP and ROI Coding
The scheme proposed in [56] takes advantage of the hierarchical nature of the ROI
coding of JPEG2000. It uses MAXSHift for ROI coefficient scaling and applies
two levels of error protection. Strong protection is given to the ROI packets using the (24,12) extended Golay code and relatively weak protection is given to
the background packets using the (8,4) extended Hamming code. Conventional
EEP was considered for comparison. The average PSNR of the ROI and the background information over 100 tansmissions of the image under test and number of
openable images (openable with the Kakadu software [57]) were compared for
both schemes. The results show that the images protected by UEP offer higher
PSNR compared to those protected by EEP. For the scenario considered in this
work, it has also been observed that none of the received images was decodable
at a signal-to-noise ratio of 6dB when EEP was used whereas 42% of the received
images were decoded when UEP was utilized.
Adaptive UEP and ROI Coding
In [12], the prioritized adaptive unequal channel protection (PAUCP) scheme is
presented. It assigns protection to each JPEG2000 packet by making use of adaptive unequal channel protection (AUCP) [8, 9] and the priority of the packet based
on its distance from the center of the ROI as presented in [10]. Simulation results for Rayleigh fading channel showed an improvement in the visual quality of
the reconstructed images compared to different channel protection techniques at
different channel conditions and bit rates.
On Region of Interest Coding for Wireless Imaging
51
JPEG2000 Wireless
In April 2007, the JPEG2000 image coding system Part 11: Wireless, also referred to as JPWL, has became a published standard [59]. JPWL employs error
detection and error correction techniques to the codestream in order to facilitate
transmission of JPEG2000 encoded image data over error prone wireless channels
and networks. It basically addresses the fact that the error protection technique advised in the JPEG2000 core standard operates on the premise of error-free headers,
which in many applications such as wireless imaging over radio channels may not
be fulfilled. In particular, JPWL specifies two cyclic redundancy check (CRC)
codes, the 16-bit CRC-CCITT (X25) and 32-bit Ethernet CRC to be used in the
common way for error detection. Alternatively, a set of Reed-Solomon (RS) codes
for a variety of block lengths and error correction capabilities are specified. The
set of RS codes is used for forward error correction and enables to perform UEP
for different parts of the codestream depending on their importance for the reconstruction of the image at the receiver. In particular, more error protection may be
given to the main header and tile headers in the codestream.
7
Conclusions
In this conceptual paper, we have focused on stimulating HVS driven signal processing approaches in terms of preferential image coding; the ROI coding and its
potential application in wireless imaging. For this purpose, the general concepts
behind ROI coding have be presented including the notion of ROI, image classes,
and the important field of quality assessment of ROI coding and related perceptual
metrics. On this basis, a framework for ROI coding for wireless imaging has been
proposed and its key functions have been discussed. In particular, the areas of ROI
identification, ROI coding and quality measures may be seen as unique features
of such a framework allowing for a number of advanced system design options.
Especially, UEP can accompany or replace EEP while suitable quality measures
would drive link adaptation techniques and allow for trading off quality with bitrate and vice versa. Regarding the building blocks for the suggested framework,
it can be concluded that both ROI identification and perceptual-based image quality needs further research efforts in order to provide the complete tool set for the
52
Part I
wireless imaging scenario to function efficiently. On the other hand, ROI coding
techniques itself are available as non-standard algorithms and the JPEG2000 standard. Both classes have been surveyed and discussed in this paper revealing many
options for preference scaling between ROI and background in the JPEG2000
standard. It has also been noted that JPEG2000 has recently advised a wireless
part that includes some conventional channel coding schemes to account for the
error prone wireless channel and support UEP of headers and ROI over background information in the codestream. Eventually, advantages of ROI coding and
some related work that explores ROI coding is reported and discussed. Although
ROI coding has been shown to operate favorable over non-ROI coding, the related
work appears to be inconclusive. It may be recommended to perform larger scale
subjective experiments as well as expanding towards studying the effects of the
error prone channel on the perceptual quality and not solely concentrate on the
ROI coding algorithms in isolation. Similar scope appears to exist when it comes
to exploiting the ROI feature as only a few UEP coding approaches have been
suggested so far while efficient link adaption schemes could further strengthen
the benefits of ROI coding in wireless imaging. The work presented in this paper
can support the understand of the relationship between the fundamental components of ROI coding for wireless imaging and stimulate the consolidation of those
fields that may still be considered as underdeveloped. It can also seen as a platform to expand HVS based methodologies towards cross-layer design techniques
for wireless multimedia systems and general RRM strategies other than explicit
and implicit link adaptation.
References
[1]
L. Stelmach, W. J. Tam, and P. J. Hearty, “Static and Dynamic Spatial Resolution in Image Coding: An Investigation of Eye Movements”, in Proc. SPIE
Human Vision, Visual Processing and Digital Display II, San Jose, USA, vol.
1453, pp. 147-152, June 1991.
[2]
L. Stark, I. Yamashita, G. Tharp, and H. X. Ngo, “Search Patterns and Search
Paths in Human Visual Search”, in D. Brogan, A. Gale, and K. Carr (eds),
Visual Search 2, pp. 37-58, Taylor and Francis, London, 1993.
On Region of Interest Coding for Wireless Imaging
53
[3]
N. H. Mackworth and A. J. Morandi, “The Gaze Selects Informative Details
Within Pictures”, Perception and Psychophysics, vol. 2, no. 11, pp. 547-552,
1967.
[4]
J. M. Findlay, “The Visual Stimulus for Saccadic Eye Movements in Human
Observers”, Perception, vol. 9, pp. 7-21, 1980.
[5]
M. D’Zmura, “Color in Visual Search”, Vision Research, vol. 31, no. 6, pp.
951-966, 1991.
[6]
J. Wise, “Eye Movements While Viewing Commercial NTSC Format Television”, White Paper, SMPTE Psychophysics Committee, 1984.
[7]
A. L. Yarbus, “Eye Movements and Vision”, Plenum Press, New York, 1967.
[8]
V. Sanchez and M. K. Mandal, “Adaptive Unequal Channel Protection for
JPEG2000 Images”, in Proc. Indian Conf. on Computer Vision, Graphics
and Image Processing, Ahmedabad, India, Dec. 2002.
[9]
V. Sanchez, M. K. Mandal, and A. Basu, “Efficient Channel Protection for
JPEG2000 Bitstream”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 14 no. 14, pp. 554-558, Apr. 2004.
[10] V. Sanchez, A. Basu, and M. K. Mandal, “Prioritized Region of Interest Coding in JPEG2000”, in Proc. Int. Conf. on Pattern Recognition, Cambridge,
UK, vol. 2, pp. 799-802, Aug. 2004.
[11] V. Sanchez, A. Basu, and M. K. Mandal, “Prioritized Region of Interest Coding in JPEG2000”, IEEE Trans. on Circrrits and Systems for Video Technology, vol. 14, pp. 1149-1155, Sept. 2004.
[12] V. Sanchez, M. Mandal, and A. Basu, “Robust Wireless Transmission of
Regions of Interest in JPEG2000”, in Proc. Int. Conf. on Image Processing,
Singapore, vol. 4, pp. 2491-2494, Oct. 2004.
[13] T. M. Stough and C. E. Brodley, “Focusing Attention on Objects of Interest
Using Multiple Matched Filters”, IEEE Trans. on Image Processing, vol. 10,
no. 3, pp. 419-426, March 2001.
54
Part I
[14] D. Giguet, L. J. Karam, and G. P. Abousleman, “Very Low Bit-Rate TargetBased Image Coding”, in Proc. Asilomar Conf. on Signals, Systems and
Computers, Pacific Grove, USA, pp. 778-782, Nov. 2001.
[15] T. M. Kusuma and H.-J. Zepernick, “A Reduced-Reference Perceptual Quality Metric for In-Service Image Quality Assessment”, in Proc. IEEE Symp.
on Trends in Commun., Bratislava, Slovakia, pp. 71-74., Oct. 2003.
[16] T. M. Kusuma, “A Perceptual-Based Objective Quality Metric for Wireless
Imaging”, Ph.D. thesis, Curtin University of Technology, Perth, Australia,
2005.
[17] S. Winkler, E. D. Gelasca, and T. Ebrahimi, “Perceptual Quality Assessment
for Video Watermarking”, in Proc. IEEE Int. Conf. on Inf. Technology: Coding and Compression, Las Vegas, USA, pp. 90–94, Apr. 2002.
[18] A. W. Rix, A. Bourret, and M. P. Hollier, “Models of Human Perception”,
Journal of BT Technology, vol. 17, no. 1, pp. 24–34, Jan. 1999.
[19] Z. Wang and E. P. Simoncelli, “Reduced-Reference Image Quality Assessment Using a Wavelet-Domain Natural Image Statistic Model”, in Proc.
SPIE Human Vision and Electronic Imaging, vol. 5666, pp. 149-159, Mar.
2005.
[20] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality
Assessment: From Error Visibility to Structural Similarity”, IEEE Trans. on
Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004.
[21] U. Engelke and H.-J. Zepernick, “Quality Evaluation in Wireless Imaging
Using Feature-Based Objective Metrics”, in Proc. IEEE Int. Symp. on Wireless Pervasive Computing, San Juan, Puerto Rico, pp. 367-372, Feb. 2007.
[22] U. Engelke and H.-J. Zepernick, “An Artificial Neural Network for Quality
Assessment in Wireless Imaging Based on Extraction of Structural Information”, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing,
Honolulu, USA, pp. 1249-1252, Apr. 2007.
On Region of Interest Coding for Wireless Imaging
55
[23] F. W. M. Stentiford, “An Evolutionary Programming Approach to Simulation of Visual Attention”, in Proc. Congress on Evolutionary Computation,
Seoul, Korea, vol. 2, pp. 851-858, May 2001.
[24] F. W. M. Stentiford, “An Estimator for Visual Attention Through Competitive Novelty with Application to Image Compression”, in Proc. Picture Coding Symp., Seoul, Korea, pp. 101-104, April 2001.
[25] X. C. Yin, C. P. Liu, and Z. Han, “Financial Document Image Coding With
Regions of Interest Using JPEG2000”, in Proc. Int. Conf. on Document
Analysis and Recognition, Seoul, Korea, vol. 1, pp. 96-100, Aug/Sept. 2005.
[26] H. Lin, J. Si, and G. P. Abousleman, “Knowledge-Based Hierarchical
Region-of-Interest Detection”, in Proc. IEEE Int. Conf. on Acoustics, Speech
and Signal Processing, Orlando, USA, vol. 4, pp. IV-3628-IV-3631, May
2002.
[27] C. M. Privitera and L. W. Stark, “Algorithms for Defining Visual Regions-ofInterest: Comparison with Eye Fixations”, IEEE Trans. on Pattern Analysis
and Machine Intelligence, vol. 22, no. 9, pp. 970-982, Sept. 2000.
[28] A. Wong and W. Bishop, “Expert Knowledge Based Automatic Regions-ofInterest (ROI) Selection in Scanned Documents for Digital Image Encryption”, in Proc. Canadian Conf. on Computer and Robot Vision, Quebec City,
Canada, pp. 51-51, June 2006.
[29] S. L. Stoev and W. StraBer, “Extracting Regions of Interest Applying a Local
Watershed Transformation”, in Proc. IEEE Visualization 2000 Conf., Salt
Lake City, USA, pp. 21-28, Oct. 2000.
[30] D. S. Taubman and M.W. Marcellin, “JPEG2000: Image Compression
Fundamentals, Standards and Practice”, Kluwer Academic Publishers,
Boston/Dordrecht/London, 2002.
[31] A. N. Skodras and T. Ebrahimi, “JPEG2000 Image Coding System Theory
and Applications”, in Proc. IEEE Int. Symp. on Circuits and Systems, Kos,
Greece, pp. 4, May 2006.
56
Part I
[32] C. Christopoulos, J. Askelöf and M. Larsson, “Efficient Methods for Encoding Regions of Interest in the Upcoming JPEG2000 Still Image Coding
Standard”, IEEE Signal Processing Letters, vol. 7, no. 9, pp. 247-249, Sept.
2000.
[33] A. P. Bradley, “Can Region of Interest Coding Improve Overall Perceived
Image Quality?”, in Proc. APRS Workshop on Digital Image Computing,
Brisbane, Australia, pp. 41-44, Feb. 2003.
[34] A. P. Bradley and F. W. M. Stentiford, “Visual Attention for Region of Interest Coding In JPEG 2000”, Journal of Visual Communication and Image
Representation, vol. 14, pp. 232-250, 2003.
[35] A. Signoroni, “Exploitation and Extension of the Region-of-Interest Coding
Functionalities in JPEG2000”, IEEE Trans. on Consumer Electronics, vol.
49, no. 4, pp. 818-823, Nov. 2003.
[36] E. S. Kang, T. Tanaka, and S. J. Ko, “Improved Embedded Zerotree Wavelet
Coder”, Electronics Letters, vol. 35, no. 9, pp. 705-706, April 1999.
[37] E. S. Kang, H. J. Choi, and S. J. Ko, “Progressive Region of Interest Coding
Using an Improved Embedded Zerotree Wavelet Coding”, in Proc. IEEE
Region 10 Conf., Cheju Island, Korea, vol. 1, pp. 609-612, Sept. 1999.
[38] S. A. Mohamed and M. M. Faymy, “Binary Image Compression Using Efficient Partitioning Into Rectangular Regions”, IEEE Trans. on Commun., vol.
43, no. 5, pp. 1888-1893, May 1995.
[39] L. Ma, J. Si and G.P. Abousleman, “Image Segmentation Using Watershed
Guided By Edge Tracing”, in Proc. Int. Conf. on Info-tech and Info-net,
Beijing, China, vol. 3, pp. 372-377, Oct./Nov. 2001.
[40] Y. Shoham and A. Gersho, “Efficient Bit Allocation for an Arbitrary Set of
Quantizers”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol.
36, no. 9, pp. 1445-1453, Sept. 1988.
On Region of Interest Coding for Wireless Imaging
57
[41] J. Strom and P. C. Cosman, “Medical Image Compression with Lossless
Regions of Interest”, Signal Processing, vol. 59, no. 2, pp. 155-171, June
1997.
[42] D. Nister and C. Christopoulos, “Lossless Region of Interest with a Naturally Progressive Still Image Coding Algorithm”, in Proc. IEEE Int. Conf.
on Image Processing, Chicago, USA, vol.3, pp. 856-860, Oct. 1998.
[43] A. Said and W. A. Pearlman, “An Image Multiresolution Representation for
Lossless and Lossy Compression”, IEEE Trans. On Image Processing, vol.
5, no. 9, Sept. 1996.
[44] A. Said and W. A. Pearlman, “A New, Fast and Efficient Image Codec Based
On Set Partitioning In Hierarchical Trees”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 6, no. 3, pp. 243-250, June 1996.
[45] A. N. Skodras, C. Christopoulos, and T. Ebrahimi, “The JPEG2000 Still
Image Compression Standard”, IEEE Signal Processing Mag., pp. 36–58,
Sept. 2001.
[46] R. Grosbois, D. S. Cruz and T. Ebrahimi, “New Approach to JPEG2000
Compliant Region of Interest Coding”, in Proc. SPICE, San Diego, California, USA, vol. 4472, pp. 267-275, Aug. 2001.
[47] Z. Wang and A. C. Bovik, “Bitplane-by-Bitplane Shift (BbBShift) - A Suggestion for JPEG2000 Region of Interest Coding”, IEEE Signal Processing
Letters, vol. 9, pp. 160-162, May 2002.
[48] Z. Wang, S. Banerjee, B. L. Evans, and A. C. Bovik, “Generalized Bitplaneby-Bitplane Shift Method for JPEG2000 ROI Coding”, in Proc. Int. Conf.
on Image Processing, Rochester, USA, vol. 3, pp. III-81-III-84, Sept. 2002.
[49] L. Liu and G. Fan, “A New Method for JPEG2000 Region-of-Interest Image
Coding: Most Significant Bitplanes Shift”, in Proc. Midwest Symp. on Circuits and Systems, Oklahoma State University, Oklahoma, USA, vol. 2, pp.
II-176-II-179, Aug. 2002.
58
Part I
[50] L. Liu and G. Fan, “A New JPEG2000 Region-of-Interest Image Coding
Method: Partial Significant Bitplanes Shift”, IEEE Signal Processing Letters, vol. 10, no. 2, pp. 35-38, Feb. 2003.
[51] L. B. Zhang and K. Wang, “New Approach for JPEG2000 Region of Interest Image Coding: Hybrid Bitplane Shift”, in Proc. Int. Conf. on Machine
Learning and Cybernetics, Shanghai, China, vol. 6, pp. 3955-3960, Aug.
2004.
[52] R. Rosenbaum and H. Schumann, “Flexible, Dynamic and Compliant Region of Interest Coding in JPEG2000”, in Proc. Int. Conf. on Image Processing, Rochester, USA, vol. 3, pp. III-101 - III-104, Sept. 2002.
[53] H. S. Kong, A. Vetro, T. Hata, and N. Kuwahara, “Fast Region-of-Interest
Transcoding for JPEG2000 Images”, in Proc. IEEE Int. Symp. on Circuits
and Systems, Kobe, Japan, vol. 2, pp. 952-955, May 2005.
[54] A. P. Bradley and F. W. M. Stentiford, “JPEG2000 and Region of Interest
Coding”, in Proc. Digital Image Computing Techniques and Applications,
Melbourne, Australia, pp. 303-308, Jan. 2002.
[55] ITU, “Methodology for the Subjective Assessment of the Quality of Television Pictures,” ITU-R, Rec. BT.500-11, 2002.
[56] Y. Yatawara, M. Caldera, T. M. Kusuma, and H. J. Zepernick, “Unequal
Error Protection for ROI Coded Images Over Fading Channels”, in Proc.
Systems Commun., Montreal, Canada, pp. 111-115, Aug. 2005.
[57] D.
Taubman,
“Kakadu
Software”,
http://www.kakadusoftware.com, July 2007.
[Online],
Available
[58] Sample Image, [Online], Avaliable, http://www.navy.gov.au/gallery/, July
2007.
[59] International Organization for Standardization, “JPEG2000 Image Coding
System: Wireless”, ISO/IEC 15444-11:2007, May 2007.
Part II
Part II
Error Sensitivity Analysis for
Wireless JPEG2000 Using
Perceptual Quality Metrics
Part II is published as:
M. I. Iqbal, H.-J. Zepernick, and U. Engelke, “Error Sensitivity Analysis for Wireless JPEG2000 Using Perceptual Quality Metrics,” in Proceedings of International Conference on Signal Processing and Communication Systems, Gold Coast,
Australia, pp. 1-9, Dec. 2008.
Error Sensitivity Analysis for Wireless JPEG2000
Using Perceptual Quality Metrics
Muhammad Imran Iqbal, Hans-Jürgen Zepernick, and Ulrich Engelke
Abstract
Quality assessment of mobile and wireless multimedia services including image and video applications has gained increased attention in recent
years as a means of facilitating efficient radio resource management. In particular, approaches that utilize perceptual-based metrics are becoming more
dominant, as conventional fidelity metrics such as the peak signal-to-noise
ratio (PSNR) may not correlate well with quality as perceived by the human
observer. In this paper, we focus on the error sensitivity analysis for images
given in the wireless JPEG2000 (JPWL) format using perceptual quality
metrics. Specifically, the perceptual quality improvements obtained by progressively decoding an increasing number of image packets are examined. It
is shown that the considered perceptual quality metrics exploiting structural
image features may accompany or replace the PSNR-based error sensitivity description (ESD) marker segment in the wireless JPEG2000 standard.
This addition will increase the effectiveness of the ESD marker segment as
it facilitates the communication of reduced-reference information about the
image quality from the transmitter to the receiver. In addition, the proposed
approach can be used to guide the design of preferential error control coding
schemes, link adaptation techniques, and selective retransmission of packets
with respect to their contribution to overall quality as perceived by humans.
63
64
Part II
1 Introduction
Mobile multimedia applications such as image and video services have gained increased attention with the deployment of third generation mobile radio systems.
In order to conserve system resources, especially the required bandwidth, multimedia content is typically source encoded prior to transmission aiming at reducing
the redundancy inherent in the source signal. This leaves the generated data stream
highly vulnerable to transmission errors. This applies in particular to the radio
channel, which induces severe impairments due to multipath propagation. The
resulting time-varying signal fading in turn may cause significant degradations to
the service. Therefore, link adaptation techniques including power control, adaptive coding and modulation as well as retransmission techniques are widely used
to improve the quality of radio channels.
In the more general context of radio resource management (RRM), it would
be beneficial to control the link adaptation not only on the varying quality of the
radio channel but also include characteristics of the multimedia service. As far
as visual content is concerned, for example, this may be achieved by differentiating between region-of-interest (ROI) and background (BG) information. Specifically, strong error control may be imposed to the more important ROI components
while weaker schemes could be sufficient for the BG of an image or video. This
approach essentially leads to unequal error protection (UEP) schemes that would
consume less overhead for facilitating error control compared to a worst case design with equal error protection (EEP). As a matter of fact, the same ideas of preferential error control coding with sufficient granularity of differentiation could
also be applied to the actual digital data stream produced by a source encoder.
Specifically, reliable transmission of headers may be essential to the reconstruction of a source encoded image or video while the payload may be composed into
a number of classes of importance with respect to achieving certain visual quality.
The most recent addition to the family of JPEG2000 known as Wireless JPEG
2000 (JPWL) [1] specifies an advanced tool set that allows for the aforementioned
preferential error control coding strategies. As its name indicates, JPWL targets
at wireless imaging applications. Specifically, it recommends an error sensitivity description with respect to the level of importance of different codestream
portions for the quality of the reconstructed image. The error metrics used to
Error Sensitivity Analysis for Wireless JPEG2000 ...
65
quantify these error sensitivities belong to the class of fidelity metrics and include
the mean squared error (MSE) and the peak signal-to-noise ratio (PSNR). These
so-called full-reference metrics operate on pixel-by-pixel comparisons between
reference and impaired images. As a consequence, they are applied in JPWL at
the transmitter end, for instance to compose an UEP scheme, but are not readily applicable to explicitly assess the degree of quality degradation induced by an
error-prone wireless channel. In addition, fidelity metrics do not correlate well
with the quality as it would be perceived by humans [2, 3].
In this paper, we propose to extend the JPWL error sensitivity description by
objective perceptual image quality metrics. In particular, the perceptual relevance
weighted LP -norm and the normalized hybrid image quality metric (NHIQM)
[4, 5] are utilized for error sensitivity analysis of JPWL codestreams. The considered perceptual quality metrics are based on extracting structural features from
the viewing area such as image activity, edges, and lost blocks. The perceptual
weights included in the feature pooling are instrumental in making the connection to the human visual system (HVS) and support excellent quality prediction
performance [5]. Moreover, both perceptual quality metrics belong to the class of
reduced-reference metrics and as such can be readily applied for RRM purposes
requiring only little overhead. The numerical results reveal that the LP -norm and
NHIQM in fact provide an even more pronounced error sensitivity description for
JPWL codestream portions compared to the fidelity metrics adopted in JPWL. As
JPWL has reserved an error sensitivity descriptor field to be available for future
use, the proposed approach could easily be included in the JPWL system.
The remainder of the paper is organized as follows. Section 2 provides fundamentals on the examined JPEG2000 and JPWL image coding standards. Subsequently in Section 3, the error sensitivity description used in JPWL is outlined and
the rationale behind using a perceptual-based error sensitivity descriptor is given.
In Section 4, the quality metrics used for error sensitivity analysis are described.
Numerical results for the error sensitivity analysis of JPWL using conventional
and perceptual-based quality metrics are provided and discussed in Section 5.
Section 6 concludes the paper.
66
Part II
2 JPEG2000 and JPWL
In the sequel, some fundamentals on the JPEG2000 and JPWL encoding process
are given to an extent as needed for the understanding of the error sensitivity
analysis.
2.1 JPEG2000 Image Compression and Codestream
The primary goal for the development of JPEG2000 has been to provide better
coding performance, especially at low bit rates, and to remedy shortcomings associated with JPEG such as blocking artifacts. In addition, JPEG2000 offers a
number of advanced features including multiple resolution representation of an
image, bitstream organization mechanisms to provide progressive decoding by
quality and/or by spatial resolution, single architecture for both lossless and lossy
compression, ROI coding, and error resilience tools [6].
The fundamental components of the JPEG2000 encoding process are organized in the following four main functions:
• Pre-processing. The image pixels are DC level shifted. Optionally, the
preprocessing may also perform tiling and component transformation [7].
• Transformation. Each tile is decomposed into transform coefficients
through a reversible/irreversible discrete wavelet transform (DWT).
• Quantization. The wavelet coefficients are quantized using uniform scalar
quantization with extended dead-zone. However, quantization is omitted if
lossless compression is desired.
• Entropy encoding. The quantized wavelet coefficients are arithmetically
coded using a technique referred to as embedded block-based coding with
optimized truncation.
The bitstream released by the arithmetic encoder represents the coded image
data and is arranged into packets along with associated headers. The syntax of
the resulting codestream comprises of headers, coded image data, and markers.
Figure 1 illustrates the organization of the JPEG2000 codestream. It consists of
Error Sensitivity Analysis for Wireless JPEG2000 ...
67
Figure 1: Organization of a JPEG2000 codestream.
a main header followed by tilestreams and termination by an end of codestream
(EOC) marker. Similarly, each tilestream comprises of a tile header and a packstream. Eventually, each packet in the packstream consists of a packet header and
carries portions of the actual image bitstream as packet data.
It should be noted that the different headers are comprised of markers and
marker segments that are used to delimit and characterize the compressed information carried with the codestream. These modular principles allow for flexible
arrangements of the bitstream and progressive image representation. For example,
resolution and quality progression may be chosen to be reflected in the codestream
organization.
2.2
JPWL Image Compression and Markers
The JPWL component of the JPEG2000 family of image standards is specified
in JPEG2000 Part 11 and aims at wireless applications [1]. Although JPEG2000
offers some error resilience mechanisms to alleviate the effects of errors in noisy
68
Part II
Figure 2: JPWL system using transcoders at transmitter and receiver [1].
transmission channels, they are not sufficient to counteract the severe impairments
induced by error-prone wireless channels. As such, JPWL uses JPEG2000 as
baseline encoder and adds additional functionality to enable error detection and
correction. For this purpose, JPWL describes two types of cyclic redundancy
check (CRC) codes and a set of Reed-Solomon (RS) default codes. Given the
progressive data representation of the JPEG2000 baseline encoder, UEP can be
added with the JPWL extension providing the different levels of error protection
according to the different error sensitivity of the parts of the codestream. Accordingly, JPWL offers the tools to describe the error sensitivity of different image parts as prerequisite for the selection of different CRC and RS codes used
to compose an UEP scheme. It is also capable of locating residual errors that
may be present in the codestream after transmission over an error-prone wireless
channel and transcoding of JPWL to JPEG2000. The block diagram of a typical
JPWL transcoder configuration emphasizing the aforementioned additions to the
JPEG2000 baseline encoder/decoder is depicted in Fig. 2.
The JPWL format introduces four additional marker segments that provide
information about the error protection utilized in a codestream such as selected
code and generated parity symbols. In particular, these marker segments are referred to as error protection block (EPB), error sensitivity descriptor (ESD), error
protection capability (EPC), and residual errors descriptor (RED) [1]:
• EPB. Indicates the presence of JPWL protected data including protection
Error Sensitivity Analysis for Wireless JPEG2000 ...
69
parameters and the parity data generated that enable the protection.
• ESD. Provides the sensitivity information for the different parts of the JPEG
2000 codestream, which may then be used to select the suitable level of
error protection.
• EPC. Contains information about the JPWL tools that are used in the codestream for protection against the errors.
• RED. Indicates the presence of residual errors in the codestream. It locates
and describes categories of residual errors to assist the decoder in reconstructing the image.
As the sensitivity information provided by the ESD marker segment constitutes a key aspect in the design of an UEP scheme for a progressively organized
JPWL codestream, it shall be discussed in the sequel in more detail. Specifically,
the fact that JPWL reserves an option for future ESD types motivated us to explore
such an opening for including objective perceptual quality metrics.
3
Error Sensitivity Description
The notion of error sensitivity relates to the effect of quality degradation in a decoded image that may be experienced due to different parts of a codestream being
truncated, corrupted, or lost during transmission. In other words, error sensitivity reflects the level of importance of different parts of the codestream for the
reconstruction of an image. The JPWL standard caters for three modes of subdividing the codestream into data units, namely, byte-range mode, packet mode,
and packet-range mode. Moreover, the standard differentiates among relative and
absolute sensitivity. Relative sensitivity is also deduced from quality metrics but
expresses the sensitivity of each considered codestream portion simply by an unsigned integer. On the other hand, absolute sensitivity utilizes distinct metrics to
quantify the error sensitivity of the different codestream portions. Specifically, the
metrics defined in JPWL for absolute sensitivity are listed in Table 1.
70
Part II
Table 1: Error metrics used with absolute sensitivity
Specifier Type of Error Metric
001
Mean squared error (MSE)
010
MSE reduction
011
PSNR
100
PSNR increase
101
Absolute peak error (MAXERR)
110
Total squared error (TSE)
111
Reserved for future use
The error sensitivity information provided with the JPWL system can be used
in many ways to tailor the image error protection such that it counteracts the impairments induced by error-prone wireless channels. For instance, error sensitivity
may be explored for optimizing codestream error protection through UEP and designing intelligent automatic repeat request (ARQ) schemes with higher number
of retransmissions allowed for the more important codestream portions.
Although JPWL provides the aforementioned additional tool set for dealing
with the particulars of wireless imaging applications, some limitations remain due
to the choice of error metrics. First of all, the types of error metrics given in Table 1 belong to the class of full-reference metrics. As such, they are applied to
calculate error sensitivity only at the transmitter end where the original image is
available. In this case, the error control coding advised in JPWL may be used to
deal with transmission errors. However, whether the transmission error or potential post-decoding errors have caused a perceptual quality degradation cannot be
decided on as the reference image is not available. In addition, all error metrics
that have been defined in JWPL so far are fidelity metric, which again not always
correlate well with quality as perceived by humans.
In order to further increase the quality control tool set of JPWL, we propose
to include perceptual error sensitivity into the system based on an investigation of
suitable objective perceptual quality metrics. These quality metrics are inherently
Error Sensitivity Analysis for Wireless JPEG2000 ...
71
well adopted to the HVS and may be chosen as reduced-reference metrics to account for wireless imaging scenarios. Also, the ESD specifier reserved for future
use may be assigned to cater for such perceptual-based image quality metrics.
4
Objective Image Quality Metrics
In view of the above, the error sensitivity analysis was performed with respect
to objective perceptual quality metrics. The quality metrics utilized hereafter are
based on extracting structural image characteristics. In the following, main concepts behind these quality metrics are therefore summarized along with some remarks on image fidelity metrics to support performance comparisons.
4.1
Image Fidelity Metrics
Fidelity measurements are based on pixel-by-pixel comparisons between reference and distorted image. The related fidelity metrics such as MSE and PSNR
are conceptually simple and computationally inexpensive. However, these metrics typically show weak correlation with quality as it would be perceived by humans. As the availability of the reference image is required to compute these
metrics, application for in-service quality assessment in communication systems
is not readily supported.
4.2
Perceptual Relevance Weighted LP -norm
A different approach is taken by metrics that explore structural image information in an attempt to mimic the behavior of the HVS. The perceptual relevance
weighted LP -norm introduced in [4] belongs to this class of metrics. It combines
suitable image features that quantify the presence of certain artifacts in an image
such as blocking, blur, image activity, intensity masking, and lost blocks. Depending on the relevance of the particular feature to the quality of the viewing experience, perceptual weights are included into the feature pooling. Clearly, these
relevance weights need to be deduced from subjective experiments to adequately
represent human perception. With the given scenario of JPWL, the perceptual
relevance weighted LP -norm operates on features of transmitted image (t) and
72
Part II
received image (r). Given a total of I considered features, the norm is calculated
as [4]
] P1
[ I
∑
P
P
LP =
wi |ft,i − fr,i |
(1)
i=1
where ft,i and fr,i denote the ith extreme value normalized feature of the transmitted and received image, respectively, wi represents the related perceptual relevance weight, and parameter P is a positive integer.
In order to facilitate detection of degradations in image quality at the receiver,
only the selected I feature values associated with the original image need to be
communicated to the receiver. A mapping function may be applied to translate
the obtained metric value into predicted mean opinion scores (MOS). Strong correlation between the perceptual relevance weighted LP -norm and human quality
perception has been reported in [5].
4.3 Normalized Hybrid Image Quality Metric
The normalized hybrid image quality metric (NHIQM) is centered around accumulating extreme value normalized feature values using a weighted pooling approach [4]. Specifically, NHIQM is defined for a given image as
N HIQM =
I
∑
wi fi
(2)
i=1
where wi denotes the perceptual relevance weight of the ith extreme value normalized feature fi ∈ [0, 1], i = 1, 2, . . . , I.
In order to quantify degradations between transmitted and received images
within the considered wireless imaging application of JPWL, the following difference may be used
∆N HIQM = |N HIQMt − N HIQMr |
(3)
where N HIQMt and N HIQMr represent the NHIQM value for the transmitted
and received image, respectively. The operator | · | denotes the absolute value of
Error Sensitivity Analysis for Wireless JPEG2000 ...
73
the argument. Again, a mapping function may be applied to the obtained difference value for translating it into predicted MOS. More details on the excellent
prediction performance of NHIQM and its strong correlation to human perception
can be found in [5].
As far as overhead for communicating the reduced-reference from transmitter
to receiver is concerned, further savings are gained with NHIQM compared to the
LP -norm as only a single value needs to accompany the related image. This benefit is obtained at the expense of loosing the ability to differentiate degradations
with respect to each involved feature.
4.4
Structural Similarity Index
The structural similarity (SSIM) index proposed in [8] belongs the class of fullreference metrics, i.e. requiring the original image. Although its applicability to
wireless imaging is not necessarily given due to its full-reference nature, it may
serve as benchmark for reduced-reference metrics. The SSIM index is based on
the degradation of structural information and produces a measure of structural
similarity between the reference and the distorted image
SSIM (x, y) =
(2µx µy + C1 )(2σxy + C2 )
(µ2x + µ2y + C1 )(σx2 + σy2 + C2 )
(4)
where µx , µy and σx , σy denote the mean intensity and contrast of image signals
x and y, respectively. The constants C1 and C2 are used to avoid instabilities in
the structural similarity comparison that may occur for particular mean intensity
and contrast combinations.
4.5
Visual Information Fidelity
The visual information fidelity (VIF) criterion [9] explores information theoretical
measures to quantify the loss of image information due to the distortion process.
Specifically, the VIF criterion uses natural scene statistics to connect image information with visual quality. The VIF criterion is a widely known full-reference
metric and as such will serve here also as a benchmark objective image quality
metric.
Part II
74
(e) 8 packets
(a) 1 packet
(f) 12 packets
(b) 2 packets
(g) 16 packets
(c) 3 packets
(h) 24 packets
(d) 4 packets
Figure 3: Perceptual image quality improvement with progression of decoded packets for image sample
‘womanhat’ [10].
Error Sensitivity Analysis for Wireless JPEG2000 ...
5
75
Numerical Results
In the sequel, numerical results of the performed error sensitivity analysis for
JPWL using different image quality metrics are presented. A detailed description
of the system settings is provided first, followed by the result for several scenarios. It is shown that the proposed reduced-reference objective perceptual quality
metrics, i.e. the perceptual relevance weighted LP -norm and ∆N HIQM would be
promising candidates for filling the available ESD marker that is currently available for future use (see Table 1).
5.1
System Settings
Given the JPWL system as shown in Fig. 2 with transcoders at transmitter and receiver, error sensitivity analysis can be based on the JPEG2000 codestream as prerequisite of driving the JPWL transcoders. Specifically, we consider the so-called
packet-range mode [1] for error sensitivity analysis by progressively increasing
the codestream portion involved in the decoding packet-by-packet. Accordingly,
the analysis of the examined image samples commences with only one packet to
be used in the decoding and then increases until finally the entire packets in the
codestream are utilized in the decoding.
The error sensitivity analysis was performed for bitmap (bmp) image samples
taken from the LIVE Quality Assessment Database [10]. The sizes of these images
range from 480 × 720 to 768 × 512 pixels and the variety of scenes includes
vehicles, faces, buildings and natural scenes. For simplicity, the images were
converted from color to gray scale using the Matlab function ‘rgb2gray(·)’. The
gray scale versions of the images were then converted to JPEG2000 using the
Kakadu software [11]. Using single tile, the conversion was realized for 4, 8, and
16 quality layers. A compression rate of 0.5 bits-per-pixel was imposed. Other
parameters used in the conversion were enabling reversible wavelet transform and
selecting the placement of start of packet (SOP) markers in the codestream. Apart
from these settings, the default values provided in the Kakadu software were used.
Then, the selected 4, 8, and 16 quality layers translate to 24, 48, and 96 packets,
respectively.
76
Part II
The reduced-reference perceptual quality metrics used to quantify error sensitivity were the L1 -norm, L2 -norm, and ∆N HIQM . In this paper, we considered five image features relating to the presence of certain artifacts in the image.
Specifically, blocking f1 , blur f2 , edge-based image activity (IA) f3 , gradientbased IA f4 and intensity masking f5 were extracted using the algorithms reported in [12–15], respectively. In order to derive the relevance weights wi ,
i = 1, 2, . . . , 5, we calculated for each image all five features along with the SSIM
index. The Pearson correlations [16] between the obtained 29 values for each of
the five features and the related 29 SSIM indices was then calculated and used as
the relevance weights. It should be noted that the SSIM index was selected for this
task as it has been shown to correlate well with human perception for images in
JPEG2000 format [8]. This approach resulted in the following relevance weights:
Blocking:
w1 = 0.2787
Blur:
w2 = 0.6735
Edge-based IA:
w3 = 0.8439
Gradient-based IA:
w4 = 0.9446
Intensity masking:
w5 = 0.2430
This result also confirms the large reduction of blocking artifacts in JPEG2000
represented by the small relevance weight of w1 = 0.2787, which has been observed as the dominant feature in our earlier work on images in JPEG format as
expected. In addition, the full-reference perceptual quality metrics SSIM and VIF
as well as the fidelity metric PSNR were examined for comparison.
5.2 Results of the Error Sensitivity Analysis
Figures 3 (a)-(h) illustrate the improvements in perceptual quality for the image
sample ‘womanhat’ with the number of decoded packets successively increasing.
Given that 4 quality layers were used, the progression of decoded packets may
range from one to 24 packets. Clearly, significant quality improvements are obtained when progressing from one to four decoded packets. Further decoding of
up to 8 packets gives additional improvements in the more complex textures of the
image. A decoding of packets beyond this codestream portion refines the image
Error Sensitivity Analysis for Wireless JPEG2000 ...
77
quality but provides only minor gains. For example, improvements are mainly
observed in the background and the hair below the hat for the cases of 12, 16, and
24 decoded packets.
Quantitative results of the error sensitivity analysis for the image sample ‘womanhat’ with 4 quality layers are given in Table 2. For each codestream portion from
1 to 24 decoded packets, the values of the considered six metrics are given together
with the difference δmetric compared to the preceding metric value. The results
confirm the findings from the visual inspection of Figs. 3 (a)-(h), namely, the significant improvement of quality or fidelity within the first 4-5 decoded packets.
Furthermore, it can be seen that 4 levels of quality plateaus are present which is
thought to correspond to the 4 quality layers used in the encoding. Moreover, this
behavior is captured by all considered metrics, i.e. the proposed reduced-reference
perceptual quality metrics L1 -norm, L2 -norm, and ∆N HIQM , the full-reference
perceptual quality metrics SSIM and VIF, and the full-reference fidelity metric
PSNR. It is also noted that quality improvements with the examined reducedreference metrics is represented by decreasing metric values while the two considered full-reference metrics and the fidelity metric represent improvements by
increasing metric values. This is because the former metrics measure distortions
while the latter measure similarity.
In order to reveal more general insights into the error sensitivity, an analysis
over the considered 29 images from the LIVE database was performed to deduce
average quality characteristics. Figures 4 (a)-(c) shows the progression of the average PSNR with the number of decoded packets for the cases of 4, 8, and 16
quality layers, respectively. It can be seen from these figures that the number of
quality layers translates to the number plateaus in the progression curves. For the
scenarios of 4 and 8 quality layers, the PSNR improves significantly with the first
five decoded packets and hence aligns with the observations from the inspection of
the image sample ‘womanhat’ (see Fig. 3). However, the large increase in fidelity
at the tail of the progression does not seem to align well with human perception.
Figures 5 (a)-(c) show the numerical results of the error sensitivity analysis
using the perceptual relevance weighted L1 -norm, L2 -norm, and ∆N HIQM . The
similar progression as with PSNR can be observed for average values of the three
objective perceptual quality metrics but with respect to plateaus of image degradations. This finding is of particular interest as these perceptual metrics only operate
Part II
78
—
−0.117
0.199
0.466
0.166
0.017
0.000
0.001
−0.001
0.018
0.059
0.016
0.000
0.000
0.001
0.006
0.040
0.016
0.000
0.001
0.000
−0.003
0.029
0.024
δ L1
0.594
0.683
0.568
0.292
0.202
0.193
0.193
0.193
0.194
0.184
0.154
0.146
0.146
0.146
0.146
0.143
0.125
0.117
0.117
0.117
0.116
0.117
0.104
0.092
L2 -norm
—
−0.089
0.115
0.276
0.090
0.009
0.000
0.000
−0.001
0.010
0.030
0.008
0.000
0.000
0.000
0.003
0.018
0.008
0.000
0.000
0.001
−0.001
0.013
0.012
δL2
0.291
0.326
0.272
0.095
0.093
0.094
0.094
0.093
0.094
0.090
0.044
0.047
0.047
0.047
0.046
0.043
0.017
0.014
0.014
0.013
0.014
0.021
0.009
0.002
—
−0.035
0.054
0.177
0.002
−0.001
0.000
0.001
−0.001
0.004
0.046
−0.003
0.000
0.000
0.001
0.003
0.026
0.003
0.000
0.001
−0.001
−0.007
0.012
0.007
∆N HIQM δN HIQM
22.291
24.439
26.440
28.681
30.640
31.179
31.179
31.212
31.261
31.403
31.877
32.321
32.321
32.329
32.353
32.448
32.975
33.457
33.457
33.497
33.534
33.786
34.269
34.972
—
2.148
2.001
2.241
1.959
0.539
0.000
0.033
0.049
0.142
0.474
0.444
0.000
0.008
0.024
0.095
0.527
0.482
0.000
0.040
0.037
0.252
0.483
0.703
0.601
0.629
0.673
0.751
0.799
0.803
0.803
0.803
0.804
0.809
0.830
0.835
0.835
0.835
0.836
0.839
0.856
0.864
0.864
0.864
0.864
0.870
0.885
0.897
—
0.028
0.044
0.078
0.048
0.004
0.000
0.000
0.001
0.005
0.021
0.005
0.000
0.000
0.001
0.003
0.017
0.008
0.000
0.000
0.000
0.006
0.015
0.012
PSNR δP SN R SSIM δSSIM
Table 2: Progression of different metric values with increasing number of decoded packets
1.110
1.227
1.028
0.562
0.396
0.379
0.379
0.378
0.379
0.361
0.302
0.286
0.286
0.286
0.285
0.279
0.239
0.223
0.223
0.222
0.222
0.225
0.196
0.172
Packet L1 -norm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
—
0.130
0.170
0.215
0.053
0.000
0.000
0.002
0.010
0.027
0.020
0.001
0.000
0.001
0.007
0.013
0.018
0.001
0.000
0.001
0.014
0.043
0.014
0.001
VIF δV IF
0.164
0.294
0.464
0.679
0.732
0.732
0.732
0.734
0.744
0.771
0.791
0.792
0.792
0.793
0.800
0.813
0.831
0.832
0.832
0.833
0.847
0.890
0.904
0.905
Error Sensitivity Analysis for Wireless JPEG2000 ...
79
on the reduced-reference information but do not require the presence of the original image. Furthermore, the rather small improvements in image quality at the
tail of the progression does much better correlate with human perception. As far
as the results for 4 and 8 quality layers are concerned, all three of the proposed
reduced-reference quality metrics behave in a similar manner and could be used
as alternative ESD in a JPWL system. On the other hand, for 16 quality layers, the
perceptual relevance weighted L1 -norm outperforms the L2 -norm and ∆N HIQM .
Specifically, the L1 -norm provides a better distinction among the quality levels
compared to the L2 -norm. It also decreases for the whole progression of decoded
packets instead of increasing at the beginning of the codestream.
To verify the applicability of objective perceptual quality metrics for error sensitivity analysis, results for the full-reference metrics SSIM and VIF are presented
in Fig. 6 for 4 quality layers. These benchmark metrics produce similar results
as their reduced-reference counterparts. Specifically, the significant increase in
image similarity with the initial five decoded packets and the reduced gains at
the tail of the progression of decoded packets aligns well with the results of the
reduced-reference metrics and also seem to correlate well with human perception
(see Fig. 3). However, SSIM and VIF are incapable to provide a quality measure
at the receiver.
In view of managing computational complexity with respect to processing resources in a mobile terminal, one may consider reducing the number of image features in the pooling of the proposed reduced-reference perceptual quality metrics
or focus only on one representative feature. Figures 7 and 8 provide therefore the
progression of the extreme value normalized features with high and low relevance
weight, respectively, for the example of 4 quality layers. Clearly, the prevalent
features of edge-based image activity and gradient-based image activity perform
favorably in the error sensitivity analysis and may be used for quality assessment
either in combination or just as single feature. On the other hand, the less relevant features being blocking, blur, and intensity masking, do not capture the gains
obtained with the progression of decoded packets as distinctively as the prevalent
features. This behavior is also expected as blocking and intensity masking are not
prevalent in JPEG2000. These features may therefore be discarded if processing
power is of concern.
80
Part II
32
30
PSNR (dB)
28
26
24
22
20
18
0
5
10
15
Number of packets decoded
20
25
40
50
80
100
(a) 4 layers
32
30
PSNR (dB)
28
26
24
22
20
18
0
10
20
30
Number of packets decoded
(b) 8 layers
32
30
PSNR (dB)
28
26
24
22
20
18
0
20
40
60
Number of packets decoded
(c) 16 layers
Figure 4: Progression of average PSNR with number of packets decoded.
Error Sensitivity Analysis for Wireless JPEG2000 ...
1.5
L −norm
1
L2−norm
∆NHIQM
Metric value
1
0.5
0
0
5
10
15
Number of packets decoded
20
25
(a) 4 layers
1.5
L −norm
1
L2−norm
∆
NHIQM
Metric value
1
0.5
0
0
10
20
30
Number of packets decoded
40
50
(b) 8 layers
1.8
L −norm
1
L2−norm
1.6
∆NHIQM
1.4
Metric value
1.2
1
0.8
0.6
0.4
0.2
0
0
20
40
60
Number of packets decoded
80
100
(c) 16 layers
Figure 5: Progression of average LP -norms and average ∆N HIQM .
81
82
Part II
0.9
0.8
Metric values
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
SSIM
VIF
5
10
15
Number of packets decoded
20
25
Figure 6: Progression of average SSIM and average VIF.
0.7
0.6
Normalized feature
0.5
0.4
0.3
0.2
0.1
Image Activity (Edge)
Image Activity (Grad)
0
0
5
10
15
Number of packets decoded
20
25
Figure 7: Progression of average features having high relevance weight.
0.8
Blocking
Blur
Intensity Masking
0.7
Normalized feature
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
Number of packets decoded
20
25
Figure 8: Progression of average features having low relevance weight.
Error Sensitivity Analysis for Wireless JPEG2000 ...
6
83
Conclusions
In this paper, we have provided an alternative error sensitivity analysis for JPWL
systems using reduced-reference objective perceptual quality metrics. Specifically, it has been revealed that the considered perceptual quality metrics exploiting
structural image features may accompany the fidelity-based ESD marker segment
in the JPWL standard. The advantages of the proposed perceptual quality metrics include better correlation with human perception at the beginning and tails of
the progression of decoded packets. These proposed metrics also readily support
in-service quality monitoring and RRM. This benefit is due to the fact that only a
small amount of reduced-reference information in terms of image features needs
to be communicated from the transmitter to the receiver while the presence of the
original image is not required. As the current JPWL standard has reserved a specifier in the ESD marker for future use, the concepts proposed in this paper could be
easily incorporated into coming releases while keeping backward compatibility.
This would pave the way for designing more efficient preferential error control
coding schemes, link adaptation techniques, and selective retransmission of packets with respect to their contribution to overall quality as perceived by humans.
References
[1]
International Organization for Standardization, “JPEG 2000 Image Coding
System – Part 11: Wireless”, ISO/IEC 15444-11:2007, May 2007.
[2]
S. Winkler, E. D. Gelasca, and T. Ebrahimi, “Perceptual Quality Assessment
for Video Watermarking”, in Proc. IEEE Int. Conf. on Inf. Techn.: Coding
and Compression, Las Vegas, USA, pp. 90—94, Apr. 2002.
[3]
A. W. Rix, A. Bourret, and M. P. Hollier, “Models of Human Perception”,
Journal of BT Technology, vol. 17, no. 1, pp. 24—34, Jan. 1999.
[4]
U. Engelke and H.-J. Zepernick, “Quality Evaluation in Wireless Imaging
Using Feature-Based Objective Metrics”, in Proc. IEEE Int. Symp. on Wireless Pervasive Computing, San Juan, Puerto Rico, pp. 367–372, Feb. 2007.
84
Part II
[5]
U. Engelke, “Perceptual Quality Metric Design for Wireless Image and
Video Communication,” Licentitiate dissertation, no. 2008:08, Blekinge Institute of Technology, Karlskrona, Sweden, June 2008.
[6]
International Organization for Standardization, “JPEG2000 Image Coding
System: Wireless”, ISO/IEC 15444-1:2004(E), Apr. 2004.
[7]
A. N. Skodras and T. Ebrahimi, “JPEG2000 Image Coding System Theory
and Applications”, in Proc. IEEE Int. Symp. on Circuits and Systems, Kos,
Greece, pp. 4, May 2006.
[8]
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality
Assessment: From Error Visibility to Structural Similarity”, IEEE Trans. on
Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004.
[9]
H. R. Sheikh and A. C. Bovik, “Image Information and Visual Quality”,
IEEE Trans. on Image Processing, vol. 15, no. 2, pp. 430–444, Feb. 2006.
[10] LIVE Quality Assessment Database, “Laboratory for Image and Video Engineering”, [Online], Available http://live.ece.utexas.edu/research/quality/sub
jective.htm
[11] D. Taubman, “Kakadu Software”, [Online], Available http://www.kakadu
software.com, Mar. 2007.
[12] Z. Wang, H. R. Sheikh, and A. C. Bovik, “No-reference Perceptual Quality Assessment of JPEG Compressed Images,” IEEE Int. Conf. on Image
Processing, pp. 477–480, Sep. 2002.
[13] P. Marziliano, F. Dufaux, S. Winkler, and T. Ebrahimi, “A No-reference Perceptual Blur Metric,” IEEE Int. Conf. on Image Processing, pp. 57–60, Sep.
2002.
[14] S. Saha and R. Vemuri, “An Analysis on the Effect of Image Features on
Lossy Coding Performance,” IEEE Signal Processing Letters, vol. 7, no. 5,
pp. 104–107, May 2000.
Error Sensitivity Analysis for Wireless JPEG2000 ...
85
[15] A. R. Weeks, “Fundamentals of Electronic Image Processing,” SPIE/IEEE
Series on Imaging Science and Engineering, 1998.
[16] S. Winkler, “Digital Video Quality,” John Wiley & Sons, 2005.
Part III
Part III
Quality Assessment of Error
Protection Schemes for Wireless
JPEG2000
Parts of this work are published as:
M. I. Iqbal and H.-J. Zepernick, “Quality Assessment of Error Protection Schemes
for Wireless JPEG2000,” Research Report, Blekinge Institute of Technology, no.
4, 2010, ISSN: 1103-1581.
M. I. Iqbal, H.-J. Zepernick, and U. Engelke, “Perceptual-based Quality Assessment of Error Protection Schemes for Wireless JPEG2000,” in Proc. IEEE International Symposium on Wireless Communication Systems, Siena, Italy, pp. 348352, Sept. 2009.
Quality Assessment of Error Protection Schemes
for Wireless JPEG2000
Muhammad Imran Iqbal and Hans-Jürgen Zepernick
Abstract
Wireless imaging services suffer large impairments due to the hostile
nature of the wireless channel. Given the limited and expensive channel
bandwidth and the high data demanding nature of these services, it becomes
a challenging task to provide high quality of service in such error prone
channels. Clearly, suitable error protection is necessary in order to maintain sufficient quality of these services under various channel conditions. In
this report, therefore, we have investigated different channel error protection
schemes for a wide range of channel conditions and coding rates. Two unequal error protection (UEP) schemes have been examined for JPEG2000
images exploiting useful features of the JPEG2000 codestream and using
the error protection tool set provided by wireless JPEG2000 (JPWL). Taking the importance of the initial codestream packets on the reconstruction of
the image at the receiver into account, the first scheme uses all the additional
bandwidth resources in protecting the initial packets of the codestream. The
rest of the packets, which are of relatively low importance, are transmitted
without any parity symbols assigned to them. In the second UEP scheme,
the initial parts of the codestream are strongly protected by assigning them
an increased amount of parity symbols. In addition, the tail packets of the
codestream are also protected but using a weaker error control code compared to the initial packets.
91
92
Part III
The performance of the proposed UEP schemes has been investigated in
terms of the peak signal-to-noise ratio as a typical fidelity metric and three
perceptual quality metrics, namely, the Lp -norm, the structural similarity
index, and the visual information fidelity criterion. Numerical results of
the proposed UEP schemes have been compared with conventional equal
error protection (EEP) over additive white Gaussian noise (AWGN) channel
as well as Rayleigh fading channel in the presence of AWGN. The results
reveal the superior performance of the suggested UEP schemes compared
to EEP over a range of channel signal-to-noise ratios and code rates.
1 Introduction
With the advent of third generation mobile radio networks, there has been a growing interest in multimedia services such as imaging along with stringent demands
on quality of service (QoS). This type of services is usually source encoded to
conserve bandwidth but at the cost of rendering the compressed signals highly
susceptible to transmission errors. As such, powerful error control coding is typically utilized to offer sufficient error resilience to multimedia services in mobile
radio systems. In addition, unequal error protection (UEP) may be applied to exploit the different importance of source encoded signals for the reconstruction at
the receiver such that bandwidth is conserved by keeping the total overhead for
error control small.
JPEG2000 [1] is a suitable candidate for the deployment of wireless imaging
services due to its favorable features. These include excellent compression performance, error resilience, hierarchical nature of the generated codestream, support
of progressive encoding and decoding with respect to quality and spatial resolution, and region of interest (ROI) coding. These features of JPEG2000 may
help wireless imaging applications to deal with the errors that the image codestream may have encountered during transmission over wireless channel, and to
help maintain the quality of these services. Moreover, its most recent part referred
to as wireless JPEG2000 (JPWL) [2] has provided a complete tool set for error
detection and correction by offering a wide range of error control codes. As such,
JPWL has further elevated the suitability of JPEG2000 images among the contemporary image coding standards for wireless imaging services. However, the task
Quality Assessment of Error Protection Schemes for ...
93
of configuring this tool set and choosing the error control codes that meet the requirements of a particular wireless imaging system within the available resources
is left to the system designer to perform.
Accordingly, different approaches have been reported in literature [3–5], trying to find suitable channel codes to cope with channel errors and revealing the
superior performance of UEP schemes over classical equal error protection (EEP)
scheme. The UEP scheme proposed in [3] optimizes combined source and channel coding rates using bit error rate (BER) statistics and rate distortion information of JPEG2000 encoded codestream. Rate-compatible punctured convolutional
(RCPC) codes have been applied to provide error protection to the multilayered
JPEG2000 codestream at the given total coding rate. A binary symmetric channel (BSC) with bit error rate of 10−2 was adopted in simulations for performance
analysis of the UEP scheme. The approach reported in [4] applies UEP to the
JPEG2000 image codestream such that only image header and the first few packets
are protected using RCPC codes along with an interleaver. Certain assumptions
have been made for the encoding of the JPEG2000 images including single tile
and single layer encoding as well as inclusion of the start of packet (SOP) marker
and the packed packet headers (PPM) marker segment into the codestream. In [5],
a technique has been proposed that searches for a suboptimal UEP scheme for
JPEG2000 codestream which maximizes the peak signal-to-noise ratio (PSNR)
for a given channel BER and coding rate using Reed-Solomon (RS) codes. A
virtual interleaving is also performed before transmission to help the decoder in
recovering the lost packets in a packet loss channel.
Apart from the fact that the above discussed schemes are not consistent with
JPWL, their performance assessment is based on link layer metrics such as BER
or fidelity metrics such as PSNR. These metrics are widely criticized for their
poor correlation with human quality assessment [6, 7]. On the other hand, subjective tests as an ultimate tool to judge user-perceived quality are expensive, time
consuming and not suitable for quality monitoring in live systems [6, 8]. Therefore, a number of perceptual quality metrics have been developed to mimic the
operation of the human visual system (HVS) including the perceptual relevance
weighted Lp -norm [9], structural similarity (SSIM) index [6] and visual information fidelity (VIF) criterion [12]. In contrast to fidelity metrics that quantify the
quality through pixel by pixel comparisons between distorted and reference im-
94
Part III
ages, perceptual quality metrics are based on analyzing impairments in the structural information of an image. As such, using perceptual quality metrics in technical systems for quality monitoring and evaluation of the error protection schemes
instead of fidelity metrics may improve the system performance which better suits
human viewers.
In this report, we have introduced two UEP strategies for JPEG2000 images.
The first strategy utilizes all the additional bandwidth resources for initial and
most important packets of the JPWL codestream by applying error protection to
these packets while the ending packets are left unprotected. In the second strategy, the available bandwidth is shared among the packets in such a way that initial
packets get a better protection while the tail packets get a comparatively weaker
protection against channel errors. These strategies are computationally simple yet
very effective in many scenarios specifically in the cases where different parts of
the image codestream can be divided into two levels of importance. One application of this type of strategies is in ROI coded images where ROI part of the
codestream is more important in terms of reconstructed image quality compared
to the rest of the codestream. Though these strategies are investigated for two levels of protection, they are easily extendable to any number of levels. Furthermore,
both of the proposed UEP strategies are fully consistent with JPWL.
As mentioned above, the perceptual quality metrics possess a better correlation with human perception, we therefore base our performance assessment of
UEP and EEP schemes on the aforementioned perceptual quality metrics. In particular, the examined UEP schemes take advantage of the progressive encoding of
JPEG2000 and the related different levels of importance of the involved packets
for image reconstruction at the receiver. The numerical results reveal the superior
perceptual quality performance of the selected UEP schemes over EEP for a range
of channel conditions and code rates.
The rest of this report is organized as follows. In Chapter 2, some background
on wireless JPEG2000 is provided. Chapter 3 gives a brief overview on image
quality assessment in terms of fidelity metrics and perceptual quality metrics. The
proposed UEP strategies are introduced in Chapter 4 and related numerical results
along with discussions are presented in Chapter 5. Finally, conclusions are drawn
in Chapter 6.
Quality Assessment of Error Protection Schemes for ...
2
95
Wireless JPEG2000
The JPEG2000 image coding standard [1] contains several tools for error resilient
image coding including error concealment and decoding process synchronization
that help the decoder in dealing with errors [10]. These tools, however, do not
provide sufficient error resilience necessary for keeping the desirable quality of
wireless imaging services under different channel conditions and bandwidth constraints. JPWL [2], on the other hand, addresses this shortcoming by providing a
complete tool set for protection against channel errors. JPWL provides the applications with the flexibility of choosing error control codes from a wide range of
channel codes ranging from simple 16 and 32-bit cyclic redundancy check codes
to more powerful RS codes, that suit them under given constraints.
2.1
JPWL System Description
A JPWL system can be applied to an input source image or to a JPEG2000 Part
1 encoded image. Typical configurations for both the aforementioned cases are
shown in Fig. 1. In the former case, on the transmitter side, the JPWL encoder
consists of three concurrent modules: A JPEG2000 Part 1 encoder, an error sensitivity generator and an error protection tool. After converting the input image to
JPEG2000 Part 1, computing error sensitivity of different parts of the generated
codestream for bit errors and applying error protection, the JPWL encoder generates the codestream which can be transmitted over any error prone channel. On the
receiving end, an error correcting process, a residual error description generation
and JPEG2000 Part 1 decoder constitute the JPWL decoder as shown in Fig. 1a.
In the later case, the JPWL encoder and decoder are composed of JPWL
transcoders for both transmitter and receiver sides. Specifically, at the transmitter,
the JPWL transcoder applies error protection to the JPEG2000 Part 1 codestream
and generates the JPWL codestream while the JPWL transcoder at the receiver end
does the reverse and generates a JPEG2000 Part 1 codestream out of the JPWL
codestream as shown in Fig. 1b. In addition, the transcoder at the receiver generates a residual error descriptor which can be used by the JPEG2000 decoder to
deal with residual errors.
96
Part III
Input
Image
JPWL codestream
JPEG2000
Part 1
encoder
Error
sensitivity
Error
protection
JPWL Encoder
Output
Image
JPEG2000
Part 1
decoder
Residual
errors
Error prone
wireless
channel
Error
correction
JPWL codestream
with possible errrors
JPWL Decoder
(a)
JPEG2000
codestream
JPWL codestream
Error
sensitivity
Error
protection
JPWL Transcoder
Error prone
wireless
channel
JPEG2000
codestream
Residual
errors
Error
correction
JPWL codestream
with possible errrors
Residuals
JPWL Transcoder
(b)
Figure 1: JPWL system description [2]: (a) JPWL encoder and decoder, (b) JPWL
transcoder.
Quality Assessment of Error Protection Schemes for ...
2.2
97
JPWL Marker Segments
In order to provide error protection to JPEG2000 images, JPWL introduces the
following four new marker segments [2] in the JPEG2000 codestream. The error
protection capability (EPC) marker segment contains the information about different normative and informative tools used for error protection. The parity symbols
added to the codestream for its protection are contained in the error protection
block (EPB) marker segment. Locating the uncorrectable errors in the codestream
is the purpose of the residual error descriptor (RED) marker segment. In addition,
the RED marker segment describes the categories of these errors. The sensitivity
of different parts of the codestream to channel errors is described in the error sensitivity descriptor (ESD) marker segment. The ESD marker segment represents
the contribution of each part of the codestream in the reconstructed image quality.
In other words, it represents the quality loss that might occur in the case of loosing
any part of the codestream.
Different fidelity metrics are included to indicate the error sensitivity of the
codestream including mean squared error (MSE), PSNR, absolute peak error and
total squared error. The error sensitivity information may assist in selecting a
suitable error protection strategy for protecting the codestream against channel
errors. Another powerful feature of the ESD marker segment is to give the decoder
an estimate of the image quality (or quality loss) for partial image decoding made
by combining only the initial error free parts of the codestream and discarding the
erroneous parts.
The usefulness of the ESD marker segment can be further enhanced in the
following two ways. Firstly, the perceptual quality metrics exhibit better correlation with human perception. Using a perceptual quality metric as error sensitivity
descriptor instead of the fidelity metrics suggested in JPWL will give a better perceptual quality estimate to the decoder for both partial and complete decoding of
the image. Secondly, the existing ESD constellations for JPWL fail to give any
quality estimate if the decoder has to decode erroneous parts of the codestream.
This shortcoming is due to the use of full reference metrics in ESD marker segments. Using reduced-reference quality metrics may overcome this shortcoming
by computing these error sensitivity values for the reconstructed image and comparing them with the same values calculated by the encoder. Further, this quality
98
Part III
assessment can be done even for codestreams having residual errors which is not
the case with the full-reference metrics specified for the JPWL-ESD marker segment.
It should be noted that JPWL offers an option for future use that allows for
inclusion of an additional error sensitivity descriptor other than the existing metrics. This may be exploited by using reduced-reference perceptual image quality
metrics such as the Lp -norm as suggested in [11]. In this way, objective image
quality assessment can be executed that is more consistent with subjective quality
without requiring the presence of the reference image at the receiver.
3 Image Quality Assessment
Several ways have been used to assess and quantify the quality of an image. Subjective experiments are considered to be the best as for as quality assessment is
concerned. On the other hand, as mentioned earlier, they are time consuming, expensive and not possible to implement in most of the live imaging systems [6, 8].
The objective quality metrics such as fidelity metrics and perceptual quality metrics, are most commonly used alternatives to the subjective tests. Some of the
objective quality metrics are described in the sequel.
3.1 Fidelity Metrics
These classical approaches use simple mathematical techniques to quantify the
quality of an image without considering the properties of the human visual system.
Mean Squared Error
The MSE is a very commonly used error metric which provides a way to quantify
the difference between the reference signal and its estimate. For an image of size
M × N , MSE is computed as
M N
1 ∑∑
(x̂i,j − xi,j )2
M SE =
MN
i=1 j=1
(1)
Quality Assessment of Error Protection Schemes for ...
99
where x̂i,j and xi,j are values of the pixels located at ith row and j th column of
distorted and the reference images, respectively.
Peak Signal-to-noise Ratio
As name suggests, PSNR is the ratio of peak signal to noise and it is usually
represented in decibel (dB). The PSNR of a distorted image can be computed as
(
P SN R = 20 · log10
2
Xmax
M SE
)
(2)
where Xmax is the dynamic range of the image pixel values. For image pixels
represented as 8-bit per pixel, for example, the dynamic range is given as Xmax =
255.
3.2
Perceptual Quality Metrics
Due to the poor correlation of fidelity metrics with human perception [6,7], including those suggested in JPWL-ESD marker segment, perceptual quality metrics are
adopted for performance assessment of the suggested error protection schemes.
The perceptual quality metrics base their quality assessment on the structural information present in both reference and distorted images which leads to a better
quality estimate compared to fidelity metrics. Some of the perceptual metrics are
described in the sequel.
Perceptual Relevance Weighted Lp -norm
The perceptual relevance weighted Lp -norm was proposed in [9]. It extracts different image features including blocking, blur, image activity, and intensity masking from both the reference and distorted image. On this basis, it computes image
quality in terms of the following pooling [9]:
Lp =
{ I
∑
i=1
} p1
wip |ft,i − fr,i |p
(3)
100
Part III
where ft,i and fr,i represent the ith extreme value normalized feature for the reference and impaired image, respectively, I is the total number of used features,
and parameter p denotes a positive integer. The perceptual relevance weights wi ,
i = 1, 2, . . . , I, associated with each of the features have been derived from subjective experiments. A non-linear mapping function may be utilized to relate Lp norm values to predicted mean opinion scores (MOS). In the case of wi = 1, ∀i,
the Lp -norm reduces to the Manhattan distance for p = 1 and represents the
Euclidian distance between feature values for p = 2. In [9], it is reported that
values larger than p = 2 do not improve the quality prediction performance. The
Lp -norm is a reduced-reference quality metric as only the feature values of the
transmitted image are needed at the receiver to assess the quality of the received
image using this metric. This is in contrast to the full-reference metrics that require the presence of the full reference image in order to assess the quality of the
received image.
Structural Similarity Index
The structural similarity (SSIM) index was proposed in [6] and works as follows.
For a small rectangular window x of the reference image, mean intensity µx and
contrast or variance σx are computed. Similar quantities µy and σy are also computed for the corresponding window y of impaired images. The SSIM index is
calculated as
SSIM (x, y) =
(2µx µy + C1 )(2σxy + C2 )
+ µ2y + C1 )(σx2 + σy2 + C2 )
(µ2x
(4)
where constants C1 and C2 are used to avoid instability in (4) that might occur
due to particular combinations of mean intensity and contrast, and σxy is the covariance between x and y. Eventually, the overall quality of the impaired image
is calculated by the averaging SSIM index values for all image windows.
Though being a full-reference quality metric and not applicable to quality
monitoring in a live system, the SSIM index may be used here for comparison
purposes.
Quality Assessment of Error Protection Schemes for ...
101
Visual Information Fidelity Criterion
The visual information fidelity (VIF) criterion has been proposed in [12] and also
belongs to the class of full-reference metrics. Based on a statistical information
model of natural scenes, it first quantifies the visual information present in the
reference image. The quality of an impaired image is then related to the extent to
which the same information is extractable from it. Natural scenes are considered
to be the output of a stochastic source and are modeled using Gaussian scale mixtures in the wavelet domain. Similarly, image distortions are modeled as signal
attenuation and additive noise. Accordingly, the HVS is considered as a distorted
channel and also modeled as additive white Gaussian noise in the wavelet domain.
The output of the HVS is considered to be the signal that the brain uses to extract
visual information. This model estimates the perceptual annoyance caused by different artifacts instead of the artifacts themselves. Based on the above models,
mutual information is extracted between the input and output of the HVS for the
reference image both with and without channel distortions for every subband. Finally, the VIF criterion is calculated as the ratio of these two mutual information
values.
4
Error Protection for Wireless JPEG2000
The release of JPWL has now equipped the JPEG2000 standard with a powerful
and flexible error control tool set to cope with transmission errors for a wide range
of channel conditions. However, the choice among the channel codes from JPWL
code set, that may fulfill the end user quality requirements under given channel
conditions and bandwidth constraints, is left to the wireless imaging service designer. In order to help selecting a suitable error protection for a wireless imaging
system, we have examined two simple but very effective UEP strategies. A comparison of the suggested UEP strategies will be made with the classic EEP scheme
to explore their effectiveness in different channel conditions and available bandwidth. To keep the scope of this research broader, the performance analysis will
be made based on a number of quality metrics ranging from fidelity metric PSNR
to perceptual quality metrics Lp -norm, SSIM index and VIF criterion. This will
connect the error protection scheme design and the quality monitoring in a live
102
Part III
Main
header
Tile
header
Packet 1
Packet 2
Packet P
Packet T
EOC
Tile stream
Figure 2: The organization of the JPEG2000 codestream for single tile encoding.
system through fidelity metrics and more importantly, through perceptual quality
metrics.
Fig. 2 shows the organization of the JPEG2000 image codestream when it is
encoded using single tile. The codestream starts withe the main header, followed
by the tile-stream. The tile-stream consists of tile header and the data packets.
An end-of-code (EOC) marker indicates the end of the codestream. Further, if
the image is coded using ROI coding, initial packets contain ROI while the tail
packets represent image background.
Specifically, the image main header and the tile header are protected using a
strong (nH , kH ) RS code due to the reasons described in the following. The number of errors introduced by wireless channel increases at severe channel conditions
and can go beyond the correction capabilities of a weaker code used for protecting
the header, leading to the header being corrupted. A codestream with corrupted
headers may not be decodable, making analysis of the error protection schemes
impossible at these severe channel conditions. Hence, using strong protection for
headers makes the analysis of the considered error protection schemes possible
for a wide range of the signal-to-noise ratio (SNR). The resulting decrease in code
rate due to strong header protection is ignorable due to image headers being very
small compared to image data. As such, the same (nH , kH ) RS code shall be used
for header protection in this report with all considered error control strategies.
The remainder of the codestream is divided into two parts. The first part contains P initial packets while the second part contains T −P tail packets, where T
represents the total number of data packets in the codestream excluding headers.
Given that error protection is deployed to the codestream, the total code rate R
Quality Assessment of Error Protection Schemes for ...
103
associated with the whole image is defined as
K
(5)
N
where K and N , respectively, are the codestream lengths before and after the
error control coding has been applied. It should be mentioned that we have kept
the same code rate for all examined error control strategies in order to facilitate fair
performance comparisons among them. Further, let SH denote the length of the
image header (i.e. main and tile headers combined), and let Si ; i = 1, 2, . . . , T ,
represent the length of the ith packet in the codestream. Then, the length N of the
protected codestream can be given as
R=
N=
T
P
∑
∑
Si
SH
Si
+
+
RH
R1
R2
i=1
(6)
i=P +1
where RH , R1 and R2 are the code rates associated with the RS codes used for
the protection of header, initial and tail packets, respectively. These code rates are
given as
kH
k1
k2
RH =
, R1 =
, R2 =
(7)
nH
n1
n2
where k1 , n1 and k2 , n2 are message lengths and codeword lengths used for the
P initial packets and the T −P tail packets, respectively.
4.1
Equal Error Protection
In this classic approach, all image packets are protected equally using the same
(n1 , k1 ) RS code. In this case, in order to find the codestream length N for any
given code rate the following modification in (6) is needed. Generally, codestream
packet lengths are not in the multiple of the message length k1 , some zeros may be
filled in at the ending messages of the packets. The similar applies to the header
for which the last message may be filled with zeros to align with the message
length kH of the (nH , kH ) RS code used. As a consequence, the length N of the
codestream after EEP is obtained from (6) with k2 = k1 and n2 = n1 as
⌉
⌉
⌈
T ⌈
∑
Si
SH
(8)
N = nH
+ n1
kH
k1
i=1
104
Part III
where ⌈x⌉ is the smallest integer greater than or equal to x.
4.2 Unequal Error Protection - Strategy 1
The first UEP strategy, referred to as UEP1, utilizes all the available parity symbols for protecting the P initial packets in the codestream while the tail packets are
transmitted without protection. The main motivation for choosing this protection
strategy is the fact that initial packets contribute more to the reconstructed image
quality compared to the ending packets. One common application of such a strategy is in the ROI coded images. Since ROI in an image captures viewer’s attention
more than background, preserving ROI quality by strong protection against channel errors may improve the perceptual image quality.
Accounting for the fact that the codestream needs to be organized into multiples of the message lengths of the involved RS codes for header and packets
involving some zero filling at ending messages, the length N of the protected
codestream using UEP1 is given as
⌈
N = nH
SH
kH
⌉
+ n1
⌉
P ⌈
∑
Si
i=1
k1
+
T
∑
Si
(9)
i=P +1
4.3 Unequal Error Protection - Strategy 2
With this strategy, referred to as UEP2, the P initial packets are protected with a
strong (n1 , k1 ) RS code while a weaker (n2 , k2 ) RS code has been applied to the
T −P tail packets. Performing similar size adjustments as with EEP and UEP1,
the resulting codestream length for UEP2 can be computed as
⌈
N = nH
SH
kH
⌉
+ n1
⌉
P ⌈
∑
Si
i=1
k1
+ n2
⌈ ⌉
T
∑
Si
k2
(10)
i=P +1
The motivations for UEP2 are the same as for UEP1 that initial parts of the image codestream are more important and accordingly they require better protection
compared to the rest of the codestream parts in order to keep the desirable quality
of wireless imaging services. This strategy is also useful for the transmission of
Quality Assessment of Error Protection Schemes for ...
105
ROI coded images over wireless channel taking care of the ROI better than the
rest of the image.
5
Results and Discussions
The performance of the suggested error control strategies has been examined in
terms of both fidelity metrics and perceptual quality metrics. Wireless imaging
scenarios were simulated using AWGN and Rayleigh fading channel model with
a wide range of channel conditions and channel code rates. The images used in
simulations were taken from the LIVE Quality Assessment Database [13] that
were converted to black and white images and eventually encoded in JPEG2000
format. The JPEG2000 codestream generated for each image contained 4 quality
layers and a bit rate of 0.5 bits per pixel with a total of 24 packets. The hierarchical nature of the JPEG2000 codestream and the powerful error control tool set
provided by JPWL are the main motivations for selecting JPEG2000 images for
simulations.
In all cases, image headers were strongly protected using an (nH , kH ) RS
code with message length kH = 32 and codeword length nH = 128. As for the
codestream packets, Table 1 shows the different settings of RS codes and corresponding total code rates for the examined EEP, UEP1 and UEP2 strategies. The
parameter P was chosen to be 18 on the basis of results obtained from experiments performed on a number of images. In particular, the values P = 6, 12,
and 18 were examined with the latter producing the best quality in terms of the
considered metrics. Due to the limited number of RS codes available in JPWL,
the code rates differ slightly among the different UEP strategies but still have
been kept within +0.07 to −0.02 of the code rates for EEP, in every case. Finally, after protecting the images with error control codes provided by these three
strategies, these images were transmitted over two different simulated channels;
AWGN and Rayleigh fading in the presence of AWGN. To produce results of statistical significance, 100 simulations were conducted for each system setting and
the performance results were averaged accordingly.
The numerical results that will be presented in the sequel were obtained for
the sample image of dimension 768 × 512 pixels shown in Fig. 3 and illustrate the
106
Part III
Table 1: RS codes used for different protection strategies.
#
EEP
R (n1 , k1 )
UEP1
R (n1 , k1 )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
0.84 (37, 32)
0.82 (38, 32)
0.78 (40, 32)
0.73 (43, 32)
0.70 (45, 32)
0.65 (48, 32)
0.62 (51, 32)
0.59 (53, 32)
0.56 (56, 32)
0.49 (64, 32)
0.42 (75, 32)
0.39 (80, 32)
0.37 (85, 32)
0.33 (96, 32)
0.28 (112, 32)
0.25 (128, 32)
0.83 (40, 32)
0.83 (40, 32)
0.79 (43, 32)
0.72 (48, 32)
0.69 (51, 32)
0.64 (56, 32)
0.64 (56, 32)
0.58 (64, 32)
0.58 (64, 32)
0.48 (80, 32)
0.41 (96, 32)
0.41 (96, 32)
0.36 (112, 32)
0.32 (128, 32)
0.32 (128, 32)
0.32 (128, 32)
R
UEP2
(n1 , k1 ), (n2 , k2 )
0.84 (37, 32), (37, 32)
0.83 (38, 32), (37, 32)
0.79 (40, 32), (38, 32)
0.73 (45, 32), (38, 32)
0.70 (48, 32), (37, 32)
0.66 (51, 32), (40, 32)
0.62 (56, 32), (38, 32)
0.60 (56, 32), (45, 32)
0.56 (64, 32), (37, 32)
0.49 (75, 32), (37, 32)
0.42 (85, 32), (51, 32)
0.40 (96, 32), (40, 32)
0.37 (96, 32), (56, 32)
0.33 (112, 32), (56, 32)
0.29 (128, 32), (64, 32)
0.26 (128, 32), (112, 32)
typical behavior seen also for other images. The quality of the unimpaired image
is given as a reference.
5.1 Performance Comparison for Additive White Gaussian Noise
Channel
In the first set of simulations, the AWGN channel model was considered and the
quality evaluation of the error protection strategies under consideration was done
in terms of PSNR.
Quality Assessment of Error Protection Schemes for ...
107
Figure 3: Sample image ‘Motorbikes’ [13].
Fig. 4 shows performance for a fixed code rate of R = 0.84 (EEP) and varying SNR. Accordingly, both UEP strategies outperform EEP in the medium SNR
range of typically 5 to 7 dB while all examined strategies perform fairly similar outside this range. The UEP1 is the best among the chosen strategies in this
medium SNR range and high coding rate. Similar trends were observed for other
total code rates.
A comparison among the three strategies for fixed channel SNR of 6 dB and
for various code rates is shown in Fig. 5. Still, both of the UEP strategies give
better PSNR performance compared to EEP. In particular, UEP1 outperforms both
EEP and UEP2 for all code rates. It is also clear from the figure that for all
strategies, decreasing the code rate improves the performance but at a certain value
of code rate a further decrease does not give any performance gain but it degrades
the performance instead.
Fig. 6 illustrates the quality improvements of progressive image decoding with
increasing number of decoded packets for code rate R = 0.59 and SN R = 6.
It can be seen that both UEP schemes provide better performance compared to
EEP for all intermediate stages of decoding. Similar behaviors were seen for
other values of R and SNR. It is clear form Figs. 4 – 6 that the simple UEP1
strategy of protecting only the P initial packets and leaving the remaining packets
108
Part III
25
PSNR (dB)
20
15
Reference
EEP
UEP 1
UEP 2
10
5
4
5
6
7
8
9
SNR (dB)
Figure 4: Performance of EEP and UEP over AWGN for fixed code rate with
R(EEP ) = 0.84 and varying channel conditions.
25
PSNR (dB)
20
15
10
5
Reference
EEP
UEP1
UEP2
0.8
0.7
0.6
0.5
0.4
Code rate
Figure 5: Performance of EEP and UEP over AWGN for fixed channel conditions
with SN R = 6 dB and different code rates R.
Quality Assessment of Error Protection Schemes for ...
109
28
26
Reference
EEP
UEP1
UEP2
PSNR (dB)
24
22
20
18
16
14
0
5
10
15
20
25
Number of packets decoded
Figure 6: Performance progression of the decoded image over AWGN for
R(EEP ) = 0.59 and SN R = 6 dB in terms of PSNR.
unprotected, provides superior PSNR performance over both UEP2 and EEP for
AWGN channel.
5.2
Performance Comparison for Fading Channel
In the following set of simulations, Rayleigh fading channel in the presence of
AWGN is considered as transmission medium. Further, the perceptual quality
metrics are also used for quality assessment of reconstructed images in addition
to PSNR.
Fig. 7 shows performance in terms of the SSIM index for a fixed code rate of
R = 0.62 (EEP) and varying SNR. Accordingly, both UEP strategies outperform
EEP with respect to the SSIM index in the medium SNR range of typically 11 to
16 dB while all examined strategies perform fairly similar outside this range. It
is noted that the superior performance of UEP over EEP was also observed with
respect to the Lp -norm and VIF criterion as well as PSNR.
Fig. 8 provides a performance comparison for the three error protection strategies for different code rates as shown in Table 1 but fixed SNR of 13 dB. Clearly,
110
Part III
0.8
0.7
0.6
SSIM
0.5
0.4
0.3
0.2
Reference
EEP
UEP1
UEP2
0.1
0
10
11
12
13
SNR (dB)
14
15
16
Figure 7: Performance of EEP and UEP over Rayleigh channel in terms of SSIM
index for fixed code rate of R = 0.62 (EEP) and different channel conditions.
UEP1 results in a better SSIM index as compared to UEP2 and EEP for all examined code rates. It is also evident from the figure that weak protection as indicated
by high code rates results in poor performance for all strategies.
Figs. 9 – 11 show the reconstructed images for visual inspection of the performance of the suggested UEP techniques. Specifically, the images were transmitted
over a simulated Rayleigh fading channel with SN R = 13 dB using coding rates
of 0.56, 0.33 and 0.25. It can been seen from these figures that UEP1 is the best
strategy among the considered strategies in terms of reconstructed image quality.
It should also be noted from Figs. 9 – 11 that a decrease in coding rate improves
the performance as can be seen from Figs. 9 through 10 but it does not provide
any gain beyond a critical lowest value. This fact can be seen in Figs. 10 – 11
where a decrease in the coding rate from 0.33 to 0.25 does not improve the results
for UEP1, it even degrades the performance of both UEP2 and EEP strategies.
This lowest critical value of code rate may not be the same for all the investigated
strategies and for any strategy, it may vary with varying SNR.
Fig. 12 illustrates the performance improvements of progressive image decoding with increasing number of decoded packets in terms of the two full-reference
Quality Assessment of Error Protection Schemes for ...
111
0.8
0.7
0.6
SSIM
0.5
0.4
0.3
0.2
Reference
EEP
UEP1
UEP2
0.1
0
0.8
0.7
0.6
0.5
Code rate
0.4
0.3
Figure 8: Performance of EEP and UEP over Rayleigh channel in terms of SSIM
index for fixed channel condition of SN R = 13 dB and different code rates (see
also Table 1, Scenarios 1-16).
metrics SSIM index and VIF criterion for code rate R = 0.49. PSNR performance is shown for comparison. It can be seen from Figs. 12(a)–(c) that both
UEP schemes outperform EEP in terms of all the considered metrics. It is also
observed that the simple UEP1 strategy, protecting only the P initial packets and
leaving the remaining packets unprotected, provides superior performance over
UEP2 in the considered case.
Fig. 13 shows the similar superior performance of the two UEP strategies over
EEP for the reduced-reference metrics L1 -norm and L2 -norm. While the SSIM
index and the VIF criterion successfully provide the transition from fidelity metrics to perceptual quality metrics, the Lp -norms additionally facilitate adoption of
such metrics to in-service quality monitoring in live systems due to their reducedreference nature.
112
Part III
(a) EEP
(b) UEP1
(c) UEP2
Figure 9: Reconstructed images at R(EEP ) = 0.56 and SN R = 13 dB.
Quality Assessment of Error Protection Schemes for ...
113
(a) EEP
(b) UEP1
(c) UEP2
Figure 10: Reconstructed images at R(EEP ) = 0.33 and SN R = 13 dB.
114
Part III
(a) EEP
(b) UEP1
(c) UEP2
Figure 11: Reconstructed images at R(EEP ) = 0.25 and SN R = 13 dB.
Quality Assessment of Error Protection Schemes for ...
1
0.8
Reference
EEP
UEP1
UEP2
0.7
0.8
0.6
0.7
0.5
VIF
SSIM
0.9
0.6
0.3
0.4
0.2
0.3
0.1
5
10
15
Number of packets decoded
20
25
Reference
EEP
UEP1
UEP2
0.4
0.5
0.2
0
115
0
0
5
(a)
10
15
Number of packets decoded
20
25
(b)
28
26
Reference
EEP
UEP1
UEP2
PSNR (dB)
24
22
20
18
16
14
0
5
10
15
Number of packets decoded
20
25
(c)
Figure 12: Performance progression of the decoded image over Rayleigh channel
for R(EEP ) = 0.49 and SN R = 13 dB: (a) SSIM, (b) VIF, (c) PSNR.
116
Part III
1
Reference
EEP
UEP1
UEP2
0.9
0.8
1
L −norm
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
5
10
15
Number of packets decoded
20
25
(a)
Reference
EEP
UEP1
UEP2
0.55
0.5
0.45
2
L −norm
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
5
10
15
Number of packets decoded
20
25
(b)
Figure 13: Performance progression of the decoded image over Rayleigh channel
for R(EEP ) = 0.49 and SN R = 13 dB: (a) L1 -norm, (b) L2 -norm.
Quality Assessment of Error Protection Schemes for ...
6
117
Conclusions
In this report, we have introduced two simple but yet very effective UEP strategies that are useful for wireless imaging systems. We have also examined the
performance of these strategies and compared them with conventional EEP for
various amounts of redundancy when used over AWGN and Rayleigh fading channels. The UEP1 strategy protects only few initial packets of the JPWL codestream
while the remaining packets are left without protection. The UEP2 strategy applies strong protection to the initial packets and weaker protection to the tail packets. The numerical results for both AWGN and fading channels reveal that both
UEP strategies outperform EEP in the medium SNR regime for all considered perceptual quality metrics as well as the classical fidelity metric PSNR. Further, the
simple UEP1 scheme performs even better than UEP2.
It may also be concluded that the Lp -norm is a suitable candidate for filling
the available space in the ESD marker segment of the JPWL standard to serve as
an error sensitivity descriptor. By doing so, in addition to providing better correlation with human perceived quality for the reconstructed image, it may also assist
in-service quality monitoring and resource management for wireless imaging services.
References
[1]
International Organization for Standardization, “Information Technology –
JPEG 2000 Image Coding System – Part 1: Core coding system,” ISO/IEC
15444-1:2004(E), Sept. 2004.
[2]
International Organization for Standardization, “Information Technology –
JPEG 2000 Image Coding System – Part 11: Wireless,” ISO/IEC 1544411:2007, May 2007.
[3]
Z. Wu, A. Bilgin, and M. W. Marcellin, “Unequal Error Protection for Transmission of JPEG2000 Codestreams Over Noisy Channels,” in Proc. International Conference on Image Processing, New York, USA, vol. 1, pp. I-213I-216, Sept. 2002.
118
Part III
[4]
V. Sanchez and M. K. Mandal, “Robust Transmission of JPEG 2000 Images
Over Noisy Channels,” 2002 Digest of Technical Papers, International Conf.
on Consumer Electronics, pp. 80-81, June, 2002.
[5]
G. Baruffa, P. Micanti, and F. Frescura, “Error Protection and Interleaving
for Wireless Transmission of JPEG 2000 Images and Video,” IEEE Transactions on Image Processing, vol. 18, no. 2, pp. 346-356, Feb. 2009.
[6]
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality
Assessment: From Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004.
[7]
W. Zhou, A. C. Bovik, and L. Ligang, “Why is Image Quality Assessment so
Difficult?,” in Proc. of IEEE International Conference on Acoustics, Speech,
and Signal Processing, Orlando, USA, vol. 4, pp. IV-3313 - IV-3316, May
2002.
[8]
M. Martinez-Rach, O. López, P. Piñol, M. P. Malumbres, J. Oliver, and C. T.
Calafate, “Quality Assessment Metrics vs. PSNR under Packet Loss Scenarios in MANET Wireless Networks,” in Proc. of the ACM Int. Workshop on
Mobile Video, Augsburg, Germany, pp. 31-36, Sept. 2007.
[9]
U. Engelke and H.-J. Zepernick, “Quality Evaluation in Wireless Imaging
Using Feature-Based Objective Metrics,” in Proc. IEEE Int. Symp. on Wireless Pervasive Computing, San Juan, Puerto Rico, pp. 367-372, Feb. 2007.
[10] A. N. Skodras and T. Ebrahimi, “JPEG2000 Image Coding System Theory
and Applications,” in Proc. IEEE International Symposium on Circuits and
Systems, Kos, Greece, pp. 4, May 2006.
[11] M. I. Iqbal, H.-J. Zepernick, and U. Engelke, “Error Sensitivity Analysis for
Wireless JPEG2000 Using Perceptual Quality Metrics,” in Proc. International Conference on Signal Processing and Communication Systems, Gold
Coast, Australia, Dec. 2008.
Quality Assessment of Error Protection Schemes for ...
119
[12] H. R. Sheikh and A. C. Bovik, “Image Information and Visual Quality,”
IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430-444, Feb.
2006.
[13] LIVE Quality Assessment Database, “Laboratory for Image and Video
Engineering,” [Online], Available http://live.ece.utexas.edu/research/ quality/subjective.htm
Part IV
Part IV
A Framework for Error Protection
of Region of Interest Coded Images
and Videos
Part IV is submitted as:
M. I. Iqbal and H.-J. Zepernick, “A Framework for Error Protection of Region
of Interest Coded Images and Videos,” EURASIP Journal: Image Communication, 2010, under review.
A Framework for Error Protection of Region of
Interest Coded Images and Videos
Muhammad Imran Iqbal and Hans-Jürgen Zepernick
Abstract
In this paper, we propose a framework for unequal error protection
(UEP) of image and video streaming over wireless channels. Our framework of allocating the parity symbols associated with error control coding
to the image or video codestream takes advantage of the different levels
of importance that particular spatial regions of visual multimedia content
has to human observers. As such, it provides stronger protection against
transmission impairments of those parts of an image or video stream that
correspond to the regions of interest (ROIs) while less protection is given
to the background (BG). For this purpose, an image or video stream represented by a sequence of packets is split into smaller cells in such a way
that certain cells contain the parts of a codestream that represent ROIs and
the last cell carries solely BG information. Accordingly, the available parity
budget given by the specific error control coding strategy is then distributed
among these cells based on their contribution to the overall perceptual quality of a reconstructed image or video. A dynamic programming approach
is utilized to facilitate optimal allocation of parity to ROIs and BG for ROI
based UEP. The performance of the proposed ROI based UEP scheme in
terms of objective perceptual quality is analyzed and compared with both
the optimal UEP without ROI processing and equal error protection (EEP)
using the structural similarity (SSIM) index. Numerical results validate the
125
126
Part IV
effectiveness of our framework and the superior performance of the proposed UEP scheme compared to EEP. The performance of the proposed
UEP scheme matches well with that of the optimal UEP without ROI processing, especially, for multiple spatial description image and video coding
while computational complexity can be kept much lower.
1 Introduction
Recent advances in wireless communication systems such as the third generation mobile cellular systems provide technologies that support the high data rates
needed for mobile multimedia services. Specifically, wireless imaging and mobile video streaming services are expected to further increase in the coming years.
Given the source compressed representation of mobile multimedia signals and
their vulnerability against transmission errors, deployment of powerful error control coding becomes a crucial part to ensure delivery of services with acceptable
quality.
To develop an efficient error control coding scheme for this class of services,
it is beneficial to account for the fact that human observers typically give different
importance to particular spatial regions of an image or video. For example, scenes
that comprise of faces or include people draw significant attention by viewers.
The region that attracts viewer’s attention the most is called region of interest
(ROI) while the remaining region of the same image or video frame having less
importance is called background (BG). Exploiting this feature of the human visual
system (HVS) provides many benefits including improving the performance of
quality metrics as shown in [1].
Accordingly, prioritized error handling of data from a streaming service depending on its importance for the reconstruction of an image or video using unequal error protection (UEP) improves performance compared to straightforward
equal error protection (EEP). However, finding an optimal UEP scheme for a mobile multimedia service is a very challenging problem. It depends on many factors
including the source distortion-rate function, transmission channel scenario, allocated bandwidth, and available code rate. Many solutions to this complex multidimensional problem, such as those reported in [2]– [15], have been proposed
for image and video streaming services. Most of the developed error protection
A Framework for Error Protection of Region of Interest Coded ... 127
techniques for preferential error control coding provide an approximation to the
optimal solution. They typically utilize certain classes of channel codes, consider
particular channel models, and aim for specific image or video formats.
The variety of deployed UEP schemes range from Reed-Solomon (RS) codes
[5, 6, 13, 15] over simple cyclic redundancy check (CRC) codes together with rate
compatible punctured convolutional (RCPC) codes [3, 14] to turbo codes [11].
Combinations of different codes include CRC and rate compatible punctured turbo
(RCPT) codes used in [4, 8, 12], RCPC/CRC and RS product codes as reported
in [2, 7], and CRC/Turbo in combination with RS codes used in [9].
The transmission channels considered in the development of UEP techniques
are often represented by discrete models such as binary symmetric channel (BSC)
[4, 8], Gilbert-Elliott channel (GEC) [10], or packet loss channel [5]– [7], [13].
Concerning real-valued channel models with respect to signal magnitude, an additive white Gaussian noise (AWGN) channel is used in [11] while a Rayleigh
fading channel is considered in [2, 6, 7, 9, 11].
Concerning the different encoding formats used with the above error protection techniques, they can be identified as JPEG2000 or Motion JPEG2000 in [4]–
[6], [8,13] while set partition in hierarchical tree encoding [21] is used in [2], [5]–
[9], [12]. The techniques in [4]– [9], [12] were developed for fixed channel packets while source/information packets of fixed length are considered in [13].
Alternatively, the discussed error protection schemes may be divided in two
main classes: Distortion based schemes and rate based schemes. With the former class of schemes, distortions are minimized through either minimizing the
expected mean squared error (MSE) or maximizing the expected peak signal-tonoise ratio (PSNR). On the other hand, the rate based schemes aim at maximizing
the total number of correctly decoded bits.
Regarding the performance evaluation of the above mentioned error protection techniques, this is typically done with fidelity metrics, for instance, PSNR
and MSE. However, it has been widely accepted that these types of metrics sometimes possess only weak correlation with the quality of images and videos as it
would be perceived by humans. A much better correlation to human perception
can be achieved by accounting for structural information from the spatial domain
and evaluate related distortions due to transmission and other impairments. This
is particulary true as the HVS deduces and operates such structural information
128
Part IV
rather then processing the spatial viewing area pixel-by-pixel. A number of objective perceptual quality metrics have been proposed in recent years that can be used
for the purpose of performance evaluation, for example, the structural similarity
(SSIM) index [19] and visual information fidelity (VIF) criterion [20]. It can be
expected that designing an error protection scheme for an image and video service
using this type of objective perceptual quality metrics will be more efficient and
reliable in terms of the quality as perceived by humans.
In view of the above, our main objective in this paper is to exploit the feature
of the HVS of prioritizing ROIs over BG for the design of efficient error control
schemes for image and video communications. In addition, optimization should
be based on perceptual quality metrics instead of fidelity metrics. In this context,
it can be expected that the HVS will react more sensitive to distortions occurring
in an ROI compared to the same level of distortions in the BG. Accordingly, the
ROI part of the codestream carrying an image or video frame is given high error
protection by allowing more parity symbols compared to the BG. This will result
in less distortions in the ROIs compared to the BG in the presence of channel
errors and will in turn lead to an overall better quality of the reconstructed image
or video.
In this paper, we propose a framework for UEP of ROI coded images and
videos. It comprises of the following three components. Firstly, a given image
or video stream is organized into cells with each cell containing a codestream
that represents an ROI and the last cell represents the BG. Secondly, the available
parity budget is distributed among ROIs and BG according to their perceptual importance to viewers. It should be noted that the respective weights of the different
spatial regions for the viewer perception have been deduced from results reported
in our publicly available ROI database [17]. Lastly, the portion of the parity budget
that was allocated to each cell is optimally distributed among the packets within
each cell using dynamic programming. For this purpose, we adopted the optimal
UEP approach suggested in [3] but only over the packets of a cell instead of over
all packets of the entire codestream of an image or video frame. This larger granularity in terms of codestream structure has been shown to be beneficial in [22]
for finding a trade-off between performance and complexity although ROIs and
BG have not been considered in our previous work. Due to the decomposition of
the spatial domain into ROIs and BG, our framework proposed in this paper offers
A Framework for Error Protection of Region of Interest Coded ... 129
performance in terms of perceptual quality that matches closely to that of optimal
UEP but with the benefit of largely reduced computational complexity. Specifically, simulation results for the JPEG2000 and Motion JPEG2000 formats over a
wireless link validate the excellent performance of the proposed UEP scheme and
effectiveness of the proposed framework. For the purpose of performance evaluation, we allow for the inclusion of objective perceptual quality metrics into the
processing and utilized the SSIM index with our numerical results as the related
algorithm is publicly available.
The remainder of this paper is organized as follows. In Section 2, we introduce the main functions of the proposed framework for error protection of ROI
coded images and videos. The approach of splitting the codestream into ROI and
BG cells is presented in Section 3. Subsequently, in Section 4, the parity allocation to those cells subject to an overall budget of symbols for error protection and
distinguishing between ROIs and BG is described. In Section 5, the dynamic programming solution adopted to further provide optimal UEP to the different packets
within each cell is presented. Numerical results are then reported and discussed
in Section 6 giving performance comparisons between EEP, optimal UEP without
ROI processing, and our proposed ROI based UEP scheme. Finally, Section 7
concludes the paper.
2
Framework for Preferential Error Protection
Image and video scenes are often comprised of certain spatial regions that attract
the viewer’s attention more than other regions. Given the image sample shown in
Fig. 1, for example, it can be expected that the faces of the two parrots will attract
higher attention as compared to the rest of the image. In this paper, we utilize this
feature of the HVS for preferential error protection of image and video services
over wireless channels. Specifically, the overall budget of parity symbols given
by a particular error control scheme is allocated such that the ROIs of an image or
video receive stronger protection compared to the less important BG.
The main idea of our framework is to divide a single optimal rate allocation
problem into multiple smaller problems and then to solve these smaller problems
independently while making use of the spatial importance of ROIs. In this context,
130
Part IV
Figure 1: Image sample ‘Parrots’ containing two regions of interest.
rate allocation to a packet means applying an error control code which provides a
specified code rate. Strong error protection is provided by low-rate codes which
require a large number of parity symbols to accompany the payload. On the other
hand, weak error protection is obtained with high-rate codes and requires less
parity symbols. We have thus reformulated the optimal rate allocation problem as
optimal parity allocation problem.
Figure 2 shows the block diagram of the proposed framework for error protection of ROI coded images and videos. It comprises of three main functions
as follows. In the first processing step, the codestream generated from an image
or video signal is split into ROIs and BG. In this way, the codestream is organized into so-called cells comprising of a number of packets where each of the
ROIs as well as the BG correspond to a particular cell. A codestream split into
M cells implies therefore that it carries M − 1 ROIs along with the BG. Subsequently, the available parity symbols of a given budget is allocated to these cells
in their respective order of significance for the quality of the reconstructed image/video. This may be controlled by perceptual weights that may be available
from subjective experiments. Finally, the amount of parity symbols given to each
cell is distributed among the packets constituting each cell. The related error control code is then applied to each packet based on the number of parity symbols
allocated. The following sections will describe these three key functions of the
A Framework for Error Protection of Region of Interest Coded ... 131
Perceptual
weights
Input
codestream
Splitting into
ROI and BG cells
Parity allocation
among cells
Error protection
within each cell
Protected
codestream
Figure 2: Block diagram of the framework for error protection of ROI coded
images and videos.
proposed framework in more detail.
3
Splitting the Codestream into ROI and BG Cells
The most beneficial approach to split the codestream of a given image or video
format would be to generate separate and independent codestreams for each ROI
and the BG. This would require a multiple description image and video encoder
that can generate separate streams for each of the considered spatial regions. In
this way, it would be straightforward to protect these separate ROI and BG streams
independently.
Due to the absence of standardized multiple description image encoders, we
have chosen the JPEG2000 format for images as it supports ROI coding and also
may be modified to account for multiple descriptions. Similarly, we have chosen the Motion JPEG2000 format for video streaming and examined both single
description and multiple description streaming representing different regions of a
video frame.
3.1
Splitting Approach for Images
Let us consider an image format that produces a single codestream for the whole
image. In this case, the codestream shall be split into M cells where each cell
represents one spatial region of the image, i.e. either an ROI or the BG. Further,
let the image be encoded using multiple ROIs using a JPEG2000 encoder and let
132
Part IV
the ROIs have different priorities to the viewer. To separate the codestream into
ROIs and BG, it is noted that the JPEG2000 encoder places the data relating to
ROIs at the beginning of the codestream and in the order of their priority. As we
used the generic scaling method for ROI coding in JPEG2000, there are no hard
boundaries among the representation of the ROIs in the codestream and between
the last ROI and the BG. The following approach is therefore used to separate the
ROIs as well as the BG and hence produce a structured codestream suitable for
UEP coding.
Given is a codestream that consists of NCS packets. As would be supported
by JPEG2000, let us assume that the codestream is organized into a sequence of
ROIs according to their priorities followed by the BG. Applying progressive decoding that commences with the first packet to proceed to the last packet in the
codestream, the quality of the image successively increases with the ROI of highest priority developing first and so on. In this setting, let ϕi denote the objective
perceptual quality of the ith ROI computed using a suitable metric such as the
SSIM index or the VIF criterion with respect to the spatial area of the considered
ROI. Similarly, let ϕi (Ni′ ) represent the objective perceptual quality obtained for
the ith ROI in case that a total of Ni′ subsequent packets are processed. In order
to quantify Ni′ , we define the following condition
ϕi (Ni′ ) ≥ α · ϕi , i = 1, 2, . . . , M − 1
(1)
where α is a real number representing the fraction of quality obtained after processing Ni′ packets. It is noted that a value of α = 0.9 has turned out to be
beneficial for separating the codestream into a sequence of cells that represents
ROIs and BG. In view of (1), the number Ni of the packets contained in the ith
cell representing the ith ROI can be calculated as
′
Ni = Ni′ − Ni−1
, i = 1, 2, . . . , M − 1
(2)
where N0′ , 0 is the initial value. Once the number of packets in the M − 1 ROI
cells are given, the remaining
′
NBG = NCS − NM
−1
packets are then associated with the BG.
(3)
A Framework for Error Protection of Region of Interest Coded ... 133
Let us now focus on the case of multiple description encoding, where we utilize the tiling feature of the JPEG2000 format to generate a multiple description
image codestream. Here, the JPEG2000 encoder generates a codestream in which
the data associated with each tile of the image is kept separate from the other
tiles resulting in a so-called tile-stream. This type of codestream partitioning has
several advantages for designing an UEP scheme. Specifically, different priorities
can be assigned to each region that is represented by a tile and related error protection can be applied according to these preferences. Furthermore, this codestream
structure facilitates the decomposition of a single and complex optimization problem with respect to parity allocation into a number of M smaller and less complex
problems. It should also be noted that multiple description coding has the advantage of being more error resilient to packet loss compared to single description
coding [24], which can be a beneficial feature for wireless imaging and video
services.
3.2
Splitting Approach for Videos
As far as error protection of ROI coded videos is concerned, splitting of the related
video stream into ROIs and BG appears to be more involved due to the associated
temporal progression of the displayed content. However, an autocorrelation analysis for mobile video quality assessment has been reported in [18] to shed light
on the dynamics of the changes in spatial features of a video over time. In particular, the autocorrelation analysis was carried out to reveal the coherence time
for which the considered spatial features can be assumed as being constant. The
numerical results showed that the average coherence time assumes a value of 25
frames for all considered spatial features. Similarly, we assume here that the ROIs
would also change slowly with the progression of frames and may be considered
as constant for a sequence of several consecutive frames (see Fig. 3). Accordingly,
the region boundaries calculated for the different ROIs and BG remain the same
for the progression of a group of consecutive frames which in turn reduces the
computational load.
In this paper, we consider videos that are given in Motion JPEG2000 format. As Motion JPEG2000 does not involve inter-frame coding, a similar split-
134
Part IV
ROI 1
ROI 2
Frame group 3
Frame group 2
Frame group 1
Figure 3: Splitting videos into ROI and BG cells to hold for groups of consecutive
frames.
ting approach as outlined in Section 3.1 for JPEG2000 images can therefore be
performed for both single and multiple description video coding.
4 Parity Allocation to ROI and BG Cells
Given the decomposition of an image or video codestream into ROIs and BG cells,
the next task is to advice a suitable parity allocation to these portions that serves
the design of an UEP scheme. For this purpose, let us assume that a total number of parity symbols is given for the entire codestream to constitute an overall
parity budget. The amount of available parity symbols may be constraint by the
limited bandwidth offered by a particular system. Specifically, bandwidth is typically scarce and expensive in mobile radio systems and wireless communication
systems and hence would warrant efficient allocation of parity symbols for error
protection to ROI and BG cells.
A distribution of the available parity symbols among ROI and BG cells can
be based on the significance that each individual cell has on the quality of the
reconstructed image or video. The optimal parity allocation among all possible
solutions may be obtained using dynamic programming. However, in this work,
we have chosen a more direct approach by applying perceptual weights associ-
A Framework for Error Protection of Region of Interest Coded ... 135
ated with the different spatial regions represented by the ROIs and the BG. These
perceptual weights may be obtained from subjective experiments. In this way, a
stronger relationship between human perception and preferential coding offered
by UEP is gained.
In view of the above grounds, let the total code rate given for the entire codestream be
K
R=
(4)
K +B
where K denotes the number of codestream symbols associated with an image or
video frame and B is the overall budget of parity symbols that are available for
distribution among ROIs and BG.
Concerning the perceptual weights wi , i = 1, 2, . . . , M , given to the M different cells, we assume that these can only take on real values in the range
0 ≤ wi ≤ 1; i = 1, 2, . . . , M
(5)
and, under the premise that the image or video frame would be of 100% interest
to the viewer, accumulate as
M
∑
wi = 1
(6)
i=1
The parity allocation to the M cells can then be performed according to the
level of importance of the ROIs and BG by solving the following set of linear
equations:
Bi = wi · Ni · C, i = 1, 2, . . . , M
M
∑
B =
Bi
(7)
(8)
i=1
where Bi is the parity allocated to the ith cell, Ni is the number of packets contained in the ith cell, and C is a constant. It is instructive to solve (7)-(8) for C,
which leads to the solution
B
C= M
(9)
∑
wi Ni
i=1
136
Part IV
Accordingly, the parity distribution is allocated not only with respect to the weights
of the ROIs and BG cells but also the number of packets in the individual cells.
This approach has been selected in order to not overprotect cells that have a high
weight but contain only a small number of packets and vice versa.
For the special case that all cells consist of the same number of packets, say
Ni = Nc , i = 1, 2, . . . , M , we have a parity allocation of
Bi = wi · B, i = 1, 2, . . . , M
(10)
while for a scenario where all cells have the same weight, we obtain
Bi =
Ni
· B, i = 1, 2, . . . , M
NCS
where
NCS =
M
∑
Ni
(11)
(12)
i=1
5 Parity Allocation to ROI and BG Packets
Let C denote the set of codes cl , l = 1, 2, . . . , L available for unequal error protection and B be the set of the number of parity symbols bl , l = 1, 2, . . . , L associated
with these codes.
5.1 Problem Statement
Let an image or video frame be transmitted using an overall UEP policy
Ψ = {ψ1 , ψ2 , ..., ψM }
(13)
where ψi is the error protection policy applied to the ith cell of an image or video
frame. Given that parity allocation to ROI and BG cells has already been performed as described in Section 4 resulting in a parity budget for each cell, it is
A Framework for Error Protection of Region of Interest Coded ... 137
sufficient to pose the problem simply with respect to the packets of a cell. For
ease of notation, let us therefore denote
ψ = {cl1 , cl2 , . . . , clN }, lj ∈ {1, 2, . . . , L}
(14)
as the error protection policy applied to a cell containing N packets.
In addition, let ϕ denote image or video frame quality if the cell is decoded
correctly at the receiver and let ϕi be the quality if the first i packets of the cell
are decoded correctly. In order to calculate the expected quality ϕ(ψ) for a given
policy ψ after transmitting the packets of a cell over an error prone channel, we
assume here a block fading channel in the presence of additive Gaussian noise
(AWGN). In this case, the fading channel can be considered as time-invariant for
the duration of a packet and the block error probabilities are independent among
packets. Then, the expected image quality ϕ(ψ) for policy ψ after transmission of
the codestream over the channel can be calculated as
ϕ(ψ) =
N
∑
j
∏
ϕj Pe,j+1 (ψ) (1 − Pe,l (ψ))
j=0
l=0
(15)
where Pe,j (ψ) is the packet error probability using the code clj defined by the
policy ψ of that cell, Pe,0 (ψ) , 0, and Pe,N +1 (ψ) , 1.
The problem of finding the optimal UEP policy ψ for an individual cell such
that the maximum quality is obtained subject to the total budget of parity symbols,
can then be formulated as
{
P1 :
max ϕ(ψ)
ψ
subject to B(ψ) ≤ Bcell
(16)
where B(ψ) denotes the total number of parity symbols allocated to this cell using
error protection policy ψ and Bcell ∈ {B1 , B2 , . . . , BM } is the total number of
parity symbols available for protecting the considered cell. It should be noted
that we have B(ψ) ≤ Bcell since the number of codes cl ∈ C is finite and the
associated redundancy bl ∈ B is discrete.
138
Part IV
5.2 Solution to the Problem
We have chosen dynamic programming to solve problem (16) and adopt the procedure reported in [3]. In contrast to [3], dynamic programming is performed for
each cell comprising of N packets to account for ROIs and BG instead of using it
for the whole codestream. In this way, the M different parts of a codestream representing an image or video frame are equipped with optimal UEP while keeping
computational complexity low.
In order to solve this optimization problem using dynamic programming, we
consider progressive image and video encoding such as JPEG2000 and Motion
JPEG2000, respectively. With this type of encoding, the image or video quality
improves with the progression of received packets j given the j − 1 previous
packets. As suggested in [3], an incremental reward can be used to measure the
quality improvement due to the additional j th packet and shall be defined in the
examined problem setting as
δj , ϕj − ϕj−1 , j = 1, 2, . . . , N
(17)
where δj is the reward or the quality gain of the j th packet over the quality obtained by decoding j − 1 packets and ϕ0 denotes a constant initial value. Alternatively, quality of an image or video frame may be expressed with (17) as a sum of
non-negative rewards
j
∑
δl
(18)
ϕj − ϕ0 =
l=1
In preparation of making problem (16) accessible to dynamic programming,
it is beneficial to define a partial expected image or video frame quality ϕk (ψ) for
policy ψ after transmission of the codestream over the channel as
( j )
j
N
∏
∑
∑
δl Pe,j+1 (ψ)
[1 − Pe,l (ψ)]
(19)
ϕk (ψ) =
j=k
l=k
l=k
which may be calculated recursively as
{
[1−Pe,N (ψ)]δN ,
k=N
ϕk (ψ) =
[
]
[1−Pe,k (ψ)] δk +ϕk+1 (ψ) , k = 1, 2, . . . , N −1
(20)
A Framework for Error Protection of Region of Interest Coded ... 139
Using (15), (17) and (19), it can be shown that
ϕ1 (ψ) = ϕ(ψ) − ϕ0
(21)
which allows to rewrite (16) as
{
P2 :
max ϕ1 (ψ)
ψ
(22)
subject to B(ψ) ≤ Bcell
Given the recursive problem formulation, let us now define ϕ∗k (r) for brevity
as the optimal solution to (22) for parity allocation ρk at the k th recursion subject
to a parity budget r. Then, the optimal parity allocation to the different packets at
the k th recursion can be found using the dynamic programming recursion
{
ϕ∗k (r) =
0,
if k > N
max{[1−Pe,k (ψ)][δk +ϕ∗k+1 (r−ρk )]}
ψ
otherwise
(23)
Solving recursion (23) for k = 1 and r = Bcell gives the optimal error protection policy for the considered cell.
5.3
Complexity Considerations
Given are the total number NSC of packets in a codestream, the set
N = {N1 , N2 , . . . , NM } of the number of packets in the different cells, and the
total number M of cells in the codestream. Further, the largest number of packets
among the cells is obtained as
Ncell,max = max(Nj ), j = 1, 2, . . . , M
j
(24)
The computational complexity of the proposed variant of the dynamic programming approach is of the order of
(
)
CROI = O cNcell,max
(25)
140
Part IV
where c denotes a positive number. For the special case of all cells having equal
number of packets, the computational complexity of the proposed approach is
given as
( N)
CROI = O c M
(26)
On the other hand, the optimal UEP approach reported in [3] operates on the
total number NCS packets in the codestream and posses a much larger computational complexity compared to the ROI based approach given as
(
)
Copt = O cNCS
(27)
This is particularly significant with the increase of ROIs which results in an increase of M and hence decrease of complexity of our ROI based approach.
6 Numerical Results
The performance of the proposed ROI based UEP technique compared to EEP
and optimal UEP [3] has been examined for four representative examples. The
considered scenarios used with our simulations are as follows:
• Single description encoding of JPEG2000 images with one ROI
• Multiple description encoding of JPEG2000 images with one ROI
• Single description encoding of a Motion JPEG2000 video with one ROI
• Multiple description encoding of a Motion JPEG2000 video with two ROIs
6.1 Simulation Setting
In the sequel, we provide the particulars of the considered ROI coded images and
videos, the description of the underlying wireless link model, the derivation of
perceptual weights, and augmentation of quality metrics.
A Framework for Error Protection of Region of Interest Coded ... 141
(a) Barbara
(b) Elaine
(c) Tiffany
(d) Lena
(e) Goldhill
(f) Mandrill
(g) Pepper
Figure 4: Sample images with their mean ROIs as specified in [16].
142
Part IV
Specifications for ROI coded images
Figure 4 shows the seven JPEG2000 images that were used with our simulations
of a wireless imaging system with ROI coding and UEP for preferential error
control. These gray scale images were all of size 512×512 pixels. The JPEG2000
encoding and decoding was performed using Kakadu software [25]. The two cases
of producing images as single tile and images with multiple tiles for generating
independent ROI and BG streams are considered.
The ROIs for these images were selected according to the results reported
in [16] as illustrated in Fig. 4 by the related rectangular shapes. Specifically, these
ROIs are the outcome of a subjective experiment involving 30 nonexpert viewers
that were given the task to select a region within each of the images that drew
most of their attention. The complete details of the experimental procedures and
a comprehensive statistical analysis of the results is presented in [16]. The localizations of the ROIs in terms of ROI center coordinates and ROI dimensions have
also been made publicly available at [17] and are used here for our simulations.
Specifications for ROI coded videos
The video sequence ‘toy train’ from the Video Quality Experts Group (VQEG)
database [26] was used here. The first frame of this video along with the selected
single ROI and two ROIs is presented in Fig. 5. This video is of 8 seconds duration with a frame size of 702 × 486 pixels. It is encoded into a special YUV
format which was converted to Motion JPEG2000 format according to VQEG
guidelines and using the Kakadu software. In particular, we generated one Motion JPEG2000 sequence with a single ROI using single tile encoding and another
Motion JPEG2000 sequence accounting for two ROIs using multiple tile encoding
to represent multiple description encoding.
Specification of the wireless link model
Forward error correction was applied to these images and videos using RS codes
suggested by the proposed UEP, EEP and the optimal UEP scheme of [3] to
achieve a total coding rate of 0.8 for all the schemes. The channel coded images
and videos were then transmitted over the simulated wireless channel modeling
A Framework for Error Protection of Region of Interest Coded ... 143
(a)
(b)
Figure 5: First frame of sample video ‘toy train’ taken from the VQEG database
[26]: (a) Single ROI, (b) Two ROI.
144
Part IV
Rayleigh fading in the presence of additive white Gaussian noise (AWGN). At
the receiver, after channel decoding the codestream is truncated for every image
or video frame when the first residual error occurs and the quality is computed
for this truncated codestream. The reason for not decoding the rest of the codestream is to avoid crashing or losing the synchronization of the JPEG2000 decoder
that might happen in the presence of errors in the codestream. The transmission
through the wireless channel was repeated 100 times and the results were averaged
to get statistically more reliable results.
Derivation of Perceptual Weights
There are many different approaches to derive the perceptual weights for ROIs and
BG of a given image or video frame. In the case of an image or video frame with
a single ROI, we have taken advantage of the ROI database [16, 17]. Accordingly,
the ROI selections performed by each of the 30 nonexpert viewers are available
along with the mean ROI as shown in Fig. 4. It is intuitive to then compute the
weight of an ROI by counting the number of pixels of the individual ROI selected
by each nonexpert viewer that falls into the mean ROI of that image and dividing
the result by the number of pixels in the corresponding viewer ROI. In other words,
the weight wROI of the ROI averaged over all 30 viewers can be formulated as
1 ∑ an
=
30
An
30
wROI
(28)
n=1
where an denotes the area of the ROI in number of pixels of the nth viewer’s ROI
that falls into the area of the mean ROI of that image and An is the total area of
the nth viewer’s ROI. In view of (5), the weight wBG given to the BG can be
calculated as
wBG = 1 − wROI
(29)
In addition, we have averaged the results obtained for the seven images from
the ROI database to provide a mean weight making the weighting more independent from the content. The obtained results are presented in Fig. 6.
For the sample video ‘toy train’ with two ROIs, we have selected the weights
for ROI1 , ROI2 , and BG, respectively, as w1 = 0.44, w2 = 0, 33, and w3 = 0.23.
A Framework for Error Protection of Region of Interest Coded ... 145
0.8
BG
ROI
0.7
0.6
Weights
0.5
0.4
0.3
0.2
0.1
0
Bb
El
Gd
Ln
Md
Image
Pp
Tf
Mean
Figure 6: Perceptual weights for ROI and BG for the considered reference images.
Augmentation of Quality Metrics
Following a similar approach as in [1], we used an augmented version of the
quality metrics to account for spatial importance of different regions of an image
or a video frame to the overall perceived quality. Specifically, the quality metrics
were calculated separately for all identified spatial regions, i.e. ROIs and BG, and
then accumulated as a weighted sum as
ϕ=
M
∑
wi ϕi
(30)
i=1
where ϕ is the overall quality of an image or video frame, ϕi is the quality value
for the ith cell representing a spatial region and M is the total number of cells.
In the case of video quality, a single quality value is obtained by averaging the
augmented quality over all frames. Although the described augmentation can be
applied to a variety of objective perceptual quality metrics, we have selected in
this paper the SSIM index as an example metric.
146
Part IV
6.2 Results and Discussion
In the sequel, the performance results produced for the sample image ‘Barbara’
and sample video ‘toy train’ are reported. It is noted that similar findings were
obtained for the other image samples but these will not be reported here due to
space limitations.
Figure 7 shows the performance for the single description encoded JPEG2000
image ‘Barbara’ with one ROI over Rayleigh fading channel in the presence of
AWGN. It can be seen from the figure, that all three considered error protection
techniques provide similar performance in terms of augmented SSIM index for
the low signal-to-noise ratio (SNR) regime, i.e. below 6 dB in this example. The
same applies to the high SNR regime of values above 21 dB to finally reach the
quality of the reference image. On the other hand, in the range of medium SNR
values, the proposed UEP technique outperforms EEP and performs almost as
well as the optimal UEP [3]. This slight shortfall in performance of the proposed
UEP compared to optimal UEP is thought to be due to the following reason. The
proposed ROI based UEP technique protects the initial packets of each cell with
strong codes while the remaining tail packets in the cell are weakly protected or
even not protected at all depending on the considered SNR and available parity
symbols. If a residual error occurs in any of these tail packets, the subsequent
cells are not decoded as the codestream is truncated with the appearance of the
first residual error. However, the slightly better performance of optimal UEP is
paid for by a significantly larger computational complexity (see Section 5.3).
Figure 8 shows the results for the same transmission scenario as before but using multiple description encoding of the JPEG2000 image ‘Barbara’. Clearly, the
performance of the ROI based UEP techniques matches very well with the computationally more complex optimal UEP. Further, both UEP schemes outperform
EEP over almost the entire examined range of SNR. It is also observed that the
performance in case of multiple description encoding increases more consistently
over a wider range of SNR values. Eventually, all three error protection schemes
reach the quality of the reference image given that the SNR is sufficiently high.
Figure 9 depicts the simulation results obtained for the Motion JPEG2000
video ‘toy train’ with single ROI, single description encoding, and transmission
over a Rayleigh fading channel in the presence of AWGN. Clearly, similar con-
A Framework for Error Protection of Region of Interest Coded ... 147
1
0.9
0.8
Augmented SSIM
0.7
0.6
0.5
0.4
0.3
0.2
Proposed UEP
Optimal UEP
EEP
Reference
0.1
0
5
10
15
SNR (dB)
20
25
30
Figure 7: Performance comparison of error protection schemes for JPEG2000
image ‘Barbara’ with single description encoding and single ROI.
1
0.9
0.8
Augmented SSIM
0.7
0.6
0.5
0.4
0.3
0.2
Proposed UEP
Optimal UEP
EEP
Reference
0.1
0
0
5
10
15
SNR (dB)
20
25
30
Figure 8: Performance comparison of error protection schemes for JPEG2000
image ‘Barbara’ with multiple spatial description encoding and single ROI.
148
Part IV
1
Proposed UEP
Optimal UEP
EEP
Reference
0.9
0.8
Augmented SSIM
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
SNR (dB)
20
25
30
Figure 9: Performance comparison of error protection schemes for Motion
JPEG2000 video ‘toy train’ with single description encoding and single ROI.
1
0.9
0.8
Augmented SSIM
0.7
0.6
0.5
0.4
0.3
0.2
Proposed UEP
Optimal UEP
EEP
Reference
0.1
0
0
5
10
15
SNR (dB)
20
25
30
Figure 10: Performance comparison of error protection schemes for Motion
JPEG2000 video ‘toy train’ with multiple description encoding and two ROIs.
A Framework for Error Protection of Region of Interest Coded ... 149
clusions as for the single description encoded JPEG2000 image can be drawn for
this example.
Finally, Fig. 10 presents the performance obtained for multiple description
encoded Motion JPEG2000 video ‘toy train’ with two ROIs selected in this scenario. Apparently, the results are consistently favorable as with the multiple description encoded JPEG2000 image. It also validates that the proposed ROI based
UEP technique performs well when more than one ROI have to be protected with
higher priority compared to the background.
7
Conclusions
In this paper, we have proposed a framework for error protection of ROI coded
images and videos. In the first processing step, it splits the codestream associated
with an image or video frame into a number of ROI and BG cells, each containing a certain number of packets. Subsequently, the overall parity budget available
for error protection is distributed among the ROIs and BG cells according to their
perceptual importance. Finally, the parity symbols allocated to a cell are then
optimally distributed among the packets of the cell using dynamic programming.
In this way, the overall computational complexity of the proposed approach can
be significantly reduced compared to the optimal UEP. The numerical results presented for JPEG2000 and Motion JPEG2000 as examples of progressive encoding
formats reveal that the proposed UEP approach can achieve similar performance
as optimal UEP with significantly reduced computational complexity. Especially,
for multiple spatial description encoding, the performance of the proposed UEP
scheme matches closely to that of the optimal UEP.
References
[1]
E. C. Larson, C. Vu, and D. M. Chandler, “Can Visual Fixation Patterns
Improve Image Fidelity Assessment?,” in Proc. IEEE Int. Conf. on Image
Processing, San Diego, California, USA, Oct. 2008, pp. 2572-2575.
150
Part IV
[2]
P. G. Sherwood and K. Zeger, “Error Protection for Progressive Image Transmission Over Memoryless and Fading Channels,” IEEE Trans. on Communications, vol. 46, no. 12, pp. 1555-1559, Dec. 1998.
[3]
V. Chande and N. Farvardin, “Progressive Transmission of Images over
Memoryless Noisy Channels,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, pp. 850-860, June 2000.
[4]
B. A. Banister, B. Belzer, and T. R. Fischer, “Robust Image Transmission
Using JPEG2000 and Turbo-Codes,” IEEE Signal Processing Letters, vol.
9, no. 4, pp. 117-119, April 2002.
[5]
J. Kim, R. M. Mersereau, and Y. Altunbasak, “Error-Resilient Image and
Video Transmission Over the Internet Using Unequal Error Protection,”
IEEE Trans. on Image Processing, vol. 12, no. 2, pp. 121-131, Feb. 2003.
[6]
S. Dumitrescu, X. Wu, and Z. Wang, “Globally Optimal Uneven ErrorProtected Packetization of Scalable Code Streams,” IEEE Trans. on Multimedia, vol. 6, no. 2, pp. 230-239, April 2004.
[7]
V. M. Stanković, R. Hamzaoui, and Z. Xiong, “Real-Time Error Protection
of Embedded Codes for Packet Erasure and Fading Channels,” IEEE Trans.
on Circuits and Systems for Video Technology, vol. 14, no. 8, pp. 1064-1072,
Aug. 2004.
[8]
R. Hamzaoui, V. Stanković, and Z. Xiong, “Fast Algorithm for DistortionBased Error Protection of Embedded Image Codes,” IEEE Trans. on Image
Processing, vol. 14, no. 10, pp. 1417-1421, Oct. 2005.
[9]
N. Thomos, N. V. Boulgouris, and M. G. Strintzis, “Wireless Image Transmission Using Turbo Codes and Optimal Unequal Error Protection,” IEEE
Trans. on Image Processing, vol. 14, no. 11, pp. 1890-1901, Nov. 2005.
[10] Y. Sun, I. Ahmad, D. Li, and Y.-Q. Zhang, “Region-Based Rate Control and
Bit Allocation for Wireless Video Transmission,” IEEE Trans. on Multimedia, vol. 8, no. 1, pp. 1-10, Feb. 2006.
A Framework for Error Protection of Region of Interest Coded ... 151
[11] N. Ramzan, S. Wan, and E. Izquierdo, “Joint Source-Channel Coding
for Wavelet-Based Scalable Video Transmission Using an Adaptive Turbo
Code,” EURASIP Journal on Image and Video Processing, vol. 2007, no. 1,
pp. 16-16, Jan. 2007.
[12] Lei Cao, “On the Unequal Error Protection for Progressive Image Transmission,” IEEE Trans. on Image Processing, vol. 16, no. 9, pp. 2384-2388, Sept.
2007.
[13] G. Baruffa and P. Micanti, “Error Protection and Interleaving for Wireless
Transmission of JPEG2000 Images and Video,” IEEE Trans. on Image Processing, vol. 18, no. 2, pp. 346-356, Feb. 2009.
[14] A. Nosratinia, J. Lu, and B. Aazhang, “Source-Channel Rate Allocation for
Progressive Transmission of Images,” IEEE Trans. on Communications, vol.
51, no. 2, pp. 186-196, Feb. 2003.
[15] A. E. Mohr, E. A. Riskin, and R. E. Ladner, “Unequal Loss Protection: Graceful Degradation of Image Quality Over Packet Erasure Channels Through Forward Error Correction,” IEEE Journal on Selected Areas in
Communications, vol. 18, no. 6, pp. 819-828, June 2000.
[16] U. Engelke and H-J. Zepernick , “A Framework for Optimal Region-OfInterest Based Quality Assessment in Wireless Imaging,” SPIE Journal of
Electronic Imaging, vol. 19, no. 1, pp. 011005-011005-13, Jan. 2010.
[17] U. Engelke and H.-J. Zepernick, “Region of Interest Database,” http://
www.bth.se/tek/rcg.nsf/pages/roi-db, 2010.
[18] F. Wang and H.-J. Zepernick, “Autocorrelation Analysis of Spatial Features
for Mobile Video Services,” in Proc. Int. Conf. on Signal Processing and
Communication Systems., Gold Coast, Australia, Dec. 2008, pp. 1-9.
[19] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, “Image Quality
Assessment: From Error Visibility to Structural Similarity,” IEEE Trans. on
Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004.
152
Part IV
[20] H. R. Sheikh and A. C. Bovik, “Image Information and Visual Quality,”
IEEE Trans. on Image Processing, vol. 15, no. 2, pp. 430-444, Feb. 2006.
[21] A. Said and W. A. Pearlman, “A New, Fast and Efficient Image Codec Based
on Set Partitioning in Hierarchical Trees,” IEEE Trans. on Circuits Syst.
Video Technology, vol. 6, no. 3, pp. 243-250, June 1996.
[22] M. I. Iqbal and H.-J. Zepernick, “Error Protection for Wireless Imaging: Providing a Trade-off Between Performance and Complexity,” in Proc. IEEE International Symposium on Communications and Information Technologies,
Tokyo, Japan, Oct. 2010, to appear.
[23] International Organization for Standardization, “Information Technology –
JPEG 2000 Image Coding System: Core coding system,” ISO/IEC 154441:2004(E), Sept. 2004.
[24] Brian A. Heng, J. G. Apostolopoulos, and J. S. Lim, “End-to-End RateDistortion Optimized MD Mode Selection for Multiple Description Video
Coding,” EURASIP Journal on Applied Signal Processing, vol. 2006, pp.
1-12, Jan. 2006.
[25] D. Taubman, “Kakadu Software,” [Online], Available at http://www.kakadu
software.com, Aug. 2009.
[26] Video Quality Experts Group, “VQEG Test Sequences,” [Online], Available
at ftp://ftp.crc.ca/crc/vqeg/TestSequences/ALL 525/src15 ref 525.yuv, [Ac
cessed] Oct. 2009.
Different parts of source encoded multimedia streams such
The thesis is divided into four parts. In the first part, region
as those associated with standard image or video formats
of interest (ROI) identification, coding and advantages of ROI
possess different levels of importance with respect to their
coding for different applications are investigated. In addition, a
contribution to the quality of the reconstructed image or vi-
framework is proposed for using ROI coding in wireless ima-
deo. This unequal importance among data within a codestream
ging. The second part analyses the error sensitivity of wireless
gives rise to preferential treatment of the more significant
JPEG2000 (JPWL) in terms of perceptual quality metrics. It
parts of the codestream compared to the less important
is also shown that using reduced-reference perceptual qua-
parts. Similarly, visual information offered by certain regions
lity metrics as error sensitivity descriptor (ESD) in JPWL
of an image or video may attract viewer’s attention more than
increases the effectiveness of ESD. Specifically, this type of
other parts of the viewing area. As a consequence, preferen-
metrics correlates well with subjective quality assessment
tial treatment of important data and information can play a
and provides an additional estimate for the obtained quality.
vital role in mobile multimedia services in order to preserve
In the third part, two UEP schemes for JPWL are proposed
satisfactory quality of service under the harsh conditions of a
and compared with equal error protection (EEP). Their per-
band-limited, error-prone wireless channel. In this thesis, we
formance is evaluated in terms of perceptual quality metrics
therefore, investigate how preferential coding can be used to
such as structural similarity index and the visual information
protect multimedia services more efficiently against transmis-
fidelity criterion of the reconstructed image and their benefit
sion errors. For this purpose, an error sensitivity analysis of
over EEP is revealed. Finally, in the fourth part, a framework for
the specific application is utilized as a basis to design efficient
optimized preferential coding of ROI based images and videos
unequal error protection (UEP) schemes. The performance of
is proposed. Specifically, a dynamic programming algorithm for
the proposed preferential coding schemes is evaluated using
optimal parity distribution is provided.
Preferential Coding for
Mobile Multimedia Services
ABSTRACT
Preferential Coding for
Mobile Multimedia Services
Muhammad Imran Iqbal
objective perceptual quality metrics in order to account for
the fact that humans are the ultimate judges of service quality.
Muhammad Imran Iqbal
ISSN 1650-2140
ISBN 978-91-7295-182-2
2010:07
2010:07
Blekinge Institute of Technology
Licentiate Dissertation Series No. 2010:07
School of Engineering
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement