MPEG Audio - Edusources

MPEG Audio - Edusources
Syllabus Outline
Topics in the module include the following:
• Introduction: Multimedia applications and requirements (e.g.,
overview of multimedia systems, video-on-demand, interactive
television, video conferencing, hypermedia courseware,
groupware, World Wide Web, and digital libraries).
6
• Audio/Video fundamentals including analog and digital
representations, human perception, and
audio/video equipment, applications.
• Audio and video compression
– perceptual transform coders for images/video
(e.g., JPEG, MPEG, H.263, etc.),
– scalable coders (e.g., pyramid coders),
– and perceptual audio encoders.
– image and video processing applications and algorithms.
JJ
II
J
I
Back
Close
Recommended Course Books
Supplied Text:
7
• Managing Multimedia:
Project Management for Interactive Media
(2nd Edition)
Elaine England and Andy Finney,
Addison Wesley, 1998
(ISBN 0-201-36058-6)
JJ
II
J
I
Back
Close
Recommended Course Book
8
Fundamentals of Multimedia
Mark S. Drew, Li Ze-Nian
Prentice Hall, 2003
(ISBN: 0130618721)
JJ
II
J
I
Back
Close
Other Good General Texts
9
• Multimedia Communications:
Applications, Networks, Protocols and
Standards,
Fred Halsall,
Addison Wesley, 2000
(ISBN 0-201-39818-4)
OR
• Networked Multimedia Systems,
Raghavan and Tripathi,
Prentice Hall,
(ISBN 0-13-210642)
JJ
II
J
I
Back
Close
The following books are highly recommended reading:
• Hypermedia and the Web: An Engineering Approach, D. Lowe and W. Hall, J. Wiley
and Sons, 1999 (ISBN 0-471-98312-8).
• Multimedia Systems, J.F.K, Buford, ACM Press, 1994 (ISBN 0-201-53258-1).
• Understanding Networked Multimedia, Fluckiger, Prentice Hall, (ISBN 0-13-190992-4)
• Design for Multimedia Learning, Boyle, Prentice Hall, (ISBN 0-13-242215-8)
10
• Distributed Multimedia:Technologies, Applications, and Opportunities in the Digital
Information Industry (1st Edition) P.W. Agnew and A.S. Kellerman , Addison Wesley,
1996 (ISBN 0-201-76536-5)
• Multimedia Communication, Sloane, McGraw Hill, (ISBN 0-077092228)
• Virtual Reality Systems, J. Vince, Addison Wesley, 1995 (ISBN 0-201-87687-6)
• Encyclopedia of Graphics File Formats, Second Edition by James D. Murray and William
vanRyper, O’Reilly & Associates, 1996 (ISBN: 1-56592-161-5)
JJ
II
J
I
Back
Close
Multimedia Authoring — Useful for Assessed Coursework
• Macromedia Director MX Demystified,
Phil Gross, Macromedia Press (ISBN:
0321180976)
11
• Macromedia Director MX and Lingo:
Training from the Source Phil Gross,
Macromedia Press (ISBN: 0321180968)
• Director
8
and
Lingo
(Inside
Macromedia), Scott Wilson, Delmar
(ISBN: 0766820084)
• Director/Lingo Manuals — Application
Help and in Library
• SMIL: Adding Multimedia to the Web
Tim Kennedy and Mary Slowinski,
Sams.net (ISBN: 067232167X)
JJ
II
J
I
Back
Close
The following provide good reference material for parts of the
module:
Multimedia Systems
• Hyperwave:The Next Generation Web Solution, H. Maurer,
Addison Wesley, 1996 (ISBn 0-201-40346).
12
JJ
II
J
I
Back
Close
Digital Audio
• A programmer’s Guide to Sound, T. Kientzle, Addison Wesley,
1997 (ISBN 0-201-41972-6)
• Audio on the Web — The official IUMA Guide, Patterson and
Melcher, Peachpit Press.
13
• The Art of Digital Audio, Watkinson,
Focal/Butterworth-Heinmann.
• Synthesiser Basics, GPI Publications.
• Signal Processing: Principles and Applications, Brook and
Wynne, Hodder and Stoughton.
• Digital Signal Processing, Oppenheim and Schafer, Prentice
Hall.
JJ
II
J
I
Back
Close
Digital Imaging/Graphics/Video
• Digital video processing, A.M. Tekalp, Prentice Hall PTR,
1995.
• Encyclopedia of Graphics File Formats, Second Edition by
James D. Murray and William vanRyper, 1996, O’Reilly &
Associates.
14
JJ
II
J
I
Back
Close
Data Compression
• The Data Compression Book, Mark Nelson,M&T Books, 1995.
• Introduction to Data Compression, Khalid Sayood, Morgan
Kaufmann, 1996.
• G.K. Wallace, The JPEG Still Picture Compression Standard
15
• CCITT, Recommendation H.261
• D. Le Gall, MPEG: A Video Compression Standard for Multimedia
Applications
• K. Patel, et. al., Performance of a Software MPEG Video Decoder
• P. Cosman, et. al., Using Vector Quantization for Image Processing
JJ
II
J
I
Back
Close
Introduction to Multimedia
16
What is Multimedia?
JJ
II
J
I
Back
Close
Introduction to Multimedia
What is Multimedia?
17
Multimedia can have a many definitions these include:
Multimedia means that computer information can be
represented through audio, video, and animation in addition
to traditional media (i.e., text, graphics/drawings, images).
JJ
II
J
I
Back
Close
General Definition
A good general definition is:
Multimedia is the field concerned with the computer-controlled
integration of text, graphics, drawings, still and moving images
(Video), animation, audio, and any other media where every
type of information can be represented, stored, transmitted and
processed digitally.
18
JJ
II
J
I
Back
Close
Multimedia Application Definition
A Multimedia Application is an Application which uses a
collection of multiple media sources e.g. text, graphics, images,
sound/audio, animation and/or video.
19
JJ
II
J
I
Back
Close
What is HyperText and HyperMedia?
Hypertext is a text which contains links to other texts.
The term was invented by Ted Nelson around 1965.
20
Figure 1: Definition of Hypertext
JJ
II
J
I
Back
Close
Hypertext is therefore usually non-linear (as indicated below).
21
Figure 2: Illustration of Hypertext Links
JJ
II
J
I
Back
Close
Hypermedia
HyperMedia is not constrained to be text-based. It can include
other media, e.g., graphics, images, and especially the
continuous media – sound and video.
22
Figure 3: Definition of HyperMedia
JJ
II
J
I
Back
Close
Example Hypermedia Applications?
23
JJ
II
J
I
Back
Close
Example Hypermedia Applications?
• The World Wide Web (WWW) is the best example of a
hypermedia application.
• Powerpoint
24
• Adobe Acrobat
• Macromedia Director
• Many Others?
JJ
II
J
I
Back
Close
Multimedia Systems
A Multimedia System is a system capable of processing
multimedia data and applications.
25
A Multimedia System is characterised by the processing,
storage, generation, manipulation and rendition of Multimedia
information.
JJ
II
J
I
Back
Close
Characteristics of a Multimedia System
A Multimedia system has four basic characteristics:
• Multimedia systems must be computer controlled.
• Multimedia systems are integrated.
26
• The information they handle must be represented digitally.
• The interface to the final presentation of media is usually
interactive.
JJ
II
J
I
Back
Close
Challenges for Multimedia Systems
• Distributed Networks
• Temporal relationship between data
– Render different data at same time — continuously.
27
– Sequencing within the media
playing frames in correct order/time frame in video
– Synchronisation — inter-media scheduling
E.g. Video and Audio — Lip synchronisation is clearly
important for humans to watch playback of video and audio
and even animation and audio.
Ever tried watching an out of (lip) sync film for a long time?
JJ
II
J
I
Back
Close
Key Issues for Multimedia Systems
The key issues multimedia systems need to deal with here are:
• How to represent and store temporal information.
• How to strictly maintain the temporal relationships on play
back/retrieval
28
• What process are involved in the above.
• Data has to represented digitally — Analog–Digital
Conversion, Sampling etc.
• Large Data Requirements — bandwidth, storage, compression
JJ
II
J
I
Back
Close
Desirable Features for a Multimedia System
Given the above challenges the following feature a desirable (if
not a prerequisite) for a Multimedia System:
Very High Processing Power — needed to deal with large data
processing and real time delivery of media. Special hardware
commonplace.
29
Multimedia Capable File System — needed to deliver real-time
media — e.g. Video/Audio Streaming.
Special Hardware/Software needed – e.g RAID technology.
Data Representations — File Formats that support multimedia
should be easy to handle yet allow for
compression/decompression in real-time.
JJ
II
J
I
Back
Close
Efficient and High I/O — input and output to the file subsystem
needs to be efficient and fast. Needs to allow for real-time
recording as well as playback of data. e.g. Direct to Disk
recording systems.
Special Operating System — to allow access to file system and
process data efficiently and quickly. Needs to support direct
transfers to disk, real-time scheduling, fast interrupt processing,
I/O streaming etc.
30
Storage and Memory — large storage units (of the order of 50
-100 Gb or more) and large memory (50 -100 Mb or more).
Large Caches also required and frequently of Level 2 and 3
hierarchy for efficient management.
Network Support — Client-server systems common as
distributed systems common.
Software Tools — user friendly tools needed to handle media,
design and develop applications, deliver media.
JJ
II
J
I
Back
Close
Components of a Multimedia System
Now let us consider the Components (Hardware and Software)
required for a multimedia system:
Capture devices — Video Camera, Video Recorder, Audio
Microphone, Keyboards, mice, graphics tablets, 3D input
devices, tactile sensors, VR devices. Digitising/Sampling
Hardware
31
Storage Devices — Hard disks, CD-ROMs, Jaz/Zip drives, DVD,
etc
Communication Networks — Ethernet, Token Ring, FDDI, ATM,
Intranets, Internets.
Computer Systems — Multimedia Desktop machines,
Workstations, MPEG/VIDEO/DSP Hardware
Display Devices — CD-quality speakers, HDTV,SVGA, Hi-Res
monitors, Colour printers etc.
JJ
II
J
I
Back
Close
Applications
Examples of Multimedia Applications include:
• World Wide Web
• Hypermedia courseware
32
• Video conferencing
• Video-on-demand
• Interactive TV
• Groupware
• Home shopping
• Games
• Virtual reality
• Digital video editing and production systems
• Multimedia Database systems
JJ
II
J
I
Back
Close
Trends in Multimedia
Current big applications areas in Multimedia include:
World Wide Web — Hypermedia systems — embrace nearly
all multimedia technologies and application areas.
33
MBone — Multicast Backbone: Equivalent of conventional TV
and Radio on the Internet.
Enabling Technologies — developing at a rapid rate to support
ever increasing need for Multimedia. Carrier, Switching,
Protocols, Applications, Coding/Compression, Database,
Processing, and System Integration Technologies at the
forefront of this.
JJ
II
J
I
Back
Close
Multimedia Data: Input and format
Text and Static Data
• Source: keyboard, floppies, disks and tapes.
• Stored and input character by character:
34
– Storage of text is 1 byte per character (text or format character).
– For other forms of data e.g. Spreadsheet files some formats
may store format as text (with formatting) others may use binary
encoding.
• Format: Raw text or formatted text e.g HTML, Rich Text Format
(RTF), Word or a program language source (C, Pascal, etc..
• Not temporal — BUT may have natural implied sequence e.g.
HTML format sequence, Sequence of C program statements.
• Size Not significant w.r.t. other Multimedia.
JJ
II
J
I
Back
Close
Graphics
• Format: constructed by the composition of primitive objects
such as lines, polygons, circles, curves and arcs.
• Input: Graphics are usually generated by a graphics editor
program (e.g. Freehand) or automatically by a program (e.g.
Postscript).
35
• Graphics are usually editable or revisable (unlike Images).
• Graphics input devices: keyboard (for text and cursor control),
mouse, trackball or graphics tablet.
• graphics standards : OpenGL, PHIGS, GKS
• Graphics files usually store the primitive assembly
• Do not take up a very high storage overhead.
JJ
II
J
I
Back
Close
Images
• Still pictures which (uncompressed) are represented as a
bitmap (a grid of pixels).
• Input: Generated by programs similar to graphics or animation
programs.
36
• Input: scanned for photographs or pictures using a digital
scanner or from a digital camera.
• Analog sources will require digitising.
• Stored at 1 bit per pixel (Black and White), 8 Bits per pixel
(Grey Scale, Colour Map) or 24 Bits per pixel (True Colour)
• Size: a 512x512 Grey scale image takes up 1/4 Mb, a 512x512
24 bit image takes 3/4 Mb with no compression.
• This overhead soon increases with image size
• Compression is commonly applied.
JJ
II
J
I
Back
Close
Audio
• Audio signals are continuous analog signals.
• Input: microphones and then digitised and stored
• usually compressed.
37
• CD Quality Audio requires 16-bit sampling at 44.1 KHz
• 1 Minute of Mono CD quality audio requires 5 Mb.
JJ
II
J
I
Back
Close
Video
• Input: Analog Video is usually captured by a video camera
and then digitised.
• There are a variety of video (analog and digital) formats
38
• Raw video can be regarded as being a series of single images.
There are typically 25, 30 or 50 frames per second.
• a 512x512 size monochrome video images take 25*0.25 =
6.25Mb for a minute to store uncompressed.
• Digital video clearly needs to be compressed.
JJ
II
J
I
Back
Close
Output Devices
The output devices for a basic multimedia system include
• A High Resolution Colour Monitor
• CD Quality Audio Output
39
• Colour Printer
• Video Output to save Multimedia presentations to (Analog)
Video Tape, CD-ROM DVD.
• Audio Recorder (DAT, DVD, CD-ROM, (Analog) Cassette)
• Storage Medium (Hard Disk, Removable Drives, CD-ROM)
JJ
II
J
I
Back
Close
Multimedia Authoring:
Systems and Applications
40
What is an Authoring System?
An Authoring System is a program which has pre-programmed
elements for the development of interactive multimedia software
titles.
Authoring systems vary widely
• orientation,
• capabilities, and
• learning curve.
JJ
II
J
I
Back
Close
Why should you use an authoring system?
• can speed up programming possibly content development
and delivery
• about 1/8th
41
• However, the content creation (graphics, text, video, audio,
animation, etc.) not affected by choice of authoring system;
• time gains – accelerated prototyping
JJ
II
J
I
Back
Close
Authoring Vs Programming
• Big distinction between Programming and Authoring.
• Authoring —
– assembly of Multimedia
42
– possibly high level graphical interface design
– some high level scripting.
• Programming —
– involves low level assembly of Multimedia
– construction and control of Multimedia
– involves real languages like C and Java.
JJ
II
J
I
Back
Close
Multimedia Authoring Paradigms
The authoring paradigm, or authoring metaphor, is the
methodology by which the authoring system accomplishes its
task.
43
There are various paradigms
Scripting Language
Iconic/Flow Control
Frame
Card/Scripting
Cast/Score/Scripting — Macromedia Director
Hypermedia Linkage
Tagging — SMIL
JJ
II
J
I
Back
Close
Scripting Language
• closest in form to traditional programming. The paradigm
is that of a programming language, which specifies (by
filename)
– multimedia elements,
– sequencing,
– hotspots,
– synchronization, etc.
44
• Usually a powerful, object-oriented scripting language
• in-program editing of elements (still graphics, video, audio,
etc.) tends to be minimal or non-existent.
• media handling can vary widely
JJ
II
J
I
Back
Close
Examples
• The Apple’s HyperTalk for HyperCard,
• Assymetrix’s OpenScript for ToolBook and
• Lingo scripting language of Macromedia Director
45
Here is an example lingo script to jump to a frame
global gNavSprite
on exitFrame
go the frame
play sprite gNavSprite
end
JJ
II
J
I
Back
Close
Iconic/Flow Control
• tends to be the speediest in development time
• best suited for rapid prototyping and short-development
time projects.
• The core of the paradigm is the Icon Palette, contains:
46
– possible functions/interactions of a program, and
– the Flow Line — shows the actual links between the
icons.
• slowest runtimes programs , high interaction overheads
Examples:
• Authorware
• IconAuthor
JJ
II
J
I
Back
Close
Frame
• similar to the Iconic/Flow Control paradigm
• usually incorporates an icon palette
• the links drawn between icons are conceptual
• do not always represent the actual flow of the program.
47
Examples
• Quest (whose scripting language is C)
• Apple Media Kit.
JJ
II
J
I
Back
Close
48
JJ
II
J
I
Back
Figure 4: Macromedia Authorware Iconic/Flow Control Examples
Close
Card/Scripting
• paradigm provides a great deal of power
(via the incorporated scripting language)
• suffers from the index-card structure.
• Well suited for Hypertext applications, and especially
suited for navigation intensive (e.g. Cyan’s ”MYST” game)
applications.
49
• extensible via XCMDs and DLLs;
• all objects (including individual graphic elements) to be
scripted;
• many entertainment applications are prototyped in a
card/scripting system prior to compiled-language coding.
JJ
II
J
I
Back
Close
Cast/Score/Scripting
• uses a music score as its primary authoring metaphor
• synchronous elements are shown in various horizontal
tracks
• simultaneity shown via the vertical columns.
50
• power of this metaphor lies in the ability to script the
behavior of each of the cast members.
• easily extensible to handle other functions (such as
hypertext) via XOBJs, XCMDs, and DLLs.
• best suited for animation-intensive or synchronized media
applications;
Examples
• Macromedia Director
• Macromedia Flash — cut Down director Interface
JJ
II
J
I
Back
Close
Hierarchical Object
• paradigm uses a object metaphor (like OOP)
• visually represented by embedded objects and iconic
properties.
• learning curve is non-trivial,
51
• visual representation of objects can make very
complicated constructions possible.
JJ
II
J
I
Back
Close
52
Figure 5: Macromedia Director Score Window
JJ
II
J
I
Back
Close
53
Figure 6: Macromedia Director Cast Window
JJ
II
J
I
Back
Close
54
Figure 7: Macromedia Director Script Window
JJ
II
J
I
Back
Close
Hypermedia Linkage
• similar to the Frame paradigm
• shows conceptual links between elements
• lacks the Frame paradigm’s visual linkage metaphor.
55
JJ
II
J
I
Back
Close
Tagging
tags in text files to
• link pages,
• provide interactivity and
• integrate multimedia elements.
56
Examples:
• SGML/HTML,
• SMIL (Synchronised Media Integration Language),
• VRML,
• 3DML and
• WinHelp
JJ
II
J
I
Back
Close
Issues in Multimedia Applications Design
There are various issues in Multimedia authoring.
Issues involved:
• Content Design
57
• Technical Design
JJ
II
J
I
Back
Close
Content Design
Content design deals with:
• What to say, what vehicle to use.
”In multimedia, there are five ways to format and deliver your
message.
You can
58
• write it,
• illustrate it,
• wiggle it,
• hear it, and
• interact with it.”
JJ
II
J
I
Back
Close
Scripting (writing)
Rules for good writing:
1. Understand your audience and correctly address them.
2. Keep your writing as simple as possible. (e.g., write out the
full message(s) first, then shorten it.)
59
3. Make sure technologies used complement each other.
JJ
II
J
I
Back
Close
Graphics (illustrating)
• Make use of pictures to effectively deliver your messages.
• Create your own (draw, (color) scanner, PhotoCD, ...), or
keep ”copy files” of art works. – ”Cavemen did it first.”
60
Graphics Styles
• fonts
• colors
– pastels
– earth-colors
– metallic
– primary color
– neon color
JJ
II
J
I
Back
Close
Animation (wiggling)
Types of Animation
• Character Animation – humanise an object
• Highlights and Sparkles
• Moving Text
61
• Video – live video or digitized video
JJ
II
J
I
Back
Close
2. When to Animate
• Enhance emotional impact
• Make a point (instructional)
• Improve information delivery
62
• Indicate passage of time
• Provide a transition to next subsection
JJ
II
J
I
Back
Close
Audio (hearing)
Types of Audio in Multimedia Applications:
1. Music – set the mood of the presentation, enhance the emotion,
illustrate points
63
2. Sound effects – to make specific points, e.g., squeaky doors,
explosions, wind, ...
3. Narration – most direct message, often effective
JJ
II
J
I
Back
Close
Interactivity (interacting)
• interactive multimedia systems!
• people remember 70% of what they interact with (according
to late 1980s study)
64
JJ
II
J
I
Back
Close
Types of Interactive Multimedia Applications:
1. Menu driven programs/presentations
– often a hierarchical structure (main menu, sub-menus, ...)
2. Hypermedia
+: less structured, cross-links between subsections of the
same subject –> non-linear, quick access to information
65
+: easier for introducing more multimedia features, e.g., more
interesting ”buttons”
-: could sometimes get lost in navigating the hypermedia
3. Simulations / Performance-dependent Simulations
– e.g., Games – SimCity, Flight Simulators
JJ
II
J
I
Back
Close
Technical Design
Technological factors may limit the ambition of your multimedia
presentation.
Studied Later in detail.
66
JJ
II
J
I
Back
Close
Storyboarding
The concept of storyboarding has been by animators and their
like for many years.
67
JJ
II
J
I
Back
Close
Storyboarding
• used to help plan the general
organisation
• used to help plan the content
of a presentation by recording
68
• organizing ideas on index
cards,
• placed on board/wall.
• Storyboard evolves as the
media are collected and
organised:
– new ideas and refinements
to the presentation are
made.
JJ
II
J
I
Back
Close
Storyboard Examples
• DVD Example
• Storyboarding Explained
• Acting With a Pencil
69
• The Storyboard Artist
• Star Wars Quicktime Storyboard
JJ
II
J
I
Back
Close
Overview of Multimedia Software Tools
Digital Audio
Macromedia Soundedit —- Edits a variety of different format
audio files, apply a variety of effects (Fig 8)
Figure 8: Macromedia Soundedit Main and Control Windows and
Effects Menu
70
JJ
II
J
I
Back
Close
CoolEdit/Adobe Audtion — Edits a variety of different format
audio files
71
JJ
II
J
I
Many Public domain audio editing tools also exist.
Back
Close
Music Sequencing and Notation
Cakewalk
• Supports General MIDI
• Provides several editing views (staff, piano roll, event list)
and Virtual Piano
72
• Can insert WAV files and Windows MCI commands (animation
and video) into tracks
JJ
II
J
I
Back
Close
Cubase
• A better software than Cakewalk Express
• Intuitive Interface to arrange and play Music (Figs 9 and 10)
• Wide Variety of editing tools including Audio (Figs 11 and 12
73
• Score Editing
JJ
II
J
I
Back
Close
74
Figure 9: Cubase Arrange Window (Main)
JJ
II
J
I
Back
Close
Figure 10: Cubase Transport Bar Window — Emulates a Tape
Recorder Interface
75
JJ
II
J
I
Back
Close
76
Figure 11: Cubase Audio Window
JJ
II
J
I
Back
Close
77
Figure 12: Cubase Audio Editing Window with Editing Functions
JJ
II
J
I
Back
Close
Logic Audio
• Cubase Competitor, similar functionality
Marc of the Unicorn Performer
• Cubase/Logic Audio Competitor, similar functionality
78
JJ
II
J
I
Back
Close
79
JJ
II
J
I
Back
Figure 13: Cubase Score Editing Window
Close
Image/Graphics Editing
Adobe Photoshop
• Allows layers of images, graphics and text
• Includes many graphics drawing and painting tools
80
• Sophisticate lighting effects filter
• A good graphics, image processing and manipulation tool
Adobe Premiere
• Provides large number (up to 99) of video and audio tracks,
superimpositions and virtual clips
• Supports various transitions, filters and motions for clips
• A reasonable desktop video editing tool
Macromedia Freehand
• Graphics drawing editing package
Many other editors in public domain and commercially
JJ
II
J
I
Back
Close
Image/Video Editing
Many commercial packages available
• Adobe Premier
• Videoshop
81
• Avid Cinema
• SGI MovieMaker
JJ
II
J
I
Back
Close
Animation
Many packages available including:
• Avid SoftImage
• Animated Gif building packages e.g. GifBuilder
82
JJ
II
J
I
Back
Close
Multimedia Authoring
– Tools for making a complete multimedia presentation where
users usually have a lot of interactive controls.
Macromedia Director
• Movie metaphor (the cast includes bitmapped sprites, scripts,
music, sounds, and palettes, etc.)
83
• Can accept almost any bitmapped file formats
• Lingo script language with own debugger allows more control
including external devices, e.g., VCRs and video disk players
• Ready for building more interactivities (buttons, etc.)
• follows the cast/score/scripting paradigm,
• tool of choice for animation content (Well FLASH for Web).
JJ
II
J
I
Back
Close
Authorware
• Professional multimedia authoring tool
• Supports interactive applications with hyperlinks,
drag-and-drop controls, and integrated animation
• Compatibility between files produced from PC version and
MAC version
84
Other Authoring Tools mentioned in notes later
JJ
II
J
I
Back
Close
Multimedia Authoring:
Scripting (Lingo)
85
Cast/Score/Scripting paradigm.
This section is a very brief introduction to Director.
For further Information, You should consult:
• Macromedia Director Using Director Manual — In Library
• Macromedia Director : Lingo Dictionary Manual — In Library
• Macromedia Director: Application Help — Select Help from within the Director
application. This is very thorough resource of information.
• Macromedia Director Guided tours — see Help menu option.
• A variety of web sites contain director tutorials, hints and information including
http://www.macromedia.com
JJ
II
J
I
Back
Close
More Director References
• Macromedia Director MX
Demystified,
Phil Gross,
Macromedia
0321180976)
Press
(ISBN:
86
• Macromedia Director MX and
Lingo: Training from the Source
Phil Gross,
Macromedia
Press
(ISBN:
0321180968)
• Director 8 and Lingo
(Inside Macromedia),
Scott Wilson,
Delmar (ISBN: 0766820084)
JJ
II
J
I
Back
Close
Related Additional Material and Coursework
Tutorials with additional Director Instructional Material
See Lab Worksheets 1 + 2
87
Also Assessed Exercise 2
JJ
II
J
I
Back
Close
Director Overview/Definitions
movies — Basic Director Commodity:
interactive multimedia pieces that can include
• animation,
• sound,
88
• text,
• digital video,
• and many other types of media.
• link to external media
A movie can be as small and simple as an animated logo or
as complex as an online chat room or game.
Frames — Director divides lengths of time into a series of frames,
cf. celluloid movie.
JJ
II
J
I
Back
Close
Creating and editing movies
4 Key Windows:
the Stage — Rectangular area where the movie plays
89
JJ
II
J
I
Back
Close
the Score : Where the movie is assembled;
90
JJ
II
J
I
Back
Close
one or more Cast windows — Where the movie’s media elements
are assembled;
91
JJ
II
J
I
Back
Close
and
the Control Panel — Controls how the movie plays back.
92
JJ
II
J
I
Back
Close
To create a new movie:
93
• Choose File > New > Movie
JJ
II
J
I
Back
Close
Some other key Director Components (1)
Channels – the rows in the Score that contain sprites for
controlling media
• numbered
• contain the sprites that control all the visible media
94
• Special effects channels at the top contain behaviors as
well as controls for the tempo, palettes, transitions, and
sounds.
Sprites —
Sprites are objects that control when, where, and how media
appears in a movie.
JJ
II
J
I
Back
Close
Some other key Director Components (2)
Cast members —
• The media assigned to sprites.
• media that make up a movie.
95
• includes bitmap images, text, vector shapes, sounds, Flash
movies, digital videos, and more.
Lingo — Director’s scripting language, adds interactivity to a
movie.
Behaviors — pre-existing sets of Lingo instructions.
Markers — identify fixed locations at a particular frame in a
movie.
JJ
II
J
I
Back
Close
Lingo Scripting (1)
Commands — terms that instruct a movie to do something while the
movie is playing. For example, go to sends the playback head to
a specific frame, marker, or another movie.
Properties — attributes that define an object. For example
colorDepth is a property of a bitmap cast member,
96
Functions — terms that return a value. For example, the date function
returns the current date set in the computer. The key function
returns the key that was pressed last. Parentheses occur at the
end of a function,
Keywords — reserved words that have a special meaning.
For example, end indicates the end of a handler,
JJ
II
J
I
Back
Close
Lingo Scripting (2)
Events — actions that scripts respond to.
Constants — elements that don’t change. For example, the constants
TAB, EMPTY, and RETURN always have the same meaning, and
Operators — terms that calculate a new value from one or more
values. For example, the add operator (+) adds two or more
values together to produce a new value.
97
JJ
II
J
I
Back
Close
Lingo Data Types
Lingo supports a variety of data types:
• references to sprites and cast members,
• (Boolean) values: TRUE and FALSE ,
98
• strings,
• constants,
• integers, and
• floating-point numbers.
Standard Program structure syntax
JJ
II
J
I
Back
Close
Lingo Script Types (1)
Director uses four types of scripts.
Behaviors — Behaviors are attached to sprites or frames in the
Score.
99
Figure 14: Behavior Icon
Movie scripts — available to the entire movie
Figure 15: Movie script icon
JJ
II
J
I
Back
Close
Lingo Script Types (2)
Parent scripts — special scripts that contain Lingo used to create
child objects.
100
Figure 16: Parent script icon
Scripts attached to cast members — independent of the Score.
don’t appear in the Cast window.
Figure 17: Script button
JJ
II
J
I
Back
Close
Director Example 1: Simple Animation
A Bouncing Ball Graphic
101
Run Example in Browser (Shockwave)
Run Example in Browser (Lecture ONLY)
• No Lingo scripting.
• basic animation where a cast member
JJ
II
J
I
Back
Close
Creating the Bouncing Ball Graphic
The following steps achieve a simple bouncing ball animation
along a path:
1. Let us begin by creating
a new movie and setting the
Stage size:
102
• Start a New movie: File
> New > Movie
(Shortcut
=
Command+N)
• Choose Modify > Movie
> Properties.
In stage size, choose 640
x 480.
JJ
II
J
I
Back
Close
2. Now let us create a ball, using a the vector shape tool:
• Choose Window > Vector Shape
(Shortcut = Command+Shift+V)
• Click the filled ellipse button.
103
• Draw an ellipse (circle) about the size of
the Vector Shape Window
• Click on the Gradient fill button.
• To change the colours, click the colour
box on the left side of the Gradient
colour control
• Change the colour on the right side of
the Gradient Colours to a dark blue.
• Change the Gradient type pull-down
menu from Linear to Radial.
• Change the Stroke Colour to white .
JJ
II
J
I
Back
Close
3. Now let us change a few other properties of this ellipse
• Close the Vector Shape window.
• In the Cast Window, select the ellipse.
• Choose Edit > Duplicate (Shortcut = Command+D).
104
• Double click the new cast, which opens it in the Vector Shape
Tool.
• Change the Cycles to 3 and the Spread to 200.
• Name the latest ellipse to ’bouncing ball’
JJ
II
J
I
Back
Close
4. Now we are going to animate the ball.
• Drag ’bouncing ball’ from the cast
member window to the stage.
• You will notice the sprite (the
object that appears in the score)
is extended over 20 frames.
105
• Drag the right end of the sprite to
frame 40.
• Click anywhere in the middle of
the sprite to select it.
• resize the ellipse.
JJ
II
J
I
Back
Close
4. Ball Animation (Key Frames)
• Click on frame 40 in channel 1
(the end of the sprite), hold down
Option and shift and drag the
ellipse to the right end of the
stage.
• To curve the path, we are going to
insert keyframes within the sprite.
• Click on frame 10 of the sprite and
choose Insert > Keyframe
(Shortcut =
Command+Option+K)
• Create keyframes at frame 20
and 30.
• at
each keyframe, a circle
appears on the path shown on
the stage.
106
• Click on the keyframe 10 circle
and drag it up.
• Change other Keyframes.
• Rewind and play the movie.
JJ
II
J
I
Back
Close
Further Animation: 1.1 Shrinking the ball
Run Example Shrinking the ball (Shockwave)
Run Shrinking the ball (Lecture Only)
107
• (Optional) Click on keyframe 40 in
the score and drag it to frame 60,
notice how all the keyframes spread out
proportionally.
• (Optional) Click on the keyframes in the
score and adjust the path if you feel like
it.
• While moving the keyframes, resize the
balls so they slowly get smaller. Notice
while you resize the balls, the path
changes and you will need to edit the
path again.
• Rewind and play the movie.
• Save your movie as example2.dir.
JJ
II
J
I
Back
Close
1.2. Animating sprite colour
Run Example: Animating sprite colour (Shockwave)
Run Example: Animating sprite colour (Lecture Only)
• Working still with example1.dir.
108
• Open Property Inspector for Sprite
– Right Mouse (or Ctrl) on Sprite
(Score or Stage)
– Select Properties...
• Click on the keyframes in the score,
and change the Foreground colour chip,
Forecolor, to different colours.
• Changing the foreground colour is like
putting a coloured film over your object.
The resulting colour is a mixture of the
object’s original colour and the ’film’. For
this reason, light colours work better
than dark colours for this effect..
• Rewind and play the movie.
• Save as example3.dir
JJ
II
J
I
Back
Close
1.3. Animating sprite transparency — Making the Ball
Disappear
Run Example: Making the Ball Disappear (Shockwave)
Run Example: Making the Ball Disappear (Lecture Only)
109
• Open example1.dir
• Open Property Inspector for Sprite
• Click on the keyframes in the score, and
• Change the Blend Transparency to 100, 75, 50, 25, 0 for the
consecutive keyframes.
• Rewind and play the movie.
• Save as example4.dir
JJ
II
J
I
Back
Close
1.4. Animating sprite shape — Deforming The Ball
Run Example: Deforming The Ball (Shockwave)
Run Example: Deforming The Ball (Lecture Only)
• Open example1.dir
110
• Open Property Inspector for Sprite
• Click on the keyframes in the score, and
• Change the Skew Angle to 0, 20, 40, 60 and 80 for the
consecutive keyframes.
• Rewind and play the movie
• Save as example5.dir
JJ
II
J
I
Back
Close
Director Example 2: Importing media
To import multimedia data there
are two basic ways:
111
• Choose File > Import ...
Useful for importing batches
of data (e.g. Several image
sequences.
• Drag and drop source media
into a cast member location
Quite Intuitive
JJ
II
J
I
Back
Close
Examples: Simple Image import and Manipulation
• Drag an image into a spare cast member.
• Drag this cast member to the Score
• Set suitable Properties for Sprite
112
– Manipulate as for a vector item above.
• Examples:
– ex dave roll.dir sets up some keyframes and alters the
rotation of the image (Shockwave)
– ex dave roll.dir sets up some keyframes and alters the
rotation of the image (Lecture Only)
– ex dave sq.dir alters the skew angle (Shockwave)
– ex dave sq.dir alters the skew angle (Lecture Only)
JJ
II
J
I
Back
Close
Example: Falling Over Movie, ex dave movie.dir
Example: Falling Over Movie, ex dave movie.dir (Shockwave)
Run Example: Falling Over Movie, ex dave movie.dir (Lecture
Only)
113
• Several Gif images depicting sequence
exist on disk.
• Choose File > Import
• Select items you wish to import by
double-clicking or pressing the Add
button
• Click on the Import Button
• Several new cast members should be
added
• Set looping on and play
JJ
II
J
I
Back
Close
Example: Pinching Movie Movie, ex dave pinch.dir
Example: Pinching Movie Movie, ex dave pinch.dir (Shockwave)
Example: Pinching Movie Movie, ex dave pinch.dir (Lecture Only)
• Photoshop has been used to set a pinch
effect of varying degree for an image.
114
• Import images as before
• To reverse the image set to obtain a smooth
back and forth animation:
– Select the sprite sequence in the score
– Copy Sequence — Press Command+C
(Copy),
– Click on the frame just after the sprite
sequence
– Paste
Sequence
Command+V (Paste).
—
press
– Click on this second sprite sequence
and choose Modify > Reverse
Sequence.
– Select the 2 sprites by pressing Shift
and clicking on both. Choose Modify >
Join Sprites.
JJ
II
J
I
Back
Close
Simple Lingo Scripting
Director Example 3: Very Simple Action
115
Here we illustrate the basic mechanism of scripting in Director
by developing and extending a very basic example:
Making a button beep and attaching a message to a button
Making the a button beep (Shockwave)
Making the a button beep (Lecture Only)
JJ
II
J
I
Back
Close
Making the Button Beep Movie
• Open a new movie.
• Turn the looping on in the
control panel.
116
• Open the tool palette.
• Click the push button icon.
• Draw a button on the stage,
and type in a label:
“button” here
JJ
II
J
I
Back
Close
Our First Lingo
Now lets write a simple script for the button:
• Press Ctrl+click the button in the cast
window and choose Cast Member
Script.
117
• Director writes the first and last line for
us, add a beep command so the script
look like this:
on mouseUp
beep
end
• Close the window.
• Rewind and play the movie.
• Click the button a few times.
JJ
II
J
I
Back
Close
To pop up a message box on button press (and still beep)
• Reopen the cast member
script.
• Change the text so it now
reads.
118
on mouseUp
beep
alert "Button Pressed"
end
• Close the window.
• Play the movie and click the
button.
JJ
II
J
I
Back
Close
Director Example 4: Controlling Navigation with Lingo
A slightly more complex Lingo Example
This examples illustrates how we may use Lingo Scripts as:
119
• Cast Member Scripts
• Sprite Scripts
• Behaviour Scripts
JJ
II
J
I
Back
Close
Director Example 4: Ready Made Example
To save time, we begin we a preassembled
Director movie:
Run Lingo Navigation Example (Shockwave)
Run Lingo Navigation Example
(Lecture Only)
120
• Open lingo ex.3.2.dir
• Play the movie —
press some of the buttons
– The Numbered buttons record
moves through
Scenes/Frames
– The Next/Back buttons replay
what has been recorded.
JJ
II
J
I
Back
Close
The Loop the Frame script
We are first going to create a loop the frame script:
• Cast Member 11 controls the Frame
Looping
• Note we have created a special frame
121
marking channel in the Score.
• To create the associated script either
– Double click on the script icon in the
Score
– Ctrl-click on the Cast member and
select Member Script
• The scripting window appears. You can
edit the script text, it now reads:
on exitFrame
go the frame
end
This frame script tells Director to keep
playing the same frame.
• The Loop lasts to frame 24
• Pressing
down Alt and
dragging the frame script in
the Score can change this
length.
JJ
II
J
I
Back
Close
Scene Markers (1)
Now we will create some markers
• To Create a Marker You Click in the marking channel for
the Frame and label some the marker with some typed text
122
In this example:
• Markers are at frame 1,
10 and 20, naming them
scene1,
scene2 and
scene3 respectively.
You can delete
a marker by clicking the
triangle and dragging it
below the marker channel.
• Note:
• A cast member (9) script
for the next button has also
been created:
on mouseUp
go to next
end
• The
go to next
command tells Director to
go to the next consecutive
marker in the score.
JJ
II
J
I
Back
Close
Scene Markers (2)
• A cast member (10) script for the back button has also been
created:
on mouseUp
go to previous
end
123
The go to previous command tells Director to go to the
previous marker in the score.
• Once again, Play the movie, click on these buttons to see
how they work.
JJ
II
J
I
Back
Close
Sprite Scripts
Now We will create some sprite scripts:
• Sometimes a button will
– behave one way in one part of the movie and
124
– behave another way in a different part of the movie.
– A typical example use of sprite scripts.
JJ
II
J
I
Back
Close
The Next Button Sprite Scripts (1)
Desired Action of Next Button: Jump to next scene
125
JJ
II
J
I
Back
Close
The Next Button Sprite Scripts (2)
• Here we have split actions to map to our
Scene Markers. To achieve this:
– Click on frame 10 of channel 6 (the
next button) this sprite and choose
Modify > Split Sprite.
– Do the same at frame 20.
126
• To attach a script to each split action:
– Select the each sprite sequence
(here in channel 6).
– Ctrl-click on sequence and select
Script... from the pull-down in
the score to give a script window.
– We add a suitable jump to next
scene
– In example shown we have go to
"scene2" :
This command tells Director to send
the movie to marker "scene2".
– Could do other sequences similar –
But alternatives exist.
JJ
II
J
I
Back
Close
Behaviour Scripts
Example Here: Another way to write the sprite script on last
slide - using the Behaviour Inspector.
127
Behaviour Scripts can do A LOT MORE.
JJ
II
J
I
Back
Close
A Behaviour Script for Next Button (Scene 2) (1)
• We now work with the second sprite
sequence (channel 6 in the score).
128
• We will create (or have created) an
associated behaviour script:
– Ctrl-click on the second sequence
– Open the Behaviour Inspector
window
– Click on the Script Window icon next
to the Behaviour Inspector Tab.
• To Create/name a new Behaviour:
– Click the + icon at the top left of the
window and select new behaviour
from the pull-down.
– Give the behaviour a name, here it
is called next2.
JJ
II
J
I
Back
Close
A Behaviour Script for Next Button (Scene 2) (2)
• To add events/actions to the script you
can:
129
– Under Events click the + icon
In this example we have added a
mouseUp from the menu.
– Under Actions click the + icon
In this example we have chosen
Navigation > Go to marker then find
scene3 on the list
– You can add/edit Lingo text
manually in the Script Editor
window for the particular behaviour
JJ
II
J
I
Back
Close
Summary: Sprite Script Order
We now 2 scripts attached to a single object (achieving much
the same task):
• a Cast Member script
130
• a Sprite script.
• Sprite scripts take priority over cast member scripts
• So Here cast member script will be ignored.
JJ
II
J
I
Back
Close
Some more Lingo to add to our example
Another part of our Application:
The jump buttons 1-3 (Buttons 4-6 currently inactive).
131
We will be using Lingo play/ play do to record actions
We have created a Vector Graphic Image (Cast Member 2)
for the main Recorder Interface
JJ
II
J
I
Back
Close
A problem in Director?
In Director: a script can only be associated with a complete
Object
For the way we have created the Recorder Interface we require
(and this is clearly a common requirement in many other cases):
132
• Only part an image to be linked instead of the whole object.
• One part for each of the jump buttons 1-3.
There is a solution:
• Use invisible buttons.
• These are shape cast members with an invisible border.
JJ
II
J
I
Back
Close
Creating our Invisible Buttons
• We have added our invisible as Cast
Member 14. To create this component:
– Open the Tool palette window.
– Click on the no line button.
– Click on the rectangle button and
draw a rectangle on the stage
around the 1 button.
133
• We have added this sprite to 8 and we
have attached a sprite script:
– Ctrl-click on frame 1 of channel 8
and select script.
– Attach a sprite script to this shape
with the command play ”scene1”.
– Extend the sprite sequence so it
covers frame 1 to 24.
• Repeat the steps placing the sprite over
the 2 and 3 button
JJ
II
J
I
Back
Close
Final Part: the Back Button (1)
Director provides the ability to record actions for future use
The Lingo Play command
The play command is similar to the go to command but
allows:
134
• Director records every time a play is initiated,
• Keeping track of the users’ path through the movie.
• You can move back on along this path by using the play
done command.
JJ
II
J
I
Back
Close
Final Part: the Back Button (1)
So in this Example
• Select the sprite sequence in channel 5 and Cast member
10.
135
• Attach a Sprite script reading
on mouseUp
play done
end
• Rewind, play the movie, click all the 1, 2, 3 buttons in various
orders, click the back button also and observe the effect of
Back button
• Complete example: lingo ex3.2.dir (Web Based)
• Complete example: lingo ex3.2.dir (Local Version)
JJ
II
J
I
Back
Close
Multimedia Authoring:Tagging (SMIL)
• Last lecture — Lingo scripting
136
• This lecture — Tagging
• SMIL an extension of XML for synchronised media integration.
JJ
II
J
I
Back
Close
What it is SMIL?
• SMIL is to synchronized multimedia what
• HTML is to hyperlinked text.
• Pronounced smile
137
JJ
II
J
I
Back
Close
SMIL :
• A simple,
• Vendor-neutral
• Markup language
138
Designed to:
• For all skill levels of WWW authors
• Schedule audio, video, text, and graphics files across a
timeline
• No need to master development tools or complex
programming languages.
• HTML-like need a text editor only
• Links to media — medie not embedded in SMIL file
JJ
II
J
I
Back
Close
Drawbacks of SMIL?
Good Points:
• A powerful tool for creating synchronized multimedia
presentations on the web
139
• Deals with low bandwidth connections.
Bad Points:
• Meant to work with linear presentations
• Several types of media can be synchronized to one timeline.
• Does not work well with non-linear presentations
• Ability to skip around in the timeline is buggy.
For slideshow style mixed media presentations it the best the
web has to offer.
JJ
II
J
I
Back
Close
SMIL support
• The W3C recommended SMIL in June 1998
• Quicktime 4.0 supports SMIL (1999)
• Not universally supported across the Web.
140
• No Web browser directly support SMIL
• RealPlayer G2 supports SMIL
• Many other SMIL-compliant players, authoring tools, and
servers available.
JJ
II
J
I
Back
Close
Running SMIL Applications
For this course there are basically three ways to run SMIL
applications (two use the a Java Applet) so there are basically
two SMIL supported mediums:
Quicktime — supported since Quicktime Version 4.0.
141
RealPlayer G2 — integrated SMIL support
Web Browser — use the SOJA SMIL applet viewer with html
wrapper
Applet Viewer — use the SOJA SMIL applet viewer with html
wrapper
JJ
II
J
I
Back
Close
Quicktime media support is richer (see later sections on
Quicktime).
You will need to use both as RealPlayer and SOJA support
different media
Media
Tag
RealPlayer GRiNS Soja
GIF
img
OK
OK
OK
JPEG
img
OK
OK
OK
Wav
audio
OK
OK
.au Audio
audio
OK
OK
OK
.auz Audio Zipped
audio
OK
MP3
audio
OK
Plain text
text
OK
OK
OK
Real text
textstream
OK
Real movie
video
OK
AVI
video
OK
OK
MPEG
video
OK
OK
MOV
video
OK
-
142
JJ
II
J
I
Back
Close
Using Quicktime
• Load the SMIL file into a Quicktime plug-in (configure Browser
helper app or mime type) or
• the Quicktime movie player.
143
JJ
II
J
I
Back
Close
Using RealPlayer G2
The RealPlayer G2 is installed on the applications HD in the
RealPlayer folder.
Real player supports lots of file format and can use plugins.
The main supported formats are:
144
• Real formats: RealText, RealAudio, etc...
• Images: GIF, JPEG
• Audio: AU, WAV, MIDI, etc...
JJ
II
J
I
Back
Close
To run SMIL files
Real Player uses streaming to render presentations.
• works better when calling a SMIL file given by a Real Server,
• rather than from an HTTP one.
145
Locally RUN SMIL files
• drag a SMIL file onto the RealPlayer G2 Application
• Open a local SMIL file inside RealPlayer G2 Application
JJ
II
J
I
Back
Close
Using the SOJA applet
SOJA stands for SMIL Output in Java
Applet.
SOJA is an applet that render SMIL in a web page or in a
separate window. It supports the following formats:
• Images: GIF, JPEG
146
• Audio: AU and AUZ (AU zipped) — SUN Audio files
• Text: plain text
JJ
II
J
I
Back
Close
Running SOJA
To run SMIL through an applet you have to
• call the applet from an HTML file:
<APPLET CODE="org.helio.soja.SojaApplet.class"
ARCHIVE="soja.jar" CODEBASE="../"
WIDTH="600" HEIGHT="300">
<PARAM NAME="source" VALUE="cardiff_eg.smil">
<PARAM NAME="bgcolor" VALUE="#000066">
</APPLET>
147
• the SOJA (soja.jar) archive is located in the SMIL folder on
the Macintoshes.
• You may need to alter the CODEBASE attribute for your own
applications
• The PARAM NAME="source" VALUE="MY SMILFILE.smil" is
how the file is called.
JJ
II
J
I
Back
Close
RUNNING APPLETS
This should be easy to do
• Run the html file through a java enabled browser
• Use Apple Applet Runner
– uses MAC OS Runtime Java (Java 1.2)
148
– less fat for SMIL applications (we do really need Web
connection for our examples)
– Efficient JAVA and MAC OS run.
– Located in Apple Extras:Mac OS Runtime For Java folder
– TO RUN: Drag files on to application, OR
– TO RUN: Open file from within application
JJ
II
J
I
Back
Close
Let us begin to SMIL — SMIL Authoring
SMIL Syntax Overview
• SMIL files are usually named with .smi or .smil extensions
• XML based syntax
149
JJ
II
J
I
Back
Close
Basic Layout
The basic Layout of a SMIL Documents is as
follows:
<smil>
<head>
<meta name="copyright"
content="Your Name" />
<layout>
<!-- layout tags -->
</layout>
</head>
<body>
<!-- media and synchronization tags -->
</body>
</smil>
150
JJ
II
J
I
Back
Close
A source begins with <smil> and ends with </smil>.
Note that SMIL is case sensitive
<smil>
....
</smil>
151
JJ
II
J
I
Back
Close
SMIL documents have two parts: head and body. Each of
them must have <smil> as a parent.
<smil>
<head>
....
</head>
<body>
....
</body>
</smil>
152
JJ
II
J
I
Back
Close
Some tags, such as meta can have a slash at their end:
....
<head>
<meta name="copyright"
content="Your Name" />
</head>
....
153
This is because SMIL is XML-based.
Some tags are written:
• <tag> ... </tag>
• <tag />
JJ
II
J
I
Back
Close
Everything concerning layout (including window
settings) is stored between the <layout> and the </layout>
tags in the header as shown in the above subsection.
A variety of Layout Tags define the presentation layout:
SMIL Layout
<smil>
<head>
<layout>
<!-- layout tags -->
</layout>
......
154
JJ
II
J
I
Back
Close
Window settings
You can set width and height for the window in which your
presentation will be rendered with <root-layout>.
The following source will create a window with a 300x200
pixels dimension and also sets the background to be white.
155
<layout>
<root-layout width="300" height="200"
background-color="white" />
</layout>
JJ
II
J
I
Back
Close
It is really easy to position media with SMIL.
You can position media in 2 ways:
Positioning Media
Absolute Positioning — Media are located with offsets from
the origin — the upper left corner of the window.
Relative Positioning — Media are located relative to the window’s
dimensions.
156
We define position with a <region> tag
JJ
II
J
I
Back
Close
The Region tag —
To insert a media within our presentation we use the <region>
tag.
• must specify the region (the place) where it will be displayed.
• must also assign an id that identifies the region.
157
JJ
II
J
I
Back
Close
Let’s say we want to
• insert the Cardiff icon (533x250 pixels)
• at 30 pixels from the left border and
• at 25 pixels from the top border.
The header becomes:
<smil>
<head>
<layout>
<root-layout width="600" height="300"
background-color="white" />
<region id="cardiff_icon"
left="30" top="25"
width="533" height="250" />
</layout>
</head>
......
158
JJ
II
J
I
Back
Close
The img tag
To insert the Cardiff icon in the region called ”cardiff icon”, we
use the <img> tag as shown in the source below.
Note that the region attribute is a pointer to the <region>
tag.
159
<head>
<layout>
<root-layout width="600" height="300"
background-color="white" />
<region id="cardiff_icon"
left="30" top="25"
width="533" height="250" />
</layout>
</head>
<body>
<img src="cardiff.gif"
alt="The Cardiff icon"
region="cardiff_icon" />
</body>
JJ
II
J
I
Back
Close
This produces the following output:
160
Figure 18: Simple Cardiff Image Placement in SMIL
JJ
II
J
I
Back
Close
Relative Position Example
if you wish to display the Cardiff icon at
• 10% from the left border and
• at 5% from the top border, modify the previous source and
replace the left and top attributes.
<head>
<layout>
<root-layout width="600" height="600"
background-color="white" />
<region id="cardiff_icon"
left="10%" top="5%"
width="533" height="250" />
</layout>
</head>
<body>
<img src="cardiff.gif"
region="cardiff_icon" />
</body>
161
JJ
II
J
I
Back
Close
Overlaying Regions
We have just seen how to position a media along x and y axes
(left and top).
What if two regions overlap ?
• Which one should be displayed on top ?
162
JJ
II
J
I
Back
Close
The following code points out the problem:
<smil>
<head>
<layout>
<root-layout width="300" height="200"
background-color="white" />
<region id="region_1" left="50" top="50"
width="150" height="125" />
<region id="region_2" left="25" top="25"
width="100" height="100" />
</layout>
</head>
<body>
<par>
<text src="text1.txt" region="region_1" />
<text src="text2.txt" region="region_2" />
</par>
</body>
</smil>
163
JJ
II
J
I
Back
Close
To ensure that one region is over the other, add z-index
attribute to <region>.
When two region overlay:
• the one with the greater z-index is on top.
• If both regions have the same z-index, the first rendered one
is below the other.
164
JJ
II
J
I
Back
Close
In the following code, we add z-index to region 1 and
region 2:
<smil>
<head>
<layout>
<root-layout width="300" height="200"
background-color="white" />
<region id="region_1" left="50"
top="50" width="150"
height="125" z-index="2"/>
<region id="region_2" left="25"
top="25" width="100"
height="100" z-index="1"/>
</layout>
</head>
<body>
<par>
<text src="text1.txt" region="region_1" />
<text src="text2.txt" region="region_2" />
</par>
</body>
</smil>
165
JJ
II
J
I
Back
Close
fitting media to regions
You can set the fit attribute of the <region> tag to force
media to be resized etc.
The following values are valid for fit:
• fill — make media grow and fill the area.
166
• meet — make media grow (without any distortion) until it
meets the region frontier.
• slice — media grows (without distortion) and fill entirely its
region.
• scroll — if media is bigger than its region area gets scrolled.
• hidden — don’t show media
Obviously you set the value like this:
<region id="region_1"
fit="fill" />
.....
JJ
II
J
I
Back
Close
There are two basic ways in which we may
want to play media:
Synchronisation
• play several media one after the other,
• how to play several media in parallel.
167
In order to do this we need to add synchronisation:
• we will need to add time parameter to media elements,
JJ
II
J
I
Back
Close
Adding a duration of time to media — dur
To add a duration of time to a media element simply specify a
dur attribute parameter in an appropriate media tag:
.....
<body>
<img src="cardiff.gif"
alt="The Cardiff icon"
region="cardiff_icon" dur="6s" />
</body>
.....
168
JJ
II
J
I
Back
Close
Delaying Media — the begin tag
To specify a delay i.e when to begin set the begin attribute
parameter in an appropriate media tag:
If you add begin=”2s” in the cardiff image tag, you will see
that the Cardiff icon will appear 2 seconds after the document
began and will remain during 6 other seconds. Have a look at
the source:
.....
<body>
<body>
<img src="cardiff.gif"
alt="The Cardiff icon"
region="cardiff_icon"
dur="6s" begin="2s" />
</body>
.....
169
JJ
II
J
I
Back
Close
Sequencing Media — the seq tag
Scheduling media:
The <seq> tag is used to define a sequence of media.
• The media are executed one after each other:
.....
<seq>
<img src="img1.gif"
region="reg1" dur="6s" />
<img src="img2.gif"
region="reg2"
dur="4s" begin="1s" />
</seq>
.....
So the setting 1s makes the img2.gif icon appear 1 second
after img1.gif.
170
JJ
II
J
I
Back
Close
Parallel Media — the par tag
We use the <par> to play media at the same time:
<par>
<img src="cardiff.gif"
alt="The cardiff icon"
region="cardiff_icon" dur="6s" />
<audio src="music.au" alt="Some Music"
dur="6s" />
</par>
171
This will display an image and play some music along with it.
JJ
II
J
I
Back
Close
Synchronisation Example 1: Planets Soundtrack
The following SMIL code plays on long soundtrack along with
as series of images.
Essentially:
• The audio file and
172
• image sequences are played in parallel
• The Images are run in sequence with no break (begin =
0s)
JJ
II
J
I
Back
Close
The files are stored on the MACINTOSHES in the Multimedia
Lab (in the SMIL folder) as follows:
• planets.html — call SMIL source (below) with the SOJA
applet. This demo uses zipped (SUN) audio files (.auz)
which are not supported by RealPlayer.
173
• planets.smil — sMIL source (listed below),
JJ
II
J
I
Back
Close
SMIL HEAD DATA
<smil>
<head>
<layout>
<root-layout height="400" width="600"
background-color="#000000"
title="Dreaming out Loud"/>
<region id="satfam" width="564" height="400"
top="0" left="0" background-color="#000000"
z-index="2" />
<region id="jupfam" width="349" height="400"
top="0" left="251" background-color="#000000"
z-index="2" />
<region id="redsun" width="400" height="400"
top="0" left="100" background-color="#000000"
z-index="2" />
...........
</layout>
</head>
174
JJ
II
J
I
Back
Close
SMIL BODY DATA
<body>
<par>
<audio src="media/dreamworldb.auz"
dur="61.90s" begin="3.00s"
system-bitrate="14000" />
<seq>
<img src="media/satfam1a.jpg" region="satfam"
begin="1.00s" dur="4.50s" />
<img src="media/jupfam1a.jpg" region="jupfam"
begin="1.50s" dur="4.50s" />
<img src="media/redsun.jpg" region="redsun"
begin="1.00s" dur="4.50s" />
........
<img src="media/orion.jpg" region="orion"
begin="1.00s" dur="4.50s" />
<par>
<img src="media/pillarsb.jpg" region="pillars"
begin="1.00s" end="50s" />
<img src="media/blank.gif" region="blank"
begin="2.00s" end="50.00s" />
<text src="media/music.txt" region="music"
begin="3.00s" end="50.00s" />
..........
<text src="media/me.txt" region="me"
begin="20.00s" dur="3.00s" />
175
JJ
II
J
I
Back
Close
<text src="media/jose.txt" region="jose"
begin="23.00s" end="50.00s" />
</par>
<text src="media/title.txt" region="title"
begin="3.00s" end="25.00s" />
</seq>
</par>
</body>
</smil>
176
JJ
II
J
I
Back
Close
Synchronisation Example 2: Slides ’N’ Sound
Dr John Rosbottom of Plymouth University has come up with
a novel way of giving lectures.
This has
• one long sequence of
177
• parallel pairs of images and audio files
The files are stored on the MACINTOSHES in the Multimedia
Lab (in the SMIL folder) as follows:
• slides n sound.smil — sMIL source (listed below), play
with RealPlayer G2. NOTE: This demo uses real audio files
which are not supported by SOJA:
JJ
II
J
I
Back
Close
<smil>
<head>
<layout>
<root-layout height="400" width="600" background-color="#000000"
title="Slides and Sound"/>
</layout>
</head>
<body>
<seq>
<par>
<audio src="audio/leconlec.rm" dur="24s" title="slide 1"/>
<img src="slides/img001.GIF" dur="24s"/>
</par>
178
<par>
<audio src="audio/leconlec.rm" clip-begin="24s" clip-end="51s" dur="27s"
title="slide 2"/>
<img src="slides/img002.GIF" dur="27s"/>
</par>
............
<par>
<audio src="audio/leconlec.rm" clip-begin="610s"
clip-end="634s" dur="24s" title="The Second Reason"/>
<img src="slides/img018.GIF" clip-begin="610s"
clip-end="634s" dur="24s" title="The Second Reason"/>
</par>
<par>
<audio src="audio/leconlec.rm" clip-begin="634s" clip-end="673s" dur="39s"
title="Slide 19"/>
<img src="slides/img019.GIF" clip-begin="634s" clip-end="673s" dur="39s"
title="Slide 19"/>
</par>
<img src="slides/img006.GIF" fill="freeze" title="And finally..."
author="Abbas Mavani ([email protected])"
JJ
II
J
I
Back
Close
copyright="Everything is so copyright protected (c)1999"/>
<!-- kept this in to remind me that you can have single things
<audio src="audio/AbbasTest.rm" dur="50.5s"/>
-->
</seq>
</body>
</smil>
179
JJ
II
J
I
Back
Close
Smiles supports event based synchronisation:
begin events
SMIL Events
• When a media begins, it sends a begin event.
• If another media waits for this event, it catches it.
180
JJ
II
J
I
Back
Close
To make a media wait to an event,
• one of its synchronisation attributes
• (begin or end) should be written as follows:
<!-- if you want tag to start when
another tag begins -->
<tag begin="id(specifiedId)(begin)" />
181
<!-- if you want tag to start 3s after
another tag begins -->
<tag begin="id(specifiedId)(3s)" />
<!-- if you want tag to start when
another tag ends -->
<tag begin="id(specifiedId)(end)" />
JJ
II
J
I
Back
Close
For example:
<body>
<par>
<img src="cardiff.gif" region="cardiff"
id="cf" begin="4s" />
182
<img src="next.gif" region="next"
begin="id(cf)(2s)" />
</par>
</body>
will make the next.gif image begin 2s after cardiff.gif
begins.
JJ
II
J
I
Back
Close
The switch Tag
The syntax for the switch tag is:
<switch>
<!-- child1 testAttributes1 -->
<!-- child2 testAttributes2 -->
<!-- child3 testAttributes3 -->
</switch>
183
The rule is:
• The first of the <switch> tag children whose test attributes
are all evaluated to TRUE is executed.
• A tag with no test attributes is evaluated to TRUE.
• See SMIL reference for list of valid test attributes
JJ
II
J
I
Back
Close
For example you may wish to provide presentations in english
or welsh:
<body>
<switch>
<!-- English only -->
<par system-language="en">
<img src="cardiff.gif"
region="cardiff"/>
< audio src ="english.au" />
</par>
<!-- Welsh only -->
<par system-language="cy">
<img src="caerdydd.gif"
region="cardiff"/>
<audio src ="cymraeg.au" />
</par>
somewhere in code you will set (or it will be set) the
system-language
184
JJ
II
J
I
Back
Close
Multimedia Systems Technology
185
Multimedia systems have to deal with the
• generation,
• manipulation,
• storage,
• presentation, and
• communication of information
Lets consider some broad implications of the above
JJ
II
J
I
Back
Close
Discrete v Continuous Media
RECALL: Our Definition of Multimedia
• All data must be in the form of digital information.
186
• The data may be in a variety of formats:
– text,
– graphics,
– images,
– audio,
– video.
JJ
II
J
I
Back
Close
Synchronisation
A majority of this data is large and the different media may
need synchronisation:
187
• The data will usually have temporal relationships as an
integral property.
JJ
II
J
I
Back
Close
Static and Continuous Media
188
Static or Discrete Media — Some media is time independent:
Normal data, text, single images, graphics are examples.
Continuous media — Time dependent Media:
Video, animation and audio are examples.
JJ
II
J
I
Back
Close
Analog and Digital Signals
189
• Some basic definitions – Studied HERE
• Overviewing of technology — Studied HERE
• In depth study later.
JJ
II
J
I
Back
Close
Analog and Digital Signal Converters
The world we sense is full of analog signals:
• Electrical sensors convert the medium they sense into
electrical signals
190
– E.g. transducers, thermocouples, microphones.
– (usually)continuous signals
• Analog signals must be converted ordigitised
• Digital: discrete digital signals that computer can readily deal
with.
• Special hardware devices : Analog-to-Digital converters
• Playback – a converse operation: Digital-to-Analog .
JJ
II
J
I
Back
Close
Multimedia Data: Input and format
How to capture and store each Media format?
191
Note that Text, Graphics and some images are generated
directly by computer and do not require digitising:
they are generated directly in some binary format.
Handwritten text would have to digitised either by electronic
pen sensing of scanning of paper based form.
JJ
II
J
I
Back
Close
Text and Static Data
• Source: keyboard, floppies, disks and tapes.
• Stored and input character by character:
– Storage of text is 1 byte per character (text or format
character).
– For other forms of data e.g. Spreadsheet files some
formats may store format as text (with formatting) others
may use binary encoding.
192
• Format: Raw text or formatted text e.g HTML, Rich Text
Format (RTF), Word or a program language source (C, Java,
etc.
• Not temporal — BUT may have natural implied sequence
e.g. HTML format sequence, Sequence of C program
statements.
• Size Not significant w.r.t. other Multimedia.
JJ
II
J
I
Back
Close
Graphics
• Format: constructed by the composition of primitive objects
such as lines, polygons, circles, curves and arcs.
• Input: Graphics are usually generated by a graphics editor
program (e.g. Freehand) or automatically by a program (e.g.
Postscript).
193
• Graphics are usually editable or revisable (unlike Images).
• Graphics input devices: keyboard (for text and cursor
control), mouse, trackball or graphics tablet.
• graphics standards : OpenGL, PHIGS, GKS
• Graphics files usually store the primitive assembly
• Do not take up a very high storage overhead.
JJ
II
J
I
Back
Close
Images
• Still pictures which (uncompressed) are represented as a
bitmap (a grid of pixels).
• Input: Generated by programs similar to graphics or
animation programs.
194
• Input: scanned for photographs or pictures using a digital
scanner or from a digital camera.
• Analog sources will require digitising.
• Stored at 1 bit per pixel (Black and White), 8 Bits per pixel
(Grey Scale, Colour Map) or 24 Bits per pixel (True Colour)
• Size: a 512x512 Grey scale image takes up 1/4 Mb, a 512x512
24 bit image takes 3/4 Mb with no compression.
• This overhead soon increases with image size
• Compression is commonly applied.
JJ
II
J
I
Back
Close
Audio
• Audio signals are continuous analog signals.
• Input: microphones and then digitised and stored
• usually compressed.
195
• CD Quality Audio requires 16-bit sampling at 44.1 KHz
• 1 Minute of Mono CD quality audio requires 5 Mb.
JJ
II
J
I
Back
Close
Video
• Input: Analog Video is usually captured by a video camera
and then digitised.
• There are a variety of video (analog and digital) formats
196
• Raw video can be regarded as being a series of single images.
There are typically 25, 30 or 50 frames per second.
• a 512x512 size monochrome video images take
25*0.25 = 6.25Mb
for a minute to store uncompressed.
• Digital video clearly needs to be compressed.
JJ
II
J
I
Back
Close
Output Devices
The output devices for a basic multimedia system include
• A High Resolution Colour Monitor
• CD Quality Audio Output
197
• Colour Printer
• Video Output to save Multimedia presentations to (Analog)
Video Tape, CD-ROM DVD.
• Audio Recorder (DAT, DVD, CD-ROM, (Analog) Cassette)
• Storage Medium (Hard Disk, Removable Drives, CD-ROM)
JJ
II
J
I
Back
Close
Storage Media
The major problems that affect storage media are:
• Large volume of date
• Real time delivery
198
• Data format
• Storage Medium
• Retrieval mechanisms
JJ
II
J
I
Back
Close
High performance I/O
There are four factors that influence I/O performance:
Data —
199
• high volume, continuous, contiguous vs distributed storage.
• Direct relationship between size of data and how long it
takes to handle.
• Compression
JJ
II
J
I
Back
Close
Data Storage —
• Depends of the storage hardware and
• The nature of the data.
• The following storage parameters affect how data is stored:
– Storage Capacity
– Read and Write Operations of hardware
– Unit of transfer of Read and Write
– Physical organisation of storage units
– Read/Write heads, Cylinders per disk,
Tracks per cylinder, Sectors per Track
– Read time
– Seek time
200
JJ
II
J
I
Back
Close
Data Transfer —
• Depend how data generated and
• written to disk, and
• in what sequence it needs to retrieved.
• Writing/Generation of Multimedia data is usually
sequential e.g. streaming digital audio/video direct to disk.
201
• Individual data (e.g. audio/video file) usually streamed.
• RAID architecture can be employed to accomplish high
I/O rates (parallel disk access)
JJ
II
J
I
Back
Close
Operating System Support —
• Scheduling of processes when I/O is initiated.
• Time critical operations can adopt special procedures.
• Direct disk transfer operations free up CPU/Operating
system space.
202
JJ
II
J
I
Back
Close
Basic Storage
Basic storage units have problems dealing with large multimedia
data
203
• Single Hard Drives — SCSI/IDE Drives.
• AV (Audio-Visual) drives
– avoid thermal recalibration between read/writes,
– suitable for desktop multimedia.
• New drives are fast enough for direct to disk audio and video
capture.
• not adequate for commercial/professional Multimedia.
JJ
II
J
I
Back
Close
• Removable Media —
– Floppies not adequate
– Jaz/Zip Drives,
– CD-ROM,
– DVD-ROM.
204
JJ
II
J
I
Back
Close
RAID — Redundant Array of Inexpensive Disks
Needed:
• To fulfill the needs of current multimedia and other data
hungry application programs,
• Fault tolerance built into the storage device.
205
• Parallel processing exploits arrangement of hard disks.
Raid technology offers some significant advantages as a
storage medium of multimedia data:
• Affordable alternative to mass storage
• High throughput and reliability
JJ
II
J
I
Back
Close
The key components of a RAID System are:
• Set of disk drives, disk arrays, viewed by user as one or more
logical drives.
• Data may be distributed across drives
• Redundancy added in order to allow for disk failure
206
• Disk arrays:
– store large amounts of data,
– have high I/O rates and
– take less power per megabyte (cf. high end disks)
– but they have very poor reliability
As more devices are added, reliability deteriorates
– N devices generally have
device
1
N
the reliability of a single
JJ
II
J
I
Back
Close
Overcoming Reliability Problems
Redundancy — Files stored on arrays may be striped across
multiple disks.
207
Four ways do to this.
JJ
II
J
I
Back
Close
Four ways of Overcoming Reliability Problems
• Mirroring or shadowing of the contents of disk, which can
be a capacity kill approach to the problem.
– write on two disks - a 100% capacity overhead.
– Reads to disks may however be optimised.
208
• Horizontal Hamming Codes: A special means to
reconstruct information using an error correction encoding
technique.
• Parity and Reed-Soloman Codes: Also an error correction
coding mechanism. Parity may be computed in a number of
ways.
• Failure Prediction: There is no capacity overhead in this
technique.
JJ
II
J
I
Back
Close
RAID Architecture
Each disk within the array needs to have its own I/O controller,
but interaction with a host computer may be mediated through
an array controller
209
Disk
Controller
Host
Processor
Host
Adaptor
Disk
Controller
Array
Controller
Manages the control
logic and parity
Disk
Controller
Disk
Controller
JJ
II
J
I
Back
Close
Orthogonal RAID
Possible to combine the
disks together to produce a
collection of devices, where
• Each vertical array is now
the unit of data
redundancy.
210
• Such an arrangement
is called an orthogonal
RAID
• Other arrangements of
disks are also possible.
JJ
II
J
I
Back
Close
The Eight levels of RAID
There are 8 levels of RAID technology:
• each level providing a greater amount of resilience then the
lower levels:
211
Level 0: Disk Striping
Level 1: Disk Mirroring
Level 2: Bit Interleaving and HEC Parity
Level 3: Bit Interleaving with XOR Parity
Level 4: Block Interleaving with XOR Parity
Level 5: Block Interleaving with Parity Distribution
Level 6: Fault Tolerant System
Level 7: Heterogeneous System
JJ
II
J
I
Back
Close
First Six RAID levels
RAID 0
Simultaneous reads
on every drive
Simultaneous reads
on every drive
RAID 1
Data
Duplication on
drive pairs
RAID 2
Each
write
spans
all
drives
Each
read
spans
all drives
Simultaneous
Writes on
every drive
212
Simultaneous reads
on every drive
Read and
Write
Span
All
drives
Parallel
Access
Parity
Simultaneous reads
on every drive
Parity
RAID 3
Every
Write must update
a dedicated parity drive
Each drive now also
handles parity
indicated by the filled circle
RAID 4
RAID 5
JJ
II
J
I
Back
Close
Optical Storage
• The most popular storage medium in the multimedia context
• compact size,
• High density recording,
213
• Easy handling and
• Low cost per MB.
• CD and recently DVD (ROM) the most common
• Laser disc — older format.
JJ
II
J
I
Back
Close
CD Storage
There a now various formats of CD:
• CD-DA (Compact Disc-Digital Audio)
• CD-I (Compact Disc-Interactive)
• CD-ROM/XA (eXtended Architecture)
214
• Photo CD
The capacity of a CD-ROM is
• 620-700 Mbs depending on CD material,
• 650/700 Mb (74/80 Mins) is a typical write once CD-ROM
size.
• Drives that read and write CD-ROMS (CD-RW) also similar.
JJ
II
J
I
Back
Close
CD Standards
There are several CD standard for different types of media:
Red Book — Digital Audio: Most Music CDs.
215
Yellow Book — CD-ROM:
Model 1 – computer data,
Model 2 – compressed audio/video data.
Green Book — CD-I
Orange Book — Write once CDs
Blue Book — Laser disc
JJ
II
J
I
Back
Close
DVD
The current best generation of optical disc storage technology
for Multimedia:
• DVD — Digital Versatile Disc (formal),
Digital Video Disc (mistaken).
216
• Larger storage and faster than CD
– over 2 Hours of Video / Single sided DVD-ROM 2.4 Gb
• Formats: DVD-Video and DVD-ROM (DVD-R and DVD-RAM)
JJ
II
J
I
Back
Close
What are the features of DVD-Video?
The main features of DVD include:
• Over 2 hours of high-quality digital video (over 8 on a
double-sided, dual-layer disc).
217
• Support for widescreen movies on standard or widescreen
TVs (4:3 and 16:9 aspect ratios).
• Up to 8 tracks of digital audio (for multiple languages), each
with as many as 8 channels.
• Up to 32 subtitle/karaoke tracks.
• Automatic seamless branching of video (for multiple story
lines or ratings on one disc).
• Up to 9 camera angles (different viewpoints can be selected
during playback).
JJ
II
J
I
Back
Close
Main features of DVD (Cont)
• Menus and simple interactive features (for games, quizzes,
etc.).
218
• Multilingual identifying text for title name, album name, song
name, cast, crew, etc.
• Instant rewind and fast forward, including search to title,
chapter, track, and timecode.
• Durability (no wear from playing, only from physical damage).
• Not susceptible to magnetic fields. Resistant to heat.
• Compact size (easy to handle and store, players can be
portable, replication is cheaper).
JJ
II
J
I
Back
Close
What are the disadvantages of DVD?
Despite several positive attributes mentioned above there are
some potential disadvantages of DVD:
• It has built-in copy protection and regional lockout.
219
• It uses digital compression. Poorly compressed audio or
video may be blocky, fuzzy, harsh, or vague.
• The audio downmix process for stereo/Dolby Surround can
reduce dynamic range.
• It doesn’t fully support HDTV.
• Some DVD players and drives may not be able to read CD-Rs.
• Dispute of some DVD-R formats
JJ
II
J
I
Back
Close
Comparison of DVD and CD-ROM
The increase in capacity in
DVD-ROM (from CD-ROM)
is due to:
• smaller
(∼2.08x),
pit
220
length
• tighter tracks (∼2.16x),
• slightly larger data area
(∼1.02x),
• discs single or double
sided
JJ
II
J
I
Back
Close
Comparison of DVD and CD-ROM (Cont.)
• another data layer added to each
side creating a potential for four
layers of data per disc
• more
efficient channel
modulation (∼1.06x),
bit
221
• more efficient error correction
(∼1.32x),
• less sector overhead ( 1.06x).
• capacity of a dual-layer disc is
slightly less than double that of a
single-layer disc.
JJ
II
J
I
Back
Close
Multimedia Data Representation
Issues to be covered:
222
• Digital Audio
• Graphics/Image Formats
• Digital Video (Next Lecture)
• Sampling/Digitisation
• Compression
JJ
II
J
I
Back
Close
Digital Audio
Application of Digital Audio — Selected Examples
223
Music Production
• Hard Disk Recording
• Sound Synthesis
• Samplers
• Effects Processing
Video – Audio Important Element: Music and Effects
Web — Many uses on Web
• Spice up Web Pages
• Listen to Cds
• Listen to Web Radio
JJ
II
J
I
Back
Close
What is Sound?
Source — Generates Sound
• Air Pressure changes
224
• Electrical — Loud Speaker
• Acoustic — Direct Pressure Variations
Destination — Receives Sound
• Electrical — Microphone produces electric signal
• Ears — Responds to pressure hear sound (more later
(MPEG Audio))
JJ
II
J
I
Back
Close
Digitising Sound
• Microphone produces analog signal
• Computer like discrete entities
225
Need to convert Analog-to-Digital — Specialised Hardware
Also known as Sampling
JJ
II
J
I
Back
Close
Digital Sampling
Sampling basically involves:
• measuring the analog signal at regular discrete intervals
• recording the value at these points
226
JJ
II
J
I
Back
Close
Computer Manipulation of Sound
Writing Digital Signal Processing routines range from being
trivial to highly complex:
• Volume
227
• Cross-Fading
• Looping
• Echo/Reverb/Delay
• Filtering
• Signal Analysis
JJ
II
J
I
Back
Close
Sound Demos
• Volume
• Cross-Fading
228
• Looping
• Echo/Reverb/Delay
• Filtering
JJ
II
J
I
Back
Close
Sample Rates and Bit Size
How do we store each sample value (Quantisation)?
8 Bit Value (0-255)
229
16 Bit Value (Integer) (0-65535)
How many Samples to take?
11.025 KHz — Speech (Telephone 8 KHz)
22.05 KHz — Low Grade Audio
(WWW Audio, AM Radio)
44.1 KHz — CD Quality
JJ
II
J
I
Back
Close
Nyquist’s Sampling Theorem
Sampling Frequency is Very Important in order to accurately
reproduce a digital version of an Analog Waveform
230
Nyquist’s Theorem:
The Sampling frequency for a signal must be at least twice
the highest frequency component in the signal.
JJ
II
J
I
Back
Close
231
Figure 19: Sampling at Signal Frequency
JJ
II
J
I
Back
Close
232
Figure 20: Sampling at Twice Nyquist Frequency
JJ
II
J
I
Back
Close
233
Figure 21: Sampling at above Nyquist Frequency
JJ
II
J
I
Back
Close
Implications of Sample Rate and Bit Size
Affects Quality of Audio
• Ears do not respond to sound in a linear fashion ((more later
(MPEG Audio))
234
• Decibel (dB) a logarithmic measurement of sound
• 16-Bit has a signal-to-noise ratio of 98 dB — virtually
inaudible
• 8-bit has a signal-to-noise ratio of 50 dB
• Therefore, 8-bit is roughly 8 times as noisy
– 6 dB increment is twice as loud
• Click Here to Hear Sound Examples
JJ
II
J
I
Back
Close
Implications of Sample Rate and Bit Size (cont)
Affects Size of Data
File Type
16 Bit Stereo
16 Bit Mono
8 Bit Mono
44.1 KHz 22.05 KHz 11.025 KHz
10.1 Mb 5.05 Mb
2.52 Mb
5.05 Mb 2.52 Mb
1.26 Mb
2.52 Mb 1.26 Mb
630 Kb
235
Figure 22: Memory Required for 1 Minute of Digital Audio
JJ
II
J
I
Back
Close
Practical Implications of Nyquist Sampling Theory
• Must (low pass) filter signal before sampling:
236
• Otherwise strange artifacts from high frequency signals appear.
JJ
II
J
I
Back
Close
Why are CD Sample Rates 44.1 KHz?
237
JJ
II
J
I
Back
Close
Why are CD Sample Rates 44.1 KHz?
238
Upper range of human hearing is around
20-22 KHz — Apply Nyquist Theorem
JJ
II
J
I
Back
Close
Common Audio Formats
• Popular audio file formats include
– .au (Origin: Unix workstations),
– .aiff (MAC, SGI),
239
– .wav (PC, DEC workstations)
• Compression can be utilised in some of the above but is not
Mandatory
• A simple and widely used (by above) audio compression
method is Adaptive Delta Pulse Code Modulation (ADPCM).
– Based on past samples, it
predicts the next sample and
encodes the difference between the actual value and the
predicted value.
– More on this later (Audio Compression)
JJ
II
J
I
Back
Close
Common Audio Formats (Cont.)
• Many formats linked to audio applications
• Can use some compression
240
– Sounblaster — .voc (Can use Silence Deletion (More on
this later (Audio Compression))
– Protools/Sound Designer – .sd2
– Realaudio — .ra.
– Ogg Vorbis — .ogg
• MPEG AUDIO — More Later (MPEG-3 and MPEG-4)
JJ
II
J
I
Back
Close
Delivering Audio across a network
• Trade off between desired fidelity and file size
• Bandwidth Considerations for Web and other media.
241
• Compress Files:
– Could affect live transmission on Web
JJ
II
J
I
Back
Close
Streaming Audio
• Buffered Data:
– Trick get data to destination before it’s needed
– Temporarily store in memory (Buffer)
242
– Server keeps feeding the buffer
– Client Application reads buffer
• Needs Reliable Connection, moderately fast too.
• Specialised client, Steaming Audio Protocol (PNM for real
audio).
JJ
II
J
I
Back
Close
Synthetic Sounds — reducing bandwidth?
• Synthesise sounds — hardware or software
• Client produces sound — only send parameters to control sound (MIDI next)
• Many synthesis techniques could be used, For example:
243
– FM (Frequency Modulation) Synthesis – used in low-end Sound Blaster
cards, OPL-4 chip, Yamaha DX Synthesiser range popular in Early 1980’s.
– Wavetable synthesis – wavetable generated from sound waves of real instruments
– Additive synthesis — make up signal from smaller simpler waveforms
– Subtractive synthesis — modify a (complex) waveform but taking out elements
– Physical Modelling — model how acoustic sound in generated in software
• Modern Synthesisers use a mixture of sample and synthesis.
JJ
II
J
I
Back
Close
MIDI
What is MIDI?
244
• No Longer Exclusively the Domain of Musicians.
• Midi provides a very low bandwidth alternative on the Web:
– transmit musical and
– certain sound effects data
• also now used as a compression control language (modified)
– See MPEG-4 Section soon
JJ
II
J
I
Back
Close
MIDI on the Web
Very Low Bandwidth (few 100K bytes)
245
• The responsibility of producing sound is moved to the client:
– Synthesiser Module
– Sample
– Soundcard
– Software Generated
• Most Web browsers can deal with MIDI.
JJ
II
J
I
Back
Close
Definition of MIDI:
A protocol that enables computer, synthesizers, keyboards,
and other musical device to communicate with each other.
246
JJ
II
J
I
Back
Close
Components of a MIDI System
Synthesizer:
• It is a sound generator (various pitch, loudness, tone colour).
247
• A good (musician’s) synthesizer often has a microprocessor,
keyboard, control panels, memory, etc.
Sequencer:
• It can be a stand-alone unit or a software program for a
personal computer. (It used to be a storage server for MIDI
data. Nowadays it is more a software music editor on the
computer.
• It has one or more MIDI INs and MIDI OUTs.
JJ
II
J
I
Back
Close
Basic MIDI Concepts
Track:
• Track in sequencer is used to organize the recordings.
• Tracks can be turned on or off on recording or playing back.
248
Channel:
• MIDI channels are used to separate information in a MIDI
system.
• There are 16 MIDI channels in one cable.
• Channel numbers are coded into each MIDI message.
Timbre:
• The quality of the sound, e.g., flute sound, cello sound, etc.
• Multitimbral – capable of playing many different sounds at
the same time (e.g., piano, brass, drums, etc.)
JJ
II
J
I
Back
Close
Basic MIDI Concepts (Cont.)
Pitch:
• The Musical note that the instrument plays
Voice:
249
• Voice is the portion of the synthesizer that produces sound.
• Synthesizers can have many (12, 20, 24, 36, etc.) voices.
• Each voice works independently and simultaneously to produce
sounds of different timbre and pitch.
Patch:
• The control settings that define a particular timbre.
JJ
II
J
I
Back
Close
Hardware Aspects of MIDI
MIDI connectors:
– Three 5-pin ports found on the back
of every MIDI unit
• MIDI IN: the connector via
which the device receives all MIDI
data.
250
OUT: the connector
through
which
the
device
transmits all the MIDI data it
generates itself.
• MIDI
THROUGH:
the
connector by which the device
echoes the data receives from
MIDI IN.
• MIDI
JJ
II
J
I
Back
Close
MIDI Messages
MIDI messages are used by MIDI devices to communicate
with each other.
251
MIDI messages are very low bandwidth:
• Note On Command
– Which Key is pressed
– Which MIDI Channel (what sound to play)
– 3 Hexadecimal Numbers
• Note Off Command Similar
• Other command (program change) configure sounds to be
played.
JJ
II
J
I
Back
Close
Structure of MIDI messages:
• MIDI message includes a status byte and up to two data
bytes.
• Status byte
252
– The most significant bit of status byte is set to 1.
– The 4 low-order bits identify which channel it belongs to
(four bits produce 16 possible channels).
– The 3 remaining bits identify the message.
• The most significant bit of data byte is set to 0.
JJ
II
J
I
Back
Close
Classification of MIDI messages:
-- voice messages
--- channel messages ---|
|
-|
MIDI messages ----|
|
---- system messages ---|---
253
mode messages
common messages
real-time messages
exclusive messages
JJ
II
J
I
Back
Close
Midi Channel messages:
– messages that are transmitted on individual channels rather
that globally to all devices in the MIDI network.
254
Channel voice messages:
• Instruct the receiving instrument to assign particular sounds
to its voice
• Turn notes on and off
• Alter the sound of the currently active note or notes
JJ
II
J
I
Back
Close
Midi Channel Control Messages
Voice Message
Status Byte
----------------------Note off
8x
Note on
9x
Polyphonic Key Pressure
Ax
Control Change
Bx
Program Change
Cx
Channel Pressure
Dx
Pitch Bend
Ex
Data Byte1
----------------Key number
Key number
Key number
Controller number
Program number
Pressure value
MSB
Data Byte2
----------------Note Off velocity
Note on velocity
Amount of pressure
Controller value
None
None
LSB
Notes: ‘x’ in status byte hex value stands for a channel
number.
255
JJ
II
J
I
Back
Close
Midi Command Example
A Note On message is followed by two bytes, one to identify
the note, and on to specify the velocity.
To play:
256
• Note number 80 (HEX 50)
• With maximum velocity (127 (Hex 7F)
• On channel 13 (Hex C),
The MIDI device would send these three hexadecimal byte
values:
9C
50
7F
JJ
II
J
I
Back
Close
Midi Channel mode messages:
• Channel mode messages are a special case of the Control
Change message (Bx (Hex) or 1011nnnn (Binary)).
• The difference between a Control message and a Channel
Mode message, is in the first data byte.
257
– Data byte values 121 through 127 have been reserved in
the Control Change message for the channel mode
messages.
– Channel mode messages determine how an instrument
will process MIDI voice messages.
JJ
II
J
I
Back
Close
System Messages:
• System messages carry information that are not channel
specific, Examples:
258
– Timing signal for synchronization,
– Positioning information in pre-recorded MIDI sequences,
and
– Detailed setup information for the destination device
– Setting up sounds, Patch Names etc.
JJ
II
J
I
Back
Close
Midi System Real-time Messages
• These messages are related to synchronization/timing etc.
259
System Real-Time Message
-----------------------Timing Clock
Start Sequence
Continue Sequence
Stop Sequence
Active Sensing
System Reset
Status Byte
----------F8
FA
FB
FC
FE
FF
JJ
II
J
I
Back
Close
System common messages
• These contain the following (unrelated) messages
260
System Common Message
--------------------MIDI Timing Code
Song Position Pointer
Song Select
Tune Request
Status Byte
----------F1
F2
F3
F6
Number of Data Bytes
-------------------1
2
1
None
JJ
II
J
I
Back
Close
Midi System exclusive messages
• Messages related to things that cannot be standardized:
– System dependent creation of sound
– System dependent organisation of sounds
(Not General Midi Compliant? (more soon))
261
• An addition to the original MIDI specification.
• Just a stream of bytes
– all with their high bits set to 0,
– bracketed by a pair of system exclusive start and end
messages:
F0 — Sysex Start
F7 — Sysex End
– Format of message byte stream system dependent.
JJ
II
J
I
Back
Close
General MIDI (GM)
Problem: Midi Music may not sound the same everywhere?
Basic GM Idea:
262
• MIDI + Instrument Patch Map + Percussion Key Map –> a
piece of MIDI music sounds (more or less) the same anywhere
it is played
– Instrument patch map is a standardised list consisting of
128 instruments (patches).
Same instrument type sounds if not identical sound
– Percussion map specifies 47 percussion sounds.
Same Drum type sounds on keyboard map
– Key-based percussion is always transmitted on MIDI
channel 10 (Default)
Can be transmitted on other channels as well
JJ
II
J
I
Back
Close
Requirements for General MIDI Compatibility
• Support all 16 channels — Default standard Multitimbral MIDI
Specification
263
• Each channel can play a different instrument/program —
multitimbral
• Each channel can play many notes — polyphony
• Minimum of 24 (usually much higher 64/128) fully
dynamically allocated voices — shared across all channels
JJ
II
J
I
Back
Close
General MIDI Instrument Patch Map
Prog No.
Instrument
-------------------------(1-8
PIANO)
1
Acoustic Grand
2
Bright Acoustic
3
Electric Grand
4
Honky-Tonk
5
Electric Piano 1
6
Electric Piano 2
7
Harpsichord
8
Clav
(17-24
17
18
19
20
21
22
23
24
(33-40
33
34
35
36
37
38
39
40
Prog No.
Instrument
----------------------------------(9-16
CHROM PERCUSSION)
9
Celesta
10
Glockenspiel
11
Music Box
12
Vibraphone
13
Marimba
14
Xylophone
15
Tubular Bells
16
Dulcimer
ORGAN)
Drawbar Organ
Percussive Organ
Rock Organ
Church Organ
Reed Organ
Accoridan
Harmonica
Tango Accordian
25
26
27
28
29
30
31
32
(25-32
GUITAR)
Acoustic Guitar(nylon)
Acoustic Guitar(steel)
Electric Guitar(jazz)
Electric Guitar(clean)
Electric Guitar(muted)
Overdriven Guitar
Distortion Guitar
Guitar Harmonics
BASS)
Acoustic Bass
Electric Bass(finger)
Electric Bass(pick)
Fretless Bass
Slap Bass 1
Slap Bass 2
Synth Bass 1
Synth Bass 2
41
42
43
44
45
46
47
48
(41-48
STRINGS)
Violin
Viola
Cello
Contrabass
Tremolo Strings
Pizzicato Strings
Orchestral Strings
Timpani
264
JJ
II
J
I
Back
Close
(49-56
ENSEMBLE)
String Ensemble 1
String Ensemble 2
SynthStrings 1
SynthStrings 2
Choir Aahs
Voice Oohs
Synth Voice
Orchestra Hit
(65-72
REED)
65
Soprano Sax
66
Alto Sax
67
Tenor Sax
68
Baritone Sax
69
Oboe
70
English Horn
71
Bassoon
72
Clarinet
49
50
51
52
53
54
55
56
(81-88
81
82
83
84
85
86
87
88
SYNTH LEAD)
Lead 1 (square)
Lead 2 (sawtooth)
Lead 3 (calliope)
Lead 4 (chiff)
Lead 5 (charang)
Lead 6 (voice)
Lead 7 (fifths)
Lead 8 (bass+lead)
73
74
75
76
77
78
79
80
(57-64
BRASS)
Trumpet
Trombone
Tuba
Muted Trumpet
French Horn
Brass Section
SynthBrass 1
SynthBrass 2
(73-80
PIPE)
Piccolo
Flute
Recorder
Pan Flute
Blown Bottle
Skakuhachi
Whistle
Ocarina
89
90
91
92
93
94
95
96
(89-96
Pad
Pad
Pad
Pad
Pad
Pad
Pad
Pad
57
58
59
60
61
62
63
64
1
2
3
4
5
6
7
8
SYNTH PAD)
(new age)
(warm)
(polysynth)
(choir)
(bowed)
(metallic)
(halo)
(sweep)
265
JJ
II
J
I
Back
Close
(97-104
97
98
99
100
101
102
103
104
113
114
115
116
117
118
119
120
SYNTH EFFECTS)
FX 1 (rain)
FX 2 (soundtrack)
FX 3 (crystal)
FX 4 (atmosphere)
FX 5 (brightness)
FX 6 (goblins)
FX 7 (echoes)
FX 8 (sci-fi)
(113-120
PERCUSSIVE)
Tinkle Bell
Agogo
Steel Drums
Woodblock
Taiko Drum
Melodic Tom
Synth Drum
Reverse Cymbal
105
106
107
108
109
110
111
112
(105-112 ETHNIC)
Sitar
Banjo
Shamisen
Koto
Kalimba
Bagpipe
Fiddle
Shanai
121
122
123
124
125
126
127
128
(121-128 SOUND EFFECTS)
Guitar Fret Noise
Breath Noise
Seashore
Bird Tweet
Telephone Ring
Helicopter
Applause
Gunshot
266
JJ
II
J
I
Back
Close
General MIDI Percussion Key Map
MIDI Key
-------35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
Drum Sound
---------Acoustic Bass Drum
Bass Drum 1
Side Stick
Acoustic Snare
Hand Clap
Electric Snare
Low Floor Tom
Closed Hi-Hat
High Floor Tom
Pedal Hi-Hat
Low Tom
Open Hi-Hat
Low-Mid Tom
Hi-Mid Tom
Crash Cymbal 1
High Tom
Ride Cymbal 1
Chinese Cymbal
Ride Bell
Tambourine
Splash Cymbal
Cowbell
Crash Cymbal 2
Vibraslap
MIDI Key
---------59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
Drum Sound
---------Ride Cymbal 2
Hi Bongo
Low Bongo
Mute Hi Conga
Open Hi Conga
Low Conga
High Timbale
Low Timbale
High Agogo
Low Agogo
Cabasa
Maracas
Short Whistle
Long Whistle
Short Guiro
Long Guiro
Claves
Hi Wood Block
Low Wood Block
Mute Cuica
Open Cuica
Mute Triangle
Open Triangle
267
JJ
II
J
I
Back
Close
Digital Audio and MIDI
• Modern Recording Studio — Hard Disk Recording and MIDI
– Analog Sounds (Live Vocals, Guitar, Sax etc) — DISK
– Keyboards, Drums, Samples, Loops Effects — MIDI
268
• Sound Generators: use a mix of
– Synthesis
– Samples
• Samplers — Digitise (Sample) Sound then
– Playback
– Loop (beats)
– Simulate Musical Instruments
JJ
II
J
I
Back
Close
Digital Audio, Synthesis, Midi and Compression —
MPEG 4 Structured Audio
• We have seen the need for compression already in Digital
Audio — Large Data Files
• Basic Ideas of compression (next lecture) used as integral
part of audio format — MP3, real audio etc.
269
• Mpeg-4 audio — actually combines compression synthesis
and midi to have a massive impact on compression.
• Midi, Synthesis encode what note to play and how to play it
with a small number of parameters
— Much greater reduction than simply having some encoded
bits of audio.
• Responsibility to create audio delegated to generation side.
JJ
II
J
I
Back
Close
MPEG 4 Structured Audio
A newer standard than MPEG3 Audio — which we study in
detail later
270
MPEG-4 covers the the whole range of digital audio:
• From very low bit rate speech
• To full bandwidth high quality audio
• Built in anti-piracy measures
• Structured Audio
• Relation to MIDI so we study MPEG 4 audio here
JJ
II
J
I
Back
Close
Structured Audio Tools
MPEG-4 comprises of 6 Structured Audio tools are:
SAOL – the Structured Audio Orchestra Language
271
SASL – the Structured Audio Score Language
SASBF – the Structured Audio Sample Bank Format
Set of MIDI semantics — describe how to control SAOL with
MIDI
Scheduler – describe how to take the above parts and create
sound
AudioBIFS – part of BIFS, which lets you make audio
soundtracks in MPEG-4 using a variety of tools and
effects-processing techniques
JJ
II
J
I
Back
Close
SAOL (Structured Audio Orchestra Language)
• Pronounced “ sail”
• The central part of the Structured Audio toolset.
272
• A new software-synthesis language
• A language for describing synthesizers, a program, or instrument
• Specifically designed it for use in MPEG-4.
• Not based on any particular method of synthesis – supports
many underlying synthesis methods.
JJ
II
J
I
Back
Close
SAOL Synthesis Methods
• Any known method of synthesis can be described in SAOL
(Open Support).
273
– FM synthesis,
– physical-modeling synthesis,
– sampling synthesis,
– granular synthesis,
– subtractive synthesis,
– FOF synthesis, and
– hybrids of all of these in SAOL.
JJ
II
J
I
Back
Close
SASL (Structured Audio Score Language)
• A very simple language to control the synthesizers specified
by SAOL instruments.
• A SASL program, or score, contains instructions that tell
SAOL:
274
– what notes to play,
– how loud to play them,
– what tempo to play them at,
– how long they last, and how to control them (vary them
while they’re playing).
• Similar to MIDI
– doesn’t suffer from MIDI’s restrictions on temporal resolution
or bandwidth.
– more sophisticated controller structure
JJ
II
J
I
Back
Close
SASL (Structured Audio Score Language) (Cont.)
• Lightweight Scoring Language: Does not support:
– looping,
275
– sections,
– repeats,
– expression evaluation,
– some other things.
– most SASL scores will be created by automatic tools
JJ
II
J
I
Back
Close
SASBF (Structured Audio Sample Bank Format)
• A format for efficiently transmitting banks of sound samples
• Used in wavetable, or sampling, synthesis.
276
• Partly compatible with the MIDI Downloaded Sounds (DLS)
format?
• The most active participants in this activity are EMu Systems
(sampler manufacturer) and the MIDI Manufacturers
Association (MMA).
JJ
II
J
I
Back
Close
MPEG-4 MIDI Semantics
SASL can be controlled by
• SASL Scripts
277
• MIDI
• Scores in MPEG-4
Reasons to use MIDI:
• MIDI is today’s most commonly used representation for music
score data,
• Many sophisticated authoring tools (such as sequencers)
work with MIDI.
JJ
II
J
I
Back
Close
MPEG-4 Midi Control
• MIDI syntax external to MPEG-4 Structured Audio standard
• Use MIDI Manufacturers Association’s standard.
278
• Redefines the some semantics for MPEG-4.
• The new semantics are carefully defined as part of the
MPEG-4 specification.
JJ
II
J
I
Back
Close
MPEG-4 Scheduler
• The main body of the Structured Audio definition.
• A set of carefully defined and somewhat complicated
instructions
279
• Specify how SAOL is used to create sound when it is driven
by MIDI or SASL.
JJ
II
J
I
Back
Close
AudioBIFS
• BIFS is the MPEG-4 Binary Format for Scene Description.
• Describes how the different ”objects” in a structured media
scene fit together:
280
– MPEG-4 consists also of the video clips, sounds,
animations, and other pieces of multimedia
– Each have special formats to describe them.
– Need to put the pieces together
– BIFS lets you describe how to put the pieces together.
JJ
II
J
I
Back
Close
AudioBIFS (Cont.)
• AudioBIFS is designed for specifying the mixing and
post-production of audio scenes as they’re played back.
281
• For example,
– we can specify how the voice-track is mixed with the
background music, and
– that it fades out after 10 seconds and
– this other music comes in and has a nice reverb on it.
• Extended version of VRML: capabilities for
– streaming and
– mixing audio and video data
• Very advanced sound model.
JJ
II
J
I
Back
Close
AudioBIFS (Cont.)
How a simple sound is created from three elementary sound
streams:
282
Figure 23: AudioBIFS Subgraph
JJ
II
J
I
Back
Close
Graphic/Image File Formats
Common graphics and image file formats:
• http://www.dcs.ed.ac.uk/home/mxr/gfx/ —
comprehensive listing of various formats.
283
• See Encyclopedia of Graphics File Formats book in library
• Most formats incorporate compression
• Graphics, video and audio compression techniques in next
Chapter.
JJ
II
J
I
Back
Close
Graphic/Image Data Structures
• A digital image consists of many picture elements, termed
pixels.
• The number of pixels determine the quality of the image
(resolution).
284
• Higher resolution always yields better quality.
• A bit-map representation stores the graphic/image data in the
same manner that the computer monitor contents are stored
in video memory.
JJ
II
J
I
Back
Close
Monochrome/Bit-Map Images
285
Figure 24: Sample Monochrome Bit-Map Image
• Each pixel is stored as a single bit (0 or 1)
• A 640 x 480 monochrome image requires 37.5 KB of storage.
• Dithering is often used for displaying monochrome images
JJ
II
J
I
Back
Close
Gray-scale Images
286
Figure 25: Example of a Gray-scale Bit-map Image
• Each pixel is usually stored as a byte (value between 0 to 255)
• A 640 x 480 greyscale image requires over 300 KB of storage.
JJ
II
J
I
Back
Close
8-bit Colour Images
287
Figure 26: Example of 8-Bit Colour Image
• One byte for each pixel
• Supports 256 out of the millions s possible, acceptable colour
quality
• Requires Colour Look-Up Tables (LUTs)
• A 640 x 480 8-bit colour image requires 307.2 KB of storage (the
same as 8-bit greyscale)
JJ
II
J
I
Back
Close
24-bit Colour Images
288
Figure 27: Example of 24-Bit Colour Image
• Each pixel is represented by three bytes (e.g., RGB)
• Supports 256 x 256 x 256 possible combined colours (16,777,216)
• A 640 x 480 24-bit colour image would require 921.6 KB of
storage
• Most 24-bit images are 32-bit images,
– the extra byte of data for each pixel is used to store an alpha
value representing special effect information
JJ
II
J
I
Back
Close
Standard System Independent Formats
GIF (GIF87a, GIF89a)
• Graphics Interchange Format (GIF) devised by the UNISYS
Corp. and Compuserve, initially for transmitting graphical
images over phone lines via modems
289
• Uses the Lempel-Ziv Welch algorithm (a form of Huffman
Coding), modified slightly for image scan line packets (line
grouping of pixels) — Algorithm Soon
• Limited to only 8-bit (256) colour images, suitable for images
with few distinctive colours (e.g., graphics drawing)
• Supports interlacing
JJ
II
J
I
Back
Close
JPEG
• A standard for photographic image compression created by
the Joint Photographic Experts Group
• Takes advantage of limitations in the human vision system
to achieve high rates of compression
290
• Lossy compression which allows user to set the desired level
of quality/compression
• Algorithm Soon — Detailed discussions in next chapter on
compression.
JJ
II
J
I
Back
Close
TIFF
• Tagged Image File Format (TIFF), stores many different types
of images (e.g., monochrome, greyscale, 8-bit & 24-bit RGB,
etc.) –> tagged
• Developed by the Aldus Corp. in the 1980’s and later
supported by the Microsoft
291
• TIFF is a lossless format (when not utilizing the new JPEG
tag which allows for JPEG compression)
• It does not provide any major advantages over JPEG and is
not as user-controllable it appears to be declining in popularity
JJ
II
J
I
Back
Close
Postscript/Encapsulated Postscript
• A typesetting language which includes text as well as
vector/structured graphics and bit-mapped images
• Used in several popular graphics programs (Illustrator,
FreeHand)
292
• Does not provide compression, files are often large
• Although Able to link to external compression applications
JJ
II
J
I
Back
Close
System Dependent Formats
Microsoft Windows: BMP
• A system standard graphics file format for Microsoft
Windows
293
• Used in Many PC Graphics programs, Cross-platform support
• It is capable of storing 24-bit bitmap images
JJ
II
J
I
Back
Close
Macintosh: PAINT and PICT
• PAINT was originally used in MacPaint program, initially only
for 1-bit monochrome images.
• PICT format was originally used in MacDraw (a vector based
drawing program) for storing structured graphics
294
• Still an underlying Mac format (although PDF on OS X)
JJ
II
J
I
Back
Close
X-windows: XBM
• Primary graphics format for the X Window system
• Supports 24-bit colour bitmap
• Many public domain graphic editors, e.g., xv
295
• Used in X Windows for storing icons, pixmaps, backdrops,
etc.
JJ
II
J
I
Back
Close
Colour in Image and Video — Basics of Colour
Light and Spectra
• Visible light is an electromagnetic wave in the 400nm - 700
nm range.
296
• Most light we see is not one wavelength, it’s a combination
of many wavelengths (Fig. 28).
Figure 28: Light Wavelengths
• The profile above is called a spectra.
JJ
II
J
I
Back
Close
The Human Retina
• The eye is basically similar to a camera
• It has a lens to focus light onto the Retina of eye
• Retina full of neurons
297
• Each neuron is either a rod or a cone.
• Rods are not sensitive to colour.
JJ
II
J
I
Back
Close
Cones and Perception
• Cones come in 3 types: red, green and blue. Each responds
differently to various frequencies of light. The following figure
shows the spectral-response functions of the cones and the
luminous-efficiency function of the human eye.
298
Figure 29: Cones and Luminous-efficiency Function of the Human
Eye
• The profile above is called a spectra.
JJ
II
J
I
Back
Close
• The colour signal to the brain comes from the response of
the 3 cones to the spectra being observed.
That is, the signal consists of 3 numbers:
Z
R=
E(λ)SR (λ)dλ
299
Z
E(λ)SG(λ)dλ
G=
Z
B=
E(λ)SB (λ)dλ
where E is the light and S are the sensitivity functions.
JJ
II
J
I
Back
Close
300
Figure 30: Spectra Response
• A colour can be specified as the sum of three colours. So
colours form a 3 dimensional vector space.
• The following figure shows the amounts of three primaries
needed to match all the wavelengths of the visible spectrum.
JJ
II
J
I
Back
Close
301
Figure 31: Wavelengths of the Visible Spectrum
JJ
II
J
I
Back
Close
RGB Colour Space
302
Figure 32: Original Color Image
• Colour Space is made up of Red, Green and Blue intensity
components
JJ
II
J
I
Back
Close
Red, Green, Blue (RGB) Image Space
303
JJ
II
J
I
Back
Close
CRT Displays
• CRT displays have three phosphors (RGB) which produce a
combination of wavelengths when excited with electrons.
304
• The gamut of colours is all colours that can be reproduced
using the three primaries
• The gamut of a colour monitor is smaller than that of color
models, E.g. CIE (LAB) Model — see later.
JJ
II
J
I
Back
Close
CIE Chromaticity Diagram
Does a set of primaries exist that span the space with only
positive coefficients?
305
• Yes, but no pure colours.
• In 1931, the CIE defined three standard primaries (X, Y, Z) .
The Y primary was intentionally chosen to be identical to the
luminous-efficiency function of human eyes.
• Figure 33 shows the amounts of X, Y, Z needed to exactly
reproduce any visible colour via the formulae:
JJ
II
J
I
Back
Close
306
Figure 33: Reproducing Visible Colour
JJ
II
J
I
Back
Close
Z
E(λ)x(λ)dλ
X=
Z
E(λ)y(λ)dλ
Y =
Z
Z=
307
E(λ)z(λ)dλ
• All visible colours are in a horseshoe shaped cone in the
X-Y-Z space. Consider the plane X+Y+Z=1 and project it
onto the X-Y plane, we get the CIE chromaticity diagram as
shown in Fig. 34.
• The edges represent the pure colours (sine waves at the
appropriate frequency)
• White (a blackbody radiating at 6447 kelvin) is at the dot
• When added, any two colours (points on the CIE diagram)
produce a point on the line between them.
JJ
II
J
I
Back
Close
308
Figure 34: CIE Chromaticity Diagram
JJ
II
J
I
Back
Close
L*a*b (Lab) Colour Model
• A refined CIE model, named CIE L*a*b in 1976
• Luminance: L Chrominance: a – ranges from green to red,
b – ranges from blue to yellow
• Used by Photoshop
309
JJ
II
J
I
Back
Close
Lab Image Space
310
Original Color Image
JJ
II
J
I
L, A, B Images
Back
Close
Colour Image and Video Representations
• Recap: A black and white image is a 2-D array of integers.
• Recap: A colour image is a 2-D array of (R,G,B) integer
triplets. These triplets encode how much the corresponding
phosphor should be excited in devices such as a monitor.
311
• Example is shown:
Beside the RGB representation, YIQ and YUV are the two
commonly used in video.
JJ
II
J
I
Back
Close
YIQ Colour Model
• YIQ is used in colour TV broadcasting, it is downward compatible
with B/W TV.
• Y (luminance) is the CIE Y primary.
Y = 0.299R + 0.587G + 0.114B
312
• the other two vectors:
I = 0.596R - 0.275G - 0.321B Q = 0.212R - 0.528G + 0.311B
• The YIQ transform:





Y
0.299 0.587 0.114
R
 I  =  0.596 −0.275 −0.321   G 
Q
0.212 −0.528 −0.311
B
• I is red-orange axis, Q is roughly orthogonal to I.
• Eye is most sensitive to Y, next to I, next to Q. In NTSC, 4 MHz is
allocated to Y, 1.5 MHz to I, 0.6 MHz to Q.
JJ
II
J
I
Back
Close
YIQ Colour Space
313
Original Color Image
JJ
II
J
I
Y, I, Q Images
Back
Close
YUV (CCIR 601 or YCrCb) Color Model
• Established in 1982 to build digital video standard
• Video is represented by a sequence of fields (odd and even
lines). Two fields make a frame.
• Works in PAL (50 fields/sec) or NTSC (60 fields/sec)
314
• Uses the Y, Cr, Cb colour space (also called YUV)
Y = 0.299R + 0.587G + 0.114B Cr = R - Y Cb = B - Y
• The YCrCb (YUV) Transform:

 
 
Y
0.299 0.587 0.114
R
 U  =  −0.169 −0.331 0.500   G 
V
0.500 −0.419 −0.081
B
JJ
II
J
I
Back
Close
YIQ Colour Space
315
Original Color Image
JJ
II
J
I
Y, I, Q Images
Back
Close
The CMY Colour Model
• Cyan, Magenta, and Yellow (CMY) are complementary colours
of RGB (Fig. 35). They can be used as Subtractive Primaries.
• CMY model is mostly used in printing devices where the
colour pigments on the paper absorb certain colours (e.g.,
no red light reflected from cyan ink).
Figure 35: The RGB and CMY Cubes
316
JJ
II
J
I
Back
Close
Conversion between RGB and CMY
E.g., convert White from (1, 1, 1) in RGB to (0, 0, 0) in CMY.






R
1
C
 M = 1 − G 
B
1
Y
    

R
1
C
 G = 1 − M 
B
1
Y
317
JJ
II
J
I
Back
Close
CMYK Color Model
• Sometimes, an alternative CMYK model (K stands for Black)
is used in colour printing (e.g., to produce darker black than
simply mixing CMY). where
318
K
C
M
Y
=
=
=
=
min(C, M, Y ),
C − K,
M − K,
Y − K.
JJ
II
J
I
Back
Close
YIQ Colour Space
319
Original Color Image
C, M, Y, K images
JJ
II
J
I
Back
Close
Summary of Colour
• Colour images are encoded as triplets of values.
• Three common systems of encoding in video are RGB, YIQ,
and YCrCb.
320
• Besides the hardware-oriented colour models (i.e., RGB, CMY,
YIQ, YUV), HSB (Hue, Saturation, and Brightness, e.g., used
in Photoshop) and HLS (Hue, Lightness, and Saturation) are
also commonly used.
• YIQ uses properties of the human eye to prioritise information.
Y is the black and white (luminance) image, I and Q are the
colour (chrominance) images. YUV uses similar idea.
• YUV is a standard for digital video that specifies
image size, and decimates the chrominance images (for 4:2:2
video) — more soon.
JJ
II
J
I
Back
Close
Basics of Video
Types of Colour Video Signals
• Component video – each primary is sent as a separate video
signal.
– The primaries can either be RGB or a luminance-chrominance
transformation of them (e.g., YIQ, YUV).
– Best colour reproduction
– Requires more bandwidth and good synchronization of the
three components
321
• Composite video – colour (chrominance) and luminance signals
are mixed into a single carrier wave. Some interference between
the two signals is inevitable.
• S-Video (Separated video, e.g., in S-VHS) – a compromise
between component analog video and the composite video. It
uses two lines, one for luminance and another for composite
chrominance signal.
JJ
II
J
I
Back
Close
Analog Video
The following figures (Fig. 36 and 37) are from A.M. Tekalp,
Digital video processing, Prentice Hall PTR, 1995.
322
Figure 36: Raster Scanning
JJ
II
J
I
Back
Close
323
Figure 37: NTSC Signal
JJ
II
J
I
Back
Close
NTSC Video
• 525 scan lines per frame, 30 frames per second (or be exact,
29.97 fps, 33.37 msec/frame)
324
• Aspect ratio 4:3
• Interlaced, each frame is divided into 2 fields, 262.5 lines/field
• 20 lines reserved for control information at the beginning of
each field (Fig. 38)
– So a maximum of 485 lines of visible data
– Laser disc and S-VHS have actual resolution of 4̃20 lines
– Ordinary TV – 3̃20 lines
JJ
II
J
I
Back
Close
NTSC Video Scan Line
• Each line takes 63.5 microseconds to scan. Horizontal retrace
takes 10 microseconds (with 5 microseconds horizontal synch
pulse embedded), so the active line time is 53.5 microseconds.
325
Figure 38: Digital Video Rasters
JJ
II
J
I
Back
Close
NTSC Video Colour Representation/Compression
• Colour representation:
– NTSC uses YIQ colour model.
– Composite = Y + I cos(Fsc t) + Q sin(Fsc t),
where Fsc is the frequency of colour subcarrier
– Basic Compression Idea
326
Eye is most sensitive to Y, next to I, next to Q.
– This is STILL Analog Compression:
In NTSC,
∗ 4 MHz is allocated to Y,
∗ 1.5 MHz to I,
∗ 0.6 MHz to Q.
– Similar (easier to work out) Compression (Part of ) in
digital compression — more soon
JJ
II
J
I
Back
Close
PAL Video
• 625 scan lines per frame, 25 frames per second
(40 msec/frame)
327
• Aspect ratio 4:3
• Interlaced, each frame is divided into 2 fields, 312.5 lines/field
• Colour representation:
– PAL uses YUV (YCbCr) colour model
– composite =
Y + 0.492 x U sin(Fsc t) + 0.877 x V cos(Fsc t)
– In PAL, 5.5 MHz is allocated to Y, 1.8 MHz each to U and
V.
JJ
II
J
I
Back
Close
Digital Video
• Advantages:
– Direct random access –> good for nonlinear video editing
– No problem for repeated recording
328
– No need for blanking and sync pulse
• Almost all digital video uses component video
JJ
II
J
I
Back
Close
Chroma Subsampling
Chroma subsampling is a method that stores color
information at lower resolution than intensity information.
• How to decimate for chrominance?
329
JJ
II
J
I
Back
Close
What do these numbers mean?
• 4:2:2 –> Horizontally subsampled colour signals by a factor
of 2. Each pixel is two bytes, e.g., (Cb0, Y0)(Cr0, Y1)(Cb2,
Y2)(Cr2, Y3)(Cb4, Y4) ...
• 4:1:1 –> Horizontally subsampled by a factor of 4
330
• 4:2:0 –> Subsampled in both the horizontal and vertical axes
by a factor of 2 between pixels.
• 4:1:1 and 4:2:0 are mostly used in JPEG and MPEG (see
Later).
JJ
II
J
I
Back
Close
Chroma Subsampling in Practice —
Analog/Digital Subsampling
• Simply different frequency sampling of signal in analog
• Digital Subsampling: Perform 2x2 (or 1x2, or 1x4) chroma
subsampling:
331
– break the image into 2x2 (or 1x2, or 1x4)pixel blocks and
– only stores the average color information for each 2x2 (or
1x2, or 1x4) pixel group.
JJ
II
J
I
Back
Close
Digital Chroma Subsampling Errors (1)
This sampling process introduces two kinds of errors:
1. The major problem is that color is typically stored at only half
the horizontal and vertical resolution as the original image.
This is not a real problem:
332
• Recall: The human eye has lower resolving power for
color than for intensity.
• Nearly all digital cameras have lower resolution for color
than for intensity,
so there is no high resolution color information present in
digital camera images.
JJ
II
J
I
Back
Close
Digital Chroma Subsampling Errors (2)
2. The subsampling process demands two conversions of the
image:
• from the original RGB representation to an intensity+color
(YIQ/YUV) representation , and
333
• then back again (YIQ/YUV –> RGB) when the image is
displayed.
• Conversion is done in integer arithmetic — some round-off
error is introduced.
– This is a much smaller effect,
– But (slightly) affects the color of (typically) one or two
percent of the pixels in an image.
JJ
II
J
I
Back
Close
CCIR Standards for Digital Video
(CCIR – Consultative Committee for International Radio)
-------------------Luminance resolution
Chrominance resolut.
Colour Subsampling
Fields/sec
Interlacing
CCIR 601
525/60
NTSC
----------720 x 485
360 x 485
4:2:2
60
Yes
CCIR 601
625/50
PAL/SECAM
----------720 x 576
360 x 576
4:2:2
50
Yes
CIF
QCIF
NTSC
----------352 x 240
176 x 120
----------176 x 120
88 x 60
30
No
30
No
334
• CCIR 601 uses interlaced scan, so each field only has half as
much vertical resolution (e.g., 243 lines in NTSC).
The CCIR 601 (NTSC) data rate is 1̃65 Mbps.
• CIF (Common Intermediate Format) is introduced to as an
acceptable temporary standard.
It delivers about the VHS quality. CIF uses progressive
(non-interlaced) scan.
JJ
II
J
I
Back
Close
ATSC Digital Television Standard
(ATSC – Advanced Television Systems Committee)
The ATSC Digital Television Standard was recommended
to be adopted as the Advanced TV broadcasting standard by
the FCC Advisory Committee on Advanced Television Service
on November 28, 1995.
335
It covers the standard for HDTV (High Definition TV).
JJ
II
J
I
Back
Close
Video Format
The video scanning formats supported by the ATSC Digital
Television Standard are shown in the following table.
Vertical Lines Horizontal
Pixels
1080
920
720
1280
480
704
480
640
Aspect Ratio
Picture Rate
16:9
16:9
16:9 and 4:3
4:3
60I 30P 24P
60P 30P 24P
60I 60P 30P 24P
60I 60P 30P 24P
336
• The aspect ratio for HDTV is 16:9 as opposed to 4:3 in NTSC,
PAL, and SECAM. (A 33% increase in horizontal dimension.)
• In the picture rate column, the I means interlaced scan, and
the P means progressive (non-interlaced) scan.
• Both NTSC rates and integer rates are supported (i.e., 60.00,
59.94, 30.00, 29.97, 24.00, and 23.98).
JJ
II
J
I
Back
Close
Compression I:
Basic Compression Algorithms
337
Recap: The Need for Compression
Raw Video, Image and Audio files are very large beasts:
Uncompressed Audio
1 minute of Audio:
Audio Type 44.1 KHz 22.05 KHz 11.025 KHz
16 Bit Stereo 10.1 Mb
5.05 Mb
2.52 Mb
16 Bit Mono 5.05 Mb
2.52 Mb
1.26 Mb
8 Bit Mono
2.52 Mb
1.26 Mb
630 Kb
JJ
II
J
I
Back
Close
Uncompressed Images
Image Type
512 x 512 Monochrome
512 x 512 8-bit colour image
512 x 512 24-bit colour image
File Size
0.25 Mb
0.25 Mb
0.75 Mb
338
JJ
II
J
I
Back
Close
Video
Can involve: Stream of audio and images
Raw Video – Uncompressed Image Frames, 512x512 True
Color PAL 1125 Mb Per Min
DV Video — 200-300 Mb per Min (Approx) Compressed
339
HDTV — Gigabytes per second.
• Relying on higher bandwidths is not a good option — M25
Syndrome.
• Compression HAS TO BE part of the representation of
audio, image and video formats.
JJ
II
J
I
Back
Close
Classifying Compression Algorithms
What is Compression?
Compression basically employs redundancy in the data:
340
• Temporal — in 1D data, 1D signals, Audio etc.
• Spatial — correlation between neighbouring pixels or data
items
• Spectral — correlation between colour or luminescence
components.
This uses the frequency domain to exploit relationships
between frequency of change in data.
• Psycho-visual — exploit perceptual properties of the human
visual system.
JJ
II
J
I
Back
Close
Lossless v Lossy Compression
Compression can be categorised in two broad ways:
Lossless Compression — Entropy Encoding Schemes,
LZW algorithm used in GIF image file format.
341
Lossy Compression — Source Coding Transform Coding,
DCT used in JPEG/MPEG etc.
Lossy methods have to employed for image and video
compression:
• Compression ratio of lossless methods (e.g., Huffman Coding,
Arithmetic Coding, LZW) is not high enough
JJ
II
J
I
Back
Close
Lossless Compression Algorithms:
Repetitive Sequence Suppression
• Fairly straight forward to understand and implement.
342
• Simplicity is their downfall: NOT best compression ratios.
• Some methods have their applications, e.g. Component of
JPEG, Silence Suppression.
JJ
II
J
I
Back
Close
Simple Repetition Suppression
If a sequence a series on n successive tokens appears
• Replace series with a token and a count number of
occurrences.
• Usually need to have a special flag to denote when the
repeated token appears
343
For Example
89400000000000000000000000000000000
we can replace with
894f32
where f is the flag for zero.
JJ
II
J
I
Back
Close
Simple Repetition Suppression: How Much Compression?
Compression savings depend on the content of the data.
344
Applications of this simple compression technique include:
• Suppression of zero’s in a file (Zero Length Suppression)
– Silence in audio data, Pauses in conversation etc.
– Bitmaps
– Blanks in text or program source files
– Backgrounds in images
• Other regular image or data tokens
JJ
II
J
I
Back
Close
Lossless Compression Algorithms:
Run-length Encoding
This encoding method is frequently applied to images
(or pixels in a scan line).
345
It is a small compression component used in
JPEG compression.
In this instance:
• Sequences of image elements X1, X2, . . . , Xn (Row by Row)
• Mapped to pairs (c1, l1), (c2, l2), . . . , (cn, ln)
where ci represent image intensity or colour and li the length
of the ith run of pixels
• (Not dissimilar to zero length suppression above).
JJ
II
J
I
Back
Close
Run-length Encoding Example
Original Sequence:
111122233333311112222
346
can be encoded as:
(1,4),(2,3),(3,6),(1,4),(2,4)
How Much Compression?
The savings are dependent on the data.
In the worst case (Random Noise) encoding is more heavy
than original file:
2*integer rather 1* integer if data is represented as integers.
JJ
II
J
I
Back
Close
Lossless Compression Algorithms:
Pattern Substitution
This is a simple form of statistical encoding.
Here we substitute a frequently repeating pattern(s) with a
code.
347
The code is shorter than than pattern giving us
compression.
A simple Pattern Substitution scheme could employ predefined
codes
JJ
II
J
I
Back
Close
Simple Pattern Substitution Example
For example replace all occurrences of ‘The’ with the
predefined code ’&’.
348
So:
The code is The Key
Becomes:
& code is & Key
Similar for other codes — commonly used words
JJ
II
J
I
Back
Close
Token Assignment
More typically tokens are assigned to according to frequency of
occurrence of patterns:
• Count occurrence of tokens
349
• Sort in Descending order
• Assign some symbols to highest count tokens
A predefined symbol table may used i.e. assign code i to
token T . (E.g. Some dictionary of common words/tokens)
However, it is more usual to dynamically assign codes to tokens.
The entropy encoding schemes below basically attempt to
decide the optimum assignment of codes to achieve the best
compression.
JJ
II
J
I
Back
Close
Lossless Compression Algorithms
Entropy Encoding
• Lossless Compression frequently involves some form of
entropy encoding
350
• Based on information theoretic techniques.
JJ
II
J
I
Back
Close
Basics of Information Theory
According to Shannon, the entropy of an information source S
is defined as:
P
H(S) = η = i pi log2 p1i
351
where pi is the probability that symbol Si in S will occur.
• log2 p1i indicates the amount of information contained in Si,
i.e., the number of bits needed to code Si.
• For example, in an image with uniform distribution of gray-level
intensity, i.e. pi = 1/256, then
– The number of bits needed to code each gray level is 8
bits.
– The entropy of this image is 8.
JJ
II
J
I
Back
Close
The Shannon-Fano Algorithm — Learn by Example
This is a basic information theoretic algorithm.
A simple example will be used to illustrate the algorithm:
352
A finite token Stream:
ABBAAAACDEAAABBBDDEEAAA........
Count symbols in stream:
Symbol
A
B
C
D
E
---------------------------------Count
15
7
6
6
5
JJ
II
J
I
Back
Close
Encoding for the Shannon-Fano Algorithm:
• A top-down approach
1. Sort symbols (Tree Sort) according to their
frequencies/probabilities, e.g., ABCDE.
2. Recursively divide into two parts, each with approx. same
number of counts.
353
JJ
II
J
I
Back
Close
3. Assemble code by depth first traversal of tree to symbol
node
Symbol
-----A
B
C
D
E
Count
----15
7
6
6
5
log(1/p)
Code
Subtotal (# of bits)
---------------- ------------------1.38
00
30
2.48
01
14
2.70
10
12
2.70
110
18
2.96
111
15
TOTAL (# of bits): 89
354
4. Transmit Codes instead of Tokens
• Raw token stream 8 bits per (39 chars) token = 312 bits
• Coded data stream = 89 bits
JJ
II
J
I
Back
Close
Huffman Coding
• Based on the frequency of occurrence of a data item
(pixels or small blocks of pixels in images).
• Use a lower number of bits to encode more frequent data
355
• Codes are stored in a Code Book — as for Shannon (previous
slides)
• Code book constructed for each image or a set of images.
• Code book plus encoded data must be transmitted to enable
decoding.
JJ
II
J
I
Back
Close
Encoding for Huffman Algorithm:
• A bottom-up approach
1. Initialization: Put all nodes in an OPEN list, keep it sorted
at all times (e.g., ABCDE).
356
2. Repeat until the OPEN list has only one node left:
(a) From OPEN pick two nodes having the lowest
frequencies/probabilities, create a parent node of them.
(b) Assign the sum of the children’s frequencies/probabilities
to the parent node and insert it into OPEN.
(c) Assign code 0, 1 to the two branches of the tree, and
delete the children from OPEN.
JJ
II
J
I
Back
Close
357
Symbol
-----A
B
C
D
E
Count
----15
7
6
6
5
log(1/p)
Code
Subtotal (# of bits)
---------------- -------------------1.38
0
15
2.48
100
21
2.70
101
18
2.70
110
18
2.96
111
15
TOTAL (# of bits): 87
JJ
II
J
I
Back
Close
The following points are worth noting about the above algorithm:
• Decoding for the above two algorithms is trivial as long as
the coding table/book is sent before the data.
– There is a bit of an overhead for sending this.
– But negligible if the data file is big.
358
• Unique Prefix Property: no code is a prefix to any other
code (all symbols are at the leaf nodes) –> great for decoder,
unambiguous.
• If prior statistics are available and accurate, then Huffman
coding is very good.
JJ
II
J
I
Back
Close
Huffman Entropy
In the above example:
359
Idealentropy = (15 ∗ 1.38 + 7 ∗ 2.48 + 6 ∗ 2.7
+6 ∗ 2.7 + 5 ∗ 2.96)/39
= 85.26/39
= 2.19
Number of bits needed for Huffman Coding is: 87/39 = 2.23
JJ
II
J
I
Back
Close
Huffman Coding of Images
In order to encode images:
• Divide image up into (typically) 8x8 blocks
• Each block is a symbol to be coded
360
• Compute Huffman codes for set of block
• Encode blocks accordingly
• In JPEG: Blocks are DCT coded first before Huffman may be
applied (More soon)
Coding image in blocks is common to all image coding methods
JJ
II
J
I
Back
Close
Adaptive Huffman Coding
Motivations:
(a) The previous algorithms require prior statistical knowledge
• This may not be available
361
• E.g. live audio, video
(b) Even when stats dynamically available,
• Heavy overhead if many tables had to be sent — tables may
change drastically
• A non-order 0 model is used,
• I.e. taking into account the impact of the previous symbol to
the probability of the current symbol can improve efficiency.
• E.g., ”qu” often come together, ....
JJ
II
J
I
Back
Close
Solution: Use adaptive algorithms
As an example, the Adaptive Huffman Coding is examined
below.
The idea is however applicable to other adaptive compression
algorithms.
ENCODER
-------
DECODER
-------
Initialize_model();
while ((c = getc (input))
!= eof)
{
encode (c, output);
update_model (c);
}
Initialize_model();
while ((c = decode (input))
!= eof)
{
putc (c, output);
update_model (c);
}
362
JJ
II
J
I
Back
Close
• Key: encoder and decoder use same initialization and
update model routines.
• update model does two things:
(a) increment the count,
(b) update the Huffman tree.
363
– During the updates, the Huffman tree will be maintained
– its sibling property, i.e. the nodes (internal and leaf) are
arranged in order of increasing weights.
– When swapping is necessary, the farthest node with weight
W is swapped with the node whose weight has just been
increased to W+1.
– Note: If the node with weight W has a subtree beneath it,
then the subtree will go with it.
– The Huffman tree could look very different after swapping
JJ
II
J
I
Back
Close
364
JJ
II
J
I
Back
Close
Arithmetic Coding
• A widely used entropy coder
• Also used in JPEG — more soon
• Only problem is it’s speed due possibly complex computations
due to large symbol tables,
365
• Good compression ratio (better than Huffman coding),
entropy around the Shannon Ideal value.
Why better than Huffman?
• Huffman coding etc. use an integer number (k) of bits for
each symbol,
– hence k is never less than 1.
• Sometimes, e.g., when sending a 1-bit image, compression
becomes impossible.
JJ
II
J
I
Back
Close
Decimal Static Arithmetic Coding
• Here we describe basic approach of Arithmetic Coding
• Initially basic static coding mode of operation.
366
• Initial example decimal coding
• Extend to Binary and then machine word length later
JJ
II
J
I
Back
Close
Basic Idea
The idea behind arithmetic coding is
• To have a probability line, 0–1, and
• Assign to every symbol a range in this line based on its
probability,
367
• The higher the probability, the higher range which assigns to
it.
Once we have defined the ranges and the probability line,
• Start to encode symbols,
• Every symbol defines where the output floating point number
lands within the range.
JJ
II
J
I
Back
Close
Simple Basic Arithmetic Coding Example
Assume we have the following token symbol stream
368
BACA
Therefore
• A occurs with probability 0.5,
• B and C with probabilities 0.25.
JJ
II
J
I
Back
Close
Basic Arithmetic Coding Algorithm
Start by assigning each symbol to the probability range 0–1.
• Sort symbols highest probability first
369
Symbol Range
A
[0.0, 0.5)
B
[0.5, 0.75)
C
[0.75, 1.0)
The first symbol in our example stream is B
• We now know that the code will be in the range 0.5 to 0.74999 . . ..
JJ
II
J
I
Back
Close
Range is not yet unique
• Need to narrow down the range to give us a unique code.
Basic arithmetic coding iteration
370
• Subdivide the range for the first token given the probabilities
of the second token then the third etc.
JJ
II
J
I
Back
Close
Subdivide the range as follows
For all the symbols
• Range = high - low
• High = low + range * high range of the symbol being coded
371
• Low = low + range * low range of the symbol being coded
Where:
• Range, keeps track of where the next range should be.
• High and low, specify the output number.
• Initially High = 1.0, Low = 0.0
JJ
II
J
I
Back
Close
Back to our example
The second symbols we have
(now Range = 0.25, Low = 0.5, High = 0.75):
372
Symbol
Range
BA
[0.5, 0.625)
BB
[0.625, 0.6875)
BC
[0.6875, 0.75)
JJ
II
J
I
Back
Close
Third Iteration
We now reapply the subdivision of our scale again to get for
our third symbol
(Range = 0.125, Low = 0.5, High = 0.625):
373
Symbol
Range
BAA
[0.5, 0.5625)
BAB
[0.5625, 0.59375)
BAC
[0.59375, 0.625)
JJ
II
J
I
Back
Close
Fourth Iteration
Subdivide again
(Range = 0.03125, Low = 0.59375, High = 0.625):
374
Symbol
Range
BACA
[0.59375, 0.60937)
BACB [0.609375, 0.6171875)
BACC
[0.6171875, 0.625)
So the (Unique) output code for BACA is any number in the
range:
[0.59375, 0.60937).
JJ
II
J
I
Back
Close
Decoding
To decode is essentially the opposite
• We compile the table for the sequence given probabilities.
375
• Find the range of number within which the code number lies
and carry on
JJ
II
J
I
Back
Close
Binary static algorithmic coding
This is very similar to above:
• except we us binary fractions.
Binary fractions are simply an extension of the binary systems
into fractions much like decimal fractions.
376
JJ
II
J
I
Back
Close
Binary Fractions — Quick Guide
Fractions in decimal:
0.1 decimal = 1011 = 1/10
0.01 decimal = 1012 = 1/100
0.11 decimal = 1011 + 1012 = 11/100
377
So in binary we get
0.1 binary = 211 = 1/2 decimal
0.01 binary = 212 = 1/4 decimal
0.11 binary = 211 + 212 = 3/4 decimal
JJ
II
J
I
Back
Close
Binary Arithmetic Coding Example
• Idea: Suppose alphabet was X, Y and token stream:
XXY
Therefore:
378
prob(X) = 2/3
prob(Y) = 1/3
• If we are only concerned with encoding length 2 messages,
then we can map all possible messages to intervals in the
range [0..1]:
JJ
II
J
I
Back
Close
• To encode message, just send enough bits of a binary fraction
that uniquely specifies the interval.
379
JJ
II
J
I
Back
Close
• Similarly, we can map all possible length 3 messages to
intervals in the range [0..1]:
380
JJ
II
J
I
Back
Close
Implementation Issues
FPU Precision
• Resolution of the of the number we represent is limited by
FPU precision
381
• Binary coding extreme example of rounding
• Decimal coding is the other extreme — theoretically no
rounding.
• Some FPUs may us up to 80 bits
• As an example let us consider working with 16 bit resolution.
JJ
II
J
I
Back
Close
16-bit arithmetic coding
We now encode the range 0–1 into 65535 segments:
0.000 0.250 0.500 0,750 1.000
0000h 4000h 8000h C000h FFFFh
382
If we take a number and divide it by the maximum (FFFFh) we
will clearly see this:
0000h:
4000h:
8000h:
C000h:
FFFFh:
0/65535 = 0.0
16384/65535 =
32768/65535 =
49152/65535 =
65535/65535 =
0.25
0.5
0.75
1.0
JJ
II
J
I
Back
Close
The operation of coding is similar to what we have seen with
the binary coding:
• Adjust the probabilities so the bits needed for operating with
the number aren’t above 16 bits.
• Define a new interval
383
• The way to deal with the infinite number is
– to have only loaded the 16 first bits, and when needed
shift more onto it:
1100 0110 0001 000 0011 0100 0100 ...
– work only with those bytes
– as new bits are needed they’ll be shifted.
JJ
II
J
I
Back
Close
Memory Problems
What about an alphabet with 26 symbols, or 256 symbols, ...?
• In general, number of bits is determined by the size of the
interval.
384
• In general, (from entropy) need − log p bits to represent interval
of size p.
• Can be memory and CPU intensive
JJ
II
J
I
Back
Close
Estimating Probabilities - Dynamic Arithmetic Coding?
How to determine probabilities?
• If we have a static stream we simply count the tokens.
Could use a priori information for static or dynamic if scenario
familiar.
385
But for Dynamic Data?
• Simple idea is to use adaptive model:
– Start with guess of symbol frequencies — or all equal
probabilities
– Update frequency with each new symbol.
• Another idea is to take account of inter-symbol probabilities,
e.g., Prediction by Partial Matching.
JJ
II
J
I
Back
Close
Lempel-Ziv-Welch (LZW) Algorithm
• A very common compression technique.
• Used in GIF files (LZW), Adobe PDF file (LZW), UNIX compress
(LZ Only)
386
• Patented — LZW not LZ.
Basic idea/Example by Analogy:
Suppose we want to encode the Oxford Concise English
dictionary which contains about 159,000 entries.
Why not just transmit each word as an 18 bit number?
JJ
II
J
I
Back
Close
Problems:
• Too many bits,
• Everyone needs a dictionary,
• Only works for English text.
387
Solution:
• Find a way to build the dictionary adaptively.
• Original methods (LZ) due to Lempel and Ziv in 1977/8.
• Terry Welch improved the scheme in 1984,
Patented LZW Algorithm
JJ
II
J
I
Back
Close
LZW Compression Algorithm
The LZW Compression Algorithm can summarised as follows:
w = NIL;
while ( read a
{
if wk
w
else
{
character k )
388
exists in the dictionary
= wk;
add wk to the dictionary;
output the code for w;
w = k;
}
}
• Original LZW used dictionary with 4K entries, first 256 (0-255)
are ASCII codes.
JJ
II
J
I
Back
Close
Example:
Input string is "ˆWEDˆWEˆWEEˆWEBˆWET".
w
k
output
index
symbol
----------------------------------------NIL
ˆ
ˆ
W
ˆ
256
ˆW
•
W
E
W
257
WE
E
D
E
258
ED
D
ˆ
D
259
Dˆ
ˆ
W
ˆW
E
256
260
ˆWE
E
ˆ
E
261
Eˆ
ˆ
W
•
ˆW
E
ˆWE
E
260
262
ˆWEE
E
ˆ
Eˆ
W
261
263
EˆW
W
E
WE
B
257
264
WEB
B
ˆ
B
265
Bˆ
ˆ
W
ˆW
E
ˆWE
T
260
266
ˆWET
T
EOF
T
A 19-symbol input
has been reduced
to 7-symbol plus
5-code output. Each
code/symbol
will
need more than 8
bits, say 9 bits.
Usually,
compression
doesn’t start until
a large number of
bytes (e.g., > 100)
are read in.
389
JJ
II
J
I
Back
Close
LZW Decompression Algorithm
The LZW Decompression Algorithm is as follows:
read a character k;
output k;
w = k;
while ( read a character k )
/* k could be a character or a code. */
{
entry = dictionary entry for k;
output entry;
add w + entry[0] to dictionary;
w = entry;
}
390
JJ
II
J
I
Back
Close
Example (continued):
Input string is
"ˆWED<256>E<260><261><257>B<260>T"
w
k
output
index
symbol
---------------------------------------ˆ
ˆ
ˆ
W
W
256
ˆW
W
E
E
257
WE
E
D
D
258
ED
D
<256>
ˆW
259
Dˆ
<256>
E
E
260
ˆWE
E
<260>
ˆWE
261
Eˆ
<260> <261>
Eˆ
262
ˆWEE
<261> <257>
WE
263
EˆW
<257>
B
B
264
WEB
B
<260>
ˆWE
265
Bˆ
<260>
T
T
266
ˆWET
391
JJ
II
J
I
Back
Close
Problems?
• What if we run out of dictionary space?
– Solution 1: Keep track of unused entries and use LRU
– Solution 2: Monitor compression performance and flush
dictionary when performance is poor.
392
• Implementation Note: LZW can be made really fast;
– it grabs a fixed number of bits from input stream,
– so bit parsing is very easy.
– Table lookup is automatic.
JJ
II
J
I
Back
Close
Entropy Encoding Summary
• Huffman maps fixed length symbols to variable length codes.
Optimal only when symbol probabilities are powers of 2.
• Arithmetic maps entire message to real number range based
on statistics. Theoretically optimal for long messages, but
optimality depends on data model. Also can be CPU/memory
intensive.
393
• Lempel-Ziv-Welch is a dictionary-based compression method.
It maps a variable number of symbols to a fixed length code.
• Adaptive algorithms do not need a priori estimation of
probabilities, they are more useful in real applications.
JJ
II
J
I
Back
Close
Lossy Compression: Source Coding Techniques
Source coding is based changing on the content of the original
signal.
Also called semantic-based coding
394
High compression rates may be high but a price of loss of
information. Good compression rates make be achieved with
source encoding with (occasionally) lossless or (mostly) little
perceivable loss of information.
There are three broad methods that exist:
• Transform Coding
• Differential Encoding
• Vector Quantisation
JJ
II
J
I
Back
Close
Transform Coding
A simple transform coding example
A Simple Transform Encoding procedure maybe described by
the following steps for a 2x2 block of monochrome pixels:
395
1. Take top left pixel as the base value for the block, pixel A.
2. Calculate three other transformed values by taking the
difference between these (respective) pixels and pixel A,
Ii.e. B-A, C-A, D-A.
3. Store the base pixel and the differences as the values of the
transform.
JJ
II
J
I
Back
Close
Simple Transforms
Given the above we can easily form the forward transform:
X0
X1
X2
X3
=
=
=
=
A
B−A
C −A
D−A
396
and the inverse transform is:
An
Bn
Cn
Dn
=
=
=
=
X0
X1 + X0
X2 + X0
X3 + X0
JJ
II
J
I
Back
Close
Compressing data with this Transform?
Exploit redundancy in the data:
• Redundancy transformed to values, Xi.
• Compress the data by using fewer bits to represent the
differences.
397
– I.e if we use 8 bits per pixel then the 2x2 block uses 32
bits
– If we keep 8 bits for the base pixel, X0,
– Assign 4 bits for each difference then we only use 20 bits.
– Better than an average 5 bits/pixel
JJ
II
J
I
Back
Close
Example
Consider the following 4x4 image block:
120 130
125 120
398
then we get:
X0
X1
X2
X3
=
=
=
=
120
10
5
0
We can then compress these values by taking less bits to
represent the data.
JJ
II
J
I
Back
Close
Inadequacies of Simple Scheme
• It is Too Simple
• Needs to operate on larger blocks (typically 8x8 min)
• Simple encoding of differences for large values will result in
loss of information
399
– V. poor losses possible here 4 bits per pixel = values 0-15
unsigned,
– Signed value range: −7 – 7 so either quantise in multiples
of 255/max value or massive overflow!!
• More advance transform encoding techniques are very
common – DCT
JJ
II
J
I
Back
Close
Frequency Domain Methods
Frequency domains can be obtained through the
transformation from one (Time or Spatial) domain to the other
(Frequency) via
400
• Discrete Cosine Transform (DCT)— Heart of JPEG and
MPEG Video, (alt.) MPEG Audio.
• Fourier Transform (FT) — MPEG Audio
JJ
II
J
I
Back
Close
1D Example
Lets consider a 1D (e.g. Audio) example to see what the different
domains mean:
Consider a complicated sound such as the noise of a car
horn. We can describe this sound in two related ways:
401
• Sample the amplitude of the sound many times a second,
which gives an approximation to the sound as a function of
time.
• Analyse the sound in terms of the pitches of the notes, or
frequencies, which make the sound up, recording the
amplitude of each frequency.
JJ
II
J
I
Back
Close
An 8 Hz Sine Wave
In the example (next slide):
• A signal that consists of a sinusoidal wave at 8 Hz.
• 8 Hz means that wave is completing 8 cycles in 1 second
402
• The frequency of that wave (8 Hz).
• From the frequency domain we can see that the composition
of our signal is
– one wave (one peak) occurring with a frequency of 8 Hz
– with a magnitude/fraction of 1.0 i.e. it is the whole signal.
JJ
II
J
I
Back
Close
An 8 Hz Sine Wave (Cont.)
403
JJ
II
J
I
Back
Close
2D Image Example
Now images are no more complex really:
• Brightness along a line can be recorded as a set of values
measured at equally spaced distances apart,
404
• Or equivalently, at a set of spatial frequency values.
• Each of these frequency values is a frequency component.
• An image is a 2D array of pixel measurements.
• We form a 2D grid of spatial frequencies.
• A given frequency component now specifies what contribution
is made by data which is changing with specified x and y
direction spatial frequencies.
JJ
II
J
I
Back
Close
What do frequencies mean in an image?
• Large values at high frequency components then the data
is changing rapidly on a short distance scale.
e.g. a page of text
• Large low frequency components then the large scale features
of the picture are more important.
405
e.g. a single fairly simple object which occupies most of the
image.
JJ
II
J
I
Back
Close
So How Compress (colour) images?
• The 2D matrix of the frequency content is with regard to
colour/chrominance:
• This shows if values are changing rapidly or slowly.
406
• Where the fraction, or value in the frequency matrix is low,
the colour is changing gradually.
• Human eye is insensitive to gradual changes in colour and
sensitive to intensity.
• Ignore gradual changes in colour SO
• Basic Idea: Attempt to throw away data without the human
eye noticing, we hope.
JJ
II
J
I
Back
Close
How can the Frequency Domain Transforms Help to Compress?
Any function (signal) can be decomposed into purely sinusoidal
components (sine waves of different size/shape) which when
added together make up our original signal.
407
Figure 39: DFT of a Square Wave
JJ
II
J
I
Back
Close
Thus Transforming a signal into the frequency domain allows
us
• To see what sine waves make up our underlying signal
• E.g.
– One part sinusoidal wave at 50 Hz and
408
– Second part sinusoidal wave at 200 Hz.
More complex signals will give more complex graphs but the
idea is exactly the same. The graph of the frequency domain is
called the frequency spectrum.
JJ
II
J
I
Back
Close
Visualising this: Think Graphic Equaliser
An easy way to visualise what is happening is to think of a
graphic equaliser on a stereo.
409
Figure 40: A Graphic Equaliser
JJ
II
J
I
Back
Close
Fourier Theory
The tool which converts a spatial (real space) description of
an image into one in terms of its frequency components is called
the Fourier transform
410
The new version is usually referred to as the Fourier space
description of the image.
The corresponding inverse transformation which turns a Fourier
space description back into a real space one is called the
inverse Fourier transform.
JJ
II
J
I
Back
Close
1D Case
Considering a continuous function f (x) of a single variable x
representing distance.
The Fourier transform of that function is denoted F (u), where
u represents spatial frequency is defined by
Z ∞
F (u) =
f (x)e−2πixu dx.
(1)
411
−∞
Note: In general F (u) will be a complex quantity even though
the original data is purely real.
The meaning of this is that not only is the magnitude of each
frequency present important, but that its phase relationship is
too.
JJ
II
J
I
Back
Close
Inverse 1D Fourier Transform
The inverse Fourier transform for regenerating f (x) from F (u) is
given by
Z
412
∞
f (x) =
F (u)e2πixu du,
(2)
−∞
which is rather similar, except that the exponential term has
the opposite sign.
JJ
II
J
I
Back
Close
Example Fourier Transform
Let’s see how we compute a Fourier Transform: consider a
particular function f (x) defined as
1 if |x| ≤ 1
f (x) =
(3)
0 otherwise,
Figure 41: A top hat function
413
JJ
II
J
I
Back
Close
So its Fourier transform is:
Z ∞
F (u) =
f (x)e−2πixu dx
Z−∞
1
1 × e−2πixu dx
=
−1
−1 2πiu
(e
− e−2πiu)
2πiu
sin 2πu
=
.
πu
414
=
(4)
In this case F (u) is purely real, which is a consequence of the
original data being symmetric in x and −x.
A graph of F (u) is shown overleaf.
This function is often referred to as the Sinc function.
JJ
II
J
I
Back
Close
The Sync Function
415
Figure 42: Fourier transform of a top hat function
JJ
II
J
I
Back
Close
2D Case
If f (x, y) is a function, for example the brightness in an image,
its Fourier transform is given by
Z ∞Z ∞
f (x, y)e−2πi(xu+yv) dx dy,
(5)
F (u, v) =
−∞
416
−∞
and the inverse transform, as might be expected, is
Z
∞
Z
∞
f (x, y) =
−∞
F (u, v)e2πi(xu+yv) du dv.
(6)
−∞
JJ
II
J
I
Back
Close
Images are digitised !!
Thus, we need a discrete formulation of the Fourier transform,
which takes such regularly spaced data values, and returns the
value of the Fourier transform for a set of values in frequency
space which are equally spaced.
417
This is done quite naturally by replacing the integral by a
summation, to give the discrete Fourier transform or DFT for
short.
In 1D it is convenient now to assume that x goes up in steps
of 1, and that there are N samples, at values of x from 0 to N −1.
JJ
II
J
I
Back
Close
1D Discrete Fourier transform
So the DFT takes the form
N −1
1 X
F (u) =
f (x)e−2πixu/N ,
N x=0
(7)
418
while the inverse DFT is
f (x) =
N
−1
X
F (u)e2πixu/N .
(8)
x=0
NOTE: Minor changes from the continuous case are a factor
of 1/N in the exponential terms, and also the factor 1/N in front
of the forward transform which does not appear in the inverse
transform.
JJ
II
J
I
Back
Close
2D Discrete Fourier transform
The 2D DFT works is similar. So for an N × M grid in x and y
we have
N −1 M −1
1 XX
f (x, y)e−2πi(xu/N +yv/M ),
F (u, v) =
N M x=0 y=0
and
f (x, y) =
−1
N
−1 M
X
X
F (u, v)e2πi(xu/N +yv/M ).
(9)
419
(10)
u=0 v=0
JJ
II
J
I
Back
Close
Balancing the 2D DFT
Often N = M , and it is then it is more convenient to redefine
F (u, v) by multiplying it by a factor of N , so that the forward and
inverse transforms are more symmetrical:
and
420
N −1 N −1
1 XX
f (x, y)e−2πi(xu+yv)/N ,
F (u, v) =
N x=0 y=0
(11)
N −1 N −1
1 XX
F (u, v)e2πi(xu+yv)/N .
f (x, y) =
N u=0 v=0
(12)
JJ
II
J
I
Back
Close
Compression
How do we achieve compression?
• Low pass filter — ignore high frequency noise components
• Only store lower frequency components
421
• High Pass Filter — Spot Gradual Changes
• If changes to low Eye does not respond so ignore?
Where do put threshold to cut off?
JJ
II
J
I
Back
Close
Relationship between DCT and FFT
DCT (Discrete Cosine Transform) is actually a cut-down version
of the FFT:
• Only the real part of FFT
422
• Computationally simpler than FFT
• DCT — Effective for Multimedia Compression
• DCT MUCH more commonly used in Multimedia.
JJ
II
J
I
Back
Close
The Discrete Cosine Transform (DCT)
• Similar to the discrete Fourier transform:
– it transforms a signal or image from the spatial domain to
the frequency domain
– DCT can approximate lines well with fewer coefficients
423
Figure 43: DCT Encoding
• Helps separate the image into parts (or spectral sub-bands)
of differing importance (with respect to the image’s visual
quality).
JJ
II
J
I
Back
Close
1D DCT
For N data items 1D DCT is defined by:
F (u) =
2
N
12 N
−1
X
Λ(i).cos
i=0
h π.u
2.N
i
(2i + 1) f (i)
424
and the corresponding inverse 1D DCT transform is simple
F −1(u), i.e.:
f (i) = F −1(u)
12 N
−1
h π.u
i
X
2
=
Λ(i).cos
(2i + 1) F (i)
N
2.N
i=0
where
(
Λ(i) =
√1
2
1
forξ = 0
otherwise
JJ
II
J
I
Back
Close
2D DCT
For a 2D N by M image 2D DCT is defined :
1 1 N −1 M −1
2 2 2 2 X X
Λ(i).Λ(j).
F (u, v) =
N
M
i=0 j=0
h π.u
i
h π.v
i
cos
(2i + 1) cos
(2j + 1) .f (i, j)
2.N
2.M
425
and the corresponding inverse 2D DCT transform is simple F −1 (u, v),
i.e.:
f (i) = F −1 (u, v)
21 NX
−1
−1 M
X
2
=
Λ(i)..Λ(j).
N
i=0 j=0
h π.u
i
h π.v
i
cos
(2i + 1) .cos
(2j + 1) .F (i, j)
2.N
2.M
where
(
Λ(ξ) =
√1
2
1
forξ = 0
otherwise
JJ
II
J
I
Back
Close
Performing DCT Computations
The basic operation of the DCT is as follows:
• The input image is N by M;
• f(i,j) is the intensity of the pixel in row i and column j;
426
• F(u,v) is the DCT coefficient in row k1 and column k2 of the
DCT matrix.
• The DCT input is an 8 by 8 array of integers.
This array contains each image window’s gray scale pixel
levels;
• 8 bit pixels have levels from 0 to 255.
JJ
II
J
I
Back
Close
Compression with DCT for Compression
• For most images, much of the signal energy lies at low
frequencies;
– These appear in the upper left corner of the DCT.
427
• Compression is achieved since the lower right values
represent higher frequencies, and are often small
– Small enough to be neglected with little visible distortion.
JJ
II
J
I
Back
Close
Computational Issues (1)
• Image is partitioned into 8 x 8 regions — The DCT input is
an 8 x 8 array of integers.
• An 8 point DCT would be:
428
i
h π.u
1X
(2i + 1) .
F (u, v) =
Λ(i).Λ(j).cos
4 i,j
16
h π.u
i
cos
(2i + 1) f (i, j)
16
where
Λ(ξ) =
√1
2
1
forξ = 0
otherwise
• The output array of DCT coefficients contains integers; these can
range from -1024 to 1023.
JJ
II
J
I
Back
Close
Computational Issues (2)
• Computationally easier to implement and more efficient to
regard the DCT as a set of basis functions
– Given a known input array size (8 x 8) can be precomputed
and stored.
429
– Computing values for a convolution mask (8 x 8 window)
that get applied
∗ Sum values x pixel the window overlap with image apply
window across all rows/columns of image
– The values as simply calculated from DCT formula.
JJ
II
J
I
Back
Close
Computational Issues (3)
Visualisation of DCT basis functions
430
Figure 44: The 64 (8 x 8) DCT basis functions
JJ
II
J
I
Back
Close
Computational Issues (4)
• Factoring reduces problem to a series of 1D DCTs
(No need to apply 2D form directly):
431
– apply 1D DCT (Vertically) to Columns
– apply 1D DCT (Horizontally) to resultant
Vertical DCT above.
– or alternatively Horizontal to Vertical.
Figure 45: 2x1D Factored 2D DCT Computation
JJ
II
J
I
Back
Close
Computational Issues (5)
• The equations are given by:
432
G(i, v) =
1X
h π.v
i
(2j + 1) f (i, j)
Λ(u).cos
2 i
16
h π.u
i
1X
F (u, v) =
Λ(u).cos
(2i + 1) G(i, v)
2 i
16
• Most software implementations use fixed point arithmetic.
Some fast implementations approximate coefficients so all
multiplies are shifts and adds.
JJ
II
J
I
Back
Close
Differential Encoding
Simple example of transform coding mentioned earlier and
instance of this approach.
Here:
• The difference between the actual value of a sample and a
prediction of that values is encoded.
433
• Also known as predictive encoding.
• Example of technique include: differential pulse code
modulation, delta modulation and adaptive pulse code
modulation — differ in prediction part.
• Suitable where successive signal samples do not differ much,
but are not zero. E.g. Video — difference between frames,
some audio signals.
JJ
II
J
I
Back
Close
Differential Encoding Methods
• Differential pulse code modulation (DPCM)
Simple prediction (also used in JPEG):
434
fpredict(ti) = factual (ti−1)
I.e. a simple Markov model where current value is the predict
next value.
So we simply need to encode:
∆f (ti) = factual (ti) − factual (ti−1)
If successive sample are close to each other we only need
to encode first sample with a large number of bits:
JJ
II
J
I
Back
Close
Simple Differential Pulse Code Modulation Example
Actual Data: 9 10 7 6
Predicted Data: 0 9 10 7
435
∆f (t): +9, +1, -3, -1.
JJ
II
J
I
Back
Close
Differential Encoding Methods (Cont.)
• Delta modulation is a special case of DPCM:
– Same predictor function,
– Coding error is a single bit or digit that indicates the
current sample should be increased or decreased by a
step.
436
– Not Suitable for rapidly changing signals.
• Adaptive pulse code modulation
Fuller Temporal/Markov model:
– Data is extracted from a function of a series of previous
values
– E.g. Average of last n samples.
– Characteristics of sample better preserved.
JJ
II
J
I
Back
Close
Vector Quantisation
The basic outline of this approach is:
• Data stream divided into (1D or 2D square) blocks — vectors
• A table or code book is used to find a pattern for each block.
437
• Code book can be dynamically constructed or predefined.
• Each pattern for block encoded as a look value in table
• Compression achieved as data is effectively subsampled and
coded at this level.
JJ
II
J
I
Back
Close
Compression II: Images (JPEG)
What is JPEG?
438
• JPEG: Joint Photographic Expert Group — an international
standard in 1992.
• Works with colour and greyscale images
• Up 24 bit colour images (Unlike GIF)
• Target Photographic Quality Images (Unlike GIF)
• Suitable Many applications e.g., satellite, medical, general
photography...
JJ
II
J
I
Back
Close
Basic JPEG Compression Pipeline
JPEG compression involves the following:
• Encoding
439
Figure 46: JPEG Encoding
• Decoding – Reverse the order for encoding
JJ
II
J
I
Back
Close
Major Coding Algorithms in JPEG
The Major Steps in JPEG Coding involve:
• Colour Space Transform and subsampling (YIQ)
• DCT (Discrete Cosine Transformation)
440
• Quantization
• Zigzag Scan
• DPCM on DC component
• RLE on AC Components
• Entropy Coding — Huffman or Arithmetic
We have met most of the algorithms already:
• JPEG exploits them in the compression pipeline to achieve
maximal overall compression.
JJ
II
J
I
Back
Close
Quantization
Why do we need to quantise:
• To throw out bits from DCT.
• Example: 101101 = 45 (6 bits).
441
Truncate to 4 bits: 1011 = 11.
Truncate to 3 bits: 101 = 5.
• Quantization error is the main source of Lossy Compression.
• DCT itself not Lossy
• How we throw away bits in Quantization Step is Lossy
JJ
II
J
I
Back
Close
Uniform quantization
• Divide by constant N and round result
(N = 4 or 8 in examples above).
• Non powers-of-two gives fine control
(e.g., N = 6 loses 2.5 bits)
442
JJ
II
J
I
Back
Close
Quantization Tables
• In JPEG, each F[u,v] is divided by a constant q(u,v).
• Table of q(u,v) is called quantization table.
• Eye is most sensitive to low frequencies (upper left corner),
less sensitive to high frequencies (lower right corner)
443
• Standard defines 2 default quantization tables, one for
luminance (below), one for chrominance.
---------------------------------16 11 10 16 24
40
51
61
12 12 14 19 26
58
60
55
14 13 16 24 40
57
69
56
14 17 22 29 51
87
80
62
18 22 37 56 68
109 103 77
24 35 55 64 81
104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
----------------------------------
JJ
II
J
I
Back
Close
Quantization Tables (Cont)
• Q: How would changing the numbers affect the picture
444
E.g., if we doubled them all?
Quality factor in most implementations is the scaling factor for
default quantization tables.
• Custom quantization tables can be put in image/scan header.
JPEG Quantisation Examples
• JPEG Quantisation Example (Java Applet)
JJ
II
J
I
Back
Close
Zig-zag Scan
What is the purpose of the Zig-zag Scan:
• to group low frequency coefficients in top of vector.
• Maps 8 x 8 to a 1 x 64 vector
445
JJ
II
J
I
Back
Close
Differential Pulse Code Modulation (DPCM) on DC component
• Another encoding method is employed
• DPCM on the DC component at least.
446
• Why is this strategy adopted:
– DC component is large and varied, but often close to
previous value (like lossless JPEG).
– Encode the difference from previous 8x8 blocks – DPCM
JJ
II
J
I
Back
Close
Run Length Encode (RLE) on AC components
Yet another simple compression technique is applied to the AC
component:
• 1x64 vector has lots of zeros in it
447
• Encode as (skip, value) pairs, where skip is the number of
zeros and value is the next non-zero component.
• Send (0,0) as end-of-block sentinel value.
JJ
II
J
I
Back
Close
Entropy Coding
DC and AC components finally need to be represented by a
smaller number of bits:
• Categorize DC values into SSS (number of bits needed to
represent) and actual bits.
448
-------------------Value
SSS
0
0
-1,1
1
-3,-2,2,3
2
-7..-4,4..7
3
--------------------
• Example: if DC value is 4, 3 bits are needed.
Send off SSS as Huffman symbol, followed by actual 3 bits.
• For AC components (skip, value), encode the composite symbol
(skip,SSS) using the Huffman coding.
• Huffman Tables can be custom (sent in header) or default.
JJ
II
J
I
Back
Close
Example JPEG Compression
449
JJ
II
J
I
Back
Close
Another Enumerated Example
450
JJ
II
J
I
Back
Close
JPEG 2000
• New version released in 2002.
• Based on:
– discrete wavelet transform (DWT) ,instead of DCT,
451
– scalar quantization,
– context modeling,
– arithmetic coding,
– post-compression rate allocation.
• Application: variety of uses, ranging from digital photography
to medical imaging to advanced digital scanning and printing.
• Higher compression efficiency — visually lossless
compression at 1 bit per pixel or better.
JJ
II
J
I
Back
Close
Further Information
Basic JPEG Information:
• http://www.jpeg.org
• Online JPEG Tutorial
452
For more information on the JPEG 2000 standard for still image
coding, refer to
http://www.jpeg.org/JPEG2000.htm
JJ
II
J
I
Back
Close
Compression III:
Video Compression (MPEG and others)
We need to compress video (more so than audio/images) in
practice since:
1. Uncompressed video (and audio) data are huge.
In HDTV, the bit rate easily exceeds 1 Gbps. — big problems
for storage and network communications.
453
E.g. HDTV: 1920 x 1080 at 30 frames per second, 8 bits per
RGB (YCrCb actually) channel = 1.5 Gbps.
2. Lossy methods have to employed since thecompression ratio
of lossless methods (e.g., Huffman, Arithmetic, LZW) is not
high enough for image and video compression, especially
when distribution of pixel values is relatively flat.
JJ
II
J
I
Back
Close
Not the complete picture studied here
Much more to MPEG — Plenty of other tricks employed.
We only concentrate on some basic principles of video
compression:
454
• Earlier H.261 and MPEG 1 and 2 standards.
JJ
II
J
I
Back
Close
Compression Standards(1)
Image, Video and Audio Compression standards have been
specifies and released by two main groups since 1985:
ISO - International Standards Organisation: JPEG, MEPG.
455
ITU - International Telecommunications Union: H.261 — 264.
JJ
II
J
I
Back
Close
Compression Standards (2)
Whilst in many cases one of the groups have specified separate
standards there is some crossover between the groups.
For example:
456
• JPEG issued by ISO in 1989 (but adopted by ITU as ITU T.81)
• MPEG 1 released by ISO in 1991,
• H.261 released by ITU in 1993 (based on CCITT 1990 draft).
CCITT stands for Comité Consultatif International Téléphonique et
Télégraphique whose parent company is ITU.
• H.262 is alternatively better known as MPEG-2 released in 1994.
• H.263 released in 1996 extended as H.263+, H.263++.
• MPEG 4 release in 1998.
• H.264 releases in 2002 for DVD quality and is now part of MPEG 4 (Part 10).
Quicktime 6 supports this.
JJ
II
J
I
Back
Close
How to compress video?
Basic Idea of Video Compression:
Motion Estimation/Compensation
• Spatial Redundancy Removal – Intraframe coding (JPEG)
457
NOT ENOUGH BY ITSELF?
• Temporal — Greater compression by noting the temporal
coherence/incoherence over frames. Essentially we note the
difference between frames.
• Spatial and Temporal Redundancy Removal – Intraframe and
Interframe coding (H.261, MPEG)
JJ
II
J
I
Back
Close
Simple Motion Estimation/Compensation Example
Things are much more complex in practice of course.
458
Which Format to represent the compressed data?
• Simply based on Differential Pulse Code Modulation (DPCM).
JJ
II
J
I
Back
Close
Simple Motion Example (Cont.)
Consider a simple image (block) of a moving circle.
Lets just consider the difference between 2 frames.
459
It simple to encode/decode:
JJ
II
J
I
Back
Close
Now lets Estimate Motion of blocks
We will examine methods of estimating motion vectors in due
course.
460
Figure 47: Motion estimation/compensation (encoding)
JJ
II
J
I
Back
Close
Decoding Motion of blocks
461
Figure 48: Motion estimation/compensation (decoding)
Why is this a better method than just frame differencing?
JJ
II
J
I
Back
Close
How is this used in Video Compression Standards?
Block Matching:
• MPEG-1/H.261 is done by using block matching techniques,
For a certain area of pixels in a picture:
462
• find a good estimate of this area in a previous (or in a future)
frame, within a specified search area.
Motion compensation:
• uses the motion vectors to compensate the picture.
• parts of a previous (or future) picture can be reused in a
subsequent picture.
• individual parts spatially compressed
JJ
II
J
I
Back
Close
Any Overheads?
• Motion estimation/compensation techniques reduces the
video bitrate significantly
but
• introduce extras computational complexity and delay (?),
463
– need to buffer reference pictures - backward and forward
referencing.
– reconstruct from motion parameters
Lets see how such ideas are used in practice.
JJ
II
J
I
Back
Close
H.261 Compression
The basic approach to H. 261 Compression is summarised as
follows:
H. 261 Compression has been specifically designed for video
telecommunication applications:
464
• Developed by CCITT in 1988-1990
• Meant for videoconferencing, videotelephone applications
over ISDN telephone lines.
• Baseline ISDN is 64 kbits/sec, and integral multiples (px64)
JJ
II
J
I
Back
Close
Overview of H.261
• Frame types are CCIR 601 CIF (352x288) and
QCIF (176x144) images with 4:2:0 subsampling.
• Two frame types:
Intraframes (I-frames) and Interframes (P-frames)
• I-frames use basically JPEG — but YUV (YCrCb) and larger DCT windows,
465
different quantisation
• I-frame provide us with a (re)fresh accessing point — Key Frames
• P-frames use pseudo-differences from previous frame (predicted), so frames
depend on each other.
JJ
II
J
I
Back
Close
Intra Frame Coding
• Various lossless and lossy compression techniques use
• Compression contained only within the current frame
• Simpler coding – Not enough by itself for high compression.
466
• However, cant rely on inter frame differences across a large
number of frames
– So when Errors get too large: Start a new I-Frame
JJ
II
J
I
Back
Close
Intraframe coding is very similar to that of a JPEG still image
video encoder:
467
JJ
II
J
I
Back
Close
This is a basic Intra Frame Coding Scheme is as follows:
• Macroblocks are typically 16x16 pixel areas on Y plane of
original image.
• A macroblock usually consists of 4 Y blocks, 1 Cr block, and
1 Cb block. (4:2:0 chroma subsampling)
468
– Eye most sensitive luminance, less sensitive chrominance.
– S0 operate on an effective color space: YUV (YCbCr)
colour which we have met.
– Typical to use 4:2:0 macroblocks: one quarter of the
chrominance information used.
• Quantization is by constant value for all DCT coefficients.
I.e., no quantization table as in JPEG.
JJ
II
J
I
Back
Close
The Macroblock is coded as follows:
• Many macroblocks will be exact matches (or close enough).
So send address of each block in image –> Addr
469
• Sometimes no good match can be found, so send INTRA
block –> Type
• Will want to vary the quantization to fine tune compression,
so send quantization value –> Quant
• Motion vector –> vector
• Some blocks in macroblock will match well, others match
poorly. So send bitmask indicating which blocks are present
(Coded Block Pattern, or CBP).
• Send the blocks (4 Y, 1 Cr, 1 Cb) as in JPEG.
JJ
II
J
I
Back
Close
Inter-frame (P-frame) Coding
• Intra frame limited spatial basis relative to 1 frame
• Considerable more compression if the inherent temporal basis
is exploited as well.
470
BASIC IDEA:
• Most consecutive frames within a sequence are very similar
to the frames both before (and after) the frame of interest.
• Aim to exploit this redundancy.
• Use a technique known as block-based motion compensated
prediction
• Need to use motion estimation
• Coding needs extensions for Inter but encoder can also
supports an Intra subset.
JJ
II
J
I
Back
Close
471
Figure 49: P-Frame Coding
JJ
II
J
I
Back
Close
Forward Prediction Basics:
• Start with I frame (spatially with no reference to any other
frame)
• Predict a future P frame(s) in a forward time manner.
472
• As an example, Predict future 6 frame sequence:
I,P,P,P,P,P,I,P,P,P,P,
JJ
II
J
I
Back
Close
P-coding can be summarised as follows:
473
JJ
II
J
I
Back
Close
A Coding Example (P-frame)
• Previous image is called reference image.
• Image to code is called target image.
• Actually, the difference is encoded.
474
• Subtle points:
1. Need to use decoded image as reference image,
not original. Why?
2. We’re using ”Mean Absolute Difference” (MAD) to decide
best block.
Can also use ”Mean Squared Error” (MSE) = sum(E*E)
JJ
II
J
I
Back
Close
Hard Problems in H.261
There are however a few difficult problems in H.261:
• Motion vector search
• Propagation of Errors
475
• Bit-rate Control
JJ
II
J
I
Back
Close
Motion Vector Search
476
JJ
II
J
I
Back
Close
• C(x + k, y + i) – pixels in the macro block with upper left
corner (x, y) in the Target.
• R(X + i + k, y + j + l) – pixels in the macro block with upper
left corner (x + i, y + j) in the Reference.
• Cost function is:
477
Where MAE stands for Mean Absolute Error.
• Goal is to find a vector (u, v) such that MAE (u, v) is minimum
– Full Search Method
– Two-Dimensional Logarithmic Search
JJ
II
J
I
Back
Close
Hierarchical Motion Estimation:
478
1. Form several low resolution version of the target and reference
pictures
2. Find the best match motion vector in the lowest resolution version.
3. Modify the motion vector level by level when going up
JJ
II
J
I
Back
Close
Propagation of Errors
• Send an I-frame every once in a while
• Make sure you use decoded frame for comparison
479
JJ
II
J
I
Back
Close
Bit-rate Control
• Simple feedback loop based on ”buffer fullness”
If buffer is too full, increase the quantization scale factor to
reduce the data.
480
JJ
II
J
I
Back
Close
MPEG Compression
MPEG stands for:
• Motion Picture Expert Group — established circa 1990 to
create standard for delivery of audio and video
481
• MPEG-1 (1991).Target: VHS quality on a CD-ROM (320 x
240 + CD audio @ 1.5 Mbits/sec)
• MPEG-2 (1994): Target Television Broadcast
• MPEG-3: HDTV but subsumed into and extension of MPEG-2
• MPEG 4 (1998): Very Low Bitrate Audio-Visual Coding
• MPEG-7 (2001) ”Multimedia Content Description Interface”.
• MPEG-21 (2002) ”Multimedia Framework”
JJ
II
J
I
Back
Close
Three Parts to MPEG
• The MPEG standard had three parts:
1. Video: based on H.261 and JPEG
2. Audio: based on MUSICAM technology
482
3. System: control interleaving of streams
JJ
II
J
I
Back
Close
MPEG Video
MPEG compression is essentially a attempts to over come some
shortcomings of H.261 and JPEG:
• Recall H.261 dependencies:
483
JJ
II
J
I
Back
Close
• The Problem here is that many macroblocks need information
is not in the reference frame.
• For example:
484
JJ
II
J
I
Back
Close
• The MPEG solution is to add a third frame type which is a
bidirectional frame, or B-frame
• B-frames search for macroblock in past and future frames.
• Typical pattern is IBBPBBPBB IBBPBBPBB IBBPBBPBB
Actual pattern is up to encoder, and need not be regular.
485
JJ
II
J
I
Back
Close
MPEG Video Layers (1)
MPEG video is broken up into a hierarchy of layers to help
• Error handling,
• Random search and editing, and
486
• Synchronization, for example with an audio bitstream.
JJ
II
J
I
Back
Close
MPEG Video Layers (2)
From the top level, the layers are
Video sequence layer — any self-contained bitstream.
For example a coded movie or advertisement.
487
Group of pictures – composed of 1 or more groups of intra (I)
frames and/or non-intra (P and/or B) pictures.
Picture layer — itself,
Slice Layer — layer beneath Picture it is called the slice layer.
JJ
II
J
I
Back
Close
Slice Layer
• Each slice: a contiguous sequence of raster ordered
macroblocks,
• Each macroblock ordered on row basis in typical video
applications
488
• Each macroblock is 16x16 arrays of
– luminance pixels, or
– picture data elements, with 2 8x8 arrays of associated
chrominance pixels.
• Macroblocks may be further divided into distinct 8x8 blocks,
for further processing such as transform coding.
JJ
II
J
I
Back
Close
Coding Layers in Macroblock
• Each of layers has its own unique 32 bit start code :
– 23 zero bits followed by a one, then followed by
– 8 bits for the actual start code.
489
– Start codes may have as many zero bits as desired
preceding them.
JJ
II
J
I
Back
Close
B-Frames
New from H.261
• MPEG uses forward/backward interpolated prediction.
• Frames are commonly referred to as bi-directional interpolated
prediction frames, or B frames for short.
490
JJ
II
J
I
Back
Close
Example I, P, and B frames
Consider a group of pictures that lasts for 6 frames:
• Given
I,B,P,B,P,B,I,B,P,B,P,B,
491
• I frames are coded spatially only (as before)
• P frames are forward predicted based on previous I and P frames(as before).
• B frames are coded based on a forward prediction from a previous I or P frame,
as well as a backward prediction from a succeeding I or P frame.
• Here: 1st B frame is predicted from the 1st I frame and 1st P frame.
• 2nd B frame is predicted from the 2nd and 3rd P frames.
• 3rd B frame is predicted from the 3rd P frame and the 1st I frame of the next
group of pictures.
JJ
II
J
I
Back
Close
Backward Prediction
Note: Backward prediction requires that the future frames that
are to be used for backward prediction be
• encoded and
492
• transmitted first,
• out of order.
This process is summarized in Figure 50.
Figure 50: B-Frame Encoding
JJ
II
J
I
Back
Close
Also NOTE:
• No defined limit to the number of consecutive B frames that
may be used in a group of pictures,
• Optimal number is application dependent.
• Most broadcast quality applications however, have tended
to use 2 consecutive B frames (I,B,B,P,B,B,P,) as the ideal
trade-off between compression efficiency and video quality.
493
JJ
II
J
I
Back
Close
Advantage of the usage of B frames
• Coding efficiency.
• Most B frames use less bits.
• Quality can also be improved in the case of moving objects
that reveal hidden areas within a video sequence.
494
• Better Error propagation: B frames are not used to predict
future frames, errors generated will not be propagated further
within the sequence.
Disadvantage:
• Frame reconstruction memory buffers within the encoder and
decoder must be doubled in size to accommodate the 2
anchor frames.
JJ
II
J
I
Back
Close
Motion Estimation
• The temporal prediction technique used in MPEG video is
based on motion estimation.
495
The basic premise:
• Consecutive video frames will be similar except for changes
induced by objects moving within the frames.
• Trivial case of zero motion between frames — no other
differences except noise, etc.),
• Easy for the encoder to predict the current frame as a duplicate
of the prediction frame.
• When there is motion in the images, the situation is not as
simple.
JJ
II
J
I
Back
Close
Example of a frame with 2 stick figures and a tree
The problem for motion estimation to solve is :
• How to adequately represent the changes, or differences,
between these two video frames.
496
Figure 51: Motion Estimation Example
JJ
II
J
I
Back
Close
Solution:
A comprehensive 2-dimensional spatial search is performed
for each luminance macroblock.
• Motion estimation is not applied directly to chrominance in
MPEG
497
• MPEG does not define how this search should be performed.
• A detail that the system designer can choose to implement
in one of many possible ways.
• Well known that a full, exhaustive search over a wide 2-D
area yields the best matching results in most cases, but at
extreme computational cost to the encoder.
• Motion estimation usually is the most computationally
expensive portion of the video encode
JJ
II
J
I
Back
Close
498
Figure 52: Motion Est. Macroblock Example
JJ
II
J
I
Back
Close
Motion Vectors, Matching Blocks
Figure 52 shows an example of a particular macroblock from
Frame 2 of Figure 51, relative to various macroblocks of Frame
1.
499
• The top frame has a bad match with the macroblock to be coded.
• The middle frame has a fair match, as there is some commonality between the
2 macroblocks.
• The bottom frame has the best match, with only a slight error between the 2
macroblocks.
• Because a relatively good match has been found, the encoder assigns motion
vectors to that macroblock,
• Each forward and backward predicted macroblock may contain 2 motion vectors,
• Achieved true bidirectionally predicted macroblocks will utilize 4 motion vectors.
JJ
II
J
I
Back
Close
500
Figure 53: Final Motion Estimation Prediction
JJ
II
J
I
Back
Close
Figure 53 shows how a potential predicted Frame 2 can be
generated from Frame 1 by using motion estimation.
• The predicted frame is subtracted from the desired frame,
• Leaving a (hopefully) less complicated residual error frame
that can then be encoded much more efficiently than before
motion estimation.
501
• The more accurate the motion is estimated and matched,
the more likely it will be that the residual error will approach
zero,
• And the coding efficiency will be highest.
JJ
II
J
I
Back
Close
Further coding efficiency
• Motion vectors tend to be highly correlated between
macroblocks:
– The horizontal component is compared to the previously
valid horizontal motion vector and
502
– Only the difference is coded.
– Same difference is calculated for the vertical component
– Difference codes are then described with a variable length
code for maximum compression efficiency.
JJ
II
J
I
Back
Close
What Happens if we find acceptable match?
B/P Blocks may not be what they appear to be?
503
If the encoder decides that no acceptable match exists then it
has the option of
• Coding that particular macroblock as an intra macroblock,
• Even though it may be in a P or B frame ??????
• In this manner, high quality video is maintained at a slight
cost to coding efficiency.
JJ
II
J
I
Back
Close
Estimating the Motion Vectors
Basic Ideas is to search for Macroblock (MB)
• Within a ±n x m pixel search window
• Work out Sum of Absolute Difference (SAD)
(or Mean Absolute Error (MAE) for each window but this is
computationally more expensive)
504
• Choose window where SAD is a minimum.
JJ
II
J
I
Back
Close
SAD Computation
SAD is computed by:
For i = -n to +n
For j = -m to +m
505
l=N −1 j=N −1
Σk=0
Σj=0
SAD(i, j) =
| C(x + k, y + l) − R(x + i + k, y + j + l) |
• N = is size of Macroblock window typically (16 or 32 pixels),
• (x, y) the position of the original MB, C, and
• R is the region to compute the SAD.
JJ
II
J
I
Back
Close
Allowing for an Alpha Mask in SAD
It is sometimes applicable for an alpha mask to be applied to
SAD calculation to mask out certain pixels.
−1 j=N −1
SAD(i, j) = Σl=N
k=0 Σj=0
| C(x + k, y + l) − R(x + i + k, y + j + l) | ∗(!alphac(i, j) = 0)
506
JJ
II
J
I
Back
Close
SAD Search Example
So for a ± 2x2 Search Area is given by dashed lines and a 2x2
Macroblock window example, the SAD is given by bold dot dash
line (near top right corner) in Figure 54.
507
Figure 54: SAD Window search Example
JJ
II
J
I
Back
Close
Selecting Intra/Inter Frame coding
Based upon the motion estimation a decision is made on
whether INTRA or INTER coding is made.
To determine INTRA/INTER MODE we do the following
calculation:
M Bmean =
508
−1
ΣN
i=0,j=0 |C(i,j)|
N
A = Σn,m
i=0,j=0 | C(i, j) − M Bmean | ∗(!alphac (i, j) = 0)
If A < (SAD − 2N ) INTRA Mode is chosen.
JJ
II
J
I
Back
Close
Coding of Predicted Frames:Coding Residual Errors
• A predicted frame is subtracted from its reference and
• the residual error frame is generated,
• this information is spatially coded as in I frames,
–
–
–
–
by coding 8x8 blocks with the DCT,
DCT coefficient quantization,
run-length/amplitude coding, and
bitstream buffering with rate control feedback.
509
• The default quantization matrix for non-intra frames is a flat matrix
with a constant value of 16 for each of the 64 locations.
• The non-intra quantization step function contains a dead-zone
around zero that is not present in the intra version. This helps
eliminate any lone DCT coefficient quantization values that might
reduce the run-length amplitude efficiency.
• Finally, the motion vectors for the residual block information are
calculated as differential values and are coded with a variable
length code.
JJ
II
J
I
Back
Close
Differences from H.261
• Larger gaps between I and P frames, so expand motion
vector search range.
510
• To get better encoding, allow motion vectors to be specified
to fraction of a pixel (1/2 pixels).
• Bitstream syntax must allow random access,
forward/backward play, etc.
• Added notion of slice for synchronization after loss/corrupt
data.
JJ
II
J
I
Back
Close
Differences from H.261 (Cont.)
• B frame macroblocks can specify two motion vectors (one to
past and one to future), indicating result is to be averaged.
511
JJ
II
J
I
Back
Close
MPEG-2, MPEG-3, and MPEG-4
• MPEG-2 target applications
---------------------------------------------------------------Level
size
Pixels/sec
bit-rate
Application
(Mbits)
---------------------------------------------------------------Low 352 x 240
3 M 4
consumer tape equiv.
Main
720 x 480
10 M
15
studio TV
High 1440 1440 x 1152
47 M
60
consumer HDTV
High
1920 x 1080
63 M
80
film production
-----------------------------------------------------------------
512
• MPEG-2 differences from MPEG-1
1.
2.
3.
4.
5.
6.
Search on fields, not just frames.
4:2:2 and 4:4:4 macroblocks
Frame sizes as large as 16383 x 16383
Scalable modes: Temporal, Progressive,...
Non-linear macroblock quantization factor
A bunch of minor fixes
JJ
II
J
I
Back
Close
MPEG-2, MPEG-3, and MPEG-4 (Cont.)
• MPEG-3: Originally for HDTV (1920 x 1080), got folded into
MPEG-2
• MPEG-4: very low bit-rate communication (4.8 to 64 kb/sec).
Video processing
513
JJ
II
J
I
Back
Close
Compression IV:
Audio Compression (MPEG and others)
As with video a number of compression techniques have been
applied to audio.
514
Simple Audio Compression Methods
RECAP (Already Studied)
Traditional lossless compression methods (Huffman, LZW, etc.)
usually don’t work well on audio compression
• For the same reason as in image and video compression:
Too much change variation in data over a short time
JJ
II
J
I
Back
Close
Some Simple But Limited Practical Methods
• Silence Compression - detect the ”silence”, similar to
run-length encoding (seen examples before)
515
• Differential Pulse Code Modulation (DPCM)
Relies on the fact that difference in amplitude in successive
samples is small then we can used reduced bits to store the
difference (seen examples before)
JJ
II
J
I
Back
Close
Simple But Limited Practical Methods Continued ....
• Adaptive Differential Pulse Code Modulation (ADPCM)
e.g., in CCITT G.721 – 16 or 32 Kbits/sec.
516
(a) Encodes the difference between two consecutive signals
but a refinement on DPCM,
(b) Adapts at quantisation so fewer bits are used when the
value is smaller.
– It is necessary to predict where the waveform is headed
–> difficult
– Apple had a proprietary scheme called ACE/MACE. Lossy
scheme that tries to predict where wave will go in next
sample. About 2:1 compression.
JJ
II
J
I
Back
Close
Simple But Limited Practical Methods Continued ....
• Adaptive Predictive Coding (APC) typically used on Speech.
– Input signal is divided into fixed segments (windows)
517
– For each segment, some sample characteristics are
computed, e.g. pitch, period, loudness.
– These characteristics are used to predict the signal
– Computerised talking (Speech Synthesisers use such
methods) but low bandwidth:
acceptable quality at 8 kbits/sec
JJ
II
J
I
Back
Close
Simple But Limited Practical Methods Continued ....
• Linear Predictive Coding (LPC) fits signal to speech model
and then transmits parameters of model as in APC.
Speech Model:
518
– Speech Model:
pitch, period, loudness, vocal tract
parameters (voiced and unvoiced sounds).
– Synthesised speech
– Still sounds like a computer talking,
– Bandwidth as low as 2.4 kbits/sec.
JJ
II
J
I
Back
Close
Simple But Limited Practical Methods Continued ....
• Code Excited Linear Predictor (CELP) does LPC, but also
transmits error term.
519
– Based on more sophisticated model of vocal tract than
LPC
– Better perceived speech quality
– Audio conferencing quality at 4.8 kbits/sec.
JJ
II
J
I
Back
Close
Psychoacoustics or Perceptual Coding
Basic Idea: Exploit areas where the human ear is less sensitive
to sound to achieve compression
E.g. MPEG audio
How do we hear sound?
520
JJ
II
J
I
Back
Close
Sound revisited
• Sound is produced by a vibrating source.
• The vibrations disturb air molecules
• Produce variations in air pressure:
lower than average pressure, rarefactions, and
higher than average, compressions.
This produces sound waves.
521
• When a sound wave impinges on a surface (e.g. eardrum or
microphone) it causes the surface to vibrate in sympathy:
• In this way acoustic energy is transferred from a source to a
receptor.
JJ
II
J
I
Back
Close
Human Hearing
• Upon receiving the the waveform the eardrum vibrates in
sympathy
• Through a variety of mechanisms the acoustic energy is
transferred to nerve impulses that the brain interprets as
sound.
522
The ear can be regarded as being made up of 3 parts:
• The outer ear,
• The middle ear,
• The inner ear.
JJ
II
J
I
Back
Close
Human Ear
We consider:
• The function of the main parts of the ear
• How the transmission of sound is processed.
523
⇒ FLASH EAR DEMO (Lecture ONLY)
Click Here to Run Flash Ear Demo over the Web (Shockwave
Required)
JJ
II
J
I
Back
Close
The Outer Ear
524
• Ear Canal: Focuses the incoming audio.
• Eardrum (Tympanic Membrane):
– Interface between the external and middle ear.
– Sound is converted into mechanical vibrations via the
middle ear.
– Sympathetic vibrations on the membrane of the eardrum.
JJ
II
J
I
Back
Close
The Middle Ear
525
• 3 small bones, the ossicles:
Malleus, Incus, and Stapes.
• Form a system of levers which are linked together and driven
by the eardrum
• Bones amplify the force of sound vibrations.
JJ
II
J
I
Back
Close
The Inner Ear
526
The Cochlea:
• Transforms mechanical ossicle forces into hydraulic pressure,
• The cochlea is filled with fluid.
• Hydraulic pressure imparts movement to the cochlear duct and to the organ of Corti.
• Cochlea which is no bigger than the tip of a little finger!!
Semicircular canals
• Body’s balance mechanism
• Thought that it plays no part in hearing.
JJ
II
J
I
Back
Close
How the Cochlea Works
• Pressure waves in the cochlea exert energy along a route
that begins at the oval window and ends abruptly at the
membrane-covered round window
• Pressure applied to the oval window is transmitted to all
parts of the cochlea.
527
Stereocilia
• Inner surface of the cochlea (the basilar membrane) is lined
with over 20,000 hair-like nerve cells — stereocilia,
• One of the most critical aspects of hearing.
JJ
II
J
I
Back
Close
Stereocilia Microscope Images
528
JJ
II
J
I
Back
Close
Hearing different frequencies
• Basilar membrane is tight at one end, looser at the other
• High tones create their greatest crests where the membrane
is tight,
529
• Low tones where the wall is slack.
• Causes resonant frequencies much like what happens in a
tight string.
• Stereocilia differ in length by minuscule amounts
• they also have different degrees of resiliency to the fluid
which passes over them.
JJ
II
J
I
Back
Close
Finally to nerve signals
• Compressional wave moves middle ear through to the cochlea
• Stereocilia will be set in motion.
530
• Each stereocilia sensitive to a particular frequency.
• Stereocilia cell will resonate with a larger amplitude of
vibration.
• Increased vibrational amplitude induces the cell to release
an electrical impulse which passes along the auditory nerve
towards the brain.
In a process which is not clearly understood, the brain is
capable of interpreting the qualities of the sound upon reception
of these electric nerve impulses.
JJ
II
J
I
Back
Close
Sensitivity of the Ear
• Range is about 20 Hz to 20 kHz, most sensitive at 2 to 4
KHz.
• Dynamic range (quietest to loudest) is about 96 dB
531
• Approximate threshold of pain: 130 dB
• Hearing damage: > 90 dB (prolonged exposure)
• Normal conversation: 60-70 dB
• Typical classroom background noise: 20-30 dB
• Normal voice range is about 500 Hz to 2 kHz
– Low frequencies are vowels and bass
– High frequencies are consonants
JJ
II
J
I
Back
Close
Question: How sensitive is human hearing?
The sensitivity of the human ear with respect to frequency is
given by the following graph.
532
JJ
II
J
I
Back
Close
Frequency dependence is also level dependent!
Ear response is even more complicated.
Complex phenomenon to explain.
Illustration : Loudness Curves or Fletcher-Munson Curves:
533
JJ
II
J
I
Back
Close
What do the curves mean?
534
• Curves indicate perceived loudness is a function of both the
frequency and the level (sinusoidal sound signal)
• Equal loudness curves. Each contour:
– Equal loudness
– Express how much a sound level must be changed as the
frequency varies,
to maintain a certain perceived loudness
JJ
II
J
I
Back
Close
Physiological Implications
Why are the curves accentuated where they are?
• Accentuates of frequency range to coincide with speech.
535
• Sounds like p and t have very important parts of their spectral
energy within the accentuated range
• Makes them more easy to discriminate between.
The ability to hear sounds of the accentuated range (around
a few kHz) is thus vital for speech communication.
JJ
II
J
I
Back
Close
Traits of Human Hearing
Frequency Masking
• Multiple frequency audio changes the sensitivity with the
relative amplitude of the signals.
536
• If the frequencies are close and the amplitude of one is less
than the other close frequency then the second frequency
may not be heard.
JJ
II
J
I
Back
Close
Critical Bands
• Range of closeness for frequency masking depends on the
frequencies and relative amplitudes.
• Each band where frequencies are masked is called the Critical
Band
537
• Critical bandwidth for average human hearing varies with
frequency:
– Constant 100 Hz for frequencies less than 500 Hz
– Increases (approximately) linearly by 100 Hz for each
additional 500 Hz.
• Width of critical band is called a bark.
JJ
II
J
I
Back
Close
What is the cause of Frequency Masking?
• The stereocilia are excited by air pressure variations,
transmitted via outer and middle ear.
• Different stereocilia respond to different ranges of
frequencies — the critical bands
538
Frequency Masking occurs because after excitation by one
frequency further excitation by a less strong similar frequency
of the same group of cells is not possible.
JJ
II
J
I
Back
Close
Example of frequency masking
• Example: Play 1 kHz tone (maskingtone) at fixed level (60
dB). Play test tone at a different level (e.g., 1.1 kHz), and
raise level until just distinguishable.
• Vary the frequency of the test tone and plot the threshold
when it becomes audible:
• If we repeat for various frequencies of masking tones we get:
539
JJ
II
J
I
Back
Close
Temporal masking
After the ear hears a loud sound:
• It takes a further short while before it can hear a quieter sound.
Why is this so?
540
• Stereocilia vibrate with corresponding force of input sound stimuli.
• If the stimuli is strong then stereocilia will be in a high state of
excitation and get fatigued.
• After extended listening to loud music or headphones this
sometimes manifests itself with ringing in the ears and even
temporary deafness.
• Prolonged exposure to noise permanently damages the Stereocilia.
Temporal Masking occurs because the hairs take time to settle
after excitation to respond again.
JJ
II
J
I
Back
Close
Example of Temporal Masking
• Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz
at 40 dB. Test tone can’t be heard (it’s masked).
Stop masking tone, then stop test tone after a short delay.
Adjust delay time to the shortest time that test tone can be
heard (e.g., 5 ms).
541
Repeat with different level of the test tone and plot:
JJ
II
J
I
Back
Close
Example of Temporal Masking (Cont.)
• Try other frequencies for test tone (masking tone duration
constant). Total effect of masking:
542
JJ
II
J
I
Back
Close
Summary: How to Exploit?
• If we have a loud tone at, say at 1 kHz, then nearby quieter
tones are masked.
• Best compared on critical band scale – range of masking is
about 1 critical band
543
• Two factors for masking – frequency masking and temporal
masking
• Question: How to use this for compression?
Two examples:
– MPEG Audio
– Dolby
JJ
II
J
I
Back
Close
How to compute?
We have met basic tools:
• Fourier and Discrete Cosine Transforms
• Work in frequency space
544
• (Critical) Band Pass Filtering — Visualise a graphic equaliser
JJ
II
J
I
Back
Close
MPEG Audio Compression
• Exploits the psychoacoustic models above.
• Frequency masking is always utilised
545
• More complex forms of MPEG also employ temporal masking.
JJ
II
J
I
Back
Close
Basic Frequency Filtering Bandpass
MPEG audio compression basically works by:
• Dividing the audio signal up into a set of frequency subbands
546
• Subbands approximate critical bands.
• Each band quantised according to the audibility of
quantisation noise.
Quantisation is the key to MPEG audio compression and is
the reason why it is lossy.
JJ
II
J
I
Back
Close
How good is MPEG compression?
Although (data) lossy
MPEG claims to be perceptually lossless:
• Human tests (part of standard development), Expert
547
listeners.
• 6-1 compression ratio, stereo 16 bit samples at 48 Khz
compressed to 256 kbits/sec
• Difficult, real world examples used.
• Under Optimal listening conditions no statistically
distinguishable difference between original and MPEG.
JJ
II
J
I
Back
Close
Basic MPEG: MPEG audio coders
• Set of standards for the use of video with sound.
• Compression methods or coders associated with audio
548
compression are called MPEG audio coders.
• MPEG allows for a variety of different coders to employed.
• Difference in level of sophistication in applying perceptual
compression.
• Different layers for levels of sophistication.
JJ
II
J
I
Back
Close
An Advantage of MPEG approach
Complex psychoacoustic modelling only in coding phase
• Desirable for real time (Hardware or software)
decompression
549
• Essential for broadcast purposes.
• Decompression is independent of the psychoacoustic
models used
• Different models can be used
• If there is enough bandwidth no models at all.
JJ
II
J
I
Back
Close
Basic MPEG: MPEG Standards
Evolving standards for MPEG audio compression:
• MPEG-1 is by the most prevalent
550
• So called mp3 files we get off Internet are members of
MPEG-1 family.
• Standards now extends to MPEG-4 (structured audio) —
Previous Lecture.
For now we concentrate on MPEG-1
JJ
II
J
I
Back
Close
Basic MPEG: MPEG Facts
• MPEG-1: 1.5 Mbits/sec for audio and video
About 1.2 Mbits/sec for video, 0.3 Mbits/sec for audio
(Uncompressed CD audio is
44,100 samples/sec * 16 bits/sample * 2 channels > 1.4 Mbits/sec)
551
• Compression factor ranging from 2.7 to 24.
• MPEG audio supports sampling frequencies of 32, 44.1 and 48
KHz.
• Supports one or two audio channels in one of the four modes:
1. Monophonic – single audio channel
2. Dual-monophonic – two independent channels
(functionally identical to stereo)
3. Stereo – for stereo channels that share bits, but not using
joint-stereo coding
4. Joint-stereo – takes advantage of the correlations between
stereo channels
JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (1)
Basic encoding algorithm summarised below:
552
JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (2)
The main stages of the algorithm are:
• The audio signal is first samples and quantised use PCM
– Application dependent: Sample rate and number of bits
553
• The PCM samples are then divided up into a number of
frequency subband and compute subband scaling factors:
JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (3)
Analysis filters
• Also called critical-band filters
• Break signal up into equal width subbands
554
• Use fast Fourier transform (FFT) (or discrete cosine
transform (DCT))
• Filters divide audio signal into frequency subbands that
approximate the 32 critical bands
• Each band is known as a sub-band sample.
• Example: 16 kHz signal frequency, Sampling rate 32 kbits/sec
gives each subband a bandwidth of 500 Hz.
• Time duration of each sampled segment of input signal is
time to accumulate 12 successive sets of 32 PCM (subband)
samples, i.e. 32*12 = 384 samples.
JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (4)
Analysis filters (cont)
• In addition to filtering the input, analysis banks determine
555
– Maximum amplitude of 12 subband samples in each
subband.
– Each known as the scaling factor of the subband.
– Passed to psychoacoustic model and quantiser blocks
JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (5)
Psychoacoustic modeller:
• Frequency Masking and may employ temporal masking.
556
• Performed concurrently with filtering and analysis operations.
• Determine amount of masking for each band caused by nearby
bands.
• Input: set hearing thresholds and subband masking
properties (model dependent) and scaling factors (above).
JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (6)
Psychoacoustic modeller (cont):
• Output: a set of signal-to-mask ratios:
557
– Indicate those frequencies components whose amplitude
is below the audio threshold.
– If the power in a band is below the masking threshold,
don’t encode it.
– Otherwise, determine number of bits (from scaling
factors) needed to represent the coefficient such that noise
introduced by quantisation is below the masking effect
(Recall that 1 bit of quantisation introduces about 6 dB
of noise).
JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (7)
Example of Quantisation:
• Assume that after analysis, the first levels of 16 of the 32
bands are these:
558
---------------------------------------------------------------------Band
1 2
3
4 5 6
7
8
9 10 11 12 13 14 15 16
Level (db) 0 8 12 10 6 2 10 60 35 20 15
2
3
5
3
1
----------------------------------------------------------------------
• If the level of the 8th band is 60 dB,
then assume (according to model adopted) it gives a masking of
12 dB in the 7th band, 15 dB in the 9th.
Level in 7th band is 10 dB ( < 12 dB ), so ignore it.
Level in 9th band is 35 dB ( > 15 dB ), so send it.
–> Can encode with up to 2 bits (= 12 dB) of quantisation error.
JJ
II
J
I
Back
Close
MPEG-1 Output bitstream
The basic output stream for a basic MPEG encoder is as follows:
559
• Header: contains information such as the sample frequency
and quantisation,.
• Subband sample (SBS) format: Quantised scaling factors
and 12 frequency components in each subband.
– Peak amplitude level in each subband quantised using 6
bits (64 levels)
– 12 frequency values quantised to 4 bits
• Ancillary data: Optional. Used, for example, to carry
additional coded samples associated with special broadcast
format (e.g surround sound)
JJ
II
J
I
Back
Close
Decoding the bitstream
• Dequantise the subband samples after demultiplexing the
coded bitstream into subbands.
560
• Synthesis bank decodes the dequantised subband samples
to produce PCM stream.
– This essentially involves applying the inverse fourier
transform (IFFT) on each substream and multiplexing the
channels to give the PCM bit stream.
JJ
II
J
I
Back
Close
MPEG Layers
MPEG defines 3 levels of processing layers for audio:
• Level 1 is the basic mode,
• Levels 2 and 3 more advance (use temporal masking).
561
• Level 3 is the most common form for audio files on the Web
– Our beloved MP3 files that record companies claim are
bankrupting their industry.
– Strictly speaking these files should be called
MPEG-1 level 3 files.
Each level:
• Increasing levels of sophistication
• Greater compression ratios.
• Greater computation expense
JJ
II
J
I
Back
Close
Level 1
• Best suited for bit rate bigger than 128 kbits/sec per channel.
• Example: Phillips Digital Compact Cassette uses Layer 1
192 kbits/sec compression
562
• Divides data into frames,
– Each of them contains 384 samples,
– 12 samples from each of the 32 filtered subbands as shown
above.
• Psychoacoustic model only uses frequency masking.
• Optional Cyclic Redundancy Code (CRC) error checking.
JJ
II
J
I
Back
Close
Layer 2
• Targeted at bit rates of around 128 kbits/sec per channel.
• Examples: Coding of Digital Audio Broadcasting (DAB) on
CD-ROM, CD-I and Video CD.
• Enhancement of level 1.
563
• Codes audio data in larger groups:
– Use three frames in filter:
before, current, next, a total of 1152 samples.
– This models a little bit of the temporal masking.
• Imposes some restrictions on bit allocation in middle and high
subbands.
• More compact coding of scale factors and quantised
samples.
• Better audio quality due to saving bits here so more bits can be
used in quantised subband values
JJ
II
J
I
Back
Close
Layer 3
• Targeted at bit rates of 64 kbits/sec per channel.
• Example: audio transmission of ISDN or suitable bandwidth network.
• Much more complex approach.
564
• Psychoacoustic model includes temporal masking effects,
• Takes into account stereo redundancy.
• Better critical band filter is used (non-equal frequencies)
• Uses a modified DCT (MDCT) for lossless subband transformation.
• Two different block lengths: 18 (long) or 6 (short)
• 50% overlap between successive transform windows gives window sizes of 36
or 12 — accounts for temporal masking
• Greater frequency resolution accounts for poorer time resolution
• Uses Huffman coding on quantised samples for better compression.
JJ
II
J
I
Back
Close
Comparison of MPEG Levels
-------------------------------------------------------------------Layer
Target
Ratio
Quality @
Quality @
Theoretical
bitrate
64 kbits
128 kbits
Min. Delay
-------------------------------------------------------------------Layer 1
192 kbit
4:1
----19 ms
Layer 2
128 kbit
6:1
2.1 to 2.6
4+
35 ms
Layer 3
64 kbit
12:1
3.6 to 3.8
4+
59 ms
--------------------------------------------------------------------
565
• 5 = perfect, 4 = just noticeable, 3 = slightly annoying,
2 = annoying, 1 = very annoying
• Real delay is about 3 times theoretical delay
JJ
II
J
I
Back
Close
Bit Allocation
• Process determines the number of code bits for each subband
• Based on information from the psychoacoustic model.
566
JJ
II
J
I
Back
Close
Bit Allocation For Layer I and 2
• Compute the mask-to-noise ratio (MNR) for all subbands:
M N RdB = SN RdB − SM RdB
567
where
M N RdB is the mask-to-noise ratio,
SN RdB is the signal-to-noise ratio (SNR), and
SM RdB is the signal-to-mask ratio from the psychoacoustic
model.
• Standard tables estimate SNR for given quantiser levels.
• Designers are free to try other methods SNR estimation.
JJ
II
J
I
Back
Close
Bit Allocation For Layer I and 2 (cont.)
Once MNR computed for all the subbands:
• Search for the subband with the lowest MNR
568
• Allocate code bits to that subband.
• When a subband gets allocated more code bits, the bit
allocation
– Unit looks up the new estimate for SNR
– Recomputes that subband’s MNR.
• The process repeats until no more code bits can be
allocated.
JJ
II
J
I
Back
Close
Bit Allocation For Layer 3
• Uses noise allocation, which employs Huffman coding.
• Iteratively varies the quantisers in an orderly way
569
– Quantises the spectral values,
– Counts the number of Huffman code bits required to code
the audio data
– Calculates the resulting noise in Huffman coding.
If there exist scale factor bands with more than the
allowed distortion:
• Encoder amplifies values in bands
• To effectively decreases the quantiser step size for those
bands.
JJ
II
J
I
Back
Close
Bit Allocation For Layer 3 (Cont.)
After this the process repeats. The process stops if any of
these three conditions is true:
• None of the scale factor bands have more than the allowed
distortion.
570
• The next iteration would cause the amplification for any of
the bands to exceed the maximum allowed value.
• The next iteration would require all the scale factor bands to
be amplified.
Real-time encoders include a time-limit exit condition for this
process.
JJ
II
J
I
Back
Close
Stereo Redundancy Coding
Exploit redundancy in two couple stereo channels?
• Another perceptual property of the human auditory system
571
• Simply stated at low frequencies, the human auditory system
can’t detect where the sound is coming from.
– So save bits and encode it mono.
Two types of stereo redundancy coding:
• Intensity stereo coding — all layers
• Middle/Side (MS) stereo coding — Layer 3 only stereo
coding.
JJ
II
J
I
Back
Close
Intensity stereo coding
Encoding:
• Code some upper-frequency subband outputs:
– A single summed signal instead of sending independent left
and right channels codes
– Codes for each of the 32 subband outputs.
572
Decoding:
• Reconstruct left and right channels
– Based only on a single summed signal
– Independent left and right channel scale factors.
With intensity stereo coding,
• The spectral shape of the left and right channels is the same
within each intensity-coded subband
• But the magnitude is different.
JJ
II
J
I
Back
Close
Middle/Side (MS) stereo coding
• Encodes the left and right channel signals in certain
frequency ranges:
573
– Middle — sum of left and right channels
– Side — difference of left and right channels.
• Encoder uses specially tuned threshold values to compress
the side channel signal further.
JJ
II
J
I
Back
Close
Further MPEG Audio Standards
MPEG-2 audio
Extension of MPEG-1:
574
• Completed in November 1994.
• Multichannel audio support:
– 5 high fidelity audio channels,
– Additional low frequency enhancement channel.
– Applicable for the compression of audio for High Definition Television or
digital movies.
• Multilingual audio support:
– Supports up to 7 additional commentary channels.
JJ
II
J
I
Back
Close
MPEG-2 audio (Cont.)
• Lower compressed audio bit rates:
– Supports bit rates down to 8 kbits/sec.
• Lower audio sampling rates:
575
– Besides 32, 44.1, and 48 kHz,
– Additional 16, 22.05, and 24 kHz.
– E.g Commentary channels can have half high fidelity channel sampling rate.
JJ
II
J
I
Back
Close
MPEG-1/MPEG-2 Compatibility
Forward/backward compatibility?
• MPEG-2 decoders can decode MPEG-1 audio bitstreams.
• MPEG-1 decoders can decode two main channels of MPEG-2
audio bitstreams.
– Achieved by combining suitably weighted versions of each
of the up to 5 channels into a down-mixed left and right
channel.
– These two channels fit into the audio data framework of a
MPEG-1 audio bitstream.
– Information needed to recover the original left, right, and
remaining channels fit into:
• The ancillary data portion of a MPEG-1 audio bitstream,
or
• In a separate auxiliary bitstream.
576
JJ
II
J
I
Back
Close
MPEG-3/MPEG-4
MPEG-3 audio:
• does not exist anymore — merged with MPEG-2
MPEG-4 audio:
577
• Previously studied
• Uses structures audio concept
• Delegates audio production to client synthesis where
appropriate
• Otherwise compress audio stream as above.
JJ
II
J
I
Back
Close
Dolby Audio Compression
Application areas:
• FM radio Satellite transmission and broadcast TV audio
578
(DOLBY AC-1)
• Common compression format in PC sound cards
(DOLBY AC-2)
• High Definition TV standard advanced television (ATV)
(DOLBY AC-3). MPEG a competitor in this area.
JJ
II
J
I
Back
Close
Differences with MPEG
• MPEG perceptual coders control quantisation accuracy of
each subband by computing bit numbers for each sample.
• MPEG needs to store each quantise value with each sample.
579
• MPEG Decoder uses this information to dequantise:
forward adaptive bit allocation
• Advantage of MPEG?: no need for psychoacoustic
modelling in the decoder due to store of every quantise value.
• DOLBY: Use fixed bit rate allocation for each subband.
– No need to send with each frame — as in MPEG.
– DOLBY encoders and decoder need this information.
JJ
II
J
I
Back
Close
Fixed Bit Rate Allocation
• Bit allocations are determined by known sensitivity
characteristics of the ear.
580
JJ
II
J
I
Back
Close
Different Dolby standards
DOLBY AC-1 :
Low complexity psychoacoustic model
581
• 40 subbands at sampling rate of 32 kbits/sec or
• (Proportionally more) Subbands at 44.1 or 48 kbits/sec
• Typical compressed bit rate of 512 kbits per second for
stereo.
• Example: FM radio Satellite transmission and broadcast
TV audio
JJ
II
J
I
Back
Close
DOLBY AC-2 :
Variation to allow subband bit allocations to vary
• NOW Decoder needs copy of psychoacoustic model.
• Minimised encoder bit stream overheads at expense of
transmitting encoded frequency coefficients of sampled
waveform segment — known as the encoded spectral
envelope.
582
• Mode of operation known as
backward adaptive bit allocation mode
• HIgh (hi-fi) quality audio at 256 kbits/sec.
• Not suited for broadcast applications:
– encoder cannot change model without changing
(remote/distributed) decoders
• Example: Common compression format in PC sound cards.
JJ
II
J
I
Back
Close
DOLBY AC-3 :
Development of AC-2 to overcome broadcast challenge
• Use hybrid backward/forward adaptive bit allocation mode
• Any model modification information is encoded in a frame.
• Sample rates of 32, 44.1, 48 kbits/sec supported depending
•
•
•
•
•
on bandwidth of source signal.
Each encoded block contains 512 subband samples, with 50%
(256) overlap between successive samples.
For a 32 kbits/sec sample rate each block of samples is of 8
ms duration, the duration of each encoder is 16 ms.
Audio bandwidth (at 32 kbits/sec) is 15 KHz so each subband
has 62.5 Hz bandwidth.
Typical stereo bit rate is 192 kbits/sec.
Example: High Definition TV standard advanced television
(ATV). MPEG competitor in this area.
583
JJ
II
J
I
Back
Close
Streaming Audio (and video)
Popular delivery medium for the Web and other Multimedia networks
Real Audio (http://www.realaudio.com/), Shockwave
(http://www.macromedia.com) and Quicktime audio
(http://www.apple.com/quicktime) are examples of streamed audio
(and video)
584
• Need to compress and uncompress data in realtime.
• Buffered Data:
–
–
–
–
Trick get data to destination before it’s needed
Temporarily store in memory (Buffer)
Server keeps feeding the buffer
Client Application reads buffer
• Needs Reliable Connection, moderately fast too.
• Specialised client, Steaming Audio Protocol (PNM for real audio).
JJ
II
J
I
Back
Close
Multimedia Integration, Interaction and
Interchange
585
Integrating Multimedia
• So far we studied media independently
• Certain media (individually) are based on spatial
and/or temporal representations,
• Others may be static.
JJ
II
J
I
Back
Close
Integrating media (Cont.):
• Spatial and temporal implications become even more critical.
• E.g. static text may need to index or label a portion of video
at a given instant or segment of time
586
– Integration becomes temporal and spatial if the label is
placed at a given location (or locations moving over time).
JJ
II
J
I
Back
Close
Synchronisation
• Important to know the tolerance and limits for each medium
• Integration will require knowledge of these for synchronisation
and
587
– Indeed it creates further limits
– E.g. bandwidth of two media types increase, if audio is
encoded at a 48 KHz sampling rate and it needs to
accompany video being streamed out at 60 frames per
second
• Inter-stream synchronisation is not necessarily
straightforward.
JJ
II
J
I
Back
Close
Integrated standards
• It is common (obvious) that media types are bundled together
for ease of delivery, storage etc.
• Formats have been developed to support, store and deliver
media in an integrated form.
588
JJ
II
J
I
Back
Close
Interchange Between Applications
The need for interchange between different multimedia
applications:
• Running on different platforms
589
• Evolved common interchange file formats.
• Build on underlying individual media formats (MPEG, JPEG
etc.)
• Truly integrated to become multimedia — Spatial, temporal
structural and procedural constraints will exist between the
media.
• This especially true now that interaction is a common feature
of multimedia.
JJ
II
J
I
Back
Close
Interactive Multimedia
Modern multimedia presentation and applications are
becoming increasingly interactive.
• Simple interactions that simply start movie clips, audio
segments animations etc.
590
• Complex interactions between media is new available:
– Following hyperlinks is instinctively non-linear and
– Advent of digital TV is important
• Interactivity now needs to be incorporated as part of the
media representation/format.
• The MHEG format (see below) has been developed
expressly for such purposes.
JJ
II
J
I
Back
Close
Multimedia Interchange
The need for interchange formats are significant in several
applications:
• As a final storage model for the creation and editing of
multimedia documents.
591
• As a format for delivery of final form digital media.
E.g. Compact Discs/DVDs to end-use players.
• As a format for real-time delivery over a distributed network
• As a format for interapplication exchange of data.
JJ
II
J
I
Back
Close
Quicktime
Introduction
• QuickTime is the most widely used cross-platform multimedia
technology available today.
592
• QuickTime now has powerful streaming capabilities, so you
can enjoy watching live events as they happen.
• Developed by Apple The QuickTime 6 (2002) is the latest
version
• It includes streaming capabilities as well as the tools needed
to create, edit, and save QuickTime movies.
• These tools include the QuickTime Player, PictureViewer,
and the QuickTime Plug-in.
JJ
II
J
I
Back
Close
Quicktime Main Features
Versatile support for web-based media
• Access to live and storedstreaming media content with the
QuickTime Player
593
• High-Quality Low-Bandwidth delivery of multimedia
• Easy view of QuickTime movies (with enhanced control) in
Web Browsers and applications.
• Multi platform support.
• Built in support for most popular Internet media formats
(well over 40 formats).
• Easy import/export of movies in the QuickTime Player
JJ
II
J
I
Back
Close
Sophisticated playback capabilities
• Play back full-screen video
• Play slide shows and movies continuously
• Work with video, still-image, and sound files in all leading
formats
594
JJ
II
J
I
Back
Close
Easy content authoring and editing
• Create new QuickTime streaming movies by copying and
pasting content from any supported format
• Enhance movies and still pictures with filters for sharpening,
color tinting, embossing, and more
595
• Save files in multiple formats, including the new DV format
for high-quality video
• Create slide shows from pictures
• Add sound to a slide show
JJ
II
J
I
Back
Close
Quicktime Support of Media Formats
QuickTime is an open standard:
• Embraces other standards and incorporates them into its environment.
• It supports every major file format for pictures, including BMP, GIF, JPEG, PICT,
and PNG. Even JPEG 2000.
596
• QuickTime also supports every important professional file format for video,
including AVI, AVR, DV (Digital Video), M-JPEG, MPEG-1 – MPEG-4, and
OpenDML.
• All common Audio format — incl. MPEG-4 Structured Audio.
• MIDI standards support including as the Roland Sound Canvas sound set and
the GM/GS format extensions.
• Other multimedia — FLASH support.
• Other Multimedia integration standards — SMIL
• Key standards for web streaming, including HTTP, RTP, and RTSP as set forth
by the Internet Engineering Task Force, are supported as well.
• Speech Models — synthesised speech
• QuickTime supports Timecode tracks, including the critical
standard for video timecode (SMPTE) and for musicians.
JJ
II
J
I
Back
Close
QuickTime Concepts
To following concepts QuickTime are used by Quicktime:
Movies and Media Data Structures —
• A continuous stream of data —cf. a traditional movie, whether
•
•
•
•
•
•
stored on film, laser disk, or tape.
A QuickTime movie can consist of data in sequences from
different forms, such as analog video and CD-ROM.
The movie is not the medium; it is the organizing principle.
Contains several tracks.
Each track refers to a media that contains references to the
movie data, which may be stored as images or sound on hard
disks, floppy disks, compact discs, or other devices.
The data references constitute the track’s media.
Each track has a single media data structure.
597
JJ
II
J
I
Back
Close
Components —
• Provided so that every application doesn’t need to know about
•
•
•
•
•
•
all possible types of audio, visual, and storage devices.
A component is a code resource that is registered by the
Component Manager.
The component’s code can be available as a system wide
resource or in a resource that is local to a particular application.
Each QuickTime component supports a defined set of features
and presents a specified functional interface to its client
applications.
Applications are thereby isolated from the details of
implementing and managing a given technology.
For example, you could create a component that supports a
certain data encryption algorithm.
Applications could then use your algorithm by connecting to
your component through the Component Manager, rather than
by implementing the algorithm over again.
598
JJ
II
J
I
Back
Close
Image Compression —
• QuickTime movie can demand substantial storage than
single images.
• Minimizing the storage requirements for image data is
an important consideration for any application that works
with images or sequences of images.
599
• The Image Compression Manager provides the application
with an interface for compressing and decompressing.
• Independent of devices and algorithms.
JJ
II
J
I
Back
Close
Time —
• Time management in QuickTime is essential for
synchronisation
• QuickTime defines time coordinate systems, which anchor
movies and their media data structures to a common
temporal timeframe.
600
• A time coordinate system contains a time scale that
provides the translation between real time and the time
frame in a movie.
• Time scales are marked in time units.
JJ
II
J
I
Back
Close
Time (cont.) —
• The number of units that pass per second quantifies the
scale–that is, a time scale of 26 means that 26 units pass
per second and each time unit is 1/26 of a second.
• A time coordinate system also contains a duration, which
is the length of a movie or a media in the number of time
units it contains.
601
• Particular points in a movie can be identified by a time
value, the number of time units elapsed to that point.
• Each media has its own time coordinate system, which
starts at time 0.
• The Movie Toolbox maps each type of media data from
the movie’s time coordinate system to the media’s time
coordinate system.
JJ
II
J
I
Back
Close
The QuickTime Architecture
QuickTime comprises two managers:
Movie Toolbox and
602
Image Compression Manager .
QuickTime also relies on the Component Manager, as well
as a set of predefined components.
JJ
II
J
I
Back
Close
The QuickTime Architecture (Cont.)
The relationships of these managers and an application that
is playing a movie:
603
Figure 55: Quicktime Architecture
JJ
II
J
I
Back
Close
The Movie Toolbox
allows you to:
• store,
• retrieve, and
• manipulate time-based data
604
that is stored in QuickTime movies.
JJ
II
J
I
Back
Close
The Image Compression Manager :
Comprises a set of functions that compress and decompress
images or sequences of graphic images.
• Device and driver independent means of compressing and
decompressing images and sequences of images.
605
• A simple interface for implementing software and hardware
image-compression algorithms.
• System integration functions for storing compressed
images as part of PICT files,
• Ability to automatically decompress compressed PICT files.
• Most applications use the Image Compression Manager
indirectly — by calling Movie Toolbox functions or by
displaying a compressed picture.
• Can call Image Compression Manager functions directly
JJ
II
J
I
Back
Close
The Component Manager :
The Component Manager allows you to define and register
types of components and communicate with components
using a standard interface.
606
• A component is a code resource that is registered by the
Component Manager.
• The component’s code can be stored in a system wide
resource or in a resource that is local to a particular
application.
JJ
II
J
I
Back
Close
QuickTime Components
QuickTime includes several components
• These components provide useful/essential services to your
application and
607
• Essential to support the managers that make up the QuickTime
architecture.
JJ
II
J
I
Back
Close
QuickTime Components
Movie controller : Components, which allow applications to play
movies using a standard user interface standard image
compression dialog components, which allow the user to
specify the parameters for a compression operation by
supplying a dialog box or a similar mechanism
608
Image compressor : Components, which compress and
decompress image data sequence grabber components,
which allow applications to preview and record video and
sound data as QuickTime movies video digitizer components,
which allow applications to control video digitization by an
external device
JJ
Media data-exchange : Components, which allow applications
II
to move various types of data in and out of a QuickTime
J
movie derived media handler components, which allow QuickTime
I
to support new types of data in QuickTime movies
Back
Close
QuickTime Components (Cont.)
Clock : Components, which provide timing services defined
for QuickTime applications preview components, which are
used by the Movie Toolbox’s standard file preview functions
to display and create visual previews for files sequence grabber
components, which allow applications to obtain digitized data
from sources that are external to a Macintosh computer
609
Sequence grabber : Channel components, which manipulate
captured data for a sequence grabber component
Sequence grabber panel : Components, which allow sequence
grabber components to obtain configuration information from
the user for a particular sequence grabber channel component
JJ
II
J
I
Back
Close
Open Media Framework Interchange (OMFI) Format
The OMFI is a common interchange framework developed
in response to an industry led standardisation effort (including
Avid — a major digital video/audio hardware/applications vendor)
610
Like Quicktime the primary concern of the OMFI format is
concerned with temporal representation of media (such as video
and audio) and a track model is used.
JJ
II
J
I
Back
Close
Target: Video/Audio Production
The primary emphasis is video production and an number of
additional features reflect this:
• Source (analogue) material object represent videotape and
film so that the origin of the data is readily identified. Final
footage may resort to this original form so as to ensure highest
possible quality.
611
• Special track types store (SMPTE) time codes for segments
of data.
• Transitions and effects for overlapping and sequences of
segments are predefined.
• Motion Control — the ability to play one track at a speed
which is a ratio of the speed of another track is supported.
JJ
II
J
I
Back
Close
OMFI Format/Support
The OMFI file format incorporates:
• A header — including references for objects contained in file
• Object dictionary — to enhance the OMFI class hierarchy in
an application
612
• Object data
• Track data
OMFI Support:
• Main Video development tools including
Apple Final Cut Pro, Xpress (Pro/DV), Softimage
• Main Audio development tools including:
Protools, Cakewalk/Sonar 2.0
JJ
II
J
I
Back
Close
Multimedia and Hypermedia Information
Encoding Expert Group (MHEG)
613
• Arose directly out of the increasing convergence of broadcast and interactive
technologies — DIGITAL INTERACTIVE TV
• Specifies an encoding format for multimedia applications independently of service
paradigms and network protocols.
• Like Quicktime and OMFI it is concerned with time-based media objects, whose
encodings are determined by other standards.
• Scope of MHEG is large in that it directly supports interactive media and real-time
delivery over networks.
• The current widespread standard is MHEG-5 but standards exist up to MHEG-8.
JJ
II
J
I
Back
Close
Practical MHEG: Digital Terrestrial TV
• Media interchange format in Digital TV set top boxes
• In the UK,
– ITV digital — WENT BUST (2002) !!!
614
– Freeview digital terrestrial (2002)
• MHEG is also widely used in European Digital TV.
JJ
II
J
I
Back
Close
Digital TV Group UK
• UK digital TV interests are managed by the Digital TV Group
UK — http://www.dtg.org.uk/.
• Alternative (satellite) digital TV interest: SKY,
– uses a proprietary API format, called OPEN (!!).
615
– MHEG advantage: is a truly open format (ISO standard).
– MHEG is the only open standard in this area.
Further reading:
http://www.dtg.org.uk/reference/mheg/ mheg index.html
JJ
II
J
I
Back
Close
Digital TV services
What sort of multimedia services does digital TV provide?
616
Figure 56: UK Digital TV Consortium
JJ
II
J
I
Back
Close
The family of MHEG standards
Version Complete Name
MHEG-1 MHEG object representation-base
notation (ASN.1)
MHEG-2 MHEG object representation-alternate
notation (SGML)
MHEG-3 MHEG script interchange representation
MHEG-4 MHEG registration procedure
MHEG-5 Support for base-level interactive applications
MHEG-6 Support for enhanced interactive applications
MHEG-7 Interoperability and conformance testing
for ISO/IEC 13522-5
617
Table 1: MHEG Standards
JJ
II
J
I
Back
Close
MHEG Standards Timeline
Version
MHEG-1
MHEG-2
MHEG-3
MHEG-4
MHEG-5
MHEG-6
Status
International standard
Withdrawn
International standard
International standard
International standard
International standard
(1998)
MHEG-7
International standard
(1999)
MHEG-8 (XML) Draft international standard
(Jan 1999)
Table 2: MHEG Standards Timeline
618
JJ
II
J
I
Back
Close
MHEG-5 overview
The major goals of MHEG-5 are:
• To provide a good standard framework for the development
of client/server multimedia applications intended to run on a
memory-constrained Client.
619
• To define a final-form coded representation for interchange
of applications across platforms of different versions and
brands.
• To provide the basis for concrete conformance levelling,
guaranteeing that a conformant application will run on all
conformant terminals.
• To allow the runtime engine on the Client to be compact and
easy to implement.
• To be free of strong constraints on the architecture of the
Client.
JJ
II
J
I
Back
Close
MHEG-5 Goals (Cont.)
• To allow the building of a wide range of applications —
providing access to external libraries. Only be partly portable.
620
• To allow for application code that is guaranteed to be “safe”.
• To allow automatic static analysis of (final-form) application
code in order to help insure bug-free applications and
minimize the debugging investment needed to get a robust
application.
• To promote rapid application development by providing
high-level primitives and provide a declarative paradigm for
the application development.
JJ
II
J
I
Back
Close
MHEG-5 Model
The MHEG-5 model is object-oriented.
621
The actions are methods targeted to objects from different
classes to perform a specific behavior and include:
• Preparation,
• Activation,
• Controlling the presentation,
• User interaction,
• Getting the value of attributes,
• and so on.
JJ
II
J
I
Back
Close
MHEG Client-Server Interaction
622
Figure 57: MHEG Client-Server Interaction
JJ
II
J
I
Back
Close
MHEG Programming Principles
OBJECT ORIENTED — simple Object-oriented implementation
MHEG-5 provides suitable abstractions for
623
• managing active, autonomous, and reusable entities
• pure object-oriented approach.
JJ
II
J
I
Back
Close
Basic MHEG Class Structure
An MHEG class is specified by three kinds of properties:
• Attributes that make up an object’s structure,
624
• Events that originate from an object, and
• Actions that target an object to accomplish a specific behavior
or to set or get an attribute’s value.
JJ
II
J
I
Back
Close
Main MHEG classes
The most significant classes of MHEG-5 are now briefly
described:
Root — A common Root superclass provides a uniform object
identification mechanism and specifies the general semantics
for preparation/destruction and activation/deactivation of
objects, including notification of changes of an object’s
availability and running status.
625
Group — This abstract class handles the grouping of objects in
the Ingredient class as a unique entity of interchange.
Group objects can be addressed and independently
downloaded from a server.
A Group can be specialized into Application and Scene
classes.
JJ
II
J
I
Back
Close
Main MHEG classes (Cont.)
Application — An MHEG-5 application is structurally organized
into one Application and one or more Scene objects.
• The Application object represents the entry point that
performs a transition to the presentation’s first Scene.
• Generally, this transition occurs at startup because a
presentation can’t happen without a Scene running.
• The Launch action activates an Application after quitting
the active Application.
• The Quit action ends the active Application, which also
terminates the active Scene’s presentation.
• The Ingredients of an Application are available to the
different Scenes that become active, thereby allowing an
uninterrupted presentation of contents
• E.g. a bitmap can serve as the common background for
all Scenes in an Application.
626
JJ
II
J
I
Back
Close
Main MHEG classes (Cont.)
Scene — This class allows spatially and temporally coordinated
presentations of Ingredients.
• At most, one Scene can be active at one time.
• Navigating within an Application is performed via the
TransitionToaction that closes the current Scene,
including its Ingredients, and activates the new one.
627
• The SceneCoordinateSystem attribute specifies the
presentation space’s 2D size for the Scene.
• If a user interaction occurs in this space, a UserInput
event is generated.
• A Scene also supports timers.
• A Timer event is generated when a timer expires.
JJ
II
J
I
Back
Close
Main MHEG classes (Cont.)
Ingredient — Abstract class provides the common behavior for
all objects included in an Application or a Scene.
• The OriginalContent attribute maps object and content
data
• The ContentHook attribute specifies the encoding format
for the content.
• The action Preload gives hints to the RTE for making
the content available for presentation.
– Especially for streams, this action does not completely
download the content, it just sets up the proper network
connection to the site where the content is stored.
• The action Unload frees allocated resources for new
content.
The Presentable, Stream, and Link classes are subclasses of the Ingredient
class.
628
JJ
II
J
I
Back
Close
Ingredient subclasses
Presentable — This abstract class specifies the common aspects
for information that can be seen or heard by the user. The
Run and Stop actions activate and terminate the
presentation, while generating the IsRunning and
IsStopped events.
629
Visible — The Visible abstract class specializes the
Presentable class with provisions for displaying objects in
the active Scene’s presentation space.
The OriginalBoxSize and OriginalPosition attributes
respectively specify the size and position of the object’s
bounding box relative to the Scene’s presentation space.
The actions SetSize and SetPosition change the current
values of these attributes.
JJ
II
J
I
Back
Close
Visible Object Classes
The specialized objects in the Visible class include:
• Bitmap — This object displays a 2D array of pixels. The
Tiling attribute specifies whether the content will be replicated
throughout the BoxSize area.
The action ScaleBitmap scales the content to a new size.
630
Example, to create a simple bitmap object:
(bitmap: BgndInfo
content-hook: #bitmapHook
content-data: referenced-content:
"Info.bitmap"
box-size: ( 320 240 )
original-position: ( 0 0 )
)
JJ
II
J
I
Back
Close
Visible Object Classes (Cont.)
• LineArt, DynamicLineArt — A LineArt is a
vector representation of graphical entities, like polylines and
ellipses.
DynamicLineArt draws lines and curves on the fly in the
BoxSize area.
631
• Text — This object represents a text string with a set of
rendition attributes. Essentially, these attributes specify fonts
and formatting information like justification and wrapping.
JJ
II
J
I
Back
Close
Ingredient subclasses (Cont.)
Stream — This class controls the synchronized presentation of
multiplexed audio-visual data (such as an MPEG-2 file).
• A Stream object consists of a list of components from the
Video, Audio, and RTGraphics (animated graphics) classes.
• The OriginalContent attribute of the Stream object refers
to the whole multiplex of data streams.
• When a Stream object is running, its streams can be switched
on and off independently — allows users switch between
different audio trails (different languages) or choose which video
stream(s) to present among a range of available ones.
• Specific events are associated with playback:
StreamPlaying/StreamStopped notifies the actual
initiation/termination and CounterTrigger notifies the system
when a previously booked time-code event occurs.
632
JJ
II
J
I
Back
Close
Ingredient subclasses (Cont.)
Link — The Link class implements event-action behavior by a
condition and an effect.
• The LinkCondition contains
– An EventSource — a reference to the object on which
the event occurs
– An EventType — specifies the kind of event and a
possible EventData that is a data value associated
with the event.
633
• MHEG-5 Action objects consist of a sequence of
elementary actions.
• Elementary actions are comparable to methods in standard
object-oriented terminology.
• The execution of an Action object means that each of its
elementary actions are invoked sequentially.
JJ
II
J
I
Back
Close
Simple Link Example
As an example, consider the following Link, which transitions
to another Scene when the character A is entered in the
EntryField EF1.
634
Example, to create a simple link:
(link: Link1
event-source: EF1
event-type: #NewChar
event-data: ’A’
link-effect:
(action: transition-to: Scene2)
)
JJ
II
J
I
Back
Close
Interactible Object Class
Interactible — This abstract class provides a way for users to
interact with objects within the following sub-classes:
635
Hotspot, PushButton, and SwitchButton —
These subclasses implement button selection capability
and generate the IsSelected event.
Example, to create a simple SwitchButton:
(switchbutton: Switch1
style: #radiobutton
position: ( 50 70 )
label: "On"
)
JJ
II
J
I
Back
Close
Interactible Object Class (Cont.)
Hypertext — This class extends the Text class with anchors.
When selected, these anchors link text content to associated
information.
Slider and EntryField — Respectively, these objects let users
adjust a numeric value (such as the volume of an audio
stream) and edit text.
636
Example, to create a simple slider:
(slider: Slider1
box-size: ( 40 5 )
original-position: ( 100
max-value: 20
orientation: #right
)
100 )
JJ
II
J
I
Back
Close
UK Digital Terrestrial MHEG Support:EuroMHEG
• Above only some main classes of MHEG addressed.
• Few other classes omitted.
• Gain a broad understanding of how MHEG works
637
• Basic classes that support this.
MHEG Class Support:
• Not all MHEG engines support all MHEG classes.
• UK digital TV MHEG:
– needed to be initially restricted
– to meet production timescales for the launch.
– EuroMHEG standard was thus defined.
– EuroMHEG: extensible so as to be able to include updates
and additions in due course.
JJ
II
J
I
Back
Close
EuroMHEG Classes
The MHEG classes supported by EuroMHEG are:
Root
Scene
Program
Palette
Variable
OctetStringVariable
Presentable
ListGroup
LineArt
Text
Video
Slider
Button
SwitchButton
Group
Ingredient
ResidentProgram
Font
BooleanVariable
ObjectRefVariable
TokenManager
Visible
Rectangle
Stream
RTGraphics
EntryField
HotSpot
Action
Application
Link
RemoteProgram
CursorShape
IntegerVariable
ContentRefVariable
TokenGroup
Bitmap
DynamicLineArt
Audio
Interactible
HyperText
PushButton
638
JJ
II
J
I
Back
Close
Interaction within a Scene
The MHEG application is event-driven, in the sense that all
actions are called as the result of an event firing a link.
Events can be divided into two main groups:
• Asynchronous events are events that occur asynchronously
to the processing of Links in the MHEG engine. These include
timer events and user input events. An application area
of MHEG-5 (such as DAVIC) must specify the permissible
UserInput events within that area.
639
asynchronous events are queued.
• Synchronous events are events that can only occur as the
result of an MHEG-5 action being targeted to some objects.
A typical example of a synchronous event is IsSelected,
which can only occur as the result of the MHEG-5 action
Select being invoked.
JJ
II
J
I
Back
Close
MHEG Engine Basics
The mechanism at the heart of the MHEG engine:
1. After a period of of idleness, an asynchronous event occurs —e.g.
a user input event, a timer event, a stream event, or some other
type of event.
640
2. Possibly, a link that reacts on the event is found. This link is then
fired. If no such link is found, the process starts again at 1.
3. The result of a link being fired is the execution of an action object,
which is a sequence of elementary actions. These can change
the state of other objects, create or destroy other objects, or
cause events to occur.
4. As a result of the actions being performed, synchronous events
may occur. These are dealt with immediately, i.e., before processing
any other asynchronous events queued.
When all events have been processed, the process starts again at
1.
JJ
II
J
I
Back
Close
Availability; Running Status
Before doing anything to an object, the MHEG-5 engine must
prepare it
• Preparing an object typically entails retrieving it from the
server, decoding the interchange format and creating the
corresponding internal data structures, and making the
object available for further processing.
641
• The preparation of an object is asynchronous; its completion
is signalled by an IsAvailable event.
• All objects that are part of an application or a scene have a
RunningStatus, which is either true or false.
• Objects whose RunningStatus is true are said to be
running, which means that they perform the behaviour they
are programmed for.
JJ
II
J
I
Back
Close
RunningStatus (Cont.)
More concretely, these are the rules governed by
RunningStatus:
642
• Only running Visibles are actually visible on the screen,
• Only running Audio objects are played out through the
loudspeaker,
• Only running Links will execute the action part if the
associated event occurs, etc.
JJ
II
J
I
Back
Close
Interactibles
The MHEG-5 mix-in class Interactible groups some
functionality associated with user interface-related objects:
643
E.g. Slider, HyperText, EntryField, Buttons.
These objects can all be highlighted
• by setting their HighlightStatus to True.
They also have the attribute InteractionStatus, which,
when set to true, allows the object to interact directly with the
user, thus bypassing the normal processing of UserInput
events by the MHEG-5 engine.
JJ
II
J
I
Back
Close
Interactibles (Cont.)
Exactly how an Interactible reacts when its
InteractionStatus is true is implementation-specific.
644
Example:
• The way that a user enters characters in an EntryField
can be implemented in different ways in different MHEG-5
engines.
At most one Interactible at a time can have its
InteractionStatus set to True.
JJ
II
J
I
Back
Close
Visual Representation
For objects that are visible on the screen, the following rules
apply :
• Objects are drawn downwards and to the right of their position
on the screen. This point can be changed during the life
cycle of an object, thus making it possible to move objects.
645
• Objects are drawn without scaling. Objects that do not fit
within their bounding box are clipped.
• Objects are drawn with ”natural” priority, i.e., on top of already
existing objects. However, it is possible to move objects to
the top or the bottom of the screen, as well as putting them
before or after another object.
• The screen can be frozen, allowing the application to perform
many (possibly slow) changes and not update the screen
until it’s unfrozen.
JJ
II
J
I
Back
Close
Object Sharing Between Scenes
It is possible within MHEG-5 to share objects between some
or all scenes of an Application.
646
Example, Sharing can be used:
• To have variables retain their value over scene changes, or
• To have an audio stream play on across a scene change.
Shared objects are alway contained in an Application
object:
• Since there is always exactly one Application object
running whenever a scene is running, the objects contained
in an Application object are visible to each of its scenes.
JJ
II
J
I
Back
Close
MHEG Object Encoding
The MHEG-5 specification does not prescribe any specific formats
for the encoding of
content.
• For example, it is conceivable that a Video object is encoded
as MPEG or as motion-JPEG.
647
This means that the group using MHEG-5 (E.g. EuroMHEG)
must define which content encoding schemes to apply for the
different objects in order to achieve interoperability.
However, MHEG-5 does specify a final-form encoding of the
MHEG-5 objects themselves.
• This encoding is an instance of ASN.1, using the Basic
Encoding Rules (BER).
JJ
II
J
I
Back
Close
MHEG Coding Examples: A Simple MHEG Example
Left is a very simple scene
that displays a bitmap and
text.
648
• The user can press the
’Left’ Mouse (or input
other device) button and
• A transition is made
from the current scene,
InfoScene1, to a new
scene, InfoScene2.
JJ
II
J
I
Back
Close
The pseudo-code from the above scene may look like the
following:
(scene:InfoScene1
<other scene attributes here>
group-items:
(bitmap: BgndInfo
content-hook: #bitmapHook
original-box-size: (320 240)
original-position: (0 0)
content-data: referenced-content: "InfoBngd"
)
(text:
content-hook: #textHook
original-box-size: (280 20)
original-position: (40 50)
content-data: included-content: "1. Lubricate..."
)
links:
(link: Link1
event-source: InfoScene1
event-type: #UserInput
event-data: #Left
link-effect: action: transition-to: InfoScene2
)
)
649
JJ
II
J
I
Back
Close
An MHEG Player Java Applet — Further MHEG Examples
The Technical University of Berlin have produced a MHEG
Java Engine:
http://www.prz.tu-berlin.de/ joe/mheg/mheg engine.html
650
• Java Class libraries (with JavaDoc documentation) and details
on installation/compilation etc. are also available.
• Several examples of MHEG coding, including an MHEG
Introduction written in MHEG.
JJ
II
J
I
Back
Close
Running the MHEG Engine
The MHEG engine exists as a Java applet and supporting
Class Libraries:
• you can of course uses the class library in your own java
code (applications and applets).
651
The MHEG engine is available in the MHEG Examples on the
Multimedia Lecture Examples Web Page.
You can run the applet through any Java enabled Web browser
or applet viewer.
JJ
II
J
I
Back
Close
Running the MHEG Engine Applet
Here as an example of how to run the main applet provided
for the demo MHEG example:
652
<applet name="MHEG 5 Engine"
code="mheg5/POM/Mheg5Applet.class"
codebase="applications/"
archive="mhegwww.zip"
width="510"
height="346"
align="center"
<param name="objectBasePath" value="file:.">
<param name="groupIdentifier"
value="demo/startup">
<param name="mon" value="false">
</applet>
JJ
II
J
I
Back
Close
Running the MHEG Engine Applet: Own Applications
If you use the applet yourself you may need to change:
• The code and codebase paths — these specify where the
applications and applet classes reside.
653
• The groupIdentifiervalue — for most of the application
demos a startup MHEG file is reference first in a folder for
each application.
See other examples below.
JJ
II
J
I
Back
Close
MHEG Example — The Simple MHEG Presentation
The Simple example produces the following output:
654
Figure 58: MHEG Simple Application Example
The presentation creates:
• Two buttons, labelled “Hello” and “World” respectively, and
• Some rectangle graphics.
• When pressed the button is brought to the foreground of the
display.
JJ
II
J
I
Back
Close
MHEG Example — The Simple MHEG Presentation Structure
The MHEG modules for this presentation are:
startup — calls helloworld.mheg
helloworld.mheg — sets up main presentation calls
scene1.mheg
655
scene1.mheg — called in helloworld.mheg
JJ
II
J
I
Back
Close
MHEG Example — The Demo MHEG Presentation
The Demo example produces the output:
656
Figure 59: MHEG Demo application Example
As can be seen many of the key features of MHEG are illustrated
in further sub-windows (click on button to move to respective
window). Try these out for yourself.
JJ
II
J
I
Back
Close
The following MHEG modules are used:
startup — Initial module
main.mhg — Called by startup
disp1.mhg — input from numeric keys 1 and 2 to tile rectangles
(Fig 60)
657
disp2.mhg — input from numeric keys 1 and 2 to tile rectangles
(different display) (Fig 61)
text.mhg — illustrates MHEG control of text display (Fig 62)
intact.mhg — illustrates MHEG interactive objects (Fig 63)
bitmap1.mhg — illustrates MHEG display of bitmaps
bitmap2.mhg — illustrates MHEG display of bitmaps
ea.mhg — illustrates MHEG elementary actions (Fig 64)
allcl.mhg — MHEG concrete classes and elementary actions
(Fig 65)
JJ
II
J
I
Back
Close
658
Figure 60: MHEG Demo application Display1 Example
JJ
II
J
I
Back
Close
659
Figure 61: MHEG Demo application Display2 Example
JJ
II
J
I
Back
Close
660
Figure 62: MHEG Demo application Text Example
JJ
II
J
I
Back
Close
661
Figure 63: MHEG Demo application Interactive Objects Example
JJ
II
J
I
Back
Close
662
Figure 64: MHEG Demo application Elementary Actions Example
JJ
II
J
I
Back
Close
663
Figure 65: MHEG Demo application Conctrete Classes Example
JJ
II
J
I
Back
Close
token.mhg — MHEG token groups example (Fig 66)
664
Figure 66: MHEG Demo application Token Groups Example
JJ
II
J
I
Back
Close
More Examples
Further examples are available in the applications folder:
bitmap — further examples of bitmaps in MHEG
interacting — further examples of interaction in MHEG
665
intvar — integer variables
jmf — video and audio
quiz2 — MHEG quiz in MHEG
text — further text in MHEG.
JJ
II
J
I
Back
Close
MHEG Relationships to Major Standards
Important relationships exist between MHEG-5 and other
standards and specifications.
Davic (Digital Audio Visual Council) — aims to maximize
interoperability across applications and services for the
broadcast and interactive domains.
666
Davic 1.0 selected MHEG-5 for encoding base level
applications and Davic 1.1 relies on MHEG-6 to extend these
applications in terms of the Java virtual machine that uses
services from the MHEG-5 RTE.
DVB (Digital Video Broadcasting) — provides a complete
solution for digital television and data broadcasting across a
range of delivery media where audio and video signals are
encoded in MPEG-2.
JJ
II
J
I
Back
Close
MHEG Relationships to Major Standards (Cont.)
MPEG — family of standards used for coding audiovisual
information (such as movies, video, and music) in a digital
compressed format.
MPEG-1 and MPEG-2 streams are likely to be used by
MHEG-5 applications, which can easily control their playback
through the facilities provided by the Stream class.
667
DSMCC (Digital Storage Media Command and Control) — a
set of protocols for controlling and managing MPEG streams
in a client-server environment.
The user-to-user protocol (both the client and server are
users) consists of VCR commands for playback of streams
stored on the server, as well as commands for downloading
other data (bitmaps, text, and so on).
JJ
II
J
I
Back
Close
MHEG Implementation
Several components may be requires in implementing and MHEG
systems:
Runtime Engine (RTE) — MHEG-5 runtime engines generally
run across a client-server architecture
668
• See The Armida (ATM) system (Figure 67) referenced
below for an example application,
• Also the Java MHEG Engine previously mentioned.
JJ
II
J
I
Back
Close
MHEG Implementation (Cont.)
669
Figure 67: Armedia Client Architecture
Armedia is a client-server based interactive multimedia application
retrieval system.
JJ
II
J
I
Back
Close
MHEG Implementation (Cont.)
A preceding Start-up Module may be used to perform general
initialization etc.:
• The client can be launched either as an autonomous Windows
application or
670
• As a plug-in by an HTML browser, allowing seamless
navigation between the World Wide Web and the webs of
MHEG-5 applications. (See Armida system for more details).
• Java RTE also available
JJ
II
J
I
Back
Close
Run Time Engine (RTE)
The MHEG-5 RTE is the kernel of the client’s architecture. It
performs
• The pure interpretation of MHEG-5 objects and,
671
• As a platform-independent module, issues I/O and data
access requests to other components that are optimized for
the specific runtime platform.
The RTE performs two main tasks.
• Prepares he presentation and handles accessing, decoding,
and managing MHEG-5 objects in their internal format.
• The actual presentation, which is based on an event loop
where events trigger actions.
These actions then become requests to the Presentation
layer along with other actions that internally affect the engine.
JJ
II
J
I
Back
Close
Presentation layer
The presentation layer (PL)
• manages windowing resources,
• deals with low-level events, and
• performs decoding and rendering of contents from different
media to the user.
672
• functionality via an object-oriented API
JJ
II
J
I
Back
Close
Access module
This module provides a consistent API for accessing information
from different sources.
It’s used by the RTE to get objects and the PL to access
content data (either downloaded or streamed).
673
Typical applications should support:
• Bulk download for bitmaps, text, and MHEG-5 objects;
and
• Progressive download for audio and audiovisual streams.
JJ
II
J
I
Back
Close
DSMCC Interface—
The implementation of these mechanisms occurs via the
DSMCC interface.
• The user has full interactive control of the data presentation,
including playback of higher quality MPEG-2 streams
delivered through an ATM network.
674
• Object and content access requests can also be issued to
the Web via HTTP — may not yet provide adequate
quality-of-service (QoS).
• When accessing the broadcast service, the Access module
requires the DVB channel selection component to select the
program referred to by a Stream object.
JJ
II
J
I
Back
Close
MHEG Authoring Tools: MediaTouch —
The availability of an adequate authoring tool is mandatory
to create the MHEG applications. The MediaTouch (Figure 68)
application is one example developed for the Armida System
( http://drogo.cselt.stet.it/ufv/ArmidaIS/home en.htm ).
675
JJ
II
J
I
Back
Close
MediaTouch (Cont.)
It is a visual-based Hierarchical Iconic authoring tool, similar to
Authorware in many approaches.
676
Figure 68: MediaTouch MHEG Authoring Tool
(Hierarchy and Links Editor windows)
JJ
II
J
I
Back
Close
MHEG Authoring Tools: MHEGDitor
MHEGDitor is an MHEG-5 authoring tool based on
Macromedia Director, composed of:
• An Authoring Xtra to edit applications — It opens a window
to set preferences and links a specific external script
castLib to your movie for you to create specific MHEG
behaviours rapidly. You can test your application on the
spot within Director, as if it were played by MHEGPlayer,
the MHEG interpreter companion of MHEGDitor.
• Converter Xtra to convert resulting movies into MHEG-5
applications — convert Macromedia Director movies
(edited with MHEGDitor Authoring Xtra) into a folder
containing all necessary items for an MHEG-5 application.
• The two MHEGDitor Xtras work separately.
677
JJ
II
J
I
Back
Close
MHEG writing tools: MHEG Write
An editor to create and manipulate MHEG-5 applications by
hand
• Based on the free software ”VIM”, which is
available via Internet from various sites for virtually all
operating systems
678
• The MHEGWrite extension supports only the MHEG-5
textual notation.
• provides macros for object templates, syntax highlighting
and syntax error detection.
JJ
II
J
I
Back
Close
Playing MHEG files — There are a few ways to play MHEG
files:
• MHEGPlayer is an MHEG-5 interpreter which is able to
execute MHEG-5 applications developed with
MHEGDitor or any other authoring tool.
679
• MHEG Java Engine — Java Source code exists to compile
a platform-independent MHEG player
( http://enterprise.prz.tu-berlin.de/imw/)
• MHEG plug-ins for Netscape Browsers and Internet
Explorer have been developed,
Note: That a Web to MHEG Converter Also Exists
JJ
II
J
I
Back
Close
MHEG Future
Several companies and research institutes are currently
developing MHEG tools and applications and conducting
interoperability experiments for international projects and
consortia.
680
The MHEG Support Center is a European project that hopes
to implement and operate an MHEG support and conformance
testing environment for developers of multimedia systems and
applications.
JJ
II
J
I
Back
Close
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement