Managing sound in a relational multimedia database system.

Managing sound in a relational multimedia database system.
Calhoun: The NPS Institutional Archive
DSpace Repository
Theses and Dissertations
Thesis and Dissertation Collection
1988
Managing sound in a relational multimedia
database system.
Sawyer, Gregory R.
Monterey, California. Naval Postgraduate School
http://hdl.handle.net/10945/23068
Downloaded from NPS Archive: Calhoun
a,3T, Oiiuifiw^*."'
NAVAL POSTGRADUATE SCHOOL
Monterey, California
THESIS
MANAGING SOUND
IN
A RELATIONAL MULTIMEDIA
DATABASE SYSTEM
by
Gregory Russell Sawyer
December 1988
Thesis Advisor:
Approved
for public release; distribution
Vincent Y.
is
Lum
unlimited.
T242326
CURITY CLASSIC CATION Qc
Ti^'S
-AG£
REPORT DOCUMENTATION PAGE
R£?0«' SECuR:'v ClASSiF.CATiQN
RESTRICTIVE
lb
MARKINGS
UNCLASSIFIED
SECURITY C.ASSiF. CATION AUTHORITY
Approved for public release;
Distribution is unlimited
/DOWNGRADING SCHEDULE
DECLASSIFICATION
.
PERFORMING ORGANIZATION REPORT NUMBER(S)
NAME OF PERFORMING ORGANIZATION
(If
{City. State,
and
SYMBOL
NAME OF MONITORING ORGANIZATION
7a
applicable)
NAVAL POSTGRADUATE SCHOOL
CODE 52
ZIP Code)
ADDRESS
7b
(City,
State,
and
ZIP Code)
and
ZIP Code)
93943-5000
PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER
9
ADDRESS
(Oty, State,
Monterey, CA
93943-5000
onterey. CA
MONITORING ORGANIZATION REPORT NUMBER(S)
5
6d OFFICE
^VAL POSTGRADUATE SCHOOL
ADDRESS
DISTRIBUTION /AVAILABILITY OF REPORT
3
SOURCE OF FUNDING NUMBERS
10
TITLE (Include Security Classification)
I
ANAGING SOUND IN A RELATIONAL MULTIMEDIA DATABASE SYSTEM
PERSONAL AUTHOR(S)
2
SAUTER, GREGORY R.
TYPE OF REPORT
Ba
3b
TIME
COVERED
DATE OF REPORT
14
{Year. I^onthi. Day)
15
PAGE COUNT
l^^
1988 December
ASTER'S THESIS
olicy or position of the Department of Defense or the U.S. Government.
COSATi CODES
9.
ABSTRACT (Continue on reverse
Sound,
I
IB
SUBJECT TERMS {Continue on reverse
if
necessary
and
identify by block
number)
multimedia, multimedia database, digitizing, encoding,
abstract data type, sound, operations, sound handling
if
necessary
and
identify by block
in all of its varied forms,
number)
is an important and widely used
medium for the
The widespread use of computers has greatly increased the
transmission of information.
breadth and depth of our information processing abilities.
Yet the limited sensory
functionality of computers has traditionally dictated a predominantly alphanumeric or
This thesis concentrates on the effective
"textual" communications interface standard.
manipulation (i.e., capture, storage and retrieval) of sound data in a relational database
system.
It
introduces the concept of an abstract data type of type SOUND which permits
DISTRIBUTION/AVAILABILITY OF ABSTRACT
20
B UNCLASSIFIED-UNLIMITED D
22a
21
SAME AS RPT
84
MAP
UNCLASSIFIED
22b TELEPHONE f/nc/ude Area Code)
(408)
Professor Vincent Y. Lum
3D FORM 1473,
ABSTRACT SECURITY CLASSIFICATION
Q DTIC USERS
NA^/E OF RESPONSIBLE INDIVIDUAL
83
APR
edition
All
may be used
until
646-2449
exhausted
SYMBOL
Code 52Lu
z2c OFFICE
SECURITY CLASSIFICATION OF THIS PAGE
other editions are obsolete
1986-«06-34.
UNCLASSIFIED
UNCLASSIFIED
• CCUniTY
CLAMiriCATION Or THIS PAOC
BLOCK 19(contlnued)
a
level of sophistication in data manipulations that is beyond the capabilities of
current systems.
Such sophistication is accomplished through the use of a set of
data manipulation operations which effectively hide the representation of the SOUND
data structure from the user.
As a result,
the current familiarity of the user's
view of the database remains unchanged when extended to the multimedia information
processing environment.
UNCLASSIFIED
SECURITY CLASSIFICATION OF THIS PAGE
Approved
for public release; distribution
is
unlimited.
MANAGING SOUND IN A RELATIONAL MULTIMEDIA
DATABASE SYSTEM
by
Gregory R. Sawyer
Lieutenant Commander/United States
B.S., United States
Navy
Naval Academy, 1977
Submitted in partial fulfillment of the
requirements for the degree of
MASTER OF SCIENCE
IN
COMPUTER SCIENCE
from the
NAVAL POSTGRADUATE SCHOOL
December 1988
ABSTRACT
Sound, in
all
of
its
varied forms,
transmission of information.
is
an important and widely used
The widespread use of computers has
breadth and depth of our information processing
abilities.
medium
for the
greatly increased the
Yet the limited sensory
functionality of computers has traditionally dictated a predominantly alphanumeric or
communications interface standard. This
"textual"
manipulation
system.
It
(i.e.,
thesis concentrates
capture, storage and retrieval) of sound data in a relational database
introduces
systems. Such sophistication
is
is
beyond the
accomplished through the use of a
operations which effectively hide the representation of the
As
SOUND
the concept of an abstract data type of type
a level of sophistication in data manipulations that
user.
on the effective
a result, the current familiarity of the user's
unchanged when extended
to the
which permits
capabilities of current
set
of data manipulation
SOUND data structure from the
view of the database remains
multimedia information processing environment.
UAVAL P03TGBADUATE SCHOi
MONTEREY, CALIFORNIA 939^
TABLE OF CONTENTS
I.
II.
INTRODUCTION
BACKGROUND
B.
PURPOSE OF THESIS
C.
AN OVERVIEW
3
D.
ORGANIZATION OF THESIS
6
IV.
V.
1
A DESCRIPTION OF SOUND ENCODING TECHNIQUES
2
7
A.
INTRODUCTION
7
B.
INTRODUCTION TO SOUND
7
C.
TYPES OF ENCODING
D.
III.
1
A.
12
1.
Waveform Encoding
13
2.
Parameter
Encoding
16
TRANSFORMATION BETWEEN TYPES
MULTIMEDIA MASS STORAGE DEVICES
18
20
A.
CONVENTIONAL SYSTEMS
20
B.
OPTICAL DISKS
22
OVERVIEW OF RELATED WORK
25
A.
INTRODUCTION
25
B.
OVERVIEW
25
THE SOUND DATABASE AND INFORMATION SYSTEM
30
A.
SOUND MEDIA MANAGEMENT
30
B.
A MULTIMEDIA ARCHITECTURE
31
C.
THE SOUND DATA TYPE
37
1.
Sound Data Organization
37
2.
The
User's
View
39
3.
VI.
Vn.
Implementation of the Abstract Data Type
DESCRIPTION OF A SOUND
MANAGEMENT PROTOTYPE
46
51
A.
ARCHITECTURE OF A PROTOTYPE
51
B.
MODEL
53
C.
IMPLEMENTATION CONSIDERATIONS OF THE MODEL
SPECIFIC EQUIPMENT
SUMMARY AND CONCLUSIONS
55
57
A.
REVIEW OF THESIS
57
B.
APPLICATIONS
57
C.
FUTURE RESEARCH AREAS
59
LIST OF REFERENCES
60
APPENDIX A
62
APPENDIX B
-
-
THE SQL PREPROCESSOR OVERVIEW
THE INTERNAL SOUND HANDLER FUNCTIONS
68
BIBLIOGRAPHY
96
INITIAL DISTRIBUTION LIST
99
OF FIGURES
LIST
Figure
1.
Figure
2.
Figure
3.
Properties of a simple sinusoidal
DSP Hardware
How
waveform
Configuration for Coder Designs
an Optical Disk
Works
The Conceptual Model of
Figure
4.
Figure
5.
The Components of
Figure
6.
Architecture of the Sound
a
a
Multimedia Object
Multimedia System
Media Prototype
9
13
23
34
36
52
I.
A.
INTRODUCTION
BACKGROUND
Recent advances
both hardware and database applications have furthered the idea of
in
achieving the representational storage of real world information objects within the computer
system.
Applications must be developed to effectively and efficiently handle the vast
diversity of this real world multimedia data.
In addition to text, graphics, images,
sound are gaining greater importance and must also be effectively integrated.
ability to access
more than mere
textual information that
It is
and
this
embodies the principal driving
force behind the development of multimedia information systems.
Sound,
in all
of
its
varied forms,
is
an imponant and widely used
medium
for the
transmission of information. The widespread use of computers has greatly increased the
breadth and depth of our information processing abilities.
Yet the limited sensory
functionality of computers has traditionally dictated a predominantly alphanumeric or
"textual"
The
communications interface standard.
efficient storage
and retrieval of multimedia information has recently sparked
numerous research effons.
Several prototypes, primarily within
automation, have been developed
Many
.
in
Chapter IV. The ultimate use of multimedia information systems
an
artificial reality
to this approach.
which
By
is totally
the area of office
of these systems will be discussed
controlled by the user. There are
in
more
detail
lies in the creation
some obvious
of
benefits
involving more sensory stimuli, better data correlations and increased
information accesses are achievable.
Multimedia information can be of special
for example,
process.
is
much
interest to the
Department of Defense. Voice,
easier to encrypt digitally rather than through the use of an analog
The superior
quality of digital encryption
1
is
a definite advantage.
Additionally,
.
.
the ability to access
computer systems through
the use of the telephone provides
far
reaching apphcations in the area of sound management.
Each type of information medium requires
its
own
unique modes of handling. But the
fundamental difficulty faced by users of multimedia database systems
lies in
handling the
Unfortunately, the ability to properly
rich semantics of the multimedia data.
manage
unconventional data has proven an extremely difficult and highly complex task. Better
ways must be found which reduce
B.
the complexity involved in handling multimedia data.
PURPOSE OF THESIS
The underlying focus of
sound can be managed
this thesis lies in the
(i.e.,
research of effective
means by which
captured, stored, retrieved, edited) in a multimedia
information and database system.
This research
is
designed to answer the following
questions:
1
2.
What
is the feasibility of developing audio (sound) storage and retrieval capabilities
using only conventional programming tools in conjunction with existing off-theshelf technologies?
What
is an appropriate design for a data base which permits the querying of
unstructured data in the form of sound?
3.
What characteristics
4
What
5
What kind of queries of sound
of sound must be captured and stored which would aid in the
reahzation of such a design?
functions are required to allow users to manipulate sound data?
data
is
meaningful and
how
can
this criteria
be
achieved?
6.
How can the development of this technology be used to meet the growing real time
requirements of todays highly technical Navy?
Once
a discussion of sound data's capture, storage, retrieval and implementation details
has been presented, an architecture of a prototype will be proposed.
This phase will
explore the feasibility of integrating existing hardware and software components into a
functional multimedia information system which incorporates the
management of sound
data.
The
utilization of "off-the-shelf
technology will serve as the underlying
criteria for
the design of this prototype.
C.
AN OVERVIEW
The management of sound data
in a
database management system offers several unique
challenges that often go unnoticed or do not apply in other media forms. For example, the
real time aspect of acoustic energy must be captured and expressed
played) sound
slow
may be
is to
have relevance and meaning. That
entirely unintelligible.
is,
When converting a
Otherwise, a substantial loss of quality and information will
in
conventional
(i.e.,
a displayed
a playback that
sound into a
which can be managed by the computer system, a fixed sampling
Information retrieval
if
is
(i.e.,
too fast or too
digital representation
rate
must be employed.
result.
alphanumeric data to be called textual)
database systems has enjoyed considerable success over the past two decades. However,
extending these accomplishments to other media has not met with similar success. The
management of sound,
unfortunately, has also proven to be no exception.
based systems, sound features cannot be readily extracted from a data
complex sequence of
enormous data
steps,
resulting
many
of which remain to be defined.
from an acceptable sampling
Unlike text
file
without a
Additionally, the
rate of an acoustic input
present system standards-far too large to be reasonably stored within the database
Future
DBMS,
however, are expected to
is— by
itself.
handle such voluminous data storage
requirements as part of their standard operations.
Current technology
is
extent as that afforded the
limited in
more
its
ability to
handle sound data storage to an equal
familiar textual form. This
limitations will continue to be the state of future systems.
systems reviewed
in the literature
the area of office automation.
have centered
is
not meant to imply that such
Many
their research
The use of sound data
of the sound handling
and development
efforts in
for inner office taskings such as audio
memos
and annotations constitute the restricted integration of sound within the scope of
current multimedia information systems.
A
few prototypes under development, however, do offer
the promise of increased
sound data manipulation techniques and a wider variety of applications. In addition
basic sound manipulation features of record, store and play,
Etherphone and Diamond (see Chapter IV) also offer the
segments.
Through
from
It is
ability to edit
and link sound data
this perspective that the research direction for this thesis
the introduction of the concept of an abstract data type of type
able to achieve a level of sophistication in data manipulations that
of current systems.
By
to the
such systems as the
is
has evolved.
SOUND, we
beyond
are
the capabilities
providing a set of operations which can be used to manipulate the
SOUND data structure, we effectively hide the representation of the data structure from the
user.
Through
this
approach, the current familiarity of the user's view of the database need
never change.
One of the major drawbacks
volume which
results.
faced in handling digitized acoustic data
is
the large data
These problems have been under study for many years by telephone
companies concerned with minimizing the amount of information being transmitted without
a corresponding loss in signal quality. Since the Nyquist theorem states that sampling rates
of twice the highest frequency present are sufficient to capture
megabyte volumes
for
a
few minutes of captured sound
all
is
of an input sound,
quite
the
Understandably, the efficient management of large amounts of data in a database
tricky, regardless
To answer
of what the actual data
is
always
may represent.
queries with regard to sound, a person engages well established mental
capacities to analyze, synthesize and interpret the information.
however,
is
norm.
More
useful information,
often obtainable from the context of the sound. This includes emphasis such as
the vocal intonations and inflections associated with speech.
The
ability to extract certain
from the sound data
features
is,
in itself, a
wide open area for research. Realistically,
however, the user of the multimedia database system should not expect
of
this level
information extraction capability from current technology.
The approach proposed by Meyer-Wegener,
et al. [Ref. 1], is to abstract the contents
of sound data, image data and other forms into words or
description of the media in the database,
possible.
This description
extrapolation methods
will
purposes, however, this loss
manually entered by a human
is
always result
By
storing the textual
in
some
user.
VI uses
this
approach.
is
the data
this will
which enables
For querying
data content
be the linearly stored digitized samples.
the proper
desired.
is
Each medium
represented by three parts: raw data, registration data and description data.
For sound,
now
Such information
loss of information.
when querying of sound
architecture presented in Chapter
bit string.
is
acceptable. Moreover, current technology does not allow
is
us to go beyond this level of sophistication
The
text.
searching on the basis of data content
Raw
will
data
be
is
a
Registration data
decoding of the raw data for the device on which
it
will
be displayed. Description data relates to the "semantics" or contents of the raw data and
will be entered into the database
One of the problems
although sound data
information
may
by the
user.
associated with multimedia databases
files
may
be totally unusable unless the
used on the original sound.
Without the
new
host
in
Chapter
11.
is
portability.
For example,
the
aware of the encoding algorithm
ability to properly
properly accessing the data becomes a serious problem.
be further discussed
is
be transferred between different host computers,
decode a sound data
The nature of these problems
file,
will
D.
ORGANIZATION OF THESIS
Chapter
II
discusses the nature of sound and describes several sound encoding
Also discussed are the problems relating
techniques.
to translations
between different
encoding algorithms.
Chapter
III
describes
the operation of optical storage devices and their impact
on
database management. Effective mass storage devices are an essential part of multimedia
database management systems.
Chapter IV discusses other related work with respect to multimedia database systems.
Of primary
interest is the handling of acoustic data within the database.
Chapter
V provides an overview of the relational data model designed for the prototype.
A discussion of the SOUND data type is presented.
Also, several handhng techniques with
regard to data searches will be discussed.
Chapter VI describes the components of the prototype and the architecture upon which
it
was designed. The
presented.
specific
equipment used
The functions required
in the
development of
for the implementation of the
model
this
prototype
Chapter VII presents conclusions drawn from the research and implementation of
thesis,
plus a projection of the applications for which this prototype
Appendix
A provides
sound. Appendix
B
a brief overview of the
lists
is
are also discussed.
SQL preprocessor functions
may
this
be used.
as they relate to
the various internal "C" functions that provide the interface to the
software driver functions of the equipment used in the development of the model.
A DESCRIPTION OF SOUND ENCODING TECHNIQUES
n.
A.
INTRODUCTION
In this chapter,
some of
the basic properties of
definitions pertaining to sound are also presented.
in the following chapters.
With
that
sound are discussed.
Many
of the
These form the basis of our discussions
background established, a description of several sound
encoding techniques can be discussed.
This backdrop will
enable a viable, though
limited, discussion to ensue regarding the transformation of data
between different data
encoding schemes.
B.
INTRODUCTION TO SOUND
Sound
is
created by vibrations from
some
source.
Vibrations can be transmitted
through various media. Our primary concern will center around those transmitted through
the air to the ear.
By
vibrating air molecules to cause compression and rarefication, acoustic energy
transmitted. Speech
caused by
air
is
a
good example of acoustic energy. The motion of
rushing past them sets the surrounding air molecules to vibrating.
vibration causes other adjacent molecules to vibrate.
to the outer ear
where
canal, causing the
Two primary
it
is
eardrum
The motion
is
is
to vibrate.
attributes of
struck,
it
sound
This vibration
are. frequency
moves back and
corresponding increases and decreases
pressure
trail
traveling
This
carried through the air
collected and focused into the inner ear through the auditory
is
sensed as sound.
and amplitude (or
intensity).
generating source such as a tuning fork can be used of help \isua\izc frequency.
tuning fork
is
the vocal cords
away from
forth at a fixed rate.
in the air pressure.
A sound
When
a
This alternation creates
The compressed and
rarefied air
the tuning fork follows a sinusoidal (or sine) function.
This
trail,
when
plotted on the
X-Y
axes in which compression or
represented by the X-axis and time by the Y-axis, forms
One
is
the
feature of the simple sine
same
as the shape of the
two successive peaks
The period
occur
is
The human
The height of
the
wave
is
ear
is
is
the
its
simple sine wave are depicted in Figure
waveform
is
called the period.
number of peaks
(or cycles) that
wave lends
equal
20
to
Amplitude indicates the
of the basic sound related properties of the
1.
Complex waves
frequencies with various amplitudes superimposed
the
is
sensitive to frequencies in the range of
Most sounds have complex waveforms.
The shape of
midline
The distance between any
called the amplitude.
Some
relative loudness of a sound. [Ref. 2]
another.
waveform above
the midline.
Frequency
in seconds.
is
a waveform.
the shape of the
waveform below
amplitude
These cycles per second are known as Hertz (Hz). Frequency
to the inverse of the period.
20,000 Hz.
is that
successive compressions) in a
(i.e.,
measured
in a second.
wave
its
tonal qualities.
the cleaner, clearer and sweeter the sound.
A
(i.e.,
composed of many
are
added together) on top of one
In general, the
smoother the wave,
square wave, for example, would sound
harsh as compared with the steady resonance of a sinusoidal wave.
Since complex waveforms contain multiple frequencies,
visualize these frequencies as
set
of frequencies together
we form
of the frequencies in the band
range of 15,000 to 20,000
within a range of 20 to 400
just filter)
through
pass.
it.
is
A
designed to
low pass
is
Hz
sometimes easier
a frequency subset, referred to as a band.
to
called a bandwidth.
a
The range
For example, frequencies within a
could be considered a high band, while those occurring
Hz would
constitute a relatively
restrict or limit the relative
filter,
it is
members of a group or range of frequencies. By grouping
for example,
is
low band.
A pass filter
bandwidths of frequencies
(or
that pass
designed to permit only low frequencies to
Period
_|
\j %'
/
Amplitude
V
4Hz
signal
(4 peaks
per second)
1
1
1
Figure
second
Properties of a simple sinusoidal
1.
One of the most common methods used
to store
sound
waveform
in a
computer
analog sound signal into a digital representation through a technique
Samples of
the amplitude of the
waveform
sample captures the
The process of converting
level (amplitudinal
a numerical quantity is
volume
relative
(i.e.,
known
Thus each
at a certain point in time.
The
original
the samples, provided the sampling rate
sufficiently high enough.
By
frequencies can be captured.
The fidelity or
was
inceasing the rate of sampling, more of the higher
quality of a digitized sound
the bandwidth or range of frequencies that has been captured.
frequencies of up to 20,000
a sample of the
as digitizing.
contribution of multiple frequencies) and energy
measurement) of a sound signal
waveform can be reconstructed from
convert the
as sampling.
are taken at evenly spaced intervals of time
using an analog-to-digital converter (ADC).
waveform's energy level into
is to
known
Hz when
captured
higher quality than the same sound capuired
at
at
A
is
a reflection of
sound consisting of
20,000 samples per second would have a
10,000 samples per second.
Human
speech
this range, the
generally within the range of 500 to 5000 Hz.
is
Nyquist theorem says
that a
sampling rate of 10,000
to capture all of the information within this
proved
5KHz
At
the upper
Hz would
end of
be sufficient
bandwidth. More formally, Nyquist
that
an arbitrary signal has been run through a low-pass filter of bandwidth H, the
can be completely reconstructed by making only 2*H (exact) samples per
second. Sampling the line faster than 2*// times per second is pointless because the
higher-frequency components that such samplings could recover have already been
...
if
filtered signal
filtered out.
[Ref
3]
The chart of Table
Downsampling
1
summarizes the
refers to sampling
rates
effects of
sampling
that are less
at different
sampling
rates.
than twice the highest filtered
frequency present in a signal.
TABLE
Sampling
1.
THE EFFECTS OF DOWNSAMPLING
Memory
Frequency
Range Recorded
Rate
or Disk
Bytes Used per
Second
Max
length of
per Mbyte of
or Disk
Sound
Ram
22,000
0-lOKHz
22K
45 seconds
16.000
0-8KHZ
16K
62.5 seconds
11,000
0-5KHZ
UK
90 seconds
8,000
0-4KHZ
8K
125 seconds
5.5K
3 minutes
5,500
When
-
a sound
by the number of
is
2.5KHZ
digitized, the value of the amplitude is restricted to a range specified
bits
used to digitize the sample. For example,
were used, the range would be from
to 255;
10
if
if
eight bits per sample
16 bits per sample were used, the range
would be from
process
is
to 65535.
known
Each sample
as quantization.
range, the "top" and "bottom" of the
this range.
it
This effect
is
known
is
rounded off
If the
amplitude of a wave
wave
are cut off in order for the
is
This
greater than the upper
wave
to "fit" within
as clipping. Clipping causes distortion of the
sound since
tends to produce sharp comers on waveforms. This results in sound signals that sound
Dynamic range
harsh.
sound
the
that
human
An
is
for
can be recorded (without clipping) and the softest sound. The dynamic range of
ear
is
around 120 dB.[Ref.
aspect of sound quality
sound data
is
more
resulting
as an
example,
2]
denoted by the number of
One method
stores only
2],
from
for reducing the
one
bit
algorithms.
specifically,
per sample of the resulting
syllables,
reproduced.
compression
algorithm
Viewing
this
method merely
upon
bits
sound data values.
is
the general loss of discriminability
that
occur when the captured sound
taking advantage of the redundancy which occurs between successive
samples of limited bandwidth waveforms,
encoding algorithms
corresponding loss
A
compression algorithms reduce the average number of
words and sometimes even phrases
By
per sample used in the
amount of storage required
the resulting recorded data to be virtually useless
The primary disadvantage of data compression
between
bits
per sample instead of the entire eight bits or
the analog-to-digital conversion process.
we could expect
More
is
through data compression
described in [Ref.
playback.
measure of the difference between the loudest
a decibel (dB)
storage of digitized sound data.
is
to the nearest integer.
in
in
this loss
can be perceptively reduced. There are
use which can reduce the data storage requirement without a
sound
quality.
A
few of these techniques are
the next section.
11
briefly discussed in
C.
TYPES OF ENCODING
There are several methcxls commonly used to encode voiced sounds.
Many
of these
techniques were developed in the field of telephone communications and have found their
way
into other
storage.
venues of speech technology and the general
The use of software
in the
field
of sound capture and
encoding process has been completely and effectively
replaced by the growing technological advancements in hardware.
Once
the sound has been digitized, the next step
needed through an assortment of
All
statistical
is to try
techniques,
compression methods are based upon the principle
relatively slowly
information
compared
to the
at the digital level is
Through
sampling frequency.
that the
are discussed below.
sound signal changes
This means that
much
of the
redundant.
the use of special purpose signal processors and microprocessors, a variety of
The analog
bandwidth reduction methods have been demonstrated.
a
and reduce the number of bits
some of which
M-Law codec (coder-decoder) chip where
digital signal processor
(DSP) chip where
it
it is
is filtered
signal is presented to
and digitized, and then fed
presents an overview of the operation of a low-to-medium complexity
The w-Law codec
is
a pulse
to a
encoded. Figure 2 reprinted from [Ref 4]
DSP
coder design.
code modulation chip (see next section) which performs the
actual analog-to-digital and digital-to-analog
(A/D and D/A)
a special purpose signal processor,
a
is
powerful,
The DSP,
signal conversions.
single-chip,
microprocessor which performs the actual algorithmic encoding
programmable
(statistical
methods
involved) of the digital PCM-signal.
All encoding techniques are realized by filtering and digitizing the analog signal,
analyzing short segments of
it,
then encoding prior to transmission or storage.
encoding and parameter encoding are the two broad categories
these schemes.
[Refs. 4, 5,
and
The
6].
definitions presented
The
we
will use to
Waveform
summarize
below are encapsulations of information found
actual mathematical formulas
12
embodied
in the discussions
in
of the
following technique are not necessary to the scope of this thesis and therefore will not be
expounded upon.
u-LflW CODEC
—
b(n)
«[n)
«(t)
u-LfllU
>1
R/D
1
CCDE
UfDRDS
i^*
Figure
1.
Configuration for Coder Designs
Waveform Encoding
One of
(PCM)
DSP Hardware
2.
the
[Ref. 5].
most
It is
The amplitude of
direct
forms of waveform encoding
the basic foundation of all
the sound
is
sampled
at a
(or sample)
may
require from
6-16
the codec in use.
13
pulse code modulation
waveform encoding schemes
fixed rate (typically
speech) and converted into digital information using an
measurement
is
bits,
A/D
8000
bits
converter.
depending on the
in use today.
per second for
Each amplitude
PCM
output of
Differential pulse code modulation
(DPCM)
consists of not outputting the digitized
amplitude, but rather the difference between the current value and the previous one. Since
amplitude differences of 32 or more on a normalized scale of
should suffice instead of eight.
If the signal
to
255 are unlikely,
five bits
does jump wildly, the encoding logic
The
require several sampling periods to "catch up."
may
error introduced for speech can
generally be ignored. [Ref 5]
Delta modulation
sample
to differ
from
its
(DM),
a variation of the
predecessor by a
generated, telling whether the
new sample
is
more
eight or
bits
down
to only one.
either
+1 or
-1.
A single bit is
above or below the previous one.
modulation differs from the compaction algorithm
determine the relative energy level of the
compaction method, requires each
minimum amount,
in that alternate
new sample
Delta
samples are compared
vice reducing a digitized sample from
[Ref 5] This approach
is
obviously unsatisfactory for
rapidly changing signals since small level changes are assumed between samples. This
true
even when each value represents or indicates different absolute amounts.
example, +1
may mean
to
is
For
adding five to the previous amplitude. Delta modulation can also
be implemented by encoding the slope of the changing waveform as one of several fixed
values (other than +1 or -1)
chip"
ROM of the DSP
An improvement
is
These values can be permanendy stored
to differential
samples and extrapolating
next value
[Ref. 4].
in the "on-
chip.
(i.e.,
PCM
is
achieved by taking a few of the previous
predicting) what the next value will be.
Once
the actual
obtained, the step size-the difference between the actual and the predicted
signal-is adaptively quantized (or encoded). This
pulse code modulation
ADPCM
realized on the
is
(ADPCM). [Ref
method
is
known
as adaptive differential
4]
a low complexity technique and
DSP. The discussions of [Ref
14
was one of the
4]
show
first
encoder algorithms
that the design is
based on a
backward adaptive
step-size algorithm with a fixed first-order predictor (possibly a "slope"
computation) and a robust adapting step
size.
A predictor signal is generated by scaling the
previously decoded signal by the predictor coefficient. This value
the input signal to
form
A
the difference signal.
table
lookup
is
then subtracted from
performed, based on the
is
difference signal, to locate the quantization value. This table-based conversion process
known
as adaptive quantization.
tables, the
need for a divide
quantization
is
By
is
storing the step sizes and inverse step sizes in on-chip
in adaptively scaling the difference signal before
and
after
avoided. This offers a tremendous advantage in real-time applications in
terms of speed.
ADPCM offers
tremendous
flexibility in signal
throughout the speech technology industry.
development of the model described
This means that
at
an 8
generated as opposed to
The next
approach has
(there are
8K
sampling
advantage over
others)
in
Chapter VI. All data
rate,
only
4K
is in
widespread use
is
used in the prototype
is
packed 4
bit
ADPCM
SBC scheme
low frequency bands with a
filter
complexity
is that
for 8
ADPCM.
bytes per second (32,000 bits/sec) are
bytes per second (64,(XX) bits/sec) using standard
level of algorithmic
little
many
KHz
encoding and
This technique
KHz
PCM.
of subband coding (SBC). This
sampled inputs. The two-band
divides the input into two equally spaced high and
bank.
sampling rate and separately coded with
The two subband
ADPCM
encoders.
signals are then reduced in
The reverse process
takes
place in the receiver (in the case of transmissions) and the digital-to-analog converter
(DAC). Table 2 compares a few of the techniques mentioned above.
15
[Refs.
4 and
6]
TABLE 2. DIGITAL ENCODING TECHNIQUES
PCM
87
92
98
111
122
135
129
DPCM
87
+5
+6
+13
+11+9
+4
-6
DM(1)
87
+1
+1
+1
+1
+1
+1
-1
DM (2)
87
+3
+3
+4
+4
+4
+2
-3
ADPCM
87
(based
on
131
sizes
step
stored
in
...
tables)
Other encoding techniques have been developed, but these typically entail a level of
complexity that far exceeds our need of a basic understanding of the processes involved.
The use of more bands coupled with time domain harmonic scaling (TDHS),
for example,
can greatly reduce the amount of bits per second of encoded data. [Ref. 4] explains that the
TDHS
algorithm compresses the input signal by a factor of two (or more) in bandwidth and
on
SBC
The
sampling rate before passing
it
tremendous increase
of complexity required to realize the encoding algorithm.
in the level
The multiple DSP approach
(i.e.,
to a
DSPs connected
leads to a multiprocessor system
coder.
in
trade-off,
however,
is
a
sequence) can also be used, but this
which requires precise communication and I/O
synchronization between processors.
2.
Parameter Encoding
Parameter encoders are typically referred to as vocoders.
widespread application
explains
how
in
Vocoders have found
speech recognition and speech synthesis technology.
the spectral shape of the speech signal
16
is
[Ref. 5]
encoded rather than the speech
waveform. The spectral shape denotes an instance of the frequency spectrum of a signal
for a fixed period of time.
compact representation of
A
model of human speech production
Bit rates as low as
the speech signal.
achieved, but the speech quality, at best,
is
only
is
used to obtain a
400
Signal characteristics are usually extracted in the frequency domain,
frequency bands are used
mathematical
These are used
in the digitizing process).
synthesis model to create an output speech signal
perceived as similar to the original one.
One example of
formant synthesis which operates by decoding
whose waveform
form of encoding
this
certain
(ie.,
to control a
is
is
called
the spectral peaks (formants) of the signal.
Linear predictive coding (LPC) explained
be interpreted as a simple model of the
be
bits/sec can
fair.
another example which can
in [Ref. 5] is
human vocal
tract.
It is
basically a time
domain
operation which involves predicting the next sample of a waveform based on a linear
combination of a
as reliable as
set
of spectral numbers of previous samples.
LPC is common
might be desired for general encoding. Nevertheless,
LPC
excellent applicability in speech generating devices such as talking
With LPC, encoding
rate of
rates ranging
between 1200
8000 samples per second) have been achieved
Today, the speech coding technology
16K
bits
per second (or
often hybrids of the
on
to
2K
is
2400
bits
is
not
consumer products.
per second
(at a
sampling
[Ref. 5].
available to achieve a high speech quahty at
bytes per second at eight bits per byte). These techniques are
waveform and parameter encoding methods.
their operation except to say that these techniques are
than those used in standard
but
has demonstrated
PCM,
We will not go into detail
more complex and
ADPCM and LPC coders.
far
Their advantage
more
costly
is that
they
can be implemented on a single very-large-scale-integration (VLSI) digital signal processor
chip for real time encoding.
17
D.
TRANSFORMATIONS BETWEEN TYPES
Once
a sound signal has been successfully digitized,
considerable speed.
However,
can be stored or transmitted with
it
for the receiving processor to properly use the received
data, both the transmitter
and the receiver must use the same encoding and decoding
algorithms. Generally this
is
ADC
the
DAC
and the
not a problem in a single workstation environment in which
conversions are both contained on the same chip.
Nor
problem among different workstations, provided the same encoding algorithm
is
a
used in
This approach would serve equally well in the event that a central
the sharing of data.
sound archive or
is this
The hardware implemented encoding and
server were employed.
file
decoding algorithms are self-contained and perform
all
Hardware implemented algorithms severely
transformation of data between
restrict the
conversions automatically.
different encoding schemes.
The other compatibility aspect of data transformations involves
in the digitizing process.
A
16
KHz
sampling rate played back
the sampling rate used
at the 8
KHz rate
would
produce a highly distorted output since the duration expected for each sample would be
error (16K-62.5M-sec/sample, 8K-125M-sec/sample).
and may be hardware selectable based on the desired
of
bits
per sample) must also be
known
This duration
rate of play.
is built
The sample
in
into the chip
size
to ensure proper boundaries are
(number
maintained
between data samples.
The
final consideration for achieving universal transformation
encoding algorithms
is
drivers can be installed
digitizing process.
into
more
enough
between different
through the use of an intermediate encoding scheme.
which
will convert all data into
PCM,
the
more
Other encoding techniques can be employed to convert the
suitable algorithms for the chip in use.
to recognize the
transformation can occur.
The
driver,
Software
basic form of the
PCM data
however, must be smart
encoding used on the incoming data before an effective
The necessity
for data conversions
18
in
an algorithmically
heterogeneous encoding environment
cannot be avoided
is
a rather difficuh
.
19
problem which, unfortunately,
m. MULTIMEDIA MASS STORAGE DEVICES
The
efficiency of multimedia storage requires the use of physical storage devices
As
capable of handling massive data storage requirements.
offered. This brief synopsis is adapted
a prelude to the discussion of
on the basic concepts of physical storage
specific types of storage devices, a reflection
from E. Bertino
is
et al. [Ref. 7] in their discussions
of query processing in a multimedia database.
To
optical.
begin, physical storage
An example
is
organized in devices, which can be either magnetic or
of a magnetic device
is
the familiar hard disk.
consist of a single disk drive or be arranged in
a.
mounted. Devices are divided into segments, where a segment
an extent
is
document and
An optical
A.
is
at
may
a time
system
set
file
may
is
a set of extents. Finally
a physically contiguous region of secondary storage, such as a cylinder
magnetic storage or a
store
Optical devices
jukebox, in which one device
of sequential sectors on optical storage.
Segments are used
on
to
indexes, text access structures, system tables and data instances.
be open or frozen, depending on whether or not writing
is
allowed.
CONVENTIONAL SYSTEMS
Multimedia database systems, typically referring
images and sound, have storage requirements
storage media.
that
The minicomputer explosion of
to the integration
of
text,
graphics,
dwarf most conventional magnetic
the 1980's has introduced a
number of
which have managed
to alleviate
easily integratable secondary storage devices (hard disks)
a portion of the problem.
Since the highly
800K
common
5.25-inch magnetic floppy disk only holds a
bytes, the necessity for increased storage
The appearance of multi-megabyte secondary
20
volume devices remains
storage devices in the
maximum
of
a crucial concern.
mid 1980's helped
to
.
ease the problem, but
storage devices has
was not an optimum
grown from
a
solution.
nominal
exceeding 100+ Mbytes for large capacity
IM
The capacity of magnetic hard disk
byte for microcomputers up to and
mainframe) systems. Additionally, the
(i.e.,
recent growth of local area networks and the use of network
servers has aided in the
file
reduction of the data storage requirements.
But even
this increased capacity
has proven somewhat limited. Superficially,
the compression algorithms discussed in Chapter
II
data storage requirements resulting from the digitization of sound.
necessity for high fidelity reproductions or extended
most of
the advantages gained.
and video presents an even more
storage.
(i.e.,
at
8K
of
Unfortunately, the
long playing) recordings obviate
Maintaining a sufficient data storage
medium
for images
problem
critical
Consider the following example regarding
minute sound recording sampled
many
have managed to reduce the massive
bytes/sec
this limitation.
An uncompressed two
would require approximately IMbytes of
The use of standard 800 Kbyte 5.25-inch floppy disk
is
inherently unsuited for
such enormous data storage requirements. Comparatively, a single image of 1024 x 1024
byte pixels would also require 1Mbyte of storage.
storage requirement
becomes even
frames (or images) per second
larger.
would
most magnetic media storage devices.
transfer of data
severely
from the storage device
hamper
solution to the
Similarly, a
color bands are added, the image
moving
picture requires
up
to
24
provide the sensation of continuous motion. The
in order to
storage of uncompacted digitized video
When
substantially overflow the boundaries of
Relatedly, the necessity for a high bandwidth
to
main memory and/or output devices could
the efficiency of video data
management requirements.
mass storage dilemma has emerged
21
in the
form of the
An
optical disk.
interim
.
B.
OPTICAL DISKS
The growing
reproduction
(i.e.,
of optical disk technology
familiarity
music industry.
the
The
ability to store large
is
consumately linked to
amounts of digitized audio data with
playback) as clear, crisp and clean as the original was a welcomed relief
to a highly disgruntied populace of audiophiles. Strangely enough, optical disk technology
has quickly gained a foothold in office automation applications.
a significant impact on database
management because
it
Its
continued use will have
can incorporate data,
text,
image,
video and audio information. This technology provides users with rapid access to a far
amount of information on
greater
magnetic storage devices [Ref
a single disk than ever before available
means of microscopic
Optical disks store information by
recorded on the disk by a laser beam, which
The
surface.
pits translate into binary code.
bums
pits
Two
9], describes
how
is
pits.
Figure
3,
reprinted
an optical disk works.
types of optical disks have been introduced, with both gaining firm footholds
CD-ROM
The
Memory),
the data is permanently stored
is
disk. First, data is
read by scanning the rapidly
within the computer industry.
disk
on the
the pits onto the media's reflective
The data
spinning rotating disk for differences in reflectivity due to the
from [Ref
on competing
8]
the version in
pressed.
which
first is the
The recorded information
disks are invulnerable to the
damage
that
is
(Compact Disk
on the disk
Read Only
-
at the
protected by a layer of plastic.
time the
Thus, the
can be done to magnetic disks through head
crashes or mishandling.
The second type of optical
may be
inch
CD-ROM
storing
disk
is
the
WORM (Write-Once-Read-Many) disk.
recorded only once, but can be read back or reviewed
up
to
many
times.
(same size as used by the audio entertainment industry)
650 megabytes of data (of course the
22
A
is
Data
single 4.25-
capable of
larger video disks can store more).
Figure
That's as
much
How An
3.
Optical Disk
Works
data as can be stored on 1500 floppy disks (around 325,000 pages of text).
WORM disk sizes have not been standardized and come in a variety of incompatible sizes
(5.25-, 8-, 12- and 14-inch disks). A 12-inch WORM disk can store the equivalent of
400,000 pages of
text.
That equals 40 reels of magnetic tape or 10,000 frames of
microfllm.
WORM
capacity
is
disks are uniquely suited for use with multimedia databases.
large
The storage
enough for most applications, including those involving graphics, sound
and video presentations. Current
WORM disk drives are SCSI adaptable to IBM/PC/AT
and compatibles and could be included as an off-the-shelf peripheral storage device for the
prototype presented in Chapter
in optical disk applications
VH. Multimedia products
of the near future.
23
will
undoubtably play a key role
Most
current optical disk drives are dedicated to single, standalone PC's.
however, the technology has been standardized
same
size).
to allow
any drive
This makes disks and drives fully interchangeable.
to read
Fortunately,
any disk (of the
CD-ROM manufacturing
companies, pressing plants, publishers and software developers have agreed on standards
for both the hardware
and software elements of CD-ROM.
area of software standards
(ISO) to incorporate the
ISO 9660. [Ref
is
the
A significant milestone in the
1988 decision by the International Standards Organization
HIGH SIERRA CD-ROM file format into a word wide
standard-
9]
Erasable (or reversible) optical disk technology
is
another area poised for future
growth. This technology will combine the capacity and reliability of optical disks with the
erasability
and
flexibility
of magnetic media. The user can revise any data stored on them.
Speculative drawbacks include a relatively high cost, high access time and currently limited
application suitability.
The
limitations
on random access would also need
and suitably resolved.
24
to
be addressed
IV.
A.
OVERVIEW OF RELATED WORK
INTRODUCTION
Several companies have embarked on programs to add voice applications to their
environments.
Additionally,
much
research
is
being conducted
at
many
work
of the major
universities around the country in the overall area of multimedia information processing.
Rather than duplicate the tremendous research efforts embodied in the included references,
as well as other non-listed references, the
emphasis of similar endeavors
in the field
of
multimedia database design and implementation will be extracted "directly" from the
synopsis of the listed reference materials.
Where
plausible, discussions relating to
differences in the approach outUned in this thesis and other systems will be examined.
B.
OVERVIEW
Terry and Swinhart
been introduced
between
in [Ref. 10]
in this area
thoroughly discussed
many
of the systems that have
over the past few years. Several comprehensive comparisons
Etherphone system and other related systems were offerred.
their
shall concentrate
This thesis
on the sound management aspects of such systems.
The Etherphone system uses
a structure
known
as a voice rope.
Voice ropes are
encryptically recorded voice segments stored as files and maintained in a voice rope
database.
An
entry in the voice rope database contains attributes for the identifier, creator,
access-control
approach
lists
(flat table
and overall length of the voice rope.
view), only a single database access
rope's complete structure.
stored in the database
[Ref
10]
by including
Our approach
is
By
is
using the identifier-attribute
required to determine the voice
designed to limit the amount of data
certain physical characteristics of the recorded data
within a header structure of the actual sound data
25
file.
Lockemann
in [Ref. 11] offered a section specifically directed at the implications
His work effectively highlighted
voice data management.
many
of
of the operations
performed by these systems as they
relate to the
concept of multimedia database systems
management.
al.
in [Ref.
12] highlights similar aspects of the
Christodoulakis et
multimedia systems
Lockemann
were covered by Terry and Swinhart
that
in [Ref. 11], but
system to what they
not be very well
call
10]
and by
goes on to also compare the suitability of their
MINOS
in [Ref.
an "open world" environment in which certain information
known
MINOS
advance.
in
may
emphasizes the idea of having multiple
methods of finding and linking diverse information.
MINOS
is
an object-oriented multimedia information system that provides integrated
facilities for creating
and managing complex multimedia objects.
functions that exploit the capabilities of a
modem
MINOS
incorporates
workstation equipped with image and
voice input/output devices to accomplish an active multimedia document presentation and
browsing system. [Ref 12]
to those
states that "...queries
allowed by database management
on the
and
disjunctions and pans of words)," and that "...the
part of
MINOS
attribute
text retrieval
same query
and
text part are similar
systems (conjunctions,
capabilities (on the voice
multimedia documents) are allowed for text as for voice."
The query
specification interface assists the user in interactively specifying a query with the help of
menus and some graphics
capabilities
"dependent" components of a
document.
By contrast,
the
[Ref
MINOS
12].
Voice segments and voice narrations are
document and do not
model presented
exist independently of the
in this thesis presents
sound objects
in general
as independent entities within the multimedia database.
Systems such as
project group at
webs.
"Webs
MINOS
Brown
and the Electronic Document System
(built
by the
BALSA
University) emphasize the ideas of hypertext and information
are links to
some other information with relevant context" [Ref
26
12].
The
underlying textbook analogy
"dependendy" linked
We
is
to a particular
maintained
access stored real-world values
there
(i.e.,
from within the database. The
a driving
concern
was nothing
in the
It
should permit the end user the
sounds or images,
issue of capture
ongoing development of
particularly
new
etc.)
upon
abilit>' to
which represent real-world
and storage of these values remains
this
Up
research project.
to this point
or revolutionary regarding our approach to the problem
of integrating multimedia objects within a
rather extraordinary exceptions.
is
document.
are interested in developing an overall multimedia information system based
an underlying relational multimedia database.
objects
each image or voice object
in that
common
database.
Our primary area
introduction and use of an abstract data t>'pe of type
However,
there are
of ingenuity resides in
SOUND;
2)
some
1)
the defmition of
operations that are needed to manipulate this multimedia data of type
SOUND;
3)
the
tiie
the
storage of the contents of the sound data object in the database via a description attribute
which can be accessed using
(e.g.,
SQL); and
4)
sound object data
well within
its
the extended syntax of a standard structured quen,' language
die storing of physical characteristics of the
file
sound object within the
inside a file header. This approach has unveiled an area of research
earliest stages
of development.
This thesis concentrates exclusively on the management of sound within such a
framework.
distinctions
Although the
final project is yet to
and comparisons between
this
be realized,
we can
see both clear
and other systems that have already been
developed. Both IBM's experimental Speech Filing System (a stand-alone special purpose
system), which
was operational
in
1975 [Ref.
13],
and the Etherphone System [Ref. 10]
rely heavily, although not exclusively,
on
the use of telephones to access stored voice data.
This differs from our approach since
we
can manage more than just voice data and since
most workstations of
the future will have their
27
own sound
output
facilities.
In the event
workstations do not offer self contained sound handling devices, our research
and complete integration of off-the-shelf products
to the full
is
adaptable
in order to achieve this
capability.
Once sound
data
is
form a "voice rope".
files to
Etherphone uses voice ropes to access data, plus a sequence of
Multimedia Mail System of [Ref.
The "voice
The Experimental
14] provides interactions with the voice data via three
editor"
used to create, record, play, select and display
is
voice data.
Once
The "voice
display" shows ahready existing voice data as an energy
access to a particular voice object
computer display screen. The
and dark bars
light
displayed on the screen to represent intervals of voice and silence.
types of windows.
The Etherphone
captured, access to that data must be available.
system can connect individual and arbitrary fragments of voice
third
is
obtained, editing can be performed.
window, the "voice
waveform on
the
buffer", serves as an intermediate
storage location for speech segments.
The Diamond Document
Store
interpretation of voice data for
however,
word
system [Ref. 15] also uses a form of waveform
selection and editing.
both imprecise and time consuming.
is
problem of analyzing the waveform for
various parameters of the recording
and stop points,
approach
in
etc.) for the
Maxemchuk
the beginning
(e.g., intervals
recorded output.
Waveform
[Ref. 16] avoids the
and end of words by associating
of silence, playback
We
interpretation,
rate, arbitrary start
have infused an adaptation of
this
our research regarding the management of stored sounds within the database.
For example,
we
also provide the user the ability to
within the referenced sound data object.
select specific start
These points, however, are
and stop points
strictly
based on
elapsed time from the beginning of the (converted) sound object's data values.
Almost
sound
data.
all
of the systems reviewed use a
file
storage system for the
management of
This somewhat universal approach, in turn, forms the basis of our data
28
storage scheme. However, additional unique information
database and the header of the data
structure for the information
is
file to faciUtate
is
also stored within both the
ease of query processing.
The
storage
never revealed to the user. The Etherphone system reduces
duplication of voice fragments by maintaining a
list
of pointers to the individual fragments
in the voice tracks.
The above systems have
established the foundation for the exploration of the
management of voluminous, shared
data
among
distributed and heterogeneous
workstations [Ref. 10]. The techniques presented in this thesis are designed to build upon
previous work and should be applicable and beneficial in the growing area of multimedia
database systems design.
29
V.
THE SOUND DATABASE AND INFORMATION SYSTEM
SOUND MEDIA MANAGEMENT
A.
The
integration of multiple
media
into the personal workstation
offers tremendously increased potential for
applications and environments.
media can now be
This
is
A
Fortunately, sound and other non-textual information
effectively and efficiently stored in an integrated information system.
due primarily to the ever increasing improvements
decreasing costs of
memory and
and
multimedia information object
intrinsically exclusive attributes.
as a
complex
several forms
sound
be
(i.e.,
record
may
hardware and the similarly
contain several attribute media
may be composed
Conceptually, each
field, tuple attribute).
medium object
is
is
an integral part
logically represented
tuple, although the actual storage of the various data values
(i.e.,
hierarchical, normalized, unnormalized).
(i.e., text,
of a number of highly specific
Each object
in the database, a further increase in the accessibility
made
in
storage.
graphics, sound, images). Each, in turn,
of the entity object
and other computers
improved productivity across a wide range of
By
may
exist
in
properly managing stored
of real world information can
available to the user.
There are several
distinct advantages that a typical database integrated with
sound can
offer:
•
The
•
The recording and
unrestricted use of voice narratives and annotations in interactive applications.
storage of naturally occurring, though textually undefinable,
sounds.
•
The
•
The sharing of sound data
retrieval
of previously stored sound data.
files
among
various users.
30
Many
found
of these feamres are not exclusive to the sound management arena, but can be
in traditional text oriented databases
and
file
servers as well.
sound, however, are significandy different from those of
captured, stored or manipulated.
the handling of
sound
As an example,
would
take about
the uTitten text
A number of the more
The
characteristics of
Sounds cannot be easily
basic characteristics uith regard to
frequency, sampling rate, etc.) were discussed in Chapter H.
consider the characteristic of size uith respect to sound.
tv,o
is
(e.g.,
text.
minutes
to read if
rather nominal
spoken two minutes of
text,
(@
A page of text
read out loud. The capacit>- needed for storage of
Yet the storage required for the same
3.5 Kb>ies).
once digitized and recorded, would require several orders of
magnitude more (@ lMb>ie) than
its
equivalent written counterpart. This brings into focus
another important aspect of sound management which needs to be considered.
Specifically, the handling of
recording and playing.
requirements upon
that special
its
And
sound may also require special devices
finally, the real-time nature
to enable
sound
of sound imposes stringent
capture, presentation and synchronization. These differences dictate
methods must be employed when managing sound
in a
multimedia information
system.
B.
A NfULTIMEDLA ARCHTTECTLTIE
An
important issue of multimedia database management
model. The structure of multimedia data objects
more
familiar text-based data objects.
Though
is
it
generally
may
is
the design of a suitable data
more complex than
that of the
be possible to provide to the user a
seamless interface for interactions uith the database, the functionality of the underl>'ing
DBMS must become much more highly specialized.
Although
a great
deal of research has been devoted to data modeling, tiiere remain
several problems relating standard database modeling techniques to a
for multimedia database objects.
more generic model
The multimedia object may be composed of a
31
collection
of components--that
is,
specific
medium
objects-that differ in the type of multimedia
information contained within.
The user
database must permit the user
when
must always be considered
interface, another underlying concern,
designing the multimedia database.
For example, sound management
many of the obvious
in a
functionalities that readily
in
multimedia
come
to
mind
dealing with sounds of any type. This functionality includes identification, capture,
storage, retrieval, analysis
sense, characteristics),
and manipulation.
we can
the multimedia object [Ref.7].
By grouping related
functionalities (or, in a
describe the logical structure and the physical structure of
The
logical structure determines
how
logical
components,
such as basic data elements, are related to the contents of the object. The physical structure
determines
how
the physical components, such as storage characteristics
and presentation
devices, are related to the contents of the object.
The
structure of a multimedia object, however,
conceptual components.
We
shall
is
best described in terms of
view the conceptual structure
expanded view of the logical model. This convention
is
its
as the "big picture" or
adopted exclusively for our use
and therefore should not be viewed as a universal declaration of the terms. The conceptual
structure
is
important since
it is
far
more meaningful
to the user.
A
be used to define an information object's conceptual structure.
powerful abstractions for describing the semantics of data
in
formal data model can
Such models provide
complex
applications.
abstractions shall be used in describing the conceptual structures of the
information object The
These
SOUND
SOUND information object (or just SOUND object) is the primary
multimedia object upon which
we
shall focus our discussions throughout the
remainder of
this thesis.
Figure 4 depicts the conceptual
may have one
or
makeup of a multimedia object. Each multimedia
more multimedia components. Each
32
variation of a
object
component may be
viewed
is entirely
A
data
schema
new multimedia object
as creating an entirely
up
to the user as to
which conceptual view
model should allow
that represents the
Queries
(i.e.,
more meaningful.
composition of and the complex relationships
It
(i.e.,
on an
IMAGE
is
both the conceptual
(i.e.,
a
SOUND
object)
the "real- world" representation) of a multimedia object.
which
always available for use. For example, when conducting an
attribute of a
structure are satisfied
and the
We refer to
is
stored in
SQL query
multimedia object, conditions on the object's conceptual
by locating the conceptual components (obtainable from internal
functions) specified in the condition.
directly, a translation of the
query
Because
in
SQL
does not support the
terms of the content structure
performed. At the end user's level, the level above the programmer's,
involve the use of interactive
designed to
distinct
storage, retrieval and transmission) of multimedia information.
the content structure (not depicted) as the textual description of the object
the database and
among
should also provide a means for the sharing and
may have condidons on
content structure
is
It
a very natural and flexible definition and evolution of the
parts of a multimedia object.
manipulation
or just an extension of the original.
menus and some graphics
call certain functions.
(i.e.,
IMAGE
may have
this
object
to be
would possibly
icons) manipulations
These functions would perform the low-level operations
required to satisfy the user's query.
If
graphics are used, they should serve as abstract
representations of the desired end result.
33
IMAGES
Figure
4.
SOUNDS
The Conceptual Model of a Multimedia Object
Figure 5 shows the relation of the conceptual components of a multimedia database
system.
The user
is
shielded from the specific actions of the multimedia database system
34
and should remain unaware of the
information.
The
activities
involved
in the presentation of
multimedia
actual information retrieval process remains intrinsically in\dsible.
At a much lower
level, the
query processor
level, the
query would be converted into
suitable database function calls to the siiperxisor (or application level).
requirements of the query would also occur
sequence of function
at this level.
Any
preprocessing
The supervisor would pro\ide a
calls to the appropriate data handler
(media manager).
handler would pro\-ide the necessar>' interface to hardware where the actual data
generally in
files, for retrieval
information system,
a
in this fashion,
model
the
to
35
its
and coordination of the data and
system open-ended, new applications and data
added when desired.
enhance
offers functions for administrating
the data, semantically linking the data, synchronization
By making
stored,
can be seen as primarily a media
few highly important added features
functionality. In particular, the multimedia data
data storage.
is
dam
and presentation. The data model of the multimedia
when presented
management system with
The
t>'pes
may be
—
^L^^^^^^^^
application
1
program
V
query processor
Stilt
sound
video
graphics
data
data
data
data
data
handler
handler
handler
handler
handler
image
text
1
I
"W-jr>^ A^^
Storage subsystem and devices
magnetic
pc
^ile
hard
z^rstr
Figure
5.
I
\
disk
The Components of a Multimedia System
36
C.
THE SOUND DATA TYPE
1.
Sound Data Organization
The foundations of the approach
images
For
to
unformatted (or unstructured) data as applied to
in [Ref. 1] will serve as the basis for the
clarification, the reference to
establishment of the
SOUND
values are not associated with the actual processing of the database.
characteristics of the data
stored and retrieved.
applied to sound.
need not be known
Many
that
The
traits
its
and
DBMS when the various data forms are
to the
of the ideas expressed for the
The crux of
data type.
unformatted data simply means that the meaning of
IMAGE data type can be directly
approach will be skeletally duplicated here for
clarity
and completeness, but with a directed emphasis towards the effective management of
sound.
The
relational database
model
shall be
adopted when discussing multimedia
database query techniques. This allows for the conceptually familiar
attributes relating to a
several
medium
flat table
view of the
multimedia object. Each tuple of the multimedia object will consist of
related attributes, such as graphics,
images and sounds. With
this idea in
mind, sound, as with images [Ref. 17], can be regarded as an abstract data type with
own
set
its
of operators or functions. The specification of our structured abstract data type
includes
1)
a
component element type (SOUND);
element values (database relational
(encoded sound data); and 4)
handhng
table);
3)
2) a structure that relates the
a
domain of allowable
a set of operations on the values in the
component
structure values
domain (sound
functions).
Because of
the rather primitive capabilities available regarding the semantic analysis
of a waveform, this rudimentary approach shall be consciously avoided. Although certain
characteristic features
may
be nice to have, our concern centers solely around the actual
semantic contents of the sound source. Very
37
little
information can be obtained from the
merely graphical depiction of a complex waveform without some prior knowledge and
understanding of the contents of the sound the waveform portrays.
Waveform
analysis
is
best performed by experts in the field of acoustics and should not be a necessary skill
required of the typical end user.
inherently tedious operation,
To circumvent
we have
this
exceedingly low level, complex and
abstracted the contents of the sound data (and other
non-textual forms) into descriptive words or text.
By
storing the "equivalent" textual
contents of the sound as an attribute of the sound data relation,
tremendous improvement
in database access
we can
achieve a
and query performance. The content
is strictly
determined by the description assigned by the human user during the storage of the sound
data object, such as the
description of the sound
Meyer- Wegener
three parts:
raw
name of
the speaker, the nature of the speech or the subjective
itself.
et al. [Ref. 1]
describe
data, registration data
this abstraction
and description data.
sequence storage of the data. With sound, for example,
sized
ADPCM
digitally
this
process as consisting of
Raw
data
is
the actual bit
sequence could be the byte
encoded samples. Registration data
is
the data related to the
physical aspects of the raw data. This data incorporates the encoding algorithm needed by
the presentation device to display the
Examples would include
raw data
in a
high level "user understood" form.
the color intensity and the colormap for an image, and the
sampling rate and encoding technique for a sound. Description data relates to the contents
of the multimedia data captured (entered) by the user.
It is
a natural language
English, Dutch, Spanish, etc.) description of the contents of the data.
segment, for example,
the place and the
this
(i.e.,
For a short voice
could be the actual words that were spoken, as well as the time,
name of the
speaker. This data will be used for contents search during the
multimedia database query process. As with any type of abstraction process, a loss of
38
information will be present relative to the original source. Such losses are an unavoidable,
though naturally occurring, phenomenon.
2.
The User's View
Information systems should be designed with the end user in mind.
information systems are no exception.
manner which
is
most useful
to them.
Multimedia
Users must be able to gain access to data in a
The
actual operation of the underlying database
should be entirely of no consequence to the user, although the "hidden" database
management processes should be
effective.
user's
Keeping
view
in the
the previous
relatively fast and accurate if they are to be considered
two sections
in
mind, the time has
management of sound data within
the
come
to discuss the
framework of the multimedia
database system.
Managing sound
presentation aspects
Clearly, such data
in a
(i.e.,
management concerns encompass more than
retrieval of information as
SOUND
multimedia information system requires close
will be used to
ties
with the
output devices or playback speeds) of the actual sound data.
the
mere storage and
performed by the multimedia database. The abstract data type
model
the actual occurrences of the sounds available in a
multimedia database system. This type allows the specification of certain intangible sound
related properties
In
which can be managed by a multimedia
some systems,
structure of the underlying database
other hand, a well-defined
it
restricted
can be fed
to
random access
may
directly influence the
and the database management system involved.
DBMS may
retrieved in a "preset" or fixed manner.
before
DBMS.
the information presentation process
On
the
require that the multimedia data be stored and
That
is,
the data
may have
to specified sub-parts of the data.
39
to
be "transformed"
may
dictate
Such considerations are
crucial
an output device for presentation.
Other
criteria
to
the
development of multimedia information systems, but the
details of their
implementation should always remain "invisible" to the end user.
Several speculative methods pertaining to the user interface have been presented in
related works.
Many
of these works are discussed
user's interface of this thesis related research
Different applications of the
in
Chapter IV. The unrealized end
remains an important part of ongoing studies.
DBMS may dictate different user interfaces.
In this thesis the
actual structure of the end user's presentation and interface will not be discussed.
programmer's interface, on the other hand, will be examined since
directly to our
The
its
The
functionality relates
development of the prototype.
end user
typical
is
not expected to
know
either the format or the structure
required of an appropriate multimedia database query. The details of the "how" remains
within the realm of the underlying application program and multimedia
DBMS. The
user should need only to specify "what"
menu commands
is
desired, possibly via a set of
end
and/or icons.
By
comparison, the programmer's interface requires the use of a structured query
language, like
SQL,
to interface with the
multimedia
DBMS. The programmer is
singularly
responsible for the formulation of the appropriate query. This level also incorporates a
lower level of "what" must be done.
convenient to
make
the
It
seems more appropriate and considerably more
components of a
SOUND
object accessible through a series of
functions, rather than through a complex, variable length record.
sequential function calls, the desired data can be retrieved
By
introducing a series of
from the database and presented
on the appropriately specified device.
Different functions can be defined to produce different output types for the
components of the
SOUND data type of the multimedia information object.
40
Additionally,
low
level editing features, such as
objects, can also be
the interleaving of portions of different
accomplished through the use of function
SOUND
calls.
Consider, for instance, the input function
CONSTRUCT_SOUND(rilename,
size,
duration, encoding,
samplerate, resolution, description)
which creates a database entry for a sound object denoted by the more general sound object
filename, and constructed with the following input parameters:
identifier,
duration, encoding, samplerate, resolution and description.
filename represents a unique
thesis, the
file
name. Note
size,
For purposes of
that both size
this
and duration can
be used as optional parameter specifications which could denote either a fixed size in bytes
or a fixed duration in
some
unit(s) of time
(i.e.,
seconds).
The description parameter
would be used to describe the contents of the sound and any other relevant information
pertaining to the sound being constructed.
A
unique
SOUND
object will be denoted by a
two
attribute relation
scheme
consisting of
SOUND_OBJECT
It
may
be helpful to view
= (S_ID,
S_SOUND).
SOUND_OBJECT
as a relational table of the
form
SOUND-OBJECT
\Z!IX
Here S_ID
is
""S"-SDUND""|
a unique identifier of the
SOUND_OBJECT and S_SOUND is the relation
which
SOUND_OBJECT
which are accessible through a
SOUND_OBJECT relation
stores
is
all
(at
of the attributes or characteristics of a unique
least conceptually)
created using
SQL
41
series of function calls.
syntax of the form:
The
CREATE TABLE SOUNDOBJECT
(S_ID
integer!,
S_SOUND
where integer! denotes
A
transient
SOUND)
a long integer
SOUND
value
is
S_SOUND
and
is
variables (e.g., a rendition of the Gettysburg Address).
denoted by the parameters of the
size, duration,
UPDATE
encoding,
etc.),
SOUND.
of type
constructed that cannot be assigned to program
However,
CONSTRUCT_SOUND function
can be stored
in the
statements of the query language.
presentation of a single value of type
its
characteristics as
above
(i.e.,
database and used in
filename,
INSERT
and
This approach permits the retrieval and
SOUND. An example
using the query language
SQL
could be generated as follows:
UPDATE SOUND-OBJECT
SET S_SOUND=CONSTRUCT_SOUND(filename,&size,
&duration...)
WHERE
<optionaI condition>;
Another example could
INSERT
be:
INTO SOUND-OBJECT
(S_ID,
S_SOUND)
VALUES (3212,CONSTRUCT_SOUND(fiIename,...)
But what
if
SOUND were needed? By
certain attributes or characteristics of type
extending the aforementioned concept even further, other
incorporated to do just
that.
Each would return
be assigned to certain program variables for
later use.
42
sets
a specific value.
This
the user and are described as external functions, such as:
);.
set
of functions could be
These values could then
of functions
is
available to
SIZE (SOUND
attribute): integer;
DURATION (SOUND
attribute): float;
ENCODING (SOUND
attribute): integer;
SAMPLERATE (SOUND
attribute): integer;
RESOLUTION (SOUND
attribute): integer;
DESCRIPTION (SOUND
attribute): char string;
etc.
Different functions could be defined to produce different outputs, such as a general edit
function which would permit the selection of segments of a
segments to be combined.
A representative SQL
SOUND type
or allow several
statement which exemplifies the return of a
database value can be seen in the following example:
SELECT
SIZE (S_SOUND)
INTO
:varl
FROM
SOUND OBJECT
WHERE
S_ID = 217;
Here, the size of the sound data object with
SOUND_OBJECT
relation via the
SIZE
S_ID = 217
is
retrieved from the
function and assigned to the program variable
:varl.
The use of description
cenain
SOUNT)
objects.
Its
data within the database could greatly aid in the retrieval of
applications with regard to the type
those described by Meyer- Wegener in [Ref
PLAY_SOUND,
a function
1]
for the type
which permits the
SOUND
retrieval
and presentation of a
SOUND object from the database can now be more formally introduced as:
PLAY_SOUND (S_SOUND),
43
are identical to
IMAGE.
where
S_SOUND
contains
appropriate output device
is
all
either implicitly designated
designated as a condition of the query.
purposes of
this thesis.
characteristics of the
more
By
SOUND
of the stored characteristics of that
by default or else can be explicidy
enabling the retrieval and storage of certain attributes or
An example of such
functionally employed.
a query
would
be even
be:
PLAY_SOUND(S_SOUND),DESCRIPTION(S_SOUND)
INTO
:varl, :var2
FROM
SOUND_OBJECT
WHERE
IS_INCLUDED_IN(S_SOUND, "grey whale")
This query would retrieve the unique identifier of the
SOUND_OBJECT database
SOUND
it
located from the
which had a description containing the term "grey whale."
located, the sound data
would be heard. For
would be routed
clarity,
to the output device
and the stored sound
we've taken the liberty of a rather syntactically casual
condition clause to simply point out
variable :var2
The
We shall always assume a default output device for
SOUND_OBJECT into program variables, SQL queries can
SELECT
Once
object.
would represent
how
these functions can be used. Here, the
the description attribute of the
SOUND
program
object which
contains the term "grey whale" and the variable :varl expresses the boolean condition
signifying the success or failure of the play process.
If a specific portion
PLAY_SEGMENT,
SOUND
LINK_SOUNDS
object were desired, another function,
have been used to designate a specific
object
file.
If
more than one
SOUND
function could have been used instead of
Similar functions based on the
SOUND object
file
and
functions could
based on the input parameters.
CUT_SEGMENT function could create a new database entry
44
start
object were
PLAY_SOUND.
PLAY_SEGMENT and LINK_SOUNDS
be included which would create a new
For example, a
SOUND
for example, could
stop location within the
desired, a
of the
by removing
a specific portion of the data
from the input object
file.
A CONCATENATE_SOUNDS
function could add another entry into the database by concatenating two input
objects to produce a
It
new
object
file
should be becoming clearer that
management of sound
we can
achieve tremendous flexibility in the
objects through the use of such functions.
discussions, an extensible
list
SOUND
as a result.
To summarize
the
above
of basic functions pertaining to the manipulation of sound
data has been collected below. These functions are designed to operate on attributes of the
SOUND-OBJECT domain.
In addition to the general operators, certain functions that are
specific to our implementation are also included.
term,
"ANTEX."
counterpart.
In general,
These are denoted by the inclusion of the
most generic internal function may have an
Since our specific implementation
is
written in the "C"
"ANTEX"
programming
language, the constructs of this language wOl be used to explain the input parameters and
output results.
Most of
perhaps
with
the function
the
names
exceptions
are descriptive of the operations
of
functions
the
IS_INCLUDED_IN. The DATA_ONLY_FILE creates
a copy of a data
minus any header information which may be stored
counterpart would produce an
header information. The
whether or not a stored
"*char."
For now,
ADPCM
encoded data
IS_INCLUDED_IN
SOUND
this is
which they perform,
DATA_ONLY_FILE
file
file in
in the file. Its
and
PCM format
ANTEX_FILE
and would also not include any
function searches the database to determine
object contains the descriptive string pointed to by
merely a pattern matching function which operates on the
description attribute and returns a boolean result.
sounds being played via an output device.
45
The
"side effects" are simply the stored
FUNCTION NAME
CONSTRUCT_SOUND
ANTEX_SOUND
INPUT (arguments)
SIZE_OF_OBJECT
SOUND
SOUND
SOUND
SOUND
SOUND
SAMPLE.RATE
TYPE.ENCODING
DURATION
BITS_PER_SAMPLE
DATA_ONLY_FILE
ANTEX_FILE
ADD_DESCRIPTION
REPLACE.DESCRIPTION
DES CRIPTION_LENGTH
DESCRIPTION
PLAY_SOUND
ANTEX_PLAY
PLAY_SEGMENT
CUT_SEGMENT
LINK.SOUNDS
CONCATENATE.SOUNDS
RECORD_SOUND
ANTEX_RECORD
"ANTEX"
SOUND,
SOUND,
SOUND,
SOUND,
long integer
integer
integer
float
integer
*char
*char
*char
*char
integer (+side effects)
integer (+side effects)
SOUND
SOUND
SOUND
SOUND
SOUND
SOUND
integer
*char
boolean (+side effects)
boolean (+side effects)
boolean (+side effects)
SOUND, float,float
SOUND, float,float
SOUND, SOUND
SOUND, SOUND
SOUND,
IS_INCLUDED_IN
Again, the
OUTPUT (result)
SOUND
SOUND
(see above)
SOUND
boolean (+side effects)
SOUND
SOUND
SOUND
*char
boolean
functions are model specific reflections of the
more generic sound
data manipulation functions. With this operational backdrop in mind, an implementation of
the
SOUND abstract data type can now be more formally described.
3.
Implementation of the Abstract Data Type
For a concept of
implementable
at
some
this nature to
be of quantifiable value,
fixed or basic level.
it
should be (preferably)
The following implementation
is
based on the
concept of relating attributes and objects as a means for retrieving previoulsy stored
SOUND objects from within a multimedia database system.
As
previously noted, a multimedia information object will consist of one or more
computer manipulated media
types. In the relational view, each representative
be stored as an attribute of the parent multimedia object.
Each
attribute
media would
would have
characteristics that are rather unique, including the requirements of special presentation
(possibly) storage devices. For the abstract data type
46
SOUND, we proposed
and
that the basic
composition of the attribute
S_SOUND
of the
SOUND-OBJECT
relation contain the
following fields or characteristics:
name
-
s_filename
char[64]
/* a
-
s_size
long integer (bytes)
/*
of data
-
s_duration
float (seconds)
/*
can be computed */
-
s_encoding
[PCM, ADPCM, LPC,
/*
DSP algorithm
-
s_samplerate
integer (8000, 16000)
/*
# samples/sec
-
s_resolution
integer
/*# bits
-
s_description
char[500]
/* text description
where the "s_" prefix represents a sound
our approach, each
etc.]
unique
file
*/
*/
*/
*/
*/
to digitize
related characteristic of the
*/
SOUND
object.
In
SOUND object represents a different sound data file.
Unfortunately, a considerable amount of the characteristics information could be
lost if the actual data is
amount of data being
file
itself.
This
is
One method
ever separated from the database.
stored in the database
the approach
is
which we have used
Specifically, a header containing the
for reducing the
to store specific characteristics within the data
in establishing
sound data object's
duration and resolution has been stored in the data
file
the items stored in the database to only the filename
We
are
encoding,
along with the data. This reduces
and the description. Although
information could also have been stored within the header,
as a linked pair of a distinct database entry.
our model.
size, samplerate,
it
is
assuming
this
considered far more useful
that a search
based on the
description attribute will be a fairly frequent event, as opposed to similar searches based on
the size, duration, etc. of the
SOUND
object.
This process
is
consistent with that which
has been successfully implemented in the related work on image objects.
The information
processing of
stored in the database will permit ease of access and better query
SOUND objects.
Note, however, that not
47
all
of the characteristics of sound
as pitch
and
intensity, as well as the physical characteristics of the individual amplitudes
and
These include such perceptual items
are explicitly stored in the database.
frequencies. Since our aim
is to
develop a model for the capture, storage and management
of sound data, the additional information mentioned
is
somewhat unnecessary. Moreover,
complex algorithms such as the Fast Fourier Transforms (FFT)
series
extract, for example, the multitude of frequencies residing in a typical
Such a process
Furthermore,
clearly undesirable.
is
it
would be needed
to
complex waveform.
does not support our need for
multimedia information access.
The sjilename
database.
The
will permit a
object tuple of the
multimedia object
device.
complex multimedia
files) is typically
relation.
Thus
the
linked to a particular
SOUND
duplication of the data within main
achieved by feeding the data
The s_filename
is
file directly to the
attribute of the
An example
memory,
greater
device driver and the output
passed as a parameter via a function
handler (typically a device driver).
introduced
object to be uniquely identified in the
form of
may be individually queried.
To avoid unnecessary
flexibility is
SOUND
actual sound data (in the
call to the
sound data
of this can be seen in the previously
ANTEX_PLAY function where the name of the file to be played is passed as a
parameter.
Avoiding intermediate storage and unnecessary copying of massive data
is
a very
important design issue of multimedia databases and one which received criucal awareness
in the design of our
sound handling prototype. The prototype uses a background interrupt
and an on-chip buffer for piping the data directly
to the output device without
main memory
processing or storage.
The ssize
sound data
attribute denotes the size of the
file) in bytes.
Ssize
is
SOUND object (in this case,
the actual
direcdy related to sduration. The sjiuration of the
48
sound
is
the
from Stan
amount of time
to finish.
in
seconds and hundredths of a second
This figure
is
inclusive of
all
capture sequence, such as white noise or silence. Sjiuration
and the
rate of playback.
Compaction algorithms may
most computer related applications currently use
differ
takes the file to play
is
based on the sampling rate
between machines, although
ADPCM because of its reasonably high
The ssamplcrate
quality reproduction with respect to both voice and music.
alert the
it
sounds recorded during the original
sound data handler (device driver) of the correct speed
in
which
is
needed
the stored
to
sound
should be played.
The
encoding
s
attribute alerts the device driver of the type of algorithm
convert the original analog signal into digitally encoded data.
used to
Should the decoding
algorithm differ from that which was used to encode, there will be a general loss of quality
in the audible output.
the algorithm
Since almost
employed
The number of
in the
bits
all
encoding
is
performed by
DSP chips,
it
is vital that
decoding process be compatible.
used per sample
is
known
as sresolution.
Most speech
digitizing chips have found excellent resolution in the use of eight bits per sample.
However, depending on
the architecture of the machine, even greater signal fidelity can be
captured by increasing the resolution of the sample.
digitizer (codec) rather than in terms of the
We define s_resolution
Unfortunately, extraction of this physical characteristic
details
remaining to be worked
is
object.
sounds
to encode.
exceedingly difficult with the
out.
Since waveform analysis should not be a required
sjdcscription
in terms of the
compression algorithm used
attribute will be
skill
of the user, the
used to determine the semantic contents of the
SOUND
This will permit browsing the contents without actually having to play the stored
in order to
determine their contents.
By
49
storing the textual representation of each
SOUND object
in the database, a
and query access time
is
considerable savings in database storage requirements
achieved.
50
DESCRIPTION OF A SOUND MANAGEMENT PROTOTYPE
VI.
A.
ARCHITECTURE OF A PROTOTYPE
The prototype
intended to provide extensibility to the current relational
is
adding the objects of type
SOUND
In an effort to maintain
to the database.
DBMS
shelf technology approach, the functionality of the sound management system
around the commercial relational
DBMS
"INGRES." The host language
is built
standard "C"
embedded SQL statements and commands provided by "INGRES."
using the
Figure 6 reveals the high level view of the architecture.
which was employed
to the
The dialogue manager
It is
IMAGE DBMS
for the discussion of the
however, has been added
database
is
by
our "off-the-
noticeably similar to that
[Ref.
1].
serves as the interface between the device
management system.
More
distinction,
concept of the sound devices.
It
manager and
the
can be viewed as the heart of the system, performing
all
exchanges of data and the employment of the various input and output devices. The device
manager
devices.
serves as the actual physical link between the main program and the various
It is
composed of
the hardware. Figure 6
may need
etc.,
to
the various software drivers
shows
the different types of
manage: namely, input devices such
computer based sound generation
The
DBMS
chips,
to activate or interact
as
that the device
with
manager
microphones, audio players or
vcr's,
and output devices, such as amplifiers.
interface implements the query language integration within the system with
respect to the various sound attributes.
functions
needed
sound devices
whose
The
DBMS
interface
level of integration is not exclusive to sound.
is
a general
purpose
The sound manager
set
is
of
a set
of functions which perform the actual storage of the sound object, as well as performing
other administrative activities such as generating a unique identifier or determining the size
of a sound data
file.
These functions were described
51
in general
terms in Chapter V.
«-»
Dialogioe
Manager
DBMS
Device
Manager
Interface
Soiind Devices
Standard
Sotmd
DBMS
Manager
z:
,X
Structured
Data
Sound
Data
Figure
6.
Architecture of the
Sound Media Prototype
Although not depicted, for the sake of professional
also exists between the device
functions in which the
manager and
the
clarity
DBMS interface is never invoked.
52
we can
also
sound manager as there
assume an arrow
exist
sound related
The next section
will discuss the specific
hardware requirements necessary for the
development of the prototype. The coding of the various transformation functions needed
to properly
B.
query a database integrated with sound remains an area for follow-on research.
MODEL SPECIHC EQUIPMENT
The requirement
System workstation
for high resolution graphics and
machine for the
as a target
full
image
capabilities led to the
SUN
research project; that of building a
multimedia database system using relational database technology. Unfortunately, the lack
of input/output audio
facilities for the
SOUND
implementation for the
SUN
3 workstation required pursuing other areas of
object data handling.
Corporation's DSC-2(X) Audio Data Conversion System
specifically designed for
system
cost,
optimum performance on
however, was a prohibitive factor
the
is
Incidentally, Digital
a unique peripheral subsystem
SUN
at this stage
Sound
2/3 Workstations.
of our research.
The
total
A reasonable
altemative with similar or better operational functionality was needed.
We
were able
to find the
VP620E PC Compatible
real-time
A/D and D/A
digitized signals.
necessary functionality through Antex Electronic's
Digital
Audio Processor. This highly
Model
versatile board performs
conversions, with corresponding encoding and decoding of the
Using
this, in
conjunction with other peripheral devices and software
components, provided a platform for evaluating the management of sound
in a
database and information system environment.
The equipment employed
in
our model specific prototype
is listed
Hardware:
20MB
-
IBM PC/AT/386
-
Antex VP620E plug
-
Standard cassette deck with min
-
Plug in microphone (standard 1/4" jack)
compatible with
in
internal hard disk
Audio Processor board
1
VRMS
53
(installed)
audio output port
below.
multimedia
-
Audio Amplifier (standard
-
Various connection cables
1/4" input connection)
Software:
-
Microsoft 'C with standard libraries
-
INGRES DBMS
-
Audio
-
Driver interface functions
-
Query language transformation functions
language (with embedded SQL)
driver routines for Antex
VP620E Audio
Processor board
(future research area)
Input signals are taken directly from the microphone or the output port of the audio
cassette player into the standard 1/4" input port of the
signals
must be
sound board
at least
filters
IVRMS
in order to
and samples the incoming analog signal
(software selectable). Each sample
This number
in a 4-bit
the
is
is
at either
8KHz
or
The
16KHz
ADPCM by the DSP chip which results
in this form, the
sound data
sound driver software and the associated function
DSP
board. Microphone
converted into an 8-bit digital number by the codec.
then encoded (compressed) using
sound data sample. Once
Output
the
is
VP620E sound
be received above normal threshold noise.
calls to
files
can be manipulated via
it.
achieved by reversing the capture process. The 4-bit sample
is
decoded by
chip into an 8-bit sample, then reconverted by the codec into an energy level
representation which
is
routed via the output port of the sound board to the amplifier. The
amplifier then translates the energy levels received into frequency responses which causes
the speaker(s) to vibrate, thus reproducing the stored sound.
The Antex VP620E comes with
a
demo program which
permits testing and evaluation
of the various sound management capabilities needed for the future high level development
of the prototype. This model also incorporates one of our long range desires. That
sound presentation can be performed
in a
background mode, allowing the user
54
to
is,
the
perform
other activities on the computer while a sound
is
being played. This
piping the sound data directly from the database to the
without
copying the data into memory.
first
manner without
together in this
Sound
We
DSP
SOUND
chip of the sound board
little
sound
files
a corresponding loss in user productivity.
two ways.
First, the
objects are to be played in the sequence desired.
played as requested with
to
no time lapse between
specifically request segments of a
this option, the user
accomplished by
are able to string several
stringing can be performed by the user in one of
specify which
is
SOUND
must specify both the
object or
SOUND
selections.
SOUND
user can
These will be
The other method
is to
For
objects to be played.
object, the start time in seconds
and
hundredths of a second, and the end time for each object. The default for each of these
and the end of the
the beginning
C.
is
SOUND object, respectively.
IMPLEMENTATION CONSIDERATIONS OF THE MODEL
At
this stage, the
have been able
prototype has demonstrated strong support of our thesis research.
to achieve the capture, storage and retrieval of sound data.
for the required internal database manipulations has
currendy
developed
in the
in
V
The foundation
been presented and discussed and
process of being formally implemented.
Chapter
We
The
SOUND
is
data model
provides a highly interactive manner for manipulating stored
SOUND objects in a database.
However,
the prototype implementation has also exposed a
few of
impressions that were being carried with respect to the manipulation of
The most prevalent of these encompasses
of the encoding algorithms. The Antex
signal
emerging from the codec.
the false
SOUND
objects.
the erroneous concept of hardware independence
VP620E
If other
uses an
ADPCM encoding of the digital
machines are
to
be used, namely the
SUN
Workstation, the system must be able to successfully decode the encoding algorithm of a
55
sound
file
which has been imported from another source. Since ahnost
perform follow-on encoding
all
current digitizers
in hardware, this could pose a serious problem.
In addition to the portability
equipment may also be needed
problem of the encoded sound
files,
additional peripheral
in order to support presentation features
of the multimedia
information system. Without a means for input (capture) and output (play) of the
object, the functionality of the underlying
ingenious
it
may
DBMS
will be of
no
SOUND
avail, regardless
of
how
be implemented.
Issues of compatibility and availability of highly specialized equipment
(i.e.,
plug-in
sound boards or sound generating chips) could severely hamper the power of the
multimedia database. The
and then
to
ability to
determine the sampling rate used on a sound data
be taken into consideration when handling sound data.
included in the header of the data
reproduced.
By
file, it is
much
adding a header to the data only
information must be stored in the database
This information must then
data
is to
The
file
ability to string various
for
somehow be
this
type of information
sound object
to
is
be properly
created by the Antex VP-620E, we've
Without
time of
this header, file specific
file creation;
otherwise
it
may be
transferred along with the data file
if
the
be of any meaningful use to the receiving user.
flexibility for the user.
same
at the
When
easier for the
circumvented a severe limitation of our model.
lost.
file
emulate the reproduction of that rate on a sound presentation device must also
It
SOUND object segments together yields a high degree of
should be noted, however, that the sampling rate must be the
SOUND objects that are to be strung together.
could be expected, results in an unintelUgible output.
56
Sampling
rate incompatibility, as
Vn.
A.
SUMMARY AND CONCLUSIONS
REVIEW OF THESIS
The 1980's has ushered
Our
computer.
in a
new
era of information
management
in the
thesis set out to unravel the mysteries surrounding the
form of
the
management of
sound data within a multimedia database management system. Several key questions were
introduced in Chapter
milestones outlined
I.
in the
These have
now been
fully
chapters which followed.
A
answered through the research
synopsis of the results are included
below.
For our work, sound data
SOUND
is
exclusively stored in
files.
object where collections of these objects form
SOUND
out the semantic contents of a
within the database,
specific types of
we were
file is
referred to as a
object into words, then storing this description
Additional information can be obtained by storing each
the use of a unique identifier.
Along with
the recorded sound data,
a header of related attributes and characteristics pertaining to the sound data
inside each
file.
These
extracting
able to introduce queries which could be used to locate
SOUND objects.
SOUND object through
Each
SOUND relations. By
attributes denote the
size,
sample
rate,
duration and resolution (or bits-per-sample) of the sound data.
is
also stored
encoding algorithm,
This information
is
accessible through a set of function calls specifically designed to extract requested
information from a
B.
SOUND object.
APPLICATIONS
The
ability to
the user.
The
manage sound
user's
in
a multimedia database offers tremendous advantages to
concerns generally center around the presentation aspects of the
57
multimedia information system. The user typically knows very
components of
the
DBMS.
suggested for the use of
One
to process greater
training time.
money
DoD
agencies.
since time
is
is
A
control.
is
increased,
medium
another area which benefits from the ability to
multimedia
DBMS
This
is
of
which
in turn
reduces
standard training programs.
The use of voiceprints have already found
a database.
is in training.
stimulating multiple senses at a time, the ability
money, multimedia information systems offer a positive
saving alternative to single
Security
By
volumes of incoming information
And
about the integrated
powerful information medium.
this
of the premier areas of multimedia information uses
particular importance to
little
Therefore from a users view, several applications can be
incorporating
manage
multiple media within
a niche in the world of access
SOUND objects could be used to selectively
attach sound bites to different passwords or documents.
Information retrieval
When
sound
resulting
is
enhanced through the use of multimedia information systems.
used, multiple personnel can take advantage of the audio information
is
from the management of sound
environment with limited output devices,
on the application)
The
acoustic
this
at
a single workstation.
for duplicating or sharing large data files
between
signals. This is
from
work
users.
real-time capture of sound signals at remote or isolated sites
signal
In a closed
could effectively reduce the need (depending
(i.e.,
an underwater
a submarine) can aid in the proper identification of unidentified
accomplished by comparing the captured signal with signals that have been
previously stored within the database.
tremendous savings
in time
and
Taking advantage of
effort during
limited or lacking.
58
this capability
could offer
remote operations where support
is
either
Perhaps the most widespread use of multimedia information has occurred within the
area of office automation. Extending inner office
memos, mail and message systems with
annotations and narrations has meant greater overall productivity for
all
employees
involved.
C.
FUTURE RESEARCH AREAS
Much work
remains
in the area
of query transformations.
A
foundation has been
established for the design and implementation of functions which will transform a user's
query entered as an embedded
SQL
request into a standard "C" program which replaces
each embedded statement with a corresponding
SOUND
this
object relation(s) of the database.
set
of function calls to the appropriate
Appendix
A provides
a brief insight into
how
process should occur. The transformation implementation should be considered as a
possible area for follow-on thesis work.
Another area which has not received much attention
in this thesis is the design
and
implementation of the end user's interface. This interface should be considerably more
structured than that of the previously cited programmer's interface.
able to query the database through a sequence of
menu
calls or icon selections, rather than
be required to learn and to use a structured query language such as
same
results.
And
finally, the
end
user's interface for
The end user should be
SQL or EQUEL for the
sound manipulations must be equally
compatible with the other forms of multimedia information within the integrated database.
59
.
LIST OF REFERENCES
1
Meyer- Wegener, K., V. Lum and C. Wu, Image Database Management
Multimedia System, Naval Postgraduate School, Monterey, California,
April
2.
3.
MacRecorder:
a
Introduction To Sound, Farallon Computing, Inc., Berkeley,
California, pp. 2-17,
1987.
Tanenbaum, Andrew
S.,
Cliffs,
in
1988.
New
Jersey
,
Computer Networks,
Englewood
Prentice Hall, Inc.,
pp. 104-110, 1981.
4.
Crochiere, Ronald E. et al., "Real-Time Speech Coding," IEEE Transactions on
Commwmcar/(9n5, Vol. COM-30, No. 4, pp. 621-631, April 1982.
5.
Frantz,
6.
Gene and K. Ling, "Speech Technology in Consumer
Technology, Vol. 1, No. 2, pp. 25-34, April 1982.
Products," Speech
Myers, Andrew B.
AT&T
Journal,
7.
(ed.),
Vol. 65, Issue
Bertino, Elisa et
ACM
al.,
Transactions
January 1988.
5,
"Speech Processing Technology,"
September/October 1986.
"Query Processing In
On
A
Office Information
Technical
Multimedia Document System,"
Systems, Vol.
6,
No.l, pp. 1-41,
Human Factors In Engineering and
New York, 6th ed., pp. 140-147, 1987.
8.
Sanders, Mark S. and E. McCormick,
Design, McGraw-Hill Book Company,
9.
Steinbrecher, David, "Optical Disks Go Head To Head With Traditional Storage
Media," Today's Office, pp. 24-30, October 1987.
10.
Terry, Douglas B. and D. Swinhart, "Managing Stored Voice In The Etherphone
System,"
Transactions on Computer
Systems, Vol. 6, No. 1, pp. 3-27,
February 1988.
ACM
11.
ljx:kQ.rmnn,'?QiQxC., Multimedia Databases: A Paradigm and an Architecture,
Naval Postgraduate School, Monterey, Califomia, August 1988.
12.
Christodoulakis, S. et al., "Multimedia Document Presentation, Information
Extraction, and Document Formation In MINOS:
Model And
System,"
Transactions On Office Information Systems, Vol.4, No. 4, pp. 345-383,
October 1986.
A
A
ACM
13.
Gould, J. D. and S. J. Boies, "Speech Filing-An Office System for Principles,"
Systems Journal, Vol. 23, No. 1, pp. 65-81, January 1984.
IBM
60
14.
Postel, Jonathan B. et
Transactions
1988.
15.
Thomas, R. H.
Distributed
On
al.,
"An Experimental Multimedia Mail System,"
Ojfice Information Systems, Vol. 6, No.
et al.,
"DIAMOND: A
1,
ACM
pp. 63-81, January
Multimedia Message System Built on a
Architecture," Computer, Vol. 18, No. 12, pp. 65-78,
December
1985.
16.
Maxemchuk, N., "An Experimental Speech Storage and Editing Facility," Bell
Systems TechnicalJournal, Vol. 59, pp. 1383- 1395, 1980.
17.
McyeT-VVegencT,K., A Project on Multimedia Databases,
School, Monterey, California, April 1988.
18.
Meyer- Wegener, K., Extending an SQL Interface with the Data Type IMAGE,
Naval Postgraduate School, Monterey, California, July 1988.
61
Naval
Postgraduate
APPENDIX A
The
-
abstract data type
every occurrence of a
THE SOL PREPROCESSOR OVERVIEW
SOUND
SOUND
is
not recognizable to standard
must
attribute
the internal representation of the structure,
by standard
INGRES SQL.
referred to as
The
a pair of declarations that are recognizable
the filename of the
SOUND
object and the description of
resulting input query language used
by the preprocessor
is
SOUND SQL (SSQL).
The preprocessor reads
the source
they affect
SOUND
that has to
be included into the source,
attributes,
file,
looks for
SSQL
(e.g., the
SOUND attribute.
For the description
it
is
is
name of
To
the
attribute identification, a "_d" extension is attached.
SQL
SQL
statements.
in conjunction with the internal
sound related functions are described below. See Chapter
prefix
collects information
attached to the
uses the "exec sql" syntax to initiate the
demonstration of the use of embedded
The "SS"
so,
declaration of additional variables).
'
INGRES Embedded SQL
statements, checks whether
and replaces them. While doing
identify the associated filename attribute, a "_f extension
functions.
INGRES SQL. So
be translated by the preprocessor into
In this case, the declarations could correspond to a pair of
program variables consisting of
the contents of this object.
i.e.
first
V
and external
for a review
used to denote internal functions and
A
of these
SOUND-SQL
variables.
The following discussions concerning
follows both the
the transformation of
SQL Quick Reference Summary and
62
[Ref. 18].
SQL
statements
The
UPDATE
operation:
(user input)
exec sql
UPDATE SOUND.OBJECT
SET S_SOUND = CONSTRUCT_SOUND(size,
WHERE
duration,...)
S_ID = 292;
(preprocessor transformations)
exec sql begin declare section;
char SSfiIenamel[64];
char SSdescrl[500];
exec sql end declare section;
SSconstruct_sound
(
size, duration,
...,
SSfilenamel,
SSdescrl);
The bracketed "exec
variables.
input the
The
sql"
sequence of statements declare and identify
internal function "SSconstruct_sound"
same parameters presented
to the external
SOUND-SQL
CONSTRUCT_SOUND function, and
returns as output the filename (SSfilenamel) and description (SSdescrl) of the
object.
The
(SS)
behaves as a procedure and takes as
SOUND
return value of the internal functions are used to pass error codes or status
information.
The
return parameters of the internal functions are then used for the translation of a
INGRES Embedded SQL equivalent.
SSQL
statement into
SSQL
statements from the resulting program.
its
statement appears below.
63
This, in effect,
The fmal conversion of
removes the
the
UPDATE
;
execsql
UPDATE SOUND_OBJECT
S_SOUND_F =
SET
S_SOUND_D
WHERE
:SSfilename 1
= :SSdescrl;
S_ID = 292;
A few of these are listed below.
Other functions can be similarly described.
INSERT Example:
exec sql
INSERT INTO SOUND-OBJECT
(S_ID,
VALUES
(3212,
S.SOUND)
CONSTRUCT_SOUND(filename,...)
(preprocessor transformations)
exec sql begin declare section;
char SSfilenamel[64];
char SSdescrl[5(X)];
exec sql end declare section;
SSconstruct_sound
(
size, duration,
..,,
SSfilenamel,
SSdescrl);
exec sql
(S_ID,
INSERT INTO SOUND_OBJECT
S_SOUND_F, S_SOUND_D)
VALUES
(312, :SSfilenamel, :SSdescrl);
64
);
APP.DESCRIPTIQN
Example;
UPDATE SOUND_OBJECT
exec sql
SET S_SOUND = ADD_DESCRIPTION(S_SOUND,new_descr)
WHERE
S_ID = 292;
(preprocessor transformations)
exec
begin declare section;
sql
char SSfilename2[64],
SSfilename3[64];
char SSdescr2[5(X)]
SSdescr3[500];
exec
end declare
sql
exec sql
section;
SELECT S_SOUND_F, S_SOUND_D
INTO
:SSrilename2, :SSdescr2
FROM SOUND_OBJECT
WHERE SJD = 292;
SSadd_descripdon (SSfilename2, SSdescr2, new_descr,
SSfilenameS, SSdescrS);
exec sql
UPDATE SOUND_OBJECT
SET S_SOUND_F =
S_SOUND_D
:SSfilename3,
= :SSdescr3;
WHERE SJD = 292;
65
PLAY SOUND
Example:
execsql
SELECT PLAY_SOUND(S_SOUND),
DESCRIPnON(S_SOUND)
INTO
:varl, :var2
FROM SOUND_OBJECT
WHERE IS_INCLUDED_IN(S_SOUND,
"grey whale");
(preprocessor transformations)
exec
sql
begin declare section;
char SSfilename4[64];
char SSdescr4[500];
int
SS found;
exec sql end declare section;
exec sql declare SScl cursor for
SELECT S_SOUND_F, S_SOUND_D
FROM SOUND_OBJECT
exec
sql
open SScl cursor;
exec
sql
whenever not found goto SScloseSScl;
for(;;)
{
exec sql
FETCH
INTO
SScl
:SSfilename4, :SSdescr4;
SSis_included_in(SSfilename4, SSdescr4,
"greywhale", SSfound);
66
if(SSfound =
TRUE)
{
SSplay_sound(SSfilename4, SSdescr4, &varl);
SSdescription(SSfilename4, SSdescr4, &var2);
goto SScloseSScl;
SScloseSScl:
exec sql close SScl;
exec sql whenever not found "old action";
**********
The preprocessor
is
not required to generate exactly this code, but the effect should be the
67
APPKNDTX B
-
THE INTERNAL SOUND HANDLER FIJNCTTONS
/***********************************/
SAWYER, GREGORY R.
LCDR, USN
PROF VINCENT Y. LUM
ADJ PROF KLAUS MEYERWEGENER
AUTHOR:
RANK:
ADVISOR:
CO-ADVISOR:
THESIS TITLE:
MANAGING SOUND IN A
RELATIONAL MULTIMEDIA
DATABASE SYSTEM
GRAD. DATE:
15
Submitted as
DECEMBER
partial fulfillment of a thesis
1988
requirement relative to the receipt of a
Masters of Science Degree in Computer Science from the Naval Postgraduate School
at
Monterey, California.
This
is
a sound handling
module
for a multimedia information system
which can be
supported by a variety of multimedia data bases using standard "C" and an Antex
sound board.
The functions can be
called from almost any program.
VP620E
The functions
included in this module directly support the software driver of the VP620E. Modifications
to this
module
VP620E
to support other drivers
would require replacing
software driver with those which support the
An
automatic
header for each
encoding, duration and resolution
is
file
new
which includes the
at the start
as
SOUND.
68
size,
of each recorded
functions are the operations necessary to realize the full
known
the function calls to the
driver.
power of
file.
sampling
rate,
The following
the abstract data type
THE NAME OF THIS HLE
/* This structure represents the
SND STRU.C
IS:
sound object whose features
will be stored in the file as a header prefix to each
recorded file. The database information will consist only
of the unique file identifier and the description data.
/
struct
SND_HDR
{
long
int
s_size;
/*
number of bytes
/*
8K or 6K per sec */
0=none,l=ADPCM */
s_samplrate;
int s_encoding;
float s_duration;
/*
s_resolution;
/*
int
int
}
and hundredths */
per sample */
/* time in sec
hdr_info;
69
*/
1
#
bits
THE NAME OF THIS FILE IS SND_ERRS.C
/*****************************************/
:
/* This
module contains
This
list is
a
list
of possible I/O error responses.
truly extensible.
*/
typedef enum
{
PARS, WOPEN, WRITE, WCLOSE, ROPEN, READ, RCLOSE, SRATE,
)
TOO_LONG, OK
ERROR;
void di splay err(e)
ERROR e;
{
switch (e)
(
case
PARS
case
WOPEN
case
WRITE
case
WCLOSE
case
ROPEN
:
printf("Incorrect parametersNn");
:
printf("Cannot open
:
printf("File write erroiNn");
:
printf("Cannot close output file\n");
:
printf("Cannot open
return;
file
for outputXn");
return;
return;
return;
file for inputXn");
return;
case
READ
case
RCLOSE
:
printf("File read erroAn");
:
printf("Cannot close input file\n");
return;
return;
case
SRATE
:
printf("Incompatible sampling rates for filesNn");
:
printf( "Description is too long\n");
return;
case
TOO_LONG
return;
70
THE NAME OF THIS FILE IS: SF_NAME.C
/*
***** This module convens the standard time parameters of GMT
into 1 -digit hexadecimal numbers which enables the
construction of a unique filename for a sound file.
*/
char YR(yr)
intyr,
{
switch(yr)
case
case
case
case
case
case
case
case
case
case
case
'8';
break
break
break
T; break
'2'; break
'3'; break
'4'; break
'5'; break
96: return '6'; break
97: return 7'; break
99: return 'a'; break
88: return
89: return
90: return
91: return
92: return
93: return
94: return
95: return
'9';
'0';
}
/*
char
int
MN(mn)
mn;
{
switch (mn)
case 1: return T; break;
case 2: return '2'; break;
case 3: return '3'; break;
case 4: return '4'; break;
case 5: return '5'; break;
case 6: return '6'; break;
case 7: return 7'; break;
case 8: return '8'; break;
case 9: return '9'; break;
case 10: return 'A' break;
case 11: return 'B' break;
case 12: return 'C break;
;
;
;
71
;
charDAY(day)
int
day;
(
switch (day)
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
return T; break;
return '2'; break;
return '3'; break;
4: return '4'; break;
5: return '5'; break;
6: return '6'; break;
7: return 7'; break;
8: return '8'; break;
9: return '9'; break;
10: return 'a'; break;
11: return 'b'; break;
12: return 'c'; break;
13: return 'd'; break;
14: return 'e'; break;
1:
2:
3:
:
:
:
:
:
T
break;
break;
break;
'i';
break;
18:
'j';
break;
19
20: return 'k'; break;
21: return T; break;
22: return 'm' break;
23 return 'n'; break;
24 return 'o'; break;
25 return 'p'; break;
26 return 'q'; break;
27 return 'r'; break;
28 return 's'; break;
29 return 't'; break;
30 return 'u'; break;
31 return V; break;
15:
:
16:
:
17:
:
:
:
return
return
return
return
return
'g';
'h';
:
:
:
:
:
:
:
:
:
:
:
:
72
1
char HR(hr)
int hr,
f
I
switch (hr)
f
I
case
case
case
case
case
case
1:
2:
return
return
T;
break;
'2';
'3';
break
break
break
break
break
break
break
break
3:
return
4:
return
'4';
5:
return
'5';
return '6';
case?: return '?';
case 8: return '8';
case 9: return '9';
case 10 return 'a'
case
case
case
case
case
case
6:
1
return
'b' ,
12
13
14
15
16
return
'c'
easel?
case
case
case
case
case
case
case
18
19
return
'd
return
return
'e'
return
'g
return
'h
return
'i';
,
'f
;
;
break;
break;
break;
break;
break;
break;
break;
break;
break;
break;
break;
break;
return
return
'j';
2
return
'1';
22
23
24
return 'm '; break
20
'k
;
return
'n
;
break;
return
b
;
brea.k;
73
/*
ADVISOR:
CO-ADVISOR:
SAWYER, GREGORY R.
LCDR, USN
PROF. VINCENT Y. LUM
ADJ. PROF. KLAUS MEYER_WEGENER
THESIS TITLE:
MANAGING SOUND
AUTHOR:
RANK:
IN A
RELATIONAL MULTIMEDIA
DATABASE SYSTEM
15 DECEMBER 1988
GRAD. DATE:
/*********************************************/
*/
/********************************
********/
THE NAME OF THIS FILE IS: SND_FNCS.C
/************************************* ********/
#include
#include
#include
#include
#include
#include
#define
#define
#defme
#define
#defme
#defme
#define
<stdio.h>
<sys/types.h>
<time.h>
"snd_stru.c"
"snd_eiTs.c"
"sf_name.c"
NAME.LENGTH 13
MAX_DESCR 500
ERROR_FREE0
SOUND_ERROR -1
BEGIN 1;
SETREC 2;
START 4;
#defmeSTOP
#defme
#defmePLAY
#defme
int
5;
STATUS
END
VP620
6;
8;
9;
0;
74
/*
*/
/*
***
ADD DESCRIPTION
***
*/
*/
/*
SSadd_description(infilename, indescr, newdescr, outfilename, outdescr)
/*
**** add
to the description of a
sound object
*/
char *infilename,
* indescr,
/* input */
/* input */
*newdescr,
/* input */
*outfilename,
/* output */
/* output */
outdescr,
{
int
i
=
0;
while (*outfilename-H- = *infilename++)
while (*outdescr++ = *indescr++)
i++;
outdescr-;
/* reposition
on
"NO'
*/
while ((outdescr++ = *newdescr++)
&&
i
<
i++;
if (i
==
MAX_DESCR &&
*outdescr != V)')
{
outdescr = V)';
displayerr(TOO_LONG);
return
SOUND_ERROR;
}
return
ERROR_FREE;
}
75
MAX_DESCR)
*/
/*
/*
***
REPLACE DESCRIPTION
/*
***
*/
*/
SSreplace_description(infilename, indescr, newdescr, outfilename, outdescr)
/*
**** replace the description of a sound object
*/
char *infilename,
*indescr,
*newdescr,
outfilename,
*outdescr;
/* input */
/* input */
/* input */
/* output */
/* output */
while (*outfilename-H- = *infilename++)
while (*outdescr++ = *newdescr++)
return
ERROR_FREE;
76
/*
*/
/*
DESCRIPTION LENGTH
***
***
*/
/*
*/
SSdescriptionJength (filename, descr, char_count)
/*
*** count the
characters in the description of a sound object
*/
char *filename;
char *descr;
unsigned int *char_count;
unsigned
int
i
=
/* input, not used */
/* input */
/* output */
0;
while (*descr++ != NO')
i++;
*char_count =
return
i;
ERROR.FTIEE;
}
/*
***
DESCRIPTION
***
*/
*/
/*
SSdescription (filename, old_descr_name, new_descr_name)
/*
**** copy the description
to the output
*/
char *filename,
*old_descr_name,
*new_descr_name;
/* input, not
/* input */
used */
/* output */
while (*new_descr_name++ = *old_descr_name++)
return
ERROR_FREE;
77
*/
/*
***
/*
ISJNCLUDEDIN
***
*/
*/
/*
SSis_included_in (filename, descr, pattern, match)
/*
**** determine whether or not string "pattern" is contained
**** within string "descr"; returns 1 if true, if false
*/
*descr,
/* input, not used */
/* input */
pattern;
*match;
/* input */
/* output */
char *filename,
int
found;
(*pattem == V)')
found = 1; /* NULL string always
if
is
contained */
else
found =
i
=
0; /* initialize
found for loop use */
0;
while (*(descr+i) != NO'
&&
!
found)
(
(*(descr+i)
if
== *pattem)
{
j
=
0;
while (*(descr+i+j) == *(pattem4-j) &«fe *(pattem+j) !=
j++;
/* pattem matched */
if (*(pattem+j) == \0')
found =
NO')
1;
else
if
(*(descr+i+j)
==
V)')
/*
pattem longer than
remaining descr */
i
= i + j;
/* terminate outer loop */
else /* continue search starting with next letter in descr */
i++;
}
else
match = found;
ERROR_FREE;
retum
/*
*/
/*
*** SIZE ***
*/
/*
*/
SSsize(fname,descr,size)
/*
**This function reads a header from a designated file,
returns the updated header, then passes the size
as a long int back to the caller. No Error Checking Performed.
*/
char
char
long
*fname;
/* input */
*descr,
/* not used */
/* return value */
int size;
{
SND_HDR hdr,
struct
read_snd_hdr(fname,&hdr);
size
=
/* read hdr fields */
hdr.s_size;
ERROR.FREE;
return
}
/
*/
/*
***
SAMPLING RATE
***
*/
*/
/*
SSsamplerate(fname,descr,samplrate)
/*
***** This function reads a header from a designated
file,
returns the updated header, then passes the samplerate
as an int back to the caller. No Error Checking Performed!
*/
char
char
*fname;
/* input */
*descr,
/* not used */
int
samplrate;
/* return value */
{
struct
SND_HDR
hdr;
read_snd_hdr(fname,&hdr);
/* read hdr fields */
samplrate = hdr.s_samplrate;
return
ERROR.FREE;
79
*/
/*
/*
***
ENCODING
***
*/
--*l
/*
SSencoding(fnanie,descr,encoding)
/*
***** This function reads a header from a designated
file,
returns the updated header, then passes the encoding code
as an int
back
to the caller.
No
Error Checking Performed!
*/
char
char
*fname;
int
encoding;
/* input */
/* not used */
/* return value */
*descr,
{
struct
SND_HDR hdr;
read_snd_hdr(fname,&hdr);
/* read hdr fields */
encoding = hdr,s_encoding;
return
ERROR_FREE;
/
/*
/*
***
DURATION **
*/
/*
*/
SSduration(fname,descr,duration)
/*
***** This function reads a header from a designated
file,
returns the updated header, then passes the duration in
seconds and hundredths of a second as a float back to the
caller. No Error Checking Performed!
*/
char
char
*fname;
/* input */
*descr,
float
duration;
/* not used */
/* return value */
{
struct
SND_HDR hdr;
read_snd_hdr(fname,&hdr);
duration
return
=
/* read hdr fields */
hdr.s_duration;
ERROR_FREE;
}
80
)
*/
/*
/*
***
RESOLUTION
***
*/
*/
/*
SSresolurion(fname,descr,resolution)
/*
***** This function reads
a header
from a designated
file,
returns the updated header, then passes the resolution in
in number of bits-per-sample as an int back to the
Error Checking Performed to determine
garbage!
caller.
if
*/
/* input */
char
char
*fname;
*descr,
/* not
int
resolution;
/* return value */
used */
{
struct
SND_HDR hdr;
read_snd_hdr(fname,&hdr);
/* read
hdr fields */
to see if this field contains garbage */
(hdr.s_encoding != 1)
(hdr.s_resolution > 32)
/*
check
if
(
II
{
hdr.s_resolution
return
resolution
return
=
=
0;
SOUND_ERROR;
hdr.s_resolution;
ERROR_FREE
81
the hdr field contains
/*
***
READ SOUND HEADER
***
*/
read_snd_hdr(fname,h)
/*
***** This function reads a header from a designated file,
and returns the header to the caller with the various
fields updated.
*/
struct
SND_HDR
HLE
int
/* output */
*fname;
char
/*
*h;
sound object record */
*f;
num;
if ((f
= fopen(fname,"rb")) == NULL)
/*
open for reading
{
displayerr(ROPEN);
retum(SOUND_ERROR);
num =
if
/* only
1;
/****** read
the header
(fread(h, sizeof(struct
one header */
from the predesignated input
SND_HDR),
displayerr(READ);
< num
1,
/*
file
)
read error */
retum(SOUND_ERROR);
if(fclose(f)!=0)
(
displayerr(WCLOSE);
/* close error*/
retum(SOUND_ERROR);
retum(O);
82
*/
*/
---
/*
/*
*/
CONCATENATE_SOUNDS
***
***
*/
/*
*/
S Sconcatenate_sounds(fname 1 ,fname2,newfile)
/*
***** This function concatenates two sound
a
new
file
and generates
files
as a result.
This function should return a "0"
if
successful.
*/
char *fnamel;
char *fname2;
char *newfile;
/* input file
#1 */
/* input file
#2
*/
/* output file */
{
struct
SND_HDR
FILE
*f,*fg,*fh;
hdrl,hdr2,hdr3;
/* input/out buffer */
char *buf[5(X)];
int
num=l;
/* only
one header
*/
read_snd_hdr(fname 1 ,&hdrl );
read_snd_hdr(fname2,&hdr2);
if
(hdrl.s_sampb-ate
==
hdr2.s_samph-ate)
(
hdr3.s_size = hdrl.s_size + hdr2.s_size;
hdr3.s_samplrate = hdrl.s_samph^te;
hdr3.s_encoding = hdrl.s_encc>ding;
hdr3.s_duration = hdrl.s_duration + hdr2.s_duration;
hdr3.s_resolution = hdrl.s_resolution;
generate_filename(newfile);
if ((f
= fopen(newfile,"wb")) == NULL)
/*
open
for writing */
{
displayenfWOPEN);
return
SOUND_ERROR;
if ((fg
= fopen(fnamel ,"rb")) == NULL)
/*
open for reading
*/
/*
open for reading
*/
{
displayerr(ROPEN);
return
SOUND_ERROR;
if ((fh
= fopen(fname2,"rb")) == NULL)
{
displayerrCROPEN);
return
SOUNT)_ERROR;
83
/******
if
write the header into the predesignated output file
(fwrite(&hdr3, sizeof(struct SND_HDR), 1, f) < num )
*/
{
/*
displayerr(WRITE);
return
write error */
SOUND_ERROR;
}
while (!feof(fg))
{
if
(fread(buf,500, 1 ,fg)
<
)
/* load buffer */
{
displayerr(READ);
return
SOUND_ERROR;
/****** append data from sound data buffer */
if (fwrite(buf 500, 1 f) < num )
/* write buffer
,
,
*/
{
displayerr(WRITE);
return
whUe
SOUND_ERROR;
(!feof(fh))
{
if
(fread(buf,500, 1 ,fh)
<
)
/* load buffer */
{
displayerr(READ);
return
/*
SOUND_ERROR;
***** append data from sound data buffer
if
(fwrite(buf, 500,
1, f)
< num
)
*/
/* write buffer */
{
displayerr(WRITE);
return
SOUND_ERROR;
printf("Data successfully written...\n");
if ((fclose(f) != 0)
&& (fclose(fg) != 0) && (fclose(fli)
!= 0))
{
displayerr(WCLOSE);
return
/* close error */
SOUND_ERROR;
84
return
ERROR.FREE;
}
else
{
displaverr(SRATE);
return
SOUND_ERROR;
/*
/*
*/
** PLAY SEGMENT
***
*/
/*
*/
SSplay_segment(fnaine,start_time,stop_time)
/*
***** This function plays only the designated portion of a file
which the user desires vice the entire sound data file.
This function should return a "0"
if
successful.
*/
/* input file */
char *fname;
float stan_time;
float stop_time;
char *p;
char *file_segment;
/*
/* concatenate 'fname'
and
filename for use by the
'times' to
'play'
combined
form
file */
a single
function
*/
*file_segment = malloc(30);
sprintf(file_segment, "%s/%07.2f/%07.2f',fname,
start_time,stop_time);
file_segment[26] =
\0';
for (p=file_segment; *p; p++)
if
(p=='
*p =
if
•)
0;
(SSantex_play(fname,file_segment) != 0)
return
SOUND_ERROR;
return
ERROR_FREE;
/* oops... an error */
else
85
*/
/*
/*
***
***
LINK SOUNDS
*/
*/
/*
SSlink_sounds(fnamel ,fname2)
/*
***** This function plays two sound object
This function should return a "0"
files
if
char *fnamel;
char *fname2;
char *p;
char *file_segment;
*file_segment = nialloc(30);
sprintf(file_segment,"%s,%s",fnamel,
fiiame2);
file_segment[25] =
NO';
for (p=file_segment; *p; p++)
if
(*p=='
*p = 0;
•)
SSantex_play (fname 1 ,file_segment);
return
back-to-back.
successful.
*/
ERROR_FREE;
86
;
*/
/*
/*
***
CUT_SEGMENT
***
*/
/*
*/
SScut_segment(fname,time 1 ,time2,newfile)
/*
This function cuts a designated segment out of the
specified file and creates a new file minus the segment.
*/
/* input */
char *fname;
float time 1
/*
float time2;
/*
char *newfile;
/* output */
segment start */
segment end */
{
HLE f.+fg;
char *buf[200];
temp = 0;
lower_bound =
upper_bound =
num = 1;
int
int
int
int
struct
SND_HDR
0;
0;
h^;
read_snd_hdr(fname,&h);
generate_filename(newfile);
I*
compute
if
(h.s.samplrate
the
/* get
header info of input
/* create a
segment time boundaries
== 8000)
new
output
file */
file */
*/
{
lower_bound = 4000
upper_bound = 4000
* timel;
* time2;
}
else
{
lower_bound = 8000
upper_bound = 8000
/*
* timel;
* time2;
*** update header for new output file */
= h.s_size - (upper_bound - lower_bound);
r.s_size
r.s_samplrate = h.s_sampb-ate;
r.s_encoding = h.s.encoding;
r.s_duration = h.s_duration - (time2
r.s_resolution = h.s_resolution;
if ((f
= fopen(newfile,"wb")) ==
-
timel);
NULL
)
/*
{
displayerrfWOPEN);
return
SOUND_ERROR;
87
open for writing */
)
= fopen(fname,
if ((fg
"rb"))
== NULL)
/*
open for reading
*/
{
displayerr(ROPEN);
return
/*
if
SOUND_ERROR;
*** write the header into the predesignated output file */
(fwrite(&r, sizeof( struct SND_HDR), 1, f)< num )
displayerr(WRITE);
return
SOUND_ERROR;
*** now load the buffer and write
/*
to the output file */
while (!feof(fg))
{
if
(
(fread(buf,200,l,fg))
>=
)
{
temp = temp + 200;
if
(
(temp >= lower_bound)
fwrite(buf,200,l,f);
II
(temp <= upper_bound)
/* hmm...no error checking */
}
else
{
displayerr(READ);
return
SOUND_ERROR;
}
}
return
ERROR_FREE;
88
*****
*/
ANTEX_PLAY FUNCTION *****
/******************************3|C********/
/*
SSantex_play(filename,temp_fname)
/*
This
is
the actual function that plays the sound. Its input
two filenames,
the second of
play will return a
'0'
which may not be used.
to the caller. Failure will return
is
A
successful
an
error message.
*/
char *filename;
char *temp_fhame;
/*
primary input
/*
combined
file */
file */
/* declarations */
int port,useint;
int
vpfunction,samplerate;
int state,error,sec,hundsec,overload;
int
monitor =1;
long
/* record
monitor always on */
int sz;
srate,sencode,sresol;
int
float sdur;
ERROR
struct
/*
if
err,
SNT)_HDR
hdr;
***** Executable Statements *****
((read_snd_hdr(filename,&hdr))
==
*/
0)
{
if
/*
((strpbrk(temp_fname,".snd")) ==
strcpy(temp_fname, filename);
NULL)
***** read header values to set parameters */
printfC'Current File = %s\n",temp_fname);
printf("Size=%ld Srate=%d Enc=%d Dur=%5f Resol=%d\n",
hdr.s_size,hdr.s_samplrate,hdr.s_encoding,
hdr.s_duration,hdr.s_resolution);
vpfunction =
BEGIN;
/* alert to the driver */
port = 0x280;
useint = 2;
VP620
(&vpfunction, &useint, &port);
89
if
(hdr.s_samplrate
== 8000)
srate
=
0;
srate
=
1;
else
vpfunction = PLAY;
VP620 (&vpfunction, &srate, temp_fname);
vpfunction =
/*
open
file */
STATUS;
do(
VP620 (&vpfunction, &overload, &hundsec, &sec, &error, &state);
printfC State=%d Error = %d Sec = %d.%d Overload = %d\r",
state, error, sec,
hundsec, overload);
}
whUe
(!kbhit()
& state
!= 3);
printf(" \n");
printf("End of play! \n");
always required to close the
vpfunction = STOP;
VP620 (&vpfunction);
/* these statements
vpfunction = END;
VP620 (&vpfunction);
return
ERROR_FREE;
}
else
{
displayerr(READ);
return
SOUND_ERROR;
90
file */
*/
/*
/*
*** Generate
A New
File
Name
***
/*
*/
*/
generate_filename(sound_filename)
/*
***** Produce
a unique 8-digit filename for the recording composed of 1-digit each for
month & hour, and 2-digits each forminute and second. Each sound object
can be identified by the ".snd" suffix.
This code is similar to that used by the IMAGE functions.
year,
*/
char *sound_filename;
{
char *p;
struct
tm
*t;
time_t current_time;
current_time = time(NULL);
t
= gmtime(&ciirrent_time);
sprintf(sound_filename,"%lc%lc%lc%lc%2d%2d.%s",
YR(t->tm_year),MN(t->tm_mon), DAY(t->tm_mday),
HR(t->tm_hour),
t->tm_min, t->tm_sec,"snd");
sound_filename[NAME_LENGTH-l]=V)';
for (p=sound_filename;*p;p++)
if
(*p==
*p =
')
'0';
return *sound_filename;
91
*/
/*
/*
*** Store Sound Header and Data ***
*/
*/
/*
ERROR store_snd_hdr(fnaine,r,temp_file)
/*
***** This function
stores a header into a designated file,
then reads the recorded sound file, buffers the data,
then writes the buffer into the designated file following
the header.
*/
*fname;
char
/* output */
struct
SND_HDR
char
*temp_file;
/*
r,
sound object record */
{
FILE
*f,*fg;
char *buf[500];
int
/* input/out buffer */
/* only one header */
num=l;
if ((f
= fopen(fname,"wb")) == NULL)
/*
open for writing */
return (WOPEN);
if ((fg
= fopen(temp_file,"rb")) == NULL)
/*
open for reading
retiim(ROPEN);
/*
if
***** write the header into the predesignated output file
(fwrite(&r, sizeof(struct SND_HDR), 1,
< num )
/* write error */
return (WRITE);
while (!feof(fg))
{
if
(fread(buf ,500, 1 ,fg)
<
/* load buffer */
)
retum(READ);
/*
***** append data from sound data buffer
if (fwrite(buf,
return
500,
1
,
f)
< num
)
*/
/* write buffer */
(WRITE);
if(fclose(f)!=0)
retum(WCLOSE);
/* close error*/
if(fclose(fg)!=0)
retum(WCLOSE);
92
*/
*/
remm(OK);
}
*/
/*
*** Determine Size of
/*
DATA-ONLY
File *** */
/*
*/
long FileSize(i_file)
char *i_file;
HLE
/* input file */
*f;
long
int f_size;
if ((f
=
fopen(i_file, "rb"))
== NULL)
/*
open
file */
displayeiT(ROPEN);
if
(fseek(f,0L,2) != 0)
return
f_size
/* set position rel. to
EOF;
=
ftell(f);
if (fclose(f)
!= 0)
/* close file */
displayerr(RCLOSE);
return f_size;
93
end
*/
ANTEX RECORD FUNCTION
/*
*/
S S antex_record(filename)
char *filename;
int port,useint;
vpfunction,samplerate;
int
int state,error,sec,hundsec,overload;
monitor =
int
long
/* record
1;
monitor always on
*/
int sz;
srate,sencode,sresol;
int
float sdur;
intc;
int
i
/* storage allocation stmts. ..unused */
*pi,
=
0;
char *newname;
ERROR err,
struct SND_HDR
/*
hdr;
***** Executable statements *****
*/
generate_filename(filename);
vpfunction = BEGIN:
port = 0x280;
useint = z;
2
VP620(&vpfunction,&useint,&port);
vpfunction = SETREC;
samplerate = 0;
sencode =
newname =
vpbegin
*/
/* use default
lO address
'"
/*
wake-up
'
i,
i
call to driver */
.
ANTEX recording w/8bit resolution
"temp.snd";
VP620(&vpfunction,&monitor,&samplerate,newname);
putsC'Press ENTER to begin... \n");
= getcharO;
c
vpfunction =
*/
/* use interrupt 2 */
/* vpsetrec */
/*
1;
/*
START;
/* vpstart */
VP620(&vpfunction);
printfC'Recording in progress... Press any key to stop!\n");
94
*/
;
vpfunction =
STATUS;
/* vpstatus */
do
{
VP620(&vT)function,&overload,&hundsec,&sec,&error,&state);
EiTor=%d Seconds=%d.%02d Overload=%d\r",
state ,eiTor,sec ,hundsec ,overload)
printfC State=%d
}
while(!kbhitO
& state!=3);
These statements always required
/*
vpfunction =
to close the file */
STOP;
/*
vpstop */
VP620(&vpfunction);
printf("\n");
printfC'End of recording session! \n");
vpfunction =
END;
/*
vpend
*/
VP620(&vpfunction);
/*
Update the header fields now that the file has been recorded
= FileSize(newname) + sizeof(struct SND_HDR);
*/
sz
hdr.s_size
if
=
sz;
(samplerate == 0)
srate = 8000;
else
srate
= 16000;
hdr.s_samph-ate = srate;
hdr.s_encoding = sencode;
sdur = (sec + ((float) hundsec
hdr.s_duration = sdur;
if
(sencode
=
/*
ADPCM code */
/ 100));
1)
hdr.s_resolution
=
8;
/* set #bits-per-sec */
else
hdr.s_resolution= 0;
printf("Header infch-> size=%ld srate=%d enc=%d
h(lr.s_size,hdr.s_samplrate,hdr.s_encoding,
hdr.s_duration, hdr.s_resolution);
/*
*****
if ((err
store
header and data into designated
file
= store_snd_hdr(filenanie,hdr,newname))
dur=%5f resol=%d\n",
*/
!=
OK)
displayerr(err);
}
/***********************/
95
BIBLIOGRAPHY
Atal, Bishnus S., "Predictive Coding Of Speech At Low Bit Rates," IEEE Transactions
on Communications, Vol. COM-30, No. 4, April 1982.
Badgett, T.,
No.
18,
"Searching Through Files with Database Software,"
PC
Magazine, Vol.
6,
October 1987.
Bertino, Elisa et
Transactions
On
ACM
A
Multimedia Document System,"
al., "Query Processing In
Office Information Systems, Vol. 6, No.l, January 1988.
al., "Multimedia Document Presentation, Information Extraction,
and Document Formation In MINOS: A Model And A System," ACM Transactions On
Vol.4, No. 4, October 1986.
Office Information Systems,
Christodoulakis, S. et
Crochiere, Ronald E. et al., "Real-Time Speech Coding,"
Communications, Vol. COM-30, No. 4, April 1982.
IEEE
Transactions on
Demuijian, S. A., and Hsiao, D. K., "The Multi-Lingual Database System," Proceedings
of the Third International Conference on Data Engineering, Los Angeles, California,
February 1987.
Demurjian, S. A., and Hsiao, D. K., Towards a Better Understanding of Data Models
Through the Multi-Lingual Database System, NPS52-87-018, Naval Postgraduate
School, Monterey, California,
Dickey, Sam,
1987.
May
1987.
"CD-ROM: A World Of
Information
On A
Disk," Today's
Office, June
Flanagan, James L. et al., "Digital Voice Storage In A Microprocessor,"
Transactions on Communications, Vol. COM-30, No. 2, February 1982.
Fossum, Robert R. and V. Cerf, "Communications Challenges For The
October 1979.
Frantz,
Gene and K. Ling, "Speech Technology
Technology, Vol.
Georgiou,
Bill,
1,
No.
"Give
2,
An
April
SIGNAL,
Consumer Products," Speech
1982.
Ear To Your Computer:
Computer Experimenters," BYTE,
in
80's,"
IEEE
BYTE
A
Speech Recognition Primer For
Publications, Inc., June 1978.
Gibbs, Simon et al., "MUSE: A Multimedia Filing System,"C<9/w/7Mrer Society of the
IEEE, Vol. 4, No. 2, March 1987.
Gould, J. D. and S. J. Boies, "Speech Filing-An Office System for Principles,"
Systems Journal, Vol. 23, No. 1, January 1984.
96
IBM
Raskins, R., and Lone, R., "On Extending the Functions of a Relational Database
System," Proceedings
SIGMOD Conference, June 1982.
ACM
Lockemann, Peter C, Multimedia Databases: A Paradigm and an Architecture, Naval
Postgraduate School, Monterey, California, August 1988.
Lum, V.
Y.,
Multimedia
November
Wu,
Integrating Advanced Techniques into
Naval Postgraduate School, Monterey, CA,
C. T., and Hsiao, D. K.,
DBMS, NPS52-87-050,
1987.
Lum, V. Y., Wu, C. T., and Hsiao, D.
Advanced Applications," Proceedings
Data Organization
,
DBMS
"Design of an Integrated
to Support
International Conference on the Foundations of
Kyoto, Japan, May 1985.
K.,
MacRecorder: Introduction To Sound, Farallon Computing,
Inc.,
Berkeley, California,
1987.
Masunaga, Yoshifumi, "Multimedia Databases: A Formal Framework,"
Society Office Automation Symposium, April 1987.
Maxemchuk, N., "An Experimental Speech Storage and Editing
Technical]ournal, Vol. 59, 1980.
Meyer- Wegener,
K.,
A
Project on Multimedia Databases,
IEEE Computer
Facility," Bell
Systems
Naval Postgraduate School,
Monterey, California, April 1988.
Meyer- Wegener, K., V. Lum and C. Wu, Image Database Management in a Multimedia
System, Naval Postgraduate School, Monterey, California, April 1988.
Michel, Stephen L.,
Review, May 1988.
"HYPERCARD:
Apple's Illuminated Manuscript,"
Myers, Andrew B. (ed.), "Speech Processing Technology,"
Vol. 65, Issue 5, September/October 1986.
AT&T
CD-ROM
Technical Journal,
Ooi, B. C. et al., "Design Of A Multimedia File Server Using Optical Disks For Office
Applications," IEEE Computer Society Office Automation
Symposium, April 1987.
Postel, Jonathan B. et
Transactions
On
al.,
"An Experimental Multimedia Mail System,"
Office Information Systems, Vol. 6, No.
1,
ACM
January 1988.
Sanders, Mark S. and E. McCormick, Human Factors In Engineering and Design,
McGraw-Hill Book Company, New York, 6th ed., 1987.
Steinbrecher, David, "Optical Disks
Today's Office, October 1987.
Go Head To Head With
Traditional Storage Media,"
Stnikhoff, Roger, "The Industry Emerges: Apple Shines In Seattle,"
1988.
CD-ROM Review,
May
Stnikhoff, Roger, "IRIS
EYES:
Intermedia Prize,"
97
CD-ROM Review,
May 1988.
"An Architecture Supporting Multi-Media
Sventek, Joseph
S.,
Computer Society
Office Automation Symposium,
Tanenbaum, Andrew
New
Jersey,
S.,
Integration,"
IEEE
Englewood
Cliffs,
April 1987.
Computer Networks, Prentice
Hall, Inc.,
1981.
Terry, Douglas B. and D. Swinhart, "Managing Stored Voice In The Etherphone
Transactions on Computer Systems, Vol. 6, No. 1, February 1988.
System,"
ACM
Thomas, R. H.
et al.,
"DIAMOND:
A
Multimedia Message System Built
Distributed Architecture," Computer, Vol. 18, No. 12,
"Understanding Voice I/O Systems
And Their
December
on a
1985.
Applications," The
American Voice
Society, April 1985.
Woelk, D., and Luther, W., Multimedia Database Requirements,
Report No. DB-042-85, July 1985.
MCC
Technical
Woelk, D. and Kim, W., "Multimedia Information Management
Database System," Proceedings of the Thirteenth
VLDB
in an Object-Oriented
Conference, Brighton, United
Kingdom, 1987.
D., Luther, W., and Kim, W.,
"Multimedia Applications and Database
Requirements," Proceedings IEEE CS Office Automation Symposium, Gaithersburg,
Maryland, April 1987.
Woelk,
98
INITIAL DISTRIBUTION LIST
No. Copies
Defense Technical Information Center
Cameron
2
Station
Alexandria, Virginia 22304-6145
Library,
Code 0142
2
Naval Postgraduate School
Monterey, California 93943-5002
Department Chairman, Code 52
Department of Computer Science
Naval Postgraduate School
Monterey, California 93943-5000
1
Professor Vincent Y. Lum, Code
Computer Science Department
Naval Postgraduate School
10
52Lu
Monterey, California 93943-5000
Klaus Meyer- Wegener
Universitaet Kaiserslautem
Fachbereich Informatik
Postfach 30 49
6750 Kaiserslautem
1
West Germany
LCDR Gregory
R.
Sawyer
2
Attack Squadron Fony-Two
U.S. Naval Air Station Oceana
Virginia Beach, Virginia 23460
LT Cathy
A.
Thomas
1
13968 Stoney Gate PI.
San Diego, California 92128
LT Diane M. Enbody
4693 Blue Pine Circle
Lake Worth, Florida 33463
1
99
D9jl
Thesis
S223
c.l
Sawyer
Managing Gound in
relational multimedia
database system.
.?.
/^\
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement