1. Introduction and Problem Domain

1. Introduction and Problem Domain
1. Introduction and Problem Domain
1.1 The Problem
I am a part time DJ, and am also interested in music production. DJing involves beat-mixing,
where the tempo of the record to be played next is changed to match the tempo of the one currently
being played. This means that a large amount of creativity can be introduced into a DJ’s set. For
example, the record being played could be an instrumental track, allowing the DJ to beat-match a
vocal track and layer it over the top of the first track. This is effectively producing a live remix of
both tracks. Another way for a DJ to alter a record that is being played is through the use of effects
units, such as the Korg Kaoss pad, pictured in figure 1.1.
Figure 1.1 – A Korg Kaoss pad, a popular and powerful effects unit for DJs
This unit can be used to produce a wide variety of effects, such as high and low pass filters,
sampling and delays, in a very user friendly manner. The desired effect is chosen using the buttons
across the top of the unit, and the intensity and volume of the effect is controlled by moving a finger
across the touch-sensitive main. Some of the possible effects are explained in greater detail in section
2.5. This piece of hardware is incredibly versatile, but as you would expect, it is also expensive, and
out of the price range of most amateur DJs. A similar problem exists with music production
programs for computers, such as Image-Line’s Fruity Loops. They are very powerful music
production tools, but priced for a professional market. Another problem exists with these programs
though, and that is the weak user interface design. The complicated and crowded nature of the GUI
for Fruity Loops, shown in figure 1.2, doesn’t inspire musical creativity (it doesn’t look much better
in colour). This is partly due to the large amount of features available and the professional market it
1
is aimed at, but it is also very bland and flat, making it difficult to find the function required even if
you are familiar with the layout and functions available.
There are plenty of music production programs available, but I haven’t found any programs that will
produce ‘live’ effects over music that is streamed to a computer for immediate playback.
With all of these facts in mind, I had an ideal problem to look at for my project – producing a
program for DJs that will produce effects over live music being played through a computer. The
program would also need to have a graphical user interface (referred to as GUI from here) that is easy
to understand as the user might not have had previous experience with any form of music production
to draw experience from.
Figure 1.2 – The GUI for Fruity Loops is crowded and confusing
1.2 A Typical DJ Mixer Layout
The layout of a mixing console is one that is familiar to all DJs. An example is shown in figure 1.3.
This style of layout and features would be an appropriate one to aim to imitate with my application. The
one pictured has three channels, two for turntables, and a third for a CD player, a microphone, or another
piece of equipment. The three large vertical sliders control the output level of each channel relative to the
2
Three channels
Equalizer (EQ)
Volume control
Sampler functions
Figure 1.3 – A typical 3-channel DJ mixer. This also has a sampler built in, as can be seen from
the bottom left- and right-hand corner features.
other two channels. Each channel has a three-band equaliser above these sliders which allow the user
to alter treble, mid range and bass levels independently of each other on each channel (the mixer
pictured in figure 1.3 has a range of -32 and +12 decibels). The top-most knob on each EQ controls
the maximum output gain level of that channel so that the maximum output level remains the same
for all three channels. The horizontal slider, or cross fader, is what a DJ uses to fade between the
records being played on the turntables. Another function that a lot of mixers include are kill switches.
These are buttons that when pressed immediately drop the level of one of the EQ bandwidths to the
lowest possible level, effectively removing it from the output signal. The example mixer doesn’t
have kill switches, but these are a feature often found on mixers, and are most often used to remove
the bass part of a track every other beat, creating a variation to the rhythm of the track.
3
2. Digital Sound Processing
2.1 Basics of Sound Waves
A brief look at some of the basics of sound waves and effects needed to be looked at to see what
would be practical and appropriate for this project.
1.0
Angle (rads)
1.0
Figure 2.1 – The graph shows a simple sound with a constant frequency that gets quieter over time.
Frequency, amplitude and phase define a sound wave. Sounds are made up of waves of different
frequencies, corresponding to the pitch of each note in the sound – the higher the frequency, the
higher the pitch of the note. Figure 2.1 shows a simple sound wave of constant frequency. The
period of the wave is how long it takes for one cycle of the wave to finish, and is the inverse of the
frequency. The frequency is therefore the number of cycles completed in one second, measured in
hertz. The amplitude of a sound determines how loud the sound is at a given point in time - the
smaller the amplitude, the quieter the sound. The amplitude of the wave in figure 2.1 gets smaller
with time, hence it gradually gets quieter with time. The phase of a sound is the periodic position of
sound and is measured as an angle relative to the starting position of the wave. Figure 2.2 shows
three sound waves of the same frequency and amplitude that have started out of phase of each other.
If just 1 and 2 are considered, 1 is exactly 180° out of phase with 2. The combined waves of 1 and 2
produce a sound that has amplitude of zero at all points. 3 is 90° behind 1, and 90° ahead of 2.
4
1.0
Angle (rads)
1.0
Figure 2.2 – The graph shows three waves with the same frequency that each have different phases
Sound waves are continuous, and cannot be represented exactly by computers. Instead, a sound
signal processed by a computer is split up into discrete values called samples which approximate
sound waves. Each sample represents an instantaneous value of the amplitude of the audio, and the
number of samples per second is determined by the sample rate in Hertz (Hz). The highest sample
rate to be widely used is 44100Hz, which is the sample rate used on CDs and on the highest quality
MP3s.
2.2 Digital Sound Processing
Sound processing on computers can be split into two areas – sampled sound and synthesised
sound. Synthesised sound is primarily supported through the use of MIDI (Musical Instrument
Digital Interface) and sequencing programs, such as Fruity Loops, mentioned in chapter 1.
Sequencing and MIDI is used for music composition and production, and files contain more
information than just a stream of data representing sound. Each MIDI file is a time line, with a
number of channels representing different instruments. An analogy is having multiple musical score
sheets laid out in front of you. Events on the time line contain information about what sound should
be played, typically a single note from a particular synthesised instrument, plus the pitch of the note,
the reverb and any other information necessary to play the event as required. MIDI events have only
recently had functionality developed that allows them to handle real-time processing, and cannot be
used to process a real-time audio signal channelled through it due to the way in which MIDI files are
constructed.
Sampled sound on the other hand is just the representation of audio as a stream of bytes of data,
and is how real-time processing is handled in both computer applications and DSP hardware. As
5
mentioned, the information on CDs is represented in this way. CDs have a sample rate of 44100
Hertz, and each sample is a 16-bit value made up of 2 bytes of information. A byte is an eight-figure
binary number. A byte can have 28 (256) discrete values, and therefore 16-bit sound allows 216
(65536) discrete values for a particular sample.
Figure 2.3 – 16-bit sound is represented by binary strings 16 characters long
The value of each sample is the amplitude of the sound wave at that point in time. A single
frequency can be represented in this way by changing the values in a cyclic fashion, and combinations
of frequencies producing more complicated sounds are represented by combining the byte streams for
all desired frequencies. So a very short snippet (0.159 milliseconds) of an audio signal in CD format
might look like the data in figure 2.3 if written out.
An 8-bit signal would only have a single byte for each channel per sample, not two as with a 16bit signal. As the number of different possible values that each sample can take is greatly reduced, an
8-bit sound contains less detailed information about the original signal than a 16-bit signal. For
sound processing some detail can be sacrificed in order to speed up audio stream processing within an
application by using an 8-bit signal. A mono signal only has one set of data for a single channel, not
two as with a stereo signal, and is another way of reducing the amount of data that needs to be
processed by an application.
2.3 The Fourier Series
For digital sound processing, audio data needs to be converted from a byte stream, which is in the
time domain, to spectral information representing the frequency domain of the sound. The use of
Fourier transforms is the standard method for analysing time domain data to obtain the series of
6
frequencies that make up the signal in the frequency domain. Fourier transforms are a result of the
discovery by Jean Baptiste Fourier in 1807 that any audio signal or waveform is the summation of
sine waves of different frequencies, phases and amplitudes [1]. This summation can be represented
by a Fourier series. The general form of a Fourier series, shown in figure 2.4, demonstrates that any
periodic function of time x(t) can be transformed into a summation of sine waves and cosine waves
starting at a frequency of zero that increases in integer multiples of a base frequency f0 = 1/T, where T
is the period of x(t).
Figure 2.4 – The general form of a Fourier seriesi
2.4 The Fast Fourier Transform
A Fourier transform calculates all ak and bk values in the equation in figure 2.4 for a given base
frequency and the function x(t). An infinite sum cannot be done on a computer, so a finite set of sines
and cosines are found that represent a number of frequencies equal to the number of time samples in
the input sound. The discrete Fourier transform is solvable in O(n2) time, and produces a complete
range of frequencies.
Figure 2.5 – Discrete Fourier transform functionii
The quickest algorithms that can convert a time domain function into a frequency domain
function are fast Fourier transform (FFT) algorithms, first discovered in the 1965 by J.W. Cooley
and J.W. Tukey, which are completed in O((n)log2(n)) time [2]. The compromises that a FFT
algorithm makes in order to be computationally fast is that the number of samples being handled by
the FFT at one time must always be a power of two. The FFT algorithm is derived from the discrete
Fourier transform function which is shown in figure 2.5. The fast Fourier transform improves
computation time by taking advantage of the periodic nature of the discrete Fourier transform. The
i ii
, Images from [10] Discrete Fast Fourier Transforms
7
discrete Fourier transform can be rewritten in terms of complex numbers rather than trigonometric
terms as that shown in figure 2.6.
N −1
XkxnW
NN( ) = ∑
n=0
( )
kn
Where:
WN = e
2π
− j 
 N
Figure 2.6 – Discrete Fourier transform function in terms of complex numbers
The series can be split into odd and even values of N that can be calculated and added together.
This can be done to each previous smaller series until there are only two values in each set (Hence the
need for a sample size to be to a power of two)
2.5 Some DSP Effects
Once the spectral information of a digital audio stream is known, it can be manipulated in a
number of ways. Not all effects require the audio stream to be converted into spectral information
however, and can be done more easily in byte form.
Some effects that can be implemented with byte information are:
•
Flanger – A flanger takes a copy of the sound signal and plays a delayed copy of
the signal over the top of the original. This causes variations in the phase of the
original signal and produces a sweeping sound often described as being similar to a
jet engine.
•
Echo/Delay – Part of the original audio signal is delayed, and then played back
repeatedly at progressively quieter levels. The time between each echo can typically
be varied, as can the rate at which the level is reduced, called the echo decay.
•
Reverb – Reverb is a specific type of echo used to recreate how sound echoes
and diffuses around different types of rooms, such as concert halls or studios, and has
specific delay and decay times associated with different surroundings.
•
Samples/Sampling – A sample in this sense is a clip of sound is saved into
memory, which can then be played again and looped repeatedly if desired. This is
somewhat different to the type of sample mentioned in chapter 1 which is a single
piece of information about a sound signal, despite having exactly the same name.
•
Pan – Pan is the biased to the left or right channel for stereo signals.
8
Some of those that need to have spectral information to be implemented are:
•
Filter – A filter changes the amplitude of a given range of frequencies, normally
reducing, or attenuating, the amplitude of the selected frequencies to 0.
•
Phaser – A phaser is similar to a flanger, but instead of delaying the sound signal,
the phase of the signal is changed and played immediately combined with the original
signal. A low frequency oscillator (LFO) is usually used to change the phase of the
original signal. It multiplies the original signal with a low frequency sine wave.
All of these effects, except the sampler, work by passing the audio stream through different
algorithms that alter the audio stream in some way.
2.6 Digital Filters
The generally accepted definition of a filter within music production is ‘a circuit or software
algorithm which permits certain frequencies to pass easily while inhibiting or preventing
others’ [4], though a filter technically mean an algorithm or circuit that changes an audio
signal in any way at all. Filters in the common sense are the most widely used effects by DJs. They
can quickly remove the bass and low frequency drums from a track, which can create an acapella
effect on tracks with vocals. Also, filtering out mid range frequencies mutes vocals effectively so that
the opposite can be done. These effects can also be used at the same time as two tracks are mixed
together to create more complicated combinations of the parts of the two tracks. Both of these are
popular tricks for DJs when they are mixing live and filters would be expected on an effects unit such
as I intend to produce for this reason. The different types of filters widely used are:
• High-pass filters – All frequencies below given cut-off frequency are attenuated to zero,
and all frequencies above the cut-off frequency are unaltered.
• Low-pass filters – The opposite of a high-pass filter. The frequencies above the cut-off
frequency are attenuated to zero.
• Band-pass filters – All frequencies other than those in a specified bandwidth, or range of
frequencies, are attenuated to zero, while the selected bandwidth remain unaltered.
A classic-style filter needs to have some settings defined – the pass-bands, the stop-bands and the
gain characteristic of the pass/stop-bands. The pass-bands are the frequency ranges that are relatively
unchanged by the filter, the stop-bands are the frequency ranges that are attenuated, and the gain
characteristic is the ratio of the output level of each band relative to the input level, measured in
decibels. There are various types of established filters [5], such as the Butterworth filter, which are
produced using different algorithms. These filters need to have the settings mentioned defined, but
what characterizes them is the frequency response in the transition band of the filter and the phase of
9
each frequency, which varies for each filter. The transition band is the frequency range in which the
frequency changes from the pass-band to the stop-band. The amplitude responses of some of these
filters are shown in figure 2.7.
A digital signal in the time domain would need to be converted to the frequency domain to have
any of these filters applied to them, which would mean that the signal would have to be processed by
a FFT, then the desired filter algorithm, then an inverse FFT to convert the signal back to time
domain information for playback.
In order to remove specific frequencies from an audio signal, I am going to consider using just a
fast Fourier transform to change the data from time domain to frequency domain so that the required
frequencies can be removed without the need of a filter function. The processed signal can then be
converted back to the time domain by an inverse FFT for playback. This would reduce the latency of
the system, but would also remove any character associated with any filters that I could use.
Gain response
1.0
0.1
0.01
0.001
0.1
1.0
10
f/f3c
Figure 2.7 – The frequency responses of various recognized filtersi
2.7 Latency
Latency is the most difficult problem to overcome with real-time processing on a computer. A
computer will always take a finite amount of time to process each instruction it is required to carry
out, and in the case of processing a live signal, this results in a delay between the instruction being
received and the instruction being carried out, often referred to as latency. In order to keep the
latency time as small as possible, it is necessary to keep all routines in the program as simple as
i
Picture taken from [17], pp.87
10
possible. Otherwise usability will be greatly reduced as the delay between a button being pressed and
the associated effect will be too long to be useful to a DJ.
2.8 Assessment of DSP Effects
My intention for this project is that the program will take an input signal from my mixer via the
line in on my computer’s sound card, process the signal and output it to an amplifier and speakers.
DJs typically have two outputs from their mixers – a master output that is routed to the main sound
system of a nightclub, and a monitor or booth output that goes to a separate sound system located next
to the DJ. This is so the DJ has a clear sound source nearby to monitor output levels and mixes. As
shown in figure 2.8, my application would process sound coming from the master output of a mixer,
Input audio from
turntables / CD
players
Headphones to
monitor audio after
effects processing
Mixer
Main nightclub
sound system
Computer has
application running
to process audio
signal
Monitor speaker
outputs
unprocessed
audio to check
levels and mixes
Processed audio is output to
main sound system with
latency delay due to
processing
DJ Booth
Figure 2.8 – The application will process the main audio signal before it reaches the sound system
11
then send it to the main sound system. The latency caused by the application would therefore become a
minor problem, as the DJ can use the monitor speakers for levels and mixes, and a separate set of
headphones coming from the output of the computer to monitor the processed sound.
The most used effects by DJs are filters, as mentioned, so a low-pass and a high pass filter are the
first choice effects to include in my application. Once these have been implemented, I will look at the
other possible effects.
12
3. Software Engineering, HCI & Usability
3.1 Software Design Techniques
The following software management approaches are all current established techniques for software
i
development and problem solving , though not all of them place a high priority on usability:
1. Formal system development
As the name suggests, this is the most formal methodology in this list. The user
requirements are collected, and then form the basis for the whole development process.
From the requirements, a strict mathematical model is constructed to represent all
possible interactions in the system with a language such as Z. This model is refined by
transformations to include higher and higher levels of detail until a precise representation
of the system is created. Once the representation is complete it can be converted into
computer code.
2. The waterfall model
This is probably the most widely accepted methodology for software engineering. The
stages of development are split into steps, such as shown in figure 3.1. Each step is a
contained part of the software development and is completed fully before the next stage
of development is undertaken. At the end of each stage, an assessment is made
Figure 3.1 – Classic Waterfall model for software development
i
Methods 1-3, 5 from [6], 4 from [7]
13
as to whether changes need to be made at any earlier stage of development before the
next step can be undertaken. If changes are required, the development goes back to the
stage where the project needs changing and restarts from this point of the waterfall,
making the same assessments at the end of each subsequent step. This continues for the
entire lifecycle of the product.
3. Reuse-orientated development
This methodology lends itself to an object-oriented style of computer programming, as
requirements of the system are referenced against an existing library of software objects.
The development team would take modules from the library necessary for the program,
and any new features that didn’t exist in the library would be created by the team. The
new objects would be written in a style consistent with the rest of the library, and added
to the library for others to use. The library would be similar to a huge virtual lego kit,
with developers able to construct large programs from pieces of code in the library.
4. Scenario-based development
A scenario is a broader version of the user requirements that are defined in the waterfall
model. User requirements are only concerned with what a user might want the program
to do when he/she uses it. Scenarios also encompass things such as the possibility of
upgrading software, maintenance issues and compatibility with other programs that could
be used in conjunction with the program being developed. Each party concerned with the
project, such as the end user, a network administrator, or a software engineer, has to say
which specific quality of the project they are interested in. The scenarios are created
from these needs, and a scenario mapping is created which demonstrates how the new
scenarios will interact. It also shows how they would fit into an existing system, if there
is one.
5. Prototyping development
Prototyping is also sometimes described as evolutionary development and there are two
different ways of taking a prototyping approach to software engineering – exploratory
development and throw-away prototyping. Both involve a higher level of interaction
with the end user during development than any of the other techniques. In exploratory
development the user is involved directly with developing each bit of the project as it is
required so that the problem is understood properly and that it is implemented correctly
from the users’ point of view. Throw-away prototyping concentrates on trying to develop
the least understood parts of the problem, then consulting the user about how well they
have been implemented. If the solution is inadequate, it is analyzed again to produce a
more suitable solution.
14
6. Version-based development
An updated piece of software is often referred to as a new version i.e. version 2.5. The
updated software might not have been planned, but customer requirements might have
made a new release with updated functionality necessary. Version-based development
starts with this situation in mind, but applies it to the requirements which are known as
well. Development is done in stages with an updated release at the end of each stage. A
high level of system integrity is maintained right from the start because of this, so further
updates can be introduced as required more easily and more quickly than with other
methods I have described here.
3.2 Comparison of Design Techniques
The table in figure 3.2 contrasts the six different approaches to software development that I have
described above. I have ranked them against each other in three different categories that I consider are
important to software design and development.
Technique
Speed of
Ease to introduce
Ease of keeping to
development
extra user
schedule
requirements
Formal system
6
6
1
The waterfall
5
5
2
Reuse-orientated
4
3
4
Scenario-based
3
4
5
Prototyping
1
2
6
Version-based
2
1
3
model
Figure 3.2 – A comparison of the pros and cons of some software design techniques
Though this comparison is not a quantitative analysis, it shows the areas where each technique is
strong. I have ranked version based development (VBD) as the best technique for software
development according to my criteria. Prototyping is very close to VBD, but suffers from the
difficulties that would occur due to not having a well defined initial schedule. Both are flexible when
new user requirements are introduced and produce solutions quicker than the other methods I have
15
looked at. This would be an advantage if software usability was considered from the start of software
development.
In the past, software engineering has tended to leave the usability testing towards the end of the
development process. This is noticeably apparent in the popular waterfall model, where system
testing is the last but one stage of the development waterfall. This is even more the case in formal
system development, where there is no user consultation about usability other than at the initial stages
of development and when the final software has been produced. There is a greater tendency to give
much more emphasis to the usability of a piece of software nowadays. After all, if the user can’t use
it then the software has failed at the most basic level.
3.3 Case Study – Design of Windows 95 GUI
A case study of the Windows 95 user interface demonstrates how a team of twelve people from
various disciplines and experience in user interface design were given a deadline of 18 months to
produce a new user interface for the update of Windows 3.1. The aim of the project given to the
development team was:
“1. Make Windows easier to learn for people just getting started with computers and
Windows;
2. Make Windows easier to use for people who already use computers-both the typical
Windows 3.1 user and the advanced, or ‘power’, Windows 3.1 user” [8]
It was decided that the traditional waterfall method of development would not be an appropriate
method of development for the project, as usability testing happens towards the end of the design
process. For this project usability was the most importance factor, so it needed to be considered from
the start of the project. This led to the decision that an iterative, or evolutionary, design approach
should be taken. The team divided the process into three phases: exploration, rapid prototyping, and
fine-tuning. They followed the design process shown in figure 3.3.
Figure 3.3 – Windows 95 development model
The rapid iteration process allowed the development team to quickly adjust to problems that they
identified. However, they found that the speed with which they were doing rapid iteration phases
16
meant that the original design specification, which represented hundreds of person-hours, became
redundant due to the results of the initial lab tests. As a result, the team discarded the design
specification and decided to let the prototypes that were created be the specification of the system.
The team found that this way of combining the working prototype and the specification “…invites
richer feedback, because the reviewer has to imagine less about how the system would work.” In
other words, for less work and the ability to carry on with the rapid prototyping at a rapid rate, a much
better report of how the project was progressing could be produced. Also, the specification was
always up to date, but also changing all the time. As a result all of the team had to keep up with every
evolution of the project through regular meetings and people outside of the development team had to
be informed of the changes to the system in the same way. It also meant that there was no absolute
model that was being worked to, and therefore no definite idea of scheduling for the project.
This case study is a good example of how to approach a software project where usability is
the primary concern, and find a solution in a limited time. It wasn’t necessary for the Win95
team to build a whole piece of software from scratch, as the functionality of Windows was
basically the same as that of Windows 3.1.
The team were concerned that the end product would be as easy to use by users unfamiliar
with computers, users familiar with computers and Windows 3.1 and Windows 3.1 “power”
users, who had experience with using Windows 3.1 and setting up computers for various
different tasks. Because of this, a large number of subjects were used by the Win95 team to
conduct the usability tests needed (560 to be precise [8]).
However, a general consensus seems to exist now among people involved with usability
testing that testing five or six people produces the results necessary to indicate the success of the
test in question. This is because usability testing is not an exact numerical science, and its aims
are to get a common opinion on how the application in question should work. Extra subjects
don’t raise any useful new problems, but replicate the general results produced from the initial
few, merely reinforcing the initial results.
3.4 The User Model and Metaphors
Everyone who uses a piece of software has a ‘user model’ [9, 10] for that piece of software. A
user model is a user’s particular expectation of how that software will behave when they perform
each action on it. This is true for everyone using any piece of software, or in fact anything else, even
as simple as a fork or pencil. Figure 3.4 shows the appearance of Nullsoft’s WinAmp audio player,
which is a perfect example of usability in an application. It demonstrates all of the ways of creating a
GUI which is easy to understand. The user model for WinAmp is generally that which is intended by
Nullsoft. WinAmp is used to play a variety of different audio file types and CDs from a computer’s
CD-ROM drive, with additional functions such as volume control, stop and pause. Its appearance is
17
similar to the facia of a real hi-fi system, so a new use would expect WinAmp to behave as its
intended.
One of the methods of making a GUI more intuitive is metaphors – buttons, etc. that have a realworld association. Because of WinAmp’s appearance, most users wouldn’t have much difficulty in
working out how to play a song using WinAmp
.
3.5 Affordances
WinAmp also demonstrates affordances – “Well-designed objects make it clear how they work
just by looking at them. Some doors have big metal plates at arm level. The only thing you can do to
a metal plate is push it. … The plate affords pushing. It makes you want to push it.” [9] The
buttons and sliders in WinAmp are affordances. They look like they should do something if they are
pressed.
Figure 3.4 – WinAmp’s appearance is much the same as a typical hi-fi system
Two conflicting opinions on metaphors are proposed in [9] and [10]. [10] says “The idea that
good user interface design relies on metaphors is one of the most insidious of the many myths that
permeate the software community.”, whereas [9] says that for a new piece of software, if ”the user
model is incomplete or wrong, the program can use affordances or metaphors to show the user
model.” I am more inclined to believe the opinion held in [9], and also believe that affordances and
metaphors can be used to enhance the user model further, such as on the buttons on the WinAmp
18
GUI. The buttons don’t simply say play, but have the play symbol used on all electronic equipment,
and appear three dimensional as well.
3.6 Skinnable User Interfaces
‘Skins’ are an increasingly popular way of being able to personalize applications. The icons
associated with each button, menu, slider, etc. in an application can be changed according to a user’s
personal tastes, either downloaded from the internet, or created from scratch. Again, WinAmp is a
prime example of this (it was one of the applications to first introduce the concept of skins) and a few
examples of WinAmp skins are provided in appendix C. Skins don’t add any extra functionality, and
don’t change the basic layout of the application, but gives the user a level of customization with an
application, as the generic appearance can be replaced with something more individual. This is an
effective way of improving the human-computer interaction due to the added emphasis at a personal
level.
3.7 Usability Assessment of My Application
In order to assess the usability and functionality of my application, I will get five people to have a
go at using the application with as little instruction from me as possible, and then get each person to
answer the following questions:
•
Do you understand the purpose of the application from its appearance?
•
Is the GUI interesting?
•
Do you understand what all of the functions do, and if not what features are confusing
and why?
•
Do you find the application useful?
•
Do you find that the application has a feature you haven’t used before?
•
Do you have any suggestions to improve the application?
The questions are open ended so that comments can be made about particular aspects of the
application.
My intention is to get one occasional computer user, one beginner DJ, two semi-professional DJs
and a music producer to do the usability tests. An occasional computer user should give opinions
based primarily on the GUI rather than the precise functionality, as I will pick him or her also on the
basis of having no music production or DJ experience. The other four testers will provide a range of
opinions based on the practicality and usability of the application as a DJ tool.
19
4 Digital Sound Processing With Programming Languages
4.1 Comparison of Programming Languages
“The programming language C has become a predominant force in engineering and computer
science in the past 5 to 10 years…Other languages are available, and could be used for filter design,
but C provides the best combination of efficiency and effectiveness.” [7]
This quote shows the wide acceptance of C for use in filter design in particular, but is also true for
DSP programming in general, and C++, a direct descendant of C [11], is now the language of choice
for much DSP programming. However, I have found that Java is the most relevant language for
producing a solution to the problem I am investigating as explained in this chapter.
Java is the language that I have the most experience in, but it is also the language that readily
lends itself to produce suitable solutions to every aspect of my project. I also have experience in
using C++, though not as much as Java. As stated above, it would be the first choice for DSP, as it
has a wide number of libraries already available for signal processing that could be referenced. An
example is The Sound Object Library, an object oriented audio processing library that has the benefit
of cross-platform compatibility. Though this library would have advantages for use in a project such
as this with regards to the digital sound processing aspect, C++ requires programming at a more
abstract level, and doesn’t provide the same support of GUIs as Java. Java also has extensive online
support and documentation for all APIs it implements [12, 13], and is similar enough to C++ allow
code to be adapted between the two languages. Visual Basic is a language that is popular for
developing applications, but as I lack any experience with it, I decided against it as a possible
language quite quickly.
CSound is a language I found that is used for MIDI to produce synthesisers and process MIDI
files which also provides methods for processing sounds. It is a language in its own right and has
wide support, but does not have any support for processing sounds from a line source, or other realtime audio stream, and had to be ruled out for use in this project because of this.
4.2 Producing GUIs with Java
There are two libraries within the Java Development Kit version 1.3 that can be used to create
GUIs – the Abstract Window Toolkit (AWT) and the Swing Application Programming Interface
(API). Swing is a newer, more advanced GUI development toolkit than AWT introduced in JDK 1.2.
This seems to make it an obvious choice over AWT, but AWT has some important advantages over
Swing. Most importantly, it is in older versions of JDK than version 1.2, which means it is more
likely to be present on computers because, in general, updates aren’t always downloaded once they
have been released. AWT can be implemented in a ‘lightweight’ form as well. This means that AWT
20
components don’t require the use of as much of a system’s resources when running compared to
Swing components.
4.3 Java and Skins
“Why should you use skins? … [you] cannot guarantee that your user base already has JDK 1.2
installed … and there is significant overhead with downloading Swing. The same argument applies if
you write a downloadable application, or one which the application size may be an issue.” [14]
‘Skinnable’ GUIs can be implemented with AWT and is a valid argument for using AWT over
Swing, provided that skins are the approach to be used for implementing the GUI. Otherwise, Swing
provides classes for far more elements that might be necessary in an application. Also, the quote is
taken from an article written almost two years, in which time Java has evolved from JDK 1.2 to JDK
1.4 and has become more widely used. It should also be noted that Swing is an older API than either
Java media framework or Java Sound, the Java APIs used for audio processing. Therefore Swing will
automatically be present in any JDK version used to develop an application that primarily processes
sound.
It is possible to use one of Swing’s methods, javax.swing.JButton.setIcon, to link specific images
to buttons created with Swing in much the same way as AWTs approach to skins. However, this
method isn’t implemented with all of the Swing components, and the lightweight advantage of using
AWT is lost, limiting its usefulness. The appearance of all Swing components can be changed using
Pluggable Look And Feel (PLAF). JDK has an API for creating a new PLAF for an application, but
the documentation for using it is all but non-existent. Ultimately, skins provide a way of introducing
some creative HCI provided users go to the trouble of making or downloading skins, but essentially
don’t add anything to the functionality of an application, as stated in section 3.5.
4.4 Java Swing
The extra components provided by Swing and the need for me to use JDK 1.4 for the either of the
sound APIs means that I considered Swing to be the more appropriate Java API for my project.
A Swing component is any element of the GUI, such as buttons, sliders, or labels. Swing
constructs the GUI by creating all components inside a container, and positioning them according to
the selected layout manager. The possible layout managers are all predefined. The most flexible is
the GridBag layout manager, which splits the window into a number of areas that can be of different
sizes. A component can then be positioned in each area. An example of the GridBag layout manager
is shown in figure 4.1. Each interactive component of the GUI has an ActionListener and an
actionPerformed method associated with it. The ActionListener detects the component being used
and runs the associated actionPerformed method, which in turn calls any methods that are required to
be run when that particular action is performed.
21
Figure 4.1 – An example of the GridBag layout manager, showing the window split into nine areas
and containing five components. This also shows a different PLAF to the basic Windows one.
4.5 Sound Processing With Java
As mentioned above, JDK has two libraries that can be used to edit and play sound within java
applications – Java Media Framework (JMF) and Java Sound (JS). Unlike Swing and AWT, these
are entirely different libraries that are intended for different uses. The purpose of JMF is to enable
control and manipulation of audiovisual media over the internet or in applications. Part of this would
be the implementation of sound, so this API appeared to have the potential to provide the necessary
controls over audio streams I require for my project. However, the methods in JMF are aimed at
developers who need to play and synchronize files of different types, i.e. an MPEG movie file and an
AIFF audio file. The methods contained in JS enabled much more control of audio streams than those
in JMF.
4.6 Java Sound API and Class Interaction
JS is broken into two sub-packages: sound.sampled and sound.midi. The midi sub-package
contains methods and classes that are used to produce applets and applications capable of midi
compatible sound manipulation and synthesis, which is not as useful for real-time processing as the
sound.sampled sub-package. The sampled package contains a wide variety of classes and methods
capable of capturing, playing and manipulating audio streams in a large number of formats.
NB All references to JS from here refer to the sound.sampled sub-package of Java Sound.
The classes in JS are linked in much the same way as the various pieces of equipment in a
recording studio. Figure 4.2 illustrates how the various classes for audio input and output interact.
22
Figure 4.2 – How audio input and output is routed through a Mixer object in JS
A Mixer object is used to control audio streams, combine them from input for processing and
combine them after processing for output. Port objects are used to control the audio input and output
hardware in a computer, such as a CD player, a line in or a pair of headphones. Ports are an
implementation of the JS interface Line, as are SourceDataLines, TargetDataLines and Clips. The
Line interface has an abstract class called Control associated with it that is extended to the various sub
classes. Control has methods that can be used for some simple audio processes on the line associated
with it, such as gain and pan, as long as the control is supported by the line, which in most cases it
would be.
4.8 JS and Audio Formats
JS supports a large number of audio formats and the format of a line must be specified when it is
opened from one of those supported. The format consists of:
•
Data encoding: This is how the sound is represented as data. JS has four pre-defined
encoding formats – PCM (pulse-code modulation) signed, PCM unsigned, mu-law and alaw. PCM encoding is a linear encoding method used for CDs which is a string of values
representing each part of a sound over time. Mu-law and a-law encoding are both nonlinear methods that are more compressed, and therefore lower detail, than PCM encoding.
They are typically used for telephony or speech recordings where sound doesn’t need to
be particularly detailed.
•
Sample rate: The sample rate is how many samples in bytes are taken each second for
each channel.
•
Sample size in bits: Either 8- or 16-bit.
23
•
Channels: The number of channels of the audio signal. A stereo signal has two channels,
one for the left signal and the other for the right signal. A mono signal only has one
channel.
•
Frame size (in bytes): All of the data for all channels at a particular time is represented by
each frame. The size of each frame depends on the encoding as each encoding represents
an audio signal in different ways. For PCM encoded data, the frame size in bytes is the
number of channels multiplied by the sample size in bits divided by 8, the number of bits
in a byte.
•
Frame rate: The frame rate is also different for each encoding, as the rate at which a
frame can be processed will depend on its size. For PCM encoded sound, the frame rate
is the same as the sample rate.
•
Big-endian or little-endian: For 16-bit signals, each sample for each channel has two parts
to it, and these can be ordered either most significant byte first (big-endian), or least
significant byte first (little-endian).
The formats supported by a computer are determined using JS’s AudioSystem class, which is the
highest member of the JS hierarchy. All of JS’s resources are obtained from this class, such as Lines
and Mixers. The formats supported by a Mixer are found from the AudioSystem, which in turn
determines which Lines can be implemented, and the possible formats of these Lines. Although a
Mixer is required for Lines to be implemented, an application doesn’t need to explicitly create a
Mixer. All Lines are initialised using the open() method, which specifies the format that the Line will
have, and can be started and stopped using the start() and stop() methods, though an audio stream has
been loaded into the internal buffer of the Line first.
4.9 JS Audio Input
A problem with JS that wasn’t apparent when I decided to use JS was that none of the port classes
have been implemented by Sun Microsystems yet. Input Ports represent the input hardware in a
computer, such as a microphone, and are used to take a stream of data representing sound going into
of the application directly from the ports to then be processed and output to speakers as necessary.
Obviously, this is an integral part of my project. TargetDataLines are an alternative way of capturing
an audio stream, but whereas the port class takes data directly from the relevant port, sound that is
being played by the computer is streamed into the internal buffer of a TargetDataLine using its write()
method. The size of the buffer is specified when the TargetDataLine is initialized, but has to be an
integral number of sample frames. For audio in CD format, where the frame size is eight, the buffer
size would have to be a multiple of eight. The data that is in the internal buffer can then be written to
an array of bytes that can then be processed as desired. The amount of data that can be written
24
doesn’t have to be the same size as the internal buffer, and can be written to any part of the byte array.
The important difference between the Port class and the TargetDataLine class is that TargetDataLines
capture sound being played, and don’t interact with the various line-ins of a computer’s hardware at
all.
4.10 JS Audio Output
Other than output Ports, there are two ways to output an audio stream using JS: SourceDataLines
and Clips (Clips are not shown in figure 4.2, but have the same interaction with the mixer as a
SourceDataLine). SourceDataLines stream audio data for playback in the same way that
TargetDataLines capture audio data, and are more appropriate for handling audio in real-time because
of this. Both SourceDataLines and Clips are initialized with a buffer that can be varied in size like
that of a TargetDataLine. SourceDataLines load audio into the internal buffer from a byte array and
then play the audio segment once the array is completely loaded, whereas Clips have to be initialised
with a byte array and the audio format as well as the buffer size, or a pre-defined audio stream. The
information is loaded into a buffer that can then be accessed as required and played back
immediately. Clips have a much lower latency because the data is loaded before it needs to be played
not when it needs to be played.
25
5 Software Design and Implementation
5.1 Borland JBuilder 5.0
In order to create my application, I chose to use Borland’s JBuilder 5.0, pictured in figure 5.1. I
chose to use it because it is a powerful visual Java development program for Windows which has
many features to help create applications and applets from scratch and it is available free from
Borland’s website. It automatically comes with version 1.3 of JDK, which contains all of the APIs I
need for the project as discussed in chapter 4. It creates packages for applications that contain all of
the classes that are necessary for the application, shown in the project pane. The structure pane shows
all of the variables, methods, etc contained in the selected class and is useful for quickly finding a
specific item in a long piece of code.
Figure 5.1 – Borland’s JBuilder provides many useful features to help program development,
referenced from JBuilder help file
26
Also, JBuilder has basic templates for applications, projects and classes which automatically
generate the basic code necessary for the item concerned. With applications for example, it creates a
basic window frame with the option to include a generic header for classes, a menu bar, a toolbar and
an About dialog box. This is very useful when creating classes in particular, as the standard methods
that need to be included are generated automatically for you and the required libraries, extension and
implementations are included automatically. This saves time instead of having to repeatedly type out
the same code.
The main window in JBuilder, called the content pane, is where all source code is written for the
selected object, but can also be toggled to a design view which is used to construct GUIs using AWT
and Swing (and other API’s for internet development). A graphical drag and drop layout manager
allows a GUI to be constructed in front of your eyes, instead of planning the exact coordinates of the
layout on a piece of paper and writing code to produce it.
5.2 Software Design Approach
Version-based and evolutionary developments appear to be the most suitable methods to develop
the application out of those discussed in sections 3.1 and 3.2. Formal methods are not flexible
enough to allow for adequate usability design, it would take to long to specify the requirements in a
suitable language and I don’t have a thorough enough knowledge of any formal methods to be able to
create the program using formal methods. The waterfall model and scenario-based development are
orientated heavily to large projects involving a lot of people who have to interact with each other to
produce and oversee the project, which isn’t appropriate for an undergraduate final year project.
There is also a lack of emphasis on usability and human-computer interaction issues. Reuseorientated development is more of a concept than a practical approach to software engineering at the
moment.
Java is a heavily object orientated language, so there will automatically be a certain level of
object orientated style to the coding. This is another advantage of using Java, as I can split the
application into separate parts and then tackle each part one at a time. The different parts would be
the GUI and each different effect that is used in the application. It might also be possible to create a
generic effect class that would be extended by each particular effect that I chose to implement.
5.3 User Requirements
Both evolutionary and version based development need some initial user requirements defined,
though these don’t need to be as thorough as with other techniques such as the waterfall model. The
user requirements for the project are:
•
Be able to understand the controls of the application.
27
•
Be able to use different effects units on a real time audio stream being played by the
computer.
•
Be able to playback processed sound with low latency.
•
Be able to stop and start any effect instantly.
5.4 Application Development
I intended to produce the application according to these development stages:
1. Design and implement a simple GUI using Swing that has buttons for basic features.
2. Create code using JS to provide functionality for the basic features of the GUI.
3. Integrate the functionality with the GUI.
4. Test the application’s usability and get suggestions about further improvements to usability
and functionality.
5. Assess whether any of the suggestions made are possible and/or suitable for the application.
6. Make the possible improvements
7. Extend the GUI to include buttons to accommodate further possible functions and produce
code using JS to provide functionality for functions, and then go back to stage 4.
5.5 Application Evolution 1.0
The first evolution of the GUI shown in figure 5.2 did not take long to create, and consisted of four
sliders and four buttons, mimicking the common controls available on a mixing desk of master level,
and treble, mid range and bass levels. The kill buttons for each EQ were intended to reduce the level
of the associated frequency range to zero instantly, in the same way that kill switches on DJ mixers
work, as described in section 1.2. These functions are similar to those that a DJ would have on a
Figure 5.2 – The first evolution of the GUI for the application
28
mixer, but I chose to start with these as creating a three band EQ is essentially three notch filters with
a frequency range that cannot be altered. The code for the three notch filters would be identical
except that each would remove a different frequency range. This code could then be used to create a
variable frequency-range filter at a later stage of the application development.
5.6 Application Evolution 1.1
After the initial GUI had been creating, I then looked at introducing some functionality for the
features. This was when I found out about the lack of implementation of the Port class, and had to
think of a way to alter my intentions for the application. As it was still possible to capture music
while it was playing with a TargetDataLine, the solution would have to involve adding an effect that
wouldn’t require changing the original sound signal, unlike my original plans to implement low pass,
high pass and band pass filters.
Echo/reverb effects, flangers or phasers (as explained in section 2.5) are all effects that add to the
sound signal, not take bits away like filters. Flangers need to be processed and played with a very
short delay from the sound source which wouldn’t be possible due to the latency of the application.
Phasers have even worse latency problems, as the processed signal needs to be played at the same
time as the original signal which is an impossible goal to try and achieve. This means that the only
feasible effect out of these three is an echo or reverb effect. Echo has plenty of potential for
variations, such as echo speed, echo decay and volume. Reverb would need to have preset delay and
decay values, so I decided to implement an echo effect first.
Due to the fact that there was no functionality for the GUI, there was no user assessment for
version 1.0 of the application.
A new GUI was designed to accommodate functionality for an echo effect, shown in figure 1.1.
Figure 5.3 – The first version of the GUI with an echo effect
The button labelled ‘Process’ is a toggle button, which is a button that has two states. For the
echo effect, these are ‘Process’ and ‘Stop Process’. The first time the button is pressed, the
29
application starts the echo effect. When it is pressed a second time, the application knows that the
user wants to stop the echo effect. The reason for making the ‘Process’ button a toggle button was to
make use of metaphors – as the button stays pressed when the effect is on, it looks more like a switch
on a mixing console. The process_actionPerformed() method checks which state the button is
currently in so that the correct method of the EchoSound class is called to either start or stop the
effect.
The echo level slider is used to change the volume of the echo effect and can be assigned ten
possible values. The delay time radio buttons let the user select the length of the echo that is applied
to the audio signal. The application can be closed either by the Exit command in the File menu, or by
the ‘X’ button in the top-right of the window.
5.7 The EchoEffect Class
The implementation of the echo effect creates a TargetDataLine object, creating an audio stream
with it. Once the TargetDataLine buffer is full, the audio stream is then passed to a SourceDataLine
object to be played back in order to keep processing time short. The delay is created by the time
taken to fill up the TargetDataLine’s buffer, put the data into a byte array and fill the buffer of the
SourceDataLine with the data in the Byte array. Assuming that the time taken to write a byte of data
to any of the buffers is negligible due to the high processing speed of my computer (~500 MHz), this
would lead me to believe that the time between the original signal and the delayed signal would be
given by the equation shown in figure 5.4.
Time delay (secs)
=
Byte array length
4
X
1
44100
Figure 5.4 – The delay of the echo effect should be given by this equation
The only writing to buffer that takes time is that determined by the sampling rate of the
TargetDataLine, which is 44100 Hz or one sample every . The four bytes of data per sample would
be written to the byte array almost instantaneously, and once the audio is in the data buffer, it should
be loaded into the SourceDataLine at a rate comparable with the processing speed of my computer,
and should take an insignificant amount of time. However, the delay time appears to be bigger by a
factor of twelve. It is unclear why this happens from the information I have about Java, but may be
due to Java processing the audio signal at a slower rate than I assumed above. This doesn’t have any
adverse effect on the application though, as the smallest buffers would produce a far shorter delay
time than I would find useful in the application.
The decay of the echo is due to the echoed sound being part of the sound signal of the subsequent
audio stream and is reduced by the same factor each time it is repeated. This means that the echo has
30
a negative exponential rate of decay. The buffer size of the TargetDataLine cannot be changed once
the TargetDataLine has been opened, so during processing the delay time has to remain the same. In
order to change the level of the echo, the GUI has an ActionListener running on the gain slider. If it
detects a change in the value of the slider, it calls the setGain method of the EchoSound object it
contains. The method changes the Gain variable in an EchoSound object, which in turn changes the
gain value of the SourceDataLine for the next loop of the sound output in the EchoSound.
Sample code for using JS available on the internet and an email based discussion list run by Sun
Microsystems were helpful in learning how to use JS efficiently. Initially I based the EchoSound
class on the CapturePlayback class contained within the Java Sound demo available from the Sun
website. This class is used to record an audio stream and store it, then produce a simple waveform
and enable the user to playback the sample as desired. The code transformed the audio stream into
several different stream types to enable it to be stored and to be analysed so that the waveform could
be produced, which isn’t necessary for the delay effect I wanted to produce as.
The format for the sound signal processed by the EchoEffect class is signed PCM encoding at
44100 Hertz, 8-bit and mono signal. This doesn’t take advantage of the most detailed format
available to JS, which is CD quality as explained in section 2.2, but reduces the processing time of the
audio signal by a factor of four. The slightly lower detail of the audio was not a concern of any of the
usability tests.
5.8 Usability Tests 1
Due to other commitments, the music producer I had hoped to involve in the usability tests was
unable to do so. No other person was available to take his place, so only four people took part in the
tests. This still provided useful assessments of the application.
Figure 5.5 –The two possible controls for the delay time function, with radio buttons, and with a
slider
31
By the time of the usability tests, I had realised that there were two alternatives for controlling the
delay time in the application – one was to use the original radio buttons to select the echo speed, or
use another slider. They are both shown in figure 5.5. Usability testing showed that the slider was a
more popular way to be able to alter the delay speed.
The overall opinion of the GUI from the DJs was that it was simple to understand, but minimal in
appearance. One concern that the occasional computer user mentioned was that at first he didn’t
understand that the button labelled process started the effect. He also mentioned that the scales of the
two sliders didn’t make much sense. The scale on the echo level slider goes from 0 decibels to +10
decibels. The labelling used was the scale compared to the default gain level that JS outputs audio at,
which is much quieter than the source signal. So though the scale makes sense to me as the software
developer, to a user the scale is meaningless. In contrast, he commented that the scale of the delay
time slider wasn’t detailed enough. This was only a minor complaint, and doesn’t need as much
attention as the other two labelling problems that he mentioned. The problem with the echo level
slider was also mentioned as a usability problem by one of the DJs and during my progress meeting,
so is the biggest usability concern at this stage. The process button is also misleading, and will also
be changed in the next evolution of the project.
As this was only the first evolution of the application I expected comments about the lack of
features, and the usability tests at this stage were primarily for suggestions to improve functionality in
the next evolution. The suggestions made during testing can be split into two parts: extra functionality
for the EchoEffect class and addition of different effects.
Extra functionality for the EchoEffect class suggested was to:
• have a filter included as part of the EchoEffect’s functionality, which would filter the echoed
audio. Specifically, an echo on a signal with a lot of bass sounds distorted and messy, and a
filter could be used to quieten or remove the bass part of the audio stream before it is played
back by the EchoEffect class.
• allow alteration of the echo speed during processing. Because the echo is caused by the
buffers of the TargetDataLine and SourceDataLine, the echo speed cannot currently be
altered while the effect is on. This could be changed
• allow alteration of the decay time of the echoed audio. This is a standard feature on most
echo/delay effects units.
The most popular extra feature that was suggested was, frustratingly, filters of various kinds. As
explained in sections 4.9and 5.6, filters are not possible to implement currently with JS. A sampler
was the only feasible effect that could be implemented with JS suggested by the testers.
Another interesting feature suggested was a ‘beat counter’. A beat counter can determine the
tempo of a record by analysing the low frequencies of the audio signal for the bass drum beats per
minute (BPMs). They are normally featured on mixers in pairs, one on each channel, so that the
32
tempo of two records can be matched without listening to them. This feature would be possible to
implement in the application using a fast Fourier transform to convert the audio stream to spectral
information. Latency wouldn’t be an issue as there is no immediate need for the results to be
calculated. Once the BPM was calculated, it could be used to alter the echo speed to be a factor of the
BPM so that it was in time with the audio signal. It couldn’t be used to determine the speeds of two
records however, as the audio input comes from the mixer, not two separate turntables. It would be
possible to synchronize MIDI signals to the tempo of records being played using a beat counter in the
application, but this would be another project in itself.
5.9 Application Evolution 2.0
The tasks that need to be done for this evolution are:
• The labels for the process button and the echo level slider need to be altered.
• The functionality of the echo speed slider needs to be increased to allow the user to alter the
echo speed while the effect is on.
• A sampler feature needs to be added.
Introducing the sampler to the application meant that the GUI had to be expanded to
accommodate the extra functionality. There were two possible ways of doing this. One alternative
would be to create a new window that would be opened up with all of the functions related to the
sampler on this new window, while the echo controls remained in the original window. This would
give more of an impression of two separate effects ‘units’, which is what they are. After trying both
alternatives out, it became apparent that the best way was to simply extend the existing window. If
two windows are opened in the same application, the one that is drawn second is not independent of
the first. If the first window is closed or minimized, then both the first and the second window have
the action carried out on them. However, if the second window has the same action performed on it,
it is only the second window that is affected, not both windows. This is contrary to the fact that the
two windows should represent two independent effects. This master window and slave window
situation could be useful if the application had a main mixer that controlled the overall output from
various effects units. However, as the only controls I have are those of the effects, this would be a
pointless extra window for the application’s current functionality.
A simpler and, from a user’s point of view, less confusing way of including the sampler in the
application would be to extend the existing window, and then the necessary functions could be added
either underneath or alongside the ones for the echo effect. I chose to add the sampler functions to
the same window, underneath the existing buttons before any usability testing as it seemed like an
obvious choice. Also, in a music studio effects units are stacked on top of each other in racks, so
mimicking this kind of setup in the application takes advantage of usability metaphors.
33
5.10 The SampleSound Class
Creating the functionality for the sampler to capture sound was done in exactly the same way as
for the EchoSound class with a TargetDataLine. A Clip was more appropriate than a SourceDataLine
for playing back the sample though, as once the sample was captured, it didn’t need to change at all
and there would be quicker playback of the sample when needed.
The SampleSound class created for the application contains two subclasses. These are
CaptureSample and PlaySample. The functions they perform are self explanatory. They are
implemented inside the SampleSound class rather than as separate classes in their own right because
they need to reference the same audio stream. The audio stream could be passed to them each as a
variable when they are constructed, but this would use a lot of system resources and cause the
application to run slower than by implementing the classes as part of SampleSound.
In order for the SampleSound to be able to play the previous sample while capturing a new one,
the CaptureSample class creates another instance of the audio stream after it captures a sample which
is handled by the PlaySample class. The original audio stream can be written over without a conflict
from the PlaySample class trying to access it at the same time. The sample length can be a maximum
of 23.2 seconds, defined by the buffer size of the Clip at 409600 bytes.
While implementing the sampler I encountered another problem that with JS - the Clip doesn’t
close as it should, with no apparent solution to the problem. After a brief talk on the Java Sound
discussion listi, I found out that Clips and SourceDataLines use the same format as CDs for output
instead of the format specified when they are opened. As TargetDataLines can only be opened in the
same format as a Clip or SourceDataLine that is already open, this means no new TargetDataLines
can be opened after the play function of the sampler is used if I continued using the audio format I
had chosen. In order to overcome this problem, I needed to change the format from 8-bit mono to 16bit stereo. The more complicated format would have to be used at the expense of some of the
operational speed of the program.
Due to a lack of implementation of the TargetDataLine class by Sun, only one TargetDataLine
can be open at any one time in an application. As both the sample function and the echo effect
sample need to open separate TargetDataLines, the sampler cannot be used to capture an audio clip
that is being processed by the EchoEffect class. This is a frustrating problem, and one highlighted by
the usability tests for this version of the application. A sampled sound can be played at the same time
as the EchoEffect is used though, as the play and reverse functions only use Clips which don’t have
the same lack of implementation.
i
All articles from [17] titled “Using Clips with other data lines”
34
5.11 Application Evolution 2.1
After the basic sampler was working correctly, I looked at extending its functionality by
including a class that would reverse the captured audio stream, and load it into a separate stream that
could also be played back. Though it wasn’t mentioned during usability testing, it is a feature that a
lot of DJs would be interested in having on a sampler, but one that isn’t present on any samplers that I
have used.
5.12 The ReverseSample Class
…
1
2
3
4
5
n-3 n-2 n-1 n
6
7
8
9
n-7 n-6 n-5
n-6 n-5 n-4 n-3 n-2 n-1 n
…
…
12
5 6
7
8
1 2
3
4
Figure 5.6 – How an audio stream would be rearranged to play in reverse
As explained in Chapter 1, an audio stream in CD format has four pieces of information for each
time-frame sample, not just one that makes up an 8-bit mono signal. This presents extra
complications for manipulating the sound for extra processes that could be included with the sampler,
such as reversing the audio stream. The principle of reversing a digital audio stream is a simple one:
!#"$%
& '
( ) *+%, ) - ) . /-0 1212-3
4 /
& ) - ' 5
& ) 67% - '
85
& ) 1- ' & ) 9%:;% - '
& ) 1<:- ' 5
& ) 6% - '
& ) 17- ' 5
& ) 6% - '
=
Figure 5.7– Algorithm for reversing a 16-bit stereo audio stream
35
just reverse the byte stream of the audio needed. With a 16-bit stereo signal this process becomes a
bit more involved, as the 4 bytes per sample need to be kept together in the correct order when the
signal is reversed. Figure 5.6 shows where each of the bytes of a 16-bit stereo audio stream are
initially, and where they end up after being reversed.
The code for the algorithm I used in my application to restructure the audio stream for reverse
play is shown in figure 5.7. audioData is the array of bytes that defines the original audio stream. An
audio stream (reverseAudioStream) is constructed from the reverseData array once the data has been
reversed that is then used by the play method for playback in the Clip.
I gave both the play and reverse play functions their own stop buttons to increase versatility.
Initially there was a single stop button that stopped both functions if they were playing. This
evolution of the application is shown in figure 5.8.
Figure 5.8 – Evolution 2.1 of the application
5.13 Usability Tests 2
As this evolution of the application had introduced a considerable amount of functionality, I
decided to carry out the next round of usability tests to discover any problems with the SampleSound
class. Then problems with both EchoSound and SampleSound could be resolved in the next
evolution.
The appearance of the GUI had remained the same as the previous evolution, so the comments
were much the same as before. The occasional computer user didn’t realise at first that there were
36
two separate effects in this version and suggested something to split the two effects units up would
make the GUI clearer. This was also a minor concern of one of the DJs as well, though he understood
the two were separate effects. A common misunderstanding with the GUI was that the echo level
slider controlled the level of the sampler as well as the level of the echo effect, and is a usability
problem that would be made clearer by splitting the two effects units in a more noticeable way. Other
comments from the DJs were that the GUI was clear, but was still minimal and would benefit from
some original graphics for the functions.
As mentioned earlier, a lack of implementation prevented the use of the echo effect at the same
time as a sample is being captured, and vice versa. This was a problem all testers mentioned.
However, the only way to resolve it would be to re-code both the EchoSound class and the
SampleSound class, and with hind-sight, the problem would be easy to remove if I had taken a more
object-oriented approach with the coding style.
The sample duration along the bottom of the GUI wasn’t implemented in this evolution, but will
be included in the next evolution.
As I expected, the DJs were impressed with the reverse sample function, but felt that the delay
between pressing the play button and the sample beginning was slightly too long. An interesting
point raised by the occasional computer user was that when he stopped either a normal sample or a
reverse sample, he expected them to be resumed from the place that he stopped them at. It would
appear that this particular usability problem highlights differences in user models for applications, as
none of the DJs mentioned this. As such, it isn’t a problem that needs to be addressed, as the intended
audience wouldn’t appear to have difficulties with the functionality of the play and reverse play
buttons. The comment by the occasional computer user did imply the possibility of including a
separate pause function for the sampler though.
Most samplers have the ability to put a sample on continuous loop, and perhaps because of this, is
a function that the DJ testers thought would be a useful addition.
5.14 Increasing The Usability of The Application
I still hadn’t addressed the usability problems of the EchoEffect class from the first usability tests.
The labels were changed without difficulty, as Java has methods to set the labels of any Swing
components to a desired string. ‘Process’ was changed to ‘Start Echo’, ‘Stop Process’ was changed
to ’Stop Echo’ and the echo level slider labels were changed to ‘High’ and ‘Low’.
Changing the delay speed function so that it could be altered during playback meant that the way
that the echo was created needed to be changed. Instead of changing the internal buffers of the
TargetDataLine when it was opened, I created a separate byte array, dataIn, which could be altered in
size while the echo effect was operating. Both the TargetDataLine and the SourceDataLine are
opened with buffers large enough to accommodate the longest possible echo. When the EchoEffect is
37
switched on, the TargetDataLine writes an amount of data to the byte array equal to the length of the
byte array, and it is then converted to an audio signal by the SourceDataLine. The delay is produced
by the TargetDataLine writing to the array and then the SourceDataLine loading the array into its
buffer, which is essentially the same as before, but the amount of audio captured in each loop is
restricted by the byte array, which can be changed in size dynamically, not the static TargetDataLine
and SourceDataLine buffers. The slider has an action listener, and when it detects that the slider has
moved, it uses the setDelay() method of EchoEffect to change the delayVal variable. This then
changes the size of the byte array in the next loop of the EchoEffect, as shown in figure 5.9. This
didn’t work properly at first – the delay speed would increase as the slider was moved up, but it didn’t
decrease when the slider was moved back down. This was because TargetDataLines and
SourceDataLines have to fill and empty their buffers at a steady rate and they cannot be decreased
while they are in use, unless they are told to completely empty the buffers, using the flush() method.
This was implemented in the if loop shown in bold in figure 5.9. Once this had been done, the delay
speed could be increased and decreased while the effect was on.
! ) > -3
[email protected] *
)0A ? [email protected] B6
C
? D) A ? E$ [email protected] N5-D
[email protected] O9 ) O9 - M P!Q"$%
& ) - ) " ( HR S4
$%O9 - '
T U5VDW%XY Z[\5]%X^_`DaW%XY Z[bRZY c5d
Y T e%Xf eg UCY h%i]V cCj
Y T e%Xk9hmlCg UCY h%i]V cCj
W%XY Z[\5]%X^_aVDT e%lDcW%XY Z[bRZY j
n
..
A E5 o pq8$%
CJP M ) M r+r ) - ) M -D-D
..
A HR o pq8$%
CJp B6
! ) M r+r ) - ) M -D-D
=
Figure 5.9 – Code showing how the EchoEffect class enables the delay time to be changed while the
effect is on
38
The two areas of the window was also a concern, so I increased the font size of the ‘Echo’ and
‘Sampler’ labels to try and make them clearer, and increased the top inset of the ‘Sampler’ label, so
that it was lower than previously. I wanted to include a line that would give a more definite
separation of the two areas, but this isn’t a possibility with the generic Swing components, and
designing my own would have been too time consuming at this stage of the project for a minor
usability problem.
The GUI for the final evolution of the application is shown in figure 6.1, and shows the final
changes made as a result of the last usability tests. This version didn’t get tested as the changes made
were only minor, and didn’t introduce any new features other than increased functionality of the delay
time slider. The sample duration feature hasn’t been implemented due to a lack of time which is a
usability concern, but can be solved relatively easily.
Figure 6.1 – The GUI of the finished application
This evolution of the application is the final version for my project, and has documentation for
use which is included in appendix B. This documentation is in keeping with the user friendly ethos of
the application, and is best put by a quote from About Face [10] – “...imagine that users are
simultaneously very intelligent and very busy. They need some instruction, but not very much, and
39
the process has to be very rapid.” The documentation is a single sided guide to setting up your
computer to use the application and the functions of the application. As the program isn’t yet a standalone piece of software, there are no installation instructions.
40
6 Evaluation & Possible Further Areas of Interest
6.1 Initial Project Proposal
Initially I intended to investigate sound editing and recording from line sources to files, in order
to record and edit live DJ mixes for playback, but due to legal issues regarding music copyright this
proved to be an inappropriate project to undertake. An article regarding music copyright provided by
the University Solicitor Adrian Slater states that copyright laws stands for any academic purpose
except “for the purposes of an examination by way of setting the questions, communicating the
questions to the candidates or answering the questions.”[16] It is unclear from this statement whether
coursework can be regarded as an examination, and to prevent any possible infringement of copyright
law I had to present an alternative title to investigate. The problem of real-time DSP was a related
topic that I also had an interest in which is what I chose to look at.
6.2 Problems During Development
The progress of my project has been hindered at times – first by the copyright issues mentioned
above and then at various stages by implementation problems with JS. The biggest difficulty came
when I found that the Port class of JS is absent from the current version of JDK. After seeking advice
from the Java Sound discussion list, from which Florian Bomers and Michael Pfisterer were very
helpful, it became clear that I would have to alter some of the initial aims of the project stated in the
mid project report. In it, I said that high-pass, low-pass and band-pass filters were effects that would
be included in the application, but the absence of the Port class meant that I couldn’t alter the source
signal, as mentioned in section 4.9. I then had to redesign the GUI I had made, and implement
alternative effects chosen from a short list of possibilities.
The lack of implementation of the Port class also seriously hinders the potential of the application
to be developed for any more effects, but the application could be modified so that it is primarily a
sampler which can then have effects applied to the samples and played back over a DJ mix. This
would mean that latency would become much less of an issue as the program would do all necessary
processing before allowing the user to play back the sample.
Another bug I discovered in JS that I haven’t mentioned already is that the default gain level is
different for the two different formats I used during development of the application. Audio played in
16-bit stereo sound is louder at the same setting than 8-bit sound. Unlike the other bugs I encountered
with JS, this one was easy to overcome as the gain of an output stream can be controlled with the
Control class.
A problem I encountered with the application is that on tracks where there are low frequencies at
high volumes, distortion sometimes occurred when I ran the application. This wasn’t mentioned
during usability testing, so wasn’t a usability issue, but is a problem that might be resolved by
41
processing the signal with an FFT so that the low frequencies could be attenuated. This might also
cause a latency problem with the delayed signal though. Also the application isn’t a stand-alone
product yet, so it still needs to be run using JBuilder. This caused some occasional problems with the
operating speed of the application during usability tests, as JBuilder uses up a lot of resources. This
would only take a small amount of time to fix.
6.3 Other Mid Project Report Issues
My intended project schedule from my mid project report is included in appendix D. I managed
to follow it quite closely until the problems I experienced with JS. Evolution 1.1 of the application
was completed in time for the progress meeting, and the completion of programming was 27th April,
not 2nd as specified in my schedule.
In my mid project report, I said that a possible enhancement would be to look at cross-platform
compatibility, however I didn’t get a chance to assess the application on other platforms during the
development of the program. Java is supported on Linux and Mac operating systems, so it is likely
that the application will run on these operating systems provided JDK 1.3 is present on the particular
computer that the application is to be run on.
6.4 Object Oriented Style
The application doesn’t currently have a particularly structured hierarchy of classes or a very
object-orientated style. This problem might be due to taking a rapid prototyping approach to
developing the program, but could also be due to my lack of experience with program development or
my lack of experience with JS. Looking at the final code now I can see some obvious improvements
that could be made to the coding style which I intended to use as I stated in section 5.2 – both effects
use almost identical code to capture sound, so an abstract class or interface could have been created
that could then be implemented by both EchoSound and SampleSound. At the time of writing the
code for the EchoSound class, this approach didn’t seem like it would be appropriate and would have
been more obvious if I had more detailed design plans to follow. This shows the disadvantages of the
software design techniques I used to develop the application. This drawback didn’t hinder the
production of the application in this case, but could cause more inconvenience in other projects.
6.5 Design and Usability Evaluation
At the moment, the GUI has a basic, Windows-style appearance, which usability testing shows
could do with changing. It is straight forward, as was one of my initial aims, but it could be improved
by introducing some more appropriate graphics that could be imported to represent the application in
a more DJ orientated style. I have tried to take advantage of the usability methods I have discussed,
such as user models and metaphors, but this has been restricted by Java Swing. The appearance and
42
functionality of the application would appear to generally fit the user model that a DJ has of the
application from the usability tests after the final usability alterations. The only problem that could
still exist is the echo level control getting mistaken for an overall level control.
A few features suggested in the usability tests and that I have thought of that could realistically be
implemented in the application in the future are:
•
A loop facility that would allow a sample to be repeated continuously. This is a common
feature on most hardware samplers available, and would be a feature most DJs would
expect to be included. This could be implemented with a simple while statement that
checks whether a ‘loop’ check-box is selected or not.
•
A sample bank that would allow multiple samples to be loaded into the sampler memory
at once. This would enable greater versatility for the user, and could be included simply
by either creating multiple instances of the SampleSound class or by having an array of
byte arrays that could store the different audio clips. Multiple instances of the
SampleSound class would enable multiple samples to be played at the same time, but an
array of audio clips would use less system resources.
•
A “load from file” feature enabling different sound files to be loaded into the sampler’s
memory. This feature would give the application an extra feature compared to most
hardware samplers. Some don’t load from disk at all, and those that do require the audio
to be in a specific format specified by the manufacturer. My application would be able to
load files of various different types, such as Windows wave files, MP3s, or Macintosh
AIFF files, due to JS’s wide support of different file formats.
•
More versatility with the echo effect could be introduced by including different
algorithms to produce different types of reverb, delay and echo. Though this would be
limited by the latency of the application, it would still be possible to create algorithms to
change the decay and delay time of the effect and play multiple delays at once.
•
A BPM counter to display the tempo of the record being played. This could then be used
for matching the tempo of MIDI tracks, but would require a considerable amount of
development.
The usability testing during development proved to be an invaluable assistance to both usability
and functionality. Having the opinions of people who only had vague expectations of the
functionality helped greatly in noticing problems and suggesting further areas to investigate. It also
takes someone outside of a project to notice obvious usability flaws, such as the labelling of the echo
level slider and the echo start button. During the tests there were some different points brought up,
but there were more similarities between opinions regarding the problems with the GUI. Some of the
problems that were brought up during usability tests were also concerns shown during the progress
meeting for the project. This reinforces both the points made and the theory described in section 3.2
43
regarding the number of people needed to carry out usability tests. Taking a small group of people
who are likely to use the program, plus one other to get an unbiased opinion of the functionality
produced usability test results that were useful in advancing my application.
6.6 Application Evaluation
Despite the setbacks I have experienced during my project, I have successfully fulfilled my
minimum aims stated in my mid project report, and have in fact implemented two effects in the
application. The application proves that some real-time effects are possible on a computer, and
presents these effects in a user-friendly way. If the port class had been implemented it would have
been interesting to see the latency that would be caused by using FFTs to alter the audio signal for
different effects and how successful different FFTs would be at converting between time domain and
frequency domain information.
The only functionality that is missing from the application is that the sample duration method has
still not been implemented, due to a lack of time.
Every other usability problem from the usability tests and the progress meeting has been
addressed and fixed as far as possible.
44
Bibliography
1. The Fourier Transform and Its Application – Ronald N. Bracewell – McGraw-Hill, 1986,
pp 462-464 (Available at www.swarthmore.edu/natsci/echeeve1/Ref/Fourier/
FourierBio.html)
2. Discrete Fast Fourier Transforms – Don Cross (Available at www.intersrv.
com/~dcross/fft.html)
3. Applications of Discrete and Continuous Fourier Analysis – H Joseph Weaver – WileyInterscience Publications (1983)
4. Soundprobe 2 Digital Audio Glossary – HiSoft (2001) (Available at
www.soundprobe.com/glossary/f.html)
5. Analog and Digital Filter Design Using C – Les Thede – Prentice Hall (1996)
6. Software Engineering (6th Edition) – Ian Sommerville – Addison Wesley (2001)
7. Toward A Discipline Of Scenario-Based Architectural Engineering – R. Kazman, S. J.
Carriere, S. G. Woods – Carnegie Mellon Software Engineering Institute (2000) (Available
at www.sei.cmu.edu/staff/rkazman/annals-scenario.pdf)
8. The Windows 95 User Interface: A Case Study in Usability Engineering – Kent Sullivan –
Association for Computing Machinery, Inc (1996) (Available at
www.acm.org/sigchi/chi96/proceedings/desbrief/Sullivan/kds_txt.htm)
9. User Interface Design for Programmers – Joel Spolsky – APress (2001)
10. About Face: The Essentials Of User Interface Design – Alan Cooper – IDG Books
Worldwide (1995)
11. The Development of the C Language – Dennis M. Ritchie – Lucent Technologies
(Available at cm.bell-labs.com/cm/cs/who/dmr/chist.html)
45
12. Java Sound API Programmer’s Guide – supplied with JDK 1.3 help files – Sun
Microsystems Inc (2000)
13. Learning Java – Patrick Niemeyer & Jonathan Knudsen – O’Reilly & Associates, Inc
(2000)
14. Dynamic User Interface Is Only Skin Deep – Jason Briggs – JavaWorld.com (2001)
(Available at www.javaworld.com/javaworld/jw-05-2000/jw-0518-skins.html)
15. Java Sound API discussion list – subscription information available from java.sun.com
/products/java-media/sound/forDevelopers/interest_group.html
16. Copyright infringement article included in Appendix E, supplied by Adrian Slater
(University Solicitor)
17. A Simple Approach To Digital Signal Processing – Craig Marven & Gillian Ewers – Texas
Instruments (1994)
46
Appendix A
During the course of my project I have learnt a great deal about usability and human-computer
interaction. There is no substitute for usability testing at early stages of product development. It
takes an unbiased potential user to see what problems there are with a product from a usability and a
functionality point of view, and isn’t something that a programmer can assume to know. I feel that
this is one of the most important lessons I have learnt from this project.
I have learnt a lot about programming in Java as well. Java proved to be a straightforward
language to develop the program in apart from the problems I experienced with Java Sound. Even
these were fairly simple to work around. The object oriented style that is inherent in Java has taught
me more than just how to code in Java though – I have learned about how an application can be
constructed using an object oriented approach. It would have been better to take a more object
oriented approach to the coding, and this would have been done if the initial specification had been
more definite, rather than saying that I intended to implement ‘an effect’. This would have allowed
better planning of each effect and I would have been able to see common features of each in order to
split each effect up into smaller classes. This conflicts slightly with the usability inclined design
approach I took and is why it didn’t feature in the project, but I feel that a more specific initial outline
would have benefited the end product.
The problems with the port class were unfortunate, but I have learnt a great deal about building
GUIs with Swing and using the Java Sound API
I have learnt to work around problems that arise during a project. The problems that I had with
my project were a hindrance, but I managed to find alternative solutions to them, though these
solutions were not as effective as the original plans.
The significance of the internet during my research surprised me, and provided me with many
articles that I found useful.
The software design techniques I chose to follow seemed appropriate for the tasks I was trying to
achieve, but again these didn’t highlight the initial need for greater detail in specification.
I am disappointed that I didn’t get to implement a fast Fourier transform during my project, but
after I realised that I wouldn’t be able to implement the filters that originally wanted to, it became
difficult to find a way in which they could be included as a useful feature.
47
Appendix B - Documentation
DJ Real-Time FX Unit
This program is intended to be used on an audio signal being played on a computer, either a sound
file or a signal from a mixer connected to the computer by a line in.
The FX unit incorporates two useful effects – an echo effect and a sampler.
The Echo Effect
Once running, a sound signal can be processed by the Echo Effect by pressing the ‘Start Echo’ button
of the Echo Effect section (i). While running, the button will remain pressed and display the text
‘Stop Echo’. The output level of the echo can be varied using the ‘Echo Level’ slider (ii), and the
delay time of the echo can be altered with the ‘Delay Time’ slider (iii). When processing is finished,
pressing the ‘Start/Stop Echo’ button will stop the effect
ii
iii
i
v
iv
vi
The Sampler
The sampler can capture a short clip of audio, maximum length 23.2 seconds, by pressing the ‘Start’
button of the Sampler section (iv). Once the sample has been captured, the sample can be played
using the ‘Play’ button, or played in reverse using the ‘Reverse’ button (v). The adjacent ‘Stop
buttons stop the respective function (vi).
The unit cannot currently be used to sample an audio signal that is being processed by the Echo
Effect, or vice versa, but a sample can be processed by the Echo Effect. Also, the sample duration
doesn’t function
48
Appendix C – WinAmp Skin Examples
49
Appendix D – Mid Project Report Schedule
Week commencing:
Project milestones:
1st October
3rd October – Hand in project preference form
12th – Procedures/timetables meeting
Allocation of supervisor
15th October
29th October
12th November
26th November
10th December
24th December
7th January
21st January
4th February
18th February
4th March
18th March
1st April
15th April
29th April
13th May
27th May
2nd November – Complete aims and
minimum
requirements
form
9th November – Mid-project report/update
meeting
Hand in questionnaire forms
13th November – Complete project schedule
12th December – Complete basic GUI
prototype
13th December – Hand in mid-project report
15th December – Start of Christmas vacation
13th January – End of Christmas vacation
14th January – Start of exams
25th January – End of exams
28th January – Collect mid-project report
– Be able to send sound through
program
18th February – Have one effect working
13th March – Submit table of contents & draft
chapters
22nd March – Completion of progress
meeting
– Have first draft of
documentation.
2nd April – Completion of programming/Start
week of debugging.
8th April – Start project report write-up
12th April – Completion of documentation
1st May – Submit Project Report
3rd May – Submit Project Postscript
29th May – Return of feedback
50
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement