Multimedia at Work Capturing Conference Presentations

Multimedia at Work Capturing Conference Presentations
Multimedia at Work
Qibin Sun
Institute for Infocomm Research
Capturing Conference Presentations
Lawrence A. Rowe
University of
California,
Berkeley
Vince Casalaina
Image Integration
M
any organizations have developed technology to capture and stream presentations.1-3 Yet, presentation capture is impractical
at many professional meetings and conferences
because of high costs. For example, the typical
expense of capturing and publishing presentations using conventional technology is $5,000 to
$20,000 per day, depending on the capture complexity and how you produce the final product.
We’ve developed an approach that’s similar
to the live-to-videotape recording process the
broadcast industry uses, except we record compressed material onto a computer disk. Captured
media files can be published immediately without offline editing or postproduction, significantly reducing publication cost. We tested our
new approach by capturing presentations at the
Association for Computing Machinery (ACM)
Workshop on Network and Operating System
Support for Digital Audio and Video (NOSSDAV)
2005.4
The total cost of the equipment we used in our
experiment (including audio, video, and computer equipment) was approximately $12,000. The
production team included one production assistant and one person who acted as both webcast
Editor’s Note
Working on multimedia and e-learning areas, you might have heard
about the Berkeley MPEG-1 Tools, the Berkeley Multimedia, Interfaces, and
Graphics (MIG) Seminar/Lecture Webcasting System, or the Open Mash
Streaming Media Toolkit. All these achievements were produced from the
group led by Larry Rowe.
In this issue, we invite Rowe to introduce his latest low-cost system on
automated presentation capture, which covers both the technology and
the process. In particular, he shares with us some valuable thoughts on
improving the quality of the captured material, the process from capture
to postproduction, the system usability (such as the user-friendly interface),
and the media streaming protocols to support playback.
—Qibin Sun
76
1070-986X/06/$20.00 © 2006 IEEE
producer and director. Based on this experiment,
we estimate it’s possible to capture and publish
conferences for approximately $3,000 per day
plus expenses (such as travel, room, and board).
This estimate includes equipment rental.
This article describes the technology and
process we used to capture and publish the
NOSSDAV presentations. A longer version of this
article and a slide show showing pictures of the
equipment and the room are available at http://
bmrc.berkeley.edu/research/nossdav05.
Capture process
The basic idea behind presentation capture
is to capture audio, video, and graphics (that is,
RGB output from a presentation computer) and
encode it into a compressed digital media file
that users can replay on demand. The challenge
is to capture high-quality images of the projected presentation material inexpensively.
Most conference presentations use relatively
static slides with transition effects and builds.
Occasionally, a presentation will include animations to illustrate dynamic behavior. Some
presenters use continuous media (such as audio
and video) and live demonstrations in their presentations. Dynamic behaviors, continuous
media, and live demonstrations are especially
difficult to capture.
The conventional approach to lecture capture
is to use one or two cameras focused on the
speaker—for example, for a close-up of the
speaker and a wide-angle shot of the stage—and
a wireless microphone to capture the speaker’s
audio presentation. Some productions use a
third camera to capture audience members as
they ask questions.
Problems arise when you try to capture the
graphics material projected to the audience. Typically, the computer’s RGB signal is converted to
a video signal by using a scan converter or pointing a camera at the projection screen. Both
approaches have limitations because the RGB sig-
Published by the IEEE Computer Society
Front of room
Projection screen
Panel
ium
Pod
Figure 1. Pictures of the
room in which the
NOSSDAV conference
was held. (a) Room
configuration, and (b)
view of the room from
the speaker’s podium.
Lecture capture
production
(a)
Entrance
(b)
signals or one of the video signals composed with
the RGB signal. The retail cost of the VP-720DS
is $1,595, although they’re widely available for
$1,200.
We used two cameras: a manually controlled
camera at the back of the room and a pan, tilt,
and zoom (PTZ) camera located in the aisle
between the classroom seating tables. Figure 1
shows a schematic of the room and a picture
from the podium. We used the manual camera to
provide a wide-angle view of the stage and to
show people asking questions. The PTZ camera
was used for close-ups of the speakers and panel
members. The captured presentation is a single
video stream that shows the speaker, the presentation material, or the presentation material with
October–December 2006
nal has too much data compared to a video signal. Digitizing and compressing these images
discards 50 to 70 percent of the image’s information, which often results in unreadable presentation material.
Another approach is to acquire the presentation source files and create images in postproduction that can be synchronized with the
speaker’s audio and video.5 This approach produces high-quality slide images but raises the cost
of production unless speakers are constrained to
a limited set of presentation packages. Capturing
dynamic material is still problematic. Moreover,
some speakers won’t provide copies of their files.
A prior experiment at ACM Multimedia 2001
used this approach, and the published material
contained only 30 percent of the slides.6
In our approach, we directly capture the RGB
signal using an NCast Telepresenter G2. (We
should note that this article’s first author, Rowe, is
a cofounder and investor in the company.) With
this method, the image quality is substantially
better, and we capture the dynamic material. The
G2 (see http://www.ncast.com/telepresenterG2.
html) contains an embedded computer that runs
software to digitize audio and RGB signals and
then compresses them using MPEG-4 codecs. It
can webcast live streams and archive the material in an MP4 file for on-demand replay. The G2
produces material compatible with Internet Engineering Task Force (IETF) and International
Telecommunications Union (ITU) standards that
users can play using the Apple QuickTime Player. The G2 can be controlled by using the embedded Web interface or by a program that accesses
the G2 through a Transmission Control Protocol (TCP) or serial connection. The G2’s retail
cost is $5,500.
The G2 captures RGB images, so we need to
convert the National TV Standards Committee
(NTSC) video signal produced by cameras recording the speaker into an RGB signal. We used a
Kramer VP-720DS seamless switcher (see http://
www.kramerelectronics.com), which accepts up
to four video inputs and one RGB input and
produces an RGB output selected from one of
the inputs.
The VP-720DS has been discontinued, but
Kramer makes many similar products that can be
used for this application. The switcher scales the
selected input to the specified output format and
uses frame-accurate switching. It also provides a
picture-in-picture (PIP) function that will show
the RGB signal composed with one of the video
77
Multimedia at Work
Figure 2. Three
examples of the
material captured for a
presentation. (a) A
close-up of the speaker,
(b) the presentation
material, and (c) a
composition showing
the speaker and the
slides using picture-inpicture.
(a)
(b)
IEEE MultiMedia
(c)
the speaker in a PIP window. Figure 2 shows
examples of each.
Figure 3 shows the equipment configuration
we used during the capture. The director
(Casalaina) operated the wide-angle camera, an
78
audio mixer to control sound levels, and two GUI
applications that ran on a laptop to control the
PTZ camera (a Canon VCC4) and the capture and
switching hardware.
The house audio system provided a single
audio signal that combined output from the wireless microphone, a wired podium microphone,
and audio from the speaker’s presentation computer. The podium microphone captured audience questions and speaker introductions.
We designed the control software to be easy
to use and to provide only the functions required
for lecture capture. Our hope was to automate as
much as possible of the production process.
One application controlled the PTZ camera and
a second application controlled the capture and
switching hardware. The camera control application provided an interface to pan, tilt, or zoom the
camera smoothly at a user-configured speed and
to set or recall up to six preset positions.
The capture/switching application provided
functions to control capture (for example, to
start, pause, resume, or stop), select the video
source (such as a wide-angle camera, close-up
camera, or RGB signal), control use of PIP, and
configure selected hardware properties such as
the capture format (video graphics array [VGA],
super video graphics array [SVGA], and Extended Graphics Array [XGA]).
We wrote the control applications in Tcl/Tk,
which together include aproximately 3,500 lines
of code. The code sends commands to the VCC4
camera and VP-720DS using serial connections
and to the G2 using a TCP connection. (More
details on these applications, including screen
dumps that show the interfaces, are available at
http://bmrc.berkeley.edu/research/nossdav05/
capture/.)
The conference was scheduled for Monday
and Tuesday, so we arrived Sunday to prepare. It
took approximately three hours to setup and test
the equipment on site.
During the event, the director operated the
equipment and monitored the capture. We ran
the audio signal through a small mixer so we
could easily control sound levels. Video was
monitored on an RGB display from the G2 that
showed the captured video. The G2 display can
be configured to show a sound meter for captured audio. Hence, we were able to verify that
the sound was intelligible and the captured signal was acceptable. The production assistant
(Rowe) solicited performance releases, helped
speakers with RGB output settings, and tweaked
Presenter PC
Wireless
microphone
Y/C
Manual camera
PTZ camera
BNC
RGB
Y/C
A in
Projector
B in
A out
RGB D/A
Preview monitor
Control PC
RGB in
Y/C1 in
Kramer Video Scalar
Y/C2 in
Serial
Serial
Serial in
RGB out
EtherHub
RGB in
Audio left in
NCast G2-R
Audio right in
Audio
mixer
RGB out
Legend:
Video
Audio
Serial
Ethernet
RGB
Audio
Audio
receiver
Monitor
headphones
Program monitor
Figure 3. The audio, video, and computer equipment used to capture the presentations. The presenter’s PC and wireless microphone
and the presentation projector are in the upper left corner. We brought all the other equipment to capture the event. The schematic
depicts the various interconnections and signals (such as video and audio). The producer uses the audio mixer and monitor
headphones, the preview and program monitors, and the control PC to capture the presentation. The preview monitor lets the
producer set up the camera not currently selected for program output.
produced 44 media files that occupied 8.7 Gbytes
of disk space. Sadly, the video for one talk wasn’t
recorded correctly for unknown reasons. It might
have been an operator error starting or stopping
a capture or a software bug. We still published
the audio for the talk.
It took approximately an hour to tear down
the equipment and repack it for transportation
after the conference ended. We also made a copy
of the media files on a separate disk just in case
there was a problem on the return trip.
October–December 2006
the control software. (ACM approved the performance release, which is available at http://bmrc.
berkeley.edu/research/nossdav05/capture/acmvid-release.pdf, before the event.)
Previous experience suggested that 67 percent
of the presenters would sign the release. Often
presenters decline because they are uncertain
whether they had releases for material used in
the talk or because they worked for organizations
that required that corporate lawyers sign the
releases. All the NOSSDAV presenters signed the
release, probably because most speakers were
from universities.
The G2 stores the captured media files on an
internal disk. We captured the conference organizers’ welcome and introduction, presentations
for the 33 accepted papers, the keynote address,
and nine question and answer sessions, which
Postproduction
As we mentioned earlier, the NCast G2 produces files that users can play on a QuickTime
Player. We installed a Darwin Streaming Server
(DSS) on a FreeBSD PC located at the University
of California, Berkeley, and loaded the captured
79
Multimedia at Work
It took some effort, but
eventually we were able
to get the HTML to work
correctly on all Web browsers
using the embedded
QuickTime Player.
files onto it. We then played the material using
various Windows and Macintosh PCs from different places including high-speed connections
at Berkeley and other universities and broadband connections at home. The captured material didn’t play well for two reasons:
❚ The material was captured at 1.5 megabits per
second (Mbps) , which is too demanding for
many broadband home connections.
❚ The material was captured at the native image
size of the presentation, typically XGA, at 30
frames per second (fps). A relatively new PC
was required to decode this material.
IEEE MultiMedia
Consequently, we decided to recode the material so that more people could play it. We had
trouble finding an inexpensive software package
to transcode the files. We found several packages
that appeared to work, but they cost between
$400 and $1,000. While we were searching for
the best alternative, Apple released QuickTime
V7 Pro, which includes the required transcoding
functionality, runs on Macintosh and Windows
PCs, and costs $30.
After experimenting with different formats,
we decided to publish two versions of each presentation, specifically a low-quality version that
users can play anywhere and a high-quality version for people with fast network connections
and computers. The low-quality version uses 384
× 256 images at 15 fps that require 600 kilobits
per second (Kbps), and the high-quality version
uses 512 × 384 images at 15 fps that require 1,200
Kbps. We used the recently released QuickTime
H.264 video codec for the published material
because the transcoding software supported it
80
and it appeared to produce better results than the
MPEG-4 video codec.
Transcoding all the material was time consuming because it required three and nine times
real time to produce the 600 and 1,200 Kbps
material, respectively. The H.264 codec in QuickTime V7 Pro has one- and two-pass encoders. We
used the one-pass encoder even though the
results were better with the two-pass encoder
because the two-pass encoder required 40 times
real time to transcode a file. We had 14 hours of
material to transcode at two settings. Using the
one-pass encoder, it still took approximately 170
hours to transcode the material.
We produced Web pages to play the material,
including a listing of all talks and popup windows to play each talk. It took some effort, but
eventually we were able to get the HTML to work
correctly on all Web browsers using the embedded QuickTime Player.
Publication
The conference was held 13–14 June 2005,
and we published the material on 1 September
2005. (The presentations are available at http://
bmrc.berkeley.edu/research/nossdav05/.) The
ACM SIG Multimedia and NOSSDAV Web sites
and mailing lists advertised the material’s availability. Users were able to play the material successfully 329 times, which is 60 percent of the
attempted plays (548), in the 11 months between
September 2005 and July 2006. We’ve omitted
from these statistics plays by the site producer
during development and testing.
Of the 219 failed attempts, 179 (80 percent)
were logged as server timeout errors, which are
caused by users trying to play the material on a
computer behind a firewall or network address
translation (NAT) router using a datagram protocol (such as Real-Time Streaming Protocol
[RTSP]) rather than a TCP-based protocol (such
as HTTP). Most of these errors occurred in the
first two months.
The player or server software didn’t work during November and December 2005, which we
discuss in more detail in the next section. We
changed the way the videos were played in early
January so that all playback used HTTP transport.
Since that time, we’ve noticed a significant
decrease in server timeout errors. The other 40
errors are bad requests (for example, the URL
doesn’t exist or a wrong format is requested).
Looking at the successful plays, 35 percent
used the high-speed version and 64 percent used
the low-speed version. The remaining 1 percent
played the audio-only talk. We’re surprised that
more people didn’t play the high-speed version
because we expected that most people interested
in the material would be at universities, which
typically have high-speed connections that can
access the Berkeley server.
Each talk and Q&A session was played
between 0 and 40 times with a 7.7 mean number
of plays (standard deviation 9.1). Surprisingly,
three talks have never been played.
The most popular talks are
❚ Keynote address “Multimedia Systems
Research: A Retrospective” by Harrick Vin
from the University of Texas (40 plays),
❚
“Supporting P2P Gaming When Players Have
Heterogeneous Resources” by Brian Neil
Levine from the University of Massachusetts
(36 plays), and
❚
“Mirinae: A Peer-to-Peer Overlay Network for
Large-Scale Content-Based Publish/Subscribe
Systems” by Yongjin Choi from KAIST (31
plays).
The most popular panel discussion was on
“Network Gaming,” which has been played five
times and included researchers Brian Neal
Levine from the University of Massachusetts,
Chris Chambers from Portland State University,
Grenville Armitage from Swineburne University, and Kuan-Ta Chen from National Taiwan
University.
We’re disappointed the material has been
played only one to two times per day. Although
we expected replays to decline over time, we
thought people interested in the topics who didn’t attend the workshop would play the material. The problem might be publicity, since it’s
difficult to advertise the material’s availability,
and the content’s one-time nature.
Several things worked well, including the
switching and capture hardware and the low-cost
model for capture and publication. Although we
believe it’s possible to capture and publish a single-track conference for approximately $3,000 a
day plus expenses, this price will of course be
higher if you use additional equipment. Still, it’s
reasonable to expect the cost to remain well
under $5,000 per day. Also, we were able to cap-
Improving quality
Generally speaking, the material we captured
is good quality, but it can be improved. First, we
captured the material at 30 fps using the native
resolution of the presenter’s projected material if
the resolution was XGA or smaller and XGA resolution if larger. Although it reduces visual quality, a lower resolution capture (such as SVGA) at
15 fps is good enough given the constraints of
current playback technology.
Scaling higher-resolution images to SVGA and
applying typical video coding algorithms produced some “ringing” around text on the slides—
that is, ghost edges around the characters.
Modern computers are exceptionally good at displaying material at different resolutions. Where
possible, we need to encourage presenters to use
lower resolution when projecting their material.
This problem is related to bandwidth available
for transmitting the material during playback
and decoding efficiency of the playback computer. Over time, these constraints will be relaxed,
and it will be practical to capture larger images at
higher frame rates.
We could also improve audio capture. Some
speakers didn’t use the wireless microphone. The
captured audio was good if they stayed at the
podium, but sometimes they strayed away from
the podium or looked at the screen, which
impacted quality. The obvious advice is to force
speakers to use the wireless microphone.
Audio capture of audience questions must be
improved because they were sometimes difficult
to hear. We thought the podium microphone
would pick up most audience questions, which it
did. However, sometimes the audience member
didn’t speak loudly, and it was difficult for the
director to change the sound level quickly enough
during interaction between the speaker and audience. We should have used several microphones
pointed at the audience and controlled them separately at the mixer to capture questions.
Finally, we had only one wireless microphone.
We needed several microphones so the session
October–December 2006
Lessons learned
ture all the workshop presentations, regardless of
the slide and computer technology the speaker
used, including all dynamic material. We believe
the resulting published material is of reasonable
quality given the playback constraints (network
bandwidth and computer processing power) and
a few production glitches we’ll discuss shortly.
Nevertheless, as in any production, there’s room
for improvement.
81
Multimedia at Work
IEEE MultiMedia
moderator could always be wired and the next
speaker could get ready before it was time to talk.
We also had a minor problem positioning the
RGB image on the projection screen and at capture. The projector in the room had a remote
control to move the image left/right or up/down,
but we didn’t notice the problem during testing.
As a result, the RGB images in the first few talks
were shifted up and to the left when captured,
which led to video noise across the bottom of the
captured images. Both the VP-720DS and the G2
have controls to move the image, but we didn’t
have access to them in our control software. This
problem can be easily fixed.
Another way to improve the captured material’s quality is to use more cameras and provide
the director with more control. We didn’t incorporate an audience camera positioned at the
front of the room because we didn’t have an
extra camera. We will do so in the future as long
as audience members don’t object. And we will
use PTZ cameras for all sources rather than a
manual camera because it will simplify operation
for the director.
Wide-angle views of the stage were unusable
when slides were being projected because the
bright light bouncing off the screen caused the
camera auto exposure to close the aperture,
which produced a dark image that made it difficult to see the speaker. A good spotlight on the
speaker will fix this problem.
NCast has released the Telepresenter M3 with
additional production features including a PIP
function and a graphic overlay function that can
be used for titling. Reducing the hardware simplifies setup and operation and improves reliability.
Lastly, Automatic Sync Technologies (see
http://www.automaticsync.com) is a commercial
company that offers an automated captioning
service for streaming media. They produce a multimedia title using any of the popular streaming
media formats that scrolls text of an audio transcript synchronized with the audiovideo material. They can also produce a word-based search
index to the material. The service costs approximately $185 per hour of source material, which
means the NOSSDAV material could be processed
for less than $3,000. We think future publication
of presentations and discussions at conferences
should include this capability with the material
they publish.
Improving process
We could also make several changes to
82
improve the capture and postproduction process.
First, a preconfigured custom hard-shell case for
the production equipment would greatly simplify preparation before an event and setup at the
remote location. The case can incorporate a small
rack for the equipment with sound dampening
and access to the front and back panels. We can
also add small rack-mounted LCD displays for
monitoring various video sources in place of the
heavy, awkward-sized professional video monitor we used in this experiment. These cases are
relatively inexpensive and many companies will
custom design them for a specific application.
Second, we can substantially improve the
postproduction and publication process. Because
it was the first time we used this approach in a
conference setting, it took almost two months to
publish the captured material. This delay was
caused in part because we had to determine the
best playback representation and transcode the
material. In future productions, we will change
the capture parameters to avoid transcoding. We
also had to setup the media server and author
Web pages for the conference program and individual presentations. Most of this work only
needs to be done once or can be automated.
During the event, we spent considerable time
keeping track of the speaker and the recorded file
that corresponded to each talk. The G2 identifies
the talk by encoding the beginning date and time
of the capture into the file name. We copied the
material off the G2 by hand and then used scripts
to produce the Web pages given the files and
information about the talks (for example, title,
authors, affiliation, speaker, talk duration, and
start time). We can easily automate this step by
entering the conference program ahead of time
and relating it to the capture files. Moreover, we
could open up the G2 interface to the embedded
FTP server and automate the entire postproduction process.
Finally, several research groups have explored
automating the decisions made by a webcast
director during a live event.2,7,8 Clearly, this technology should be incorporated into conference
presentation capture.
Improving usability
Numerous changes can be made to the control
software used to capture the event. First, we need
to fix the PIP interface. The control software
needs a simple configuration interface that lets
the director change the PIP location (to bottom
left or right) to more easily adapt to the spatial
Figure 4. Spatial
relation between
speaker and PIP
window. It must be
easy for the director to
change the PIP location
dynamically to avoid
having (a) the speaker
gesture off screen. (b)
Preferably, the speaker
will gesture onto the
screen.
(a)
(b)
Improving playback
We used the QuickTime Player embedded in
a Web page to play the recorded material. Several users had problems playing the material. Generally speaking, it worked well on Macs running
OS X using the Safari Web browser. Although
users were able to play the material using Windows computers and other browsers (such as Firefox and Internet Explorer), most had problems
with streaming transport because the user had to
manually configure it. Users have no patience for
configuring software to view material like these
presentations. Playback must work like TV: go to
a Web page and it works.
The QuickTime embedded player can transport content using either RTSP or HTTP streaming. Given the state of the Internet today, nearly
everyone uses HTTP streaming because of firewalls and NAT routers. However, the player uses
RTSP streaming by default so the user must reset
the transport parameter manually. Most users,
October–December 2006
positioning at the conference venue. If the PIP
window is on the lower right side of the image
and the speaker is standing to the left of the
screen as you face the stage, the speaker’s gesture
to the screen on his left is off the right edge of the
captured image. By moving the PIP window to
the lower left, the speaker gesture points to the
slide in the captured image. Figure 4 illustrates
this problem. If the speaker is on the screen’s right
side, you need to change the PIP position. Hence,
the control software must make it easy for the
director to change the PIP location dynamically.
This feature is easy to add because the Kramer
interface has the function. But one problem with
the Kramer is the absence of a function to switch
the PIP and main window source. This function
exists through the VP-720DS onscreen display
interface, but we couldn’t execute that function
remotely, even when we tried to mimic the
onscreen operations. The device clearly has the
function, but it’s unavailable through the serial
control interface.
This limitation caused problems because several times the director wanted to swap the PIP
and main window images. To do it, he had to
turn off the PIP window, switch to the alternative
source, and turn on the PIP window, which was
distracting and time consuming. Moreover, it
probably increased the compressed bits because
the codec produces encodings for the intermediate images.
To improve camera control, we need to
rewrite the PTZ camera control software. The
software we used was developed originally for a
Canon VCC3. We used the VCC3 emulation
mode on the VCC4 camera. The VCC4 also has
more functions (such as variable speed moves)
that we can exploit to improve the captured
images. The VCC3 has manual iris and focus controls, but we couldn’t get them to work in the
emulation mode on the VCC4. Presumably, the
VCC4 interface to these controls works.
Finally, we need to define some presets to
move the PTZ camera in one or two dimensions
and to add more presets. A preset in the current
software defines an absolute setting for pan, tilt,
and zoom. Several times the director wanted to
pan to the right or left at the same tilt and zoom
settings. In effect, he wanted a delta from the current position, rather than an absolute setting for
a preset. We also need to add groups of presets so
the director can easily switch between them. For
example, individual speakers and panel sessions
require different settings.
83
Multimedia at Work
including experienced computer scientists, were
confused by this requirement even though our
Web pages described the problem and explained
how to change the setting.
Moreover, a recent release of the QuickTime
software for Windows (version 7.0.3) exacerbated this problem. Prior to this release, the user
could set the transport to use port 8000 with
HTTP streaming. This release doesn’t let users
change the port—they must use default port 80.
This restriction, or more likely bug, caused problems because we run the DSS server on the same
machine as a Web server. We didn’t notice this
problem with the material for more than two
months because no one notified us that the
material was unplayable. The server logs show
that people just stopped playing the material.
We fixed this problem by explicitly including
the port number (8000) in the RTSP URL we used
to launch playback. This port uses HTTP streaming by default, so it removed the requirement that
people explicitly set the transport parameter.
IEEE Distributed Systems Online brings you
peer-reviewed articles, detailed tutorials, expert-managed
topic areas, and diverse departments covering the latest
news and developments in this fast-growing field.
Log on for free
access to such topic areas as
Grid Computing • Mobile & Pervasive
Cluster Computing • Security • Peer-to-Peer
and More!
To receive monthly
updates, email
[email protected]
http://dsonline.computer.org
84
Conclusion
This experiment demonstrates that it’s possible to capture conference and workshop presentations for on-demand replay for $3,000 per day.
We believe professional organizations such as the
ACM and IEEE should consider capturing presentations for many, if not all, conferences. Over
time, this cost should decline and the quality of
the captured material will improve.
MM
Acknowledgments
We thank the ACM Special Interest Group
Multimedia Chair Ramesh Jain for funding this
experiment. We also thank Bobb Bottomley, who
is responsible for the audiovideo technology at
Skamania Lodge where the conference was held.
Lastly, we thank all the speakers who agreed to
be captured for posterity.
References
1. L. Rowe et al., BIBS: A Lecture Webcasting System,
Berkeley Multimedia Research Center Technical
Report, Univ. of Calif., Berkeley, 2001; http://bmrc.
berkeley.edu/bibs-report.
2. Y. Rui et al., “Automating Lecture Capture and
Broadcast: Technology and Videography,” Multimedia Systems J., vol. 10, no. 1, 2004, pp. 3-15.
3. A. Steinmetz and M. Kienzle, “The E-Seminar Lecture Recording and Distribution System,” Multimedia Computing and Networking 2001, Proc. Int’l Soc.
for Optical Engineering, vol. 4312, SPIE, 2001, pp.
25-36.
4. W.-C. Feng and K. Mayer-Patel, eds., Proc. 15th Int’l
Workshop on Network and Operating Systems Support
for Digital Audio and Video, ACM Press, 2005.
5. S. Mukhopadhyay and B. Smith, “Passive Capture
and Structuring of Lectures,” Proc. 7th ACM Int’l
Conf. Multimedia, ACM Press, 1999, pp. 477-487.
6. SOMA Media, ACM Multimedia 2001: Conference
Presentations DVD, ACM Press, 2002.
7. M. Bianchi, “Automatic Video Production of Lectures
Using an Intelligent and Aware Environment,” Proc.
3rd Int’l Conf. Mobile and Ubiquitous Multimedia
(MUM 04), vol. 83, ACM Press, pp. 117-123.
8. E. Machnicki and L.A. Rowe, “Virtual Director:
Automating a Webcast,” Multimedia Computing and
Networking 2002, Proc. Int’l Soc. for Optical
Engineering, vol. 4673, SPIE, 2002, pp. 208-225.
Readers may contact the authors at [email protected]
edu and [email protected]
Contact Multimedia at Work editor Qibin Sun at [email protected]
i2r.a-star.edu.sg.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement