Automated visual image editing system
US007362946B1
(12) United States Patent
(10) Patent N0.:
KoWald
US 7,362,946 B1
(45) Date of Patent:
Apr. 22, 2008
(54)
AUTOMATED VISUAL IMAGE EDITING
SYSTEM
5,937,136 A
5,956,453 A *
('75)
Inventor:
5,982,447 A * 11/1999 Nakamura .
5,999,689 A * 12/1999 Iggulden
348/515
386/46
(73)
Asslgnee. Canon Kabushlkl Kalsha, Tokyo (JP)
6,034,679 A
345/328
5,969,716 A *
-
Julie Rae Kowald, Balmain (AU)
_
.
<*> Notice:
(21)
.
.
10/1999
3/2000
Davis et a1. .... ..
McGrath
6,137,945 A * 100000 McGrath
subiecFw any disclaimewhe term “this
2:522:13; Eli Z5881 £225,531: ..... ..
Patent 15 extended or adlusted under 35
6,292,620 B1 *
U-S-C- 154(1)) by 0 days-
6,334,022 B1 *
APP1- NO-I 09/543,330
(22) Filed:
8/1999 Sato .......................... .. 386/52
9/1999 Yaegashi et a1. ..
386/52
9/2001 Ohmori et a1.
12/2001
345/726
386/52
11': 232/25
386/55
Ohba et a1. . . . . . . .
. . . .. 386/46
Fujinami
.. . 386/52
6,341,192 B1*
1/2002
6,449,422 B1 *
9/2002 Ebisawa .................... .. 386/52
6,546,187 B1*
4/2003
MiyaZaki et a1. ........... .. 386/52
Apr. 5, 2000
* cited by examiner
(30)
Foreign Application Priority Data
Apr. 12, 1999
(AU)
Primary Examinerilohn Miner
.................................... .. PP9704
Assistant Examiner4Christopher Onuaku
Apr. 12, 1999
(AU)
.................................... .. PP9705
(74) Attorney, Agent, or FirmiFitZpatrick, Cella, Harper &
Apr. 12, 1999
(AU)
.................................... .. PP9706
Scinto
(51) Int. Cl.
H04N 5/93
(52)
(58)
(57)
ABSTRACT
(2006.01)
us. Cl. ........................ .. 386/52; 386/46; 386/125;
Amethod Of editing a video Sequence Comprising a Series Of
360/13; 369/83
clips, in Which each clip is formed by Video content captured
Field of Classi?cation Search ................ .. 386/52,
between two Points in time- Duration data associated With
386/54i55 46 83 126 95 111 117 125.
the duration ofeach clip is extracted from the sequence. The
360/13- 369f83- ’34é/239’ 51,5 23,15 $2313’
See application’ ?le for éomplete saearcil histol’y
duration data is processed according to at least one prede
terrnined template of editing rules to form editing instruction
data, the template indicating a plurality of predetermined
(56)
References Cited
U S PATENT DOCUMENTS
edited segment durations, and the editing instruction data
being con?gured to form output edited segments from the
' '
clips The clip is then processed (edited) according to the
5,012,334 A
4/ 1991 Etra ...... ..
.. 358/102
5,436,653 A *
7/ 1995 Ellis et 31
editing instruction data to form an output edited sequence of
725/22
output edited segments. Each of the output edited segments
5,515,101 A * 5/1996 Yoshid? ~~~~~ ~~- 348/239
5,589,945 A * 12/1996 Abecassls """ "
386/83
5,696,866 A * 12/1997 Iggulden et a1. ............ .. 386/46
has a duration corresponding to one of the predetermined
5,737,476 A *
5,784,521 A *
5,841,740 A *
4/1998
edited segment durations With at least a portion of the clip
being discarded by the processing of the clip'
Kim .......................... .. 386/52
7/1998 Nakataniet a1.
386/46
11/1998 Fijita et a1. ............... .. 386/126
71 Claims, 11 Drawing Sheets
90
F8
i
I
I
Video UP I
‘
Video
Interface
48
Keyboard
;
42
95
Analog VCR :I
1:: 1:11:11:
46
55
43
U.S. Patent
Apr. 22, 2008
US 7,362,946 B1
Sheet 1 0f 11
10
Dlgltalvideo Camera 13/12
'
Flg.
1A
M t d t
eaaa\\14
16
—>
6
5
4
2
1
I
/20
22
'
Analog Video Camera 11/
f 24
1
2
F/ 34
3
4
_
4_
/28
26
: Digitizer
32
\
/ 3O
F'Im Stock
Clip Detector
Metadata
64 /60
62
Fig_ 6A
Audio In ‘E;+
LPF
\
Audio In
i» 84
.
FWR
Flg'
\
82
68
Trigger
66
72
74
/
LPF /
70
7s
(T1)
+
80
LPF
(12> \
-
Trigger
76
U.S. Patent
Apr. 22, 2008
Sheet 2 0f 11
US 7,362,946 B1
2
m?
3
NF
2
9g.286a5l02.8
Po
wo
we
#0
mo
no
or
W
2m 6u1o m
V
A
an
96
mm
c390 32968
ow
2
N.3
U.S. Patent
Apr. 22, 2008
Sheet 3 0f 11
US 7,362,946 B1
26
5$2$53335m
:02696e6 Q6m8v8
or
2‘
NP
2
3
2*9co85z0m6S23n:
:0 to
392685m:62958
m5NSwooP00
8:056:502
No
we
@0
no
mo
_‘Q
00
m 26Q6
om
mm
ON
96
8:930
m?
Admwv
o_,
.5
m
U.S. Patent
Apr. 22, 2008
Sheet 4 0f 11
US 7,362,946 B1
E8?g1wmgw
0_.N50m
2696SzNoe683wmemw
..0,2,mi,m,T%02,. _.;,_ .
26L2ofc6o9z?:c5oEmw
89>
_-1
A—025Q2cow6zb2mo4%wz .VL2wbm=EmzC-osctvmE
i.
.._
. _.
.
/<\O_QD<
U.S. Patent
Apr. 22, 2008
Sheet 6 6f 11
US 7,362,946 B1
U.S. Patent
Apr. 22, 2008
Sheet 8 0f 11
Input raw
/
digital footage
(-
US 7,362,946 B1
100
- - -
- - - - _
with metadata
104
Metadata
154 f:
l
102
\
_ 1
Raw digital footage
Extract
Additional metadata
114
112
TEMPLATES
Standard 10-4
118
\
116
Silent Movie
Romance
Apply
MUX
Process H
/
Action
120
110
:
_\
-
122
User
select
124
User Defined
111
Graphics
Sound
User selects
template
K 126
Effects
f 130
EDL —>
K156
138 \
Apply
Beat
Extract
139
+
128
‘—_
‘
Edited _\
video
\\ 132
134
158\
Beat
Overdub
Audio
(1 + )‘__ Adjust
140
f
136
142 f
152
144
El} Dismay ‘_ Reproduce
/
148
K
150
146 J T—
'
Fig. 9
U.S. Patent
Apr. 22, 2008
Sheet 9 0f 11
US 7,362,946 B1
59
52
/
56
Modem
Device
Video
K
_ r
91
Pun?“
“
7
47 \
+
I
Video UP I
I/Face
Devlce
41
f 53 f 54
/
48 \
\ V
I
i v Y
Video
I/O
Interface
Interface
I, _ _ ,1
I
49 j I
.
'
I
l Processor
I
\
FDD
Storage Device
I
|
i
HDD
|
:
4O
57
‘
Display
90
f
I
50
f
>
I
I/O
Interface Memory CD'ROM
45
hx \
m6 \55
Keyboard
K42
43
Digital Video Camera
OO
Analog
:
DEIEIEIEI
I_l
U.S. Patent
Apr. 22, 2008
Sheet 10 0f 11
US 7,362,946 B1
/ 200
208
\
Phrase Database
210
\ Typeset Database ‘
120
/- Graphical Database
212
V
/
204
Metadata
Analysis
>
if 214
/216
User Entered
Metadata
Rule-Based / 206
Application
/128
218
—>
User
Selection
104
112
202
Fig. 11
U.S. Patent
Apr. 22, 2008
Sheet 11 0f 11
US 7,362,946 B1
/ 250
Input raw
digital footage
/ 252
with metadata
i
How many
print frames
254
/
required?
l
Divide into raw
256
clips to provide /
segments
RULES
l
Audio
Face
—>
Process segment / 258
to derive
best frame
i
Format best
frame into
260
/
thumbnail
i
Format best
frame for
printing
Fig. 12
262
/
US 7,362,946 B1
1
2
AUTOMATED VISUAL IMAGE EDITING
SYSTEM
correct understanding of transitions and Where they should
be used is often lacking With respect to the amateur video
maker, and often results in inappropriate or excessive use or
the draining of resources from the production system, only
FIELD OF THE INVENTION
to achieve an unprofessional result. The current dual VCR
approach is fraught With problems. For example, should the
The present invention relates to the editing of raW motion
picture footage and, in particular, to the extraction of infor
amateur Wish to amend any part of the video after editing is
mation from a sequence of image clips obtained from ?lm or
video image information to facilitate editing of the raW
completed, the entire process must be re-performed.
The placement of titles in the edited video must also be
done by ?rst analysing the footage to determine neW scene
footage to provide a desired result. Speci?c implementations
are concerned With the automated editing of the source
image materials to provide a rhythmic sequence of clips that
locations. This task requires some time relative to the
amount of footage the video maker has available, as the
captures the essence of the raW footage Whilst reducing the
playback time so as to avoid reproduction of portions of
footage likely to be of little interest, and also to the identi
recorded, and then further time is required for the title mattes
to be inserted. To achieve an optimal result, alternate tran
footage must be carefully revieWed With in-out points
?cation of signi?cant events in the footage, the placement of
sitions to the rest of the video must be inserted When a neW
scene is introduced.
titles, and to the extraction of a series of individual frames
for printing Which are representative of the original footage.
BACKGROUND
Insert titles, or “inter‘titles” as they are sometimes knoWn,
have been used historically in the production of silent
20
The creation of smooth, rhythmic edited results from raW
video or ?lm stock requires specialised skill in order to
produce interesting and entertaining results. When dealing
With ?lm, typically the ?lm stock is converted into a video
format so that the sequence of images can be readily
25
manipulated With computerised assistance. Once the speci?c
sequence is ?nalised using video editing, the original ?lm
setter is a skilled person Who sets out the text either
stock may be cut and spliced in the traditional fashion
thereby ensuring high quality reproduction. Such a process
photographically, illustrated by hand or With the use of a
30
therefore relates to the manipulation of video (either analog
Written in Witty prose or, if conveying the setting of location
including digital ?lm effects, editing and sound design. Such
35
of Working in the ?lm production industry.
Amateur video makers rarely have the time, expertise and
sophisticated equipment necessary to achieve the results a
the production also alter the duration of titles as does the
skill of the editor in maintaining continuity Within the
40
subjectively interesting to participants of the video, and
fashion requires a number of people each With specialised
skills. Writing the text for insert titles requires knoWledge of
45
the movie story, genre and an understanding of the culture
of the audience. Typesetting the text in a fashion that re?ects
the genre of the movie requires special design skills, and
and production techniques commonly used by professionals
placing the insert title Within the movie sequence at an
that are missing from amateur video include incorporation of
attractive titles, a rhythmic approach to editing, the appro
priate use of transitions and cuts, sound and backing tracks
and also the application of digital effects such as colour
movie.
As a consequence, producing insert titles in a traditional
often the interest of non-participant audiences are found to
Wane early in the screening. Such a lack of interest, in many
cases arises from the poor application of editing techniques
that can otherWise turn someWhat “ordinary” original foot
age into an entertaining ?nal edited version. Basic editing
or time, is generally direct and informative. Insert titles are
incorporated into a short list for the editor to then sequence
the titles into a movie. The duration of insert titles is largely
set according to the number of Words and syllables required
to be comprehended by the audience. The genre and style of
professional ?lm maker might obtain given comparable
source material. The amateur results are, in most cases, only
desktop publishing system. Words are supplied in most cases
by a Writer Who knoWs the context of the story and are often
or digital-based) Which requires skills in a number of areas
skills are rarely possessed by one person and each take
advanced training sometimes only ever achieved from years
movies to help convey information about characters and the
story to the audience in the absence of sound. Insert titles are
also used in modern day productions to facilitate comments
on action, create humor, set time and location and provide
for continuity betWeen otherWise disparate scenes. The
current method of producing insert titles has been performed
by a person knoWn as a typesetter Who is given the Written
content by a Writer of the movie or production. The type
50
correction, and particle animations, and also the application
of different shot types.
The editing of original footage requires placing clips in a
sequence corresponding to the order in Which they Were
originally derived. Current tools available to amateurs and
professionals alike include softWare that may operate on
personal computers (PC’ s), With or Without a video card, and
Which is con?gured to manage a linear time line for editing
55
purposes. HardWare such as dual video cassette recorders
60
appropriate place requires the specialised skill of an editor.
Thus, creating insert titles is a complicated expensive and
time-consuming process.
Current methods of sound editing are highly specialised
and the concept of embellishing the ?nal edited rhythm With
beat synchronisation is Well beyond the scope of most
amateur video makers. The time taken to analyse an audio
Waveform of a chosen sound track and then to synchronise
video cuts is prohibitive, the cost of equipment is unjusti?ed
(VCR’ s) may be used to alloW sequencing from the original
source tape to a neW tape. Editing by either method is a time
for most amateurs, and the techniques are even harder to
manage With dual VCR editors.
It is an object of the present invention to substantially
overcome, or at least ameliorate, one or more of the de?
ciencies associated With amateur video production.
consuming task, as both solutions require a “hands on”
approach of manually slotting each clip into its place in the
SUMMARY OF THE INVENTION
sequence. Transitions such as dissolves or cross-fades must 65
also be placed manually and often impose heavy processing
demands on computer aided production devices. Also, the
In accordance With one aspect of the present disclosure
there is provided a method of editing a video sequence
US 7,362,946 B1
4
3
comprising at least one clip, each said clip each having a
determinable duration, said method comprising the steps of:
extracting from said sequence characteristic data associ
ated With each said clip, said characteristic data including at
least time data related to the corresponding said duration;
processing said characteristic data according to at least
one template of editing rules to form editing instruction data,
said editing rules comprising at least a predetermined cutting
FIG. 10 is a schematic block diagram representation of a
general purpose computer upon Which the arrangements
speci?cally described herein can be practiced;
FIG. 11 is a schematic block diagram representation of an
insert title generator; and
FIG. 12 is a How chart depicting the print frame selection
method.
DETAILED DESCRIPTION
format con?gured to form edited segments based on a
plurality of predetermined segment durations; and
The present disclosure includes a number of aspects all
intended to assist in the automated editing of raW video
processing said video sequence according to said editing
instruction data to form an edited sequence of said edited
footage to permit satisfying reproduction. In one aspect, an
segments.
automated editing tool provides for rhythmic editing of the
In accordance With another aspect of the present disclo
sure there is provided a method of editing a video sequence
raW footage in such a fashion so as to provide an edited
comprising a plurality of individual clips and associated data
version Which captures the essence of the original raW
including at least time data related to a real time at Which
footage Whilst avoiding the inclusion of excessively long
said clip Was recorded, said method comprising the steps of:
(a) examining said time data for each said clip to identify
those of said clips that are associable by a predetermined
video cuts that might be perceived as non-entertaining to the
vieWer, or that surpass the attention span of the vieWer. In
another aspect, an arrangement is provided for extracting
20
from video cuts a selection of individual frames represen
tative of the raW footage so that a still-shot summary of the
raW footage may be formed. In a further aspect, a method of
time function, said associable clips being arranged into
corresponding groups of clips;
(b) identifying at least one of a beginning and a conclu
sion of each said group as a title location;
(c) at least one said title location, examining at least one
25
providing insert titles into the edited versions to distinguish
betWeen different stages of the raW footage is disclosed.
Referring to FIGS. 1A and 1B, video footage is typically
of corresponding said time data and further data to generate
an insert title including at least a text component; and
obtained from either one of a digital video camera 10 or an
(d) incorporating said insert title into said sequence at said
title location.
In accordance With another aspect of the present disclo
sure there is provided a method of extracting a ?rst number
of individual images from a video sequence comprising a
analog video camera 20. With the digital video camera,
depression of a record button 12 results in the digital
recording of a video signal upon a recording medium,
30
typically magnetic tape, magnetic disk and/or semiconduc
tor memory. One speci?c advantage of digital video cameras
is that they incorporate an arrangement by Which metadata
14 may be generated by the camera 10 automatically and/or
second number of individual clips, said method comprising
the steps of:
(a) dividing said sequence into segments corresponding to
said ?rst number, there being a substantially equal number
of said segments divided from each said clip; and
(b) for each said segment
35
sion With, and along side, the recorded digital video. From
the digital video camera 10, digital video footage 16 may be
output and typically comprises a number of individual clips,
(ba) identifying a plurality of video frames Within a
predetermined portion of said segment;
represented in FIG. 1A by the numbers 1, 2, 3, 4, . . . .
40
(bb) processing said frames to select a single representa
tive frame for said segment; and
(c) associating said representative frames to form said
individual clip, and often associated With the speci?c real
45
50
FIG. 2 provides a presentation histogram of a number of
content.
55
audio analysis;
FIG. 5 depicts the segmentation of the raW footage of FIG.
2 for use in frame printing;
FIGS. 6A and 6B depict various arrangements for imple
60
menting audio analysis;
FIG. 7 depicts a video frame presentation sampled from
the same as traditional celluloid ?lm stock Which typically
FIG. 8 depicts the insertion of titles based on a further
FIG. 9 is a data How diagram of a preferred editing
method;
Turning to FIG. 1B, an analog video camera 20 includes
a record button 22 to enable recording of video footage,
typically onto a magnetic tape recording medium or the like.
A signal 24 may be output from the camera 20 for repro
duction and/ or editing of the recorded footage. The signal 24
is traditionally provided Without any indicators as to the
commencement or cessation of any individual clip Within
the overall footage that has been recorded. This is effectively
the clip segmentation of FIG. 5;
example of a clip arrangement;
desired. Automatically generated metadata may be inserted
into or associated With the clip sequence 16, typically
coincident With the depression and/or release of the record
button 12. The metadata in this fashion becomes a repository
of information that is characteristic of the clip and/or its
clips Which together form original raW footage;
FIG. 3 represents an analysis of the clips of FIG. 2
according to a “IO-4” rule de?ned herein;
FIG. 4 illustrates a segmentation of a clip based upon
date, may be automatically recorded. Other details, for
example entered by the user or generated by other metadata
tools may include data as to the location (for example
provides by a such GPS locator device) and/or event infor
mation being recorded at the time and other details as
BRIEF DESCRIPTION OF THE DRAWINGS
FIGS. 1A and 1B depict the sourcing of digital video clips
from each of digital and analog sources;
Typically, the metadata recorded With the video includes
reference points for the commencement and cessation of any
time at Which recording Was made. These times, and the
extracted images.
Other aspects of the present disclosure are speci?ed later
herein.
manually generated by the user of the camera 10 for inclu
65
has no speci?c mechanism for recognition of different clips.
In this regard, the traditional “clipboard” snapped at the
commencement of a traditional ?lm shoot is one indicator
that is traditionally manually identi?ed by the ?lm editor and
US 7,362,946 B1
5
6
is speci?cally provided for the synchronising of ?lm and
and any data fetched may be accomplished using the semi
conductor memory 46, possibly in concert With the hard disk
drive 53. In some instances, the application program may be
audio rather than merely the identi?cation of any one clip.
In order for either analog video derived from the camera
20 or ?lm stock 26 to be processed in a manner akin to the
supplied to the user encoded on a CD-ROM or ?oppy disk,
digital video data 16, it is necessary for each of the signal 24
or alternatively could be read by the user from the netWork
via the modem device 52.
In particular, the digital audio stream 16 or raW footage 24
or ?lm stock 26 as appropriate to be input to a digitiZer 28
Which converts the respective signals into a digital image
signal. The output of the digitiZer 28 is provided to clip
detector 30 Which detects transitions betWeen clips and
may be provided to the computer 41 in any appropriate
manner including via a computer netWork and the modem
52, by means of portable memory device such as CD ROM
55 or directly for example to a “video” input of the I/O
interface 48. In this fashion, the entirety of the raW video
forms metadata Which is combined With the output of the
digitiZer 28 in a summer 32 to provide a digital video signal
34 effectively comparable to that of the signal 16 derived
from the digital video camera 10.
The described arrangements may be implemented as a
computer application program hosted in a WindoWsTM oper
footage including each of the clips is available for comput
erised processing Within the computer 41.
ating system environment developed by Microsoft Corpo
connection to a netWork 59 Which may act as a source of
ration. HoWever, those skilled in the art Will recognise that
digital video information including both video images and
the described embodiment may be implemented on com
an accompanying audio track. Alternatively, a video input
interface 90 may be provided Which includes an digital input
91 con?gured to receive digital video information, for
example from a digital video camera 10, and an analog input
92 con?gured to receive video information 93 and audio
As seen in FIG. 10, the modem device 52 alloWs for
puter systems hosted by other operating systems. For
example, the preferred embodiment can be performed on
20
computer systems running UNXTM, OS/2TM, DOSTM. The
application program has a user interface Which includes
menu items and controls that respond to mouse and key
information 94, each in an analog format from a device such
board operations. The application program has the ability to
transmit processed data to one or more displays, printers or
as an analog video cassette recorder 95 or an analog video
25
storage arrangements, either directly connected to a host
camera 20. The signals 93 and 94 are input to respective
computer or accessed over a netWork. The application
analog-to-digital converters 96 and 97, the outputs of Which
are, like the digital input 91, are applied to the system bus
program also has the ability to transmit and receive data to
50 via an isolating buffer 78. Clip detection as shoWn in FIG.
a connected digital communications netWork (for example
the “Internet”).
1B may be performed by softWare Within the computer
30
module 41 so that metadata-enhanced digital video
The described arrangements can be practiced using a
sequences comprising images and audio tracks comparable
conventional general-purpose (host) computer system, such
to the sequences 16 and 34 of FIGS. 1A and 1B may be
as the computer system 40 shoWn in FIG. 10, Wherein the
application program discussed above and to be described
With reference to the other draWings is implemented as
softWare executed on the computer system 40. The computer
stored Within the system 40 for editing and other manipu
lation and reproduction via the output interface 47 and the
audio-video output device 56.
35
system 40 comprises a computer module 41, input devices
Rhythmic Sequence Editing
such as a keyboard 42 and mouse 43, and output devices
including a printer 57 and an audio-video output device 56.
A Modulator-Demodulator (Modem) transceiver device 52
various video clips (numbered 01-16) obtained from a
FIG. 2 shoWs a histogram 300 representing a sequence of
40
is used by the computer module 41 for communicating to
and from a communications netWork 59, for example con
nectable via a telephone line or other functional medium.
The modem 52 can be used to obtain access to the Internet,
and other netWork systems.
The computer module 41 typically includes at least one
processor unit 45, a memory unit 46, for example formed
from semiconductor random access memory (RAM) and
45
minimum of about 4 seconds (clip 15) through to 29 seconds
(clip 09). A delineation betWeen each of the individual clips
is provided by metadata mentioned above stored in associa
tion With each clip.
If vieWed in linear (time line) order, portions of the
original footage are likely to be construed as being boring,
50
relates to the editing of the raW video footage in a Way that
uninteresting and the like. Rhythmic sequence editing
read only memory (ROM), input/output (I/O) interfaces
including an output interface 47, and an I/O interface 48 for
the keyboard 42 a mouse 43 and optionally a joystick (not
illustrated). The output interface 47 couples to an audio
visual output device 56 typically incorporating a video
display and a loudspeaker arrangement. A storage device 49
is provided and typically includes a hard disk drive 53 and
a ?oppy disk drive 54. A CD-ROM drive 55 is typically
enhances vieWer appeal. Through careful revieW of profes
sional edited productions, it Was determined that the interest
of an audience tends to Wane after certain, relatively short
periods of time, particularly Where there is little or nothing
55
in particular upon Which the images are focussed. This Was
appreciated as particularly being the case in domestically
produced (amateur) video productions Where the content
provided as a non-volatile source of data. The components
recorded typically has more relevance to the actual ?lm
maker, rather than any future audience Which is often
45 to 49 and 53 to 55 of the computer module 41, typically
communicate via an interconnected bus 50 and in a manner
Which results in a conventional mode of operation of the
computer system 40 knoWn to those in the relevant art.
Examples of computers on Which the embodiments can be
particular item of footage, in this case an excursion to a
naval museum. It is seen from FIG. 2 that a total of 16
individual clips Were taken each of varying duration from a
60
comprised of family, friends or colleagues. This is to be
distinguished from professional productions such as feature
?lms, telemovies and the like Where characters and/or action
practiced include IBM-PC’s and compatibles, Sun Sparcs
can maintain the interest of an audience even over What
tations or alike computer systems evolved therefrom. Typi
might be considered as an excessively long clip that may
cally, the application program of the preferred embodiment
is resident on a hard disk drive 53 and read and controlled
using the processor 45. Intermediate storage of the program
65
take numerous minutes to conclude.
A number of rules Were then determined Which may be
applied to any individual clip in an automated fashion so as
US 7,362,946 B1
7
8
to achieve a best chance of reproducing the interesting
content of any individual clip. The rules determined by the
Which Would be less than 70% of the sequence duration of
the desired segment intervals. In a preferred embodiment,
Where a portion of a clip is betWeen 70% and 200% of the
present inventor are effectively tWo-fold. Firstly, the present
inventor has determined that, more often than not, the ?rst
desired segment duration, the portion may be modi?ed (by
portion of a clip, obtained immediately after depression of
time compression or expansion) so that the reproduction
time of the modi?ed portion, Which forms an edited clip,
matches that of an appropriate 10-4 second segment. For
example, a softWare product marketed under the name
QUICKTIME may be used to provide for the compression/
expansion of video time frames over the range of about
the record button 12 or 22 as the case may be, is typically
of little interest or of poorer quality in amateur circum
stances as this is usually the time taken by the user to focus
the camera onto the subject of the clip. This typically
occupies approximately one second of the clip and for this
purpose, a ?rst rule used in editing in a preferred imple
mentation is to ignore the ?rst second of any one clip. It is
noted that the period of one second is relative and may be
varied according to the duration of the clip in question or of
25-400%.
From the example of FIG. 3, the total duration of the raW
footage is 327 seconds spanning 16 raW clips, and as
illustrated, this is edited to provide 26 edited clips spanning
the clips that form the original footage.
a total of 176 seconds of reproducible images. As a result,
the overall play duration of the edited version is almost
The second substantive rule is to divide the remainder of
the clip into segments With each segment being one of a
halved compared to the original footage. Further, the edited
footage provides a rhythmic 4-10 second change betWeen
predetermined number of intervals each having a speci?c
time period. In this regard, the present inventor has deter
mined that by dividing a clip into segments, each of a
predetermined time period, and editing out other portions of
the clip Which do not match the predetermined time period,
20
alloWs for an effective compression of the amount of footage
to the reproduced, Whilst maintaining the essence of the clip
and the linearity of the overall footage. In the preferred
implementation, the present inventor has determined that
clip segments of duration of about 4 and 10 seconds, are best
25
a collection of clips can be based on the creation of a pro?le
of the duration of clips and other time related metadata in
order to apply a selected rule set, termed herein a “template”.
A hierarchy of rules may be embedded in the template to
accommodate clips of varying duration. For example, clips
of only a feW seconds or even frames can thus be managed
in a manner different to those of hours or many minutes of
used for the editing of domestic (amateur) video produc
tions. It Will be apparent that these time periods may be
altered depending upon the speci?c requirements of the user,
the type of source material provided, or, Where one is used,
clips to maintain audience interest.
Based on the foregoing, a system for the presentation of
duration.
Further, the manner in Which individual segments are
30
edited from the original footage may be varied according to
the actual content of the footage. For example, Whereas FIG.
the type of editing template selected (to be described below).
3 utilises speci?c timing rules for the selection of edited
FIG. 3 shoWs a clip analysis according to the above-noted
rules for the naval museum excursion depicted in FIG. 2. As
as comprising sixteen raW clips (01-16) With each clip being
clips from raW clips, altemative selections can be made. For
example, as illustrated in FIG. 4, analysis of an audio track
Which accompanies the original raW video can be used to
identify areas of interest. Examples of such areas of interest
divided in some Way into one or edited clips (001-026)
include that of the cheer of a croWd at a sporting event or the
desired for reproduction. From FIG. 3 it Will be apparent that
sound of a speaker at a conference. In this fashion, analysing
the audio track to identify passages of increased audio level
provides a point at Which clip selection may be made either
commencing at that point or straddling that point so as to
can be seen from FIG. 3, the raW footage is indicated again
35
each of the edited clips (001-026) commences no sooner
than 1 second into each of the raW clips (01-016). Further,
the ?rst raW clip (clip 01), Which is seen as being approxi
mately 7 seconds long is edited to provide a ?rst clip
segment (clip 001) of a 4 second duration. Since the remain
der of clip 01 of the raW footage is not suf?cient to
accommodate another edited segment, the next edited seg
ment is derived from the next raW clip 02. In this particular
40
obtain the relevant and probably interesting content before,
including and folloWing the audio peak.
Although audio detection for identi?cation of interesting
45
values compared to a predetermined threshold, it is often
advantageous for that threshold to be variable and re?ective
of a background noise level rather than total noise level.
With this, the system may generate a pro?le per presentation
example, editing takes place using alternate 4 second and 10
second clips and this seen in respect of the second raW clip
02 Where a 10 second edited segment 002 is extracted from
that clip. Further, since the second raW clip 02 has a duration
of 20 seconds, this provides a mechanism Whereby a further
4 second clip 003 may be edited from the second raW clip 02.
As can be seen from FIG. 3, a predetermined time period, in
this embodiment of 2 seconds, is provided to separate edited
clips derived from any one raW clip.
In this fashion, each of the raW clips 01-16 of the naval
50
rhythmic sequencing method acts to reject any edited clip
of a clip collection or on an individual clip basis, for
thresholded peak examination and identi?cation.
FIG. 6A illustrates an arrangement 60 that alloWs for the
55
discrimination of audio peaks in the presence of substantial
background noise, Which may be of a highly variable nature.
An audio signal 62 is input to a non-inverting input of a
comparator 64 and also to a loW pass ?lter 66. The time
constant of the loW pass ?lter 66 is set at a period suf?cient
to ?lter out loW level background noise not desired for
museum excursion are edited using alternate 4 and 10
second segments as required. As Will be apparent from FIG.
3, the number of edited segments derived from any indi
vidual raW clip is dependent upon the duration of the original
raW clip. Further, as is apparent from raW clip 15, since that
clip is of a duration less than 5 seconds, the rules described
above do not permit editing any resulting clip from that raW
footage. This is because removing the ?rst 1 second of clip
15 leaves less than 4 seconds Which is less than the desired
10 second segment of the sequence. In this example, the
clip segments can be performed merely by examining peak
60
triggering the provision of metadata or the like. The output
of the loW pass ?lter 66 is provided to an inverting input of
the comparator 64 and provides What is in effect an audio
signal averaged over the time constant of the loW pass ?lter
66. The comparator 64 acts to compare the average and
65
instantaneous audio signals to provide a trigger signal 68
indicative of When the instantaneous signal exceeds the
average. The trigger signal 68 may be included With the
video sequences as (further) metadata.
US 7,362,946 B1
9
10
arrangement 70. An audio input signal 84 is input to a full
user to provide an interpretation of the raW video footage
Whether or not that raW video footage may be suited to the
Wave recti?er 82 Which provides a full Wave recti?ed signal
particular template selected.
FIG. 6B illustrates a more re?ned audio detection
72 to a pair of loW pass ?lters 74 and 76, each having
corresponding time constants "c1 and '52. The loW pass ?lters
64 and 76 output to respective inputs of a comparator 78
Which is also con?gured to provide a trigger signal 80. With
this con?guration, the time constants "c1 and '52 may be set
EXAMPLE 1
Standard Template
to provide appropriate discrimination betWeen background
noise and desired audio. For example, '52 may be set to a
The standard template is one that may be applied to
provide a basic editing of a Wide variety of source footage.
relatively long period (eg. 5 seconds) so as to provide a fair
The various attributes of the template are as folloWs:
representation of background noise, thus alloWing for its
discrimination. '51 may be set to a loWer period su?icient to
(i) Sequence:
alloW for the detection of desired noise content (eg. cheering
Sequence is a time basis upon Which the footage is cut to
give a ?nal edited result. Speci?cally a line sequence may
of a croWd or a desired speaker’s voice) Whilst still provid
specify the actual duration of edited clips, Which in the
ing for discrimination from momentary transient sounds. In
this regard, '51 may be set at a time period of approximately
above example accords to a 10-4 second format. Other
formats such as 12-4 or 12-6 may alternatively used.
0.5 seconds. As a consequence, the circuit 70 operates to
provide a trigger signal 80 that varies betWeen tWo logic
levels suf?cient to provide a marker or metadata as to When 20
(ii) Duration:
a desired audio peak is established. This metadata may be
combined With the raW video footage and used in the clip
Duration is generally determined by the number and
duration of clips in the raW footage. The overall edited
analysis procedures for identifying segments of interest of
sequence duration may be forced to map to the duration of
an accompanying audio track intended to be dubbed into the
edited video. Such may not hoWever be appropriate for
audio tracks longer than seven minutes.
possible selection in the ultimate edited version.
The arrangements of FIGS. 6A and 6B may be imple
mented using analog electronics, for example at an input of
the audio-ADC 97. Alternatively, implementation using
digital arrangements either by hardWare (a DSP device
con?gured Within the computer module 41) or softWare
(operating Within the computer module 41) that implements
25
(iii) Transitions:
30
digital ?ltering and level comparison.
(iv) Cutting Rule:
Further, the editing of raW footage may be substantially,
In a preferred implementation, a number of cutting rules
or at least perceptually, synchronised to an audio track
intended to be dubbed over the edited footage. This involves
examining the audio track to identify an appropriate beat and
are applied as folloWs:
35
adjusting the reproduction rate of either one or both of the
audio or the video to achieve perceptual synchronism. For
example, music having a beat of 120 beats per minute has 2
beats per second Which divides equally into any rhythmic
sequence having edited clips of duration Which is an integer
multiple of 0.5 second, such as the 10-4 sequence described
above.
With the foregoing described automatic detection meth
ods, and others to be described, it is thus possible to process
raW video footage comprised of one or more clips to identify
portions of interest Which may form clip segments in an
Transitions betWeen edited clips are preferably achieved
using a four frame cross fade betWeen each clip.
(a) Clips are cut in chronological order.
(b) Remove one second from the beginning and end of
each original clip before determining a neW clip cut length.
(c) Add a 12 frame cross fade betWeen tWo edited clips
taken from same original raW clip.
40
(d) Where possible apply the 10-4 rhythmic cutting
sequence.
(e) When the duration of the clip alloWs more than one
clip to be cut, alWays ensure the remaining duration alloWs
45
for 1 second to be omitted from the end, and 4 seconds to
omit from betWeen the tWo clips.
edited production that provides a rhythmic sequence of
Cutting Rule ExampleiStandard Template
images more likely to attract and maintain the interest of a
vieWer.
If the ?rst raW clip is less than 7 seconds, cut to 4 seconds.
If the raW clip is 7 seconds, but less than 10, time stretch the
original raW clip to 12 seconds and then cut the stretched clip
doWn to provide a 10 second (someWhat sloWer motion)
clip. If the next original raW clip is 14 seconds or more, and
less than 20 seconds, omit the ?rst second and cut the next
4 seconds, omit the next 4 seconds, cut the next 4 seconds,
omit the remainder until the end of the end of the raW clip.
According to a particular implementation, the actual rules
applied in the interpretation of any raW video signal are
established by a template arrangement Which provides for
the creation of edited video sequences based upon prede
termined video production styles and Which may be suited to
different types of raW video image. Examples of templates
50
55
each incorporating predetermined combinations of editing
If the next raW clip is 20 seconds or more, omit the ?rst
rules Which may be used to edit raW video images to provide
second, cut 4 seconds, skip the next 4 seconds, cut the
remaining 10, omitting the remainder up to 27 seconds. If
an edited sequence include:
standard 10-4 format,
music video,
music trailer,
quick look summary,
the next clip is 28 seconds or more, omit the ?rst second, cut
60
der up to 38 seconds.
(v) Effects:
romance, and
action.
Each different template is con?gured to provide a stylis
tically and structurally different result and is selected by the
4 seconds, skip the next 4 seconds, then cut 10 seconds, omit
the next 4 seconds, cut 4 seconds, and omitting the remain
65
This relates to any visual effects that may be applied to the
video footage. In the standard template no effects are
applied.
US 7,362,946 B1
11
12
(vi) Time Stretching:
?rst second, cut 4 seconds, skip the next 4 seconds then cut
12 seconds, omit the next 4 seconds, cut 4 seconds omitting
the remaining up to the 38 seconds.
Time stretch the last clip of the edited video up to 200%
to make a neW duration of 12 seconds. Omit the ?rst and last
seconds of the clip by cutting it doWn to 10 seconds. Fade
(vi) Effects:
out to black or template default for the last 3 seconds.
(vii) Audio:
Utilise an animated fog ?lter to provide a misty “roman
tic” appearance.
The audio is beat stretched to suit the sequence (either
increased or decreased to achieve the best possible match).
(vii) Audio:
(viii) Mattes:
so as to increase or decrease to achieve the best possible
Beat stretch/compress the audio to suit the video sequence
match.
(a) An editable title matte is placed in sequence duration
during the ?rst 10 seconds from Which a fade occurs to the
(viii) Mattes:
?rst clip. An editable “The End” matte is provided in
sequence at the conclusion of the edited clip.
(b) Editable scene and cast masts may be provided and
need not be placed in sequence.
(a) Editable title matte placed in sequence duration 10
seconds With a fade to the ?rst clip.
(b) Editable “The End” matte provided in sequence.
(c) Editable scene cast and mast provided but not placed
into any particular sequence.
EXAMPLE 2
Romance Montage
EXAMPLE 3
20
Music Video Clip
(i) Sequence: 12-4 seconds
In this regard, since romance type footage is typically
more sedate, and thus the sequence duration is extended
slightly compared to the 10-4 sequence to give a more
(i) Sequence:
25
The sequence in this example is dependent on the audio
beat, since generally the video is intended to complement the
audio, not vice versa (as it sometimes may appear). For
example, for music With less than 100 beats per minute, the
30
exceeding 100 beats per minute, an 8-3 basis sequence can
be used. In each case the actual clip intervals are adjusted to
relaxed, sloWer pace.
(ii) Duration:
Duration is generally determined by the number and
duration of clips in a directory. The duration sequence can be
forced to map to an audio track duration although this is not
recommended for tracks longer than seven minutes.
10-4 sequence is used as a basis. For beats equal to or
permit substantial beat synchronisation. For example, With
music at 96 beats/minute gives 1.6 beats/ second, the footage
(iii) Transitions:
For 12 second clips, fade-in to the next clip from 0 to
100% opaque With the last 2 seconds before the current clip
ends. Use a four frame cross fade betWeen each clip.
may be cut in a sequence of 10 seconds and 3.76 seconds
35
(ii) Transitions:
(iv) Time Stretching:
General four frame cross fade betWeen each clip.
(a) SloW the speed of clips by stretching the duration to
150% thus giving a more relaxed, romantic feel.
thereby approximating 16 and 6 beats respectively and
providing perceptual synchronism.
40
(iii) Duration:
(b) Stretch the speed of the last clip up to 200% to make
Duration of the cut sequence is forced to map to audio
a neW duration of 12 seconds (creating the effect of sloW
track duration. This is not recommended for tracks longer
than six minutes.
motion), omit the ?rst and last second of the clip by cutting
it doWn to 10 seconds, and applying a fade out to black
template over the last 3 seconds of those 10 seconds.
45
(v) Cutting Rule:
to be cut, alWays ensure the remaining duration alloWs for 1
second to be omitted from the end and 4 seconds to be
omitted from betWeen the tWo clips.
(a) Cut in chronological order.
(b) Remove 1 second from the beginning and end of each
original clip before determining neW clip cut length.
(a) Cut in chronological order.
(b) Remove 1 second from the beginning for determining
a neW clip cut length.
(c) Add a 2 second cross fade betWeen the tWo clips taken
from the same shot.
(d) When the duration of a clip alloWs more than one clip
(iv) Cutting Rule:
(c) Add a 12 frame cross fade betWeen clips taken from
the same shot.
50
(d) Apply the (eg. 10-4) rhythmic cut sequence.
(e) When the duration of a clip alloWs for more than one
clip to be cut, alWays ensure the remaining duration alloWs
for one second to be omitted from the end and 4 seconds to
omit from betWeen the tWo clips.
55
Cutting Rule ExampleiMusic Video (for the 10-4
Cutting Rule ExampleiRomance Montage
Sequence)
If the ?rst raW clip is less than 8 seconds, cut to 4 seconds.
If the clip is 8 seconds but less than 12 seconds, time stretch
to 14 and cut doWn to 12 seconds. If the next raW clip is 14
seconds or more and less than 20 seconds, omit the ?rst
second, cut the next to 4 seconds, omit the next 4 seconds,
cut the next clip to 4 seconds, omit the remaining until 20
seconds. If the next raW clip is 20 seconds or more, omit the
If the ?rst raW clip is less than 7 seconds, cut to 4 seconds,
if the clip is 7 seconds but less than 10 seconds, time stretch
?rst second, cut 4 seconds, skip the next 4 seconds, then cut
the remaining 12 seconds omitting the remainder up to 27
seconds. If the next raW clip is 28 seconds or more, omit the
60
to 12 seconds and cut doWn to 10 seconds. If the next raW
65
clip is 14 seconds or more and less than 20 seconds, omit the
?rst second, cut the next 4 seconds, omit the next four, cut
the next 4 seconds, omit the remaining until 20 seconds. If
the next raW clip is 20 seconds or more, omit the ?rst second,
cut 4 seconds, skip the next 4 seconds and then cut the
remaining 10 seconds omitting any remained up to 27
seconds. If the next raW clip is 28 seconds of more, omit the
US 7,362,946 B1
13
14
?rst second, cut 4 seconds, skip the next 4 seconds, then cut
10 seconds, omit the next 4 seconds, cut 4 seconds, omitting
lengths to be an even fraction of the overall duration. For a
the remainder up to 38 seconds.
segments spliced together. Each segment may be obtained
by dividing each clip into 3 second portions, each separated
30 second output, this may be formed of 30 one second
(V) Effects: None.
by a 1 second (Waste or cut) portion. Each 3 second portion
may be compressed in time by 300% to give the desired
reproduction duration. Thirty of these portions are then used
(vi) Time Stretching:
For a 10-4 sequence, time stretch the last clip up to 200%
to make a neW duration of 12 seconds, omit the ?rst and last
to form the Quick Look previeW. Where the raW clips are of
second of the clip cutting it doWn to 10 seconds. Fade out to
black or template default for the last 3 seconds.
varying duration, it may be desirable in the template to
(vii) Audio:
Quick Look4Comparative Example
ensure a portion is extracted from each raW clip.
Although not preferred in order to ensure audio integrity,
This Example compares a number of Quick Look
the beat may be stretched or compressed to suit the sequence
and obtain a best possible match.
sequence rules against an Action sequence, as seen in Table
1 beloW:
(viii) Matte:
TABLE 1
(a) Editable title matte placed in sequence duration 10
seconds for the ?rst clip, fade into the ?rst clip.
(b) Editable “The End” matte provided in sequence.
Editable scene and cast matte provided but not placed in
sequence.
EXAMPLE 4
Quick Look Template
Rule
20 set ‘IN’ point from start of clip
set ‘OUT’ point from end of clip
period to pass before setting next ‘IN’ point in
same clip When duration alloWs
clip speed
skip clip if duration x% is smaller than cut
25 duration
number of cuts that make a rhythm cycle
duration of cut in rhythm cycle
duration of cut in rhythm cycle
The Quick Look template provides the user With a short
Action
Quick Look
1 sec.
1 sec.
2 sec.
2 sec.
2 sec.
1 sec.
100%
200%
70%
70%
2
no cycle
10 sec.
4 sec.
i
i
running previeW of all of the footage that may be presented
Within raW video content. The Quick Look template pro
vides the previeW Within a designated default time period,
for example 30 seconds, or to a time period speci?ed by the
user. The rhythmic editing sequence is applied to accom
30
edited video reproduction. It Will be appreciated by those
modate the original raW footage of a duration many times
longer than the predetermined time period (30 seconds) by
cutting clips to short durations of only frames in length. Clip
The above Examples only describe a feW different tem
plate arrangements Which may achieve a certain style of
35
speed may be altered in the range 100-600% for example
skilled in the art that the above rules regarding automated
editing can be modi?ed to establish alternative template
con?gurations. An example of this is Where different ranges
of compression/ stretch may be used based on the particular
and cut durations may range from a feW frames to a feW
genre being manipulated. Examples of other types of tem
seconds. A variety of Quick Look templates may be formed
plates can include those that re?ect various established
as desired.
Quick Look Example 1
40
Clips may be cut into segments of ten frames and four
frames in a fashion corresponding to the 10-4 rule men
tioned above. In order to present more footage into these
short durations, the footage is stretched sometimes up to
300% of the original play speed, and in some cases, the
45
frame rate of the original footage is reduced. For example,
using the arrangement shoWn in FIG. 3 Where it Was
indicated that 176 seconds of standard edited clips Were
derived using the 10-4 second rule, those same clips may be
processed to extract ten and four frame segments from each
clip giving 176 frames for reproduction. At a frame rate of,
say, 25 frames per second as used in the PAL reproduction
50
system, this equates to approximately 7 seconds of replay
time. According to the Quick Look implementation, selected
ones of the four and ten frame segments, or alternatively
their entirety, are stretched to achieve the 30 second previeW
time. The user can select longer “previeW” of the raW
footage and can adjust system parameters such as frame rate
and time stretching.
60
Quick Look Example 2
In this example, a rhythmic cutting rule such as the 10-4
rule is not applied. The user speci?es the duration of the
Quick Look, Which generally Will be related to the overall
length of the source footage. For instance, 5 minutes of raW
footage may be desired to be compressed into 30 seconds of
Quick Look reproduction. The user can adjust the cut
65
movie styles, such as “martial arts”, “sci-?”, “horror”, “War”
and “Westem”. Other styles such as “sports” and “action”
may be used. Where desired, multiple templates may be
applied to raW footage. For example, raW footage may be
edited according to the romance template, and the edited
version then further edited according to an action template.
Further, Where multiple templates are used in sequence or
combined, a hierarchy of the various rules may be applied,
not to override any particular effect, but to permit priority
ordering of the rules and their application.
In each instance, the particular style is founded upon the
use of a number of standard edited clip lengths (eg. 10-4,
12-4) Which have been found to be generally applicable to
the style of edited reproduction that is desired. Although the
above templates supply only tWo base clip lengths, a further
number of clip lengths may be used depending upon the
particular circumstances. For example, Wild life or scenic
footage may be Well suited to the editing of longer clips, at
least intermittently With clips of shorter duration. Further,
although the 10-4 and 12-4 format is preferred in the various
templates described above, other durations may be used.
Typically, the “shorter” duration Will typically be of a period
of betWeen 1 and 8 seconds With the longer duration being
betWeen 12 and 20 seconds. Also, the 1 second cutting from
the commencement of each clip can be varied. Typically, any
period betWeen 0.5 and 2 seconds may be used. Further, the
2 second interval betWeen the selection of edited segments
may be varied. Durations of 1 to 5 seconds may be appro
priate. Further, Whereas the embodiment of FIG. 3 depicts
US 7,362,946 B1
15
16
alternate 4 and 10 second edited clip lengths, the selection
betWeen the various clip lengths may be modi?ed to an
alternative pattern for example, short, short-short-long or
storage 108 and provided to a metadata determination pro
cess 110 Which acts to process the raW digital footage 106 so
as to extract additional metadata 112 for combination With
the original metadata 104 in a summing arrangement 114.
alternatively, a particular duration for use may be selected on
a pseudo-random basis. Such may be found to be useful
template structure using its rhythmic sequence in appropri
The metadata extraction process 110 may include an audio
extraction arrangement such as those indicated in FIG. 4,
FIG. 6A or FIG. 6B. Alternatively, or additionally, other data
extraction processes may be used. These may include com
ate transitions, beat synchronised music and to add digital
effects. Examples of such effects include altering the original
parison of individual frames of the raW footage to identify
motion of parts of the image and any collision therebetWeen,
Where there are more than tWo base durations.
The system cuts the raW footage according to the chosen
colour palette, fog ?ltering the image, and distorting the
image. In this fashion, the rhythmic sequence editing system
such for example being useful for the provision of captioned
graphics and/or sound effects. Other metadata extraction
processes include face detection, line detection and motion
described herein applies the skills of a ?lm editor, sound
editor and special effects editor to the source video taken by
an amateur thereby alloWing the amateur to be free to direct
the rearrangement of the video to modify, adjust or simply
detection, to name but a feW. As seen in FIG. 9, the process
110 includes an input 111 that permits the user to select a
appreciate the results. The process of applying these effects
particular metadata extraction process to be performed.
Where metadata 112 is extracted using the process 110, such
to the raW video is fast and is Well suited to olf-line (ie. non
may be saved in the store 108 With the raW video alongside
real-time) processing Within the computer system 40. Such
a process also frees the amateur ?lm maker to make higher
existing metadata. By default, no additional metadata extrac
20
tion processes are performed. The summer 114 outputs
combined metadata 116 to an application module 118.
In order for beat synchronisation to be performed, an
overdub audio source 136 is analysed by a beat extraction
process 138 that identi?es the beat of the source 136 Which
25
may be used in rhythmic sequence editing. The extracted
beat 139 is input to the application module 118.
Also input to the application module 118 is a speci?c
editing template 120 selected by the user via a multiplexer
122 from a repository of templates 124. As seen in FIG. 9,
in addition to predetermined templates that may be provided
level decisions regarding the content of the edited video
rather than consuming time through the repetitive task of
placing transitions and in-output points in their clips. The
arrangement also permits real-time operation. For example,
for a given raW video source, a desired template may be
selected by a simple keystroke or clicking of the mouse 43
Which results in the automatic editing of the video source by
the system 40 and a rendering of the edited sequence to the
output device 56 for immediate vieWing by the user. Further,
multiple WindoWs may be operated permitting simultaneous
30
real-time editing and reproduction of source video according
to multiple templates. This, for example, can permit a
direct to the user, provision is also included for the user to
de?ne their oWn template structure Which may be an original
real-time comparison betWeen a 10-4 template and a 12-4
template thus permitting the user to an output that is more
creation or alternatively a modi?cation of one or more
appealing.
existing templates. The application module 118 applies the
35
In operation, the rhythmic sequencing editing system
described achieves edited results by examining the metadata
associated With the raW video footage to produce an edit
decision list (EDL) Which represents a combination of the
information from the above-noted templates. Since funda
mental rhythmic sequence processing can be performed
40
solely upon the metadata, Which includes clip number,
duration, frame number and the like, and Without any
and Without requiring the video maker to devote (typically
hours of) time setting appropriate in and out points. Once the
edit decision list is created, the list is applied to the raW
footage to select the appropriate bit sequences for reproduc
tion. This may be performed in real-time or alternatively by
copying the edited output to a contiguous reproduction ?le.
?nal edited version. The edit display list 126 also includes an
input 128 permitting the user to edit any title segments
associated With the edited version.
The combined metadata 116 may be represented as a list
and retained With the edit display list 126 and may be used
to mark edited clips of importance in the ?nal edited
knoWledge or need to access the actual video content,
evaluation of the edit decision list can be achieved quickly
selected template to the metadata 116 and extracted beat 139
to form the edit display list (EDL) 126 Which represents the
clip actual segments and their corresponding periods to be
selected from the raW digital footage for reproduction in the
45
sequence.
The edit display list 126 is input to a further application
module 130 Which interprets the edit display list to cut the
raW digital footage stored in the storage 108 and extract
50
appropriate edited segments. The application module 130
also extracts graphics, including animation and captions,
together With any appropriate sound effects from a storage
FIG. 9 depicts a data How arrangement for a method of
132 for combination With the edited video to provide an
editing raW video footage in accordance With a speci?c
edited video output 134. Where appropriate, the edit display
implementation. RaW digital footage comprising video
images and any accompanying audio track is input at step
100 together With metadata associated With that footage and
identifying at least the various clips Within the footage. The
input footage is provided to an extraction process 102 that
list 126 can output beat control commands 156 to a beat
55
the rhythmic sequence editing formed by the application
extracts the metadata 104 from the raW digital footage 106.
The raW digital footage 106, including images and any
adjustment unit 158 Which is con?gured to alter the repro
duction rate of the overdub audio source 136 so as to match
60
module 130. It Will be appreciated in this regard that in some
instances it may be appropriate to substantially match the
audio reproduction rate to speci?c edit intervals (eg. 10-4) or
accompanying audio, is stored in an appropriate manner,
typically on a magnetic hard disk storage arrangement 108.
alternatively adjust the edit intervals (eg. from 12-4 to
Where the particular con?guration permits, the input raW
136.
The edited video 134 may then be combined in a summing
footage may be derived from the store 108, as indicated by
the line 154.
Where it is desired for further metadata to be derived from
the raW digital footage, that footage is extracted from the
11.5-3.5) to substantially match the beat of the audio source
65
unit 140 With the overdub audio track derived either directly
from the source 136 or the beat adjustment unit 160 as
required. The summing unit 140 outputs edited audio-visual
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement