MlCROPHOIiE/Q`—
US 20070245375A1
(19) United States
(12) Patent Application Publication (10) Pub. No.: US 2007/0245375 A1
(43) Pub. Date:
Tian et al.
(54)
METHOD, APPARATUS AND COMPUTER
(22) Filed:
PROGRAM PRODUCT FOR PROVIDING
CONTENT DEPENDENT MEDIA CONTENT
MIXING
(51)
Nurminen, Lempaala (Fl)
Correspondence Address:
(52)
CHARLOTTE, NC 28280-4000 (US)
Int. Cl.
H04N 5/445
H04N 7/10
H04N 7/025
(2006.01)
(2006.01)
(2006.01)
US. Cl. ............................... .. 725/45; 725/34; 725/35
(57)
ABSTRACT
A method of providing content dependent media content
mixing includes automatically determining an emotional
property of a ?rst media content input, determining a speci
?cation for a second media content in response to the
(73) Assignee: Nokia Corporation
(21) Appl. No.:
Mar. 21, 2006
Publication Classi?cation
(75) Inventors: Jilei Tian, Tamere (Fl); J ani
ALSTON & BIRD LLP
BANK OF AMERICA PLAZA
101 SOUTH TRYON STREET, S UITE 4000
Oct. 18, 2007
determined emotional property, and producing the second
media content in accordance With the speci?cation.
11/385,578
TRANSMITTER
14—/
:] 162O _\
RECEIVER
——>
DISPLAY
<—
KEYPAD
CONTROLLER
RINGER
22
3Q_/
sPEAK|jtU__
t
24
MlCROPHOIiE/Q'—
NON
VOLATILE
VOLATILE
MEMORY
MEMORY
26
UIM
l
/—28
Patent Application Publication Oct. 18, 2007 Sheet 1 0f 5
Us 2007/0245375 A1
10
12
{
\/ ’
TRANSMITTER
14 J
i] 1620 _\
RECEIVER
———>
DISPLAY
<—
KEYPAD
CONTROLLER
RINGER
22 -/[ U
sPEAKljujqi
24
__>
30 _/
4O —\ t
ii
/— 42 _/_
NON-
M'CROPHO'iE/Q—
26
_
VELQS'RLE VOLATILE
MEMORY
UIM
/_
‘FIG. 1.
38
l
/— 28
Patent Application Publication Oct. 18, 2007 Sheet 2 0f 5
US 2007/0245375 A1
62
/- s2
/_
AP
AP
\ /_ 48
1o _\
I
/_ 48
ORIGIN SERVER
s2 -\
TERMINAL
-
/- 52
COMPUTING
COMPUTING
SYSTEM
SYSTEM
/
12
“I
F62
/— 10
AP
H
r
48
f
GTW
GGSN
TERMINAL
12 —\T
10 -\
TERMINAL
/I\ / MSC
12
BS
/- 10
44 _/
\
sGsN
\- 56
TERMINAL
60
Patent Application Publication Oct. 18, 2007 Sheet 3 0f 5
82
US 2007/0245375 A1
82
TEXT CONTENT
/— 76
TEXT PROCESSOR
ANALYZER
/
/— 72
i
86
/
/— 78
PROSODIC PROCESSOR
EXPRESSIVE PERFORMANCE
AND/OR SELECTION
(RULES/MODELS)
v
i
//— 8O
ACOUSTIC SYSTHESIZER
MUSIC PLAYER
(MIDI, MP3’ Em)
84
9O
AUDIO
MIXER
FIG. 3.
/—
92
as
Patent Application Publication Oct. 18, 2007 Sheet 4 0f 5
US 2007/0245375 A1
Fade Gain
>
Happy
Birthday
FIG. 4.
Time
Patent Application Publication Oct. 18, 2007 Sheet 5 0f 5
DETERMINE AN
EMOTIONAL
US 2007/0245375 A1
100
/
PROPERTY OF A TEXT
INPUT
V
DETERMINE A SPECIFICATION FOR
110
MUSICAL CONTENT IN RESPONSE /_
TO THE DETERMINED EMOTIONAL
PROPERTY
DELIVER THE MUSICAL CONTENT
TO AN OUTPUT DEVICE IN
120
/_
ACCORDANCE WITH THE
SPECIFICATION
MIX THE MUSICAL CONTENT WITH
SYNTHETIC SPEECH DERIVED
FROM THE TEXT AT THE OUTPUT
DEVICE
FIG. 5.
130
/
Oct. 18, 2007
US 2007/0245375 A1
METHOD, APPARATUS AND COMPUTER
PROGRAM PRODUCT FOR PROVIDING
CONTENT DEPENDENT MEDIA CONTENT
MIXING
FIELD OF THE INVENTION
TTS, there has come a desire to further enhance the user’s
experience When receiving TTS output. Accordingly, one
Way to improve the user’s experience is to deliver back
ground music that is appropriate to the text being delivered
via an audio mixer. In this regard, background music may be
considered appropriate to the text if the background music
conveys the same mood or emotional qualities as the asso
[0001] Embodiments of the present invention relate gen
erally to mobile terminal technology and, more particularly,
relate to a method, apparatus, and computer program prod
uct for providing content dependent media content mixing.
the background for text that conveys a positive or uplifting
BACKGROUND OF THE INVENTION
be equally enhancing for short messages, emails, and other
applications as Well. Currently, methods for mixing music
and TTS involve embedding explicit tags into the text
[0002] The modern communications era has brought about
a tremendous expansion of Wireline and Wireless netWorks.
Computer netWorks, television netWorks, and telephony
netWorks are experiencing an unprecedented technological
expansion, fueled by consumer demand. Wireless and
mobile networking technologies have addressed related con
sumer demands, While providing more ?exibility and imme
diacy of information transfer.
[0003] Current and future networking technologies con
ciated text With, for example, upbeat music being played in
message. This is especially enhancing for gaming experi
ences and audio books, for example. HoWever, the effect can
through manual effort. The text is examined and tags for
particular sound effects are inserted. Each sound effect is
treated as an independent track With an independent time
line, volume and sample rate. Accordingly, a large amount of
storage space is required to store such information. Although
either the user or creator of the text may perform the tagging,
a time consuming and laborious process results since each
command such as Mix, Play, Stop, Pause, Resume, Loop,
Fade, etc., must be manually inserted. Furthermore, the
tinue to facilitate ease of information transfer and conve
nience to users. One area in Which there is a demand to
music is sometimes not appropriately selected for the mood
increase ease of information transfer relates to the delivery
exists for providing a user With the ability to enjoy music
that is tailored to a particular text automatically, and Without
a requirement for such signi?cant effort.
of services to a user of a mobile terminal. The services may
be in the form of a particular media or communication
application desired by the user, such as a music player, a
game player, an electronic book, short messages, email, etc.
The services may also be in the form of interactive appli
cations in Which the user may respond to a netWork device
in order to perform a task or achieve a goal. The services
may be provided from a netWork server or other netWork
device, or even from the mobile terminal such as, for
example, a mobile telephone, a mobile television, a mobile
or emotion of a particular content section. Thus, a need
BRIEF SUMMARY OF THE INVENTION
[0007] A method, apparatus and computer program prod
uct are therefore provided that alloWs automatic content
dependent music mixing. Additionally, the music mixing
does not require embedded tags, thereby reducing memory
requirements and, more importantly, eliminating the labori
gaming system, etc.
ous process of tag insertion. Furthermore, the music is
selected or generated responsive to the emotion expressed in
[0004]
the text.
In many applications, it is necessary for the user to
receive audio information such as oral feedback or instruc
tions from the netWork. An example of such an application
may be paying a bill, ordering a program, receiving driving
instructions, etc. Furthermore, in some services, such as
audio books, for example, the application is based almost
entirely on receiving audio information. It is becoming more
common for such audio information to be provided by
[0008]
In one exemplary embodiment, a method of pro
viding content dependent media content mixing is provided.
The method includes automatically determining an emo
tional property of a ?rst media content input, determining a
speci?cation for a second media content in response to the
computer generated voices. Accordingly, the user’s experi
determined emotional property, and producing the second
media content in accordance With the speci?cation.
ence in using such applications Will largely depend on the
quality and naturalness of the computer generated voice. As
a result, much research and development has gone into
[0009] In another exemplary embodiment, a computer
program product for providing content dependent media
content mixing is provided. The computer program product
improving the quality and naturalness of computer generated
voices.
[0005] One speci?c application of such computer gener
includes at least one computer-readable storage medium
having computer-readable program code portions stored
therein. The computer-readable program code portions
(TTS). TTS is the creation of audible speech from computer
include ?rst, second and third executable portions. The ?rst
executable portion is for automatically determining an emo
readable text. TTS is often considered to consist of tWo
stages. First, a computer examines the text to be converted
tional property of a ?rst media content input. The second
executable portion is for determining a speci?cation for a
ated voices that is of interest is knoWn as text-to-speech
to audible speech to determine speci?cations for hoW the
text should be pronounced, What syllables to accent, What
pitch to use, hoW fast to deliver the sound, etc. Next, the
computer tries to create audio that matches the speci?ca
tions.
second media content in response to the determined emo
tional property. The third executable portion is for producing
the second media content in accordance With the speci?ca
tion.
With the development of improved means for
[0010] In another exemplary embodiment, a device for
providing content dependent media content mixing is pro
delivery of natural sounding and high quality speech via
vided. The device includes a ?rst module and a second
[0006]
Oct. 18, 2007
US 2007/0245375 A1
module. The ?rst module is con?gured to automatically
sure Will satisfy applicable legal requirements. Like refer
determine an emotional property of a ?rst media content
ence numerals refer to like elements throughout.
input. The second module con?gured to determine a speci
[0011] In another exemplary embodiment, a mobile ter
minal for providing content dependent media content mixing
is provided. The mobile terminal includes an output device,
[0021] FIG. 1 illustrates a block diagram of a mobile
terminal 10 that Would bene?t from the present invention. It
should be understood, hoWever, that a mobile telephone as
illustrated and hereinafter described is merely illustrative of
one type of mobile terminal that Would bene?t from the
present invention and, therefore, should not be taken to limit
the scope of the present invention. While several embodi
a ?rst module and a second module. The ?rst module is
ments of the mobile terminal 10 are illustrated and Will be
con?gured to automatically determine an emotional prop
erty of a ?rst media content input. The second module
con?gured to determine a speci?cation for a second media
content in response to the determined emotional property
and produce the second media content in accordance With
hereinafter described for purposes of example, other types of
mobile terminals, such as portable digital assistants (PDAs),
pagers, mobile televisions, laptop computers and other types
?cation for a second media content in response to the
determined emotional property and produce the second
media content in accordance With the speci?cation.
the speci?cation.
[0012] In an exemplary embodiment, the ?rst module is a
text content analyZer and the ?rst media content is text,
While the second module is a music module and the second
of voice and text communications systems, can readily
employ the present invention.
[0022]
In addition, While several embodiments of the
method of the present invention are performed or used by a
mobile terminal 10, the method may be employed by other
media content is musical content.
than a mobile terminal. Moreover, the system and method of
the present invention Will be primarily described in con
[0013]
junction With mobile communications applications. It should
be understood, hoWever, that the system and method of the
Embodiments of the invention provide a method,
apparatus and computer program product for providing
content dependent music mixing for a TTS system. As a
result, users may enjoy automatically and appropriately
selected music associated With a particular textual content
based on the mood, expression or emotional theme of the
particular textual content.
BRIEF DESCRIPTION OF THE SEVERAL
VIEWS OF THE DRAWING(S)
[0014] Having thus described the invention in general
terms, reference Will noW be made to the accompanying
draWings, Which are not necessarily draWn to scale, and
Wherein:
[0015] FIG. 1 is a schematic block diagram of a mobile
terminal according to an exemplary embodiment of the
present invention;
[0016] FIG. 2 is a schematic block diagram of a Wireless
communications system according to an exemplary embodi
ment of the present invention;
[0017] FIG. 3 illustrates a block diagram of portions of a
mobile terminal according to an exemplary embodiment of
the present invention;
[0018] FIG. 4 illustrates an graph of time-varying mixing
gain according to an exemplary embodiment of the present
invention; and
[0019]
FIG. 5 is a block diagram according to an exem
plary method of providing content dependent music mixing.
present invention can be utiliZed in conjunction With a
variety of other applications, both in the mobile communi
cations industries and outside of the mobile communications
industries.
[0023] The mobile terminal 10 includes an antenna 12 in
operable communication With a transmitter 14 and a receiver
16. The mobile terminal 10 further includes a controller 20
or other processing element that provides signals to and
receives signals from the transmitter 14 and receiver 16,
respectively. The signals include signaling information in
accordance With the air interface standard of the applicable
cellular system, and also user speech and/or user generated
data. In this regard, the mobile terminal 10 is capable of
operating With one or more air interface standards, commu
nication protocols, modulation types, and access types. By
Way of illustration, the mobile terminal 10 is capable of
operating in accordance With any of a number of ?rst,
second and/or third-generation communication protocols or
the like. For example, the mobile terminal 10 may be
capable of operating in accordance With second-generation
(2G) Wireless communication protocols IS-l36 (TDMA),
GSM, and IS-95 (CDMA).
[0024]
It is understood that the controller 20 includes
circuitry required for implementing audio and logic func
tions of the mobile terminal 10. For example, the controller
20 may be comprised of a digital signal processor device, a
microprocessor device, and various analog to digital con
verters, digital to analog converters, and other support
circuits. Control and signal processing functions of the
mobile terminal 10 are allocated betWeen these devices
DETAILED DESCRIPTION OF THE
INVENTION
according to their respective capabilities. The controller 20
thus may also include the functionality to convolutionally
encode and interleave message and data prior to modulation
[0020] Embodiments of the present invention Will noW be
described more fully hereinafter With reference to the
accompanying draWings, in Which some, but not all embodi
ments of the invention are shoWn. Indeed, the invention may
be embodied in many different forms and should not be
and transmission. The controller 20 can additionally include
an internal voice coder, and may include an internal data
construed as limited to the embodiments set forth herein;
rather, these embodiments are provided so that this disclo
modem. Further, the controller 20 may include functionality
to operate one or more softWare programs, Which may be
stored in memory. For example, the controller 20 may be
capable of operating a connectivity program, such as a
conventional Web broWser. The connectivity program may
Oct. 18, 2007
US 2007/0245375 A1
then allow the mobile terminal 10 to transmit and receive
Web content, such as location-based content, according to a
Wireless Application Protocol (WAP), for example. Also, for
example, the controller 20 may be capable of operating a
trunks When the mobile terminal 10 is involved in a call. In
addition, the MSC 46 can be capable of controlling the
forWarding of messages to and from the mobile terminal 10,
and can also control the forWarding of messages for the
softWare application capable of analyzing text and selecting
mobile terminal 10 to and from a messaging center. It should
music appropriate to the text. The music may be stored on
be noted that although the MSC 46 is shoWn in the system
of FIG. 2, the MSC 46 is merely an exemplary netWork
the mobile terminal 10 or accessed as Web content.
[0025]
The mobile terminal 10 also comprises a user
interface including an output device such as a conventional
earphone or speaker 22, a ringer 24, a microphone 26, a
display 28, and a user input interface, all of Which are
coupled to the controller 20. The user input interface, Which
alloWs the mobile terminal 10 to receive data, may include
any of a number of devices alloWing the mobile terminal 10
to receive data, such as a keypad 30, a touch display (not
shoWn) or other input device. In embodiments including the
keypad 30, the keypad 30 includes the conventional numeric
(0-9) and related keys (#, *), and other keys used for
operating the mobile terminal 10. The mobile terminal 10
further includes a battery 34, such as a vibrating battery
pack, for poWering various circuits that are required to
operate the mobile terminal 10, as Well as optionally pro
device and the present invention is not limited to use in a
netWork employing an MSC.
[0028]
The MSC 46 can be coupled to a data netWork,
such as a local area netWork (LAN), a metropolitan area
netWork (MAN), and/or a Wide area netWork (WAN). The
MSC 46 can be directly coupled to the data netWork. In one
typical embodiment, hoWever, the MSC 46 is coupled to a
GTW 48, and the GTW 48 is coupled to a WAN, such as the
Internet 50. In turn, devices such as processing elements
(e.g., personal computers, server computers or the like) can
be coupled to the mobile terminal 10 via the Internet 50. For
example, as explained beloW, the processing elements can
include one or more processing elements associated With a
computing system 52 (tWo shoWn in FIG. 2), origin server
54 (one shoWn in FIG. 2) or the like, as described beloW.
viding mechanical vibration as a detectable output.
[0029]
[0026]
The mobile terminal 10 may further include a
universal identity module (UIM) 38. The UIM 38 is typi
cally a memory device having a processor built in. The UIM
38 may include, for example, a subscriber identity module
(SIM), a universal integrated circuit card (UICC), a univer
sal subscriber identity module (USIM), a removable user
identity module (R-UIM), etc. The UIM 38 typically stores
information elements related to a mobile subscriber. In
The BS 44 can also be coupled to a signaling GPRS
(General Packet Radio Service) support node (SGSN) 56. As
knoWn to those skilled in the art, the SGSN 56 is typically
capable of performing functions similar to the MSC 46 for
packet sWitched services. The SGSN 56, like the MSC 46,
can be coupled to a data netWork, such as the Internet 50.
The SGSN 56 can be directly coupled to the data netWork.
In a more typical embodiment, hoWever, the SGSN 56 is
equipped With memory. For example, the mobile terminal 10
coupled to a packet-sWitched core netWork, such as a GPRS
core netWork 58. The packet-sWitched core netWork is then
coupled to another GTW 48, such as a GTW GPRS support
may include volatile memory 40, such as volatile Random
Access Memory (RAM) including a cache area for the
node (GGSN) 60, and the GGSN 60 is coupled to the
Internet 50. In addition to the GGSN 60, the packet-sWitched
temporary storage of data. The mobile terminal 10 may also
include other non-volatile memory 42, Which can be embed
core netWork can also be coupled to a GTW 48. Also, the
GGSN 60 can be coupled to a messaging center. In this
ded and/ or may be removable. The non-volatile memory 42
regard, the GGSN 60 and the SGSN 56, like the MSC 46,
may be capable of controlling the forWarding of messages,
addition to the UIM 38, the mobile terminal 10 may be
can additionally or alternatively comprise an EEPROM,
?ash memory or the like, such as that available from the
such as MMS messages. The GGSN 60 and SGSN 56 may
SanDisk Corporation of Sunnyvale, Calif., or Lexar Media
also be capable of controlling the forWarding of messages
Inc. of Fremont, Calif. The memories can store any of a
for the mobile terminal 10 to and from the messaging center.
number of pieces of information, and data, used by the
mobile terminal 10 to implement the functions of the mobile
[0030] In addition, by coupling the SGSN 56 to the GPRS
terminal 10. For example, the memories can include an
identi?er, such as an international mobile equipment iden
core netWork 58 and the GGSN 60, devices such as a
ti?cation (IMEI) code, capable of uniquely identifying the
mobile terminal 10.
[0027]
Referring noW to FIG. 2, an illustration of one type
of system that Would bene?t from the present invention is
provided. The system includes a plurality of netWork
devices. As shoWn, one or more mobile terminals 10 may
each include an antenna 12 for transmitting signals to and for
receiving signals from a base site or base station (BS) 44.
The base station 44 may be a part of one or more cellular or
mobile netWorks each of Which includes elements required
to operate the netWork, such as a mobile sWitching center
(MSC) 46. As Well knoWn to those skilled in the art, the
mobile netWork may also be referred to as a Base Station/
MSC/InterWorking function (BMI). In operation, the MSC
46 is capable of routing calls to and from the mobile terminal
10 When the mobile terminal 10 is making and receiving
calls. The MSC 46 can also provide a connection to landline
computing system 52 and/or origin server 54 may be
coupled to the mobile terminal 10 via the Internet 50, SGSN
56 and GGSN 60. In this regard, devices such as the
computing system 52 and/or origin server 54 may commu
nicate With the mobile terminal 10 across the SGSN 56,
GPRS core netWork 58 and the GGSN 60. By directly or
indirectly connecting mobile terminals 10 and the other
devices (e.g., computing system 52, origin server 54, etc.) to
the Internet 50, the mobile terminals 10 may communicate
With the other devices and With one another, such as accord
ing to the Hypertext Transfer Protocol (HTTP), to thereby
carry out various functions of the mobile terminals 10.
[0031] Although not every element of every possible
mobile netWork is shoWn and described herein, it should be
appreciated that the mobile terminal 10 may be coupled to
one or more of any of a number of different netWorks
through the BS 44. In this regard, the netWork(s) can be
capable of supporting communication in accordance With
Oct. 18, 2007
US 2007/0245375 A1
any one or more of a number of ?rst-generation (1G),
devices (e.g., other terminals). Like With the computing
second-generation (2G), 2.5G and/or third-generation (3G)
mobile communication protocols or the like. For example,
systems 52, the mobile terminal 10 may be con?gured to
communicate With the portable electronic devices in accor
one or more of the netWork(s) can be capable of supporting
communication in accordance With 2G Wireless communi
any of a number of different Wireline or Wireless commu
cation protocols IS-136 (TDMA), GSM, and IS-95
nication techniques, including USB, LAN, WLAN, WiMAX
(CDMA). Also, for example, one or more of the netWork(s)
can be capable of supporting communication in accordance
and/or UWB techniques.
With 2.5G Wireless communication protocols GPRS,
Enhanced Data GSM Environment (EDGE), or the like.
Further, for example, one or more of the netWork(s) can be
capable of supporting communication in accordance With
3G Wireless communication protocols such as Universal
Mobile Telephone System (UMTS) netWork employing
Wideband Code Division Multiple Access (WCDMA) radio
access technology. Some narroW-band AMPS (NAMPS), as
Well as TACS, netWork(s) may also bene?t from embodi
ments of the present invention, as should dual or higher
mode mobile stations (e.g., digital/analog or TDMA/
CDMA/analog phones).
[0032]
The mobile terminal 10 can further be coupled to
one or more Wireless access points (APs) 62. The APs 62
may comprise access points con?gured to communicate With
the mobile terminal 10 in accordance With techniques such
as, for example, radio frequency (RF), Bluetooth (BT),
infrared (IrDA) or any of a number of different Wireless
dance With techniques such as, for example, RF, BT, IrDA or
[0034] An exemplary embodiment of the invention Will
noW be described With reference to FIG. 3, in Which certain
elements of a system for content dependent expressive
music mixing are displayed. The system of FIG. 3 may be
employed, for example, on the mobile terminal 10 of FIG.
1. HoWever, it should be noted that the system of FIG. 3,
may also be employed on a variety of other devices, both
mobile and ?xed, and therefore, the present invention should
not be limited to application on devices such as the mobile
terminal 10 of FIG. 1. It should also be noted, hoWever, that
While FIG. 3 illustrates one example of a con?guration of a
system for content dependent expressive music mixing,
numerous other con?gurations may also be used to imple
ment the present invention. Furthermore, although FIG. 3
shoWs a text-to-speech (TTS) module, the present invention
need not necessarily be practiced in the context of TTS, but
instead applies more generally to delivering information, in
a ?rst media, that is related to the emotional content of
information delivered simultaneously in a second media.
networking techniques, including Wireless LAN (WLAN)
techniques such as IEEE 802.11 (e.g., 802.11a, 802.11b,
802.11g, 802.11n, etc.), WiMAX techniques such as IEEE
802.16, and/or ultra Wideband (UWB) techniques such as
[0035]
Referring noW to FIG. 3, a system for content
dependent expressive music mixing is provided. The system
includes a TTS module 70, a music module 72 and a text
IEEE 802.15 or the like. The APs 62 may be coupled to the
Internet 50. Like With the MSC 46, the APs 62 can be
content analyZer 74. Each of the TTS module 70, the music
module 72 and the text content analyZer 74 may be any
directly coupled to the Internet 50. In one embodiment,
hoWever, the APs 62 are indirectly coupled to the Internet 50
device or means embodied in either hardWare, softWare, or
a combination of hardWare and softWare. In an exemplary
via a GTW 48. Furthermore, in one embodiment, the BS 44
may be considered as another AP 62. As Will be appreciated,
by directly or indirectly connecting the mobile terminals 10
and the computing system 52, the origin server 54, and/or
any of a number of other devices, to the Internet 50, the
mobile terminals 10 can communicate With one another, the
embodiment, the TTS module 70, the music module 72 and
the text content analyZer 74 are embodied in softWare as
instructions that are stored on a memory of the mobile
terminal 10 and executed by the controller 20.
[0036]
The TTS module 70 may be any means knoWn in
the art for producing synthesiZed speech from computer text.
computing system, etc., to thereby carry out various func
As such, elements of the TTS module 70 of FIG. 3 are
tions of the mobile terminals 10, such as to transmit data,
merely exemplary and the descriptions provided beloW are
given merely to explain an operation of the TTS module 70
content or the like to, and/ or receive content, data or the like
from, the computing system 52. As used herein, the terms
“data,’"‘content,”“information” and similar terms may be
used interchangeably to refer to data capable of being
transmitted, received and/or stored in accordance With
embodiments of the present invention. Thus, use of any such
terms should not be taken to limit the spirit and scope of the
present invention.
[0033] Although not shoWn in FIG. 2, in addition to or in
lieu of coupling the mobile terminal 10 to computing
systems 52 across the Internet 50, the mobile terminal 10
and computing system 52 may be coupled to one another and
communicate in accordance With, for example, RF, BT, IrDA
in general terms for the sake of clarity. The TTS module 70
includes a text processor 76, a prosodic processor 78 and an
acoustic synthesiZer 80. The text processor 76 receives a
media input, such as an input text 82, and begins processing
the input text 82 before communicating processed text to the
prosodic processor 78. The text processor 76 can perform
any of numerous processing operations knoWn in the art.
The text processor 76 may include a table or other means to
correlate a particular text Word or sequence of letters With a
particular speci?cation or rule for pronunciation. The pro
sodic processor 78 analyzes the processed text to determine
speci?cations for hoW the text should be pronounced, What
or any of a number of different Wireline or Wireless com
syllables to accent, What pitch to use, hoW fast to deliver the
munication techniques, including LAN, WLAN, WiMAX
sound, etc. The acoustic synthesiZer 80 produces a syntheti
cally created audio output in the form of computer generated
speech. The acoustic synthesiZer 80 applies stored rules or
and/or UWB techniques. One or more of the computing
systems 52 can additionally, or alternatively, include a
removable memory capable of storing content, Which can
thereafter be transferred to the mobile terminal 10. Further,
models to an input from the prosodic processor 78 to
the mobile terminal 10 can be coupled to one or more
computer text in a Way that conforms to the speci?cations
electronic devices, such as printers, digital projectors and/or
other multimedia capturing, producing and/or storing
determined by the prosodic processor 78. The synthetic
generate synthetic speech 84 that audibly reproduces the
speech 84 may then be communicated to an output device
Oct. 18, 2007
US 2007/0245375 A1
such as an audio mixer 92 for appropriate mixing prior to
delivery to another output device such as the speaker 22.
[0037] The text content analyzer 74 divides the input text
82 into segments. The segments may correspond to, for
example, paragraphs or chapters. Alternatively, the segments
may correspond to arbitrarily chosen portions of text. The
text content analyZer 74 then analyZes each of the segments
by applying natural language processing. Using the natural
language processing, the text content analyZer 74 identi?es
portions of the input text 82 that correspond to certain
emotions or certain types of expressiveness. Portions of the
input text 82 corresponding to certain emotions or types of
expressiveness are then marked, labeled, tagged, or other
Wise identi?ed by the text content analyZer 74 to identify the
text portions With emotions or expressions that correspond.
In this Way, an emotional property of each of the segments
may be determined.
[0038] The natural language processing may be per
formed, for example, by use of a key Word search. For
example, Words such as sad, somber, sorrowful, unhappy,
etc. may correlate to an emotion of sadness. The natural
language processing may alternatively be performed, for
music according to the instructions generated by the expres
sive performance and/or selection module 86. The instruc
tions may include a command to play, for example, a stored
MP3 or a stored selection of musical notes. The stored MP3
or the stored selection of musical notes may be associated
With a particular emotion or expression. Thus, the text
content analyZer 74 may associate a particular emotion With
a text segment based on the natural language and the
expressive performance and/or selection module 86 Will
send instructions to the music player 88 to cause the music
player 88 to play or generate music that is associated With
the particular emotion or expression. In an exemplary
embodiment the music player 88 may employ the Well
knoWn technology of musical instrument digital interface
(MIDI). HoWever, other suitable technologies for playing
music may also be employed, such as MP3 or others.
Accordingly, the music player 88 outputs music content 90
that is associated With a particular emotion, mood or expres
sion. The music content 90 may then be communicated to an
output device such as the audio mixer 92 for mixing With the
synthetic speech 84. Alternatively, the music content 90 may
be stored prior to communication to the output device.
Additionally, mixing may occur someWhere other than at the
example, by using a pre-trained statistical model. The model
output device.
may include tables or other means for dividing speci?c
Words, combinations of Words, or Words Within proximity to
each other into particular emotional groups. In an exemplary
embodiment, text portions may be classi?ed as belonging to
one of four basic emotions such as anger, sadness, happiness
and fear. More sophisticated classi?cations may also be
ule 86 may, in one exemplary embodiment, select back
ground music or sound that is appropriate to the text based
on results from the text content analyZer 74. In this regard,
a list of available music elements may be stored either in the
implemented including additional emotions such as, for
example, excitement, drama, tension, etc. Accordingly, each
of the segments may be analyZed by comparison to the table
of the model. In an exemplary embodiment, a probabilistic
determination may be made by an algorithm that determines
Which entry in the table With Which a particular segment
[0041] The expressive performance and/or selection mod
memory of the mobile terminal 10 or at a netWork server that
may be accessed by the mobile terminal 10. The list of
available music elements may have each musical element
(or piece) classi?ed according to different emotions or
expressions. In an exemplary embodiment, text content
analyZer 74 may classify text according to a set of various
emotional themes and the expressive performance and/or
most closely corresponds. The tables include, for example,
selection module 86 may access musical elements that are
Words, combinations of Words, and Words in proximity to
classi?ed by the same set of various emotional themes to
select a musical element that is appropriate to the emotional
theme of a particular text section as determined by the text
each other Which are often associated With a particular
emotional property. Accordingly, a phrase such as “I ?nd
that it is increasingly rare that I feel happy”, could be
associated With sadness, rather than With happiness as may
occur With a simple Word search for “happy”.
[0039]
In an exemplary embodiment, a user of the mobile
content analyZer 74. The musical elements associated With
each of the emotional themes may be predetermined at the
netWork by a netWork operator and updated or changed as
desired or required during routine server maintenance. Alter
natively, the user may manually select musical elements that
terminal 10 may manually supplement the automatic pro
cessing of the text content analyZer 74. In such a situation,
the user may manually tag particular text segments and
the user Wishes to associate With each of the emotional
themes. Selections for a particular user may be stored locally
associate a desired emotion With that text segment. For
example, the user may select a text portion using a click and
at a netWork server, i.e., as a part of the user’s pro?le. In an
drag operation and select the desired emotion from or input
the desired emotion into a dialog box. Furthermore, the user
in the memory of the mobile terminal 10, or stored remotely
exemplary embodiment, a series of musical selections,
stored in MP3 form, and classi?ed according to emotional
theme may be stored on either the memory of the mobile
may have the option to bypass the text content analyZer 74
completely and perform all associations betWeen text seg
terminal 10 or at a netWork server. The mobile terminal 10
ments and corresponding emotions manually.
ones on the musical selections for mixing of synthetic
[0040]
The music module 72 includes an expressive per
formance and/ or selection module 86 and a music player 88.
The expressive performance and/or selection module 86
employs particular rules or models to control playback of
sounds and/or music that correlates to the emotion or
expression associated With each of the text segments as
determined by the text content analyZer 74. The expressive
performance and/or selection module 86 then sends instruc
tions to the music player 88. The music player 88 plays
then automatically associates text segments With particular
speech from the text segments With corresponding musical
selections that have an emotional theme associated With
each of the text segments.
[0042] In another exemplary embodiment, the expressive
performance and/or selection module 86 may generate
music that is intelligently selected to correspond to the
emotional theme determined by the text content analyZer 74.
For example, the expressive performance and/or selection
module 86 may present a musical piece With speci?c con
Oct. 18, 2007
US 2007/0245375 A1
tent-dependent emotional coloring. In other Words, although
the musical piece, Which is essentially a collection of
musical notes, is normally rendered as generically described
by a composer of the musical piece, the present invention
provides a mechanism by Which the emotional theme deter
mined by the text content analyZer 74 may be used to modify
the musical piece in accordance With the determined emo
tional theme. As such, notes in the musical piece or score are
rendered in terms of, for example, intensity, duration and
timbre in a Way that expresses the determined emotional
theme. In other Words, the expressive performance and/or
selection module 86 is capable of adding expressive or
emotional content to the score by rendering the score
modi?ed according to the determined emotional theme.
[0043] The expressive performance and/or selection mod
ule 86 may be programmed to perform the addition of
expressive or emotional content to the score by any suitable
means. For example, case based reasoning systems, multiple
regression analysis algorithms, spectral interpolation syn
thesis, rule based systems, fuZZy logic-based rule systems,
etc. may be employed. Alternatively, analysis-by-measure
ment to model musical expression and the extraction of rules
from performances by a machine learning system may also
be employed. In an exemplary embodiment, the expressive
performance and/or selection module 86 provides at least
analyZer 74 and the text processor 76 are shoWn as separate
elements in FIG. 3, the text content analyZer 74 and the text
processor 76 may be combined into a single element capable
of performing all of the functions described above.
[0045] The audio mixer 92 is any knoWn device or means,
embodied in softWare, hardWare or a combination of hard
Ware and softWare, Which is capable of mixing tWo audio
inputs to produce a resultant output or combined signal. In
an exemplary embodiment, the audio mixer 92 generates a
combined signal x(n) by mixing synthetic speech s(n) and
background music/ sound mij (n). Accordingly, the combined
signal x(n) may be described by the equation: x(n)=s(n)+
(x(n)*miJ-(n), in Which 0t denotes time-varying mixing gain
and i and j are the ith expressive mode of jLh selected music.
In a TTS system, prosodic parameters include pitch, dura
tion, intensity, etc. Accordingly, based on the parameters,
energy and Word segmentation values may be de?ned. The
synthetic speech to background music ratio (SMR) may then
be de?ned as: SMR=10 log [E(s2)/E(m2)], Where E(s2) is the
energy of the synthetic speech and E(m2) is the energy of the
background music. Since the energy of the synthetic speech
Would be a knoWn value, the time-varying mixing gain 0t
may be derived given an SMR. The time-varying mixing
gain 0t may be implemented at a Word level or a sentence
level. Accordingly, a template function can be used to
one speci?cation based on emotion determined from a text
reshape the time-varying mixing gain 0t to, for example,
to the music player 88 along With a musical element. The
fade-in When beginning a Word and lift gain during a pause,
music player 88 then produces musical content responsive to
such as betWeen a paragraph or chapter in an audio book, as
the speci?cation and the musical element. Accordingly,
shoWn roughly in FIG. 4.
pre-composed music may be stored in note form on either
the memory of the mobile terminal 10 or at a netWork server
[0046]
Thus, any computer readable text may be accom
and played in different Ways by the music player 88,
panied by emotionally appropriate background music.
dependent upon a mood or emotion determined from the
Accordingly, media such as electronic books, emails, SMS
messages, games, etc. may be enhanced, not just by the
addition of music, but rather by the addition of music that
corresponds to the emotional tone expressed in the media.
Additionally, since the addition of the music is automatic,
and is performed at the mobile terminal 10, the labor
text. In an exemplary embodiment, the pre-composed music
may be predetermined according to the text (i.e., a musical
score associated With a particular book title) or pre-selected
by the user. For example, the user may select the Works of
Bach or Handel to be modi?ed according to the emotion
determined from the text. Alternatively, the pre-composed
music may be selected from a playlist determined by, for
example, the user, a netWork operator or a producer of an
intensive, time consuming and expensive process of tagging
media for correlation to emotionally appropriate music can
be avoided.
electronic book.
[0047]
[0044] Thus, the expressive performance and/or selection
program product according to exemplary embodiments of
module 86 either selects, generates, or modi?es music based
on text content analysis, thereby producing music that
matches an emotional or expressive coloring of the text
the invention. It Will be understood that each block or step
of the ?oWcharts, and combinations of blocks in the How
charts, can be implemented by various means, such as
hardWare, ?rmWare, and/or softWare including one or more
content. In other Words, for example, the expressive perfor
mance and/or selection module 86 may select music that is
prede?ned to correlate to a particular emotion or expression
responsive to the emotional or expressive coloring of the
text content. Alternatively, the expressive performance and/
or selection module 86 may modify selected music (i.e.,
change notes, instruments, tempo, etc.) to correlate an
expression or emotion of the music With the emotional or
expressive coloring of the text content. The music player 88
then plays the music that is either selected, generated or
modi?ed by the expressive performance and/or selection
module 86. It should be noted that although the expressive
performance and/or selection module 86 and the music
player 88 are shoWn as separate elements in FIG. 3, the
expressive performance and/or selection module 86 and the
music player 88 may be combined into a single element
capable of performing all of the functions described above.
It should also be noted that although the text content
FIG. 5 is a ?owchart of a system, method and
computer program instructions. For example, one or more of
the procedures described above may be embodied by com
puter program instructions. In this regard, the computer
program instructions Which embody the procedures
described above may be stored by a memory device of the
mobile terminal and executed by a built-in processor in the
mobile terminal. As Will be appreciated, any such computer
program instructions may be loaded onto a computer or
other programmable apparatus (i.e., hardWare) to produce a
machine, such that the instructions Which execute on the
computer or other programmable apparatus create means for
implementing the functions speci?ed in the ?oWcharts
block(s) or step(s). These computer program instructions
may also be stored in a computer-readable memory that can
direct a computer or other programmable apparatus to
function in a particular manner, such that the instructions
stored in the computer-readable memory produce an article
Oct. 18, 2007
US 2007/0245375 A1
of manufacture including instruction means Which imple
ment the function speci?ed in the ?oWcharts block(s) or
step(s). The computer program instructions may also be
the emotional content of the ?rst media content, such as text.
loaded onto a computer or other programmable apparatus to
cause a series of operational steps to be performed on the
inventions set forth herein Will come to mind to one skilled
computer or other programmable apparatus to produce a
computer-implemented process such that the instructions
Which execute on the computer or other programmable
displayed according to a speci?cation determined based on
[0051]
Many modi?cations and other embodiments of the
in the art to Which these inventions pertain having the bene?t
of the teachings presented in the foregoing descriptions and
the associated draWings. Therefore, it is to be understood
apparatus provide steps for implementing the functions
speci?ed in the ?oWcharts block(s) or step(s).
that the inventions are not to be limited to the speci?c
embodiments disclosed and that modi?cations and other
embodiments are intended to be included Within the scope of
[0048] Accordingly, blocks or steps of the ?oWcharts
support combinations of means for performing the speci?ed
the appended claims. Although speci?c terms are employed
herein, they are used in a generic and descriptive sense only
functions, combinations of steps for performing the speci
and not for purposes of limitation.
?ed functions and program instruction means for performing
the speci?ed functions. It Will also be understood that one or
more blocks or steps of the ?oWcharts, and combinations of
blocks or steps in the ?oWcharts, can be implemented by
What is claimed is:
1. A method of providing content dependent media con
special purpose hardWare-based computer systems Which
perform the speci?ed functions or steps, or combinations of
special purpose hardWare and computer instructions.
[0049]
In this regard, one embodiment of a method for
content dependent music mixing includes determining an
tent mixing, the method comprising:
automatically determining an emotional property of a ?rst
media content input;
determining a speci?cation for a second media content in
response to the determined emotional property; and
emotional property of a text input at operation 100. At
operation 110, a speci?cation for musical content is deter
producing the second media content in accordance With
mined in response to the emotional property. In an exem
2. A method according to claim 1, Wherein the second
plary embodiment, determining the speci?cation includes
selecting the musical content from a group of musical
the speci?cation.
media content is musical content.
3. A method according to claim 2, Wherein the ?rst media
elements that are arranged according to emotional proper
content is text content.
ties. In another exemplary embodiment, determining the
speci?cation includes providing instructions to modify a
pre-composed musical element according to the determined
speaker, in accordance With the speci?cation. If the present
4. A method according to claim 3, Wherein determining
the emotional property comprises dividing the text content
into segments and determining by text analysis the emo
tional property associated With each of the segments.
5. A method according to claim 4, further comprising
mixing the musical content With synthetic speech derived
invention is used in the context of enhancing a TTS system,
from the text content.
then the musical content is mixed With synthetic speech
6. A method according to claim 2, Wherein determining
the speci?cation comprises selecting the musical content
emotional property. At operation 120, musical content is
delivered to an output device, such as an audio mixer or a
derived from the text at operation 130. The mixed musical
content and synthetic speech may then be synchroniZed to be
from a group of musical elements that are associated With
played at the same time by an audio output device. Addi
respective emotional properties.
tionally, a mixing gain of the output device may be varied in
response to timing instructions. In other Words, the mixing
gain may be time variable in accordance With predetermined
the speci?cation comprises providing instructions to modify
criteria.
[0050] The above described functions may be carried out
in many Ways. For example, any suitable means for carrying
out each of the functions described above may be employed
to carry out the invention. In one embodiment, all or a
portion of the elements of the invention generally operate
under control of a computer program product. The computer
program product for performing the methods of embodi
ments of the invention includes a computer-readable storage
medium, such as the non-volatile storage medium, and
computer-readable program code portions, such as a series
of computer instructions, embodied in the computer-read
able storage medium. It should also be noted, that although
the above described principles have been applied in the
context of delivering background music related to emotional
themes of text, similar principles Would also apply to the
delivery of background music related to emotional themes of
other media including, for example, pictures. Additionally,
the present invention should not be limited to presenting
music related to an emotional theme of a ?rst media content.
Thus, a second media content such as a visual image may be
7. A method according to claim 2, Wherein determining
a pre-composed musical element according to the deter
mined emotional property.
8. A method according to claim 5, further comprising
varying a mixing gain in response to timing based instruc
tions.
9. A method according to claim 8, Wherein the mixing
gain is increased during pauses in the text content.
10. A method according to claim 1, Wherein producing the
second media content comprises one of:
generating music;
modifying a musical score; and
selecting an appropriate musical score.
11. A computer program product for providing content
dependent media content mixing, the computer program
product comprising at least one computer-readable storage
medium having computer-readable program code portions
stored therein, the computer-readable program code portions
comprising:
a ?rst executable portion for automatically determining an
emotional property of a ?rst media content input;
Oct. 18, 2007
US 2007/0245375 A1
a second executable portion for determining a speci?ca
tion for a second media content in response to the
determined emotional property; and
a third executable portion for producing the second media
content in accordance With the speci?cation.
12. A computer program product according to claim 11,
Wherein the second media content is musical content.
13. A computer program product according to claim 12,
Wherein the ?rst executable portion further includes instruc
tions for dividing a text into segments and determining by
text analysis the emotional property associated With each of
the segments.
14. A computer program product according to claim 13,
further comprising fourth executable instruction for mixing
the musical content With synthetic speech derived from the
text.
15. A computer program product according to claim 12,
Wherein the second executable portion further includes
instructions for selecting the musical content from a group
of musical elements that are associated With respective
emotional properties.
16. A computer program product according to claim 12,
Wherein the second executable portion further includes
instructions for providing instructions to modify a pre
composed musical element according to the determined
emotional property.
17. A computer program product according to claim 14,
further comprising a fourth executable portion for varying a
mixing gain in response to timing based instructions.
18. A computer program product according to claim 17,
23. A device according to claim 21, Wherein the music
module is capable of accessing at least one pre-composed
musical element and the music module is further con?gured
to modify the pre-composed musical element according to
the determined property.
24. A device according to claim 21, Wherein the text
content analyZer is capable of dividing a text into segments
and determining by text analysis the emotional property
associated With each of the segments.
25. A mobile terminal for providing content dependent
media content mixing, the mobile terminal comprising:
an output device capable of delivering media in a user
perceptible manner;
a ?rst module con?gured to automatically determine an
emotional property of a ?rst media content input; and
a second module con?gured to determine a speci?cation
for a second media content in response to the deter
mined emotional property and produce the second
media content in accordance With the speci?cation.
26. A mobile terminal according to claim 25, Wherein the
?rst module is a text content analyZer and the ?rst media
content is a text, and Wherein the second module is a music
module and the second media content is musical content.
27. A mobile terminal according to claim 26, Wherein the
text content analyZer is capable of dividing a text into
segments and determining by text analysis the emotional
property associated With each of the segments.
Wherein the fourth executable portion further includes
28. A mobile terminal according to claim 26, Wherein the
output device is an audio mixer capable of mixing a plurality
instructions for increasing the mixing gain during pauses in
of audio signals.
the text.
29. A mobile terminal according to claim 28, Wherein the
audio mixer is con?gured to vary a mixing gain in response
to timing based instructions.
30. A mobile terminal according to claim 29, the missing
gain is increased during pauses in the text.
31. A mobile terminal according to claim 28, further
19. A computer program product according to claim 11,
Wherein third executable portion comprises one of:
generating music;
modifying a musical score; and
selecting an appropriate musical score.
20. A device for providing content dependent media
content mixing, the device comprising:
a ?rst module con?gured to automatically determine an
emotional property of a ?rst media content input; and
a second module con?gured to determine a speci?cation
for a second media content in response to the deter
mined emotional property and producing the second
media content in accordance With the speci?cation.
21. A device according to claim 20, Wherein the ?rst
module is a text content analyZer and the ?rst media content
is a text, and Wherein the second module is a music module
and the second media content is musical content.
22. A device according to claim 21, Wherein the music
module is capable of accessing musical elements associated
With respective emotional properties.
comprising a text-to-speech module capable of producing
synthetic speech responsive to the input text, the text-to
speech module delivering the synthetic speech to the audio
mixer,
Wherein the audio mixer mixes the synthetic speech and
the musical content.
32. A mobile terminal according to claim 26, Wherein the
music module is capable of accessing musical elements
associated With respective emotional properties.
33. A mobile terminal according to claim 26, Wherein the
music module is capable of accessing at least one pre
composed musical element and the music module is further
con?gured to modify the pre-composed musical element
according to the determined property.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement