RECORD 8

RECORD 8
US007421394B2
(12) Unlted States Patent
(10) Patent No.:
Omi et a].
(45) Date of Patent:
(54) INFORMATION PROCESSING APPARATUS,
(56)
Sep. 2, 2008
References Cited
INFORMATION PROCESSING METHOD AND
(75)
US 7,421,394 B2
U S PATENT DOCUMENTS
RECORDING MEDIUM, AND PROGRAM
' '
5,231,670 A *
7/1993
Inventors: Hiromi Omi, KanagaWa (JP); Tsuyoshi
5,652,898 A *
7/1997 Kaji ............... ..
5,778,344 A *
7/1998 Attwater 6t
5,787,455 A *
7/1998 Seybold .................... .. 711/100
Yagisawas Kanagawa
Makoto
Himta’ Tokyo (JP)
(73) Assignee: Canon Kabushiki Kaisha, Tokyo (JP)
Goldhor et a1.
(C t_
........... .. 704/275
704/10
........... .. 704/275
d)
on 1nue
(*)
Notice:
Subject to any disclaimer, the term of this
patent is extended or adjusted under 35
U.S.C. 154(b) by 237 days.
(21) Appl. No.:
(22) PCTFiled:
63-48040
10/577,493
Oct.26,2004
2/1988
(Commued)
_
Primary ExamineriVijay B ChaWan
(86) PCT NO"
(87)
FOREIGN PATENT DOCUMENTS
JP
PCT/JP2004/016195
(74) Attorney, Agent, or FirmiFitzpatrick, Cella, Harper &
§ 371 (0x1),
(2), (4) Date/1
APr- 27, 2006
SW0
(57)
PCT Pub. No.:
WO2005/045804
_
_
ABSTRACT
_
_
_
_
_
This invention has as its obj ect to save labor time required for
PCT Pub Date, May 19 2005
’
(65)
correction and adjustments When an error has occurred during
either the recognition of input data or the process of the
PI‘iOI‘ PllblicatiOIl Data
recognition result of an information processing apparatus
Which recognizes input data and outputs the recognition
Us 2007/0043552 A1
(30)
Feb' 22’ 2007
result. The information processing method described in this
Foreign Application Priority Data
NOV. 7, 2003
(JP)
embodiment includes a recording step of recording input data
(step S708), a recognition step of recognizing the input data
........................... .. 2003678877
(51) Int_ CL
(Step S707)’ a detennination Step of determining Whether or
not the input data can be recognized in the recognition step
GIoL 21/00
G0 6F 1 7/28
(2006 01)
(200601)
(step S709), and an output step of outputting, When it is
determined in the determination step that the input data can be
'
recognized, data generated based on a recognition result in
(52)
US. Cl. ......................... .. 704/277; 704/9; 704/270;
704/2; 704/257
the recognition Step (Step S713)’ and outputting’ When it is
determined in the determination step that the input data can
(58)
Field of Classi?cation Search ................... .. 704/2,
not be recognized, output data generated based on the input
704/9, 10, 231, 257, 260, 270, 275, 277;
710/65; 707/535, 532, 533, 530
data Which is recorded in the recording step (step S710).
See application ?le for complete search history.
INPUT
19 Claims, 13 Drawing Sheets
SPEECH RECOGNITION : SUCCESS
JAPANESE —) ENGLISH = SUCCESS
RECORD 8
II
HOW CAN I GET TO
THE EIFFEL TOWER
'?
OUTPUT & HOW CAN I GET TO THE EIFFEL TOWER ? Z
él/llZ/l/ll/l/Lll/ll/l///////////L///
US 7,421,394 B2
Page 2
U.S. PATENT DOCUMENTS
5,890,182
5,963,892
5,991,721
6,167,368
6,192,332
6,266,642
6,356,865
6,779,060
2001/0042082
3/1999 Yagisawa et a1.
10/1999 Tanaka et a1. ................ .. 704/2
11/1999 Asano et a1.
. 704/257
12/2000 Wacholder ................... .. 704/9
2/2001 Golding ....................... .. 704/2
7/2001 Franz et a1.
. 704/277
3/2002 Franz et a1.
704/2
8/2004 AZvine et a1. ............... .. 710/65
11/2001 Ueguri et a1.
2003/0046076 A1
2003/0061030 A1
2004/0143441 A1
3/2003 Hirota et a1.
3/2003 Kuboyama e a1.
7/2004 AiZaWa et a1.
FOREIGN PATENT DOCUMENTS
JP
JP
JP
JP
1-293490
7-121651
7-160289
2000-29492
* cited by examiner
11/1989
5/1995
6/1995
1/2000
US. Patent
Sep. 2,2008
B?
2.:
m9m22:
20:5 53m:28o&7éos%5a
w
w
@
US 7,421,394 B2
Sheet 1 0f 13
w
592E03s82a1
6528
w
56Sw:%i8m
5s ‘
R:
2:
P
o:
F
<
>
US. Patent
Sep. 2, 2008
Sheet 2 0f 13
US 7,421,394 B2
2(‘J1
A418 '5 ?h‘timu'c‘i'fr ? ~202
SPEECH RECOGNITION
JAPANESE —-> ENGLISH
Y
HOW CAN I GET TO
l _ - - -
- - - - -
- - . - - - - - -
- - Q ~ - -
- - - -
- n - a - q - - - - - - - - . - Q p a -
(2
Q n _ _ - -
- - . - . - ~ . - - - -
-
- - . - _ -
~ 203
- - - - -
- - - - n - .d
US. Patent
Sep. 2, 2008
Sheet 3 0f 13
US 7,421,394 B2
I “J 7 I I [11%
Matt??wtimrc'i??
'
o
\-
—,@
SPEECH RECOGNITION = success
RECORD
INPUT
JAPANESE —> ENGLISH : SUCCESS
v
HOW CAN I GET TO
THE EIFFEL TOWER
?
OUTPUT 2 HOW CAN I GET TO THE EIFFEL TOWER '? Z
////Z/////////L[/////////////////L//A
US. Patent
Sep. 2, 2008
Sheet 4 0f 13
F l G.
US 7,421,394 B2
4
I4 7 I H157 '7 '
’\Ii<‘:' 'B?UIi'MM'C‘i'b‘ ?
IL I
INPUT
RECORD
PLAY BACK SPEECH RECOGNITION = FAILURE
HOW CAN I GET TO
\\
OUTPUT %
/ HOW CAN I GET TO
2
1471):]??
US. Patent
Sep. 2, 2008
Sheet 5 0f 13
US 7,421,394 B2
F l G. 5
t'“/ 7 ' /\“/
'VZE'BEH‘ZEMMT'QVJ‘ ?
I c
I
\/
INPUT
RECORD
PLAY BACK SPEECH RECOGNITION = SUCCESS
JAPANESE —> ENGLISH : FAILURE
HOW CAN I GET 0
OUTPUT
\\
2
US. Patent
Sep. 2, 2008
Sheet 6 0f 13
US 7,421,394 B2
MIKE'S HOUSE
Mi?-‘a?miuu'cqb?
2,
'
o
\
INPUT
neconu
RECORDING!
PLAYBACK MODE
PLAY BACK
V
\\
HOW CAN IGET TO
k
\\
\\
OUTPUT 6 HOW CAN I GET TO
,
2
“K98 HOUSE
[III-‘III’!
US. Patent
Sep. 2, 2008
Sheet 7 0f 13
US 7,421,394 B2
SPEECH
RECOGNITION
MODE ‘.7
YES [S706
START SPEECH INPUT
, / S703
START SPEECH
INPUT
"
[S707
SPEECH
RECOGNITION
RECORD INPUT
SPEECH
‘
"
/ S704
" [S708
I
"P
S709
RECORD INPUT
SPEECH
RECOGNITION
SUCCESSFUL 2
v / S705
SYNTHETIC SPEECH
YES /s71 1
OF FORM SENTENCE
,
PROCESS
PLAYBACK OF
RECORDED SPEECH
RECOGNIZED WORD
(EX: JAPANESE -—> ENGLISH)
PROCESS
SUCCESSFUL ?
YES /s71s
SYNTHETIC SPEECH
OF FORM SENTENCE
v [8710
SYNTHETIC SPEECH
OF FORM SENTENCE
+
+
SYNTHETIC SPEECH OF
PROCESSED wono
OR PHRASE
PLAYBACK OF
RECORDED SPEECH
I
(
END
F3714
US. Patent
Sep. 2, 2008
Sheet 8 0f 13
US 7,421,394 B2
891
HOW CAN I GET TO
CHARACTER RECOGNITION
? ~802
ENGLISH -> JAPANESE
'
US. Patent
Sep. 2, 2008
THE TOKYO TOWER
Sheet 9 0f 13
US 7,421,394 B2
Q INPUT
J
HOW CAN I GET TO
RECORD 6
?
CHARACTER RECOGNITION
= SUCCESS
ENGLISH —> JAPANESE = SUCCESS
\_ /7////////////////////7//////////////////
OUTPUT
él/l//////ZZ[///////////////////////////A
US. Patent
Sep. 2, 2008
Sheet 10 0f 13
US 7,421,394 B2
,\
w,E(m.QJMPSEW
/
»
/
US. Patent
Sep. 2, 2008
Sheet 11 0f 13
US 7,421,394 B2
US. Patent
Sep. 2, 2008
Sheet 13 0f 13
US 7,421,394 B2
FIG. 13
S1302
NO
CHARACTER
RECOGNITION
MODE ?
YES /s1 306
INPUT CHARACTER
" [S1303
INPUT CHARACTER
v
[S1307
CHARACTER RECOGNITION
4
‘ [S1304
RECORD INPUT
CHARACTER IMAGE
v / S1305
"
" [S1308
RECORD INPUT
CHARACTER IMAGE
I
S1309
RECOGNITION
SUCCESSFUL ?
TEXT DISPLAY OF
YES /S131 1
FORM SENTENCE
+
PROCESS
PLAYBACK OF INPUT
CHARACTER IMAGE
RECOGNIZED WORD
(EX; ENGLISH —) JAPANESE)
PROCESS
SUCCESSFUL ‘.7
YES [S1313
TEXT DISPLAY OF
FORM SENTENCE
"/S13‘I0
TEXT DISPLAY OF
FORM SENTENCE
+
+
PROCESSED TEXT DISPLAY
PLAYBACK OF INPUT
CHARACTER IMAGE
US 7,421,394 B2
1
2
INFORMATION PROCESSING APPARATUS,
niZe the input data, outputting output data generated based on
the input data Which is recorded in the recording means.
INFORMATION PROCESSING METHOD AND
RECORDING MEDIUM, AND PROGRAM
According to this invention, even When an error has
occurred during recognition of input data or a post-process
using the recognition result in an information processing
TECHNICAL FIELD
apparatus Which recogniZes input data and outputs the recog
nition result, labor required for user’ s correction can be saved
and user’s convenience can be improved.
The present invention relates to a user interface in a recog
nition process of input data
BRIEF DESCRIPTION OF DRAWINGS
BACKGROUND ART
Features and advantages of the present invention Will be
In recent years, various user interfaces that use recognition
techniques such as a speech recognition technique, text rec
suf?ciently understood from the folloWing detailed descrip
ognition technique, and the like have been proposed. For
example, Japanese Patent Laid-Open No. 7-160289 has pro
tion of the preferred embodiments taken in conjunction With
posed a user interface Which alloWs the user to easily correct
FIG. 1 is a schematic block diagram shoWing the arrange
ment of an information processing apparatus according to
each embodiment of the present invention;
the accompanying draWings.
recognition results, Which cannot be determined by a speech
recognition apparatus, in correspondence With input speech.
FIG. 2 shoWs an example of a user interface of an informa
With this user interface, the user can easily correct recogni
tion results that cannot be recogniZed.
Japanese Patent Laid-Open No. 63-48040 has proposed a
user interface Which records input speech and plays it back for
20
FIG. 3 is a vieW for explaining the operation of the infor
mation processing apparatus according to the ?rst embodi
ment of the present invention;
a partner user to make him or her con?rm in a private branch
exchange Which recogniZes the callee’s name uttered by a
caller. In this Way, even When a Wrong recognition result is
25
FIG. 5 is a vieW for explaining the operation of the infor
30
mation processing apparatus according to the ?rst embodi
35
tion result, the user himself or herself must correct it.
tion processing apparatus according to the second embodi
ment of the present invention;
patterns for recognition, and improvement of the recognition
40
ognition result, it is desired to save labor required for user’s
FIG. 10 is a vieW for explaining the operation of the infor
correction as much as possible.
45
FIG. 11 is a vieW for explaining the operation of the infor
FIG. 12 is a vieW for explaining the operation of the infor
for the user’ s correction and improve user’ s convenience even
50
cessing apparatus according to the present invention com
mation processing apparatus according to the second embodi
ment of the present invention.
55
BEST MODE FOR CARRYING OUT THE
INVENTION
prises the folloWing arrangement.
That is, an information processing apparatus comprises:
recording means for recording input data;
recognition means for recogniZing the input data;
mation processing apparatus according to the second embodi
ment of the present invention; and
FIG. 13 is a ?owchart shoWing the operation of the infor
processing apparatus Which recogniZes input data and outputs
the recognition result.
In order to achieve the above object, an information pro
mation processing apparatus according to the second embodi
ment of the present invention;
mation processing apparatus according to the second embodi
ment of the present invention;
The present invention has been made in consideration of
the above situation, and has as its object to save labor required
When an error has occurred during recognition of input data or
a post-process using the recognition result in an information
FIG. 9 is a vieW for explaining the operation of the infor
mation processing apparatus according to the second embodi
ment of the present invention;
during a recognition process or a post-process using the rec
DISCLOSURE OF INVENTION
ment of the present invention;
FIG. 7 is a ?owchart shoWing the operation of the infor
mation processing apparatus according to the ?rst embodi
ment of the present invention;
FIG. 8 shoWs an example of a user interface of an informa
On the other hand, it is dif?cult to register all Words and
phrases including proper nouns in a grammar and standard
rate has its limits. For this reason, When an error has occurred
mation processing apparatus according to the ?rst embodi
ment of the present invention;
FIG. 6 is a vieW for explaining the operation of the infor
occurred during a recognition process, or When a Wrong rec
ognition result is obtained, the user himself or herself must
correct it, resulting in poor convenience. Even When a recog
nition result is correct, if an error has occurred upon executing
a post-process (e.g., a translation process) using the recogni
FIG. 4 is a vieW for explaining the operation of the infor
mation processing apparatus according to the ?rst embodi
ment of the present invention;
obtained, the partner user can recogniZe it by hearing the
playback tone. Hence, When the partner user notices a Wrong
recognition result, he or she can correct it by himself or
herself.
HoWever, With both the prior arts, When an error has
tion processing apparatus according to the ?rst embodiment
of the present invention;
60
Embodiments of the present invention Will be described
hereinafter With reference to the accompanying draWings.
determination means for determining Whether or not the
recognition means can recogniZe the input data; and
First Embodiment
output means for, When the determination means deter
mines that the recognition means can recogniZe the input
data, outputting data generated based on a recognition result
of the recognition means, and for, When the determination
means determines that the recognition means cannot recog
65
An embodiment of the present invention Will be described
hereinafter With reference to the accompanying draWings.
FIG. 1 is a schematic block diagram shoWing the arrangement
of an information processing apparatus according to an
US 7,421,394 B2
4
3
Various operations of the information processing apparatus
embodiment of the present invention. An information pro
cessing apparatus 101 includes a communication unit 102,
according to this embodiment Will be described beloW using
examples shoWn in FIGS. 3 to 6.
FIG. 3 shoWs an example Wherein the user’s input speech
“Efferu-to” can be successfully recognized. In this case, the
speech recognition result “Efferu-to” is translated into “the
operation unit 103, storage unit 104, display unit 105, OCR
unit 106, control unit 107, speech input unit 108, speech
output unit 109, speech synthesis unit 110, and speech rec
ognition unit 111.
The communication unit 102 is connected to a netWork and
Eiffel ToWer” in English. As a result, a generated sentence
“HoW can I get to the Eiffel Tower?” is output as synthetic
speech. Note that the output may be displayed as a text mes
makes data communications With external devices and the
like. The operation unit 103 includes buttons, a keyboard,
mouse, touch panel, pen, tablet, and the like, and is used to
operate the apparatus. The storage unit 104 comprises a stor
age medium such as a magnetic disk, optical disk, hard disk
device, or the like, and stores application programs, input text
sage or an icon on the display screen in addition to synthetic
speech.
By contrast, FIG. 4 shoWs an example When a Word “Eif
eru-taWa” input by the user is not registered in the speech
data, image data, speech data, and the like. The display unit
recognition grammar, When an error has occurred during a
recognition process, or When the recognition result has a poor
105 comprises a display device such as a liquid crystal display
or the like, and displays pictures, text, and the like.
The OCR unit 106 optically reads handwritten or printed
certainty factor (e.g., 30% or less). In this case, a speech
synthesis output generated from a de?ned sentence and play
back of the recorded input speech are combined. In the
characters, speci?es characters by collation With patterns
Which are stored in advance, and inputs text data. The OCR
unit 106 may comprise a scanner, and OCR softWare Which
identi?es characters from read images, and converts the char
acters into document data. The control unit 107 comprises a
Work memory, microcomputer, and the like, and reads out and
executes programs stored in the storage unit 104. The speech
input unit 108 includes a microphone and the like, and inputs
speech uttered by the user. The speech output unit 109 com
prises a loudspeaker, headphone, or the like, and outputs
example of FIG. 4, a de?ned sentence “HoW can I get to” is
20
25
translation is registered, When an error has occurred, or When
speech synthesiZed by the speech synthesis unit 110, speech
stored in the storage unit 104, and the like. The speech syn
thesis unit 110 generates synthetic speech of text stored in the
the processing result has a loW certainty factor during the
30
storage unit 104. The speech recognition unit 111 applies
nese into English. When no English translation of the recog
nition result “Biggu-Ben” is registered in the system in the
sis technique, existing ones are used.
35
generated from a de?ned sentence and playback of the
recorded input speech are combined.
face used When the speech recognition unit 111 recogniZes
speech input via the speech input unit 108, the recognition
40
speech “Biggu-Ben” is played back after that output. At this
time, a text message or icon indicating no corresponding
English translation is available, an error has occurred during
noun, but it is dif?cult to register all Words and phrases in a
45
the translation process, the translation result has a loW cer
50
tainty factor, or the like may be displayed on the display
screen. When no English translation of the recognition result
“Biggu-Ben” is registered in the translation process, text
“Biggu-Ben” as the recognition result may be output, and
“HoW can I get to Biggu-Ben” may be output as synthetic
ment, the user’ s input speech is recorded, and When a Word or
phrase Which is not registered in the speech recognition gram
mar is input, When an error has occurred during a speech
recognition process, When the speech recognition result has a
loW certainty factor, When no corresponding English transla
55
de?ned sentence, and playback of the recorded input speech
When the user recogniZes beforehand that a Word or phrase
to be input is a Word or phrase Which is not registered in the
select a recording/playback mode in Which input speech is
recorded, and a speech synthesis output generated from a
de?ned sentence, and playback of the recorded input speech
are combined upon output. The speech recognition technique,
recognition grammar or a Word or phrase Which cannot
undergo, e.g., a translation process or the like, he or she can
select a recording/playback mode in Which input speech is
recorded, and a speech synthesis output generated from a
de?ned sentence, and playback of the recorded input speech
are combined.
recognition grammar or a Word or phrase Which cannot
undergo, e.g., a translation process or the like, he or she can
speech.
When the user recogniZes beforehand that a Word or phrase
to be input is a Word or phrase Which is not registered in the
tion is registered, When an error has occurred during a trans
lation process, When the translation result has a loW certainty
factor, or the like, a speech synthesis output generated from a
In the example of FIG. 5, a de?ned sentence “HoW can I get
to” is output as synthetic speech, and the recordeduser’ s input
synthetic speech. In such a case, the user often utters a proper
grammar for speech recognition. LikeWise, in a translation
process, it is di?icult to register English translation of all
Words and phrases. Hence, in the apparatus of this embodi
translation process, When an error has occurred during the
translation process, or When the translation result has a loW
certainty factor (e.g., 30% or less), a speech synthesis output
described beloW. FIG. 2 shoWs an example of the user inter
result is translated from Japanese into English, the speech
synthesis unit 110 generates synthetic speech of the generated
English text, and the speech output unit 109 outputs the
application process (translation process). The user’s input
speech “Biggu-Ben” is recognized, and the Word “Biggu
Ben” as the speech recognition result is translated from Japa
speech recognition to speech input via the speech input unit
108. As the speech recognition technique and speech synthe
A feature of the information processing apparatus accord
ing to the ?rst embodiment of the present invention Will be
output as synthetic speech, and the user’s input speech “Eif
eru-taWa” is played back after that speech. At this time, a text
message or icon Which indicates that the Word is not regis
tered in the speech recognition grammar, an error has
occurred during the recognition process, or the recognition
result has a loW certainty factor, or the like may be displayed.
FIG. 5 shoWs an example When no corresponding English
are combined upon output. As shoWn in FIG. 6, in the record
60
ing/playback mode, the user’s input speech “Mike’ s house” is
recorded, and the speech recognition and translation pro
cesses are skipped. Upon output, a de?ned sentence “HoW
can I get to” is output as synthetic speech, and the recorded
user’s input speech “Mike’s house” is played back. At this
65
time, a text message or icon Which indicates that the recorded
speech synthesis technique, and translation technique use
input speech is played back may be displayed on the display
existing ones.
screen.
US 7,421,394 B2
5
6
The aforementioned operations will be described below
using the ?owchart shown in FIG. 7. Initially, a setup indi
cating whether or not the speech recognition mode is selected
unit 103, the recogniZed characters are translated from
is loaded (step S702). If the speech recognition mode is not
selected (the recording/playback mode is selected), and
embodiment, the user’s input character image is recorded in
English into Japanese, and a generated Japanese sentence is
displayed as text on the display unit 105. As in the ?rst
the storage unit 104, and when characters which are not
speech is input (step S703), the speech is recorded (step
S704). Upon output, a speech synthesis output generated
registered in standard patterns for character recognition are
input, when an error has occurred during character recogni
tion, when the character recognition result has a low certainty
factor, when no corresponding Japanese translation is regis
from a de?ned sentence, and playback of the recorded input
speech are combined (step S705, FIG. 6).
On the other hand, if the speech recognition mode is
selected and speech is input, the input speech is recogniZed
(step S707), and is recorded (step S708). If the user’s input
word or phrase is not registered in the speech recognition
tered, when an error has occurred during the translation pro
cess, or when the translation result has a low certainty factor,
a text output of a de?ned sentence and an output of the
recorded input character image are combined.
When the user recogniZes beforehand that characters
(word or phrase) to be input are not registered in the standard
grammar, if an error has occurred during the recognition
process, or if the recognition result has a low certainty factor
(e.g., 30% or less) (i.e., if “NO” in step S709), a speech
synthesis output generated from a de?ned sentence, and play
back of the recorded input speech are combined (step S710,
patterns for recognition, or cannot undergo a translation pro
FIG. 4). If the user’ s input word or phrase is registered in the
speech recognition grammar, or if the recognition result has a
output of a de?ned sentence and an output of the recorded
cess or the like, he or she can select a recording/output mode
in which an input character image is recorded, and a text
20
high certainty factor (e.g., 30% or higher) (i.e., if “YES” in
step S709), the recogniZed word or phrase is processed (trans
Various operations of the information processing apparatus
lation process) (step S711).
In the translation process, if no corresponding English
translation of the recognition result is registered, if an error
25
has occurred during the translation process of the recognition
result, or if the translation result has a low certainty factor
(e.g., 30% or less) (i.e., if “NO” in step S712), a speech
synthesis output generated from a de?ned sentence, and play
back of the recorded input speech are combined (step S710,
FIG. 5). If the corresponding English translation of the rec
ognition result is registered in the system, or if the translation
result has a high certainty factor (e. g., 30% or higher) (i.e., if
30
text. Note that the output may be made using synthetic speech
of the text in addition to the text output.
35
displayed as a text message or icon on the display screen in
addition to synthetic speech.
As described above, according to this embodiment, input
By contrast, FIG. 10 shows an example when the user’s
input characters are not registered in standard patterns for
character recognition, when an error has occurred during
character recognition, or when the recognition result has a
low certainty factor (e.g., 30% or less). In this case, a text
output of a de?ned sentence and an output of the recorded
input character image are combined. In the example of FIG.
10, the user’s input character image “the Tokyo Tower” is
speech is recorded, and when a word or phrase which is not
registered in the speech recognition grammar is input, when
according to this embodiment will be described below using
examples shown in FIGS. 9 to 12.
FIG. 9 shows an example wherein the user’s input charac
ters “the Tokyo Tower” can be successfully recogniZed. In
this case, the character recognition result “the Tokyo Tower”
is translated into “i159 7”” in Japanese. As a result, a gen
erated sentence “iii? '7' ‘13 5?h‘liWAT3‘?‘? is output as
“YES” in step S712), the full generated sentence is output as
synthetic speech (step S713, FIG. 3). The output may be
input character image are combined upon output. The text
output technique and translation technique use existing ones.
an error has occurred during speech recognition, when the
recognition result has a low certainty factor, when no corre
output, and a de?ned sentence “K133517711 IXLWI’C‘TYU‘?” is
output as text. At this time, a text message, icon, or voice
message indicating that the user’s input characters are not
sponding English translation is registered in the system, when
registered in standard patterns for character recognition, an
40
an error has occurred during the translation process, or when
the processing result has a low certainty factor, a speech
synthesis output generated from a de?ned sentence, and play
back of the recorded input speech are output in combination,
45
nese translation is registered in the system, when an error has
occurred during an application process (translation process),
thus reducing the number of times of user’s manual correc
tion upon occurrence of a recognition error or any other
errors, and improving the convenience.
50
Second Embodiment
An information processing apparatus according to the sec
ond embodiment of the present invention will be described
or when the processing result has a low certainty factor. The
user’ s input characters “the Tokyo Towr” are recogniZed, and
the character recognition result “the Tokyo Towr” is trans
lated from English into Japanese. In the translation process,
when no corresponding Japanese translation of the recogni
tion result “the Tokyo Towr” is registered in the system, when
55
below. The ?rst embodiment has exempli?ed a case wherein
speech is recognized. This embodiment will exemplify a case
wherein handwritten characters are recogniZed. Note that the
apparatus arrangement is the same as that shown in FIG. 1,
and a description thereof will be omitted. An existing tech
error has occurred during character recognition, or the recog
nition result has a low certainty factor, may be output.
FIG. 11 shows an example when no corresponding Japa
an error has occurred during the translation process, or when
the translation result has a low certainty factor (e.g., 30% or
less), a text output of a de?ned sentence and an output of the
recorded input character image are combined.
In the example of FIG. 11, the user’s input character image
60
“the Tokyo Towr” is output, and a de?ned sentence
nique is used to recogniZe handwritten characters. Note that
“ “13557511?” is output as text. At this time, a text message,
characters are not limited to handwritten characters, and char
icon, or voice message indicating that no corresponding J apa
nese translation is registered, an error has occurred during the
acters which are speci?ed by optically scanning printed char
acters by the OCR unit 1 06 and collating them with pre- stored
patterns may be used.
FIG. 8 shows the operation of the apparatus when the
control unit 107 recogniZes characters input via the operation
translation process, or the translation result has a low cer
65
tainty factor may be output. In the translation process, when
no corresponding Japanese translation of the recognition
result “the Tokyo Towr” is registered in the system, text “the
US 7,42l,394 B2
7
8
Tokyo ToWr” as the recognition result may be output, and “the
Tokyo ToWr “I15 5511?” may be output as text.
When the user recognizes before hand that characters to be
thus reducing the number of times of user’s manual correc
tion upon occurrence of a recognition error or any other
input are a Word or phrase Which is not registered in the
standard patterns for recognition, or no corresponding J apa
nese translation is registered, he or she can select a recording/
Other Embodiment
errors, and improving the convenience.
output mode in Which an input character image is recorded,
Note that the present invention may be applied to either a
system constituted by a plurality of devices (e.g., a host com
and a text output of a de?ned sentence and an output of the
recorded input character image are combined upon output. As
shoWn in FIG. 12, in the recording/output mode, the user’s
input character image “Taro’s house” is recorded, and the
character recognition and translation processes are skipped.
puter, interface device, reader, printer, and the like), or an
Upon output, the user’s input character image “Taro’s house”
is output, and a de?ned sentence “K133555117” is output as
text. At this time, a text message, icon, or voice message
supplying a storage medium, Which contains a program code
of a softWare program that can implement the functions of the
above-mentioned embodiments to the system or apparatus.
indicating that the recorded input character image is output
The program code is then read and executed by a computer (or
may be output.
The aforementioned operations Will be explained beloW
using the ?owchart of FIG. 13. Initially, a setup indicating
a CPU or MPU) of the system or apparatus.
Whether or not the character recognition mode is selected is
apparatus consisting of a single equipment (e.g., a copying
machine, facsimile apparatus, or the like).
The objects of the present invention are also achieved by
20
loaded (step S1301). If the character recognition mode is not
selected (if the recording/output mode is selected) and char
acters are input (step S1303), the character image is recorded
(step S1304). Upon output, a text output of a de?ned sentence
and an output of the recorded input character image are com
example, a ?oppy® disk, hard disk, optical disk, magneto
optical disk, CD-ROM, CD-R, magnetic tape, nonvolatile
25
bined (step S1305, FIG. 12).
On the other hand, if the character recognition mode is
selected and characters are input (step S1306), the input char
acters are recogniZed (step S1307), and the input character
image is recorded (step S1308). If the user’s input characters
30
ognition, if an error has occurred during the recognition pro
cess, or if the recognition result has a loW certainty factor
(e.g., 30% or less) (i.e., if“NO” in step S1309), a text output
35
character image are combined (step S1310, FIG. 10). If the
user’s input characters are registered in the standard patterns
for character recognition, or if the recognition result has a
high certainty factor (e.g., 30% or higher) (i.e., if “YES” in
step S1309), the recogniZed Word or phrase is processed
(translation process) (step S1311). In the translation process,
As many apparently Widely different embodiments of the
40
CLAIM OF PRIORITY
This application claims priority from Japanese Patent
Application No. 2003-378877 ?led on Nov. 7, 2003 Which is
hereby incorporated herein by reference herein.
50
system, When an error has occurred during the translation
process, or When the processing result has a loW certainty
factor, a text output of a de?ned sentence and an output of the
recorded input character image are displayed in combination,
The invention claimed is:
1. An information processing apparatus for processing (1)
a sentence having a de?nedportion and a missing portion, and
Note that the character recognition may use image recog
and text according to the user’s input image may be output
after translation or the recorded input image may be output.
As described above, according to the second embodiment,
an input character image is recorded, and When characters
unregistered in the standard patterns for character recogni
tion, When an error has occurred during character recognition,
When the recognition result has a loW certainty factor, When
no corresponding Japanese translation is registered in the
present invention can be made Without departing from the
spirit and scope thereof, it is to be understood that the inven
tion is not limited to the speci?c embodiments thereof except
as de?ned in the appended claims.
45
S1312), a text output of a de?ned sentence and an output of
nition that exploits an existing image recognition technique,
in a function extension board or a function extension unit,
Which is inserted in or connected to the computer. Once the
program code is read out from the storage medium, it is
if no Japanese Word or phrase corresponding to the recogni
the recorded input character image are combined (step S13 10,
FIG. 11). if the Japanese Word or phrase corresponding to the
recognition result is registered in the system, or if the trans
lation result has a high certainty factor (e.g., 30% or higher)
(i.e., if “YES” in step S1312), the full generated sentence is
output as text (step S1313, FIG. 9). The output may be made
by synthetic speech of text in addition to the text output.
With the computer or by executing instructions of the program
code like some or all of actual processing operations executed
by an OS (operating system) running on the computer.
Furthermore, the functions of the above-mentioned
embodiments may be implemented by some or all of actual
processing operations executed by a CPU or the like arranged
Written in a memory of the extension board or unit.
tion result is registered in the system, if an error has occurred
during the translation process, or if the translation result has a
loW certainty factor (e.g., 30% or less) (i.e., if “NO” in step
memory card, ROM, and the like may be used.
The functions of the above-mentioned embodiments may
be implemented either by executing the readout program code
are not registered in the standard patterns for character rec
of a de?ned sentence and an output of the recorded input
In this case, the program code, itself, read out from the
storage medium implements the functions of the above-men
tioned embodiments, and the storage medium Which stores
the program code constitutes the present invention.
As the storage medium for supplying the program code, for
55
60
(2) input speech data corresponding to the missing portion,
the apparatus comprising:
(a) registration means for registering a de?ned portion
translation, Which is a translation of the de?ned portion;
(b) recording means for recording the input speech data
corresponding to the missing portion;
(c) generation means for, When the speech data is input,
simultaneously generating:
(i) de?ned portion translation speech data, Which is
speech data to output speech corresponding to the
de?ned portion translation registered in said registra
65
tion means, and
(ii) missing portion translation speech data, Which is
speech data to output speech corresponding to a miss
US 7,421,394 B2
10
ing portion translation, the missing portion translation
(f) speech output means for outputting speech under the
folloWing conditions:
(i) if the recognition certainty factor is less than the
being a recognized and translated result of the input
speech data;
predetermined threshold value or said determination
means determines that all of the recognition result has
not been translated, said speech output means outputs
(d) ?rst determination means for determining Whether all
of the input speech data has been recognized;
(e) second determination means for, if said ?rst determi
nation means determines that all of the input speech data
has been recognized, determining Whether all of a rec
speech by combining the speech data recorded by said
recording means With the de?ned portion translation
speech data,
ognition result has been translated, the recognition result
(ii) otherWise, said speech output means outputs speech
by combining the de?ned portion translation speech
data With the missing portion translation speech data.
being obtained by speech-recognizing the input speech
data; and
(f) speech output means for outputting speech under the
folloWing conditions:
6. The apparatus according to claim 5, Wherein if the rec
ognition certainty factor is less than the predetermined
threshold value, said speech output means outputs informa
tion representing that the recognition certainty factor of the
recognition result is loW.
7. An information processing apparatus for processing (1)
(i) if said ?rst determination means determines that all of
the input speech data has not been recognized or said
second determination means determines that all of the
recognition result has not been translated, said speech
output means outputs speech by combining the
speech data recorded by said recording means With
the de?ned portion translation speech data,
(ii) otherWise, said speech output means outputs speech
by combining the de?ned portion translation speech
data With the missing portion translation speech data.
2. The apparatus according to claim 1, Wherein said ?rst
determination means determines that all of the input speech
data has not been recognized When a phrase corresponding to
the input speech data is not registered in a syntax for speech
a sentence having a de?ned portion and a missing portion, and
20
(a) registration means for registering a de?ned portion
translation, Which is a translation of the de?ned portion;
25
nition process.
30
tion means, and
speech data to output speech corresponding to a miss
puts information representing that all of the input speech data
ing portion translation, the missing portion translation
35
(d) determination means for determining Whether all of the
input speech data has been recognized;
outputs information representing that all of the recognition
result Was not translated.
40
missing portion translation; and
(2) input speech data corresponding to the missing portion,
the apparatus comprising:
(f) speech output means for outputting speech under the
folloWing conditions:
(a) registration means for registering a de?ned portion
translation, Which is a translation of the de?ned portion;
(i) if said determination means determines that all of the
input speech data has not been recognized or the
translation certainty factor is less than a predeter
mined threshold value, said speech output means out
(b) recording means for recording the input speech data
corresponding to the missing portion;
(c) generation means for, When the speech data is input,
(i) de?ned portion translation speech data, Which is
speech data to output speech corresponding to the
de?ned portion translation registered in said registra
puts speech by combining the speech data recorded by
50
data With the missing portion translation speech data.
55
speech data to output speech corresponding to a miss
ing portion translation, the missing portion translation
being a recognized and translated result of the input
speech data;
(d) acquisition means for acquiring a recognition certainty
factor of a recognition result, the recognition result
60
9. An information processing apparatus for processing (1)
(2) input speech data corresponding to the missing portion,
the apparatus comprising:
(e) determination means for, if the recognition certainty
determining Whether all of the recognition result has
been translated; and
8. The apparatus according to claim 7, Wherein if the trans
lation certainty factor is less than the predetermined threshold
value, said speech output means outputs information repre
senting that the translation certainty factor of the translation
result of the missing portion translation is loW.
a sentence having a de?ned portion and a missing portion, and
being obtained by speech-recognizing the input speech
data;
factor is more than a predetermined threshold value,
said recording means With the de?ned portion trans
lation speech data,
(ii) otherWise, said speech output means outputs speech
by combining the de?ned portion translation speech
tion means, and
(ii) missing portion translation speech data, Which is
(e) acquisition means for, if said determination means
determines that all of the input speech data has been
recognized, acquiring a translation certainty factor of the
a sentence having a de?ned portion and a missing portion, and
simultaneously generating:
being a recognized and translated result of the input
speech data;
second determination means determines that all of the recog
nition result has not been translated, said speech output means
5. An information processing apparatus for processing (1)
(i) de?ned portion translation speech data, Which is
speech data to output speech corresponding to the
de?ned portion translation registered in said registra
(ii) missing portion translation speech data, Which is
Was not recognized.
4. The apparatus according to claim 1, Wherein if said
(b) recording means for recording the input speech data
corresponding to the missing portion;
(c) generation means for, When the speech data is input,
simultaneously generating:
recognition or When an error occurs during a speech recog
3. The apparatus according to claim 1, Wherein if said ?rst
determination means determines that all of the input speech
data has not been recognized, said speech output means out
(2) input speech data corresponding to the missing portion,
the apparatus comprising:
65
(a) registration means for registering a de?ned portion
translation, Which is a translation of the de?ned portion;
(b) recording means for recording the input speech data
corresponding to the missing portion;
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement