Speaker Verification - TMH

Speaker Verification - TMH
Person identification
Speaker Recognition and
Verification
• Methods rely on
– Something you possess
• (E.g. key, magnetic card)
– Something you know
Mats Blomberg
• (PIN-code)
– Something you are
Speech, Music and Hearing
KTH
• (physical attributes, behaviour: biometrics)
Speaker verification 2005-10-31 [ 1 ]
Speaker verification 2005-10-31 [ 2 ]
Biometric identification features
<physical attribute>
Verification / Identification
<activity/behaviour>
Height and weight
Finger print
handwriting
Typing
pattern
Gestures, facial expressions
Hand shape
retina
face
• Speaker verification
– The claimed identity is verified by voice
– Binary decision: “accept or reject?”, “true customer or impostor?”
– The performance is independent of the number of registered users
SPEECH
Vocal tract size
Nasal cavities
Glottal folds
• Speaker identification
Speech rate
Phonetic realisation
– Choose 1 of N: “Who is the speaker?”
Intonation
Choice of word and grammar
• Closed set: The utterance is known to come from the N trained speakers
• Open set: The utterance may be spoken by persons outside N
– The performance decreases with increased number of identities
Speaker verification
accept/reject claimed identity
Speaker Identification (who?)
Speaker verification 2005-10-31 [ 3 ]
Speaker verification 2005-10-31 [ 4 ]
Advantages/problems with speaker
verification
+ Speech is natural
+ Simple to record, non-obtrusive
+ In many applications, speech may already be used for other
purposes
Application examples
•
–
–
–
–
+ Low extra cost if the application already uses speech recognition
+ Not 100% security, but
•
•
•
•
Telecommunication
Bank services, also complementary to manual methods
Credit cards
Information access by phone
Telephone call charging
That’s the case for other techniques as well
Can be combined with other methods
Makes it less worthwhile for organised crime
Deterrent effect
•
– Large variability for a speaker at different occasions
•
Crime investigation
•
Speaker tracking
On-site
– Entrance control
– Authorisation
– Home incarceration (large in USA)
– Behaviour
– Different microphones or microphone positions
– Physical and mental condition
– Objective automatic techniques
– Find the intervals during a conversation when a certain person is speaking
– E.g. during telephone conversation and in radio and TV
– Speech recognition problems
Speaker verification 2005-10-31 [ 5 ]
Speaker verification 2005-10-31 [ 6 ]
Voice characteristics vary with time
Text dependence
Variability within one speaker
• SV systems have varying requirements on what
the user should say
– Fixed password
• Highest text dependence
– User specific password
– Limited vocabulary
• E.g. digits
– The system presents the text to be spoken
• Text-prompted
• combination of speaker and text verification
• Prevents playback of recorded speech
– Any word sequence is allowed
Acoustic variation among identical utterances as a function of the duration
of the recordings. Average for nine male speakers.
(Furui, 1986).
• Text independent
Decreasing text dependence
Speaker verification 2005-10-31 [ 7 ]
Speaker verification 2005-10-31 [ 8 ]
Modelling techniques
Same or different analysis
as in speech recognition?
a 2 ,2
• HMM
•
SPEECH recognition should be SPEAKER independent
•
SPEAKER recognition should be SPEECH independent
•
This suggests that the optimal acoustic features are different between
speech and speaker recognition
•
However, experiments have shown that the best SPEECH
representation is at the same time one of the best SPEAKER
representations
•
Why? Maybe the optimal representation contains both SPEECH and
SPEAKER information
– Text-dependent systems
q
q
q
– The state sequence represents allowed utterances
– Should extract phonetic information but not speaker information
1
– Text-independent systems
q
– Single-state HMM with large number of Gaussian
mixture components (~ 1000) representing any
utterance by the speaker
– Sequential information is not used
1
• Combined GMM + HMM systems
Speaker verification 2005-10-31 [ 9 ]
Speaker verification 2005-10-31 [ 10 ]
Probabilistic decision criterion
Two phases in speaker verification
Registration (training, enrolment)
•
Train
model
a2 ,2
q1
q2
q7
If
P(The client sounds like this)
P(Anybody could sound like this)
Then accept, else reject
Verification
Access utterance
Bayes decision theory
– The ratio between the probability scores of a client and an anti-client
model is compared with a decision threshold
Trained speaker model
Spectral
analysis
Spectral
analysis
Matching
7
• GMM (Gaussian Mixture Models)
– Should extract speaker information but not speech information
Training utterances
from a new client
2
>R
P(O | θ C )
≥R
P(O | θ C )
O: utterance
θC: client C’s model
Accept / Reject
The threshold R can be adjusted for
Required balance between errors,
Minimum total error
Minimum error cost
Claimed identity
Speaker verification 2005-10-31 [ 11 ]
Speaker verification 2005-10-31 [ 12 ]
Standard system
Two types of errors
a2 ,2
Speaker model (HMM)
Claimed identity
q1
q2
q7
Client
matching
Claimed identity:
log( P(O | Client )
True
+
Utterance
∑
Spectral
analysis
-
OK
Accept
log( P (O | Non − client )
(MFCC)
False
Decision
Decision
Threshold
False Accept (FA)
Decision:
Background
Background
model
model
matching
matching
a
Reject
False Reject (FR)
OK
2 ,2
Background model(-s) (HMM)
q1
q2
q7
Speaker verification 2005-10-31 [ 13 ]
Speaker verification 2005-10-31 [ 14 ]
Score distribution
for true and false speaker identities
sˆ =
f (ŝ " false speaker"
p (O Client
)
p (O Non − client
The error balance depends
on the decision threshold
Error rate
)
f (ŝ " true speaker"
)
False accept (FA)
False reject (FR)
)
FA(T)
Threshold
EER
s$
P("false accept")
P("false reject")
Decision threshold
FR(T)
TEER
T
EER: Equal Error Rate, EER = FA(TEER) = FR(TEER)
at an a posteriori determined threshold
Speaker verification 2005-10-31 [ 15 ]
Speaker verification 2005-10-31 [ 16 ]
Application-dependent
operating point
Performance measures
• False Rejection rate (FR)
– FR = (Nbr false reject utterances) / (Nbr true ID attempts)
False Reject [%]
Bank transactions:
• False Acceptance rate (FA)
– FA = (Nbr false accept utterances) / (Nbr impostor attempts)
• Half Total Error Rate (HTER)
10
The appropriate operating point
(balance FA/FR) depends on
the costs of each error type
The FA cost is high
The customer can accept a few
false rejects to achieve high security
High security
– HTER = (FR + FA) /2
• Equal Error Rate (EER)
1.0
– EER = FR = FA at an a posteriori determined threshold
– Well defined measure, but cannot be selected in practice
• Detection Error Trade-off (DET)
– Exhibits FR and FA at different thresholds
– Similar to “Receiver Operating Characteristics” (ROC)
Speaker verification 2005-10-31 [ 17 ]
Telephone call charges:
The FA cost is low
The customer can accept a few
false accepts for high convenience
DET curve
0.1
High convenience
0.1
1.0
10
False Accept [%]
Speaker verification 2005-10-31 [ 18 ]
Performance in different
applications
False Reject [%]
Text independent
Telephone (several types)
Medium training size
Text dependent
(system combinations)
HiFi speech
Known microphone
Large training size
Text dependent
(e.g. digit strings)
Telephone (several types)
Small training size
Security aspects
– Performance is measured using casual
impostors
– What is the immunity against real impostor
attempts?
• Imitations? Recordings? “Personal” speech
synthesis?
– The security of conventional systems can be
raised by combination with voice
0.1
1.0
10
False Accept [%]
• E.g. protection if credit card + PIN code is stolen
– Preventive effect by
• Recordings can be saved for later manual control
Speaker verification 2005-10-31 [ 19 ]
Speaker verification 2005-10-31 [ 20 ]
The CTT project PER
(Prototype Entrance Receptionist)
User aspects
• As little training as possible, preferrably nothing
- The speaker’s variability cannot be measured
• Speaker verification should simplify for the user,
preferrably transparent
• Door guard or warning bell?
• What balance FA / FR?
– Depends on the security demands and the costs
– True clients should not be disturbed
• Visually detects the presence of a person at the
TMH entrance
• Identifies personnel using speaker verification
and unlocks the gate
– Say your name and a prompted digit sequence
– Animated talking face
• Combined HMM and GMM system
– Comparable performance with commercial system
• In practical use since 1998
Speaker verification 2005-10-31 [ 21 ]
PER at the TMH entrance
Subject: the creator Håkan Melin
Speaker verification 2005-10-31 [ 22 ]
Summary
• Speaker verification useful today in certain applications
• Can be combined with other methods to increase security
• User aspects have to be taken into account
Speaker verification 2005-10-31 [ 23 ]
Speaker verification 2005-10-31 [ 24 ]
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement