Multi-speaker, Narrowband, Continuous Marathi Speech Database

Multi-speaker, Narrowband, Continuous Marathi Speech Database
MULTI-SPEAKER, NARROWBAND, CONTINUOUS MARATHI SPEECH DATABASE
Tejas Godambe, Nandini Bondale, Samudravijaya K
Preeti Rao
School of Technology and Computer Science
Tata Institute of Fundamental Research
Mumbai, India
{tejas.godambe, drnandini.bondale,
samudravijaya}@gmail.com
Department of Electrical Engineering
Indian Institute of Technology Bombay
Mumbai, India
prao@ee.iitb.ac.in
Abstract—We describe the development of a continuous
speech database in Marathi language. Speech data was collected
from about 1500 literate speakers from 34 districts of Maharashtra,
with a variety of characteristics such as age group, gender, mother
tongue and educational qualification. The subjects called the data
acquisition system with personal mobile handsets, and read
specially designed sentence sets. The sentence data acquisition
process was conducted on field in contrast to a quiet environment.
As a result, the acquired speech data captured large amount of
nonspeech sounds as well as incompletely spoken words. So, the
speech data was transcribed employing additional labels to denote
frequently occurring nonspeech sounds, different kinds of
incomplete words and invalid words. We characterize the database
in terms of the statistics of features such as gender distribution of
speakers, phonemic richness, amount of non speech sounds, and
average sentence and word lengths for both reference and actual
sentences.
Keywords—speech recognition; speech data; Marathi;
transcription
I.
INTRODUCTION
Agriculture has been and still remains the major source of
income for majority of the Indian population. To promote
agricultural growth, Ministry of Agriculture, Government of
India, has set up a website at http://agmarknet.nic.in/ that
displays (and also archives) selling prices of commodities
traded in markets across India on a daily basis. But this
information can be accessed only by literate and computer
savvy farmers. Hence, a spoken interface to the online price
database to enable all farmers to access the information using
personal mobile phones in local language has been developed
in six Indian languages by a consortium of seven institutions
with support from the Department of Information Technology,
Government of India [1]. The spoken interface employs
Automatic Speech Recognition (ASR) and Speech Synthesis
technologies. Tata Institute of Fundamental Research,
Mumbai, and Indian Institute of Technology, Bombay, have
together developed a voice interface in Marathi language for
Maharashtra’s farmers. This voice interface which we refer as
Agricultural Information System (AIS), prompts the farmer to
speak the name of commodity whose price (s)he is interested
in, name of the district and the nearest mandi (market), one at
a time. However, in practice extra words are uttered by the
farmers. We observed this in the recordings of a few telephone
calls dialed to AIS by the farmers. Some farmers uttered “Give
the price of Potato” (English translation of Marathi query)
instead of saying just “Potato” in response to system’s query
to utter a commodity name. In order to enable the existing
ASR system, primarily trained with isolated word utterances
of district, market and commodity names, to infer keywords of
interest (Potato in the above example) from a sequence of
spoken words, we decided to record continuous speech in the
form of read sentences over telephone/mobile channel. Our
volunteers traversed all 34 districts in Maharashtra to collect
speech data from around 1500 speakers which helped us to
capture regional speech/environmental variations and varied
speaker characteristics.
This paper is organized as follows. Section II explains the
process of development of continuous speech database in
Marathi language using read sentences. In Section III, we
perform statistical analysis of acquired data. Section IV talks
about mapping of rare sounds to other phonemes for ASR
applications. We conclude the paper in section V.
II. DEVELOPMENT OF MARATHI SENTENCE DATABASE
Development of the Marathi sentence database is explained in
three steps as follows: (1) Generation of Marathi sentence sets;
(2) Speech data recording; and (3) Speech data transcription.
A. Generation of Marathi Sentence Sets
For acquiring continuous speech data, each speaker (literate
enough to read Marathi and who may or may not be a farmer)
was asked to read out a set of sentences. Design of these
sentence sets, and the refinement step required before
releasing them for data collection [2] is described below.
1) Semi-automatic design of sentence sets
A few key things we kept in mind while designing sentence
sets were as follows:
• A set should contain sentences from diverse sources.
• The sentences should be grammatically correct,
meaningful and natural.
• The sentences should be easy to read, i.e. they should
be simple and short.
• The number of sentences in a set should not be large.
So, each set containing 10 sentences was automatically
generated by pooling sentences from many sources, viz. six
sentences from books, two proverbs, one sentence from online
stories, and one digit sequence (of length seven digits), to
incorporate variety. The number of sentences selected from
each source was proportional to the size of the source. But, we
did not select sentences containing words fewer than four (too
short sentences) or those containing words more than 10 (long
sentences), since too short sentences yield little speech data
and long sentences may be difficult to read without fumbling
in single attempt. In all, 340 unique sentence sets containing
3400 sentences were automatically generated. We adjusted
two sentence sets in single A4 size sheet, i.e. four sets in a
page as we wanted that volunteer does not need to carry many
pages on the field, but made sure that font size had remained
large enough to visually aid the speaker for reading sentences.
2) Manual refinement of sentence sets
We noticed that sentences copied from web contained a lot of
spelling errors. Some sentences were syntactically or
semantically incorrect. Some sentences contained invalid
words. There were even a few sentences which touched social
and politically sensitive topics. A few sentences contained
words which we felt may be difficult to pronounce for the
speakers, and may give rise to more speech disfluencies.
Some sentences, even if they contained words fewer than 10,
could not be accommodated in a single line. All 3400
sentences were manually validated to remove such defects.
B. Speech Data Recording
During data collection, our aim was to acquire continuous
speech data, using the 340 specially designed sentence sets,
from many speakers in various background conditions over
narrowband telephone channel. In this subsection, we describe
the data acquisition process in brief, discuss the motivation for
collecting wideband speech in parallel with narrowband
speech, analyze the metadata statistics, and see proportion of
male and female data obtained from each administrative
division in Maharashtra.
1) Data acquisition process
Two Data Acquisition Systems (DAS) employing dedicated
phone lines were set up at both IITB and TIFR. Our volunteers
traversed all 34 districts in Maharashtra to collect read speech
data, using the 340 specially designed sentence sets, from
around 1500 literate speakers. Each speaker read exactly one
sentence set and each set was read by 4-5 speakers. Before
dialing a call to DAS and beginning actual recording, a
volunteer would ask the speaker to get familiarized with the
pronunciation of words in the sentence set provided, so that
the speaker does not fumble during actual recording. Then,
volunteer would dial a call to DAS using speaker's calling
device (mobile or landline), enter his/her metadata details via
DTMF keys in response to the system prompts, and handover
the phone to the speaker. Soon, DAS would begin to prompt
the speaker to read out each numbered sentence, one by one.
The recording duration for a sentence was kept long enough to
allow the speakers to read the sentences at their natural pace.
Facility to pause the call in between (by pressing the '*' key)
and to replay the current system prompt (by pressing '#' key)
was also provided for speakers. This entire endeavor yielded
around 25 hours of speech data.
2) Wideband speech data
The speech signal that traversed wireless and narrowband
channels was recorded in the lab. We also provided high
quality Sony handheld recorders (model: ICD-PX820) to the
volunteers to simultaneously record wideband speech. The
hope was that one could compare wideband speech recorded
in situ with the same speech recorded after passing through
narrowband (mobile) channel and derive insight into the
convolutional distortions of mobile speech. This information
could potentially be used for minimizing the effect of channel
and handset variations. However, we found that the wideband
speech data recorded data is highly noisy, and SNR varies a
lot. The recorded speech files are not yet segmented into
sentence length chunks for comparison with the corresponding
narrowband data. The wideband speech even contains
conversations between the farmer and the volunteer.
3) Metadata statistics
Volunteers entered metadata details (various attributes of
speakers, environment, and calling device) in response to
system prompts via DTMF keys before handing phone to the
speaker for recording. Table I shows the metadata statistics.
TABLE I.
METADATA STATISTICS
Metadata
Stats
(%)
Gender
Male
Female
83.4
16.6
Age group
15-25
25-50
> 50
29.7
64.0
6.3
Mother tongue
Marathi
Other
90.6
9.4
Educational
qualification
Primary
High school
College
9.7
37.3
50.1
Phone
ON
loudspeaker mode OFF
89.2
10.8
Recording
environment
Quiet
Music in background
People talking in background
Other noises in background
25.5
9.1
45.1
20.3
Calling device
Mobile
Landline
92.5
7.5
Phone
manufacturer
Nokia
Samsung
Other
42.3
15.2
42.5
Service provider
Airtel
Idea Cellular
Vodafone
BSNL
Other
27.1
26.0
15.4
11.1
20.4
Salient features of metadata statistics in Table I is as below.
a) We expect the AIS will be mostly used by Indian
farmers belonging to age group 25-50 (middle age) and
particularly male farmers. Female farmers may call AIS but
surely not as frequently as male farmers. Also, it is not easy to
acquire speech data from middle aged female farmers in rural
areas. Hence, we instructed our volunteers to collect speech
data from male and female speakers in 80:20 proportion.
Analysis shows that our volunteers collected speech data in
83.4:16.6 proportion, and that majority of speakers
participating in data collection belonged to age group 25-50.
b) More than 90% were fluent Marathi speakers.
c) Half of the speakers were college educated persons.
d) As expected, recording environment was nonquiet in
majority (~75%) of cases.
e) With the increasing usage of mobile phones over
landlines, we asked volunteers to dial calls to DAS from
speaker’s mobile and landline phones in proportion 90:10. We
can see that calls were made in 92.5:7.5 proportion.
f) Most speakers owned Nokia and Samsung handsets.
g) Most speakers were subscribers of Airtel, Idea
Cellular, Vodafone and BSNL. Since other handset brands and
service providers did not have comparable number of
customers, we did not report their figures in Table I.
4) Division wise proportion of male and female speakers
The state of Maharashtra in India has six administrative
divisions [3], which are Amaravati, Aurangabad, Konkan,
Nagpur, Nashik and Pune. Table II shows the percentage
proportion of total Marathi sentence data collected in each
division. It also shows gender distribution in each division.
Now, let us understand the numbers reported in Table II. The
second column tells that Amaravati division contains five
districts. The third column states that 16.4% of total sentence
speech data was collected in Amaravati division. The next two
columns respectively tell that 14.2% and 2.2% of the total
male and female speech data came from Amaravati division.
TABLE II.
PROPORTION OF TOTAL MARATHI SENTENCE DATA AND
MALE / FEMALE DATA COLLECTED IN EACH ADMINISTRATIVE DIVISION
Administrative No. Speech Male
Divisions in
of
Files spkrs
Maharashtra Dist.
(%)
(%)
Female
spkrs
(%)
Amaravati
5
16.4
14.2
2.2
Aurangabad
8
24.7
19.5
5.2
Konkan
5
13.8
11.6
2.2
Nagpur
6
17.4
14.2
3.2
Nashik
5
14.4
12
2.4
Pune
5
13.3
11.8
1.5
C. Speech Data Transcription
In addition to acquiring speech data that represent test
conditions well, accurate transcriptions are essential for
building a robust ASR system. We transcribed the sentence
utterances at word level with the help of Indic Language
Transliteration tool [4] in IT3 format [5]. Since the sentence
data to be transcribed was large, and contained significant
amount of different kinds of nonspeech events, incomplete
and invalid words, a set of transcription guidelines [6] was
written to be referred by all transcribers, which helped
maintain consistency in the transcription process. The scheme
we followed for transcribing nonspeech events and
incomplete/invalid words is described below.
1) Transcribing nonspeech events
It is hard to ignore nonspeech (filler) sounds as they have
occurred in large numbers in the acquired database. If not
labeled, they can damage statistical parameters estimated for
speech sounds. Table III lists the labels of nine filler sounds
with meanings, sorted according to their frequency of
occurrence in the database. Following the CMU Sphinx ASR
tool [7] convention, we enclosed the filler labels in double '+'
sign to distinguish them from other words in transcriptions. A
detailed explanation on usage of filler labels is in [6]. We will
discuss the frequency distribution of fillers in section III.
TABLE III.
LIST OF FILLER LABELS WITH MEANING, SORTED
THEIR FREQUENCY OF OCCURRENCE IN DATABASE
Filler labels
Meaning
++bn++
++babble++
Background noise
People talking in
background
Machine generated
narrow band sound
Noticeable silence
Vocal noise
Impulsive sound
Hesitation sound
Laughing sound
Affirmative sound
Frequency
Distribution
Count
++horn++
++pau++
++vn++
++bang++
++aah++
++laugh++
++hmm++
ACCORDING TO
(%)
7931
3172
52.33
20.93
1543
10.18
1229
470
385
294
79
52
8.11
3.10
2.54
1.94
0.52
0.35
2) Transcribing incomplete and invalid spoken words
The following excerpt from transcription guidelines illustrates
four situations in which an incomplete word is spoken, and
explains how we transcribed it in each situation.
• The speaker mumbled something and then said “ster”.
If the transcriber was able to predict that the speaker
wanted to say “minister”, (s)he transcribed the partially intelligible word as [mini]*ster.
•
The speaker said only “ster”. If the transcriber could
predict that the speaker wanted to say “minister”, the
incompletely spoken word was transcribed as *ster.
•
The speaker said only “ster”. If the transcriber could
not predict what the speaker wanted to say, i.e. if the
incomplete word sounded invalid to the transcriber, it
was transcribed as ++babble++.
•
The speaker said just “mini”. Since “mini” is a valid
word in English language, we transcribed it as mini.
The following lines show an exemplar sentence in Devanagari
script, expected/reference transcription in IT3 format and the
manually corrected transcription containing filler labels for
unexpected sounds respectively.
a) Reference sentence: मी िमिनसटर चं बोलणं टेिलिवहजन वर
ऐकलं (which means “I heard the minister’s speech on television”)
b) Reference transcription: mii < minister > chan'
boland-an' < television > vara aikalan’
c) Actual transcription: ++babble++ mii < mini*[ster] >
++horn++ chan' boland-an' < television > vara aikalan’
++babble++
We can see that reference and actual transcription differs significantly. In the utterance, the first part of the word “minister”
up to “mini” was intelligible but not the latter part. The partially intelligible word was therefore transcribed as mini*[ster]
as transcriber could also predict (from reference transcription)
that speaker wanted to say “minister”. A ++horn++ label was
included after the word minister because of a horn sound after
the word. We also noticed people talking in the background
throughout the length of the utterance. This was indicated by
including ++babble++ at the start and end in the actual transcription. Also, the words “minister” and “television” can be
seen enclosed in angular brackets to indicate English words.
TABLE IV.
PHONEMES WITH THEIR PERCENTAGE RELATIVE FREQUENCY OF
OCCURRENCE IN THE DATABASE
Short vowels
ʌ
12.413
ʌ̆
0.360
a
15.205
ã
0.010
ai
0.142
(Nasalized)
Long vowels
(Nasalized)
Diphthongs
A. Phonemic Richness of Sentence Data
The transcription of Marathi speech contains 70 phoneme-like
labels; some labels represent nasalized or geminated sounds.
In this paper, we use the terms phonemes and phoneme like
labels interchangeably. Table IV shows all 70 phonemes with
their percentage relative frequencies of occurrence in the
database. Percentage relative frequency of a phoneme, f i, is
i:
4.511
u:
1.375
au
0.053
æ
0.005
j=1
o
2.204
Nasals
Unvoiced
Velar
Voiced
Unaspirated
Aspirated
Unaspirated
Aspirated
k
3.322
kʰ
0.592
g
1.367
gʰ
0.514
ŋ
0.279
d͡ʒ
0.795
d͡ʒʰ
0.184
ɲ
0.290
d͡z 0.272
d͡zʰ
0.246
ɖ
1.008
ɖʰ
0.005
ɳ
1.595
d̪
1.370
d̪ʰ
0.456
n
4.403
kkʰ
0.003
t͡ʃ
2.930
Palatal
t͡ʃʰ
0.038
t͡ʃ t͡ʃʰ
0.018
ʈ
0.684
Alveolar
ʈʰ
0.604
ʈʈʰ
0.004
t̪
5.028
Dental
tʰ
0.238
d̪d̪ʰ
0.011
fi = 100 * ni / ∑ n j , where nj is the number of
times jth phoneme occurred in the database and N is the total
number of phonemes. N equals 70 in this case. We see each
phoneme cell shaded by one of three colors: white, yellow or
orange. 29 least frequent phonemes that together constitute 1%
of the accumulated count of 70 phonemes in the database are
shaded in orange. We call them as rare phonemes. The next 34
phonemes that occupy 49% of the database are shaded in
yellow. We call them as medium frequent phonemes. The final
seven phonemes which cover the remaining 50% are shaded in
white. We call them as frequent phonemes. We note that
geminated consonants (16) and vowels (4) occupy more than
half share of 29 rare and seven frequent phonemes
respectively.
e
4.630
Stops
N
defined as
u
1.793
aĩ
0.006
III. STATISTICAL ANALYSIS OF MARATHI UTTERANCES
In this section, we will see a few statistics related to the
acquired Marathi sentence database so as to understand its
advantages, shortcomings and also judge its usefulness for
different applications.
i
1.900
p
2.216
Labial
j
3.717
ʃ
0.994
f
0.206
r
4.356
s
2.774
b
0.873
l
3.795
ʂ
0.397
bʰ
0.450
v
2.744
ɭ
0.871
m
2.530
h
2.748
Rare geminated phonemes
t̪t̪
0.053
kk
0.038
ll
0.034
ʈʈ
0.020
pp
0.016
nn
0.016
dd
0.015
ɳɳ
0.011
cc
0.008
ss
0.002
ɖɖ
0.002
vv
0.002
bb
0.002
d͡ʒd͡ʒ
0.002
jj
0.001
d͡zʰd͡zʰ
0.001
It is interesting to compare the phonetic richness (relative frequency of rare phonemes) of the current Marathi sentence
database to that of an earlier work [8], where phonetically rich
sentences were culled out of a large general text corpus. Their
transcription had 50 labels corresponding to Devanagari alphabet. Instead, we are using 70 phonemes here where most
among the 20 extra phonemes correspond to rare geminated
sounds. Figure 1 shows a plot of the cumulative relative frequency (in log scale) of 70 phonemes in current database. We
compare the cumulative relative frequencies of the top 50 frequent phonemes in our database with those in [8]. We note
that the relative frequency (0.001) of 50th frequent phoneme
(the least frequent phoneme) in [8], matches with that of 50 th
frequent phoneme in our database. Also, the curve in Figure 1
shows the same trend as that in [8] for the top 50 phonemes.
Fig. 1.
Cumulative relative frequency of phonemes in database
B. Proportion of Filler Labels in Actual Transcriptions
We transcribed 14,662 utterances, corresponding to 3400
reference sentences, collected from around 1500 literate
speakers. The utterances contain 94,306 spoken words
(complete and incomplete) and 15,155 filler labels. So, the
proportion of filler labels in transcriptions is 13.8%.
The last two columns in Table III show the frequency
distribution of the 15,155 filler labels in actual transcriptions
among nine distinct fillers in both absolute count and
percentage form. We observe that ++bn++ (background noise)
tops the list by a huge margin. The reason is that all other filler
tags represent a definitive category of nonspeech sounds;
whenever an uncharacterizable or unclear nonspeech sound
that did not fit into one of the eight predefined categories of
fillers was observed, it was tagged as ++bn++.
C. Hints from Average Sentence and Word Length
Here, we first calculate average sentence length to get a feel
for number of words the speakers had to articulate per
sentence. Average sentence lengths may differ for reference
and actual transcriptions. Table V shows the average sentence
length of reference transcriptions and also actual transcriptions
(considering just complete words and considering both
complete and incomplete words). The reference sentence sets
contain 22,257 words spread across 3400 sentences. So,
average number of words in reference sentences equals 6.54.
The 14,662 actual transcriptions contain 93,589 complete
words and 717 incomplete words. Average number of words
in actual transcriptions considering only complete words
therefore equals 6.383, which increases to 6.432 when we
consider incomplete words as well.
TABLE V.
COMPARISON BETWEEN AVERAGE SENTENCE LENGTHS OF
REFERENCE AND ACTUAL TRANSCRIPTIONS
Type of sentence
transcriptions
Reference
transcriptions
Actual trans.
(containing only
complete words)
Actual trans.
(containing both
complete and
incomplete words)
Total
no. of
words
22257
Total no.
of
sentences
3400
Average
sentence
length
6.54
93589
14662
6.38
94306
14662
6.43
Comparison of average sentence lengths of reference and
actual transcriptions conveys that not all words in reference
sentences were read out by speakers. But, presence of large
number of incompletely spoken words indicates that speakers
may not have intentionally skipped whole words (to quickly
finish reading). They seem to have attempted to utter the
words, possibly unsuccessfully. In an attempt to find out the
reason why these many incomplete words had occurred, we
noticed that out of 717 incomplete words, 100 had occurred at
the start of the sentence and 215 words towards the end. This
possibly happened because in 100 instances, speakers started
reading the sentence before recording began. Similarly, in 215
instances, the speakers probably had not finished reading the
sentence when recording had stopped. As a precautionary
measure, we played a soft beep to hint the speakers that
recording has begun, and they can start reading a sentence.
But, it appears that in 315 out of 14,662 cases, speakers did
not notice/follow it.
We also calculated average word length (number of phonemes
per word) to know whether the words are short, moderate or
long. Using a preliminary pronunciation dictionary, we
mapped the 93,589 complete words in actual transcriptions to
669,513 phonemes. Thus, average word length comes out to
7.1 phonemes per word which conveys that words in the
database, similar to sentences, are neither short, nor long, but
moderate in length.
D. Occurrence of English Words in Marathi Sentences
We often see people using foreign words (words from other
languages) in conversations. The reason is, some foreign
words are popular than their domestic counterparts, and this
helps the speaker to effectively convey his/her idea, and also
the listener to quickly grasp it with minimal effort. The
reference sentences we distributed for reading out contain 60
unique foreign (English) words among 9383 unique total
words. In the 60 unique English words, there are 38 common
nouns, one proper noun, 15 verbs and six adjectives.
IV. APPLICATIONS
This database will be used to enable the existing speaker
independent ASR system, employed as a part of the AIS, to
infer keywords of interest from continuous queries. It can also
be used for other tasks such as speaker recognition, and for
building TTS system following statistical modeling approach.
While building an ASR system, it is essential to have enough
tokens of each phoneme for modeling their statistical
properties. But, as we saw in Section III, 29 phonemes have
occurred rarely in the database. We have two choices to
mitigate this problem of data scarcity. Either, we can discard
the utterances containing these rare sounds during training, or
map the rare phonemes to the acoustically similar and frequent
phonemes. Since, there are many phonemes which occurred
rarely, it may not be wise to discard the utterances where they
have occurred as that will mean losing speech data for other
sounds too. So, we choose to map them.
Table VI shows a mapping chart for the rare phonemes. We
classify them into five categories: geminated stop sounds,
geminated continuant sounds, diphthongs, nasalized vowels,
and stops followed by their aspirated counterparts.
TABLE VI.
MAPPING CHART
FOR
RARE PHONEMES
Rare
Geminated
stops
Map
to
Rare
Geminated
continuants
Map
to
t̪t̪
kk
ʈʈ
pp
dd
t͡ʃt͡ʃ
t̪
k
ʈ
p
d
t͡ʃ
ll
nn
ɳɳ
ss
vv
jj
ll
nn
ɳɳ
ss
vv
jj
ɖɖ
bb
d͡ʒd͡ʒ
d͡zʰd͡zʰ
ɖ
b
d͡ʒ
d͡zʰ
Rare
Diphthongs
and
nasalized
vowels
Map
to
ai
geminated stop to sequence of nongeminated counteraparts,
i.e. sequence of two (closure+release) portions, it will not be
correct. Hence, we map a geminated stop to its nongeminated
counterpart. Next, we map the rare diphthongs to sequence of
component vowels, nasalized vowel to the nonnasalized
vowel, and stops followed by their aspirated counterparts to
aspirated stops.
V. CONCLUSIONS
In this paper, we described the development of a multispeaker, narrowband, continuous speech database in Marathi
language whose primary objective is to enable the existing
ASR system, employed as a front end of an agricultural price
information delivery application, to infer keywords of interest
from continuous queries. Since majority of the speakers
participating in data collection had agrarian background, the
sentence level speech data collected from them contains
significant amount of speech disfluencies and incompletely
spoken words in addition to various background sounds. We
explained a method to label these events in the word level
transcriptions. A preliminary statistical analysis of the speech
database was carried out. This speech data can also be used for
other tasks such as speaker recognition, and for building
Speech synthesis system following statistical model approach.
ACKNOWLEDGMENT
We thank Department of Information Technology for
supporting this work. We also thank Pranav Jawale, Nikul
Prajapati, Pinki Kumawat, Jigar Gada from IIT Bombay, and
Namrata Karkera, Nikita Acharekar, Joel Mathias, Tauseef
Hussain and Vishal Khadake from TIFR Mumbai, without
whom this work would not have been possible.
REFERENCES
[1]
[2]
Stops
followed
by their
aspirated
counterparts
Map
to
ai
kkʰ
kʰ
[4]
au
au
ʈʈʰ
ʈʰ
[5]
æ
e
d̪dʰ̪
d̪ʰ
aĩ
ai
t͡ʃ t͡ʃʰ
t͡ʃʰ
ã
a
ʌ̆
ʌ
[3]
[6]
[7]
[8]
Stop sounds (geminated or nongeminated) contain a closure
portion followed by one burst (release). On the contrary,
continuant sounds don’t contain sequential parts, and are
produced by incomplete closure of vocal tract which allows
continuous passage of air. So, we map a geminated continuant
to sequence of nongeminated counterparts. If we map a
Department of Information Technology (DIT), http://www.mit.gov.in/
Tejas Godambe and Samudravijaya K, “Speech data acquisition for
voice based agricultural information retrieval”, presented at the 39th All
India DLA Conference, Punjabi University, Patiala, 14-16th June 2011.
Administrative divisions in state of Maharashtra,
https://www.maharashtra.gov.in/Site/Common/District.aspx
Indic Language Transliteration Tool,
http://ravi.iiit.ac.in/~speech/Transliteration/
L. Prahallad, K. Prahallad, and G. Madhavi, "A simple approach for
building transliteration editors for Indian languages", Journal of
Zhejiang University Science, vol. 6A, no. 11, pp. 1354-1361, 2005.
Transcription guidelines,
http://speech.tifr.res.in/asrProject/transcription/transcriptionGuidelines_
v1.2.doc
Carnegie Mellon University's Sphinx ASR tool,
http://cmusphinx.sourceforge.net/
Samudravijaya K and Mandar R Gogate, “Marathi speech database”,
Proc. of Int. Symp. on Speech Technology and Processing Systems and
Oriental COCOSDA-2006, Penang, Malaysia, pp. 21-24
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising