meter detection from audio for indian music

meter detection from audio for indian music
CMMR/FRSM 2011 - 9-12 March, 2011, Utkal University, Bhubaneswar
Sankalp Gulati, Vishweshwara Rao and Preeti Rao
Department of Electrical Engineering
Indian Institute of Technology Bombay, Mumbai 400076, India
{sankalpg, vishu, prao}
The meter of a musical excerpt provides high-level rhythmic information and is crucial to many Music
Information Retrieval tasks. We investigate the use of a computationally efficient approach to
metrical analysis based on psycho-acoustically motivated decomposition of the audio signal. A twostage comb filter-based approach, originally proposed for double/ triple meter estimation, is extended
to a septuple meter (such as 7/8 time-signature) and its performance evaluated on a sizable Indian
music database. We find that this system works well for Indian music and the distribution of musical
stress/accents across a temporal grid can be utilized to obtain the metrical structure of audio
Keywords: Meter detection, Indian music, complex meter, comb filtering.
1. Introduction
Music typically comprises of events occurring at regular times. Meter is a hierarchical temporal
framework consisting of pulses at different levels (time-scales), where pulses represent regularly
occurring musical events [1]. Perception of meter is an innate cognitive ability in humans. Meter
provides useful rhythmic information essential in understanding musical structure and is useful in
various MIR applications like similarity based music classification [2], beat tracking and tempo
estimation of music [3]. In this study we investigate automatic meter detection for Indian music.
1.1 Previous work on meter detection
Considerable research has been directed towards extraction of low-level rhythmic information like
onset detection and beat tracking [4]. However, less attention has been paid to higher-level metrical
analysis. Most of the earlier work on meter analysis concentrated on symbolic data (MIDI). The
system proposed by Goto and Muraoka [8] is considered as being the first to achieve a reasonable
accuracy for the meter analysis task on audio signal. Their system was based on agent based
architecture, tracking competing meter hypotheses and operated in real time. Meter detection requires
tempo independent information about the rhythmic structure. And hence tempo normalization
becomes a crucial stage in the meter detection system. In the approach proposed by Gouyon and
Herrera [9] the beat indices are manually extracted and then an autocorrelation function, computed on
chosen low level features (energy flux, spectral flatness, energy in upper half of the first bark band) is
used to detect meter type. Also, in this approach the meter detection problem was simplified by
restricting the result to double (2/4, 4/4) and triple (3/4, 6/8) meter. Metrical analysis of non-Western
music using the scale transform for the tempo normalization is proposed by Holzapfel and
Stylianou [2]. A more detailed description of previous work on meter analysis from audio can be
found in [1].
1.2 Meter in Indian music
Meter, from a perspective of Indian music, is discussed in depth by Clayton [10]. Rhythmic
organization in Indian Classical Music is described by the Tāl system [10]. Tal can be viewed as a
hierarchical structure organized on three temporal levels, the smallest time unit ‘matra’, the section
‘vibhag’ and the complete rhythmic cycle ‘avart’. Matra may be interpreted as the beat in most cases.
Automatic metrical analysis from audio of Indian music is a relatively unexplored area despite the
well established Tal framework of rhythmic organization. There are multiple Tals containing a given
number of beats in a rhythmic cycle but which differ from each other in terms of sectional divisions
and distribution of stressed/unstressed beats. In the current work we do not discriminate between the
CMMR/FRSM 2011 - 9-12 March, 2011, Utkal University, Bhubaneswar
different possible sectional structures within a cycle but restrict ourselves to obtaining higher metrical
level information by mapping the number of beats in a cycle to a meter type. This is similar to
considering 3/4 and 6/8 metrical structure to both belong to triple meter [11].
In the current work, we implement the meter detection system proposed by Schuller, Eyben, and
Rigoll [11] in which the tatum duration is extracted to establish the temporal grid on which metrical
analysis is then implemented. Tatum can be defined as that regular time division which coincides
most highly with all notes onsets [12]. This approach does not explicitly use any knowledge about the
note onsets, beat positions or downbeat locations. We evaluate the above system on a previously used
database of ballroom dance music and also a new database of Indian music. The latter, in addition to
songs having double or triple meter, also includes songs in a complex meter, in this case septuple
meter (7 beats in a cycle).
2. System implementation
The meter detection system is described in Figure 1. The method relies on finding the tatum duration
and how well the integer multiple of this duration resonates with the sizable segment of the song. We
follow the implementation procedure described in [11]. As can be seen in Figure 1 whole system can
be divided into three stages. The implementation of each of these stages is described next.
Figure 1 (a) Block diagram of meter detection system, (b) Comb filter bank
2.1 Pre-processing
The input audio signal is down sampled to 16 kHz and converted to mono channel. The data is split
into 32 ms frames with a hop size of 5 ms and corresponding frame rate of 200 Hz. A Hamming
window is applied to each frame and a 512-point FFT is computed. By using 12 overlapping
triangular filters, equidistant on the Mel-Frequency scale, these DFT frequency bins are reduced to 12
non-linear frequency bands. The band envelope is then converted to log scale (dB) and low pass
filtered by convolving with a half-wave raised cosine filter of length 15 frames (75 ms). This filters
out the noise and high frequencies in the envelope signal without diminishing the fast transient
attacks. From this envelope a weighted differential dwtd is computed according to
d wtd (i) = (oi − oi ,l ) ⋅ oi ,r
where, is the sample at position (frame) i, ̅, is the moving average over one window of 10
samples to the left of the sample and ̅, of the window of 20 samples to the right of the sample .
2.2 Tatum extraction
Tatum can be defined as that regular time division which coincides most highly with all notes onsets
[12]. It is the lowest metrical level of the song. The tatum extraction method used by [11] uses a comb
filter bank based approach, originally proposed by Scheirer [13]. The comb filter bank is implemented
with delay varying from 0.1-0.6 sec consisting of 100 filters. The filter bank processes the extracted
differential signal for each Mel band and the total energy over all bands of the output of each filter is
CMMR/FRSM 2011 - 9-12 March, 2011, Utkal University, Bhubaneswar
computed. These values for each comb filter forms the tatum vector. The location of the maximum
peak of this function is the delay corresponding to the tatum duration.
2.3 Meter extraction
The meter vector is computed from the extracted differential signal by setting up narrow comb
filter banks around integer multiples of tatum duration. The number of comb filters per filter bank is
equal to twice the integer multiple of the tatum duration plus one to compensate for the round off
factor of the tatum duration. For each filter bank that filter with the highest output energy is selected
and the total energy of this filter over all Mel bands is taken as the salience value in the meter vector
at the position of the integer multiple. In the current implementation multiples from 1-19 are
considered. An example of meter vectors for different meters is shown in Figure 2.
S 2 = [ m(4) + m(8) + m(16)] ⋅ 13
S3 = [ m(3) + m(6) + m(9) + m(18)] ⋅ 14
S 7 = [m(7) + m(14)] ⋅ 12
The final meter value is determined from using a simple rule based approach. For each possible
meter i.e. double, triple and septuple, we calculate a salience value as in Eq. 2, 3, 4 respectively. The
maximum of S2, S3, S7 determines the final meter of the song.
20 0
Multiple of tatum
20 0
Figure 2 Meter vector for (a) double meter, (b) triple meter and (c) septuple meter
3. Experimental evaluation
3.1 Database
We have used two databases in the evaluation of the above system. The first is the well-known
ballroom dance database containing 698 30-sec duration audio clips [14]. The audio is categorized by
8 different ballroom dance styles (Jive, Quickstep, Tango, Waltz, Viennese Waltz, Samba, Cha cha
cha and Rumba). Each of these styles belongs to either double or triple meter category. We have
annotated them as such. The total duration of this database is 5 hrs 49 min.
The second database includes 620 30-sec duration audio clips from Indian film songs. Most of the
songs from old Indian films tend to rigidly follow the tal framework and use mostly acoustic
instruments whereas the songs from new movies also contain drum loops and electronic
instrumentation. In this database we have included an equal number of popular songs from both old as
well as new films. These audio clips belong to three different metrical structures most commonly
found in Indian film music. 470 clips belong to double meter (4/4, 2/4 time signature), 109 triple
meter (3/4, 6/8) and 41 follow septuple meter (7/8 time signature). The total duration of the database
is 5 hrs 10 min. The ground truth meter values for the database have been annotated by the authors.
3.2 Evaluation and results
The performance accuracy of the meter detection system for both databases Indian music database
(IMDB) and ballroom dance database (BDDB) is summarized in Table 1 in the form of a confusion
matrix. It is to be noted that although database 1 did not have any audio clips in the septuple meter
category, this category was still included as a possible output of the meter detection system.
Removing this category from the system naturally increases system accuracy for this dataset.
CMMR/FRSM 2011 - 9-12 March, 2011, Utkal University, Bhubaneswar
We note that although the overall accuracies for both datasets are quite high, the performance of the
system for the triple meter for both datasets is quite low. The performance for the double meter, for
both databases, and the complex meter (septuple), for database 2, are equally high. The overall
accuracy for the meter extraction over both databases is 87.1%.
Table 1 Confusion matrix for meter detection and performance accuracies
for both the databases BDDB and IMDB
Double Triple Septuple Accuracy (%)
3.3 Discussion
As seen from Table 1 the maximum number of errors is encountered in the detection of triple meter,
specifically large confusion between triple and double meters occurs for both datasets. An analysis of
these erroneous cases revealed that for many songs the error was due to incorrect estimation of tatum
duration. Such errors in tatum estimation for triple meter songs are more often found to occur in songs
with fast tempo. Here periodicity at metrical levels higher than the tatum, such as half-rhythm cycle,
fall within the search range of tatum delays (0.1-0.6 sec). Peaks in the tatum vector at such locations
have comparable salience as the true tatum durations leading to incorrect tatum estimation.
This incorrect tatum estimation leads to incorrect meter detection. Consider the example in Figure 3
which displays meter vectors for a triple-meter song for which meter was incorrectly detected as
double. Figure 3.a. and Figure 3.b. display the meter vectors computed from an incorrectly estimated
tatum duration value (half-cycle duration) and the true tatum duration respectively. Clearly the
salience of the double meter (S2) is high for figure 3.a. and that of the triple meter (S3) is high for
Figure 3.b. This is a result of analyzing the song structure at a larger time-scale, where the double
meter dominates, as opposed to a smaller time-scale, where the triple meter dominates.
20 0
Multiple of tatum
Figure 3 Meter vector of a song from Indian music dataset in triple meter
(a) Incorrectly estimated tatum (b) true tatum duration
4. Conclusion and future work
The high overall accuracy 87.1% reveals that this approach for accent extraction from audio is quite
successful in automatic detection of the meter for two culturally distinct datasets – ballroom and
Indian music. The approach which was originally proposed for simple meters like double and triple
can also be used in determining complex meters like septuple. However confusions between triple and
double meters are found to occur resulting from incorrect tatum duration estimation. These errors
need to be analyzed in detail and corrected. We also intend to test the above system on other complex
CMMR/FRSM 2011 - 9-12 March, 2011, Utkal University, Bhubaneswar
meters like 5/8 and 9/8 and also utilize this system for detecting tals in Indian music, since these can
often be categorized on the basis of time signature.
5. References
P. Klapuri, A. J. Eronen, and J. T. Astola, “Analysis of the meter of acoustic musical signals,”
IEEE Transactions on Acoustics Speech and Signal Processing, 14(1):342–355, 2006.
A. Holzapfel and Y. Stylianou “Rhythmic similarity in traditional turkish music,” in
Proceedings of International Conference on Music Information Retrieval (ISMIR), 2009.
S. Gulati, P. Rao, “Rhythm Pattern Representation for Tempo Detection in Music”, in
proceedings of the First International Conference on Intelligent Interactive Technologies and
Multimedia, Dec, 2010, Allahabad, India.
S. Dixon, “Onset Detection Revisited,” in Proceedings of the International Conference on
Digital Audio Effects (DAFx’06), Montreal, Canada, 2006.
A. Klapuri, “Sound Onset Detection by Applying Psychoacoustic Knowledge,” in Proceedings
of the IEEE International Conference on Acoustics, Speech, and Signal Processing, March
S. Dixon, “Automatic extraction of tempo and beat from expressive performances,” J. New
Music Res., vol. 30, no. 1, pp. 39–58, 2001.
D. P. Ellis, “Beat tracking by dynamic programming” J. New Music Res., vol. 36, no. 1, pp.
51–60, 2007.
M. Goto and Y. Muraoka, “Music Understanding At The Beat Level Real-time Beat Tracking
For Audio Signals,” in Proceedings of IJCAI-95 Workshop on Computational Auditory Scene
Analysis, page 6875, 1995.
F. Gouyon and P. Herrera, “Determination of the meter of musical audio signals: Seeking
recurrences in beat segment descriptors,” 114th Audio Engineering Society Convention, March
M. Clayton: Time in Indian music: rhythm, metre, and form in North Indian rãg performance,
Oxford University press Inc., New York, 2000.
B. Schuller, F. Eyben, and G. Rigoll, “Fast and robust meter and tempo recognition for the
automatic discrimination of ballroom dance styles,” in Proceedings of the IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP ’07), pp. 217–220, Honolulu,
Hawaii, USA, April 2007.
F. Gouyon, “A computational approach to rhythm description – Audio features for the
computation of rhythm periodicity functions and their use in tempo induction and music
content processing,” PhD dissertation, Music Technology Group, PompeuFabra University,
Scheirer, E. 1998. Tempo and beat analysis of acoustic musical signals. J.Acoust. Soc. Amer.,
vol. 103, no. 1, pp. 588–601.
Gouyon, F., Dixon, S., Pampalk, E., Widmer, G. 2004. Evaluating rhythmic descriptors for
musical genre classification, In Proceedings. AES 25th Int. Conf., New York, 2004, pp. 196–
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF