Audio Content Analysis

Burred, Juan José; Haller, M.; Jin, S.; Samour, Amjad; Sikora, Thomas

doi:10.1007/978-1-84800-076-6_5

Cited by 8 publications

(2 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Such metrics have often been augmented with temporal information, which was found to improve the robustness of content identification [17,18]. Common modeling of temporal dynamics also ranged from simple summary statistics such as onsets, attack time, velocity, acceleration and higher-order moments to more sophisticated statistical temporal modeling using Hidden Markov Models, Artificial Neural Networks, Adaptive Resonance Theory models, Liquid State Machine systems and Self-Organizing Maps [19,20]. Overall, the choice of features was very dependent on the task at hand, the complexity of the dataset, and the desired performance level and robustness of the system.…”

Section: Introductionmentioning

confidence: 99%

Correction: Music in Our Ears: The Biological Bases of Musical Timbre Perception

Patil¹,

Pressnitzer²,

Shamma³

et al. 2013

PLoS Comput Biol

View full text Add to dashboard Cite

Timbre is the attribute of sound that allows humans and other animals to distinguish among different sound sources. Studies based on psychophysical judgments of musical timbre, ecological analyses of sound's physical characteristics as well as machine learning approaches have all suggested that timbre is a multifaceted attribute that invokes both spectral and temporal sound features. Here, we explored the neural underpinnings of musical timbre. We used a neuro-computational framework based on spectro-temporal receptive fields, recorded from over a thousand neurons in the mammalian primary auditory cortex as well as from simulated cortical neurons, augmented with a nonlinear classifier. The model was able to perform robust instrument classification irrespective of pitch and playing style, with an accuracy of 98.7%. Using the same front end, the model was also able to reproduce perceptual distance judgments between timbres as perceived by human listeners. The study demonstrates that joint spectro-temporal features, such as those observed in the mammalian primary auditory cortex, are critical to provide the rich-enough representation necessary to account for perceptual judgments of timbre by human listeners, as well as recognition of musical instruments.

show abstract

Section: Introductionmentioning

confidence: 99%

Correction: Music in Our Ears: The Biological Bases of Musical Timbre Perception

Patil¹,

Pressnitzer²,

Shamma³

et al. 2013

PLoS Comput Biol

View full text Add to dashboard Cite

show abstract

“…Στη βιβλιογραφία των μεθόδων γνώσης, η κυρίαρχη αντιμετώπιση της ανάλυσης πολυμέσων είναι μέσω της κατάτμησης, που μπορεί να είναι χωρική (spatial segmentation) [58], χωρο-χρονική (spatio-temporal segmentation) [56] κατά ομιλητή στον ήχο (speaker segmentation) [98]…”

Section: τοποθέτηση και συμβολή της εργασίαςunclassified

Ανάλυση Πολυμέσων Με Χρήση Γνώσης

Φαλελάκης¹

View full text Add to dashboard Cite

This thesis introduces tools for the semantic analysis of multimedia documents based on prior knowledge and its main goal is to turn the computational complexity into a controllable parameter of such systems. Entities are divided into (i) directly measurable quantities (syntactic entities) and (ii) high-level concepts, closer to human perception (semantic entities) and organized within a hierarchical fuzzy model. Moreover, appropriate metrics for quantifying the semantic search procedure and its results are proposed. The methodology is equipped with Inference mechanisms that fit various scenarios, while appropriate methods for computing the fuzzy weights of the knowledge model are also described. Although the proposed expressivity is limited w.r.t. Description Logics, it is fully adequate and compatible with the way classifiers treat multimedia documents. On these grounds, this thesis combines the results of other measurement methods (e.g. classifiers), by using a knowledge model that does not require complicated computations during inference. This can be achieved because the truth factors of the entities under examination are computed using closed mathematical expressions that stem directly from knowledge, eliminating the need for ABox reasoning. Furthermore, through the proposed methodology, semantic search can be efficiently used under any restrictions posed by computational complexity, by selecting optimal subsets of the available measurements. The subset selection problem is efficiently solved using dynamic programming, minimizing the extra computational burden it may pose. Experiments demonstrate that the proposed method can achieve very good accuracy while searching for and retrieving new entities, together with improving the scores given by existing classifiers. This method can be adapted to various domains/datasets through a process of fuzzy weight re-computation. An extra application scenario is presented, where the mathematical tools provided here are used for software agent evaluation. Finally, we theoretically prove that, even in the case of a more expressive language, the execution of a fuzzy tableau algorithm on the measurements (i.e. using the instantiated ABox) yields results identical with the ones our method can achieve using closed-form mathematical expressions. Corresponding experiments illustrate the virtues of this language, while also indicate that the performance of our methodology can be estimated using the development set.

show abstract