Since 2017, the goal of the two-million song WASABI database has been to build a knowledge graph linking collected metadata (artists, discography, producers, dates, etc.) with metadata generated by the analysis of both the songs' lyrics (topics, places, emotions, structure, etc.) and audio signal (chords, sound, etc.). It relies on natural language processing and machine learning methods for extraction, and semantic Web frameworks for representation and integration. It describes more than 2 millions commercial songs, 200K albums and 77K artists. It can be exploited by music search engines, music professionals (e.g. journalists, radio presenters, music teachers) or scientists willing to analyze popular music published since 1950. It is available under an open license, in multiple formats and with online and open source services including an interactive navigator, a REST API and a SPARQL endpoint.
We present the WASABI 1 Song Corpus, a large corpus of songs enriched with metadata extracted from music databases on the Web, and resulting from the processing of song lyrics and from audio analysis. More specifically, given that lyrics encode an important part of the semantics of a song, we focus here on the description of the methods we proposed to extract relevant information from the lyrics, such as their structure segmentation, their topics, the explicitness of the lyrics content, the salient passages of a song and the emotions conveyed. The corpus contains 1.73M songs with lyrics (1.41M unique lyrics) annotated at different levels with the output of the above mentioned methods. The corpus labels and the provided methods can be exploited by music search engines and music professionals (e.g. journalists, radio presenters) to better handle large collections of lyrics, allowing an intelligent browsing, categorization and recommendation of songs. We demonstrate the utility and versatility of the WASABI Song Corpus in three concrete application scenarios. Together with the work on the corpus, we present the work achieved to transition the dataset into a knowledge graph, the WASABI RDF Knowledge Graph, and we show how this will enable an even richer set of applications.
This paper focuses on the fundamental role played by annotations to support provenance analysis in visual exploration processes of large datasets. Particularly, we investigate the use of annotations during the visual exploration of semantic datasets assisted by chained visualization techniques. In this paper, we identify three potential uses of annotations: (i) documenting findings (including errors in the dataset), (ii) supporting collaborative reasoning among teammates, and (iii) analysing provenance during the exploratory process. To demonstrate the feasibility of our approach, we implemented it as a tool support, while illustrating its usage and effectiveness through a series of use case scenarios. We identify the attributes and meta-data that describe the dependencies between annotations and visual representations, and we illustrate these dependencies through a domain-specific model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.