Increasingly, the web produces massive volumes of texts, alone or associated with images, videos, photographs, together with some metadata, indispensable for their finding and retrieval. Keywords/keyphrases that characterize the semantic content of documents should be, automatically or manually, extracted, and/or associated with them. The paper presents a novel method to address the problem of the automatic unsupervised extraction of keywords/phrases from texts, expressed both in English and in Italian. The main feature of this approach is the integration of two methods that have given interesting results: word embedding models, such as Word2Vec or GloVe able to capture the semantics of words and their context, and clustering algorithms, able to identify the essence of the terms and choose the more significant one(s), to represent the contents of a text. In the paper, the datasets used are presented, together with the method implemented and the results obtained. These results will be discussed, commented, and compared with those obtained in previous experimentations, using TextRank, Rapid Automatic Keyword Extraction (RAKE), and TF-IDF.
Cultural heritage inventories have been created to collect and preserve the culture and to allow the participation of stakeholders and communities, promoting and disseminating their knowledges. There are two types of inventories: those who give data access via web services or open data, and others which are closed to external access and can be visited only through dedicated web sites, generating data silo problems. The integration of data harvested from different archives enables to compare the cultures and traditions of places from opposite sides of the world, showing how people have more in common than expected. The purpose of the developed portal is to provide query tools managing the web services provided by cultural heritage databases in a transparent way, allowing the user to make a single query and obtain results from all inventories considered at the same time. Moreover, with the introduction of the ICH-Light model, specifically studied for the mapping of intangible heritage, data from inventories of this domain can also be harvested, indexed and integrated into the portal, allowing the creation of an environment dedicated to intangible data where traditions, knowledges, rituals and festive events can be found and searched all together.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.