Klink-2: Integrating Multiple Web Sources to Generate Semantic Topic Networks

Osborne, Francesco; Motta, Enrico

doi:10.1007/978-3-319-25007-6_24

Cited by 57 publications

(94 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a reference topic ontology, we used the Computer Science Ontology (CSO), created to represent topics in the Rexplore system [3], which is currently being trialled by Springer Nature to classify proceedings in the field of Computer Science [17], such as the well-known LNCS series. CSO was created by applying the Klink-2 algorithm [18] to the 16 million publications of our Scopus-derived dataset [3]. The Klink-2 algorithm combines semantic technologies, machine learning and knowledge from external sources (e.g., DBpedia, calls for papers, web pages) to automatically generate a fully populated ontology of research areas, which uses the Klink data model 10 .…”

Section: Input Knowledge Basesmentioning

confidence: 99%

“…However, as extensively discussed in previous works [3,17,18], these solutions ignore the rich network of semantic relationships between research topics and are often unable to distinguish research areas from other terms that may be used to annotate publications. Therefore, we exploit the topic ontology by associating to each paper i) all the concepts in CSO whose label is found either in the title, the abstract or the keyword set, as well as ii) all skos:broaderGeneric and iii) all relatedEquivalent areas of the initial set of topics extracted from the scholarly dataset.…”

Section: Generation Of Technology-topic Matricesmentioning

confidence: 99%

See 1 more Smart Citation

Forecasting the Spreading of Technologies in Research Communities

Osborne

Mannocci

Motta

2017

Proceedings of the Knowledge Capture Conference

Self Cite

View full text Add to dashboard Cite

Technologies such as algorithms, applications and formats are an important part of the knowledge produced and reused in the research process. Typically, a technology is expected to originate in the context of a research area and then spread and contribute to several other fields. For example, Semantic Web technologies have been successfully adopted by a variety of fields, e.g., Information Retrieval, Human Computer Interaction, Biology, and many others. Unfortunately, the spreading of technologies across research areas may be a slow and inefficient process, since it is easy for researchers to be unaware of potentially relevant solutions produced by other research communities. In this paper, we hypothesise that it is possible to learn typical technology propagation patterns from historical data and to exploit this knowledge i) to anticipate where a technology may be adopted next and ii) to alert relevant stakeholders about emerging and relevant technologies in other fields. To do so, we propose the Technology-Topic Framework, a novel approach which uses a semantically enhanced technology-topic model to forecast the propagation of technologies to research areas. A formal evaluation of the approach on a set of technologies in the Semantic Web and Artificial Intelligence areas has produced excellent results, confirming the validity of our solution.

show abstract

Section: Input Knowledge Basesmentioning

confidence: 99%

Section: Generation Of Technology-topic Matricesmentioning

confidence: 99%

Forecasting the Spreading of Technologies in Research Communities

Osborne

Mannocci

Motta

2017

Proceedings of the Knowledge Capture Conference

Self Cite

View full text Add to dashboard Cite

show abstract

“…It takes as input the IDs, the titles and the abstracts of a number of research papers in the Scopus dataset 6 and a variety of knowledge bases (DBpedia [12], WordNet [15], the Klink-2 Computer Science ontology [16], and others) and returns an OWL ontology describing a number of technologies and their related research entities. These include: 1) the authors who most published on it, 2) related research areas, 3) the publications in which they appear, and, optionally, 4) the team of authors who introduced the technology and 5) the URI of the related DBpedia entity.…”

Section: Techminermentioning

confidence: 99%

“…The Klink-2 Computer Science Ontology (CSO) is a very large ontology of Computer Science that was created by running the Klink-2 algorithm [16] on about 16 million publications in the field of Computer Science extracted from the Scopus repository. The Klink-2 algorithm combines semantic technologies, machine learning and external sources to generate a fully populated ontology of research areas.…”

Section: Background Datamentioning

confidence: 99%

“…Our intention was not to create 'yet another ontology' of the scholarly domain, but to craft a simple scheme for representing our output. For this reason we reused concepts and relationships from a number of well-known scholarly ontologies (including FABIO [22], FOAF 16 , CITO, SKOS, SRO 17 , FRBR 18 ) and introduced new entities and properties only when necessary. The main classes of the TechMiner OWL ontology are Technology, foaf:Person, to represent the researchers associated to the technology, Topic (equivalent to frbr:concept and skos:concept) and Category, representing the category of the technology (e.g., application, format, language).…”

Section: Triple Generationmentioning

confidence: 99%

See 1 more Smart Citation

TechMiner: Extracting Technologies from Academic Publications

Osborne

Ribaupierre

Motta

2016

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture 'standard' scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others. TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.

show abstract

GitRanking: A ranking of GitHub topics for software classification using active sampling

et al. 2023

View full text Add to dashboard Cite

ContextGitHub is the world's most prominent host of source code, with more than 327M repositories. However, most of these repositories are not labelled or inadequately, making it harder for users to find relevant projects. Various proposals for software application domain classification over the past years have been proposed. However, these several of those approaches suffer from multiple issues, called antipatterns of software classification, that reduce their usability.ObjectiveIn this paper, we propose a new taxonomy in the GitHub ecosystem, called GitRanking, starting from a well‐structured data set, composed of curated repositories annotated with topics. The main objective is to create a baseline methodology for software classification that is expandable, hierarchical, grounded in a knowledge base, and free of antipatterns.MethodWe collected 121K topics from GitHub and used GitRanking to create a taxonomy of 301 ranked application domains. GitRanking (1) uses active sampling to ensure a minimal number of annotations to create the ranking; and (2) links each topic to Wikidata, reducing ambiguities and improving the reusability of the taxonomy. Furthermore, we adopt the conceived taxonomy in a classification task by considering a state‐of‐the‐art classifier.ResultsOur results show that GitRanking can effectively rank terms in a hierarchy according to how general or specific their meaning is. Furthermore, we show that GitRanking is a dynamically extensible method: it can currently accept further terms to be ranked, and with a minimum number of annotations (). Concerning the classification task, we show that the model achieves an F1‐score of 34%, with a precision of 54%.ConclusionThis paper is the first collective attempt at building a ground‐up taxonomy of software domains. Our vision is that our taxonomy, and its extensibility, can be used to better and more precisely label software projects.

show abstract

Klink-2: Integrating Multiple Web Sources to Generate Semantic Topic Networks

Cited by 57 publications

References 17 publications

Forecasting the Spreading of Technologies in Research Communities

Forecasting the Spreading of Technologies in Research Communities

TechMiner: Extracting Technologies from Academic Publications

GitRanking: A ranking of GitHub topics for software classification using active sampling

Contact Info

Product

Resources

About