Leveraging Wikipedia knowledge to classify multilingual biomedical documents

García, Marcos Antonio Mouriño; Rodríguez, Ramón López; Anido‐Rifón, Luis E.

doi:10.1016/j.artmed.2018.04.007

Cited by 12 publications

(11 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• It can also be clearly seen that the performance metrics are higher when enriching the tweets representations with the features extracted from the text of the tweets by using Wikipedia Miner, achieving F1-score improvements up to 13% for the Random Forest algorithm and up to 22.23% for the CART algorithm. This is a clear evidence that the knowledge contained in Wikipedia provides very relevant information to the classifier, thus improving its performance, which is in line with what was stated in previous studies [14]- [16]. • Finally, after the analysis of the results presented, we concluded that the best option for this particular case is the CART algorithm, since it shows performance values similar to Random Forests with significantly lower training and classification times.…”

Section: ) Classifier Results and Analysissupporting

confidence: 89%

“…Wikipedia Miner is a general purpose semantic annotator based on natural language processing, machine learning techniques, and the use of Wikipedia as background knowledge. This approach has been successfully applied in previous studies for the classification of, among others, biomedical documents [14], documents of legal nature [15], and news [16]. The main characteristics of Wikipedia Miner are: 1) It identifies concepts that appear in documents, thus avoiding the generation of irrelevant features; 2) it performs word sense disambiguation, thus tackling synonymy and polysemy problems; 3) it links the extracted concepts from documents to Wikipedia entries; and 4) it assigns a weight to each extracted concept according to its relevance in the text.…”

Section: A Document (Tweet) Representationmentioning

confidence: 99%

See 1 more Smart Citation

Detection of Barriers to Mobility in the Smart City Using Twitter

et al. 2020

View full text Add to dashboard Cite

We present a system that analyzes data extracted from the microbloging site Twitter to detect the occurrence of events and obstacles that can affect pedestrian mobility, with a special focus on people with impaired mobility. First, the system extracts tweets that match certain predefined terms. Then, it obtains location information from them by using the location provided by Twitter when available, as well as searching the text of the tweet for locations. Finally, it applies natural language processing techniques to confirm that an actual event that affects mobility is reported and extract its properties (which urban element is affected and how). We also present some empirical results that validate the feasibility of our approach.

show abstract

Section: ) Classifier Results and Analysissupporting

confidence: 89%

Section: A Document (Tweet) Representationmentioning

confidence: 99%

Detection of Barriers to Mobility in the Smart City Using Twitter

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The Wikimedia Foundation maintains a large set of multilingual data in the form of articles across multiple Wikipedia projects. Garcia et al, previously used the interlanguage links of Wikipedia to apply concept mapping in their efforts to develop a classifier for multilingual biomedical documents [18].…”

Section: Discussionmentioning

confidence: 99%

Towards Multi-Lingual Pneumonia Research Data Collection Using the Community-Acquired Pneumonia International Cohort Study Database

Mattingly¹,

Buckner²,

Pena³

2019

JRI

View full text Add to dashboard Cite

Sharing the results of clinical research continues to play a large role in scientific discovery [1], and more and more funders and publishers are encouraging investigators to adopt sustainable means to share the data used along with their results. For ad-hoc studies the structure of data is not crucial beyond the needs of accurate analysis. However, when data sharing and reproducibility of research are important, time and effort must be invested to structure a dataset to be efficiently combined with other datasets. Efficient means of integrating data from different sources are needed to leverage the full benefits of data sharing because new research questions can be examined using integrated, multi-source data. Studies that are international in scope can provide large, diverse samples of patient populations, but require investment in translation, and the appropriate ontological structuring of data. In this paper we describe the process of extending the Community Acquired Pneumonia Organization (CAPO) clinical research database to support data collection for multiple languages. After English, Spanish is the most common spoken language of members of the Community Acquired Pneumonia Organization with almost 40% of member sites being in Spanish speaking countries. Starting with Spanish we establish a general multi-language workflow for data entry into the CAPO database, with the eventual goal of supporting all CAPO member languages. Methods The study database for CAPO currently resides in a web-hosted instance of the REDCap electronic data capture software. REDCap is a secure web application for building and managing online surveys and databases. Members of CAPO access the REDCap instance remotely from its web URL https:// id.research.louisville.edu/capo using a web browser and their assigned user credentials. Demographic and clinical history information can then be entered for new cases in the database.

show abstract

“…Wikipedia and Wikipedia Miner have been used in many fields such as automatic topic indexing [14], document clustering [15], document summarization [16], the classification of multilingual biomedical documents [17], converting concept-based representations of documents from one language to another [18], identifying the prerequisite relationships among learning objects [19], classifying news articles [20], evaluating and classifying Open Educational Resources (OERs) and OpenCourseware (OCW) based on quality criteria [21], and for group recommendation by combining topic identification and social networks [22].…”

Section: Literature Reviewmentioning

confidence: 99%

Topic Extraction and Interactive Knowledge Graphs for Learning Resources

et al. 2021

View full text Add to dashboard Cite

Humanity development through education is an important method of sustainable development. This guarantees community development at present time without any negative effects in the future and also provides prosperity for future generations. E-learning is a natural development of the educational tools in this era and current circumstances. Thanks to the rapid development of computer sciences and telecommunication technologies, this has evolved impressively. In spite of facilitating the educational process, this development has also provided a massive amount of learning resources, which makes the task of searching and extracting useful learning resources difficult. Therefore, new tools need to be advanced to facilitate this development. In this paper we present a new algorithm that has the ability to extract the main topics from textual learning resources, link related resources and generate interactive dynamic knowledge graphs. This algorithm accurately and efficiently accomplishes those tasks no matter how big or small the texts are. We used Wikipedia Miner, TextRank, and Gensim within our algorithm. Our algorithm’s accuracy was evaluated against Gensim, largely improving its accuracy. This could be a step towards strengthening self-learning and supporting the sustainable development of communities, and more broadly of humanity, across different generations.

show abstract

Leveraging Wikipedia knowledge to classify multilingual biomedical documents

Cited by 12 publications

References 19 publications

Detection of Barriers to Mobility in the Smart City Using Twitter

Detection of Barriers to Mobility in the Smart City Using Twitter

Towards Multi-Lingual Pneumonia Research Data Collection Using the Community-Acquired Pneumonia International Cohort Study Database

Topic Extraction and Interactive Knowledge Graphs for Learning Resources

Contact Info

Product

Resources

About