Identifying medical terms in patient-authored text: a crowdsourcing-based approach

MacLean, Diana; Heer, Jeffrey

doi:10.1136/amiajnl-2012-001110

Cited by 66 publications

(53 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MTurk has also been used for evaluating biomedical informatics research. For instance, Maclean and Heer [31] presented crowdsourcing patient-authored text medical word identification tasks to MTurk non-experts and achieved results that were comparable in quality to those achieved by medical experts. Therefore, we used MTurk for evaluating the similarity within our clusters.…”

Section: Methodsmentioning

confidence: 99%

Clustering clinical trials with similar eligibility criteria features

Hao

Rusanov

Boland

et al. 2014

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Objectives To automatically identify and cluster clinical trials with similar eligibility features. Methods Using the public repository ClinicalTrials.gov as the data source, we extracted semantic features from the eligibility criteria text of all clinical trials and constructed a trial-feature matrix. We calculated the pairwise similarities for all clinical trials based on their eligibility features. For all trials, by selecting one trial as the center each time, we identified trials whose similarities to the central trial were greater than or equal to a predefined threshold and constructed center-based clusters. Then we identified unique trial sets with distinctive trial membership compositions from center-based clusters by disregarding their structural information. Results From the 145,745 clinical trials on ClinicalTrials.gov, we extracted 5,508,491 semantic features. Of these, 459,936 were unique and 160,951 were shared by at least one pair of trials. Crowdsourcing the cluster evaluation using Amazon Mechanical Turk (MTurk), we identified the optimal similarity threshold, 0.9. Using this threshold, we generated 8,806 center-based clusters. Evaluation of a sample of the clusters by MTurk resulted in a mean score 4.331±0.796 on a scale of 1–5 (5 indicating “strongly agree that the trials in the cluster are similar”). Conclusions We contribute an automated approach to clustering clinical trials with similar eligibility features. This approach can be potentially useful for investigating knowledge reuse patterns in clinical trial eligibility criteria designs and for improving clinical trial recruitment. We also contribute an effective crowdsourcing method for evaluating informatics interventions.

show abstract

Section: Methodsmentioning

confidence: 99%

Clustering clinical trials with similar eligibility criteria features

Hao

Rusanov

Boland

et al. 2014

Journal of Biomedical Informatics

View full text Add to dashboard Cite

show abstract

“…Various medical entity extractors are available for the purpose, but only ADEPT [25] has been specifically trained on medical forums. The algorithm is based on Conditional Random Fields, and the authors have shown that it achieved F 1 score of 0.84 while all the other algorithms that were trained on non-medical forum domains, including MetaMap [1] which is popularly used for literature data, achieved F 1 scores of below 0.5.…”

Section: Medical Entity Extractionmentioning

confidence: 99%

Resolving healthcare forum posts via similar thread retrieval

Cho

Sondhi

Zhai

et al. 2014

Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

View full text Add to dashboard Cite

Web communities such as healthcare web forums serve as popular platforms for users to get their complex medical queries resolved. A typical forum thread contains a query in its first post, and a discussion around it in subsequent posts. However many users do not receive satisfactory responses from other members in the community, leaving them dissatisfied. We propose to help these users by exploiting an existing collection of discussion threads.Often many users suffer from the same medical condition and start multiple discussion threads on very similar queries. In this paper we develop and evaluate a plethora of specialized search methods that treat an entire unresolved forum post as a query, and retrieve forum threads discussing similar problems to help resolve it. The task is more challenging than a traditional document retrieval problem, since forum posts can contain a lot of irrelevant background information. The discussion threads to be retrieved are also quite different from traditional unstructured text documents. We evaluate our results on a dataset comprising over 350K discussion threads and show that our proposed methods outperform state of the art retrieval methods for the task. In particular, method based on non-uniform weighting of thread posts and semantic analysis of the query text perform quite well.

show abstract

“…Hence, we used a pre-trained machine learning algorithm that extracts only medical-related keywords from patient-authored text [26]. After removing general stop words, our initial keyword set consisted of 4,633 words, which were then used to further compute meaningful measures and to highlight such keywords in different views.…”

Section: Designing Visohcmentioning

confidence: 99%

VisOHC: Designing Visual Analytics for Online Health Communities

Kwon

Kim

Lee

et al. 2016

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

Through online health communities (OHCs), patients and caregivers exchange their illness experiences and strategies for overcoming the illness, and provide emotional support. To facilitate healthy and lively conversations in these communities, their members should be continuously monitored and nurtured by OHC administrators. The main challenge of OHC administrators' tasks lies in understanding the diverse dimensions of conversation threads that lead to productive discussions in their communities. In this paper, we present a design study in which three domain expert groups participated, an OHC researcher and two OHC administrators of online health communities, which was conducted to find with a visual analytic solution. Through our design study, we characterized the domain goals of OHC administrators and derived tasks to achieve these goals. As a result of this study, we propose a system called VisOHC, which visualizes individual OHC conversation threads as collapsed boxes–a visual metaphor of conversation threads. In addition, we augmented the posters' reply authorship network with marks and/or beams to show conversation dynamics within threads. We also developed unique measures tailored to the characteristics of OHCs, which can be encoded for thread visualizations at the users' requests. Our observation of the two administrators while using VisOHC showed that it supports their tasks and reveals interesting insights into online health communities. Finally, we share our methodological lessons on probing visual designs together with domain experts by allowing them to freely encode measurements into visual variables.

show abstract

Identifying medical terms in patient-authored text: a crowdsourcing-based approach

Cited by 66 publications

References 23 publications

Clustering clinical trials with similar eligibility criteria features

Clustering clinical trials with similar eligibility criteria features

Resolving healthcare forum posts via similar thread retrieval

VisOHC: Designing Visual Analytics for Online Health Communities

Contact Info

Product

Resources

About