Matthew Michelson scite author profile

BackgroundA major barrier to the practice of evidence-based medicine is efficiently finding scientifically sound studies on a given clinical topic.ObjectiveTo investigate a deep learning approach to retrieve scientifically sound treatment studies from the biomedical literature.MethodsWe trained a Convolutional Neural Network using a noisy dataset of 403,216 PubMed citations with title and abstract as features. The deep learning model was compared with state-of-the-art search filters, such as PubMed’s Clinical Query Broad treatment filter, McMaster’s textword search strategy (no Medical Subject Heading, MeSH, terms), and Clinical Query Balanced treatment filter. A previously annotated dataset (Clinical Hedges) was used as the gold standard.ResultsThe deep learning model obtained significantly lower recall than the Clinical Queries Broad treatment filter (96.9% vs 98.4%; P<.001); and equivalent recall to McMaster’s textword search (96.9% vs 97.1%; P=.57) and Clinical Queries Balanced filter (96.9% vs 97.0%; P=.63). Deep learning obtained significantly higher precision than the Clinical Queries Broad filter (34.6% vs 22.4%; P<.001) and McMaster’s textword search (34.6% vs 11.8%; P<.001), but was significantly lower than the Clinical Queries Balanced filter (34.6% vs 40.9%; P<.001).ConclusionsDeep learning performed well compared to state-of-the-art search filters, especially when citations were not indexed. Unlike previous machine learning approaches, the proposed deep learning model does not require feature engineering, or time-sensitive or proprietary features, such as MeSH terms and bibliometrics. Deep learning is a promising approach to identifying reports of scientifically rigorous clinical research. Further work is needed to optimize the deep learning model and to assess generalizability to other areas, such as diagnosis, etiology, and prognosis.

show abstract

The significant cost of systematic reviews and meta-analyses: A call for greater involvement of machine learning to assess the promise of clinical trials

Michelson¹,

Reuter

2019

Contemporary Clinical Trials Communications

View full text Add to dashboard Cite

Background More than 90% of clinical-trial compounds fail to demonstrate sufficient efficacy and safety. To help alleviate this issue, systematic literature review and meta-analysis (SLR), which synthesize current evidence for a research question, can be applied to preclinical evidence to identify the most promising therapeutics. However, these methods remain time-consuming and labor-intensive. Here, we introduce an economic formula to estimate the expense of SLR for academic institutions and pharmaceutical companies. Methods We estimate the manual effort involved in SLR by quantifying the amount of labor required and the total associated labor cost. We begin with an empirical estimation and derive a formula that quantifies and describes the cost. Results The formula estimated that each SLR costs approximately $141,194.80. We found that on average, the ten largest pharmaceutical companies publish 118.71 and the ten major academic institutions publish 132.16 SLRs per year. On average, the total cost of all SLRs per year to each academic institution amounts to $18,660,304.77 and for each pharmaceutical company is $16,761,234.71. Discussion It appears that SLR is an important, but costly mechanisms to assess the totality of evidence. Conclusions With the increase in the number of publications, the significant time and cost of SLR may pose a barrier to their consistent application to assess the promise of clinical trials thoroughly. We call on investigators and developers to develop automated solutions to help with the assessment of preclinical evidence particularly. The formula we introduce provides a cost baseline against which the efficiency of automation can be measured.

show abstract

Creating Relational Data from Unstructured and Ungrammatical Data Sources

Michelson¹,

Knoblock²

2008

jair

View full text Add to dashboard Cite

In order for agents to act on behalf of users, they will have to retrieve and integrate vast amounts of textual data on the World Wide Web. However, much of the useful data on the Web is neither grammatical nor formally structured, making querying difficult. Examples of these types of data sources are online classifieds like Craigslist and auction item listings like eBay. We call this unstructured, ungrammatical data "posts." The unstructured nature of posts makes query and integration difficult because the attributes are embedded within the text. Also, these attributes do not conform to standardized values, which prevents queries based on a common attribute value. The schema is unknown and the values may vary dramatically making accurate search difficult. Creating relational data for easy querying requires that we define a schema for the embedded attributes and extract values from the posts while standardizing these values. Traditional information extraction (IE) is inadequate to perform this task because it relies on clues from the data, such as structure or natural language, neither of which are found in posts. Furthermore, traditional information extraction does not incorporate data cleaning, which is necessary to accurately query and integrate the source. The two-step approach described in this paper creates relational data sets from unstructured and ungrammatical text by addressing both issues. To do this, we require a set of known entities called a "reference set." The first step aligns each post to each member of each reference set. This allows our algorithm to define a schema over the post and include standard values for the attributes defined by this schema. The second step performs information extraction for the attributes, including attributes not easily represented by reference sets, such as a price. In this manner we create a relational structure over previously unstructured data, supporting deep and accurate queries over the data as well as standard values for integration. Our experimental results show that our technique matches the posts to the reference set accurately and efficiently and outperforms state-of-the-art extraction systems on the extraction task from posts.

show abstract

A Heterogeneous Field Matching Method for Record Linkage

Minton

Nanjo

Knoblock

et al.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.