Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer

Jouhet, Vianney; Défossez, Gautier; Burgun, Anita; Beux, P. Le; Levillain, P.; Ingrand, Pierre; Claveau, Vincent

doi:10.3414/me11-01-0005

Cited by 44 publications

(20 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Two of the dissimilar sequences nevertheless comprised all the treatment periods, because the absence of a pathology report on the surgical piece led to the creation of an intermediate surgery state in the sequence (“C” - surgery alone - rather than “D” - surgery and pathology evidence). An earlier study [34] implemented a text categorisation method using a machine-learning technique for the purpose of automatically categorising pathology reports solely on their content, which has demonstrated very good performances. It is therefore likely that the performance of the algorithm could be improved further by adding a supplementary check of the coding of pathology reports.…”

Section: Discussionmentioning

confidence: 99%

Temporal representation of care trajectories of cancer patients using data from a regional information system: an application in breast cancer

Défossez

Rollet

Dameron

et al. 2014

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

BackgroundEnsuring that all cancer patients have access to the appropriate treatment within an appropriate time is a strategic priority in many countries. There is in particular a need to describe and analyse cancer care trajectories and to produce waiting time indicators. We developed an algorithm for extracting temporally represented care trajectories from coded information collected routinely by the general cancer Registry in Poitou-Charentes region, France. The present work aimed to assess the performance of this algorithm on real-life patient data in the setting of non-metastatic breast cancer, using measures of similarity.MethodsCare trajectories were modeled as ordered dated events aggregated into states, the granularity of which was defined from standard care guidelines. The algorithm generates each state from the aggregation over a period of tracer events characterised on the basis of diagnoses and medical procedures. The sequences are presented in simple form showing presence and order of the states, and in an extended form that integrates the duration of the states. The similarity of the sequences, which are represented in the form of chains of characters, was calculated using a generalised Levenshtein distance.ResultsThe evaluation was performed on a sample of 159 female patients whose itineraries were also calculated manually from medical records using the same aggregation rules and dating system as the algorithm. Ninety-eight per cent of the trajectories were correctly reconstructed with respect to the ordering of states. When the duration of states was taken into account, 94% of the trajectories matched reality within three days. Dissimilarities between sequences were mainly due to the absence of certain pathology reports and to coding anomalies in hospitalisation data.ConclusionsThese results show the ability of an integrated regional information system to formalise care trajectories and automatically produce indicators for time-lapse to care instatement, of interest in the planning of care in cancer. The next step will consist in evaluating this approach and extending it to more complex trajectories (metastasis, relapse) and to other cancer localisations.

show abstract

Section: Discussionmentioning

confidence: 99%

Temporal representation of care trajectories of cancer patients using data from a regional information system: an application in breast cancer

Défossez

Rollet

Dameron

et al. 2014

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

show abstract

“…Many use dictionary-based methods (Coden, 2009) (Ashish et al, 2014) for extracting entities before structuring them using specific algorithms. However statistical named entity recognition methods (Ou and Patrick, 2014) and document classification methods are also used (Jouhet et al, 2012) (Kavuluru et al, 2013).…”

Section: Cancer Information Extractionmentioning

confidence: 99%

In-depth annotation for patient level liver cancer staging

Yim

Kwan

Yetisgen

2015

Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

View full text Add to dashboard Cite

Cancer stages, which summarizes extent of cancer progression, is an important tool for evidence-based medical research. However, they are not always recorded in the electronic medical record. In this paper, we describe work for annotating a medical text corpus with the goal of predicting patient level liver cancer staging in hepatocellular carcinoma (HCC) patients.Our annotation consisted of identifying 11 parameters, used to calculate liver cancer staging, at the text span level as well as at the patient level. Also at the patient level, we annotated stages for three commonly-used liver cancer staging schemes. Our inter-rater agreement showed text annotation consistency 0.73 F1 for partial text match and 0.91 F1 at the patient level.After annotation, we performed several document classification experiments for the text span annotations using standard machine learning classifiers, including decision trees, maximum entropy, naive Bayes and support vector machines. Thereby, we identified baseline performances for our task at 0.63 F1 as well as strategies for future improvement.

show abstract

“…Martinez and Li13 explore a machine learning methodology for populating a colorectal cancer template with six attributes including the tumor site. They report an F score of 58.1, for a model whose most predictive features are based on UMLS and SNOMED-CT. Jouhet et al 14 work with pathology notes from the French Poitou-Charentes Cancer Registry automatically to discover the primary tumor site and code to the International Classification of Diseases—Oncology (ICD-O)15 codes using machine learning techniques. Kuvuluru et al 16 focus on extracting the generic ICD-O code for primary cancers reported in pathology reports.…”

Section: Background and Significancementioning

confidence: 99%

Discovering body site and severity modifiers in clinical texts

Dligach

Bethard

Becker

et al. 2014

J Am Med Inform Assoc

View full text Add to dashboard Cite

ObjectiveTo research computational methods for discovering body site and severity modifiers in clinical texts.MethodsWe cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct cross-domain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors.ResultsThe performance of our method for discovering body site modifiers achieves F1 of 0.740–0.908 and our method for discovering severity modifiers achieves F1 of 0.905–0.929.DiscussionResults indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures.ConclusionsWe investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES).

show abstract

Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer

Cited by 44 publications

References 25 publications

Temporal representation of care trajectories of cancer patients using data from a regional information system: an application in breast cancer

Temporal representation of care trajectories of cancer patients using data from a regional information system: an application in breast cancer

In-depth annotation for patient level liver cancer staging

Discovering body site and severity modifiers in clinical texts

Contact Info

Product

Resources

About