Background Papillary thyroid carcinoma is the most common endocrine malignancy. Since most nodules are benign, the challenge for the clinician is to identify those most likely to harbour malignancy while limiting exposure to surgical risks among those with benign nodules. Methods Random Forests (augmented to select features based on our clinical measure of interest), in conjunction with interpretable rule sets, were used on demographic, ultrasound and biopsy data of thyroid nodules from children <18 years at a tertiary pediatric hospital. Accuracy, False Positive Rate (FPR), False Negative Rate (FNR) and Area Under the Receiver Operator Curve (AUROC) are reported. Results Our models predict non-benign cytology and malignant histology better than historical outcomes. Specifically, we expect a 68.04% improvement in the FPR, 11.90% increase in accuracy and 24.85% increase in AUROC for biopsy predictions in 67 patients (28 with benign and 39 with non-benign histology). We expect an 23.22% decrease in FPR, 32.19% increase in accuracy, and 3.84% decrease in AUROC for surgery prediction in 53 patients (42 with benign and 11 with non-benign histology). This improvement comes at the expense of the FNR, where we expect 10.27% with malignancy would be discouraged from performing biopsy, and 11.67% from surgery. Given the small number of patients, these improvements are estimates and are not tested on an independent test set Conclusions This work presents a first attempt at developing an interpretable machine learning based clinical tool to aid clinicians. Future work will involve sourcing more data and developing probabilistic estimates for predictions.
Background South Africa’s National Health Laboratory Service (NHLS), the only clinical laboratory service in the country’s public health sector, is an important resource for monitoring public health programmes. Objectives We describe NHLS data quality, particularly patient demographics among infants, and the effect this has on linking multiple test results to a single patient. Methods Retrospective descriptive analysis of NHLS data from 1st January 2017—1st September 2020 was performed. A validated probabilistic record-linking algorithm linked multiple results to individual patients in lieu of a unique patient identifier. Paediatric HIV PCR data was used to illustrate the effect on monitoring and evaluating a public health programme. Descriptive statistics including medians, proportions and inter quartile ranges are reported, with Chi-square univariate tests for independence used to determine association between variables. Results During the period analysed, 485 300 007 tests, 98 217 642 encounters and 35 771 846 patients met criteria for analysis. Overall, 15.80% (n = 15 515 380) of all encounters had a registered national identity (ID) number, 2.11% (n = 2 069 785) were registered without a given name, 63.15% (n = 62 020 107) were registered to women and 32.89% (n = 32 304 329) of all folder numbers were listed as either the patient’s date of birth or unknown. For infants tested at < 7 days of age (n = 2 565 329), 0.099% (n = 2 534) had an associated ID number and 48.87% (n = 1 253 620) were registered without a given name. Encounters with a given name were linked to a subsequent encounter 40.78% (n = 14 180 409 of 34 775 617) of the time, significantly more often than the 21.85% (n = 217 660 of 996 229) of encounters registered with a baby-derivative name (p-value < 0.001). Conclusion Unavailability and poor capturing of patient demographics, especially among infants and children, affects the ability to accurately monitor routine health programmes. A unique national patient identifier, other than the national ID number, is urgently required and must be available at birth if South Africa is to accurately monitor programmes such as the Prevention of Mother-to-Child Transmission of HIV.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.