Svetislav Grbich scite author profile

Svetislav Grbich

5Publications

6Citation Statements Received

55Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study

Surodina¹,

Lam²,

Grbich³

et al. 2021

JMIRx Med

View full text Add to dashboard Cite

Background Researching people with herpes simplex virus (HSV) is challenging because of poor data quality, low user engagement, and concerns around stigma and anonymity. Objective This project aimed to improve data collection for a real-world HSV registry by identifying predictors of HSV infection and selecting a limited number of relevant questions to ask new registry users to determine their level of HSV infection risk. Methods The US National Health and Nutrition Examination Survey (NHANES, 2015-2016) database includes the confirmed HSV type 1 and type 2 (HSV-1 and HSV-2, respectively) status of American participants (14-49 years) and a wealth of demographic and health-related data. The questionnaires and data sets from this survey were used to form two data sets: one for HSV-1 and one for HSV-2. These data sets were used to train and test a model that used a random forest algorithm (devised using Python) to minimize the number of anonymous lifestyle-based questions needed to identify risk groups for HSV. Results The model selected a reduced number of questions from the NHANES questionnaire that predicted HSV infection risk with high accuracy scores of 0.91 and 0.96 and high recall scores of 0.88 and 0.98 for the HSV-1 and HSV-2 data sets, respectively. The number of questions was reduced from 150 to an average of 40, depending on age and gender. The model, therefore, provided high predictability of risk of infection with minimal required input. Conclusions This machine learning algorithm can be used in a real-world evidence registry to collect relevant lifestyle data and identify individuals’ levels of risk of HSV infection. A limitation is the absence of real user data and integration with electronic medical records, which would enable model learning and improvement. Future work will explore model adjustments, anonymization options, explicit permissions, and a standardized data schema that meet the General Data Protection Regulation, Health Insurance Portability and Accountability Act, and third-party interface connectivity requirements.

show abstract

Authors’ Response to Peer Reviews of “Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study”

Surodina¹,

Lam²,

Grbich³

et al. 2021

JMIRx Med

View full text Add to dashboard Cite

Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study (Preprint)

Surodina¹,

Lam²,

Grbich³

et al. 2020

Preprint

View full text Add to dashboard Cite

BACKGROUND Conducting research about people with herpes simplex virus is challenging because of poor data quality, low user engagement, and concerns around stigma and anonymity. OBJECTIVE This project aimed to improve data collection for a real-world HSV registry by identifying predictors of HSV infection and selecting a limited number of relevant questions to ask new registry users to determine their level of HSV infection risk. METHODS The US National Health and Nutrition Examination Survey (NHANES, 2015-16) database includes the confirmed HSV1 and HSV2 status of American participants (14-49 years) as well as a wealth of demographic and health-related data. The questionnaires and datasets from this survey were used to form two datasets (for HSV1 and HSV2). These datasets were used to train and test a model that used a Random Forest algorithm (devised using Python) to minimize the number of anonymous lifestyle-based questions needed to identify risk groups for HSV. RESULTS The model selected a reduced number of questions from the NHANES questionnaire that predicted HSV infection risk with high accuracy scores of 0.91 and 0.96 and high recall scores of 0.88 and 0.98 for HSV1 and HSV2 datasets, respectively. The number of questions was reduced from 150 to an average of 40, depending on age and gender. The model therefore provided high predictability of risk of infection with minimal required input. CONCLUSIONS This machine-learning algorithm can be used in a real-world evidence registry to collect relevant lifestyle data and identify individuals’ levels of risk of HSV infection. A current limitation is the absence of real user data and integration with electronic medical records, which would enable model learning and improvement. Future work will explore model adjustments, anonymisation options, explicit permissions and standardised data schema that meet General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), and third-party interface connectivity requirements.

show abstract

Authors’ Response to Peer Reviews of “Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study” (Preprint)

Surodina¹,

Lam²,

Grbich³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

Requirements Engineering of a Herpes Simplex Virus Patient Registry: Alpha Phase

Surodina¹,

Lam²,

Grbich³

et al. 2020

Preprint

View full text Add to dashboard Cite

Abstract Background Collecting data from people with herpes simplex virus is challenging because of poor data quality, low user engagement, and concerns around stigma and anonymity. This project aimed to improve data collection for a real-world HSV registry by identifying predictors of HSV infection and selecting a limited number of relevant questions to ask new registry users in order to determine the HSV infection risk group. Methods. The US National Health and Nutrition Examination Survey (NHANES, 2015-16) database has confirmed HSV1 and HSV2 status of American participants (14-49 years) as well as a wealth of demographic and health-related data. Two datasets – for HSV1 and HSV2 – were formed using this database, and an anonymous lifestyle-data based questionnaire with a Random Forest algorithm was devised using Python. The algorithm was optimised to reduce the number of questions and to identify risk groups for HSV. Data was split into subsets to train and test the model. Results The model selected a reduced number of questions from the NHANES questionnaire that predicted HSV infection risk with high accuracy scores of 0.91 and 0.96 and high recall scores of 0.88 and 0.98 for HSV1 and HSV2 datasets, respectively. The number of questions was reduced from 150 to an average of 40, depending on age and gender, that together provides high predictability of the infection Conclusions This machine-learning algorithm for risk identification of people infected with HSV can be used in a real-world evidence registry to collect relevant lifestyle data. A current limitation is the absence of real user data and integration with electronic medical records that would enable model learning and improvement. Future work will explore model adjustments, anonymisation options, explicit permissions and standardised data schema that meet GDPR, HIPAA and third-party interface connectivity requirements.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.