Medical diagnoses have important implications for improving patient care, research, and policy. For a medical diagnosis, health professionals use different kinds of pathological methods to make decisions on medical reports in terms of the patients’ medical conditions. Recently, clinicians have been actively engaged in improving medical diagnoses. The use of artificial intelligence and machine learning in combination with clinical findings has further improved disease detection. In the modern era, with the advantage of computers and technologies, one can collect data and visualize many hidden outcomes such as dealing with missing data in medical research. Statistical machine learning algorithms based on specific problems can assist one to make decisions. Machine learning (ML), data-driven algorithms can be utilized to validate existing methods and help researchers to make potential new decisions. The purpose of this study was to extract significant predictors for liver disease from the medical analysis of 615 humans using ML algorithms. Data visualizations were implemented to reveal significant findings such as missing values. Multiple imputations by chained equations (MICEs) were applied to generate missing data points, and principal component analysis (PCA) was used to reduce the dimensionality. Variable importance ranking using the Gini index was implemented to verify significant predictors obtained from the PCA. Training data (ntrain=399) for learning and testing data (ntest=216) in the ML methods were used for predicting classifications. The study compared binary classifier machine learning algorithms (i.e., artificial neural network, random forest (RF), and support vector machine), which were utilized on a published liver disease data set to classify individuals with liver diseases, which will allow health professionals to make a better diagnosis. The synthetic minority oversampling technique was applied to oversample the minority class to regulate overfitting problems. The RF significantly contributed (p<0.001) to a higher accuracy score of 98.14% compared to the other methods. Thus, this suggests that ML methods predict liver disease by incorporating the risk factors, which may improve the inference-based diagnosis of patients.
For a medical diagnosis, health professionals use different kinds of pathological ways to make a decision for medical reports in terms of patients medical condition. In the modern era, because of the advantage of computers and technologies, one can collect data and visualize many hidden outcomes from them. Statistical machine learning algorithms based on specific problems can assist one to make decisions. Machine learning data driven algorithms can be used to validate existing methods and help researchers to suggest potential new decisions. In this paper, multiple imputation by chained equations was applied to deal with missing data, and Principal Component Analysis to reduce the dimensionality. To reveal significant findings, data visualizations were implemented. We presented and compared many binary classifier machine learning algorithms (Artificial Neural Network, Random Forest, Support Vector Machine) which were used to classify blood donors and non-blood donors with hepatitis, fibrosis and cirrhosis diseases. From the data published in UCI-MLR, all mentioned techniques were applied to find one better method to classify blood donors and non-blood donors (hepatitis, fibrosis, and cirrhosis) that can help health professionals in a laboratory to make better decisions. Our proposed ML-method showed better accuracy score (e.g. 98.23% for SVM). Thus, it improved the quality of classification.
Hepatocellular carcinoma (HCC) is the primary liver cancer that occurs the most frequently. The risk of developing HCC is highest in those with chronic liver diseases, such as cirrhosis brought on by hepatitis B or C infection and the most common type of liver cancer. Knowledge-based interpretations are essential for understanding the HCC microarray dataset due to its nature, which includes high dimensions and hidden biological information in genes. When analyzing gene expression data with many genes and few samples, the main problem is to separate disease-related information from a vast quantity of redundant gene expression data and their noise. Clinicians are interested in identifying the specific genes responsible for HCC in individual patients. These responsible genes may differ between patients, leading to variability in gene selection. Moreover, ML approaches, such as classification algorithms, are similar to black boxes, and it is important to interpret the ML model outcomes. In this paper, we use a reliable pipeline to determine important genes for discovering HCC from microarray analysis. We eliminate redundant and unnecessary genes through gene selection using principal component analysis (PCA). Moreover, we detect responsible genes with the random forest algorithm through variable importance ranking calculated from the Gini index. Classification algorithms, such as random forest (RF), naïve Bayes classifier (NBC), logistic regression, and k-nearest neighbor (kNN) are used to classify HCC from responsible genes. However, classification algorithms produce outcomes based on selected genes for a large group of patients rather than for specific patients. Thus, we apply the local interpretable model-agnostic explanations (LIME) method to uncover the AI-generated forecasts as well as recommendations for patient-specific responsible genes. Moreover, we show our pathway analysis and a dendrogram of the pathway through hierarchical clustering of the responsible genes. There are 16 responsible genes found using the Gini index, and CCT3 and KPNA2 show the highest mean decrease in Gini values. Among four classification algorithms, random forest showed 96.53% accuracy with a precision of 97.30%. Five-fold cross-validation was used in order to collect multiple estimates and assess the variability for the RF model with a mean ROC of 0.95±0.2. LIME outcomes were interpreted for two random patients with positive and negative effects. Therefore, we identified 16 responsible genes that can be used to improve HCC diagnosis or treatment. The proposed framework using machine-learning-classification algorithms with the LIME method can be applied to find responsible genes to diagnose and treat HCC patients.
Knowledge-based interpretations are essential for understanding the omic data set because of its nature, such as high dimension and hidden biological information in genes. When analyzing gene expression data with many genes and few samples, the main problem is to separate disease-related information from a vast quantity of redundant data and noise. This paper uses a reliable framework to determine important genes for discovering Hepato-cellular Carcinoma (HCC) from micro-array analysis and eliminating redundant and unnecessary genes through gene selection. Several machine learning models were applied to find significant predictors responsible for HCC. As we know, classification algorithms such as Random Forest, Naive Bayes classifier, or a k-Nearest Neighbor classifier can help us to classify HCC from responsible genes. Random Forests shows 96.53% accuracy with p < 0.00001, which is better than other discussed Machine Learning(ML) approaches. Each gene is not responsible for a particular patient. Since ML approaches are like black boxes and people/practitioners do not rely on them sometimes. Artificial Intelligence(AI) technologies with high optimization interoperability shed light on what is happening inside these systems and aid in the detection of potential problems; including causality, information leakage, model bias, and robustness when determining responsible genes for a specific patient with a high probability score of almost 0.99, from one of the samples mentioned in this study.
In this paper, our main aim is to show a better dimension reduction process of high dimensional image data sets from several existing techniques. To verify it we start with most useful singular value decomposition to reduce the dimensionality of data to incorporate principal components. On the other hand, we classify data in advance to work out Fisher’s discriminant. From many real-world examples, we set a very well-known paradigm of analysis using Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) or Fisher Discriminant Analysis (FDA) and Simple Projection (SP) to recognize people from their facial images. We consider that we have some images of known people that can be used to compare and recognize new images (of the same set of face images). Moreover, we show graphical and tabular representation for average performance of correct recognition as well as analyze the effectiveness of three different machine learning techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.