Aiming at the problems available in the traditional method of cross-language text clustering, a Chinese-English cross-language text clustering algorithm based on Latent Semantic Analysis is put forward. [Method] With the method of Latent Semantic Analysis, Singular Value Decomposition of characteristic word-text matrix is carried out. The bilingual latent semantic space in Chinese-English is constructed to realize cross-language latent semantic association so as to reduce dimension and noise. The K-means algorithm which chooses the initial cluster center on the basis of the minimum similarity is adopted to avoid the effect of random selection of the initial cluster centers on the clustering effect.[Results] Experiment results show that the number of reserved characteristic words of each text s and the selection of the spatial dimension value k have certain impacts on the clustering result. When each text retains the top 15 characteristic words and k=200, the F-measure can be optimal. Compared to CLTC, 13.96 percentage points can be improved. [Conclusions] This method has greatly reduced the dimension of text space and improved the cross-language text clustering quality effectively. The clustering effect is better than CLTC.
Background: With the rapid development of advanced artificial intelligence technologies which have been applied in varying types of applications, especially in the medical field. Cancer is one of the biggest problems in medical sciences. If cancer can be detected and treated early, the possibility of a cure will be greatly increased. Malignant skin cancer is one of the cancers with the highest mortality rate, which cannot be diagnosed in time only through doctors’ experience. We can employ artificial intelligence algorithms to detect skin cancer at an early stage, for example, patients are determined whether suffering from skin cancer by detecting skin damage or spots. Objective: We use the real HAM10000 image dataset to analyze and predict skin cancer. Methods: (1) We introduce a lightweight attention module to discover the relationships between features, and we fine-tune the pre-trained model (i.e., ResNet-50) on the HAM10000 dataset to extract the hidden high-level features from the images; (2) we integrate these high-level features with generic statistical features, and use the SMOTE oversampling technique to augment samples from the minority classes; and (3) we input the augmented samples into the XGBoost model for training and predicting. Results: The experimental results show that the accuracy, sensitivity, and specificity of the proposed SkinDet (Skin cancer detector based on transfer learning and feature fusion) model reached 98.24%, 97.84%, and 98.13%. The proposed model has stronger classification capability for the minority classes, such as dermato fibroma and actinic keratoses. result: The experimental results show that the accuracy, sensitivity and specificity of the proposed SkinDet (Skin cancer detector based on transfer learning and feature fusion) model reached 98.24%, 97.84% and 98.13%. The proposed model has stronger classification capability for the minority classes, such as dermato fibroma and actinic keratoses. Conclusion: SkinDet contains a lightweight attention module and can extract the hidden high-level features of the images by fine-tuning the pretrained model on the skin cancer dataset. In particular, SkinDet integrates high-level features with statistical features and augments samples of these minority classes. Importantly, SkinDet can be applied to classify the samples into minority classes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.