Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation

Cichosz, Paweł

doi:10.1017/s1351324920000066

Cited by 11 publications

(6 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Matlab 7 simulation experiment was used to verify the application performance of the proposed method in realizing the abnormal detection of portable multidimensional control software testing. e parameters of abnormal feature detection of portable multidimensional control software testing were set as 1400, the sequence length of the training set was 400, and the fuzzy matching coefficient was 0.35 [21]. Related parameter settings are shown in Table 1.…”

Section: Results Analysismentioning

confidence: 99%

Application of Software Data Analysis Model Based on K-Means Clustering Algorithm

Qiu

2022

Security and Communication Networks

View full text Add to dashboard Cite

In order to improve the anomaly detection ability of portable multidimensional control software test data, a software test data anomaly detection method based on K-means clustering is proposed. The abnormal data distribution structure model of portable multidimensional control software testing is constructed. The fuzzy semantic feature reconstruction method is adopted to identify the fuzzy parameters of portable multidimensional control software and extract the feature quantity of associated information. According to the evolution distribution of associated features, the joint combination feature analysis method is adopted to realize the fuzzy clustering center detection of abnormal data of portable multidimensional control software test, and the fusion of abnormal feature distribution is carried out to complete the joint multidimensional feature detection. The K-means clustering method is used for effective data combination control in portable multidimensional control software to extract and detect abnormal features of test data in portable multidimensional control software. Experimental results show that the accuracy of anomaly detection of software test data by the proposed method is always greater than 0.8. Conclusion. Using this method to detect abnormal data of portable multidimensional control software test has higher accuracy and better detection performance.

show abstract

Section: Results Analysismentioning

confidence: 99%

Application of Software Data Analysis Model Based on K-Means Clustering Algorithm

Qiu

2022

Security and Communication Networks

View full text Add to dashboard Cite

show abstract

“…The most commonly used methods in text representation include bag of word and term frequency–inverse document frequency, which perform well in classification and clustering tasks; however, there are still some problems, such as an extremely high vector dimension, sparse data, failure to focus on the word order in sentences, and failure to learn text semantic information [ 34 , 35 ]. Multiple word embedding representation methods have been developed to overcome this limitation, such as Word2Vec [ 36 ], GloVe [ 37 , 38 ], and Embeddings from Language Models [ 39 ], which can effectively address the semantic problems of words in the text. In this study, GloVe was applied for text representation owing to its advantages of high accuracy and a short training period.…”

Section: Methodsmentioning

confidence: 99%

A Deep Learning Model for the Normalization of Institution Names by Multisource Literature Feature Fusion: Algorithm Development Study

Chen¹,

Li²,

Li³

et al. 2023

JMIR Form Res

View full text Add to dashboard Cite

Background The normalization of institution names is of great importance for literature retrieval, statistics of academic achievements, and evaluation of the competitiveness of research institutions. Differences in authors’ writing habits and spelling mistakes lead to various names of institutions, which affects the analysis of publication data. With the development of deep learning models and the increasing maturity of natural language processing methods, training a deep learning–based institution name normalization model can increase the accuracy of institution name normalization at the semantic level. Objective This study aimed to train a deep learning–based model for institution name normalization based on the feature fusion of affiliation data from multisource literature, which would realize the normalization of institution name variants with the help of authority files and achieve a high specification accuracy after several rounds of training and optimization. Methods In this study, an institution name normalization–oriented model was trained based on bidirectional encoder representations from transformers (BERT) and other deep learning models, including the institution classification model, institutional hierarchical relation extraction model, and institution matching and merging model. The model was then trained to automatically learn institutional features by pretraining and fine-tuning, and institution names were extracted from the affiliation data of 3 databases to complete the normalization process: Dimensions, Web of Science, and Scopus. Results It was found that the trained model could achieve at least 3 functions. First, the model could identify the institution name that is consistent with the authority files and associate the name with the files through the unique institution ID. Second, it could identify the nonstandard institution name variants, such as singular forms, plural changes, and abbreviations, and update the authority files. Third, it could identify the unregistered institutions and add them to the authority files, so that when the institution appeared again, the model could identify and regard it as a registered institution. Moreover, the test results showed that the accuracy of the normalization model reached 93.79%, indicating the promising performance of the model for the normalization of institution names. Conclusions The deep learning–based institution name normalization model trained in this study exhibited high accuracy. Therefore, it could be widely applied in the evaluation of the competitiveness of research institutions, analysis of research fields of institutions, and construction of interinstitutional cooperation networks, among others, showing high application value.

show abstract

“…Language Models Based on Neural Networks. From 2018 to 2021, studies have investigated language models based on neural networks, such as Word2Vec and Bidirectional Encoder Representations from Transformers (BERT), that generates more semantic representations through word embeddings [Ruff et al 2019, Cichosz 2020, Mayaluru 2020]. These methods, mainly BERT, obtained state-of-the-art results for ATC through OCL.…”

Section: Related Workmentioning

confidence: 99%

Text Representation through Multimodal Variational Autoencoder for One-Class Learning

Gôlo

Marcacini

2023

Anais Do XXXVI Concurso De Teses E Dissertações (CTD 2023)

View full text Add to dashboard Cite

Multi-class learning (MCL) methods perform Automatic Text Classification (ATC), which requires labeling for all classes. MCL fails when there is no well-defined information about the classes and requires a great effort to label instances. One-Class Learning (OCL) can mitigate these limitations since the training only has instances from one class, reducing the labeling effort and making the ATC more appropriate for open-domain applications. However, OCL is more challenging due to the lack of counterexamples for model training, requiring more robust representations. However, most studies use unimodal representations, even though different domains contain other information that can be used as modalities. Thus, this study proposes the Multimodal Variational Autoencoder (MVAE) for OCL. MVAE is a multimodal method that learns a new representation from more than one modality, capturing the characteristics of the interest class in an adequate way. MVAE explores semantic, density, linguistic, and spatial information modalities. The main contributions are: (i) a multimodal method for ATC through OCL; (ii) MVAE for fake news detection; (iii) relevant reviews detection via MVAE; and (iv) sensing events through MVAE.

show abstract

Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation

Cited by 11 publications

References 59 publications

Application of Software Data Analysis Model Based on K-Means Clustering Algorithm

Application of Software Data Analysis Model Based on K-Means Clustering Algorithm

A Deep Learning Model for the Normalization of Institution Names by Multisource Literature Feature Fusion: Algorithm Development Study

Text Representation through Multimodal Variational Autoencoder for One-Class Learning

Contact Info

Product

Resources

About