2020
DOI: 10.1017/s1351324920000066
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation

Abstract: Anomaly detection can be seen as an unsupervised learning task in which a predictive model created on historical data is used to detect outlying instances in new data. This work addresses possibly promising but relatively uncommon application of anomaly detection to text data. Two English-language and one Polish-language Internet discussion forums devoted to psychoactive substances received from home-grown plants, such as hashish or marijuana, serve as text sources that are both realistic and possibly interest… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 59 publications
0
6
0
Order By: Relevance
“…Matlab 7 simulation experiment was used to verify the application performance of the proposed method in realizing the abnormal detection of portable multidimensional control software testing. e parameters of abnormal feature detection of portable multidimensional control software testing were set as 1400, the sequence length of the training set was 400, and the fuzzy matching coefficient was 0.35 [21]. Related parameter settings are shown in Table 1.…”
Section: Results Analysismentioning
confidence: 99%
“…Matlab 7 simulation experiment was used to verify the application performance of the proposed method in realizing the abnormal detection of portable multidimensional control software testing. e parameters of abnormal feature detection of portable multidimensional control software testing were set as 1400, the sequence length of the training set was 400, and the fuzzy matching coefficient was 0.35 [21]. Related parameter settings are shown in Table 1.…”
Section: Results Analysismentioning
confidence: 99%
“…The most commonly used methods in text representation include bag of word and term frequency–inverse document frequency, which perform well in classification and clustering tasks; however, there are still some problems, such as an extremely high vector dimension, sparse data, failure to focus on the word order in sentences, and failure to learn text semantic information [ 34 , 35 ]. Multiple word embedding representation methods have been developed to overcome this limitation, such as Word2Vec [ 36 ], GloVe [ 37 , 38 ], and Embeddings from Language Models [ 39 ], which can effectively address the semantic problems of words in the text. In this study, GloVe was applied for text representation owing to its advantages of high accuracy and a short training period.…”
Section: Methodsmentioning
confidence: 99%
“…Language Models Based on Neural Networks. From 2018 to 2021, studies have investigated language models based on neural networks, such as Word2Vec and Bidirectional Encoder Representations from Transformers (BERT), that generates more semantic representations through word embeddings [Ruff et al 2019, Cichosz 2020, Mayaluru 2020]. These methods, mainly BERT, obtained state-of-the-art results for ATC through OCL.…”
Section: Related Workmentioning
confidence: 99%