2022
DOI: 10.1145/3527838
|View full text |Cite
|
Sign up to set email alerts
|

A Novel Deep Auto-Encoder Based Linguistics Clustering Model for Social Text

Abstract: The wide adoption of media and social media has increased the amount of digital content to an enormous level. Natural language processing (NLP) techniques provide an opportunity to extract and explore meaningful information from a large amount of text. Among natural languages, Urdu is one of the widely used languages worldwide for spoken and written communications. Due to its wide adopt-ability, digital content in the Urdu language is increasing briskly, especially with social media and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(18 citation statements)
references
References 41 publications
0
18
0
Order By: Relevance
“…Secondly, we need to simplify the representation of the text, which will make our model more advantageous for industrial replication. We will try to further extract textual information by short text clustering [ 43 ], ensemble learning, DistilBERT [ 44 ], and try to optimize the time complexity [ 45 ]. Finally, we would like to incorporate external data (e.g., knowledge graph, speech, images) into CTR prediction to mine the intent behind users with richer features.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Secondly, we need to simplify the representation of the text, which will make our model more advantageous for industrial replication. We will try to further extract textual information by short text clustering [ 43 ], ensemble learning, DistilBERT [ 44 ], and try to optimize the time complexity [ 45 ]. Finally, we would like to incorporate external data (e.g., knowledge graph, speech, images) into CTR prediction to mine the intent behind users with richer features.…”
Section: Discussionmentioning
confidence: 99%
“…In the past few years, RNNs have achieved great success in speech recognition, language modeling, text translation, and other tasks, but an RNN has the problems of gradient explosion and gradient disappearance due to the presence of continuous cyclic input information. The text component LSTM [ 34 ] introduces a gating unit, which can solve the abovementioned problems faced by RNNs (see Figure 8 ). The mathematical expression of the text component LSTM is shown in Equation ( 11 ): where x t represents the current input value, h t represents the output of the current hidden layer, and h t −1 represents the output of the hidden layer at the previous moment.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…e proposed DNN-based model outperforms other machine learning approaches. Akram et al [14] projected a linguistic prototype for social text based on deep autoencoder. ey have implemented this model for low resource language Urdu.…”
Section: Literature Surveymentioning
confidence: 99%
“…Early research was carried out based on traditional machine learning methods, such as support vector machines (SVM) [ 2 ], Gaussian mixture models (GMM) [ 3 ], hidden Markov models (HMM) [ 4 ], K-Nearest Neighbor (KNN) [ 5 ], and other methods for processing speech features, and in recent years, with the development of deep learning, convolutional neural networks (CNNs) [ 6 ], recurrent neural networks (RNNs) [ 7 ], long short-term memory (LSTM) [ 8 ], deep belief networks (DBNs) [ 9 ], auto-encoders (AEs) [ 10 ], and other methods are applied to speech feature extraction, and these data-driven deep learning methods have obtained excellent performance improvement in SER tasks. Notably, these methods are also widely used in semantic sentiment analysis, with some studies [ 11 , 12 ] using CNNs and AE to learn text feature representations, and these techniques, together with speech sentiment recognition, have driven the development of affective computing research.…”
Section: Introductionmentioning
confidence: 99%