A Hidden Topic-Based Framework toward Building Applications with Short Web Documents

Phan, Xuan-Hieu; Nguyen, Cam-Tu; Le, Dieu-Thu; Nguyen, Le-Minh; Horiguchi, Susumu; Ha, Quang-Thuy

doi:10.1109/tkde.2010.27

Cited by 103 publications

(65 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[21] improved Sahami's work by involving a learning process to make the measure more appropriate for the target corpus. Phan et al [12,11] proposed to convert additional knowledge base to topics to improve the representation of the short texts. The knowledge base is crawled with selected seeds from several topics to avoid noise.…”

Section: Mining Short Textmentioning

confidence: 99%

“…Directly learning topic models on short text is much harder than on traditional long text. For this reason, [12,11] proposed to train topic models on a collection long text in the same domain and then make inference on short text to help the learning task on short texts. However, in highly dynamic domains like Twitter where novel topics and trends constantly emerge, it is not always possible to find strongly related long texts via a search engine or a static knowledge base such as Wikipedia.…”

Section: Dual Latent Dirichlet Alloca-tion (Dlda)mentioning

confidence: 99%

“…In the presence of such inconsistencies between short texts and auxiliary long texts, it would be unreasonable to assume that the topical structure of the two domains is completely identical, as done in several previous works [12,11,20]. In this section, we describe a better solution to the problem by designing a novel topic model, referred to as the Dual Latent Dirichlet Allocation (DLDA), which can distinguish between consistent and inconsistent topical structures across domains when learning topics from short texts with an additional set of auxiliary long texts.…”

Section: Dual Latent Dirichlet Alloca-tion (Dlda)mentioning

confidence: 99%

“…This can be achieved by sending the input short texts as queries to a search engine to retrieve a set of most relevant results [18]. Another popular method is to match short texts with topics learned from general knowledge repositories such as Wikipedia or ODP [12,11]. Once the auxiliary data or auxiliary topic is obtained, the data or topics are often directly combined with the original short texts, which are then processed by some traditional text mining models.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Transferring topical knowledge from auxiliary long texts for short text clustering

Jin

Liu

Zhao

et al. 2011

Proceedings of the 20th ACM International Conference on Information and Knowledge Management

178

110

View full text Add to dashboard Cite

With the rapid growth of social Web applications such as Twitter and online advertisements, the task of understanding short texts is becoming more and more important. Most traditional text mining techniques are designed to handle long text documents. For short text messages, many of the existing techniques are not effective due to the sparseness of text representations. To understand short messages, we observe that it is often possible to find topically related long texts, which can be utilized as the auxiliary data when mining the target short texts data. In this article, we present a novel approach to cluster short text messages via transfer learning from auxiliary long text data. We show that while some previous works for enhancing short text clustering with related long texts exist, most of them ignore the semantic and topical inconsistencies between the target and auxiliary data and may hurt the clustering performance on the short texts. To accommodate the possible inconsistencies between source and target data, we propose a novel topic model -Dual Latent Dirichlet Allocation (DLDA) model, which jointly learns two sets of topics on short and long texts and couples the topic parameters to cope with the potential inconsistencies between data sets. We demonstrate through large-scale clustering experiments on both advertisements and Twitter data that we can obtain superior performance over several state-of-art techniques for clustering short text documents.

show abstract

Section: Mining Short Textmentioning

confidence: 99%

Section: Dual Latent Dirichlet Alloca-tion (Dlda)mentioning

confidence: 99%

Section: Dual Latent Dirichlet Alloca-tion (Dlda)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Transferring topical knowledge from auxiliary long texts for short text clustering

Jin

Liu

Zhao

et al. 2011

Proceedings of the 20th ACM International Conference on Information and Knowledge Management

178

110

View full text Add to dashboard Cite

show abstract

“…In this frame work, they solved two problems like data sparseness, synonyms problem using LDA method through MaxEnt classifier [8] .…”

Section: Related Workmentioning

confidence: 99%

Improved Keyword and Keyphrase Extraction from Meeting Transcripts

Sheeba¹,

Vivekanandan²

2012

IJCA

View full text Add to dashboard Cite

Interest prediction in social networks based on Markov chain modeling on clustered users

Zheng

Chen

et al. 2015

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYEffective user interest prediction is significant for service providers in a set of application scenarios such as user behavior analysis and resource recommendation. However, existing approaches are either incomplete or proprietary. In this paper, user interest prediction based on the Markov chain modeling on clustered users is proposed with the following procedure: collect dataset from 4613 users and more than 16 million messages from Sina Weibo; obtain each user's interest eigenvalue sequence and establish single-Markov chain model; and implement user clustering algorithm for the multi-Markov chain construction in order to divide users into a set of predefined interest categories. The proposed solution is capable of predicting both long-term and short-term user interests based on a suitable selection of the initial state distribution, λ. The proposed solution also proves that short-term interests are consistent with long-term interests if the influences of social or user-related events that cause interruptions (e.g., earthquake and birthday) are not considered. Furthermore, experiments show that the proposed solution is feasible and efficient and can achieve a higher accuracy of prediction than that of the other approaches such as Support Vector Machine (SVM) and K-means.

show abstract

A Hidden Topic-Based Framework toward Building Applications with Short Web Documents

Cited by 103 publications

References 34 publications

Transferring topical knowledge from auxiliary long texts for short text clustering

Transferring topical knowledge from auxiliary long texts for short text clustering

Improved Keyword and Keyphrase Extraction from Meeting Transcripts

Interest prediction in social networks based on Markov chain modeling on clustered users

Contact Info

Product

Resources

About