Abstract.With the rapid development of the social network and e-commerce, we are exposed to enormous short text every day, ranging from twitters, movie comments, search snippets to news summaries. To classify the short and sparse text accurately is always the basic need for us to deal with information efficiently. However, previous methods fail to achieve high performance due to the sparseness and meaningless of the representation of text. The key breakout lies on the appropriate representation of the words, on which we excogitate a new framework. By discovering the latent topics in the related data crawled from the web, topic distribution can describe the text content in general. Combining with the word embedding generated from the online universal data, the proposed method is a more dense representation, containing semantic information from two different aspects. With this semantic representation of the texts, this framework greatly outperform the previous methods even using the most common SVM classifier, improving the accuracy by 11.58% on standard data set.