Short text message streams are produced by Instant Messaging and Short Message Service which are wildly used nowadays. Each stream contains more than one thread usually. Detecting threads in the streams is helpful to various applications, such as business intelligence, investigation of crime and public opinion analysis. Existing works which are mainly based on text similarity encounter many challenges including the sparse eigenvector and anomaly of short text message. This paper introduces a novel concept of contextual correlation instead of the traditional text similarity into single-pass clustering algorithm to cover the challenges of thread detection. We firstly analyze the contextually correlative nature of conversations in short text message streams, and then propose an unsupervised method to compute the correlative degree. As a reference, a single-pass algorithm employing the contextual correlation is developed to detect threads in massive short text stream. Experiments on large real-life online chat logs show that our approach improves the performance by 11% when compared with the best similarity-based algorithm in terms of F1 measure.
For the low performance of slot filling method applied in Chinese entity-attribute extraction at present, this paper presents a distant supervision relation extraction method based on bidirectional long short-term memory neural network. First we get the Infobox of Baidu baike, using relation triples of Infobox to get the training corpus from the internet and then we train the classifier based on bidirectional LSTM Networks. Compared with classical methods, the method of this paper is fully automatic in the aspect of data annotation and feature extraction. Experiment results show that the proposed method is effective and it is suitable for information extraction in high dimensional space. Compared with the SVM algorithm, the accuracy rate is significantly improved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.