In the processing of Chinese documents and queries in information retrieval (IR), one has to identify the units that are used as indexes. Words and n-grams have been used as indexes in several previous studies, which showed that both kinds of indexes lead to comparable IR performances. In this study, we carry out more experiments on different ways to segment documents and queries, and to combine words with n-grams. Our experiments show that a combination of the longest-matching algorithm with single characters is the best choice.
Traffic flow forecasting is a critical task for urban traffic control and dispatch in the field of transportation, which is characterized by the high nonlinearity and complexity. In this paper, we propose an end-to-end deep learning based dual path framework, i.e., Spatial-Temporal Graph Attention Network (STGAT), for traffic flow forecasting. Specifically, different from previous structure-based approaches, STGAT can be directly generalized to the graph with arbitrary structure. Furthermore, STGAT is capable of handling long temporal sequence by stacking gated temporal convolution layer. The dual path architectures is proposed for taking both potential and existing spatial dependencies into account. By capturing potential spatial dependencies will naturally catch more useful information for forecasting. We design a gated fusion mechanism to combine the outputs from each path. The proposed model can be directly applicable to inductive learning tasks by introducing a graph attention mechanism into spatial-temporal framework, which means our model can be generalized to completely unseen graphs. Moreover, experimental results on two public real-world traffic network datasets, METR-LA and PEMS-BAY, show that our STGAT outperforms the state-of-the-art baselines. Additionally, we demonstrate the proposed model is competent for efficient migration between graphs with different structures.INDEX TERMS Traffic flow forecasting, spatial-temporal graph neural networks, intelligent transportation systems.
We present the first known empirical study on speech summarization without lexical features for Mandarin broadcast news. We evaluate acoustic, lexical and structural features as predictors of summary sentences. We find that the summarizer yields good performance at the average Fmeasure of 0.5646 even by using the combination of acoustic and structural features alone, which are independent of lexical features. In addition, we show that structural features are superior to lexical features and our summarizer performs surprisingly well at the average F-measure of 0.3914 by using only acoustic features. These findings enable us to summarize speech without placing a stringent demand on speech recognition accuracy.
This paper proposes a new method for feature extraction and recognition of epileptiform activity in EEG signals. The method improves feature extraction speed of epileptiform activity without reducing recognition rate. Firstly, Principal component analysis (PCA) is applied to the original EEG for dimension reduction and to the decorrelation of epileptic EEG and normal EEG. Then discrete wavelet transform (DWT) combined with approximate entropy (ApEn) is performed on epileptic EEG and normal EEG, respectively. At last, Neyman-Pearson criteria are applied to classify epileptic EEG and normal ones. The main procedure is that the principle component of EEG after PCA is decomposed into several sub-band signals using DWT, and ApEn algorithm is applied to the sub-band signals at different wavelet scales. Distinct difference is found between the ApEn values of epileptic and normal EEG. The method allows recognition of epileptiform activities and discriminates them from the normal EEG. The algorithm performs well at epileptiform activity recognition in the clinic EEG data and offers a flexible tool that is intended to be generalized to the simultaneous recognition of many waveforms in EEG.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.