The rapidly developing internet and other media have produced a tremendous amount of text data, making it a challenging and valuable task to find a more effective way to analyze text data by machine. Text representation is the first step for a machine to understand the text, and the commonly used text representation method is the Bag-of-Words (BoW) model. To form the vector representation of a document, the BoW model separately matches and counts each element in the document, neglecting much correlation information among words. In this paper, we propose a network-based bag-of-words model, which collects high-level structural and semantic meaning of the words. Because the structural and semantic information of a network reflects the relationship between nodes, the proposed model can distinguish the relation of words. We apply the proposed model to text classification and compare the performance of the proposed model with different text representation methods on four document datasets. The results show that the proposed method achieves the best performance with high efficiency. Using the Eccentricity property of the network as features can get the highest accuracy. We also investigate the influence of different network structures in the proposed method. Experimental results reveal that, for text classification, the dynamic network is more suitable than the static network and the hybrid network.
The reliability of high-speed railway network is an important issue for the sustainable development of railway traffic. A high reliable railway network not only has a longer service life but also has a greater ability to resist destruction of the network. In this article, based on the theory of complex network, we construct a topological networked model to study and analyze the reliability of high-speed railway network with respect to the destruction caused by natural disasters, geological disasters, equipment failure, or man-made disasters. In real world, heavy rain and snow storms are frequent on a large scale. These destructed regions are represented by network communities. Here, we put forward an evaluation index to quantify the network reliability. Taking China high-speed railway network as an example, the results show that some key communities has great influence on the network reliability. When these key communities are destructed by some natural factors, the reliability of railway network would reduce greatly or even breakdown. In addition, we find that the network reliability with the number of deleted communities approximately shows an exponential law.
Root cause identification is an important task in providing prompt assistance for diagnosis, security monitoring and guidance for specific routine maintenance measures in the field of railway transportation. However, most of the methods addressing rail faults are based on state detection, which involves structured data. Manual cause identification from railway equipment maintenance and management text records is undoubtedly a time-consuming and laborious task. To quickly obtain the root cause text from unstructured data, this paper proposes an approach for root cause factor identification by using a root cause identification-new word sentence (RCI-NWS) keyword extraction method. The experimental results demonstrate that the extraction of railway fault text data can be performed using the keyword extraction method and the highest values are obtained using RCI-NWS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.