Recent years, Chinese text classification has attracted more and more research attention. However, most existing techniques which specifically aim at English materials may lose effectiveness on this task due to the huge difference between Chinese and English. Actually, as a special kind of hieroglyphics, Chinese characters and radicals are semantically useful but still unexplored in the task of text classification. To that end, in this paper, we first analyze the motives of using multiple granularity features to represent a Chinese text by inspecting the characteristics of radicals, characters and words. For better representing the Chinese text and then implementing Chinese text classification, we propose a novel Radicalaware Attention-based Four-Granularity (RAFG) model to take full advantages of Chinese characters, words, characterlevel radicals, word-level radicals simultaneously. Specifically, RAFG applies a serialized BLSTM structure which is context-aware and able to capture the long-range information to model the character sharing property of Chinese and sequence characteristics in texts. Further, we design an attention mechanism to enhance the effects of radicals thus model the radical sharing property when integrating granularities. Finally, we conduct extensive experiments, where the experimental results not only show the superiority of our model, but also validate the effectiveness of radicals in the task of Chinese text classification.
Cognitive psychology research shows that humans have the instinct for abstract thinking, where association plays an essential role in language comprehension. Especially for Chinese, its ideographic writing system allows radicals to trigger semantic association without the need of phonetics. In fact, subconsciously using the associative information guided by radicals is a key for readers to ensure the robustness of semantic understanding. Fortunately, many basic and extended concepts related to radicals are systematically included in Chinese language dictionaries, which leaves a handy but unexplored way for improving Chinese text representation and classification. To this end, we draw inspirations from cognitive principles between ideography and human associative behavior to propose a novel Radical-guided Associative Model (RAM) for Chinese text classification. RAM comprises two coupled spaces, namely Literal Space and Associative Space, which imitates the real process in people's mind when understanding a Chinese text. To be specific, we first devise a serialized modeling structure in Literal Space to thoroughly capture the sequential information of Chinese text. Then, based on the authoritative information provided by Chinese language dictionaries, we design an association module and put forward a strategy called Radical-Word Association to use ideographic radicals as the medium to associate prior concept words in Associative Space. Afterwards, we design an attention module to imitate people's matching and decision between Literal Space and Associative Space, which can balance the importance of each associative words under specific contexts. Finally, extensive experiments on two real-world datasets prove the effectiveness and rationality of RAM, with good cognitive insights for future language modeling.
Recent years have witnessed the increasing interests in research of crowdfunding mechanism. In this area, dynamics tracking is a significant issue but is still under exploration. Existing studies either fit the fluctuations of time-series or employ regularization terms to constrain learned tendencies. However, few of them take into account the inherent decision-making process between investors and crowdfunding dynamics. To address the problem, in this paper, we propose a Trajectory-based Continuous Control for Crowdfunding (TC3) algorithm to predict the funding progress in crowdfunding. Specifically, actor-critic frameworks are employed to model the relationship between investors and campaigns, where all of the investors are viewed as an agent that could interact with the environment derived from the real dynamics of campaigns. Then, to further explore the in-depth implications of patterns (i.e., typical characters) in funding series, we propose to subdivide them into fast-growing and slow-growing ones. Moreover, for the purpose of switching from different kinds of patterns, the actor component of TC3 is extended with a structure of options, which comes to the TC3-Options. Finally, extensive experiments on the Indiegogo dataset not only demonstrate the effectiveness of our methods, but also validate our assumption that the entire pattern learned by TC3-Options is indeed the U-shaped one.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.