Encouraged by the feasibility demonstration that a relatively low-cost grid environment can speed up the processing of continuous text streams of financial news in English, we have attempted to replicate our methods for automatic sentiment analysis in two major languages of the world-Arabic and Chinese. We show that our local grammar approach, developed on an archive of English (Indo-European language) texts, works equally on the typologically different Chinese (Sino-Asiatic) and Arabic (Semitic) languages.
An unsupervised learning method, based on corpus linguistics and special language terminology, is described that can extract time-varying information from text streams. The method is shown to be 'language-independent' in that its use leads to sets of regular-expressions that can be used to extract the information in typologically distinct languages like English and Arabic. The method uses the information related to the distribution of Ngrams, for automatically extracting 'meaning bearing' patterns of usage in a training corpus. The analysis of an English news wire corpus (1,720,142 tokens) and Arabic news wire corpus (1,720,154 tokens) show encouraging results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.