Abstract-In this study, we aim to obtain "natural groupings" of 151 local non-government organizations and institutions mentioned in a news archive of 77,000 articles spanning a decade (May 1999 to Jan 2010) from Indonesia. One of our goals is to enhance our understanding of counter-radical movements in critical locations in the Muslim world. We present information extraction techniques to recognize entities, and their beliefs and practices in text as a step towards identifying socially significant scales with explanatory power. Then, we proceed to cluster organizations based on these scales. We present experimental results, and discuss challenges in reasoning with the complex interactions of many simultaneous beliefs, practices and attitudes held by the leaders and followers of various organizations.
Abstract-With the humongous amount of news stories published daily and the range of ways (RSS feeds, blogs etc) to disseminate them, even an expert at tracking new developing stories can feel the information overload. At most times, when a user is reading a news story, she would like to know "what happened before this?" or "how things progressed after this incident?". In this paper, we present a novel real-time yet simple method to detect and track new events related to violence and terrorism in news streams through their life over a time line. We do this by first extracting signature of the event, at microscopic level rather than topic or macroscopic level, and then tracking and linking this event with mentions of same event signature in other incoming news articles. There by forming a thread that links all the news articles that describe this specific event, with no training data used or machine learning algorithms employed. We also present our experimental evaluations conducted with Document Understand Conference (DUC) datasets that validate our observations and methodology.
Document classification is the task to split the document set into distinct highly relative classes or groups based on nature of the document contents.Here, an improved approach of document classification called keywordbased document classification (KBDC) is introduced. It focuses on splitting the unstructured text document set into K number of dissimilar classes based on K predetermined keywords text models by improved probability technique. This new system comprises of the following stages. Namely, pre-processing, classification and classifier stage respectively. Initial, the proposed system (KBDC) recognizes all the immaterial existing contents in the input text document through constructed Predetermined Irrelevant Text Pattern Model (PITPM). Next, it divides the pre-processed document set into 'K' different groups or classes by K number of Predetermined Keyword Text Pattern Models (PKTPM) through probability technique, where K denotes the number of groups or classes or models. Finally, the KBDC system classifies the trial test text document without any class label that belongs to either of the existing group based on the K different class models (PKTPs). Experimentation results show that the KBDC is appropriate to split and identifies the unstructured text document set into K distinct extremely comparative classes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.