No abstract
We propose a pipeline for learning event templates from a large corpus of textual news articles. An event template is a machine-usable semantic data structure, in our case a graph, describing a certain event type. Most earthquake news reports, for example, semantically fit the template "x people dead, town y shook, at time z". Such templates can be used as an input for information extraction tasks or automated ontology extension. We also present preliminary results in the form of sample extracted templates from Google News articles.
For most events of at least moderate significance, there are likely tens, often hundreds or thousands of online articles reporting on it, each from a slightly different perspective. If we want to understand an event in depth, from multiple perspectives, we need to aggregate multiple sources and understand the relations between them. However, current news aggregators do not offer this kind of functionality. As a step towards a solution, we propose DiversiNews, a real-time news aggregation and exploration platfom whose main feature is a novel set of controls that allow users to contrast reports of a selected event based on topical emphases, sentiment differences and/or publisher geolocation. News events are presented in the form of a ranked list of articles pertaining to the event and an automatically generated summary. Both the ranking and the summary are interactive and respond in real time to user’s change of controls. We validated the concept and the user interface through user tests with positive results.
In recent years, both academia and the industry have seen a push for converting unstructured data, most commonly text, into structured representations. A relatively poorly explored challenge in this area is that of domain template construction: for a domain, we wish to find the attributes with which texts from that domain can be meaningfully represented. For example, given the domain of news reports on bombing attacks, we would like to identify the existence of concepts like "victim" and "perpetrator". We introduce two new methods for this task, both operating on semantic representations of input data and exploiting the hierarchical organization of features, something not explored in prior art. We evaluate on multiple datasets/domains and achieve performance at least comparable to a state of the art method on a set of "real world" scenarios while additionally identifying fine-grained type information for properties: for example, the bombing attack victim is found to be of type "defender" (policeman, guard, ...).We also provide the first fully documented evaluation methodology, publicly available labeled datasets and golden standard outputs for this research problem, supporting and facilitating future work in the area.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.