Geographically-grounded situational awareness (SA) is critical to crisis management and is essential in many other decision making domains that range from infectious disease monitoring, through regional planning, to political campaigning. Social media are becoming an important information input to support situational assessment (to produce awareness) in all domains. Here, we present a geovisual analytics approach to supporting SA for crisis events using one source of social media, Twitter. Specifically, we focus on leveraging explicit and implicit geographic information for tweets, on developing place-time-theme indexing schemes that support overview+detail methods and that scale analytical capabilities to relatively large tweet volumes, and on providing visual interface methods to enable understanding of place, time, and theme components of evolving situations. Our approach is user-centered, using scenario-based design methods that include formal scenarios to guide design and validate implementation as well as a systematic claims analysis to justify design choices and provide a framework for future testing. The work is informed by a structured survey of practitioners and the end product of Phase-I development is demonstrated / validated through implementation in SensePlace2, a map-based, web application initially focused on tweets but extensible to other media.
In this article we present GeoTxt, a scalable geoparsing system for the recognition and geolocation of place names in unstructured text. GeoTxt offers six named entity recognition (NER) algorithms for place name recognition, and utilizes an enterprise search engine for the indexing, ranking, and retrieval of toponyms, enabling scalable geoparsing for streaming text. GeoTxt offers a flexible application programming interface (API), allowing for customized attribute and/or spatial ranking of retrieved toponyms. We evaluate the system on a corpus of manually geo-annotated tweets.First, we benchmark the performance of the six NERs that GeoTxt provides access to. Second, we assess GeoTxt toponym resolution accuracy incrementally, demonstrating improvements in toponym resolution achieved (or not achieved) by adding specific heuristics and disambiguation methods. Compared to using the GeoNames web service, GeoTxt's toponym resolution demonstrates a 20% accuracy gain. Our results show that places mentioned in the same tweet do not tend to be geographically proximate.
The Solr search algorithm is customizable and uses the following factors: tf (term frequency) -The more times a search term appears in a document, the higher the score. idf (inverse document frequency) -Matches on rarer terms count more than matches on common terms. coordination factor -If there are multiple terms in a query, the more terms that match, the higher the score. lengthNorm -Matches on a smaller field score higher than matches on a larger field. Larger documents have a higher probability of containing a search term simply due to size. If a document is short and contains the search term, it is scored as more relevant. index-time boost -A boost is a prioritization of certain values in a field set by the Solr user. If a boost was specified for a document at index-time, scores for searches that match that document will be boosted. The advantage of boosting during the indexing process is that the boost is already calculated and stored in the index and means a faster query. The negative of boosting at index-time is that the boost cannot be changed (Shahi, 2015).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.