User interaction in social networks, such as Twitter and Facebook, is increasingly becoming a source of useful information on daily events. The online monitoring of short messages posted in such networks often provides insight on the repercussions of events of several different natures, such as (in the recent past) the earthquake and tsunami in Japan, the royal wedding in Britain and the death of Osama bin Laden. Studying the origins and the propagation of messages regarding such topics helps social scientists in their quest for improving the current understanding of human relationships and interactions. However, the actual location associated to a tweet or to a Facebook message can be rather uncertain. Some tweets are posted with an automatically determined location (from an IP address), or with a user‐informed location, both in text form, usually the name of a city. We observe that most Twitter users opt not to publish their location, and many do so in a cryptic way, mentioning non‐existing places or providing less specific place names (such as “Brazil”). In this article, we focus on the problem of enriching the location of tweets using alternative data, particularly the social relationships between Twitter users. Our strategy involves recursively expanding the network of locatable users using following‐follower relationships. Verification is achieved using cross‐validation techniques, in which the location of a fraction of the users with known locations is used to determine the location of the others, thus allowing us to compare the actual location to the inferred one and verify the quality of the estimation. With an estimate of the precision of the method, it can then be applied to locationless tweets. Our intention is to infer the location of as many users as possible, in order to increase the number of tweets that can be used in spatial analyses of social phenomena. The article demonstrates the feasibility of our approach using a dataset comprising tweets that mention keywords related to dengue fever, increasing by 45% the number of locatable tweets.
Addresses are the most common georeferencing resource people use to communicate to others a location within a city. Urban GIS applications that receive data directly from citizens, or from legacy information systems, need to be able to quickly and efficiently obtain a spatial location from addresses. In this paper we understand addresses in a broader perspective, in which not only the conventional elements of postal addresses are considered, but other kinds of direct or indirect references to places, such as building names, postal codes, or telephone area codes, which are also valuable as locators to urban places. This broader view on addresses allows us to work with two perspectives. First, in the ontological definition, modeling, and implementation of an addressing database that is flexible enough to accommodate the variety of concepts and address formats used worldwide, along with direct and indirect references to places. Second, in the definition of an indicator that is able to quantify the degree of certainty that could be reached when a user-given, semi-structured address is geocoded into a spatial position, as a function of the type and completeness of the available addressing data and of the geocoding method that has been employed. This indicator, which we call Geocoding Certainty Indicator (GCI), can be used as a threshold, beyond which the geocoded event should be left out of any statistical analysis, or as a weight that allows spatial analysis methods to reduce the influence of events that have been less reliably located. In order to support geocoding activities and the determination of the GCI, we propose a conceptual schema for addressing databases. The schema is flexible enough to accommodate a variety of addressing systems, at various levels of detail, and in different countries. Our intention is to depart from the usual geocoding strategy employed in commercial GIS products, which is usually limited to the average American or British address format. The schema also extends the notion of
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.