Local place names are frequently used by residents living in a geographic region. Such place names may not be recorded in existing gazetteers, due to their vernacular nature, relative insignificance to a gazetteer covering a large area (e.g., the entire world), recent establishment (e.g., the name of a newly-opened shopping center), or other reasons. While not always recorded, local place names play important roles in many applications, from supporting public participation in urban planning to locating victims in disaster response. In this paper, we propose a computational framework for harvesting local place names from geotagged housing advertisements. We make use of those advertisements posted on local-oriented websites, such as Craigslist, where local place names are often mentioned. The proposed framework consists of two stages: natural language processing (NLP) and geospatial clustering. The NLP stage examines the textual content of housing advertisements, and extracts place name candidates. The geospatial stage focuses on the coordinates associated with the extracted place name candidates, and performs multi-scale geospatial clustering to filter out the non-place names. We evaluate our framework by comparing its performance with those of six baselines. We also compare our result with four existing gazetteers to demonstrate the not-yet-recorded local place names discovered by our framework.
To a large degree, the attraction of Big Data lies in the variety of its heterogeneous multi-thematic and multidimensional data sources and not merely its volume. To fully exploit this variety, however, requires conflation. This is a two step process. First, one has to establish identity relations between information entities across the different data sources; and second, attribute values have to be merged according to certain procedures which avoid logical contradictions. The first step, also called matching, can be thought of as a weighted combination of common attributes according to some similarity measures. In this work, we propose such a matching based on multiple attributes of Points of Interests (POI) from the Location-based Social Network Foursquare and the Yelp local directory service. While both contain overlapping attributes that can be use for matching, they have specific strengths and weaknesses which makes their conflation desirable. For instance, Foursquare offers information about user check-ins to places, while Yelp specializes in user-contributed reviews. We present a weighted multi-attribute matching strategy, evaluate its performance, and discuss application areas that benefit from a successful matching. Finally, we also outline how the established POI matches can be stored as Linked Data on the Semantic Web. Our strategy can automatically match 97% of randomly selected Yelp POI to their corresponding Foursquare entities.
As location-enabled technologies are becoming ubiquitous, our location is being shared with an ever-growing number of external services. Issues revolving around location privacy -or geoprivacy -therefore concern the vast majority of the population, largely without knowing how the underlying technologies work and what can be inferred from an individual's location, especially if recorded over longer periods of time. Research, on the other hand, has largely treated this topic from isolated standpoints, most prominently from the technological and ethical point of view. This article therefore reflects upon the current state of geoprivacy from a broader perspective. It integrates technological, ethical, legal, and educational aspects and clarifies how they interact and shape how we deal with the corresponding technology, both individually and as a society. It does so in the form of a manifesto, consisting of 21 theses that summarise the main arguments made in the article. These theses argue that location information is di↵erent from other kinds of personal information and, in combination, show why geoprivacy (and privacy in general) needs to be protected and should not become a mere illusion. The fictional couple of Jane and Tom is used as a running example to illustrate how common it has become to share our location information, and how it can be usedboth for good and for worse.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.