Proceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information 2013
DOI: 10.1145/2534732.2534736
|View full text |Cite
|
Sign up to set email alerts
|

Automatic gazetteer enrichment with user-geocoded data

Abstract: Geographical knowledge resources or gazetteers that are enriched with local information have the potential to add geographic precision to information retrieval. We have identified sources of novel local gazetteer entries in crowd-sourced OpenStreetMap and Wikimapia geotags that include geo-coordinates. We created a fuzzy match algorithm using machine learning (SVM) that checks both for approximate spelling and approximate geocoding in order to find duplicates between the crowd-sourced tags and the gazetteer in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
9
0
1

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(13 citation statements)
references
References 29 publications
1
9
0
1
Order By: Relevance
“…The unsupervised method surpasses the supervised method when the overlap ratio is less than 60% (when the overlap ratio is at 0.6, CHF still outperforms CustomAdaptive with a 1% margin). This observation confirms that the unsupervised technique, namely CHF, can handle 15 Among the datasets used in TopoCluster paper [8], LG L is the only dataset to which we have access 16 Mean error distance for TopoCluster in LG L is derived from the original paper [8]. unknown data better than the supervised method, namely Adaptive (CustomAdaptive implementation).…”
Section: Unseen Data Analysissupporting
confidence: 63%
See 1 more Smart Citation
“…The unsupervised method surpasses the supervised method when the overlap ratio is less than 60% (when the overlap ratio is at 0.6, CHF still outperforms CustomAdaptive with a 1% margin). This observation confirms that the unsupervised technique, namely CHF, can handle 15 Among the datasets used in TopoCluster paper [8], LG L is the only dataset to which we have access 16 Mean error distance for TopoCluster in LG L is derived from the original paper [8]. unknown data better than the supervised method, namely Adaptive (CustomAdaptive implementation).…”
Section: Unseen Data Analysissupporting
confidence: 63%
“…Provided that it co-occurred with either Alberta or Canada, we can pinpoint it (i.e., the city of Edmonton located in Canada). For each toponym t i , the preliminary disambiguation measures a score for each interpretation l i, j (lines 8-13) and picks the interpretation with maximum score (lines [14][15] and in case of tie, the most populous interpretation is selected (lines [16][17][18]. The score is acquired by finding the maximum similarity between l i, j mentions and its ancestors' mentions; similarity here is the inverse of term distance (line 11), as used by Yu and Rafiei [37].…”
Section: Algorithm 1: Preliminary Toponym Disambiguation In Cbhmentioning
confidence: 99%
“…While such a platform is useful, it can be challenging to constantly encourage people to contribute, especially over a long time period. In another study, Gelernter et al (2013) proposed a matching algorithm which can compare the tags in OpenStreetMap and Wikimapia with the place entries in a gazetteer, and can add the place information that are not contained in a gazetteer. Our work aligns with the general direction of these two studies, but utilizes geotagged housing advertisements posted on local-oriented websites for harvesting local place names.…”
Section: Related Workmentioning
confidence: 99%
“…A common intuition is that users often mention places that are near their current location. Several approaches have been presented to automatically geolocate non-geotagged textual clips using textual content [5,7,11,23,29]. Most of these methods rely on a training phase, during which they construct language models, in order to probabilistically infer the location of unseen messages.…”
Section: Related Workmentioning
confidence: 99%