2020
DOI: 10.1016/j.ipm.2020.102312
|View full text |Cite
|
Sign up to set email alerts
|

A Google Trends spatial clustering approach for a worldwide Twitter user geolocation

Abstract: User location data is valuable for diverse social media analytics. In this paper, we address the non-trivial task of estimating a worldwide city-level Twitter user location considering only historical tweets. We propose a purely unsupervised approach (no location data is used) that is based on a synthetic geographic sampling of Google Trends (GT) city-level frequencies of tweet nouns and three clustering algorithms. The approach was validated empirically by using a recently collected dataset, with 3,268 worldw… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 29 publications
(11 citation statements)
references
References 32 publications
0
8
0
1
Order By: Relevance
“…As for practical implications, we recommend performing the spatial characterization as per city basis, this is mostly due to how the density based clusters vary according to the observed region and the computational power available. Moreover, in similar cases to Twitter, where none or only a small portion of messages carry the location context, incorporating other solutions to predict location context from messages lacking this information will significantly increase the observation surface [18]. Furthermore, practitioners might want to tune or extend upon our classification results, particularly for the classes with lower F1-scores.…”
Section: A Research Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…As for practical implications, we recommend performing the spatial characterization as per city basis, this is mostly due to how the density based clusters vary according to the observed region and the computational power available. Moreover, in similar cases to Twitter, where none or only a small portion of messages carry the location context, incorporating other solutions to predict location context from messages lacking this information will significantly increase the observation surface [18]. Furthermore, practitioners might want to tune or extend upon our classification results, particularly for the classes with lower F1-scores.…”
Section: A Research Discussionmentioning
confidence: 99%
“…We considered this to be of interest because marketing campaigns can be costly and may suffer from low user response compared to the original investment. Moreover, there might be other reasons to estimate the optimal location for a particular activity [17], such as opening a new store [6], [7], or estimating the interest for something in particular locations to validate assumptions [18]. Our work focuses on how to characterize geographic areas in relation to a selected set of product categories.…”
Section: Research Objective and Contribution Overviewmentioning
confidence: 99%
See 1 more Smart Citation
“…The most used algorithms are distance-based solutions such as K-means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), or Ordering Points To Identify the Clustering Structure (OPTICS), and probabilistic ones like Probabilistic Latent Semantic Indexing (PLSI) [20]. The work of Zola et al [40] is a good example of how some of these clustering techniques are currently used in the spatial data context to identify patterns in text collections. They estimate Twitter user location based on their tweets using Google Trends frequencies of tweet nouns and clustering to identify the most probable location.…”
Section: Related Workmentioning
confidence: 99%
“…Diversos estudios estiman fiabilidades en función del lugar del tuit por encima del 90% en Reino Unido o Estados Unidos, 85,8% en España o 83,95% Filipinas, con una media mundial del 77,84% y 88,15% en Europa (van der Veen et al, 2015), o un error medio de localización de 256 kilómetros (Holbrook et al, 2016). Dichos porcentajes pueden ser mejorados mediante el uso de técnicas complementarias varias de las que hay diversa bibliografía, que no han sido aplicadas para este trabajo (Zola, Ragno y Cortez, 2020).…”
Section: Metodologíaunclassified