The Use of Exhaustive Micro-Data Firm Databases for Economic Geography: The Issues of Geocoding and Usability in the Case of the Amadeus Database

Lennert, Moritz

doi:10.3390/ijgi4010062

Cited by 3 publications

(3 citation statements)

References 27 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In these studies, enterprise or firm data is widely used, including aggregated data, and micro enterprise data. As distinct from aggregated data, micro enterprise data allows users to analyze information at varying spatial levels or partitions, and provides much more fine-grained individual information, offering the potential for theoretical innovation in economic geography and regional studies that are invisible in aggregated data sets (Domenech, Lazzeretti, Molina, & Ruiz, 2011;Lennert, 2011).…”

Section: Industrial Spatial Distribution Analysismentioning

confidence: 99%

“…Different from numerical data imputation, text data imputation can harness NLP for semantic analysis. For text data imputation, to label a text with predefined categories, text classification is required; to extract unambiguous location information and even accurate coordinates from georeferenced text data, location estimation and geocoding are often needed (Chen, David, & Yang, 2013;Lennert, 2011). For example, there is a need to classify, estimate and geocode text location for social media data (Barapatre, Meena, & Ibrahim, 2016;Ghahremanlou, Sherchan, & Thom, 2015;Krumm & Horvitz, 2015).…”

Section: Data Imputation In Big Data Eramentioning

confidence: 99%

See 1 more Smart Citation

Big enterprise registration data imputation: Supporting spatiotemporal analysis of industries in China

Gui

et al. 2018

Computers, Environment and Urban Systems

View full text Add to dashboard Cite

A B S T R A C TBig, fine-grained enterprise registration data that includes time and location information enables us to quantitatively analyze, visualize, and understand the patterns of industries at multiple scales across time and space. However, data quality issues like incompleteness and ambiguity, hinder such analysis and application. These issues become more challenging when the volume of data is immense and constantly growing. High Performance Computing (HPC) frameworks can tackle big data computational issues, but few studies have systematically investigated imputation methods for enterprise registration data in this type of computing environment. In this paper, we propose a big data imputation workflow based on Apache Spark as well as a bare-metal computing cluster, to impute enterprise registration data. We integrated external data sources, employed Natural Language Processing (NLP), and compared several machine-learning methods to address incompleteness and ambiguity problems found in enterprise registration data. Experimental results illustrate the feasibility, efficiency, and scalability of the proposed HPC-based imputation framework, which also provides a reference for other big georeferenced text data processing. Using these imputation results, we visualize and briefly discuss the spatiotemporal distribution of industries in China, demonstrating the potential applications of such data when quality issues are resolved. Only one matchedAddresses with maximum matched degree 1. Chinese word segmentation and address noun selection 2. SQL fuzzy matching 3. Number of matched address records counting 5. Number of matched address records counting Yes Yes Appearance probability: 83% Appearance probability: 17% 7. Output Matched degree: 93.96% Matched degree: 93.96%

show abstract

Section: Industrial Spatial Distribution Analysismentioning

confidence: 99%

Section: Data Imputation In Big Data Eramentioning

confidence: 99%

Big enterprise registration data imputation: Supporting spatiotemporal analysis of industries in China

Gui

et al. 2018

Computers, Environment and Urban Systems

View full text Add to dashboard Cite

show abstract

“…However, traditional indexes are unable to accurately reflect agglomeration degrees, such as locational entropy, the Thiel index, spatial Gini coefficient, Herfindahl index, and EG index [13,15,[27][28][29][30][31]. This is mainly because these indexes were primarily designed for a fixed spatial scale [17], which makes them inevitably influenced by the zoning scheme of the administrative unit, i.e., the existence of the Modifiable Areal Unit Problem (MAUP) [22,32,33].…”

Section: Introductionmentioning

confidence: 99%

An Integrated Duranton and Overman Index and Local Duranton and Overman Index Framework for Industrial Spatial Agglomeration Pattern Analysis

Huang,

Zhuo,

Cao

2024

IJGI

View full text Add to dashboard Cite

Accurately measuring industrial spatial agglomeration patterns is crucial for promoting regional economic development. However, few studies have considered both agglomeration degrees and cluster locations of industries. Moreover, the traditional multi-scale cluster location mining (MCLM) method still has limitations in terms of accuracy, parameter setting, calculation efficiency, etc. This study proposes a new framework for analyzing industrial spatial agglomeration patterns, which uses the Duranton and Overman (DO) index for estimating agglomeration degrees and a newly developed local DO (LDO) index for mining cluster locations. The MCLM-LDO method was proposed by incorporating the LDO index into the MCLM method, and it was validated via comparisons with three baseline methods based on two synthetic datasets. The results proved that the MCLM-LDO method can achieve accuracies of 0.945 and 1 with computational times of 0.15 s and 0.11 s on two datasets, which are superior to existing MCLM methods. The proposed framework was further applied to analyze the spatial agglomeration patterns of the industry of computer, communication, and other electronic equipment manufacturing in Guangdong Province, China. The results showed that the framework gives a more holistic perspective of spatial agglomeration patterns, which can serve as more meaningful references for industrial sustainable development.

show abstract

Contrasting patterns and dynamics of patent offshoring in European regions

et al. 2022

View full text Add to dashboard Cite

The Use of Exhaustive Micro-Data Firm Databases for Economic Geography: The Issues of Geocoding and Usability in the Case of the Amadeus Database

Cited by 3 publications

References 27 publications

Big enterprise registration data imputation: Supporting spatiotemporal analysis of industries in China

Big enterprise registration data imputation: Supporting spatiotemporal analysis of industries in China

An Integrated Duranton and Overman Index and Local Duranton and Overman Index Framework for Industrial Spatial Agglomeration Pattern Analysis

Contrasting patterns and dynamics of patent offshoring in European regions

Contact Info

Product

Resources

About