2018
DOI: 10.1553/giscience2018_01_s82
|View full text |Cite
|
Sign up to set email alerts
|

Generating Big Spatial Data on Firm Innovation Activity from Text- Mined Firm Websites

Abstract: Innovation is one of the major drivers of economic growth, where spatial processes of knowledge spillover play a vital role. Current practices in assessing firms' innovation activity, including patent analysis and questionnaires, suffer from severe limitations. In this paper, we propose a novel approach to estimate firms' innovation activity based on the texts on their websites. We use an automated web-scraper to harvest text from the websites, then extract semantic topics in a self-learning, generative topic-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 19 publications
0
6
0
Order By: Relevance
“…For example, the information in company websites is self-reported. Moreover, it is not standardized, and thus needs accurate processing for appropriate analysis (Arora et al , 2016; Kinne and Resch, 2018). Nevertheless, there is an increasing number of applications to facilitate the use of data collected from the Web for economics and tourism analysis.…”
Section: Methodsmentioning
confidence: 99%
“…For example, the information in company websites is self-reported. Moreover, it is not standardized, and thus needs accurate processing for appropriate analysis (Arora et al , 2016; Kinne and Resch, 2018). Nevertheless, there is an increasing number of applications to facilitate the use of data collected from the Web for economics and tourism analysis.…”
Section: Methodsmentioning
confidence: 99%
“…For the extraction of the diabetes apps' metadata, we first devised the architecture [28] and subsequently developed the corresponding software module for the automatic extraction of mobile app metadata using the web-based API of 42Matters. The output of this module is a data set stored locally in a comma-separated values (CSV) file.…”
Section: Methodsmentioning
confidence: 99%
“…The main difference between the two studies is the way the text on websites has been prepared, the type of model used, and the innovation indicator considered. Lenz (2019, 2021) use a web scraping approach described in Axenbeck (2018, 2020) and Kinne and Resch (2018) and developed a deep neural network for analysing website content. The text analysis rests on a dictionary of all words that occurred on websites as long as the document frequency is between 1.5% and 65% (i.e.…”
Section: Large-scale Web Scrapingmentioning
confidence: 99%