2018
DOI: 10.2139/ssrn.3240470
|View full text |Cite
|
Sign up to set email alerts
|

Web Mining of Firm Websites: A Framework for Web Scraping and a Pilot Study for Germany

Abstract: Standard-Nutzungsbedingungen:Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Zwecken und zum Privatgebrauch gespeichert und kopiert werden.Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich machen, vertreiben oder anderweitig nutzen.Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, gelten abweichend von diesen Nutzungsbedingungen die in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
28
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 19 publications
(30 citation statements)
references
References 46 publications
1
28
0
1
Order By: Relevance
“…Reaching 90% would require raising this threshold to 250, highlighting that web-based studies have to deal with outlier websites, as some firms (especially large ones) have massive websites with ten-thousands of webpages. (4) found that the amount of text per webpage is not statistically related to a firm's age, size, and sector. Following the suggested practice by (4), we excluded websites which redirect to a different domain when requesting their first webpage (i.e.…”
Section: Mip Innovation Survey Data the Mannheim Innovationmentioning
confidence: 90%
See 2 more Smart Citations
“…Reaching 90% would require raising this threshold to 250, highlighting that web-based studies have to deal with outlier websites, as some firms (especially large ones) have massive websites with ten-thousands of webpages. (4) found that the amount of text per webpage is not statistically related to a firm's age, size, and sector. Following the suggested practice by (4), we excluded websites which redirect to a different domain when requesting their first webpage (i.e.…”
Section: Mip Innovation Survey Data the Mannheim Innovationmentioning
confidence: 90%
“…The resulting dataset contains 2.52 million firms and 1.15 million URLs (URL coverage of 46%). A prior analysis of this dataset by (4) showed that URL coverage differs systematically with firm characteristics. Only a fraction of very young (younger than two year) and very small firms (fewer than five employees) are covered by an URL after controlling for the search quality of the data provider.…”
Section: Datamentioning
confidence: 95%
See 1 more Smart Citation
“…These innovation indicators suffer, however, from some major drawbacks (i.e. Axenbeck & Kinne 2018, Pukelis & Stanciauskas 2019. The MIP, for example, surveys around 18,000 firms every year.…”
Section: Introductionmentioning
confidence: 99%
“…Data of 4,485 German firms from the Mannheim Innovation Panel (MIP) 2019 is used. We extract their website's text and hyperlink structure by applying the ARGUS web-scraper (Kinne 2018). Several methods including topic modelling and natural language processing tools are applied to generate features that potentially relate to the firm-level innovation status.…”
Section: Introductionmentioning
confidence: 99%