2015
DOI: 10.1002/asi.23596
|View full text |Cite
|
Sign up to set email alerts
|

Map of science with topic modeling: Comparison of unsupervised learning and human‐assigned subject classification

Abstract: The delineation of coordinates is fundamental for the cartography of science, and accurate and credible classification of scientific knowledge presents a persistent challenge in this regard. We present a map of Finnish science based on unsupervised-learning classification, and discuss the advantages and disadvantages of this approach vis-à-vis those generated by human reasoning. We conclude that from theoretical and practical perspectives there exist several challenges for human reasoningbased classification f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
75
0
5

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 124 publications
(80 citation statements)
references
References 45 publications
0
75
0
5
Order By: Relevance
“…For example, several topics identified different types of triggers for returning to Facebook, but no single topic aligned with the higher‐level theme of triggers. In applying LDA (Blei et al, ) to Finnish scientific papers, Suominen and Toivanen () similarly found that, in most cases, the topic model included finer distinctions than existing classification schemes. For instance, the Organization for Economic Cooperation and Development classification for medical research includes 10 subfields, but documents in that classification came from over 26 different topics (Suominen & Toivanen, , p. 7).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, several topics identified different types of triggers for returning to Facebook, but no single topic aligned with the higher‐level theme of triggers. In applying LDA (Blei et al, ) to Finnish scientific papers, Suominen and Toivanen () similarly found that, in most cases, the topic model included finer distinctions than existing classification schemes. For instance, the Organization for Economic Cooperation and Development classification for medical research includes 10 subfields, but documents in that classification came from over 26 different topics (Suominen & Toivanen, , p. 7).…”
Section: Resultsmentioning
confidence: 99%
“…For instance, topic modeling has been used to uncover the temporal evolution of topics in documents (Song et al, ), to compare between the popularity of scientific themes in open and closed access publications (Hu et al, ), and to detect communities of taggers in social recommendation systems (Li et al, ). Some preliminary work has compared machine classification with existing categorization schemes (Suominen & Toivanen, ).…”
Section: Related Workmentioning
confidence: 99%
“…However, compared to citations, the relationships derived from semantic structures are not as direct and clear as citation links. In recent years, the latent Dirichlet allocation approach (Blei, Ng, & Jordan, 2003) has played an active role in science maps (Suominen & Toivanen, 2015;Yau, Porter, Newman, & Suominen, 2014), as well as ontology and semantic webs (Wang, Lu, & Zhang, 2007). In recent years, the latent Dirichlet allocation approach (Blei, Ng, & Jordan, 2003) has played an active role in science maps (Suominen & Toivanen, 2015;Yau, Porter, Newman, & Suominen, 2014), as well as ontology and semantic webs (Wang, Lu, & Zhang, 2007).…”
Section: Co-word Analysismentioning
confidence: 99%
“…With that in mind, we resorted to matching the most prominent tokens with EuroVoc 4 terms, allowing us to base our work on a solid, well-documented foundation on the one hand, and to harness the power of a hierarchical, multilingual thesaurus on the other. Despite the fact that LDA provides a distribution of topics for each document, we resorted, in this case study, to hard-partitioning (Suominen and Toivanen, 2015): as this is a proof-of-concept approach to a semi-automatic classification of historical archives, soft-partitioning -associating several topics to a document -would have proven too time-consuming for too low a gain. Indeed, it can be argued that if topic models do detect the main topic of a document, as it is the case in our study, subsequent less-important topics can be assumed to be correct.…”
Section: Methodsmentioning
confidence: 99%
“…This approach, along with other text-mining routines, has gained momentum for document classification, as pointed out by Suominen and Toivanen (2015) in the specific field of bibliometrics or by Newman et al (2010) in a library context. Similarily, Roe et al (2016) uses LDA in order to draw a map of all human knowledge -as seen by d'Alembert and Diderot -contained in the French Encyclopédie.…”
Section: Introductionmentioning
confidence: 99%