“…This process reduces researcher bias because foreknowledge of document content does not affect the topic classifications (Zhang et al, 2021). The LDA topic model is widely used in patent content analysis (Wang et al, 2015;Zhang et al, 2021) and technology topics evaluation (Li et al, 2021;Wang et al, 2020;Savin et al, 2022aSavin et al, , 2022b In order to apply LDA to the STO white papers, we first pre-processed the corpus by 1) converting words to lowercase, 2) removing standard English stop words and punctuation, and 3) lemmatizing all the words by means of the Natural Language Toolkit 6 lemmatiser. We then analysed the distribution of terms with domain experts and filtered out generic terms that appeared in more than 60% of the white papers (Zhang et al, 2021).…”