“…The first corpora for automated keyphrase extraction were likewise assembled out of publications from scientific fields including technical reports (Witten et al, 1999), paper abstracts (Hulth, 2003), and scientific papers (Nguyen and Kan, 2007;Medelyan et al, 2009;Kim et al, 2010). To this day, scientific publications still serve as a fundamental fixed-domain benchmark for neural KPE methods (Meng et al, 2017;Alzaidy et al, 2019;Sahrawat et al, 2019) due to the availability of ample data of this kind. However, experiments have revealed that KPE methods trained directly on such corpora do not generalize well to other web-related genres or other types of documents (Chen et al, 2018;Xiong et al, 2019), where there may be far more heterogeneity in topics, content and structure, and there may be more variation in terms of where a key phrase may appear.…”