Proceedings of the 2013 KDD Cup 2013 Workshop 2013
DOI: 10.1145/2517288.2517291
|View full text |Cite
|
Sign up to set email alerts
|

KDD Cup 2013 - author-paper identification challenge

Abstract: This paper describes our submission to the KDD Cup 2013 Track 1 Challenge: Author-Paper Indentification in the Microsoft Academic Search database. Our approach is based on Gradient Boosting Machine (GBM) of Friedman ([5]) and deep feature engineering. The method was second in the final standings with Mean Average Precision (MAP) of 0.98144, while the winning submission scored 0.98259.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 8 publications
0
8
0
Order By: Relevance
“…• Sup: Triggered by KDD Cup 2013, the problem of author identification has recently garnered attention, and top solutions of the challenge heavily relied on feature engineering followed by supervised ranking models on these features [8,17]. Following them, we extract 16 features for each pair of paper and author in the training set.…”
Section: Methodsmentioning
confidence: 99%
“…• Sup: Triggered by KDD Cup 2013, the problem of author identification has recently garnered attention, and top solutions of the challenge heavily relied on feature engineering followed by supervised ranking models on these features [8,17]. Following them, we extract 16 features for each pair of paper and author in the training set.…”
Section: Methodsmentioning
confidence: 99%
“…• Supervised feature-based baselines. As widely used in similar author identification/disambiguation problems [12,13,34,9,33], this thread of methods first extract features for each pair of training data, and then applies supervised learning algorithm to learn some ranking/classification functions. Following them, we extract 20+ related features for each pair of paper and author in the training set (details can be found in appendix).…”
Section: Baselines and Experimental Settingsmentioning
confidence: 99%
“…Unlike traditional supervised learning, dense vectorized representations [16,15] are not directly available in networked data [26]. Hence, many traditional methods under network settings heavily rely on problem specific feature engineering [12,13,34,9,33].Although feature engineering can incorporate prior knowledge of the problem and network structure, usually it is time-consuming, problem specific (thus not transferable), and the extracted features may be too simple for complicated data sets [3]. Several network embedding methods [17,26,25] have been proposed to automatically learn feature representations for networked data.…”
mentioning
confidence: 99%
“…In the past few years, some works have devoted to paper-author pair identification problem in big scholarly data, such as studies in [9,19] and various solutions in [5,15,35] for 2013 KDD Cup author-paper identification challenge. Most of these works focused on feature engineering and utilized supervised learning algorithms to infer the correlation between paper and author.…”
Section: Targetmentioning
confidence: 99%
“…To solve the author identification problem, supervised leaning models have been applied to predict the correlation between paper and author, such as the ones used in the top solutions [5,15,35] of 2013 KDD Cup author-paper pair identification challenge and the multimodal approach in [19]. However, these methods heavily rely on time consuming and storage intensive feature engineering, which may extract irrelevant and redundant features or miss important features.…”
Section: Introductionmentioning
confidence: 99%