Proceedings of the 2013 KDD Cup 2013 Workshop 2013
DOI: 10.1145/2517288.2517290
|View full text |Cite
|
Sign up to set email alerts
|

Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013

Abstract: The track 1 problem in KDD Cup 2013 is to discriminate between papers confirmed by the given authors from the other deleted papers. This paper describes the winning solution of team National Taiwan University for track 1 of KDD Cup 2013. First, we conduct the feature engineering to transform the various provided text information into 97 features. Second, we train classification and ranking models using these features. Last, we combine our individual models to boost the performance by using results on the inter… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
10
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 12 publications
0
10
0
Order By: Relevance
“…• Sup: Triggered by KDD Cup 2013, the problem of author identification has recently garnered attention, and top solutions of the challenge heavily relied on feature engineering followed by supervised ranking models on these features [8,17]. Following them, we extract 16 features for each pair of paper and author in the training set.…”
Section: Methodsmentioning
confidence: 99%
“…• Sup: Triggered by KDD Cup 2013, the problem of author identification has recently garnered attention, and top solutions of the challenge heavily relied on feature engineering followed by supervised ranking models on these features [8,17]. Following them, we extract 16 features for each pair of paper and author in the training set.…”
Section: Methodsmentioning
confidence: 99%
“…The problem of author identification has been briefly studied before [11]. And we also notice KDD Cup 2013 has similar author identification/disambiguation problem [12,13,34,9,33], where participants are asked to predict which paper is truly written by some author. However, different from the KDD Cup, our setting is different from them in the sense that (1) existing authors are unknown in our double-blind setting, and (2) we consider the reference of the paper, which is one of the most important sources of information.…”
Section: Related Workmentioning
confidence: 99%
“…Unlike traditional supervised learning, dense vectorized representations [16,15] are not directly available in networked data [26]. Hence, many traditional methods under network settings heavily rely on problem specific feature engineering [12,13,34,9,33].Although feature engineering can incorporate prior knowledge of the problem and network structure, usually it is time-consuming, problem specific (thus not transferable), and the extracted features may be too simple for complicated data sets [3]. Several network embedding methods [17,26,25] have been proposed to automatically learn feature representations for networked data.…”
mentioning
confidence: 99%
“…In organizational networks, graph-based models, largely based on random walk [23], are widely used to estimate individual authority. In this field, extensive researches have demonstrated strong correlations between centrality and authority [10,16,26,27]. The famous PageRank algorithm proposed by L. Page et al [18] and later the Topic-sensitive Pagerank [11] have proven the value of the citation graph for web pages.…”
Section: Related Workmentioning
confidence: 99%