KDD Cup 2013 - author-paper identification challenge

Efimov, Dmitry; Silva, Lucas F. M. da; Solecki, Benjamin

doi:10.1145/2517288.2517291

Cited by 8 publications

(8 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Sup: Triggered by KDD Cup 2013, the problem of author identification has recently garnered attention, and top solutions of the challenge heavily relied on feature engineering followed by supervised ranking models on these features [8,17]. Following them, we extract 16 features for each pair of paper and author in the training set.…”

Section: Methodsmentioning

confidence: 99%

Task-Guided Pair Embedding in Heterogeneous Network

Park

Kim

Zhu

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

Many real-world tasks solved by heterogeneous network embedding methods can be cast as modeling the likelihood of a pairwise relationship between two nodes. For example, the goal of author identification task is to model the likelihood of a paper being written by an author (paper-author pairwise relationship). Existing taskguided embedding methods are node-centric in that they simply measure the similarity between the node embeddings to compute the likelihood of a pairwise relationship between two nodes. However, we claim that for task-guided embeddings, it is crucial to focus on directly modeling the pairwise relationship. In this paper, we propose a novel task-guided pair embedding framework in heterogeneous network, called TaPEm, that directly models the relationship between a pair of nodes that are related to a specific task (e.g., paper-author relationship in author identification). To this end, we 1) propose to learn a pair embedding under the guidance of its associated context path, i.e., a sequence of nodes between the pair, and 2) devise the pair validity classifier to distinguish whether the pair is valid with respect to the specific task at hand. By introducing pair embeddings that capture the semantics behind the pairwise relationships, we are able to learn the fine-grained pairwise relationship between two nodes, which is paramount for task-guided embedding methods. Extensive experiments on author identification task demonstrate that TaPEm outperforms the state-of-the-art methods, especially for authors with few publication records.

show abstract

Section: Methodsmentioning

confidence: 99%

Task-Guided Pair Embedding in Heterogeneous Network

Park

Kim

Zhu

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

show abstract

“…• Supervised feature-based baselines. As widely used in similar author identification/disambiguation problems [12,13,34,9,33], this thread of methods first extract features for each pair of training data, and then applies supervised learning algorithm to learn some ranking/classification functions. Following them, we extract 20+ related features for each pair of paper and author in the training set (details can be found in appendix).…”

Section: Baselines and Experimental Settingsmentioning

confidence: 99%

“…Unlike traditional supervised learning, dense vectorized representations [16,15] are not directly available in networked data [26]. Hence, many traditional methods under network settings heavily rely on problem specific feature engineering [12,13,34,9,33].Although feature engineering can incorporate prior knowledge of the problem and network structure, usually it is time-consuming, problem specific (thus not transferable), and the extracted features may be too simple for complicated data sets [3]. Several network embedding methods [17,26,25] have been proposed to automatically learn feature representations for networked data.…”

mentioning

confidence: 99%

Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification

Chen

Sun

2017

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

208

165

View full text Add to dashboard Cite

In this paper, we study the problem of author identification under double-blind review setting, which is to identify potential authors given information of an anonymized paper. Different from existing approaches that rely heavily on feature engineering, we propose to use network embedding approach to address the problem, which can automatically represent nodes into lower dimensional feature vectors. However, there are two major limitations in recent studies on network embedding: (1) they are usually general-purpose embedding methods, which are independent of the specific tasks; and (2) most of these approaches can only deal with homogeneous networks, where the heterogeneity of the network is ignored. Hence, challenges faced here are two folds: (1) how to embed the network under the guidance of the author identification task, and (2) how to select the best type of information due to the heterogeneity of the network.To address the challenges, we propose a task-guided and pathaugmented heterogeneous network embedding model. In our model, nodes are first embedded as vectors in latent feature space. Embeddings are then shared and jointly trained according to task-specific and network-general objectives. We extend the existing unsupervised network embedding to incorporate meta paths in heterogeneous networks, and select paths according to the specific task. The guidance from author identification task for network embedding is provided both explicitly in joint training and implicitly during meta path selection. Our experiments demonstrate that by using pathaugmented network embedding with task guidance, our model can obtain significantly better accuracy at identifying the true authors comparing to existing methods. network mining problems, good representations of data are very important, as demonstrated by many previous work [16,15,17,26,7]. Unlike traditional supervised learning, dense vectorized representations [16,15] are not directly available in networked data [26]. Hence, many traditional methods under network settings heavily rely on problem specific feature engineering [12,13,34,9,33].Although feature engineering can incorporate prior knowledge of the problem and network structure, usually it is time-consuming, problem specific (thus not transferable), and the extracted features may be too simple for complicated data sets [3]. Several network embedding methods [17,26,25] have been proposed to automatically learn feature representations for networked data. A key idea behind network embedding is learning to map nodes into vector space, such that the proximities among nodes can be preserved. Similar nodes (in terms of connectivity, or other properties) are expected to be placed near each other in the vector space.Unfortunately, most existing embedding methods produce generalpurpose embeddings that are independent of tasks, and they are usually designed for homogeneous networks [17,26]. When it comes to author identification problem under the heterogeneous networks, existing embedding methods cannot be applied direc...

show abstract

“…In the past few years, some works have devoted to paper-author pair identification problem in big scholarly data, such as studies in [9,19] and various solutions in [5,15,35] for 2013 KDD Cup author-paper identification challenge. Most of these works focused on feature engineering and utilized supervised learning algorithms to infer the correlation between paper and author.…”

Section: Targetmentioning

confidence: 99%

“…To solve the author identification problem, supervised leaning models have been applied to predict the correlation between paper and author, such as the ones used in the top solutions [5,15,35] of 2013 KDD Cup author-paper pair identification challenge and the multimodal approach in [19]. However, these methods heavily rely on time consuming and storage intensive feature engineering, which may extract irrelevant and redundant features or miss important features.…”

Section: Introductionmentioning

confidence: 99%

Camel

Zhang

Huang

et al. 2018

Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18

View full text Add to dashboard Cite

In this paper, we study the problem of author identification in big scholarly data, which is to effectively rank potential authors for each anonymous paper by using historical data. Most of the existing deanonymization approaches predict relevance score of paper-author pair via feature engineering, which is not only time and storage consuming, but also introduces irrelevant and redundant features or miss important attributes. Representation learning can automate the feature generation process by learning node embeddings in academic network to infer the correlation of paper-author pair. However, the learned embeddings are often for general purpose (independent of the specific task), or based on network structure only (without considering the node content). To address these issues and make a further progress in solving the author identification problem, we propose Camel, a content-aware and meta-path augmented metric learning model. Specifically, first, the directly correlated paper-author pairs are modeled based on distance metric learning by introducing a push loss function. Next, the paper content embedding encoded by the gated recurrent neural network is integrated into the distance loss. Moreover, the historical bibliographic data of papers is utilized to construct an academic heterogeneous network, wherein a meta-path guided walk integrative learning module based on the task-dependent and content-aware Skipgram model is designed to formulate the correlations between each paper and its indirect author neighbors, and further augments the model. Extensive experiments demonstrate that Camel outperforms the state-of-the-art baselines. It achieves an average improvement of 6.3% over the best baseline method.

show abstract

KDD Cup 2013 - author-paper identification challenge

Cited by 8 publications

References 8 publications

Task-Guided Pair Embedding in Heterogeneous Network

Task-Guided Pair Embedding in Heterogeneous Network

Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification

Camel

Contact Info

Product

Resources

About