Bridging Collaborative Filtering and Semi-Supervised Learning

Yang, Carl; Bai, Lanxiao; Zhang, Chao; Yuan, Quan; Han, Jiawei

doi:10.1145/3097983.3098094

Cited by 294 publications

(193 citation statements)

References 36 publications

Supporting

Mentioning

192

Contrasting

Unclassified

Order By: Relevance

“…Heterogeneous network has been intensively studied due to its power of accommodating multi-typed interconnected data [21,22,3,30]. In this work, we stress that rich contents are prevalently available on nodes in the networks, and we define content-rich heterogeneous networks as follows.…”

Section: Heterogeneous Network Modelingmentioning

confidence: 99%

See 1 more Smart Citation

Similarity Modeling on Heterogeneous Networks via Automatic Path Discovery

Yang

Liu

Fang

et al. 2019

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Heterogeneous networks are widely used to model real-world semi-structured data. The key challenge of learning over such networks is the modeling of node similarity under both network structures and contents. To deal with network structures, most existing works assume a given or enumerable set of meta-paths and then leverage them for the computation of meta-path-based proximities or network embeddings. However, expert knowledge for given meta-paths is not always available, and as the length of considered meta-paths increases, the number of possible paths grows exponentially, which makes the path searching process very costly. On the other hand, while there are often rich contents around network nodes, they have hardly been leveraged to further improve similarity modeling. In this work, to properly model node similarity in content-rich heterogeneous networks, we propose to automatically discover useful paths for pairs of nodes under both structural and content information. To this end, we combine continuous reinforcement learning and deep content embedding into a novel semi-supervised joint learning framework. Specifically, the supervised reinforcement learning component explores useful paths between a small set of example similar pairs of nodes, while the unsupervised deep embedding component captures node contents and enables inductive learning on the whole network. The two components are jointly trained in a closed loop to mutually enhance each other. Extensive experiments on three real-world heterogeneous networks demonstrate the supreme advantages of our algorithm.

show abstract

Section: Heterogeneous Network Modelingmentioning

confidence: 99%

“…Recently, increasing research attention has been paid to heterogeneous networks, highlighting multityped nodes and connections. Their modeling of rich semantics in terms of both node contents and typed links enables the integration of real-world data from various sources and facilitates wide applications [22,13,30,31,33].…”

Section: Introductionmentioning

confidence: 99%

Similarity Modeling on Heterogeneous Networks via Automatic Path Discovery

Yang

Liu

Fang

et al. 2019

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Motivated by a recent work on place recommendation [50], we refine the place embedding through unsupervised embedding smoothing on a place network. The idea is to require the embeddings of places that have similar coordinates or same categories to be close.…”

Section: Leveraging Coordinate and Categorymentioning

confidence: 99%

“…Following [50], we derive the loss that enforces smoothness among places that are close on the place network as…”

Section: Leveraging Coordinate and Categorymentioning

confidence: 99%

Place Deduplication with Embeddings

Yang

Hoang

Mikolov

2019

The World Wide Web Conference

Self Cite

View full text Add to dashboard Cite

Thanks to the advancing mobile location services, people nowadays can post about places to share visiting experience on-the-go. A large place graph not only helps users explore interesting destinations, but also provides opportunities for understanding and modeling the real world. To improve coverage and flexibility of the place graph, many platforms import places data from multiple sources, which unfortunately leads to the emergence of numerous duplicated places that severely hinder subsequent location-related services. In this work, we take the anonymous place graph from Facebook as an example to systematically study the problem of place deduplication: We carefully formulate the problem, study its connections to various related tasks that lead to several promising basic models, and arrive at a systematic two-step data-driven pipeline based on place embedding with multiple novel techniques that works significantly better than the state-of-the-art.

show abstract

“…To make a fair comparison, we also replace the weighted nonnegative MF approach with CMF. • PACE [49] proposes a neural approach to bridge collaborative filtering and SSL, and we have distinguished our work from it in Section 2.3. Note that PACE can only support binary ratings and thus we only compare with it on Delicious and Lastfm datasets.…”

Section: Experiments Settingmentioning

confidence: 99%

Semi-supervised Learning Meets Factorization

Chen

Chang

et al. 2018

ACM Trans. Knowl. Discov. Data

View full text Add to dashboard Cite

Recently latent factor model (LFM) has been drawing much attention in recommender systems due to its good performance and scalability. However, existing LFMs predict missing values in a user-item rating matrix only based on the known ones, and thus the sparsity of the rating matrix always limits their performance. Meanwhile, semi-supervised learning (SSL) provides an effective way to alleviate the label (i.e., rating) sparsity problem by performing label propagation, which is mainly based on the smoothness insight on affinity graphs. However, graph-based SSL suffers serious scalability and graph unreliable problems when directly being applied to do recommendation. In this paper, we propose a novel probabilistic chain graph model (CGM) to marry SSL with LFM. The proposed CGM is a combination of Bayesian network and Markov random field. The Bayesian network is used to model the rating generation and regression procedures, and the Markov random field is used to model the confidence-aware smoothness constraint between the generated ratings. Experimental results show that our proposed CGM significantly outperforms the state-of-the-art approaches in terms of four evaluation metrics, and with a larger performance margin when data sparsity increases.Latent factor model. Among the existing recommendation approaches, latent factor model (LFM) has been drawing much attention due to its good performance and scalability. LFM uses a low dimensional user and item latent factors to represent the characteristics of each user and each item, and uses the product of them to represent the user's rating on the item. LFM has drawn much attention recently due to its good performance and scalability [1,4,19,24,26,28,34,35,37,43,44,47]. As users have actions (e.g., rate and buy) on some items, LFM aims to predict the users' unknown actions on other items. The tendency of a user's action on an item can be indicated by a real-valued number, i.e., rating or label. Thus, the recommendation problem is also known as the unknown ratings prediction problem [41]. In practice, however, many LFMs have to evaluate very large user and item sets, where the user-item (U-I) matrix is extremely sparse-such data sparsity has always been its main challenge [42]. Semi-supervised learning. SSL uses unlabeled data to either modify or reprioritize hypotheses obtained from labeled data alone, and thus can alleviate the label sparsity problem by adopting the graph information between data [40]. Towards effective SSL, affinity graph-based smoothness approaches have attracted much research interests, which follow the smoothness insight: close nodes on an affinity graph have similar labels. Graph-based SSL is appealing recently because it is easy to implement and gives rise to closed-form solutions [9,13,16,45,46,52]. However, graph-based SSL directly predicts the unknown ratings in the original U-I matrix, and thus suffers from the scalability problem.As a key insight of this paper, we identify the marriage of SSL and LFM. The main insights of SSL (i.e., smoothness...

show abstract

Bridging Collaborative Filtering and Semi-Supervised Learning

Cited by 294 publications

References 36 publications

Similarity Modeling on Heterogeneous Networks via Automatic Path Discovery

Similarity Modeling on Heterogeneous Networks via Automatic Path Discovery

Place Deduplication with Embeddings

Semi-supervised Learning Meets Factorization

Contact Info

Product

Resources

About