Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable endto-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Twitter has rapidly grown to a popular social network in recent years and provides a large number of real-time messages for users. Tweets are presented in chronological order and users scan the followees' timelines to find what they are interested in. However, an information overload problem has troubled many users, especially those with many followees and thousands of tweets arriving every day. In this paper, we focus on recommending useful tweets that users are really interested in personally to reduce the users' effort to find useful information. Many kinds of information on Twitter are available for helping recommendation, including the user's own tweet history, retweet history and social relations between users. We propose a method of making tweet recommendations based on collaborative ranking to capture personal interests. It can also conveniently integrate the other useful contextual information. Our final method considers three major elements on Twitter: tweet topic level factors, user social relation factors and explicit features such as authority of the publisher and quality of the tweet. The experiments show that all the proposed elements are important and our method greatly outperforms several baseline methods.
Collaborative filtering techniques rely on aggregated user preference data to make personalized predictions. In many cases, users are reluctant to explicitly express their preferences and many recommender systems have to infer them from implicit user behaviors, such as clicking a link in a webpage or playing a music track. The clicks and the plays are good for indicating the items a user liked (i.e., positive training examples), but the items a user did not like (negative training examples) are not directly observed. Previous approaches either randomly pick negative training samples from unseen items or incorporate some heuristics into the learning model, leading to a biased solution and a prolonged training period. In this paper, we propose to dynamically choose negative training samples from the ranked list produced by the current prediction model and iteratively update our model. The experiments conducted on three large-scale datasets show that our approach not only reduces the training time, but also leads to significant performance gains.
SignificanceIdentifying predictive biomarkers of therapeutic response for melanoma patients treated with immune checkpoint inhibitors is a major challenge. By combining microfluidic enrichment for melanoma circulating tumor cells (CTCs) together with RNA-based droplet digital PCR quantitation, we have established a highly sensitive and robust platform for noninvasive, blood-based monitoring of tumor burden. Serial monitoring of melanoma patients treated with immune checkpoint inhibitors shows rapid changes in CTC score, which precede standard clinical assessment and are highly predictive of long-term clinical outcome. Early on-treatment digital monitoring of CTC dynamics may thus help identify patients likely to benefit from immune checkpoint inhibition therapy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.