Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Response modeling is concerned with identifying potential customers who are likely to purchase a promoted product, based on customers’ demographic and behavioral data. Constructing a response model requires a preliminary campaign result database. Customers who responded to the campaign are labeled as respondents while those who did not are labeled as non-respondents. Those customers who were not chosen for the preliminary campaign do not have labels, and thus are called unlabeled. Then, using only those labeled customer data, a classification model is built in the supervised learning framework to predict all existing customers. However, often in response modeling, only a small part of customers are labeled, and thus available for model building, while a large number of unlabeled data may give valuable information. As a method to exploit the unlabeled data, we introduce semi-supervised learning to the interactive marketing community. A case study on the CoIL Challenge 2000 and the Direct Marketing Educational Foundation data sets shows that the transductive support vector machine, one of widely used semi-supervised models, can identify more respondents than conventional supervised models, especially when a small number of data are labeled. Semi-supervised learning is a viable alternative and merits further investigation.
Response modeling is concerned with identifying potential customers who are likely to purchase a promoted product, based on customers’ demographic and behavioral data. Constructing a response model requires a preliminary campaign result database. Customers who responded to the campaign are labeled as respondents while those who did not are labeled as non-respondents. Those customers who were not chosen for the preliminary campaign do not have labels, and thus are called unlabeled. Then, using only those labeled customer data, a classification model is built in the supervised learning framework to predict all existing customers. However, often in response modeling, only a small part of customers are labeled, and thus available for model building, while a large number of unlabeled data may give valuable information. As a method to exploit the unlabeled data, we introduce semi-supervised learning to the interactive marketing community. A case study on the CoIL Challenge 2000 and the Direct Marketing Educational Foundation data sets shows that the transductive support vector machine, one of widely used semi-supervised models, can identify more respondents than conventional supervised models, especially when a small number of data are labeled. Semi-supervised learning is a viable alternative and merits further investigation.
Protein modeling is an increasingly popular area of machine learning research. Semi-supervised learning has emerged as an important paradigm in protein modeling due to the high cost of acquiring supervised protein labels, but the current literature is fragmented when it comes to datasets and standardized evaluation techniques. To facilitate progress in this field, we introduce the Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. We curate tasks into specific training, validation, and test splits to ensure that each task tests biologically relevant generalization that transfers to real-life scenarios. We benchmark a range of approaches to semi-supervised protein representation learning, which span recent work as well as canonical sequence learning techniques. We find that self-supervised pretraining is helpful for almost all models on all tasks, more than doubling performance in some cases. Despite this increase, in several cases features learned by self-supervised pretraining still lag behind features extracted by state-of-the-art non-neural techniques. This gap in performance suggests a huge opportunity for innovative architecture design and improved modeling paradigms that better capture the signal in biological sequences. TAPE will help the machine learning community focus effort on scientifically relevant problems. Toward this end, all data and code used to run these experiments are available at https://github.com/songlab-cal/tape. * Equal Contribution
Background: In computational biology, a novel knowledge has been obtained mostly by identifying 'intra-relation,' the relation between entities on a specific biological level such as from gene expression or from microRNA (miRNA) and many such researches have been successful. However, intra-relations are not fully explaining complex cancer mechanisms because the inter-relation information between different levels of genomic data is missing, e.g. miRNA and its target genes. The 'inter-relation' between different levels of genomic data can be constructed from biological experimental data as well as genomic knowledge. Methods: Previously, we have proposed a graph-based framework that integrates with multi-layers of genomic data, copy number alteration, DNA methylation, gene expression, and miRNA expression, for the cancer clinical outcome prediction. However, the limitation of previous work was that we integrated with multi-layers of genomic data without considering of inter-relationship information between genomic features. In this paper, we propose a new integrative framework that combines genomic dataset from gene expression and genomic knowledge from inter-relation between miRNA and gene expression for the clinical outcome prediction as a pilot study. Results: In order to demonstrate the validity of the proposed method, the prediction of short-term/long-term survival for 82 patients in glioblastoma multiforme (GBM) was adopted as a base task. Based on our results, the accuracy of our predictive model increases because of incorporation of information fused over genomic dataset from gene expression and genomic knowledge from inter-relation between miRNA and gene expression. Conclusions: In the present study, the intra-relation of gene expression was reconstructed from inter-relation between miRNA and gene expression for prediction of short-term/long-term survival of GBM patients. Our finding suggests that the utilization of external knowledge representing miRNA-mediated regulation of gene expression is substantially useful for elucidating the cancer phenotype.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.