Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics

Krogel, Mark-A.; Scheffer, Tobias

doi:10.1023/b:mach.0000035472.73496.0c

Cited by 52 publications

(27 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For inferring annotation rules, a machine learning algorithm takes these feature vectors and known annotations as an input to train model [31]. In order to adapt to topic model that has been developed for text mining, we set up a parallelism between text documents and proteins in our framework.…”

Section: The Bow Of Protein Sequencementioning

confidence: 99%

Predicting protein function via multi-label supervised topic model on gene ontology

Liu

Tang

et al. 2017

Biotechnology & Biotechnological Equipment

View full text Add to dashboard Cite

As the biological datasets accumulate rapidly, computational methods designed to automate protein function prediction are critically needed. The problem of protein function prediction can be considered as a multi-label classification problem resulting in protein functional annotations. Nevertheless, biologists prefer to discover the correlations between protein attributes and functions. We introduce a multi-label supervised topic model into protein function prediction and investigate the advantages of this approach. This topic model can not only work out the function probability distributions over protein instances effectively, but also directly provide the words probability distributions over functions. To the best of our knowledge, this is the first effort to apply a multi-label supervised topic model to the protein function prediction. In this paper, we model a protein as a document and a function label as a topic. First, a set of protein sequences is formalized into a bag of words. Then, we perform inference and estimate the model parameters to predict protein functions. Experimental results on yeast and human datasets demonstrate the effectiveness of this multi-label supervised topic model on protein function prediction. Meanwhile, the experiments also show that this multi-label supervised topic model delivers superior results over the compared algorithms. In summary, the method discussed in this paper provides a new efficient approach to protein function prediction and reveals more information about functions.

show abstract

Section: The Bow Of Protein Sequencementioning

confidence: 99%

Predicting protein function via multi-label supervised topic model on gene ontology

Liu

Tang

et al. 2017

Biotechnology & Biotechnological Equipment

View full text Add to dashboard Cite

show abstract

“…Another recent theoretical analysis treats co-training as a combinative label propagation over multiple views and provides a sufficient and necessary condition desired for co-training [163]. However, the performance could be dramatically degraded if the classifiers do not complement each other or the independency assumption does not hold [88]. Though co-training is conceptually treated as a semi-supervised learning paradigm due to the way unlabeled data is incorporated, the classifier training procedure is often supervised [22].…”

Section: Co-trainingmentioning

confidence: 99%

Semi-supervised learning for scalable and robust visual search

Wang

2011

SIGMultimedia Rec.

View full text Add to dashboard Cite

Semi-Supervised Learning for Scalable and Robust Visual Search Jun WangUnlike textual document retrieval, searching of visual data is still far from satisfactory. There exist major gaps between the available solutions and practical needs in both accuracy and computational cost. This thesis aims at the development of robust and scalable solutions for visual search and retrieval. Specifically, we investigate two classes of approaches: graph-based semi-supervised learning and hashing techniques. The graph-based approaches are used to improve accuracy, while hashing approaches are used to improve efficiency and cope with large-scale applications. A common theme shared between these two subareas of our work is the focus on semi-supervised learning paradigm, in which a small set of labeled data is complemented with large unlabeled datasets.Graph-based approaches have emerged as methods of choice for general semi-supervised tasks when no parametric information is available about the data distribution. It treats both labeled and unlabeled samples as vertices in a graph and then instantiates pairwise edges between these vertices to capture affinity between the corresponding samples. A quadratic regularization framework has been widely used for label prediction over such graphs. However, most of the existing graphbased semi-supervised learning methods are sensitive to the graph construction process and the initial labels. We propose a new bivariate graph transduction formulation and an efficient solution via an alternating minimization procedure. Based on this bivariate framework, we also develop new methods to filter unreliable and noisy labels. Extensive experiments over diverse benchmark datasets demonstrate the superior performance of our proposed methods.However, graph-based approaches suffer from the critical bottleneck in scalability since graph construction requires a quadratic complexity and the inference procedure costs even more. The widely used graph construction method relies on nearest neighbor search, which is prohibitive for large-scale applications. In addition, most large-scale visual search problems involve handling highdimensional visual descriptors, thereby causing another challenge in excessive storage requirement.To handle the scalability issue of both computation and storage, the second part of the thesis focuses on efficient techniques for conducting approximate nearest neighbor (ANN) search, which is key to many machine learning algorithms, including graph-based semi-supervised learning and clustering.Specifically, we propose Semi-Supervised Hashing (SSH) methods that leverage semantic similarity over a small set of labeled data while preventing overfitting. We derive a rigorous formulation in which a supervised term minimizes the empirical errors on the labeled data and an unsupervised term provides effective regularization by maximizing variance and independence of individual bits.Experiments on several large datasets demonstrate the clear performance gain over several state-ofthe-art ...

show abstract

“…Krogel and Scheffer 22 have explored the effectiveness of using cotraining in functional genomic data that includes relational information. The authors perform an experimental analysis where they show that cotraining fails to improve classification results.…”

Section: Potential Bronchovascular Pair Detectionmentioning

confidence: 99%

“…A model L built by the relational learner using an initial training set is used in the parametertuning algorithm. The examples corresponding to the highest f1 measure 22 of completeness and correctness are used as positive examples. Completeness is also known as recall and sensitivity, while correctness is also known as precision in the pattern recognition literature.…”

Section: Image Preprocessingmentioning

confidence: 99%

Automatic Detection of Bronchial Dilatation in HRCT Lung Images

2008

View full text Add to dashboard Cite

Bronchiectasis is an airway disease caused by the dilatation of the bronchial tree, and a bronchovascular pair is formed between a bronchus and a vessel. An abnormal bronchovascular pair is one that has a larger bronchus compared to its accompanying vessel. Typically, bronchi and vessels running perpendicular to the plane of section appear as near-circular rings on computed tomography (CT) scans. This paper describes BV_pairs, a system capable of detecting abnormal bronchovascular pairs in high-resolution CT scans of sparse datasets using a three-stage process: (1) detection of potential bronchovascular pairs, (2) detection of discrete pairs, where there exists no ambiguity as to the artery that accompanies a bronchus, and (3) identification of abnormal pairs with severity levels. The system was evaluated at every stage. The automated scoring for the presence and severity of bronchial abnormalities was demonstrated to be comparable to that of an experienced radiologist (i.e., kappa statistics κ90.5). In addition, BV_pairs was also evaluated on images containing honeycombing regions, since honeycombing cysts appear very similar to bronchi, and the system could successfully differentiate honeycombing cysts from bronchi.

show abstract

Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics

Cited by 52 publications

References 31 publications

Predicting protein function via multi-label supervised topic model on gene ontology

Predicting protein function via multi-label supervised topic model on gene ontology

Semi-supervised learning for scalable and robust visual search

Automatic Detection of Bronchial Dilatation in HRCT Lung Images

Contact Info

Product

Resources

About