The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.
Error tolerant backbone resonance assignment is the cornerstone of the NMR structure determination process. Although a variety of assignment approaches have been developed, none works sufficiently well on noisy fully automatically picked peaks to enable the subsequent automatic structure determination steps. We have designed an integer linear programming (ILP) based assignment system (IPASS) that has enabled fully automatic protein structure determination for four test proteins. IPASS employs probabilistic spin system typing based on chemical shifts and secondary structure predictions. Furthermore, IPASS extracts connectivity information from the inter-residue information and the (automatically picked) (15)N-edited NOESY peaks which are then used to fix reliable fragments. When applied to automatically picked peaks for real proteins, IPASS achieves an average precision and recall of 82% and 63%, respectively. In contrast, the next best method, MARS, achieves an average precision and recall of 77% and 36%, respectively. The assignments generated by IPASS are then fed into our protein structure calculation system, FALCON-NMR, to determine the 3D structures without human intervention. The final models have backbone RMSDs of 1.25Å, 0.88Å, 1.49Å, and 0.67Å to the reference native structures for proteins TM1112, CASKIN, VRAR, and HACS1, respectively. The web server is publicly available at http://monod.uwaterloo.ca/nmr/ipass.
In a typical algorithmic learning model, a learner has to identify a target object from partial information. Conversely, in a teaching model a teacher has to give information that allows the learners to identify a target object. We devise two variants of the classical teaching model for Boolean concept classes, based on the teaching dimension, and describe them by teaching-dimensionlike combinatorial parameters. In the first model, the learners choose consistent hypotheses with least complexity. We show that 1-decision lists are the harder to teach the longer they are and that 2-term DNFs are the harder to teach the more terms they have. This contrasts with the teachability results for these classes in the teaching-dimension model. In our second model, the learners choose consistent hypotheses based on the assumption that the teacher is optimal. We show that monomials can be taught with a linear number of examples, whereas some 1-decision lists need exponentially many.
Abstract. The present paper surveys recent developments in algorithmic teaching. First, the traditional teaching dimension model is recalled. Starting from the observation that the teaching dimension model sometimes leads to counterintuitive results, recently developed approaches are presented. Here, main emphasis is put on the following aspects derived from human teaching/learning behavior: the order in which examples are presented should matter; teaching should become harder when the memory size of the learners decreases; teaching should become easier if the learners provide feedback; and it should be possible to teach infinite concepts and/or finite and infinite concept classes. Recent developments in the algorithmic teaching achieving (some) of these aspects are presented and compared.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.