A truly distributed (as opposed to parallelized) support vector machine (SVM) algorithm is presented. Training data are assumed to come from the same distribution and are locally stored in a number of different locations with processing capabilities (nodes). In several examples, it has been found that a reasonably small amount of information is interchanged among nodes to obtain an SVM solution, which is better than that obtained when classifiers are trained only with the local data and comparable (although a little bit worse) to that of the centralized approach (obtained when all the training data are available at the same place). We propose and analyze two distributed schemes: a "naïve" distributed chunking approach, where raw data (support vectors) are communicated, and the more elaborated distributed semiparametric SVM, which aims at further reducing the total amount of information passed between nodes while providing a privacy-preserving mechanism for information sharing. We show the feasibility of our proposal by evaluating the performance of the algorithms in benchmarks with both synthetic and real-world datasets.
In this paper we propose a new method for training classifiers for multi-class problems when classes are not (necessarily) mutually exclusive and may be related by means of a probabilistic tree structure. It is based on the definition of a Bayesian model relating network parameters, feature vectors and categories. Learning is stated as a maximum likelihood estimation problem of the classifier parameters. The proposed algorithm is specially suited to situations where each training sample is labeled with respect to only one or part of the categories in the tree. Our experiments on information retrieval scenarios show the advantages of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.