Ensemble methods are considered the state-of-the art solution for many machine learning challenges. Such methods improve the predictive performance of a single model by training multiple models and combining their predictions. This paper introduce the concept of ensemble learning, reviews traditional, novel and state-ofthe-art ensemble methods and discusses current challenges and trends in the field. This article is categorized under: Algorithmic Development > Ensemble Methods Technologies > Machine Learning Technologies > Classification K E Y W O R D S boosting, classifier combination, ensemble models, machine-learning, mixtures of experts, multiple classifier system, random forest 1 | INTRODUCTIONEnsemble learning is an umbrella term for methods that combine multiple inducers to make a decision, typically in supervised machine learning tasks. An inducer, also referred as a base-learner, is an algorithm that takes a set of labeled examples as input and produces a model (e.g., a classifier or regressor) that generalizes these examples. By using the produced model, predictions can be drawn for new unlabeled examples. An ensemble inducer can be of any type of machine learning algorithm (e.g., decision tree, neural network, linear regression model, etc.). The main premise of ensemble learning is that by combining multiple models, the errors of a single inducer will likely be compensated by other inducers, and as a result, the overall prediction performance of the ensemble would be better than that of a single inducer. Ensemble learning is usually regarded as the machine learning interpretation for the wisdom of the crowd. This concept can be illustrated through the story of Sir Francis Galton who was an English philosopher and statistician that conceived the basic concept of standard deviation and correlation. While visiting a livestock fair, Galton conducted a simple weight guessing contest. The participants were asked to guess the weight of an ox. Hundreds of people participated in this contest, but no one succeeded in guessing the weight: 1,198 pounds. Much to his surprise, Galton found that the average of all guesses came quite close to the exact weight: 1,198 pounds. In this experiment, Galton revealed the power of combining many predictions in order to obtain an accurate prediction. Ensemble methods manifest this concept in machine learning challenges, where they result in improved predictive performance compared to a single model. In addition, when the computational cost of the participating inducers is low (e.g., decision tree), ensemble models are often very efficient.