Machine learning (ML) is a branch of computer science dealing with computational algorithms that have learning capabilities. The learning approaches such as data clustering, artificial neural networks, genetic programming, and support vector machines have found wide applications in the fields of engineering, business and science. Among the different approaches, the greatest advantage of artificial neural networks (ANNs) is their ability to approximate an arbitrary function through learning from observed data. The single hidden layer feedforward neural networks (SLFNs) are probably the most popular ANNs and have been studied extensively. The extreme learning machine (ELM) introduced by Huang et al. is a learning algorithm designed based on the generalized SLFNs with a wide variety of hidden nodes. It randomly generates hidden node parameters and then determines the output weights analytically. The key advantage of ELM is that it needs no iteration when determining the hidden node parameters, which dramatically reduces the computational time required for training the model. In addition, ELM is very simple and it tends to obtain the smallest training error and the smallest norm of weights, which can lead to good generalization performance of networks. However, the good performance of ELM is valid only when the network architecture is chosen correctly. For any application, a small network cannot learn the problem well, but a large network may lead to overfitting. Both of these cases result in poor generalization performance. The network architecture design for the original ELM relies on a trial and error method that may be very tedious for certain applications. Therefore, how to choose an appropriate network structure remains an open