Abstract:The echo state network (ESN) employs a huge reservoir with sparsely and randomly connected internal nodes and only trains the output weights, which avoids the suboptimal problem, exploding and vanishing gradients, high complexity and other disadvantages faced by traditional recurrent neural network (RNN) training. In light of the outstanding adaption to nonlinear dynamical systems, ESN has been applied into a wide range of applications. However, in the era of Big Data, with an enormous amount of data being generated continuously every day, the data are often distributed and stored in real applications, and thus the centralized ESN training process is prone to being technologically unsuitable. In order to achieve the requirement of Big Data applications in the real world, in this study we propose an algorithm and its implementation for distributed ESN training. The mentioned algorithm is based on the parallel particle swarm optimization (P-PSO) technique and the implementation uses Spark, a famous large-scale data processing framework. Four extremely large-scale datasets, including artificial benchmarks, real-world data and image data, are adopted to verify our framework on a stretchable platform. Experimental results indicate that the proposed work is accurate in the era of Big Data, regarding speed, accuracy and generalization capabilities.