The automatic sound event classification (SEC) has attracted a growing attention in recent years. Feature extraction is a critical factor in SEC system, and the deep neural network (DNN) algorithms have achieved the state-of-the-art performance for SEC. The extreme learning machine-based auto-encoder (ELM-AE) is a new deep learning algorithm, which has both an excellent representation performance and very fast training procedure. However, ELM-AE suffers from the problem of unstability. In this work, a bilinear multi-column ELM-AE (B-MC-ELM-AE) algorithm is proposed to improve the robustness, stability, and feature representation of the original ELM-AE, which is then applied to learn feature representation of sound signals. Moreover, a B-MC-ELM-AE and two-stage ensemble learning (TsEL)-based feature learning and classification framework is then developed to perform the robust and effective SEC. The experimental results on the Real World Computing Partnership Sound Scene Database show that the proposed SEC framework outperforms the state-of-the-art DNN algorithm.