Feature extraction is recognized as a critical stage in bearing fault diagnosis. Pattern spectrum (PS) and pattern spectrum entropy (PSE) in recent years have been smoothly applied in feature extraction, whereas they easily ignore the partial impulse signatures hidden in bearing vibration data. In this paper, the pattern gradient spectrum (PGS) and pattern gradient spectrum entropy (PGSE) are firstly presented to improve the performance of fault feature extraction of two approaches (PS and PSE). Nonetheless, PSE and PGSE are only able to evaluate dynamic behavior of the time series on a single scale, which implies there is no consideration of feature information at other scales. To address this problem, a novel approach entitled multiscale pattern gradient spectrum entropy (MPGSE) is further implemented to extract fault features across multiple scales, where its key parameters are determined adaptively by grey wolf optimization (GWO). Meanwhile, a Laplacian score-(LS-) based feature selection strategy is employed to choose the sensitive features and establish a new feature set. Finally, the selected new feature set is imported into extreme learning machine (ELM) to identify different health conditions of rolling bearing. Performance of our designed algorithm is tested on two experimental cases. Results confirm the availability of our proposed algorithm in feature extraction and show that our method can recognize effectively different bearing fault categories and severities. More importantly, the designed approach can achieve higher recognition accuracies and provide better stability by comparing with other entropy-based methods involved in this paper.