The gear fault signal under different working conditions is non-linear and non-stationary, which makes it difficult to distinguish faulty signals from normal signals. Currently, gear fault diagnosis under different working conditions is mainly based on vibration signals. However, vibration signal acquisition is limited by its requirement for contact measurement, while vibration signal analysis methods relies heavily on diagnostic expertise and prior knowledge of signal processing technology. To solve this problem, a novel acoustic-based diagnosis (ABD) method for gear fault diagnosis under different working conditions based on a multi-scale convolutional learning structure and attention mechanism is proposed in this paper. The multi-scale convolutional learning structure was designed to automatically mine multiple scale features using different filter banks from raw acoustic signals. Subsequently, the novel attention mechanism, which was based on a multi-scale convolutional learning structure, was established to adaptively allow the multi-scale network to focus on relevant fault pattern information under different working conditions. Finally, a stacked convolutional neural network (CNN) model was proposed to detect the fault mode of gears. The experimental results show that our method achieved much better performance in acoustic based gear fault diagnosis under different working conditions compared with a standard CNN model (without an attention mechanism), an end-to-end CNN model based on time and frequency domain signals, and other traditional fault diagnosis methods involving feature engineering.