As an important machine component, the gearbox is widely used in industry for power transmission. Condition monitoring (CM) of a gearbox is critical to provide timely information for undertaking necessary maintenance actions. Massive research efforts have been made in the last two decades to develop vibration-based techniques. However, vibration-based methods usually include several inherent shortages including contact measurement, localized information, noise contamination, and high computation costs, making it difficult to be a cost-effective CM technique. In this paper, infrared thermal (IRT) images, which can contain information covering a large area and acquired remotely, are based on developing a cost-effective CM method. Moreover, a convolutional neural network (CNN) is employed to automatically process the raw IRT images for attaining more comprehensive feature parameters, which avoids the deficiency of incomplete information caused by various feature-extraction methods in vibration analysis. Thus, an IRT–CNN method is developed to achieve online remote monitoring of a gearbox. The performance evaluation based on a bevel gearbox shows that the proposed method can achieve nearly 100% correctness in identifying several common gear faults such as tooth pitting, cracks, and breakages and their compounds. It is also especially robust to ambient temperature changes. In addition, IRT also significantly outperforms its vibration-based counterparts.