Tuberculosis (TB) remains the leading cause of morbidity and mortality from infectious disease in developing countries. The sputum smear microscopy remains the primary diagnostic laboratory test. However, microscopic examination is always time‐consuming and tedious. Therefore, an effective computer‐aided image identification system is needed to provide timely assistance in diagnosis. The current identification system usually suffers from complex color variations of the images, resulting in plentiful of false object detection. To overcome the dilemma, we propose a two‐stage Mycobacterium tuberculosis identification system, consisting of candidate detection and classification using convolution neural networks (CNNs). The refined Faster region‐based CNN was used to distinguish candidates of M. tuberculosis and the actual ones were classified by utilizing CNN‐based classifier. We first compared three different CNNs, including ensemble CNN, single‐member CNN, and deep CNN. The experimental results showed that both ensemble and deep CNNs were on par with similar identification performance when analyzing more than 19,000 images. A much better recall value was achieved by using our proposed system in comparison with conventional pixel‐based support vector machine method for M. tuberculosis bacilli detection.