BackgroundAlthough the incidence of tuberculosis (TB) has dropped substantially, it still is a serious threat to human health. And in recent years, the emergence of resistant bacilli and inadequate disease control and prevention has led to a significant rise in the global TB epidemic. It is known that the cause of TB is Mycobacterium tuberculosis infection. But it is not clear why some infected patients are active while others are latent.MethodsWe analyzed the blood gene expression profiles of 69 latent TB patients and 54 active pulmonary TB patients from GEO (Transcript Expression Omnibus) database.ResultsBy applying minimal redundancy maximal relevance and incremental feature selection, we identified 24 signature genes which can predict the TB activation. The support vector machine predictor based on these 24 genes had a sensitivity of 0.907, specificity of 0.913, and accuracy of 0.911, respectively. Although they need to be validated in a large independent dataset, the biological analysis of these 24 genes showed great promise.ConclusionWe found that cytokine production was a key process during TB activation and genes like CYBB, TSPO, CD36, and STAT1 worth further investigation.