Abstract. Computer aided diagnosis (CAD) of cancerous anatomical structures via 3D medical images has emerged as an intensively studied research area. In this paper, we present a principled three-tiered image feature learning approach to capture task specific and data-driven class discriminative statistics from an annotated image database. It integrates voxel-, instance-, and database-level feature learning, aggregation and parsing. The initial segmentation is proceeded as robust voxel labeling and thresholding. After instance-level spatial aggregation, extracted features can also be flexibly tuned for classifying lesions, or discriminating different subcategories of lesions. We demonstrate the effectiveness in the lung nodule detection task which handles all types of solid, partial-solid, and ground-glass nodules using the same set of learned features. Our hierarchical feature learning framework, which was extensively trained and validated on large-scale multiple site datasets of 879 CT volumes (510 training and 369 validation), achieves superior performance than other state-of-the-art CAD systems. The proposed method is also shown to be applicable for colonic polyp detection, including all polyp morphological subcategories, via 770 tagged-prep CT scans from multiple medical sites (358 training and 412 validation).