Ultrasound (US) based classification systems exist for the stratification of thyroid nodules based on the risk for malignancy. This systematic review aimed to assess the evidence for the performance of US-based thyroid nodule classification systems through correlation with fine needle aspiration biopsy (FNAB). PubMed and Scopus were searched using keywords that included 'ultrasound classification', 'thyroid nodules', 'fine needle aspiration', and 'malignancy'. Inclusion criteria were as follows: studies/reviews reporting on US imaging for the classification of thyroid nodules. Exclusion criteria were as follows: no comparison between US imaging findings and histology reports based on FNAB, no full English text available/accessible. The database searches identified 66 publications. After evaluation, 12 studies met the inclusion criteria. Two US-based classification systems for thyroid nodules were assessed: the Thyroid Imaging Reporting and Data System (TIRADS) and the American Thyroid Association (ATA) guidelines. For TIRADS, the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) ranged from 70.6% to 97.4%, 29.3% to 90.4%, 23.3% to 64.3%, and 87.1% to 99.0%, respectively. The median sensitivity, specificity, PPV, and NPV for TIRADS was 90.0%, 57.4%, 49.0%, and 91.0%, respectively. One study comparing TIRADS with the ATA guidelines demonstrated that TIRADS was superior in terms of sensitivity, whereas the ATA guidelines were superior in terms of specificity and PPV. The high sensitivity and NPV of the US-based TIRADS classification system have excellent utility for correctly classifying nodules as positive for malignant disease and for predicting the absence of malignant disease. The paucity of studies assessing the ATA guidelines highlights avenues for further research comparing TIRADS with other systems of thyroid nodule classification.