Cases of laryngeal cancer are rising, with diagnosis often involving invasive biopsy procedures. An alternate approach is to identify high-risk patients by analysis of voice recordings which can alert clinical teams to those patients that need prioritisation. We propose a pipeline for evaluating speech classifier performance in the presence of noise. We perform experiments using the pipeline with several classifiers and denoising techniques. Random forest classifier performed best with an accuracy of 81.2% on clean data dropping to 63.8% when noise was added to recordings. The accuracy of all classifiers was reduced by added noise, signal denoising improved classifier accuracy but could not fully reverse the effects of noise. The effects of noise on classification is a complex issue which must be resolved for these detection systems to be implemented in clinical practice. We show that the proposed pipeline allows for the evaluation of classifier performance in the presence of noise.