This paper aims to analyze and evaluate the performance of different classification algorithms on medical datasets and gives some tips and guidelines for choosing the right algorithm for the dataset. For this purpose, twelve medical datasets were selected from the UCI repository, including breast cancer data, chronic kidney disease, cryotherapy, hepatitis, immunotherapy, Indian Liver Patient Dataset (ILPD), liver disorders, pima diabetes, risk factors cervical cancer, statlog (heart) dataset, dermatology and lung cancer. The selected classifiers for performance evaluation are Bagging, IBK, J48, JRip, Multilayer Perceptron (MP), Naïve Bayes (NB), Support Vector Machine (SVM), ZeroR and Voting Frequency Intervals (VFI). The algorithms were implemented on datasets using the WEKA tool and 10-fold cross-validation. Experimental results and analysis for all measures were introduced to guideline researchers who seek to achieve and how to balance different measures. Additionally, challenges and future directions were presented at the end of this paper.