Recent work suggests the ThinPrep method can improve diagnostic sensitivity and accuracy in bile duct brushings. However, the proportion of atypical and suspicious diagnoses remains high. The aim of this study was to identify the most useful morphologic features in ThinPrep bile duct cytology and evaluate interobserver reliability. We evaluated 100 bile duct brushings prepared by ThinPrep, all with either histology or long term clinical follow-up (55 malignant, 45 benign). Morphologic features were evaluated by four experienced cytopathologists blind to clinical information and follow-up diagnoses. These features included cellularity, blood or diathesis, mitoses, inflammation, three-dimensional groups, discohesive atypical cells, macronucleoli, well-defined cytoplasmic borders, and nuclear features of malignancy (nuclear membrance irregularity, chromatin clumping). The data were analyzed by intraclass correlation (ICC) and stepwise multiple logistic regression. Reviewers showed unanimous agreement in 29% of cases, one degree of disagreement in 58% of cases, and full disagreement in 13% of cases. Of benign cases, 38% were thought to be diagnostic of malignancy by at least one of the four reviewers. Sensitivity for the morphologic parameters varied from 18 to 67%; the highest sensitivity was for discohesive atypical cells, well-defined cytoplasmic borders, nuclear features of malignancy, and cellularity (67, 62, 51 and 46%, respectively). Specificity of parameters varied from 16 to 100%; the highest specificity was for mitoses, three-dimensional groups, nuclear features of malignancy, and macronucleoli (100, 98, 93, and 93%, respectively). Interobserver reliability (ICC) was very good for specimen cellularity (0.72) and nuclear features of malignancy (0.60). In logistic regression analysis, only nuclear features of malignancy and increasing patient age separated benign from malignant. On ThinPrep bile duct brushings, nuclear features of malignancy are most useful in distinguishing benign from malignant, and interobserver reliability for this parameter is very good. Discohesive atypical cells show moderate sensitivity and specificity, while three dimensional clusters and macronucleoli are specific but not sensitive for malignancy, and are not significant in multivariate logistic regression models. The relatively high proportion of benign cases thought to be diagnostic of malignancy by at least one reviewer argues for a consensus approach to this diagnosis.