The recent interest in using deep learning for seismic interpretation tasks, such as facies classification, has been facing a significant obstacle, namely the absence of large publicly available annotated datasets for training and testing models. As a result, researchers have often resorted to annotating their own training and testing data. However, different researchers may annotate different classes, or use different train and test splits. In addition, it is common for papers that apply machine learning for facies classification to not contain quantitative results, and rather rely solely on visual inspection of the results. All of these practices have lead to subjective results and have greatly hindered the ability to compare different machine learning models against each other and understand the advantages and disadvantages of each approach. To address these issues, we open-source a fullyannotated 3D geological model of the Netherlands F3 Block. This model is based on the study of the 3D seismic data in addition to 26 well logs, and is grounded on the careful study of the geology of the region. Furthermore, we propose two baseline models for facies classification based on a deconvolution network architecture and make their codes publicly available. Finally, we propose a scheme for evaluating different models on this dataset, and we share the results of our baseline models. In addition to making the dataset and the code publicly available, this work helps advance research in this area by creating an objective benchmark for comparing the results of different machine learning approaches for facies classification.