Electrocardiography (ECG) is a very common, non-invasive diagnostic procedure and its interpretation is increasingly supported by algorithms. The progress in the field of automatic ECG analysis has up to now been hampered by a lack of appropriate datasets for training as well as a lack of well-defined evaluation procedures to ensure comparability of different algorithms. To alleviate these issues, we put forward first benchmarking results for the recently published, freely accessible clinical 12-lead ECG dataset PTB-XL, covering a variety of tasks from different ECG statement prediction tasks to age and sex prediction. Among the investigated deep-learning-based timeseries classification algorithms, we find that convolutional neural networks, in particular resnet-and inception-based architectures, show the strongest performance across all tasks. We find consistent results on the ICBEB2018 challenge ECG dataset and discuss prospects of transfer learning using classifiers pretrained on PTB-XL. These benchmarking results are complemented by deeper insights into the classification algorithm in terms of hidden stratification, model uncertainty and an exploratory interpretability analysis, which provide connecting points for future research on the dataset. Our results emphasize the prospects of deep-learning-based algorithms in the field of ECG analysis, not only in terms of quantitative accuracy but also in terms of clinically equally important further quality metrics such as uncertainty quantification and interpretability. With this resource, we aim to establish the PTB-XL dataset as a resource for structured benchmarking of ECG analysis algorithms and encourage other researchers in the field to join these efforts.