Test smells are considered bad practices that can reduce the test code quality, thus harming software testing goals and maintenance activities. Prior studies have investigated the diffusion of test smells and their impact on test code maintainability. However, we cannot directly compare the outcomes of the studies as most of them use customized datasets. In response, we introduced the TSSM (Test Smells and Structural Metrics) dataset, containing test smells detected using the JNose Test tool and structural metrics (test code and production code) calculated with the CK metrics tool of 13,703 open‐source Java systems from GitHub. In addition, we perform an empirical study to investigate the relationship between test smells and structural metrics of test code and the relationship between test smells on a large‐scale dataset. We split the projects into three clusters to analyze the distribution of test smells, the co‐occurrences among test smells, and the correlation of test smells and structural metrics of test code. The ratio of smelly test classes with a specific test smell is similar among the clusters, but we could observe a significant difference in the number of test smells among them. The test smells Sleepy Test, Mystery Guest, and Resource Optimism rarely occur in the three clusters, and the last two are strongly correlated, indicating that those test smells are more severe than others. Our results point out that most test smells have a moderate correlation with high complexity, large size, and coupling of the test code, indicating that they can also negatively affect its quality. To support further studies, we made our dataset publicly available.