Hematoma expansion (HE) is an independent predictor of poor outcomes and a modifiable treatment target in intracerebral hemorrhage (ICH). Evaluating HE in large datasets requires segmentation of hematomas on admission and follow-up CT scans, a process that is time-consuming and labor-intensive in large-scale studies. Automated segmentation of hematomas can expedite this process; however, cumulative errors from segmentation on admission and follow-up scans can hamper accurate HE classification. In this study, we combined a tandem deep-learning classification model with automated segmentation to generate probability measures for false HE classifications. With this strategy, we can limit expert review of automated hematoma segmentations to a subset of the dataset, tailored to the research team’s preferred sensitivity or specificity thresholds and their tolerance for false-positive versus false-negative results. We utilized three separate multicentric cohorts for cross-validation/training, internal testing, and external validation (n = 2261) to develop and test a pipeline for automated hematoma segmentation and to generate ground truth binary HE annotations (≥3, ≥6, ≥9, and ≥12.5 mL). Applying a 95% sensitivity threshold for HE classification showed a practical and efficient strategy for HE annotation in large ICH datasets. This threshold excluded 47–88% of test-negative predictions from expert review of automated segmentations for different HE definitions, with less than 2% false-negative misclassification in both internal and external validation cohorts. Our pipeline offers a time-efficient and optimizable method for generating ground truth HE classifications in large ICH datasets, reducing the burden of expert review of automated hematoma segmentations while minimizing misclassification rate.