“…Other studies instead addressed whether the segmentation sequences obtained from a specific model (e.g., HMM) matched the annotations provided by human raters by means of different two-steps procedures. First, a measure of similarity/distance is typically defined: either number of matching boundaries between sequences within a pre-specified time-window (e.g., 3 seconds - Baldassano et al, 2017;Williams et al, 2022), the point-biserial correlation between different segmentations (Franklin et al, 2020;Zacks et al, 2006) or the Jaccard index to estimate the similarity between two time-series , etc (e.g., see Cohen et al, 2022;Lee et al, 2021). Second, the selected metric is typically re-computed many times within a permutation-based approach (e.g., 1000) by shuffling the event boundaries of one of the two sequences in time and by keeping both the number of boundaries and the distance between them constant.…”