“…Two name instances in a block are compared for similarity over these four features as follows. Each name string is lower‐cased, converted into ASCII format, and segmented into an array of 2–4‐gram, following several studies (Han, Xu, Zha, & Giles, ; Kim & Kim, ; Kim, Kim, & Owen‐Smith, ; Louppe et al, ; Treeratpituk & Giles, ). For example, “Mark” is converted into a list of “ma,” “ar,” “rk,” “mar,” “ark,” and “mark.” Then a cosine similarity of the term frequency (TF) between the 2–4‐gram lists of two name instances is calculated as a forename similarity score for the instance pair.…”