Abstract:Machines have difficulty when using people’s names to link medical and other records pertaining to the same individuals because of nicknames, ethnic synonyms, truncations, misspellings and typographical errors. Present algorithms used to compute the discriminating powers (or ODDS) associated with partial agreements of names are based, inappropriately, on the degrees of outward similarity alone. They are particularly ineffective in dealing with names that look alike but are unrelated, and with related names that have little apparent similarity. A fundamentally different rationale is, therefore, proposed which, like the human mind, assesses the relatedness of two alternative forms of a name in terms of how often they are used, interchangeably in practice. This must be taken into account if the associated discriminating powers (ODDS) are to be correctly computed. A way of implementing this more precise approach is described and illustrated, using the given names on linked records from an earlier epidemiological study. This first study of two describes the logical basis for record linkage, a second one the empirical test.
Abstract:The preceeding paper examined the logical basis of an exact way of calculating the discriminating powers of people’s names when they only partially agree. The method has application to automated file searching and record linkage. The present account describes an empirical test of the approach. Use is made of some 2000 comparison pairs of male given names, obtained as a byproduct from an earlier linkage study. The test shows that exact value-specific ODDS can indeed be calculated for common names when compared with their accepted synonyms (e.g. JOSEPH versus JOE). Moreover, the use can be extended to include rare variants, by arranging these into groups defined in value-specific terms (e.g. as selected blocks in an alphabetically sequenced listing, or combinations of such blocks). A majority of all name comparisons may be handled in this manner.The added precision serves to reduce the numbers of records that are ambiguously linked and require labour intensive clerical resolution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.