Record linkage is the process of identifying records that refer to the same entities from different data sources. While most research efforts are concerned with linking individual records, new approaches have recently been proposed to link groups of records across databases. Group record linkage aims to determine if two groups of records in two databases refer to the same entity or not. One application where group record linkage is of high importance is the linking of census data that contain household information across time. In this paper we propose a novel method to group record linkage based on multiple instance learning. Our method treats group links as bags and individual record links as instances. We extend multiple instance learning from bag to instance classification to reconstruct bags from candidate instances. The classified bag and instance samples lead to a significant reduction in multiple group links, thereby improving the overall quality of linked data. We evaluate our method with both synthetic data and real historical census data.
In this paper a system will be described that enables the design of a series of computer programs ("modules") for the automatic treatment of homography.By homography we mean the ambiguity between the graphic representation of word forms belonging to different syntactic classes.By automatic treatment of homography we mean assigning the correct syntactic class to homographs with the help of an algorithm.Homography is a problem connected with written text only.In our opinion it is not a linguistic problem,but a problem of information processing using linguistic and extralinguistic signs.As linguistic signs the syntactic rules of the so-called surface structure are used.They are used from two different points of view:the string properties are used to shape the algorithm; the constituent structure is used to find out which branch is to be taken in the algorithm.Thus the design of the system is more or less comparable with Winograd's model for semantic information processing or Salkoff's String Grammar (Salkoff,1973).However,we are not developing a model for parsing structures.The paper is devoted to homography because the systems developed in the past neglected this kind oflinguistic information processing.On the other hand the output of homography solving programs forms useful input for a variety of sciences studying aspects of languages use.Not only purely linguistic studies could use text input free from homography,but also literary,sociological and even legal text studies or question answering systems using data bases (Berry-Rogghe,1978; Kragelöh;Lockemann) needsuchinput.A more detailed description will be given of a concrete problem:the definition of the information that should be computed to disambiguate the homography of the German determiner.We developed a linear bounded automaton,because this proved to be the most efficient way to solve,the problem of homography.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.