An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record.The computation required to organize the file is proportional to kNlogN. The expected number of records examined in each search is independent of the file size. The expected computation to perform each search is proportional-to 1ogN. Empirical evidence suggests that except for very small files, this algorithm is considerably faster than other methods.(Submitted to ACM Transactions on Mathematical Software) Work supported in part by U.S. Energy Research and Development Administration under contract E(O43)515The Best Match or Nearest Neighbor ProblemThe best match or nearest neighbor problem applies to data files that store records with several real valued keys or attributes. The problem is to find those records in the file most similar to a query record according to some dissimilarity or distance measure. Formally, given a file of N recor,ds (each of which is described by k real valued attributes) and a dissimilarity measure D, find the m closest records to a query record (possibly not in the file) with specified attribute values.A data file, for example, might contain information on all cities with post offices. Associated with each city is its longitude and latitude. If a letter is addressed to a town without a post office, the closest town that has a post office might be chosen as the destination.The solution to this problem is of use in many applications. Information retrieval might involve searching a catalog for those items most similar to a given query item; each item in the file would be cataloged by numerical attributes that describe its characteristics. Classification decisions can be made by selecting prototype features from each category and finding which of these prototypes is closest to the record to be classified. Multivariate density estimation can be performed by calculating the volume about a given point ccntaining the closest m neighbors. Structures Used for Associative SearchingOne straightforward technique for solving the best match or nearest neighbor problem is the cell method. The k-dimensional key space is divided into small,identically sized cells. A spiral search of the cells from any query record will find the best matches of that record. Although _ this procedure minimizes the number of records examined, it is extremely costly in space and time, especially when the dimensionality of the space is large.-l- what is termed an optimized k-d tree.-4-The Search AlgorithmThe k-d tree data structure provides an efficient mechanism for examining only those records closest to the query record, thereby greatly reducing the computation required to find the best matches.The search algorithm is most easily described as a recursive procedure. The argument to the procedure is the node under investigation.The first invocation passes the root of the tree as this argument. Available as a global array ...
We describe two new algorithms for implementing barrier synchronization on a shared-memory multicomputer. Both algorithms are based on a method due to Brooks. We first improve Brooks' algorithm by introducing double buffering. Our dissemination algorithm replaces Brooks' communication pattern with an information dissemination algorithm described by Han and Finkel. Our tournament algorithm Uses a different communication pattern and generally requires fewer total instructions. The resulting algorithms improve Brooks' original barrier by a factor of two when the number of processes is a power of two. When the number of processes is not a power of two, these algorithms improve even more upon Brooks' algorithm because absent processes need not be simulated. These algorithms share with Brooks' barrier the limitation that each of the n processes meeting at the barrier must be assigned identifiers i such that O<~i
Like the numbers in a sudoku puzzle, a lexeme's principal parts provide enough information-but only enough-to deduce all of the remaining forms in its paradigm. Because principal parts are a distillation of the implicative relations that exist among the members of a lexeme's paradigm, they afford an important (but heretofore neglected) basis for typological classification. We recognize three logically distinct sorts of principal-part systems that might be postulated for a given language: static, adaptive, and dynamic. Focussing for present purposes on dynamic systems, we propose five crosscutting criteria for the typological classification of principal-part systems. These criteria relate to (i) how many principal parts are needed to determine a lexeme's paradigm; (ii) whether distinct lexemes possess parallel sets of principal parts; (iii) how many principal parts are needed to determine a given word in a lexeme's paradigm; (iv) what sort of morphological relation exists between a principal part and the forms that it is used to deduce; and (v) whether lexemes' nonprincipal parts are inferred from their principal parts in the same way from one inflection class to another. Drawing on these criteria, we propose a novel classification of a range of typologically diverse languages.
DIB is a general-purpose package that allows a wide range of applications such as recursive backtrack, branch and bound, and alpha-beta search to be implemented on a multicomputer. It is very easy to use. The application program needs to specify only the root of the recursion tree, the computation to be performed at each node, and how to generate children at each node. In addition, the application program may optionally specify how to synthesize values of tree nodes from their children's values and how to disseminate information (such as bounds) either globally or locally in the tree. DIB uses a distributed algorithm, transparent to the application programmer, that divides the problem into subproblems and dynamically allocates them to any number of (potentially nonhomogeneous) machines. This algorithm requires only minimal support from the distributed operating system. DIB can recover from failures of machines even if they are not detected. DIB currently runs on the Crystal multicomputer at the University of Wisconsin-Madison. Many applications have been implemented quite easily, including exhaustive traversal (N queens, knight's tour, negamax tree evaluation), branch and bound (traveling salesman) and alpha-beta search (the game of NIM). Speedup is excellent for exhaustive traversal and quite good for branch and bound.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.