The repair of lexicographic errors occuring during the computer translation of knowledge is performed by approximate string matching algorithms. This article presents a new algorithm that attempts to repair lexical errors in inorganic chemical names. The algorithm allows the repair of an incorrect string containing one or more errors due to deletion, insertion, or transposition of multiple characters. Repair may be carried out on all or only part of the incorrect string, so the algorithm may be used to repair strings containing more than one incorrect morpheme. A model is proposed for the use of this algorithm in the repair of lexical errors in inorganic chemical names. The model attempts to generate a set of possible repair strings free from either syntactic or lexicographic errors. Repair is performed by comparing all or part of the error string with the set of terminal symbols or morphemes included in the grammar. A new model is proposed for the calculation of similarity; it is highly discriminating and allows the repair of parts of an error string.
The development of a system for the repair of lexical errors
detected in the process of recognizing inorganic
chemical names is described. Repairs are based on the calculation
of similarity between strings, using the
model and approximate matching algorithm detailed in the previous
article of this series. A hierarchical
data structure is proposed for the storage of information generated
during the repair process. This structure
comprises two hierarchically-related abstract data types, to permit the
storage of partial and total repairs
generated by the process. The great amount of information
generated by the system might recommend its
use in applications requiring explanatory subsystems to report on the
success of recognition of user-input
information. Problems in the repair process arising from inorganic
grammar structures are also examined,
and solutions are proposed for cases such as that of the delimiters
between significant parts of the inorganic
substance name.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.