SUMMARYThe aim of the paper is to propose adequate models for the description of and appropriate solutions to dictionary optimization. A dictionary dedicated to human users is (as with any human artefact) full of errors and inconsistencies which cannot be handled by a computer as an end user. Thus, it is necessary to check and optimize the contents. By using 'pairwise optimal clustering' approaches, similarity aggregation and the new quadri-decomposition methods, we will show how they are adapted to handle linguistic data in different configurations (a dictionary of synonyms and a bilingual dictionary). On these real-life applications (i.e. our processed dictionary of synonyms corresponds to more than 2.5 billior matrix cells) we will describe how different aspects of such a method can be a good tool for designing new dictionaries for both applications.
KEY WORDS Optimal clustering Relational analysis .Clustering algorithms Consensus theory Linguistics Lexicography
NATURAL LANGUAGE PROCESSING AND LEXICOGRAPHYFor twenty years, computerized processing in linguistics has been leading to the creation of a new research field: computational linguistics. Lexicon, syntax and semantics have been subject to much research work. Within the wider context of language industries, dictionaries play a fundamental part as an obligatory component of any system: speech synthesis, speech recognition, text processing and advanced office systems, computerized publishing, textgeneration systems, natural language interfaces, computer-aided translation, etc., use information stored in dictionaries dedicated to the specific needs of these applications. It is now possible to consider computational lexicography as a research field by itself.
COMPUTATIONAL LEXICOGRAPHYRelationships between dictionaries and computers are widely illustrated.
SUMMARYAfter a short methodological presentation of similarity aggregation in automatic classification, we will present an application to computational linguistics. We will try to explain, from an existing dictionary of synonyms, how we have (a) defined what was, in our opinion, the meaning of the synonymous relation we wanted to reveal in (b) transformed the existing dictionary into a sequence of matrices of synonymy, (c) checked with an adapted algorithm (similarity aggregation technique) if the links appearing in the (d) tried to improve the synonymous relation, a new optimized dictionary, existing dictionary corresponded to our synonymy definition, in order to propose more accurate data facilitating the management of a new dictionary and providing a classification of synonyms according to a semic separate valuation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.