Combined one sense disambiguation of abbreviations

HaCohen-Kerner, Yaakov; Kass, Ariel; Peretz, Ariel

doi:10.3115/1557690.1557707

Cited by 25 publications

(23 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…in the field of biochemistry, HMM is generally an abbreviation for heavy meromyosin. Associating abbreviations with their fully expanded forms is of great importance in various natural language processing (NLP) applications [HaCohen-Kerner et al 2008;Pakhomov 2002;Yu et al 2006]. …”

Section: Introductionmentioning

confidence: 99%

Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information

Sun

Okazaki

Tsujii

et al. 2013

ACM Transactions on Asian Language Information Processing

View full text Add to dashboard Cite

The present article describes a robust approach for abbreviating terms. First, in order to incorporate nonlocal information into abbreviation generation tasks, we present both implicit and explicit solutions: the latent variable model and the label encoding with global information. Although the two approaches compete with one another, we find they are also highly complementary. We propose a combination of the two approaches, and we will show the proposed method outperforms all of the existing methods on abbreviation generation datasets. In order to reduce computational complexity of learning non-local information, we further present an online training method, which can arrive the objective optimum with accelerated training speed. We used a Chinese newswire dataset and a English biomedical dataset for experiments. Experiments revealed that the proposed abbreviation generator with non-local information achieved the best results for both the Chinese and English languages. ACM Reference Format:Sun, X., Okazaki, N., Tsujii, J., and Wang, H. 2013. Learning abbreviations from Chinese and English terms by modeling non-local information.

show abstract

Section: Introductionmentioning

confidence: 99%

Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information

Sun

Okazaki

Tsujii

et al. 2013

ACM Transactions on Asian Language Information Processing

View full text Add to dashboard Cite

show abstract

“…The research described in this paper is clearly developed and expanded beyond the conference papers written by us (HaCohen‐Kerner, Kass, & Peretz, 2004, 2008a, b) as follows: (1) The background in various subdomains was enlarged significantly; (2) Additional experiments were applied; and (3) Terms, examples, analyses, and conclusions were added, explained, and detailed.…”

Section: Introductionmentioning

confidence: 99%

HAADS: A Hebrew Aramaic abbreviation disambiguation system

HaCohen-Kerner

Kass

Peretz

2010

J. Am. Soc. Inf. Sci.

Self Cite

View full text Add to dashboard Cite

In many languages abbreviations are very common and are widely used in both written and spoken language. However, they are not always explicitly defined and in many cases they are ambiguous. This research presents a process that attempts to solve the problem of abbreviation ambiguity using modern machine learning (ML) techniques. Various baseline features are explored, including context-related methods and statistical methods. The application domain is Jewish Law documents written in Hebrew and Aramaic, which are known to be rich in ambiguous abbreviations. Two research approaches were implemented and tested: general and individual. Our system applied four common ML methods to find a successful integration of the various baseline features. The best result was achieved by the SVM ML method in the individual research, with 98.07% accuracy. IntroductionIn the field of natural language processing (NLP), one of the attractive research subjects is the word sense disambiguation (WSD) problem. Word sense disambiguation is the task of assigning to each occurrence of an ambiguous word in a text one of its possible senses. To solve this widespread problem, many research systems have been developed and executed for a variety of languages, e.g.: (1) WSD system in Thai, disambiguating both verbs and nouns (the system result was not reported).In this research project, the goal is to solve a subproblem of WSD, the abbreviations disambiguation problem in Jewish Law documents, which are written in the Hebrew script, but they mix the Hebrew and Aramaic languages. This problem has been researched by a mere handful of previously developed systems, none of which with the above languages.It is important to note that previous research concerning this subproblem did not focus on defining a generic model or generic model creation process. The various researches attempted to create human-like computational and decision processes for specific contexts, such as medical articles or Latin literature. Each research is composed of a set of context-specific assumptions, which helped improve the system performance or limit the system to solve specific types of abbreviation instances, but in turn lessened the generality of the developed system or solution method.In this research, in addition to its uniqueness in handling Jewish texts, specifically law documents, the research aspires to find a generic model creation process. The developed process considers other languages and does not define preexecution assumptions, albeit additional languages were not tested using this process. The only limitation to this process is the input itself: the languages of the different text documents and the man-made solution database inputted during the learning process limit the context of documents that may be solved by the resulting disambiguation system. This claim is supported by the fact that the researched domain contains a mixture of the Hebrew and Aramaic languages, thus exampling the generic nature of the learning process. In addition to the generic model, ...

show abstract

“…Previous research mainly focuses on "abbreviation disambiguation", and machine learning approaches are commonly used (Park and Byrd, 2001;HaCohen-Kerner et al, 2008;Yu et al, 2006;Ao and Takagi, 2005). These ways of linking abbreviation pairs are effective, however, they cannot solve our problem directly.…”

Section: Related Workmentioning

confidence: 99%

Predicting Chinese Abbreviations with Minimum Semantic Unit and Global Constraints

Zhang¹,

Wang

et al. 2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We propose a new Chinese abbreviation prediction method which can incorporate rich local information while generating the abbreviation globally. Different to previous character tagging methods, we introduce the minimum semantic unit, which is more fine-grained than character but more coarse-grained than word, to capture word level information in the sequence labeling framework. To solve the "character duplication" problem in Chinese abbreviation prediction, we also use a substring tagging strategy to generate local substring tagging candidates. We use an integer linear programming (ILP) formulation with various constraints to globally decode the final abbreviation from the generated candidates. Experiments show that our method outperforms the state-of-the-art systems, without using any extra resource.

show abstract

Combined one sense disambiguation of abbreviations

Cited by 25 publications

References 5 publications

Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information

Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information

HAADS: A Hebrew Aramaic abbreviation disambiguation system

Predicting Chinese Abbreviations with Minimum Semantic Unit and Global Constraints

Contact Info

Product

Resources

About