In this information-accumulating world, each of us must learn continuously. To participate in a new field, or even a sub-field, one must be aware of the terminology including the acronyms that specialists know so well, but newcomers do not.
Building on state-of-the art acronym tools, our
end-to-end acronym expander system
called
AcX
takes a document, identifies its acronyms, and suggests expansions that are either found in the document or appropriate given the subject matter of the document. As far as we know, AcX is the first open source and extensible system for acronym expansion that allows mixing and matching of different inference modules. As of now, AcX works for English, French, and Portuguese with other languages in progress.
This paper describes the design and implementation of AcX, proposes
three new acronym expansion benchmarks
, compares state-of-the-art techniques on them, and proposes ensemble techniques that improve on any single technique.
Finally, the paper evaluates the performance of AcX and related work MadDog system in end-to-end experiments on a new human-annotated dataset of Wikipedia documents. Our experiments show that AcX outperforms MadDog but that human performance is still substantially better than the best automated approaches. Thus, achieving Acronym Expansion at a human level is still a rich and open challenge.