go back

Volume 15, No. 11

AcX: System, Techniques, and Experiments for Acronym Expansion

Authors:
João L. M. Pereira (INESC-ID and IST, Universidade de Lisboa, and University of Amsterdam)* João Casanova (Hitachi Vantara) Helena Galhardas (INESC-ID and IST, Universidade de Lisboa ) Dennis Shasha (NYU, USA)

Abstract

In this information-accumulating world, each of us must learn continuously. To participate in a new field, or even a sub-field, one must be aware of the terminology including the acronyms that specialists know so well, but newcomers do not. Building on state-of-the art acronym tools, our end-to-end acronym expander system called AcX takes a document, identifies its acronyms, and suggests expansions that are either found in the document or appropriate given the subject matter of the document. As far as we know, AcX is the first open source and extensible system for acronym expansion that allows mixing and matching of different inference modules. As of now, AcX works for English, French, and Portuguese with other languages in progress. This paper describes the design and implementation of AcX, proposes three new acronym expansion benchmarks, compares state-of-the-art techniques on them, and proposes ensemble techniques that improve on any single technique. Finally, the paper evaluates the performance of AcX in end-to-end experiments on a human-annotated dataset of Wikipedia documents. Our experiments show that human performance is still better than the best automated approaches. Thus, achieving Acronym Expansion at a human level is still a rich and open challenge.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy