From Infogalactic: the planetary knowledge core
Jump to: navigation, search

UBY-LMF[1][2] is a format for standardizing lexical resources for Natural Language Processing (NLP).[3] UBY-LMF conforms to the ISO standard for lexicons: LMF, designed within the ISO-TC37, and constitutes a so-called serialization of this abstract standard.[4] In accordance with the LMF, all attributes and other linguistic terms introduced in UBY-LMF refer to standardized descriptions of their meaning in ISOCat.

UBY-LMF has been implemented in Java and is actively developed as an Open Source project on Google Code. Based on this Java implementation, the large scale electronic lexicon UBY[5] has automatically been created - it is the result of using UBY-LMF to standardize a range of diverse lexical resources frequently used for NLP applications.

In 2013, UBY contains 10 lexicons which are pairwise interlinked at the sense level:[6][7][8]

A subset of lexicons integrated in UBY have been converted to a Semantic Web format according to the lemon lexicon model.[9] This conversion is based on a mapping of UBY-LMF to the lemon lexicon model.

External references


  1. Judith Eckle-Kohler, Iryna Gurevych, Silvana Hartmann, Michael Matuschek, Christian M Meyer: UBY-LMF - exploring the boundaries of language-independent lexicon models, in Gil Francopoulo, LMF Lexical Markup Framework, ISTE / Wiley 2013 (ISBN 978-1-84821-430-9)
  2. Judith Eckle-Kohler, Iryna Gurevych, Silvana Hartmann, Michael Matuschek and Christian M. Meyer. UBY-LMF - A Uniform Model for Standardizing Heterogeneous Lexical-Semantic Resources in ISO-LMF. In: Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), p. 275--282, May 2012.
  3. Gottfried Herzog, Laurent Romary, Andreas Witt: Standards for Language Resources. Poster Presentation at the META-FORUM 2013 - META Exhibition, September 2013, Berlin, Germany.
  4. Laurent Romary: TEI and LMF crosswalks. CoRR abs/1301.2444 (2013)
  5. Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer, Christian Wirth: UBY – a large-scale unified lexical-semantic resource based on LMF, Proceedings of EACL, pp. 580–590, 2012, Avignon, France.
  6. Christian M. Meyer and Iryna Gurevych. What Psycholinguists Know About Chemistry: Aligning Wiktionary and WordNet for Increased Domain Coverage, in: Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), p. 883–892, November 2011. Chiang Mai, Thailand.
  7. Silvana Hartmann and Iryna Gurevych. FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), vol. 1, p. 1363-1373, Association for Computational Linguistics, August 2013.
  8. Michael Matuschek and Iryna Gurevych. Dijkstra-WSA: A Graph-Based Approach to Word Sense Alignment. In: Transactions of the Association for Computational Linguistics (TACL), vol. 1, p. 151-164, May 2013.
  9. John McCrae, Guadalupe Aguado-de-Cea, Paul Buitelaar, Philipp Cimiano, Thierry Declerck, Asunción Gómez-Pérez, Jorge Gracia, Laura Hollink, Elena Montiel-Ponsoda, Dennis Spohr, Tobias Wunner. (2012) Interchanging lexical resources on the Semantic Web. Language Resources and Evaluation 46:701–719.