BulNet

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

The Bulgarian WordNet (BulNet) is a lexical semantic network of Bulgarian following the Princeton WordNet (PWN) framework which implements the traditional semantic networks whose structure consists of nodes and relations between the nodes.[1][2][3]

General information

BulNet was started within the EU-funded project BalkaNet - a Multilingual Semantic Network of the Balkan Languages directed to the construction of synchronized semantic databases for the following Balkan languages - Bulgarian, Greek, Romanian, Serbian, Turkish and the expansion of the Czech lexical-semantic network. After BalkaNet's completion the development of the Bulgarian WordNet has continued within the nationally-funded projects BulNet - a Lexical-semantic Network of Bulgarian (2005-2010) and Language E-resources and Processing Tools (2011-2013); the latter is co-funded under the project CESAR: Central and South-East European Resources (Information and Communication Technologies Policy Support Programme Call: CIP ICT-PSP-2010-4).

Contents of BulNet

Categories

Currently the Bulgarian WordNet comprises more than 80,000 (as of April 15, 2015) synonym sets distributed into nine parts of speech - nouns, verbs, adjectives, adverbs, pronouns, prepositions, conjunctions, particles and interjections. The words included in the Bulgarian WordNet have been selected according to different criteria, the main ones being frequency analysis of the word occurrences in large text corpora (taking account of the number of occurrences of citation forms and not of wordforms), as well as the inclusion of synsets already featuring in the wordnets of other languages and synsets that correspond to high-frequency word senses found in parallel corpora.

Synsets

Each synonym set - SYNSET encodes the relation of equivalence between a number of lexical items - LITERALS (at least one should be explicitly represented in the SYNSET), each of them having a unique meaning (specified by the value of SENSE) - which pertain to one and the same part of speech (specified as the value of POS) and represent one and the same lexical meaning (specified as the value of DEF). Each synset is linked to its counterpart in PWN 3.0 by means of a unique identification number - ID. The common synsets in the Balkan languages are marked as common concepts subsets - BCS. In a monolingual database a synset should be linked to at least one other synset through an intralingual relation. Non-obligatory information may also be encoded such as examples of usage, stylistic peculiarities, morphological or syntactic properties, author and last edit details.

Semantic relations

The large number of relations encoded in the Bulgarian wordnet effectively illustrates the language's semantic and derivational richness that offers diverse opportunities for numerous applications of the multilingual database. The Bulgarian electronic semantic database offers linguistic solutions at the semantic level such as options for synonym selection, queries for semantic relations of a word in the language's lexical system (antonymy, holonymy, etc.), explanatory definition queries and translation equivalents for a lexical item. The Bulgarian wordnet is an electronic multilingual dictionary of synonym sets along with their explanatory definitions and sets of semantic relations with other words in the language.[4][5]

Hydra

Hydra is an OS-independent system designed for wordnet development, validation and exploration. The program enables users to browse and edit any number of monolingual wordnets at a time. The individual wordnets are synchronised, so that equivalent synonym sets, or synsets, may be viewed and explored in parallel.[6]

Access

BulNet search engine

Hydra

BulNet in META-SHARE

BulSemCor - Bulgarian sense-annotated corpus

BulNC: Bulgarian National Corpus

References

  1. Koeva, S., G. Totkov and A. Genov. Towards Bulgarian WordNet. Romanian Journal of Information Science and Technology, Vol. 7, No. 1-2, 45-61, 2004. ISSN 1453-8245.
  2. Koeva, S. Bulgarian WordNet – development and perspectives. In International Conference Cognitive Modeling in Linguistics, Varna, 2005, 270-271.
  3. Koeva, S. Bulgarian Wordnet - current state, applications and prospects. In Bulgarian-American Dialogues, Prof. M. Drinov Academic Publishing House, Sofia, 2010, 120-132. ISBN 978-954-322-383-1.
  4. Koeva, S. Derivational and morphosemantic relations in Bulgarian Wordnet. In Intelligent Information Systems, XVI, Warsaw, Academic Publishing House, 2008, 359—389. ISBN 978-83-60434-44-4. [1]
  5. Tsvetana Dimitrova, Ekaterina Tarpomanova and Borislav Rizov. Coping with Derivation in the Bulgarian Wordnet. In: Heili Orav, Christiane Fellbaum and Piek Vossen (Eds.) Proceedings of the Seventh Global Wordnet Conference, Tartu, Estonia, 2014, pp. 109-117. [2].
  6. Borislav Rizov. Hydra: A Software System for Wordnet. In: Heili Orav, Christiane Fellbaum and Piek Vossen (Eds.) Proceedings of the Seventh Global Wordnet Conference, Tartu, Estonia, 2014, pp. 142-147. [3].

Sources