The Bulgarian WordNet (BulNet) is a lexical semantic network of Bulgarian following the Princeton WordNet (PWN) framework which implements the traditional semantic networks whose structure consists of nodes and relations between the nodes.[1][2][3]

General information

BulNet was started within the EU-funded project BalkaNet - a Multilingual Semantic Network of the Balkan Languages directed to the construction of synchronized semantic databases for the following Balkan languages - Bulgarian, Greek, Romanian, Serbian, Turkish and the expansion of the Czech lexical-semantic network. After BalkaNet's completion the development of the Bulgarian WordNet has continued within the nationally-funded projects BulNet - a Lexical-semantic Network of Bulgarian (2005-2010) and Language E-resources and Processing Tools (2011-2013); the latter is co-funded under the project CESAR: Central and South-East European Resources (Information and Communication Technologies Policy Support Programme Call: CIP ICT-PSP-2010-4).

Contents of BulNet


Currently the Bulgarian WordNet comprises more than 80,000 (as of April 15, 2015) synonym sets distributed into nine parts of speech - nouns, verbs, adjectives, adverbs, pronouns, prepositions, conjunctions, particles and interjections. The words included in the Bulgarian WordNet have been selected according to different criteria, the main ones being frequency analysis of the word occurrences in large text corpora (taking account of the number of occurrences of citation forms and not of wordforms), as well as the inclusion of synsets already featuring in the wordnets of other languages and synsets that correspond to high-frequency word senses found in parallel corpora.


Each synonym set - SYNSET encodes the relation of equivalence between a number of lexical items - LITERALS (at least one should be explicitly represented in the SYNSET), each of them having a unique meaning (specified by the value of SENSE) - which pertain to one and the same part of speech (specified as the value of POS) and represent one and the same lexical meaning (specified as the value of DEF). Each synset is linked to its counterpart in PWN 3.0 by means of a unique identification number - ID. The common synsets in the Balkan languages are marked as common concepts subsets - BCS. In a monolingual database a synset should be linked to at least one other synset through an intralingual relation. Non-obligatory information may also be encoded such as examples of usage, stylistic peculiarities, morphological or syntactic properties, author and last edit details.

Semantic relations

The large number of relations encoded in the Bulgarian wordnet effectively illustrates the language's semantic and derivational richness that offers diverse opportunities for numerous applications of the multilingual database. The Bulgarian electronic semantic database offers linguistic solutions at the semantic level such as options for synonym selection, queries for semantic relations of a word in the language's lexical system (antonymy, holonymy, etc.), explanatory definition queries and translation equivalents for a lexical item. The Bulgarian wordnet is an electronic multilingual dictionary of synonym sets along with their explanatory definitions and sets of semantic relations with other words in the language.[4][5]


Hydra is an OS-independent system designed for wordnet development, validation and exploration. The program enables users to browse and edit any number of monolingual wordnets at a time. The individual wordnets are synchronised, so that equivalent synonym sets, or synsets, may be viewed and explored in parallel.[6]


BulNet search engine



BulSemCor - Bulgarian sense-annotated corpus

BulNC: Bulgarian National Corpus


