PlWordNet

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

plWordNet – a lexico-semantic database of Polish language. It includes sets of synonymous lexical units (synsets) followed by short definitions. plWordNet serves as a thesaurus-dictionary where concepts (synsets) and individual word meanings (lexical units) are defined by their location in the network of mutual relations, reflecting the lexico-semantic system of Polish language.[1] plWordNet is also used as one of the basic resources for the construction of natural language processing tools for Polish.[1]

History

plWordNet is being developed at Wrocław University of Technology. The works have been carried out by the The WrocUT Language Technology Group G4.19 since 2005,[2] funded by the Ministry of Science and Higher Education and by the EU. The thesaurus has been built from the ‘ground up’ by lexicographers and natural language engineers.[3] The first version of plWordNet was published in 2009 – it contained 20 223 lemmas, 26 990 lexical units and 17 695 synsets.[4] The most recent version, plWordNet 2.2, was made available on May 13, 2014.

Content

Currently, plWordNet contains 148k lemmas, 207k lexical units and 151k synsets.[5] It has already outgrown Princeton WordNet with respect to the number of lexical units. plWordNet consists of nouns (116k), verbs (18k) and adjectives (13k).[5] Each meaning of a given word is a separate lexical unit. Units that represent the same concept, and do not differ significantly in stylistic register, have been combined into synsets - sets of synonyms. Each lexical unit is assigned to one of the domains (semantic categories), indicating its general meaning. plWordNet domains correspond to Princeton WordNet lexicographers’ files.

Semantic categories in plWordNet

Noun domains[6] Verb domains[7] Adjective domains[8]
  • the highest in the hierarchy (bhp)
  • attribute (cech)
  • motive (cel)
  • time (czas)
  • body (czc)
  • emotion (czuj)
  • act (czy)
  • group (grp)
  • quantity (il)
  • food (jedz)
  • shape (ksz)
  • location (msc)
  • person (os)
  • communication (por)
  • possession (pos)
  • process (prc)
  • plant (rsl)
  • natural object (rz)
  • substance (sbst)
  • state (st)
  • classification (sys)
  • cognition (umy)
  • artefact (wytw)
  • event (zdarz)
  • natural phenomenon (zj)
  • animal (zw)
  • emotion (cczuj)
  • consumption (cjedz)
  • communication (cpor)
  • possession (cpos)
  • state (cst)
  • cognition (cumy)
  • creation (cwytw)
  • contact (dtk)
  • body (hig)
  • weather (pog)
  • perception (pst)
  • motion (ruch)
  • social (sp)
  • competition (wal)
  • change (zmn)
  • deadjectival (grad)
  • quality (jak)
  • deverbal (odcz)
  • relation (rel)

Lexical unit description

Some lexical units are provided with the information about stylistic register, short definition, usage examples and link to the relevant Wikipedia article.

noun miasto town, city
domain miejsce i umiejscowienie place and location
definition duży, gęsto zabudowany i zaludniony teren posiadający odrębną administrację; miejsce życia ludzi pracujących w przemyśle lub usługach big, densely built-up and populated area with a separate administration; living place of people working in industry or services
example W mieście człowiek ma większą szansę na zrobienie kariery i zarobienie pieniędzy, choć jednocześnie łatwiej tam niż na wsi popaść w ubóstwo. It is much easier to make a career in a city than in a village, but it is also much easier to fall into poverty.

The most important element defining words meanings are lexico-semantic and derivational relations, which hold between synsets and between lexical units. One synset groups such lexical units, which share the same set of relations.[9] Based on the relations assigned to the synsets and units, tools for natural language processing can conclude about meaning of the lemma, which is important for example in word-sense disambiguation.

Selected noun relations[9]

Relation Test Example
synonymy
  • If he/she/it is X, then he/she/it is also Y
  • If he/she/it is Y, then he/she/it is also X
{kot2; kot domowy1}, 'cat, domestic cat'
inter-register synonymy
  • X and Y share a hypernym, their sets of hyponyms do not overlap
  • X and Y are not synonyms
  • If he/she/it is X, then he/she/it is also Y [to the extent of the stylistic register difference]
  • If he/she/it is X, then he/she/it is also Y [to the extent of the stylistic register difference]
{chłopiec1}, {gówniarz1}, 'boy, ~brat, squirt'
hypo-/hypernymy
  • If he/she/it is X, then he/she/it must be Y
  • If he/she/it is Y, then he/she/it not necessarily is X
  • If he/she/it is not Y, then he/she/it cannot be X
{buk1} jest rodzajem {drzewo liściaste1} , ‘beech’ is a kind of ‘deciduous tree’
mero-/holonymy
  • X jest częścią Y
  • Y nie jest częścią X
  • Y jest całością, której częścią jest X
{poduszka powietrzna1} jest częścią {samochód1}, ‘air bag’ is a part of ‘car’

Polish synsets are connected to the corresponding Princeton WordNet synsets with a set of inter-lingual lexico-semantic relations (such as for instance synonymy, partial synonymy, hyponymy). 91 578 synsets have been mapped so far (which amounts to about 2/3 of plWordNet synsets, among which mainly nouns).[10] The mapping enables the application of plWordNet in machine translation, e.g. in the online service offered by Google Translate.

Applications

plWordNet is available on the open access license, allowing free browsing. It has been made available to the users in the form of an online dictionary, mobile application and web services. Some application of plWordNet:

References

  1. 1.0 1.1 http://plwordnet.pwr.wroc.pl/wordnet/about
  2. Maziarz M., Piasecki M., Szpakowicz S., Approaching plWordNet 2.0, http://nlp.pwr.wroc.pl/ltg/files/publications/paper%2042.pdf
  3. http://nlp.pwr.wroc.pl/plwordnet/download/?lang=eng
  4. Piasecki M., Szpakowicz S., Broda B., A Wordnet from the Ground Up, Wrocław 2009, s. 170, http://www.plwordnet.pwr.wroc.pl/main/content/files/publications/A_Wordnet_from_the_Ground_Up.pdf
  5. 5.0 5.1 Detailed comparative statistics of plWN and PWN can be found at plWN webpage: http://plwordnet.pwr.wroc.pl/wordnet/stats [access: 30.06.2014]
  6. Rabiega-Wiśniewska J., Maziarz M., Piasecki M., Szpakowicz S., Opis relacji leksykalno-semantycznych w Słowosieci 2.0. Rzeczownik, s. 4.
  7. Hojka B., Maziarz M., Piasecki M., Rabiega-Wiśniewska J., Szpakowicz S., Opis relacji leksykalno-semantycznych w Słowosieci 2.0. Czasownik, s. 15-16.
  8. Maziarz M., Szpakowicz S., Piasecki M., Semantic Relations among Adjectives in Polish WordNet 2.0: A New Relation Set, Discussion and Evaluation, Cognitive Studies / Études Cognitives, t. 12, s. 149–179, 2012.
  9. 9.0 9.1 Maziarz M., Piasecki M., Szpakowicz S., Rabiega-Wiśniewska J., Semantic Relations Among Nouns in Polish Wordnet Grounded in Lexicographic and Semantic Tradition, Cognitive Studies/Études Cognitives, t, 11, s. 161-181, 2011.
  10. http://plwordnet.pwr.wroc.pl/wordnet/stats [access: 30.05.2014]