Variation of information
In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements). It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality. Even more, it is a universal metric, in that if any[dubious ] other distance measure two items close-by, then the variation of information will also judge them close.[1]
Background
Lua error in package.lua at line 80: module 'strict' not found.
Definition
Suppose we have two partitions and
of a set
into disjoint subsets, namely
,
. Let
,
,
,
. Then the variation of information between the two partitions is:
.
This is equivalent to the shared information distance between the random variables i and j with respect to the uniform probability measure on defined by
for
. The variation of information satisfies
.
where is the entropy of
, and
is mutual information between
and
with respect to the uniform probability measure on
.
References
- ↑ Alexander Kraskov, Harald Stögbauer, Ralph G. Andrzejak, and Peter Grassberger, "Hierarchical Clustering Based on Mutual Information", (2003) ArXiv q-bio/0311039
Further reading
- Lua error in package.lua at line 80: module 'strict' not found.
- Lua error in package.lua at line 80: module 'strict' not found.
- Lua error in package.lua at line 80: module 'strict' not found.
- Lua error in package.lua at line 80: module 'strict' not found.