Conditional random field

Machine learning and data mining 

Machine learning venues

Conditional random fields (CRFs) are a class of statistical modelling method often applied in pattern recognition and machine learning, where they are used for structured prediction. Whereas an ordinary classifier predicts a label for a single sample without regard to "neighboring" samples, a CRF can take context into account; e.g., the linear chain CRF popular in natural language processing predicts sequences of labels for sequences of input samples.
CRFs are a type of discriminative undirected probabilistic graphical model. It is used to encode known relationships between observations and construct consistent interpretations. It is often used for labeling or parsing of sequential data, such as natural language text or biological sequences^{[1]} and in computer vision.^{[2]} Specifically, CRFs find applications in shallow parsing,^{[3]} named entity recognition,^{[4]} gene finding and peptide critical functional region finding,^{[5]} among other tasks, being an alternative to the related hidden Markov models (HMMs). In computer vision, CRFs are often used for object recognition and image segmentation.
Contents
Description
Lafferty, McCallum and Pereira^{[1]} define a CRF on observations and random variables as follows:
Let be a graph such that
, so that is indexed by the vertices of . Then is a conditional random field when the random variables , conditioned on , obey the Markov property with respect to the graph: , where means that and are neighbors in .
What this means is that a CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets and , the observed and output variables, respectively; the conditional distribution is then modeled.
Inference
For general graphs, the problem of exact inference in CRFs is intractable. The inference problem for a CRF is basically the same as for an MRF and the same arguments hold.^{[6]} However there exist special cases for which exact inference is feasible:
 If the graph is a chain or a tree, message passing algorithms yield exact solutions. The algorithms used in these cases are analogous to the forwardbackward and Viterbi algorithm for the case of HMMs.
 If the CRF only contains pairwise potentials and the energy is submodular, combinatorial min cut/max flow algorithms yield exact solutions.
If exact inference is impossible, several algorithms can be used to obtain approximate solutions. These include:
 Loopy belief propagation
 Alpha expansion
 Mean field inference
 Linear programming relaxations
Parameter Learning
Learning the parameters is usually done by maximum likelihood learning for . If all nodes have exponential family distributions and all nodes are observed during training, this optimization is convex.^{[6]} It can be solved for example using gradient descent algorithms, or QuasiNewton methods such as the LBFGS algorithm. On the other hand, if some variables are unobserved, the inference problem has to be solved for these variables. Exact inference is intractable in general graphs, so approximations have to be used.
Examples
In sequence modeling, the graph of interest is usually a chain graph. An input sequence of observed variables represents a sequence of observations and represents a hidden (or unknown) state variable that needs to be inferred given the observations. The are structured to form a chain, with an edge between each and . As well as having a simple interpretation of the as "labels" for each element in the input sequence, this layout admits efficient algorithms for:
 model training, learning the conditional distributions between the and feature functions from some corpus of training data.
 decoding, determining the probability of a given label sequence given .
 inference, determining the most likely label sequence given .
The conditional dependency of each on is defined through a fixed set of feature functions of the form , which can informally be thought of as measurements on the input sequence that partially determine the likelihood of each possible value for . The model assigns each feature a numerical weight and combines them to determine the probability of a certain value for .
Linearchain CRFs have many of the same applications as conceptually simpler hidden Markov models (HMMs), but relax certain assumptions about the input and output sequence distributions. An HMM can loosely be understood as a CRF with very specific feature functions that use constant probabilities to model state transitions and emissions. Conversely, a CRF can loosely be understood as a generalization of an HMM that makes the constant transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence.
Notably in contrast to HMMs, CRFs can contain any number of feature functions, the feature functions can inspect the entire input sequence at any point during inference, and the range of the feature functions need not have a probabilistic interpretation.
Variants
Higherorder CRFs and semiMarkov CRFs
CRFs can be extended into higher order models by making each dependent on a fixed number of previous variables . Training and inference are only practical for small values of (such as o ≤ 5),^{[citation needed]} since their computational cost increases exponentially with . Largemargin models for structured prediction, such as the structured Support Vector Machine can be seen as an alternative training procedure to CRFs.
There exists another generalization of CRFs, the semiMarkov conditional random field (semiCRF), which models variablelength segmentations of the label sequence .^{[7]} This provides much of the power of higherorder CRFs to model longrange dependencies of the , at a reasonable computational cost.
Latentdynamic conditional random field
Latentdynamic conditional random fields (LDCRF) or discriminative probabilistic latent variable models (DPLVM) are a type of CRFs for sequence tagging tasks. They are latent variable models that are trained discriminatively.
In an LDCRF, like in any sequence tagging task, given a sequence of observations x = x₁, … xₙ, the main problem the model must solve is how to assign a sequence of labels y = y₁, … yₙ from one finite set of labels Y. Instead of directly modeling P(yx) as an ordinary linearchain CRF would do, instead a set of latent variables h is "inserted" between x and y using the chain rule of probability:^{[8]}
This allows the capturing of latent structure between the observations and labels.^{[9]} While LDCRFs can be trained using quasiNewton methods, a specialized version of the perceptron algorithm called the latentvariable perceptron has been developed for them as well, based on Collins' structured perceptron algorithm.^{[8]} These models find applications in computer vision, specifically gesture recognition from video streams,^{[9]} and shallow parsing.^{[8]}
Software
This is a partial list of software that implement generic CRF tools.
 RNNSharp CRFs based on recurrent neural networks (C#, .NET)
 CRFADF Linearchain CRFs with fast online ADF training (C#, .NET)
 CRFSharp Linearchain CRFs (C#, .NET)
 GCO CRFs with submodular energy functions (C++, Matlab)
 DGM General CRFs (C++)
 GRMM General CRFs (Java)
 factorie General CRFs (Scala)
 CRFall General CRFs (Matlab)
 Sarawagi's CRF Linearchain CRFs (Java)
 HCRF library Hiddenstate CRFs (C++, Matlab)
 Accord.NET Linearchain CRF, HCRF and HMMs (C#, .NET)
 Wapiti Fast linearchain CRFs (C)^{[10]}
 CRFSuite Fast restricted linearchain CRFs (C)
 CRF++ Linearchain CRFs (C++)
 FlexCRFs Firstorder and secondorder Markov CRFs (C++)
 crfchain1 Firstorder, linearchain CRFs (Haskell)
 imageCRF CRF for segmenting images and image volumes (C++)
 MALLET Linearchain for sequence tagging (Java)
 PyStruct Structured Learning in Python (Python)
 Figaro Probabilistic programming language capable of defining CRFs and other graphical models
This is a partial list of software that implement CRF related tools.
 Conrad CRF based gene predictor (Java)
 Stanford NER Named Entity Recognizer (Java)
 BANNER Named Entity Recognizer (Java)
See also
References
 ↑ ^{1.0} ^{1.1} Lafferty, J., McCallum, A., Pereira, F. (2001). "Conditional random fields: Probabilistic models for segmenting and labeling sequence data". Proc. 18th International Conf. on Machine Learning. Morgan Kaufmann. pp. 282–289.CS1 maint: uses authors parameter (link)<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
 ↑ He, X.; Zemel, R.S.; CarreiraPerpinñán, M.A. (2004). "Multiscale conditional random fields for image labeling". IEEE Computer Society. CiteSeerX: 10.1.1.3.7826.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
 ↑ Sha, F., Pereira, F. (2003). shallow parsing with conditional random fields.CS1 maint: multiple names: authors list (link)<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
 ↑ Settles, B. (2004). "Biomedical named entity recognition using conditional random fields and rich feature sets" (PDF). Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications. pp. 104–107.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
 ↑ Chang KY, Lin Tp, Shih LY, Wang CK (2015). Analysis and Prediction of the Critical Regions of Antimicrobial Peptides Based on Conditional Random Fields. PLoS ONE.CS1 maint: multiple names: authors list (link)<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
 ↑ ^{6.0} ^{6.1} Sutton, Charles; McCallum, Andrew (2010). "An Introduction to Conditional Random Fields". arXiv:1011.4088v1 [stat.ML].<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
 ↑ Sarawagi, Sunita; Cohen, William W. (2005). "SemiMarkov conditional random fields for information extraction" (PDF). In Lawrence K. Saul, Yair Weiss, Léon Bottou (eds.) (eds.). Advances in Neural Information Processing Systems 17. Cambridge, MA: MIT Press. pp. 1185–1192.CS1 maint: uses editors parameter (link)<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
 ↑ ^{8.0} ^{8.1} ^{8.2} Xu Sun; Takuya Matsuzaki; Daisuke Okanohara; Jun'ichi Tsujii (2009). Latent Variable Perceptron Algorithm for Structured Classification. IJCAI. pp. 1236–1242.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
 ↑ ^{9.0} ^{9.1} Morency, L. P.; Quattoni, A.; Darrell, T. (2007). "LatentDynamic Discriminative Models for Continuous Gesture Recognition". 2007 IEEE Conference on Computer Vision and Pattern Recognition (PDF). p. 1. doi:10.1109/CVPR.2007.383299. ISBN 1424411793.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
 ↑ T. Lavergne, O. Cappé and F. Yvon (2010). Practical very large scale CRFs. Proc. 48th Annual Meeting of the ACL, pp. 504513.
Further reading
 McCallum, A.: Efficiently inducing features of conditional random fields. In: Proc. 19th Conference on Uncertainty in Artificial Intelligence. (2003)
 Wallach, H.M.: Conditional random fields: An introduction. Technical report MSCIS0421, University of Pennsylvania (2004)
 Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning. In "Introduction to Statistical Relational Learning". Edited by Lise Getoor and Ben Taskar. MIT Press. (2006) Online PDF
 Klinger, R., Tomanek, K.: Classical Probabilistic Models and Conditional Random Fields. Algorithm Engineering Report TR072013, Department of Computer Science, Dortmund University of Technology, December 2007. ISSN 18644503. Online PDF
 CS1 maint: uses authors parameter
 CS1 maint: multiple names: authors list
 CS1 maint: uses editors parameter
 Wikipedia articles needing context from January 2013
 Wikipedia introduction cleanup from January 2013
 Wikipedia articles that are too technical from June 2012
 All articles that are too technical
 Articles needing expert attention from June 2012
 Articles with unsourced statements from December 2008
 Graphical models
 Loglinear models
 Machine learning