Nonlinear conjugate gradient method

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function $\displaystyle f(x)$ :

\displaystyle f(x)=\|Ax-b\|^2

The minimum of $f$ is obtained when the gradient is 0:

\nabla_x f=2 A^\top(Ax-b)=0

.

Whereas linear conjugate gradient seeks a solution to the linear equation $\displaystyle A^\top Ax=A^\top b$ , the nonlinear conjugate gradient method is generally used to find the local minimum of a nonlinear function using its gradient $\nabla_x f$ alone. It works when the function is approximately quadratic near the minimum, which is the case when the function is twice differentiable at the minimum.

Given a function $\displaystyle f(x)$ of $N$ variables to minimize, its gradient $\nabla_x f$ indicates the direction of maximum increase. One simply starts in the opposite (steepest descent) direction:

\Delta x_0=-\nabla_x f (x_0)

with an adjustable step length $\displaystyle \alpha$ and performs a line search in this direction until it reaches the minimum of $\displaystyle f$ :

\displaystyle \alpha_0:= \arg \min_\alpha f(x_0+\alpha \Delta x_0)

,

\displaystyle x_1=x_0+\alpha_0 \Delta x_0

After this first iteration in the steepest direction $\displaystyle \Delta x_0$ , the following steps constitute one iteration of moving along a subsequent conjugate direction $\displaystyle s_n$ , where $\displaystyle s_0=\Delta x_0$ :

Calculate the steepest direction: $\Delta x_n=-\nabla_x f (x_n)$ ,
Compute $\displaystyle \beta_n$ according to one of the formulas below,
Update the conjugate direction: $\displaystyle s_n=\Delta x_n+\beta_n s_{n-1}$
Perform a line search: optimize $\displaystyle \alpha_n=\arg \min_{\alpha} f(x_n+\alpha s_n)$ ,
Update the position: $\displaystyle x_{n+1}=x_{n}+\alpha_{n} s_{n}$ ,

With a pure quadratic function the minimum is reached within N iterations (excepting roundoff error), but a non-quadratic function will make slower progress. Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least every N iterations, or sooner if progress stops. However, resetting every iteration turns the method into steepest descent. The algorithm stops when it finds the minimum, determined when no progress is made after a direction reset (i.e. in the steepest descent direction), or when some tolerance criterion is reached.

Within a linear approximation, the parameters $\displaystyle \alpha$ and $\displaystyle \beta$ are the same as in the linear conjugate gradient method but have been obtained with line searches. The conjugate gradient method can follow narrow (ill-conditioned) valleys where the steepest descent method slows down and follows a criss-cross pattern.

Four of the best known formulas for $\displaystyle \beta_n$ are named after their developers and are given by the following formulas:

Fletcher–Reeves^{[citation needed]}:

\beta_{n}^{FR} = \frac{\Delta x_n^\top \Delta x_n} {\Delta x_{n-1}^\top \Delta x_{n-1}}

Polak–Ribière^{[citation needed]}:

\beta_{n}^{PR} = \frac{\Delta x_n^\top (\Delta x_n-\Delta x_{n-1})} {\Delta x_{n-1}^\top \Delta x_{n-1}}

Hestenes-Stiefel^{[citation needed]}:

\beta_n^{HS} = -\frac{\Delta x_n^\top (\Delta x_n-\Delta x_{n-1})} {s_{n-1}^\top (\Delta x_n-\Delta x_{n-1})}

Dai–Yuan^{[citation needed]}:

\beta_{n}^{DY} = -\frac{\Delta x_n^\top \Delta x_n} {s_{n-1}^\top (\Delta x_n-\Delta x_{n-1})}

.

These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred formula is a matter of heuristics or taste. A popular choice is $\displaystyle \beta=\max\{0,\,\beta^{PR}\}$ which provides a direction reset automatically^{[citation needed]}.

Newton based methods - Newton-Raphson Algorithm, Quasi-Newton methods (e.g., BFGS method) - tend to converge in fewer iterations, although each iteration typically requires more computation than a conjugate gradient iteration as Newton-like methods require computing the Hessian (matrix of second derivatives) in addition to the gradient. Quasi-Newton methods also require more memory to operate (see also the limited memory L-BFGS method).

External links

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain by Jonathan Richard Shewchuk.
A NONLINEAR CONJUGATE GRADIENT METHOD WITH A STRONG GLOBAL CONVERGENCE PROPERTY by Y. H. DAI and Y. YUAN.

Nonlinear conjugate gradient method

See also

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools