Polynomial least squares

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

In mathematical statistics, polynomial least squares refers to a broad range of statistical methods for estimating an underlying polynomial that describes observations. These methods include polynomial regression, curve fitting, linear regression, least squares, ordinary least squares, simple linear regression, linear least squares, approximation theory and method of moments. Polynomial least squares has applications in radar trackers, estimation theory, signal processing, statistics, and econometrics.

Two common applications of polynomial least squares methods are approximating a low-degree polynomial that approximates a complicated function and estimating an assumed underlying polynomial from corrupted (also known as "noisy") observations. The former is commonly used in statistics and econometrics to fit a scatter plot with a first degree polynomial (that is, a line).[1][2][3] The latter is commonly used in target tracking in the form of Kalman filtering, which is effectively a recursive implementation of polynomial least squares.[4][5][6][7] Estimating an assumed underlying deterministic polynomial can be used in econometrics as well.[8] In effect, both applications produce average curves as generalizations of the common average of a set of numbers, which is equivalent to zero degree polynomial least squares.[1][2][9]

In the above applications, the term "approximate" is used when no statistical measurement or observation errors are assumed, as when fitting a scatter plot. The term "estimate", derived from statistical estimation theory, is used when assuming that measurements or observations of a polynomial are corrupted.

Polynomial least squares estimate of a deterministic first degree polynomial corrupted with observation errors

Assume the deterministic first degree polynomial equation y with unknown coefficients \alpha and \beta as follows:

y=\alpha+\beta t

which is corrupted with an additive stochastic process  \varepsilon, described as an error (noise in tracking) written as

z=y+\varepsilon=\alpha+\beta t+\varepsilon

Given samples z_n where the subscript n is the sample index, the problem is to apply polynomial least squares to estimate y(t), and to determine its variance along with its expected value.

Definitions and assumptions

(1) The term linearity in mathematics may be considered to take two forms that are sometimes confusing: a linear "system" or transformation (sometimes called an operator)[9] and a linear equation. The term "function" is often used to describe both a system and an equation, which may lead to confusion. A linear system is defined by

f(ax +by)= af(x) +bf(y)

where a and b are constants, and where x and y are variables. In a linear "system" E[f(x)]=f(E[x]), where E is the linear expectation operator. A linear equation is a straight line as is the first degree polynomial described above.

(2) The error \varepsilon  is modeled as a zero mean stochastic process, samples of which are random variables that are uncorrelated and assumed to have identical probability distributions (specifically same mean and variance), but not necessarily Gaussian, treated as inputs to polynomial least squares. Stochastic processes and random variables are described only by probability distributions.[1][9][2]

(3) Polynomial least squares is modeled as a linear signal processing "system" which processes statistical inputs deterministically, the output being the linearly processed empirically determined statistical estimate, variance, and expected value.[6][7][8]

(4) Polynomial least squares processing produces deterministic moments (analogous to mechanical moments), which may be considered as moments of sample statistics, but not of statistical moments.[8]

Polynomial least squares and the orthogonality principle

Approximating a function z(t) with a polynomial

\hat z(t)=\sum_{j=1} ^J a_j t^{j-1}

where hat (^) denotes the estimate and (J − 1) is the polynomial degree, can be performed by applying the orthogonality principle. The error e in the sum of the squared errors can be written as

 e=\sum_{n=1}^N (z_n - \hat z_n)^2

According to the orthogonality principle,[4][5][6][7][8][9][10][11] e is minimum when the error (z-\hat z) is orthogonal to the estimate \hat z, that is

\sum_{n=1}^N (z_n - \hat z_n)\hat z_n=0

This can be described as the orthogonal projection of the data  z_n onto a solution in the form of the polynomial \hat z(t).[4][6][7] For N > J, orthogonal projection yields the standard overdetermined system of equations (often called normal equations) used to compute the coefficients in the polynomial approximation.[1][10][11] The minimum e is then

e_\min = \sum_{n=1} ^N (z_n - \hat z_n)z_n

The advantage of using orthogonal projection is that e_\min can be determined for use in the polynomial least squares processed statistical variance of the estimate.[8][9][11]

The empirically determined polynomial least squares output of a first degree polynomial corrupted with observation errors

To fully determine the output of polynomial least squares, a weighting function describing the processing must first be structured and then the statistical moments can be computed.

The weighting function describing the linear polynomial least squares "system"

The weighting function  w_n (\tau) can be formulated from polynomial least squares to estimate the unknown y(t) as follows:[8]

\hat y (\tau) = \frac {1} {N}\sum_{n=1} ^N z_n w_n (\tau) =  \frac {1} {N}\sum_{n=1} ^N (\alpha+\beta t_n + \varepsilon_n) w_n (\tau)

where N is the number of samples,  z_n are random variables as samples of the stochastic z (noisy signal), and the first degree polynomial data weights are

w_n(\tau)\equiv\frac{[\bar{t^2}-\bar{t}t_n+(t_n-\bar{t})\tau]}{(\bar{t^2}- \bar{t}^2)}

which represent the linear polynomial least squares "system" and describe its processing.[8] The Greek letter \tau is the independent variable t when estimating the dependent variable y(t) after data fitting has been performed. (The letter \tau is used to avoid confusion with t before and sampling during polynomial least squares processing.) The overbar ( ¯ ) defines the deterministic centroid of u_n as processed by polynomial least squares [8] – i.e., it defines the deterministic first order moment, which may be considered a sample average, but does not here approximate a first order statistical moment:

\bar{u}\overset{\underset{\mathrm{def}}{}}{=}\frac {1} {N}\sum_{n=1} ^N u_n

Empirically determined statistical moments

Applying  w_n (\tau) yields

\hat y(\tau)=\hat\alpha+\hat\beta \tau

where

\hat\alpha=\frac{(\bar{z}\bar{t^2}-\bar{zt}\bar{t})}{(\bar{t^2}-\bar{t}^2)}=\alpha+\frac{(\bar{\varepsilon}\bar{t^2}-\bar{{\varepsilon}t}\bar{t})}{(\bar{t^2}-\bar{t}^2)}

and

\hat\beta=\frac{(\bar{zt}-\bar{z}\bar{t})}{(\bar{t^2}-\bar{t}^2)}=\beta+\frac{(\bar{\varepsilon t}-\bar{\varepsilon}\bar{t})}{(\bar{t^2}-\bar{t}^2)}

As linear functions of the random variables \varepsilon_n , both coefficient estimates \hat\alpha and \hat\beta are random variables.[8] In the absence of the errors \varepsilon_n, \hat\alpha=\alpha and \hat\beta=\beta, as they should to meet that boundary condition.

Because the statistical expectation operator E[•] is a linear function and the sampled stochastic process errors \varepsilon_n are zero mean, the expected value of the estimate \hat y is the first order statistical moment as follows:[1][2][3][8]

E[\hat y (\tau)] =\alpha+\beta\tau+ \frac {1} {N}\sum_{n=1} ^N E[\varepsilon_n] w_n (\tau)= \alpha+\beta\tau =\alpha+\beta t

The statistical variance in \hat y is given by the second order statistical central moment as follows:[1][2][3][8]

\sigma_\hat y ^2 = E[(\hat y)-E[\hat y])^2 ]= \frac {1} {N}\frac {1} {N}\sum_{n=1} ^N \sum_{i=1} ^N w_n (\tau) E[\varepsilon_n \varepsilon_i] w_i (\tau)

=\sigma_\varepsilon ^2 \frac {1} {N}\frac {1} {N}\sum_{n=1} ^N \sum_{i=1} ^N w_n ^2 (\tau)

because

 \sum_{i=1} ^N E[\varepsilon_n \varepsilon_i] w_i (\tau)=\sigma_\varepsilon^2 w_n (\tau)

where \sigma_\varepsilon^2  is the statistical variance of random variables  \varepsilon_n ; i.e.,  E[\varepsilon_n \varepsilon_i]= \sigma_\varepsilon ^2  for i = n and (because  \varepsilon_n are uncorrelated)  \sigma_\varepsilon ^2=0 for i \ne n [8]

Carrying out the multiplications and summations in \sigma_\hat y^2 yields

\sigma_\hat y^2=\sigma_\varepsilon^2\frac{(\bar{t^2}-2\bar{t}\tau+\tau^2)}{N(\bar{t^2}- \bar{t}^2)}

[8]

Measuring or approximating the statistical variance of the random errors

In a hardware system, such as a tracking radar, the measurement noise variance \sigma_\varepsilon^2 can be determined from measurements when there is no target return – i.e., by just taking measurements of the noise alone.

However, if polynomial least squares is used when the variance \sigma_\varepsilon^2 is not measurable (such as in econometrics or statistics), it can be estimated with observations in e_\min from orthogonal projection as follows:

\sigma_\varepsilon^2\approx\hat {\sigma_\varepsilon^2}= (\bar {z^2}-\hat\alpha\bar{z} - \hat \beta \bar{zt}) [8]

As a result, to the first order approximation from the estimates \hat\alpha and  \hat\beta as functions of sampled z and  t

\sigma_\hat y^2 \approx \bigg[\frac{(\bar{z^2}-\bar{z}^2)}{(\bar{t^2}-\bar{t}^2)}- \Biggl(\frac{(\bar{zt}-\bar{z}\bar{t})}{(\bar{t^2}-\bar{t})}\Biggl)^2 \bigg]{\frac{(\bar{t^2}-2\bar{t}\tau+\tau^2)}N}

which goes to zero in the absence of the errors \varepsilon_n , as it should to meet that boundary condition.[8]

As a result, the samples  z_n (noisy signal) are considered to be the input to the linear polynomial least squares "system" which transforms the samples into the empirically determined statistical estimate  \hat y (\tau), the expected value E[\hat y] , and the variance \sigma_\hat y^2 .[8]

Properties of polynomial least squares modeled as a linear "system"

(1) The empirical statistical variance \sigma_\hat y^2 is a function of \sigma_\varepsilon ^2 , N and \tau. Setting the derivative of \sigma_\hat y^2 with respect to \tau equal to zero shows the minimum to occur at \tau=\bar t; i.e., at the centroid (sample average) of the samples t_n . The minimum statistical variance thus becomes \frac{\sigma_\varepsilon ^2 } {N} . This is equivalent to the statistical variance from polynomial least squares of a zero degree polynomial – i.e., of the centroid (sample average) of \alpha .[1][2][8] [9]

(2) The empirical statistical variance \sigma_\hat y^2 is a function of the quadratic \tau^2 . Moreover, the further \tau deviates from \bar t (even within the data window), the larger is the variance \sigma_\hat y^2 due to the random variable errors \varepsilon_n . The independent variable \tau can take any value on the t axis. It is not limited to the data window. It can extend beyond the data window – and likely will at times depending on the application. If it is within the data window, estimation is described as interpolation. If it is outside the data window, estimation is described as extrapolation. It is both intuitive and well known that the further is extrapolation, the larger is the error.[8]

(3) The empirical statistical variance \sigma_\hat y^2 due to the random variable errors \varepsilon_n is inversely proportional to N. As N increases, the statistical variance decreases. This is well known and what filtering out the errors  \varepsilon_n is all about.[1][2][8][12] The underlying purpose of polynomial least squares is to filter out the errors to improve estimation accuracy by reducing the empirical statistical estimation variance. In reality, only two data points are required to estimate \alpha and \beta ; albeit the more data points with zero mean statistical errors included, the smaller is the empirical statistical estimation variance as established by N samples.

(4) There is an additional issue to be considered when the noise variance is not measurable: Independent of the polynomial least squares estimation, any new observations would be described by the variance \sigma_\varepsilon^2\approx\hat {\sigma_\varepsilon^2}= (\bar {z^2}-\hat\alpha\bar{z} - \hat \beta \bar{zt}).[8][9]

Thus, the polynomial least squares statistical estimation variance \sigma_\hat y^2 and the statistical variance of any new sample in \sigma_\varepsilon ^2 would both contribute to the uncertainty of any future observation. Both variances are clearly determined by polynomial least squares in advance.

(5) This concept also applies to higher degree polynomials. However, the weighting function  w_n (\tau) is obviously more complicated. In addition, the estimation variances increase exponentially as polynomial degrees increase linearly (i.e., in unit steps). However, there are ways of dealing with this as described in.[6][7]

The synergy of integrating polynomial least squares with statistical estimation theory

Modeling polynomial least squares as a linear signal processing "system" creates the synergy of integrating polynomial least squares with statistical estimation theory to deterministically process samples of an assumed polynomial corrupted with a statistically described stochastic error ε. In the absence of the error ε, statistical estimation theory is irrelevant and polynomial least squares reverts back to the conventional approximation of complicated functions and scatter plots.

See also


References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Lua error in package.lua at line 80: module 'strict' not found.
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 Lua error in package.lua at line 80: module 'strict' not found.
  3. 3.0 3.1 3.2 Lua error in package.lua at line 80: module 'strict' not found.
  4. 4.0 4.1 4.2 Lua error in package.lua at line 80: module 'strict' not found.
  5. 5.0 5.1 Sorenson, H. W., Least-squares estimation: Gauss to Kalman, IEEE Spectrum, July, 1970.
  6. 6.0 6.1 6.2 6.3 6.4 Bell, J. W., Simple Disambiguation Of Orthogonal Projection In Kalman’s Filter Derivation, Proceedings of the International Conference on Radar Systems, Glasgow, UK. October, 2012.
  7. 7.0 7.1 7.2 7.3 7.4 Bell, J. W., A Simple Kalman Filter Alternative: The Multi-Fractional Order Estimator, IET-RSN, Vol. 7, Issue 8, October 2013.
  8. 8.00 8.01 8.02 8.03 8.04 8.05 8.06 8.07 8.08 8.09 8.10 8.11 8.12 8.13 8.14 8.15 8.16 8.17 8.18 8.19 [1]
  9. 9.0 9.1 9.2 9.3 9.4 9.5 9.6 Papoulis, A., Probability, RVs, and Stochastic Processes, McGraw-Hill, New York, 1965
  10. 10.0 10.1 Wylie, C. R., Jr., Advanced Engineering Mathematics, McGraw-Hill, New York, 1960.
  11. 11.0 11.1 11.2 Schied, F., Numerical Analysis, Schaum's Outline Series, McGraw-Hill, New York, 1968.
  12. Ordinary least squares