Association (statistics)

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

In statistics, an association is any relationship between two measured quantities that renders them statistically dependent.[1] The term "association" is closely related to the term "correlation." Both terms imply that two or more variables vary according to some pattern. However, correlation is more rigidly defined by some correlation coefficient which measures the degree to which the association of the variables tends to a certain pattern. Sometimes the pattern of association is a simple linear relationship (as in the case of the popular Pearson product moment correlation coefficient (commonly called simply "the correlation coefficient"), although other forms of correlation are better suited to non-linear associations.[2][3][4]

Measures of association

There are many statistical measures of association that can be used to infer the presence or absence of association in a sample of data. Examples of such measures of association include the Pearson product moment correlation coefficient (mentioned above), the odds ratio "OR," Risk Ratio "RR," Absolute Risk Reduction "ARR" (all three of which ratios are used for dichotomous measurements), as well as other measures of association such as distance correlation, tetrachoric correlation coefficient, Goodman and Kruskal's lambda, Tschuprow's T and Cramér's V, or in information theory measures such as mutual information.

Association vs. causality

It is important to note that neither association nor correlation establish causality. This is necessary to state, because studies which show correlation are frequently misinterpreted or misconstrued to the effect that association by itself proves something useful. As an example, some factual association may be stated, such as that 89 percent of Washington DC-based journalists voted for Bill Clinton in 1992 [5] (we will assume that the statement is factual for the purposes of this example). Such an association may then be construed as proof that there is a hiring-bias in the media in America. However, another way of interpreting the same data might be to say that if journalists are the most politically informed portion of the population, then 89 percent of the most politically informed persons in America voted for Clinton that year, which implies a completely different form of causality (stating that being informed caused the persons to vote for Clinton). Neither of these interpretations can be established based solely on the numbers, though, and it may even be the case that both conclusions are incorrect, or that both are correct, or that they are partially correct. In other words, association by itself does not prove or disprove anything, and can only at best show that two things are mathematically related, whether or not they are causally related. Likewise, it is quite common (and yet erroneous) for people or groups to state that "studies show..." some given conclusion which is actually based only on statistical association rather than the implied causality suggested by the person or group citing the studies. This is not to say that the studies themselves are invalid (although they may be so), but rather that a study which looks only for correlation can only establish that there is correlation, not proof of why there is correlation.

See also

References

  1. Upton, G., Cook, I. (2006) Oxford Dictionary of Statistics, 2nd Edition, OUP. ISBN 978-0-19-954145-4
  2. Croxton, Frederick Emory; Cowden, Dudley Johnstone; Klein, Sidney (1968) Applied General Statistics, Pitman. ISBN 9780273403159 (page 625)
  3. Dietrich, Cornelius Frank (1991) Uncertainty, Calibration and Probability: The Statistics of Scientific and Industrial Measurement 2nd Edition, A. Higler. ISBN 9780750300605 (Page 331)
  4. Aitken, Alexander Craig (1957) Statistical Mathematics 8th Edition. Oliver & Boyd. ISBN 9780050013007 (Page 95)
  5. Media Bias Basics