It’s common to hear that correlation does not imply causation. It’s certainly true in the strong sense that observing a correlation between X and Y does not prove causation X->Y, because the true causality might be Y->X or Z->(X,Y).
Some wag (Feynman?) pointed out that “correlation does not imply causation, but it’s a good start.” This is also true. If you’re trying to understand the causes of Y, data mining for things that are correlated is a useful exploratory step, even if it proves nothing. If you find something, then you can look for plausible mechanisms, try experiments, etc.
Some go a little further than this, and combine Popper’s falsification with causality criteria to argue that lack of correlation does imply lack of causation. Unfortunately, this is untrue, for a number of reasons:
- Measurement error – in OLS regression, the slope is just the correlation coefficient normalized by standard deviations. However, if there’s measurement error in the RHS variables, not just equation error affecting the LHS, the slope is affected by attenuation bias. In other words, a poor signal to noise ratio destroys apparent correlation, even when causality is present.
- Integration – bathtub dynamics renders pattern matching incorrect, and destroys correlations, even in synthetic data experiments where causation is known to exist.
- Nonlinearity – there are many possible bivariate patterns that result in a linear correlation coefficient of 0 despite an obvious (possibly causal) relationship.
Most systems have all three of these features to some extent, and they gain strength in combination. Noise integrates into the system stocks, and the slope or correlation of a relationship may reverse, depending on system state. Sugihara et al. show that Granger Causality fails, because “in deterministic dynamic systems (even noisy ones), if X is a cause for Y, information about X will be redundantly present in Y itself and cannot formally be removed….”
The common thread here is that no method can say much about causality if the assumptions neglect features of the system dynamics (integration or nonlinearity) or stochastic processes (measurement error and driving noise). Sometimes you get lucky, because you have a natural experiment, or high precision measurements, or simply loads of data about benign dynamics, but luck rarely coincides with big novel problems. Presence or absence of correlation is suggestive but far from definitive.