freeCodeCamp/guide/english/machine-learning/principles/correlation-does-not-imply-.../index.md

3.1 KiB

title
Correlation Does not Imply Causation

Correlation Does not Imply Causation

Many Fitness and Health related websites often miss this point about research that tends to happen in these fields. They report the scientific research as Causation other than what it really is, Correlation. For eg. researchers found that early risers have lower BMI and are found to less obese. This correlation can be misrepresented as 'Waking up early can reduce chances of Obesity'. We do not know that just waking up early 'caused' the outcome - lower obesity. What we have found here is Correlation.

Informal definition of Correlation goes as - when event A happens, event B also tends to happen and vice-versa. Or people that wake up early tend to be towards the lower end of the weight spectrum. Both events tend to happen together. But it is not necessary that one event caused the other.

Causality means that event A 'caused' or lead to the happening of event B. For eg. if I stand in the sun, I would get tanned. Here then second event occurs because of the first.

In statistics, there is a lot of talk about correlated variables. A correlation is a relationship between two variables. Causation refers to a relationship where a change in one variable is responsible for the change of another variable. This is also known as a causal relationship.

When there is a causal relationship between two variables, there is also a correlation between them. But, a correlation between two variables does not imply a causal relationship between them. This is a logical fallacy.

This is because a correlation between two variables can be explained by many reasons:

  • One variable influences the other. This would be a causal relationship. For example, there is a correlation between household salary and number of cars owned.
  • Both variables influence each other. This would be a two-way causal relationship. For example, a correlation between education level and the wealth of a person.
  • There is another variable that is influencing both variables under examination. This would not be a causal relationship. For example, number of cars owned and size of the house may be correlated, but these two variables are influenced by another variable: salary. An increase in the number of cars owned does not influence the size of the house.
  • Correlation could be a random accident. This would not be a causal relationship. This is the explanation for the previous example of margarine consumption and the divorce rate in Maine.

In machine learning, correlation suffices for making a predictive model. However, just because two variables are correlated does not mean one variable influences the other. In other words, although machine learning may help find a relationship between two variables, it does not necessarily help find the reason for the relationship. Because of this, explanatory applications not only need correlation, but also causation.