--- title: Correlation Does not Imply Causation --- ## Correlation Does not Imply Causation Many Fitness and Health related websites often miss this point about research that tends to happen in these fields. They report the scientific research as Causation other than what it really is, Correlation. For eg. researchers found that early risers have lower BMI and are found to less obese. This correlation can be misrepresented as 'Waking up early can reduce chances of Obesity'. We do not know that just waking up early 'caused' the outcome - lower obesity. What we have found here is Correlation. Informal definition of Correlation goes as - when event A happens, event B also tends to happen and vice-versa. Or people that wake up early tend to be towards the lower end of the weight spectrum. Both events tend to happen together. But it is not necessary that one event caused the other. Causality means that event A 'caused' or lead to the happening of event B. For eg. if I stand in the sun, I would get tanned. Here then second event occurs because of the first. In statistics, there is a lot of talk about **correlated variables**. A correlation is a relationship between two variables. **Causation** refers to a relationship where a change in one variable **is responsible for** the change of another variable. This is also known as a **causal relationship**. When there is a causal relationship between two variables, there is also a correlation between them. But, a correlation between two variables does not imply a causal relationship between them. This is a logical fallacy. This is because a correlation between two variables can be explained by many reasons: - One variable influences the other. This _would_ be a causal relationship. For example, there is a correlation between household salary and number of cars owned. - Both variables influence each other. This _would_ be a two-way causal relationship. For example, a correlation between education level and the wealth of a person. - There is another variable that is influencing both variables under examination. This would _not_ be a causal relationship. For example, number of cars owned and size of the house may be correlated, but these two variables are influenced by another variable: salary. An increase in the number of cars owned does not influence the size of the house. - Correlation could be a random accident. This would _not_ be a causal relationship. This is the explanation for the previous example of margarine consumption and the divorce rate in Maine. In machine learning, correlation suffices for making a predictive model. However, just because two variables are correlated does not mean one variable influences the other. In other words, although machine learning may help find a relationship between two variables, it does not necessarily help find the reason for the relationship.