Correlation and Causation
Understanding correlation
Within any population the variables that concern a researcher will hold different values. This difference in value for any variable becomes the basis of different types of analysis, which go beyond simply counting categories of the phenomenon. This type of analysis engages the use of variation to make statements about the nature of the relationship between variables. One of the ways to measure the association between two variables is the use of correlation. Correlation is consequently a useful tool that provides a quantitative measure of the presumed relationship between two or more variables.
Correlation therefore is a statistical technique that provides a numerical or quantitative assessment of the degree to which two variable co-vary. The idea of association is tied to the concept of co-variation. Co-variation occurs when two variables change values. This changing of values is a conceptual association that exists as a consequence of the way in which we try to make sense of the world. Within the mind of the observer it is possible to consider that the presence of y is linked to the presence of y. This linking is as a consequence of observing instances of x and seeing instances of y existing within close proximity to y. One may observe that changes in the diet may result in the loss or gaining of weight. This observation forms the basis of common understandings about the relationship between things. What scientist have attempted to do is to measure the strength of that relationship, thus providing a number that can be compared to other numbers to indicate different features of the observed relationship.
The main way to represent a correlation is to use the correlation coefficient (r). The correlation coefficient is the product of a series of statistical calculations that are produced when either the Pearson's Product Moment Correlation or the Spearman Rho is computed. The correlation coefficient ranges in value from -1.0 to + 1.0. The larger the size of the correlation coefficient that is, (tending toward 1 or -1) the stronger the relationship between the variables being tested. Moderate correlations are understood to begin at around 0.6 and weak correlations around 0.4 these values may be positive or negative. If the correlation coefficient is 0 then that suggests there is no relationship between the variables being tested.
The positive and negative signs are very important in interpreting the correlation between two variables. While the number tells the magnitude or size of the correlation the sign before the number indicates the direction of the correlation. The direction of the correlation can be positive or negative. These directions are also known as a direct correlation and an inverse correlation (Cooper & Schindler 2011). With a direct correlation the values of both variables increase together. Consequently as the number of calories that an individual ingests increases their weight may also increase. The relationship that has been describe is a positive correlation, where as one variable increases the other decreases. In an indirect correlation where there is an inverse relationship as the independent variable increases the dependent variable decreases or goes in the opposite direction. This negative correlation is represented by the negative sign in front of the correlation coefficient.
Correlations can also be represented graphically. A scatterplot can be used to plot or graph the relationship between two variables (Howell 2011). In constructing a scatterplot the independent variable is usually placed on the x axis and the dependent variable on the y axis. Then values of x are plotted with the corresponding values of y. If the relationship between the variables is positive then a line drawn through the points will slope upward. This upward slope signifies that as the values for x increase the values corresponding values for y also increase. The opposite will be true for a negative relationship. In this case the line will slope downward and the correlation would be negative.
In interpreting the correlation coefficient there is one other factor to consider and that is the p value. The statistical program will produce a probability statistic that suggests how likely the result that is observed is the product of chance. The researcher then examines the p value and compares it to the predetermined alpha level that was set for this particular test. If the p value is less than the alpha level the researcher must make a decision to reject the null hypothesis. If however, the p value is greater than the alpha level then the researcher must accept the null hypothesis. The correct interpretation of a correlation statistic involves the consideration of the magnitude of the correlation coefficient, the sign in front of the coefficient and the p value produced by the statistical test.
This type of interpretation can be observed with the following example. If the pre-loan and post-loan expenditure on school books by students is examined a correlation can be determined.
A scatter plot of the data would suggest that the correlation between the data points would be positive.
Pre-Loan
Post-Loan
36
74
33
62
75
55
73
93
65
50
83
62
77
74
67
44
70
99
87
40
83
37
71
83
59
85
72
57
86
55
88
89
79
91
73
Using excel a correlation of the above data produces a value of 0.85. The r value suggests that there is a strong positive relationship between the two values.
You’re 77% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.