How to Measure the Covariance and Correlation of Data Samples

When comparing data samples from different populations, two of the most popular measures of association are covariance and correlation. Covariance and correlation show that variables can have a positive relationship, a negative relationship, or no relationship at all.
A sample is a randomly chosen selection of elements from an underlying population.
Sample covariance measures the strength and the direction of the relationship between the elements of two samples, and the sample correlation is derived from the covariance. The sample covariance between two variables, X and Y, is
image0.png
Here’s what each element in this equation means:
  • sXY = the sample covariance between variables X and Y (the two subscripts indicate that this is the sample covariance, not the sample standard deviation).
    image1.png
  • n = the number of elements in both samples.
  • i = an index that assigns a number to each sample element, ranging from 1 to n.
  • Xi = a single element in the sample for X.
  • Yi = a single element in the sample for Y.
    image2.png
The sample covariance may have any positive or negative value.
You calculate the sample correlation (also known as the sample correlation coefficient) between X and Y directly from the sample covariance with the following formula:
image3.png
The key terms in this formula are
  • rXY = sample correlation between X and Y
  • sXY = sample covariance between X and Y
  • sX = sample standard deviation of X
  • sY = sample standard deviation of Y
The formula used to compute the sample correlation coefficient ensures that its value ranges between –1 and 1.
For example, suppose you take a sample of stock returns from the Excelsior Corporation and the Adirondack Corporation from the years 2008 to 2012, as shown here:
Year Excelsior Corp. Annual Return (percent) (X) Adirondack Corp. Annual Return (percent) (Y)
2008 1 3
2009 –2 2
2010 3 4
2011 0 6
2012 3 0
What are the covariance and correlation between the stock returns? To figure that out, you first have to find the mean of each sample. In this example, X represents the returns to Excelsior and Y represents the returns to Adirondack.
  • The sample mean of X is
    image4.png
You obtain the sample mean by summing all the elements of the sample and then dividing by the sample size. In this case, the sample elements sum to 5 and the sample size is 5. Dividing these numbers gives a sample mean of 1.
  • The sample mean of Y is
    image5.png
This table shows the remaining calculations for the sample covariance:
image6.png
In the table, the
image7.png
column represents the differences between each return to Excelsior in the sample and the sample mean; similarly, the
image8.png
column represents the same calculations for Adirondack. The entries in the
image9.png
column equal the product of the entries in the previous two columns. The sum of the
image10.png
column gives the numerator in the sample covariance formula:
image11.png
The denominator equals the sample size minus one, which is 5 – 1 = 4. (Both samples have five elements, n = 5.) Therefore, the sample covariance equals
image12.png
To calculate the sample correlation coefficient, divide the sample covariance by the product of the sample standard deviation of X and the sample standard deviation of Y:
image13.png
You find the sample standard deviation of X by computing the sample variance of X and then taking the square root of the result. The table shows the calculations for the sample variance of X.
image14.png
In the table, the
image15.png
column represents the differences between each return to Excelsior in the sample and the sample mean; the
image16.png
column represents the squared difference between each return to Excelsior and the sample mean. The sum of the
image17.png
column gives the numerator in the sample variance formula. You divide this number by the sample size minus one (5 – 1 = 4) to get the sample variance of X:
image18.png
The sample standard deviation of X is the square root of 4.5, or
image19.png
The table shows the calculations for the sample variance of Y.
image20.png
Based on the calculations in the table, the sample variance of Y equals
image21.png
The sample standard deviation of Y equals the square root of 5, or
image22.png
Substituting these values into the sample correlation formula gives you
image23.png
The negative result shows that there’s a weak negative correlation between the stock returns of Excelsior and Adirondack.
If two variables are perfectly negatively correlated (they always move in opposite directions), their correlation will be –1.

The correlation between the returns to Excelsior and Adirondack stock is a –0.2108, which indicates that the two variables show a slight tendency to move in opposite directions.

Zero result: If two variables are independent (unrelated to each other), their correlation will be 0.

Correlation of 1: means perfect correlation

Correlation does not imply causation, the cause of the relation maybe something completely different.

Comments

Popular posts from this blog

Maxpooling vs minpooling vs average pooling

Percentiles, Deciles, and Quartiles

Understand the Softmax Function in Minutes