Covariance and correlation

    To "measure" the tendency of two variables to vary proportionally, the concept of covariance is used, which derives its name from the relationship with the formula of variance: instead of the square of the difference from mean, we take the product of the two differences:

    Var(X) = mean( (X–mean(X))2 )

    Var(Y) = mean( (Y–mean(Y))2 )     covariance:   Cov(X,Y) = mean( (X–mean(X))·(Y–mean(Y)) )

    I can interpret it as an indicator that assumes an absolute value that goes down as far as the points tend to be arranged so as to present a vertical or horizontal symmetry and that grows as the points tend to be arranged along an oblique line. In fact the components of the summation (in the mean) represent areas "with sign" of rectangles whose dimensions are the distances "with sign" of the coordinates of the points from the coordinates of the center of gravity. In the figure below on the left (horizontal symmetry) the components of the sum two by two cancel each other out, so the covariance is zero. If you obliquely crush the cloud of points the compensation becomes only partial. In the case of the figure on the right (X and Y in linear relation) there is no compensation (all positive components).  The sign will be equal to the sign of the slope of the line along which the points tend to be arranged.

    Another possible interpretation is based on the observation that Cov(X,Y) = mean(X·Y)−mean(X)·mean(Y): the covariance is an indicator of the deviation of mean(X·Y) from mean(X)·mean(Y), ie from the value that mean(X·Y) would assume in the case of independence.

    To disregard the units of measurement in which X and Y are expressed (and to pass from an "area" to a pure number) the covariance is normalized by dividing by the standard deviation (ie the square root of variance) of X and Y, introducing the:

correlation coefficient:  r X,Y  =  
Cov(X,Y)
—————
√(Var(X) Var(Y))
 = 
Cov(X,Y)
—————
Sd(X)Sd(Y)

    These values refer to the situation in which X and Y are "all possible values".  In the experimental case (where I have N observations) the value of the variance (and of standard deviation) is slightly lower:

var(X) = Var(X) · (N−1)/N       Var(X) = var(X) · N/(N−1)
sd(X) = Sd(X) · ((N−1)/N)      Sd(X) = sd(X) · (N/(N−1))

    Here are the commands in R. In experimental cases the initial letter of the commands is capitalized.

x=c(220,300,210,350,270); y=c(32,38,27,50,25); n=length(x); n
#  5
mean(x); mean(y)
#  270    34.4
mean( (x-mean(x))^2 ); mean( (y-mean(y))^2 )
#  2680   81.04
Var(x); Var(y)
#  2680   81.04
mean( (x-mean(x))^2 )*n/(n-1); mean( (y-mean(y))^2 )*n/(n-1)
#  3350  101.3
var(x); var(y)
#  3350  101.3
mean( (x-mean(x))*(y-mean(y)) )
#  384
mean( (x-mean(x))*(y-mean(y)) )*n/(n-1)
#  480
cov(x,y)
#  480
Cov = cov(x,y)*(length(x)-1)/length(x); Cov
#  384
sd(x); sd(y)
#  57.87918  10.06479
Sd(x); Sd(y)
#  51.76872   9.002222

    In both the theoretical and experimental cases, the same value is obtained for the correlation coefficient:

cov(x,y)/(sd(x)*sd(y))
# 0.8239751
cov(x,y)*(length(x)-1)/length(x)/(Sd(x)*Sd(y))
# 0.8239751
cor(x,y)
# 0.8239751