\(\newcommand{L}[1]{\| #1 \|}\newcommand{VL}[1]{\L{ \vec{#1} }}\newcommand{R}[1]{\operatorname{Re}\,(#1)}\newcommand{I}[1]{\operatorname{Im}\, (#1)}\)
Correlation and projection¶
Here we phrase the Pearson product-moment correlation coefficient in terms of vectors.
Say we have two vectors of \(n\) values:
Write the mean of the values in \(\vec{x}\) as \(\bar{x}\):
Define two new vectors, \(\vec{x^c}, \vec{y^c}\) that contain the values in \(\vec{x}, \vec{y}\) with their respective means subtracted:
Define the sample variance of \(\vec{x}\) as the mean of the squared deviations from the mean:
The sample standard deviation of \(\vec{x}\):
Now define the standardized versions of \(\vec{x}, \vec{y}\) as:
The Pearson product-moment correlation coefficient of \(\vec{x}, \vec{y}\) is given by \(1 / n\) times the dot product of \(\vec{x^z}, \vec{y^z}\):
The equivalent expression in terms of sums rather than vectors is:
Rewrite in terms of \(\vec{x^c}, \vec{y^c}\) (see Vectors and dot products):
The Pearson product-moment correlation coefficient is the dot product between the vectors \(\vec{x^c}, \vec{y^c}\) after normalizing the vectors to unit length.
\(r_{xy}\) is therefore the cosine of the angle between \(\vec{x^c}\) and \(\vec{y^c}\).