\(\newcommand{L}[1]{\| #1 \|}\newcommand{VL}[1]{\L{ \vec{#1} }}\newcommand{R}[1]{\operatorname{Re}\,(#1)}\newcommand{I}[1]{\operatorname{Im}\, (#1)}\)

# Correlation and projection¶

Here we phrase the Pearson product-moment correlation coefficient in terms of vectors.

Say we have two vectors of \(n\) values:

Write the mean of the values in \(\vec{x}\) as \(\bar{x}\):

Define two new vectors, \(\vec{x^c}, \vec{y^c}\) that contain the values in \(\vec{x}, \vec{y}\) with their respective means subtracted:

Define the sample variance of \(\vec{x}\) as the mean of the squared deviations from the mean:

The sample standard deviation of \(\vec{x}\):

Now define the standardized versions of \(\vec{x}, \vec{y}\) as:

The Pearson product-moment correlation coefficient of \(\vec{x}, \vec{y}\) is given by \(1 / n\) times the dot product of \(\vec{x^z}, \vec{y^z}\):

The equivalent expression in terms of sums rather than vectors is:

Rewrite in terms of \(\vec{x^c}, \vec{y^c}\) (see Vectors and dot products):

The Pearson product-moment correlation coefficient is the dot product between the vectors \(\vec{x^c}, \vec{y^c}\) after normalizing the vectors to unit length.

\(r_{xy}\) is therefore the cosine of the angle between \(\vec{x^c}\) and \(\vec{y^c}\).