\(\newcommand{L}[1]{\| #1 \|}\newcommand{VL}[1]{\L{ \vec{#1} }}\newcommand{R}[1]{\operatorname{Re}\,(#1)}\newcommand{I}[1]{\operatorname{Im}\, (#1)}\)

Correlation and projection

Here we phrase the Pearson product-moment correlation coefficient in terms of vectors.

Say we have two vectors of \(n\) values:

\[ \begin{align}\begin{aligned}\vec{x} = [x_1, x_2, ... , x_n]\\\vec{y} = [y_1, y_2, ... , y_n]\end{aligned}\end{align} \]

Write the mean of the values in \(\vec{x}\) as \(\bar{x}\):

\[\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i\]

Define two new vectors, \(\vec{x^c}, \vec{y^c}\) that contain the values in \(\vec{x}, \vec{y}\) with their respective means subtracted:

\[ \begin{align}\begin{aligned}\vec{x^c} = [x_1 - \bar{x}, x_2 - \bar{x}, ... , x_n - \bar{x}]\\\vec{y^c} = [y_1 - \bar{y}, y_2 - \bar{y}, ... , y_n - \bar{y}]\end{aligned}\end{align} \]

Define the sample variance of \(\vec{x}\) as the mean of the squared deviations from the mean:

\[v_x = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2\]

The sample standard deviation of \(\vec{x}\):

\[s_x = \sqrt{v_x}\]

Now define the standardized versions of \(\vec{x}, \vec{y}\) as:

\[ \begin{align}\begin{aligned}\vec{x^z} = \frac{1}{s_x} \vec{x^c}\\\vec{y^z} = \frac{1}{s_y} \vec{y^c}\end{aligned}\end{align} \]

The Pearson product-moment correlation coefficient of \(\vec{x}, \vec{y}\) is given by \(1 / n\) times the dot product of \(\vec{x^z}, \vec{y^z}\):

\[r_{xy} = \frac{1}{n} \vec{x^z} \cdot \vec{y^z}\]

The equivalent expression in terms of sums rather than vectors is:

\[r_{xy} =\frac{1}{n} \sum ^n _{i=1} \left( \frac{x_i - \bar{x}}{s_x} \right) \left( \frac{y_i - \bar{y}}{s_y} \right)\]

Rewrite in terms of \(\vec{x^c}, \vec{y^c}\) (see Vectors and dot products):

\[ \begin{align}\begin{aligned}r_{xy} = \frac{1}{n} \frac{\vec{x^c} \cdot \vec{y^c}} {s_x s_y}\\s_x = \sqrt{ \frac{1}{n} \vec{x^c} \cdot \vec{x^c} } = \sqrt{\frac{1}{n}} \; \VL{x^c}\\s_y = \sqrt{\frac{1}{n}} \; \VL{y^c}\\r_{xy} = \frac{\vec{x^c} \cdot \vec{y^c}} {\VL{x^c} \VL{y^c}}\end{aligned}\end{align} \]

The Pearson product-moment correlation coefficient is the dot product between the vectors \(\vec{x^c}, \vec{y^c}\) after normalizing the vectors to unit length.

\(r_{xy}\) is therefore the cosine of the angle between \(\vec{x^c}\) and \(\vec{y^c}\).