$$\newcommand{L}{\| #1 \|}\newcommand{VL}{\L{ \vec{#1} }}\newcommand{R}{\operatorname{Re}\,(#1)}\newcommand{I}{\operatorname{Im}\, (#1)}$$

# Correlation and projection¶

Here we phrase the Pearson product-moment correlation coefficient in terms of vectors.

Say we have two vectors of $$n$$ values:

\begin{align}\begin{aligned}\vec{x} = [x_1, x_2, ... , x_n]\\\vec{y} = [y_1, y_2, ... , y_n]\end{aligned}\end{align}

Write the mean of the values in $$\vec{x}$$ as $$\bar{x}$$:

$\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i$

Define two new vectors, $$\vec{x^c}, \vec{y^c}$$ that contain the values in $$\vec{x}, \vec{y}$$ with their respective means subtracted:

\begin{align}\begin{aligned}\vec{x^c} = [x_1 - \bar{x}, x_2 - \bar{x}, ... , x_n - \bar{x}]\\\vec{y^c} = [y_1 - \bar{y}, y_2 - \bar{y}, ... , y_n - \bar{y}]\end{aligned}\end{align}

Define the sample variance of $$\vec{x}$$ as the mean of the squared deviations from the mean:

$v_x = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2$

The sample standard deviation of $$\vec{x}$$:

$s_x = \sqrt{v_x}$

Now define the standardized versions of $$\vec{x}, \vec{y}$$ as:

\begin{align}\begin{aligned}\vec{x^z} = \frac{1}{s_x} \vec{x^c}\\\vec{y^z} = \frac{1}{s_y} \vec{y^c}\end{aligned}\end{align}

The Pearson product-moment correlation coefficient of $$\vec{x}, \vec{y}$$ is given by $$1 / n$$ times the dot product of $$\vec{x^z}, \vec{y^z}$$:

$r_{xy} = \frac{1}{n} \vec{x^z} \cdot \vec{y^z}$

The equivalent expression in terms of sums rather than vectors is:

$r_{xy} =\frac{1}{n} \sum ^n _{i=1} \left( \frac{x_i - \bar{x}}{s_x} \right) \left( \frac{y_i - \bar{y}}{s_y} \right)$

Rewrite in terms of $$\vec{x^c}, \vec{y^c}$$ (see Vectors and dot products):

\begin{align}\begin{aligned}r_{xy} = \frac{1}{n} \frac{\vec{x^c} \cdot \vec{y^c}} {s_x s_y}\\s_x = \sqrt{ \frac{1}{n} \vec{x^c} \cdot \vec{x^c} } = \sqrt{\frac{1}{n}} \; \VL{x^c}\\s_y = \sqrt{\frac{1}{n}} \; \VL{y^c}\\r_{xy} = \frac{\vec{x^c} \cdot \vec{y^c}} {\VL{x^c} \VL{y^c}}\end{aligned}\end{align}

The Pearson product-moment correlation coefficient is the dot product between the vectors $$\vec{x^c}, \vec{y^c}$$ after normalizing the vectors to unit length.

$$r_{xy}$$ is therefore the cosine of the angle between $$\vec{x^c}$$ and $$\vec{y^c}$$.