This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.
The code below shows you what your “current directory” is. This the folder that R is working from.
getwd()
[1] "/Users/mb312/dev_trees/excel-but-simpler"
Use the “Session” menu in R Studio, and select “Set Working Directory” to navigate to the folder that contains the data for the workshop. This folder should contain files that include calibration.csv
. When you’ve done that, execute the chunk above to confirm the folder is what you expected.
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
The first thing we need to do, it so load the various tidyverse
packages. We do this with the R library
command. library
loads code packages so that we can use commands from those packages in our R session:
library('tidyverse')
package ‘tidyverse’ was built under R version 3.2.5Note: the specification for S3 class “difftime” in package ‘lubridate’ seems equivalent to one from package ‘hms’: not turning on duplicate class definitions for this class.
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
package ‘ggplot2’ was built under R version 3.2.5package ‘tibble’ was built under R version 3.2.5package ‘tidyr’ was built under R version 3.2.5package ‘readr’ was built under R version 3.2.5package ‘purrr’ was built under R version 3.2.5Conflicts with tidy packages ------------------------------
filter(): dplyr, stats
lag(): dplyr, stats
Let’s load the data first:
calibration = read_csv('calibration.csv')
Parsed with column specification:
cols(
BSA = col_integer(),
Absorbance = col_double()
)
calibration
Next we would like to do a simple scatterplot of the data. Do do this we will use the ggplot
function from the ggplot2
package. ggplot2
is part of the tidyverse
set of packages, so we have already loaded it with our library('tidyverse')
command above. Have a look at the command output above to confirm.
ggplot(calibration, aes(x=BSA, y=Absorbance)) +
geom_point(shape=1) # Use hollow circles
We would also like to see a regression line through these points. We do this by adding a lm
(Linear Model) component to the plot, like this:
ggplot(calibration, aes(x=BSA, y=Absorbance)) +
geom_point(shape=1) + # Use hollow circles
geom_smooth(method=lm) # linear model regression line
What is the formula for this line? We can show this (the least squares regression line) by estimating it again using R’s lm
command (Linear Model again):
fit = lm(Absorbance ~ BSA, data=calibration)
summary(fit)
essentially perfect fit: summary may be unreliable
Call:
lm(formula = Absorbance ~ BSA, data = calibration)
Residuals:
Min 1Q Median 3Q Max
-3.348e-17 -1.118e-17 -7.213e-18 5.671e-18 5.574e-17
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.850e-17 1.227e-17 1.508e+00 0.175
BSA 6.000e-03 4.952e-19 1.212e+16 <2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.6e-17 on 7 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.468e+32 on 1 and 7 DF, p-value: < 2.2e-16
Notice the Intercept and BSA slope in the Estimate
column above.
So far we have the line for predicting Absorbance
from BSA
- but it’s a simple switch to predict BSA
from Absorbance
:
reverse_fit = lm(BSA ~ Absorbance, data=calibration)
summary(reverse_fit)
essentially perfect fit: summary may be unreliable
Call:
lm(formula = BSA ~ Absorbance, data = calibration)
Residuals:
Min 1Q Median 3Q Max
-6.646e-15 -2.927e-15 -2.687e-15 1.480e-15 1.645e-14
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.105e-15 3.382e-15 2.101e+00 0.0738 .
Absorbance 1.667e+02 2.274e-14 7.330e+15 <2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.163e-15 on 7 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 5.372e+31 on 1 and 7 DF, p-value: < 2.2e-16
What it the relationship between the estimate of slope for this line, and the estimate for the Absorbance ~ BSA
line?
We can use this fit to predict BSA
from other values of Absorbance
. To do this, we make a new data table with the Absorbance
values to predict for. The tibble
function makes a new data table. We pass it the name of the new column - in our case, Absorbance
:
new_absorbance = tibble(Absorbance=c(0.25, 0.04, 0.4))
new_absorbance
Now we can predict BSA
values from the new Absorbance
values:
predict(reverse_fit, new_absorbance)
1 2 3
41.666667 6.666667 66.666667