This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.

The code below shows you what your “current directory” is. This the folder that R is working from.

getwd()
[1] "/Users/mb312/dev_trees/excel-but-simpler"

Use the “Session” menu in R Studio, and select “Set Working Directory” to navigate to the folder that contains the data for the workshop. This folder should contain files that include calibration.csv. When you’ve done that, execute the chunk above to confirm the folder is what you expected.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).

The first thing we need to do, it so load the various tidyverse packages. We do this with the R library command. library loads code packages so that we can use commands from those packages in our R session:

library('tidyverse')
package ‘tidyverse’ was built under R version 3.2.5Note: the specification for S3 class “difftime” in package ‘lubridate’ seems equivalent to one from package ‘hms’: not turning on duplicate class definitions for this class.
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
package ‘ggplot2’ was built under R version 3.2.5package ‘tibble’ was built under R version 3.2.5package ‘tidyr’ was built under R version 3.2.5package ‘readr’ was built under R version 3.2.5package ‘purrr’ was built under R version 3.2.5Conflicts with tidy packages ------------------------------
filter(): dplyr, stats
lag():    dplyr, stats

Let’s load the data first:

calibration = read_csv('calibration.csv')
Parsed with column specification:
cols(
  BSA = col_integer(),
  Absorbance = col_double()
)
calibration

Next we would like to do a simple scatterplot of the data. Do do this we will use the ggplot function from the ggplot2 package. ggplot2 is part of the tidyverse set of packages, so we have already loaded it with our library('tidyverse') command above. Have a look at the command output above to confirm.

ggplot(calibration, aes(x=BSA, y=Absorbance)) +
    geom_point(shape=1)      # Use hollow circles

We would also like to see a regression line through these points. We do this by adding a lm (Linear Model) component to the plot, like this:

ggplot(calibration, aes(x=BSA, y=Absorbance)) +
    geom_point(shape=1) +    # Use hollow circles
    geom_smooth(method=lm)   # linear model regression line

What is the formula for this line? We can show this (the least squares regression line) by estimating it again using R’s lm command (Linear Model again):

fit = lm(Absorbance ~ BSA, data=calibration)
summary(fit)
essentially perfect fit: summary may be unreliable

Call:
lm(formula = Absorbance ~ BSA, data = calibration)

Residuals:
       Min         1Q     Median         3Q        Max 
-3.348e-17 -1.118e-17 -7.213e-18  5.671e-18  5.574e-17 

Coefficients:
             Estimate Std. Error   t value Pr(>|t|)    
(Intercept) 1.850e-17  1.227e-17 1.508e+00    0.175    
BSA         6.000e-03  4.952e-19 1.212e+16   <2e-16 ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.6e-17 on 7 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 1.468e+32 on 1 and 7 DF,  p-value: < 2.2e-16

Notice the Intercept and BSA slope in the Estimate column above.

So far we have the line for predicting Absorbance from BSA - but it’s a simple switch to predict BSA from Absorbance:

reverse_fit = lm(BSA ~ Absorbance, data=calibration)
summary(reverse_fit)
essentially perfect fit: summary may be unreliable

Call:
lm(formula = BSA ~ Absorbance, data = calibration)

Residuals:
       Min         1Q     Median         3Q        Max 
-6.646e-15 -2.927e-15 -2.687e-15  1.480e-15  1.645e-14 

Coefficients:
             Estimate Std. Error   t value Pr(>|t|)    
(Intercept) 7.105e-15  3.382e-15 2.101e+00   0.0738 .  
Absorbance  1.667e+02  2.274e-14 7.330e+15   <2e-16 ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.163e-15 on 7 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 5.372e+31 on 1 and 7 DF,  p-value: < 2.2e-16

What it the relationship between the estimate of slope for this line, and the estimate for the Absorbance ~ BSA line?

We can use this fit to predict BSA from other values of Absorbance. To do this, we make a new data table with the Absorbance values to predict for. The tibble function makes a new data table. We pass it the name of the new column - in our case, Absorbance:

new_absorbance = tibble(Absorbance=c(0.25, 0.04, 0.4))
new_absorbance

Now we can predict BSA values from the new Absorbance values:

predict(reverse_fit, new_absorbance)
        1         2         3 
41.666667  6.666667 66.666667 
LS0tCnRpdGxlOiAiQ2FsaWJyYXRpb24iCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KClRoaXMgaXMgYW4gW1IgTWFya2Rvd25dKGh0dHA6Ly9ybWFya2Rvd24ucnN0dWRpby5jb20pIE5vdGVib29rLiBXaGVuIHlvdSBleGVjdXRlIGNvZGUgd2l0aGluIHRoZSBub3RlYm9vaywgdGhlIHJlc3VsdHMgYXBwZWFyIGJlbmVhdGggdGhlIGNvZGUuIAoKVHJ5IGV4ZWN1dGluZyB0aGlzIGNodW5rIGJ5IGNsaWNraW5nIHRoZSAqUnVuKiBidXR0b24gd2l0aGluIHRoZSBjaHVuayBvciBieSBwbGFjaW5nIHlvdXIgY3Vyc29yIGluc2lkZSBpdCBhbmQgcHJlc3NpbmcgKkNtZCtTaGlmdCtFbnRlciouCgpUaGUgY29kZSBiZWxvdyBzaG93cyB5b3Ugd2hhdCB5b3VyICJjdXJyZW50IGRpcmVjdG9yeSIgaXMuICBUaGlzIHRoZSBmb2xkZXIgdGhhdCBSIGlzIHdvcmtpbmcgZnJvbS4gCgpgYGB7cn0KZ2V0d2QoKQpgYGAKVXNlIHRoZSAiU2Vzc2lvbiIgbWVudSBpbiBSIFN0dWRpbywgYW5kIHNlbGVjdCAiU2V0IFdvcmtpbmcgRGlyZWN0b3J5IiB0byBuYXZpZ2F0ZSB0byB0aGUgZm9sZGVyIHRoYXQgY29udGFpbnMgdGhlIGRhdGEgZm9yIHRoZSB3b3Jrc2hvcC4gIFRoaXMgZm9sZGVyIHNob3VsZCBjb250YWluIGZpbGVzIHRoYXQgaW5jbHVkZSBgY2FsaWJyYXRpb24uY3N2YC4gIFdoZW4geW91J3ZlIGRvbmUgdGhhdCwgZXhlY3V0ZSB0aGUgY2h1bmsgYWJvdmUgdG8gY29uZmlybSB0aGUgZm9sZGVyIGlzIHdoYXQgeW91IGV4cGVjdGVkLgoKQWRkIGEgbmV3IGNodW5rIGJ5IGNsaWNraW5nIHRoZSAqSW5zZXJ0IENodW5rKiBidXR0b24gb24gdGhlIHRvb2xiYXIgb3IgYnkgcHJlc3NpbmcgKkNtZCtPcHRpb24rSSouCgpXaGVuIHlvdSBzYXZlIHRoZSBub3RlYm9vaywgYW4gSFRNTCBmaWxlIGNvbnRhaW5pbmcgdGhlIGNvZGUgYW5kIG91dHB1dCB3aWxsIGJlIHNhdmVkIGFsb25nc2lkZSBpdCAoY2xpY2sgdGhlICpQcmV2aWV3KiBidXR0b24gb3IgcHJlc3MgKkNtZCtTaGlmdCtLKiB0byBwcmV2aWV3IHRoZSBIVE1MIGZpbGUpLgoKVGhlIGZpcnN0IHRoaW5nIHdlIG5lZWQgdG8gZG8sIGl0IHNvIGxvYWQgdGhlIHZhcmlvdXMgYHRpZHl2ZXJzZWAgcGFja2FnZXMuICBXZSBkbyB0aGlzIHdpdGggdGhlIFIgYGxpYnJhcnlgIGNvbW1hbmQuIGBsaWJyYXJ5YCBsb2FkcyBjb2RlIHBhY2thZ2VzIHNvIHRoYXQgd2UgY2FuIHVzZSBjb21tYW5kcyBmcm9tIHRob3NlIHBhY2thZ2VzIGluIG91ciBSIHNlc3Npb246CgpgYGB7cn0KbGlicmFyeSgndGlkeXZlcnNlJykKYGBgCkxldCdzIGxvYWQgdGhlIGRhdGEgZmlyc3Q6CgpgYGB7cn0KY2FsaWJyYXRpb24gPSByZWFkX2NzdignY2FsaWJyYXRpb24uY3N2JykKY2FsaWJyYXRpb24KYGBgCk5leHQgd2Ugd291bGQgbGlrZSB0byBkbyBhIHNpbXBsZSBzY2F0dGVycGxvdCBvZiB0aGUgZGF0YS4gIERvIGRvIHRoaXMgd2Ugd2lsbCB1c2UgdGhlIGBnZ3Bsb3RgIGZ1bmN0aW9uIGZyb20gdGhlIGBnZ3Bsb3QyYCBwYWNrYWdlLiAgYGdncGxvdDJgIGlzIHBhcnQgb2YgdGhlIGB0aWR5dmVyc2VgIHNldCBvZiBwYWNrYWdlcywgc28gd2UgaGF2ZSBhbHJlYWR5IGxvYWRlZCBpdCB3aXRoIG91ciBgbGlicmFyeSgndGlkeXZlcnNlJylgIGNvbW1hbmQgYWJvdmUuICBIYXZlIGEgbG9vayBhdCB0aGUgY29tbWFuZCBvdXRwdXQgYWJvdmUgdG8gY29uZmlybS4KCgpgYGB7cn0KZ2dwbG90KGNhbGlicmF0aW9uLCBhZXMoeD1CU0EsIHk9QWJzb3JiYW5jZSkpICsKICAgIGdlb21fcG9pbnQoc2hhcGU9MSkgICAgICAjIFVzZSBob2xsb3cgY2lyY2xlcwpgYGAKV2Ugd291bGQgYWxzbyBsaWtlIHRvIHNlZSBhIHJlZ3Jlc3Npb24gbGluZSB0aHJvdWdoIHRoZXNlIHBvaW50cy4gV2UgZG8gdGhpcyBieSBhZGRpbmcgYSBgbG1gIChMaW5lYXIgTW9kZWwpIGNvbXBvbmVudCB0byB0aGUgcGxvdCwgbGlrZSB0aGlzOgpgYGB7cn0KZ2dwbG90KGNhbGlicmF0aW9uLCBhZXMoeD1CU0EsIHk9QWJzb3JiYW5jZSkpICsKICAgIGdlb21fcG9pbnQoc2hhcGU9MSkgKyAgICAjIFVzZSBob2xsb3cgY2lyY2xlcwogICAgZ2VvbV9zbW9vdGgobWV0aG9kPWxtKSAgICMgbGluZWFyIG1vZGVsIHJlZ3Jlc3Npb24gbGluZQpgYGAKV2hhdCBpcyB0aGUgZm9ybXVsYSBmb3IgdGhpcyBsaW5lPyAgV2UgY2FuIHNob3cgdGhpcyAodGhlIGxlYXN0IHNxdWFyZXMgcmVncmVzc2lvbiBsaW5lKSBieSBlc3RpbWF0aW5nIGl0IGFnYWluIHVzaW5nIFIncyBgbG1gIGNvbW1hbmQgKExpbmVhciBNb2RlbCBhZ2Fpbik6CmBgYHtyfQpmaXQgPSBsbShBYnNvcmJhbmNlIH4gQlNBLCBkYXRhPWNhbGlicmF0aW9uKQpzdW1tYXJ5KGZpdCkKYGBgCk5vdGljZSB0aGUgSW50ZXJjZXB0IGFuZCBCU0Egc2xvcGUgaW4gdGhlIGBFc3RpbWF0ZWAgY29sdW1uIGFib3ZlLgoKU28gZmFyIHdlIGhhdmUgdGhlIGxpbmUgZm9yIHByZWRpY3RpbmcgYEFic29yYmFuY2VgIGZyb20gYEJTQWAgLSBidXQgaXQncyBhIHNpbXBsZSBzd2l0Y2ggdG8gcHJlZGljdCBgQlNBYCBmcm9tIGBBYnNvcmJhbmNlYDoKYGBge3J9CnJldmVyc2VfZml0ID0gbG0oQlNBIH4gQWJzb3JiYW5jZSwgZGF0YT1jYWxpYnJhdGlvbikKc3VtbWFyeShyZXZlcnNlX2ZpdCkKYGBgCldoYXQgaXQgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHRoZSBlc3RpbWF0ZSBvZiBzbG9wZSBmb3IgdGhpcyBsaW5lLCBhbmQgdGhlIGVzdGltYXRlIGZvciB0aGUgYEFic29yYmFuY2UgfiBCU0FgIGxpbmU/CgpXZSBjYW4gdXNlIHRoaXMgZml0IHRvIHByZWRpY3QgYEJTQWAgZnJvbSBvdGhlciB2YWx1ZXMgb2YgYEFic29yYmFuY2VgLiBUbyBkbyB0aGlzLCB3ZSBtYWtlIGEgbmV3IGRhdGEgdGFibGUgd2l0aCB0aGUgYEFic29yYmFuY2VgIHZhbHVlcyB0byBwcmVkaWN0IGZvci4gIFRoZSBgdGliYmxlYCBmdW5jdGlvbiBtYWtlcyBhIG5ldyBkYXRhIHRhYmxlLiBXZSBwYXNzIGl0IHRoZSBuYW1lIG9mIHRoZSBuZXcgY29sdW1uIC0gaW4gb3VyIGNhc2UsIGBBYnNvcmJhbmNlYDoKYGBge3J9Cm5ld19hYnNvcmJhbmNlID0gdGliYmxlKEFic29yYmFuY2U9YygwLjI1LCAwLjA0LCAwLjQpKQpuZXdfYWJzb3JiYW5jZQpgYGAKCk5vdyB3ZSBjYW4gcHJlZGljdCBgQlNBYCB2YWx1ZXMgZnJvbSB0aGUgbmV3IGBBYnNvcmJhbmNlYCB2YWx1ZXM6CgpgYGB7cn0KcHJlZGljdChyZXZlcnNlX2ZpdCwgbmV3X2Fic29yYmFuY2UpCmBgYAo=