There are two main sets of tools in data science:

  • The Python programming language, and its libraries, including Pandas for handling data in tables, and Matplotlib for plotting. Python is a general programming language that is excellent for teaching, but also for serious coding in business and academia.
  • The R language. The main page for R describes it as a “a free software environment for statistical computing and graphics”. R is a favorite for people who already know they need to do “statistics”, and want the largest range of statistical routines. R is the standard language used by statisticians.

Both languages have Notebook environments.

We will be using the Jupyter Notebook with Python.

To get set up for the class, see the setup instructions