logo

Coding for Data - 2020 edition

Coding for data

  • What is data science?
  • Why Data Science?
  • Tools and techniques
    • Computational Tools
    • Statistical Techniques
  • Text is data
    • Plotting the classics
    • Literary characters
    • Another kind of character
  • Surviving the computer
  • Our tools
  • Using Jupyter notebooks
  • More on the Jupyter notebook

On code

  • Ode to code
  • A sampling problem
  • A simpler problem
  • Introduction to variables
  • Names and variables
  • Expressions
  • Introduction to functions
  • Call expressions
  • A first pass at the simple problem

Data types

  • Types of things
  • Numbers
  • Strings
  • Strings, variables and expressions
  • String methods
  • Comparisons
  • Lists

Arrays

  • Arrays
  • Ranges
  • More on Arrays
  • Selecting values from an array
  • Making and filling arrays.
  • Function arguments
  • Boolean arrays
  • Leaping ahead

Iteration

  • Iteration
  • Indentation, indentation, indentation
  • Reply to the Supreme Court
  • Inference

Data frames

  • Indexing with Boolean arrays
  • Data frames
  • Introduction to data frames
  • Data frames, Series and arrays
  • Missing values
  • Pandas plotting methods

Population and permutation

  • Populations and permutations
  • Population and permutation
  • Brexit and ages
  • The idea of permutation
  • Permutation and the t-test

More building blocks

  • More building blocks
  • Introducing Functions
  • On None
  • Functions
  • Conditional Statements
  • Functions as values

Pandas, indices and labels

  • Indexing in Pandas
  • Noble politics and comparing counts
  • Handling Pandas safely
  • Storing and loading text
  • Numbers and strings

The mean and straight lines

  • The mean
  • The meaning of the mean
  • Where and argmin
  • The mean and slopes
  • Optimization
  • Where in 2D
  • Finding lines
  • Using minimize
  • Inference on slopes
  • Combining boolean arrays
  • Standard scores
  • Correlation

Classification

  • Classification
  • Nearest neighbors
  • Training and testing
  • Rows of tables
  • Implementing the classifier
  • Accuracy of the classifier
  • Multiple regression
  • Simple and multiple regression

Useful pandas

  • Cross-tabulation
  • The power of groupby
  • Merging

More on regression

  • Logistic regression

Confidence

  • On confidence
  • The Bootstrap
  • A problem for the education minister
  • The Central Limit Theorem
  • Random choice
  • The law of large numbers
  • Laws of probability
  • First Bayes
  • Bayes theorem
  • Bayes and bars
  • Confident treatment
  • Bayes, confidence

The end of the beginning

  • The end of the beginning
Powered by Jupyter Book

Setting up software and exercisesΒΆ

  • Installing the base software on your computer.

  • Starting a terminal application on your computer.

  • Installing extra packages such as OKpy.

  • Extracting zip arcives such as OKpy exercise .zip files.

By Matthew Brett, Ani Adhikari, John Denero, David Wagner
© Copyright 2021.