1.3.2 Statistical techniques

Statistical Techniques

Modern data analysis involves techniques from several fields, including statistics and machine learning.

Statistics is the body of knowledge about drawing conclusions from incomplete information. Put another way, statistics allows us to reason about data in the presence of uncertainty.

Among the statistical tools that we will use are hypothesis testing and tools for prediction of new data.

Because we are doing data science, we will take a rather different approach from standard statistical teaching. Data science puts coding at the heart of data analysis. Coding gives us a much more powerful set of tools than have been available in the past. This has great benefits for analysis, but also for teaching.

The benefits for analysis are that we can get, clean, and analyze a much wider range of data. It becomes natural to extend our methods of analysis to techniques based in computation, like machine learning.

The benefits for teaching are two-fold. The first benefit is our ability to analyze real data. The greater power and range of our tools allow us to analyze real-world, messy data instead of cleaned-up toy datasets, so you are better prepared to analyze the real data you will soon have to deal with in your education and work. The second benefit is that an emphasis on computation allows us to use richer, simpler and more powerful techniques, based in resampling, that are easier to explain, and have a deeper relationship to the models that we are using. We will rely much less on mathematics, and that gives us time to explain the ideas in another way, and in more depth.

See the Statistical techniques page from the UC Berkeley course, for more discussion.