Why Data Science?

Most important decisions are made with only partial information and uncertain outcomes. However, the degree of uncertainty for many decisions can be reduced sharply by public access to large data sets and the computational tools required to analyze them effectively. Data-driven decision making has already transformed a tremendous breadth of industries, including finance, advertising, manufacturing, and real estate. At the same time, a wide range of academic disciplines are evolving rapidly to incorporate large-scale data analysis into their theory and practice.

Studying data science enables individuals to bring these techniques to bear on their work, their scientific endeavors, and their personal decisions. Critical thinking has long been a hallmark of a rigorous education, but critiques are often most effective when supported by data. A critical analysis of any aspect of the world, may it be business or social science, involves inductive reasoning — conclusions can rarely been proven outright, only supported by the available evidence. Data science provides the means to make precise, reliable, and quantitative arguments about any set of observations. With unprecedented access to information and computing, critical thinking about any aspect of the world that can be measured would be incomplete without the inferential techniques that are core to data science.

Data science uses code. Code opens up new ways of analyzing all the data you can store on your computer, be they numbers, text, pictures, music, speech or video. Just as you can communicate with your computer, you can analyze communication, by studying text and network in data such as social networks, tweets, and emails.

Code gives you a new language to describe data, and the algorithms we use to draw conclusions from data. Many people find this new language allows greater precision and clarity that helps them think about the process of drawing conclusions under uncertainty. Code is a powerful way of explaining statistical reasoning, that is can be much more persuasive than traditional explanations using mathematics [Cob15, Cob07, Sim97].

Algorithms on data already have great impact on us and our society. Some of this impact is harmful or frightening. Melissa O’Neil’s book “Weapons of Math Destruction” gives a readable and alarming overview of misuse of code and algorithms that can have serious consequences for individuals and society [ONeil16]:

Among many examples of misuse, prison services in the USA have started to use code and algorithms to inform criminal sentencing, but the apparently objective output from these algorithms appears to be biased in favor of white offenders at the expense of black offenders.

Learning about data science will help you reason about these uses and abuses. Data science makes it possible for us to engage with the difficult decisions ahead about the role of code and data in making good decisions for our future society.

The world has too many unanswered questions and difficult challenges to leave this critical reasoning to only a few specialists. However, all educated adults can build the capacity to reason about data. The tools, techniques, and data sets are all readily available; this text aims to make them accessible to everyone.

Note

This page has content from the Why_Data_Science notebook of an older version of the UC Berkeley data science course. See the Berkeley course section of the license file.

References

Cob15

George Cobb. Mere renovation is too little too late: we need to rethink our undergraduate curriculum from the ground up. The American Statistician, 69(4):266–282, 2015.

Cob07

George W Cobb. The introductory statistics course: a ptolemaic curriculum? Technology Innovations in Statistics Education, 2007. URL: https://escholarship.org/uc/item/6hb3k0nz.

ONeil16

Cathy O'Neil. Weapons of math destruction: How big data increases inequality and threatens democracy. Crown Publishing Group, "New York", 2016.

Sim97

Julian Lincoln Simon. Resampling: The new statistics. Resampling Stats, Arlington, VA, 2 edition, 1997. URL: https://www.resample.com/intro-text-online.