{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# HIDDEN\n", "import numpy as np\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('fivethirtyeight')\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Describing distributions\n", "\n", "We have seen several examples of *distributions*.\n", "\n", "We can describe distributions as having a *center*, and a *spread*.\n", "\n", "In [the mean as predictor](../08/mean_meaning), we saw that the mean is a useful measure of the center of a distribution.\n", "\n", "What measure should we use for the spread?\n", "\n", "## Chronic kidney disease\n", "\n", "We're going to work with a data set that was collected to help doctors diagnose chronic kidney disease (CKD). Each row in the data set represents a single patient who was treated in the past and whose diagnosis is known. For each patient, we have a bunch of measurements from a blood test.\n", "\n", "You will see more of this dataset soon.\n", "\n", "If you are running on your laptop, you should download the\n", "[ckd.csv]({{ site.baseurl }}/data/ckd.csv) file to the same\n", "directory as this notebook." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "lines_to_next_cell": 0 }, "outputs": [ { "data": { "text/html": [ "
\n", " | Age | \n", "Blood Pressure | \n", "Specific Gravity | \n", "Albumin | \n", "Sugar | \n", "Red Blood Cells | \n", "Pus Cell | \n", "Pus Cell clumps | \n", "Bacteria | \n", "Blood Glucose Random | \n", "... | \n", "Packed Cell Volume | \n", "White Blood Cell Count | \n", "Red Blood Cell Count | \n", "Hypertension | \n", "Diabetes Mellitus | \n", "Coronary Artery Disease | \n", "Appetite | \n", "Pedal Edema | \n", "Anemia | \n", "Class | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "48 | \n", "70 | \n", "1.005 | \n", "4 | \n", "0 | \n", "normal | \n", "abnormal | \n", "present | \n", "notpresent | \n", "117 | \n", "... | \n", "32 | \n", "6700 | \n", "3.9 | \n", "yes | \n", "no | \n", "no | \n", "poor | \n", "yes | \n", "yes | \n", "1 | \n", "
1 | \n", "53 | \n", "90 | \n", "1.020 | \n", "2 | \n", "0 | \n", "abnormal | \n", "abnormal | \n", "present | \n", "notpresent | \n", "70 | \n", "... | \n", "29 | \n", "12100 | \n", "3.7 | \n", "yes | \n", "yes | \n", "no | \n", "poor | \n", "no | \n", "yes | \n", "1 | \n", "
2 | \n", "63 | \n", "70 | \n", "1.010 | \n", "3 | \n", "0 | \n", "abnormal | \n", "abnormal | \n", "present | \n", "notpresent | \n", "380 | \n", "... | \n", "32 | \n", "4500 | \n", "3.8 | \n", "yes | \n", "yes | \n", "no | \n", "poor | \n", "yes | \n", "no | \n", "1 | \n", "
3 | \n", "68 | \n", "80 | \n", "1.010 | \n", "3 | \n", "2 | \n", "normal | \n", "abnormal | \n", "present | \n", "present | \n", "157 | \n", "... | \n", "16 | \n", "11000 | \n", "2.6 | \n", "yes | \n", "yes | \n", "yes | \n", "poor | \n", "yes | \n", "no | \n", "1 | \n", "
4 | \n", "61 | \n", "80 | \n", "1.015 | \n", "2 | \n", "0 | \n", "abnormal | \n", "abnormal | \n", "notpresent | \n", "notpresent | \n", "173 | \n", "... | \n", "24 | \n", "9200 | \n", "3.2 | \n", "yes | \n", "yes | \n", "yes | \n", "poor | \n", "yes | \n", "yes | \n", "1 | \n", "
5 rows × 25 columns
\n", "