{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Around Simpson's Paradox" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data are available in the MOOC repository. The CSV file contains data for the 1314 women that were polled in Whickham, England, in 1972-1974 and were categorized as \"currently smoking\" or \"never smoked\". Each line is related to a person and contains whether she smokes or not, whether alive or dead twenty year after the survey and her age at the time of the survey." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of rows: 1314\n" ] }, { "data": { "text/html": [ "
\n", " | Smoker | \n", "Status | \n", "Age | \n", "
---|---|---|---|
0 | \n", "Yes | \n", "Alive | \n", "21.0 | \n", "
1 | \n", "Yes | \n", "Alive | \n", "19.3 | \n", "
2 | \n", "No | \n", "Dead | \n", "57.5 | \n", "
3 | \n", "No | \n", "Alive | \n", "47.1 | \n", "
4 | \n", "Yes | \n", "Alive | \n", "81.4 | \n", "