"# Subject 2: Purchasing power of English workers from the 16th to the 19th century"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import numpy as np\n",
"import isoweek"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"William Playfair was one of the pioneers of the graphical presentation of data, being credited in particular with the invention of the histogram. One of his famous graphs, taken from his book \"A Letter on Our Agricultural Distresses, Their Causes and Remedies\", shows the evolution of the wheat price and average salaries from 1565 to 1821. First, we will replicate his famous graph and then present alternative versions of the graph to improve the readability."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting the original graph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data used by Playfair are available on [github](https://vincentarelbundock.github.io/Rdatasets/doc/HistData/Wheat.html) in a csv format using the url:"
"We load the data are remove the first column that is unecessary. The array is made of three columns : the year, the wheat price (in Shilling/quarter) and the wages (in Shilling/week)."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Year</th>\n",
" <th>Wheat</th>\n",
" <th>Wages</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1565</td>\n",
" <td>41.0</td>\n",
" <td>5.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1570</td>\n",
" <td>45.0</td>\n",
" <td>5.05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1575</td>\n",
" <td>42.0</td>\n",
" <td>5.08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1580</td>\n",
" <td>49.0</td>\n",
" <td>5.12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1585</td>\n",
" <td>41.5</td>\n",
" <td>5.15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1590</td>\n",
" <td>47.0</td>\n",
" <td>5.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>1595</td>\n",
" <td>64.0</td>\n",
" <td>5.54</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>1600</td>\n",
" <td>27.0</td>\n",
" <td>5.61</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1605</td>\n",
" <td>33.0</td>\n",
" <td>5.69</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>1610</td>\n",
" <td>32.0</td>\n",
" <td>5.78</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>1615</td>\n",
" <td>33.0</td>\n",
" <td>5.94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>1620</td>\n",
" <td>35.0</td>\n",
" <td>6.01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>1625</td>\n",
" <td>33.0</td>\n",
" <td>6.12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>1630</td>\n",
" <td>45.0</td>\n",
" <td>6.22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>1635</td>\n",
" <td>33.0</td>\n",
" <td>6.30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>1640</td>\n",
" <td>39.0</td>\n",
" <td>6.37</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>1645</td>\n",
" <td>53.0</td>\n",
" <td>6.45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>1650</td>\n",
" <td>42.0</td>\n",
" <td>6.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>1655</td>\n",
" <td>40.5</td>\n",
" <td>6.60</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>1660</td>\n",
" <td>46.5</td>\n",
" <td>6.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>1665</td>\n",
" <td>32.0</td>\n",
" <td>6.80</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>1670</td>\n",
" <td>37.0</td>\n",
" <td>6.90</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>1675</td>\n",
" <td>43.0</td>\n",
" <td>7.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>1680</td>\n",
" <td>35.0</td>\n",
" <td>7.30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>1685</td>\n",
" <td>27.0</td>\n",
" <td>7.60</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>1690</td>\n",
" <td>40.0</td>\n",
" <td>8.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>1695</td>\n",
" <td>50.0</td>\n",
" <td>8.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>1700</td>\n",
" <td>30.0</td>\n",
" <td>9.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>1705</td>\n",
" <td>32.0</td>\n",
" <td>10.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>1710</td>\n",
" <td>44.0</td>\n",
" <td>11.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>1715</td>\n",
" <td>33.0</td>\n",
" <td>11.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>1720</td>\n",
" <td>29.0</td>\n",
" <td>12.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>1725</td>\n",
" <td>39.0</td>\n",
" <td>13.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>1730</td>\n",
" <td>26.0</td>\n",
" <td>13.30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>1735</td>\n",
" <td>32.0</td>\n",
" <td>13.60</td>\n",
" </tr>\n",
" <tr>\n",
" <th>35</th>\n",
" <td>1740</td>\n",
" <td>27.0</td>\n",
" <td>14.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>36</th>\n",
" <td>1745</td>\n",
" <td>27.5</td>\n",
" <td>14.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37</th>\n",
" <td>1750</td>\n",
" <td>31.0</td>\n",
" <td>15.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>38</th>\n",
" <td>1755</td>\n",
" <td>35.5</td>\n",
" <td>15.70</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td>1760</td>\n",
" <td>31.0</td>\n",
" <td>16.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>1765</td>\n",
" <td>43.0</td>\n",
" <td>17.60</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41</th>\n",
" <td>1770</td>\n",
" <td>47.0</td>\n",
" <td>18.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>1775</td>\n",
" <td>44.0</td>\n",
" <td>19.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>1780</td>\n",
" <td>46.0</td>\n",
" <td>21.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>44</th>\n",
" <td>1785</td>\n",
" <td>42.0</td>\n",
" <td>23.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>1790</td>\n",
" <td>47.5</td>\n",
" <td>25.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>1795</td>\n",
" <td>76.0</td>\n",
" <td>27.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>47</th>\n",
" <td>1800</td>\n",
" <td>79.0</td>\n",
" <td>28.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48</th>\n",
" <td>1805</td>\n",
" <td>81.0</td>\n",
" <td>29.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>1810</td>\n",
" <td>99.0</td>\n",
" <td>30.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50</th>\n",
" <td>1815</td>\n",
" <td>78.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>51</th>\n",
" <td>1820</td>\n",
" <td>54.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>1821</td>\n",
" <td>54.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Year Wheat Wages\n",
"0 1565 41.0 5.00\n",
"1 1570 45.0 5.05\n",
"2 1575 42.0 5.08\n",
"3 1580 49.0 5.12\n",
"4 1585 41.5 5.15\n",
"5 1590 47.0 5.25\n",
"6 1595 64.0 5.54\n",
"7 1600 27.0 5.61\n",
"8 1605 33.0 5.69\n",
"9 1610 32.0 5.78\n",
"10 1615 33.0 5.94\n",
"11 1620 35.0 6.01\n",
"12 1625 33.0 6.12\n",
"13 1630 45.0 6.22\n",
"14 1635 33.0 6.30\n",
"15 1640 39.0 6.37\n",
"16 1645 53.0 6.45\n",
"17 1650 42.0 6.50\n",
"18 1655 40.5 6.60\n",
"19 1660 46.5 6.75\n",
"20 1665 32.0 6.80\n",
"21 1670 37.0 6.90\n",
"22 1675 43.0 7.00\n",
"23 1680 35.0 7.30\n",
"24 1685 27.0 7.60\n",
"25 1690 40.0 8.00\n",
"26 1695 50.0 8.50\n",
"27 1700 30.0 9.00\n",
"28 1705 32.0 10.00\n",
"29 1710 44.0 11.00\n",
"30 1715 33.0 11.75\n",
"31 1720 29.0 12.50\n",
"32 1725 39.0 13.00\n",
"33 1730 26.0 13.30\n",
"34 1735 32.0 13.60\n",
"35 1740 27.0 14.00\n",
"36 1745 27.5 14.50\n",
"37 1750 31.0 15.00\n",
"38 1755 35.5 15.70\n",
"39 1760 31.0 16.50\n",
"40 1765 43.0 17.60\n",
"41 1770 47.0 18.50\n",
"42 1775 44.0 19.50\n",
"43 1780 46.0 21.00\n",
"44 1785 42.0 23.00\n",
"45 1790 47.5 25.50\n",
"46 1795 76.0 27.50\n",
"47 1800 79.0 28.50\n",
"48 1805 81.0 29.50\n",
"49 1810 99.0 30.00\n",
"50 1815 78.0 NaN\n",
"51 1820 54.0 NaN\n",
"52 1821 54.0 NaN"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_data = pd.read_csv(data_url)\n",
"data = raw_data.copy()\n",
"data.pop('Unnamed: 0')\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We sort by increasing years and verify that the gap between two points is not more than 5 years:"
"\u001b[0;32m/opt/conda/lib/python3.6/site-packages/matplotlib/axes/_axes.py\u001b[0m in \u001b[0;36mhist\u001b[0;34m(***failed resolving arguments***)\u001b[0m\n\u001b[1;32m 6637\u001b[0m \u001b[0;31m# this will automatically overwrite bins,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6638\u001b[0m \u001b[0;31m# so that each histogram uses the same bins\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 6639\u001b[0;31m \u001b[0mm\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbins\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhistogram\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbins\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mweights\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mw\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mhist_kwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6640\u001b[0m \u001b[0mm\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mastype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfloat\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# causes problems later if it's an int\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6641\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mmlast\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/opt/conda/lib/python3.6/site-packages/numpy/lib/histograms.py\u001b[0m in \u001b[0;36mhistogram\u001b[0;34m(a, bins, range, normed, weights, density)\u001b[0m\n\u001b[1;32m 700\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mweights\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_ravel_and_check_weights\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mweights\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 701\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 702\u001b[0;31m \u001b[0mbin_edges\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muniform_bins\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_get_bin_edges\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbins\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mweights\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 703\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 704\u001b[0m \u001b[0;31m# Histogram is an integer or a float array depending on the weights.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/opt/conda/lib/python3.6/site-packages/numpy/lib/histograms.py\u001b[0m in \u001b[0;36m_get_bin_edges\u001b[0;34m(a, bins, range, weights)\u001b[0m\n\u001b[1;32m 359\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0many\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mbin_edges\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0mbin_edges\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 360\u001b[0m raise ValueError(\n\u001b[0;32m--> 361\u001b[0;31m '`bins` must increase monotonically, when an array')\n\u001b[0m\u001b[1;32m 362\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 363\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValueError\u001b[0m: `bins` must increase monotonically, when an array"