no commit message

parent 29d5c5f0
...@@ -158,13 +158,17 @@ ...@@ -158,13 +158,17 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Le chargement à partir des données GIT ne fonctionne pas\n", "Le chargement à partir des données GIT ne fonctionne pas \n",
"A regarder plus tard, surement un problème dans le lien !!" "A regarder plus tard, surement un problème dans le lien !!\n",
"\n",
"En regardant le début de la table, on voit que la première colonne se sert à rien, on va pouvoir la supprimer par la suite.\n",
"La colonne Year sera à passer en index (en vérifiant que Python reconnait bien le format date (ou cas ou ca serve)\n",
"Les 2 dernières colonnes sont bien en format numérique donc pas besoin de transformation"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 10, "execution_count": 14,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
...@@ -196,355 +200,126 @@ ...@@ -196,355 +200,126 @@
" </thead>\n", " </thead>\n",
" <tbody>\n", " <tbody>\n",
" <tr>\n", " <tr>\n",
" <th>0</th>\n", " <th>count</th>\n",
" <td>1</td>\n", " <td>53.000000</td>\n",
" <td>1565</td>\n", " <td>53.000000</td>\n",
" <td>41.0</td>\n", " <td>53.000000</td>\n",
" <td>5.00</td>\n", " <td>50.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1570</td>\n",
" <td>45.0</td>\n",
" <td>5.05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1575</td>\n",
" <td>42.0</td>\n",
" <td>5.08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1580</td>\n",
" <td>49.0</td>\n",
" <td>5.12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>1585</td>\n",
" <td>41.5</td>\n",
" <td>5.15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>1590</td>\n",
" <td>47.0</td>\n",
" <td>5.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7</td>\n",
" <td>1595</td>\n",
" <td>64.0</td>\n",
" <td>5.54</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>8</td>\n",
" <td>1600</td>\n",
" <td>27.0</td>\n",
" <td>5.61</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>9</td>\n",
" <td>1605</td>\n",
" <td>33.0</td>\n",
" <td>5.69</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>10</td>\n",
" <td>1610</td>\n",
" <td>32.0</td>\n",
" <td>5.78</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>11</td>\n",
" <td>1615</td>\n",
" <td>33.0</td>\n",
" <td>5.94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>12</td>\n",
" <td>1620</td>\n",
" <td>35.0</td>\n",
" <td>6.01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>13</td>\n",
" <td>1625</td>\n",
" <td>33.0</td>\n",
" <td>6.12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>14</td>\n",
" <td>1630</td>\n",
" <td>45.0</td>\n",
" <td>6.22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>15</td>\n",
" <td>1635</td>\n",
" <td>33.0</td>\n",
" <td>6.30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>16</td>\n",
" <td>1640</td>\n",
" <td>39.0</td>\n",
" <td>6.37</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>17</td>\n",
" <td>1645</td>\n",
" <td>53.0</td>\n",
" <td>6.45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>18</td>\n",
" <td>1650</td>\n",
" <td>42.0</td>\n",
" <td>6.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>19</td>\n",
" <td>1655</td>\n",
" <td>40.5</td>\n",
" <td>6.60</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>20</td>\n",
" <td>1660</td>\n",
" <td>46.5</td>\n",
" <td>6.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>21</td>\n",
" <td>1665</td>\n",
" <td>32.0</td>\n",
" <td>6.80</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>22</td>\n",
" <td>1670</td>\n",
" <td>37.0</td>\n",
" <td>6.90</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>23</td>\n",
" <td>1675</td>\n",
" <td>43.0</td>\n",
" <td>7.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>24</td>\n",
" <td>1680</td>\n",
" <td>35.0</td>\n",
" <td>7.30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>25</td>\n",
" <td>1685</td>\n",
" <td>27.0</td>\n",
" <td>7.60</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>26</td>\n",
" <td>1690</td>\n",
" <td>40.0</td>\n",
" <td>8.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>27</td>\n",
" <td>1695</td>\n",
" <td>50.0</td>\n",
" <td>8.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>28</td>\n",
" <td>1700</td>\n",
" <td>30.0</td>\n",
" <td>9.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>29</td>\n",
" <td>1705</td>\n",
" <td>32.0</td>\n",
" <td>10.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>30</td>\n",
" <td>1710</td>\n",
" <td>44.0</td>\n",
" <td>11.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>31</td>\n",
" <td>1715</td>\n",
" <td>33.0</td>\n",
" <td>11.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>32</td>\n",
" <td>1720</td>\n",
" <td>29.0</td>\n",
" <td>12.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>33</td>\n",
" <td>1725</td>\n",
" <td>39.0</td>\n",
" <td>13.00</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th>33</th>\n", " <th>mean</th>\n",
" <td>34</td>\n", " <td>27.000000</td>\n",
" <td>1730</td>\n", " <td>1694.924528</td>\n",
" <td>26.0</td>\n", " <td>43.264151</td>\n",
" <td>13.30</td>\n", " <td>11.581600</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th>34</th>\n", " <th>std</th>\n",
" <td>35</td>\n", " <td>15.443445</td>\n",
" <td>1735</td>\n", " <td>77.089571</td>\n",
" <td>32.0</td>\n", " <td>15.410287</td>\n",
" <td>13.60</td>\n", " <td>7.336287</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th>35</th>\n", " <th>min</th>\n",
" <td>36</td>\n", " <td>1.000000</td>\n",
" <td>1740</td>\n", " <td>1565.000000</td>\n",
" <td>27.0</td>\n", " <td>26.000000</td>\n",
" <td>14.00</td>\n", " <td>5.000000</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th>36</th>\n", " <th>25%</th>\n",
" <td>37</td>\n", " <td>14.000000</td>\n",
" <td>1745</td>\n", " <td>1630.000000</td>\n",
" <td>27.5</td>\n", " <td>33.000000</td>\n",
" <td>14.50</td>\n", " <td>6.145000</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th>37</th>\n", " <th>50%</th>\n",
" <td>38</td>\n", " <td>27.000000</td>\n",
" <td>1750</td>\n", " <td>1695.000000</td>\n",
" <td>31.0</td>\n", " <td>41.000000</td>\n",
" <td>15.00</td>\n", " <td>7.800000</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th>38</th>\n", " <th>75%</th>\n",
" <td>39</td>\n", " <td>40.000000</td>\n",
" <td>1755</td>\n", " <td>1760.000000</td>\n",
" <td>35.5</td>\n", " <td>47.000000</td>\n",
" <td>15.70</td>\n", " <td>14.875000</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th>39</th>\n", " <th>max</th>\n",
" <td>40</td>\n", " <td>53.000000</td>\n",
" <td>1760</td>\n", " <td>1821.000000</td>\n",
" <td>31.0</td>\n", " <td>99.000000</td>\n",
" <td>16.50</td>\n", " <td>30.000000</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " </tbody>\n",
" <th>40</th>\n", "</table>\n",
" <td>41</td>\n", "</div>"
" <td>1765</td>\n", ],
" <td>43.0</td>\n", "text/plain": [
" <td>17.60</td>\n", " Unnamed: 0 Year Wheat Wages\n",
" </tr>\n", "count 53.000000 53.000000 53.000000 50.000000\n",
" <tr>\n", "mean 27.000000 1694.924528 43.264151 11.581600\n",
" <th>41</th>\n", "std 15.443445 77.089571 15.410287 7.336287\n",
" <td>42</td>\n", "min 1.000000 1565.000000 26.000000 5.000000\n",
" <td>1770</td>\n", "25% 14.000000 1630.000000 33.000000 6.145000\n",
" <td>47.0</td>\n", "50% 27.000000 1695.000000 41.000000 7.800000\n",
" <td>18.50</td>\n", "75% 40.000000 1760.000000 47.000000 14.875000\n",
" </tr>\n", "max 53.000000 1821.000000 99.000000 30.000000"
" <tr>\n", ]
" <th>42</th>\n", },
" <td>43</td>\n", "execution_count": 14,
" <td>1775</td>\n", "metadata": {},
" <td>44.0</td>\n", "output_type": "execute_result"
" <td>19.50</td>\n", }
" </tr>\n", ],
" <tr>\n", "source": [
" <th>43</th>\n", "raw_data.describe()\n"
" <td>44</td>\n", ]
" <td>1780</td>\n", },
" <td>46.0</td>\n", {
" <td>21.00</td>\n", "cell_type": "markdown",
" </tr>\n", "metadata": {},
" <tr>\n", "source": [
" <th>44</th>\n", "On va maintenant regarder si il y a des données manquantes."
" <td>45</td>\n", ]
" <td>1785</td>\n", },
" <td>42.0</td>\n", {
" <td>23.00</td>\n", "cell_type": "code",
" </tr>\n", "execution_count": 15,
" <tr>\n", "metadata": {},
" <th>45</th>\n", "outputs": [
" <td>46</td>\n", {
" <td>1790</td>\n", "data": {
" <td>47.5</td>\n", "text/html": [
" <td>25.50</td>\n", "<div>\n",
" </tr>\n", "<style scoped>\n",
" <tr>\n", " .dataframe tbody tr th:only-of-type {\n",
" <th>46</th>\n", " vertical-align: middle;\n",
" <td>47</td>\n", " }\n",
" <td>1795</td>\n", "\n",
" <td>76.0</td>\n", " .dataframe tbody tr th {\n",
" <td>27.50</td>\n", " vertical-align: top;\n",
" </tr>\n", " }\n",
" <tr>\n", "\n",
" <th>47</th>\n", " .dataframe thead th {\n",
" <td>48</td>\n", " text-align: right;\n",
" <td>1800</td>\n", " }\n",
" <td>79.0</td>\n", "</style>\n",
" <td>28.50</td>\n", "<table border=\"1\" class=\"dataframe\">\n",
" </tr>\n", " <thead>\n",
" <tr>\n", " <tr style=\"text-align: right;\">\n",
" <th>48</th>\n", " <th></th>\n",
" <td>49</td>\n", " <th>Unnamed: 0</th>\n",
" <td>1805</td>\n", " <th>Year</th>\n",
" <td>81.0</td>\n", " <th>Wheat</th>\n",
" <td>29.50</td>\n", " <th>Wages</th>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>50</td>\n",
" <td>1810</td>\n",
" <td>99.0</td>\n",
" <td>30.00</td>\n",
" </tr>\n", " </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n", " <tr>\n",
" <th>50</th>\n", " <th>50</th>\n",
" <td>51</td>\n", " <td>51</td>\n",
...@@ -572,97 +347,114 @@ ...@@ -572,97 +347,114 @@
], ],
"text/plain": [ "text/plain": [
" Unnamed: 0 Year Wheat Wages\n", " Unnamed: 0 Year Wheat Wages\n",
"0 1 1565 41.0 5.00\n",
"1 2 1570 45.0 5.05\n",
"2 3 1575 42.0 5.08\n",
"3 4 1580 49.0 5.12\n",
"4 5 1585 41.5 5.15\n",
"5 6 1590 47.0 5.25\n",
"6 7 1595 64.0 5.54\n",
"7 8 1600 27.0 5.61\n",
"8 9 1605 33.0 5.69\n",
"9 10 1610 32.0 5.78\n",
"10 11 1615 33.0 5.94\n",
"11 12 1620 35.0 6.01\n",
"12 13 1625 33.0 6.12\n",
"13 14 1630 45.0 6.22\n",
"14 15 1635 33.0 6.30\n",
"15 16 1640 39.0 6.37\n",
"16 17 1645 53.0 6.45\n",
"17 18 1650 42.0 6.50\n",
"18 19 1655 40.5 6.60\n",
"19 20 1660 46.5 6.75\n",
"20 21 1665 32.0 6.80\n",
"21 22 1670 37.0 6.90\n",
"22 23 1675 43.0 7.00\n",
"23 24 1680 35.0 7.30\n",
"24 25 1685 27.0 7.60\n",
"25 26 1690 40.0 8.00\n",
"26 27 1695 50.0 8.50\n",
"27 28 1700 30.0 9.00\n",
"28 29 1705 32.0 10.00\n",
"29 30 1710 44.0 11.00\n",
"30 31 1715 33.0 11.75\n",
"31 32 1720 29.0 12.50\n",
"32 33 1725 39.0 13.00\n",
"33 34 1730 26.0 13.30\n",
"34 35 1735 32.0 13.60\n",
"35 36 1740 27.0 14.00\n",
"36 37 1745 27.5 14.50\n",
"37 38 1750 31.0 15.00\n",
"38 39 1755 35.5 15.70\n",
"39 40 1760 31.0 16.50\n",
"40 41 1765 43.0 17.60\n",
"41 42 1770 47.0 18.50\n",
"42 43 1775 44.0 19.50\n",
"43 44 1780 46.0 21.00\n",
"44 45 1785 42.0 23.00\n",
"45 46 1790 47.5 25.50\n",
"46 47 1795 76.0 27.50\n",
"47 48 1800 79.0 28.50\n",
"48 49 1805 81.0 29.50\n",
"49 50 1810 99.0 30.00\n",
"50 51 1815 78.0 NaN\n", "50 51 1815 78.0 NaN\n",
"51 52 1820 54.0 NaN\n", "51 52 1820 54.0 NaN\n",
"52 53 1821 54.0 NaN" "52 53 1821 54.0 NaN"
] ]
}, },
"execution_count": 10, "execution_count": 15,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
], ],
"source": [ "source": [
"raw_data\n" "raw_data[raw_data.isnull().any(axis=1)]"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {}, "metadata": {},
"outputs": [], "source": [
"source": [] "Il y a 3 lignes avec des données manquantes...uniqument sur les salaires. On va donc les garder pour l'instant.\n",
"On va donc supprimer la première colonne et paaser la colonne Year en index."
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 37,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [
"source": [] {
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Wheat</th>\n",
" <th>Wages</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Year</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1565</th>\n",
" <td>41.0</td>\n",
" <td>5.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1570</th>\n",
" <td>45.0</td>\n",
" <td>5.05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1575</th>\n",
" <td>42.0</td>\n",
" <td>5.08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1580</th>\n",
" <td>49.0</td>\n",
" <td>5.12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1585</th>\n",
" <td>41.5</td>\n",
" <td>5.15</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Wheat Wages\n",
"Year \n",
"1565 41.0 5.00\n",
"1570 45.0 5.05\n",
"1575 42.0 5.08\n",
"1580 49.0 5.12\n",
"1585 41.5 5.15"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"colonne0=list(raw_data)[0]\n",
"sorted_data = raw_data.set_index('Year').sort_index().drop(colonne0,axis=1) \n",
"# ca ne marche pas si j'essaie de combiner les 2 lignes en une !!\n",
"sorted_data.head()"
]
} }
], ],
"metadata": { "metadata": {
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment