From 69e4777abdf430c3e61e734a84b201b40927ed05 Mon Sep 17 00:00:00 2001
From: 264af2e9a1e4e844f861df089a5604e3
 <264af2e9a1e4e844f861df089a5604e3@app-learninglab.inria.fr>
Date: Fri, 1 Nov 2024 12:31:38 +0000
Subject: [PATCH] no commit message

---
 module3/exo3/exerciceTabac.ipynb     |   6 +-
 module4/src_Python3_challenger.ipynb | 380 +++++++++++++++++++++++++++
 2 files changed, 383 insertions(+), 3 deletions(-)
 create mode 100644 module4/src_Python3_challenger.ipynb

diff --git a/module3/exo3/exerciceTabac.ipynb b/module3/exo3/exerciceTabac.ipynb
index c26fe7c..56c8b96 100644
--- a/module3/exo3/exerciceTabac.ipynb
+++ b/module3/exo3/exerciceTabac.ipynb
@@ -2651,7 +2651,7 @@
       "Model:                          Logit   Df Residuals:                      580\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Thu, 31 Oct 2024   Pseudo R-squ.:                  0.2492\n",
-      "Time:                        21:26:13   Log-Likelihood:                -240.21\n",
+      "Time:                        23:20:10   Log-Likelihood:                -240.21\n",
       "converged:                       True   LL-Null:                       -319.94\n",
       "                                        LLR p-value:                 1.477e-36\n",
       "==============================================================================\n",
@@ -2715,7 +2715,7 @@
       "Model:                          Logit   Df Residuals:                      730\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Thu, 31 Oct 2024   Pseudo R-squ.:                  0.4304\n",
-      "Time:                        21:26:13   Log-Likelihood:                -259.54\n",
+      "Time:                        23:20:10   Log-Likelihood:                -259.54\n",
       "converged:                       True   LL-Null:                       -455.62\n",
       "                                        LLR p-value:                 2.808e-87\n",
       "==============================================================================\n",
@@ -2796,7 +2796,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 28,
    "metadata": {},
    "outputs": [],
    "source": [
diff --git a/module4/src_Python3_challenger.ipynb b/module4/src_Python3_challenger.ipynb
new file mode 100644
index 0000000..f108b9b
--- /dev/null
+++ b/module4/src_Python3_challenger.ipynb
@@ -0,0 +1,380 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this document we reperform some of the analysis provided in \n",
+    "*Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure* by *Siddhartha R. Dalal, Edward B. Fowlkes, Bruce Hoadley* published in *Journal of the American Statistical Association*, Vol. 84, No. 408 (Dec., 1989), pp. 945-957 and available at http://www.jstor.org/stable/2290069. \n",
+    "\n",
+    "On the fourth page of this article, they indicate that the maximum likelihood estimates of the logistic regression using only temperature are: $\\hat{\\alpha}=5.085$ and $\\hat{\\beta}=-0.1156$ and their asymptotic standard errors are $s_{\\hat{\\alpha}}=3.052$ and $s_{\\hat{\\beta}}=0.047$. The Goodness of fit indicated for this model was $G^2=18.086$ with 21 degrees of freedom. Our goal is to reproduce the computation behind these values and the Figure 4 of this article, possibly in a nicer looking way."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Technical information on the computer on which the analysis is run"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We will be using the python3 language using the pandas, statsmodels, numpy, matplotlib and seaborn libraries."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "3.6.4 |Anaconda, Inc.| (default, Mar 13 2018, 01:15:57) \n",
+      "[GCC 7.2.0]\n",
+      "uname_result(system='Linux', node='49d9c65916ca', release='4.15.0-142-generic', version='#146~16.04.1-Ubuntu SMP Tue Apr 13 09:27:15 UTC 2021', machine='x86_64', processor='x86_64')\n",
+      "IPython 7.12.0\n",
+      "IPython.core.release 7.12.0\n",
+      "PIL 7.0.0\n",
+      "PIL.Image 7.0.0\n",
+      "PIL._version 7.0.0\n",
+      "_csv 1.0\n",
+      "_ctypes 1.1.0\n",
+      "_curses b'2.2'\n",
+      "decimal 1.70\n",
+      "argparse 1.1\n",
+      "backcall 0.1.0\n",
+      "cffi 1.13.2\n",
+      "csv 1.0\n",
+      "ctypes 1.1.0\n",
+      "cycler 0.10.0\n",
+      "dateutil 2.8.1\n",
+      "decimal 1.70\n",
+      "decorator 4.4.1\n",
+      "distutils 3.6.4\n",
+      "ipaddress 1.0\n",
+      "ipykernel 5.1.4\n",
+      "ipykernel._version 5.1.4\n",
+      "ipython_genutils 0.2.0\n",
+      "ipython_genutils._version 0.2.0\n",
+      "ipywidgets 7.2.1\n",
+      "ipywidgets._version 7.2.1\n",
+      "jedi 0.16.0\n",
+      "json 2.0.9\n",
+      "jupyter_client 6.0.0\n",
+      "jupyter_client._version 6.0.0\n",
+      "jupyter_core 4.6.3\n",
+      "jupyter_core.version 4.6.3\n",
+      "kiwisolver 1.1.0\n",
+      "logging 0.5.1.2\n",
+      "matplotlib 2.2.3\n",
+      "matplotlib.backends.backend_agg 2.2.3\n",
+      "numpy 1.15.2\n",
+      "numpy.core 1.15.2\n",
+      "numpy.core.multiarray 3.1\n",
+      "numpy.lib 1.15.2\n",
+      "numpy.linalg._umath_linalg b'0.1.5'\n",
+      "numpy.matlib 1.15.2\n",
+      "optparse 1.5.3\n",
+      "pandas 0.22.0\n",
+      "_libjson 1.33\n",
+      "parso 0.6.0\n",
+      "patsy 0.5.1\n",
+      "patsy.version 0.5.1\n",
+      "pexpect 4.8.0\n",
+      "pickleshare 0.7.5\n",
+      "platform 1.0.8\n",
+      "prompt_toolkit 3.0.3\n",
+      "ptyprocess 0.6.0\n",
+      "pygments 2.5.2\n",
+      "pyparsing 2.4.6\n",
+      "pytz 2019.3\n",
+      "re 2.2.1\n",
+      "scipy 1.1.0\n",
+      "scipy._lib.decorator 4.0.5\n",
+      "scipy._lib.six 1.2.0\n",
+      "scipy.fftpack._fftpack b'$Revision: $'\n",
+      "scipy.fftpack.convolve b'$Revision: $'\n",
+      "scipy.integrate._dop b'$Revision: $'\n",
+      "scipy.integrate._ode $Id$\n",
+      "scipy.integrate._odepack  1.9 \n",
+      "scipy.integrate._quadpack  1.13 \n",
+      "scipy.integrate.lsoda b'$Revision: $'\n",
+      "scipy.integrate.vode b'$Revision: $'\n",
+      "scipy.interpolate._fitpack  1.7 \n",
+      "scipy.interpolate.dfitpack b'$Revision: $'\n",
+      "scipy.linalg 0.4.9\n",
+      "scipy.linalg._fblas b'$Revision: $'\n",
+      "scipy.linalg._flapack b'$Revision: $'\n",
+      "scipy.linalg._flinalg b'$Revision: $'\n",
+      "scipy.ndimage 2.0\n",
+      "scipy.optimize._cobyla b'$Revision: $'\n",
+      "scipy.optimize._lbfgsb b'$Revision: $'\n",
+      "scipy.optimize._minpack  1.10 \n",
+      "scipy.optimize._nnls b'$Revision: $'\n",
+      "scipy.optimize._slsqp b'$Revision: $'\n",
+      "scipy.optimize.minpack2 b'$Revision: $'\n",
+      "scipy.signal.spline 0.2\n",
+      "scipy.sparse.linalg.eigen.arpack._arpack b'$Revision: $'\n",
+      "scipy.sparse.linalg.isolve._iterative b'$Revision: $'\n",
+      "scipy.special.specfun b'$Revision: $'\n",
+      "scipy.stats.mvn b'$Revision: $'\n",
+      "scipy.stats.statlib b'$Revision: $'\n",
+      "seaborn 0.8.1\n",
+      "seaborn.external.husl 2.1.0\n",
+      "seaborn.external.six 1.10.0\n",
+      "six 1.14.0\n",
+      "statsmodels 0.9.0\n",
+      "statsmodels.__init__ 0.9.0\n",
+      "traitlets 4.3.3\n",
+      "traitlets._version 4.3.3\n",
+      "urllib.request 3.6\n",
+      "zlib 1.0\n",
+      "zmq 17.1.2\n",
+      "zmq.sugar 17.1.2\n",
+      "zmq.sugar.version 17.1.2\n"
+     ]
+    }
+   ],
+   "source": [
+    "def print_imported_modules():\n",
+    "    import sys\n",
+    "    for name, val in sorted(sys.modules.items()):\n",
+    "        if(hasattr(val, '__version__')): \n",
+    "            print(val.__name__, val.__version__)\n",
+    "#        else:\n",
+    "#            print(val.__name__, \"(unknown version)\")\n",
+    "def print_sys_info():\n",
+    "    import sys\n",
+    "    import platform\n",
+    "    print(sys.version)\n",
+    "    print(platform.uname())\n",
+    "\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "import statsmodels.api as sm\n",
+    "import seaborn as sns\n",
+    "\n",
+    "print_sys_info()\n",
+    "print_imported_modules()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Loading and inspecting data\n",
+    "Let's start by reading data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "ParserError",
+     "evalue": "Error tokenizing data. C error: Expected 1 fields in line 30, saw 21\n",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mParserError\u001b[0m                               Traceback (most recent call last)",
+      "\u001b[0;32m<ipython-input-2-dd5c5ce9e9a8>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"https://app-learninglab.inria.fr/moocrr/gitlab/moocrr-session3/moocrr-reproducibility-study/blob/master/data/shuttle.csv\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      2\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)\u001b[0m\n\u001b[1;32m    707\u001b[0m                     skip_blank_lines=skip_blank_lines)\n\u001b[1;32m    708\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 709\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    710\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    711\u001b[0m     \u001b[0mparser_f\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m    453\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    454\u001b[0m     \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 455\u001b[0;31m         \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    456\u001b[0m     \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    457\u001b[0m         \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m   1067\u001b[0m                 \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'skipfooter not supported for iteration'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1068\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1069\u001b[0;31m         \u001b[0mret\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   1070\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1071\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'as_recarray'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m   1837\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1838\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1839\u001b[0;31m             \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   1840\u001b[0m         \u001b[0;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1841\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_first_chunk\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.read\u001b[0;34m()\u001b[0m\n",
+      "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_low_memory\u001b[0;34m()\u001b[0m\n",
+      "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[0;34m()\u001b[0m\n",
+      "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._tokenize_rows\u001b[0;34m()\u001b[0m\n",
+      "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.raise_parser_error\u001b[0;34m()\u001b[0m\n",
+      "\u001b[0;31mParserError\u001b[0m: Error tokenizing data. C error: Expected 1 fields in line 30, saw 21\n"
+     ]
+    }
+   ],
+   "source": [
+    "data = pd.read_csv(\"https://app-learninglab.inria.fr/moocrr/gitlab/moocrr-session3/moocrr-reproducibility-study/blob/master/data/shuttle.csv\")\n",
+    "data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We know from our previous experience on this data set that filtering data is a really bad idea. We will therefore process it as such."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "pd.set_option('mode.chained_assignment',None) # this removes a useless warning from pandas\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "data[\"Frequency\"]=data.Malfunction/data.Count\n",
+    "data.plot(x=\"Temperature\",y=\"Frequency\",kind=\"scatter\",ylim=[0,1])\n",
+    "plt.grid(True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Logistic regression\n",
+    "\n",
+    "Let's assume O-rings independently fail with the same probability which solely depends on temperature. A logistic regression should allow us to estimate the influence of temperature."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import statsmodels.api as sm\n",
+    "\n",
+    "data[\"Success\"]=data.Count-data.Malfunction\n",
+    "data[\"Intercept\"]=1\n",
+    "\n",
+    "logmodel=sm.GLM(data['Frequency'], data[['Intercept','Temperature']], \n",
+    "                family=sm.families.Binomial(sm.families.links.logit)).fit()\n",
+    "\n",
+    "logmodel.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The maximum likelyhood estimator of the intercept and of Temperature are thus $\\hat{\\alpha}=5.0849$ and $\\hat{\\beta}=-0.1156$. This **corresponds** to the values from the article of Dalal *et al.* The standard errors are $s_{\\hat{\\alpha}} = 7.477$ and $s_{\\hat{\\beta}} = 0.115$, which is **different** from the $3.052$ and $0.04702$ reported by Dallal *et al.* The deviance is $3.01444$ with 21 degrees of freedom. I cannot find any value similar to the Goodness of fit ($G^2=18.086$) reported by Dalal *et al.* There seems to be something wrong. Oh I know, I haven't indicated that my observations are actually the result of 6 observations for each rocket launch. Let's indicate these weights (since the weights are always the same throughout all experiments, it does not change the estimates of the fit but it does influence the variance estimates)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "logmodel=sm.GLM(data['Frequency'], data[['Intercept','Temperature']], \n",
+    "                family=sm.families.Binomial(sm.families.links.logit),\n",
+    "                var_weights=data['Count']).fit()\n",
+    "\n",
+    "logmodel.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Good, now I have recovered the asymptotic standard errors $s_{\\hat{\\alpha}}=3.052$ and $s_{\\hat{\\beta}}=0.047$.\n",
+    "The Goodness of fit (Deviance) indicated for this model is $G^2=18.086$ with 21 degrees of freedom (Df Residuals).\n",
+    "\n",
+    "**I have therefore managed to fully replicate the results of the Dalal *et al.* article**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Predicting failure probability\n",
+    "The temperature when launching the shuttle was 31°F. Let's try to estimate the failure probability for such temperature using our model.:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "data_pred = pd.DataFrame({'Temperature': np.linspace(start=30, stop=90, num=121), 'Intercept': 1})\n",
+    "data_pred['Frequency'] = logmodel.predict(data_pred)\n",
+    "data_pred.plot(x=\"Temperature\",y=\"Frequency\",kind=\"line\",ylim=[0,1])\n",
+    "plt.scatter(x=data[\"Temperature\"],y=data[\"Frequency\"])\n",
+    "plt.grid(True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "hideCode": false,
+    "hidePrompt": false,
+    "scrolled": true
+   },
+   "source": [
+    "This figure is very similar to the Figure 4 of Dalal *et al.* **I have managed to replicate the Figure 4 of the Dalal *et al.* article.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Computing and plotting uncertainty"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Following the documentation of [Seaborn](https://seaborn.pydata.org/generated/seaborn.regplot.html), I use regplot."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sns.set(color_codes=True)\n",
+    "plt.xlim(30,90)\n",
+    "plt.ylim(0,1)\n",
+    "sns.regplot(x='Temperature', y='Frequency', data=data, logistic=True)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**I think I have managed to correctly compute and plot the uncertainty of my prediction.** Although the shaded area seems very similar to [the one obtained by with R](https://app-learninglab.inria.fr/moocrr/gitlab/moocrr-session3/moocrr-reproducibility-study/tree/master/challenger.pdf), I can spot a few differences (e.g., the blue point for temperature 63 is outside)... Could this be a numerical error ? Or a difference in the statistical method ? It is not clear which one is \"right\"."
+   ]
+  }
+ ],
+ "metadata": {
+  "celltoolbar": "Hide code",
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
-- 
2.18.1