diff --git a/module2/exo4/exercice_R_fr.org b/module2/exo4/exercice_R_fr.org
index 1bb8f61f1d11b486ceb724afcdd14d11e5329545..7d261b8ecd3327d21c2ece1c453a5e53efcc86d8 100644
--- a/module2/exo4/exercice_R_fr.org
+++ b/module2/exo4/exercice_R_fr.org
@@ -1,6 +1,6 @@
-#+TITLE:  Votre titre
-#+AUTHOR: Votre nom
-#+DATE:   La date du jour
+#+TITLE:  Exercice 4
+#+AUTHOR: Waad ALMASRI
+#+DATE:   25/08/2020
 #+LANGUAGE: fr
 # #+PROPERTY: header-args :eval never-export
 
@@ -11,74 +11,224 @@
 #+HTML_HEAD: <script type="text/javascript" src="http://www.pirilampo.org/styles/lib/js/jquery.stickytableheaders.js"></script>
 #+HTML_HEAD: <script type="text/javascript" src="http://www.pirilampo.org/styles/readtheorg/js/readtheorg.js"></script>
 
-* Quelques explications
+* Exploration du répertoire 
+D'abord, on fait un git pull pour récupérer les données qu'on a téléversé dans le répertoire GIT.<br>
+*Attention!* Si tu as commencé à écrire dans le notebook, enregistre les données pour ne pas les perdre.<br>
+Ensuite, on s'assure que nous avons les données dans le répertoire
+avec la commande "list.files()".
+
+#+begin_src R :results output :session *R* :exports both
+list.files(".")
+#+end_src
+
+#+RESULTS:
+:  [1] "#exercice_python_fr.org#" "#exercice_R_fr.org#"     
+:  [3] "bar-chart.html"           "cosxsx.png"              
+:  [5] "data.csv"                 "exercice_en.ipynb"       
+:  [7] "exercice_en.Rmd"          "exercice_fr.ipynb"       
+:  [9] "exercice_fr.Rmd"          "exercice_python_en.org"  
+: [11] "exercice_python_fr.org"   "exercice_R_en.org"       
+: [13] "exercice_R_fr.org"        "exercice.ipynb"          
+: [15] "fig2_python_org.png"      "fig3_python_org.png"     
+: [17] "fig4_python_org.png"      "fig5_python_org.png"
 
-Ceci est un document org-mode avec quelques exemples de code
-R. Une fois ouvert dans emacs, ce document peut aisément être
-exporté au format HTML, PDF, et Office. Pour plus de détails sur
-org-mode vous pouvez consulter https://orgmode.org/guide/.
 
-Lorsque vous utiliserez le raccourci =C-c C-e h o=, ce document sera
-compilé en html. Tout le code contenu sera ré-exécuté, les résultats
-récupérés et inclus dans un document final. Si vous ne souhaitez pas
-ré-exécuter tout le code à chaque fois, il vous suffit de supprimer
-le # et l'espace qui sont devant le ~#+PROPERTY:~ au début de ce
-document.
+* Exploration du jeu de données
+Maintenant qu'on a les données, on va commencer à les explorer.<br>
+*NB:* les données suivantes sont déjà formattées en .csv.
 
-Comme nous vous l'avons montré dans la vidéo, on inclut du code
-R de la façon suivante (et on l'exécute en faisant ~C-c C-c~):
+#+begin_src R :results output :session *R* :exports both
+df <- read.csv(file = "data.csv", sep="\t")
+print(nrow(df))
+head(df)
+#+end_src
 
-#+begin_src R :results output :exports both
-print("Hello world!")
+#+RESULTS:
+#+begin_example
+
+[1] 7569
+
+        date           edited_by        job       researched_by
+1 2013-08-29 Angie Drobnic Holan Republican       Jon Greenberg
+2 2013-08-29 Angie Drobnic Holan Republican      Louis Jacobson
+3 2013-08-29       Greg Borowski                  Tom Kertscher
+4 2013-08-28    Aaron Sharockman                  Rochelle Koff
+5 2013-08-28    Aaron Sharockman            Angie Drobnic Holan
+6 2013-08-28    W. Gardner Selby Republican            Sue Owen
+                                       source     state
+1                                Scott Walker Wisconsin
+2                               Mike Huckabee  Arkansas
+3               League of Conservation Voters          
+4 National Republican Congressional Committee          
+5                            Janet Napolitano          
+6                              Steve Stockman     Texas
+                                                                                                                                                                                                         statement
+1 In the Wisconsin health insurance exchange, "the Society of Actuaries points out that there'll be, according to them, an 82 percent increase in individual premiums over the next couple years under Obamacare."
+2           "America’s gun-related homicide rate … would be about the same as Belgium’s if you left out California, Illinois, D.C. and New Jersey, places with some of the strictest gun control laws in the U.S."
+3                                                                                                     Says U.S. Sen. Ron Johnson voted to let oil and gas companies emit "unlimited carbon pollution into our air"
+4                                                                                                              "Congressman Patrick Murphy voted to keep the scandal-ridden IRS in charge of enforcing Obamacare."
+5                                                                                                                                                   The 2010 DREAM Act failed despite "strong bipartisan support."
+6                                                                                                                                           Says U.N. arms treaty will mandate a "new international gun registry."
+                                                       subjects truth
+1                                               ['Health Care']     3
+2                                  ['Crime', 'Guns', 'Pundits']     0
+3 ['Climate Change', 'Energy', 'Environment', 'Transportation']     5
+4                                               ['Health Care']     2
+5                             ['Bipartisanship', 'Immigration']     2
+6                                                      ['Guns']     1
+#+end_example
+
+Let us add the year column to the dataframe.
+#+begin_src R :results output :session *R* :exports both
+df$year <- substring(df$date,1,4)
 #+end_src
 
 #+RESULTS:
-: [1] "Hello world!"
 
-Voici la même chose, mais avec une session R (c'est le cas le
-plus courant, R étant vraiment un langage interactif), donc une
-persistance d'un bloc à l'autre (et on l'exécute toujours en faisant
-~C-c C-c~).
+Now let us check what's in the dataframe:
+#+begin_src R :results output :session *R* :exports both
+summary(df)
+#+end_src
+
+#+RESULTS:
+#+begin_example
+     date            edited_by             job            researched_by     
+ Length:7569        Length:7569        Length:7569        Length:7569       
+ Class :character   Class :character   Class :character   Class :character  
+ Mode  :character   Mode  :character   Mode  :character   Mode  :character  
+                                                                            
+                                                                            
+                                                                            
+    source             state            statement           subjects        
+ Length:7569        Length:7569        Length:7569        Length:7569       
+ Class :character   Class :character   Class :character   Class :character  
+ Mode  :character   Mode  :character   Mode  :character   Mode  :character  
+                                                                            
+                                                                            
+                                                                            
+     truth           year          
+ Min.   :0.000   Length:7569       
+ 1st Qu.:1.000   Class :character  
+ Median :3.000   Mode  :character  
+ Mean   :2.741                     
+ 3rd Qu.:4.000                     
+ Max.   :5.000
+#+end_example
+
+Let us remove the missing data.
+#+begin_src R :results output :session *R* :exports both
+library(tidyr)
+library(plyr)
+library(dplyr)
+df <- df %>% drop_na(job, state)
+#+end_src 
+
+#+RESULTS:
+
+* Statistiques de base
+#+begin_src R :results output :session *R* :exports both
+print(paste0("There are ", length(unique(df$job)), " unique jobs."))
+print(paste0("There are ", length(unique(df$edited_by)), " unique editors."))
+print(paste0("There are ", length(unique(df$state)), " unique state"))
+#+end_src
+
+#+RESULTS:
+: [1] "There are 20 unique jobs."
+: 
+: [1] "There are 127 unique editors."
+: 
+: [1] "There are 60 unique state"
 
+Number of jobs per state per year
 #+begin_src R :results output :session *R* :exports both
-summary(cars)
+jobs_per_state_year <-ddply(df,.(state,year),summarise,number_of_jobs=length((job)))
+jobs_per_state_year <-jobs_per_state_year[order(jobs_per_state_year$number_of_jobs, decreasing=TRUE),]
+#+end_src
+
+#+RESULTS:
+
+* Representations graphiques
+We will start by plotting the Nbr of jobs per year of New York versus
+Texas.
+#+begin_src R :results output graphics :file (org-babel-temp-file "figure" ".png") :exports both :width 600 :height 400 :session *R* 
+library(ggplot2)
+df %>% 
+  filter(df$state %in% c("Texas", "New York") ) %>%
+  group_by(state, year) %>%
+  dplyr::summarise(Nbr_of_jobs=n()) %>%
+  ggplot(aes(x=year, y=Nbr_of_jobs))+ 
+    geom_bar(aes(fill=state),stat="identity") + 
+    theme_bw()
 #+end_src
 
 #+RESULTS:
-:      speed           dist       
-:  Min.   : 4.0   Min.   :  2.00  
-:  1st Qu.:12.0   1st Qu.: 26.00  
-:  Median :15.0   Median : 36.00  
-:  Mean   :15.4   Mean   : 42.98  
-:  3rd Qu.:19.0   3rd Qu.: 56.00  
-:  Max.   :25.0   Max.   :120.00
-
-Et enfin, voici un exemple de sortie graphique:
-#+begin_src R :results output graphics :file "./cars.png" :exports results :width 600 :height 400 :session *R* 
-plot(cars)
+[[file:/var/folders/7s/_r7s0qgj0nlbng33j4v38z9h0000gn/T/babel-dXCm2H/figureLo0GIk.png]]
+
+Let us Check the top 7 jobs present in the US:
+#+begin_src R :results output graphics :file (org-babel-temp-file "figure" ".png") :exports both :width 600 :height 400 :session *R* 
+top_jobs <-ddply(df,.(job),summarise,number_of_jobs=length((state)))
+top_jobs <-top_jobs[order(top_jobs$number_of_jobs, decreasing=TRUE),]
+ggplot(data=top_jobs, aes(x=reorder(job, -number_of_jobs), y=number_of_jobs)) +
+  geom_bar(stat="identity", color="blue", fill="white")+ 
+  theme(axis.text.x = element_text(angle = 90))
+#+end_src
+
+#+RESULTS:
+[[file:/var/folders/7s/_r7s0qgj0nlbng33j4v38z9h0000gn/T/babel-dXCm2H/figurejq9iem.png]]
+
+** Réflexion
+It seems that this database is more about politics since we see that the top 2 jobs are Republicans and Democrats.
+Let us check the rate of Republicans versus Democrats in the top states of the US.
+But First let us identify the top states of the US.
+#+begin_src R :results output graphics :file (org-babel-temp-file "figure" ".png") :exports both :width 600 :height 400 :session *R* 
+jobs_per_state <-ddply(df,.(state),summarise,number_of_jobs=length((job)))
+jobs_per_state <-jobs_per_state[order(jobs_per_state$number_of_jobs, decreasing=TRUE),]
+ggplot(data=jobs_per_state, aes(x=reorder(state, -number_of_jobs), y=number_of_jobs)) +
+  geom_bar(stat="identity", color="white", fill="red")+ 
+  theme(axis.text.x = element_text(angle = 90))
+#+end_src
+
+#+RESULTS:
+[[file:/var/folders/7s/_r7s0qgj0nlbng33j4v38z9h0000gn/T/babel-dXCm2H/figureDIwvHt.png]]
+
+Thus, we can conclude that the top 7 US states having the higher jobs availability are: Texas, Florida, Illinois, Ohio, Wisconsin, Georgia and Rhode Island.
+
+Now let us compare the distribution of the Republican versus Democrat
+in the top 7 US states:
+#+begin_src R :results output graphics :file (org-babel-temp-file "figure" ".png") :exports both :width 600 :height 400 :session *R* 
+df %>% 
+  filter(df$state %in% c("Texas", "Florida", "Illinois", "Ohio", "Wisconsin", "Georgia", "Rhode Island") & df$job %in% c("Republican", "Democrat")) %>%
+  group_by(state, job) %>%
+  dplyr::summarise(Nbr_of_jobs=n()) %>%
+  ggplot(aes(x=state, y=Nbr_of_jobs))+ 
+    geom_bar(aes(fill=job),stat="identity") + 
+    theme_bw()
+#+end_src
+
+#+RESULTS:
+[[file:/var/folders/7s/_r7s0qgj0nlbng33j4v38z9h0000gn/T/babel-dXCm2H/figurekHTcAX.png]]
+
+** Word Cloud
+We could have also found the top states and top jobs using word cloud.
+
+Top Jobs:
+#+begin_src R :results output graphics :file (org-babel-temp-file "figure" ".png") :exports both :width 600 :height 400 :session *R* 
+library(wordcloud)
+library(RColorBrewer)
+pal2 <- brewer.pal(8,"Set2")#length(unique(top_jobs$job))
+wordcloud(top_jobs$job, top_jobs$number_of_jobs,
+     random.order=TRUE, rot.per=.10, colors=pal2, vfont=c("sans serif","plain"))
+#+end_src
+
+#+RESULTS:
+[[file:/var/folders/7s/_r7s0qgj0nlbng33j4v38z9h0000gn/T/babel-dXCm2H/figureSVtKM2.png]]
+
+Top US states:
+#+begin_src R :results output graphics :file (org-babel-temp-file "figure" ".png") :exports both :width 600 :height 400 :session *R* 
+pal2 <- brewer.pal(8,"Accent")
+wordcloud(jobs_per_state$state, jobs_per_state$number_of_jobs,
+     random.order=FALSE, rot.per=.15, colors=pal2, vfont=c("sans serif","plain"))
 #+end_src
 
 #+RESULTS:
-[[file:./cars.png]]
-
-Vous remarquerez le paramètre ~:exports results~ qui indique que le code
-ne doit pas apparaître dans la version finale du document. Nous vous
-recommandons dans le cadre de ce MOOC de ne pas changer ce paramètre
-(indiquer ~both~) car l'objectif est que vos analyses de données soient
-parfaitement transparentes pour être reproductibles.
-
-Attention, la figure ainsi générée n'est pas stockée dans le document
-org. C'est un fichier ordinaire, ici nommé ~cars.png~. N'oubliez pas
-de le committer si vous voulez que votre analyse soit lisible et
-compréhensible sur GitLab.
-
-Enfin, pour les prochains exercices, nous ne vous fournirons pas
-forcément de fichier de départ, ça sera à vous de le créer, par
-exemple en repartant de ce document et de le commiter vers
-gitlab. N'oubliez pas que nous vous fournissons dans les ressources de
-ce MOOC une configuration avec un certain nombre de raccourcis
-claviers permettant de créer rapidement les blocs de code R (en
-faisant ~<r~ ou ~<R~ suivi de ~Tab~).
-
-Maintenant, à vous de jouer! Vous pouvez effacer toutes ces
-informations et les remplacer par votre document computationnel.
+[[file:/var/folders/7s/_r7s0qgj0nlbng33j4v38z9h0000gn/T/babel-dXCm2H/figure3i2g5j.png]]
diff --git a/module2/exo4/exercice_fr.Rmd b/module2/exo4/exercice_fr.Rmd
index baa5f24fdc36e3171ef81ff5fbe56ec7742c3c24..3679fc32dc5a8dc62643ff84493327af5dba03d9 100644
--- a/module2/exo4/exercice_fr.Rmd
+++ b/module2/exo4/exercice_fr.Rmd
@@ -10,13 +10,12 @@ output: html_document
 knitr::opts_chunk$set(echo = TRUE)
 ```
 
-*Les données utilisées dans cet exercice sont open source et ne sont pas en relation avec ma thèse parce que les données utilisées dans la thèse sont confidentielles.*
 
 ## Exploration du répertoire
 
 D'abord, on fait un git pull pour récupérer les données qu'on a téléversé dans le répertoire GIT.<br>
 **Attention!** Si tu as commencé à écrire dans le notebook, enregistre les données pour ne pas les perdre.<br>
-Ensuite, on s'assure que nous avons les données dans le répertoire avec la commande "ls".
+Ensuite, on s'assure que nous avons les données dans le répertoire avec la commande "list.files()".
 
 
 ```{r }
@@ -59,7 +58,6 @@ print(paste0("There are ", length(unique(df$state)), " unique state"))
 Number of jobs per state per year
 ```{r, echo=FALSE}
 library(plyr)
-library(tidyr)
 jobs_per_state_year <-ddply(df,.(state,year),summarise,number_of_jobs=length((job)))
 jobs_per_state_year <-jobs_per_state_year[order(jobs_per_state_year$number_of_jobs, decreasing=TRUE),]
 ```
diff --git a/module2/exo4/exercice_python_fr.org b/module2/exo4/exercice_python_fr.org
index c7157ba42216cf2e1d291112bb351ce48811115c..f007170ccb0d0253f9bf6b34591cba947391f991 100644
--- a/module2/exo4/exercice_python_fr.org
+++ b/module2/exo4/exercice_python_fr.org
@@ -1,6 +1,6 @@
-#+TITLE:  Votre titre
-#+AUTHOR: Votre nom
-#+DATE:   La date du jour
+#+TITLE:  Exercice 4
+#+AUTHOR: Waad ALMASRI
+#+DATE:   25/08/2020
 #+LANGUAGE: fr
 # #+PROPERTY: header-args :eval never-export
 
@@ -11,83 +11,171 @@
 #+HTML_HEAD: <script type="text/javascript" src="http://www.pirilampo.org/styles/lib/js/jquery.stickytableheaders.js"></script>
 #+HTML_HEAD: <script type="text/javascript" src="http://www.pirilampo.org/styles/readtheorg/js/readtheorg.js"></script>
 
-* Quelques explications
+* Exploration du répertoire 
+D'abord, on fait un git pull pour récupérer les données qu'on a téléversé dans le répertoire GIT.<br>
+*Attention!* Si tu as commencé à écrire dans le notebook, enregistre les données pour ne pas les perdre.<br>
+Ensuite, on s'assure que nous avons les données dans le répertoire
+avec la commande "listdir()".
 
-Ceci est un document org-mode avec quelques exemples de code
-python. Une fois ouvert dans emacs, ce document peut aisément être
-exporté au format HTML, PDF, et Office. Pour plus de détails sur
-org-mode vous pouvez consulter https://orgmode.org/guide/.
+#+begin_src python :results output :exports both
+import os
+files = os.listdir()
+print(files)
+#+end_src
+
+#+RESULTS:
+: ['exercice_en.ipynb', 'exercice.ipynb', 'exercice_python_en.org', 'exercice_python_fr.org', 'data.csv', 'exercice_R_en.org', 'bar-chart.html', 'exercice_R_fr.org', 'cosxsx.png', 'exercice_fr.Rmd', 'exercice_en.Rmd', 'exercice_fr.ipynb']
 
-Lorsque vous utiliserez le raccourci =C-c C-e h o=, ce document sera
-compilé en html. Tout le code contenu sera ré-exécuté, les résultats
-récupérés et inclus dans un document final. Si vous ne souhaitez pas
-ré-exécuter tout le code à chaque fois, il vous suffit de supprimer
-le # et l'espace qui sont devant le ~#+PROPERTY:~ au début de ce
-document.
+* Exploration du jeu de données
+Maintenant qu'on a les données, on va commencer à les explorer.<br>
+*NB:* les données suivantes sont déjà formattées en .csv.
 
-Comme nous vous l'avons montré dans la vidéo, on inclue du code
-python de la façon suivante (et on l'exécute en faisant ~C-c C-c~):
 
 #+begin_src python :results output :exports both
-print("Hello world!")
+print("Reading Data...")
+import pandas as pd
+df = pd.read_csv("./data.csv", sep="\t")
+print("Checking Data...")
+print("In this dataframe there are ",len(df), "data samples") 
+print(df.head())
+print("Adding a column for 'year'")
+df['year'] = df.date.apply(lambda x: int(x[:4]))
+print("Checking Missing Data...")
+print(df.isnull().sum())
+print("Dropping rows having a Null job i.e. missing job info")
+df_ = df.dropna(subset=['job'])
+print("The number of data samples left are",len(df_))
+print("\n Statistiques de Base")
+print("There are ",len(set(df_.job)), " unique jobs.")
+print("There are ",len(set(df_.edited_by)), " unique editors.")
+print("There are ",len(set(df_.state)), " unique states.")
+print("Number of jobs per state per year")
+pivot_table = pd.pivot_table(df_, index=['state'], columns=['year'], values=['job'], aggfunc='count', fill_value=0)
+print(pivot_table)
+
 #+end_src
 
 #+RESULTS:
-: Hello world!
-
-Voici la même chose, mais avec une session python, donc une
-persistance d'un bloc à l'autre (et on l'exécute toujours en faisant
-~C-c C-c~).
-#+begin_src python :results output :session :exports both
-import numpy
-x=numpy.linspace(-15,15)
-print(x)
+
+
+
+* Representations graphiques
+We will start by plotting the Nbr of jobs per year of New York versus Texas.
+
+#+begin_src python :results file :session :var matplot_lib_filename="fig1_python_org.png" :exports both
+import pandas as pd
+df = pd.read_csv("./data.csv", sep="\t")
+df_ = df.dropna(subset=['job'])
+import plotly
+import plotly.graph_objs as go
+# Create two additional DataFrames to traces
+df1 = df_[df_.state == "New York"]
+df2 = df_[df_.state == "Texas"]
+# Create two traces, first "New York" and second "Texas"
+trace1 = go.Bar(x=df1["year"], y=df1["job"], name="New York")
+trace2 = go.Bar(x=df2["year"], y=df2["job"], name="Texas")
+# Fill out  data with our traces
+data = [trace1, trace2]
+# Create layout and specify title, legend and so on
+layout = go.Layout(title="Nbr of jobs per state per year",
+                   xaxis=dict(title="Year"),
+                   yaxis=dict(title="Count of Jobs"),
+                   barmode="group")
+# Create figure with all prepared data for plot
+fig = go.Figure(data=data, layout=layout)
+fig
+fig.write_image(matplot_lib_filename)
 #+end_src
 
-#+RESULTS:
-#+begin_example
-[-15.         -14.3877551  -13.7755102  -13.16326531 -12.55102041
- -11.93877551 -11.32653061 -10.71428571 -10.10204082  -9.48979592
-  -8.87755102  -8.26530612  -7.65306122  -7.04081633  -6.42857143
-  -5.81632653  -5.20408163  -4.59183673  -3.97959184  -3.36734694
-  -2.75510204  -2.14285714  -1.53061224  -0.91836735  -0.30612245
-   0.30612245   0.91836735   1.53061224   2.14285714   2.75510204
-   3.36734694   3.97959184   4.59183673   5.20408163   5.81632653
-   6.42857143   7.04081633   7.65306122   8.26530612   8.87755102
-   9.48979592  10.10204082  10.71428571  11.32653061  11.93877551
-  12.55102041  13.16326531  13.7755102   14.3877551   15.        ]
-#+end_example
-
-Et enfin, voici un exemple de sortie graphique:
-#+begin_src python :results output file :session :var matplot_lib_filename="./cosxsx.png" :exports results
+Checkig the top 7 jobs present in the United states
+#+begin_src python :results output :exports both
+import pandas as pd
+df = pd.read_csv("./data.csv", sep="\t")
+df_ = df.dropna(subset=['job'])
+df_.job.value_counts()[:7]
+#+end_src
+
+#+begin_src python :results file :session :var matplot_lib_filename="fig2_python_org.png" :exports both
+import pandas as pd
+df = pd.read_csv("./data.csv", sep="\t")
+df_ = df.dropna(subset=['job'])
+import matplotlib.pyplot as plt
+import seaborn as sns
+sns.countplot(df_[df_.job.isin(df_.job.value_counts()[:7].keys())].job)
+plt.xticks(rotation=90)
+plt.savefig(matplot_lib_filename)
+#+end_src
+
+** Réflexion
+It seems that this database is more about politics since we see that the top 2 jobs are Republicans and Democrats.
+Let us check the rate of Republicans versus Democrats in the top states of the US.
+But First let us identify the top states of the US.
+
+Checkig the top 7 US states present in the dataset
+#+begin_src python :results output :exports both
+
+import pandas as pd
+df = pd.read_csv("./data.csv", sep="\t")
+df_ = df.dropna(subset=['job'])
+df_.state.value_counts()[:7]
+#+end_src
+
+#+begin_src python :results file :session :var matplot_lib_filename="fig3_python_org.png" :exports both
+import pandas as pd
+df = pd.read_csv("./data.csv", sep="\t")
+df_ = df.dropna(subset=['job'])
+import seaborn as sns
+import matplotlib.pyplot as plt
+sns.countplot(df_[df_.state.isin(df_.state.value_counts()[:7].keys())].state)
+plt.xticks(rotation=90)
+plt.savefig(matplot_lib_filename)
+#+end_src
+
+Now let us compare the distribution of the Republican versus Democrat
+in the top 7 US states. 
+#+begin_src python :results file :session :var matplot_lib_filename="fig4_python_org.png" :exports both
+import pandas as pd
+df = pd.read_csv("./data.csv", sep="\t")
+df_ = df.dropna(subset=['job'])
+import seaborn as sns
 import matplotlib.pyplot as plt
+df1 = df_[(df_.job.isin(["Republican","Democrat"])) & df_.state.isin(df_.state.value_counts()[:7].keys())]
+sns.countplot(data=df1, x='state', hue='job' )
+plt.title("Distribution of Republican vs Democrat in the top 7 US states in the database")
+plt.xticks(rotation=90)
+plt.savefig(matplot_lib_filename)
+#+end_src
 
-plt.figure(figsize=(10,5))
-plt.plot(x,numpy.cos(x)/x)
-plt.tight_layout()
+* Word Cloud
+We could have also found the top states ad top jobs using word cloud.
 
+#+begin_src python :results file :session :var matplot_lib_filename="fig5_python_org.png" :exports both
+import pandas as pd
+df = pd.read_csv("./data.csv", sep="\t")
+df_ = df.dropna(subset=['job'])
+import matplotlib.pyplot as plt
+from wordcloud import WordCloud, ImageColorGenerator
+text = ' '.join(df_.job.tolist())
+wordcloud = WordCloud(background_color="white").generate(text)
+# Display the generated image:
+plt.figure(figsize=(15,8))
+plt.imshow(wordcloud)
+plt.axis("off")
+plt.show()
 plt.savefig(matplot_lib_filename)
-print(matplot_lib_filename)
 #+end_src
 
-#+RESULTS:
-[[file:./cosxsx.png]]
-
-Vous remarquerez le paramètre ~:exports results~ qui indique que le code
-ne doit pas apparaître dans la version finale du document. Nous vous
-recommandons dans le cadre de ce MOOC de ne pas changer ce paramètre
-(indiquer ~both~) car l'objectif est que vos analyses de données soient
-parfaitement transparentes pour être reproductibles.
-
-Attention, la figure ainsi générée n'est pas stockée dans le document
-org. C'est un fichier ordinaire, ici nommé ~cosxsx.png~. N'oubliez pas
-de le committer si vous voulez que votre analyse soit lisible et
-compréhensible sur GitLab.
-
-Enfin, n'oubliez pas que nous vous fournissons dans les ressources de
-ce MOOC une configuration avec un certain nombre de raccourcis
-claviers permettant de créer rapidement les blocs de code python (en
-faisant ~<p~, ~<P~ ou ~<PP~ suivi de ~Tab~).
-
-Maintenant, à vous de jouer! Vous pouvez effacer toutes ces
-informations et les remplacer par votre document computationnel.
+#+begin_src python :results file :session :var matplot_lib_filename="fig6_python_org.png" :exports both
+import pandas as pd
+df = pd.read_csv("./data.csv", sep="\t")
+df_ = df.dropna(subset=['job'])
+import matplotlib.pyplot as plt
+text = ' '.join(df_.state.tolist())
+wordcloud = WordCloud(background_color="pink").generate(text)
+# Display the generated image:
+plt.figure(figsize=(15,8))
+plt.imshow(wordcloud)
+plt.axis("off")
+plt.show()
+plt.savefig(matplot_lib_filename)
+#+end_src