# -*- mode: org -*- #+TITLE: Gérer un workflow avec snakemake #+DATE: August, 2019 #+STARTUP: overview indent #+OPTIONS: num:nil toc:t #+PROPERTY: header-args :eval never-export * Préambule Avant de lancer =org-babel-tangle=, il faut créer tous les répertoires qui vont accueillir les fichiers: #+begin_src sh :results output :exports both for directory in incidence_syndrome_grippal incidence_syndrome_grippal_par_region incidence_syndrome_grippal_par_region_v2 do rm -rf $directory mkdir $directory mkdir $directory/data mkdir $directory/scripts done #+end_src #+RESULTS: Puis: #+begin_src emacs-lisp (org-babel-tangle) #+end_src #+RESULTS: | incidence_syndrome_grippal/scripts/annual-incidence-histogram.R | incidence_syndrome_grippal/scripts/annual-incidence.R | incidence_syndrome_grippal/scripts/incidence-plots.R | incidence_syndrome_grippal_par_region_v2/scripts/split-by-region.py | incidence_syndrome_grippal_par_region/scripts/peak-years.py | incidence_syndrome_grippal_par_region/scripts/split-by-region.py | incidence_syndrome_grippal/scripts/preprocess.py | incidence_syndrome_grippal_par_region_v2/Snakefile | incidence_syndrome_grippal_par_region/Snakefile | incidence_syndrome_grippal/Snakefile | * Installer snakemake ** Linux par les distributions ** Mac, Windows par Anaconda * L'analyse de l'incidence du syndrome grippal revisitée Je vais reprendre l'exemple du module 3, l'analyse de l'incidence du syndrome grippal, et je vais refaire exactement la même analyse sous forme d'un workflow par =snakemake=. Ceci veut dire que pour l'instant, nous quittons le monde des documents computationnels que nous vous avons montré dans les modules 2 et 3, pour passer dans l'univers de la ligne de commande. Il y a des bonnes raisons pour cela, que je vous donnerai plus tard. Et vous verrez aussi le retour des documents computationnels à la fin de ce tutoriel, même si ce sera dans un rôle moins central. Un workflow est composé de tâches dont chacun correspond à un bout du calcul total. Une tâche est typiquement l'exécution d'une commande ou d'un script. Les tâches communiques entre eux par des fichiers - au moins dans la vision de =snakemake= (et d'autres descendants de =make=). Pour faire le lien avec la programmation dans un langage comme Python ou R, une tâche est l'appel à une fonction, et les paramètres et les valeurs de retour sont stockés dans des fichiers. Il y a beaucoup de liberté dans la décomposition d'un calcul en tâches d'un workflow. Souvent les critères sont plutôt techniques que scientifiques: une tâche peut alors correspondre à l'exécution d'un logiciel, ou à une étape du calcul qui est fait sur un ordinateur précis. Pour mon analyse je propose la décomposition suivante, qui est assez arbitraire : 1. Téléchargement des données du site du Réseau Sentinelles 2. Pré-traitement: extraction des données utilisées, vérifications 3. Visualisation: génération des plots 4. Calcul des incidences annuelles 5. Calcul de l'histogramme des incidences annuelles Pour faire les calculs, je vais recycler le code du module 3, sans les commenter de nouveau ici. ** Préparation Un workflow finit par utiliser beaucoup de fichiers, il est donc prudent de les regrouper dans un répertoire, avec des sous-répertoires pour les scripts et les données: #+begin_src sh :session *snakemake* :results output :exports both # déjà fait: mkdir incidence_syndrome_grippal cd incidence_syndrome_grippal # déjà fait: mkdir data # déjà fait: mkdir scripts #+end_src #+RESULTS: ** 1ère tâche: le téléchargement des données Pour télécharger un fichier, inutile d'écrire du code: l'utilitaire =wget= fait ce qu'il faut. La ligne de commande #+begin_src sh :session *snakemake* :results output :exports both wget -O data/weekly-incidence.csv http://www.sentiweb.fr/datasets/incidence-PAY-3.csv #+end_src #+RESULTS: : --2019-09-24 15:00:23-- http://www.sentiweb.fr/datasets/incidence-PAY-3.csv : Resolving www.sentiweb.fr (www.sentiweb.fr)... 134.157.220.17 : Connecting to www.sentiweb.fr (www.sentiweb.fr)|134.157.220.17|:80... connected. : HTTP request sent, awaiting response... 200 OK : Length: unspecified [text/csv] : Saving to: 'data/weekly-incidence.csv' : ] 0 --.-KB/s data/weekly-incidence.c [ <=> ] 80.00K --.-KB/s in 0.06s : : 2019-09-24 15:00:24 (1.38 MB/s) - 'data/weekly-incidence.csv' saved [81916] fait ce qu'il faut, et dépose les données dans le fichier =data/weekly-incidence.csv=. Je le supprime parce que je veux faire le téléchargement dans mon workflow! #+begin_src sh :session *snakemake* ::results output :exports both rm data/weekly-incidence.csv #+end_src #+RESULTS: Je vais commencer la rédaction du =Snakefile=, le fichier qui déinit mon workflow: #+begin_src :exports both :tangle incidence_syndrome_grippal/Snakefile rule download: output: "data/weekly-incidence.csv" shell: "wget -O {output} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv" #+end_src Un =Snakefile= consiste de /règles/ qui définissent les tâches. Chaque règle a un nom, ici j'ai choisi /download/. Une règle liste aussi les fichiers d'entrée (aucun dans ce cas) et de sortie (notre fichier de données). Enfin, il faut dire ce qui est à faire pour exécuter la tâche, ce qui est ici la commande =wget=. Pour exécuter cette tâche, il y a deux façons de faire: on peut demander à =snakemake= d'exécuter la règle =download=: #+begin_src sh :session *snakemake* ::results output :exports both snakemake download #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 download 1 [Tue Sep 24 15:00:41 2019] rule download: output: data/weekly-incidence.csv jobid: 0 --2019-09-24 15:00:41-- http://www.sentiweb.fr/datasets/incidence-PAY-3.csv Resolving www.sentiweb.fr (www.sentiweb.fr)... 134.157.220.17 Connecting to www.sentiweb.fr (www.sentiweb.fr)|134.157.220.17|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/csv] Saving to: 'data/weekly-incidence.csv' 2019-09-24 15:00:41 (1.08 MB/s) - 'data/weekly-incidence.csv' saved [81916] [Tue Sep 24 15:00:41 2019] Finished job 0. 1 of 1 steps (100%) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal/.snakemake/log/2019-09-24T150041.026347.snakemake.log #+end_example En regardant bien ce que =snakemake= dit au deuxième tour, il s'est rendu compte qu'il n'y a rien à faire, parce que le fichier souhaité existe déjà. Voici un premier avantage important d'un workflow: une tâche n'est exécutée que s'il est nécessaire. Quand une tâche met deux heures à exécuter, c'est appréciable. ** 2ème tâche: le pré-traitement des données La deuxième tâche est le pré-traitement: en partant du fichier téléchargé du Réseau Sentinelle, il faut extraire juste les éléments nécessaires, et il faut vérifier s'il y a des données manquantes ou des erreurs. Dans un document computationnel, j'avais procédé pas par pas, en inspectant les résultats à chaque étape. Dans mon workflow, le pré-traitement devient une seule tâche, exécutée en bloc. Il faut donc bien réfléchir à ce qu'on attend comme résultat. En fait, il faut deux fichiers de sortie: un qui contient les données qui seront analysées par la suite, et un autre qui contient les éventuels messages d'erreur. Avec ça, la deuxième règle s'écrit assez vite: #+begin_src :exports both :tangle incidence_syndrome_grippal/Snakefile rule preprocess: input: "data/weekly-incidence.csv" output: data="data/preprocessed-weekly-incidence.csv", errorlog="data/errors-from-preprocessing.txt" script: "scripts/preprocess.py" #+end_src Il y a donc un fichier d'entrée, qui est le résultat de la tâche /download/. Et il y a les deux fichiers de sortie, un pour les résultats et un pour les messages d'erreur. Enfin, pour faire le travail, j'ai opté pour un script Python cette fois. =snakemake= reconnaît le langage par l'extension =.py=. Le contenu de ce script est presque un copier-coller d'un document computationnel du module 3, plus précisément du document que j'ai montré dans le parcours Emacs/Org-Mode: #+begin_src python :exports both :tangle incidence_syndrome_grippal/scripts/preprocess.py # Libraries used by this script: import datetime # for date conversion import csv # for writing output to a CSV file # Read the CSV file into memory data = open(snakemake.input[0], 'rb').read() # Decode the Latin-1 character set, # remove white space at both ends, # and split into lines. lines = data.decode('latin-1') \ .strip() \ .split('\n') # Discard the first line, which contains a comment data_lines = lines[1:] # Split each line into columns table = [line.split(',') for line in data_lines] # Remove records with missing data and write # the removed records to a separate file for inspection. with open(snakemake.output.errorlog, "w") as errorlog: valid_table = [] for row in table: missing = any([column == '' for column in row]) if missing: errorlog.write("Missing data in record\n") errorlog.write(str(row)) errorlog.write("\n") else: valid_table.append(row) # Extract the two relevant columns, "week" and "inc" week = [row[0] for row in valid_table] assert week[0] == 'week' del week[0] inc = [row[2] for row in valid_table] assert inc[0] == 'inc' del inc[0] data = list(zip(week, inc)) # Check for obviously out-of-range values with open(snakemake.output.errorlog, "a") as errorlog: for week, inc in data: if len(week) != 6 or not week.isdigit(): errorlog.write("Suspect value in column 'week': {week}\n") if not inc.isdigit(): errorlog.write("Suspect value in column 'inc': {inc}\n") # Convert year/week by date of the corresponding Monday, # then sort by increasing date converted_data = \ [(datetime.datetime.strptime(year_and_week + ":1" , '%G%V:%u').date(), inc) for year_and_week, inc in data] converted_data.sort(key = lambda record: record[0]) # Check that consecutive dates are seven days apart with open(snakemake.output.errorlog, "a") as errorlog: dates = [date for date, _ in converted_data] for date1, date2 in zip(dates[:-1], dates[1:]): if date2-date1 != datetime.timedelta(weeks=1): errorlog.write(f"{date2-date1} between {date1} and {date2}\n") # Write data to a CSV file with two columns: # 1. the date of the Monday of each week, in ISO format # 2. the incidence estimate for that week with open(snakemake.output.data, "w") as csvfile: csv_writer = csv.writer(csvfile) csv_writer.writerow(["week_starting", "incidence"]) for row in converted_data: csv_writer.writerow(row) #+end_src Ce qui saute aux yeux d'abord, c'est =snakemake.input[0]= comme nom de fichier. Le nom =snakemake= semble venir de nulle part: il n'est ni défini dans le script, ni importé d'un module. En fait, c'est bien =snakemake= qui définit ce nom dans l'interprète Python avant de lancer le script. Il permet d'accèder aux définitions du =Snakefile=, et notamment aux noms des fichiers. Sinon, il y a deux modifications par rapport au code du module 3. Premièrement, les messages d'erreurs sont écrits dans un fichier. Deuxièmement, les données finales sont écrites également dans un fichier, en utilisant le format CSV. Pour appliquer le pré-traitement, demandons à =snakemake=: #+begin_src sh :session *snakemake* :results output :exports both snakemake preprocess #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 preprocess 1 [Tue Sep 24 15:02:32 2019] rule preprocess: input: data/weekly-incidence.csv output: data/preprocessed-weekly-incidence.csv, data/errors-from-preprocessing.txt jobid: 0 [Tue Sep 24 15:02:33 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal/.snakemake/log/2019-09-24T150232.768339.snakemake.log #+end_example Voyons s'il y a eu des problèmes: #+begin_src sh :session *snakemake* :results output :exports both cat data/errors-from-preprocessing.txt #+end_src #+RESULTS: : Missing data in record : ['198919', '3', '0', '', '', '0', '', '', 'FR', 'France'] : 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 En effet, on avait vu dans le module 3 qu'il y a un point manquant dans ce jeu de données. Quant aux données, je vais afficher juste le début: #+begin_src sh :session *snakemake* :results output :exports both head -10 data/preprocessed-weekly-incidence.csv #+end_src #+RESULTS: #+begin_example week_starting,incidence 1984-10-29,68422 1984-11-05,135223 1984-11-12,87330 1984-11-19,72029 1984-11-26,78620 1984-12-03,101073 1984-12-10,123680 1984-12-17,101726 1984-12-24,84830 #+end_example Ça a l'air pas mal! ** 3ème tâche: préparer les plots La règle pour faire les plots ne présente plus aucune surprise: #+begin_src :exports both :tangle incidence_syndrome_grippal/Snakefile rule plot: input: "data/preprocessed-weekly-incidence.csv" output: "data/weekly-incidence-plot.png", "data/weekly-incidence-plot-last-years.png" script: "scripts/incidence-plots.R" #+end_src Il y a les données pré-traitées à l'entrée, et deux fichiers image à la sortie, créées par un script, cette fois en langage R: #+begin_src R :exports both :tangle incidence_syndrome_grippal/scripts/incidence-plots.R # Read in the data and convert the dates data = read.csv(snakemake@input[[1]]) data$week_starting <- as.Date(data$week_starting) # Plot the complete incidence dataset png(filename=snakemake@output[[1]]) plot(data, type="l", xlab="Date", ylab="Weekly incidence") dev.off() # Zoom on the last four years png(filename=snakemake@output[[2]]) plot(tail(data, 4*52), type="l", xlab="Date", ylab="Weekly incidence") dev.off() #+end_src Comme pour le script Python de l'étape précedente, l'accès aux noms des fichiers se fait par le nom =snakemake= qui est créé par... =snakemake=. Passons à l'exécution: #+begin_src sh :session *snakemake* :results output :exports both snakemake plot #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 plot 1 [Tue Sep 24 15:03:17 2019] rule plot: input: data/preprocessed-weekly-incidence.csv output: data/weekly-incidence-plot.png, data/weekly-incidence-plot-last-years.png jobid: 0 During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' null device 1 null device 1 [Tue Sep 24 15:03:18 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal/.snakemake/log/2019-09-24T150317.752684.snakemake.log #+end_example Voici les deux plots: [[file:incidence_syndrome_grippal/data/weekly-incidence-plot.png]] [[file:incidence_syndrome_grippal/data/weekly-incidence-plot-last-years.png]] ** 4ème tâche: calculer l'incidence annuelle Écrire les règles pour =snakemake= devient vite une routine: #+begin_src :exports both :tangle incidence_syndrome_grippal/Snakefile rule annual_incidence: input: "data/preprocessed-weekly-incidence.csv" output: "data/annual-incidence.csv" script: "scripts/annual-incidence.R" #+end_src Et le script en langage R ressemble fortement au code du module 3: #+begin_src R :exports both :tangle incidence_syndrome_grippal/scripts/annual-incidence.R # Read in the data and convert the dates data = read.csv(snakemake@input[[1]]) names(data) <- c("date", "incidence") data$date <- as.Date(data$date) # A function that extracts the peak for year N yearly_peak = function(year) { start = paste0(year-1,"-08-01") end = paste0(year,"-08-01") records = data$date > start & data$date <= end sum(data$incidence[records]) } # The years for which we have the full peak years <- 1986:2018 # Make a new data frame for the annual incidences annual_data = data.frame(year = years, incidence = sapply(years, yearly_peak)) # write output file write.csv(annual_data, file=snakemake@output[[1]], row.names=FALSE) #+end_src Allons-y! #+begin_src sh :session *snakemake* :results output :exports both snakemake annual_incidence #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 annual_incidence 1 [Tue Sep 24 15:03:37 2019] rule annual_incidence: input: data/preprocessed-weekly-incidence.csv output: data/annual-incidence.csv jobid: 0 During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' [Tue Sep 24 15:03:37 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal/.snakemake/log/2019-09-24T150337.607638.snakemake.log #+end_example Voyons le début du résultat: #+begin_src sh :session *snakemake* :results output :exports both head -10 data/annual-incidence.csv #+end_src #+RESULTS: #+begin_example "year","incidence" 1986,5100540 1987,2861556 1988,2766142 1989,5460155 1990,5233987 1991,1660832 1992,2576347 1993,2703708 1994,3515735 #+end_example ** 5ème tâche: l'histogramme Et pour finir, encore un petit script en R: #+begin_src :exports both :tangle incidence_syndrome_grippal/Snakefile rule histogram: input: "data/annual-incidence.csv" output: "data/annual-incidence-histogram.png" script: "scripts/annual-incidence-histogram.R" #+end_src #+begin_src R :exports both :tangle incidence_syndrome_grippal/scripts/annual-incidence-histogram.R # Read in the data and convert the dates data = read.csv(snakemake@input[[1]]) # Plot the histogram png(filename=snakemake@output[[1]]) hist(data$incidence, breaks=10, xlab="Annual incidence", ylab="Number of observations", main="") dev.off() #+end_src #+begin_src sh :session *snakemake* :results output :exports both snakemake histogram #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 histogram 1 [Tue Sep 24 15:03:55 2019] rule histogram: input: data/annual-incidence.csv output: data/annual-incidence-histogram.png jobid: 0 During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" null device 1 [Tue Sep 24 15:03:55 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal/.snakemake/log/2019-09-24T150355.592895.snakemake.log #+end_example [[file:incidence_syndrome_grippal/data/annual-incidence-histogram.png]] * Travailler avec un workflow Jusqu'ici, j'ai lancé chaque tâche de mon workflow à la main, une par une. Avec le même effort, j'aurais pu lancer directement les divers scripts qui font le travail de fon. Autrement dit, =snakemake= ne m'a rien apporté, autre que sortir les noms des fichiers des scripts, qui devienennt ainsi un peu plus généraux, pour les transférer dans le grand script maître qui est =Snakefile=. J'ai déjà évoqué un avantage du workflow: les tâches ne sont exécutées qu'en cas de besoin. Par exemple, la commande =snakemake plot= exécute le script =scripts/incidence-plots.R= seulement si l'une des conditions suivantes est satisfaite: 1. Un des deux fichiers =data/weekly-incidence-plot.png= et =data/weekly-incidence-plot-last-years.png= est absent. 2. Un des deux fichiers =data/weekly-incidence-plot.png= et =data/weekly-incidence-plot-last-years.png= a une date de modification antérieure à la date de modification du fichier d'entrée, =data/preprocessed-weekly-incidence.csv=. Vérifions cela, en demandant en plus à =snakemake= d'expliquer son raisonnement avec l'option =-r=: #+begin_src sh :session *snakemake* :results output :exports both snakemake -r plot #+end_src #+RESULTS: : Building DAG of jobs... : Nothing to be done. : Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal/.snakemake/log/2019-09-24T150412.176683.snakemake.log Maintenant les plots sont là et à jour. Je vais simuler la modification du fichier d'entrée avec la commande =touch= et relancer: #+begin_src sh :session *snakemake* :results output :exports both touch data/preprocessed-weekly-incidence.csv snakemake -r plot #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 plot 1 [Tue Sep 24 15:04:19 2019] rule plot: input: data/preprocessed-weekly-incidence.csv output: data/weekly-incidence-plot.png, data/weekly-incidence-plot-last-years.png jobid: 0 reason: Updated input files: data/preprocessed-weekly-incidence.csv During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' null device 1 null device 1 [Tue Sep 24 15:04:19 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal/.snakemake/log/2019-09-24T150419.657205.snakemake.log #+end_example Attention, =snakemake= ne regarde que les fichiers listés sous "input", pas les fichiers listés sous "scripts". Autrement dit, la modification d'un script n'entraîne pas sa ré-exécution ! #+begin_src sh :session *snakemake* :results output :exports both touch scripts/incidence-plots.R snakemake -r plot #+end_src #+RESULTS: : : Building DAG of jobs... : Nothing to be done. : Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal/.snakemake/log/2019-09-24T150429.536212.snakemake.log Je considère ceci un défaut de =snakemake=, car le script est une donnée d'entrée du calcul tout comme la séquence de chiffres à plotter. Un petit astuce permet de corriger ce défaut (à condition d'y penser chaque fois qu'on écrit une règle !): on peut rajouter le fichier script à la liste "input": #+begin_src :exports both rule plot: input: "data/preprocessed-weekly-incidence.csv", "scripts/incidence-plots.R" output: "data/weekly-incidence-plot.png", "data/weekly-incidence-plot-last-years.png" script: "scripts/incidence-plots.R" #+end_src On peut aussi demander à =snakemake= de lancer une tâche même si ceci ne lui semble pas nécessaire, avec l'option =-f= (force): #+begin_src sh :session *snakemake* :results output :exports both snakemake -f plot #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 plot 1 [Tue Sep 24 15:04:41 2019] rule plot: input: data/preprocessed-weekly-incidence.csv output: data/weekly-incidence-plot.png, data/weekly-incidence-plot-last-years.png jobid: 0 During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' null device 1 null device 1 [Tue Sep 24 15:04:41 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal/.snakemake/log/2019-09-24T150441.339904.snakemake.log #+end_example Le plus souvent, ce qu'on veut, c'est une mise à jour de tous les résultats suite à une modification. La bonne façon d'y arriver est de rajouter une nouvelle règle, par convention appellée =all=, qui ne fait rien mais demande à l'entrée tous les fichiers créés par toutes les autres tâches : #+begin_src :exports both :tangle incidence_syndrome_grippal/Snakefile rule all: input: "data/weekly-incidence.csv", "data/preprocessed-weekly-incidence.csv", "data/weekly-incidence-plot.png", "data/weekly-incidence-plot-last-years.png", "data/annual-incidence.csv", "data/annual-incidence-histogram.png" #+end_src La mise à jour complète se fait alors avec #+begin_src sh :session *snakemake* :results output :exports both snakemake all #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 annual_incidence 1 histogram 3 [Tue Sep 24 15:04:52 2019] rule annual_incidence: input: data/preprocessed-weekly-incidence.csv output: data/annual-incidence.csv jobid: 4 During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' [Tue Sep 24 15:04:52 2019] Finished job 4. ) done [Tue Sep 24 15:04:52 2019] rule histogram: input: data/annual-incidence.csv output: data/annual-incidence-histogram.png jobid: 5 During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" null device 1 [Tue Sep 24 15:04:52 2019] Finished job 5. ) done [Tue Sep 24 15:04:52 2019] localrule all: input: data/weekly-incidence.csv, data/preprocessed-weekly-incidence.csv, data/weekly-incidence-plot.png, data/weekly-incidence-plot-last-years.png, data/annual-incidence.csv, data/annual-incidence-histogram.png jobid: 0 [Tue Sep 24 15:04:52 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal/.snakemake/log/2019-09-24T150452.405932.snakemake.log #+end_example Les plus paresseux mettent la règle =all= au début du =Snakefile=, parce qu'en absence de tâche (ou fichier) nommé sur la ligne de commande, =snakemake= utilise la première régle qu'il trouve, et pour la mise à jour total, il suffit de taper =snakemake=. Pour rédémarrer de zéro, donc exécuter toutes les tâches, on fait: #+begin_src sh :session *snakemake* :results output :exports both snakemake --forceall all #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 annual_incidence 1 download 1 histogram 1 plot 1 preprocess 6 [Tue Sep 24 15:05:03 2019] rule download: output: data/weekly-incidence.csv jobid: 1 --2019-09-24 15:05:03-- http://www.sentiweb.fr/datasets/incidence-PAY-3.csv Resolving www.sentiweb.fr (www.sentiweb.fr)... 134.157.220.17 Connecting to www.sentiweb.fr (www.sentiweb.fr)|134.157.220.17|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/csv] Saving to: 'data/weekly-incidence.csv' ] 0 --.-KB/s data/weekly-incidence.c [ <=> ] 80.00K --.-KB/s in 0.02s 2019-09-24 15:05:04 (3.55 MB/s) - 'data/weekly-incidence.csv' saved [81916] [Tue Sep 24 15:05:04 2019] Finished job 1. ) done [Tue Sep 24 15:05:04 2019] rule preprocess: input: data/weekly-incidence.csv output: data/preprocessed-weekly-incidence.csv, data/errors-from-preprocessing.txt jobid: 2 [Tue Sep 24 15:05:04 2019] Finished job 2. ) done [Tue Sep 24 15:05:04 2019] rule annual_incidence: input: data/preprocessed-weekly-incidence.csv output: data/annual-incidence.csv jobid: 4 During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' [Tue Sep 24 15:05:04 2019] Finished job 4. ) done [Tue Sep 24 15:05:04 2019] rule plot: input: data/preprocessed-weekly-incidence.csv output: data/weekly-incidence-plot.png, data/weekly-incidence-plot-last-years.png jobid: 3 During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' null device 1 null device 1 [Tue Sep 24 15:05:04 2019] Finished job 3. ) done [Tue Sep 24 15:05:04 2019] rule histogram: input: data/annual-incidence.csv output: data/annual-incidence-histogram.png jobid: 5 During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" null device 1 [Tue Sep 24 15:05:04 2019] Finished job 5. ) done [Tue Sep 24 15:05:04 2019] localrule all: input: data/weekly-incidence.csv, data/preprocessed-weekly-incidence.csv, data/weekly-incidence-plot.png, data/weekly-incidence-plot-last-years.png, data/annual-incidence.csv, data/annual-incidence-histogram.png jobid: 0 [Tue Sep 24 15:05:04 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal/.snakemake/log/2019-09-24T150503.912061.snakemake.log #+end_example Comme =snakemake= gère bien toutes les dépendances entre les données, il peut même nous en faire un dessin, ce qui est fort utile quand les workflows augmentent en taille: #+begin_src sh :session *snakemake* :results output :exports both snakemake --forceall --dag all | dot -Tpng > graph.png #+end_src #+RESULTS: : Building DAG of jobs... [[file:incidence_syndrome_grippal/graph.png]] Pour comprendre cette ligne de commande, il faut savoir que =snakemake= produit ce graphe en exécutant les tâches. Voilà pourquoi il faut les arguments =--forceall all= pour être sûr que toutes les tâches seront exécutées. =dot= est un logiciel qui fait partie de la collection [[https://graphviz.org/][Graphviz]], son rôle est de traduire une description textuelle d'un graph en graphique. Le sigle "DAG" veut dire "Directed Acyclic Graph", graphe orienté acyclique. C'est un type de graphe qu'on trouve naturellement dans les descriptions formelles de dépendences parce que "acyclique" veut simplement dire qu'aucun fichier de données produit ne peut avoir soi-même comme dépendence, directement ou indirectement. En regardant bien ce dessin, vous remarquez peut-être qu'il y a deux branches indépendantes. Une fois qu'on a fait "preprocess", on peut attaquer ou "plot" ou "annual_incidence" suivi de "histogram". Mais ça veut dire aussi qu'on peut exécuter ces deux branches en parallèle et gagner du temps, pourvu qu'on a un ordinateur avec au moins deux processeurs. En fait, =snakemake= s'en charge automatiquement si on lui indique combien de processeurs utiliser: #+begin_src sh :session *snakemake* :results output :exports both snakemake --cores 2 --forceall all #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 2 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 annual_incidence 1 download 1 histogram 1 plot 1 preprocess 6 [Tue Sep 24 15:05:25 2019] rule download: output: data/weekly-incidence.csv jobid: 1 --2019-09-24 15:05:25-- http://www.sentiweb.fr/datasets/incidence-PAY-3.csv Resolving www.sentiweb.fr (www.sentiweb.fr)... 134.157.220.17 Connecting to www.sentiweb.fr (www.sentiweb.fr)|134.157.220.17|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/csv] Saving to: 'data/weekly-incidence.csv' ] 0 --.-KB/s data/weekly-incidence.c [ <=> ] 80.00K --.-KB/s in 0.02s 2019-09-24 15:05:25 (4.87 MB/s) - 'data/weekly-incidence.csv' saved [81916] [Tue Sep 24 15:05:25 2019] Finished job 1. ) done [Tue Sep 24 15:05:25 2019] rule preprocess: input: data/weekly-incidence.csv output: data/preprocessed-weekly-incidence.csv, data/errors-from-preprocessing.txt jobid: 2 [Tue Sep 24 15:05:25 2019] Finished job 2. ) done [Tue Sep 24 15:05:25 2019] rule plot: input: data/preprocessed-weekly-incidence.csv output: data/weekly-incidence-plot.png, data/weekly-incidence-plot-last-years.png jobid: 3 [Tue Sep 24 15:05:25 2019] rule annual_incidence: input: data/preprocessed-weekly-incidence.csv output: data/annual-incidence.csv jobid: 4 During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' [Tue Sep 24 15:05:26 2019] Finished job 4. ) done [Tue Sep 24 15:05:26 2019] rule histogram: input: data/annual-incidence.csv output: data/annual-incidence-histogram.png jobid: 5 null device 1 null device 1 [Tue Sep 24 15:05:26 2019] Finished job 3. ) done During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" null device 1 [Tue Sep 24 15:05:26 2019] Finished job 5. ) done [Tue Sep 24 15:05:26 2019] localrule all: input: data/weekly-incidence.csv, data/preprocessed-weekly-incidence.csv, data/weekly-incidence-plot.png, data/weekly-incidence-plot-last-years.png, data/annual-incidence.csv, data/annual-incidence-histogram.png jobid: 0 [Tue Sep 24 15:05:26 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal/.snakemake/log/2019-09-24T150525.566515.snakemake.log #+end_example * Vers la gestion de données plus volumineuses Le workflow que je viens de montrer produit 7 fichiers. Ce n'est pas beaucoup. On peut les nommer à la main, un par un, sans difficulté. Dans la vraie vie, par exemple en bioinformatique, un workflow peut facilement gérer des centaines ou milliers de fichiers, par exemple un fichier par séquence d'acides aminés dans une étude de protéomique. Dans une telle situation, il faut définir un schéma pour nommer les fichiers de façon systématique, et introduire des boucles dans le workflow dont les itérations seront idéalement exécutées en parallèle. Je vais illustrer ceci avec une variante de l'analyse de l'incidence du syndrome grippal. Elle utilise une forme plus détaillée des données brutes dans laquelle les incidence sont repertoriées par région plutôt que pour la France entière. Il faut donc répéter le calcul de l'incidence annuelle 13 fois, une fois pour chaque région. Pour simplifier un peu, le résultat principal de ce nouveau workflow sera un fichier qui contient, pour chaque région, l'année dans laquelle l'incidence était la plus élevée. Il n'y a donc pas d'histogramme. Pour cette deuxième version, je crée un nouveau répertoire, et j'y fais une copie des scripts qui seront réutilisés sans modification: #+begin_src sh :session *snakemake2* :results output :exports both # déjà fait: mkdir incidence_syndrome_grippal_par_region cd incidence_syndrome_grippal_par_region # déjà fait: mkdir data # déjà fait: mkdir scripts cp -r ../incidence_syndrome_grippal/scripts/preprocess.py ./scripts/ cp -r ../incidence_syndrome_grippal/scripts/annual-incidence.R ./scripts/ cp -r ../incidence_syndrome_grippal/scripts/incidence-plots.R ./scripts/ #+end_src #+RESULTS: Et puis je vais vous montrer le =Snakefile=, tout de suite en entier, que je vais commenter après. #+begin_src :exports both :tangle incidence_syndrome_grippal_par_region/Snakefile rule all: input: "data/peak-year-all-regions.txt" rule download: output: "data/weekly-incidence-all-regions.csv" shell: "wget -O {output} http://www.sentiweb.fr/datasets/incidence-RDD-3.csv" REGIONS = ["AUVERGNE-RHONE-ALPES", "BOURGOGNE-FRANCHE-COMTE", "BRETAGNE", "CENTRE-VAL-DE-LOIRE", "CORSE", "GRAND EST", "HAUTS-DE-FRANCE", "ILE-DE-FRANCE", "NORMANDIE", "NOUVELLE-AQUITAINE", "OCCITANIE", "PAYS-DE-LA-LOIRE", "PROVENCE-ALPES-COTE-D-AZUR"] rule split_by_region: input: "data/weekly-incidence-all-regions.csv" output: expand("data/weekly-incidence-{region}.csv", region=REGIONS) script: "scripts/split-by-region.py" rule preprocess: input: "data/weekly-incidence-{region}.csv" output: data="data/preprocessed-weekly-incidence-{region}.csv", errorlog="data/errors-from-preprocessing-{region}.txt" script: "scripts/preprocess.py" rule plot: input: "data/preprocessed-weekly-incidence-{region}.csv" output: "data/weekly-incidence-plot-{region}.png", "data/weekly-incidence-plot-last-years-{region}.png" script: "scripts/incidence-plots.R" rule annual_incidence: input: "data/preprocessed-weekly-incidence-{region}.csv" output: "data/annual-incidence-{region}.csv" script: "scripts/annual-incidence.R" rule peak_years: input: expand("data/annual-incidence-{region}.csv", region=REGIONS) output: "data/peak-year-all-regions.txt" script: "scripts/peak-years.py" #+end_src Commençons en haut: j'ai mis la règle =all= au début pour pouvoir être paresseux à l'exécution: la simple commande =snakemake= déclenchera l'ensemble des calculs. Et =all=, c'est simplement le fichier qui résume les années du pic maximal pour chaque région. Dans la règle =download=, seul le nom du fichier de données a changé par rapport à avant. J'ai trouvé le nom du fichier "par région" sur le site Web du Réseau Sentinelles. C'est après qu'il y a le plus grand changement: la définition d'une variable =REGIONS=, qui est une liste des 13 régions administratives, dont les noms sont écrits exactement comme dans le fichier des données. On devrait récupérer cette liste du fichier de façon automatique, et je montrerai plus tard comment faire. Pour l'instant, je préfère copier la liste manuellement dans le =Snakefile= afin de ne pas introduire trop de nouveautés d'aun seul coup. La variable =REGIONS= est utilisée immédiatement après, pour définir les fichiers de sortie de la règle =split_by_region=. La fonction =expand= produit une liste des noms de fichier en insérant le nom de la région au bon endroit dans le modèle. Le rôle de la règle =split_by_region= est de découper les données téléchargées en un fichier par région, afin de pouvoir traiter les régions en parallèle et avec les même scripts que nous avons déjà. Le script appliqué par la règle est assez simple: #+begin_src python :exports both :tangle incidence_syndrome_grippal_par_region/scripts/split-by-region.py import os # Read the CSV file into memory data = open(snakemake.input[0], 'rb').read() # Decode the Latin-1 character set, # remove white space at both ends, # and split into lines. lines = data.decode('latin-1') \ .strip() \ .split('\n') # Separate header from data table comment = lines[0] header = lines[1] table = [line.split(',') for line in lines[2:]] # Find all the regions mentioned in the table regions = set(record[-1] for record in table) # Write CSV files for each region for region in regions: filename = 'data/weekly-incidence-' + region + '.csv' with open(filename, 'w') as output_file: # The other scripts expect a comment in the first line, # so write a minimal one to make them happy. output_file.write('#\n') output_file.write(header) output_file.write('\n') for record in table: # Write only the records for right region if record[-1] == region: output_file.write(','.join(record)) output_file.write('\n') #+end_src Avant de continuer, voyons déjà ce que ça donne: #+begin_src sh :session *snakemake2* :results output :exports both snakemake split_by_region #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 download 1 split_by_region 2 [Tue Sep 24 15:11:23 2019] rule download: output: data/weekly-incidence-all-regions.csv jobid: 1 --2019-09-24 15:11:23-- http://www.sentiweb.fr/datasets/incidence-RDD-3.csv Resolving www.sentiweb.fr (www.sentiweb.fr)... 134.157.220.17 Connecting to www.sentiweb.fr (www.sentiweb.fr)|134.157.220.17|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/csv] Saving to: 'data/weekly-incidence-all-regions.csv' 2019-09-24 15:11:23 (10.3 MB/s) - 'data/weekly-incidence-all-regions.csv' saved [1112021] [Tue Sep 24 15:11:23 2019] Finished job 1. ) done [Tue Sep 24 15:11:23 2019] rule split_by_region: input: data/weekly-incidence-all-regions.csv output: data/weekly-incidence-AUVERGNE-RHONE-ALPES.csv, data/weekly-incidence-BOURGOGNE-FRANCHE-COMTE.csv, data/weekly-incidence-BRETAGNE.csv, data/weekly-incidence-CENTRE-VAL-DE-LOIRE.csv, data/weekly-incidence-CORSE.csv, data/weekly-incidence-GRAND EST.csv, data/weekly-incidence-HAUTS-DE-FRANCE.csv, data/weekly-incidence-ILE-DE-FRANCE.csv, data/weekly-incidence-NORMANDIE.csv, data/weekly-incidence-NOUVELLE-AQUITAINE.csv, data/weekly-incidence-OCCITANIE.csv, data/weekly-incidence-PAYS-DE-LA-LOIRE.csv, data/weekly-incidence-PROVENCE-ALPES-COTE-D-AZUR.csv jobid: 0 [Tue Sep 24 15:11:23 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal_par_region/.snakemake/log/2019-09-24T151123.361735.snakemake.log #+end_example Et les fichiers sont bien là où il faut: #+begin_src sh :session *snakemake2* :results output :exports both ls data #+end_src #+RESULTS: #+begin_example weekly-incidence-AUVERGNE-RHONE-ALPES.csv weekly-incidence-BOURGOGNE-FRANCHE-COMTE.csv weekly-incidence-BRETAGNE.csv weekly-incidence-CENTRE-VAL-DE-LOIRE.csv weekly-incidence-CORSE.csv weekly-incidence-GRAND EST.csv weekly-incidence-HAUTS-DE-FRANCE.csv weekly-incidence-ILE-DE-FRANCE.csv weekly-incidence-NORMANDIE.csv weekly-incidence-NOUVELLE-AQUITAINE.csv weekly-incidence-OCCITANIE.csv weekly-incidence-PAYS-DE-LA-LOIRE.csv weekly-incidence-PROVENCE-ALPES-COTE-D-AZUR.csv weekly-incidence-all-regions.csv #+end_example Les trois règles suivantes, =preprocess=, =plot=, et =annual_incidence= sont presques les mêmes qu'avant. Ce qui a changé, c'est la partie =-{region}= dans les noms des fichiers. Il faut interpréter le mot entre les accolades ("region") comme un nom de variable. La règle =preprocess=, par exemple, peut produire tout fichier qui a la forme "data/preprocessed-weekly-incidence-{region}.csv" si on lui donne le fichier "data/weekly-incidence-{region}.csv" avec la même valeur pour ={region}=. Etant donné les fichiers que nous avons obtenu par =split_by_region=, nous pouvons donc demander à snakemake le fichier "data/preprocessed-weekly-incidence-CORSE.csv", et snakemake va appliquer la règle =preprocess= au fichier d'entrée "data/weekly-incidence-CORSE.csv" que nous avons déjà. Faison-le: #+begin_src sh :session *snakemake2* :results output :exports both snakemake data/preprocessed-weekly-incidence-CORSE.csv #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 preprocess 1 [Tue Sep 24 15:11:55 2019] rule preprocess: input: data/weekly-incidence-CORSE.csv output: data/preprocessed-weekly-incidence-CORSE.csv, data/errors-from-preprocessing-CORSE.txt jobid: 0 wildcards: region=CORSE [Tue Sep 24 15:11:55 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal_par_region/.snakemake/log/2019-09-24T151155.206496.snakemake.log #+end_example #+begin_src sh :session *snakemake2* :results output :exports both ls data/preprocessed* #+end_src #+RESULTS: : data/preprocessed-weekly-incidence-CORSE.csv Le script =preprocess.py= n'a d'ailleurs pas changé du tout. Un workflow permet donc de séparer la logistique de la gestion des données du code qui fait les calculs. Le même mécanisme permet de demander l'incidence annuelle pour la Corse: #+begin_src sh :session *snakemake2* :results output :exports both snakemake data/annual-incidence-CORSE.csv #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 annual_incidence 1 [Tue Sep 24 15:12:03 2019] rule annual_incidence: input: data/preprocessed-weekly-incidence-CORSE.csv output: data/annual-incidence-CORSE.csv jobid: 0 wildcards: region=CORSE During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' [Tue Sep 24 15:12:04 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal_par_region/.snakemake/log/2019-09-24T151203.779223.snakemake.log #+end_example Snakemake nous dit d'ailleurs explicitement quelle règle a été appliquée (=annual_incidence=), avec quel fichier d'entrée (=data/preprocessed-weekly-incidence-CORSE.csv=), et avec quel fichier de sortie (=data/annual-incidence-CORSE.csv=). A la fin du workflow, il y a une nouvelle règle, =peak_years=, qui extrait l'année du pic maximal de chaque fichier d'incience annuelle, et produit un fichier résumant ces années par région. Sa seule particularité est la spécification des fichiers d'entrée, qui utilise la fonction =expand= exactement comme on l'a vu pour les fichiers résultats de la règle =split_by_region=. Le script Python associé est assez simple: #+begin_src python :exports both :tangle incidence_syndrome_grippal_par_region/scripts/peak-years.py # Libraries used by this script: import csv # for reading CSV files import os # for path manipulations with open(snakemake.output[0], 'w') as result_file: for filename in snakemake.input: region = '-'.join(os.path.splitext(filename)[0].split('-')[2:]) with open(filename, 'r') as csv_file: csv_reader = csv.reader(csv_file) csv_reader.__next__() peak_year = None peak_incidence = 0 for year, incidence in csv_reader: incidence = int(incidence) if incidence > peak_incidence: peak_incidence = incidence peak_year = year result_file.write(region) result_file.write(', ') result_file.write(peak_year) result_file.write('\n') #+end_src Dans ce workflow, nous avons donc introduit une boucle sur les régions en jouant avec les noms des fichiers. Chaque fichier du workflow précédent a été remplacé par une version "régionalisée", avec le suffix =-{region}= dans le nom. Ce qui déclenche la boucle, c'est la fonction =expand= dans notre =Snakefile=. Le grand avantage d'une telle boucle, par rapport à une boucle standard en Python ou R, est la parallélisation automatique. Sur une machine avec suffisamment de processeurs, toutes les 13 régions seront traitées en même temps. Mon ordinateur portable n'a qu'un processeur à deux coeurs, donc =snakemake= traite seulement deux régions à la foi. Je vais maintenant lancer le calcul total - avec une petite précaution, l'option =-q= ("quiet") qui dit à snakemake d'être moins bavard: #+begin_src sh :session *snakemake2* :results output :exports both snakemake -q #+end_src #+RESULTS: #+begin_example Job counts: count jobs 1 all 12 annual_incidence 1 peak_years 12 preprocess 26 During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' #+end_example En regardant bien le début du rapport que snakemake a fourni, on voit que =preprocess= et =annual_incidence= sont comptés 12 fois: une fois par région, moins la Corse que j'ai déjà traitée à la main. Une fois =all= et =peak_years=, ça a l'air bon. Et le résultat est là: #+begin_src sh :session *snakemake2* :results output :exports both cat data/peak-year-all-regions.txt #+end_src #+RESULTS: #+begin_example AUVERGNE-RHONE-ALPES, 2009 BOURGOGNE-FRANCHE-COMTE, 1986 BRETAGNE, 1996 CENTRE-VAL-DE-LOIRE, 1996 CORSE, 1989 GRAND EST, 2000 HAUTS-DE-FRANCE, 2013 ILE-DE-FRANCE, 1989 NORMANDIE, 1990 NOUVELLE-AQUITAINE, 1989 OCCITANIE, 2013 PAYS-DE-LA-LOIRE, 1989 PROVENCE-ALPES-COTE-D-AZUR, 1986 #+end_example Un dernier détail à noter: la règle =plot= est bien dans mon =Snakefile=, mais elle n'a jamais été appliquée, et il n'y a aucun plot. C'est simplement parce que la règle =all= ne réclame que la production du fichier =data/peak-year-all-regions.txt=. J'aurais pu rajouter les plots, mais je ne l'ai pas fait. Ceci ne m'empêche pas de les demander explicitement: #+begin_src sh :session *snakemake2* :results output :exports both snakemake data/weekly-incidence-plot-last-years-CORSE.png #+end_src #+RESULTS: #+begin_example Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 plot 1 [Tue Sep 24 15:12:46 2019] rule plot: input: data/preprocessed-weekly-incidence-CORSE.csv output: data/weekly-incidence-plot-CORSE.png, data/weekly-incidence-plot-last-years-CORSE.png jobid: 0 wildcards: region=CORSE During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' null device 1 null device 1 [Tue Sep 24 15:12:47 2019] Finished job 0. ) done Complete log: /home/hinsen/projects/RR_MOOC/repos-session02/mooc-rr-ressources/module6/ressources/incidence_syndrome_grippal_par_region/.snakemake/log/2019-09-24T151246.742101.snakemake.log #+end_example [[file:incidence_syndrome_grippal_par_region/data/weekly-incidence-plot-last-years-CORSE.png]] Enfin, je vais tenter de produire le dessin du graphe des tâches, comme je l'ai fait avant pour un workflow nettement plus simple. Voyons... #+begin_src sh :session *snakemake2* :results output :exports both snakemake -q --forceall --dag all | dot -Tpng > graph.png #+end_src #+RESULTS: [[file:incidence_syndrome_grippal_par_region/graph.png]] On voit bien la structure du calcul, y compris le traitement des régions en parallèle. * Cherchez l'erreur! Il y a un point très important que j'ai laissé de côté jusqu'à maintenant pour me concentrer sur la partie technique: comment écrire et exécuter un workflow. Ce point est plutôt de nature méthodologique: il s'agit de la surveillance de possibles erreurs. Dans les documents computationnels du module 3, nous avons exécuté des petits bouts de code à la main et regardé les résultats. Nous l'avons fait pour bien comprendre ce qui se passe, et pour décider comment procéder. Dans un workflow, tout est automatisé, et tous les résultats finissent dans des fichiers que nous pourrions regarder, mais la plupart du temps nous ne les regardons pas. Regarder un plot d'une incidence hebodamadaire sur quelques décennies, c'est intéressant. En regarder 13 qui se ressemblent devient une corvée. Mais si nous ne regardons jamais les résultats, nous pouvons facilement passer à côté d'erreurs insoupçonnées. Et oui, ça arrive dans la vraie vie, même à des chercheurs expérimentés. Il est donc primordial d'introduire des vérifications dans un workflow, et de faciliter l'inspection manuelle des possibles erreurs. Dans ma règle =preprocess=, j'ai écrit les messages d'erreur dans des fichiers spécifiques, sans les melanger avec les résultats comme on ferait dans un document computationnel. Alors regardons-les! #+begin_src sh :session *snakemake2* :results output :exports both for file in data/error* do echo $file cat "$file" echo " " done #+end_src #+RESULTS: #+begin_example data/errors-from-preprocessing-AUVERGNE-RHONE-ALPES.txt Missing data in record ['198919', '3', '0', '', '', '0', '', '', '84', 'AUVERGNE-RHONE-ALPES'] 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 data/errors-from-preprocessing-BOURGOGNE-FRANCHE-COMTE.txt Missing data in record ['200632', '3', '0', '', '', '0', '', '', '27', 'BOURGOGNE-FRANCHE-COMTE'] Missing data in record ['200232', '3', '0', '', '', '0', '', '', '27', 'BOURGOGNE-FRANCHE-COMTE'] Missing data in record ['200227', '3', '0', '', '', '0', '', '', '27', 'BOURGOGNE-FRANCHE-COMTE'] Missing data in record ['200132', '3', '0', '', '', '0', '', '', '27', 'BOURGOGNE-FRANCHE-COMTE'] Missing data in record ['200131', '3', '0', '', '', '0', '', '', '27', 'BOURGOGNE-FRANCHE-COMTE'] Missing data in record ['198919', '3', '0', '', '', '0', '', '', '27', 'BOURGOGNE-FRANCHE-COMTE'] Missing data in record ['198752', '3', '0', '', '', '0', '', '', '27', 'BOURGOGNE-FRANCHE-COMTE'] 14 days, 0:00:00 between 1987-12-14 and 1987-12-28 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 21 days, 0:00:00 between 2001-07-23 and 2001-08-13 14 days, 0:00:00 between 2002-06-24 and 2002-07-08 14 days, 0:00:00 between 2002-07-29 and 2002-08-12 14 days, 0:00:00 between 2006-07-31 and 2006-08-14 data/errors-from-preprocessing-BRETAGNE.txt Missing data in record ['198919', '3', '0', '', '', '0', '', '', '53', 'BRETAGNE'] Missing data in record ['198752', '3', '0', '', '', '0', '', '', '53', 'BRETAGNE'] 14 days, 0:00:00 between 1987-12-14 and 1987-12-28 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 data/errors-from-preprocessing-CENTRE-VAL-DE-LOIRE.txt Missing data in record ['200729', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200728', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200628', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200330', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200329', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200227', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200137', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200136', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200135', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200134', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200133', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200132', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200131', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200130', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['200129', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['199037', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['199036', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['199025', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['199024', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['199023', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['199022', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['199013', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['199012', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['199005', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['199004', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] Missing data in record ['198919', '3', '0', '', '', '0', '', '', '24', 'CENTRE-VAL-DE-LOIRE'] 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 21 days, 0:00:00 between 1990-01-15 and 1990-02-05 21 days, 0:00:00 between 1990-03-12 and 1990-04-02 35 days, 0:00:00 between 1990-05-21 and 1990-06-25 21 days, 0:00:00 between 1990-08-27 and 1990-09-17 70 days, 0:00:00 between 2001-07-09 and 2001-09-17 14 days, 0:00:00 between 2002-06-24 and 2002-07-08 21 days, 0:00:00 between 2003-07-07 and 2003-07-28 14 days, 0:00:00 between 2006-07-03 and 2006-07-17 21 days, 0:00:00 between 2007-07-02 and 2007-07-23 data/errors-from-preprocessing-CORSE.txt Missing data in record ['200544', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200543', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200542', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200541', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200540', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200539', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200538', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200537', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200536', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200535', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200534', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200533', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200532', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200531', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200530', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200529', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200528', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200527', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200526', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200525', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200524', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200523', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200522', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200521', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200520', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200519', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200518', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200517', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200516', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200515', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200514', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200513', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200512', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200511', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200510', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200509', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200508', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200507', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200506', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200505', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200504', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200503', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200502', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200501', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200453', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200452', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200451', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200450', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200449', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200448', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200447', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200446', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200445', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200444', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200443', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200442', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200441', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200440', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200439', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200438', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200437', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200436', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200435', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200434', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200433', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200432', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200431', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200430', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200429', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200428', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200427', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200426', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200425', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200424', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200423', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200422', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200421', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200420', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200419', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200418', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200417', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200416', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200415', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200414', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200413', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200412', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200411', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200410', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200409', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200401', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200352', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200344', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200343', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200342', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200341', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200340', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200339', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200338', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200337', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200336', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200335', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200334', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200333', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200332', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200331', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200330', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200329', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200328', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200327', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200326', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200325', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200324', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200323', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200322', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200321', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200320', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200319', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200318', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200317', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200316', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200315', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200314', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200313', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200312', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200311', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200310', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200309', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200308', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200307', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200306', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200305', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200304', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200303', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200302', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200301', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200252', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200251', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200250', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200249', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200248', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200247', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200246', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200245', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200244', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200243', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200242', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200241', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200240', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200239', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200238', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200237', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200236', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200235', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200234', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200233', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200232', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200231', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200230', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200229', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200228', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200227', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200226', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200225', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200224', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200223', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200222', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200221', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200220', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200219', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200218', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200217', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200216', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200215', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200214', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200213', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200212', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200211', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200210', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200209', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200208', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200207', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200206', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200205', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200204', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200203', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200202', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200201', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200152', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200151', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200150', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200149', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200148', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200147', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200146', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200145', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200144', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200143', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200142', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200141', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200140', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200139', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200138', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200137', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200136', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200135', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200134', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200133', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200132', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200131', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200130', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200129', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200128', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200127', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200126', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200125', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200124', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200123', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200122', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200121', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200120', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200119', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200118', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200117', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200116', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200115', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200114', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200113', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200112', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200111', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200110', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200109', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200108', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200107', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200106', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200105', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200104', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200103', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200102', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200101', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200052', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200051', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200050', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200049', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200048', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200047', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200046', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200045', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200044', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200043', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200042', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200041', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200040', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200039', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200038', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200037', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200036', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200035', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200034', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200033', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200032', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200031', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200030', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200029', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200028', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200027', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200026', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200025', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200024', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200023', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200022', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200021', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200020', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200019', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200018', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200017', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200016', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200015', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200014', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200013', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200012', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200011', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200010', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200009', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200008', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200007', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200006', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200005', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200004', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200003', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200002', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['200001', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199952', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199951', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199950', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199949', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199948', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199947', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199946', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199945', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199944', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199943', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199942', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199941', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199940', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199939', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199938', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199937', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199936', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199935', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199934', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199933', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199932', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199931', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199930', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199929', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199928', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199927', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199926', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199925', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199924', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199923', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199922', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199921', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199920', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199919', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199918', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199917', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199916', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199915', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199914', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199913', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199912', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199911', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199910', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199909', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199908', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199907', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199906', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199905', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199904', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199903', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199902', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199901', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199853', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199852', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199851', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199850', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199849', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199848', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199847', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199846', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199845', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199844', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199843', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199842', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199841', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199840', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199839', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199838', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199837', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199836', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199835', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199834', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199833', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199832', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199831', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199830', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199829', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199828', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199827', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199826', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199825', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199824', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199823', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199822', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199821', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199820', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199819', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199818', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199817', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199816', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199815', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199814', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199813', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199812', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199811', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199810', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199809', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199808', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199807', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199806', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199805', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199804', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199803', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199802', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199801', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199752', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199751', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199750', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199749', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199748', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199747', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199746', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199745', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199744', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199743', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199742', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199741', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199740', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199739', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199738', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199737', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199736', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199735', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199734', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199733', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199732', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199731', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199730', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199729', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199728', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199727', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199726', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199725', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199724', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199723', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199722', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199721', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199720', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199719', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199718', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199717', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199716', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199715', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199714', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199713', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199712', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199711', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199710', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199709', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199708', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199707', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199706', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199705', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199704', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199646', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199645', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199644', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199643', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199642', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199641', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199640', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199639', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199638', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199637', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199636', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199635', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199634', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199633', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199632', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199631', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199630', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199629', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199628', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199627', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199626', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199625', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199624', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199623', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199622', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199621', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199620', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199619', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199618', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199617', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199616', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199615', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199612', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199611', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199607', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199606', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199605', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199552', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199544', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199543', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199540', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199539', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199538', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199537', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199536', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199535', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199534', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199533', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199532', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199531', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199530', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199529', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199528', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199527', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199526', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199525', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199502', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199441', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199423', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199422', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199330', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199329', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199328', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199327', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199326', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199323', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199322', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199321', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199320', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199241', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199228', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199210', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199209', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199208', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199207', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199142', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199141', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199140', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199135', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199110', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199109', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199101', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199048', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199045', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199044', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199040', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199039', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199038', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199035', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199034', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199033', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199021', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199020', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['199019', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198943', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198935', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198934', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198933', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198927', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198919', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198918', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198836', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198816', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198814', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198752', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198751', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198748', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198747', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198746', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198745', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198744', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198743', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198739', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198738', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198737', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198736', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198735', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198734', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198728', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198727', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198726', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198725', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198724', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198723', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198722', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198721', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198720', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198719', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198718', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198717', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198716', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198715', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198712', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198709', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198706', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198703', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198702', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198701', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198652', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198638', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198637', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198636', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198635', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198632', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198631', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198630', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198629', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198628', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198627', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198626', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198625', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198624', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198623', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198622', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198621', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198620', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198619', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198618', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198617', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198616', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198615', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198614', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198613', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198612', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198611', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198610', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198609', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198608', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198607', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198606', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198605', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198604', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198603', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198602', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198601', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198552', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198551', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198550', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198549', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198536', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198535', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198534', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198533', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198532', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198515', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198514', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198513', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198503', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198502', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198501', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198452', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198451', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198450', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198449', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198448', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198447', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198446', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198445', '3', '0', '', '', '0', '', '', '94', 'CORSE'] Missing data in record ['198444', '3', '0', '', '', '0', '', '', '94', 'CORSE'] 28 days, 0:00:00 between 1985-03-18 and 1985-04-15 42 days, 0:00:00 between 1985-07-29 and 1985-09-09 259 days, 0:00:00 between 1985-11-25 and 1986-08-11 35 days, 0:00:00 between 1986-08-18 and 1986-09-22 35 days, 0:00:00 between 1986-12-15 and 1987-01-19 14 days, 0:00:00 between 1987-01-26 and 1987-02-09 14 days, 0:00:00 between 1987-02-16 and 1987-03-02 14 days, 0:00:00 between 1987-03-09 and 1987-03-23 105 days, 0:00:00 between 1987-03-30 and 1987-07-13 49 days, 0:00:00 between 1987-08-10 and 1987-09-28 49 days, 0:00:00 between 1987-10-12 and 1987-11-30 21 days, 0:00:00 between 1987-12-07 and 1987-12-28 14 days, 0:00:00 between 1988-03-28 and 1988-04-11 14 days, 0:00:00 between 1988-04-11 and 1988-04-25 14 days, 0:00:00 between 1988-08-29 and 1988-09-12 21 days, 0:00:00 between 1989-04-24 and 1989-05-15 14 days, 0:00:00 between 1989-06-26 and 1989-07-10 28 days, 0:00:00 between 1989-08-07 and 1989-09-04 14 days, 0:00:00 between 1989-10-16 and 1989-10-30 28 days, 0:00:00 between 1990-04-30 and 1990-05-28 28 days, 0:00:00 between 1990-08-06 and 1990-09-03 28 days, 0:00:00 between 1990-09-10 and 1990-10-08 21 days, 0:00:00 between 1990-10-22 and 1990-11-12 14 days, 0:00:00 between 1990-11-19 and 1990-12-03 14 days, 0:00:00 between 1990-12-24 and 1991-01-07 21 days, 0:00:00 between 1991-02-18 and 1991-03-11 14 days, 0:00:00 between 1991-08-19 and 1991-09-02 28 days, 0:00:00 between 1991-09-23 and 1991-10-21 35 days, 0:00:00 between 1992-02-03 and 1992-03-09 14 days, 0:00:00 between 1992-06-29 and 1992-07-13 14 days, 0:00:00 between 1992-09-28 and 1992-10-12 35 days, 0:00:00 between 1993-05-10 and 1993-06-14 42 days, 0:00:00 between 1993-06-21 and 1993-08-02 21 days, 0:00:00 between 1994-05-23 and 1994-06-13 14 days, 0:00:00 between 1994-10-03 and 1994-10-17 14 days, 0:00:00 between 1995-01-02 and 1995-01-16 119 days, 0:00:00 between 1995-06-12 and 1995-10-09 21 days, 0:00:00 between 1995-10-16 and 1995-11-06 14 days, 0:00:00 between 1995-12-18 and 1996-01-01 28 days, 0:00:00 between 1996-01-22 and 1996-02-19 21 days, 0:00:00 between 1996-03-04 and 1996-03-25 231 days, 0:00:00 between 1996-04-01 and 1996-11-18 2485 days, 0:00:00 between 1997-01-13 and 2003-11-03 21 days, 0:00:00 between 2003-12-15 and 2004-01-05 630 days, 0:00:00 between 2004-02-16 and 2005-11-07 data/errors-from-preprocessing-GRAND EST.txt Missing data in record ['200352', '3', '0', '', '', '0', '', '', '44', 'GRAND EST'] Missing data in record ['198919', '3', '0', '', '', '0', '', '', '44', 'GRAND EST'] 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 14 days, 0:00:00 between 2003-12-15 and 2003-12-29 data/errors-from-preprocessing-HAUTS-DE-FRANCE.txt Missing data in record ['200631', '3', '0', '', '', '0', '', '', '32', 'HAUTS-DE-FRANCE'] Missing data in record ['198919', '3', '0', '', '', '0', '', '', '32', 'HAUTS-DE-FRANCE'] Missing data in record ['198444', '3', '0', '', '', '0', '', '', '32', 'HAUTS-DE-FRANCE'] 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 14 days, 0:00:00 between 2006-07-24 and 2006-08-07 data/errors-from-preprocessing-ILE-DE-FRANCE.txt Missing data in record ['198919', '3', '0', '', '', '0', '', '', '11', 'ILE-DE-FRANCE'] 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 data/errors-from-preprocessing-NORMANDIE.txt Missing data in record ['198919', '3', '0', '', '', '0', '', '', '28', 'NORMANDIE'] Missing data in record ['198752', '3', '0', '', '', '0', '', '', '28', 'NORMANDIE'] 14 days, 0:00:00 between 1987-12-14 and 1987-12-28 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 data/errors-from-preprocessing-NOUVELLE-AQUITAINE.txt Missing data in record ['198919', '3', '0', '', '', '0', '', '', '75', 'NOUVELLE-AQUITAINE'] 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 data/errors-from-preprocessing-OCCITANIE.txt Missing data in record ['200352', '3', '0', '', '', '0', '', '', '76', 'OCCITANIE'] Missing data in record ['198919', '3', '0', '', '', '0', '', '', '76', 'OCCITANIE'] 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 14 days, 0:00:00 between 2003-12-15 and 2003-12-29 data/errors-from-preprocessing-PAYS-DE-LA-LOIRE.txt Missing data in record ['200632', '3', '0', '', '', '0', '', '', '52', 'PAYS-DE-LA-LOIRE'] Missing data in record ['200534', '3', '0', '', '', '0', '', '', '52', 'PAYS-DE-LA-LOIRE'] Missing data in record ['200533', '3', '0', '', '', '0', '', '', '52', 'PAYS-DE-LA-LOIRE'] Missing data in record ['200336', '3', '0', '', '', '0', '', '', '52', 'PAYS-DE-LA-LOIRE'] Missing data in record ['200335', '3', '0', '', '', '0', '', '', '52', 'PAYS-DE-LA-LOIRE'] Missing data in record ['200329', '3', '0', '', '', '0', '', '', '52', 'PAYS-DE-LA-LOIRE'] Missing data in record ['200328', '3', '0', '', '', '0', '', '', '52', 'PAYS-DE-LA-LOIRE'] Missing data in record ['200238', '3', '0', '', '', '0', '', '', '52', 'PAYS-DE-LA-LOIRE'] Missing data in record ['200237', '3', '0', '', '', '0', '', '', '52', 'PAYS-DE-LA-LOIRE'] Missing data in record ['198919', '3', '0', '', '', '0', '', '', '52', 'PAYS-DE-LA-LOIRE'] 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 21 days, 0:00:00 between 2002-09-02 and 2002-09-23 21 days, 0:00:00 between 2003-06-30 and 2003-07-21 21 days, 0:00:00 between 2003-08-18 and 2003-09-08 21 days, 0:00:00 between 2005-08-08 and 2005-08-29 14 days, 0:00:00 between 2006-07-31 and 2006-08-14 data/errors-from-preprocessing-PROVENCE-ALPES-COTE-D-AZUR.txt Missing data in record ['198919', '3', '0', '', '', '0', '', '', '93', 'PROVENCE-ALPES-COTE-D-AZUR'] Missing data in record ['198444', '3', '0', '', '', '0', '', '', '93', 'PROVENCE-ALPES-COTE-D-AZUR'] 14 days, 0:00:00 between 1989-05-01 and 1989-05-15 #+end_example Sans inspecter ces messages, auriez-vous soupçonné qu'il manquent tant de données pour la Corse, par exemple? Et si je n'avais pas fait attention à les rendre faciles à inspecter, l'auriez-vous fait quand-même? On peut d'ailleurs se demander comment il est possible d'avoir tant de points manquants dans les données régionales, mais un seul point manquant dans les données nationales qui devraient, en théorie, être simplement la somme sur les régions. Mais c'est une question que seul le Réseau Sentinelles peut répondre. * Les limites des workflows pilotés par les fichiers Snakemake fait partie d'une grande famille de gestionnaires de workflow dont l'ancêtre commun est l'outil =make= qui date de 1976. Le principe de fonctionnement de cette famille est que l'exécution des tâches est pilotée par les noms et les dates de modification des fichiers qui contiennent les données. C'est un principe simple qui permet beaucoup de fléxibilité dans son application, comme par exemple la parallélisation automatique. Mais ce principe a aussi des limitations, et nous en avons rencontré une: la nécessité de fournir une liste explicite des régions dans notre =Snakefile=. À priori, la fichier de données téléchargé contient la liste exhaustive des régions qu'il faut traiter. On s'attend donc à pouvoir simplement dire "je veux appliquer mes règles à toutes les régions réferencées dans ce fichier". On s'attend bien sûr aussi à devoir fournir du code pour extraire la liste des région dudit fichier. Mais le problème est plus fondamental. Snakemake utilise le =Snakefile= et les fichiers de données réferencées par celui-ci pour déduire quelles tâches il faut exécuter et dans quel ordre. La liste des tâches ne peut donc pas dépendre du /contenu/ d'un tel fichier, et encore moins du contenu d'un fichier qui n'est même pas disponible avant l'exécution de la toute première tâche qui est =download=! Autrement dit, la limitation fondamentale de snakemake est le fait que le graphe de dépendance des tâches est établi avant toute exécution. Les versions récentes de Snakemake proposent une façon de contourner cette limitation, et je vais vous la montrer. Le principe est qu'il faut dire à Snakemake de reconstruire le graphe des tâches après en avoir exécuté certaines. Je vais d'abord créer un nouveau repertoire pour ce troisième workflow, et copier les scripts du deuxième qui ne seront pas modifiés: #+begin_src sh :session *snakemake3* :results output :exports both # déjà fait: mkdir incidence_syndrome_grippal_par_region_v2 cd incidence_syndrome_grippal_par_region_v2 # déjà fait: mkdir data # déjà fait: mkdir scripts cp -r ../incidence_syndrome_grippal_par_region/scripts/*.R ./scripts/ cp -r ../incidence_syndrome_grippal_par_region/scripts/preprocess.py ./scripts/ cp -r ../incidence_syndrome_grippal_par_region/scripts/peak-years.py ./scripts/ #+end_src #+RESULTS: Le =Snakefile= commence avec deux règles non modifiées: #+begin_src :exports both :tangle incidence_syndrome_grippal_par_region_v2/Snakefile rule all: input: "data/peak-year-all-regions.txt" rule download: output: "data/weekly-incidence-all-regions.csv" shell: "wget -O {output} http://www.sentiweb.fr/datasets/incidence-RDD-3.csv" #+end_src La règle =split_by_region= devient un "checkpoint", ce qui veut dire que Snakemake reconstruit son graphe de tâches /après/ son exécution: #+begin_src :exports both :tangle incidence_syndrome_grippal_par_region_v2/Snakefile checkpoint split_by_region: input: "data/weekly-incidence-all-regions.csv" output: directory("data/weekly-incidence-by-region") script: "scripts/split-by-region.py" #+end_src La particularité d'un checkpoint est que ses fichiers de sortie ne sont pas connus d'avance. On donne donc seulement le nom d'un répertoire. C'est le répertoire entier qui est consideré le résultat de la tâche. C'est donc le script qui doit le créer: #+begin_src python :exports both :tangle incidence_syndrome_grippal_par_region_v2/scripts/split-by-region.py import os # Read the CSV file into memory data = open(snakemake.input[0], 'rb').read() # Decode the Latin-1 character set, # remove white space at both ends, # and split into lines. lines = data.decode('latin-1') \ .strip() \ .split('\n') # Separate header from data table comment = lines[0] header = lines[1] table = [line.split(',') for line in lines[2:]] # Find all the regions mentioned in the table regions = set(record[-1] for record in table) # Create the output directory directory = snakemake.output[0] if not os.path.exists(directory): os.makedirs(directory) # Write CSV files for each region for region in regions: filename = os.path.join(directory, region + '.csv') with open(filename, 'w') as output_file: # The other scripts expect a comment in the first line, # so write a minimal one to make them happy. output_file.write('#\n') output_file.write(header) output_file.write('\n') for record in table: # Write only the records for right region if record[-1] == region: output_file.write(','.join(record)) output_file.write('\n') #+end_src Cette réorganisation des fichiers nécessite une petite modification des entrées de la règle =preprocess=: #+begin_src :exports both :tangle incidence_syndrome_grippal_par_region_v2/Snakefile rule preprocess: input: "data/weekly-incidence-by-region/{region}.csv" output: data="data/preprocessed-weekly-incidence-{region}.csv", errorlog="data/errors-from-preprocessing-{region}.txt" script: "scripts/preprocess.py" #+end_src Mais rien ne change pour les deux règles suivantes: #+begin_src :exports both :tangle incidence_syndrome_grippal_par_region_v2/Snakefile rule plot: input: "data/preprocessed-weekly-incidence-{region}.csv" output: "data/weekly-incidence-plot-{region}.png", "data/weekly-incidence-plot-last-years-{region}.png" script: "scripts/incidence-plots.R" rule annual_incidence: input: "data/preprocessed-weekly-incidence-{region}.csv" output: "data/annual-incidence-{region}.csv" script: "scripts/annual-incidence.R" #+end_src Enfin, c'est la règle =peak_years= qui doit changer parce qu'elle doit construire la liste des fichiers d'entrées à partir des sorties du checkpoint =split_by_regions=. Ceci nécessite du code, mais snakemake permet de définir des fonctions Python dans le =Snakefile=: #+begin_src :exports both :tangle incidence_syndrome_grippal_par_region_v2/Snakefile def annual_incidence_files(wildcards): directory = checkpoints.split_by_region.get().output[0] pattern = os.path.join(directory, "{region}.csv") return expand("data/annual-incidence-{region}.csv", region=glob_wildcards(pattern).region) rule peak_years: input: annual_incidence_files output: "data/peak-year-all-regions.txt" script: "scripts/peak-years.py" #+end_src Ceci nécessite quelques explications. Sous "input", j'ai donné le nom d'une fonction à la place d'une liste de fichiers. Cette fonction retourne la liste dont snakemake a besoin. Pour construire la liste, elle utilise =expand= que nous avons déjà vu avant. Mais la liste des noms de région n'est plus une constante définie dans le =Snakefile=. Elle est le résultat d'une recherche de fichiers qui correspondent à "{region}.csv", dans le répertoire peuplé par la règle =split_by_region=. Nous voyons donc que le prix à payer pour laisser le workflow trouver la liste des régions est une plus grande complexité du workflow. Il faut décider au cas par cas si c'est réellement avantageux. Pour terminer, lançons ce troisième workflow et comparons s'il donne les mêmes résultats que le deuxième! #+begin_src sh :session *snakemake3* :results output :exports both snakemake -q #+end_src #+RESULTS: #+begin_example Job counts: count jobs 1 all 1 download 1 peak_years 1 split_by_region 4 --2019-09-24 15:55:01-- http://www.sentiweb.fr/datasets/incidence-RDD-3.csv Resolving www.sentiweb.fr (www.sentiweb.fr)... 134.157.220.17 Connecting to www.sentiweb.fr (www.sentiweb.fr)|134.157.220.17|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/csv] Saving to: 'data/weekly-incidence-all-regions.csv' 2019-09-24 15:55:01 (8.35 MB/s) - 'data/weekly-incidence-all-regions.csv' saved [1112021] During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" Warning message: Y-%m-%d", tz = "GMT") : unknown timezone 'zone/tz/2019b.1.0/zoneinfo/Europe/Paris' #+end_example #+begin_src sh :session *snakemake3* :results output :exports both cat data/peak-year-all-regions.txt #+end_src #+RESULTS: #+begin_example NORMANDIE, 1990 BRETAGNE, 1996 GRAND EST, 2000 NOUVELLE-AQUITAINE, 1989 OCCITANIE, 2013 CORSE, 1989 PAYS-DE-LA-LOIRE, 1989 HAUTS-DE-FRANCE, 2013 BOURGOGNE-FRANCHE-COMTE, 1986 AUVERGNE-RHONE-ALPES, 2009 CENTRE-VAL-DE-LOIRE, 1996 ILE-DE-FRANCE, 1989 PROVENCE-ALPES-COTE-D-AZUR, 1986 #+end_example