Table of Contents

Getting information about your Git repository

When taking notes, it may be difficult to remember which version of the code or of a file was used. This is what version control is useful for. Here are a few useful commands that we typically insert at the top of our notebooks in shell cells

git log -1
commit 741b0088af5b40588493c23c46d6bab5d0adeb33
Author: Arnaud Legrand <arnaud.legrand@imag.fr>
Date:   Tue Sep 4 12:45:43 2018 +0200

    Fix a few typos and provide information on jupyter-git plugins.

git status -u
On branch master
Your branch is ahead of 'origin/master' by 4 commits.
  (use "git push" to publish your local commits)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   resources.org

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	../../module2/ressources/replicable_article/IEEEtran.bst
	../../module2/ressources/replicable_article/IEEEtran.cls
	../../module2/ressources/replicable_article/article.bbl
	../../module2/ressources/replicable_article/article.tex
	../../module2/ressources/replicable_article/data.csv
	../../module2/ressources/replicable_article/figure.pdf
	../../module2/ressources/replicable_article/logo.png
	.#resources.org

no changes added to commit (use "git add" and/or "git commit -a")

Note: the -u indicates that git should also display the contents of new directories it did not previously know about.

Then, I often include commands at the end of my notebook indicating how to commit the results (adding the new files, committing with a clear message and pushing). E.g.,

git add resources.org;
git commit -m "Completing the section on getting Git information"
git push
[master 514fe2c1 ] Completing the section on getting Git information
 1 file changed, 61 insertions(+)
Counting objects: 25, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (20/20), done.
Writing objects: 100% (25/25), 7.31 KiB | 499.00 KiB/s, done.
Total 25 (delta 11), reused 0 (delta 0)
To ssh://app-learninglab.inria.fr:9418/learning-lab/mooc-rr-ressources.git
   6359f8c..1f8a567  master -> master

Obviously, in this case you need to save the notebook before running this cell, hence the output of this final command (with the new git hash) will not be stored in the cell. This is not really a problem and is the price to pay for running git from within the notebook itself.

Getting information about Python(3) libraries

Getting the list of installed packages and their version

This topic is discussed on StackOverflow. When using pip (the Python package installer) within a shell command, it is easy to query the version of all installed packages (note that on your system, you may have to use either pip or pip3 depending on how it is named and which versions of Python are available on your machine.

pip3 freeze
asn1crypto==0.24.0
attrs==17.4.0
bcrypt==3.1.4
beautifulsoup4==4.6.0
bleach==2.1.3
...
pandas==0.22.0
pandocfilters==1.4.2
paramiko==2.4.0
patsy==0.5.0
pexpect==4.2.1
...
traitlets==4.3.2
tzlocal==1.5.1
urllib3==1.22
wcwidth==0.1.7
webencodings==0.5

Once you know which packages are installed, you can easily get additional information about a given package and in particular check whether it was installed "locally" through pip or whether it is installed system-wide. Again, in a shell command:

pip3 show pandas
echo "            "
pip3 show statsmodels
Name: pandas
Version: 0.22.0
Summary: Powerful data structures for data analysis, time series,and statistics
Home-page: http://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: /usr/lib/python3/dist-packages
Requires: 
            
Name: statsmodels
Version: 0.9.0
Summary: Statistical computations and models for Python
Home-page: http://www.statsmodels.org/
Author: None
Author-email: None
License: BSD License
Location: /home/alegrand/.local/lib/python3.6/site-packages
Requires: patsy, pandas

How to list imported modules?

Without resorting to pip (that will list all available packages), you may want to know which modules are loaded in a Python session as well as their version. Inspiring from StackOverflow, here is a simple function that lists loaded package (that have a __version__ attribute, which is unfortunately not completely standard).

def print_imported_modules():
    import sys
    for name,val in sorted(sys.modules.items()):
        if(hasattr(val, '__version__')): 
            print(val.__name__, val.__version__)

print("**** Package list in the beginning ****");
print_imported_modules()
print("**** Package list after loading pandas ****");
import pandas
print_imported_modules()

**** Package list in the beginning ****
**** Package list after loading pandas ****
_csv 1.0
_ctypes 1.1.0
decimal 1.70
argparse 1.1
csv 1.0
ctypes 1.1.0
cycler 0.10.0
dateutil 2.7.3
decimal 1.70
distutils 3.6.5rc1
ipaddress 1.0
json 2.0.9
logging 0.5.1.2
matplotlib 2.1.1
numpy 1.14.5
numpy.core 1.14.5
numpy.core.multiarray 3.1
numpy.core.umath b'0.4.0'
numpy.lib 1.14.5
numpy.linalg._umath_linalg b'0.1.5'
pandas 0.22.0
_libjson 1.33
platform 1.0.8
pyparsing 2.2.0
pytz 2018.5
re 2.2.1
six 1.11.0
urllib.request 3.6
zlib 1.0

Setting up an environment with pip

The easiest way to go is as follows:

pip3 freeze > requirements.txt # to obtain the list of packages with their version
pip3 install -r requirements.txt # to install the previous list of packages, possibly on an other machine

If you want to have several installed Python environments, you may want to use Pipenv. I doubt it allows to track correctly FORTRAN or C dynamic libraries that are wrapped by Python though.

Getting information about R libraries

The best way seems to be to rely on the devtools package.

sessionInfo()
devtools::session_info()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux buster/sid

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.5.1
Session info ------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  fr_FR.UTF-8                 
 tz       Europe/Paris                
 date     2018-08-01                  

Packages ----------------------------------------------------------------------
 package   * version date       source        
 base      * 3.5.1   2018-07-02 local         
 compiler    3.5.1   2018-07-02 local         
 datasets  * 3.5.1   2018-07-02 local         
 devtools    1.13.6  2018-06-27 CRAN (R 3.5.1)
 digest      0.6.15  2018-01-28 CRAN (R 3.5.0)
 graphics  * 3.5.1   2018-07-02 local         
 grDevices * 3.5.1   2018-07-02 local         
 memoise     1.1.0   2017-04-21 CRAN (R 3.5.1)
 methods   * 3.5.1   2018-07-02 local         
 stats     * 3.5.1   2018-07-02 local         
 utils     * 3.5.1   2018-07-02 local         
 withr       2.1.2   2018-03-15 CRAN (R 3.5.0)

Some actually advocate that writing a reproducible research compendium can be done by writing an R package. Those of you willing to have a clean R dependency management should thus have a look at Packrat.

Finally, it is good to know that there is a built-in R command (installed.packages) allowing to retrieve and list the details of all packages installed.

head(installed.packages())
Package LibPath Version Priority Depends Imports LinkingTo Suggests Enhances License LicenseisFOSS Licenserestrictsuse OStype MD5sum NeedsCompilation Built  
BH /home/alegrand/R/x8664-pc-linux-gnu-library/3.5 1.66.0-1 nil nil nil nil nil nil BSL-1.0 nil nil nil nil no 3.5.1  
Formula /home/alegrand/R/x8664-pc-linux-gnu-library/3.5 1.2-3 nil R (>= 2.0.0), stats nil nil nil nil GPL-2 GPL-3 nil nil nil nil no 3.5.1
Hmisc /home/alegrand/R/x8664-pc-linux-gnu-library/3.5 4.1-1 nil lattice, survival (>= 2.40-1), Formula, ggplot2 (>= 2.2) methods, latticeExtra, cluster, rpart, nnet, acepack, foreign,                      
gtable, grid, gridExtra, data.table, htmlTable (>= 1.11.0),                                
viridis, htmltools, base64enc nil chron, rms, mice, tables, knitr, ff, ffbase, plotly (>=                            
4.5.6) nil GPL (>= 2) nil nil nil nil yes 3.5.1                
Matrix /home/alegrand/R/x8664-pc-linux-gnu-library/3.5 1.2-14 recommended R (>= 3.2.0) methods, graphics, grid, stats, utils, lattice nil expm, MASS MatrixModels, graph, SparseM, sfsmisc GPL (>= 2) file LICENCE nil nil nil nil yes 3.5.1
StanHeaders /home/alegrand/R/x8664-pc-linux-gnu-library/3.5 2.17.2 nil nil nil nil RcppEigen, BH nil BSD3clause + file LICENSE nil nil nil nil yes 3.5.1  
acepack /home/alegrand/R/x8664-pc-linux-gnu-library/3.5 1.4.1 nil nil nil nil testthat nil MIT + file LICENSE nil nil nil nil yes 3.5.1