---
title: MOOC, Reproducible Research
author: Thomas Rushton
---
# MOOC: Reproducible Research
- [22/03/2024](#22/03/2024)
- [26/03/2024](#26/03/2024)
# 22/03/2024
I started this MOOC a few days ago (18/03) and this is my first day tackling
any of the practical exercises.
I've been using VCS for many years, mostly GitHub and a bit of self-hosted git,
and rarely collaboratively.
You *can* teach an old dog new tricks, and it's interesting to get some new
perspectives on otherwise familiar tools.
## Thoughts/Musings
- Some of the exercises in module 1 are posed quite ambiguously;
- I'm looking forward to getting to the part about org-mode;
- I've been meaning to cultivate a proper note-taking system/personal
knowledge-base for a couple of years, since I first became aware of
Obsidian;
- I really didn't know that scholars were using using XML (well, TEI) to make
linguistic/historical/literary annotations, but, well, that makes sense;
- Markdown comments, containing delimited labels/tags — now, there's a
thought;
- Here follows a comment that looks ``
;
- [Apparently](https://stackoverflow.com/a/4829998) Pandoc prefers the
triple-hyphen;
- I thought markdown comments had to be some awful thing like
`[//] # (this is a comment)`;
- Anyway, here's an example: ``;
- And now there's a(n invisible) tag against which to search this document.
## Interesting links
- [Sustainable Authorship in Plain Text using Pandoc and Markdown](https://programminghistorian.org/en/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown)
I genuinely had no idea about Pandoc prior to this MOOC, and converting from
markdown to PDF is pretty cool.
Despite years working with markdown for READMEs and documentation, I didn't know
that one can add a YAML metadata block to the top of a markdown file.
I recently learnt about markdown footnotes, but I didn't know you could cite
entries in a .bib file.
Typora supports shorthand links, e.g. ``; I wonder whether
other platforms do too; let's give it a try:
- [The Measure of All Things](https://www.amazon.fr/Measure-All-Things-Seven-Year-Transformed/dp/0743216768)
Who would've thought that a book about white dudes on ships (presumably
measuring stuff) would be held in such high esteem?
- [Maintaining a laboratory notebook](https://colinpurrington.com/tips/lab-notebooks/)
Colin Purrington's guide is pretty rad.
Probably a bit too hard-science to be of relevance to my work, but you never
know.
If I can incorporate even 25% of the rigour that's described there into my own
practice, I'll be in a much better place than I have been to-date.
- [Transcribing medieval manuscripts with TEI](https://andrewdunning.ca/transcribing-medieval-manuscripts-tei)
(See above.)
Chances are I'll never (need to) use TEI, but I'm into it.
## Some software I wasn't previously aware of
### Pandoc
Markdown to PDF? To HTML? With metadata-awareness? Yes please.
One issue... Pandoc doesn't appear to like display-math blocks containing
nested environments, e.g.
```markdown
$$
\begin{align}
e^{i\pi} = -1
\end{align}
$$
```
GitHub doesn't appear to have a problem with that sort of action.
What about GitLab?..
$$
\begin{align}
e^{i\pi} = -1
\end{align}
$$
### DocFetcher
Not sure I'll use it, but then again there have been various times when I've had
to resort to trying `find` or `locate` at the (Mac) command line, and it's been
painful.
### ExifTool
Perhaps I *should* be adding metadata to my images and audio files.
```shell
exiftool -[comment|notes]=":mylabel:" img.jpg
```
Worth remembering that, in addition to EXIF,
[XMP](https://en.wikipedia.org/wiki/Extensible_Metadata_Platform) exists.
# 26/03/2024
Working my way through Module 2.
## Reproducibility Problems
- [Reinhart & Rogoff](https://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt):
Growth in a Time of Debt
In short, the basis for the economic orthodoxy with regard to _austerity_ in the
wake of the 2008/9 financial crisis.
Ultimately based the insubstantiable assertion that national debt exceeding 90%
of GDP has "[dramatic consequences for growth]".
By the time their dubious data-handling and slipshod statistical practice had
been discovered, their conclusions were already in the hands of conservative
economic policymakers.
- Chang et al. and the database column-swap
[debacle](https://people.ligo-wa.caltech.edu/~michael.landry/calibration/S5/getsignright.pdf)
Papers had to be retracted.
Sure, methodological problems, but driven by _sociological/cultural_ ones;
high productivist pressure — _publish or perish_.
The real problem is the risk of a lack of rigour and transparency...
## Why is reproducibility difficult?
- Lack of info leading to inability to replicate decisions made by original
researchers.
- Profusion of errors caused by the user of computers;
- computers permit us to go further and faster, but also to make errors more
readily and rapidly;
- there's also the black-box effect of proprietary software, and the daftness
of opinionated design decisions being confused for helpful ones, e.g.
"MARCH1" and "2310009E13" being interpreted by Excel as a date and a very
large number respectively.
- Lack of rigour and organisation;
- no VCS, manual file-naming conventions, etc.;
- no code-review or continuous integration.
- And, as ever, cultural/social issues;
- an article can be (uncharitably) described as an advert for the _real_ work
of research and result-gathering, but why?;
- well, perhaps we feel like we could have been more rigorous, so we take a
few liberties with documenting our work, we get selective with our results,
etc.;
- ultimately we don't wish to suffer embarrassment or humiliation, or (worse
still!) miss an opportunity to publish;
- the irony being that we'd be better-placed to publish if we were open,
transparent, etc.; but we're far from alone in all this.