--- title: MOOC, Reproducible Research author: Thomas Rushton --- # MOOC: Reproducible Research - [22/03/2024](#22/03/2024) - [26/03/2024](#26/03/2024) # 22/03/2024 I started this MOOC a few days ago (18/03) and this is my first day tackling any of the practical exercises. I've been using VCS for many years, mostly GitHub and a bit of self-hosted git, and rarely collaboratively. You *can* teach an old dog new tricks, and it's interesting to get some new perspectives on otherwise familiar tools. ## Thoughts/Musings - Some of the exercises in module 1 are posed quite ambiguously; - I'm looking forward to getting to the part about org-mode; - I've been meaning to cultivate a proper note-taking system/personal knowledge-base for a couple of years, since I first became aware of Obsidian; - I really didn't know that scholars were using using XML (well, TEI) to make linguistic/historical/literary annotations, but, well, that makes sense; - Markdown comments, containing delimited labels/tags — now, there's a thought; - Here follows a comment that looks `` ; - [Apparently](https://stackoverflow.com/a/4829998) Pandoc prefers the triple-hyphen; - I thought markdown comments had to be some awful thing like `[//] # (this is a comment)`; - Anyway, here's an example: ``​; - And now there's a(n invisible) tag against which to search this document. ## Interesting links - [Sustainable Authorship in Plain Text using Pandoc and Markdown](https://programminghistorian.org/en/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown) I genuinely had no idea about Pandoc prior to this MOOC, and converting from markdown to PDF is pretty cool. Despite years working with markdown for READMEs and documentation, I didn't know that one can add a YAML metadata block to the top of a markdown file. I recently learnt about markdown footnotes, but I didn't know you could cite entries in a .bib file. Typora supports shorthand links, e.g. ``; I wonder whether other platforms do too; let's give it a try: - [The Measure of All Things](https://www.amazon.fr/Measure-All-Things-Seven-Year-Transformed/dp/0743216768) Who would've thought that a book about white dudes on ships (presumably measuring stuff) would be held in such high esteem? - [Maintaining a laboratory notebook](https://colinpurrington.com/tips/lab-notebooks/) Colin Purrington's guide is pretty rad. Probably a bit too hard-science to be of relevance to my work, but you never know. If I can incorporate even 25% of the rigour that's described there into my own practice, I'll be in a much better place than I have been to-date. - [Transcribing medieval manuscripts with TEI](https://andrewdunning.ca/transcribing-medieval-manuscripts-tei) (See above.) Chances are I'll never (need to) use TEI, but I'm into it. ## Some software I wasn't previously aware of ### Pandoc Markdown to PDF? To HTML? With metadata-awareness? Yes please. One issue... Pandoc doesn't appear to like display-math blocks containing nested environments, e.g. ```markdown $$ \begin{align} e^{i\pi} = -1 \end{align} $$ ``` GitHub doesn't appear to have a problem with that sort of action. What about GitLab?.. $$ \begin{align} e^{i\pi} = -1 \end{align} $$ ### DocFetcher Not sure I'll use it, but then again there have been various times when I've had to resort to trying `find` or `locate` at the (Mac) command line, and it's been painful. ### ExifTool Perhaps I *should* be adding metadata to my images and audio files. ```shell exiftool -[comment|notes]=":mylabel:" img.jpg ``` Worth remembering that, in addition to EXIF, [XMP](https://en.wikipedia.org/wiki/Extensible_Metadata_Platform) exists. # 26/03/2024 Working my way through Module 2. ## Reproducibility Problems - [Reinhart & Rogoff](https://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt): Growth in a Time of Debt In short, the basis for the economic orthodoxy with regard to _austerity_ in the wake of the 2008/9 financial crisis. Ultimately based the insubstantiable assertion that national debt exceeding 90% of GDP has "[dramatic consequences for growth]". By the time their dubious data-handling and slipshod statistical practice had been discovered, their conclusions were already in the hands of conservative economic policymakers. - Chang et al. and the database column-swap [debacle](https://people.ligo-wa.caltech.edu/~michael.landry/calibration/S5/getsignright.pdf) Papers had to be retracted. Sure, methodological problems, but driven by _sociological/cultural_ ones; high productivist pressure — _publish or perish_. The real problem is the risk of a lack of rigour and transparency... ## Why is reproducibility difficult? - Lack of info leading to inability to replicate decisions made by original researchers. - Profusion of errors caused by the user of computers; - computers permit us to go further and faster, but also to make errors more readily and rapidly; - there's also the black-box effect of proprietary software, and the daftness of opinionated design decisions being confused for helpful ones, e.g. "MARCH1" and "2310009E13" being interpreted by Excel as a date and a very large number respectively. - Lack of rigour and organisation; - no VCS, manual file-naming conventions, etc.; - no code-review or continuous integration. - And, as ever, cultural/social issues; - an article can be (uncharitably) described as an advert for the _real_ work of research and result-gathering, but why?; - well, perhaps we feel like we could have been more rigorous, so we take a few liberties with documenting our work, we get selective with our results, etc.; - ultimately we don't wish to suffer embarrassment or humiliation, or (worse still!) miss an opportunity to publish; - the irony being that we'd be better-placed to publish if we were open, transparent, etc.; but we're far from alone in all this.