Towards reproducible computational biology¶
An introductory tutorial¶
This is an introductory tutorial for working in a reproducible manner in bioinformatics/genomics and related fields of study. You will learn how to analyse some next-generation sequencing (NGS) data. The idea here is not to facilitate the best possible analysis of this data or to use the best tools available, but rather to learn some tools that are available to us for making our analysis as reproducible as possible. The data you will be using is real research data, albeit down-sampled to make the analyses finish in a reasonable time.
Currently, Dr. Sebastian Schmeier is teaching this material at Massey University in Auckland, New Zealand.
A PDF-version of this tutorial can be downloaded
- 1. Introduction
- 2. Working reproducible
- 3. Tool and package management
- 4. Creating analysis workflows
- 4.1. What is a workflow management system?
- 4.2. What is Snakemake?
- 4.3. Setup
- 4.4. The analysis without a workflow management system
- 4.5. Using a workflow management system
- 4.6. Making your work available
- 5. Containerization
- 5.1. What is containerization?
- 5.2. What does it accomplish for us?
- 5.3. Using a Singularity container
- 5.4. Building your own Singularity container locally
- 5.5. Building a container on Singularity Hub
- 5.6. Using a container in our workflow
- 5.7. Ready made containers
- 5.8. Using one container for the whole workflow
- 5.9. Background reading on containers
- 6. Downloads