3. Tool and package management¶
Here, we will tackle the first item on the list towards more reproducibility:
Keeping track of the used tools and their versions.
3.1. Installing the Conda package manager¶
We will use the package/tool managing system conda to install some programs that we will use during the course. It is not installed by default, thus we need to install it first to be able to use it.
# download latest conda installer $ curl -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh # run the installer $ bash Miniconda3-latest-Linux-x86_64.sh # delete the installer after successful run $ rm Miniconda3-latest-Linux-x86_64.sh
Should the conda installer download fail. Please find links to alternative locations on the Downloads page.
Before we are able to use conda we need to tell our shell where it can find the program. We add the right path to the conda installation to our shell config files:
$ echo 'export PATH="~/miniconda3/bin:$PATH"' >> ~/.bashrc $ echo 'export PATH="~/miniconda3/bin:$PATH"' >> ~/.zshrc
So what is actually happening here? We are appending a line to a file (either
If we are starting a new command-line shell, either file gets executed first (depending on which shell you are using, either bash or zsh shells).
What this line does, is to put permanently the directory
~/miniconda3/bin first on your
PATH variable contains directories in which our computer looks for installed programs, one directory after the other until the program you requested is found (or not, then it will complain).
Through the addition of the above line we make sure that the program
conda can be found any time we open a new shell.
Close shell/terminal, re-open new shell/terminal. Now, we should be able to use the conda command:
$ conda update conda
3.1.2. Installing conda channels to make tools available¶
Different tools are packaged in what conda calls channels. We need to add some channels to make the bioinformatics and genomics tools available for installation. In particular we need the Bioconda channel, that pre-packages many bioinformatics tools.
# Install some conda channels # A channel is where conda looks for packages $ conda config --add channels defaults $ conda config --add channels conda-forge $ conda config --add channels bioconda
3.2. Using conda to search and install tools¶
Let us first look for a tool, e.g. the aligner BWA:
# Look for available tools/packages $ conda search bwa Loading channels: done # Name Version Build Channel bwa 0.5.9 0 bioconda bwa 0.5.9 1 bioconda bwa 0.6.2 0 bioconda bwa 0.6.2 1 bioconda bwa 0.7.3a 0 bioconda ...
We can see that the tool is available and several versions can be installed.
To install software (here BWA) using conda, one uses the command
# install a tool into the environment $ conda install bwa # to install a particular version of a tool do $ conda install bwa=0.6.2
Without a version number conda tries to install the latest version for you.
While conda was in the first place not developed for bioinformatics/genomics type of tools/packages, clever people took the system and packaged bioinformatics tools into the conda system. To not confuse things with the original conda system, people are using “channels” to distribute software that is related. We already made three software “channels” available to our conda installation: conda-forge, defaults, bioconda. Specifically, the Bioconda channel is of importance to us as it makes ~3000 bioinformatics packages available to us [GRUENING2017].
3.3. Create isolated environments¶
While having one software manager for all your bioinformatics tools is great already, conda has one particular strength that we are going to exploit often during the course of this tutorial. Conda can create isolated environments for sets of user-defined tools. The tools and their version numbers within environments, once created, can be easily saved in a file. Using these files we can easily re-create an environment from scratch with the same tool-set with the same version numbers. Awesome!
# Create a base environment $ conda create -n tutorial python=3 # Activate the environment $ conda activate tutorial
So what is happening when you type
conda activate tutorial in a shell.
PATH variable (mentioned above) gets temporarily manipulated and set to:
$ conda activate tutorial # Lets look at the content of the PATH variable (tutorial) $ echo $PATH ~/miniconda3/envs/tutorial/bin:~/miniconda3/bin:/usr/local/bin: ...
Now it will look first in your environment’s
~/miniconda3/envs/tutorial/bin) and only afterwards in the general conda
So basically everything you install generally with
conda install (without being in an environment) is also available to you but gets overshadowed if a similar program is in
~/miniconda3/envs/tutorial/bin and you have activated the
To tell if you are in the correct conda environment, look at the command-prompt.
Do you see the name of the environment in round brackets at the very beginning of the prompt, e.g. (tutorial)?
If not, activate the
tutorial environment with
conda activate tutorial before installing the tools.
To leave an environment just type:
(tutorial) $ conda deactivate # Lets look at the content of the PATH variable $ echo $PATH ~/miniconda3/bin:/usr/local/bin: ...
conda list will show you the packages that are installed within the environment:
$ conda activate tutorial # list all installed (tutorial) $ conda list
Looks like the tools
bwa we wanted is installed.
Ok, now we want to get a snapshot of the current environment so that we could recreate it either here or on another machine running the same operating system.
# Lets export the environment into a yaml-file (tutorial) $ conda env export > tutorial.yaml
Lets have a look into the
(tutorial) $ cat tutorial.yaml
To deactivate the environment again type:
# Deactivate environment (tutorial) $ conda deactivate
Now we delete the environment, specifying the name again with
# Delete original "tutorial" environment $ conda env remove -n tutorial
Now, we can use the created
yaml-file to recreate the former
# Lets recreate an environment using the tutorial.yaml file $ conda env create -n tutorial -f tutorial.yml # Activate environment $ conda activate tutorial
So we learned that we can create conda environments for a certain tool or toolset/packages and store the installed tools and their installed version numbers in a
yaml-file that can be used to recreate the environment.
This enables us in a very easy way to keep track of the tools and versions used in our analysis.
It is good practice to include a
yaml file of your environment in your analysis directory and submit it together with the rest of your code.
3.4. General Conda commands¶
# to search for packages conda search PACKAGE # Install conda install PACKAGE # To update all packages conda update --all --yes # List all packages installed conda list [-n ENV] # conda list environments conda env list # create new environment with packages conda create -n ENV PACKAGE [PACKAGE] ... # activate environment conda activate ENV # deavtivate environment conda deactivate # export env conda env export > env.yaml # recreate env from file conda env create -n ENV -f env.yaml