velvet
Introduction
Source: http://www.ebi.ac.uk/~zerbino/velvet/
Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.
Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs.
Manual: http://www.ebi.ac.uk/~zerbino/velvet/Manual.pdf
Module usage
module load bioinformatics/velvet/1.2.03 module load bioinformatics/velvet/1.2.03-omp module load bioinformatics/velvet/1.2.03-max101 module load bioinformatics/velvet/1.2.03-color module load bioinformatics/velvet/1.2.03-category57
Installation
module load library/zlib/1.2.5
Requirements
Velvet should function on any standard 64bit Linux environment with gcc. A
good amount of physical memory (12GB to start with, more is no luxury) is
3
recommended.
It can in theory function on a 32bit environment, but such systems have
memory limitations which might ultimately be a constraint for assembly.
mkdir -p /sw/bioinformatics/velvet/1.2.03 cd /sw/bioinformatics/velvet/1.2.03 tar -zxvf velvet_latest.tgz cd /sw/bioinformatics/velvet/1.2.03-omp make 'OPENMP=1' 2>&1 |tee make_velvet.txt cd /sw/bioinformatics/velvet/1.2.03-color make color 2>&1 |tee make_velvet-color.txt cd /sw/bioinformatics/velvet/1.2.03 make 2>&1 |tee make_velvet.txt cd /sw/bioinformatics/velvet/1.2.03-category57 make 'CATEGORIES=57' 2>&1 |tee make_velvet-category57.txt cd /sw/bioinformatics/velvet/1.2.03-MAXKMERLENGTH101 make 'MAXKMERLENGTH=101' 2>&1 |tee make_velvet-MAXKMERLENGTH101.txt
From a GNU environment, simply type: > make 2.3 Compilation settings 2.3.1 Colorspace Velvet To produce the colorspace version of Velvet, compile with the instruction: > make color All the rest of the manual remains valid, except that the executables are now called velveth de and velveth de . Beware that color- and sequence space are incompatible, hence separate sets of executables. In other words, don’t try to hash sequence files with colorspace velvet or vice-versa, under penalty of meaningless results! 2.3.2 CATEGORIES Because of the use of fixed-length arrays, a number of variables have to be set at compilation time. One of the is the number of channels, or categories of reads, which can be handled independently. This is for example useful is you want to distinguish reads from different insert libraries, or from different samples altogether. By default, there are only two short read categories, but this variable can be extended to your needs. For example, to obtain 57 different channels, compile with the parameter: make ’CATEGORIES=57’ (Note the single quotes and absence of spacing.) Obviously, the greater the number, the longer the corresponding arrays, the more memory will be required to run Velvet. Adjust this variable according to your needs and your memory requirements. 2.3.3 MAXKMERLENGTH. Another useful compilation parameter is the MAXKMERLENGTH. As explained in 5.2, the hash length can be crucial to getting optimal assemblies. Depending on the dataset, you might wish to use long hash lengths. By default, hash-lengths are limited to 31bp, but you can push up this limit by adjusting the MAXKMERLENGTH parameter at compilation time: make ’MAXKMERLENGTH=57’ (Note the single quotes and absence of spacing.) By storing longer words, Velvet will be requiring more memory, so adjust this variable according to your needs and memory resources. 2.3.4 BIGASSEMBLY Read IDs are stored on signed 32bit integers, meaning that if you have a big assembly with more than 2.2 billion reads more memory is needed to track the reads. To do so, simply add the following option to the make command: make ’BIGASSEMBLY=1’ (Note the single quotes and absence of spacing.) This will cost more memory overhead. 2.3.5 LONGSEQUENCES Read lengths are stored on signed 16bit integers, meaning that if you are assembling contigs longer than 32kb long, then more memory is required to store coordinates. To do so, simply add the following option to the make command: make ’LONGSEQUENCES=1’ (Note the single quotes and absence of spacing.) This will cost more memory overhead. 2.3.6 OPENMP To turn on multithreading, simply use the OPENMP option at compilation. This should not significantly affect the memory overhead or results: make ’OPENMP=1’ OpenMP allows a program to make use of multiple CPU cores on the same machine. You might have to set the environment variables OMP NUM THREADS and OMP THREAD LIMIT. Velvet will the use up to OMP NUM THREADS+ 1 or OMP THREAD LIMIT threads. More information at http://www.ats.ucla.edu/clusters/common/computing/parallel/using openmp.htm Only parts of the Velvet algorithm make use of OpenMP, so don’t expect a linear increase in run time with respect to CPUs. 2.3.7 BUNDLEDZLIB By default, Velvet uses an existing zlib installed on your system. If there isn’t one or if it is unsuitable for any reason, zlib source code is also distributed within the Velvet source package and Velvet can be compiled to use this bundled zlib by adding the following option to the make command: make ’BUNDLEDZLIB=1’