/
velvet

velvet

Introduction

Source: http://www.ebi.ac.uk/~zerbino/velvet/

Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.

Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs.

Manual: http://www.ebi.ac.uk/~zerbino/velvet/Manual.pdf

Module usage

module load bioinformatics/velvet/1.2.03
module load bioinformatics/velvet/1.2.03-omp
module load bioinformatics/velvet/1.2.03-max101
module load bioinformatics/velvet/1.2.03-color
module load bioinformatics/velvet/1.2.03-category57

Installation

module load library/zlib/1.2.5

Requirements
Velvet should function on any standard 64bit Linux environment with gcc. A
good amount of physical memory (12GB to start with, more is no luxury) is
3
recommended.
It can in theory function on a 32bit environment, but such systems have
memory limitations which might ultimately be a constraint for assembly.

mkdir -p /sw/bioinformatics/velvet/1.2.03
cd /sw/bioinformatics/velvet/1.2.03
tar -zxvf velvet_latest.tgz
cd /sw/bioinformatics/velvet/1.2.03-omp
 make 'OPENMP=1' 2>&1 |tee make_velvet.txt

cd /sw/bioinformatics/velvet/1.2.03-color
make color 2>&1 |tee make_velvet-color.txt

cd /sw/bioinformatics/velvet/1.2.03
 make  2>&1 |tee make_velvet.txt

cd /sw/bioinformatics/velvet/1.2.03-category57
make 'CATEGORIES=57' 2>&1 |tee make_velvet-category57.txt

cd /sw/bioinformatics/velvet/1.2.03-MAXKMERLENGTH101
 make 'MAXKMERLENGTH=101' 2>&1 |tee make_velvet-MAXKMERLENGTH101.txt
From a GNU environment, simply type:
> make
2.3 Compilation settings

2.3.1 Colorspace Velvet
To produce the colorspace version of Velvet, compile with the instruction:
> make color

All the rest of the manual remains valid, except that the executables are
now called velveth de and velveth de .
Beware that color- and sequence space are incompatible, hence separate sets
of executables. In other words, don’t try to hash sequence files with colorspace
velvet or vice-versa, under penalty of meaningless results!

2.3.2 CATEGORIES
Because of the use of fixed-length arrays, a number of variables have to be set
at compilation time.
One of the is the number of channels, or categories of reads, which can be
handled independently. This is for example useful is you want to distinguish
reads from different insert libraries, or from different samples altogether.

By default, there are only two short read categories, but this variable can be
extended to your needs. For example, to obtain 57 different channels, compile
with the parameter:

make ’CATEGORIES=57’

(Note the single quotes and absence of spacing.)
Obviously, the greater the number, the longer the corresponding arrays, the
more memory will be required to run Velvet. Adjust this variable according to
your needs and your memory requirements.

2.3.3 MAXKMERLENGTH.
Another useful compilation parameter is the MAXKMERLENGTH. As explained
in 5.2, the hash length can be crucial to getting optimal assemblies.
Depending on the dataset, you might wish to use long hash lengths.
By default, hash-lengths are limited to 31bp, but you can push up this limit
by adjusting the MAXKMERLENGTH parameter at compilation time:


make ’MAXKMERLENGTH=57’
(Note the single quotes and absence of spacing.)
By storing longer words, Velvet will be requiring more memory, so adjust
this variable according to your needs and memory resources.

2.3.4 BIGASSEMBLY
Read IDs are stored on signed 32bit integers, meaning that if you have a big
assembly with more than 2.2 billion reads more memory is needed to track the
reads. To do so, simply add the following option to the make command:

make ’BIGASSEMBLY=1’
(Note the single quotes and absence of spacing.)
This will cost more memory overhead.

2.3.5 LONGSEQUENCES
Read lengths are stored on signed 16bit integers, meaning that if you are assembling
contigs longer than 32kb long, then more memory is required to store
coordinates. To do so, simply add the following option to the make command:

make ’LONGSEQUENCES=1’
(Note the single quotes and absence of spacing.)
This will cost more memory overhead.

2.3.6 OPENMP
To turn on multithreading, simply use the OPENMP option at compilation.
This should not significantly affect the memory overhead or results:

make ’OPENMP=1’
OpenMP allows a program to make use of multiple CPU cores on the same
machine. You might have to set the environment variables OMP NUM THREADS
and OMP THREAD LIMIT. Velvet will the use up to OMP NUM THREADS+
1 or OMP THREAD LIMIT threads. More information at
http://www.ats.ucla.edu/clusters/common/computing/parallel/using openmp.htm
Only parts of the Velvet algorithm make use of OpenMP, so don’t expect a
linear increase in run time with respect to CPUs.

2.3.7 BUNDLEDZLIB
By default, Velvet uses an existing zlib installed on your system. If there isn’t
one or if it is unsuitable for any reason, zlib source code is also distributed within
the Velvet source package and Velvet can be compiled to use this bundled zlib
by adding the following option to the make command:

make ’BUNDLEDZLIB=1’