RepeatSeq

Introduction

RepeatSeq determines genotypes for microsatellite repeats in high-throughput sequencing data.

If you use this code, please cite manuscript:

G. Highnam, C. Franck, A. Martin, C. Stephens, A. Puthige, and D. Mittelman (2012) Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles, Nucleic Acids Res, Epub Oct 22.

Usage

 

Readme

Required Input
RepeatSeq requires a BAM file, a FASTA file, and a region file as the minimal parameters. 

Optional Input

The user to specify a number of command-line options to customize the behavior of RepeatSeq:

Command-line Options: 
	-r   	    use only a specific read length or range of read lengths (e.g. LENGTH or MIN:MAX)
	-L          required number of reference matching bases BEFORE the repeat [3]
	-R          required number of reference matching bases AFTER the repeat [3]
	-M          minimum mapping quality for a read to be used for allele determination
	-multi      exclude reads flagged with the XT:A:R tag
	-pp         exclude reads that are not properly paired (for PE reads only)
	-error      manually override the RepeatSeq error model and set a constant error rate [0.05]
    	-haploid    assume a haploid rather than diploid genome
	-repeatseq  write .repeatseq file (**see below for more information**)
	-calls      write .calls file (**see below for more information**)
	-t          include user-defined tag in the output filename
	-o          number of flanking bases to output from each read

Running RepeatSeq

Usage: repeatseq [options] <in.bam> <in.fasta> <in.regions>,

If an improper command line option is found, RepeatSeq will exit and print usage information.

Installation

Download the following packages as well:

BamTools (http://sourceforge.net/projects/bamtools/)
fastahack (https://github.com/ekg/fastahack)

Steps to install (requires CMake):

[..navigate to repeatseq directory..]

(1) download bamtools, place in repeatseq/ in directory named "bamtools" (https://github.com/pezmaster31/bamtools)
(2) download fastahack, place in repeatseq/ in directory named "fastahack" (https://github.com/ekg/fastahack)

(3) build bamtools
$ mkdir bamtools/build
$ cd bamtools/build/
$ cmake ..
$ make

(4) build repeatseq
$ cd ../..
$ make
Actual Install on gowonda:

cd /sw/bioinformatics/repeatseq/0.6.4/
unzip /sw/bioinformatics/repeatseq/0.6.4/src/repeatseq-master.zip
cd /sw/bioinformatics/repeatseq/0.6.4/repeatseq-master
mv * ../
cd ../
rmdir repeatseq-master
mkdir /sw/bioinformatics/repeatseq/0.6.4/bamtools
mkdir /sw/bioinformatics/repeatseq/0.6.4/fastahack

>>>>>>>> bamtools build <<<<<<<<<<<
cd /sw/bioinformatics/repeatseq/0.6.4/bamtools
unzip /sw/bioinformatics/repeatseq/0.6.4/src/bamtools-master.zip
cd /sw/bioinformatics/repeatseq/0.6.4/bamtools/bamtools-master
mv * ../
cd /sw/bioinformatics/repeatseq/0.6.4/bamtools/
rmdir bamtools-master

cd /sw/bioinformatics/repeatseq/0.6.4/
mkdir bamtools/build
cd bamtools/build/
cmake ..
[to build a debug version which can be used with gdb, do as follows]
[cmake -DCMAKE_BUILD_TYPE=Debug ../ ]
.....................
 cmake ..
-- The C compiler identification is GNU
-- The CXX compiler identification is GNU
-- Check for working C compiler: /sw/gcc/4.7.1/bin/gcc
-- Check for working C compiler: /sw/gcc/4.7.1/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /sw/gcc/4.7.1/bin/g++
-- Check for working CXX compiler: /sw/gcc/4.7.1/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Configuring done
-- Generating done
-- Build files have been written to: /sw/bioinformatics/repeatseq/0.6.4/bamtools/build


......................
make
>>>>>>>>>>>>>>>>>>>>

cd /sw/bioinformatics/repeatseq/0.6.4/fastahack
unzip /sw/bioinformatics/repeatseq/0.6.4/src/fastahack-master.zip
cd /sw/bioinformatics/repeatseq/0.6.4/fastahack/fastahack-master
mv * ../
mv .gitignore  ../
cd /sw/bioinformatics/repeatseq/0.6.4/fastahack/
rmdir fastahack-master

.....build repeatseq.............
cd /sw/bioinformatics/repeatseq/0.6.4/
make

..................................

Reference

1. https://github.com/BioinformaticsArchive/repeatseq#readme
2. https://github.com/pezmaster31/bamtools
3. https://github.com/ekg/fastahack