RepeatSeq
Introduction
RepeatSeq determines genotypes for microsatellite repeats in high-throughput sequencing data.
If you use this code, please cite manuscript:
G. Highnam, C. Franck, A. Martin, C. Stephens, A. Puthige, and D. Mittelman (2012) Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles, Nucleic Acids Res, Epub Oct 22.
Usage
Â
Readme
Required Input RepeatSeq requires a BAM file, a FASTA file, and a region file as the minimal parameters. Optional Input The user to specify a number of command-line options to customize the behavior of RepeatSeq: Command-line Options: -r use only a specific read length or range of read lengths (e.g. LENGTH or MIN:MAX) -L required number of reference matching bases BEFORE the repeat [3] -R required number of reference matching bases AFTER the repeat [3] -M minimum mapping quality for a read to be used for allele determination -multi exclude reads flagged with the XT:A:R tag -pp exclude reads that are not properly paired (for PE reads only) -error manually override the RepeatSeq error model and set a constant error rate [0.05] -haploid assume a haploid rather than diploid genome -repeatseq write .repeatseq file (**see below for more information**) -calls write .calls file (**see below for more information**) -t include user-defined tag in the output filename -o number of flanking bases to output from each read Running RepeatSeq Usage: repeatseq [options] <in.bam> <in.fasta> <in.regions>, If an improper command line option is found, RepeatSeq will exit and print usage information.
Installation
Download the following packages as well:
BamTools (http://sourceforge.net/projects/bamtools/)
fastahack (https://github.com/ekg/fastahack)
Steps to install (requires CMake): [..navigate to repeatseq directory..] (1) download bamtools, place in repeatseq/ in directory named "bamtools" (https://github.com/pezmaster31/bamtools) (2) download fastahack, place in repeatseq/ in directory named "fastahack" (https://github.com/ekg/fastahack) (3) build bamtools $ mkdir bamtools/build $ cd bamtools/build/ $ cmake .. $ make (4) build repeatseq $ cd ../.. $ make
Actual Install on gowonda: cd /sw/bioinformatics/repeatseq/0.6.4/ unzip /sw/bioinformatics/repeatseq/0.6.4/src/repeatseq-master.zip cd /sw/bioinformatics/repeatseq/0.6.4/repeatseq-master mv * ../ cd ../ rmdir repeatseq-master mkdir /sw/bioinformatics/repeatseq/0.6.4/bamtools mkdir /sw/bioinformatics/repeatseq/0.6.4/fastahack >>>>>>>> bamtools build <<<<<<<<<<< cd /sw/bioinformatics/repeatseq/0.6.4/bamtools unzip /sw/bioinformatics/repeatseq/0.6.4/src/bamtools-master.zip cd /sw/bioinformatics/repeatseq/0.6.4/bamtools/bamtools-master mv * ../ cd /sw/bioinformatics/repeatseq/0.6.4/bamtools/ rmdir bamtools-master cd /sw/bioinformatics/repeatseq/0.6.4/ mkdir bamtools/build cd bamtools/build/ cmake .. [to build a debug version which can be used with gdb, do as follows] [cmake -DCMAKE_BUILD_TYPE=Debug ../ ] ..................... cmake .. -- The C compiler identification is GNU -- The CXX compiler identification is GNU -- Check for working C compiler: /sw/gcc/4.7.1/bin/gcc -- Check for working C compiler: /sw/gcc/4.7.1/bin/gcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /sw/gcc/4.7.1/bin/g++ -- Check for working CXX compiler: /sw/gcc/4.7.1/bin/g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Configuring done -- Generating done -- Build files have been written to: /sw/bioinformatics/repeatseq/0.6.4/bamtools/build ...................... make >>>>>>>>>>>>>>>>>>>> cd /sw/bioinformatics/repeatseq/0.6.4/fastahack unzip /sw/bioinformatics/repeatseq/0.6.4/src/fastahack-master.zip cd /sw/bioinformatics/repeatseq/0.6.4/fastahack/fastahack-master mv * ../ mv .gitignore ../ cd /sw/bioinformatics/repeatseq/0.6.4/fastahack/ rmdir fastahack-master .....build repeatseq............. cd /sw/bioinformatics/repeatseq/0.6.4/ make ..................................
Reference
1. https://github.com/BioinformaticsArchive/repeatseq#readme
2. https://github.com/pezmaster31/bamtools
3. https://github.com/ekg/fastahack