repeatmasker
Introduction
http://www.repeatmasker.org/
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including, cross_match, ABBlast/WUBlast, RMBlast and Decypher.
Usage
Current Version: ================ module load bioinformatics/repeatmasker/4.0.7 (gowonda) module load bioinformatics/repeatmasker/4.0.7-gu (awoonga) Other versions: =============== module load bioinformatics/repeatmasker/3.3.0-p1
Installation
mkdir -p /sw/bioinformatics/repeatmasker/3.3.0-p1/SRC tar -zxvf RepeatMasker-open-3-3-0-p1.tar.gz cd /sw/bioinformatics/repeatmasker/3.3.0-p1/SRC/RepeatMasker mv * /sw/bioinformatics/repeatmasker/3.3.0-p1/ cd /sw/bioinformatics/repeatmasker/3.3.0-p1/ perl ./configure
perl ./configure RepeatMasker Configuration Program This program assists with the configuration of the RepeatMasker program. The next set of screens will ask you to enter information pertaining to your system configuration. At the end of the program your RepeatMasker installation will be ready to use. <PRESS ENTER TO CONTINUE> **PERL INSTALLATION PATH** This is the full path to the Perl interpreter. e.g. /usr/local/bin/perl or enter "env" if you prefer to use the "/usr/bin/env perl" mechanism to locate perl. Enter path [ /usr/bin/perl ]: **REPEATMASKER INSTALLATION PATH** This is the path to the location where the RepeatMasker program has been installed. Enter path [ /sw/bioinformatics/repeatmasker/3.3.0-p1 ]: -- Building monolithic RM database... **TRF INSTALLATION PATH** This is the path to the location where the TRF program can be found. This is used by the RepeatProteinMask program. [NOTE: This script assumes the trf program is called "trf". If it is name trf###-linux.exe you will need to create a symlink called "trf" to it or rename it first. Enter path [ ]: /sw/bioinformatics/repeatmasker/trf/4.04/bin Add a Search Engine: 1. CrossMatch: [ Un-configured ] 2. RMBlast - NCBI Blast with RepeatMasker extensions: [ Un-configured ] 3. WUBlast/ABBlast (required by DupMasker): [ Un-configured ] 4. DeCypher (TimeLogic): [ Un-configured ] 5. Done Enter Selection: 2 **RMBlast (rmblastn) INSTALLATION PATH** This is the path to the location where the rmblastn and makeblastdb programs can be found. Enter path [ ]: /sw/bioinformatics/repeatmasker/rmblast/2.2.23/bin Building RMBlast frozen libraries.. Do you want RMBlast to be your default search engine for Repeatmasker? (Y/N) [ Y ]: Y Add a Search Engine: 1. CrossMatch: [ Un-configured ] 2. RMBlast - NCBI Blast with RepeatMasker extensions: [ Configured, Default ] 3. WUBlast/ABBlast (required by DupMasker): [ Un-configured ] 4. DeCypher (TimeLogic): [ Un-configured ] 5. Done Enter Selection:5 Enter Selection: 5 -- Setting perl interpreter... Congratulations! RepeatMasker is now ready to use. The program is installed with a minimal repeat library by default. This library only contains simple, low-complexity, and common artefact ( contaminate ) sequences. These are adequate for use with your own custom repeat library. If you plan to search using common species specific repeats you will need to obtain the complete RepeatMasker repeat library from GIRI ( www.giriinst.org ) and install it in /sw/bioinformatics/repeatmasker/3.3.0-p1. Further documentation on the program may be found here: /sw/bioinformatics/repeatmasker/3.3.0-p1/repeatmasker.help
Complete RepeatMasker repeat Library
sign in and down load the libraries from : http://www.girinst.org/ cd /sw/bioinformatics/repeatmasker/3.3.0-p1/SRC >>>>>>>>> ls RepBase14.11_REPET.embl.tar.gz RepBase17.08.fasta.tar.gz repeatmaskerlibraries-20090604.tar.gz RepBase17.08.embl.tar.gz repeatmaskerlibraries-20120418.tar.gz >>>>>>>>>>>> tar -zxvf RepBase14.11_REPET.embl.tar.gz tar -zxvf RepBase17.08.embl.tar.gz tar -zxvf RepBase17.08.fasta.tar.gz tar -zxvf repeatmaskerlibraries-20090604.tar.gz tar -zxvf repeatmaskerlibraries-20120418.tar.gz These are the repeat libraries for the program RepeatMasker. To install, move or copy the file "RepeatMaskerLib.embl" into the "Libraries" subdirectory of the RepeatMasker directory. cd Libraries cp -i RepeatMaskerLib.embl /sw/bioinformatics/repeatmasker/3.3.0-p1/Libraries cd ../RepBase14.11_REPET.embl cp -i *.fa /sw/bioinformatics/repeatmasker/3.3.0-p1/Libraries