Reptile

What is Reptile?

Reptile is a software developed in C++ for correcting sequencing errors in short reads from next-gen sequencing platforms. Reptile has several favorable properties:

  • Memory efficiency. Reptile can process input data with sizes larger than main memory. For instance, to process a 160x coverage (3.8GB) Illumina data for E. coli it requires only ~1GB memory, which is easily available in a desktop computer.
  • High speed. Processing Illumina data for a microbe typically takes 0.5hr ~ 2hrs, depending on the number and the quality of reads.
  • Can handle reads containing non-acgt characters and reads with non-equal length.
  • Makes simple use of quality score information.

Reptile has been developed by Xiao Yang, Karin Dorman and Srinivas Aluru.

↑Top

Highlights

  • The largest Illumina GAIIx dataset we have tested so far has 60M reads with length 100bp each for a plant genome. Several tested runs used 4G ~ 11G memory.
  • An experiment on a bacterial dataset showed that after Reptile correction, the N50 length improves three folds using YAGA assembler (default value).

↑Top

Requirements

  • Perl – for running some preprocessing scripts.
  • GNU make – to build Reptile we use Make build system. We tested GNU make.
  • C++ compiler – GNU C++ compiler is recommended.

↑Top

Download

The program is available under GNU Lesser General Public License version 3 with some components under Boost Software License version 1.0.

Note: the default values of program parameters are dataset dependent, i.e., they vary as dataset changes and hence are not “fixed” or “standard”. The calculation of these parameters can be automated but currently, many of them need to be set manually using the method explained in the paper (there is no assumption of any information of the reference genome). In general, the default parameters are chosen based on the histograms of quality scores, tile occurrences, and so on, of the dataset under consideration.

  • Release 1.0 Click here to download (included are a simple documentation (readme) file, all source files and a preprocessing Perl script.)
  • Release 1.1 Click here to download (included are a simple documentation (readme) file, a release note, all source files) – Aug 2010
  • Release 1.1 update Click here to download (updated folder structure, a tool included to help choosing default parameter values for each input dataset) – April, 2011. You can download a parallelized version (OpenMP, developed by Daniel S. Standage, http://standage.public.iastate.edu) of Release 1.1 here. Click here to download – June, 2011

↑Top

Tutorial

Here is a brief tutorial demonstrating how to use Reptile, including data preparation, parameter tuning, error correction, and generation of corrected short reads.

↑Top

References

When using Reptile please cite:

X. Yang, K. Dorman and S. Aluru, “Reptile: Representative tiling for short read error correction”, Bioinformatics, 26(20), 2526-2533, 2010.

↑Top

godaddy statistics

reptile.txt · Last modified: 2012/07/29 14:53 by xyang