Title: MEEPTOOLS: a maximum expected error based FASTQ read filtering and trimming toolkit
Authors: Vishal N. Koparde; Hardik I. Parikh; Steven P. Bradley; Nihar U. Sheth
Addresses: Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA ' Center for the Study of Biological Complexity and Department of Microbiology and Immunology, Virginia Commonwealth University, Richmond, VA 23284, USA ' Center for the Study of Biological Complexity and Department of Microbiology and Immunology, Virginia Commonwealth University, Richmond, VA 23284, USA ' Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA
Abstract: Quality-based sequence trimming of next generation sequencing (NGS) data is essential as it can potentially prevent missed or incorrect alignment downstream or prevent missassembly in a de novo assembly process by over-representing false k-mers. Trimming algorithms in tools like sickle, trimmomatic, etc. either depend on a running sum of quality of bases; or rely on average base quality of a sliding window. They consider PHRED quality, which is exponentially related to the probability of an erroneous base call. Here we present MEEPTOOLS, which is an open-source tool based on maximum expected error (MEE) as a percentage of read length (MEEP score) to filter, trim, truncate and assess NGS data in FASTQ file. By treating read trimming as a minimum subarray problem, MEEPTOOLS can simultaneously retain more reliable bases and remove unreliable bases than the traditional quality filtering strategies. MEEPTOOLS is ready available for download under the GNU GPLv3 at https://github.com/nisheth/meeptools.
Keywords: read trimming; FASTQ; QC; read processing; Illumina.
DOI: 10.1504/IJCBDD.2017.085409
International Journal of Computational Biology and Drug Design, 2017 Vol.10 No.3, pp.237 - 247
Received: 17 Aug 2016
Accepted: 13 Feb 2017
Published online: 25 Jul 2017 *