Skip to content

yttria-aniseia/fastq-lengths

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

fastq-lengths is a simple utility for quickly determining a limited set of summary statistics related to the length of sequences in a FASTQ file.

fastq-lengths generally assumes well-formed fastq and does little validation, but notably does not assume sequence and quality entries to be on one line like some tools -- if the FASTQ file conforms to spec it should work.

No external dependencies other than functions in the POSIX C standard.
It is approximately 12x faster to find the median sequence length than the naive equivalent using Python and SeqIO, with runtime mem usage in the KB (vs tens of MB for Python).

usage:
fastq-lengths [subcommand [stopafter]] file.fastq
    subcommand: lengths | median | count
        lengths - report all sequence lengths and occurence counts (default)
        median  - report the median sequence length
        count   - report the total number of records in the file

    stopafter:  only read the first <stopafter> records in the file, e.g.
                when 1000 records is enough to characterize the contents.
fastq-lengths lengths 100 example.fq
32  1
150 99
fastq-lengths median example.fq
150
fastq-lengths count example.fq
45101612

Build

git clone [email protected]:yttria-aniseia/fastq-lengths.git
cd fastq-lengths
make
    Command being timed: "./fastq-lengths median ERR1757416_2.fastq"
    User time (seconds): 0.45
    System time (seconds): 0.00
    Percent of CPU this job got: 100%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.45
    Maximum resident set size (kbytes): 1436