FastaRead
— FASTA file reader module¶
This module provides the type FastaReader
which allows to parse FASTA files in Julia.
It is designed to be lightweight and fast, and it’s inspired to
kseq.h. It reads files on the fly,
keeping only one entry at a time in memory, and it can read gzip-compressed files.
Here is a quick example:
julia> using FastaRead
julia> fr = FastaReader("somefile.fasta")
julia> for (name, seq) in fr
println("$name : $seq")
end
julia> close(fr)
Usage¶
There is one central type provided by the module, FastaReader
, which takes a file name as argument, and has
a parameter to set the output type:
-
FastaReader{T}(filename::String)
creates an object which is able to parse FASTA files. The file can be in plain text format or in gzip-compressed format. The type
T
determines the output type of the sequences: the default isASCIIString
, which is the fastest option; another fast output type isVector{Uint8}
, which is less readable but has the advantage of being mutable; any other containerT
for whichconvert(T, ::Vector{Uint8})
is defined will work, but it will pass through a conversion step and therefore be slower.The data can be read out by iterating the
FastaReader
object:for (name, seq) in FastaReader("somefile.fasta") # do something with name and seq end
As shown, the iterator returns a tuple containing the description (always an
ASCIIString
) and the data (whose type is set when creating theFastaReader
object (e.g.FastaReader{Vector{Uint8}}(filename)
).The
FastaReader
type has a fieldnum_parsed
which contains the number of entries parsed so far.Other ways to read out the data are via the
readentry()
andreadall()
functions.
-
readentry
(fr::FastaReader)¶ This function can be used to read entries one at a time:
fr = FastaReader("somefile.fasta") name, seq = readentry(fr)
See also the
eof()
function.
-
readall
(fr::FastaReader)¶ This function extends
Base.readall()
: it parses a whole FASTA file at once, and returns an array of tuples, each one containing the description and the sequence.
-
rewind
(fr::FastaReader)¶ This function rewinds the reader, so that it can restart the parsing again without closing and re-opening it. It also resets the value of the
num_parsed
field.
-
eof
(fr::FastaReader)¶ This function extends
Base.eof()
and tests for end-of-file condition; it is useful when usingreadentry()
:fr = FastaReader("somefile.fasta") while !eof(fr) name, seq = readentry(fr) # do something end close(fr)
-
close
(fr::FastaReader)¶ This function extends
Base.close()
and closes the stream associated with theFastaReader
; the reader must not be used any more after this function is called.