nucleus.io.gff -- Classes for reading and writing GFF files.
Source code: nucleus/io/gff.py
Documentation index: doc_index.md
The GFF format is described at https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md.
API for reading:
from nucleus.io import gff
# Iterate through all records.
with gff.GffReader(input_path) as reader:
for record in reader:
print(record)
where record
is a nucleus.genomics.v1.GffRecord
protocol buffer.
API for writing:
from nucleus.io import gff
from nucleus.protos import gff_pb2
# records is an iterable of nucleus.genomics.v1.GffRecord protocol buffers.
records = ...
header = gff_pb2.GffHeader()
# Write all records to the desired output path.
with gff.GffWriter(output_path, header) as writer:
for record in records:
writer.write(record)
For both reading and writing, if the path provided to the constructor contains
'.tfrecord' as an extension, a TFRecord
file is assumed and attempted to be
read or written. Otherwise, the filename is treated as a true GFF file.
Files that end in a '.gz' suffix cause the file to be treated as compressed (with BGZF if it is a true GFF file, and with gzip if it is a TFRecord file).
Classes overview
Name | Description |
---|---|
GffReader |
Class for reading GffRecord protos from GFF or TFRecord files. |
GffWriter |
Class for writing GffRecord protos to GFF or TFRecord files. |
NativeGffReader |
Class for reading from native GFF files. |
NativeGffWriter |
Class for writing to native GFF files. |
Classes
GffReader
Class for reading GffRecord protos from GFF or TFRecord files.
GffWriter
Class for writing GffRecord protos to GFF or TFRecord files.
NativeGffReader
Class for reading from native GFF files.
Most users will want to use GffReader instead, because it dynamically
dispatches between reading native GFF files and TFRecord files based on the
filename's extension.
Methods:
__init__(self, input_path)
Initializes a NativeGffReader.
Args:
input_path: string. A path to a resource containing GFF records.
iterate(self)
Returns an iterable of GffRecord protos in the file.
query(self)
Returns an iterator for going through the records in the region.
NOTE: This function is not currently implemented by NativeGffReader though
it could be implemented for sorted, tabix-indexed GFF files.
NativeGffWriter
Class for writing to native GFF files.
Most users will want GffWriter, which will write to either native GFF
files or TFRecord files, based on the output filename's extension.
Methods:
__init__(self, output_path, header)
Initializer for NativeGffWriter.
Args:
output_path: str. The path to which to write the GFF file.
header: nucleus.genomics.v1.GffHeader. The header that defines all
information germane to the constituent GFF records.