nucleus.io.genomics_writer -- Classes that provide the interface for writing genomics data.
Source code: nucleus/io/genomics_writer.py
Documentation index: doc_index.md
GenomicsWriter
defines the core API supported by writers, and is subclassed
directly or indirectly (via DispatchingGenomicsWriter
) for all concrete
implementations.
TFRecordWriter
is an implementation of the GenomicsWriter
API for reading
TFRecord
files. This is usable for all data types when writing data as
serialized protocol buffers.
DispatchingGenomicsWriter
is an abstract class defined for convenience on top
of GenomicsWriter
that supports writing to either the native file format or to
TFRecord
files of the corresponding protocol buffer used to encode data of
that file type. The output format chosen is dependent upon the filename to which
the data are being written.
Concrete implementations for individual file types (e.g. BED, SAM, VCF, etc.) reside in type-specific modules in this package. A general example of the write functionality is shown below.
# options is a writer-specific set of options.
options = ...
# records is an iterable of protocol buffers of the specific data type.
records = ...
with GenomicsWriterSubClass(output_path, options) as writer:
for proto in records:
writer.write(proto)
Classes overview
Name | Description |
---|---|
DispatchingGenomicsWriter |
A GenomicsWriter that dispatches based on the file extension. |
GenomicsWriter |
Abstract base class for writing genomics data. |
TFRecordWriter |
A GenomicsWriter that writes to a TFRecord file. |
Classes
DispatchingGenomicsWriter
A GenomicsWriter that dispatches based on the file extension.
If '.tfrecord' is present in the filename, a TFRecordWriter is used.
Otherwise, a native writer is.
Sub-classes of DispatchingGenomicsWriter must define a _native_writer()
method.
Methods:
__init__(self, output_path, **kwargs)
Initializer.
Args:
output_path: str. The output path to which the records are written.
**kwargs: k=v named args. Keyword arguments used to instantiate the native
writer, if applicable.
write(self, proto)
GenomicsWriter
Abstract base class for writing genomics data.
A GenomicsWriter only has one method, write, which writes a single
protocol buffer to a file.
Methods:
write(self, proto)
Writes proto to the file.
Args:
proto: A protocol buffer.
TFRecordWriter
A GenomicsWriter that writes to a TFRecord file.
Example usage:
writer = TFRecordWriter('/tmp/my_output.tfrecord.gz')
for record in records:
writer.write(record)
Note that TFRecord files do not need to be wrapped in a "with" block.
Methods:
__init__(self, output_path, header=None, compression_type=None)
Initializer.
Args:
output_path: str. The output path to which the records are written.
header: An optional header for the particular data type. This can be
useful for file types that have logical headers where some operations
depend on that header information (e.g. VCF using its headers to
determine type information of annotation fields).
compression_type: Either 'ZLIB', 'GZIP', '' (uncompressed), or
None. If None, __init__ will guess the compression type based on
the input_path's suffix.
Raises:
IOError: if there was any problem opening output_path for writing.
close(self)
Explicitly closes writer.
write(self, proto)
Writes the proto to the TFRecord file.