nucleus.util.cigar -- Utility functions for working with alignment CIGAR operations.
Source code: nucleus/util/cigar.py
Documentation index: doc_index.md
The CIGAR format is defined within the SAM spec, available at https://samtools.github.io/hts-specs/SAMv1.pdf
This module provides utility functions for interacting with the parsed representations of CIGAR strings.
Functions overview
Name | Description |
---|---|
alignment_length (cigar_units) |
Computes the span in basepairs of the cigar units. |
format_cigar_units (cigar_units) |
Returns the string version of an iterable of CigarUnit protos. |
parse_cigar_string (cigar_str) |
Parse a cigar string into a list of cigar units. |
to_cigar_unit (source) |
Creates a cigar_pb2 CigarUnit from source. |
to_cigar_units (source) |
Converts object to a list of CigarUnit. |
Functions
alignment_length(cigar_units)
Computes the span in basepairs of the cigar units.
Args:
cigar_units: iterable[CigarUnit] whose alignment length we want to compute.
Returns:
The number of basepairs spanned by the cigar_units.
format_cigar_units(cigar_units)
Returns the string version of an iterable of CigarUnit protos.
Args:
cigar_units: iterable[CigarUnit] protos.
Returns:
A string representation of the CigarUnit protos that conforms to the
CIGAR string specification.
parse_cigar_string(cigar_str)
Parse a cigar string into a list of cigar units.
For example, if cigar_str is 150M2S, this function will return:
[
CigarUnit(operation=ALIGNMENT_MATCH, operation_length=150),
CigarUnit(operation=SOFT_CLIP, operation_length=2)
]
Args:
cigar_str: str containing a valid cigar.
Returns:
list[cigar_pb2.CigarUnit].
Raises:
ValueError: If cigar_str isn't a well-formed CIGAR.
to_cigar_unit(source)
Creates a cigar_pb2 CigarUnit from source.
This function attempts to convert source into a CigarUnit protobuf. If
source is a string, it must be a single CIGAR string specification like
'12M'. If source is a tuple or a list, must have exactly two elements
(operation_length, opstr). operation_length can be a string or int, and must
be >= 1. opstr should be a single character CIGAR specification (e.g., 'M').
If source is already a CigarUnit, it is just passed through unmodified.
Args:
source: many types allowed. The object we want to convert to a CigarUnit
proto.
Returns:
CigarUnit proto with operation_length and operation set to values from
source.
Raises:
ValueError: if source cannot be converted or is malformed.
to_cigar_units(source)
Converts object to a list of CigarUnit.
This function attempts to convert source into a list of CigarUnit protos.
If source is a string, we assume it is a CIGAR string and call
parse_cigar_string on it, returning the result. It not, we assume it's an
iterable containing element to be converted with to_cigar_unit(). The
resulting list of converted elements is returned.
Args:
source: str or iterable to convert to CigarUnit protos.
Returns:
list[CigarUnit].