Contig documentation.

A contig is as a set of sequences, locally aligned to each other, so
that every sequence has overlapping regions with at least one sequence
in the contig, such that a continuous of overlapping sequences is
formed, allowing the deduction of a consensus sequence which may be
longer than any of the sequences from which it was deduced.
In this documentation we refer to the overlapping sequences used to
build the contig as "aligned sequences" and to the sequence deduced
from the overlap of aligned sequences as the "consensus". Methods to
deduce the consensus sequence from aligned sequences were not yet
implemented in this module, but its posssible to add a consensus
sequence deduced by other means, e.g, by the assembly program used to
build the alignment.
All aligned sequences in a Bio::Assembly::Contig must be Bio::Assembly::Locatable
objects and have a unique ID. The unique ID restriction is due to the
nature of the module's internal data structures and is also a request
of some assembly programs. If two sequences with the same ID are added
to a contig, the first sequence added is replaced by the second one. There are four base coordinate systems in Bio::Assembly::Contig. When
you need to access contig elements or data that exists on a certain
range or location, you may be specifying coordinates in relation to
different sequences, which may be either the contig consensus or one
of the aligned sequences that were used to do the assembly.

 =========================================================
          Name           | Referenced sequence
 ---------------------------------------------------------
   "gapped consensus"    | Contig (with gaps)
   "ungapped consensus"  | Contig (without gaps)
   "aligned $seqID"      | sequence $seqID (with gaps)
   "unaligned $seqID"    | sequence $seqID (without gaps)
 =========================================================

"gapped consensus" refers to positions in the aligned consensus
sequence, which is the consensus sequence including the gaps inserted
to align it agains the aligned sequences that were used to assemble
the contig. So, its limits are [ 1, (consensus length + number of gaps
in consensus) ]
"ungapped consensus" is a coordinate system based on the consensus
sequence, but excluding consensus gaps. This is just the coordinate
system that you have when considering the consensus sequence alone,
instead of aligned to other sequences.
"aligned $seqID" refers to locations in the sequence $seqID after
alignment of $seqID against the consensus sequence (reverse
complementing the original sequence, if needed). Coordinate 1 in
"aligned $seqID" is equivalent to the start location (first base) of
$seqID in the consensus sequence, just like if the aligned sequence
$seqID was a feature of the consensus sequence.
"unaligned $seqID" is equivalent to a location in the isolated
sequence, just like you would have when considering the sequence
alone, out of an alignment. When changing coordinates from "aligned
$seq2" to "unaligned $seq2", if $seq2 was reverse complemented when
included in the alignment, the output coordinates will be reversed to
fit that fact, i.e. 1 will be changed to length($seq2), 2 will be
length($seq)-1 and so on.
An important note: when you change gap coordinates from a gapped
system ("gapped consensus" or "aligned $seqID") to a system that does
not include gaps ("ungapped consensus" or "unaligned $seqID"), the
position returned will be the first location before all gaps
neighboring the input location. Bio::Assembly::Contig stores much information about a contig in a
Bio::Assembly::SeqFeature::Collection object. Relevant information on the
alignment is accessed by selecting features based on their primary
tags (e.g. all features which have a primary tag of the form
'_aligned_coord:$seqID', where $seqID is an aligned sequence ID, are
coordinates for sequences in the contig alignment) and, by using
methods from Bio::Assembly::SeqFeature::Collection, it's possible to select
features by overlap with other features.
We suggest that you use the primary tags of features as identifiers
for feature classes. By convention, features with primary tags
starting with a '_' are generated by modules that populate the contig
data structure and return the contig object, maybe as part of an
assembly object, e.g. drivers from the Bio::Assembly::IO set.
Features in the features collection may be associated with particular
aligned sequences. To obtain this, you must attach the sequence to the
feature, using attach() seq from Bio::Assembly::SeqFeatureI, before you add the
feature to the feature collection. We also suggest to add the sequence
id to the primary tag, so that is easy to select feature for a
particular sequence.
There is only one feature class that some methods in
Bio::Assembly::Contig expect to find in the feature collection: features
with primary tags of the form '_aligned_coord:$seqID', where $seqID is
the aligned sequence id (like returned by $seq->id()). These features
describe the position (in "gapped consensus" coordinates) of aligned
sequences, and the method set_seq_coord() automatically changes a
feature's primary tag to this form whenever the feature is added to
the collection by this method. Only two methods in Bio::Assembly::Contig
will not work unless there are features from this class:
change_coord() and get_seq_coord().
Other feature classes will be automatically available only when
Bio::Assembly::Contig objects are created by a specific module. Such
feature classes are (or should be) documented in the documentation of
the module which create them, to which the user should refer.

Methods

_binary_search	Description	Code
_compare	Description	Code
_nof_gaps	Description	Code
_padded_unpadded	Description	Code
_register_gaps	Description	Code
_unpadded_padded	Description	Code
add_features	Description	Code
add_seq	Description	Code
assembly	Description	Code
average_percentage_identity	Description	Code
change_coord	Description	Code
column_from_residue_number	Description	Code
consensus_iupac	Description	Code
consensus_string	Description	Code
displayname	Description	Code
downstream_neighbor	Description	Code
each_alphabetically	Description	Code
each_seq	Description	Code
each_seq_with_id	Description	Code
gap_char	Description	Code
get_consensus_length	Description	Code
get_consensus_quality	Description	Code
get_consensus_sequence	Description	Code
get_features_collection	Description	Code
get_qual_by_name	Description	Code
get_seq_by_name	Description	Code
get_seq_by_pos	Description	Code
get_seq_coord	Description	Code
get_seq_feat_by_tag	Description	Code
get_seq_ids	Description	Code
id	Description	Code
is_flush	Description	Code
length	Description	Code
map_chars	Description	Code
match	Description	Code
match_char	Description	Code
match_line	Description	Code
maxname_length	No description	Code
missing_char	Description	Code
new	Description	Code
no_residues	Description	Code
no_sequences	Description	Code
overall_percentage_identity	Description	Code
percentage_identity	Description	Code
purge	Description	Code
remove_features	Description	Code
remove_seq	Description	Code
select	Description	Code
select_noncont	Description	Code
set_consensus_quality	Description	Code
set_consensus_sequence	Description	Code
set_displayname_count	Description	Code
set_displayname_flat	Description	Code
set_displayname_normal	Description	Code
set_seq_coord	Description	Code
set_seq_qual	Description	Code
slice	Description	Code
sort_alphabetically	Description	Code
source	Description	Code
strand	Description	Code
symbol_chars	Description	Code
unmatch	Description	Code
uppercase	Description	Code
upstream_neighbor	Description	Code

Methods description

_binary_search