=cut
#-----
sub n { my $self = shift; $self->{'_n'} || ''; }
#-----
=head2 frame
Usage : $hsp_obj->frame()
Purpose : Get the reading frame number (-/+ 1, 2, 3) (TBLASTN/X only).
Returns : Integer or null string if not defined.
Argument : n/a
Throws : n/a
=cut
#---------
sub frame { my $self = shift; $self->{'_frame'} || ''; }
#---------
=head2 signif()
Usage : $hsp_obj->signif()
Purpose : Get the P-value or Expect value for the HSP.
Returns : Float (0.001 or 1.3e-43)
: Returns P-value if it is defined, otherwise, Expect value.
Argument : n/a
Throws : n/a
Comments : Provided for consistency with Sbjct::signif()
: Support for returning the significance data in different
: formats (e.g., exponent only), is not provided for HSP objects.
: This is only available for the Sbjct or Blast object.
See Also : L, L, B
=cut
#-----------
sub signif {
#-----------
my $self = shift;
my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};
$val;
}
=head2 expect
Usage : $hsp_obj->expect()
Purpose : Get the Expect value for the HSP.
Returns : Float (0.001 or 1.3e-43)
Argument : n/a
Throws : n/a
Comments : Support for returning the expectation data in different
: formats (e.g., exponent only), is not provided for HSP objects.
: This is only available for the Sbjct or Blast object.
See Also : L
=cut
#----------
sub expect { my $self = shift; $self->{'_expect'}; }
#----------
=head2 p
Usage : $hsp_obj->p()
Purpose : Get the P-value for the HSP.
Returns : Float (0.001 or 1.3e-43) or undef if not defined.
Argument : n/a
Throws : n/a
Comments : P-value is not defined with NCBI Blast2 reports.
: Support for returning the expectation data in different
: formats (e.g., exponent only) is not provided for HSP objects.
: This is only available for the Sbjct or Blast object.
See Also : L
=cut
#-----
sub p { my $self = shift; $self->{'_p'}; }
#-----
=head2 length
Usage : $hsp->length( [seq_type] )
Purpose : Get the length of the aligned portion of the query or sbjct.
Example : $hsp->length('query')
Returns : integer
Argument : seq_type: 'query' | 'sbjct' | 'total' (default = 'total')
Throws : n/a
Comments : 'total' length is the full length of the alignment
: as reported in the denominators in the alignment section:
: "Identical = 34/120 Positives = 67/120".
: Developer note: when using the built-in length function within
: this module, call it as CORE::length().
See Also : L
=cut
#-----------
sub length {
#-----------
my( $self, $type ) = @_;
$type ||= 'total';
$type ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};
## Sensitive to member name format.
$type = "_\L$type\E";
$self->{$type.'Length'};
}
=head2 gaps
Usage : $hsp->gaps( [seq_type] )
Purpose : Get the number of gaps in the query, sbjct, or total alignment.
: Also can return query gaps and sbjct gaps as a two-element list
: when in array context.
Example : $total_gaps = $hsp->gaps();
: ($qgaps, $sgaps) = $hsp->gaps();
: $qgaps = $hsp->gaps('query');
Returns : scalar context: integer
: array context without args: (int, int) = ('queryGaps', 'sbjctGaps')
Argument : seq_type: 'query' | 'sbjct' | 'total'
: (default = 'total', scalar context)
: Array context can be "induced" by providing an argument of 'list' or 'array'.
Throws : n/a
See Also : L, L
=cut
#---------
sub gaps {
#---------
my( $self, $seqType ) = @_;
$self->_set_seq_data() unless $self->{'_set_seq_data'};
$seqType ||= (wantarray ? 'list' : 'total');
if($seqType =~ /list|array/i) {
return (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));
}
if($seqType eq 'total') {
return ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;
} else {
## Sensitive to member name format.
$seqType = "_\L$seqType\E";
return $self->{$seqType.'Gaps'} || 0;
}
}
=head2 matches
Usage : $hsp->matches([seq_type], [start], [stop]);
Purpose : Get the total number of identical and conservative matches
: in the query or sbjct sequence for the given HSP. Optionally can
: report data within a defined interval along the seq.
: (Note: 'conservative' matches are called 'positives' in the
: Blast report.)
Example : ($id,$cons) = $hsp_object->matches('sbjct');
: ($id,$cons) = $hsp_object->matches('query',300,400);
Returns : 2-element array of integers
Argument : (1) seq_type = 'query' | 'sbjct' (default = query)
: (2) start = Starting coordinate (optional)
: (3) stop = Ending coordinate (optional)
Throws : Exception if the supplied coordinates are out of range.
Comments : Relies on seq_str('match') to get the string of alignment symbols
: between the query and sbjct lines which are used for determining
: the number of identical and conservative matches.
See Also : L, L, L, B
=cut
#-----------
sub matches {
#-----------
my( $self, %param ) = @_;
my(@data);
my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});
$seqType ||= 'query';
if(!defined $beg && !defined $end) {
## Get data for the whole alignment.
push @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});
} else {
## Get the substring representing the desired sub-section of aln.
$beg ||= 0;
$end ||= 0;
my($start,$stop) = $self->range($seqType);
if($beg == 0) { $beg = $start; $end = $beg+$end; }
elsif($end == 0) { $end = $stop; $beg = $end-$beg; }
if($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)
else { $end += 1;} ##ML moved from commented position below, makes
##more sense here
# if($end > $stop) { $end = $stop; }
if($beg < $start) { $beg = $start; }
# else { $end += 1;}
# my $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));
## ML: START fix for substr out of range error ------------------
my $seq = "";
if (($self->{'_prog'} eq 'TBLASTN') and ($seqType eq 'sbjct'))
{
$seq = substr($self->seq_str('match'),
int(($beg-$start)/3), int(($end-$beg+1)/3));
} elsif (($self->{'_prog'} eq 'BLASTX') and ($seqType eq 'query'))
{
$seq = substr($self->seq_str('match'),
int(($beg-$start)/3), int(($end-$beg+1)/3));
} else {
$seq = substr($self->seq_str('match'),
$beg-$start, ($end-$beg));
}
## ML: End of fix for substr out of range error -----------------
## ML: debugging code
## This is where we get our exception. Try printing out the values going
## into this:
##
# print STDERR
# qq(*------------MY EXCEPTION --------------------\nSeq: ") ,
# $self->seq_str("$seqType"), qq("\n),$self->name,",( index:";
# print STDERR $beg-$start, ", len: ", $end-$beg," ), (HSPRealLen:",
# CORE::length $self->seq_str("$seqType");
# print STDERR ", HSPCalcLen: ", $stop - $start +1 ," ),
# ( beg: $beg, end: $end ), ( start: $start, stop: stop )\n";
## ML: END DEBUGGING CODE----------
if(!CORE::length $seq) {
$self->throw("Undefined sub-sequence ($beg,$end). Valid range = $start - $stop");
}
## Get data for a substring.
# printf "Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\n%s<---\n", $beg, $end, $start, $stop, $seq;
# printf "Original match seq:\n%s\n",$self->seq_str('match');
$seq =~ s/ //g; # remove space (no info).
my $len_cons = CORE::length $seq;
$seq =~ s/\+//g; # remove '+' characters (conservative substitutions)
my $len_id = CORE::length $seq;
push @data, ($len_id, $len_cons);
# printf " HSP = %s\n id = %d; cons = %d\n", $self->name, $len_id, $len_cons; ;
}
@data;
}
=head2 frac_identical
Usage : $hsp_object->frac_identical( [seq_type] );
Purpose : Get the fraction of identical positions within the given HSP.
Example : $frac_iden = $hsp_object->frac_identical('query');
Returns : Float (2-decimal precision, e.g., 0.75).
Argument : seq_type: 'query' | 'sbjct' | 'total'
: default = 'total' (but see comments below).
Throws : n/a
Comments : Different versions of Blast report different values for the total
: length of the alignment. This is the number reported in the
: denominators in the stats section:
: "Identical = 34/120 Positives = 67/120".
: BLAST-GP uses the total length of the alignment (with gaps)
: WU-BLAST uses the length of the query sequence (without gaps).
: Therefore, when called without an argument or an argument of 'total',
: this method will report different values depending on the
: version of BLAST used.
:
: To get the fraction identical among only the aligned residues,
: ignoring the gaps, call this method with an argument of 'query'
: or 'sbjct'.
See Also : L, L, L
=cut
#-------------------
sub frac_identical {
#-------------------
# The value is calculated as opposed to storing it from the parsed results.
# This saves storage and also permits flexibility in determining for which
# sequence (query or sbjct) the figure is to be calculated.
my( $self, $seqType ) = @_;
$seqType ||= 'total';
if($seqType ne 'total') {
$self->_set_seq_data() unless $self->{'_set_seq_data'};
}
## Sensitive to member name format.
$seqType = "_\L$seqType\E";
sprintf( "%.2f", $self->{'_numIdentical'}/$self->{$seqType.'Length'});
}
=head2 frac_conserved
Usage : $hsp_object->frac_conserved( [seq_type] );
Purpose : Get the fraction of conserved positions within the given HSP.
: (Note: 'conservative' positions are called 'positives' in the
: Blast report.)
Example : $frac_cons = $hsp_object->frac_conserved('query');
Returns : Float (2-decimal precision, e.g., 0.75).
Argument : seq_type: 'query' | 'sbjct'
: default = 'total' (but see comments below).
Throws : n/a
Comments : Different versions of Blast report different values for the total
: length of the alignment. This is the number reported in the
: denominators in the stats section:
: "Identical = 34/120 Positives = 67/120".
: BLAST-GP uses the total length of the alignment (with gaps)
: WU-BLAST uses the length of the query sequence (without gaps).
: Therefore, when called without an argument or an argument of 'total',
: this method will report different values depending on the
: version of BLAST used.
:
: To get the fraction conserved among only the aligned residues,
: ignoring the gaps, call this method with an argument of 'query'
: or 'sbjct'.
See Also : L, L, L
=cut
#--------------------
sub frac_conserved {
#--------------------
# The value is calculated as opposed to storing it from the parsed results.
# This saves storage and also permits flexibility in determining for which
# sequence (query or sbjct) the figure is to be calculated.
my( $self, $seqType ) = @_;
$seqType ||= 'total';
if($seqType ne 'total') {
$self->_set_seq_data() unless $self->{'_set_seq_data'};
}
## Sensitive to member name format.
$seqType = "_\L$seqType\E";
sprintf( "%.2f", $self->{'_numConserved'}/$self->{$seqType.'Length'});
}
=head2 num_identical
Usage : $hsp_object->num_identical();
Purpose : Get the number of identical positions within the given HSP.
Example : $num_iden = $hsp_object->num_identical();
Returns : integer
Argument : n/a
Throws : n/a
See Also : L, L
=cut
#-------------------
sub num_identical {
#-------------------
my( $self) = shift;
$self->{'_numIdentical'};
}
=head2 num_conserved
Usage : $hsp_object->num_conserved();
Purpose : Get the number of conserved positions within the given HSP.
Example : $num_iden = $hsp_object->num_conserved();
Returns : integer
Argument : n/a
Throws : n/a
See Also : L, L
=cut
#-------------------
sub num_conserved {
#-------------------
my( $self) = shift;
$self->{'_numConserved'};
}
=head2 range
Usage : $hsp->range( [seq_type] );
Purpose : Gets the (start, end) coordinates for the query or sbjct sequence
: in the HSP alignment.
Example : ($qbeg, $qend) = $hsp->range('query');
: ($sbeg, $send) = $hsp->range('sbjct');
Returns : Two-element array of integers
Argument : seq_type = string, 'query' or 'sbjct' (default = 'query')
: (case insensitive).
Throws : n/a
See Also : L, L
=cut
#----------
sub range {
#----------
my ($self, $seqType) = @_;
$self->_set_seq_data() unless $self->{'_set_seq_data'};
$seqType ||= 'query';
## Sensitive to member name changes.
$seqType = "_\L$seqType\E";
return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});
}
=head2 start
Usage : $hsp->start( [seq_type] );
Purpose : Gets the start coordinate for the query, sbjct, or both sequences
: in the HSP alignment.
Example : $qbeg = $hsp->start('query');
: $sbeg = $hsp->start('sbjct');
: ($qbeg, $sbeg) = $hsp->start();
Returns : scalar context: integer
: array context without args: list of two integers
Argument : In scalar context: seq_type = 'query' or 'sbjct'
: (case insensitive). If not supplied, 'query' is used.
: Array context can be "induced" by providing an argument of 'list' or 'array'.
Throws : n/a
See Also : L, L
=cut
#----------
sub start {
#----------
my ($self, $seqType) = @_;
$seqType ||= (wantarray ? 'list' : 'query');
$self->_set_seq_data() unless $self->{'_set_seq_data'};
if($seqType =~ /list|array/i) {
return ($self->{'_queryStart'}, $self->{'_sbjctStart'});
} else {
## Sensitive to member name changes.
$seqType = "_\L$seqType\E";
return $self->{$seqType.'Start'};
}
}
=head2 end
Usage : $hsp->end( [seq_type] );
Purpose : Gets the end coordinate for the query, sbjct, or both sequences
: in the HSP alignment.
Example : $qbeg = $hsp->end('query');
: $sbeg = $hsp->end('sbjct');
: ($qbeg, $sbeg) = $hsp->end();
Returns : scalar context: integer
: array context without args: list of two integers
Argument : In scalar context: seq_type = 'query' or 'sbjct'
: (case insensitive). If not supplied, 'query' is used.
: Array context can be "induced" by providing an argument of 'list' or 'array'.
Throws : n/a
See Also : L, L
=cut
#----------
sub end {
#----------
my ($self, $seqType) = @_;
$seqType ||= (wantarray ? 'list' : 'query');
$self->_set_seq_data() unless $self->{'_set_seq_data'};
if($seqType =~ /list|array/i) {
return ($self->{'_queryStop'}, $self->{'_sbjctStop'});
} else {
## Sensitive to member name changes.
$seqType = "_\L$seqType\E";
return $self->{$seqType.'Stop'};
}
}
=head2 strand
Usage : $hsp_object->strand( [seq_type] )
Purpose : Get the strand of the query or sbjct sequence.
Example : print $hsp->strand('query');
: ($qstrand, $sstrand) = $hsp->strand();
Returns : -1, 0, or 1
: -1 = Minus strand, +1 = Plus strand
: Returns 0 if strand is not defined, which occurs
: for non-TBLASTN/X reports.
: In scalar context without arguments, returns queryStrand value.
: In array context without arguments, returns a two-element list
: of strings (queryStrand, sbjctStrand).
: Array context can be "induced" by providing an argument of 'list' or 'array'.
Argument : seq_type: 'query' | 'sbjct' or undef
Throws : n/a
See Also : L<_set_seq()|_set_seq>, L<_set_match_stats()|_set_match_stats>
=cut
#-----------
sub strand {
#-----------
my( $self, $seqType ) = @_;
$seqType ||= (wantarray ? 'list' : 'query');
return '' if $seqType eq 'query' and $self->{'_prog'} eq 'TBLASTN';
## Sensitive to member name format.
$seqType = "_\L$seqType\E";
# $seqType could be '_list'.
$self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'};
if($seqType =~ /list|array/i) {
return ('','') unless defined $self->{'_queryStrand'};
return ($self->{'_queryStrand'}, $self->{'_sbjctStrand'});
}
local $^W = 0;
$STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0;
}
#####################################################################################
## INSTANCE METHODS ##
#####################################################################################
=head2 seq
Usage : $hsp->seq( [seq_type] );
Purpose : Get the query or sbjct sequence as a Bio::Seq.pm object.
Example : $seqObj = $hsp->seq('query');
Returns : Object reference for a Bio::Seq.pm object.
Argument : seq_type = 'query' or 'sbjct' (default = 'query').
Throws : Propagates any exception that occurs during construction
: of the Bio::Seq.pm object.
Comments : The sequence is returned in an array of strings corresponding
: to the strings in the original format of the Blast alignment.
: (i.e., same spacing).
See Also : L, L, B
=cut
#-------
sub seq {
#-------
my($self,$seqType) = @_;
$seqType ||= 'query';
my $str = $self->seq_str($seqType);
my $num = $self->name;
my $name = $seqType =~ /query/i
? $self->parent->parent->name
: $self->parent->name;
require Bio::Seq;
new Bio::Seq (-ID => $name,
-SEQ => $str,
-DESC => "Blast HSP #$num, $seqType sequence",
);
}
=head2 seq_str
Usage : $hsp->seq_str( seq_type );
Purpose : Get the full query, sbjct, or 'match' sequence as a string.
: The 'match' sequence is the string of symbols in between the
: query and sbjct sequences.
Example : $str = $hsp->seq_str('query');
Returns : String
Argument : seq_Type = 'query' or 'sbjct' or 'match'
Throws : Exception if the argument does not match an accepted seq_type.
Comments : Calls _set_residues() to set the 'match' sequence if it has
: not been set already.
See Also : L, L, L<_set_match_seq()|_set_match_seq>
=cut
#------------
sub seq_str {
#------------
my($self,$seqType) = @_;
## Sensitive to member name changes.
$seqType = "_\L$seqType\E";
$self->_set_seq_data() unless $self->{'_set_seq_data'};
if($seqType =~ /sbjct|query/) {
my $seq = join('',@{$self->{$seqType.'Seq'}});
$seq =~ s/\s+//g;
return $seq;
} elsif( $seqType =~ /match/i) {
# Only need to call _set_match_seq() if the match seq is requested.
my $aref = $self->_set_match_seq() unless ref $self->{'_matchSeq'};
$aref = $self->{'_matchSeq'};
## DEBUGGING CODE:
# if($self->parent->name eq '1AK5_' and $self->parent->parent->name eq 'YAR073W') {
# print "seq_str():\n @$aref";;
# }
return join('',@$aref);
} else {
$self->throw("Invalid or undefined sequence type: $seqType",
"Valid types: query, sbjct, match");
}
}
=head2 seq_inds
Usage : $hsp->seq_inds( seq_type, class, collapse );
Purpose : Get a list of residue positions (indices) for all identical
: or conserved residues in the query or sbjct sequence.
Example : @ind = $hsp->seq_inds('query', 'identical');
: @ind = $hsp->seq_inds('sbjct', 'conserved');
: @ind = $hsp->seq_inds('sbjct', 'conserved', 1);
Returns : List of integers
: May include ranges if collapse is true.
Argument : seq_type = 'query' or 'sbjct' (default = query)
: class = 'identical' or 'conserved' (default = identical)
: (can be shortened to 'id' or 'cons')
: (actually, anything not 'id' will evaluate to 'conserved').
: collapse = boolean, if true, consecutive positions are merged
: using a range notation, e.g., "1 2 3 4 5 7 9 10 11"
: collapses to "1-5 7 9-11". This is useful for
: consolidating long lists. Default = no collapse.
Throws : n/a.
Comments : Calls _set_residues() to set the 'match' sequence if it has
: not been set already.
See Also : L, L<_set_residues()|_set_residues>, L, B
=cut
#---------------
sub seq_inds {
#---------------
my ($self, $seq, $class, $collapse) = @_;
$seq ||= 'query';
$class ||= 'identical';
$collapse ||= 0;
$self->_set_residues() unless defined $self->{'_identicalRes_query'};
$seq = ($seq !~ /^q/i ? 'sbjct' : 'query');
$class = ($class !~ /^id/i ? 'conserved' : 'identical');
## Sensitive to member name changes.
$seq = "_\L$seq\E";
$class = "_\L$class\E";
my @ary = sort { $a <=> $b } keys %{ $self->{"${class}Res$seq"}};
return $collapse ? &collapse_nums(@ary) : @ary;
}
=head2 get_aln
Usage : $hsp->get_aln()
Purpose : Get a Bio::UnivAln.pm object constructed from the query + sbjct
: sequences of the present HSP object.
Example : $aln_obj = $hsp->get_aln();
Returns : Object reference for a Bio::UnivAln.pm object.
Argument : n/a.
Throws : Propagates any exception ocurring during the construction of
: the Bio::UnivAln object.
Comments : Requires Bio::UnivAln.pm.
: The Bio::UnivAln.pm object is constructed from the query + sbjct
: sequence objects obtained by calling seq().
: Gap residues are included (see $GAP_SYMBOL). It is important that
: Bio::UnivAln.pm recognizes the gaps correctly. A strategy for doing
: this is being considered. Currently it is hard-wired.
See Also : L, B
=cut
#------------
sub get_aln {
#------------
my $self = shift;
require Bio::UnivAln;
my $qseq = $self->seq('query');
my $sseq = $self->seq('sbjct');
my $desc = sprintf "HSP #%s of query %s vs. sbjct %s",
$self->name, $self->parent->parent->name, $self->parent->name;
my $type = $self->{'_prog'} =~ /P$|^T/ ? 'amino' : 'dna';
Bio::UnivAln->new( -seqs => [$qseq, $sseq],
-desc => $desc,
-type => $type,
);
}
=head2 display
Usage : $sbjct_object->display( %named_parameters );
Purpose : Display information about Bio::Tools::Blast::Sbjct.pm data members
: including: length, gaps, score, significance value,
: sequences and sequence indices.
Example : $object->display(-SHOW=>'stats');
Argument : Named parameters: (TAGS CAN BE UPPER OR LOWER CASE)
: -SHOW => 'hsp',
: -WHERE => filehandle (default = STDOUT)
Returns : n/a
Status : Experimental
Comments : For more control over the display of sequence data,
: use seq(), seq_str(), seq_inds().
See Also : L<_display_seq()|_display_seq>, L, L, L, L<_display_matches()|_display_matches>, B
=cut
#-----------
sub display {
#-----------
my( $self, %param ) = @_;
my $sbjctName = $self->parent->name();
my $queryName = $self->parent->parent->name();
my $layout = $self->parent->parent->_layout();
my $OUT = $self->set_display(%param);
printf( $OUT "%-15s: %d\n", "LENGTH TOTAL", $self->length('total') );
printf( $OUT "%-15s: %d\n", "LENGTH QUERY", $self->length('query') );
printf( $OUT "%-15s: %d\n", "LENGTH SBJCT", $self->length('sbjct') );
printf( $OUT "%-15s: %d\n", "GAPS QUERY", $self->gaps('query') );
printf( $OUT "%-15s: %d\n", "GAPS SBJCT", $self->gaps('sbjct') );
printf( $OUT "%-15s: %d\n", "SCORE", $self->{'_score'} );
printf( $OUT "%-15s: %0.1f\n", "BITS", $self->{'_bits'} );
if($layout == 1) {
printf( $OUT "%-15s: %.1e\n", "P-VAL", $self->{'_p'} );
printf( $OUT "%-15s: %.1e\n", "EXPECT", $self->{'_expect'} );
} else {
printf( $OUT "%-15s: %.1e\n", "EXPECT", $self->{'_expect'} );
}
my $queryLength = $self->length('query');
printf( $OUT "%-15s: %d (%0.0f%%)\n", "IDENTICAL", $self->{'_numIdentical'},
$self->{'_numIdentical'}/$queryLength * 100 );
printf( $OUT "%-15s: %d (%0.0f%%) %s \n", "CONSERVED", $self->{'_numConserved'},
$self->{'_numConserved'}/$queryLength * 100,
"includes identical" );
$self->_display_seq('query', $queryName, $OUT);
$self->_display_seq('sbjct', $sbjctName, $OUT);
$self->_display_matches($queryName, $sbjctName, $OUT);
}
=head2 _display_seq
Usage : n/a; called automatically by display()
Purpose : Display information about query and sbjct HSP sequences.
: Prints the start, stop coordinates and the actual sequence.
Example : n/a
Argument :
Returns : printf call.
Status : Experimental
Comments : For more control, use seq(), seq_str(), or seq_inds().
See Also : L, L, L, L, L<_display_matches()|_display_matches>
=cut
#------------------
sub _display_seq {
#------------------
my( $self, $seqType, $name, $OUT ) = @_;
$self->_set_seq_data() unless $self->{'_set_seq_data'};
# Sensitive to member name changes.
my $mem = "_\L$seqType\E";
printf( $OUT "\n%10s: %s\n%10s %s\n", "\U$seqType\E", "$name", "-----",
('-'x ((CORE::length $name) + 2)) );
printf( $OUT "%13s: %d\n", "START", $self->{$mem.'Start'} );
printf( $OUT "%13s: %d\n", "STOP", $self->{$mem.'Stop'} );
printf( $OUT "%13s: \n", "SEQ" );
foreach( @{ $self->{$mem.'Seq'}} ) {
printf( $OUT "%15s%s\n", "", $_);
}
}
=head2 _display_matches
Usage : n/a; called automatically by display()
Purpose : Display information about identical and conserved positions
: within both the query and sbjct sequences.
Example : n/a
Argument :
Returns : printf call.
Status : Experimental
Comments : For more control, use seq_inds().
See Also : L, L, L<_display_seq()|_display_seq>,
=cut
#--------------------
sub _display_matches {
#--------------------
my( $self, $queryName, $sbjctName, $OUT) = @_;
my($resNum, $count);
$self->_set_residues() unless defined $self->{'_identicalRes_query'};
printf( $OUT "\n%10s: \n%10s\n", "HITS", "-----" );
foreach( @{ $self->{'_matchSeq'}} ) {
printf( $OUT "%15s%s\n", "", $_ );
}
print $OUT "\n\U$queryName\E\n------------\n";
printf( $OUT "\n%5s%s:\n%5s%s\n\t", "", "IDENTICAL RESIDUES IN $queryName (n=$self->{'_numIdentical'})",
"", "--------------------------------------------" );
$count = 0;
foreach $resNum ( sort keys %{ $self->{'_identicalRes_query' }} ) {
$count++;
print $OUT "$resNum";
$count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t");
}
print $OUT "\n";
my $justConserved = ($self->{'_numConserved'})-($self->{'_numIdentical'});
printf( $OUT "\n%5s%s:\n%5s%s\n\t", "","CONSERVED RESIDUES IN $queryName (n=$justConserved)",
"", "--------------------------------------------" );
$count = 0;
foreach $resNum ( sort keys %{ $self->{'_conservedRes_query' }} ) {
$count++;
print $OUT "$resNum";
$count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t");
}
print $OUT "\n\n\U$sbjctName\E\n------------\n";
printf( $OUT "\n%5s%s:\n%5s%s\n\t", "", "IDENTICAL RESIDUES IN $sbjctName (n=$self->{'_numIdentical'})",
"", "--------------------------------------------" );
$count = 0;
foreach $resNum ( sort keys %{ $self->{'_identicalRes_sbjct' }} ) {
$count++;
print $OUT "$resNum";
$count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t");
}
print $OUT "\n";
$justConserved = ($self->{'_numConserved'})-($self->{'_numIdentical'});
printf( $OUT "\n%5s%s:\n%5s%s\n\t", "","CONSERVED RESIDUES IN $sbjctName (n=$justConserved)",
"", "--------------------------------------------" );
$count = 0;
foreach $resNum ( sort keys %{ $self->{'_conservedRes_sbjct' }} ) {
$count++;
print $OUT "$resNum";
$count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t");
}
}
=head2 homol_data
Usage : $data = $hsp_object->homo_data( %named_params );
Purpose : Gets similarity data for a single HSP.
Returns : String:
: "Homology data" for each HSP is in the format:
: " "
: where integer is the value returned by homol_score().
Argument : Named params: (UPPER OR LOWERCASE TAGS)
: currently just one param is used:
: -SEQ =>'query' or 'sbjct'
Throws : n/a
Status : Experimental
Comments : This is a very experimental method used for obtaining a
: coarse indication of:
: 1) how strong the similarity is between the sequences in the HSP,
: 3) the endpoints of the alignment (sequence monomer numbers)
See Also : L, B, B
=cut
#---------------
sub homol_data {
#---------------
my ($self, %param) = @_;
my $seq = $param{-SEQ} || $param{'-seq'} || 'sbjct'; # 'query' or 'sbjct'
my $homolScore = $self->homol_score();
# Sensitive to member name changes.
$seq = "_\L$seq\E";
$self->_set_seq_data() unless $self->{'_set_seq_data'};
return ( $homolScore.' '.$self->{$seq.'Start'}.' '.$self->{$seq.'Stop'});
}
=head2 homol_score
Usage : $self->homol_score();
Purpose : Get a homology score (integer 1 - 3) as a coarse representation of
: the strength of the similarity independent of sequence composition.
: Based on the Blast bit score.
Example : $hscore = $hsp->homol_score();
Returns : Integer
Argument : n/a
Throws : n/a
Status : Experimental
Comments : See @Bio::Tools::Blast::HSP::SCORE_CUTOFFS for the specific values.
: Currently, BIT_SCORE HOMOL_SCORE
: --------- -----------
: >=100 --> 3
: 30-100 --> 2
: < 30 --> 1
See Also : L
=cut
#----------------
sub homol_score {
#----------------
my $self = shift;
if( $self->{'_bits'} >= $SCORE_CUTOFFS[0] ) { 1 }
elsif($self->{'_bits'} < $SCORE_CUTOFFS[0] and
$self->{'_bits'} >= $SCORE_CUTOFFS[1] ) { 2 }
else { 3 }
}
#####################################################################################
## CLASS METHODS ##
#####################################################################################
=head1 CLASS METHODS
=head2 collapse_nums
Usage : @cnums = collapse_nums( @numbers );
Purpose : Collapses a list of numbers into a set of ranges of consecutive terms:
: Useful for condensing long lists of consecutive numbers.
: EXPANDED:
: 1 2 3 4 5 6 10 12 13 14 15 17 18 20 21 22 24 26 30 31 32
: COLLAPSED:
: 1-6 10 12-15 17 18 20-22 24 26 30-32
Argument : List of numbers and sorted numerically.
Returns : List of numbers mixed with ranges of numbers (see above).
Throws : n/a
Comments : Probably belongs in a more general utility class.
See Also : L
=cut
#------------------
sub collapse_nums {
#------------------
# This is not the slickest connectivity algorithm, but will do for now.
my @a = @_;
my ($from, $to, $i, @ca, $consec);
$consec = 0;
for($i=0; $i < @a; $i++) {
not $from and do{ $from = $a[$i]; next; };
if($a[$i] == $a[$i-1]+1) {
$to = $a[$i];
$consec++;
} else {
if($consec == 1) { $from .= ",$to"; }
else { $from .= $consec>1 ? "\-$to" : ""; }
push @ca, split(',', $from);
$from = $a[$i];
$consec = 0;
$to = undef;
}
}
if(defined $to) {
if($consec == 1) { $from .= ",$to"; }
else { $from .= $consec>1 ? "\-$to" : ""; }
}
push @ca, split(',', $from) if $from;
@ca;
}
1;
__END__
#####################################################################################
# END OF CLASS
#####################################################################################
=head1 FOR DEVELOPERS ONLY
=head2 Data Members
Information about the various data members of this module is provided for those
wishing to modify or understand the code. Two things to bear in mind:
=over 4
=item 1 Do NOT rely on these in any code outside of this module.
All data members are prefixed with an underscore to signify that they are private.
Always use accessor methods. If the accessor doesn't exist or is inadequate,
create or modify an accessor (and let me know, too!).
=item 2 This documentation may be incomplete and out of date.
It is easy for these data member descriptions to become obsolete as
this module is still evolving. Always double check this info and search
for members not described here.
=back
An instance of Bio::Tools::Blast::HSP.pm is a blessed reference to a hash containing
all or some of the following fields:
FIELD VALUE
--------------------------------------------------------------
(member names are mostly self-explanatory)
_score :
_bits :
_p :
_n : Integer. The 'N' value listed in parenthesis with P/Expect value:
: e.g., P(3) = 1.2e-30 ---> (N = 3).
: Not defined in NCBI Blast2 with gaps.
: To obtain the number of HSPs, use Bio::Tools::Blast::Sbjct::num_hsps().
_expect :
_queryLength :
_queryGaps :
_queryStart :
_queryStop :
_querySeq :
_sbjctLength :
_sbjctGaps :
_sbjctStart :
_sbjctStop :
_sbjctSeq :
_matchSeq : String. Contains the symbols between the query and sbjct lines
which indicate identical (letter) and conserved ('+') matches
or a mismatch (' ').
_numIdentical :
_numConserved :
_identicalRes_query :
_identicalRes_sbjct :
_conservedRes_query :
_conservedRes_sbjct :
_match_indent : The number of leading space characters on each line containing
the match symbols. _match_indent is 13 in this example:
Query: 285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG
Q +APWGLARIS G+ + Y YD+ AG
^^^^^^^^^^^^^
INHERITED DATA MEMBERS
_name : From Bio::Root::Object.pm.
:
_parent : From Bio::Root::Object.pm. This member contains a reference to the
: Bio::Tools::Blast::Sbjct.pm object to which this hit belongs.
=cut
1;