Bio::Search::HSP
BlastHSP
Toolbar
Summary
Bio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object
Package variables
No package variables defined.
Included modules
Inherit
Synopsis
The construction of BlastHSP objects is performed by
Bio::Factory::BlastHitFactory in a process that is
orchestrated by the Blast parser (
Bio::SearchIO::psiblast).
The resulting BlastHSPs are then accessed via
Bio::Search::Hit::BlastHit). Therefore, you do not need to
use
Bio::Search::HSP::BlastHSP) directly. If you need to construct
BlastHSPs directly, see the new() function for details.
For
Bio::SearchIO BLAST parsing usage examples, see the
examples/search-blast directory of the Bioperl distribution.
Description
A
Bio::Search::HSP::BlastHSP object provides an interface to data
obtained in a single alignment section of a Blast report (known as a
"High-scoring Segment Pair"). This is essentially a pairwise
alignment with score information.
BlastHSP objects are accessed via
Bio::Search::Hit::BlastHitobjects after parsing a BLAST report using the
Bio::SearchIOsystem.
Sequence endpoints are swapped so that start is always less than
end. This affects For TBLASTN/X hits on the minus strand. Strand
information can be recovered using the strand() method. This
normalization step is standard Bioperl practice. It also facilitates
use of range information by methods such as match().
* Supports BLAST versions 1.x and 2.x, gapped and ungapped.
Bio::Search::HSP::BlastHSP.pm has the ability to extract a list of all
residue indices for identical and conservative matches along both
query and sbjct sequences. Since this degree of detail is not always
needed, this behavior does not occur during construction of the BlastHSP
object. These data will automatically be collected as necessary as
the BlastHSP.pm object is used.
Methods
Methods description
Title : algorithm Usage : $alg = $hsp->algorithm(); Function: Gets the algorithm specification that was used to obtain the hsp For BLAST, the algorithm denotes what type of sequence was aligned against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated dna-prt, TBLASTN prt-translated dna, TBLASTX translated dna-translated dna). Returns : a scalar string Args : none |
Usage : $hsp->end( [seq_type] ); Purpose : Gets the end coordinate for the query, sbjct, or both sequences : in the HSP alignment. : NOTE: Start will always be less than end. : To determine strand, use $hsp->strand() Example : $query_end = $hsp->end('query'); : $hit_end = $hsp->end('hit'); : ($query_end, $hit_end) = $hsp->end(); Returns : scalar context: integer : array context without args: list of two integers Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query') : ('sbjct' is synonymous with 'hit') : Array context can be "induced" by providing an argument of 'list' or 'array'. Throws : n/a
See Also : start(), range(), strand() |
Usage : $hsp_obj->evalue() Purpose : Get the Expect value for the HSP. Returns : Float (0.001 or 1.3e-43) Argument : n/a Throws : n/a Comments : Support for returning the expectation data in different : formats (e.g., exponent only), is not provided for HSP objects. : This is only available for the BlastHit or Blast object.
See Also : p() |
Usage : $hsp_object->frac_conserved( [seq_type] ); Purpose : Get the fraction of conserved positions within the given HSP. : (Note: 'conservative' positions are called 'positives' in the : Blast report.) Example : $frac_cons = $hsp_object->frac_conserved('query'); Returns : Float (2-decimal precision, e.g., 0.75). Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total' : ('sbjct' is synonymous with 'hit') : default = 'total' (but see comments below). Throws : n/a Comments : Different versions of Blast report different values for the total : length of the alignment. This is the number reported in the : denominators in the stats section: : "Identical = 34/120 Positives = 67/120". : NCBI-BLAST uses the total length of the alignment (with gaps) : WU-BLAST uses the length of the query sequence (without gaps). : Therefore, when called without an argument or an argument of 'total', : this method will report different values depending on the : version of BLAST used. : : To get the fraction conserved among only the aligned residues, : ignoring the gaps, call this method with an argument of 'query' : or 'sbjct'.
See Also : frac_conserved(), num_conserved(), matches() |
Usage : $hsp_object->frac_identical( [seq_type] ); Purpose : Get the fraction of identical positions within the given HSP. Example : $frac_iden = $hsp_object->frac_identical('query'); Returns : Float (2-decimal precision, e.g., 0.75). Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total' : ('sbjct' is synonymous with 'hit') : default = 'total' (but see comments below). Throws : n/a Comments : Different versions of Blast report different values for the total : length of the alignment. This is the number reported in the : denominators in the stats section: : "Identical = 34/120 Positives = 67/120". : NCBI-BLAST uses the total length of the alignment (with gaps) : WU-BLAST uses the length of the query sequence (without gaps). : Therefore, when called without an argument or an argument of 'total', : this method will report different values depending on the : version of BLAST used. : : To get the fraction identical among only the aligned residues, : ignoring the gaps, call this method with an argument of 'query' : or 'sbjct' ('sbjct' is synonymous with 'hit').
See Also : frac_conserved(), num_identical(), matches() |
Usage : $hsp->gaps( [seq_type] ) Purpose : Get the number of gaps in the query, sbjct, or total alignment. : Also can return query gaps and sbjct gaps as a two-element list : when in array context. Example : $total_gaps = $hsp->gaps(); : ($qgaps, $sgaps) = $hsp->gaps(); : $qgaps = $hsp->gaps('query'); Returns : scalar context: integer : array context without args: (int, int) = ('queryGaps', 'sbjctGaps') Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total' : ('sbjct' is synonymous with 'hit') : (default = 'total', scalar context) : Array context can be "induced" by providing an argument of 'list' or 'array'. Throws : n/a
See Also : length(), matches() |
Usage : $hsp->get_aln() Purpose : Get a Bio::SimpleAlign object constructed from the query + sbjct : sequences of the present HSP object. Example : $aln_obj = $hsp->get_aln(); Returns : Object reference for a Bio::SimpleAlign.pm object. Argument : n/a. Throws : Propagates any exception ocurring during the construction of : the Bio::SimpleAlign object. Comments : Requires Bio::SimpleAlign. : The Bio::SimpleAlign object is constructed from the query + sbjct : sequence objects obtained by calling seq(). : Gap residues are included (see $GAP_SYMBOL).
See Also : seq(), Bio::SimpleAlign |
Title : hit_string Usage : my $hseq = $hsp->hit_string; Function: Retrieves the hit sequence of this HSP as a string Returns : string Args : none |
Title : homology_string Usage : my $homo_string = $hsp->homology_string; Function: Retrieves the homology sequence for this HSP as a string. : The homology sequence is the string of symbols in between the : query and hit sequences in the alignment indicating the degree : of conservation (e.g., identical, similar, not similar). Returns : string Args : none |
Usage : $hsp->length( [seq_type] ) Purpose : Get the length of the aligned portion of the query or sbjct. Example : $hsp->length('query') Returns : integer Argument : seq_type: 'query' | 'hit' or 'sbjct' | 'total' (default = 'total') ('sbjct' is synonymous with 'hit') Throws : n/a Comments : 'total' length is the full length of the alignment : as reported in the denominators in the alignment section: : "Identical = 34/120 Positives = 67/120".
See Also : gaps() |
Usage : $hsp->matches([seq_type], [start], [stop]); Purpose : Get the total number of identical and conservative matches : in the query or sbjct sequence for the given HSP. Optionally can : report data within a defined interval along the seq. : (Note: 'conservative' matches are called 'positives' in the : Blast report.) Example : ($id,$cons) = $hsp_object->matches('hit'); : ($id,$cons) = $hsp_object->matches('query',300,400); Returns : 2-element array of integers Argument : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query) : ('sbjct' is synonymous with 'hit') : (2) start = Starting coordinate (optional) : (3) stop = Ending coordinate (optional) Throws : Exception if the supplied coordinates are out of range. Comments : Relies on seq_str('match') to get the string of alignment symbols : between the query and sbjct lines which are used for determining : the number of identical and conservative matches.
See Also : length(), gaps(), seq_str(), Bio::Search::Hit::BlastHit::_adjust_contigs() |
Usage : $hsp_obj->n() Purpose : Get the N value (num HSPs on which P/Expect is based). : This value is not defined with NCBI Blast2 with gapping. Returns : Integer or null string if not defined. Argument : n/a Throws : n/a Comments : The 'N' value is listed in parenthesis with P/Expect value: : e.g., P(3) = 1.2e-30 ---> (N = 3). : Not defined in NCBI Blast2 with gaps. : This typically is equal to the number of HSPs but not always. : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().
See Also : Bio::SeqFeature::SimilarityPair::score() |
Usage : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params ); : Bio::Search::HSP::BlastHSP.pm objects are constructed : automatically by Bio::SearchIO::BlastHitFactory.pm, : so there is no need for direct instantiation. Purpose : Constructs a new BlastHSP object and Initializes key variables : for the HSP. Returns : A Bio::Search::HSP::BlastHSP object Argument : Named parameters: : Parameter keys are case-insensitive. : -RAW_DATA => array ref containing raw BLAST report data for : for a single HSP. This includes all lines : of the HSP alignment from a traditional BLAST or PSI-BLAST (non-XML) report, : -RANK => integer (1..n). : -PROGRAM => string ('TBLASTN', 'BLASTP', etc.). : -QUERY_NAME => string, id of query sequence : -HIT_NAME => string, id of hit sequence : Comments : Having the raw data allows this object to do lazy parsing of : the raw HSP data (i.e., not parsed until needed). : : Note that there is a fair amount of basic parsing that is : currently performed in this module that would be more appropriate : to do within a separate factory object. : This parsing code will likely be relocated and more initialization : parameters will be added to new(). : See Also : Bio::SeqFeature::SimilarityPair::new(), Bio::SeqFeature::Similarity::new() |
Usage : $hsp_object->num_conserved(); Purpose : Get the number of conserved positions within the given HSP. Example : $num_iden = $hsp_object->num_conserved(); Returns : integer Argument : n/a Throws : n/a
See Also : num_identical(), frac_conserved() |
Usage : $hsp_object->num_identical(); Purpose : Get the number of identical positions within the given HSP. Example : $num_iden = $hsp_object->num_identical(); Returns : integer Argument : n/a Throws : n/a
See Also : num_conserved(), frac_identical() |
Usage : $hsp_obj->p() Purpose : Get the P-value for the HSP. Returns : Float (0.001 or 1.3e-43) or undef if not defined. Argument : n/a Throws : n/a Comments : P-value is not defined with NCBI Blast2 reports. : Support for returning the expectation data in different : formats (e.g., exponent only) is not provided for HSP objects. : This is only available for the BlastHit or Blast object.
See Also : expect() |
Title : query_string Usage : my $qseq = $hsp->query_string; Function: Retrieves the query sequence of this HSP as a string Returns : string Args : none |
Usage : $hsp->range( [seq_type] ); Purpose : Gets the (start, end) coordinates for the query or sbjct sequence : in the HSP alignment. Example : ($query_beg, $query_end) = $hsp->range('query'); : ($hit_beg, $hit_end) = $hsp->range('hit'); Returns : Two-element array of integers Argument : seq_type = string, 'query' or 'hit' or 'sbjct' (default = 'query') : ('sbjct' is synonymous with 'hit') Throws : n/a
See Also : start(), end() |
Usage : $hsp->rank( [string] ); Purpose : Get the rank of the HSP within a given Blast hit. Example : $rank = $hsp->rank; Returns : Integer (1..n) corresponding to the order in which the HSP appears in the BLAST report. |
Usage : $hsp->seq( [seq_type] ); Purpose : Get the query or sbjct sequence as a Bio::Seq.pm object. Example : $seqObj = $hsp->seq('query'); Returns : Object reference for a Bio::Seq.pm object. Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = 'query'). : ('sbjct' is synonymous with 'hit') Throws : Propagates any exception that occurs during construction : of the Bio::Seq.pm object. Comments : The sequence is returned in an array of strings corresponding : to the strings in the original format of the Blast alignment. : (i.e., same spacing).
See Also : seq_str(), seq_inds(), Bio::Seq |
Usage : $hsp->seq_inds( seq_type, class, collapse ); Purpose : Get a list of residue positions (indices) for all identical : or conserved residues in the query or sbjct sequence. Example : @s_ind = $hsp->seq_inds('query', 'identical'); : @h_ind = $hsp->seq_inds('hit', 'conserved'); : @h_ind = $hsp->seq_inds('hit', 'conserved', 1); Returns : List of integers : May include ranges if collapse is true. Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = query) : ('sbjct' is synonymous with 'hit') : class = 'identical' or 'conserved' (default = identical) : (can be shortened to 'id' or 'cons') : (actually, anything not 'id' will evaluate to 'conserved'). : collapse = boolean, if true, consecutive positions are merged : using a range notation, e.g., "1 2 3 4 5 7 9 10 11" : collapses to "1-5 7 9-11". This is useful for : consolidating long lists. Default = no collapse. Throws : n/a. Comments : Calls _set_residues() to set the 'match' sequence if it has : not been set already.
See Also : seq(), _set_residues(), Bio::Search::BlastUtils::collapse_nums(), Bio::Search::Hit::BlastHit::seq_inds() |
Usage : $hsp->seq_str( seq_type ); Purpose : Get the full query, sbjct, or 'match' sequence as a string. : The 'match' sequence is the string of symbols in between the : query and sbjct sequences. Example : $str = $hsp->seq_str('query'); Returns : String Argument : seq_Type = 'query' or 'hit' or 'sbjct' or 'match' : ('sbjct' is synonymous with 'hit') Throws : Exception if the argument does not match an accepted seq_type. Comments : Calls _set_seq_data() to set the 'match' sequence if it has : not been set already.
See Also : seq(), seq_inds(), _set_match_seq() |
Usage : $hsp_obj->signif() Purpose : Get the P-value or Expect value for the HSP. Returns : Float (0.001 or 1.3e-43) : Returns P-value if it is defined, otherwise, Expect value. Argument : n/a Throws : n/a Comments : Provided for consistency with BlastHit::signif() : Support for returning the significance data in different : formats (e.g., exponent only), is not provided for HSP objects. : This is only available for the BlastHit or Blast object.
See Also : p(), expect(), Bio::Search::Hit::BlastHit::signif() |
Usage : $hsp->start( [seq_type] ); Purpose : Gets the start coordinate for the query, sbjct, or both sequences : in the HSP alignment. : NOTE: Start will always be less than end. : To determine strand, use $hsp->strand() Example : $query_beg = $hsp->start('query'); : $hit_beg = $hsp->start('hit'); : ($query_beg, $hit_beg) = $hsp->start(); Returns : scalar context: integer : array context without args: list of two integers Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query') : ('sbjct' is synonymous with 'hit') : Array context can be "induced" by providing an argument of 'list' or 'array'. Throws : n/a
See Also : end(), range() |
Usage : $hsp_object->strand( [seq_type] ) Purpose : Get the strand of the query or sbjct sequence. Example : print $hsp->strand('query'); : ($query_strand, $hit_strand) = $hsp->strand(); Returns : -1, 0, or 1 : -1 = Minus strand, +1 = Plus strand : Returns 0 if strand is not defined, which occurs : for BLASTP reports, and the query of TBLASTN : as well as the hit if BLASTX reports. : In scalar context without arguments, returns queryStrand value. : In array context without arguments, returns a two-element list : of strings (queryStrand, sbjctStrand). : Array context can be "induced" by providing an argument of 'list' or 'array'. Argument : seq_type: 'query' or 'hit' or 'sbjct' or undef : ('sbjct' is synonymous with 'hit') Throws : n/a
See Also : _set_seq(), _set_match_stats() |
Title : to_string Usage : print $hsp->to_string; Function: Returns a string representation for the Blast HSP. Primarily intended for debugging purposes. Example : see usage Returns : A string of the form: [BlastHSP] <rank> e.g.: [BlastHit] 1 Args : None |
Methods code
sub _id_str
{ my $self = shift;
if( not defined $self->{'_id_str'}) {
my $qname = $self->query->seqname;
my $hname = $self->hit->seqname;
$self->{'_id_str'} = "QUERY=\"$qname\" HIT=\"$hname\"";
}
return $self->{'_id_str'};
}
} |
sub _set_data
{ my $self = shift;
my @data = @_;
my @queryList = (); my @sbjctList = (); my @matchList = ();
my $matchLine = 0; my @linedat = ();
my($line, $aln_row_len, $length_diff);
$length_diff = 0;
foreach $line( @data ) {
next if $line =~ /^\s*$/;
if( $line =~ /^ ?Score/ ) {
$self->_set_score_stats( $line );
} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {
$self->_set_match_stats( $line );
} elsif( $line =~ /^ ?Frame = ([\d+-]+)/ ) {
my $frame = abs($1) - 1;
$self->frame( $frame );
} elsif( $line =~ /^(Query:?[\s\d]+)([^\s\d]+)/ ) {
push @queryList, $line;
$self->{'_match_indent'} = CORE::length $1;
$aln_row_len = (CORE::length $1) + (CORE::length $2);
$matchLine = 1;
} elsif( $matchLine ) {
$length_diff = $aln_row_len - CORE::length $line;
$length_diff and $line .= ' 'x $length_diff;
push @matchList, $line;
$matchLine = 0;
} elsif( $line =~ /^Sbjct/ ) {
push @sbjctList, $line;
}
}
$self->{'_queryList'} =\@ queryList;
$self->{'_sbjctList'} =\@ sbjctList;
$self->{'_matchList'} =\@ matchList;
if(not defined ($self->{'_numIdentical'})) {
my $id_str = $self->_id_str;
$self->throw( -text => "Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)");
}
if(!scalar @queryList or !scalar @sbjctList) {
my $id_str = $self->_id_str;
$self->throw( "Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)");
}
}
} |
sub _set_match_seq
{ my $self = shift;
if( ! ref($self->{'_matchList'}) ) {
my $id_str = $self->_id_str;
$self->throw("Can't set HSP match sequence: No data ($id_str)");
}
my @data = @{$self->{'_matchList'}};
my(@sequence);
foreach( @data ) {
chomp($_);
s/^ {$self->{'_match_indent'}}//;
push @sequence, $_;
}
@{$self->{'_matchList'}} = undef;
$self->{'_matchList'} = undef;
$self->{'_matchSeq'} =\@ sequence;
return $self->{'_matchSeq'}; } |
sub _set_match_stats
{ my ($self, $data) = @_;
if($data =~ m!Identities = (\d+)/(\d+)!) { # blast1 or 2 format $self->{'_numIdentical'} = $1; $self->{'_totalLength'} = $2;
}
if($data =~ m!Positives = (\d+)/(\d+)!) { # blast1 or 2 format $self->{'_numConserved'} = $1; $self->{'_totalLength'} = $2;
}
if($data =~ m!Frame = ([\d+-]+)!) { $self->frame($1); }
if($data =~ m!Strand = (\w+) / (\w+)!) { $self->{'_queryStrand'} = $1; $self->{'_sbjctStrand'} = $2;
}
}
} |
sub _set_residues
{ my $self = shift;
my @sequence = ();
$self->_set_seq_data() unless $self->{'_set_seq_data'};
my %identicalList_query = ();
my %identicalList_sbjct = ();
my %conservedList_query = ();
my %conservedList_sbjct = ();
my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};
$aref ||= $self->{'_matchSeq'};
my $seqString = join('', @$aref );
my $qseq = join('',@{$self->{'_querySeq'}});
my $sseq = join('',@{$self->{'_sbjctSeq'}});
my $resCount_query = $self->{'_queryStop'} || 0;
my $resCount_sbjct = $self->{'_sbjctStop'} || 0;
my $prog = $self->algorithm;
if($prog !~ /^BLASTP|^BLASTN/) {
if($prog eq 'TBLASTN') {
$resCount_sbjct /= 3; } elsif($prog eq 'BLASTX') {
$resCount_query /= 3; } elsif($prog eq 'TBLASTX') {
$resCount_query /= 3; $resCount_sbjct /= 3; }
}
my ($mchar, $schar, $qchar);
while( $mchar = chop($seqString) ) {
($qchar, $schar) = (chop($qseq), chop($sseq));
if( $mchar eq '+' ) {
$conservedList_query{ $resCount_query } = 1;
$conservedList_sbjct{ $resCount_sbjct } = 1;
} elsif( $mchar ne ' ' ) {
$identicalList_query{ $resCount_query } = 1;
$identicalList_sbjct{ $resCount_sbjct } = 1;
}
$resCount_query-- if $qchar ne $GAP_SYMBOL;
$resCount_sbjct-- if $schar ne $GAP_SYMBOL;
}
$self->{'_identicalRes_query'} =\% identicalList_query;
$self->{'_conservedRes_query'} =\% conservedList_query;
$self->{'_identicalRes_sbjct'} =\% identicalList_sbjct;
$self->{'_conservedRes_sbjct'} =\% conservedList_sbjct;
}
} |
sub _set_score_stats
{ my ($self, $data) = @_;
my ($expect, $p);
if($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect = +([\d.e+-]+)/) {
$self->bits($1);
$self->score($2);
$expect = $3;
} elsif($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect\((\d+)\) = +([\d.e+-]+)/) {
$self->bits($1);
$self->score($2);
$self->{'_n'} = $3;
$expect = $4;
} elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), P = +([\d.e-]+)/) {
$self->score($1);
$self->bits($2);
$expect = $3;
$p = $4;
} elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), +Sum P\((\d+)\) = +([\d.e-]+)/) {
$self->score($1);
$self->bits($2);
$expect = $3;
$self->{'_n'} = $4;
$p = $5;
} else {
my $id_str = $self->_id_str;
$self->throw(-class => 'Bio::Root::Exception',
-text => "Can't parse score statistics: unrecognized format. ($id_str)",
-value => $data);
}
$expect = "1$expect" if $expect =~ /^e/i;
$p = "1$p" if defined $p and $p=~ /^e/i;
$self->{'_expect'} = $expect;
$self->{'_p'} = $p || undef;
$self->significance( $p || $expect );
}
} |
sub _set_seq
{ my $self = shift;
my $seqType = shift;
my @data = @_;
my @ranges = ();
my @sequence = ();
my $numGaps = 0;
foreach( @data ) {
if( m/(\d+) *([^\d\s]+) *(\d+)/) { push @ranges, ( $1, $3 ) ; push @sequence, $2;
} else {
$self->warn("Bad sequence data: $_");
}
}
if( !(scalar(@sequence) and scalar(@ranges))) {
my $id_str = $self->_id_str;
$self->throw("Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str)");
}
$seqType = "_\L$seqType\E";
$self->{$seqType.'Start'} = $ranges[0];
$self->{$seqType.'Stop'} = $ranges[ $#ranges ];
$self->{$seqType.'Seq'} =\@ sequence;
$self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;
my $prog = $self->algorithm;
if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {
$self->{$seqType.'Length'} /= 3; } elsif($prog eq 'BLASTX' and $seqType eq '_query') {
$self->{$seqType.'Length'} /= 3; } elsif($prog eq 'TBLASTX') {
$self->{$seqType.'Length'} /= 3; }
if( $prog ne 'BLASTP' ) {
$self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;
$self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');
if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {
($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =
($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});
$self->{$seqType.'Strand'} = 'Minus';
}
}
my $seqstr = join('', @sequence);
$seqstr =~ s/\s//g;
my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};
$self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;
}
} |
sub _set_seq_data
{ my $self = shift;
$self->_set_seq('query', @{$self->{'_queryList'}});
$self->_set_seq('sbjct', @{$self->{'_sbjctList'}});
@{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();
undef $self->{'_queryList'};
undef $self->{'_sbjctList'};
$self->{'_set_seq_data'} = 1;
}
} |
sub algorithm
{ my ($self,@args) = @_;
return $self->{'_prog'}; } |
sub end
{ my ($self, $seqType) = @_;
$seqType ||= (wantarray ? 'list' : 'query');
$seqType = 'sbjct' if $seqType eq 'hit';
$self->_set_seq_data() unless $self->{'_set_seq_data'};
if($seqType =~ /list|array/i) {
return ($self->{'_queryStop'}, $self->{'_sbjctStop'});
} else {
$seqType = "_\L$seqType\E";
return $self->{$seqType.'Stop'};
} } |
sub evalue
{ shift->{'_expect'} } |
sub expect
{ shift->evalue( @_ ); } |
sub frac_conserved
{
my( $self, $seqType ) = @_;
$seqType ||= 'total';
$seqType = 'sbjct' if $seqType eq 'hit';
if($seqType ne 'total') {
$self->_set_seq_data() unless $self->{'_set_seq_data'};
}
$seqType = "_\L$seqType\E";
sprintf( "%.2f", $self->{'_numConserved'}/$self->{$seqType.'Length'});
} |
sub frac_identical
{
my( $self, $seqType ) = @_;
$seqType ||= 'total';
$seqType = 'sbjct' if $seqType eq 'hit';
if($seqType ne 'total') {
$self->_set_seq_data() unless $self->{'_set_seq_data'};
}
$seqType = "_\L$seqType\E";
sprintf( "%.2f", $self->{'_numIdentical'}/$self->{$seqType.'Length'});
} |
sub gaps
{ my( $self, $seqType ) = @_;
$self->_set_seq_data() unless $self->{'_set_seq_data'};
$seqType ||= (wantarray ? 'list' : 'total');
$seqType = 'sbjct' if $seqType eq 'hit';
if($seqType =~ /list|array/i) {
return (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));
}
if($seqType eq 'total') {
return ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;
} else {
$seqType = "_\L$seqType\E";
return $self->{$seqType.'Gaps'} || 0;
} } |
sub get_aln
{ my $self = shift;
require Bio::SimpleAlign;
require Bio::LocatableSeq;
my $qseq = $self->seq('query');
my $sseq = $self->seq('sbjct');
my $type = $self->algorithm =~ /P$|^T/ ? 'amino' : 'dna';
my $aln = new Bio::SimpleAlign();
$aln->add_seq(new Bio::LocatableSeq(-seq => $qseq->seq(),
-id => 'query_'.$qseq->display_id(),
-start => 1,
-end => CORE::length($qseq)));
$aln->add_seq(new Bio::LocatableSeq(-seq => $sseq->seq(),
-id => 'hit_'.$sseq->display_id(),
-start => 1,
-end => CORE::length($sseq)));
return $aln;
}
1;
__END__ } |
sub hit_string
{shift->seq_str('hit'); } |
sub homology_string
{shift->seq_str('match'); } |
sub length
{
my( $self, $seqType ) = @_;
$seqType ||= 'total';
$seqType = 'sbjct' if $seqType eq 'hit';
$seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};
$seqType = "_\L$seqType\E";
$self->{$seqType.'Length'}; } |
sub matches
{ my( $self, %param ) = @_;
my(@data);
my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});
$seqType ||= 'query';
$seqType = 'sbjct' if $seqType eq 'hit';
my($start,$stop);
if(!defined $beg && !defined $end) {
push @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});
} else {
$beg ||= 0;
$end ||= 0;
($start,$stop) = $self->range($seqType);
if($beg == 0) { $beg = $start; $end = $beg+$end; }
elsif($end == 0) { $end = $stop; $beg = $end-$beg; }
if($end >= $stop) { $end = $stop; } else { $end += 1;}
if($beg < $start) { $beg = $start; }
my $seq = "";
my $prog = $self->algorithm;
if (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))
{
$seq = substr($self->seq_str('match'),
int(($beg-$start)/3), int(($end-$beg+1)/3));
} elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))
{
$seq = substr($self->seq_str('match'),
int(($beg-$start)/3), int(($end-$beg+1)/3));
} else {
$seq = substr($self->seq_str('match'),
$beg-$start, ($end-$beg));
}
if(!CORE::length $seq) {
my $id_str = $self->_id_str;
$self->throw("Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)");
}
$seq =~ s/ //g; my $len_cons = CORE::length $seq;
$seq =~ s/\+//g; my $len_id = CORE::length $seq;
push @data, ($len_id, $len_cons);
}
@data; } |
sub n
{ my $self = shift; $self->{'_n'} || ''; } |
sub new
{ my ($class, @args ) = @_;
my $self = $class->SUPER::new( @args );
$self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;
my ($raw_data, $qname, $hname, $qlen, $hlen);
($self->{'_prog'}, $self->{'_rank'}, $raw_data,
$qname, $hname) =
$self->_rearrange([qw( PROGRAM
RANK
RAW_DATA
QUERY_NAME
HIT_NAME
)], @args );
$self->_set_data( @{$raw_data} );
my ($qb, $hb) = ($self->start());
my ($qe, $he) = ($self->end());
my ($qs, $hs) = ($self->strand());
my ($qf,$hf) = ($self->query->frame(),
$self->hit->frame);
$self->query( Bio::SeqFeature::Similarity->new (-start =>$qb,
-end =>$qe,
-strand =>$qs,
-bits =>$self->bits,
-score =>$self->score,
-frame =>$qf,
-seq_id => $qname,
-source =>$self->{'_prog'} ));
$self->hit( Bio::SeqFeature::Similarity->new (-start =>$hb,
-end =>$he,
-strand =>$hs,
-bits =>$self->bits,
-score =>$self->score,
-frame =>$hf,
-seq_id => $hname,
-source =>$self->{'_prog'} ));
$self->query->seqlength($qlen); $self->hit->seqlength($hlen);
$self->query->frac_identical($self->frac_identical('query'));
$self->hit->frac_identical($self->frac_identical('hit'));
return $self;
}
} |
sub num_conserved
{ my( $self) = shift;
$self->{'_numConserved'}; } |
sub num_identical
{ my( $self) = shift;
$self->{'_numIdentical'}; } |
sub p
{ my $self = shift; $self->{'_p'}; } |
sub pvalue
{ shift->p(@_); } |
sub query_string
{shift->seq_str('query'); } |
sub range
{ my ($self, $seqType) = @_;
$self->_set_seq_data() unless $self->{'_set_seq_data'};
$seqType ||= 'query';
$seqType = 'sbjct' if $seqType eq 'hit';
$seqType = "_\L$seqType\E";
return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'}); } |
sub rank
{ shift->{'_rank'} } |
sub seq
{ my($self,$seqType) = @_;
$seqType ||= 'query';
$seqType = 'sbjct' if $seqType eq 'hit';
my $str = $self->seq_str($seqType);
require Bio::Seq;
new Bio::Seq (-ID => $self->to_string,
-SEQ => $str,
-DESC => "$seqType sequence",
); } |
sub seq_inds
{ my ($self, $seqType, $class, $collapse) = @_;
$seqType ||= 'query';
$class ||= 'identical';
$collapse ||= 0;
$seqType = 'sbjct' if $seqType eq 'hit';
$self->_set_residues() unless defined $self->{'_identicalRes_query'};
$seqType = ($seqType !~ /^q/i ? 'sbjct' : 'query');
$class = ($class !~ /^id/i ? 'conserved' : 'identical');
$seqType = "_\L$seqType\E";
$class = "_\L$class\E";
my @ary = sort { $a <=> $b } keys %{ $self->{"${class}Res$seqType"}};
require Bio::Search::BlastUtils if $collapse;
return $collapse ? &Bio::Search::BlastUtils::collapse_nums(@ary) : @ary; } |
sub seq_str
{ my($self,$seqType) = @_;
$seqType ||= 'query';
$seqType = 'sbjct' if $seqType eq 'hit';
$seqType = "_\L$seqType\E";
$self->_set_seq_data() unless $self->{'_set_seq_data'};
if($seqType =~ /sbjct|query/) {
my $seq = join('',@{$self->{$seqType.'Seq'}});
$seq =~ s/\s+//g;
return $seq;
} elsif( $seqType =~ /match/i) {
my $aref = $self->_set_match_seq() unless ref $self->{'_matchSeq'};
$aref = $self->{'_matchSeq'};
return join('',@$aref);
} else {
my $id_str = $self->_id_str;
$self->throw(-class => 'Bio::Root::BadParameter',
-text => "Invalid or undefined sequence type: $seqType ($id_str)\n" .
"Valid types: query, sbjct, match",
-value => $seqType);
} } |
sub signif
{ my $self = shift;
my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};
$val; } |
sub start
{ my ($self, $seqType) = @_;
$seqType ||= (wantarray ? 'list' : 'query');
$seqType = 'sbjct' if $seqType eq 'hit';
$self->_set_seq_data() unless $self->{'_set_seq_data'};
if($seqType =~ /list|array/i) {
return ($self->{'_queryStart'}, $self->{'_sbjctStart'});
} else {
$seqType = "_\L$seqType\E";
return $self->{$seqType.'Start'};
} } |
sub strand
{ my( $self, $seqType ) = @_;
return if $self->{'_initializing'};
$seqType ||= (wantarray ? 'list' : 'query');
$seqType = 'sbjct' if $seqType eq 'hit';
$seqType = "_\L$seqType\E";
$self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'};
my $prog = $self->algorithm;
if($seqType =~ /list|array/i) {
my ($qstr, $hstr);
if( $prog eq 'BLASTP') {
$qstr = 0;
$hstr = 0;
}
elsif( $prog eq 'TBLASTN') {
$qstr = 0;
$hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}};
}
elsif( $prog eq 'BLASTX') {
$qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}};
$hstr = 0;
}
else {
$qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}} if defined $self->{'_queryStrand'};
$hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}} if defined $self->{'_sbjctStrand'};
}
$qstr ||= 0;
$hstr ||= 0;
return ($qstr, $hstr);
}
local $^W = 0;
$STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0; } |
sub to_string
{ my $self = shift;
return "[BlastHSP] " . $self->rank();
}
} |
General documentation
Relationship to SimpleAlign.pm & Seq.pm | Top |
BlastHSP.pm can provide the query or sbjct sequence as a
Bio::Seqobject via the
seq() method. The BlastHSP.pm object can also create a
two-sequence
Bio::SimpleAlign alignment object using the the query
and sbjct sequences via the
get_aln() method. Creation of alignment
objects is not automatic when constructing the BlastHSP.pm object since
this level of functionality is not always required and would generate
a lot of extra overhead when crunching many reports.
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one
of the Bioperl mailing lists. Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://bio.perl.org/MailList.html - About the mailing lists
Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution. Bug reports can be submitted via email
or the web:
bioperl-bugs@bio.perl.org
http://bugzilla.bioperl.org/
Steve Chervitz <sac@bioperl.org>
See
the FEEDBACK section for where to send bug reports and comments.
This software was originally developed in the Department of Genetics
at Stanford University. I would also like to acknowledge my
colleagues at Affymetrix for useful feedback.
Bio::Search::Hit::BlastHit.pm - Blast hit object.
Bio::Search::Result::BlastResult.pm - Blast Result object.
Bio::Seq.pm - Biosequence object
Copyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.
This software is provided "as is" without warranty of any kind.
The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _
Information about the various data members of this module is provided for those
wishing to modify or understand the code. Two things to bear in mind:
1 Do NOT rely on these in any code outside of this module.
All data members are prefixed with an underscore to signify that they are private.
Always use accessor methods. If the accessor doesn't exist or is inadequate,
create or modify an accessor (and let me know, too!).
2 This documentation may be incomplete and out of date.
It is easy for these data member descriptions to become obsolete as
this module is still evolving. Always double check this info and search
for members not described here.
An instance of Bio::Search::HSP::BlastHSP.pm is a blessed reference to a hash containing
all or some of the following fields:
FIELD VALUE
--------------------------------------------------------------
(member names are mostly self-explanatory)
_score :
_bits :
_p :
_n : Integer. The 'N' value listed in parenthesis with P/Expect value:
: e.g., P(3) = 1.2e-30 ---> (N = 3).
: Not defined in NCBI Blast2 with gaps.
: To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().
_expect :
_queryLength :
_queryGaps :
_queryStart :
_queryStop :
_querySeq :
_sbjctLength :
_sbjctGaps :
_sbjctStart :
_sbjctStop :
_sbjctSeq :
_matchSeq : String. Contains the symbols between the query and sbjct lines
which indicate identical (letter) and conserved ('+') matches
or a mismatch (' ').
_numIdentical :
_numConserved :
_identicalRes_query :
_identicalRes_sbjct :
_conservedRes_query :
_conservedRes_sbjct :
_match_indent : The number of leading space characters on each line containing
the match symbols. _match_indent is 13 in this example:
Query: 285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG
Q +APWGLARIS G+ + Y YD+ AG
^^^^^^^^^^^^^