Summary | Included libraries | Package variables | Synopsis | Description | General documentation | Methods |
WebCvs | Raw content |
@params = ('database' => 'swissprot','outfile' => 'blast1.out',Blast a sequence against a database:
'_READMETHOD' => 'Blast');
$factory = Bio::Tools::Run::StandAloneBlast->new(@params);
$str = Bio::SeqIO->new(-file=>'t/amino.fa' , '-format' => 'Fasta' );Run an iterated Blast (psiblast) of a sequence against a database:
$input = $str->next_seq();
$input2 = $str->next_seq();
$blast_report = $factory->blastall($input);
$factory->j(3); # 'j' is blast parameter for # of iterationsUse blast to align 2 sequences against each other:
$factory->outfile('psiblast1.out');
$factory = Bio::Tools::Run::StandAloneBlast->new(@params);
$blast_report = $factory->blastpgp($input);
$factory = Bio::Tools::Run::StandAloneBlast->new('outfile' => 'bl2seq.out');Various additional options and input formats are available. See the
$factory->bl2seq($input, $input2);
@params = ('program' => 'blastn', 'database' => 'ecoli.nt');Any parameters not explicitly set will remain as the defaults of the
$factory = Bio::Tools::Run::StandAloneBlast->new(@params);
$expectvalue = 0.01;Note that for improved script readibility one can modify the name of
$factory->e($expectvalue);
> blastpgp - .Once the factory has been created and the appropriate parameters set,
$inputfilename = 't/testquery.fa';In addition, sequence input may be in the form of either a Bio::Seq
$blast_report = $factory->blastall($inputfilename);
$input = Bio::Seq->new(-id=>"test query",-seq=>"ACTACCCTTTAAATCAGTGGGGG");For blastall and non-psiblast blastpgp runs, report object is either a
$blast_report = $factory->blastall($input);
$str = Bio::AlignIO->new(-file=> "cysprot.msf", '-format' => 'msf' );For bl2seq execution, StandAloneBlast.pm can be combined with
$aln = $str->next_aln();
$len = $aln->length_aln();
$mask = '1' x $len; # simple case where PSSM's to be used at all residues
$report = $factory->blastpgp("cysprot1.fa", $aln, $mask);
#Get 2 sequencesFor more examples of syntax and use of Blast.pm, the user is
$str = Bio::SeqIO->new(-file=>'t/amino.fa' , '-format' => 'Fasta', );
my $seq3 = $str->next_seq();
my $seq4 = $str->next_seq();
# Run bl2seq on them $factory = Bio::Tools::Run::StandAloneBlast->new('outfile' => 'bl2seq.out'); my $bl2seq_report = $factory->bl2seq($seq3, $seq4); # Use AlignIO.pm to create a SimpleAlign object from the bl2seq report $str = Bio::AlignIO->new(-file=> 'bl2seq.out','-format' => 'bl2seq'); $aln = $str->next_aln();
BEGIN | Code | |
AUTOLOAD | No description | Code |
DESTROY | No description | Code |
_generic_local_blast | Description | Code |
_runblast | Description | Code |
_setinput | Description | Code |
_setparams | Description | Code |
bl2seq | Description | Code |
blastall | Description | Code |
blastpgp | Description | Code |
executable | Description | Code |
new | No description | Code |
program | No description | Code |
program_dir | Description | Code |
program_path | Description | Code |
_generic_local_blast | code | next | Top |
Title : _generic_local_blast |
_runblast | code | prev | next | Top |
Title : _runblast |
_setinput | code | prev | next | Top |
Title : _setinput |
_setparams | code | prev | next | Top |
Title : _setparams |
bl2seq | code | prev | next | Top |
Title : bl2seq |
blastall | code | prev | next | Top |
Title : blastall |
blastpgp | code | prev | next | Top |
Title : blastpgp |
executable | code | prev | next | Top |
Title : executable |
program_dir | code | prev | next | Top |
Title : program_dir |
program_path | code | prev | next | Top |
Title : program_path |
BEGIN | Top |
@BLASTALL_PARAMS = qw( p d i e m o F G E X I q r v b f g Q}
D a O J M W z K L Y S T l U y Z);
@BLASTPGP_PARAMS = qw(d i A f e m o y P F G E X N g S H a I h c
j J Z O M v b C R W z K L Y p k T Q B l U);
@BL2SEQ_PARAMS = qw(i j p g o d a G E X W M q r F e S T ;
# Non BLAST parameters start with underscore to differentiate them
# from BLAST parameters
@OTHER_PARAMS = qw(_READMETHOD);
# _READMETHOD = 'BPlite' (default) or 'Blast'
# my @other_switches = qw(QUIET);
# Authorize attribute fields
foreach my $attr (@BLASTALL_PARAMS, @BLASTPGP_PARAMS,
@BL2SEQ_PARAMS, @OTHER_PARAMS )
{ $OK_FIELD{$attr}++; }
# You will need to enable Blast to find the Blast program. This can be done
# in (at least) two different ways:
# 1. define an environmental variable blastDIR:
# export BLASTDIR=/home/peter/blast or
# 2. include a definition of an environmental variable BLASTDIR in every script that will
# use StandAloneBlast.pm.
# BEGIN {$ENV{BLASTDIR} = '/home/peter/blast/'; }
$PROGRAMDIR = $ENV{'BLASTDIR'} || '';
# If local BLAST databases are not stored in the standard
# /data directory, the variable BLASTDATADIR will need to be set explicitly
$DATADIR = $ENV{'BLASTDATADIR'} || $ENV{'BLASTDB'} || '';
AUTOLOAD | description | prev | next | Top |
my $self = shift; my $attr = $AUTOLOAD; $attr =~ s/.*:://; my $attr_letter = substr($attr, 0, 1) ; # actual key is first letter of $attr unless first attribute}
# letter is underscore (as in _READMETHOD), the $attr is a BLAST
# parameter and should be truncated to its first letter only
$attr = ($attr_letter eq '_') ? $attr : $attr_letter; $self->throw("Unallowed parameter: $attr !") unless $OK_FIELD{$attr}; # $self->throw("Unallowed parameter: $attr !") unless $ok_field{$attr_letter};
$self->{$attr_letter} = shift if @_; return $self->{$attr_letter};
DESTROY | description | prev | next | Top |
my $self= shift; unless ( $self->save_tempfiles ) { $self->cleanup(); } $self->SUPER::DESTROY(); } 1; __END__}
_generic_local_blast | description | prev | next | Top |
my $self = shift; my $executable = shift; # Create parameter string to pass to Blast program}
my $param_string = $self->_setparams($executable); # run Blast
my $blast_report = &_runblast($self, $executable, $param_string);
_runblast | description | prev | next | Top |
my ($self,$executable,$param_string) = @_; my ($blast_obj,$exe); if( ! ($exe = $self->executable($executable)) ) { $self->warn("cannot find path to $executable"); return undef; } my $commandstring = $exe. $param_string; # next line for debugging}
$self->debug( "$commandstring\n "); my $status = system($commandstring); $self->throw("$executable call crashed: $? $commandstring\n") unless ($status==0) ; my $outfile = $self->o() ; # get outputfilename
my $signif = $self->e() || 1e-5 ; # set significance cutoff to set expectation value or default value
# (may want to make this value vary for different executables)
# If running bl2seq or psiblast (blastpgp with multiple iterations),
# the specific parsers for these programs must be used (ie BPbl2seq or
# BPpsilite). Otherwise either the Blast parser or the BPlite
# parsers can be selected.
if ($executable =~ /bl2seq/i) { if( $self->verbose > 0 ) { open(OUT, $outfile) || $self->throw("cannot open $outfile"); while(<OUT>) { $self->debug($_)} close(OUT); } # Added program info so BPbl2seq can compute strand info
$blast_obj = Bio::Tools::BPbl2seq->new(-file => $outfile, -REPORT_TYPE => $self->p ); # $blast_obj = Bio::Tools::BPbl2seq->new(-file => $outfile);
} elsif ($executable =~ /blastpgp/i && defined $self->j() && $self->j() > 1) { print "using psilite parser\n"; $blast_obj = Bio::Tools::BPpsilite->new(-file => $outfile); } elsif ($self->_READMETHOD =~ /^Blast/i ) { $blast_obj = Bio::SearchIO->new(-file=>$outfile, -format => 'blast' ) ; } elsif ($self->_READMETHOD =~ /^BPlite/i ) { $blast_obj = Bio::Tools::BPlite->new(-file=>$outfile); } else { $self->warn("Unrecognized readmethod ".$self->_READMETHOD. " or executable $executable\n"); } return $blast_obj;
_setinput | description | prev | next | Top |
my ($self, $executable, $input1, $input2) = @_; my ($seq, $temp, $infilename1, $infilename2,$fh ) ; # If $input1 is not a reference it better be the name of a file with}
# the sequence/ alignment data...
$self->io->_io_cleanup(); SWITCH: { unless (ref $input1) { $infilename1 = (-e $input1) ? $input1 : 0 ; last SWITCH; } # $input may be an array of BioSeq objects...
if (ref($input1) =~ /ARRAY/i ) { ($fh,$infilename1) = $self->io->tempfile(); $temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'Fasta'); foreach $seq (@$input1) { unless ($seq->isa("Bio::PrimarySeqI")) {return 0;} $temp->write_seq($seq); } close $fh; $fh = undef; last SWITCH; } # $input may be a single BioSeq object...
elsif ($input1->isa("Bio::PrimarySeqI")) { ($fh,$infilename1) = $self->io->tempfile(); # just in case $input1 is taken from an alignment and has spaces (ie
# deletions) indicated within it, we have to remove them - otherwise
# the BLAST programs will be unhappy
my $seq_string = $input1->seq(); $seq_string =~ s/\W+//g; # get rid of spaces in sequence
$input1->seq($seq_string); $temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'Fasta'); $temp->write_seq($input1); close $fh; undef $fh; # $temp->write_seq($input1);
last SWITCH; } $infilename1 = 0; # Set error flag if you get here
} # End SWITCH
unless ($input2) { return $infilename1; } SWITCH2: { unless (ref $input2) { $infilename2 = (-e $input2) ? $input2 : 0 ; last SWITCH2; } if ($input2->isa("Bio::PrimarySeqI") && $executable eq 'bl2seq' ) { ($fh,$infilename2) = $self->io->tempfile(); $temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'Fasta'); $temp->write_seq($input2); close $fh; undef $fh; last SWITCH2; } # Option for using psiblast's pre-alignment "jumpstart" feature
elsif ($input2->isa("Bio::SimpleAlign") && $executable eq 'blastpgp' ) { # a bit of a lie since it won't be a fasta file
($fh,$infilename2) = $self->io->tempfile(); # first we retrieve the "mask" that determines which residues should
# by scored according to their position and which should be scored
# using the non-position-specific matrices
my @mask = split("", shift ); # get mask
# then we have to convert all the residues in every sequence to upper
# case at the positions that we want psiblast to use position specific
# scoring
foreach $seq ( $input2->each_seq() ) { my @seqstringlist = split("",$seq->seq()); for (my $i = 0; $i < scalar(@mask); $i++) { unless ( $seqstringlist[$i] =~ /[a-zA-Z]/ ) {next} $seqstringlist[$i] = $mask[$i] ? uc $seqstringlist[$i]: lc $seqstringlist[$i] ; } my $newseqstring = join("", @seqstringlist); $seq->seq($newseqstring); } # Now we need to write out the alignment to a file
# in the "psi format" which psiblast is expecting
$input2->map_chars('\.','-'); $temp = Bio::AlignIO->new(-fh=> $fh, '-format' => 'psi'); $temp->write_aln($input2); close $fh; undef $fh; last SWITCH2; } $infilename2 = 0; # Set error flag if you get here
} # End SWITCH2
return ($infilename1, $infilename2);
_setparams | description | prev | next | Top |
my ($self,$executable) = @_; my ($attr, $value, @execparams); if ($executable eq 'blastall') {@execparams = @BLASTALL_PARAMS; } if ($executable eq 'blastpgp') {@execparams = @BLASTPGP_PARAMS; } if ($executable eq 'bl2seq') {@execparams = @BL2SEQ_PARAMS; } my $param_string = ""; for $attr ( @execparams ) { $value = $self->$attr(); next unless (defined $value); # Need to prepend datadirectory to database name}
if ($attr eq 'd' && ($executable ne 'bl2seq')) { # This is added so that you can specify a DB with a full path
if (! (-e $value.".nin" || -e $value.".pin")){ $value = File::Spec->catdir($DATADIR,$value); } } # put params in format expected by Blast
$attr = '-'. $attr ; $param_string .= " $attr $value "; } # if ($self->quiet()) { $param_string .= ' >/dev/null';}
return $param_string;
bl2seq | description | prev | next | Top |
my $self = shift; my $executable = 'bl2seq'; my $input1 = shift; my $input2 = shift; # Create input file pointer}
my ($infilename1, $infilename2 ) = $self->_setinput($executable, $input1, $input2); if (!$infilename1){$self->throw(" $input1 not Seq Object or file name!");} if (!$infilename2){$self->throw("$input2 not Seq Object or file name!");} $self->i($infilename1); # set file name of first sequence to
# be aligned to inputfilename1
# (-i param of bl2seq)
$self->j($infilename2); # set file name of first sequence to
# be aligned to inputfilename2
# (-j param of bl2seq)
my $blast_report = &_generic_local_blast($self, $executable); } #################################################
blastall | description | prev | next | Top |
my ($self,$input1) = @_; $self->io->_io_cleanup(); my $executable = 'blastall'; my $input2; # Create input file pointer}
my $infilename1 = $self->_setinput($executable, $input1); if (! $infilename1) {$self->throw(" $input1 ($infilename1) not Bio::Seq object or array of Bio::Seq objects or file name!");} $self->i($infilename1); # set file name of sequence to be blasted to inputfilename1 (-i param of blastall)
my $blast_report = &_generic_local_blast($self, $executable, $input1, $input2);
blastpgp | description | prev | next | Top |
my $self = shift; my $executable = 'blastpgp'; my $input1 = shift; my $input2 = shift; my $mask = shift; # used by blastpgp's -B option to specify which residues are position aligned}
my ($infilename1, $infilename2 ) = $self->_setinput($executable, $input1, $input2, $mask); if (!$infilename1) {$self->throw(" $input1 not Bio::Seq object or array of Bio::Seq objects or file name!");} $self->i($infilename1); # set file name of sequence to be blasted to inputfilename1 (-i param of blastpgp)
if ($input2) { unless ($infilename2) {$self->throw("$input2 not SimpleAlign Object in pre-aligned psiblast\n");} $self->B($infilename2); # set file name of partial alignment to inputfilename2 (-B param of blastpgp)
} my $blast_report = &_generic_local_blast($self, $executable, $input1, $input2);
executable | description | prev | next | Top |
my ($self, $exename, $exe,$warn) = @_; $exename = 'blastall' unless defined $exename; if( defined $exe && -x $exe ) { $self->{'_pathtoexe'}->{$exename} = $exe; } unless( defined $self->{'_pathtoexe'}->{$exename} ) { my $f = $self->program_path($exename); $exe = $self->{'_pathtoexe'}->{$exename} = $f if(-e $f && -x $f ); # This is how I meant to split up these conditionals --jason}
# if exe is null we will execute this (handle the case where
# PROGRAMDIR pointed to something invalid)
unless( $exe ) { # we didn't find it in that last conditional
if( ($exe = $self->io->exists_exe($exename)) && -x $exe ) { $self->{'_pathtoexe'}->{$exename} = $exe; } else { $self->warn("Cannot find executable for $exename") if $warn; $self->{'_pathtoexe'}->{$exename} = undef; } } } return $self->{'_pathtoexe'}->{$exename};
new | description | prev | next | Top |
my ($caller, @args) = @_; # chained new}
my $self = $caller->SUPER::new(@args); # to facilitiate tempfile cleanup
my ($tfh,$tempfile) = $self->io->tempfile(); close($tfh); # we don't want the filehandle, just a temporary name
$self->outfile($tempfile); $self->_READMETHOD('Blast'); while (@args) { my $attr = shift @args; my $value = shift @args; next if( $attr eq '-verbose'); # the workaround to deal with initializing
$attr = 'p' if $attr =~ /^\s*program\s*$/; $self->$attr($value); } return $self;
program | description | prev | next | Top |
my $self = shift; if( wantarray ) { return ($self->executable, $self->p()); } else { return $self->executable(@_); }}
program_dir | description | prev | next | Top |
$PROGRAMDIR;
}program_path | description | prev | next | Top |
my ($self,$program_name) = @_; my @path; push @path, $self->program_dir if $self->program_dir; push @path, $program_name .($^O =~ /mswin/i ?'.exe':''); return Bio::Root::IO->catfile(@path);}
DEVELOPERS NOTES | Top |
FEEDBACK | Top |
Mailing Lists | Top |
bioperl-l@bioperl.org - General discussion
http://bio.perl.org/MailList.html - About the mailing lists
Reporting Bugs | Top |
bioperl-bugs@bio.perl.org
http://bio.perl.org/bioperl-bugs/
AUTHOR - Peter Schattner | Top |
APPENDIX | Top |
BLAST parameters | Top |
Blastall | Top |
-p Program Name [String]
Input should be one of "blastp", "blastn", "blastx",
"tblastn", or "tblastx".
-d Database [String] default = nr
The database specified must first be formatted with formatdb.
Multiple database names (bracketed by quotations) will be accepted.
An example would be -d "nr est"
-i Query File [File In] Set by StandAloneBlast.pm from script.
default = stdin. The query should be in FASTA format. If multiple FASTA entries are in the input
file, all queries will be searched.
-e Expectation value (E) [Real] default = 10.0
-o BLAST report Output File [File Out] Optional,
default = ./blastreport.out ; set by StandAloneBlast.pm
-S Query strands to search against database (for blast[nx], and tblastx). 3 is both, 1 is top, 2 is bottom [Integer]
default = 3
Blastpgp (including Psiblast) | Top |
-j is the maximum number of rounds (default 1; i.e., regular BLAST)
-h is the e-value threshold for including sequences in the
score matrix model (default 0.001)
-c is the "constant" used in the pseudocount formula specified in the paper (default 10)
-B Multiple alignment file for PSI-BLAST "jump start mode" Optional
-Q Output File for PSI-BLAST Matrix in ASCII [File Out] Optional
Bl2seq | Top |
-i First sequence [File In]
-j Second sequence [File In]
-p Program name: blastp, blastn, blastx. For blastx 1st argument should be nucleotide [String]
default = blastp
-o alignment output file [File Out] default = stdout
-e Expectation value (E) [Real] default = 10.0
-S Query strands to search against database (blastn only). 3 is both, 1 is top, 2 is bottom [Integer]
default = 3
Methods | Top |
Bio::Tools::Run::Wrapper methods | Top |
no_param_checks | Top |
Title : no_param_checks
Usage : $obj->no_param_checks($newval)
Function: Boolean flag as to whether or not we should
trust the sanity checks for parameter values
Returns : value of no_param_checks
Args : newvalue (optional)
save_tempfiles | Top |
Title : save_tempfiles
Usage : $obj->save_tempfiles($newval)
Function:
Returns : value of save_tempfiles
Args : newvalue (optional)
outfile_name | Top |
Title : outfile_name
Usage : my $outfile = $tcoffee->outfile_name();
Function: Get/Set the name of the output file for this run
(if you wanted to do something special)
Returns : string
Args : [optional] string to set value to
tempdir | Top |
Title : tempdir
Usage : my $tmpdir = $self->tempdir();
Function: Retrieve a temporary directory name (which is created)
Returns : string which is the name of the temporary directory
Args : none
cleanup | Top |
Title : cleanup
Usage : $tcoffee->cleanup();
Function: Will cleanup the tempdir directory after a PAML run
Returns : none
Args : none
io | Top |
Title : io
Usage : $obj->io($newval)
Function: Gets a Bio::Root::IO object
Returns : Bio::Root::IO
Args : none