Bio::DB
SwissProt
Toolbar
Summary
Bio::DB::SwissProt - Database object interface to SwissProt retrieval
Package variables
No package variables defined.
Included modules
Inherit
Synopsis
use Bio::DB::SwissProt;
$sp = new Bio::DB::SwissProt;
$seq = $sp->get_Seq_by_id('KPY1_ECOLI'); # SwissProt ID
# <4-letter-identifier>_
# or ...
$seq = $sp->get_Seq_by_acc('P43780'); # SwissProt AC
# [OPQ]xxxxx
# In fact in this implementation
# these methods call the same webscript so you can use
# then interchangeably
# choose a different server to query
$sp = new Bio::DB::SwissProt('-servertype' => 'expasy',
'-hostlocation' => 'us');
$seq = $sp->get_Seq_by_id('BOLA_HAEIN'); # SwissProtID
Description
SwissProt is a curated database of proteins managed by the Swiss
Bioinformatics Institute. This is in contrast to EMBL/GenBank/DDBJ
which are archives of protein information. Additional tools for
parsing and manipulating swissprot files can be found at
ftp://ftp.ebi.ac.uk/pub/localsw/swissprot/Swissknife/.
Allows the dynamic retrieval of Sequence objects (Bio::Seq) from the
SwissProt database via an expasy retrieval. Perhaps through SRS
later.
In order to make changes transparent we have host type (currently only
expasy) and location (default to switzerland) separated out. This
allows the user to pick the closest expasy mirror for running their
queries.
Methods
Methods description
Title : default_format Usage : my $format = $self->default_format Function: Returns default sequence format for this module Returns : string Args : none |
Title : get_Stream_by_batch Usage : $seq = $db->get_Stream_by_batch($ref); Function: Retrieves Seq objects from SwissProt 'en masse', rather than one at a time. This is implemented the same way as get_Stream_by_id, but is provided here in keeping with access methods of NCBI modules. Example : Returns : a Bio::SeqIO stream object Args : $ref : either an array reference, a filename, or a filehandle from which to get the list of unique ids/accession numbers. |
Title : get_request Usage : my $url = $self->get_request Function: returns a HTTP::Request object Returns : Args : %qualifiers = a hash of qualifiers (ids, format, etc) |
Title : hostlocation Usage : my $location = $self->hostlocation() $self->hostlocation($location) Function: Set/Get Hostlocation Returns : string representing hostlocation Args : string specifying hostlocation [optional] |
Title : location Usage : my $url = $self->location_url() Function: Get host url Returns : string representing url Args : none |
Title : postprocess_data Usage : $self->postprocess_data ( 'type' => 'string', 'location' => \$datastr); Function: process downloaded data before loading into a Bio::SeqIO Returns : void Args : hash with two keys - 'type' can be 'string' or 'file' - 'location' either file location or string reference containing data |
Title : request_format Usage : my ($req_format, $ioformat) = $self->request_format; $self->request_format("genbank"); $self->request_format("fasta"); Function: Get/Set sequence format retrieval. The get-form will normally not be used outside of this and derived modules. Returns : Array of two strings, the first representing the format for retrieval, and the second specifying the corresponding SeqIO format. Args : $format = sequence format |
Title : servertype Usage : my $servertype = $self->servertype $self->servertype($servertype); Function: Get/Set server type Returns : string Args : server type string [optional] |
Methods code
sub default_format
{ return $DEFAULTFORMAT; } |
sub get_Stream_by_batch
{ my ($self, $ids) = @_;
return $self->get_Stream_by_id( $ids); } |
sub get_request
{ my ($self, @qualifiers) = @_;
my ($uids, $format) = $self->_rearrange([qw(UIDS FORMAT)],
@qualifiers);
if( !defined $uids ) {
$self->throw("Must specify a value for uids to query");
}
my ($f,undef) = $self->request_format($format);
my %vars = (
@{$HOSTS{$self->servertype}->{'basevars'}},
( 'format' => $f )
);
my $url = $self->location_url;
my $uid;
my $jointype = $HOSTS{$self->servertype}->{'jointype'} || ' ';
my $idvar = $HOSTS{$self->servertype}->{'idvar'} || 'id';
if( ref($uids) =~ /ARRAY/i ) {
$uid = join($jointype, @$uids);
} else {
$uid = $uids;
}
$vars{$idvar} = $uid;
return POST $url,\% vars; } |
sub hostlocation
{ my ($self, $location ) = @_;
$location = lc $location;
my $servertype = $self->servertype;
$self->throw("Must have a valid servertype defined not $servertype")
unless defined $servertype;
my %hosts = %{$HOSTS{$servertype}->{'hosts'}};
if( defined $location && $location ne '' ) {
if( ! $hosts{$location} ) {
$self->throw("Must specify a known host, not $location,".
" possible values (".
join(",", sort keys %hosts ). ")");
}
$self->{'_hostlocation'} = $location;
}
return $self->{'_hostlocation'}; } |
sub location_url
{ my ($self) = @_;
my $servertype = $self->servertype();
my $location = $self->hostlocation();
if( ! defined $location || !defined $servertype ) {
$self->throw("must have a valid hostlocation and servertype set before calling location_url");
}
return sprintf($HOSTS{$servertype}->{'baseurl'},
$HOSTS{$servertype}->{'hosts'}->{$location}); } |
sub new
{ my ($class, @args) = @_;
my $self = $class->SUPER::new(@args);
my ($format, $hostlocation,$servertype) =
$self->_rearrange([qw(FORMAT HOSTLOCATION SERVERTYPE)],
@args);
if( $format && $format !~ /(swiss)|(fasta)/i ) {
$self->warn("Requested Format $format is ignored because only SwissProt and Fasta formats are currently supported");
$format = $self->default_format;
}
$servertype = $DEFAULTSERVERTYPE unless $servertype;
$servertype = lc $servertype;
$self->servertype($servertype);
if ( $hostlocation ) {
$self->hostlocation(lc $hostlocation);
}
$self->request_format($format); return $self; } |
sub postprocess_data
{ my ($self, %args) = @_;
return; } |
sub request_format
{ my ($self, $value) = @_;
if( defined $value ) {
if( $self->servertype =~ /expasy/ ) {
if( $value =~ /sprot/ || $value =~ /swiss/ ) {
$self->{'_format'} = [ 'sprot', 'swiss'];
} elsif( $value =~ /^fa/ ) {
$self->{'_format'} = [ 'fasta', 'fasta'];
} else {
$self->warn("Unrecognized format $value requested");
$self->{'_format'} = [ 'fasta', 'fasta'];
}
} elsif( $self->servertype =~ /ebi/ ) {
if( $value =~ /sprot/ || $value =~ /swiss/ ) {
$self->{'_format'} = [ 'swissprot', 'swiss' ];
} elsif( $value =~ /^fa/ ) {
$self->{'_format'} = [ 'fasta', 'fasta'];
} else {
$self->warn("Unrecognized format $value requested");
$self->{'_format'} = [ 'swissprot', 'swiss'];
}
}
}
return @{$self->{'_format'}};
}
1;
__END__ } |
sub servertype
{ my ($self, $servertype) = @_;
if( defined $servertype && $servertype ne '') {
$self->throw("You gave an invalid server type ($servertype)".
" - available types are ".
keys %HOSTS) unless( $HOSTS{$servertype} );
$self->{'_servertype'} = $servertype;
$self->{'_hostlocation'} = $HOSTS{$servertype}->{'default'};
my ($existingformat,$seqioformat) = $self->request_format;
$self->request_format($existingformat);
}
return $self->{'_servertype'} || $DEFAULTSERVERTYPE; } |
General documentation
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one
of the Bioperl mailing lists. Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://bio.perl.org/MailList.html - About the mailing lists
Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution. Bug reports can be submitted via email
or the web:
bioperl-bugs@bio.perl.org
http://bugzilla.bioperl.org/
AUTHOR - Jason Stajich | Top |
Email Jason Stajich <jason@bioperl.org <
Thanks go to Alexandre Gattiker <gattiker@isb-sib.ch> of Swiss
Institute of Bioinformatics for helping point us in the direction of
the correct expasy scripts and for swissknife references.
Also thanks to Heikki Lehvaslaiho <heikki@ebi.ac.uk> for help with
adding EBI swall server.
The rest of the documentation details each of the object
methods. Internal methods are usually preceded with a _
Routines from Bio::DB::RandomAccessI | Top |
Title : get_Seq_by_id
Usage : $seq = $db->get_Seq_by_id('ROA1_HUMAN')
Function: Gets a Bio::Seq object by its name
Returns : a Bio::Seq object
Args : the id (as a string) of a sequence
Throws : "id does not exist" exception
Title : get_Seq_by_acc
Usage : $seq = $db->get_Seq_by_acc('X77802');
Function: Gets a Bio::Seq object by accession number
Returns : A Bio::Seq object
Args : accession number (as a string)
Throws : "acc does not exist" exception
Title : get_Stream_by_id
Usage : $stream = $db->get_Stream_by_id( [$uid1, $uid2] );
Function: Gets a series of Seq objects by unique identifiers
Returns : a Bio::SeqIO stream object
Args : $ref : a reference to an array of unique identifiers for
the desired sequence entries
Title : get_Stream_by_acc
Usage : $seq = $db->get_Seq_by_acc([$acc1, $acc2]);
Function: Gets a series of Seq objects by accession numbers
Returns : a Bio::SeqIO stream object
Args : $ref : a reference to an array of accession numbers for
the desired sequence entries
Note : For GenBank, this just calls the same code for get_Stream_by_id()
Implemented Routines from Bio::DB::WebDBSeqI interface | Top |
Bio::DB::SwissProt specific routines | Top |