Bio::SeqIO
genbank
Toolbar
Summary
Bio::SeqIO::GenBank - GenBank sequence input/output stream
Package variables
No package variables defined.
Included modules
Inherit
Synopsis
It is probably best not to use this object directly, but
rather go through the SeqIO handler system. Go:
$stream = Bio::SeqIO->new(-file => $filename, -format => 'GenBank');
while ( my $seq = $stream->next_seq() ) {
# do something with $seq
}
Description
This object can transform Bio::Seq objects to and from GenBank flat
file databases.
There is alot of flexibility here about how to dump things which I need
to document fully.
This section is supposed to document which sections and properties of
a GenBank databank record end up where in the Bioperl object model. It
is far from complete and presently focuses only on those mappings
which may be non-obvious. $seq in the text refers to the
Bio::Seq::RichSeqI implementing object returned by the parser for each
record.
GI number
$seq->primary_id
_show_dna()
(output only) shows the dna or not
_post_sort()
(output only) provides a sorting func which is applied to the FTHelpers
before printing
_id_generation_func()
This is function which is called as
print "ID ", $func($seq), "\n";
To generate the ID line. If it is not there, it generates a sensible ID
line using a number of tools.
If you want to output annotations in genbank format they need to be
stored in a Bio::Annotation::Collection object which is accessible
through the Bio::SeqI interface method
annotation().
The following are the names of the keys which are polled from a
Bio::Annotation::Collection object.
reference - Should contain Bio::Annotation::Reference objects
comment - Should contain Bio::Annotation::Comment objects
segment - Should contain a Bio::Annotation::SimpleValue object
origin - Should contain a Bio::Annotation::SimpleValue object
Methods
Methods description
Title : _ac_generation_func Usage : $obj->_ac_generation_func($newval) Function: Returns : value of _ac_generation_func Args : newvalue (optional) |
Title : _id_generation_func Usage : $obj->_id_generation_func($newval) Function: Returns : value of _id_generation_func Args : newvalue (optional) |
Title : _kw_generation_func Usage : $obj->_kw_generation_func($newval) Function: Returns : value of _kw_generation_func Args : newvalue (optional) |
Title : _post_sort Usage : $obj->_post_sort($newval) Function: Returns : value of _post_sort Args : newvalue (optional) |
Title : _print_GenBank_FTHelper Usage : Function: Example : Returns : Args : |
Title : _read_FTHelper_GenBank Usage : _read_FTHelper_GenBank($buffer) Function: reads the next FT key line Example : Returns : Bio::SeqIO::FTHelper object Args : filehandle and reference to a scalar |
Title : _read_GenBank_References Usage : Function: Reads references from GenBank format. Internal function really Returns : Args : |
Title : _read_GenBank_Species Usage : Function: Reads the GenBank Organism species and classification lines. Example : Returns : A Bio::Species object Args : a reference to the current line buffer |
Title : _show_dna Usage : $obj->_show_dna($newval) Function: Returns : value of _show_dna Args : newvalue (optional) |
Title : _sv_generation_func Usage : $obj->_sv_generation_func($newval) Function: Returns : value of _sv_generation_func Args : newvalue (optional) |
Title : _write_line_GenBank Usage : Function: internal function Example : Returns : Args : |
Title : _write_line_GenBank_regex Usage : Function: internal function for writing lines of specified length, with different first and the next line left hand headers and split at specific points in the text Example : Returns : nothing Args : file handle, first header, second header, text-line, regex for line breaks, total line length |
Title : next_seq Usage : $seq = $stream->next_seq() Function: returns the next sequence in the stream Returns : Bio::Seq object Args : |
Title : write_seq Usage : $stream->write_seq($seq) Function: writes the $seq object (must be seq) to the stream Returns : 1 for success and 0 for error Args : array of 1 to n Bio::SeqI objects |
Methods code
sub _ac_generation_func
{ my ($obj,$value) = @_;
if( defined $value ) {
$obj->{'_ac_generation_func'} = $value;
}
return $obj->{'_ac_generation_func'}; } |
sub _add_ref_to_array
{ my ($self, $refs, $ref) = @_;
my $au = $ref->authors();
my $title = $ref->title();
$au =~ s/;\s*$//g if $au;
$title =~ s/;\s*$//g if $title;
$ref->authors($au);
$ref->title($title);
push(@{$refs}, $ref); } |
sub _id_generation_func
{ my ($obj,$value) = @_;
if( defined $value ) {
$obj->{'_id_generation_func'} = $value;
}
return $obj->{'_id_generation_func'}; } |
sub _initialize
{ my($self,@args) = @_;
$self->SUPER::_initialize(@args);
$self->{'_func_ftunit_hash'} = {};
$self->_show_dna(1); if( ! defined $self->sequence_factory ) {
$self->sequence_factory(new Bio::Seq::SeqFactory
(-verbose => $self->verbose(),
-type => 'Bio::Seq::RichSeq'));
} } |
sub _kw_generation_func
{ my ($obj,$value) = @_;
if( defined $value ) {
$obj->{'_kw_generation_func'} = $value;
}
return $obj->{'_kw_generation_func'};
}
1; } |
sub _post_sort
{ my ($obj,$value) = @_;
if( defined $value) {
$obj->{'_post_sort'} = $value;
}
return $obj->{'_post_sort'}; } |
sub _print_GenBank_FTHelper
{ my ($self,$fth,$always_quote) = @_;
if( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {
$fth->warn("$fth is not a FTHelper class. Attempting to print, but there could be tears!");
}
if( defined $fth->key &&
$fth->key eq 'CONTIG' ) {
$self->_write_line_GenBank_regex(sprintf("%-12s",$fth->key),
' 'x12,$fth->loc,"\,\|\$",80);
} else {
$self->_write_line_GenBank_regex(sprintf(" %-16s",$fth->key),
" "x21,
$fth->loc,"\,\|\$",80);
}
if( !defined $always_quote) { $always_quote = 0; }
foreach my $tag ( keys %{$fth->field} ) {
foreach my $value ( @{$fth->field->{$tag}} ) {
$value =~ s/\"/\"\"/g;
if ($value eq "_no_value") {
$self->_write_line_GenBank_regex(" "x21,
" "x21,
"/$tag","\.\|\$",80);
}
elsif( $always_quote == 1 || $value !~ /^\d+$/ ) {
my ($pat) = ($value =~ /\s/ ? '\s|$' : '.|$');
$self->_write_line_GenBank_regex(" "x21,
" "x21,
"/$tag=\"$value\"",$pat,80);
} else {
$self->_write_line_GenBank_regex(" "x21,
" "x21,
"/$tag=$value","\.\|\$",80);
}
}
} } |
sub _read_FTHelper_GenBank
{ my ($self,$buffer) = @_;
my ($key, $loc );
my @qual = ();
if ($$buffer =~ /^ (\S+)\s+(.+?)\s*$/o) {
$key = $1;
$loc = $2;
while ( defined($_ = $self->_readline) ) {
if (/^(\s+)(.+?)\s*$/o) {
if (length($1) > 6) {
if (@qual || (index($2,'/') == 0)) {
push(@qual, $2);
}
else {
$loc .= $2;
}
} else {
last;
}
} else {
last;
}
}
} else {
$self->debug("no feature key!\n");
$$buffer = $self->_readline();
return;
}
$$buffer = $_;
my $out = new Bio::SeqIO::FTHelper();
$out->verbose($self->verbose());
$out->key($key);
$out->loc($loc);
QUAL: for (my $i = 0; $i < @qual; $i++) {
$_ = $qual[$i];
my( $qualifier, $value ) = (m{^/([^=]+)(?:=(.+))?})
or $self->warn("cannot see new qualifier in feature $key: ".
$qual[$i]);
$qualifier = '' unless( defined $qualifier);
if (defined $value) {
if (substr($value, 0, 1) eq '"') {
while ($value !~ /\"$/ or $value =~ tr/"/"/ % 2) {
if($i >= $#qual) {
$self->warn("Unbalanced quote in:\n" .
join('', map("$_\n", @qual)) .
"No further qualifiers will " .
"be added for this feature");
last QUAL;
}
$i++; my $next = $qual[$i];
if(($value.$next) =~ /[^A-Za-z"-]/) {
$value .= " ";
}
$value .= $next;
}
$value =~ s/^"|"$//g;
$value =~ s/""/\"/g;
}
} else {
$value = '_no_value';
}
$out->field->{$qualifier} ||= [];
push(@{$out->field->{$qualifier}},$value);
}
return $out; } |
sub _read_GenBank_References
{ my ($self,$buffer) = @_;
my (@refs);
my $ref;
if( $$buffer !~ /^REFERENCE/ ) {
warn("Not parsing line '$$buffer' which maybe important");
}
$_ = $$buffer;
my (@title,@loc,@authors,@com,@medline,@pubmed);
REFLOOP: while( defined($_) || defined($_ = $self->_readline) ) {
if (/^ AUTHORS\s+(.*)/) {
push (@authors, $1);
while ( defined($_ = $self->_readline) ) {
/^\s{3,}(.*)/ && do { push (@authors, $1);next;};
last;
}
$ref->authors(join(' ', @authors));
}
if (/^ TITLE\s+(.*)/) {
push (@title, $1);
while ( defined($_ = $self->_readline) ) {
/^\s{3,}(.*)/ && do { push (@title, $1);
next;
};
last;
}
$ref->title(join(' ', @title));
}
if (/^ JOURNAL\s+(.*)/) {
push(@loc, $1);
while ( defined($_ = $self->_readline) ) {
/^\s{3,}(.*)/ && do { push(@loc, $1);
next;
};
last;
}
$ref->location(join(' ', @loc));
redo REFLOOP;
}
if (/^ REMARK\s+(.*)/) {
push (@com, $1);
while ( defined($_ = $self->_readline) ) {
/^\s{3,}(.*)/ && do { push(@com, $1);
next;
};
last;
}
$ref->comment(join(' ', @com));
redo REFLOOP;
}
if( /^ MEDLINE\s+(.*)/ ) {
push(@medline,$1);
while ( defined($_ = $self->_readline) ) {
/^\s{4,}(.*)/ && do { push(@medline, $1);
next;
};
last;
}
$ref->medline(join(' ', @medline));
redo REFLOOP;
}
if( /^ PUBMED\s+(.*)/ ) {
push(@pubmed,$1);
while ( defined($_ = $self->_readline) ) {
/^\s{5,}(.*)/ && do { push(@pubmed, $1);
next;
};
last;
}
$ref->pubmed(join(' ', @pubmed));
redo REFLOOP;
}
/^REFERENCE/ && do {
$self->_add_ref_to_array(\@refs,$ref) if $ref;
@authors = ();
@title = ();
@loc = ();
@com = ();
@pubmed = ();
@medline = ();
$ref = Bio::Annotation::Reference->new();
if (/^REFERENCE\s+\d+\s+\([a-z]+ (\d+) to (\d+)/){
$ref->start($1);
$ref->end($2);
}
};
/^(FEATURES)|(COMMENT)/ && last;
$_ = undef; }
$self->_add_ref_to_array(\@refs,$ref) if $ref;
$$buffer = $_;
return @refs;
}
} |
sub _read_GenBank_Species
{ my( $self,$buffer) = @_;
my @organell_names = ("chloroplast", "mitochondr");
$_ = $$buffer;
my( $sub_species, $species, $genus, $common, $organelle, @class );
while (defined($_) || defined($_ = $self->_readline())) {
s/<[^>]+>//g;
if (/^SOURCE\s+(.*)/) {
$common = $1;
$common =~ s/\.$//; } elsif (/^\s+ORGANISM/) {
my @spflds = split(' ', $_);
shift(@spflds); if(grep { $_ =~ /^$spflds[0]/i; } @organell_names) {
$organelle = shift(@spflds);
}
$genus = shift(@spflds);
if(@spflds) {
$species = shift(@spflds);
} else {
$species = "sp.";
}
$sub_species = shift(@spflds) if(@spflds);
} elsif (/^\s+(.+)/) {
push(@class, map { s/^\s+//; s/\s+$//; $_; } split /[;\.]+/, $1);
} else {
last;
}
$_ = undef; }
$$buffer = $_;
return unless $genus and $genus !~ /^(Unknown|None)$/i;
if ($class[$#class] eq $genus) {
push( @class, $species );
} else {
push( @class, $genus, $species );
}
@class = reverse @class;
my $make = Bio::Species->new();
$make->classification(\@ class, "FORCE" ); $make->common_name( $common ) if $common;
$make->sub_species( $sub_species ) if $sub_species;
$make->organelle($organelle) if $organelle;
return $make; } |
sub _show_dna
{ my ($obj,$value) = @_;
if( defined $value) {
$obj->{'_show_dna'} = $value;
}
return $obj->{'_show_dna'}; } |
sub _sv_generation_func
{ my ($obj,$value) = @_;
if( defined $value ) {
$obj->{'_sv_generation_func'} = $value;
}
return $obj->{'_sv_generation_func'}; } |
sub _write_line_GenBank
{ my ($self,$pre1,$pre2,$line,$length) = @_;
$length || $self->throw("Miscalled write_line_GenBank without length. Programming error!");
my $subl = $length - length $pre2;
my $linel = length $line;
my $i;
my $sub = substr($line,0,$length - length $pre1);
$self->_print("$pre1$sub\n");
for($i= ($length - length $pre1);$i < $linel;) {
$sub = substr($line,$i,($subl));
$self->_print("$pre2$sub\n");
$i += $subl;
} } |
sub _write_line_GenBank_regex
{ my ($self,$pre1,$pre2,$line,$regex,$length) = @_;
$length || $self->throw( "Miscalled write_line_GenBank without length. Programming error!");
my $subl = $length - (length $pre1) - 2;
my @lines = ();
CHUNK: while($line) {
foreach my $pat ($regex, '[,;\.\/-]\s|'.$regex, '[,;\.\/-]|'.$regex) {
if($line =~ m/^(.{1,$subl})($pat)(.*)/) { $line = $3; my $l = $1.$2;
$l =~ s/\s+$//;
push(@lines, $l);
next CHUNK;
}
}
$self->warn("trouble dissecting\" $line\" into chunks ".
"of $subl chars or less - this tag won't print right");
$line = substr($line,0,$subl) . " " . substr($line,$subl);
}
my $s = shift @lines;
$self->_print("$pre1$s\n");
foreach my $s ( @lines ) {
$self->_print("$pre2$s\n");
} } |
sub next_seq
{ my ($self,@args) = @_;
my $builder = $self->sequence_builder();
my $seq;
my %params;
RECORDSTART: while (1) {
my $buffer;
my (@acc, @features);
my ($display_id, $annotation);
my $species;
@features = ();
$annotation = undef;
@acc = ();
$species = undef;
%params = (-verbose => $self->verbose); local($/) = "\n";
while(defined($buffer = $self->_readline())) {
last if index($buffer,'LOCUS ') == 0;
}
return undef if( !defined $buffer ); $buffer =~ /^LOCUS\s+(\S.*)$/ ||
$self->throw("GenBank stream with bad LOCUS line. Not GenBank in my book. Got '$buffer'");
my @tokens = split(' ', $1);
$display_id = shift(@tokens);
$params{'-display_id'} = $display_id;
$params{'-length'} = shift(@tokens);
$params{'-alphabet'} = (lc(shift @tokens) eq 'bp') ? 'dna' : 'protein';
if (($params{'-alphabet'} eq 'dna') || (@tokens > 2)) {
$params{'-molecule'} = shift(@tokens);
my $circ = shift(@tokens);
if ($circ eq 'circular') {
$params{'-is_circular'} = 1;
$params{'-division'} = shift(@tokens);
} else {
$params{'-division'} =
(CORE::length($circ) == 3 ) ? $circ : shift(@tokens);
}
} else {
$params{'-molecule'} = 'PRT' if($params{'-alphabet'} eq 'aa');
$params{'-division'} = shift(@tokens);
}
my $date = join(' ', @tokens); if($date =~ s/\s*((\d{1,2})-(\w{3})-(\d{2,4})).*/$1/) {
if( length($date) < 11 ) { my ($d,$m,$y) = ($2,$3,$4);
if( length($d) == 1 ) {
$d = "0$d";
}
if( length($y) == 2 ) {
if( $y > 60 ) { $y = "19$y";
} else {
$y = "20$y";
}
$self->warn("Date was malformed, guessing the century for $date to be $y\n");
}
$params{'-dates'} = [join('-',$d,$m,$y)];
} else {
$params{'-dates'} = [$date];
}
}
$builder->add_slot_value(%params);
%params = ();
if(! $builder->want_object()) {
$builder->make_object();
next RECORDSTART;
}
if($builder->want_slot('annotation')) {
$annotation = new Bio::Annotation::Collection;
}
$buffer = $self->_readline();
until( !defined ($buffer) ) {
$_ = $buffer;
if (/^DEFINITION\s+(\S.*\S)/) {
my @desc = ($1);
while ( defined($_ = $self->_readline) ) {
if( /^\s+(.*)/ ) { push (@desc, $1); next };
last;
}
$builder->add_slot_value(-desc => join(' ', @desc));
}
if( /^ACCESSION\s+(\S.*\S)/ ) {
push(@acc, split(/\s+/,$1));
while( defined($_ = $self->_readline) ) {
/^\s+(.*)/ && do { push (@acc, split(/\s+/,$1)); next };
last;
}
$buffer = $_;
next;
}
elsif( /^PID\s+(\S+)/ ) {
$params{'-pid'} = $1;
}
elsif( /^VERSION\s+(.+)$/ ) {
my ($acc,$gi) = split(' ',$1);
if($acc =~ /^\w+\.(\d+)/) {
$params{'-version'} = $1;
$params{'-seq_version'} = $1;
}
if($gi && (index($gi,"GI:") == 0)) {
$params{'-primary_id'} = substr($gi,3);
}
}
elsif( /^KEYWORDS\s+(.*)/ ) {
my @kw = split(/\s*\;\s*/,$1);
while( defined($_ = $self->_readline) ) {
chomp;
/^\s+(.*)/ && do { push (@kw, split(/\s*\;\s*/,$1)); next };
last;
}
@kw && $kw[-1] =~ s/\.$//;
$params{'-keywords'} =\@ kw;
$buffer = $_;
next;
}
elsif (/^SOURCE/) {
if($builder->want_slot('species')) {
$species = $self->_read_GenBank_Species(\$buffer);
$builder->add_slot_value(-species => $species);
} else {
while(defined($buffer = $self->_readline())) {
last if substr($buffer,0,1) ne ' ';
}
}
next;
}
elsif (/^REFERENCE/) {
if($annotation) {
my @refs = $self->_read_GenBank_References(\$buffer);
foreach my $ref ( @refs ) {
$annotation->add_Annotation('reference',$ref);
}
} else {
while(defined($buffer = $self->_readline())) {
last if substr($buffer,0,1) ne ' ';
}
}
next;
}
elsif (/^COMMENT\s+(.*)/) {
if($annotation) {
my $comment = $1;
while (defined($_ = $self->_readline)) {
last if (/^\S/);
$comment .= $_;
}
$comment =~ s/\n/ /g;
$comment =~ s/ +/ /g;
$annotation->add_Annotation(
'comment',
Bio::Annotation::Comment->new(-text => $comment));
$buffer = $_;
} else {
while(defined($buffer = $self->_readline())) {
last if substr($buffer,0,1) ne ' ';
}
}
next;
} elsif( /^SEGMENT\s+(.+)/ ) {
if($annotation) {
my $segment = $1;
while (defined($_ = $self->_readline)) {
last if (/^\S/);
$segment .= $_;
}
$segment =~ s/\n/ /g;
$segment =~ s/ +/ /g;
$annotation->add_Annotation(
'segment',
Bio::Annotation::SimpleValue->new(-value => $segment));
$buffer = $_;
} else {
while(defined($buffer = $self->_readline())) {
last if substr($buffer,0,1) ne ' ';
}
}
next;
}
last if( /^(FEATURES|ORIGIN)/ );
$buffer = $self->_readline;
}
return undef if(! defined($buffer));
$builder->add_slot_value(-accession_number => shift(@acc),
-secondary_accessions =>\@ acc,
%params);
$builder->add_slot_value(-annotation => $annotation) if $annotation;
%params = ();
if(! $builder->want_object()) {
$builder->make_object();
next RECORDSTART;
}
if($builder->want_slot('features') && defined($_) && /^FEATURES/) {
$buffer = $self->_readline;
while( defined($buffer) ) {
last if(($buffer =~ /^BASE/) || ($buffer =~ /^ORIGIN/) ||
($buffer =~ /^CONTIG/) );
my $ftunit = $self->_read_FTHelper_GenBank(\$buffer);
if( !defined $ftunit ) {
$self->warn("Unexpected error in feature table for ".$params{'-display_id'}." Skipping feature, attempting to recover");
unless( ($buffer =~ /^\s{5,5}\S+/) or ($buffer =~ /^\S+/)) {
$buffer = $self->_readline();
}
next; }
my $feat =
$ftunit->_generic_seqfeature($self->location_factory(),
$display_id);
if($species && ($feat->primary_tag eq 'source') &&
$feat->has_tag('db_xref') && (! $species->ncbi_taxid())) {
foreach my $tagval ($feat->get_tag_values('db_xref')) {
if(index($tagval,"taxon:") == 0) {
$species->ncbi_taxid(substr($tagval,6));
}
}
}
push(@features, $feat);
}
$builder->add_slot_value(-features =>\@ features);
$_ = $buffer;
}
if( defined ($_) ) {
if( /^CONTIG/ && $builder->want_slot('features')) {
$b = " $_"; my $ftunit = $self->_read_FTHelper_GenBank(\$b);
if( ! defined $ftunit ) {
$self->warn("unable to parse the CONTIG feature\n");
} else {
push(@features,
$ftunit->_generic_seqfeature($self->location_factory(),
$display_id));
}
} elsif(! /^(ORIGIN|\/\/)/ ) { while (defined( $_ = $self->_readline) ) {
last if /^(ORIGIN|\/\/)/;
}
}
}
if(! $builder->want_object()) {
$builder->make_object(); next RECORDSTART;
}
if($builder->want_slot('seq')) {
if(defined($_) && s/^ORIGIN//) {
chomp;
if( $annotation && length($_) > 0 ) {
$annotation->add_Annotation('origin',
Bio::Annotation::SimpleValue->new(-value => $_));
}
my $seqc = '';
while( defined($_ = $self->_readline) ) {
/^\/\// && last;
$_ = uc($_);
s/[^A-Za-z]//g;
$seqc .= $_;
}
$self->debug("sequence length is ". length($seqc) ."\n");
$builder->add_slot_value(-seq => $seqc);
}
} elsif ( defined($_) && (substr($_,0,2) ne '//')) {
while( defined($_ = $self->_readline) ) {
last if substr($_,0,2) eq '//';
}
}
$seq = $builder->make_object();
next RECORDSTART unless $seq;
last RECORDSTART;
}
return $seq; } |
sub write_seq
{ my ($self,@seqs) = @_;
foreach my $seq ( @seqs ) {
$self->throw("Attempting to write with no seq!") unless defined $seq;
if( ! ref $seq || ! $seq->isa('Bio::SeqI') ) {
$self->warn(" $seq is not a SeqI compliant module. Attempting to dump, but may fail!");
}
my $str = $seq->seq;
my ($div, $mol);
my $len = $seq->length();
if ( $seq->can('division') ) {
$div=$seq->division;
}
if( !defined $div || ! $div ) { $div = 'UNK'; }
my $alpha = $seq->alphabet;
if( !$seq->can('molecule') || ! defined ($mol = $seq->molecule()) ) {
$mol = $alpha || 'DNA';
}
my $circular = 'linear ';
$circular = 'circular' if $seq->is_circular;
local($^W) = 0;
my $temp_line;
if( $self->_id_generation_func ) {
$temp_line = &{$self->_id_generation_func}($seq);
} else {
my $date = '';
if( $seq->can('get_dates') ) {
($date) = $seq->get_dates();
}
$temp_line = sprintf ("%-12s%-15s%13s %s%4s%-8s%-8s %3s %-s",
'LOCUS', $seq->id(),$len,
(lc($alpha) eq 'protein') ? ('aa','', '') :
('bp', '',$mol),$circular,
$div,$date);
}
$self->_print("$temp_line\n");
$self->_write_line_GenBank_regex("DEFINITION ", " ",
$seq->desc(),"\\s\+\|\$",80);
if( $self->_ac_generation_func ) {
$temp_line = &{$self->_ac_generation_func}($seq);
$self->_print("ACCESSION $temp_line\n");
} else {
my @acc = ();
push(@acc, $seq->accession_number());
if( $seq->isa('Bio::Seq::RichSeqI') ) {
push(@acc, $seq->get_secondary_accessions());
}
$self->_print("ACCESSION ", join(" ", @acc), "\n");
}
if($seq->isa('Bio::Seq::RichSeqI') && $seq->pid()) {
$self->_print("PID ", $seq->pid(), "\n");
}
if( defined $self->_sv_generation_func() ) {
$temp_line = &{$self->_sv_generation_func}($seq);
if( $temp_line ) {
$self->_print("VERSION $temp_line\n");
}
} else {
if($seq->isa('Bio::Seq::RichSeqI') && defined($seq->seq_version)) {
my $id = $seq->primary_id(); $self->_print("VERSION ",
$seq->accession_number(), ".", $seq->seq_version,
($id && ($id =~ /^\d+$/) ? " GI:".$id : ""),
"\n");
}
}
if( defined $self->_kw_generation_func() ) {
$temp_line = &{$self->_kw_generation_func}($seq);
$self->_print("KEYWORDS $temp_line\n");
} else {
if( $seq->can('keywords') ) {
my $kw = $seq->keywords;
if( ref($kw) =~ /ARRAY/i ) {
$kw = join("; ", @$kw);
}
$kw .= '.' if( $kw !~ /\.$/ );
$self->_print("KEYWORDS $kw\n");
}
}
foreach my $ref ( $seq->annotation->get_Annotations('segment') ) {
$self->_print(sprintf ("%-11s %s\n",'SEGMENT',
$ref->value));
}
if (my $spec = $seq->species) {
my ($species, $genus, @class) = $spec->classification();
my $OS;
if( $spec->common_name ) {
$OS = $spec->common_name;
} else {
$OS = "$genus $species";
}
if (my $ssp = $spec->sub_species) {
$OS .= " $ssp";
}
$self->_print("SOURCE $OS\n");
$self->_print(" ORGANISM ",
($spec->organelle() ? $spec->organelle()." " : ""),
"$genus $species", "\n");
my $OC = join('; ', (reverse(@class), $genus)) .'.';
$self->_write_line_GenBank_regex(' 'x12,' 'x12,
$OC,"\\s\+\|\$",80);
}
my $count = 1;
foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {
$temp_line = sprintf ("REFERENCE $count (%s %d to %d)",
($seq->alphabet() eq "protein" ?
"residues" : "bases"),
$ref->start,$ref->end);
$self->_print("$temp_line\n");
$self->_write_line_GenBank_regex(" AUTHORS ",' 'x12,
$ref->authors,"\\s\+\|\$",80);
$self->_write_line_GenBank_regex(" TITLE "," "x12,
$ref->title,"\\s\+\|\$",80);
$self->_write_line_GenBank_regex(" JOURNAL "," "x12,
$ref->location,"\\s\+\|\$",80);
if ($ref->comment) {
$self->_write_line_GenBank_regex(" REMARK "," "x12,
$ref->comment,"\\s\+\|\$",80);
}
if( $ref->medline) {
$self->_write_line_GenBank_regex(" MEDLINE "," "x12,
$ref->medline, "\\s\+\|\$",80);
if( $ref->pubmed ) {
$self->_write_line_GenBank_regex(" PUBMED "," "x12,
$ref->pubmed, "\\s\+\|\$",
80);
}
}
$count++;
}
foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {
$self->_write_line_GenBank_regex("COMMENT "," "x12,
$comment->text,"\\s\+\|\$",80);
}
$self->_print("FEATURES Location/Qualifiers\n");
my $contig;
if( defined $self->_post_sort ) {
my $post_sort_func = $self->_post_sort();
my @fth;
foreach my $sf ( $seq->top_SeqFeatures ) {
push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));
}
@fth = sort { &$post_sort_func($a,$b) } @fth;
foreach my $fth ( @fth ) {
$self->_print_GenBank_FTHelper($fth);
}
} else {
foreach my $sf ( $seq->top_SeqFeatures ) {
my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);
foreach my $fth ( @fth ) {
if( ! $fth->isa('Bio::SeqIO::FTHelper') ) {
$sf->throw("Cannot process FTHelper... $fth");
}
$self->_print_GenBank_FTHelper($fth);
}
}
}
if( $seq->length == 0 ) { $self->_show_dna(0) }
if( $self->_show_dna() == 0 ) {
$self->_print("\n//\n");
return;
}
$str =~ tr/A-Z/a-z/;
unless( $mol eq 'protein' ) {
my $alen = $str =~ tr/a/a/;
my $clen = $str =~ tr/c/c/;
my $glen = $str =~ tr/g/g/;
my $tlen = $str =~ tr/t/t/;
my $olen = $len - ($alen + $tlen + $clen + $glen);
if( $olen < 0 ) {
$self->warn("Weird. More atgc than bases. Problem!");
}
my $base_count = sprintf("BASE COUNT %8s a %6s c %6s g %6s t%s\n",
$alen,$clen,$glen,$tlen,
( $olen > 0 ) ? sprintf("%6s others",$olen) : '');
$self->_print($base_count);
}
my ($o) = $seq->annotation->get_Annotations('origin');
$self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value : ''));
my $nuc = 60; my $whole_pat = 'a10' x 6; my $out_pat = 'A11' x 6; my $length = length($str);
my $whole = int($length / $nuc) * $nuc;
my $i;
for ($i = 0; $i < $whole; $i += $nuc) {
my $blocks = pack $out_pat,
unpack $whole_pat,
substr($str, $i, $nuc);
chop $blocks;
$self->_print(sprintf("%9d $blocks\n", $i + $nuc - 59));
}
if (my $last = substr($str, $i)) {
my $last_len = length($last);
my $last_pat = 'a10' x int($last_len / 10) .'a'. $last_len % 10; my $blocks = pack $out_pat,
unpack($last_pat, $last);
$blocks =~ s/ +$//;
$self->_print(sprintf("%9d $blocks\n", $length - $last_len + 1));
}
$self->_print("//\n");
$self->flush if $self->_flush_on_write && defined $self->_fh;
return 1;
} } |
General documentation
Where does the data go? | Top |
Data parsed in Bio::SeqIO::genbank is stored in a variety of data
fields in the sequence object that is returned. More information in
the HOWTOs about exactly what each field means and where it goes.
Here is a partial list of fields.
Items listed as RichSeq or Seq or PrimarySeq and then NAME() tell you
the top level object which defines a function called NAME() which
stores this information.
Items listed as Annotation 'NAME' tell you the data is stored the
associated Bio::Annotation::Colection object which is associated with
Bio::Seq objects. If it is explictly requested that no annotations
should be stored when parsing a record of course they won't be
available when you try and get them. If you are having this problem
look at the type of SeqBuilder that is being used to contruct your
sequence object.
Comments Annotation 'comment'
References Annotation 'reference'
Segment Annotation 'segment'
Origin Annotation 'origin'
Accessions PrimarySeq accession_number()
Secondary accessions RichSeq get_secondary_accessions()
Keywords RichSeq keywords()
Dates RichSeq get_dates()
Molecule RichSeq molecule()
Seq Version RichSeq seq_version()
PID RichSeq pid()
Division RichSeq division()
Features Seq get_SeqFeatures()
Alphabet PrimarySeq alphabet()
Definition PrimarySeq description() or desc()
Version PrimarySeq version()
Sequence PrimarySeq seq()
User feedback is an integral part of the evolution of this
and other Bioperl modules. Send your comments and suggestions preferably
to one of the Bioperl mailing lists.
Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://www.bioperl.org/MailList.shtml - About the mailing lists
Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.
Bug reports can be submitted via email or the web:
bioperl-bugs@bio.perl.org
http://bugzilla.bioperl.org/
The rest of the documentation details each of the object
methods. Internal methods are usually preceded with a _