Bio::DB
Flat
Toolbar
Summary
Bio::DB::Flat - Interface for indexed flat files
Package variables
No package variables defined.
Included modules
Inherit
Synopsis
$db = Bio::DB::Flat->new(-directory => '/usr/share/embl',
-format => 'embl',
-write_flag => 1);
$db->build_index('/usr/share/embl/primate.embl','/usr/share/embl/protists.embl');
$seq = $db->get_Seq_by_id('BUM');
@sequences = $db->get_Seq_by_acc('DIV' => 'primate');
$raw = $db->fetch_raw('BUM');
Description
This object provides the basic mechanism to associate positions in
files with primary and secondary name spaces. Unlike
Bio::Index::Abstract (see
Bio::Index::Abstract), this is specialized
to work with the "flat index" and BerkeleyDB indexed flat file formats
worked out at the 2002 BioHackathon.
This object is a general front end to the underlying databases.
Methods
DESTROY | No description | Code |
_catfile | No description | Code |
_config_path | No description | Code |
_fileno2path | No description | Code |
_filenos | No description | Code |
_files | No description | Code |
_initialize | No description | Code |
_path2fileno | No description | Code |
_read_config | No description | Code |
_set_namespaces | No description | Code |
_store_index | No description | Code |
add_flat_file | No description | Code |
close | No description | Code |
default_file_format | No description | Code |
default_primary_namespace | No description | Code |
default_secondary_namespaces | No description | Code |
directory | No description | Code |
fetch | Description | Code |
fetch_raw | No description | Code |
file_format | No description | Code |
files | No description | Code |
get_Seq_by_acc | No description | Code |
get_Seq_by_id | No description | Code |
indexing_scheme | No description | Code |
new | Description | Code |
out_file | No description | Code |
parse_one_record | No description | Code |
primary_namespace | No description | Code |
secondary_namespaces | No description | Code |
seq_to_ids | No description | Code |
verbose | No description | Code |
write_config | No description | Code |
write_flag | No description | Code |
write_seq | No description | Code |
Methods description
Title : fetch Usage : $index->fetch( $id ) Function: Returns a Bio::Seq object from the index Example : $seq = $index->fetch( 'dJ67B12' ) Returns : Bio::Seq object Args : ID
Deprecated. Use get_Seq_by_id instead. |
Title : new Usage : my $db = new Bio::Flat->new( -directory => $root_directory, -write_flag => 0, -index => 'bdb'|'flat', -verbose => 0, -out => 'outputfile', -format => 'genbank'); Function: create a new Bio::Index::BDB object Returns : new Bio::Index::BDB object Args : -directory Root directory containing "config.dat" -write_flag If true, allows reindexing. -verbose Verbose messages -out File to write to when write_seq invoked Status : Public
The root -directory indicates where the flat file indexes will be stored. The build_index() and write_seq() methods will automatically create a human-readable configuration file named "config.dat" in this file. The -write_flag enables writing new entries into the database as well as the creation of the indexes. By default the indexes will be opened read only. -index is one of "bdb" or "flat" and indicates the type of index to generate. "bdb" corresponds to Berkeley DB. You *must* be using BerkeleyDB version 2 or higher, and have the Perl BerkeleyDB extension installed (DB_File will *not* work). The -out argument species the output file for writing objects created with write_seq(). |
Methods code
sub DESTROY
{ my $self = shift;
$self->close;
}
1; } |
sub _catfile
{ my $self = shift;
my $component = shift;
Bio::Root::IO->catfile($self->directory,$component); } |
sub _config_path
{ my $self = shift;
$self->_catfile($self->_config_name); } |
sub _fileno2path
{ my $self = shift;
my $fileno = shift;
$self->{flat_flat_file_path}{$fileno}; } |
sub _filenos
{ my $self = shift;
return unless $self->{flat_flat_file_path};
return keys %{$self->{flat_flat_file_path}};
}
} |
sub _files
{ my $self = shift;
my $paths = $self->{flat_flat_file_no};
return keys %$paths; } |
sub _initialize
{ my $self = shift;
my ($flat_write_flag,$flat_indexing,$flat_verbose,$flat_outfile,$flat_format)
= $self->_rearrange([qw(WRITE_FLAG INDEX VERBOSE OUT FORMAT)],@_);
$self->write_flag($flat_write_flag) if defined $flat_write_flag;
if (defined $flat_indexing) {
$flat_indexing = 'BerkeleyDB/1' if $flat_indexing =~ /bdb/;
$flat_indexing = 'flat/1' if $flat_indexing =~ /flat/;
$self->indexing_scheme($flat_indexing);
}
$self->verbose($flat_verbose) if defined $flat_verbose;
$self->out_file($flat_outfile) if defined $flat_outfile;
$self->file_format($flat_format) if defined $flat_format; } |
sub _path2fileno
{ my $self = shift;
my $path = shift;
return $self->add_flat_file($path)
unless exists $self->{flat_flat_file_no}{$path};
$self->{flat_flat_file_no}{$path}; } |
sub _read_config
{ my $self = shift;
my $config = shift;
my $path = defined $config ? Bio::Root::IO->catfile($config,CONFIG_FILE_NAME)
: $self->_config_path;
return unless -e $path;
open (F,$path) or $self->throw("open error on $path: $!");
my %config;
while (<F>) {
chomp;
my ($tag,@values) = split "\t";
$config{$tag} =\@ values;
}
CORE::close F or $self->throw("close error on $path: $!");
$config{index}[0] =~ m~(flat/1|BerkeleyDB/1)~ or $self->throw("invalid configuration file $path: no index line");
$self->indexing_scheme($1);
$self->file_format($config{format}[0]) if $config{format};
my $primary_namespace = $config{primary_namespace}[0]
or $self->throw("invalid configuration file $path: no primary namespace defined");
$self->primary_namespace($primary_namespace);
$self->secondary_namespaces($config{secondary_namespaces});
my @normalized_files = grep {$_ ne ''} map {/^fileid_(\S+)/ && $1} keys %config;
for my $nf (@normalized_files) {
my ($file_path,$file_length) = @{$config{"fileid_${nf}"}};
$self->add_flat_file($file_path,$file_length,$nf);
}
1; } |
sub _set_namespaces
{ my $self = shift;
$self->primary_namespace($self->default_primary_namespace)
unless defined $self->{flat_primary_namespace};
$self->secondary_namespaces($self->default_secondary_namespaces)
unless defined $self->{flat_secondary_namespaces};
$self->file_format($self->default_file_format)
unless defined $self->{flat_format};
}
} |
sub _store_index
{ my ($ids,$file,$offset,$length) = @_;
$self->throw_not_implemented; } |
sub add_flat_file
{ my $self = shift;
my ($file_path,$file_length,$nf) = @_;
File::Spec->file_name_is_absolute($file_path)
or $self->throw("the flat file path $file_path must be absolute");
-r $file_path or $self->throw("flat file $file_path cannot be read: $!");
my $current_size = -s _;
if (defined $file_length) {
$current_size == $file_length
or $self->throw("flat file $file_path has changed size. Was $file_length bytes; now $current_size");
} else {
$file_length = $current_size;
}
unless (defined $nf) {
$self->{flat_file_index} = 0 unless exists $self->{flat_file_index};
$nf = $self->{flat_file_index}++;
}
$self->{flat_flat_file_path}{$nf} = $file_path;
$self->{flat_flat_file_no}{$file_path} = $nf;
$nf; } |
sub close
{ my $self = shift;
return unless $self->{flat_outfile_dirty};
$self->write_config;
delete $self->{flat_outfile_dirty};
delete $self->{flat_cached_parsers}{$self->out_file}; } |
sub default_file_format
{ my $self = shift;
$self->throw_not_implemented; } |
sub default_primary_namespace
{ return "ACC"; } |
sub default_secondary_namespaces
{ return; } |
sub directory
{ my $self = shift;
my $d = $self->{flat_directory};
$self->{flat_directory} = shift if @_;
$d; } |
sub fetch
{ shift->get_Seq_by_id(@_) } |
sub fetch_raw
{ my ($self,$id,$namespace) = @_;
$self->throw_not_implemented;
}
} |
sub file_format
{ my $self = shift;
my $d = $self->{flat_format};
$self->{flat_format} = shift if @_;
$d;
}
} |
sub files
{ my $self = shift;
return unless $self->{flat_flat_file_no};
return keys %{$self->{flat_flat_file_no}}; } |
sub get_Seq_by_acc
{ my $self = shift;
return $self->get_Seq_by_id(shift) if @_ == 1;
my ($ns,$key) = @_;
$self->throw_not_implemented; } |
sub get_Seq_by_id
{ my $self = shift;
my $id = shift;
$self->throw_not_implemented;
}
} |
sub indexing_scheme
{ my $self = shift;
my $d = $self->{flat_indexing};
$self->{flat_indexing} = shift if @_;
$d; } |
sub new
{ my $class = shift;
$class = ref($class) if ref($class);
my $self = $class->SUPER::new(@_);
my ($flat_directory) = @_ == 1 ? shift
: $self->_rearrange([qw(DIRECTORY)],@_);
$self->directory($flat_directory);
$self->_read_config() if -e $flat_directory;
$self->_initialize(@_);
my $index_type = $self->indexing_scheme eq 'BerkeleyDB/1' ? 'BDB'
:$self->indexing_scheme eq 'flat/1' ? 'Flat'
:$self->throw("unknown indexing scheme: ".$self->indexing_scheme);
my $format = $self->file_format;
my $child_class= "Bio\:\:DB\:\:Flat\:\:$index_type\:\:\L$format";
eval "use $child_class";
$self->throw($@) if $@;
bless $self,$child_class;
$self->_initialize(@_);
$self->_set_namespaces(@_);
$self; } |
sub out_file
{ my $self = shift;
my $d = $self->{flat_outfile};
$self->{flat_outfile} = shift if @_;
$d; } |
sub parse_one_record
{ my $self = shift;
my $fh = shift;
$self->throw_not_implemented;
my (%keys,$offset);
return (\%keys,$offset); } |
sub primary_namespace
{ my $self = shift;
my $d = $self->{flat_primary_namespace};
$self->{flat_primary_namespace} = shift if @_;
$d;
}
} |
sub secondary_namespaces
{ my $self = shift;
my $d = $self->{flat_secondary_namespaces};
$self->{flat_secondary_namespaces} = (ref($_[0]) eq 'ARRAY' ? shift : [@_]) if @_;
return unless $d;
$d = [$d] if $d && ref($d) ne 'ARRAY'; return wantarray ? @$d : $d;
}
} |
sub seq_to_ids
{ my $self = shift;
my $seq = shift;
my %ids;
$ids{$self->primary_namespace} = $seq->accession_number;\%
ids; } |
sub verbose
{ my $self = shift;
my $d = $self->{flat_verbose};
$self->{flat_verbose} = shift if @_;
$d; } |
sub write_config
{ my $self = shift;
$self->write_flag or $self->throw("cannot write configuration file because write_flag is not set");
my $path = $self->_config_path;
open (F,">$path") or $self->throw("open error on $path: $!");
my $index_type = $self->indexing_scheme;
print F "index\t$index_type\n";
my $format = $self->file_format;
print F "format\t$format\n";
my @filenos = $self->_filenos or $self->throw("cannot write config file because no flat files defined");
for my $nf (@filenos) {
my $path = $self->{flat_flat_file_path}{$nf};
my $size = -s $path;
print F join("\t","fileid_$nf",$path,$size),"\n";
}
my $primary_ns = $self->primary_namespace
or $self->throw('cannot write config file because no primary namespace defined');
print F join("\t",'primary_namespace',$primary_ns),"\n";
my @secondary = $self->secondary_namespaces;
print F join("\t",'secondary_namespaces',@secondary),"\n";
close F or $self->throw("close error on $path: $!"); } |
sub write_flag
{ my $self = shift;
my $d = $self->{flat_write_flag};
$self->{flat_write_flag} = shift if @_;
$d; } |
sub write_seq
{ my $self = shift;
my $seq = shift;
$self->write_flag or $self->throw("cannot write sequences because write_flag is not set");
my $file = $self->out_file or $self->throw('no outfile defined; use the -out argument to new()');
my $seqio = $self->{flat_cached_parsers}{$file}
||= Bio::SeqIO->new(-Format => $self->file_format,
-file => ">$file")
or $self->throw("couldn't create Bio::SeqIO object");
my $fh = $seqio->_fh or $self->throw("couldn't get filehandle from Bio::SeqIO object");
my $offset = tell($fh);
$seqio->write_seq($seq);
my $length = tell($fh)-$offset;
my $ids = $self->seq_to_ids($seq);
$self->_store_index($ids,$file,$offset,$length);
$self->{flat_outfile_dirty}++; } |
General documentation
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one
of the Bioperl mailing lists. Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://bioperl.org/MailList.shtml - About the mailing lists
Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution. Bug reports can be submitted via
email or the web:
bioperl-bugs@bio.perl.org
http://bugzilla.bioperl.org/
AUTHOR - Lincoln Stein | Top |
The rest of the documentation details each of the object methods. Internal
methods are usually preceded with an "_" (underscore).
To Be Implemented in Subclasses | Top |
The following methods MUST be implemented by subclasses.
May Be Overridden in Subclasses | Top |
The following methods MAY be overridden by subclasses.