Bio::EnsEMBL
Mapper
Toolbar
Summary
Bio::EnsEMBL::Mapper
Package variables
No package variables defined.
Included modules
Bio::EnsEMBL::Utils::Exception(1) qw ( throw deprecate warning stack_trace_dump )
Bio::EnsEMBL::Utils::Exception(2) qw ( throw )
Synopsis
$map = Bio::EnsEMBL::Mapper->new( 'rawcontig', 'chromosome' );
# add a coodinate mapping - supply two pairs or coordinates
$map->add_map_coordinates(
$contig_id, $contig_start, $contig_end, $contig_ori,
$chr_name, chr_start, $chr_end
);
# map from one coordinate system to another
my @coordlist =
$mapper->map_coordinates( 627012, 2, 5, -1, "rawcontig" );
Description
Generic mapper to provide coordinate transforms between two disjoint
coordinate systems. This mapper is intended to be 'context neutral' - in
that it does not contain any code relating to any particular coordinate
system. This is provided in, for example, Bio::EnsEMBL::AssemblyMapper.
Methods
Methods description
Arg 1 Bio::EnsEMBL::Mapper $mapper2 Example $mapper->add_Mapper($mapper2) Function add all the map coordinates from $mapper to this mapper. This object will contain mapping pairs from both the old object and $mapper2. Returntype int 0,1 Exceptions throw if 'to' and 'from' from both Bio::EnsEMBL::Mappers are incompatible Caller $mapper->methodname() |
Arg 1 int $id id of 'source' sequence Arg 2 int $start start coordinate of 'source' sequence Arg 3 int $end end coordinate of 'source' sequence Arg 4 int $strand relative orientation of source and target (+/- 1) Arg 5 int $id id of 'targe' sequence Arg 6 int $start start coordinate of 'targe' sequence Arg 7 int $end end coordinate of 'targe' sequence Function stores details of mapping between two regions: 'source' and 'target'. Returns 1 if the pair was added, 0 if it was already in. Used when adding an indel Returntype int 0,1 Exceptions none Caller Bio::EnsEMBL::Mapper |
Arg 1 int $id id of 'source' sequence Arg 2 int $start start coordinate of 'source' sequence Arg 3 int $end end coordinate of 'source' sequence Arg 4 int $strand relative orientation of source and target (+/- 1) Arg 5 int $id id of 'targe' sequence Arg 6 int $start start coordinate of 'targe' sequence Arg 7 int $end end coordinate of 'targe' sequence Function stores details of mapping between two regions: 'source' and 'target'. Returns 1 if the pair was added, 0 if it was already in. Returntype int 0,1 Exceptions none Caller Bio::EnsEMBL::Mapper |
Arg 1 string $id id of 'source' sequence Arg 2 int $start start coordinate of 'source' sequence Arg 3 int $end end coordinate of 'source' sequence Arg 4 int $strand raw contig orientation (+/- 1) Arg 5 int $type nature of transform - gives the type of coordinates to be transformed *from* Function inferior map method. Will only do ungapped unsplit mapping. Will return id, start, end strand in a list. Returntype list of results Exceptions none Caller Bio::EnsEMBL::AssemblyMapper |
Args : none Example : none Description: removes all cached information out of this mapper Returntype : none Exceptions : none Caller : AssemblyMapper, ChainedAssemblyMapper |
Arg 1 Bio::EnsEMBL::Mapper::Unit $id id of 'source' sequence Function accessor method form the 'source' and 'target' in a Mapper::Pair Returntype Bio::EnsEMBL::Mapper::Unit Exceptions none Caller Bio::EnsEMBL::Mapper |
Arg 1 int $id id of 'source' sequence Arg 2 int $start start coordinate of 'source' sequence Arg 3 int $end end coordinate of 'source' sequence Arg 4 int $type nature of transform - gives the type of coordinates to be transformed *from* Function list all pairs of mappings in a region Returntype list of Bio::EnsEMBL::Mapper::Pair Exceptions none Caller Bio::EnsEMBL::Mapper |
Arg 1 string $id id of 'source' sequence Arg 2 int $start start coordinate of 'source' sequence Arg 3 int $end end coordinate of 'source' sequence Arg 4 int $strand raw contig orientation (+/- 1) Arg 5 int $type nature of transform - gives the type of coordinates to be transformed *from* Function generic map method Returntype array of Bio::EnsEMBL::Mapper::Coordinate and/or Bio::EnsEMBL::Mapper::Gap Exceptions none Caller Bio::EnsEMBL::Mapper |
Arg [1] : string $id Arg [2] : int $start - start coord. Since this is an indel should always be one greater than end. Arg [3] : int $end - end coord. Since this is an indel should always be one less than start. Arg [4] : int $strand (0, 1, -1) Arg [5] : string $type - the coordinate system name the coords are from. Example : @coords = $mapper->map_indel(); Description: This is in internal function which handles the special mapping case for indels (start = end +1). It will be used to map from a coordinate system with a gap to another that contains an insertion. It will be mainly used by the Variation API. Returntype : Bio::EnsEMBL::Mapper::IndelCoordinate objects Exceptions : none Caller : general |
Arg [1] : string $id Arg [2] : int $start - start coord. Since this is an insert should always be one greater than end. Arg [3] : int $end - end coord. Since this is an insert should always be one less than start. Arg [4] : int $strand (0, 1, -1) Arg [5] : string $type - the coordinate system name the coords are from. Arg [6] : boolean $fastmap - if specified, this is being called from the fastmap call. The mapping done is not any faster for inserts, but the return value is different. Example : Description: This is in internal function which handles the special mapping case for inserts (start = end +1). This function will be called automatically by the map function so there is no reason to call it directly. Returntype : list of Bio::EnsEMBL::Mapper::Coordinate and/or Gap objects Exceptions : none Caller : map_coordinates() |
Arg [1] : string $from The name of the 'from' coordinate system Arg [2] : string $to The name of the 'to' coordinate system Arg [3] : (optional) Bio::EnsEMBL::CoordSystem $from_cs The 'from' coordinate system Arg [4] : (optional) Bio::EnsEMBL::CoordSystem $to_cs Example : my $mapper = Bio::EnsEMBL::Mapper->new('FROM', 'TO'); Description: Constructor. Creates a new Bio::EnsEMBL::Mapper object. Returntype : Bio::EnsEMBL::Mapper Exceptions : none Caller : general |
Methods code
sub _dump
{ my ($self,$fh) = @_;
if( !defined $fh ) {
$fh =\* STDERR;
}
foreach my $id ( keys %{$self->{'_pair_hash_from'}} ) {
print $fh "From Hash $id\n";
foreach my $pair ( @{$self->{'_pair_hash_from'}->{uc($id)}} ) {
print $fh " ",$pair->from->start," ",$pair->from->end,":",$pair->to->start," ",$pair->to->end," ",$pair->to->id,"\n";
}
}
}
} |
sub _is_sorted
{ my ($self, $value) = @_;
$self->{'_is_sorted'} = $value if (defined($value));
return $self->{'_is_sorted'};
}
1; } |
sub _merge_pairs
{ my $self = shift;
my ( $lr, $lr_from, $del_pair, $next_pair, $current_pair );
my $map_to = $self->{'to'};
my $map_from = $self->{'from'};
$self->{'pair_count'} = 0;
for my $key ( keys %{$self->{"_pair_$map_to"}} ) {
$lr = $self->{"_pair_$map_to"}->{$key};
my $i = 0;
my $next = 1;
my $length = $#$lr;
while( $next <= $length ) {
$current_pair = $lr->[$i];
$next_pair = $lr->[$next];
$del_pair = undef;
if(exists $current_pair->{'indel'} || exists $next_pair->{'indel'}){
$next++;
$i++;
}
else{
if( $current_pair->{'to'}->{'start'} == $next_pair->{'to'}->{'start'}
and $current_pair->{'from'}->{'id'} == $next_pair->{'from'}->{'id'} ) {
$del_pair = $next_pair;
} elsif(( $current_pair->{'from'}->{'id'} eq $next_pair->{'from'}->{'id'} ) &&
( $next_pair->{'ori'} == $current_pair->{'ori'} ) &&
( $next_pair->{'to'}->{'start'} -1 == $current_pair->{'to'}->{'end'} )) {
if( $current_pair->{'ori'} == 1 ) {
if( $next_pair->{'from'}->{'start'} - 1 == $current_pair->{'from'}->{'end'} ) {
$current_pair->{'to'}->{'end'} = $next_pair->{'to'}->{'end'};
$current_pair->{'from'}->{'end'} = $next_pair->{'from'}->{'end'};
$del_pair = $next_pair;
}
} else {
if( $next_pair->{'from'}->{'end'} + 1 == $current_pair->{'from'}->{'start'} ) {
$current_pair->{'to'}->{'end'} = $next_pair->{'to'}->{'end'};
$current_pair->{'from'}->{'start'} = $next_pair->{'from'}->{'start'};
$del_pair = $next_pair;
}
}
}
if( defined $del_pair ) {
splice( @$lr, $next, 1 );
$lr_from = $self->{"_pair_$map_from"}->{uc($del_pair->{'from'}->{'id'})};
for( my $j=0; $j <= $#$lr_from; $j++ ) {
if( $lr_from->[$j] == $del_pair ) {
splice( @$lr_from, $j, 1 );
last;
}
}
$length--;
if( $length < $next ) { last; }
}
else {
$next++;
$i++;
}
}
}
$self->{'pair_count'} += scalar( @$lr );
}
}
} |
sub _sort
{ my ($self) = @_;
my $to = $self->{'to'};
my $from = $self->{'from'};
foreach my $id ( keys %{ $self->{"_pair_$from"} } ) {
@{ $self->{"_pair_$from"}->{$id} } =
sort { $a->{'from'}->{'start'} <=> $b->{'from'}->{'start'} }
@{ $self->{"_pair_$from"}->{$id} };
}
foreach my $id ( keys %{ $self->{"_pair_$to"} } ) {
@{ $self->{"_pair_$to"}->{$id} } =
sort { $a->{'to'}->{'start'} <=> $b->{'to'}->{'start'} }
@{ $self->{"_pair_$to"}->{$id} };
}
$self->_merge_pairs();
$self->_is_sorted(1);
}
} |
sub add_Mapper
{ my ($self, $mapper) = @_;
my $mapper_to = $mapper->{'to'};
my $mapper_from = $mapper->{'from'};
if ($mapper_to ne $self->{'to'} or $mapper_from ne $self->{'from'}) {
throw("Trying to add an incompatible Mapper");
}
my $count_a = 0;
foreach my $seq_name (keys %{$mapper->{"_pair_$mapper_to"}}) {
push(@{$self->{"_pair_$mapper_to"}->{$seq_name}},
@{$mapper->{"_pair_$mapper_to"}->{$seq_name}});
$count_a += scalar(@{$mapper->{"_pair_$mapper_to"}->{$seq_name}});
}
my $count_b = 0;
foreach my $seq_name (keys %{$mapper->{"_pair_$mapper_from"}}) {
push(@{$self->{"_pair_$mapper_from"}->{$seq_name}},
@{$mapper->{"_pair_$mapper_from"}->{$seq_name}});
$count_b += scalar(@{$mapper->{"_pair_$mapper_from"}->{$seq_name}});
}
if ($count_a == $count_b) {
$self->{'pair_count'} += $count_a;
} else {
throw("Trying to add a funny Mapper");
}
$self->{'_is_sorted'} = 0;
return 1; } |
sub add_indel_coordinates
{ my ($self, $contig_id, $contig_start, $contig_end,
$contig_ori, $chr_name, $chr_start, $chr_end) = @_;
unless(defined($contig_id) && defined($contig_start) && defined($contig_end)
&& defined($contig_ori) && defined($chr_name) && defined($chr_start)
&& defined($chr_end)) {
throw("7 arguments expected");
}
my $from =
Bio::EnsEMBL::Mapper::Unit->new($contig_id, $contig_start, $contig_end);
my $to =
Bio::EnsEMBL::Mapper::Unit->new($chr_name, $chr_start, $chr_end);
my $pair = Bio::EnsEMBL::Mapper::IndelPair->new($from, $to, $contig_ori);
my $map_to = $self->{'to'};
my $map_from = $self->{'from'};
push( @{$self->{"_pair_$map_to"}->{uc($chr_name)}}, $pair );
push( @{$self->{"_pair_$map_from"}->{uc($contig_id)}}, $pair );
$self->{'pair_count'}++;
$self->{'_is_sorted'} = 0;
return 1; } |
sub add_map_coordinates
{ my ($self, $contig_id, $contig_start, $contig_end,
$contig_ori, $chr_name, $chr_start, $chr_end) = @_;
unless(defined($contig_id) && defined($contig_start) && defined($contig_end)
&& defined($contig_ori) && defined($chr_name) && defined($chr_start)
&& defined($chr_end)) {
throw("7 arguments expected");
}
if( ($contig_end - $contig_start) != ($chr_end - $chr_start) ) {
throw("Cannot deal with mis-lengthed mappings so far");
}
my $from =
Bio::EnsEMBL::Mapper::Unit->new($contig_id, $contig_start, $contig_end);
my $to =
Bio::EnsEMBL::Mapper::Unit->new($chr_name, $chr_start, $chr_end);
my $pair = Bio::EnsEMBL::Mapper::Pair->new($from, $to, $contig_ori);
my $map_to = $self->{'to'};
my $map_from = $self->{'from'};
push( @{$self->{"_pair_$map_to"}->{uc($chr_name)}}, $pair );
push( @{$self->{"_pair_$map_from"}->{uc($contig_id)}}, $pair );
$self->{'pair_count'}++;
$self->{'_is_sorted'} = 0; } |
sub fastmap
{ my ($self, $id, $start, $end, $strand, $type) = @_;
my ($from, $to, $cs);
if($end+1 == $start) {
return $self->map_insert($id, $start, $end, $strand, $type, 1);
}
if( ! $self->{'_is_sorted'} ) { $self->_sort() }
if($type eq $self->{'to'}) {
$from = 'to';
$to = 'from';
$cs = $self->{'from_cs'};
} else {
$from = 'from';
$to = 'to';
$cs = $self->{'to_cs'};
}
my $hash = $self->{"_pair_$type"} or
throw("Type $type is neither to or from coordinate systems");
my $pairs = $hash->{uc($id)};
foreach my $pair (@$pairs) {
my $self_coord = $pair->{$from};
my $target_coord = $pair->{$to};
if( $start < $self_coord->{'start'} ||
$end > $self_coord->{'end'} ) {
next;
}
if( $pair->{'ori'} == 1 ) {
return ( $target_coord->{'id'},
$target_coord->{'start'}+$start-$self_coord->{'start'},
$target_coord->{'start'}+$end-$self_coord->{'start'},
$strand, $cs );
} else {
return ( $target_coord->{'id'},
$target_coord->{'end'} - ($end - $self_coord->{'start'}),
$target_coord->{'end'} - ($start - $self_coord->{'start'}),
-$strand, $cs );
}
}
return (); } |
sub flush
{ my $self = shift;
my $from = $self->from();
my $to = $self->to();
$self->{"_pair_$from"} = {};
$self->{"_pair_$to"} = {};
$self->{'pair_count'} = 0; } |
sub from
{ my ( $self, $value ) = @_;
if ( defined($value) ) {
$self->{'from'} = $value;
}
return $self->{'from'};
}
} |
sub list_pairs
{ my ($self, $id, $start, $end, $type) = @_;
if( ! $self->{'_is_sorted'} ) { $self->_sort() }
if( !defined $type ) {
throw("Must start,end,id,type as coordinates");
}
if( $start > $end ) {
throw("Start is greater than end for id $id, start $start, end $end\n");
}
my $hash = $self->{"_pair_$type"};
my ($from, $to);
if($type eq $self->{'to'}) {
$from = 'to';
$to = 'from';
} else {
$from = 'from';
$to = 'to';
}
unless(defined $hash) {
throw("Type $type is neither to or from coordinate systems");
}
my @list;
unless(exists $hash->{uc($id)}) {
return ();
}
@list = @{$hash->{uc($id)}};
my @output;
if( $start == -1 && $end == -1 ) {
return @list;
} else {
foreach my $p ( @list ) {
if( $p->{$from}->{'end'} < $start ) {
next;
}
if( $p->{$from}->{'start'} > $end ) {
last;
}
push(@output,$p);
}
return @output;
} } |
sub map_coordinates
{ my ($self, $id, $start, $end, $strand, $type) = @_;
unless(defined($id) && defined($start) && defined($end) &&
defined($strand) && defined($type) ) {
throw("Must start,end,strand,id,type as coordinates");
}
if($start == $end + 1) {
return $self->map_insert($id, $start, $end, $strand, $type);
}
if( ! $self->{'_is_sorted'} ) { $self->_sort() }
my $hash = $self->{"_pair_$type"};
my ($from, $to, $cs);
if($type eq $self->{'to'}) {
$from = 'to';
$to = 'from';
$cs = $self->{'from_cs'};
} else {
$from = 'from';
$to = 'to';
$cs = $self->{'to_cs'};
}
unless(defined $hash) {
throw("Type $type is neither to or from coordinate systems");
}
if( !defined $hash->{uc($id)} ) {
my $gap = Bio::EnsEMBL::Mapper::Gap->new($start, $end);
return $gap;
}
my $last_used_pair;
my @result;
my ( $start_idx, $end_idx, $mid_idx, $pair, $self_coord );
my $lr = $hash->{uc($id)};
$start_idx = 0;
$end_idx = $#$lr;
while(( $end_idx - $start_idx ) > 1 ) {
$mid_idx = ($start_idx+$end_idx)>>1;
$pair = $lr->[$mid_idx];
$self_coord = $pair->{$from};
if( $self_coord->{'end'} < $start ) {
$start_idx = $mid_idx;
} else {
$end_idx = $mid_idx;
}
}
my $orig_start = $start;
my $last_target_coord = undef;
for( my $i = $start_idx; $i<=$#$lr; $i++ ) {
$pair = $lr->[$i];
my $self_coord = $pair->{$from};
my $target_coord = $pair->{$to};
if(defined($last_target_coord) and $target_coord->{'id'} ne $last_target_coord){
if($self_coord->{'start'} < $start){ $start = $orig_start;
}
}
else{
$last_target_coord = $target_coord->{'id'};
}
if( $self_coord->{'end'} < $orig_start ) {
next;
}
if( $self_coord->{'start'} > $end ) {
last;
}
if( $start < $self_coord->{'start'} ) {
my $gap = Bio::EnsEMBL::Mapper::Gap->new($start,
$self_coord->{'start'}-1);
push(@result,$gap);
$start = $gap->{'end'}+1;
}
my ($target_start,$target_end,$target_ori);
my $res;
if (exists $pair->{'indel'}){
$target_start = $target_coord->{'start'};
$target_end = $target_coord->{'end'};
my $gap = Bio::EnsEMBL::Mapper::Gap->new($start, ($self_coord->{'end'} < $end ? $self_coord->{'end'} : $end));
my $coord = Bio::EnsEMBL::Mapper::Coordinate->new($target_coord->{'id'},
$target_start,
$target_end,
$pair->{'ori'} * $strand,
$cs);
$res = Bio::EnsEMBL::Mapper::IndelCoordinate->new($gap,$coord);
}
else{
if( $pair->{'ori'} == 1 ) {
$target_start =
$target_coord->{'start'} + ($start - $self_coord->{'start'});
} else {
$target_end =
$target_coord->{'end'} - ($start - $self_coord->{'start'});
}
if( $end > $self_coord->{'end'} ) {
if( $pair->{'ori'} == 1 ) {
$target_end = $target_coord->{'end'};
} else {
$target_start = $target_coord->{'start'};
}
} else {
if( $pair->{'ori'} == 1 ) {
$target_end =
$target_coord->{'start'} + ($end - $self_coord->{'start'});
} else {
$target_start =
$target_coord->{'end'} - ($end - $self_coord->{'start'});
}
}
$res = Bio::EnsEMBL::Mapper::Coordinate->new($target_coord->{'id'},
$target_start,
$target_end,
$pair->{'ori'} * $strand,
$cs);
}
push(@result,$res);
$last_used_pair = $pair;
$start = $self_coord->{'end'}+1;
}
if( !defined $last_used_pair ) {
my $gap = Bio::EnsEMBL::Mapper::Gap->new($start, $end);
push(@result,$gap);
} elsif( $last_used_pair->{$from}->{'end'} < $end ) {
my $gap = Bio::EnsEMBL::Mapper::Gap->new(
$last_used_pair->{$from}->{'end'} + 1,
$end);
push(@result,$gap);
}
if ( $strand == -1 ) {
@result = reverse ( @result);
}
return @result; } |
sub map_indel
{ my ($self, $id, $start, $end, $strand, $type) = @_;
($start, $end) =($end,$start);
if( ! $self->{'_is_sorted'} ) { $self->_sort() }
my $hash = $self->{"_pair_$type"};
my ($from, $to, $cs);
if($type eq $self->{'to'}) {
$from = 'to';
$to = 'from';
$cs = $self->{'from_cs'};
} else {
$from = 'from';
$to = 'to';
$cs = $self->{'to_cs'};
}
unless(defined $hash) {
throw("Type $type is neither to or from coordinate systems");
}
my $last_used_pair;
my @indel_coordinates;
my ( $start_idx, $end_idx, $mid_idx, $pair, $self_coord );
my $lr = $hash->{uc($id)};
$start_idx = 0;
$end_idx = $#$lr;
while(( $end_idx - $start_idx ) > 1 ) {
$mid_idx = ($start_idx+$end_idx)>>1;
$pair = $lr->[$mid_idx];
$self_coord = $pair->{$from};
if( $self_coord->{'end'} <= $start ) {
$start_idx = $mid_idx;
} else {
$end_idx = $mid_idx;
}
}
for( my $i = $start_idx; $i<=$#$lr; $i++ ) {
$pair = $lr->[$i];
my $self_coord = $pair->{$from};
my $target_coord = $pair->{$to};
if (exists $pair->{'indel'}){
my $to =
Bio::EnsEMBL::Mapper::Unit->new($target_coord->{'id'},
$target_coord->{'start'},
$target_coord->{'end'},
);
push @indel_coordinates, $to;
last;
}
$last_used_pair = $pair;
}
return @indel_coordinates; } |
sub map_insert
{ my ($self, $id, $start, $end, $strand, $type, $fastmap) = @_;
($start, $end) =($end,$start);
my @coords = $self->map_coordinates($id, $start, $end, $strand, $type);
if(@coords == 1) {
my $c = $coords[0];
($c->{'start'}, $c->{'end'}) = ($c->{'end'}, $c->{'start'});
} else {
throw("Unexpected: Got ",scalar(@coords)," expected 2.") if(@coords != 2);
my ($c1, $c2);
if($strand == -1) {
($c2,$c1) = @coords;
} else {
($c1, $c2) = @coords;
}
@coords = ();
if(ref($c1) eq 'Bio::EnsEMBL::Mapper::Coordinate') {
if($c1->{'strand'} * $strand == -1) {
$c1->{'end'}--;
} else {
$c1->{'start'}++;
}
@coords = ($c1);
}
if(ref($c2) eq 'Bio::EnsEMBL::Mapper::Coordinate') {
if($c2->{'strand'} * $strand == -1) {
$c2->{'start'}++;
} else {
$c2->{'end'}--;
}
if($strand == -1) {
unshift @coords, $c2;
} else {
push @coords, $c2;
}
}
}
if($fastmap) {
return undef if(@coords != 1);
my $c = $coords[0];
return ($c->{'id'}, $c->{'start'}, $c->{'end'},
$c->{'strand'}, $c->{'coord_system'});
}
return @coords; } |
sub new
{ my ( $proto, $from, $to, $from_cs, $to_cs ) = @_;
if ( !defined($to) || !defined($from) ) {
throw("Must supply 'to' and 'from' tags");
}
my $class = ref($proto) || $proto;
my $self = bless( { "_pair_$from" => {},
"_pair_$to" => {},
'pair_count' => 0,
'to' => $to,
'from' => $from,
'to_cs' => $to_cs,
'from_cs' => $from_cs
},
$class );
return $self; } |
sub to
{ my ( $self, $value ) = @_;
if ( defined($value) ) {
$self->{'to'} = $value;
}
return $self->{'to'}; } |
General documentation
Copyright (c) 1999-2009 The European Bioinformatics Institute and
Genome Research Limited. All rights reserved.
This software is distributed under a modified Apache license.
For license details, please see
/info/about/code_licence.html