Gene Families in Compara

Introduction

Ensembl families are determined through clustering of all Ensembl proteins along with metazoan sequences from UniProtKB. It therefore provides a way of exploring orthologues and closely related homologues across a range of animal species.

Ensembl Protein Family Report

Family ID: The family ID reflects the Ensembl version. For example, fam50v... reflects Ensembl version 50. The ID is not stable, i.e. it can change upon a new Ensembl release. However, recording the family ID in itself should be sufficient to find it in the archive sites.
Consensus Annotation: For each cluster obtained, a consensus annotation is automatically generated from the UniProt/Swiss-Prot and UniProt/TrEMBL description lines of all UniProtKB members using the following approach:
If the description covers less than 40% of UniProt members in the cluster, the family description is assigned 'AMBIGUOUS'. If the annotation confidence score, described below, is zero, 'UNKNOWN' is assigned. Be aware that 'UNCHARACTERIZED' is a UniProt description for a protein, and does not reflect the score.
The annotation confidence score is the percentage of UniProtKB family members with this description, or part of it. Note that only family members with 'informative' UniProt descriptions are taken into account.
Prediction Method: A brief summary of the protein family clustering algorithm is given.
Multiple Alignments: Ensembl provides pre-calculated multiple sequence alignments of all members for each cluster.
If the Java runtime environment, JRE, is properly installed on your computer, then buttons will produce a new window with multiple alignments of family members displayed in JalView. The first option includes Ensembl protein prediction from the current, as well as all other species supported by Ensembl. The second option also includes UniProtKB members. You can also export a text file with the alignments of all the family members - a wide range of formats is available from the control panel.
Alternatively, export alignments using the Compara Perl API.

Gene Families in Compara

Introduction

Ensembl Protein Family Report

Clustering

References