Endeavour

ENDEAVOUR: A web resource for gene prioritization in multiple species

The identification of key genes involved in health and disease remains a formidable challenge. Experimental approaches often produce lists of candidate genes, among which disease causing genes are hidden (i.e., their disease associations are still unknown). These lists of candidate genes can be rather large, and thus experimentally validating each candidate gene would be too expensive and time-consuming. There is therefore the need to predict the most promising candidate genes as to be able to maximise the yield of the experimental validation, which has been defined as 'gene prioritization'.

We have developed a bioinformatics approach to prioritize candidate genes underlying biological processes or diseases, and implemented it into a software application termed 'Endeavour'. Our strategy is based on how similar a candidate gene is to a profile derived from genes already known to be involved in the process of interest. Our approach relies on the integration of multiple heterogeneous sources (e.g., coding sequence, gene expression, functional annotation, literature, regulatory information) that cover what we currently know about these genes.

More precisely, Endeavour consists of three stages: training, scoring and fusion. In the first stage, information about the training genes (genes already known to play a role in the process under study) are retrieved from the genomic data sources in order to build models (one per data source). In the second stage, the models are then used to score the candidate genes and to rank them according to their scores. In the last stage, the rankings (one per data source) are fused into a global ranking using Order Statistics. Endeavour is currently available for human, mouse, rat, fruit fly, zebra fish and worm.

We have successfully used Endeavour to prioritize a DiGeorge syndrome associated region, a congenital heart defects associated region, and to optimize a genetic screen in Drosophila melanogaster. Researchers have also used Endeavour to look for genes involved in cleft lip / cleft palate from aCGH data and to analyze the proteome of adipocytes. Please browse our reference section to find a list of Endeavour related publications.

Our Approach

Data

Data from multiple heterogeneous sources are collected and integrated in our local database, for better performances.

Algorithms

Our algorithm uses basic machine learning techniques to model the biological process under study and then to prioritize the candidate genes.

Alternatives

There exists an alternative in case the already known disease genes can not easily be identified.