Endeavour

Help /

Manual

A manual is available to help researchers to use Endeavour. It briefly explains the general concept of gene prioritization and then describes a step-by-step example.

Lectures

Gene prioritization through genomic data fusion (18th May 2009 - Institut Curie, Paris, France)
Link: presentation.

How to make the best of your aCGH data? (9th-11th March 2009 - Genomic Disorders - Wellcome Trust Conference Center, Hinxton, United Kingdom)
Link: presentation.

Gene prioritization (18th November 2008 - Master Bioinformatics - Université de Liège, Liège, Belgium)
Link: presentation.

Optimization of a genetic screen (15th-17th September 2008 - EURO-CBBM - Consiglio Nazionale delle Ricerche, Rome, Italy)
Link: presentation.

Flash demonstration of Endeavour (Java application - ISMB 2007 - Vienna, Autria)
In this demonstration, you will learn how to start the java client from the website, how to load lists of genes, how to train the model and scores the candidates. We demonstrate the powerfulness of our approach by applying it to the Usher syndrome and more precisely to the recent discovery of the DFNB31 gene as a Usher syndrome causing gene. The demo is 20 minutes long, start it now !
Link: presentation and demonstration.

Video of the lecture by Professor Yves Moreau (MLSB 2007 - Evry, France)
The first 30 minutes of the talk present the general principles of Endeavour. The last 20 minutes are intended for a machine learning audience and will be less relevant to biologists using Endeavour. The lecture is fully available on the video lecture website.

Abstract: The overwhelming amount of biological data makes the assignment of candidate genes to diseases and biological pathways a formidable challenge. We present ENDEAVOUR, a generally applicable computational methodology to prioritize candidate genes based on their similarity to case-specific reference gene sets. Unlike previous methods, ENDEAVOUR is capable of flexibly utilizing multiple data sets from diverse sources. It allows the modular incorporation of de novo generated data sets and integrates distinct prioritizations into a global ranking by applying order statistics. We first validate the overall performance in a statistical cross validation of 29 diseases and 3 biological pathways. We validate a novel candidate for DiGeorge syndrome in a zebrafish model and present several new candidates for congenital heart disease. We extend the basic ENDEAVOUR methodology using data from multiple species (human, mouse, rat, drosophila and C. elegans). We also present an alternative machine learning methodology for gene prioritization using kernel methods for novelty detection that outperforms our previous results.
Link: lecture.

FAQ

General

What is the current version of the software?
The current version is v3.71, it corresponds to our latest paper.
What is gene prioritization?
Gene prioritization is a process in which a list of candidate genes is analyzed. The goal is to give priority to interesting genes while discarding the non interesting ones. In our case, we are mainly interested in disease causing genes and therefore we would favor the genes that are likely involved in the disease of interest. In practice, the result of our approach is a ranking of the candidate genes with the more promising candidates at the top.
When do I need gene prioritization?
Basically, anytime you have a list of candidate genes from which you want to select the most promising genes for further validation. One example is the comparative analysis of gene expression for a disease tissue and a reference tissue. This gives rise to a list of genes differentially expressed between the two conditions. One way to select the most promising genes among this list is to use Endeavour. A second example is the use of the array CGH technology for a patient with a known disorder but without diagnosis (known loci are normal). The result of the analysis, if successful, is a genomic region that is though to harbor the disease causing gene. One way to optimize the search is to start by validating the genes highly ranked by Endeavour.
What is the difference between the academic version and the commercial version?
By default, there is no conceptual difference between the two versions. Off course every commercial agreement is different and can contain specific conditions (e.g., the setup of a secured connection between the client and the server to insure the privacy of the transfered data).
Why do I need a training set of genes?
Our approach relies on similarity between any candidate and the biological process under study (e.g., a disease). A simple way to model a biological process is to use all the genes known to play a role in that process (i.e., the training genes). Endeavour uses the training genes to build a model of the process of interest and will then compare the candidate genes to that model. Without training genes (and therefore without a model), it is thus impossible to prioritize the candidate genes.
How small/large can my training set be?
There is no limitation. You can prioritize the whole genome using the 'Full genome' checkbox. However, prioritizing the full genome takes some time and it is recommended to input an e-mail address so that the system can tell you exactly when the results are ready.
How can I estimate the homogeneity of my training set?
The simplest way is to perform a leave-one-out cross-validation on that training set. The procedure is easy, each gene from the training set is, in turn, left out and the model is trained using the remaining training genes. Then, the left-out gene is scored against the model together with 99 genes randomly selected from the genome. The position of the left-out gene among the 100 candidates is recorded. As the procedure is repeated for every training gene, we get the positions for all the genes, which gives us an estimate of the quality of the training set as well as which data sources are informative. Unfortunately, this procedure is not yet available through our online tools but only through the batch mode.
How small/large can my candidate set be?
Depending on which tool you are using, the procedure differs but you can always prioritize the whole genome. For the web client, the 'Full genome' checkbox allows you to prioritize the full genome and to receive the results by e-mail. If you are manually entering a large candidate set, only the top 200 genes will be displayed in the graphical user interface. For the Java client, the 'Full genome' option, available from the 'Tools' menu, allows you to score the full genome and once again to receive the results by e-mail. Depending on the models you are using, the scoring of a manually defined training set of 1500 genes can still be achieved, however using more than 1500 genes can produce Java memory errors. In case, you would better score the genome and apply a filter afterwards.
What reasonable threshold can I use for the final p-values?
We recommend not to use the p-values to make decision since they are not exactly p-values (they do not fulfill all p-values properties). They represent the probability that a candidate gene would obtain these ranks by chance, but they are dependent on the number of candidate genes considered. As a result, there is no reasonable threshold for p-values and we advice to consider that Endeavour produce a ranking of the candidate genes from the most promising ones to the less promising ones. We are well aware of that limitation and we are currently working on that problem.
Is this approach limited to human?
No, you can perform gene prioritization for human, mouse, rat, fly, fish and worm.
Can I get the endeavour data?
Genomic data is the core of our system. Collecting and updating that data is a challenging and time consuming task. Nowadays, we are collecting data from more than 75 different databases for 6 different species. If you want to get access to that data, please contact Prof. Moreau in order to find an agreement.
How is that database xxxxxx is not included in the tool?
If that database contains some specific information not already present via other databases and if that database is freely accessible, you might consider dropping us an email. We will fully consider your request and do our best to include that data source for the community.
I have a problem related to Endeavour, what can I do ?
The simplest thing is to send us an e-mail through the contact form. Ideally, this message should include as many details as possible regarding your problem. For instance, you should mention your operating system, the browser you are running, as well as the complete description of the problem encountered. We will come back to you as soon as possible.

Publications

Endeavour main publications

Tranchevent L., Ardeshirdavani A., ElShal S., Alcaide D., Aerts J., Auboeuf D., Moreau Y., "Candidate gene prioritization with Endeavour", Nucleic Acids Research, Web Server Issue, vol. 44, no. 1, Jul. 2016, pp. 117-121.

Aerts S., Vilain S., Hu S., Tranchevent L., Barriot R., Yan J., Moreau Y., Hassan B., Quan X., "Integrating Computational Biology and Forward Genetics in Drosophila", PLoS Genetics, vol. 5, no. 1, Jan. 2009, pp. 351.

Tranchevent L., Barriot R., Yu S., Van Vooren S., Van Loo P., Coessens B., Aerts S., De Moor B., Moreau Y., "ENDEAVOUR update: a web resource for gene prioritization in multiple species", Nucleic Acids Research, Web Server issue, vol. 36, no. 1, Jun. 2008, pp. 377-384.

Aerts S., Lambrechts D., Maity S., Van Loo P., Coessens B., De Smet F., Tranchevent L., De Moor B., Marynen P., Hassan B., Carmeliet P., Moreau Y., "Gene prioritization through genomic data fusion", Nature Biotechnology, vol. 24, no. 5, May 2006, pp. 537-544.

Endeavour related publications

Yu S., Van Vooren S., Tranchevent L.C., De Moor B., Moreau Y., "Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining", Bioinformatics, vol. 24, no. 16, Aug. 2008, pp. i119-125.

De Bie T., Tranchevent L., van Oeffelen L., Moreau Y., "Kernel-based data fusion for gene prioritization", Bioinformatics, vol. 23, no. 13, Jul. 2007, pp. i125-32.

Endeavour applications

Storey E., Bahlo M., Fahey M., Sisson O., Lueck C.J., Gardner R.J., "A new dominantly inherited pure cerebellar ataxia, SCA 30", J Neurol Neurosurg Psychiatry, 2009 Apr;80(4):408-11.

Cheung C.L., Sham P.C., Chan V., Paterson A.D., Luk K.D., Kung A.W., "Identification of LTBP2 on chromosome 14q as a novel candidate gene for bone mineral density variation and fracture risk association", J Clin Endocrinol Metab, 2008 Nov;93(11):4448-55.

Liu X.G., Liu Y.J., Liu J., Pei Y., Xiong D.H., Shen H., Deng H.Y., Papasian C.J., Drees B.M., Hamilton J.J., Recker R.R., Deng H.W., "A bivariate whole genome linkage study identified genomic regions influencing both BMD and bone structure", J Bone Miner Res, 2008 Nov;23(11):1806-14.

Huang Q.Y., Li G.H., Cheung W.M., Song Y.Q., Kung A.W., "Prediction of osteoporosis candidate genes by computational disease-gene identification strategy", J Hum Genet, 2008;53(7):644-55.

Osoegawa K., Vessere G.M., Utami K.H., Mansilla M.A., Johnson M.K., Riley B.M., L'Heureux J., Pfundt R., Staaf J., van der Vliet W.A., Lidral A.C., Schoenmakers E.F., Borg A., Schutte B.C., Lammer E.J., Murray J.C., de Jong P.J., "Identification of novel candidate genes associated with cleft lip and palate using array comparative genomic hybridisation", J Med Genet, 2008 Feb;45(2):81-6.

Tzouvelekis A., Harokopos V., Paparountas T., Oikonomou N., Chatziioannou A., Vilaras G., Tsiambas E., Karameris A., Bouros D., Aidinis V., "Comparative expression profiling in pulmonary fibrosis suggests a role of hypoxia-inducible factor-1alpha in disease pathogenesis", Am J Respir Crit Care Med, 2007 Dec 1;176(11):1108-19.

Windelinckx A., Vlietinck R., Aerssens J., Beunen G., Thomis M.A., "Selection of genes and single nucleotide polymorphisms for fine mapping starting from a broad linkage region", Twin Res Hum Genet, 2007 Dec;10(6):871-85.

Vanden Bempt I., Drijkoningen M., De Wolf-Peeters C., "The complexity of genotypic alterations underlying HER2-positive breast cancer: an explanation for its clinical heterogeneity", Curr Opin Oncol, 2007 Nov;19(6):552-7.

Adachi J., Kumar C., Zhang Y., Mann M., "In-depth analysis of the adipocyte proteome by mass spectrometry and bioinformatics", Mol Cell Proteomics, 2007 Jul;6(7):1257-73.

Elbers C.C., Onland-Moret N.C., Franke L., Niehoff A.G., van der Schouw Y.T., Wijmenga C., "A strategy to search for common obesity and type 2 diabetes genes", Trends Endocrinol Metab, 2007 Jan-Feb;18(1):19-26.

Katsanou V., Milatos S., Yiakouvaki A., Sgantzis N., Kotsoni A., Alexiou M., Harokopos V., Aidinis V., Hemberger M., Kontoyiannis D.L., "The RNA-binding protein Elavl1/HuR is essential for placental branching morphogenesis and embryonic development", Mol Cell Biol. 2009 May;29(10):2762-76.

Letra A., Menezes R., Govil M., Fonseca R.F., McHenry T., Granjeiro J.M., Castilla E.E., Orioli I.M., Marazita M.L., Vieira A.R., "Follow-up association studies of chromosome region 9q and nonsyndromic cleft lip/palate", Am J Med Genet A 2010 Jul;152A(7):1701-10.

Sookoian S., Gianotti T.F., Gemma C., Burgueno A.L., Pirola C.J., "Role of genetic variation in insulin-like growth factor 1 receptor on insulin resistance and arterial hypertension", J Hypertens. 2010 Jun;28(6):1194-202.

Désir J., Sznajer Y., Depasse F., Roulez F., Schrooyen M., Meire F., Abramowicz M., "LTBP2 null mutations in an autosomal recessive ocular syndrome with megalocornea, spherophakia, and secondary glaucoma", Eur J Hum Genet. 2010 Jul;18(7):761-7.

Thienpont B., Zhang L., Postma A.V., Breckpot J., Tranchevent L.C., Van Loo P., Mollgard K., Tommerup N., Bache I., Tumer Z., van Engelen K., Menten B., Mortier G., Waggoner D., Gewillig M., Moreau Y., Devriendt K., Larsen L.A., "Haploinsufficiency of TAB2 causes congenital heart defects in humans", Am J Hum Genet. 2010 Jun 11;86(6):839-49.

Zhang R., Sun P., Jiang Y., Chen Z., Huang C., Zhang X., Zhang R., "Genome-wide haplotype association analysis and gene prioritization identify CCL3 as a risk locus for rheumatoid arthritis", Int J Immunogenet. 2010 Aug;37(4):273-8.

Tanaka D., Nagashima K., Sasaki M., Yamada C., Funakoshi S., Akitomo K., Takenaka K., Harada K., Koizumi A., Inagaki N., "GCKR mutations in Japanese families with clustered type 2 diabetes", Mol Genet Metab. 2011 Apr;102(4):453-60.

Ambegaokar S.S., Jackson G.R., "Functional genomic screen and network analysis reveal novel modifiers of tauopathy dissociated from tau phosphorylation", Hum Mol Genet. 2011 Dec 15;20(24):4947-77.

Benitez B.A., Alvarado D., Cai Y., Mayo K., Chakraverty S., Norton J., Morris J.C., Sands M.S., Goate A., Cruchaga C., "Exome-sequencing confirms DNAJC5 mutations as cause of adult neuronal ceroid-lipofuscinosis", PLoS One 2011;6(11):e26741.

Erlich Y., Edvardson S., Hodges E., Zenvirt S., Thekkat P., Shaag A., Dor T., Hannon G.J., Elpeleg O., "Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis", Genome Res. 2011 May;21(5):658-64.

Thiel C., Kessler K., Giessl A., Dimmler A., Shalev S.A., von der Haar S., Zenker M., Zahnleiter D., Stoss H., Beinder E., Abou Jamra R., Ekici A.B., Schroder-Kress N., Aigner T., Kirchner T., Reis A., Brandstatter J.H., Rauch A., "NEK1 mutations cause short-rib polydactyly syndrome type majewski", Am J Hum Genet. 2011 Jan 7;88(1):106-14.

Breckpot J., Tranchevent L.C., Thienpont B., Bauters M., Troost E., Gewillig M., Vermeesch J.R., Moreau Y., Devriendt K., Van Esch H., "BMPR1A is a candidate gene for congenital heart defects associated with the recurrent 10q22q23 deletion syndrome", Eur J Med Genet. 2012 Jan;55(1):12-6.

Hussain M.S., Baig S.M., Neumann S., Nurnberg G., Farooq M., Ahmad I., Alef T., Hennies H.C., Technau M., Altmuller J., Frommolt P., Thiele H., Noegel A.A., Nurnberg P., "A truncating mutation of CEP135 causes primary microcephaly and disturbed centrosomal function", Am J Hum Genet. 2012 May 4;90(5):871-8.

Melchionda L., Fang M., Wang H., Fugnanesi V., Morbin M., Liu X., Li W., Ceccherini I., Farina L., Savoiardo M., D'Adamo P., Zhang J., Costa A., Ravaglia S., Ghezzi D., Zeviani M., "Adult-onset Alexander disease, associated with a mutation in an alternative GFAP transcript, may be phenotypically modulated by a non-neutral HDAC6 variant", Orphanet J Rare Dis. 2013 May 1;8:66.

Yu L., Wynn J., Cheung Y.H., Shen Y., Mychaliska G.B., Crombleholme T.M., Azarow K.S., Lim F.Y., Chung D.H., Potoka D., Warner B.W., Bucher B., Stolar C., Aspelund G., Arkovitz M.S., Chung W.K., "Variants in GATA4 are a rare cause of familial and sporadic congenital diaphragmatic hernia", Hum Genet. 2013 Mar;132(3):285-92.

Zhu J., Cui L., Wang W., Hang X.Y., Xu A.X., Yang S.X., Dou J.T., Mu Y.M., Zhang X., Gao J.P., "Whole exome sequencing identifies mutation of EDNRA involved in ACTH-independent macronodular adrenal hyperplasia", Fam Cancer. Jun 11.

Is your publication missing ? Contact us with the reference.

ENDEAVOUR