ACKNOWLEDGMENTS AND CITING

When using results from ExonMine for publications you are requested to cite the following reference Unconstrained mining of transcript data reveals increased alternative splicing complexity in the human transcriptome. Mollet IG, Ben-Dov C, Felício-Silva D, Grosso AR, Eleutério P, Alves R, Staller R, Silva TS, Carmo-Fonseca M. Nucleic Acids Res. 2010 Apr 12. and to quote the ExonMine webserver including the link http://www.imm.fm.ul.pt/exonmine/, the UCSC Genome Browser including the link http://genome.ucsc.edu/; and to cite any relevant publications detaided below.

Financial Support
This project was supported by the Muscular Dystrophy Association (MDA3662), the European Commission (LSHG-CT-2005-518238, EURASNET), and Fundação para a Ciência e Tecnologia, Portugal (PTDC/SAU-GMG/69739/2006).

Personal Acknowledgements
We are indebted to Juan Valcárcel (CRG-Centre de Regulació Genómica, Barcelona, Spain) and members of his lab Britta Hartmann and Josefin Lundgren for detailed discussions on the data, and to Samuel Aparicio (BC Cancer Agency, Vancouver, British Columbia, Canada) for initial ideas critical to the success of this project.

Retrictions on use of the sequences are particular to each assembly and are detailed below.

Genomic Sequence Data

Human genomic sequence data (Homo sapiens)

Chimp genomic sequence data (Pan troglodytes)

Rhesus genomic sequence data (Macaca mulatta)

Mouse genomic sequence data (Mus musculus)

Rat genomic sequence data (Rattus norvegicus)

Cow genomic sequence data (Bos taurus)

Dog genomic sequence data (Canis lupus familiaris)

Chicken genomic sequence data (Gallus gallus)

Xenopus genomic sequence data (Xenopus tropicalis)

Zebrafish genomic sequence data (Danio rerio)

Drosophila genome sequence data (Drosophila melanogaster)

Nematode genome sequence data (Caenohrabditis elegans)

Ciona genome sequence data (Ciona intestinalis)

Data Sources

Tools

Financial Support

Personal Acknowledgements

Genomic Sequence Data

All sequences presented on this webserver were obtained from the relevant assemblies made available through the UCSC Genome Browser [Karolchik 2002, Kuhn 2007]. These sequences result from large collaborative efforts of research institutes throughout the world. Please refer to the credits page at the UCSC Genome Browser at http://genome.ucsc.edu/goldenPath/credits.html for details on the production of each assembly.

Human genomic sequence data (Homo sapiens)

The human genome sequence data used, referred to as the hg18, March 2006 human reference sequence (NCBI Build 36.1), was produced by the International Human Genome Sequencing Consortium. All the human genomic sequences are freely available for public use and reference to this data should cite the Human Genome Consortium paper:

International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409(6822), 860-921 (2001).

Chimp genomic sequence data (Pan troglodytes)

The March 2006 chimp genome draft assembly data used was produced by the Chimpanzee Genome Sequencing Consortium (Build 2, Version 1 chromosome-based assembly), it is referred to as the panTro2 assembly. References to this data should cite the paper:

The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005 Sep 1;437(7055):69-87.

The chimpanzee sequence is made freely available before scientific publication from the Chimpanzee Genome Sequencing Consortium with the following understanding:

1. The data may be freely downloaded, used in analyses, and repackaged in databases.

2. Users are free to use the data in scientific papers analyzing particular genes and regions if the providers of this data (the Chimpanzee Genome Sequencing Consortium) are properly acknowledged.

3. The centers producing the data reserve the right to publish the initial large-scale analyses of the data set, including large-scale identification of regions of evolutionary conservation and large-scale genomic assembly. Large-scale refers to regions with size on the order of a chromosome (that is, 30 Mb or more).

4. Any redistribution of the data should carry this notice.

Rhesus genomic sequence data (Macaca mulatta)

The sequencing and assembly of the Macaca mulatta genome is a project of the Rhesus Macaque Genome Sequencing Consortium (RMGSC) led by the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC), in collaboration with the J. Craig Venter Institute Joint Technology Center, and the Genome Sequencing Center at Washington University School of Medicine, St. Louis.

This assembly is provided with the following acknowledgements:

- Funding: National Human Genome Research Institute (NHGRI), USA

- Sequencing/Assembly: BCM HGSC, Houston, TX, USA in collaboration with: the J. Craig Venter Science Foundation Joint Technology Center, Rockville, MD, USA; and the Genome Sequencing Center at Washington University School of Medicine, St. Louis, MO, USA

- BAC resources: Children's Hospital Oakland Research Institue (CHORI), Oakland, CA, USA

- BAC-based fingerprint map: Genome Sciences Centre, Vancouver, B.C.

- UCSC Rhesus Genome Browser (rheMac2) and Initial Annotations: UCSC Genome Bioinformatics Group, Santa Cruz, CA, USA - Robert Baertsch, Kayla Smith, Ann Zweig, Robert Kuhn, and Donna Karolchik.

For more information on the rhesus genome project, see the BCM HGSC Rhesus Monkey Genome Project web page.

These data are made available before scientific publication with the following understanding:

1. The data may be freely downloaded, used in analyses, and repackaged in databases.

2. Users are free to use the data in scientific papers analyzing particular genes and regions if the providers of this data (Baylor College of Medicine Human Genome Sequencing Center and the Rhesus Macaque Genome Sequencing Consortium) are properly acknowledged. Please cite the BCM-HGSC web site or publications from BCM-HGSC referring to the genome sequence.

3. The BCM-HGSC and RMGSC plan to publish the assembly and genomic annotation of the dataset, including large-scale identification of regions of evolutionary conservation.

4. This is in accordance with, and with the understandings in the Fort Lauderdale meeting discussing Community Resource Projects and the resulting NHGRI policy statement.

5. Any redistribution of the data should carry this notice.

Mouse genomic sequence data (Mus musculus)

The Mouse genome sequence data used, referred to as the July 2007 mouse genome data mm9 (Build 37) are made available by the UCSC Mouse Genome Project in collaboration with the Mouse Sequencing Consortium and the Mouse Genome Sequencing Consortium [Mouse Genome Sequencing Consortium ].

All the mouse sequence data is freely available for public use.

Mouse genome sequence data are released weekly into a public repository maintained by EBI and NCBI.

References to these data should cite the following publication:

Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520-562 (2002).

Rat genomic sequence data (Rattus norvegicus)

The Rat genome sequence data used is referred to as the Nov. 2004 update of the rat genome (rn4, Nov. 2004, version 3.4) from the Rat Genome Sequencing Consortium.

The assembly was produced at the Baylor College of Medicine Human Genome Sequencing Center.

For more information on the rat genome, see the Rat Genome Project website for the Baylor College of Medicine Human Genome Sequencing Center at http://www.hgsc.bcm.tmc.edu/.

The rat genome sequence is made freely available by the Rat Genome Project at the Baylor College of Medicine Human Genome Sequencing Center.

Please cite the following publications when using these data:

Havlak, P. et al. The Atlas genome assembly system. Genome Res. 14(4), 721-32 (2004).

Rat Genome Sequencing Project Consortium. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428(6982), 493-521 (2004).

These data are made available before scientific publication with the following understanding:

1. The data may be freely downloaded, used in analyses, and repackaged in databases.

2. Users are free to use the data in scientific papers analyzing particular genes and regions if the providers of this data (the Rat Genome Sequencing Consortium) are properly acknowledged.

3. The Centers producing the data reserve the right to publish the initial large-scale analyses of the dataset, including large-scale identification of regions of evolutionary conservation and large-scale genomic assembly. Large-scale refers to regions with size on the order of a chromosome (that is, 30 Mb or more).

4. This is in accordance with, and with the understandings in the Fort Lauderdale meeting discussing Community Resource Projects (see http://www.wellcome.ac.uk/en/1/awtpubrepdat.html) and the resulting NHGRI policy statement (http://www.genome.gov/page.cfm?pageID=10506537).

5. Any redistribution of the data should carry this notice.

Cow genomic sequence data (Bos taurus)

The Cow genome sequence data used is referred to as the August 2006 assembly of the cow genome (bosTau3, Baylor Release 3.1).

This assembly was produced by the Baylor College of Medicine Human Genome Sequencing Center.

For more information on the cow genome, see the project website: http://www.hgsc.bcm.tmc.edu/.

This data is made available before scientific publication with the following understanding:

- The data may be freely downloaded, used in analyses, and repackaged in databases.

- Users are free to use the data in scientific papers analyzing particular genes and regions if the providers of this data are properly acknowledged.Please cite the BCM-HGSC web site or publications from BCM-HGSC referring to the genome sequence.

- BCM HGSC plans to publish the assembly and genomic annotation of the dataset, including large-scale identification of regions of evolutionary conservation.

- This is in accordance with, and with the understandings in the Fort Lauderdale meeting discussing Community Resource Projects and the resulting NHGRI policy statement.

- Any redistribution of the data should carry this notice.

For conditions of use regarding the Cow genome sequence data, see http://www.hgsc.bcm.tmc.edu/projects/conditions_for_use.html .

Dog genomic sequence data (Canis lupus familiaris)

The dog genomic sequence data used, referred to as the UCSC version canFam2, May 2005 dog genome assembly, was produced by the Broad Institute at MIT.

The dog genome sequence is made freely available by the Dog Genome Sequencing Project.

Please cite the following publication when using these data:

Lindblad-Toh K, et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005 Dec 8;438:803-19.

Chicken genomic sequence data (Gallus gallus)

The chicken sequence data used, referred to as galGal3, May 2006 chicken v2.1 draft assembly, was produced by the Genome Sequencing Center at the Washington University School of Medicine in St. Louis, MO, USA (WUSTL).

The G. gallus sequence is made freely available to the community by the Genome Sequencing Center, Washington University School of Medicine, with the following understanding:

1. The data may be freely downloaded, used in analyses, and repackaged in databases.

2. Users are free to use the data in scientific papers analyzing particular genes and regions if the providers of this data (Genome Sequencing Center, Washington University School of Medicine) are properly cited as:

International Chicken Genome Sequencing Consortium. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 2004 Dec 9;432(7018): 695-716. PMID: 15592404.

3. Any redistribution of the data should carry this notice.

Xenopus genomic sequence data (Xenopus tropicalis)

The Xenopus sequence gata used, referred to as xenTro2, August 2005 frog (Xenopus tropicalis) whole genome shotgun (WGS) assembly version 4.1, was sequenced and assembled by the DOE Joint Genome Institute (JGI).

These sequence data are made freely available by the DOE Joint Genome Institute JGI. Preliminary drafts of the X. tropicalis sequence are made freely available

before scientific publication by the JGI and the X. tropicalis Genome Consortium, with the following understanding:

1. The data may be freely downloaded, used in analyses, and repackaged in databases.

2. Users are free to use the data in scientific papers analyzing particular genes and regions if the provider of this data (DOE Joint Genome Institute) is properly acknowledged.

3. Additional shotgun sequencing is ongoing, and future assembly releases will be made in a timely fashion. We expect to publish an initial analysis of a high quality draft X. tropicalis genome sequence in 2005 (with submission targeted for the spring of 2005) which will include descriptions of the large scale organization of the frog genome as well as genome-scale comparisons of the frog sequence and gene set with those of other animals. Others who would like to coordinate other genome-wide analysis with this work should contact Paul Richardson (pmrichardson@lbl.gov), JGI. We welcome a coordinated approach to describing this community resource.

4. Any redistribution of the data should carry this notice.

Please refer to the JGI data release policy for data use guidelines at http://genome.jgi-psf.org/Xentr4/Xentr4.download.html .

Zebrafish genomic sequence data (Danio rerio)

The Zebrafish sequence data used, referred to as the Jul. 2007 Zv7 assembly of the zebrafish genome (danRer5) was produced by the Zebrafish Sequencing Group at the Sanger Institute, a collaboration between the Wellcome Trust Sanger Institute in Cambridge, UK, the Max Planck Institute for Developmental Biology in Tuebingen, Germany, the Netherlands Institute for Developmental Biology (Hubrecht Laboratory), Utrecht, The Netherlands and Yi Zhou and Leonard Zon from the Children's Hospital in Boston, Massachusetts.

For more information on the zebrafish genome, see the project website:

http://www.sanger.ac.uk/Projects/D_rerio/ or from ftp://ftp.sanger.ac.uk/pub/zebrafish/ .

All sequence data are made available before scientific publication with the understanding that the groups involved in generating the data intend to publish the initial large-scale analyses of the dataset.

This will include a summary detailing the data that have been generated and key features of the genome identified from genomic assembly and clone mapping/sequencing.

Any redistribution of the data should carry this notice.

Please adhere to the data use guidelines detailed in the use-policy pages at http://www.sanger.ac.uk/notices/use-policy.shtml .

Drosophila genome sequence data (Drosophila melanogaster)

The Drosophila sequence data used, referred to as dm3, Apr. 2006 Drosophila melanogaster draft assembly (BDGP Release 5) was provided by the Berkeley Drosophila Genome Project (BDGP). For more information on the D. melanogaster genome, see the release notes: http://www.fruitfly.org/sequence/release5genomic.shtml .

All sequence data is freely available for public use.

For additional information about these data, including citation guidelines, see the BDGP web site at http://www.fruitfly.org/ .

Nematode genome sequence data (Caenohrabditis elegans)

The C. elegans sequence data used, referred to as ce4, Jan. 2007 Caenorhabditis elegans assembly is based on sequence version WS170 deposited into WormBase as of 19 January 2007. This data was produced jointly by the Sanger Institute in Hinxton, England and the Genome Sequencing Center at Washington University in St. Louis (WUSTL) School of Medicine. All this sequence data is freely available for public use.

Ciona genome sequence data (Ciona intestinalis)

The Ciona intestinalis data used is referred to as the Mar. 2005 freeze of the C. intestinalis genome (ci2) from the US Department of Energy's (DOE) Joint Genome Institute (JGI).

The C. intestinalis sequence is made freely available by JGI http://www.jgi.doe.gov/.

For restrictions on the use of these data, see the JGI Data Release Policy at http://genome.jgi-psf.org/ciona4/ciona4.download.html.

Data Sources

Transcribed sequence data used is that which is deposited in GenBank [Benson et al. 2007] as mRNA or EST at the time of each update of our data. This includes the NCBI reference sequences (RefSeq) [Pruitt et al. 2007]. Gene loci were determined using the gene centered information produced by Entrez Gene [Maglott et al. 2007] at NCBI. BLAT [Kent 2002] mappings of transcribed sequences onto the relevant genome assemblies were collected from UCSC Table Browser [Karolchik et al. 2004] at the UCSC Genome Browser [Kent et al. 2002, Kuhn et al. 2007].Variants of the canonical polyadenylation site AATAAA were based on signal usage in human genes [Beaudoing et al. 2000].

Tools

The data was processed using Perl and MySQL and the web interface was generated using MySQL, PHP, R and Apache. We are deeply endebted to the communities which provide these tools as open source software.

Financial Support

This project was supported by the Muscular Dystrophy Association (MDA) and the European Commission EURASNET.

Personal Acknowledgements

We are indebted to Juan Valcarcel from the CRG-Centre de Regulació Genómica, Barcelona, Spain for many detailed discussions on the data and to Sam Aparicio from BC Cancer Research Centre in Vancouver, British Columbia, Canada, for initial ideas critical to the success of this project.