The Genome sequence when printed fills a huge book of close print

El Proyecto del Genoma son los intentos cientificos que buscan determinar completamente la secuencia entera delgenoma de un organismo (ya sea un animal, una planta, un hongo, una bacteria, una arquean, un protista o un virus) y para anotar genes que codifican proteinas y otras características importantes del genoma codificado.[1] La secuencia del genoma de un organismo incluye la recopilación de las secuencias de cada cromosoma en el ADN de un organismo. Para una bacteria que contiene unicamente un cromosoma, el proyecto del genoma buscará mapear la secuencia de ese cromosoma. Para la especie humana, cuyo genomea incluye 22 pares de autosomas y 2 cromosomas sexuales, una secuencia completa del genoma tendrá 46 secuencias separadas de cromosomas.

El Proyecto del Genoma Humano fue un punto importante en proyecto del genoma que está teniendo un gran impacto en el campo de la investigación con respecto a las ciencias de la vida, que tiene potencial para estimular numerosos desarrollos médicos y comerciales.[2]

Monataje del genoma edit

El montaje del genoma se refiere al proces de tomar un número grande de secuencias de ADN cortas y poniéndolas juntas nuevamente para formar una representación de los cromosomas originales de los cuales se origina el ADN. En un proyecto de secuenciación escopeta, todo el ADN de una fuente (normalmente un organismo simple, desde una bacteria hasta unmamífero) primero es separado en millones de piezas pequeñas. Estas piezas son "leídas" por máquinas automatizadas de secuenciación, que pueden leer hasta 1000 [[nucleótido]s o bases al mismo timpo. (Las cuatro bases son adenina, guanina, citosina, y thiminea, representadas como AGCT.) El algoritmo del montaje de un genoma funciona al tomar todas las piezas y alinearlas una con otra, y detectar todos los lugares donde dos o más de las secuencias cortas o "lecturas" se sobreponen. Estas lecturas sobrepuestas se pueden juntas, y el proceso continua.

El montaje del genoma es un problema computacional, hecho más díficil por el hecho de que muchos genomas contienen grandes números de secuencias idénticas, conocidas como, repeticiones. Estas repeticiones puedes estar formadas por miles de nucleótidos y algunas toman lugar en miles de unicaciones diferentes, especialmente en los genomas grandes de plantas y animales.

La secuencia resultante de la seucnecia del genoma se produce al combinar la información secuenciada contigs y después utilizando información que vincule para crear andamio biológicos. Los andamio biológicos se posiconan a lo largo del mapa físico del cromosoma creando un "camino dorado".

Assembly software edit

Originally, most large-scale DNA sequencing centers developed their own software for assembling the sequences that they produced. However, this has changed as the software has grown more complex and as the number of sequencing centers has increased. An example of such assembler Short Oligonucleotide Analysis Package developed by BGI for de novo assembly of human-sized genomes, alignment, SNP detection, resequencing, indel finding, and structural variation analysis.[3][4][5]

Genome annotation edit

Genome annotation is the process of attaching biological information to sequences.[6] It consists of three main steps:

  1. identifying portions of the genome that do not code for proteins
  2. identifying elements on the genome, a process called gene prediction, and
  3. attaching biological information to these elements.

Automatic annotation tools try to perform all this by computer analysis, as opposed to manual annotation (a.k.a. curation) which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.

The basic level of annotation is using BLAST for finding similarities, and then annotating genomes based on that.[1] However, nowadays more and more additional information is added to the annotation platform. The additional information allows manual annotators to deconvolute discrepancies between genes that are given the same annotation. Some databases use genome context information, similarity scores, experimental data, and integrations of other resources to provide genome annotations through their Subsystems approach. Other databases (e.g. Ensembl) rely on both curated data sources as well as a range of different software tools in their automated genome annotation pipeline.[7]

Structural annotation consists of the identification of genomic elements.

  • ORFs and their localisation
  • gene structure
  • coding regions
  • location of regulatory motifs

Functional annotation consists of attaching biological information to genomic elements.

  • biochemical function
  • biological function
  • involved regulation and interactions
  • expression

These steps may involve both biological experiments and in silico analysis. Proteogenomics based approaches utilize information from expressed proteins, often derived from mass spectrometry, to improve genomics annotations.[8]

A variety of software tools have been developed to permit scientists to view and share genome annotations.[citation needed]

Genome annotation remains a major challenge for scientists investigating the human genome, now that the genome sequences of more than a thousand human individuals and several model organisms are largely complete.[9][10] Identifying the locations of genes and other genetic control elements is often described as defining the biological "parts list" for the assembly and normal operation of an organism.[1] Scientists are still at an early stage in the process of delineating this parts list and in understanding how all the parts "fit together".[11]

Genome annotation is an active area of investigation and involves a number of different organizations in the life science community which publish the results of their efforts in publicly available biological databases accessible via the web and other electronic means. Here is an alphabetical listing of on-going projects relevant to genome annotation:

At Wikipedia, genome annotation has started to become automated under the auspices of the Gene Wiki portal which operates a bot that harvests gene data from research databases and creates gene stubs on that basis.[12]

When is a genome project finished? edit

When sequencing a genome, there are usually regions that are difficult to sequence (often regions with highly repetitive DNA). Thus, 'completed' genome sequences are rarely ever complete, and terms such as 'working draft' or 'essentially complete' have been used to more accurately describe the status of such genome projects. Even when every base pair of a genome sequence has been determined, there are still likely to be errors present because DNA sequencing is not a completely accurate process. It could also be argued that a complete genome project should include the sequences of mitochondria and (for plants) chloroplasts as these organelles have their own genomes.

It is often reported that the goal of sequencing a genome is to obtain information about the complete set of genes in that particular genome sequence. The proportion of a genome that encodes for genes may be very small (particularly in eukaryotes such as humans, where coding DNA may only account for a few percent of the entire sequence). However, it is not always possible (or desirable) to only sequence the coding regions separately. Also, as scientists understand more about the role of this noncoding DNA (often referred to as junk DNA), it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism.

In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. Such projects may also include gene prediction to find out where the genes are in a genome, and what those genes do. There may also be related projects to sequence ESTs or mRNAs to help find out where the genes actually are.

Historical and technological perspectives edit

Historically, when sequencing eukaryotic genomes (such as the worm Caenorhabditis elegans) it was common to first map the genome to provide a series of landmarks across the genome. Rather than sequence a chromosome in one go, it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). Changes in technology and in particular improvements to the processing power of computers, means that genomes can now be 'shotgun sequenced' in one go (there are caveats to this approach though when compared to the traditional approach).

Improvements in DNA sequencing technology has meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per base pair) and newer technology has also meant that genomes can be sequenced far more quickly.

When research agencies decide what new genomes to sequence, the emphasis has been on species which are either high importance as model organism or have a relevance to human health (e.g. pathogenic bacteria or vectors of disease such as mosquitos) or species which have commercial importance (e.g. livestock and crop plants). Secondary emphasis is placed on species whose genomes will help answer important questions in molecular evolution (e.g. the common chimpanzee).

In the future, it is likely that it will become even cheaper and quicker to sequence a genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. For humans, this will allow us to better understand aspects of human genetic diversity.

Example genome projects edit

 
L1 Dominette 01449, the Hereford who serves as the subject of the Bovine Genome Project

Many organisms have genome projects that have either been completed or will be completed shortly, including:

See also edit

References edit

  1. ^ a b c Pevsner, Jonathan (2009). Bioinformatics and functional genomics (2nd edy ed.). Hoboken, N.J: Wiley-Blackwell. ISBN 9780470085851.
  2. ^ "Potential Benefits of Human Genome Project Research". Department of Energy, Human Genome Project Information. 2009-10-09. Retrieved 2010-06-18.
  3. ^ Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun (February 2010). "De novo assembly of human genomes with massively parallel short read sequencing". Genome Research. 20 (2): 265–272. doi:10.1101/gr.097261.109. ISSN 1549-5469. PMC 2813482. PMID 20019144.
  4. ^ a b Rasmussen, Morten; et al. (2010-02-11). "Ancient human genome sequence of an extinct Palaeo-Eskimo". Nature. 463 (7282): 757–762. doi:10.1038/nature08835. ISSN 1476-4687. PMC 3951495. PMID 20148029.
  5. ^ Wang, Jun; et al. (2008-11-06). "The diploid genome sequence of an Asian individual". Nature. 456 (7218): 60–65. doi:10.1038/nature07484. ISSN 0028-0836. PMC 2716080. PMID 18987735.
  6. ^ Stein, L. (2001). "Genome annotation: from sequence to biology". Nature Reviews Genetics. 2 (7): 493–503. doi:10.1038/35080529. PMID 11433356.
  7. ^ "Ensembl's genome annotation pipeline online documentation".
  8. ^ Gupta, Nitin; Tanner, Stephen; Jaitly, Navdeep; Adkins, Joshua N.; Lipton, Mary; Edwards, Robert; Romine, Margaret; Osterman, Andrei; Bafna, Vineet; Smith, Richard D.; Pevzner, Pavel A. (September 2007). "Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation". Genome Research. 17 (9): 1362–1377. doi:10.1101/gr.6427907. ISSN 1088-9051. PMC 1950905. PMID 17690205.
  9. ^ ENCODE Project Consortium (2011). Becker PB (ed.). "A User's Guide to the Encyclopedia of DNA Elements (ENCODE)". PLOS Biology. 9 (4): e1001046. doi:10.1371/journal.pbio.1001046. PMC 3079585. PMID 21526222.{{cite journal}}: CS1 maint: unflagged free DOI (link)  
  10. ^ McVean, G. A.; Abecasis, D. M.; Auton, R. M.; Brooks, G. A. R.; Depristo, D. R.; Durbin, A.; Handsaker, A. G.; Kang, P.; Marth, E. E.; McVean, P.; Gabriel, S. B.; Gibbs, R. A.; Green, E. D.; Hurles, M. E.; Knoppers, B. M.; Korbel, J. O.; Lander, E. S.; Lee, C.; Lehrach, H.; Mardis, E. R.; Marth, G. T.; McVean, G. A.; Nickerson, D. A.; Schmidt, J. P.; Sherry, S. T.; Wang, J.; Wilson, R. K.; Gibbs (Principal Investigator), R. A.; Dinh, H.; Kovar, C. (2012). "An integrated map of genetic variation from 1,092 human genomes". Nature. 491 (7422): 56–65. doi:10.1038/nature11632. PMC 3498066. PMID 23128226.
  11. ^ Dunham, I.; Bernstein, A.; Birney, S. F.; Dunham, P. J.; Green, C. A.; Gunter, F.; Snyder, C. B.; Frietze, S.; Harrow, J.; Kaul, R.; Khatun, J.; Lajoie, B. R.; Landt, S. G.; Lee, B. K.; Pauli, F.; Rosenbloom, K. R.; Sabo, P.; Safi, A.; Sanyal, A.; Shoresh, N.; Simon, J. M.; Song, L.; Trinklein, N. D.; Altshuler, R. C.; Birney, E.; Brown, J. B.; Cheng, C.; Djebali, S.; Dong, X.; Dunham, I. (2012). "An integrated encyclopedia of DNA elements in the human genome". Nature. 489 (7414): 57–74. doi:10.1038/nature11247. PMC 3439153. PMID 22955616.
  12. ^ Huss, Jon W.; Orozco, C; Goodale, J; Wu, C; Batalov, S; Vickers, TJ; Valafar, F; Su, AI (2008). "A Gene Wiki for Community Annotation of Gene Function". PLOS Biology. 6 (7): e175. doi:10.1371/journal.pbio.0060175. PMC 2443188. PMID 18613750.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  13. ^ Yates, Diana (2009-04-23). "What makes a cow a cow? Genome sequence sheds light on ruminant evolution" (Press Release). EurekAlert!. Retrieved 2012-12-22.
  14. ^ Elsik, C. G.; et al. (2009). "The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution". Science. 324 (5926): 522–528. doi:10.1126/science.1169588. PMC 2943200. PMID 19390049.
  15. ^ http://www.genome.gov/20519480

External links edit