Evolution of Genomes

Definitions

  1. Genomics: the study of whole sets of genes and their interactions within a species, as well as genome comparisons between species.
  2. Bioinformatics: the use of computers, software, and mathematical models to process and integrate biological information from large data sets.
  3. Linkage map: a genetic map based on the frequencies of recombination between markers during crossing over of homologous chromosomes.
  4. Physical map: A genetic map in which the physical distances between genes or other genetic markers are expressed (often based on the number of base pairs along the DNA).
  5. Dideoxy chain-termination method (Sanger sequencing): “To sequence the DNA, it must first be separated into two strands. The strand to be sequenced is copied using chemically altered bases. These altered bases cause the copying process to stop each time one particular letter is incorporated into the growing DNA chain. This process is carried out for all four bases, and then the fragments are put together like a jigsaw to reveal the sequence of the original piece of DNA,” (Source 4).
  6. Metagenomics: the collection and sequencing of DNA from a group of species. Computer software sorts partial sequences and assembles them into genome sequences of individual species making up the sample.
  7. Gene annotation: analysis of genomic sequences to identify protein-coding genes and determine the function of their products.
  8. Proteomics: the systematic study of the full protein sets (proteomes) encoded by genomes.
  9. Pseudogenes: DNA segments very similar to real genes bu which do not yield a functional product; DNA segment that formerly functioned as a gene but has become inactivated in a particular species because of mutation.
  10. Repetitive DNA: Nucleotide sequences, usually noncoding, that are present in many copies in a eukaryotic genome. The repeated units may be short and arranged tandemly or long and dispersed in the genome.
  11. Transposable element: a segment that can move within the genome of a cell by means of a DNA or RNA intermediate; called a transposable genetic element.
  12. Transposon: a transposable element that moves within a genome by means of a DNA intermediate, “by a “cut and paste” mechanism, which removes the element from the original site, or by a “copy and paste” mechanism, which leaves a copy behind” (Source 1).
  13. Retrotransposon: a transposable element that moves within a genome by means of an RNA intermediate, a transcript of the retrotransposon DNA.
  14. Simple sequence DNA: a DNA sequence that contains many copies of tandemly repeated short sequences. May contain 15 to 500 nucleotides. Makes up 3% of the human genome. It has a different density than the rest of the human genome. It is also often located at chromosomal telomeres and centromeres, indicating a likely structural role.
  15. Short tandem repeat (STR): simple sequence DNA containing multiple tandemly repeated units of two to five nucleotides. Variations in STRs act as genetic markers in STR analysis, used to prepare genetic profiles.
  16. Multigene family: a collection of genes with similar or identical sequences, presumably of common origin.
  17. The Human Genome Project: began in 1990, as an international effort to sequence the human genome. It involved 20 large sequencing centers in 6 countries.

Important Concepts In the Evolution of Genomes:

Three-Stage Approach:

  1. Create a linkage map of the genetic markers spaced throughout the chromosomes; their order and relative distance is based on recombination frequencies. The markers may be individual genes or whole sequences (Hancock HS Outline).
  2. Create a physical map, marking the distance between genes based on the number of base pairs between them.
  3. Identify the complete nucleotide sequence of each chromosome, using the dideoxy chain-termination method (Sanger sequencing). There are 3.2 billion base pairs in a haploid set of human chromosomes.

Researchers used the three-stage approach in the Human Genome Project (Source 1).

Whole-Genome Shotgun Approach:

This approach skips over the process of linkage and physical mapping, going straight to sequencing random DNA fragments. Using a computer, scientists reassembled whole genomes from these jumbled pieces of DNA. “The DNA fragments are cloned into three different vectors, each of which takes a defined size of insert. The computer uses the known distance between the ends of the inserted DNA, along with other information, to assemble the sequences” (Source 1, 2).

The Whole-Genome approach may be less accurate than the three-step method, because it may skip over some duplicated sequences and underestimate the size of the genome under examination (Source 1, 2).

Identifying Protein-Coding Genes Within DNA Sequences:

  • In order to ID genes and their functions, scientists used software that scans DNA sequences for “transcriptional and translational start and stop signals, RNA splicing sites, and other signs of protein-coding genes” (Source 1).
    • These are called expressed sequence tags (ESTs)

Biochemical vs. Functional approach:

  • Biochemical approach determines the 3-D structure of the protein
  • Functional approach means that scientists disable the gene to see its effect on the phenotype.
Screen Shot 2016-04-25 at 17.09.52

An example of a protein’s 3D structure (Source 6)

Genes and Gene Expression at the Systems Level (Source 1)

  • Researchers are using new information to create catalogs of genes to see how they fit into the larger picture of biology
  • An application of systems biology is defining gene circuits and networks of protein interactions
  • Another application of systems biology is the Cancer Genome Atlas, where many interacting genes are examined together
  • From 2007-2010, researchers compared gene sequences and patterns of expression from lung cancer, ovarian cancer, and brain glioblastoma to those of normal cells
  • The GeneChip is a microarray that holds most known human genes

Genomes vary widely

  • Bacterial genomes tend to have between 1 and 6 million base pairs (Mb); E. coli has 4.6 Mb
  • Archaea genomes tend to be the same
  • Eukaryotic genomes are larger; most multicellular plants/animals have at least 100 Mb
  • Bacteria and archaea have 1,500 to 7,500 genes, while eukaryotes go from 5,000 to 40,000. Humans have around 20,488 distinguishable genes.
  • Gene density also varies; eukaryotes tend to have larger genomes but lower gene density than prokaryotes (Source 1). Multiple reasons for this discrepancy include the fact that bacteria do not have introns and “nontranscribed regulatory sequences” make up only a small section of their DNA.
  • Humans have 10,000 times as much noncoding DNA as bacteria (1.5% of human genome codes for proteins or gives rise to rRNAs or tRNAs).
  • Human genes have 27,000 base pairs on average, while bacterial genes have 1,000 base pairs on average.
  • Humans have a relatively low # of genes: the way that they cope with this low number is through alternative splicing of RNA transcripts, generating several functioning proteins from a single gene.
1_2

A reminder of how eukaryotic DNA is organized (Source 7)

What do transposable elements do?

  • During a process called transposition, these transposable elements (TEs) move from one site on a cell’s DNA to another by recombination; the new DNA sites come together by bending.
  • In eukaryotes, there are two types of TEs:
    • Transposons (see definitions)
    • Retrotransposons make up the majority of transposable elements
  • Retrotransposons always leave a copy at the original site, because they begin transcription by copying into RNA, which is later converted to DNA by reverse transcriptase (which is encoded into the retrotransposon).
    • Retroviruses may have evolved from retrotransposons.
  • Alu elements are TEs that make up about 10% of the human genome, are 300 nucleotides long, and do not code for a protein— although they are transcribed into RNA molecules. They have no known function (Source 1, 2, 5).
  • LINE-1 or L1 elements are retrotransposons that make up 17% of the human genome, contain 6,500 base pairs, have a slow rate of transposition, and contain sequences that block RNA polymerase (necessary for transcription).
    • might help in regulating gene expression (Source 1). 
  • If a genome contains multiple copies of transposons, they could facilitate crossing over.
    • their insertion could block protein sequences, affect promoters, carry new genes to an area
    • could create new places for alternative RNA splicing (Source 8).
Screen Shot 2016-04-25 at 19.06.05

Example of transposon movement (Source 8)

Screen Shot 2016-04-25 at 19.19.10

(Source 9)

Other parts of the human genome:

  • 15% of human DNA consists of repetitive sequences unrelated to transposable elements, and likely resulted from mistakes in DNA replication
  • 5% of the human genome consists of duplication of 10,000 to 300,000 bp stretches of DNA
  • Simple-sequence DNA makes up 3% of the human genome.
  • Gene-related DNA makes up 25% of the human genome.

Genes:

  • Eukaryotic genes occur most often as one copy per haploid set of chromosomes.
  • More than 1/2 occur in multigene families (ex= RNA products and hemoglobin)

How do genomes evolve?

  • Chromosome sets occasionally duplicate (polyploidy)
  • Chromosomes may also be altered. For example, some chromosomes are duplicated or inverted
  • Exon shuffling (a molecular mechanism that creates new genes)
  • Movement of TEs

Facts about genomic evolution:

  • Since humans have 23 chromosomes, while chimpanzees have 24, it is likely that two ancestral chromosomes fused.
  • The rate of duplication/inversions accelerated about 100 million years ago, in unison with the extinction of dinosaurs and the diversification of mammalian species (Source 8).

A video explaining the evolution of genomes

Evo-Devo: 

  • Homeobox (180 nucleotides that regulate gene expression— usually during embryonic development— found both in invertebrate organisms and vertebrate organisms)
  • Scientists first recognized the homeobox domain in Drosophila melanogaster, flies.

 

Note: The textual information on this page comes from sources 1, 2, 5, and 6


Sources:

  1. http://www.hancockhs.org/ourpages/auto/2010/9/20/47399010/chap21.pdf
  2. AP Biology textbook, Eighth edition
  3. Image: http://sites.duke.edu/dukeresearch/files/2012/05/header_evolutionary.jpg
  4. https://www.dnalc.org/view/15479-Sanger-method-of-DNA-sequencing-3D-animation-with-narration.html
  5. AP Biology textbook, Ninth edition
  6. http://web.stanford.edu/class/cs279/lectures/lecture2.pdf
  7. http://www.nature.com/scitable/content/ne0000/ne0000/ne0000/ne0000/113743374/1_2.jpg
  8. http://www.slideshare.net/smithbio/ap-chapter-21-presentation
  9. http://image.slidesharecdn.com/21lecturegenomeandevolution-150106205000-conversion-gate02/95/21-lecture-genomeandevolution-54-638.jpg?cb=1420599180