CHAPTER 9

 

Genomics involves studying the totality of all the genes in the genome in search for patterns that only can be understood by looking at the entire genetic system. Figure 9-2 summarizes various research alternatives.

One goal is to obtain a complete a consensus sequence of the nucleotides that define a species. To do this, a variety of mapping projects must be performed. Previously we have studied how maps may be made for genes that generate discernible phenotypic effects. While these maps provide a coarse subdivision of the genome they are of minimal use for more fine scaled studies associated with mapping the genome.

High resolution maps employ molecular markers such as restriction endonuclease cutting sites (Figure 9-5), and simple sequence length polymorphisms (SSLPs) Figures 9-6, 9-7, 9-8. Studying SSLPs often involves using VNTRs (variable number of tandem repeats) as molecular markers. VNTRs are regions of DNA that often are categorized as either minisatellites (regions 15-100 nucleotides long forming regions of 1-5kb tandem repeats) or microsatellites (regions of dinucleotide repeats). Because the number of repeats are variable in different DNA molecules the different lengths of these regions may serve as alternative alleles for a hypothetical gene. These alternative alleles may be identified using electrophoresis when the regions are included between two restriction endonuclease cutting sites (or between two PCR primers). Such procedures can generate DNA fingerprints. Specific bands may be linked to other alleles and therefore serve as surrogates when one tries to locate whether these other alleles are present.     

High-resolution cytogenetic maps associate particular sequence patterns with visually identifiable chromosomal regions. One way this can be done is by using FISH (fluorescent in situ hybridization) where a fluorescently labeled probe binds to a partially denatured chromosome (Figure 9-11).

 Human-rodent somatic cell hybridization studies have been effectively used to assign loci to their chromosomal locations in humans. Operational this works because human/rodent cells can fuse together in cell cultures forming heterokaryons (i.e., cells with two nuclei). The rate of such fusions is increased if Sendai virus is added to the culture. Sendai virus simultaneously can bind to multiple cells, thereby positioning them close to each other and encouraging fusion. The nuclei fuse forming a single nucleus with chromosomes from both species (a synkaryon). As the cell goes through subsequent divisions chromosomes are randomly lost. These chromosomes are selectively from one species, so eventually a cell line develops having a full set of mouse chromosomes and a subset of human chromosomes. If human cells are subjected to high energy irradiation prior to them fusing with mouse cells their chromosomes will break (I.e., the sugar phosphate backbones will break) and smaller sections (smaller linkage units) of human chromosomes will at random become incorporated in mouse chromosomes (these are called radiation hybrids). Different cell lines produced in this way will contain different sets of human chromosomes. Biochemicals present or absent from these different cell lines allow one to assign production of this chemical to a chromosome.

 

Physical maps of the genome may involve creating multiple clones which in total represent the entire genome and then ordering these clones such that they reflect their appropriate orders along chromosomes. Usually these clones will contain overlapping regions of genomic DNA. Each of these clones may be subjected to restriction endonucleases and a fingerprint can be developed for each clone. The number of shared fragments between clones reflects their relative degree of overlap (Figure 9-15). Sequence tagged sites (STSs) are short regions of DNA that are sequenced within clones. Clones can also be ordered by evaluating which have overlapping STSs (Figure 9-16). If a complete collection of ordered clones is produced, their DNA can be arranged on nitrocellulose filters and denatured. Any nucleic acid obtained from that species can be labeled for use as a probe and then its position within the genome established by evaluating its complementary to these clones (Figure 9-17).

STRUCTURAL GENOMICS

Eukaryotic DNA can be categorized as single copy functional genes, repetitive DNA, or spacer DNA. Single copy functional genes hold unique messages for the polypeptides necessary for cellular functioning. Spacer DNA is all DNA that does not fit into either of the other categories. Repetitive DNA is subdivided into a number of subcategories, but it may be either coding or noncoding in nature. Repetitive coding DNA can be dispersed throughout the genome (e.g. human hemoglobin gene family) and may include pseudogenes that do not code for functional polypeptides (but have obviously arisen from other functional members of the gene family). Other gene families are not dispersed throughout the genome, but form tandem families (e.g., histone genes and nucleolar organizer genes) Figure 9-22. Some repetitive DNA is noncoding but still has a function. An example of this repetitive DNA found in the telomeres.

Repetitive DNA exists that has no known function. Some such repeats reside in the centromeric regions and comprise most of the constitutive heterochromatin. These tandem repeats have a different average base composition than the rest of the DNA and so form a region that has a unique density when density gradient centrifugation is performed. Because of this density pattern the centromeric DNA is referred to as satellite DNA

Another category of nonfunctional repetitive DNA is the variable number of tandem repeats (VNTR) DNA. VNTR forms minisatellites (or microsatellites) which are short stretches of DNA that are repeated in tandem differing numbers of times and such sites are scattered throughout the genome. These VNTRs are useful because each individual shows effectively their own pattern of VNTRs which can than serve as a genetic fingerprint for the individual.

Some DNA in the genome has arisen as copies of other genomic DNA that has been inserted in new locations. Such DNA is called a transposable element. If transposable elements contain genes they are referred to as transposons. Transposable elements can be conservative (i.e., they are DNA that moves from one site to another) or replicative (i.e., additional copies are made from a DNA sequence that remains in position in the genome and these copies are inserted in new positions elsewhere in the genome). Retrotransposons are the result of RNA copies of genomic regions that then produce DNA through reverse transcriptase. The DNA so produced then is inserted in additional genomic locations. Retrotransposons make up a considerable amount of eukaryotic DNA. The retrotransposon DNA is further subdivided into long interspersed elements (LINES) which consist of 1-5kb regions repeated 20,000-40,000 times per human genome and short interspersed elements (SINES). The most common SINE is the Alu sequence that makes up about 5% of the human genome. See Figures 9-27, 9-28.

 BIOINFORMATICS

A complete genomic nucleotide sequence has been developed for a number of taxa. Interpreting these data bases is the purview of the evolving discipline of Bioinformatics. A significant part of this challenge is recognizing signals that exist along the DNA molecule. These signals typically represent docking sites for the interaction of various molecules. Sometimes this involves interactions between the DNA and protein. Alternatively, since these sites may be transcribed, they may indirectly represent docking sites on RNA for its interaction with proteins or other RNAs. Finally, some of these sites will serve as codon templates and result in polypeptides that in turn will interact with other proteins, DNA, or RNA (Figure 9-30). Among the problems currently faced is that, in many cases, alternative DNA sequences may be “equally” effective as serving as the same docking site message, and the same sequence may represent a docking site when occurring in one location in the genome but serve another purpose in another location (Figure 9-31).

Bioinformatics also involves identifying the set of polypeptides generated from the genome understudy. This collection of proteins is termed the organism’s proteome. This involves examining all  the open reading frames (ORFs) in the genome and is made quite difficult in higher eukaryotes because of gene splicing. In many taxa, the same pre-mRNA may be spliced in alternative ways thereby differentially including or excluding some exons, thus creating more than one type polypeptide (estimates in humans are that 60% of genes are spliced in alternative ways and that on average there are 3 spice variants per gene). cDNA sequence is often of value in locating ORFs in the genome. Searches for ORFs can also be aided by using known amino acid proteins (or DNA sequences) from other taxa.

 

FUNCTIONAL GENOMICS

DNA chips are glass chips onto which DNA is dried and treated so that it will stick to the chip. The DNA is arranged in an orderly pattern and the chip is exposed to any of a variety of treatments. For example, flourescently labeled cDNA produced from mRNA collected at different times during the development of an organism will selectively bind to particular locations on the chip. Therefore, the binding locations which reflect the genes that are turned on at that developmental period can be identified.