The hereditary nature of every living organism is defined by its
genome, which consists of a long sequence of nucleic acid that
provides the information needed to construct the organism. We use the
term "information" because the genome does not itself perform any active
role in building the organism; rather it is the sequence of the individual
subunits (bases) of the nucleic acid that determines hereditary
features. By a complex series of interactions, this sequence is used to
produce all the proteins of the organism in the appropriate time and
place. The proteins either form part of the structure of the organism, or
have the capacity to build the structures or to perform the metabolic
reactions necessary for life.
The genome contains the complete set of hereditary information for
any organism. Physically the genome may be divided into a number of
different nucleic acid molecules. Functionally it may be divided into
genes. Each gene is a sequence within the nucleic acid that represents a
single protein. Each of the discrete nucleic acid molecules comprising
the genome may contain a large number of genes. Genomes for living
organisms may contain as few as <500 genes (for a mycoplasma, a type
of bacterium) to as many as >40,000 for Man.
In this chapter, we analyze the properties of the gene in terms of its
basic molecular construction. Figure 1.1 summarizes the stages in the
transition from the historical concept of the gene to the modern definition
of the genome.
The basic behavior of the gene was defined by Mendel more than a
century ago. Summarized in his two laws, the gene was recognized as a
"particulate factor" that passes unchanged from parent to progeny.
A gene may exist in alternative forms. These forms are called alleles.
In diploid organisms, which have two sets of chromosomes, one
copy of each chromosome is inherited from each parent. This is the
same behavior that is displayed by genes. One of the two copies of each
gene is the paternal allele (inherited from the father), the other is the
maternal allele (inherited from the mother). The equivalence led to the
discovery that chromosomes in fact carry the genes.
Each chromosome consists of a linear array of genes. Each gene resides
at a particular location on the chromosome. This is more formally
called a genetic locus. We can then define the alleles of this gene as the
different forms that are found at this locus.
The key to understanding the organization of genes into chromosomes
was the discovery of genetic linkage. This describes the observation that
alleles on the same chromosome tend to remain together in the progeny
instead of assorting independently as predicted by Mendel's laws. Once
the unit of recombination (reassortment) was introduced as the measure
of linkage, the construction of genetic maps became possible.
On the genetic maps of higher organisms established during the first
half of this century, the genes are arranged like beads on a string. They
occur in a fixed order, and genetic recombination involves transfer of
corresponding portions of the string between homologous chromosomes.
The gene is to all intents and purposes a mysterious object (the
bead), whose relationship to its surroundings (the string) is unclear.
The resolution of the recombination map of a higher eukaryote is restricted
by the small number of progeny that can be obtained from each
mating. Recombination occurs so infrequently between nearby points
that it is rarely observed between different mutations in the same gene.
By moving to a microbial system in which a very large number of progeny
can be obtained from each genetic cross, it became possible to
demonstrate that recombination occurs within genes. It follows the same
rules that were previously deduced for recombination between genes.
Mutations within a gene can be arranged into a linear order, showing
that the gene itself has the same linear construction as the array of
genes on a chromosome. So the genetic map is linear within as well as
between loci: it consists of an unbroken sequence within which the
genes reside. This conclusion leads naturally into the modern view that
the genetic material of a chromosome consists of an uninterrupted
length of DNA representing many genes.
A genome consists of the entire set of chromosomes for any particular
organism. It therefore comprises a series of DNA molecules (one for
each chromosome), each of which contains many genes. The ultimate
definition of a genome is to determine the sequence of the DNA of each
The first definition of the gene as a functional unit followed from
the discovery that individual genes are responsible for the production of
specific proteins. The difference in chemical nature between the DNA
of the gene and its protein product led to the concept that a gene codes
for a protein. This in turn led to the discovery of the complex apparatus
that allows the DNA sequence of gene to generate the amino acid sequence
of a protein.
Understanding the process by which a gene is expressed allows us to
make a more rigorous definition of its nature. Figure 1.2 shows the
basic theme of this book. A gene is a sequence of DNA that produces another
nucleic acid, RNA. The DNA has two strands of nucleic acid, and
the RNA has only one strand. The sequence of the RNA is determined by
the sequence of the DNA (in fact, it is identical to one of the DNA
strands). In many, but not in all cases, the RNA is in turn used to direct
production of a protein. Thus a gene is a sequence of DNA that codes for
an RNA; in protein-coding genes, the RNA in turn codes for a protein.
From the demonstration that a gene consists of DNA, and that a
chromosome consists of a long stretch of DNA representing many
genes, we move to the overall organization of the genome in terms of its
DNA sequence. In 2 The interrupted gene we take up in more detail the
organization of the gene and its representation in proteins. In 3 The
content of the genome we consider the total number of genes, and in 4
Clusters and repeats we discuss other components of the genome and
the maintenance of its organization.