Sunday, August 4, 2024

Genetics 101

Genetics is inescapable in the popular media these days. The Human Genome Project, the HapMap, personalized medicine, are buzzwords on a regular basis. But what are they really about, and what is or is not realistically feasible at the current time?

As everyone probably knows by now, the human genome consists of roughly 3 billion "bases" of information. This is packaged biochemically as long DNA molecules assembled into chromosomes. The human species has 23 pairs of chromosomes, 2 each of chromosome 1,2,3 etc up to chromosome 22; males have an X and a Y and females have two X chromosomes. In other words, except for the special properties of the X and Y, cells in our bodies have two complete copies of the genome. The chromosomes vary in length, chromosome 1 has around 300 million bases whereas chromosome 22 only about 50 million (very approximately).

Without getting into the details of DNA structure and chemistry, the computer analogy is fairly helpful. If DNA is a computer program for assembly of an organism, then the bases are like bits of information. As we write computer programs, the hardware ultimately sees each bit as either a 0 or 1, so there are two states - in contrast each base or bit of DNA can have 4 different states (chemically G, A, T or C).

Genes are simply segments of DNA sequence that encode particular activities. There is a lot of argument these days about the exact definition of a gene, and some slightly new-age holistic arguments that there are not actually any such things, but most working scientists still find it necessary to think of genes as real entities in nature. With the computer analogy, one can say that only a whole program has the ultimate information, but still programs can usually be helpfully broken down into segments of code, subroutines or even individual lines. In any case, the exact nature of a gene is not relevant to my train of thought here.

There is not actually one definable "human genome" sequence. Extensive experimental work has clearly shown that we each differ at millions of places in our DNA sequence - a place being defined as a particular element of the code that can usually be compared in two different individuals. Even more, since we each carry two copies of most of the chromosomes, one received from dad's sperm and one from mom's egg, those two are equally quite different. So internally we also carry two different bits of information at many millions of sites in our own genomes (with the exception of genetically identical twins, nature's clones). That is not to say that the human genome sequence as generated by the Human Genome Project is not real, it is simply a composite or consensus combined from different individual human sources.

It is clear that these millions of differences account for many of the physiological differences between individuals. Together they define the genetic component of biological individuality. There is of course a strong environmental component, it is not necessary for now to try and figure out how much of each is relevant to any particular trait such as height, eye color, disease susceptibility. The problem is, for any given DNA sequence variant, a G in one person but a T in another person (or on the two different copies of a particular chromosome in one person), what is the effect of that sequence variation on the physiology of that person. This in essence is the discipline of genetics: genetics is the study of genotype/phenotype causation. In practice it is very difficult, since there is simply so much DNA and so much variation, how can one possibly assign a cause/effect role to any particular variant and any particular physiological trait? I will start to address this in my next blog on the subject, but suffice for now to say that one starts with the simplest most straightforward examples, where many different kinds of evidence can be used to support a causal claim, and proceeds to more and more difficult examples where the evidence is far more tenuous and sometimes no more than speculation. This is how experimental science usually works, few genuine moments of absolute definitiveness and many moments of incremental growing certainty.



No comments:

Post a Comment