I want to get back to considering some ideas to build infrastructure, but I need to take one other detour first. I've used the terms "high-thoughput" and "omics" quite a bit, but what, exactly do they mean? Simply, high-throughput refers to just that, a technology in which a large (or even exhaustive) number of measurements that can be taken in a fairly short time period. "Ome" and "omics" are suffixes that are derived from genome (the whole collection of a person's DNA, as coined by Hans Winkler, as a combinaion of "gene" and "chromosome"1) and genomics (the study of the genome). Scientists like to append to these to any large-scale system (or really, just about anything complex), such as the collection of proteins in a cell or tissue (the proteome), the collection of metabolites (the metabolome), and the collection of RNA that's been transcribed from genes (the transcriptome). High-throughput analysis is essential considering data at the "omic" level, that is to say considering all DNA sequences, gene expression levels, or proteins at once (or, to be slightly more precise, a significant subset of them). Without the ability to rapidly and accurately measure tens and hundreds of thousands of data points in a short period of time, there is no way to perform analyses at this level.
There are four major types of high-throughput measurements that are commonly performed: genomic SNP analysis (i.e., the large-scale genotyping of single nucleotide polymorphisms), transcriptomic measurements (i.e., the measurement of all gene expression values in a cell or tissue type simultaneously), proteomic measurements (i.e., the identification of all proteins present in a cell or tissue type), and metabolomic measurements (i.e., the identification and quantification of all metabolites present in a cell or tissue type). Each of these four is distinct and offers a different perspective on the processes underlying disease initiation and progression as well as on ways of predicting, preventing, or treating disease.
Genomic SNP genotyping measures a person's genotypes for several hundred thousand single nucleotide polymorphisms spread throughout the genome. Other assays exists to genotype ten thousand or so polymorphic sites that are near known genes (under the assumption that these are more likely to have some effect on these genes). The genotyping technology is quite accurate, but the SNPs themselves offer only limited information. These SNPs tend to be quite common (with typically at least 5% of the population having at least one copy of the less frequent allele), and not strictly causal of the disease. Rather, SNPs can act in unison with other SNPs and with environmental variables to increase or decrease a person's risk of a disease. This makes identifying important SNPs difficult; the variation in a trait that can be accounted for by a single SNP is fairly small relative to the total variation in the trait. Even so, because genotypes remain constant (barring mutations to individual cells) throughout life, SNPs are potentially among the most useful measurements for predicting risk.
Transcriptomic measurements (often referred to as gene expression microarrays or "gene chips" are the oldest and most established of the high-throughput methodologies. The most common are commercially produced "oligonucleotide arrays", which have hundreds of thousands of small (25 bases) probes, between 11 and 20 per gene. RNA that has been extracted from cells is then hybridized to the chip, and the expression level of ~30,000 different mRNAs can be assessed simultaneously. More so than SNP genotypes, there is the potential for a significant amount of noise in transcriptomic measurements. The source of the RNA, the preparation and purification methods, and variations in the hybridization and scanning process can lead to differences in expression levels; statistical methods to normalize, quantify, and analyze these measures has been one of the hottest areas of research in the last five years. Gene expression levels influence traits more directly than than SNPs, and so significant associations are easier to detect. While transcriptomic measures are not as useful for pre-disease prediction (because a person's gene expression levels very far in advance of disease initiation are not likely to be informative because they have the potential to change so significantly), they are very well-suited for either early identification of a disease (i.e., finding people who have gene expression levels characteristic of a disease but who have not yet manifested other symptoms) or classifying patients with a disease into subgroups (by identifying gene expression levels that are associated with either better or worse outcomes or with higher or lower values of some disease phenotype).
Proteomics is similar in character to transcriptomics. The most significant difference is in regards to the measurements. Unlike transcriptomics, where the gene expression levels are assessed simultaneously, protein identification is done in a rapid serial fashion. After a sample has been prepared, the proteins are separated using chromatography, 2 dimensional protein gels (which separate proteins based on charge and then size) or 1 dimensional protein gels (which separate based on size alone), and digested, typically with trypsin (which cuts proteins after each arginine and lysine), and then run through mass spectroscopy. The mass spec identifies the size of each of the peptides, and the proteins can be identified by comparing the size of the peptides created with the theoretical digests of all know proteins in a database. This searching is the key to the technology, and a number of algorithms both commercial and open-source have been created for this. Unlike transcriptomic measures, the overall quantity of a protein cannot be assessed, just its presence or absence. Like transcriptomic measures, though, proteomic measures are excellent for early identification of disease or classifying people into subgroups.
Last up is metabolomics, the high-throughput measure of the metabolites present in a cell or tissue. As with proteomics, the metabolites are measured in a very fast serial process. NMR is typically used to both identify and quantify metabolites. This technology is newer and less frequently used than the other technologies, but similar caveats apply. Measurements of metabolites are dynamic as are gene expression levels and proteins, and so are best suited for either early disease detection or disease subclass identification.
These are obviously fore-shortened descriptions of each of these technologies, but a passing familiarity with the state of technology is really important to understanding what personalized medicine can and can't accomplish and what the best strategies are. By understanding what current technologies can accurately measure and what that in turn can tell us, we can make informed choices about where to focus time, money, and effort developing tools and encouraging infrastructure growth.
In my post defining "personalized medicine" I mentioned trying to tailor a person's drug treatment to get the best possible effect. Using a person's genetic makeup to choose the optimal drug treatment is called pharmacogenomics. Put another way, this is the study of the way a person's genome influences the effect of drug treatments. Drug response is a very complex phenotype that's influenced by both genetic and environmental factors, but high-throughput technologies such as gene expression microarrays and SNP genotyping arrays allow these genetic factors to be considered on a scale never before possible.
Drug response has two separate components, each of which can be studied in pharmacogenomic terms. The first component of drug response is pharmacokinetics, or the way the body metabolizes a drug. This can be crudely estimated now with some simple genotyping. Polymorphisms in he genes CYP2C19 and CY2D6 are known to affect the rates of metabolisms of many drugs. By modifying the effective concentration of medications, these polymorphisms can either decrease the drug's effectiveness or increase the risk of toxic side effects. The second component is pharmacodynamics, which is how the drug acts to treat the specific condition. Complex diseases from cancer to hypertension are heterogeneous both in their symptoms and in their response to drugs, and some of this variability is due to genetic factors. The underlying molecular cause of the disease, then, can be used to decide which drug is best suited for which patient.
Pharmacogenomic profiling for pharmacokinetic information should lead to a deeper understanding of drug metabolism. As it stands now, a few important polymorphisms are well-known, but have next to no role in clinical practice, because the predictions that can be made with that information are of uncertain accuracy. Knowing more factors that influence metabolism of certain drugs should lead to more nuanced predictions about individual responses. Determining what genetic factors affect the metabolism of a specific drug, however, is a very difficult problem, one that requires a great deal of genetic information and a fairly large sample of people, with blood drawn at regular intervals to measure drug concentrations. This process must be repeated for every drug (and every combination of drugs, too) under investigation.
The application of pharmacogenomics to pharmacodynamic research is similar, and requires a great deal of data and effort. By considering genetic information about a group of patients treated with a specific drug, polymorphisms that mark response can be identified. Finding a number of these can lead to a signature or profile that can be used for prediction. In fact, this has already been done successfully in cancer research[1,2,3], but it is not routine in cancer treatment, both because the technologies involved are expensive and because the outcomes that are being predicted by these methods are limited. Larger scale investigations will fix the latter problem, technological advances will hopefully fix the former.
Pharmacogenomics is only beginning as an area of research, but it has the potential to improve the way drugs are developed and prescribed. AT the same time, hyperbole about the benefits of pharmacogenomics (and about how close it is to real clinical impact) has led to disappointment4. Being realistic about the potential of pharmacogenomics is important, but optimism is definitely warranted. The lessons learned in predicting other complex phenotypes will be of service in pharmacogenomics, and it is realistic to think that in the next 10 or 15 years it will be routine to screen people with common, complex diseases to choose the medicine and dosage most likely to work for them. This profiling may take significantly longer for rarer diseases, particularly uncommon forms of cancer, but the momentum is there, and as the cost of high-throughput measurements drop, the number of samples being collected will climb, pushing this knowledge closer to clinical usage.
Okay, now that we've spent some time defining our terms and delineating some of the problems it's time to start thinking about solutions. I'm going to start with the one I think is most fundamental: infrastructure. I'll talk about some strategies for developing infrastructure and encouraging investment in infrastructure by both private industry and health care providers. I have another presentation this Thursday, so the post may be delayed a day or so.
If I want to talk about personalized medicine (and I do), I have to begin by saying what I mean by it. (As a side note, I'll use the term individualized medicine interchangeably. Occasionally, people will use them to slightly different effect, but for my purposes, they're the same thing.) And what I mean is pretty simple - the combining of all different types of data (clinical, environmental, and genetic) to predict what diseases a person is at risk for and to identify medical treatments that will work for that specific person.
It's easy to lose sight of how far medicine has come in the past 100 years. We take for granted that most diseases are able to be treated if not cured, and we dedicate significant resources to medical research. Modern chemistry has led to hundreds of drugs that have saved countless lives. For all that, medicine can still be a crude endeavor.
Consider hypertension. It is one of the most prevalent diseases in America, and the single most common reason that people visit their doctors. In spite of that, less than half of people taking drugs to treat they hypertension actually have their blood pressure under control. Why is that? Well, partly because people don't change their lifestyles to combat the disease, but also because there is no way to identify which patient will respond to which drug. Hypertension is extremely heterogeneous, and it stands to reason that different subsets will respond to different medicines. For now, though, there is neither a way to easily assign a person to a subset of hypertensions, nor a mapping for which drug best treats which subtype.
But let's back up for a second. Why is hypertension so common? What leads to a person developing hypertension? For now the best predictors of hypertension are age (the older you are, the higher your risk of high blood pressure) and family history (if your relatives have high blood pressure, you're more likely to, also). But that casts a very wide net, and it's difficult to identify the people who would most benefit from early interventions to prevent them from developing hypertension. One potential application of personalized medicine is being able to combine all of the information available to make better predictions about who is really at risk of developing hypertension
Finding and targeting those at risk, though, will not stop everyone from getting hypertension. And the next potential application of personalized medicine is determining who will respond to which drug. This type of prediction is currently not even considered as part of treating a patient, rather the physician makes an educated guess about what drug may work and then monitors to see if the dosage needs to be increased or if another drug needs to be tried. But by identifying subsets of hypertensives and identifying which drugs work best in a subset, hypertension treatment will not only be more effective, there will likely be fewer adverse reactions and less wasted money.
Now that we know what personalized medicine is, the next three posts will cover the scientific, policy, and ethical issues that face the field. I don't intend to lay out much in the way of answers, and I also doubt that my listing of problems will be exhaustive. Rather, I want to convey a sense of the breadth of the issues that I'll be discussing in more depth over the next few months.
Reagan Kelly is a PhD student at University of Michigan studying bioinformatics. His thesis is focused on risk prediction algorithms for personalized medicine systems, and he is also interested in the policy and societal implications of individualized healthcare.You can read his CV for more information about him. If you would like to contact him, please send an email to reagank -at- reagank.com