I want to get back to considering some ideas to build infrastructure, but I need to take one other detour first. I've used the terms "high-thoughput" and "omics" quite a bit, but what, exactly do they mean? Simply, high-throughput refers to just that, a technology in which a large (or even exhaustive) number of measurements that can be taken in a fairly short time period. "Ome" and "omics" are suffixes that are derived from genome (the whole collection of a person's DNA, as coined by Hans Winkler, as a combinaion of "gene" and "chromosome"1) and genomics (the study of the genome). Scientists like to append to these to any large-scale system (or really, just about anything complex), such as the collection of proteins in a cell or tissue (the proteome), the collection of metabolites (the metabolome), and the collection of RNA that's been transcribed from genes (the transcriptome). High-throughput analysis is essential considering data at the "omic" level, that is to say considering all DNA sequences, gene expression levels, or proteins at once (or, to be slightly more precise, a significant subset of them). Without the ability to rapidly and accurately measure tens and hundreds of thousands of data points in a short period of time, there is no way to perform analyses at this level.
There are four major types of high-throughput measurements that are commonly performed: genomic SNP analysis (i.e., the large-scale genotyping of single nucleotide polymorphisms), transcriptomic measurements (i.e., the measurement of all gene expression values in a cell or tissue type simultaneously), proteomic measurements (i.e., the identification of all proteins present in a cell or tissue type), and metabolomic measurements (i.e., the identification and quantification of all metabolites present in a cell or tissue type). Each of these four is distinct and offers a different perspective on the processes underlying disease initiation and progression as well as on ways of predicting, preventing, or treating disease.
Genomic SNP genotyping measures a person's genotypes for several hundred thousand single nucleotide polymorphisms spread throughout the genome. Other assays exists to genotype ten thousand or so polymorphic sites that are near known genes (under the assumption that these are more likely to have some effect on these genes). The genotyping technology is quite accurate, but the SNPs themselves offer only limited information. These SNPs tend to be quite common (with typically at least 5% of the population having at least one copy of the less frequent allele), and not strictly causal of the disease. Rather, SNPs can act in unison with other SNPs and with environmental variables to increase or decrease a person's risk of a disease. This makes identifying important SNPs difficult; the variation in a trait that can be accounted for by a single SNP is fairly small relative to the total variation in the trait. Even so, because genotypes remain constant (barring mutations to individual cells) throughout life, SNPs are potentially among the most useful measurements for predicting risk.
Transcriptomic measurements (often referred to as gene expression microarrays or "gene chips" are the oldest and most established of the high-throughput methodologies. The most common are commercially produced "oligonucleotide arrays", which have hundreds of thousands of small (25 bases) probes, between 11 and 20 per gene. RNA that has been extracted from cells is then hybridized to the chip, and the expression level of ~30,000 different mRNAs can be assessed simultaneously. More so than SNP genotypes, there is the potential for a significant amount of noise in transcriptomic measurements. The source of the RNA, the preparation and purification methods, and variations in the hybridization and scanning process can lead to differences in expression levels; statistical methods to normalize, quantify, and analyze these measures has been one of the hottest areas of research in the last five years. Gene expression levels influence traits more directly than than SNPs, and so significant associations are easier to detect. While transcriptomic measures are not as useful for pre-disease prediction (because a person's gene expression levels very far in advance of disease initiation are not likely to be informative because they have the potential to change so significantly), they are very well-suited for either early identification of a disease (i.e., finding people who have gene expression levels characteristic of a disease but who have not yet manifested other symptoms) or classifying patients with a disease into subgroups (by identifying gene expression levels that are associated with either better or worse outcomes or with higher or lower values of some disease phenotype).
Proteomics is similar in character to transcriptomics. The most significant difference is in regards to the measurements. Unlike transcriptomics, where the gene expression levels are assessed simultaneously, protein identification is done in a rapid serial fashion. After a sample has been prepared, the proteins are separated using chromatography, 2 dimensional protein gels (which separate proteins based on charge and then size) or 1 dimensional protein gels (which separate based on size alone), and digested, typically with trypsin (which cuts proteins after each arginine and lysine), and then run through mass spectroscopy. The mass spec identifies the size of each of the peptides, and the proteins can be identified by comparing the size of the peptides created with the theoretical digests of all know proteins in a database. This searching is the key to the technology, and a number of algorithms both commercial and open-source have been created for this. Unlike transcriptomic measures, the overall quantity of a protein cannot be assessed, just its presence or absence. Like transcriptomic measures, though, proteomic measures are excellent for early identification of disease or classifying people into subgroups.
Last up is metabolomics, the high-throughput measure of the metabolites present in a cell or tissue. As with proteomics, the metabolites are measured in a very fast serial process. NMR is typically used to both identify and quantify metabolites. This technology is newer and less frequently used than the other technologies, but similar caveats apply. Measurements of metabolites are dynamic as are gene expression levels and proteins, and so are best suited for either early disease detection or disease subclass identification.
These are obviously fore-shortened descriptions of each of these technologies, but a passing familiarity with the state of technology is really important to understanding what personalized medicine can and can't accomplish and what the best strategies are. By understanding what current technologies can accurately measure and what that in turn can tell us, we can make informed choices about where to focus time, money, and effort developing tools and encouraging infrastructure growth.
On Friday, the Secretary's Advisory Committee on Genetics, Health, and Society (SACGHS), an advisory body for the Secretary of Health and Human Services (HHS), released its draft report Realizing the Promise of PGx: Challenges and Opportunities for public comment. I want to talk about my impressions of their findings and recommendations. I'm going to constrain myself to the Executive Summary and the Introduction (with the occasional stop into the main text for more context), mainly because I haven't had time to thoroughly read the report's hefty 100 pages.
To begin with, I want to mention one caveat. This report focuses (like the title says) on pharmacogenomics (for brevity I'll use their abbreviation, PGx). This is distinct from personalized medicine, both because personalized medicine is broader (it incorporates a number of facets other than a patient's response to a specific drug) and because PGx is broader (there are some important basic science problems that can be addressed by pharmacogenomic research that, while tangentially related to medicine, are not directly clinically relevant. There is significant overlap, however, and many of the problems and challenges of PGx also apply to personalized medicine more broadly.
The report makes recommendations in fifteen areas. I'm going to focus on just a few of these and talk about their recommendations for
The recommendations suggest that the FDA needs to provide guidance for companies intending to develop drugs and associated diagnostic tools to assess the drugs efficacy for a specific person. In particular it says they need to address the review process for the case where the drug is subject to FDA review but the diagnostic test is not. I have a simpler question: should there ever BE a case where a diagnostic test intended to identify people who will respond to a specific drug? This is a case where I'm not sure how far the regulatory oversight of the FDA extends, and what precisely is covered or not, but one important step in both making PGx effective and ensuring public confidence is extensive and exhaustive validation by an objective body. For drugs and diagnostics this means FDA review. The report also recommends providing incentives to the private sector for developing PGx technologies. To a limited extent I think this is an excellent idea, but it has to be executed properly. Being first into a market is potentially expensive, yes, but there are definite benefits to it. I think that providing some financial incentives is a reasonable way to encourage investment, and I think that expedited FDA review (another suggestion) is an excellent idea. The last suggested way of encouraging investment (and I understand that these are simply ideas for discussion and not concrete recommendations) is increasing intellectual property rights of these early investors. This is a very bad idea, and one that is at odds with another goal - equitable and widespread access to PGx technology. Private industry will be an important driver of the field, but the financial rewards they stand to reap should be enough. Strengthening IP protection will only serve to limit access due to cost.
Analytic validity, clinical validity, clinical utility, and cost-effectiveness are the foundations that clinical practice modifications are based on. Unless a new test or technique sufficiently demonstrates these traits, no physician is going to adopt it. The report recommends that HHS work to assess these for PGx applications and develop ways to improve it, such as better datasets and improvements to study methodologies, as well as quantifying the differing levels of evidence required for different uses of PGx technology. More importantly, pharmaceutical manufacturers should publish the results of studies on the clinical validity and utility of PGx, even (I would say especially) non-significant or negative results) or make the data available to be studied by others. I think a better approach may be to require drug makers to report these results to the FDA as part of the approval and surveillance process. They will still want to publish positive results in peer-review journals, and I think that's a fine thing, but the results from all of their studies should be available to other researchers in some other form.
Data sharing is a potential goldmine for researchers. Right now obtaining datasets can quite difficult, both mechanically (because of their size and format) and politically (because they are well-protected even by government-funded researchers). The report recommends that HHS identify the obstacles to data sharing and encourage companies and academic institutions to participate. It is also important for future research to develop ways to share and use patient data, and again, the report suggests that HHS work in coordination with other agencies and programs to ensure the interoperability of the various electronic health records systems in use and in development.
Of course if this work stays in the basic research phase indefinitely, it doesn't do a lot of good. The report recommends that HHS help to catalog and and disseminate applications of PGx technology, work with professional and licensing organizations to improve physician education, publish systematic reviews of PGx and its applications as they become available to help inform usage guidelines, and ensure that package inserts and labels on both drugs and PGx tests contain all available PGx information. This is especially important, because over 70% of current drugs have some PGx information available about them, but almost none contain this on their labels.
Last is health information technology. HHS needs to both encourage the growth of health IT experts as well as the inclusion of PGx in to current and future electronic health records (EHR). The report recommends working with Office of the National Coordinator for Health Information Technology (did you know there was such a thing?) and other agencies to ensure that both EHR and clinical decision support tools take into account currently available PGx information. Also, for the current time (when EHR are not universal), HHS should develop way for physicians to retrieve and utilize PGx information.
Overall, I think the report strikes the right tone - hopeful for the future applications but realistic both of the current state and the challenges that face the field. A number of the recommendations the report makes will also directly benefit personalized medicine broadly as well. My sense is that this report won't change very extensively before becoming finalized, and when it is, it will be an important roadmap within HHS and the NIH specifically as to what PGx projects should have priority. Anyone who is working in the field should read this both to get a sense for how the wind is blowing as well as for the chance to have some impact through your comments on the direction of PGx in the next decade.
In my post defining "personalized medicine" I mentioned trying to tailor a person's drug treatment to get the best possible effect. Using a person's genetic makeup to choose the optimal drug treatment is called pharmacogenomics. Put another way, this is the study of the way a person's genome influences the effect of drug treatments. Drug response is a very complex phenotype that's influenced by both genetic and environmental factors, but high-throughput technologies such as gene expression microarrays and SNP genotyping arrays allow these genetic factors to be considered on a scale never before possible.
Drug response has two separate components, each of which can be studied in pharmacogenomic terms. The first component of drug response is pharmacokinetics, or the way the body metabolizes a drug. This can be crudely estimated now with some simple genotyping. Polymorphisms in he genes CYP2C19 and CY2D6 are known to affect the rates of metabolisms of many drugs. By modifying the effective concentration of medications, these polymorphisms can either decrease the drug's effectiveness or increase the risk of toxic side effects. The second component is pharmacodynamics, which is how the drug acts to treat the specific condition. Complex diseases from cancer to hypertension are heterogeneous both in their symptoms and in their response to drugs, and some of this variability is due to genetic factors. The underlying molecular cause of the disease, then, can be used to decide which drug is best suited for which patient.
Pharmacogenomic profiling for pharmacokinetic information should lead to a deeper understanding of drug metabolism. As it stands now, a few important polymorphisms are well-known, but have next to no role in clinical practice, because the predictions that can be made with that information are of uncertain accuracy. Knowing more factors that influence metabolism of certain drugs should lead to more nuanced predictions about individual responses. Determining what genetic factors affect the metabolism of a specific drug, however, is a very difficult problem, one that requires a great deal of genetic information and a fairly large sample of people, with blood drawn at regular intervals to measure drug concentrations. This process must be repeated for every drug (and every combination of drugs, too) under investigation.
The application of pharmacogenomics to pharmacodynamic research is similar, and requires a great deal of data and effort. By considering genetic information about a group of patients treated with a specific drug, polymorphisms that mark response can be identified. Finding a number of these can lead to a signature or profile that can be used for prediction. In fact, this has already been done successfully in cancer research[1,2,3], but it is not routine in cancer treatment, both because the technologies involved are expensive and because the outcomes that are being predicted by these methods are limited. Larger scale investigations will fix the latter problem, technological advances will hopefully fix the former.
Pharmacogenomics is only beginning as an area of research, but it has the potential to improve the way drugs are developed and prescribed. AT the same time, hyperbole about the benefits of pharmacogenomics (and about how close it is to real clinical impact) has led to disappointment4. Being realistic about the potential of pharmacogenomics is important, but optimism is definitely warranted. The lessons learned in predicting other complex phenotypes will be of service in pharmacogenomics, and it is realistic to think that in the next 10 or 15 years it will be routine to screen people with common, complex diseases to choose the medicine and dosage most likely to work for them. This profiling may take significantly longer for rarer diseases, particularly uncommon forms of cancer, but the momentum is there, and as the cost of high-throughput measurements drop, the number of samples being collected will climb, pushing this knowledge closer to clinical usage.
Okay, now that we've spent some time defining our terms and delineating some of the problems it's time to start thinking about solutions. I'm going to start with the one I think is most fundamental: infrastructure. I'll talk about some strategies for developing infrastructure and encouraging investment in infrastructure by both private industry and health care providers. I have another presentation this Thursday, so the post may be delayed a day or so.
The promise of personalized medicine is one that is fundamentally rooted in science. It's based, at least partly, on the belief that drives all science: knowing more (relevant) information about a process can lead to a deeper understanding of how that process works. Much science, however, (and particularly molecular biology) has followed a fundamentally reductionist paradigm. Each part of a system is studied in isolation, and the information it provides is considered additive to the information provided by a separate piece of the system.
But R.B. Laughlin and David Pines write
So the triumph of the reductionism of the Greeks is a pyrrhic victory: We have succeeded in reducing all of ordinary physical behavior to a simple, correct Theory of Everything only to discover that it has revealed exactly nothing about many things of great importance.1Laughlin & Pines are talking about the Theory of Everything in physics, but the principle holds. Human biology is inordinately complex, with variables working not in isolation but in concert. To individualize medical care requires a deeper understanding of that biology, and that is no small order. I'm going to cover three major problems in this post, which is by no means an exhaustive list of the scientific issues facing personalized medicine. Rather, its a subset of issues that are very interesting to me and present significant hurdles to the field. These are
The first problem is how single nucleotide polymorphisms (SNPs) affect a person's risk of disease or how they influence a complex phenotype. Some of these polymorphisms have an obvious impact. They change either an amino acid or an exon splice site, leading to a different protein product. Others might affect a promoter binding sites or have no discernible change whatsoever. Genetic association studies, the typical tool for identifying SNPs with an impact of a disease or quantitative phenotype, have run against a roadblock in recent years; the findings of many promising studies have not been replicated when the studies were repeated in independent samples. The reasons for this lack of reproducibility are numerous, ranging from spurious initial findings to poor choices for replication samples. Although a number of potential solutions, from cross-validation to requiring all association studies to include an independent validation sample, have been proposed, the field still lacks a consistent, realistic, and tractable method for assessing which associations are "real" (i.e., have real predictive power or mechanistic insight and are worth devoting further resources to) and which are either artifacts of the data or are exclusive to the population being studied.
The next problem is predicting a person's risk of developing a disease. Biology and disease etiology are very complex, and attempting to create a mathematical model of who will develop a disease is impractical. At the same time, being able to estimate a person's risk of a disease is an important goal; it allows people who will benefit from early interventions to be identified, and it can both improve people's health and lower the long-term cost of a person's care. Right now, the only way to predict a person's risk of a disease is to have a medium to large study population that measures the outcome of interest and develop a prediction scheme from that. It's laborious, and if there isn't a study that looks at your outcome, there's no way to create one. Worse, it's difficult to know how well a risk prediction scheme will work when applied to a new patient, especially if that patient is very different from the people used to construct the scheme. These systems are built are imperative to personalized medicine (and, not coincidentally, the subject of my dissertation research, but that's a topic for another time - we're laying out problems, not solutions today), and ways to make them more accurate and more broadly applicable must be found.
Last is a group of problems that fall into their own own category: pharmacogenomics. Creating drugs that have fewer adverse side effects and are more effective has long been a goal of researchers and pharmaceutical companies. The genomics revolution means that more information than ever before can be integrated into this process. But to design more effective drugs requires two very difficult things: a deeper understanding of the mechanisms that underlie different disease subtypes and a more thorough understanding of how drugs act. As an exercise, flip through a drug reference book like the Physician's Desk Reference. The precise mechanism of many (if not most) recent blockbuster drugs is unknown. Doctors understand that some drugs work better for some people than for others, but there is no systematic way to tell who will respond. Targeting drugs to specific disease subtypes should help alleviate that problem. But as an extra layer of consternation, there is significant variability not only in how people respond to a drug, but in what dose they need and what side effects they suffer. This is because a number of factors influence drug metabolism. Polymorphisms in the genes CYP2C19 and CY2D6 are known to affect the rates of metabolisms of many drugs. By modifying the effective concentration of medications, these polymorphisms can either decrease the drug's effectiveness or increase the risk of toxic side effects. Despite the fact that the CYP subunits responsible for processing many drugs are known and the effect of polymorphisms in these subunits is fairly well studied, it is still not typical clinical practice to base drug dosage on this information. Hopefully increased accuracy will change this.
This list of problems just begins to scratch the surface, but it should provide a good sense of the scale of the scientific issues that face personalized medicine. None of the problems are easy, and none of them will be solved overnight. It's not even clear right now what the best solution to some of these problems looks like. In fact, the answer to some of these problems depends on the answer to some of the policy and ethical issues facing personalized medicine. On Monday I'll begin to describe what those policy issues are.
If I want to talk about personalized medicine (and I do), I have to begin by saying what I mean by it. (As a side note, I'll use the term individualized medicine interchangeably. Occasionally, people will use them to slightly different effect, but for my purposes, they're the same thing.) And what I mean is pretty simple - the combining of all different types of data (clinical, environmental, and genetic) to predict what diseases a person is at risk for and to identify medical treatments that will work for that specific person.
It's easy to lose sight of how far medicine has come in the past 100 years. We take for granted that most diseases are able to be treated if not cured, and we dedicate significant resources to medical research. Modern chemistry has led to hundreds of drugs that have saved countless lives. For all that, medicine can still be a crude endeavor.
Consider hypertension. It is one of the most prevalent diseases in America, and the single most common reason that people visit their doctors. In spite of that, less than half of people taking drugs to treat they hypertension actually have their blood pressure under control. Why is that? Well, partly because people don't change their lifestyles to combat the disease, but also because there is no way to identify which patient will respond to which drug. Hypertension is extremely heterogeneous, and it stands to reason that different subsets will respond to different medicines. For now, though, there is neither a way to easily assign a person to a subset of hypertensions, nor a mapping for which drug best treats which subtype.
But let's back up for a second. Why is hypertension so common? What leads to a person developing hypertension? For now the best predictors of hypertension are age (the older you are, the higher your risk of high blood pressure) and family history (if your relatives have high blood pressure, you're more likely to, also). But that casts a very wide net, and it's difficult to identify the people who would most benefit from early interventions to prevent them from developing hypertension. One potential application of personalized medicine is being able to combine all of the information available to make better predictions about who is really at risk of developing hypertension
Finding and targeting those at risk, though, will not stop everyone from getting hypertension. And the next potential application of personalized medicine is determining who will respond to which drug. This type of prediction is currently not even considered as part of treating a patient, rather the physician makes an educated guess about what drug may work and then monitors to see if the dosage needs to be increased or if another drug needs to be tried. But by identifying subsets of hypertensives and identifying which drugs work best in a subset, hypertension treatment will not only be more effective, there will likely be fewer adverse reactions and less wasted money.
Now that we know what personalized medicine is, the next three posts will cover the scientific, policy, and ethical issues that face the field. I don't intend to lay out much in the way of answers, and I also doubt that my listing of problems will be exhaustive. Rather, I want to convey a sense of the breadth of the issues that I'll be discussing in more depth over the next few months.
Reagan Kelly is a PhD student at University of Michigan studying bioinformatics. His thesis is focused on risk prediction algorithms for personalized medicine systems, and he is also interested in the policy and societal implications of individualized healthcare.You can read his CV for more information about him. If you would like to contact him, please send an email to reagank -at- reagank.com