I want to get back to considering some ideas to build infrastructure, but I need to take one other detour first. I've used the terms "high-thoughput" and "omics" quite a bit, but what, exactly do they mean? Simply, high-throughput refers to just that, a technology in which a large (or even exhaustive) number of measurements that can be taken in a fairly short time period. "Ome" and "omics" are suffixes that are derived from genome (the whole collection of a person's DNA, as coined by Hans Winkler, as a combinaion of "gene" and "chromosome"1) and genomics (the study of the genome). Scientists like to append to these to any large-scale system (or really, just about anything complex), such as the collection of proteins in a cell or tissue (the proteome), the collection of metabolites (the metabolome), and the collection of RNA that's been transcribed from genes (the transcriptome). High-throughput analysis is essential considering data at the "omic" level, that is to say considering all DNA sequences, gene expression levels, or proteins at once (or, to be slightly more precise, a significant subset of them). Without the ability to rapidly and accurately measure tens and hundreds of thousands of data points in a short period of time, there is no way to perform analyses at this level.
There are four major types of high-throughput measurements that are commonly performed: genomic SNP analysis (i.e., the large-scale genotyping of single nucleotide polymorphisms), transcriptomic measurements (i.e., the measurement of all gene expression values in a cell or tissue type simultaneously), proteomic measurements (i.e., the identification of all proteins present in a cell or tissue type), and metabolomic measurements (i.e., the identification and quantification of all metabolites present in a cell or tissue type). Each of these four is distinct and offers a different perspective on the processes underlying disease initiation and progression as well as on ways of predicting, preventing, or treating disease.
Genomic SNP genotyping measures a person's genotypes for several hundred thousand single nucleotide polymorphisms spread throughout the genome. Other assays exists to genotype ten thousand or so polymorphic sites that are near known genes (under the assumption that these are more likely to have some effect on these genes). The genotyping technology is quite accurate, but the SNPs themselves offer only limited information. These SNPs tend to be quite common (with typically at least 5% of the population having at least one copy of the less frequent allele), and not strictly causal of the disease. Rather, SNPs can act in unison with other SNPs and with environmental variables to increase or decrease a person's risk of a disease. This makes identifying important SNPs difficult; the variation in a trait that can be accounted for by a single SNP is fairly small relative to the total variation in the trait. Even so, because genotypes remain constant (barring mutations to individual cells) throughout life, SNPs are potentially among the most useful measurements for predicting risk.
Transcriptomic measurements (often referred to as gene expression microarrays or "gene chips" are the oldest and most established of the high-throughput methodologies. The most common are commercially produced "oligonucleotide arrays", which have hundreds of thousands of small (25 bases) probes, between 11 and 20 per gene. RNA that has been extracted from cells is then hybridized to the chip, and the expression level of ~30,000 different mRNAs can be assessed simultaneously. More so than SNP genotypes, there is the potential for a significant amount of noise in transcriptomic measurements. The source of the RNA, the preparation and purification methods, and variations in the hybridization and scanning process can lead to differences in expression levels; statistical methods to normalize, quantify, and analyze these measures has been one of the hottest areas of research in the last five years. Gene expression levels influence traits more directly than than SNPs, and so significant associations are easier to detect. While transcriptomic measures are not as useful for pre-disease prediction (because a person's gene expression levels very far in advance of disease initiation are not likely to be informative because they have the potential to change so significantly), they are very well-suited for either early identification of a disease (i.e., finding people who have gene expression levels characteristic of a disease but who have not yet manifested other symptoms) or classifying patients with a disease into subgroups (by identifying gene expression levels that are associated with either better or worse outcomes or with higher or lower values of some disease phenotype).
Proteomics is similar in character to transcriptomics. The most significant difference is in regards to the measurements. Unlike transcriptomics, where the gene expression levels are assessed simultaneously, protein identification is done in a rapid serial fashion. After a sample has been prepared, the proteins are separated using chromatography, 2 dimensional protein gels (which separate proteins based on charge and then size) or 1 dimensional protein gels (which separate based on size alone), and digested, typically with trypsin (which cuts proteins after each arginine and lysine), and then run through mass spectroscopy. The mass spec identifies the size of each of the peptides, and the proteins can be identified by comparing the size of the peptides created with the theoretical digests of all know proteins in a database. This searching is the key to the technology, and a number of algorithms both commercial and open-source have been created for this. Unlike transcriptomic measures, the overall quantity of a protein cannot be assessed, just its presence or absence. Like transcriptomic measures, though, proteomic measures are excellent for early identification of disease or classifying people into subgroups.
Last up is metabolomics, the high-throughput measure of the metabolites present in a cell or tissue. As with proteomics, the metabolites are measured in a very fast serial process. NMR is typically used to both identify and quantify metabolites. This technology is newer and less frequently used than the other technologies, but similar caveats apply. Measurements of metabolites are dynamic as are gene expression levels and proteins, and so are best suited for either early disease detection or disease subclass identification.
These are obviously fore-shortened descriptions of each of these technologies, but a passing familiarity with the state of technology is really important to understanding what personalized medicine can and can't accomplish and what the best strategies are. By understanding what current technologies can accurately measure and what that in turn can tell us, we can make informed choices about where to focus time, money, and effort developing tools and encouraging infrastructure growth.
On Friday, the Secretary's Advisory Committee on Genetics, Health, and Society (SACGHS), an advisory body for the Secretary of Health and Human Services (HHS), released its draft report Realizing the Promise of PGx: Challenges and Opportunities for public comment. I want to talk about my impressions of their findings and recommendations. I'm going to constrain myself to the Executive Summary and the Introduction (with the occasional stop into the main text for more context), mainly because I haven't had time to thoroughly read the report's hefty 100 pages.
To begin with, I want to mention one caveat. This report focuses (like the title says) on pharmacogenomics (for brevity I'll use their abbreviation, PGx). This is distinct from personalized medicine, both because personalized medicine is broader (it incorporates a number of facets other than a patient's response to a specific drug) and because PGx is broader (there are some important basic science problems that can be addressed by pharmacogenomic research that, while tangentially related to medicine, are not directly clinically relevant. There is significant overlap, however, and many of the problems and challenges of PGx also apply to personalized medicine more broadly.
The report makes recommendations in fifteen areas. I'm going to focus on just a few of these and talk about their recommendations for
I'm going to deviate a little from the planned topic for today. A bill I've mentioned before, the Genetic Information Nondiscrimination Act, has been in the news recently, and will hopefully pass within the next few weeks. I have a ton of respect for Congresswoman Slaughter (she represents Rochester, NY, where I went to college, and was a big supporter of RIT), the bills primary sponsor in the House, and she has real science bona fides, with a degree in microbiology and masters degree in Public Health, but how good is this bill?
I want to spend a little bit of time dissecting it (not parsing phrase-for-phrase, but rather pulling out important points), and trying to assess its potential impact. Most of the press this bill has received has been positive (if uncritical, but what do I expect from the mainstream media on science?), but I'm always uneasy when I see a very diverse group of people supporting something. If all of these people like it, how can it possibly be doing much of anything? At the same time, at least some health insurers are opposed, and that gives me some visceral, if not intellectual, confirmation that the bill on the right track.Continue reading "The Genetic Information Nondiscrimination Act of 2007"
In my post defining "personalized medicine" I mentioned trying to tailor a person's drug treatment to get the best possible effect. Using a person's genetic makeup to choose the optimal drug treatment is called pharmacogenomics. Put another way, this is the study of the way a person's genome influences the effect of drug treatments. Drug response is a very complex phenotype that's influenced by both genetic and environmental factors, but high-throughput technologies such as gene expression microarrays and SNP genotyping arrays allow these genetic factors to be considered on a scale never before possible.
Drug response has two separate components, each of which can be studied in pharmacogenomic terms. The first component of drug response is pharmacokinetics, or the way the body metabolizes a drug. This can be crudely estimated now with some simple genotyping. Polymorphisms in he genes CYP2C19 and CY2D6 are known to affect the rates of metabolisms of many drugs. By modifying the effective concentration of medications, these polymorphisms can either decrease the drug's effectiveness or increase the risk of toxic side effects. The second component is pharmacodynamics, which is how the drug acts to treat the specific condition. Complex diseases from cancer to hypertension are heterogeneous both in their symptoms and in their response to drugs, and some of this variability is due to genetic factors. The underlying molecular cause of the disease, then, can be used to decide which drug is best suited for which patient.Continue reading "What is Pharmacogenomics?"
I've discussed some of the scientific and policy challenges that surround personalized medicine, but I've left for last a much harder task: defining the ethical issues. From my perspective, policy and science problems share at least one important common feature - even if no one can agree on the optimal solution, it's possible to propose a solution, consider it's appropriateness in a reasonably objective fashion (hopefully using some pre-determined metric), and make adjustments based on its performance. Ethical issues are a bit trickier. I would guess that most people are willing to agree on what (most) the issues are, but that's about as far as things go. Trying to decide on appropriate solutions is much harder, because it's essentially a balancing act, and success looks less like an objective criteria and more like alienating the fewest people possible.
Becuase these issues are so much more difficult, I'm going to limit myself to three different classes of issues:
As with any significant undertaking, the challenges facing personalized medicine are not limited to the science behind it. A large number of public policy challenges exist that must be addressed before personalized medicine can become a reality. Each of these challenges must be dealt with not by a single person or group, but by all of the stakeholders that are affected by it. Who are the stakeholders? That seems like an easier question than it actually is, but in general, the stakeholders are physicians, health care organizations like hospitals and health networks, private insurance providers, public insurance providers such as medicare and medicaid, pharmaceutical companies, state governments, the federal government, and, of course, patients. Not all of these are affected by each issue, but solutions will only be possible when the affected stakeholders work together.
As with the scientific issues, in no way is my listing complete, nor is the discussion about the problems. Rather, I want to give a sense for how broad the policy issues are and who they affect. The main issues I want to describe are
The promise of personalized medicine is one that is fundamentally rooted in science. It's based, at least partly, on the belief that drives all science: knowing more (relevant) information about a process can lead to a deeper understanding of how that process works. Much science, however, (and particularly molecular biology) has followed a fundamentally reductionist paradigm. Each part of a system is studied in isolation, and the information it provides is considered additive to the information provided by a separate piece of the system.
But R.B. Laughlin and David Pines write
So the triumph of the reductionism of the Greeks is a pyrrhic victory: We have succeeded in reducing all of ordinary physical behavior to a simple, correct Theory of Everything only to discover that it has revealed exactly nothing about many things of great importance.1Laughlin & Pines are talking about the Theory of Everything in physics, but the principle holds. Human biology is inordinately complex, with variables working not in isolation but in concert. To individualize medical care requires a deeper understanding of that biology, and that is no small order. I'm going to cover three major problems in this post, which is by no means an exhaustive list of the scientific issues facing personalized medicine. Rather, its a subset of issues that are very interesting to me and present significant hurdles to the field. These are
If I want to talk about personalized medicine (and I do), I have to begin by saying what I mean by it. (As a side note, I'll use the term individualized medicine interchangeably. Occasionally, people will use them to slightly different effect, but for my purposes, they're the same thing.) And what I mean is pretty simple - the combining of all different types of data (clinical, environmental, and genetic) to predict what diseases a person is at risk for and to identify medical treatments that will work for that specific person.
It's easy to lose sight of how far medicine has come in the past 100 years. We take for granted that most diseases are able to be treated if not cured, and we dedicate significant resources to medical research. Modern chemistry has led to hundreds of drugs that have saved countless lives. For all that, medicine can still be a crude endeavor.Continue reading "What is Personalized Medicine?"
After a long time of thinking about it (and a long time spent procrastinating), I've decided to resurrect the blog. So I've slapped on a new coat of paint, added a few new gew-gaws, and I'm off to the races.
The blog is now actually an experiment of sorts. I'm waist-deep in writing my thesis, which is a risk prediction system that is able to sit at the heart of a personalized medicine system. It's fascinating work and I'm learning incredible amounts both about the mechanism of making a prediction and about what extra steps are necessary to make an algorithm clinically relevant and doctor friendly.Continue reading "Welcome - Let's Get Personal!"
Reagan Kelly is a PhD student at University of Michigan studying bioinformatics. His thesis is focused on risk prediction algorithms for personalized medicine systems, and he is also interested in the policy and societal implications of individualized healthcare.You can read his CV for more information about him. If you would like to contact him, please send an email to reagank -at- reagank.com