I have recently explored the development of the biggest Genomics facility in the world at present, the BGI in China, to contextualize what kind of role Africa can play in the ‘omics’ revolution.
Here, I am starting to investigate some of the challenges the ‘omics’ discipline is experiencing as it generates increasing amounts of data at increasingly lower costs, while the benefits that had been promised by the ‘omics’ revolution to date are few and far between.
This post will explore why the ‘omics’ innovation engine is currently in a state of crisis by way of highlighting problems in ‘omics’ research, exploring likely consequences if proper fixes are not found and highlighting potential solutions.
Ever since human DNA had been decoded in its entirety more than 10 years ago, the hype about the possible benefits emanating from this landmark achievement has been never ending. Clearly, the excitement has been warranted, considering (i) the efforts that went into the project and (ii) the basis for further scientific and technological developments it has laid.
While an ever growing number of scientists seem to be jumping on the ‘omics’ bandwagon, the exponentially increasing number of reported data (a historical development of ‘Genomics’ citations in PubMed is provided in the following image) hasn’t led to the promised breakthrough revelations in drug discovery and human medicine.
The feverish buzz that is characterising the ‘omics’ field at the moment has prompted some to call for a more cautious approach in advancing the entire discipline, in the interest of reaping the real benefits from the high-throughput biological revolution (1).
In fact, in spite of breakthrough biomedical innovations the field seems to be plagued by cases of botched science and a research and development process that seems hardly fit for tackling the complex biomedical questions it is meant to solve.
Problems in the ‘omics’ arena
In 2009, 3 clinical trials lead by Duke University were suspended because of the irreproducibility of genomic ‘signatures’ used to select cancer therapies for patients (2, 3).
The trials were based on filed patent applications and published papers describing genomic predictors — computer algorithms that take gene-expression data from a cancer cell and predict whether the cancer will be sensitive to a particular therapy. This is the kind of application clinicians are keen to use in treating patients with complex diseases, such as cancer, in order to develop fit-for-purpose therapies (‘Personalised Medicine’).
The trials were suspended following an intense investigation conducted by a group of researchers around Keith Baggerly at the University of Texas MD Anderson Cancer Center who had raised concerns over the quality and reproducibility of the data reported in these publications. A high-profile paper published in the Journal of Clinical Oncology was retracted on November 16, 2010 as a consequence thereof (4).
Unfortunately, cases like this – as extreme as they may be – are merely the tip of the proverbial ‘omics’ iceberg. A recent study found that only 2 out of 18(=11%) published microarray gene-expression results were exactly reproducible (5). Ten of the studies (=55%) could not be repeated at all, mainly due to inadequate reporting of results. One would have thought that a technology such as DNA microarrays that has been around for more than 15 years would by now guarantee a steady supply of good quality results, even in larger-scale and more complex studies.
Considering this, one wonders how much worse the problem would be in technology areas that are less mature and prone to higher analytical variability, such as mass spectrometry based Proteomics. Indeed, a recent study tested the ability of 27 laboratories to evaluate standardized samples containing 20 highly purified recombinant human proteins with mass spectrometry, a simple challenge compared to the thousands of proteins involved in clinical samples. Only seven laboratories (=26%) reported all 20 proteins correctly, and only one lab captured all tryptic peptides of 1250 daltons (6).
Despite the efforts that go into large-scale ‘omics’ driven biomarker discovery, few biomarkers reach clinical practice. In fact, few discoveries (currently much less than 1%) progress far in the translation process (7). And if they do, a huge risk of failure remains because of the flaws in the process leading to the generation of ‘omics’ signatures.
Considering available public information and our own experience in this field, it really seems as if ‘omics’ powered research and development is fraught with problems in every step of the process.
If the discovery stage is divided into design, execution and outcomes (see image below), the process seemingly suffers from an number of flaws, amongst others
- inadequate projects size and lack of research hypotheses;
- Lack of workflow validation, adequate data management (leading to basic errors in the reporting process), and over-compensating Bioinformatics (trying to use sophisticated tools to make sense of large-scale data-sets that are fraught with errors);
- Incomplete reporting of data, publication bias (exaggeration of results and hyping up of findings), and inadequate peer review (missing fundamental flaws in manuscripts filed for publication).
Following publication hardly any of the ‘omics’ data emerging from discovery phase move downstream the innovation chain. If they do, the development lacks proper validation at virtually every stage of the process, including things like the simple replication of results in the originating laboratory, independent external replication, independent validation (with different samples), and proper clinical validation.
The current approach seems to shift the onus of creating value from ‘omics’ research down the innovation chain. It appears as if all the activities are concentrated on the discovery phase, with hardly any of the outcomes moving into product development. This could be due to the poor quality of the data generated in the discovery phase; equally likely, it could be due to the programmatic focus on and the incentives for scientists to concentrate on discovery work while disregarding validation of results.
With public funding organisations spending in the order of 3 billion USD per annum on Genomics research alone (8), at face value the return on this mega-investment is meagre. At the moment, at least, ‘omics’ doesn’t seem to be the right fix for the world’s ailing health care issues.
Consequences of a fix that fails
The predominant approach to tackling complex biological and biomedical questions nowadays is to aim with the big ‘omics’ guns at every possible research problem that moves, regardless of the adequacy of the employed (technical) solution. Once the dust of the ‘high-throughput’ data generation barrage has settled, bioinformaticists are deployed to recon what has actually been hit. Their task is to sieve through the debris and look for high-profile casualties, because that’s what we are eventually after. If this sounds like going after the proverbial terrorist hiding in a cave somewhere in ‘Problemistan’ with weapons of mass destruction – then that’s what it actually is. Often, it’s difficult to see what was actually hit, let alone knowing who is friend and who is foe. In the ‘omics’ arena, this is not really solving any problems either.
Large-scale ‘omics’ efforts (e.g. genome-wide association studies and whole-genome sequencing initiatives) are employed to tackle problems in health care, amongst others comprised of the prevalence of communicable or non-communicable diseases, the lack of adequate diagnosis or cure, and the rising health care costs. In an ideal scenario, pictured as a reflexive loop system below, the situation would actually improve. Why is this not the case?
Because the unintended consequence of current ‘omics’ efforts seemingly is that fundamental principles of good quality research are often neglected, eventually re-enforcing the problem. The (unintentionally) over-looked issues include the formulation of proper research questions or hypotheses; the preparation of proper study plans, including data analysis plans; the proper documentation and validation of analytical work-flows; and the proper documentation and publication of results.
This looks like a ‘Fixes that Fail’ archetype (9), a common problem found in systems where quick fixes are applied to complex problems whereas the underlying real causes of the problem are overlooked.
With respect to the ‘omics revolution’ and the promises it makes, the consequence is that the problems in health (and other areas) don’t disappear; they may actually get worse because policy makers, funders and investors, and the public will eventually lose faith in the empty promises made by ‘omics’.
Solutions to enhance ‘omics’ research and translation
There appear to be 3 problem areas: the public domain (where the key drivers of science reside), the science domain (where research ideas are being conceived and developed) and the technological domain (where data are being generated and interpreted). These areas are interlinked; therefore, solutions should be applied across the 3 fields simultaneously in order to achieve a desired effect, e.g. to enhance the number of good quality projects that are eventually translated into practical applications with proven clinical utility and validity.
Below is a high-level summary of solutions proposed by experts in the field (7, 10) and blended with strategies employed by the CPGR.
Finally, it will be helpful to consider ‘translational omics’ as a system where key development stages are connected in a feedback-loop fashion. 2 things are of key relevance, in my mind, in order to enhance the performance of the system: (i) the individual components must be viable. Research strategies, technologies, processes and procedures used in the discovery phase must support the creation of good quality data-sets. This, in turn is the basis for validation and further development efforts downstream of early-stage discovery. Likewise, these stages have to be resourced properly in order to be viable. (ii) Aspects of validation and development should be considered as early as possible in the discovery phase to get a true understanding of total ‘omics’ R&D efforts and costs. Even though it may be considered unusual by scientists who believe in independence and ‘purity’ of the research they do, working in a genuine multidisciplinary approach from the word go has the potential not only to enhance research outputs but can also boost translational efforts. Since this is the impact that funders, clinicians and concerned patients at large are looking for, everyone involved in the process will eventually benefit.
1. Evans JP, Meslin EM, Marteau TM & Caulfield T. (2011) Deflating the genomic bubble. Science, 331: 861 – 862.
2. Baggerly K. (2010) Disclose all data in publications. Nature, 467: 401.
4. Hsu DS, Balakumaran BS, Acharya CR, Vlahovic V, Walters KS, Garman K, Anders C, Riedel RF, Lancaster J, Harpole D, Dressman HK, Nevins JR, Febbo PG & Potti A. (2007) Pharmacogenomic strategies provide a rational approach to the treatment of cisplatin-resistant patients with advanced cancer. J Clin Oncol, 25: 4350 – 4357.
5. Ioannidis JP, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC, Falchi M, Furlanello C, Game L, Jurman G, Mangion J, Mehta T, Nitzberg M, Page GP, Petretto E & van Noort V. (2009) Repeatability of published microarray gene expression analyses. Nature Genetics, 41: 149 – 155.
6. Bell AW, Deutsch EW, Au CE, Kearney RE, Beavis R, Sechi S, Nilsson T & Bergeron JJ; HUPO Test Sample Working Group. (2009) A HUPO test sample study reveals common problems in mass spectrometry-based proteomics. Nat Methods, 6: 423 – 430.
7. Ioannidis JP & Khoury MJ (2011) Improving validation practices in “omics” research. Science, 334: 1230 – 1232.
8. Pohlhaus JR & Cook-Deegan RM. (2008) Genomics research: world survey of public funding. BMC Genomics, 9: 472-490.
10. Baggerly KA & Coombes KR (2011) What Information Should Be Required to Support Clinical “Omics” Publications? Clin Chem. 2011 Mar 1. [Epub ahead of print]