World Class Biotech. Made in Africa Rotating Header Image

The “Omics” Revolution

Fixes that fail in ‘omics’ research and translation

I have recently explored the development of the biggest Genomics facility in the world at present, the BGI in China, to contextualize what kind of role Africa can play in the ‘omics’ revolution.

Here, I am starting to investigate some of the challenges the ‘omics’ discipline is experiencing as it generates increasing amounts of data at increasingly lower costs, while the benefits that had been promised by the ‘omics’ revolution to date are few and far between.

This post will explore why the ‘omics’ innovation engine is currently in a state of crisis by way of highlighting problems in ‘omics’ research, exploring likely consequences if proper fixes are not found and highlighting potential solutions.


Ever since human DNA had been decoded in its entirety more than 10 years ago, the hype about the possible benefits emanating from this landmark achievement has been never ending. Clearly, the excitement has been warranted, considering (i) the efforts that went into the project and (ii) the basis for further scientific and technological developments it has laid.

While an ever growing number of scientists seem to be jumping on the ‘omics’ bandwagon, the exponentially increasing number of reported data (a historical development of ‘Genomics’ citations in PubMed is provided in the following image) hasn’t led to the promised breakthrough revelations in drug discovery and human medicine.

The feverish buzz that is characterising the ‘omics’ field at the moment has prompted some to call for a more cautious approach in advancing the entire discipline, in the interest of reaping the real benefits from the high-throughput biological revolution (1).

In fact, in spite of breakthrough biomedical innovations the field seems to be plagued by cases of botched science and a research and development process that seems hardly fit for tackling the complex biomedical questions it is meant to solve.

Problems in the ‘omics’ arena

In 2009, 3 clinical trials lead by Duke University were suspended because of the irreproducibility of genomic ‘signatures’ used to select cancer therapies for patients (2, 3).

The trials were based on filed patent applications and published papers describing genomic predictors — computer algorithms that take gene-expression data from a cancer cell and predict whether the cancer will be sensitive to a particular therapy. This is the kind of application clinicians are keen to use in treating patients with complex diseases, such as cancer, in order to develop fit-for-purpose therapies (‘Personalised Medicine’).

The trials were suspended following an intense investigation conducted by a group of researchers around Keith Baggerly at the University of Texas MD Anderson Cancer Center who had raised concerns over the quality and reproducibility of the data reported in these publications. A high-profile paper published in the Journal of Clinical Oncology was retracted on November 16, 2010 as a consequence thereof (4).

Unfortunately, cases like this – as extreme as they may be – are merely the tip of the proverbial ‘omics’ iceberg. A recent study found that only 2 out of 18 (=11%) published microarray gene-expression results were exactly reproducible (5). Ten of the studies (=55%) could not be repeated at all, mainly due to inadequate reporting of results. One would have thought that a technology such as DNA microarrays that has been around for more than 15 years would by now guarantee a steady supply of good quality results, even in larger-scale and more complex studies.

Considering this, one wonders how much worse the problem would be in technology areas that are less mature and prone to higher analytical variability, such as mass spectrometry based Proteomics. Indeed, a recent study tested the ability of 27 laboratories to evaluate standardized samples containing 20 highly purified recombinant human proteins with mass spectrometry, a simple challenge compared to the thousands of proteins involved in clinical samples. Only seven laboratories (=26%) reported all 20 proteins correctly, and only one lab captured all tryptic peptides of 1250 daltons (6).

Despite the efforts that go into large-scale ‘omics’ driven biomarker discovery, few biomarkers reach clinical practice. In fact, few discoveries (currently much less than 1%) progress far in the translation process (7). And if they do, a huge risk of failure remains because of the flaws in the process leading to the generation of ‘omics’ signatures.

Considering available public information and our own experience in this field, it really seems as if ‘omics’ powered research and development is fraught with problems in every step of the process.

If the discovery stage is divided into design, execution and outcomes (see image below), the process seemingly suffers from an number of flaws, amongst others

  1. inadequate projects size and lack of research hypotheses;
  2. Lack of workflow validation, adequate data management (leading to basic errors in the reporting process), and over-compensating Bioinformatics (trying to use sophisticated tools to make sense of large-scale data-sets that are fraught with errors);
  3. Incomplete reporting of data, publication bias (exaggeration of results and hyping up of findings), and inadequate peer review (missing fundamental flaws in manuscripts filed for publication).

Following publication hardly any of the ‘omics’ data emerging from discovery phase move downstream the innovation chain. If they do, the development lacks proper validation at virtually every stage of the process, including things like the simple replication of results in the originating laboratory, independent external replication, independent validation (with different samples), and proper clinical validation.


The current approach seems to shift the onus of creating value from ‘omics’ research down the innovation chain. It appears as if all the activities are concentrated on the discovery phase, with hardly any of the outcomes moving into product development. This could be due to the poor quality of the data generated in the discovery phase; equally likely, it could be due to the programmatic focus on and the incentives for scientists to concentrate on discovery work while disregarding validation of results.

With public funding organisations spending in the order of 3 billion USD per annum on Genomics research alone (8), at face value the return on this mega-investment is meagre. At the moment, at least, ‘omics’ doesn’t seem to be the right fix for the world’s ailing health care issues.

Consequences of a fix that fails  

The predominant approach to tackling complex biological and biomedical questions nowadays is to aim with the big ‘omics’ guns at every possible research problem that moves, regardless of the adequacy of the employed (technical) solution. Once the dust of the ‘high-throughput’ data generation barrage has settled, bioinformaticists are deployed to recon what has actually been hit. Their task is to sieve through the debris and look for high-profile casualties, because that’s what we are eventually after. If this sounds like going after the proverbial terrorist hiding in a cave somewhere in ‘Problemistan’ with weapons of mass destruction – then that’s what it actually is. Often, it’s difficult to see what was actually hit, let alone knowing who is friend and who is foe. In the ‘omics’ arena, this is not really solving any problems either.

Large-scale ‘omics’ efforts (e.g. genome-wide association studies and whole-genome sequencing initiatives) are employed to tackle problems in health care, amongst others comprised of the prevalence of communicable or non-communicable diseases, the lack of adequate diagnosis or cure, and the rising health care costs. In an ideal scenario, pictured as a reflexive loop system below, the situation would actually improve. Why is this not the case?

Because the unintended consequence of current ‘omics’ efforts seemingly is that fundamental principles of good quality research are often neglected, eventually re-enforcing the problem. The (unintentionally) over-looked issues include the formulation of proper research questions or hypotheses; the preparation of proper study plans, including data analysis plans; the proper documentation and validation of analytical work-flows; and the proper documentation and publication of results.

This looks like a ‘Fixes that Fail’ archetype (9), a common problem found in systems where quick fixes are applied to complex problems whereas the underlying real causes of the problem are overlooked.

With respect to the ‘omics revolution’ and the promises it makes, the consequence is that the problems in health (and other areas) don’t disappear; they may actually get worse because policy makers, funders and investors, and the public will eventually lose faith in the empty promises made by ‘omics’.

Solutions to enhance ‘omics’ research and translation 

There appear to be 3 problem areas: the public domain (where the key drivers of science reside), the science domain (where research ideas are being conceived and developed) and the technological domain (where data are being generated and interpreted). These areas are interlinked; therefore, solutions should be applied across the 3 fields simultaneously in order to achieve a desired effect, e.g. to enhance the number of good quality projects that are eventually translated into practical applications with proven clinical utility and validity.

Below is a high-level summary of solutions proposed by experts in the field (7, 10) and blended with strategies employed by the CPGR.

Finally, it will be helpful to consider ‘translational omics’  as a  system where key development stages are connected in a feedback-loop fashion. 2 things are of key relevance, in my mind, in order to enhance the performance of the system: (i) the individual components must be viable. Research strategies, technologies, processes and procedures used in the discovery phase must support the creation of good quality data-sets. This, in turn is the basis for validation and further development efforts downstream of early-stage discovery. Likewise, these stages have to be resourced properly in order to be viable. (ii) Aspects of validation and development should be considered as early as possible in the discovery phase to get a true understanding of total ‘omics’ R&D efforts and costs. Even though it may be considered unusual by scientists who believe in independence and ‘purity’ of the research they do, working in a genuine multidisciplinary approach from the word go has the potential not only to enhance research outputs but can also boost translational efforts. Since this is the impact that funders, clinicians and concerned patients at large are looking for, everyone involved in the process will eventually benefit.


1 Evans JP, Meslin EM, Marteau TM & Caulfield T. (2011) Deflating the genomic bubble. Science, 331: 861 – 862.

2 Baggerly K. (2010) Disclose all data in publications. Nature, 467: 401.

3 Cancer trial errors revealed.

4 Hsu DS, Balakumaran BS, Acharya CR, Vlahovic V, Walters KS, Garman K, Anders C, Riedel RF, Lancaster J, Harpole D, Dressman HK, Nevins JR, Febbo PG & Potti A. (2007) Pharmacogenomic strategies provide a rational approach to the treatment of cisplatin-resistant patients with advanced cancer. J Clin Oncol, 25: 4350 – 4357.

5 Ioannidis JP, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC, Falchi M, Furlanello C, Game L, Jurman G, Mangion J, Mehta T, Nitzberg M, Page GP, Petretto E & van Noort V. (2009) Repeatability of published microarray gene expression analyses. Nature Genetics, 41: 149 – 155.

6 Bell AW, Deutsch EW, Au CE, Kearney RE, Beavis R, Sechi S, Nilsson T & Bergeron JJ; HUPO Test Sample Working Group. (2009) A HUPO test sample study reveals common problems in mass spectrometry-based proteomics. Nat Methods, 6: 423 – 430.

7 Ioannidis JP & Khoury MJ (2011) Improving validation practices in “omics” research. Science, 334: 1230 – 1232.

8 Pohlhaus JR & Cook-Deegan RM. (2008) Genomics research: world survey of public funding. BMC Genomics, 9: 472-490.

10 Baggerly KA & Coombes KR (2011) What Information Should Be Required to Support Clinical “Omics” Publications? Clin Chem. 2011 Mar 1. [Epub ahead of print]

Can Africa play a role in the ‘omics’ revolution?

Can Africa play a role in the ‘omics’ revolution?

This is a question that I am being asked frequently by colleagues locally and internationally. If yes, what role could it be?

I’d like to explore the role Africa can play through 3 different lenses:

  1. China’s role as an emerging superpower in biotechnology, with a particular focus on Genomics, and the lessons we can learn from it;
  2. Common issues in ‘omics’ research (Genomics, Proteomics, Bioinformatics, Biomarkers), in particular in the biomedical sciences;
  3. Resources & capabilities in Africa, mainly based on my experience in working in South Africa.

Ultimately, I aim at synthesizing these viewpoints into a business idea for positioning Africa as a serious player in the ‘omics’ arena globally.

1 China’s role as an emerging biotech super power and a world leader in the Genomics arena

Recently, China published its 12th Five-Year plan for national economic and social development (1). The plan features biotechnology as one of 7 priority industries that are meant to contribute to a 7% annual GDP growth over the 5-year period. It sets an innovation target of 3.3 patents per 10.000 people and predicts an increase in R&D spending of 2.2% of GDP (1).

Impressively, Chinese officials have mapped a plan to generate a million new biotech jobs by the end of 2015. In order to achieve this ambitious goal, they have plans to spend in the order of USD 300 billion for science and technology development (2).

In the Genomics arena, China has created the most powerful Genomics hub in the entire world: BGI, formerly known as Beijing Genomics Institute. The BGI is located in a converted shoe factory in Shenzen city (, containing about 128 high end 2nd generation sequencers from Illumina and another 27 SOLiD systems from rival company Life Technologies (formerly trading as AB) (3, 4).

Today, BGI has the capacity to generate the equivalent of 10.000 human genomes per annum. In 2010, it produced 500 Tb of data – 10 times the amount of data the NCBI ( generated in the past 20 years – and it is set to produce 100 Pb of data in 2011 (3).

BGI was originally created as a sequencing factory but has progressed into developing its own scientific muscle and reputation (5). From its initial involvement in the ‘Human Genome Sequencing’ project, BGI has been involved or actively spearheading a series of large-scale genomic projects, such as the ‘1000 Genomes’ project, the ‘10.000 Microbial Genomes’ project, and the ‘1000 Mendelian Disorders in Humans’ project (4).

Backed by the Chinese government and endowed with a USD 1.5 bn loan from the Chinese Development Bank in 2009 (6), the company’s strategy to catapult itself, and China, into an entirely different league in the Genomics and biotechnology arena seems to pay off.

Amongst others, the company has managed to draw attention from literally everyone involved in Genomics research, simply because of the organisation’s massive economies of scale and the consequential savings in cost and time. In 2011, when a deadly foodborne strain of E. coli strain was threatening Germany and other European countries, the BGI teamed up with experts in Hamburg (Germany) to sequence the bacterium in no time. Not only did they do that, BGI experts also developed a test for the rapid detection of the E. coli strain, all based on the initial sequencing effort (7).

BGI has grown into a massive operation with some 3000 staff, most of them at an average age of 25 (1). Their ability to generate unprecedented amounts of data has attracted attention from academia and industry. In 2010, the company’s revenue hit $ 150 million and is supposed to triplicate by end of 2011 (6). In order to address the needs of industry, BGI has created a commercial interface (BGI Americas,, with a strong interest to gain a footprint in the lucrative US pharmaceutical market. To date, BGI is serving 15 of the top 20 pharmaceutical companies (7).


  • BGI has built unprecedented critical mass and economies of scale in the Genomics arena by way of creating a next generation sequencing (NGS) factory. This has lead to a significant reduction in cost and time-to-data for everyone with an interest in getting results quickly. For the BGI, this means that they can continuously improve their processes when dealing with an increasing number of projects.
  • By way of teaming up with scientific groups all over the world, the BGI uses its NGS muscle in collaborative projects to gain co-ownership of data and IP. Moving away from a pure service provider model, the company has seemingly started to subsidize interesting projects in-kind and in-cash. As a consequence, by way of engaging in collaborative scientific efforts (5), the BGI manages to complement its ‘production power’ with the biological and biomedical knowledge residing in the minds of scientists all over the world to stimulate the creation of IP, services and products.
  • This kind of exposure puts the BGI into a position to select and prioritise the most lucrative projects, treating the rest as either process-improvement or money-making exercises.
  • Building a track record in successfully turning around NGS projects has surely helped the BGI to attract widespread interest from academia and industry. This is important for at least two reasons: (1), the company needs to make money to service the $ 1.5 bn loan from the Chinese Development Bank. Although there is some information in the public domain about the revenues BGI makes, little is known about any surplus (profit) it makes from any of the projects it runs. It would be interesting to see how successful the company is going to be in blending a front-end contract research organisation (CRO) business with a large-scale collaborative research effort. However, the ability to attract unprecedented numbers of talented scientists should be a decent basis for building the capacity to tackle these ventures effectively; (2), BGI will have to complement its expertise in ‘NGS production’ with relevant areas of science. It’ll be interesting to see if the company choses a collaborative approach or if it rather opts to build its own scientific expertise. In case of the latter, it may eventually look more like a large-scale academic institute as opposed to a factory. If so, will it choose to have a focus on a particular field of science or expand even further to tackle a wider field of research areas?
  • Turning out massive amounts of data is merely the starting point in modern-day genomic projects. What follows when the lab has generated data is often a larger-scale effort to crunch and interpret the results generated in these projects. It will be interesting to see if turning out ‘primary’ data quickly is the real value-add for scientists turning to BGI for help. With the volume of data increasing on the current scale, who will take care of the down-stream value creation? It is unlikely that the BGI will be able to do all of this; therein is a possible gap in the BGI business model and an opportunity for other solution providers to step in. Unless, of course, all the BGI customers build their own bioinformatic capacity to deal with the massive amount of data that the BGI can generate. But this creates another conundrum for everyone who turns to the BGI for cost-effective sequencing: Bioinformaticists cost money too! I doubt that all of the projects run by the BGI, on behalf of its clients, have considered these properly. We may in the end sit with nothing but a pile of data that hardly anyone can interpret or put to use.
  • It will be interesting to see how BGI is going to capitalise on the IP it generates. Translation of genomic data into products is a long-term process, fraught with challenges problems, all in all requiring further investment. Taking the development of a predictive genomic signature as an example, the corresponding efforts range from the validation of data to demonstrating utility in clinical trials. Will the company be able to convert the intellectual capital it creates into products that can render a decent return? What kind of strategies is BGI going to pursue in this regard? Will it employ a licensing model, prominently used by some of the big Universities; or rather choose to set up special purpose vehicles (start-up companies) to keep the development of technologies or products, and therefore value creation, in China?
  • Lastly, how is BGI planning to maintain a competitive edge in terms of NGS technology? The company’s factory is built on state-of-the-art 2nd generation instruments and it is putting these to impressive use. How will BGI respond to the next wave of instruments that are alreday promising to reduce the sequencing costs per base pair even further? It’s not inconceivable that other countries, and companies, decide to emulate the BGI model to build NGS factories using 3rd generation NGS instrumentation. It will also be interesting to see how BGI will leverage its muscle in using 2nd generation NGS instruments to develop its own proprietary sequencing technologies. It is conceivable, considering the expertise China, and other Asian countries have built in engineering and manufacturing electronic devices, that the 4th generation of NGS technology will emerge in Asia, rather than in the US.

So, all of this looks very impressive! Where does it leave the rest of the world? Where does it leave Africa?

Before eventually getting to this point, I’ll explore current issues in ‘omics’ research in part 2 of this post.


1         China’s 12th Five-Year Plan: Overview (2011).

2         China to spend $308B, gain 1 M new jobs in 5-year biotech plan (2011).

3         China genomics institute outpaces the world (2011).

4         BGI – China’s Genomic Centre has a hand in everything (2011).

5         The sequence factory. (2010) Nature 464, 22-24.

6         High quality DNA (2011).

7         Chinese Genomics Firm Expands Operations.

Constant gardeners – How we can stimulate innovation in emerging econcomies

In a recent blog on Harvard Business Reviews online (, Vijay Govindarajan and Justin Chakma argued that some VC firms operating in emerging economies follow a systems-based philosophy when making investment decisions. More specifically, some of these companies treat investments in discrete entities as part of a value chain(s), with multiple inter-relationships and inter-dependencies, that needs to function as a whole in order for the individual components to be viable. In scarce environments many of the components that new ventures require to strive (e.g. support services, manufacturing at higher scales, or distribution channels) often do not exist. Therefore, some VC firms, acting like gardeners who want to make sure that their ‘start-ups’ grow and bloom, plant  seeds that are meant to benefit each other in a symbiotic fashion; by creating a more sustainable ecosystem the gardener facilitates growth of a specific investment as well as of entire value chains.

When analysing the influence of government support on innovation in Korean biotechnology SMEs Kang & Park (2011) have found that certain activities, in particular partnerships with upstream and downstream entities, are particularly effective in producing innovation outputs. For this to happen, the players in the system need to have access to a broad pool of information, technologies and financial and human resources (Kang & Park, 2011).

The CPGR has been created to support the development of the biotech sector in South Africa (SA), as an enabler of innovation. Applying an eco-system view, the SA government made the strategic decision to build enabling support infrastructure and resources to make sure biotech activity develops and, ultimately, grows into an economic power of its own.

Against this backdrop, the CPGR was built as an enabler of research and development in the ‘omics’ arena and, more broadly, of innovation in a system that is characterised by a lack of human resources and shortage of funding, to name just two of the major challenges. We found that funding available for academic research significantly limits access for scientists to an offering that is provided on a fee-for-service basis; what’s more, with teaching being part of the academic value chain, catering to the research component only meant that the value derived by scientists from our services was incomplete. In addition, because of the local biotech sector’s infant state we also found ourselves in a difficult situation regarding the generation of sustainable income streams from local industry.

We realised that the value chains in our environment were incomplete. Therefore, we decided to expand the scope of our activities to build capacity upstream and downstream of our core offering, not least by forging stronger relationships with the key players in the innovation system. Notably, the corresponding activities include a focus on human capital development, collaborative projects with academia and industry, and the creation of dedicated offerings for international biotech industry (

Achieving our mandate is a long-term project that requires patience and the ability to understand the complex interplay of a multitude of components in the innovation system and its impact on the viability of a specific asset, be it project, company or value chain. It requires the ability to respond to changes in the environment and careful understanding of the signals that the system sends in response to the interventions that we put in place. Below are some of the changes that we have made to our business model in tackling perceived gaps in up- and downstream of our value chain.


What Venture Capital Can Learn from Emerging Markets by Vijay Govindarajan (

Kyung-Nam Kang & Hayoung Park (2011) Influence of government R&D support and inter-firm collaborations on innovation in Korean biotechnology SMEs. In press