One of the main reasons behind the conception of the CPGR was to create a facility that could help facilitate ‘omics-based research in a resource limited setting. This was over a decade ago and in that time there have been several lessons we have learnt as a service provider.
A regular issue we have had to deal with in this time is funding; the best science does not always have the corresponding budget to go with it and consequently researchers need to make some decisions to get as much “bang for their buck” as possible. Grants can only be stretched so far and there is often a need to get answers for numerous questions and not enough resources to adequately address these needs.
We recently contributed to an article that covers some of the challenges involved in setting up or running a biological mass spectrometry-based core facility in Africa. Whilst that paper had some interesting perspectives from the point of view of a service provider and the challenges they face, I’d like to cover one of the main components of any experiment that we deal with on an almost daily basis (and consequently of interest to many researchers): biological variance. The impact of biological variance and the requisite need for replicates that are required to control for this is clear – especially when attempting to discern the signal of interest from the noise. Their impact is also very clear on a researcher’s budget and consequently how they may try and push the limits scientifically to get as much data for a given financial input as possible.
In preparing for a talk that delivered at the recent HUPO PSI conference held in Cape Town earlier this year, I came across an interesting paper by Maes et al (Interindividual Variation in the Proteome of Human Peripheral Blood Mononuclear Cells) looking at biological variance and its effects in human peripheral blood mononuclear cells. I found the figure below from this paper to be really insightful in that it was able to concisely demonstrate the problem with a large portion of scientific experiments still being run with an “n = 3” approach.
Below is the last figure from this article along with its descriptor legend.
Panel A shows the influence of coefficient of variation on sample size, assuming a power of 0,8 and a fold change of 1,5. The higher the variation in a setup, the more replicates are needed to obtain the same power. Panel B illustrates power versus number of replicates when detecting various fold changes with following parameters: CV = 30% and a significance level of 0,05. The more subtle changes one wants to observe, the more replicates are required (B).
The main takeaway from this is that, as systems biology studies gain in complexity, being able to resolve subtle differences is going to become increasingly important. It’s clear from panel B above that to gain the resolution to delve deeper into these subtle changes in protein expression profiles at a systems level by, for example, dropping from a 2-fold to a 1.5-fold change threshold in expression profiles, your experiment will need to jump from three to eight replicates in this specific case. This will have a big impact on a project budget, and understandably many researchers may opt to go with the “n = 3” route as a result. But at what risk? What happens once collected your samples, months later you finish acquiring your data for the experiments only to realise there may have been a problem batch of samples whose data you need to exclude? As a researcher you’ve spent a lot of money to produce this data but may have to throw everything away as by having “n = 3” there is no room for any problems or artifacts to be able to be dealt with.
Given what’s at stake, it would be better to gain more reliable and powerful stats for a more limited number of conditions than rather potentially leave your entire study under powered. This may seem like a mundane topic to cover…until you fall victim to not having the replicates in place to provide you with enough power to see the signal from the noise and you’ve blown your budget already.
Hypothetically speaking you may have a budget of R20 000 – R30 000 and decide that you want to compare six different conditions and a control. Each of these has three biological replicates and it takes you six months to obtain the samples. Once your samples have been analysed it becomes clear to the analyst that for three of the conditions there is at least one replicate that is a clear outlier. You investigate and determine that a different buffer was used for their extraction and it was made up differently. You now must throw away three branches of your study and at a cost you won’t see a scientific return on. Alternately you decided initially to cut the number of conditions to three and a control condition with five biological replicates in each condition (for approximately the same cost). Consequently, you were able to discard three samples split across two conditions and still maintain enough samples across all your samples to run some basic statistical analyses. This would provide you with insight on how you could look to plan your next experiment and how to move forward providing scientific value for your budget.
There are obviously going to be exceptions and interesting situations, but I would argue that getting the number of replicates and your study design right at the start of any ‘omics-based experiment is absolutely critical to ultimately getting useful and plausible data out on the far side.
Hopefully this blog makes you think about these issues for any future experimental questions you may have. Good luck with your research and feel free to contact us should you wish to talk about any of the issues raised above or others relating to how best get reliable results for a burning research question you may have.