South Africa’s CPGR inviting members of the Genomics community to enhance variant calling from African genome data

November 03, 2017 —Cape Town, South Africa — Today the Centre for Proteomic & Genomic Research (CPGR), in partnership with Edico Genome, announced that it will host a first-of-its kind hackathon to address genome data analysis challenges in resource constraint settings and from African genome data.

Variant calling from information-rich sequencing data, such as whole exomes or genomes, can be complicated by unequal (ineffective) mapping of sequencing reads to reference genomes. This problem is exacerbated in low-read depth data (e.g. in cost-constrained settings) and when working with African genomes. Failure to address these problems may lead to genome misinterpretation and under-reporting during variant interpretation. In a first-of-its-kind Hackathon, CPGR will use a hardware-accelerated approach to genome data analysis, as provided by Edico Genome’s DRAGEN Bio-IT Platform, to develop a purpose-designed pipeline for high-quality variant calling from low-read depth and/or African genome data.

DRAGEN is a highly reconfigurable field-programmable gate array (FPGA) designed to provide hardware-accelerated implementations of genome pipeline algorithms, such as BCL conversion, compression, mapping, alignment, sorting, duplicate marking, haplotyping and variant calling. As an example of its capabilities, DRAGEN recently set a new Guinness World Record for Fastest Analysis of 1,000 Whole Human Genomes, accomplishing the feet in 2 hours and 25 minutes. The speed and accuracy of DRAGEN make it an ideal platform for running a large number of different data-sets; or, individual data repeatedly with varying input parameters, which makes it particularly useful rapid prototyping and pipeline development.

CPGR is a non-profit organization dedicated to providing state-of-the-art ‘omics’ services to South Africa’s life sciences and biotech communities, originating from an initiative by the South Africa Department of Science & Technology (DST). Based in Cape Town, CPGR combines innovative, information-rich genomic and proteomic technologies with bio-computational pipelines to develop customized offerings for users in both academia and industry. CPGR has recently implemented a DRAGEN to support the development of NGS applications and data analysis pipelines.

While aligning reads from a NGS (Next Generation Sequencing) run to a reference genome, reads often map to multiple locations. This is especially true on whole genome and exome data. For efficiency sake, these multi-mapping reads are aligned randomly between the sites with equal mapping scores. This has a downstream effect on variant calling, especially in low read-depth applications like whole genome and exome sequencing. This also has a strong impact on low coverage regions. The same dataset when processed from raw data to variant calling will therefore output slightly different variant lists on multiple runs.

This difference in variant lists is aggravated in the Southern African context. While the region has incredibly rich human genetic diversity, this is not adequately reflected in the composition of human reference genome databases. As a consequence, data-analysis of African genomes may result in a higher probability of multi-mapping reads and subsequently lead to an increase of missed variants downstream.

DRAGEN allows for a unique opportunity to address this issue. The DRAGEN system is able to perform sequence alignment in a fraction of the time required by any other platform, making it possible to run multiple sequence alignments on a single sample without significantly increasing turnaround times.

In order to address this problem, CPGR in collaboration with Edico Genome, plans to host an inaugural South African hackathon.

The event ought to generate a computational tool that:

  1. Compares variant lists of the same dataset run through the DRAGEN pipeline multiple times
  2. Identifies inconsistently called variants
  3. Investigates these variants by performing targeted alignment on regions they are found in (eliminating the possibility of multiple alignment sites)
  4. Builds a consensus variant list adding confirmed variants from multiple runs and outputs the list as VCF file

The prospective tool shall address two major constraints in the (South) African context:

  • Variant calling from sequencing data where lack of funding may not allow sufficient read depths
  • Effective variant calling from African sequence data in absence of high-quality reference genomes



We invite and welcome participation from all members of the Bioinformatics and Genomics community in South Africa.

We have room for up to 15 participants in total and welcome interest from prospective participants near and far. Hackateers are encouraged to bring their own laptops. We will supply space, drinks and snacks.

 Hackathon program (prospective)

  1. Welcome (CPGR)
  2. DRAGEN introduction (Edico Genome)
  3. Hackathon / 1 (all participants)
  4. Feedback session (CPGR, all participants)
  5. Lunch (all)
  6. Hackathon / 2 (all participants)
  7. Feedback session (CPGR, all participants)
  8. Closure (all)


Centre for Proteomic & Genomic Research (CPGR), Upper Level, St Peter’s Mall, Cnr Anzio and Main Road, Observatory 7925, Cape Town, South Africa

Time & date

25 November 2017, 10:00 to 17:00


For information, please contact


About CPGR

The CPGR is one of Africa’s first fully integrated ‘omics’ service providers, built to leapfrog South Africa’s ability to conduct information-rich biomedical research onto a globally competitive level. Amongst others, the organization offers the following ‘omics’ capacity: Next-Generation Sequencing: NextSeq500 (1x), MiSeq (1x), MiniSeq (1x), IonTorrent PGM (2x), IonProton (1x), for sequencing projects; Microarrays: Affymetrix GS 3000 and Affymetrix GeneTitan for genoytping and gene-expression analysis; Mass spectrometry: ABI 4800 MALDI-ToF/ToF; Waters Xevo TQS triple quadrupole, SCIEX API4000 triple quadrupole and Thermo Q Exactive for MS-Proteomics; High-throughput PCR: QuantStudio 12K Flex Real-Time PCR System, QuantStudio 3D Digital PCR Sytem and ABI 7900 for qRT-PCR and genotyping applications; Automated DNA/RNA QC, library handling and sample processing; dedicated IT infrastructure and bioinformatic applications for data analysis and interpretation.

The CPGR is a non-profit company located in Cape Town, South Africa, based on an initiative by the Department of Science and Technology (DST), and financially supported by the Technology Innovation Agency (TIA). The CPGR combines state-of-the-art information rich genomic and proteomic (‘omics’) technologies with bio-computational pipelines to render services and support projects in the life science and biomedical arena in (South) Africa, all run in an ISO 9001:2015 certified and ISO 17025 compliant quality management system. Among others, the CPGR has recently launched an accelerator program to stimulate the creation of South African start-ups based on ‘omics’ technologies and set up Artisan Biomed to develop and implement Precision Medicine solutions in (South) Africa.

Information about the CPGR can be obtained at and


About Edico Genome

The use of next-generation sequencing is growing at an unprecedented pace, creating a need for easy to implement infrastructure that enables rapid, accurate and cost-effective processing and storage of this big data. Edico Genome has created a patented end-to-end platform solution for analysis of next-generation sequencing data, DRAGEN™, which speeds whole genome data analysis from hours to minutes while maintaining high accuracy and reducing costs. Top clinicians and researchers are utilizing the platform to achieve faster diagnoses for critically ill newborns, cancer patients and expecting parents waiting on prenatal tests, and faster results for scientists and drug developers. For more information, visit or follow @EdicoGenome.