COG-UK
COG-UK passes 100K genomes
This week saw COG-UK pass an important milestone, having now sequenced more than 100K SARS-CoV-2 genomes
When we began in March, few in the consortium envisaged having the capacity or need to sequence and analyse such a staggering number of viral genomes in such a short space of time. And yet just eight months later, as the 100K mark whizzes by, it is worth reflecting on why this unprecedented viral genomic surveillance effort has been needed, how the genome data generated has been used to better understand the transmission dynamics underpinning the COVID-19 pandemic in the UK, how the same data has provided insights that have helped to inform outbreak responses from the local to national level, and how working collaboratively and in an open manner has been of fundamental importance in getting this far.
A sombre milestone
Naturally, while applauding the often herculean efforts of the many hundreds of individuals whose work has been instrumental to our progress to date, it is important to note that this is not an occasion on which anyone involved with COG-UK is celebrating. The reality behind these genomes, and the samples from which they were sequenced, is a grim one. Our thoughts are with the families and loved ones of the people who died of COVID-19 in the UK, of the individuals still suffering as a result of infection, and those whose health has been impacted in other ways as a result of interventions put in place to tackle the virus. As will be true of other strands of national and global COVID-19 responses, the unprecedented health, social and economic impacts owing to the pandemic focus the mind and have been an important motivating force behind our work.
From 0 to 100K in eight months
Prior to SARS-CoV-2, the largest previous dataset for real-time genomic viral epidemiology during an epidemic was ~1500 genomes from the West African Ebola outbreak, which were sequenced over the course of 2014-2016. By comparison, COG-UK surpassed this total within the first month and has continued to push viral genome surveillance on to an entirely different scale ever since.
We have previously used the imagery of assembling an aeroplane from parts while already in the process of taking off to illustrate the manner in which COG-UK was launched and functioned through much of 2020. Indeed, genome sequencing began at pace from the very beginning and between the first meeting at the Wellcome Trust in London on the 11th of March to explore the opportunity to harness national capabilities to support genomic surveillance of COVID-19, and the first report sent to SAGE on the 20th of March, 260 SARS-CoV-2 genomes had been sequenced.
So the consortium was hurtling down the runway while still in the process of establishing exactly how this unprecedented national network would collaborate and function. Over the ensuing months COG-UK has developed into a consortium through which the public health agencies from all four UK nations are working together with researchers from academic institutions across the whole country in a truly national endeavour.
During these eight months, many logistical, technical, organisational and scientific hurdles have been overcome. Just skimming the surface, these included; working out how to route samples from testing laboratories to COG-UK sequencing laboratories; how to store, pick and sequence the positives; how to stock, staff and supply the sequencing laboratories; how to handle and analyse the resulting data; and how to integrate sufficient metadata to ensure the genome data is relevant for tackling outbreaks. All the while moving faster and handling more samples month by month.
The work involved has occupied hundreds of individuals in and out of laboratories across the country, many of which have worked long hours and sacrificed time with loved ones, who themselves will often have had to bear a larger burden of household and childcare duties as a result.
Of course, while the number of genomes resulting from all of this hard work is impressive in itself, some still question what is the need being addressed by sequencing at such scale as compared to just sampling a random fraction. The answer is that only with a dataset on this scale is it possible to get the high-resolution needed to make genome surveillance useful for investigating individual outbreaks; to look at the relatedness of the viruses within a health care institution, workplace, or community and compare to the transmission patterns in the surrounding areas to reveal the patterns linking individual cases and spot otherwise unidentifiable opportunities for intervention.