COG-UK passes 100K genomes

This week saw COG-UK pass an important milestone, having now sequenced more than 100K SARS-CoV-2 genomes 

When we began in March, few in the consortium envisaged having the capacity or need to sequence and analyse such a staggering number of viral genomes in such a short space of time. And yet just eight months later, as the 100K mark whizzes by, it is worth reflecting on why this unprecedented viral genomic surveillance effort has been needed, how the genome data generated has been used to better understand the transmission dynamics underpinning the COVID-19 pandemic in the UK, how the same data has provided insights that have helped to inform outbreak responses from the local to national level, and how working collaboratively and in an open manner has been of fundamental importance in getting this far.   

A sombre milestone

Naturally, while applauding the often herculean efforts of the many hundreds of individuals whose work has been instrumental to our progress to date, it is important to note that this is not an occasion on which anyone involved with COG-UK is celebrating. The reality behind these genomes, and the samples from which they were sequenced, is a grim one. Our thoughts are with the families and loved ones of the people who died of COVID-19 in the UK, of the individuals still suffering as a result of infection, and those whose health has been impacted in other ways as a result of interventions put in place to tackle the virus. As will be true of other strands of national and global COVID-19 responses, the unprecedented health, social and economic impacts owing to the pandemic focus the mind and have been an important motivating force behind our work. 

From 0 to 100K in eight months

Prior to SARS-CoV-2, the largest previous dataset for real-time genomic viral epidemiology during an epidemic was ~1500 genomes from the West African Ebola outbreak, which were sequenced over the course of 2014-2016. By comparison, COG-UK surpassed this total within the first month and has continued to push viral genome surveillance on to an entirely different scale ever since. 

We have previously used the imagery of assembling an aeroplane from parts while already in the process of taking off to illustrate the manner in which COG-UK was launched and functioned through much of 2020. Indeed, genome sequencing began at pace from the very beginning and between the first meeting at the Wellcome Trust in London on the 11th of March to explore the opportunity to harness national capabilities to support genomic surveillance of COVID-19, and the first report sent to SAGE on the 20th of March, 260 SARS-CoV-2 genomes had been sequenced. 

So the consortium was hurtling down the runway while still in the process of establishing exactly how this unprecedented national network would collaborate and function. Over the ensuing months COG-UK has developed into a consortium through which the public health agencies from all four UK nations are working together with researchers from academic institutions across the whole country in a truly national endeavour.

During these eight months, many logistical, technical, organisational and scientific hurdles have been overcome. Just skimming the surface, these included; working out how to route samples from testing laboratories to COG-UK sequencing laboratories; how to store, pick and sequence the positives; how to stock, staff and supply the sequencing laboratories; how to handle and analyse the resulting data; and how to integrate sufficient metadata to ensure the genome data is relevant for tackling outbreaks. All the while moving faster and handling more samples month by month. 

The work involved has occupied hundreds of individuals in and out of laboratories across the country, many of which have worked long hours and sacrificed time with loved ones, who themselves will often have had to bear a larger burden of household and childcare duties as a result. 

Of course, while the number of genomes resulting from all of this hard work is impressive in itself, some still question what is the need being addressed by sequencing at such scale as compared to just sampling a random fraction. The answer is that only with a dataset on this scale is it possible to get the high-resolution needed to make genome surveillance useful for investigating individual outbreaks; to look at the relatedness of the viruses within a health care institution, workplace, or community and compare to the transmission patterns in the surrounding areas to reveal the patterns linking individual cases and spot otherwise unidentifiable opportunities for intervention. 

Sequencing at this level of resolution has meant that individual outbreaks can be investigated – in hospitals, care homes, factories, and larger geographical regions. But it needs the background genomes to put it in context and find connections.

Andrew Rambaut, University of Edinburgh

By the 15th of October, COG-UK researchers (and data) had been used in more than 120 retrospective or real-time outbreak investigations, working closely with local health and infection management teams, demonstrating the real-world utility and impact of integrating genomic insights into outbreak responses. 

Textbook demonstration of the benefits of open-science and data sharing

Trying to bring the long hours spent, hard work and sacrifice of all involved with COG-UK into focus is simultaneously exhausting and invigorating. It is a tale of cooperation and collaboration to tackle problems that for individual laboratories and institutions would otherwise have been insurmountable. 

Only through an open science approach has it been possible to make the advances seen. From day one, genome data produced by COG-UK has been shared rapidly through the European Nucleotide Archive (ENA) and Global Initiative on Sharing All Influenza Data (GISAID). Similarly, the bioinformatic tools developed through COG-UK that enable analyses of the large viral genome datasets have been made publicly available. 

We believe in open science, but also know the risks of sharing data before publication. We are extraordinarily lucky to have fantastic collaborators in COG-UK, which has seen us being able to be involved with/publish work that would not have otherwise been possible

Tom Connor, Cardiff University & Public Health Wales

This open approach to sharing data and tools has not only enabled researchers and investigation teams to investigate outbreaks across the UK, but many of the tools developed have been adopted as standard internationally. The UK dataset is used as a baseline with which researchers in other countries can understand what they are seeing (and what they are not seeing) in their more limited viral genomic datasets. Furthermore, COG-UK is working to establish links with researchers around the world, to share insights into the experiences gained over the last eight months as they begin to come up against some of the same hurdles that we have overcome. 

The next 100K?

Everyone within the consortium (and beyond) surely hopes that this is the only time we will need to mark the passing of a milestone that measures in the hundreds of thousands of SARS-CoV-2 genomes sequenced. Especially as green shoots begin to emerge from other arms of the COVID-19 response, with news of vaccines performing well in phase III clinical trials. 

Yet despite the potential for physical, mental and emotional exhaustion as we head into the winter in the UK after the longest of years, COG-UK partner institutions and individual members will be redoubling efforts to improve the volume and speed with which we can deliver insights from genome sequencing to infection management teams. 

Furthermore, with the impending roll out of national vaccination programmes, which will begin to apply a new selective pressure to SARS-CoV-2 as it continues to spread, COG-UK researchers are working with others to monitor changes in the evolution of the virus, to identify mutations that may impact on the efficacy of particular vaccines (as potentially seen with the recent SARS-CoV-2 mink outbreak) and prioritise them for further investigation, and to inform national and local policy for vaccine choices and non-pharmaceutical interventions going forward. 

To borrow an image from the cricket field, COG-UK can now raise its collective bat in acknowledgement of a hard fought century on a tough wicket, before settling back in at the crease to refocus on the rest of the innings to come.  


COVID-19 Genomics UK (COG-UK)

The current COVID-19 pandemic, caused by the SARS-CoV-2 virus, represents a major threat to health. The COVID-19 Genomics UK (COG-UK) consortium has been created to deliver large-scale and rapid whole-genome virus sequencing to local NHS centres and the UK government.

Led by Professor Sharon Peacock of the University of Cambridge, COG-UK is made up of an innovative partnership of NHS organisations, the four Public Health Agencies of the UK, the Wellcome Sanger Institute and twelve academic partners providing sequencing and analysis capacity. A full list of collaborators can be found here: 

COG-UK was established in March 2020 supported by £20 million funding from the UK Department of Health and Social Care (DHSC), UK Research and Innovation (UKRI) and the Wellcome Sanger Institute, administered by UK Research and Innovation. For more information, visit: