22 Jan 2021

Reflections on the achievements of COG-UK

Sharon Peacock, on behalf of consortium members

The COVID-19 Genomics UK (COG-UK) Consortium is ten months old this January. As with many other scientific efforts worldwide that have arisen as a direct response to the COVID-19 pandemic, many years’ worth of effort and achievement from COG-UK members have been compressed into this short time period. Here are ten of these achievements, one for each month of our existence.


1. Getting started

Growing the capabilities of COG-UK from zero to a fully functioning genomics consortium involving the four Public Health Agencies (PHAs) of the United Kingdom (UK), numerous academic institutional partners across the four nations, the Wellcome Sanger Institute, and diagnostic laboratory partners including the Lighthouse Labs and multiple NHS laboratories, has required a herculean effort by many hundreds of people.

To have assembled and maintained the cohesion of COG-UK despite the frequently contrasting approaches and underlying goals of different partner members is also noteworthy, particularly when set against a backdrop of a constant churn of events as the pandemic has unfolded. There are many reasons for our successes, but central to them all has been our common goal to generate genome data to provide scientific evidence that supports PHAs, governments and the wider world. 


2. Generating data for Public Health Agencies

One of our major objectives and achievements has been to generate a large body of data for use by PHAs, and by researchers who provide interpretations based on phylogeny, modelling and other approaches that inform policy decisions.

All data generated by COG-UK becomes rapidly available to the four UK PHAs who gain access via MRC-CLIMB (the Cloud Infrastructure for Microbial Bioinformatics — funded by the UK’s Medical Research Council). Within this computing infrastructure, SARS-CoV-2 genome sequence data is rapidly analysed using a range of tools developed by consortium members (see below) that enable public health teams to answer questions about local, regional and national viral transmission, the success of interventions, and the ongoing evolution of the virus. This capability has been used by PHAs to investigate hundreds of suspected COVID-19 outbreaks. CLIMB has been recognised in the annual HPCwire Readers’ and Editors’ Choice Awards 2020 for its role in supporting COG-UK.

COG-UK had sequenced more than 200,000 SARS-CoV-2 genomes by mid-Jan 2021, which equates to around 10% coverage in the UK (defined as the number of genome sequences from COG-UK, out of the total number of COVID-19-positive tests in the UK). Weekly coverage report summaries on the genome numbers sequenced from the four nations are made available here, with a more detailed report provided to PHAs that includes information on methods as well as coverage by testing pillar and COG-UK sequencing site.


3. Research priorities

The constant flow of new evidence on SARS-CoV-2 and COVID-19 means that COG-UK needs to regularly refresh its research priorities. Our latest review based on an information-gathering exercise within the consortium focused on three research areas: those that improved our understanding of SARS-CoV-2 evolution, biology and disease pathogenesis; priority questions for surveillance, epidemiology and infection management; and questions that did not fall neatly into these two categories. This review was completed and published in November 2020. Of note, mutations in the viral genome were highlighted as a major area for research. The full report is available here, together with a past blog that provides more details of the review process.


4. Research on viral transmission

Much of our early research focused on the fine-scale genetic lineage structure of SARS-CoV-2, and analysing the dynamics of transmission and introduction. This included striking expositions of the history of SARS-CoV-2 importations into the UK. An analysis of the first wave of infection in the UK found rapid fluctuations in virus importation rates that resulted in the introduction of >1000 lineages, with lineage importation and regional lineage diversity declining after lockdown (Ref 1). This, and a second study of introductions into Scotland during the first wave (Ref 2) highlighted the role of European travel in COVID-19 emergence in the UK. Separate studies of the first wave plus the early part of the second wave in Wales and Scotland demonstrated that following the first lockdown, many SARS-CoV-2 lineages that were circulating in the population became extinct once the first wave had been supressed. The second wave was driven by numerous new importations into both countries, largely through visitors and returning travellers from other parts of the UK and from European countries (Ref 3Ref 4).

By combining genomes with time and place information for the people who had their virus sample sequenced, we have been able to generate evidence on SARS-CoV-2 transmission in particular environments, including hospitals, long-term care facilities, schools, universities and work environments. For example, the impact of real-time genomic surveillance of SARS-CoV-2 was evaluated in a hospital in the East of England. This demonstrated the benefit of combined genomic and epidemiological analysis for the investigation of healthcare-associated COVID-19 infections (Ref 5), and was used to identify sources for, and control outbreaks.

Another of our research initiatives, the COG-UK HOCI (hospital-onset COVID-19 infection) study, is a phase III prospective, interventional, cohort, superiority study to evaluate the benefit of rapid COVID-19 genomic sequencing on infection control in preventing the spread of the virus in UK NHS settings. This work is essential if we are to achieve the level of confidence to support widespread roll-out of genome sequencing into the clinic in the longer term. The ability to support routine sequencing depends on simple and intuitive automated genome analysis tools. Several have been developed (Ref 6Ref 7), one of which is in routine use in the HOCI trial. 


5. Research on mutations

The appearance of mutations in the SARS-CoV-2 genome was inevitable following its emergence towards the end of 2019, reflecting the pattern of natural evolution for viruses as well as other microorganisms. The SARS-CoV-2 genome is made up of around 30,000 nucleotides, and it naturally accumulates one to two mutations every month as it replicates and circulates in the human population.

We and others noted the emergence of the D614G replacement, a mutation in the spike protein of the virus, which decorates the outside of the virus and mediates viral attachment to, and then entry into the host cell where it can replicate. This mutation was not present when the virus first emerged but has now become almost ubiquitous. Population genetic analyses indicate that 614G confers increased transmission fitness, but there was no evidence that patients infected with the Spike 614G variant had higher COVID-19 mortality or clinical severity (Ref 8).

Approaches developed to study D614G have been rapidly put to further use, including following the detection and appearance of lineage B.1.1.7, termed Variant of Concern 202012/01 (VOC) by Public Health England. This lineage is more transmissible than other lineages tracked previously (Ref 9Ref 10), but again there is no evidence of increased disease severity or an adverse effects on immunity and vaccine efficacy.

Although speculative, it has been proposed that variants with numerous mutations (such as VOC) may first arise in chronically infected people (Ref 9). This is based on the observation that high rates of mutation accumulation over short time periods have been reported in a small number of studies of immunodeficient or immunosuppressed patients who are chronically infected with SARS-CoV-2 (Ref 11). However, more work is required to understand how frequent this is in this population.


6. Applied research to improve sequencing efficiency 

The emergence of SARS-CoV-2 required the development of sequencing methodologies that reliably generate high quality genome data at the lowest cost and in the fastest time. The ARTIC protocol has become the mainstay for SARS-CoV-2 sequencing, which is open access and used worldwide.

The time it takes to generate a genome is an important issue when it comes to urgent outbreak situations. Generally, the fastest achievable turnaround time from arrival of an extract from a positive PCR test to first interpretation of genome data is around 24 hours. COG-UK can currently meet such rapid turnaround for urgent samples in low volume. But the higher volume sequencing that we aim to achieve (10,000 samples per week at the moment, rising to 20,000 samples per week by March 2021) is delivered using instruments that sequence a large number of samples at the same time (so-called multiplexing). As a result, we rely on the flexibility of combining rapid sequencing of lower volumes of samples as required, with high volume sequencing with a turnaround time target of five days.

Reducing cost has also remained an important target. In April 2020, we estimated that the cost of reagents to sequence one sample was £56, but through a process of continuous improvement this can now be achieved at £40 per sample in our smaller sequencing sites, and £20 per sample in the high volume sequencing sites.


7. Data interpretation tools and data release 

COG-UK members have produced a suite of tools capable for the analysis of SARS-CoV-2 genome data on the scales required. For example, Pangolin (Phylogenetic Assignment of Named Global Outbreak LINeages) allows users to assign lineages to genome sequences, view descriptive characteristics of the assigned lineages, view placement of the lineage in a global phylogeny and view the temporal and geographic distribution of the assigned lineages. It also enables user samples to be contextualised within the global context by linking to Microreact, a web application that provides a simple but powerful data linkage and visualisation method for linking genomics to epidemiology.

By linking phylogenetic trees together with geographic, temporal or other associated metadata research and public health audiences can easily interpret data. CoV-GLUE is a web application for the interpretation and analysis of SARS-CoV-2 genome sequences that allows users to browse a database of amino acid replacements and coding region insertions and deletions in sequences. It also allows users to analyse their own SARS-CoV-2 sequences and receive an interactive report.

An important principle for the consortium is open access to data, methods and software analysis tools. All genome data are released to GISAID and ENA.


8. Administration and logistics achievements

Three foundational achievements are indispensable to the way that we work. First, our Consortium Agreement brought together the four UK PHAs and fifteen academic institution partners, which represents an unprecedented achievement for UK science and public health. Second, our Data Sharing Agreement supports the integration of epidemiological data from public health agencies with the genome sequences, enabling certain analyses using information not released into the public domain. Third, our Publication Policy ensures that consortium publications are fair and recognise the efforts of its many members, including those who are undertaking the extensive laboratory work that goes towards generating the genome data. Reaching each of these milestones has involved countless administrative and legal departments, and much patience and willingness to work through differences as they arose.

Having created a network of sequencing sites, we needed mechanisms to ensure a highly interconnected whole. This has been a huge achievement in terms of movement of samples and data, consumables, staffing, management, governance and communication. We also maintain numerous operational and working group meetings internally, and with partners and stakeholders.

At the Wellcome Sanger Institute, where high volume samples from the Lighthouse Labs are sequenced, cherry picking of positive samples is required from the 96 well plates used in testing laboratories, which may have a positivity rate as low as 1-2% when virus prevalence in the community is low. Storage of more than 6 million samples is a major achievement. Our working relationship with the Lighthouse Labs is fundamental to our success, and the data that we generate. 


9. Events, seminars and external information

Our December 2020 Science Showcase event highlighted many aspects of the cutting-edge science undertaken by the consortium during the past 10 months, and we are making plans to host further such showcases in the coming months. The recording is available here for those who missed it. We also organise regular seminars to update consortium members and others on progress and outputs, on topics such as the genomics of hospital-onset infection and detecting SARS-CoV-2 in wastewater.

Our existing website is being revamped with a new version, which will be launched in February 2021. But in the meantime, we continue to provide regular blogs, commentaries, news items, reports and explainers for the research community, media and lay public, and post our latest updates on our LinkedIn and Twitter accounts. This year we are also developing an initiative which will be focused on showcasing and celebrating women in COG-UK and the work they do. 


10. National and international collaboration

Collaborating with others is a key part of sharing our knowledge. Important partners include GenOMICC (helping to understand critical illness in people with COVID-19 through human genome sequencing), CanCOGen (Genome Canada), the UK Coronavirus Immunology Consortium, the Genotype to Phenotype Consortium (G2P)REACT (real-time assessment of community transmission of coronavirus (COVID-19), the Office for National Statistics coronavirus (COVID-19) infection survey, and the UK COVID-19 wastewater consortium. COG-UK also work closely with the Foreign and Commonwealth Office and respond to requests for collaboration and discussion.


Looking to the future 

Managing to have realised all of the above set against a backdrop of lockdowns, restrictions, remote working (for many), endless zoom meetings, home schooling, supporting family members, and much more is perhaps the biggest achievement of all. This comes down to the dedication, commitment and support of our consortium members.

Our journey began with backing and support from Sir Patrick Vallance and Professor Chris Whitty, and funding from government sources and the Wellcome Sanger Institute. It is now time to head into a busy year during which there will be an orchestrated process of handover of our foundations, deep subject matter expertise, and technical know-how to a new nationalised pathogen sequencing capability that will be led by the PHAs and the NHS. This transition has already begun, will gather pace in the coming months and continue through to early 2022, and has the potential to result in what promises to be among the strongest public health pathogen sequencing networks worldwide.


Image Credit: Alex Cagan.


COVID-19 Genomics UK (COG-UK)

The current COVID-19 pandemic, caused by the SARS-CoV-2, represents a major threat to health. The COVID-19 Genomics UK (COG-UK) consortium has been created to deliver large-scale and rapid whole-genome virus sequencing to local NHS centres and the UK government.

Led by Professor Sharon Peacock of the University of Cambridge, COG-UK is made up of an innovative partnership of NHS organisations, the four Public Health Agencies of the UK, 15 academic partners providing sequencing and analysis capacity, and the central sequencing hub of the Wellcome Sanger Institute. A full list of collaborators can be found here. Professor Peacock is also on a part-time secondment to PHE as Director of Science, where she focuses on the development of pathogen sequencing through COG-UK.

COG-UK was established in April 2020 supported by £20 million funding from the COVID-19 rapid-research-response “fighting fund” from Her Majesty’s Treasury (established by Professor Chris Whitty and Sir Patrick Vallance), and administered by the National Institute for Health Research (NIHR), UK Research and Innovation (UKRI), and the Wellcome Sanger Institute. The consortium was also backed by the Department of Health and Social Care’s Testing Innovation Fund on 16 November 2020 to facilitate the genome sequencing capacity needed to meet the increasing number of COVID-19 cases in the UK over the winter period.