Report 1: 23rd March 2020 – COVID-19 Genomics UK (COG-UK) Consortium
Please Note: This report is provided at the request of SAGE and includes information on the ongoing state of the research being carried out. It should not be considered formal or informal advice. The conclusions of the ongoing scientific studies may be subject to change as further evidence becomes available and as such any firm conclusions would be premature.
- The COG-UK consortium has rapidly established a multi-agency national network to deliver coordinated large-scale sequencing and analysis of SARS-CoV-2 genomes.
- In the space of 12 days, the COG-UK centres already online have sequenced and analysed 260 SARS-CoV-2 genomes.
- Initial analyses indicate a large number of independent SARS-CoV-2 introductions to the UK, from multiple locations around the world.
- Enabling access to patient electronic health records to allow a detailed epidemiological analysis of the early spread of cases in the UK COVID-19 epidemic is a crucial next step.
About the COG-UK Consortium
The COVID-19 Genomics UK Consortium (COG-UK) – composed of the NHS, UK Public Health Agencies, multiple UK Universities, and the Wellcome Sanger Institute – has been established to deliver large scale, rapid sequencing of SARS-CoV-2 genomes.
In just 12 days, COG-UK has assembled a truly national network of sequencing centres and analysis groups located in Belfast, Birmingham, Cambridge, Cardiff, Edinburgh, Exeter, Glasgow, Liverpool, London, Norwich, Nottingham, Oxford and Sheffield. These initial locations are based on sequencing capacity that is immediately available; we envisage incorporating sequencing capacity at other locations in future.
A national operations hub and large-scale sequencing facility has been established at the Wellcome Sanger Institute and the University of Cambridge. It currently includes approximately 30 people drawn from Sanger core staff and has the potential to grow to meet demand.
The coordinated sequencing of samples from patients with confirmed cases of COVID-19, and subsequent analysis of genomic data by the COG-UK consortium will enable clinicians and public health teams to rapidly investigate clusters of cases, to understand how the virus is spread and to implement appropriate infection control measures.
Achievements to date
A meeting at the Wellcome Trust in London on 11th March brought together key stakeholders to explore the opportunity to harness national capabilities to support genomic surveillance of COVID-19.
- Proposal submitted to the UK government on 16th March outlining the COG-UK consortium plan to deliver a large scale and rapid SARS-CoV-2 sequencing capacity to local NHS centres and the UK government at pace.
- By the data cut-off for this report on 20th March, COG-UK has sequenced, assembled, collated and analysed 260 complete SARS-Cov-2 genomes from 7 sequencing centres in the network. To place this in context, in just 11 days COG-UK generated more virus genomes than reported by any other country, other than China, during the entire epidemic to date.
- Working groups have been established in five key areas:
- Sample collection : finding effective ways of working with NHS and public health labs at both the local and national level.
- Methodology: sharing and standardising protocols for sample processing and genome sequencing.
- Metadata: : developing protocols and informatics solutions to link genomic data to clinical, epidemiological and biological data collected by the NHS, public health agencies and academic groups.
- Analysis: inferring epidemiological processes from genome sequence data using phylogenetic trees and other computational methods.
- Report writing: producing weekly reports based on the analytical findings and tailored for the needs of the NHS, public health agencies and government.
- A coordinated naming system for all sequenced UK samples has been developed and agreed.
- A digital analysis and data sharing environment has been established on MRC CLIMB (Cloud Infrastructure for Microbial Bioinformatics), where all samples were collated and analysed.
- A system for sharing visualizations of the relationships and distribution of samples has been deployed, using Microreact (Appendix Fig S3).
- Plans for the collation of virus genome sequence and linked, anonymised eHealth data have been initiated with HDR-UK.
- An initial outline plan for the types of required metadata has been developed and circulated to NHS partners for approval.
- A system for metadata submission and collection from the diverse range of organisations involved has been created using Epicollect.
- A comprehensive system for collaborative online working and teleconferencing has been established.
Questions COG-UK will address
We have identified multiple public health applications of large-scale SARS-CoV-2 genome sequencing:
- Enhance our understanding of epidemiology and transmission of SARS-CoV-2.
- Infer the relative contribution of local transmission versus imported cases.
- Distinguish individual chains of transmission at a local level.
- Estimate rates of epidemic growth, unreported cases and rate of sampling.
- Determine patterns of within-country virus spread among locations and population sub-groups.
- Provide insights into mechanisms and drivers of long-distance dispersal.
- Enable monitoring of interventions and treatment
- Monitor the effect of non-pharmacological interventions (such as travel restrictions, social distancing and isolation) on SARS-CoV-2 spread, epidemiology and biology.
- Monitor changes in virus antigenicity and the emergence of resistance mutations in response to the introduction of vaccines and drugs.
- Identify virus genetic markers associated with clinical severity to prioritize for functional investigation.
- Expand biological understanding and further research
- Assess the functional and phenotypic relevance of mutations.
Initial analytical findings
Only 12 days since the initial conception of the COG-UK consortium, the genomic data being generated and analysed have already allowed us to begin to address important questions regarding the spread of COVID-19 in the UK and beyond.
- We have identified 12 viral lineages in the 260 UK SARS-CoV-2 genomes sequenced to date. However, under-sampling in the UK and elsewhere means the number of independent introductions of SARS-CoV-2 is very likely substantially higher.
- The data are consistent with a large number of independent SARS-CoV-2 introductions to the UK, from multiple locations around the world.
- Major import to the UK seems to have occurred from locations with large epidemics and high travel volumes, notably Italy and other parts of Europe.
- Community transmission of multiple lineages/clusters following introduction is likely, as revealed by cases where genomes have come from patients with no known contact with an international source.
- A crucial next step for COG-UK is to access patient electronic health records to allow a detailed epidemiological analysis of the early spread of cases in the UK COVID-19 epidemic. This analysis will potentially provide important insight to help design future early interventions of viral introductions and transmission.
- Secondly, access to electronic health records will enable COG-UK to investigate any association between viral lineage, genotype and severity of disease.
Figure S1 | Phylogenetic tree of the first 260 UK genome sequences in the context of all global data. Larger circles denote cases from England (red), Northern Ireland (pink), Scotland (blue) and Wales (green). The distribution of UK cases across the entire global diversity reveals the many imports of the virus from across the world. Bars on the right denote the lineages with cases in the UK and correspond to those in Figure 2. The lineages B.11 and B.12.1 at the bottom of the tree are the predominant lineages circulating in Italy and continental Europe. Clustering of UK genomes together may be indications of community spread but this must be interpreted with caution as such groupings would also be expected as a result of travellers returning from common destinations.
Figure S2 | Detailed phylogenetic tree of all global SARS-CoV-2 genomes from GISAID as of 21-Mar-2020 combined with 260 UK virus genomes sequenced by COG-UK. Viruses are coloured by location of sampling and lineages are labelled using the same scheme as Figures 2 and S1.
Figure S3 | Figure S3 | Data are live linked and visualised using Microreact (web application) enabling ongoing monitoring of trends across the consortium.
Download a PDF of this Report
COG-UK researchers make highly cited list
Eight COG-UK Consortium members and associates have been recognised by Clarivate as some of the most highly cited researchers of 2020.
COVID-19 in care homes — what have we learned from genome sequencing?
Read COG-UK partner Quadram Institute’s explainer blog on the latest findings of how the SARS-CoV-2 virus spreads within care homes.
The value of large-scale coordinated sequencing activities to understand a pandemic in real-time
In recent work from COG-UK consortium investigators, Erik Volz and colleagues investigated the D614G mutation in the population by using more than 25,000 viral genomes that have been sequenced in the UK over a period between February and June 2020 in order to understand the pandemic in real-time.