10 Apr 2020

Report 3: 9th April 2020 – COVID-19 Genomics UK (COG-UK) Consortium

Report by the COVID-19 Genomics UK (COG-UK) Consortium

Report 3: 9th April 2020 – COVID-19 Genomics UK (COG-UK) Consortium

Executive Summary

  • COG-UK has brought online an additional sequencing centre (Exeter) and increased the number SARSCoV-2 genomes sequenced and analysed to a total of 1679 (up from 806 on the 31st of March). The UK has now reported the largest number of genomes of any individual country in the pandemic to date, accounting for around one third of the global total.
  • Roadblocks remain to access and integrate epidemiological data with viral genome sequence data for some regions of the UK, limiting the ability of COG-UK to realise the full potential of genomic surveillance to impact the course of the COVID-19 pandemic.
  • Analyses are underway to develop weekly phylodynamic situation reports with outputs, such as reproduction number estimates, at the level of cities or local authorities, and to integrate aggregate mobility data with SARS-CoV-2 genomic data.

COG-UK update

In the past week, an additional sequencing centre at the University of Exeter has been brought on line, bringing the total number of already active COG-UK sites to nine. A further seven sequencing centres are expected to come online shortly (Figure 1).

Online sequencing capacity is sufficient to sequence ~9K samples per week, which currently exceeds ability to access positive samples.

The pipeline for downloading latest datasets from GISAID, quality control, integration with the latest UK data and installation on CLIMB is now essentially automated. With a firm cut off on Friday afternoon, the data and phylogenetic trees should now be available for annotation and subsequent analysis from Monday morning each week.

By the data cut-off for this report, the total number of viral genomes now stands at 1679 (Table 1). To place this in context, in the month that the consortium has been active to date, COG-UK has generated more genomes than reported by any other country during the entire epidemic to date, accounting for approximately one third of the total number of SARS-CoV-2 genomes reported (Table 2).

COG-UK has proposed (and adopted) a workable and practical nomenclature system for SARS-CoV-2 to describe virus lineages and to facilitate real-time genomic epidemiology by providing commonly-agreed labels to refer to viruses circulating in different parts of the world (Full report here)

From the genomes available, we have identified at least 39 viral lineages that have been or are in circulation within the UK. Figure 2 below illustrates the duration, size, and geographic distribution of these lineages. As noted in previously, this is likely a substantial underestimate of the number of independent virus introductions and active transmission chains in the country.

Using data generated through COG-UK, we are working towards establishing a system for generating weekly phylodynamic (i.e. how epidemiological, immunological, and evolutionary processes act and interact to shape viral phylogenies) situation reports with outputs, such as reproduction number estimates, at the level of cities or local authorities.

Access to aggregated UK mobility data from 20th March onwards from the mobile phone operator O2 has been secured and work is underway to integrate this mobility data with SARS-CoV-2 genomic data to help discriminate among different scenarios of spatial spread when the genomic data are not sufficiently informative. An initial trial on a location with a high density of sampling and good metadata for this type of analysis is being considered.


  • Building the infrastructure and embedding COG-UK personnel to access patient electronic health records and obtain the metadata that will enable our analyses to provide actionable insights remains a roadblock in some areas. To realise the full potential of COG-UK genomic surveillance and to begin to address pressing epidemiological questions, it is imperative that these barriers are surmounted with a high priority.
  • Sequencing capacity now outstrips the ability to access positive samples. Further work to enable the smooth transfer of samples from NHS laboratories to COG-UK sequencing centres (for instance, direction that material transfer agreements are not essential) needs to be prioritized.

Analysis Summaries

This weekly report focuses on providing an update on COG-UK progress, reporting key numbers and statistics, and highlighting specific bespoke analyses that are being undertaken. This section will provide brief summaries of these analyses. In-depth reports will be submitted separately, where appropriate.

Visualising transmission chains

As more genomic data becomes available, our analyses are able to map the duration, size, and geographic distribution of the SARS-CoV-2 lineages circulating in the UK and to reconstruct transmission chains. As can be seen in Figure 2, we can identify at least 39 lineages that have been or are currently circulating. Some of these lineages are probably no longer in circulation (as demonstrated by the lack of genomes from later time points), although additional data in the coming weeks will be required to confirm this.

Figure 2 | Reconstructed UK transmission lineages with at least 5 viral genomes. Each row represents a single introduction and subsequent UK spread reconstructed from the virus genomes captured in the COG-UK sample in combination with global genome data. The diameter of the circles represent the number of virus genomes for the day of sampling. The bars denote the number of virus genomes in that lineage.


Figure S1 | Phylogenetic tree of the first 1679 UK genome sequences in the context of all global data. Larger circles denote cases from England (red), Northern Ireland (pink), Scotland (blue) and Wales (green). The distribution of UK cases across the entire global diversity reveals the many imports of the virus from across the world. Bars on the right denote the lineages with cases in the UK and correspond to those in Figure 2. The lineages Clustering of UK genomes together may be indications of community spread but this must be interpreted with caution as such groupings would also be expected as a result of travellers returning from common destinations

Figure S2 | Geographic distribution of known SARS-CoV-2 lineages in the UK – Pie charts summarise distribution of lineages at each location.

Figure S3 | Current live view of data linked and visualised using Microreact – geographic and temporal patterns in lineages can be shared and investigated. Figure above highlights geographic and temporal distribution of Lineage B.1.11

COVID-19 Genomics UK (COG-UK)

The COVID-19 Genomics UK (COG-UK) consortium works in partnership to harness the power of SARS-CoV-2 genomics in the fight against COVID-19.

Led by Professor Sharon Peacock of the University of Cambridge, COG-UK is made up of an innovative collaboration of NHS organisations, the four public health agencies of the UK, the Wellcome Sanger Institute and sixteen academic partners. A full list of collaborators can be found here.

The COVID-19 pandemic, caused by SARS-CoV-2, represents a major threat to health. The COG-UK consortium was formed in March 2020 to deliver SARS-CoV-2 genome sequencing and analysis to inform public health policy and to support the establishment of a national pathogen sequencing service, with sequence data now predominantly generated by the Wellcome Sanger Institute and the Public Health Agencies.

SARS-CoV-2 genome sequencing and analysis plays a key role in the COVID-19 public health response by enabling the identification, tracking and analysis of variants of concern, and by informing the design of vaccines and therapeutics. COG-UK works collaboratively to deliver world-class research on pathogen sequencing and analysis, maximise the value of genomic data by ensuring fair access and data linkage, and provide a training programme to enable equity in global sequencing.