Commentary: COG-UK report 9 - 25th June 2020

Commentary for report 9 – 25th June 2020

The COVID-19 Genomics UK (COG-UK) Consortium has now sequenced more than 32,000 SARS-CoV-2 virus genomes from the UK. In the 9th report from the Consortium, dated 25th June, 29,593 of these were analysed.

The number of genomes is sufficiently large to be able to analyse the impact of a mutation in the SARS-CoV-2 Spike protein on viral transmission, which has been a focus of much interest over the past few months.

Overall, the researchers found slightly different epidemic growth rates for two variants of the virus. It is not currently possible to confidently determine the size of this effect, but an effect does appear to be present.

Importantly, there is no evidence of a link between the variants and severity of COVID-19 disease.

The Spike protein

The SARS-CoV-2 virus, like all coronaviruses, has characteristic proteins that stick out of the core of the virus and form the ‘crowns’ that give coronaviruses their name. This characteristic protein, the Spike protein, allows the virus to attach to, and then enter, human cells.

Researchers around the world have been monitoring a change in the Spike protein, which was first seen in February. This change in the virus genome results in one amino acid change in position 614 in the Spike protein, from aspartate (D) to glycine (G).

The frequency of SARS-CoV-2 genomes carrying this change, named the D614G variant, or 614G, or just ‘G’, has increased rapidly in both the UK and global datasets since it was first seen.

Assessing transmissibility

Research is underway to try to understand if a virus with the D614G variant is more transmissible. Laboratory studies are still in the early stages, but some suggest that this variant could be better at entering human cells. However, laboratory studies are currently limited in what they can reveal about transmission between patients and across a population.

Analysis of worldwide genome data can also allow assessments of transmissibility. Some analysis is consistent with the possibility of increased transmissibility of virus with the D614G variant. However, again, there are limitations. Studies of genomic data are inhibited by the inherent bias of global databases. Geographical biases or sampling biases mean the data may not be representative of the whole virus population, and so findings could be false signals. For example, there are very few genome sequences available from low- and middle- income countries. For countries where more data is available, there could be unseen biases that affect how the data has been collected. Sample bias and the context it provides for the virus population could affect any findings.

The ability to determine if a genomic variant impacts upon transmissibility can also be influenced by ‘founder effects’. This is where multiple introductions of a particular variant into a population can create a signal in the data that looks like increased transmission potential, where in fact there is no difference in transmissibility.

Large scale analysis

The COG-UK dataset, with over 30,000 genome sequences, is now beginning to reach sufficient scale to attempt analyses to resolve some of these issues. In the 9th COG-UK report, Dr Erik Volz, Dr Thomas Connor and Dr Andrew Rambaut present analysis of the growth rates of D and G variants of the virus.

The researchers used several methods to estimate growth rates. The findings show that groups of related virus (lineages) with the 614G variant have grown at a slightly faster rate than lineages with the 614D variant in the UK.

These findings rely on underlying assumptions that, in time, may need to be adjusted as more data becomes available, and so the researchers urge caution when interpreting their findings.

The fact that such a huge data set is needed to be able to see these effects suggests the link between the different lineages and transmission is a weak one. As the transmission of SARS-CoV-2 is affected by factors outside of simply how it gets into cells, not least by the behaviours and movements of people, this is perhaps unsurprising.

Continued monitoring and analysis of D614G are essential to determine if there is any relevance of this mutation to the progression of the pandemic in the UK and elsewhere.

Disease severity

To assess any potential links between the D and G variants and disease severity, the team analysed viral genome sequences and clinical characteristics from 1,870 patients. There was no statistically significant association between the D and G virus variants and patient outcomes.

While there are sources of potential biases in this dataset, the team are confident these have been accounted for as far as possible, and that this finding is robust.

Local analysis

As well as looking at genome sequences nationally, Dr Tom Connor, working with colleagues in Public Health Wales, has been analysing the data at a local level. The team has not assessed the spike protein variants specifically, but has used genomic data together with patient data, to assess local clusters of cases in North Wales. The work builds from an existing system developed to support the genomic surveillance of C. difficile bacteria.

The genomic data was able to support the identification of complex outbreaks across multiple wards in a hospital. Analysing the sequence data from hospital cases together with sequences from the community, it was possible to clearly identify different clusters.

The team has shown that systems which integrate genomic information with patient timelines and other information can provide powerful tools for infection prevention and control in hospitals. This can inform outbreak response in real time, and can help to understand the progression of past outbreaks, in order to improve future practice.

These same systems can also act as powerful public health tools for understanding outbreaks in communities and on a larger scale.