12 Jun 2020

The COVID-19 genome


The COVID-19 genome

What studying the virus’ genomes can tell us about the pandemic

More than 20,000 SARS-CoV-2 that have caused infection in people in the UK have been sequenced by the Covid-19 Genomics UK (COG-UK) Consortium to date. Analysing these and future sequences will help create the evidence for how to use these as a vital part of controlling the pandemic.

Information from the genome sequences will help track the spread of the coronavirus in the UK and support public health planning and clinical decision making.

In such a fast-moving situation, careful interpretation of information from genome sequences, together with additional data, is essential. Here, we reflect on what the genome data can, and can’t, tell us.

Viral genomes

Virus genomes are not made of DNA like most organisms, but RNA. The genome sequence of the SARS-CoV-2 virus was determined several months ago after it was first detected in China[1]. It is small, at just under 30,000 letters, or bases (29 kilobases), with only 15 genes. Humans, by comparison, have around 20,000 genes in a 3 billion base pair (3.3 gigabase) genome. More about COVID-19 biology is available on the UKRI website.

Why sequence genomes of the SARS-CoV-2 virus?

Building genome-based trees to define transmission

Genomes mutate. Letters in the genome sequence change as organisms replicate. Virus genomes usually mutate at a steady rate – HIV extremely rapidly, influenza slower, and coronavirus slower still. Researchers can use the mutation rate as a molecular clock. Any genetic difference between two viruses is proportional to the time since they last shared a common ancestor. The individual virus sequences can be placed back in time on a phylogenetic tree, much like a family tree, which determines the relatedness of two or more SARS-CoV-2 viruses.

With a new virus, it is hard to initially define how fast the clock is ticking. The SARS-CoV-2 mutation rate was initially based on that of related viruses, though researchers now estimate it has a mutation rate of approximately 2.5 bases a month – slow in evolutionary terms.

Together with the fact that the virus has a very recent common ancestor – in December 2019 – the slow mutation rate means that there is limited genomic diversity in the circulating viruses so far, although that will change over time as mutations accumulate. Despite this, it has been possible to trace the virus’s history, from the centre of the outbreak, to all corners of the world. Researchers are constantly refining and updating the picture as more evidence becomes available. To view global data for SARS-CoV-2 to date, visit

Local transmission

The same principles of building a phylogenetic tree can be used on a more local scale, too. The virus in a particular area, be that a hospital, town, or region, may have a particular genomic change. This change may be different from a virus that is multiplying and spreading in another area. If a third area is tested, researchers can, in some cases, trace where it has come from, based on its sequence.

COG-UK researchers envisage that we will soon be at this point in the UK, where they will have accumulated enough data to see ‘local’ mutations in the virus. If sequencing can be done in real-time, then this is important information for public health officials – outbreaks can be spotted and brought to a rapid close, as well as other interventions being introduced to reduce the chances of this happening again.

Recent research led by Professor Ian Goodfellow and Dr Estée Török at the University of Cambridge assessed how useful genomic sequencing of the virus can be within a hospital. They assessed hundreds of virus sequences from Cambridge University Hospitals NHS Foundation Trust during March and April. Together with data about the movement of patients and staff, they were able to identify clusters of infections that were linked, and some that weren’t. This, in turn, helped inform infection control procedures. The genomic data provided evidence to support or refute transmission between potentially linked cases.

False Connections

But caution is needed when interpreting such data. Sequences from two or more people could be the same through chance rather than because they are part of an outbreak. Other information, such as whether the people involved have been in direct contact or shared the same environment, is an essential part of the process when investigating possible outbreaks.

As a  result, it is easier to rule out outbreaks when people with covid-19 have viruses are genetically distinct than it is to confirm an outbreak when genomes are the same. Extensive spread of the virus means that identical genomes can be seen even in different countries despite the lack of a direct epidemiological link.

An important pitfall is when not enough sequences are used in an outbreak analysis. This can lead to false connections being made between genomes, which when more sequences are added to the analysis can become more distantly related. Genomic analysis is a dynamic process and will depend on having the right number and sampling strategy.


COVID-19 Genomics UK (COG-UK)

The COVID-19 Genomics UK (COG-UK) consortium works in partnership to harness the power of SARS-CoV-2 genomics in the fight against COVID-19.

Led by Professor Sharon Peacock of the University of Cambridge, COG-UK is made up of an innovative collaboration of NHS organisations, the four public health agencies of the UK, the Wellcome Sanger Institute and sixteen academic partners. A full list of collaborators can be found here.

The COVID-19 pandemic, caused by SARS-CoV-2, represents a major threat to health. The COG-UK consortium was formed in March 2020 to deliver SARS-CoV-2 genome sequencing and analysis to inform public health policy and to support the establishment of a national pathogen sequencing service, with sequence data now predominantly generated by the Wellcome Sanger Institute and the Public Health Agencies.

SARS-CoV-2 genome sequencing and analysis plays a key role in the COVID-19 public health response by enabling the identification, tracking and analysis of variants of concern, and by informing the design of vaccines and therapeutics. COG-UK works collaboratively to deliver world-class research on pathogen sequencing and analysis, maximise the value of genomic data by ensuring fair access and data linkage, and provide a training programme to enable equity in global sequencing.