The terms used to describe the genetics of viruses are easily misunderstood and can contribute to inaccurate reporting or cause unnecessary confusion. With this Explainer, we aim to define some of these terms using clear language and straightforward examples.
At the COVID-19 Genomics UK (COG-UK) Consortium we sequence genomes of SARS-CoV-2, the virus that causes COVID-19 in order to help understand and control its spread. The term ‘sequencing’ means reading out the order of the 30,000 letters of the RNA genetic code making up the virus, which can either be A, U, C or G.
Each time the SARS-CoV-2 virus replicates in human cells it has the opportunity to accumulate small changes in its genome. Many of these changes will be silent and have no impact on the properties of the virus, but others can alter how it behaves. This characteristic of gradual genetic change is inevitable and common not only to all viruses but to all living things, although the speed with which viruses multiply means they change relatively fast.
This Explainer is a simple guide to using terms such as ‘mutation’, ‘variant’ and ‘strain’ when talking about the virus that causes COVID-19.
First the easy bit – when to use the word ‘strain’ when talking about the virus which causes COVID-19. The simple answer is never. There is no need to use strain to talk about the virus as there is only one strain of SARS-CoV-2. There are viruses with more than one strain, which differ substantially in terms of their biological properties and behaviour, but there are currently no clear dividing lines separating the SARS-CoV-2 virus into strains.
Mutation and variant
The words ‘mutation’ and ‘variant’ are sometimes used interchangeably, but they mean quite different things. A mutation is a single change in the genetic material of the virus (RNA in this case). A variant is the whole sequence of the virus (the genome), which may contain one or more mutations.
We can use an imaginary sequence to illustrate the difference between a mutation and a variant. Instead of 30,000 letters long, this imaginary sequence is 30 letters long (not enough to code for a virus but sufficient for explanatory purposes):
Imagine the mini sequence is copied, but the copying sometimes incorporates mistakes, shown in yellow:
The first change to happen is a C to a U, then this sequence changes by replacing an A with a G.
The original sequence, equivalent to ‘wild-type’ or ‘reference sequence’ is:
The change in the letter C to U is a single mutation, as is the change of A to G.
The following sequence is a variant that contains one mutation:
This sequence is a variant with two mutations:
Many viral mutations will either have no impact on virus behaviour or make the virus less competitive against other variants. The important mutations are those making the virus more competitive, for example, by increasing how quickly the virus spreads or the seriousness of the resulting illness.
Variants of concern
COG-UK, in partnership with the public health authorities, carefully track all interesting variants, and if there is clear evidence that the variant is causing problems then the variant is called a Variant of Concern. An example of a Variant of Concern is the cluster of viruses first identified in Kent that is sometimes referred to as the B.1.1.7 lineage.
The vast majority of variants are no more than normal variations in the virus (it would be very strange, in virus terms, if all copies of SARS-CoV-2 viruses around the world were identical in every one of their 30,000 RNA bases). At the time of writing, many thousands of unique combinations of viral mutations have been identified in the UK, but only three have been designated Variants of Concern.
Clade and lineage
Finally, the terms ‘clade’ and ‘lineage’ are heard less often in the media than ‘variant’ and ‘mutation’. Both terms are ways of grouping related virus sequences to generate order and answer particular questions.
Clade is used to describe relatively large, long-lived groups to help understand persisting trends. Examples would be the Nextstrain system which classifies SARS-CoV2 viruses into groups according to sequence similarities. The clades are named with a number representing the year of first identification followed by a letter, for example 20A is the first clade defined in 2020.
Lineage is used to describe groups that change more often than clades, and is more suited to describing sequences at the front edge of the pandemic. They are useful for outbreak investigations and for tracking relationships on a finer scale. The classification of lineages takes into account information about how the virus is spreading as well as genetic sequence data. The labelling system is based on a hierarchy and is responsible for the name “B.1.1.7”. In this case the lineage B.1 evolved and a subset became B.1.1, which again changed to give at least 7 lineages, one of which was B.1.1.7.
Coming to terms with viral evolution
At COG-UK we understand that talk of new strains, mutants and variants of SARS-CoV-2 can be confusing or even frightening. However, genetic change in viruses is a normal and inevitable process and the very reason that COG-UK began sequencing viruses early in the pandemic.
New mutations, variants and lineages come and go over time, and we need to keep using genomic sequencing to monitor for the minor fraction of variants that might behave differently when infecting people or encountering immune responses primed by vaccination. The good news from around the world is that whether through vaccinations, clinical treatments, or changes in behaviour, we have plenty of tools with which to tackle all variants of the virus.
COVID-19 Genomics UK (COG-UK)
The current COVID-19 pandemic, caused by SARS-CoV-2, represents a major threat to health. The COVID-19 Genomics UK (COG-UK) consortium has been created to deliver large-scale and rapid whole-genome virus sequencing to local NHS centres and the UK government.
Led by Professor Sharon Peacock of the University of Cambridge, COG-UK is made up of an innovative partnership of NHS organisations, the four Public Health Agencies of the UK, the Wellcome Sanger Institute and twelve academic partners providing sequencing and analysis capacity. A full list of collaborators can be found here. Professor Peacock is also on a part-time secondment to PHE as Director of Science, where she focuses on the development of pathogen sequencing through COG-UK.
COG-UK was established in April 2020 supported by £20 million funding from the COVID-19 rapid-research-response “fighting fund” from Her Majesty’s Treasury (established by Professor Chris Whitty and Sir Patrick Vallance), and administered by the National Institute for Health Research (NIHR), UK Research and Innovation (UKRI), and the Wellcome Sanger Institute. The consortium was also backed by the Department of Health and Social Care’s Testing Innovation Fund on 16 November 2020 to facilitate the genome sequencing capacity needed to meet the increasing number of COVID-19 cases in the UK over the winter period.