7 Jun 2021

Genomics in a pandemic – Shedding light on the invisible

Genomic sequencing is giving researchers unprecedented detail about the coronavirus pandemic in the UK. The speed and scale of genome sequencing for SARS-CoV-2 means that data can help identify variants of concern, and feed into public health responses. Experts from the COVID-19 Genomics UK Consortium discuss the huge genomic sequencing effort, variants, and viral fitness landscapes.

By Alison Cranage, Science Writer at the Wellcome Sanger Institute. Illustrations by Laura Olivares Boldú, Graphic Designer and Illustrator at the Wellcome Genome Campus.

Covid connections, a series of events organised by Wellcome Connecting Science, brings together experts from across the COVID-19 Genomics UK Consortium (COG-UK). Genomics has proven to be a vital tool in the response to the pandemic and data from genomic sequencing is helping public health agencies trace outbreaks and identify new variants of interest and concern.

COG-UK was formed in March 2020 and includes UK public health agencies, NHS organisations and academic institutions including the Wellcome Sanger Institute. The consortium work to sequence and analyse coronavirus genomes. It’s the largest sequencing effort ever undertaken for a viral pathogen, and to date, the consortium has sequenced over 490,000 SARS-CoV-2 genomes – around one fifth of the world’s publicly available sequences.

In the first event of the series, chaired by science writer Philip Ball, Verity Hill, Catherine Ludden, and Jeffery Barrett discussed the huge genomic sequencing effort, variants, and viral mutational landscapes.

You can read a summary of their talks below, or visit YouTube to watch a recording of the full event.

Please note that the talks took place on 12th May 2021, and so statistics and figures mentioned relate to the situation at the time.

To sign up for the next talks ‘when does a variant become a villain’, visit the covid connections website.

How are genomes sequenced?

Sequencing the genomes of coronavirus from infected patients is a huge logistical challenge. Dr Catherine Ludden, Director of Operations for COG-UK, oversees the flow of samples as they move from swabs to become the sequence data, which is reported back to public health agencies.

Over 120 diagnostic labs send coronavirus samples to 16 COG-UK sequencing centres. Samples also arrive from ONS surveys, vaccine studies, prisons and care homes.

Catherine: “There are so many people who deserve recognition. Everyone in the diagnostic laboratories, couriers, everyone in the sequencing labs, plus those who manage and analyse the data. I’d like to thank everyone involved. I think it’s something that the UK should be very proud of.”

Why sequence a genome?

When COG-UK was first set up, researchers saw the most use for genomic data in tracking outbreaks. Genome sequence data, together with other anonymised data – for example about people’s movements – can help determine if one virus is likely to be related to another, and so help to untangle the routes of transmission.

Verity Hill, a PhD student at the University of Edinburgh, described how this information is being used effectively in Scotland. “We’ve built a computational tool, called CIVET, where you can enter in the genome sequence and it will identify the cluster it’s part of. That can be very useful for working out whether a hospital outbreak is really an outbreak, or whether it’s lots of different introductions from the community, for example.”

Catherine highlighted the importance of good epidemiological data alongside genome data. “You can have two sequences that are identical – zero SNP’s [single nucleotide polymorphisms] apart. But you really need that epidemiological data to understand if they are part of an outbreak.” SARS-CoV-2, like all viruses, naturally mutates as it replicates in our bodies, though relatively slowly. Two indistinguishable sequences could be from different corners of the globe, with no connection at all. Researchers need to know more, like the time and place a sample came from, to be able to analyse outbreaks and patterns of transmission.

Genome data is also helping researchers understand the biology of the virus. And now, a major focus is on identifying variants.


With variants in the headlines, the event’s discussion quickly turned to them. Audience members first asked, “What are the variants we should be worried about?”

Dr Jeffrey Barrett, Director of the COVID-19 Genomics Initiative at the Wellcome Sanger Institute, explained that there are several properties that are assessed for each new variant that arises. The first is transmissibility – its ability to spread. The second is its ability to evade an immune response, either from a vaccine or a previous infection.

Jeff discussed the B.1.351 variant, first identified in South Africa. There was concern that it might be able to evade the immune response. Laboratory experiments consistently show that blood serum from vaccinated individuals doesn’t neutralize the B.1.351 virus as well as other variants. That might mean that the vaccines are slightly less effective at preventing infection. But, he stressed the key thing was whether the vaccines prevent people getting sick.

“What we’ve seen in real-world data, from Qatar and Israel where vaccination rates are high, is very reassuring. The vaccines work to prevent serious illness. And that’s the absolutely critical thing. There has not been any variant yet that has just fundamentally blasted straight through the vaccines.”

Jeff also discussed the B.1.1.7 variant, first identified in Kent in late 2020. Several different analyses, from Public Health England, COG-UK, the Sanger Institute and others, all pointed towards the fact that this variant was more transmissible than any previous ones.

“Until a few weeks ago [April/May 2021], 99 per cent of every new case in the UK was this variant [B.1.1.7] that had just completely taken over. And in fact, we saw the same pattern happening in country after country in Europe, in North America in other parts of the world. Once this variant arrived someplace it tended to spread very, very fast.”

“I would say the one that is now on people’s mind, and for which the book is still very much open, though, is the so-called B.1617.2. This is the one that was first seen in India. We are seeing it now in the UK, and it does seem to be growing pretty fast. ”

“Public Health England recently designated it a ‘variant of concern’ alongside the ones first seen in South Africa, Brazil, and Kent. And that was primarily on this transmissibility evidence. And so I think we have to watch and see how fast it will grow. Will it possibly grow even faster than the Kent variant? I don’t think we know yet.”

Jeff highlighted that scientists and public health officials are working hard to get this information, plus information on vaccine efficacy, as rapidly as possible.

He also described some of the difficulties in interpreting genomic data. He shared an example where you might see a rapid rise in a variant in the boroughs of Crawley and Hillingdon. It could, in theory, be a new variant that is more transmissible, spreading quickly in communities there. But there could be an alternative explanation. The boroughs are home to Heathrow and Gatwick airports, and those cases could be people in quarantine, inside hotel rooms. It may not be a variant causing an uncontrolled outbreak, it just happens that a lot of people arrived with the same variant at the same time.

Low case numbers also make interpretations difficult – factors including socio-economic status, housing density, levels of restrictions on people’s movements all need to be accounted for when assessing the transmissibility of a new variant.

“You have to feed in these other kinds of data to separate out what the virus is doing biologically from what the human community in which it’s circulating is doing,” said Jeff.

Verity: “We are working to understand how the virus spreads in space and time. That is what genomic epidemiology is, at heart.”

Fitness Landscapes

The discussion turned to fitness landscapes – the conditions under which the virus is evolving. How do new variants arise?

Verity described patterns of viral evolution. “This is part of the problem with this pandemic – we call it the fitness landscape. The fitness landscape of the virus is really, really complicated. We have a lot of questions to ask about why variants seem to be arising at once. And what impact does population immunity have on that? And what is the impact on varying levels of lockdown? Because you don’t have people mixing freely – you have them kind of chopped up in small populations. We have a lot of discussions about this. We haven’t yet come to a good conclusion of what is driving variant selection. So basically, it’s really complicated.”

The panel discussed the origins of B.1.1.7. This variant is very different from others in terms of its mutations – it has 23 in its genome. These were all identified at once – previously, new mutations had been seen at a rate of about two or three a month. It is possible that this variant evolved in a patient who was infected with the virus for a very long time, perhaps months.


Verity and Jeff discussed some of the individual mutations that have occurred in the coronavirus’s 30,000 base genome. Researchers watch out for mutations that affect the viral spike protein, which attaches to human cells and enables the virus to invade, as these are more likely to affect its functions. These, plus mutations in certain other areas of the genome, are on a ‘red list’ – though researchers are not certain how many of the mutations they’ve found affect the virus’s function. And yet some of the same individual mutations are occurring in different variants that have arisen in different parts of the world.

E501Y and E401K are two mutations that have popped up in a lot of distant places. Verity: “It suggests that the virus is always choosing the same selective pathways out of whatever pressure it’s under, which is interesting from an evolutionary point of view. Maybe the virus only has a limited box of tricks. But I suspect it’s more likely that these mutations are just the easiest route, because viruses are good at adapting.”


Constellations of mutations

A coronavirus variant is a version of the virus that has accumulated a specific set of mutations. It won’t just have a single mutation in its genome – but usually a cluster of several. Verity described these as ‘constellations’.

Jeff: “Because of the number of genomes we’ve sequenced, I think we’ve discovered fundamental aspects of viral biology. For example, I think this virus is teaching us that it is often combinations of mutations that are important. Lots of experts weren’t looking for that in November. In a space of just a few months I think we’ve learned something about the biology of this virus – and maybe of viruses or pathogens more broadly.”


Catherine summarised the situation, and current genomic surveillance in the UK.

“The more transmission there is, the more opportunity for the virus to mutate and for new variants to arise. So we need to reduce as much as possible. I think the most important thing that we can promote is the uptake of vaccines. We know that vaccines work to reduce cases. We’ve seen it here in the UK and we’ve seen it in other countries.”

“We’re in a situation now in the UK, with relatively low cases and high sequencing capacity, where nearly every positive SARS-CoV-2 sample received can be sequenced. It isn’t always possible – because the viral load in the sample may be too low (high CT), there may not be enough sample volume left over after the diagnostic test, or it’s not of sufficient quality, but if these factors are not encountered – it will be sequenced.”

“There are so many benefits to having this data at scale. We sequence as much as we can, and as fast as we can, to put these data into the hands of people who can then try to understand what the virus is doing, and make recommendations on how we might react.”

Find out more