This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.


16 Jan 2023

The use of whole genome sequencing for SARS-CoV-2 variant monitoring

Between 2020 and 2022, over 50,000 COVID-19 samples were collected for sequencing from the population of Northern Ireland through national swab tests. To avoid becoming overwhelmed by the large number of sequencing samples, the public health agency sought rapid assistance. We spoke to Professor David Simpson, from the Queen’s University Belfast School of Medicine, Dentistry and Biomedical Sciences, about how he and his team, supported by COG-UK, worked on the sequencing of these samples.

Samples taken from tests to detect the SARS-CoV-2 virus, from people living in Northern Ireland with suspected COVID-19, had the test material sent for whole genome sequencing when this proved positive. This allowed public health officials to better understand what the viral variants that were circulating.  “We were in a unique position because we were essentially sequencing all of the samples from Northern Ireland as they came through our lab”, explains David, “we started to wonder what additional analysis we could perform from our extensive dataset.”

With the news at the time dominated by reporting on the rise of COVID-19 variants across the country, David and his team were interested in analysing their data to understand if new mutations were arising – or being imported from abroad – and if so, whether these were similar to mutations and variants being reported by other bodies in the UK and globally. “When we analysed our sequence data, rather than looking only at mutations associated with the dominant variants, such as Omicron or Delta, we focused our attention on new mutations that occurred with low frequency. They were the ones we were interested in.”

David and his team examined low-frequency mutations that were present in between 3-5% of the sequenced genomes, which may arise as single nucleotide variants (differences at the level of a single DNA building block, called a nucleotide). These single nucleotide variants may be acquired by way of viral evolution within a host, or as a result of genetic recombination (the process of exchanging genetic material) during co-infection with two variants. By mapping such mutations, it is possible to begin to understand the evolutionary pressures exerted on the virus during infection of a person, and potentially identify mutations that alter the behaviour of the virus in a significant manner. “That’s why we were particularly interested in analysing new low-frequency mutations as they could indicate new mutations that hadn’t been picked up yet or if co-infection was occurring in Northern Ireland”, explains David.

Typically, many studies don’t capture variants that occur in less than 5% of a single sequencing run. This is because, when analysing sequenced data for low-frequency mutations, it can become complicated to obtain accurate readings of variants that occur at such low frequencies. This was a challenge that David and his team knew they would need to overcome. “These readings have caused us to question if it’s a real variant as it’s quite typical for potential mutations to result from the low-level error rate inherent to most genome sequencing protocols”, states David. However, to ensure that the results were reliable, the team simulated the sequenced data that had known errors in them and put them through their bioinformatics pipeline to see if it would flag as an error rather than as a variant. “I hadn’t really thought about the value of doing these simulations until we got into this project. Simulations allow us to check how sensitively and how specifically we can detect genuine low-frequency variants as well as how reliable our detection system is,” elaborates David, “it wasn’t something we initially planned for, but has turned out to be quite useful in the long run.”

“Whole genome sequencing enabled us to shed light on the development of new variants with potential health impact. Both in terms of individual and mixed infection and whether they can lead to recombinant infections. Through whole genome sequencing, we were able to confirm findings of previous studies with enhanced sensitivity and detect more of these variants occurring.”

As we are still living through the COVID-19 pandemic and remain susceptible to future pandemics caused by other pathogens, it’s essential that we have tools and systems in place to help inform public health decisions. David believes these approaches and tools “will allow us to analyse equivalent sequence data for other viral infections or diseases”. David went on to state that undertaking such research is significant to our understanding of viral behaviour as “it helps to shed light on the development of new potential variants of concern, it helps to determine how much of a risk genetic variations arising within a host organism pose, and can help with the tracking of variant transmission.”

Analysing whole genome sequences of samples collected from a defined geographical region also offers plenty of possibilities for future research. In addition, David and his team have also collaborated with the Wellcome Sanger Institute and colleagues in the geography department at Queen’s University Belfast to develop a dashboard to show the geographic spread of different variants in Northern Ireland. He is excited at the prospect of publishing a paper on how the sequenced dataset compares to what can be seen in the wastewater in Northern Ireland. “These weren’t outputs that we initially envisaged from this research, but we can now see them turning into the next exciting routes of investigation.”

David hopes his research, once completed, can be used to narrow down a list of potentially troublesome genetic variants, to highlight variants that occur within the same timeframe, and to have those insights shared with the Public Health Agency. “I can see this helping with mapping and transmission of the virus within, for example, a hospital which obviously has public health implications.”

COVID-19 Genomics UK (COG-UK)

The COVID-19 Genomics UK (COG-UK) consortium works in partnership to harness the power of SARS-CoV-2 genomics in the fight against COVID-19.

Led by Professor Sharon Peacock of the University of Cambridge, COG-UK is made up of an innovative collaboration of NHS organisations, the four public health agencies of the UK, the Wellcome Sanger Institute and sixteen academic partners. A full list of collaborators can be found here.

The COVID-19 pandemic, caused by SARS-CoV-2, represents a major threat to health. The COG-UK consortium was formed in March 2020 to deliver SARS-CoV-2 genome sequencing and analysis to inform public health policy and to support the establishment of a national pathogen sequencing service, with sequence data now predominantly generated by the Wellcome Sanger Institute and the Public Health Agencies.

SARS-CoV-2 genome sequencing and analysis plays a key role in the COVID-19 public health response by enabling the identification, tracking and analysis of variants of concern, and by informing the design of vaccines and therapeutics. COG-UK works collaboratively to deliver world-class research on pathogen sequencing and analysis, maximise the value of genomic data by ensuring fair access and data linkage, and provide a training programme to enable equity in global sequencing.