Between 2020 and 2022, over 50,000 COVID-19 samples were collected for sequencing from the population of Northern Ireland through national swab tests. To avoid becoming overwhelmed by the large number of sequencing samples, the public health agency sought rapid assistance. We spoke to Professor David Simpson, from the Queen’s University Belfast School of Medicine, Dentistry and Biomedical Sciences, about how he and his team, supported by COG-UK, worked on the sequencing of these samples.
Samples taken from tests to detect the SARS-CoV-2 virus, from people living in Northern Ireland with suspected COVID-19, had the test material sent for whole genome sequencing when this proved positive. This allowed public health officials to better understand what the viral variants that were circulating. “We were in a unique position because we were essentially sequencing all of the samples from Northern Ireland as they came through our lab”, explains David, “we started to wonder what additional analysis we could perform from our extensive dataset.”
With the news at the time dominated by reporting on the rise of COVID-19 variants across the country, David and his team were interested in analysing their data to understand if new mutations were arising – or being imported from abroad – and if so, whether these were similar to mutations and variants being reported by other bodies in the UK and globally. “When we analysed our sequence data, rather than looking only at mutations associated with the dominant variants, such as Omicron or Delta, we focused our attention on new mutations that occurred with low frequency. They were the ones we were interested in.”
David and his team examined low-frequency mutations that were present in between 3-5% of the sequenced genomes, which may arise as single nucleotide variants (differences at the level of a single DNA building block, called a nucleotide). These single nucleotide variants may be acquired by way of viral evolution within a host, or as a result of genetic recombination (the process of exchanging genetic material) during co-infection with two variants. By mapping such mutations, it is possible to begin to understand the evolutionary pressures exerted on the virus during infection of a person, and potentially identify mutations that alter the behaviour of the virus in a significant manner. “That’s why we were particularly interested in analysing new low-frequency mutations as they could indicate new mutations that hadn’t been picked up yet or if co-infection was occurring in Northern Ireland”, explains David.
Typically, many studies don’t capture variants that occur in less than 5% of a single sequencing run. This is because, when analysing sequenced data for low-frequency mutations, it can become complicated to obtain accurate readings of variants that occur at such low frequencies. This was a challenge that David and his team knew they would need to overcome. “These readings have caused us to question if it’s a real variant as it’s quite typical for potential mutations to result from the low-level error rate inherent to most genome sequencing protocols”, states David. However, to ensure that the results were reliable, the team simulated the sequenced data that had known errors in them and put them through their bioinformatics pipeline to see if it would flag as an error rather than as a variant. “I hadn’t really thought about the value of doing these simulations until we got into this project. Simulations allow us to check how sensitively and how specifically we can detect genuine low-frequency variants as well as how reliable our detection system is,” elaborates David, “it wasn’t something we initially planned for, but has turned out to be quite useful in the long run.”
“Whole genome sequencing enabled us to shed light on the development of new variants with potential health impact. Both in terms of individual and mixed infection and whether they can lead to recombinant infections. Through whole genome sequencing, we were able to confirm findings of previous studies with enhanced sensitivity and detect more of these variants occurring.”
As we are still living through the COVID-19 pandemic and remain susceptible to future pandemics caused by other pathogens, it’s essential that we have tools and systems in place to help inform public health decisions. David believes these approaches and tools “will allow us to analyse equivalent sequence data for other viral infections or diseases”. David went on to state that undertaking such research is significant to our understanding of viral behaviour as “it helps to shed light on the development of new potential variants of concern, it helps to determine how much of a risk genetic variations arising within a host organism pose, and can help with the tracking of variant transmission.”
Analysing whole genome sequences of samples collected from a defined geographical region also offers plenty of possibilities for future research. In addition, David and his team have also collaborated with the Wellcome Sanger Institute and colleagues in the geography department at Queen’s University Belfast to develop a dashboard to show the geographic spread of different variants in Northern Ireland. He is excited at the prospect of publishing a paper on how the sequenced dataset compares to what can be seen in the wastewater in Northern Ireland. “These weren’t outputs that we initially envisaged from this research, but we can now see them turning into the next exciting routes of investigation.”
David hopes his research, once completed, can be used to narrow down a list of potentially troublesome genetic variants, to highlight variants that occur within the same timeframe, and to have those insights shared with the Public Health Agency. “I can see this helping with mapping and transmission of the virus within, for example, a hospital which obviously has public health implications.”