Three tools, Grinch, CoVal and COG-UK-ME, are now available to help understand the genetic information produced by COG-UK about mutations in the SARS-CoV-2 genome.
The viral genomic dataset produced by COG-UK can be challenging to analyse due to its size and rapid change. The interpretation of variants, mutations and associated data such as geographical location is also a high-stakes endeavour with critical importance in informing public health policy.
This blog explores three tools, developed by COG-UK Consortium members and associates, designed to visualise, query and explore SARS-CoV2 mutation data. Each is freely available and has a number of pre-defined settings along with detailed drill-down options, enabling access by scientists and public health professionals to explore genome datasets.
The tools contain information on how output has been generated and tips for interpretation. Two key caveats are 1) the amount of viral variants displayed in each country is dependent on the amount of viral sequencing performed by that country; the UK is the obvious example of a high-throughput sequencing nation with correspondingly high volume genetic data; and 2) the most interesting data is often the most recent however these data may sometimes be skewed by a lag in data reporting.
Grinch provides a global overview and is the go-to daily early morning report for COG-UK Consortium Director, Professor Sharon Peacock. Priority development of the tool, carried out by Edinburgh University, began on Christmas Eve 2020 in response to the urgent need to understand and track the new B.1.1.7 variant first isolated in Kent, UK. Grinch’s name reflects “the lineages that stole Christmas” or more formally: “Global report investigating novel coronavirus haplotypes”. It is now well-placed to report on variants of interest as they arise.
As an example, the summary page of B.1.1.7 contains the following graph to illustrate how the variant has spread around the world over time, specifically colouring countries by the date the sequence was first spotted. The data shows early detection in Australia, a country which prioritises the tracking and reporting of viral sequence information.
There are many graphic goodies on the summary page, including airline passenger data permitting analysis of viral spread in terms of movement of people between countries.
CoVal, developed by Harwell’s Scientific Computing Department at Science and Technology Facilities Council is a tool that adds a three-dimensional context to the viral mutations. More than 300 SARS-Cov2 virus structures have been solved by cryo-electron microscopy which enables visualisation of the viral proteins and their assemblies. Mapping of mutation sites onto the structures of viral proteins helps to understand the effects of mutations on structure, function and molecular interactions. Using CoVal, the consequences of selected mutations can be explored, including, in some cases, interactions with the human receptor protein, antibodies and drug molecules. The resource is updated every two weeks with the latest data from GISAID and the Electron Microscopy Data Bank.
The screenshot below shows output for the E484K mutation – an amino acid change at position 484 on the viral spike protein. There are two applets which can be rotated in order to visualise the position and interaction of the selected mutation site. The applet on the right shows the zoomed-out view of the interactions between the receptor binding domain of the virus spike protein in green, and the larger human receptor protein in grey, with residue 484 highlighted in red.
COG-UK Mutational Explorer (ME)
The Mutational Explorer has been developed by the MRC-University of Glasgow Centre for Virus Research and is described in more detail in the accompanying COG-UK blog post. Like COVAL, COG-UK-ME displays information about mutations of interest using COG-UK genome sequence data stored on MRC-CLIMB. There are a selection of tables summarising mutation counts, with a focus on the spike protein of the virus. A clear bar-graph based visualiser allows the frequency of selected mutations to be interrogated over time.
One very useful feature is the integration of information about the immunological consequences of mutations. As of writing there are 450 entries, each corresponding to information about how an individual mutation may affect antibody binding. Data have been gathered from antibodies made naturally by patients and in response to vaccination, and from monoclonal, or lab-generated, antibodies. Searching tools allow mining of this important and fast-growing experimental data pool in order to help understand real-life consequences of viral mutations.
The screenshot below shows the easy-to-use visualiser for tracking mutations over time.
The direct and immediate release of SAR-CoV2 mutation data into the MRC-CLIMB database by the COG-UK Consortium provides researchers all over the world a rich resource for tool development and analysis. At COG-UK we are grateful to the developers of these tools who have put in long hours so we may better understand such detailed, fast-moving and high consequence information.
COVID-19 Genomics UK (COG-UK)
The current COVID-19 pandemic, caused by SARS-CoV-2, represents a major threat to health. The COVID-19 Genomics UK (COG-UK) consortium has been created to deliver large-scale and rapid whole-genome virus sequencing to local NHS centres and the UK government.
Led by Professor Sharon Peacock of the University of Cambridge, COG-UK is made up of an innovative partnership of NHS organisations, the four Public Health Agencies of the UK, the Wellcome Sanger Institute and twelve academic partners providing sequencing and analysis capacity. A full list of collaborators can be found here. Professor Peacock is also on a part-time secondment to PHE as Director of Science, where she focuses on the development of pathogen sequencing through COG-UK.
COG-UK was established in April 2020 supported by £20 million funding from the COVID-19 rapid-research-response “fighting fund” from Her Majesty’s Treasury (established by Professor Chris Whitty and Sir Patrick Vallance), and administered by the National Institute for Health Research (NIHR), UK Research and Innovation (UKRI), and the Wellcome Sanger Institute. The consortium was also backed by the Department of Health and Social Care’s Testing Innovation Fund on 16 November 2020 to facilitate the genome sequencing capacity needed to meet the increasing number of COVID-19 cases in the UK over the winter period.