History of COG-UK

A short history of the COG-UK consortium, by Professor Sharon Peacock

This short history of the development of the COG-UK consortium is dedicated to everyone who has contributed to its success. I want to offer my thanks and gratitude for your unlimited energy, enthusiasm, commitment and support.


The early days

The Chief Scientific Advisor to the UK Government, Sir Patrick Vallance, sought early involvement of the pathogen genomics community. The blueprint for what became the COVID-19 Genomics UK (COG-UK) consortium can be dated back to Wednesday 4th March, a day when 84 cases of confirmed COVID-19 were recorded in the UK. It started with a series of hurried phone calls between pathogen genome experts and enthusiasts for the application of genome sequencing to real-world problems. A follow-up email was circulated on Friday 6th March about a possible discussion meeting, by which time there were 159 recorded UK COVID-19 cases. The email resulted in around 20 people coming together on Wednesday 11th March to explore the application of genomics to SARS-CoV-2. The urgency with which these discussions took place was underscored by a quickening drumbeat provided by recorded UK cases, which had by then risen to 456. People who attended and contributed for the entire day were generous with their time during a period when many were already working excessive hours, even as COVID-19 was just becoming established in Europe. I am grateful to the Wellcome Trust for giving us the space and location to meet in London.

Most of the objectives and framework for COG-UK were negotiated by the end of the meeting. We heard ten flash talks on relevant topics that focused largely on methodologies and technologies, but the majority of our time was spent in discussion. We sought rapid consensus on a range of questions. For example, we debated relevant outcomes for public health-driven sequencing of COVID-19. We agreed on the individual components of an end-to-end pipeline for COVID-19 sequencing — from sample processing to use of data. We considered and discussed a variety of models for the application of COVID-19 sequencing (including centralised, distributed, and mixed models), before agreeing on the most suitable to adopt. We described in outline each separate component of the pipeline (the ‘what, where and who’) and then reconstructed the pathway and workflow (the ‘how’). Finally, after a discussion about governance, we agreed to put together a funding proposal.

The following few days were frantic and frazzled as we assembled our thoughts into a coherent written argument for the development of COG-UK, supported by a scientific proposal. The backdrop to this was a working life when many of us were receiving as many as 250 emails a day, together with a meetings schedule that spanned from dawn to dusk. We are by no means unique in having to bear such demands, and many readers will have experienced (and continue to experience) similar work pressures. But it is to the great credit of my colleagues that a high-quality proposal was produced within days. Throughout our initial meeting and subsequent discussions and writing, there was no doubt about what our early contribution should be towards the COVID-19 response. We were a research consortium that wanted to contribute to the UK response by providing genome data, associated sequencing methods and analysis tools that could be used to inform public health actions and policy decisions. Our confidence that such contributions would be of significant value was rooted in an extensive collective experience of applying genomic analyses to tackle other infectious disease threats, for example in the West African Ebola outbreak of 2014–2016.

There were doubters who made their views known to us and others, who felt we were wasting our time. Coronaviruses do not mutate as frequently as some other viruses including influenza and HIV. We were challenged (and we challenged ourselves), about whether there would be enough genetic variation generated in the viral population for sequencing to be worthwhile for outbreak detection and other purposes. Even now, there remains uncertainty over whether a virus will ever emerge that has a mutation (or combination of mutations) that leads to evasion of vaccines, an issue of growing potential importance as vaccination programmes are just beginning. Why bother getting ahead of the worry curve? We took the view that waiting until the worst happens, only to realise that one is totally unprepared, is not where we collectively wanted to find ourselves. We reasoned that under such circumstances, it would be better to be wrong and to have generated new tools and a great deal of data that, if not of direct relevance to the national COVID-19 response, would at least be of immense value to the scientific community and for innovation more generally. So, we pushed ahead with a sense of optimism, whilst continuing to reflect on the evidence and basis for any concerns and criticism raised.

A proposal for the development of COG-UK was circulated to senior UK government and funding body advisors by the close of Sunday 15th March. The scientific and logistical basis of our plans went through a process of rapid internal critical review, the consensus from which was that a genomics network in the UK was essential and important. Funding was awarded and the grant began on 1st April 2020. We are deeply grateful to our funders, the COVID-19 rapid-research-response “fighting fund” from Her Majesty’s Treasury established by Professor Chris Whitty and Sir Patrick Vallance and administered by the National Institute for Health Research (NIHR), the UK Research and Innovation (UKRI), and the Wellcome Sanger Institute, for providing around £20M to support the establishment of the consortium. COG-UK became affiliated to SAGE (Scientific Advisory Group for Emergencies) and we have been reporting key findings to them ever since, which are also released onto our website in real time.


COG-UK gets going

What happened next has been likened to assembling an aeroplane from its component parts while already in the process of taking off. Indeed, genome sequencing began at pace from the very beginning. Between the first meeting on the 11th of March and the first report sent to SAGE on the 23rd of March, at least 260 SARS-CoV-2 genomes had been sequenced. The varied pressures associated with moving at such a lightening pace will have been felt by all involved in the early days of COG-UK (and indeed others involved in the disparate parts of the global COVID-19 response) and are difficult to capture in the space available here. Needless to say, under normal circumstances, a £20M award to develop a major research consortium would be associated with 12 months or more of detailed planning and preparation. However, circumstances were anything but normal, and we had to develop relevant processes at the same time as beginning to deliver on our objectives. Since then, COG-UK has developed into an unprecedented national pathogen genomics network.

In the last eight months we have grown into a consortium of many hundreds of people. COG-UK supports 16 sequencing hubs that are distributed across the country and includes the four Public Health Agencies, and researchers from academic partners across the UK. Our academic partner institutions include (in alphabetic order) the University of Birmingham, University of Cambridge, Cardiff University, University of Edinburgh, University of Exeter, University of Glasgow, Imperial College London, University of Liverpool, University of Nottingham, Northumbria University, University of Oxford, University of Portsmouth, the Quadram Institute — Norwich, Queens University — Belfast, University of Sheffield, University College London, and the Wellcome Sanger Institute. A large number of other institutions and partners have also been essential to the COG-UK effort. The work involved has occupied hundreds of individuals in and out of laboratories across the country, many of whom have worked long hours and sacrificed time with loved ones, who themselves will often have had to bear a larger burden of household and childcare duties as a result, in my view making them a vital part of the COG-UK family too.

In terms of delivery, we sequence positive samples from people with COVID-19 that are tested through the so-called ‘pillar 1’ and ‘pillar 2’ diagnostic pathways. Pillar 1 is made up of hospital diagnostic laboratories, which are largely focused on testing samples from people who are unwell and present to hospitals. Pillar 2 includes the Lighthouse Lab ‘super-laboratories’ that receive swabs from people who are not in hospital, as well as samples from people in long-term care facilities. Broadly speaking, the pillar 1 samples are sequenced in our COG-UK regional laboratories, and pillar 2 samples are sent to the Wellcome Sanger Institute for sequencing.

Existing bioinformatic tools that analyse sequence data were simply not designed to cope with the volume of viral genomes about to be sequenced at the beginning of the pandemic. We were fortunate to have the necessary world-class expertise in the consortium to quickly develop tools and pipelines that were up to the task at hand. We were given rapid access to MRC CLIMB (Cloud Infrastructure for Microbial Bioinformatics) by researchers who were like-minded in achieving the goals of the consortium and were enthusiastic partners. This was an established and readily accessible cloud bioinformatic infrastructure capable of handling the volume of data and tools necessary. COVID-19 has demonstrated how research investment in infrastructure such as CLIMB can reap unintended benefits when circumstances change rapidly during pandemic emergencies. I was particularly pleased that the contribution made by CLIMB was recognised through an HPCwire award in November 2020. Consortium members have developed numerous analysis tools and sequencing methods specifically for COVID-19. These can be accessed and used by others, representing a major contribution towards the global effort to sequence, understand and control SARS-CoV-2.

While we purchased some sequencing equipment (of note, we are agnostic to sequencing technology and multiple platforms have been deployed across the consortium), and made some funding available for staff, the vast majority of the funding awarded to COG-UK has been used to meet the cost of sequencing. This is only right, but it is important to acknowledge that many COG-UK participants have worked on a voluntary basis. I also want to acknowledge the huge effort of hundreds of scientist volunteers across the Wellcome Sanger Institute, the central hub for our sequencing capability, and the regional sequencing sites that form the spokes. At the Sanger, samples tested in the Lighthouse Labs for the presence of SARS-COV-2 are placed into 96-well plates and contain both positive and negative samples. These are transported to the Sanger where plates are held in huge walk-in freezers, which are difficult working environments for their staff because they are cramped and cold. The Sanger have also embraced the enormous task of picking out positive samples from the hundreds of thousands of plates containing both positive and negative samples that are transferred to them from the Lighthouse Labs. Similar challenges have been met by COG-UK members across all of our regional sequencing sites, albeit at a somewhat lower sample throughput compared to the Sanger. Of course, many spokes have contributed not just in terms of sequencing but also in specific ways that reflect the diverse expertise of the COG-UK members at these sites, including the creation and maintenance of bioinformatic analysis tools, managing the CLIMB bioinformatic infrastructure, leading outbreak investigations and much more beside this. The scale of operations felt, and continues to feel, entirely unprecedented and awe-inspiring.

At the time of writing, COG-UK has sequenced more than 137,000 SARS-CoV-2 genomes. While the number of genomes resulting from all of this hard work is impressive in itself, some individuals still question the need being addressed by sequencing at such scale as compared to just sampling a random fraction. The simple answer is that only with a dataset on this scale is it possible to get the high resolution needed to make genome surveillance useful for investigating individual outbreaks; to look at the relatedness of the viruses within a health care institution, workplace, or community and compare to the transmission patterns in the surrounding areas to reveal the patterns linking individual cases and spot otherwise unidentifiable opportunities for intervention. We are also on the hunt for mutations associated with changes in the biology of the virus that could be associated with treatment or vaccine failure and alterations in disease severity.

Since our inception, we have overcome many logistical, technical, organisational and scientific hurdles. Just skimming the surface, these have included working out how to route samples from testing laboratories to COG-UK sequencing laboratories; how to store, pick and sequence the positive samples; how to stock, staff and supply the sequencing laboratories; how to handle and analyse the resulting data; and how to integrate sufficient metadata to ensure the genome data is relevant for tackling outbreaks. All the while moving faster and handling more samples month by month.

Looking back, COG-UK sprang into existence suddenly and occasionally awkwardly. It has been a process of rapid learning and adaptation as circumstances change on a week-to-week basis during a turbulent year. Holding together a consortium built from members largely volunteering their time and expertise to keep things moving forward can be extremely challenging. I want to recognise my management team, without whom we would not have stayed afloat. Numerous other colleagues have made major contributions to our success, in particular those who sit on the COG-UK steering, report and working groups. The host institution for the COG-UK award (University of Cambridge) also deserve acknowledgement for their enthusiastic support.


Reflecting on our progress

COG-UK consortium members have achieved a great deal. Prior to SARS-CoV-2, the largest previous dataset for real-time genomic viral epidemiology during an epidemic was ~1500 genomes from the West African Ebola outbreak, which were sequenced over the course of 2014–2016. By comparison, COG-UK surpassed this total within the first month and has continued to push viral genome surveillance on to an entirely different scale ever since.

COG-UK has led the development of analytical software to define viral lineages, and shares methods and data globally. We provide sequence data at scale to Public Health Agencies. Our data and tools have been used during the investigation of hundreds of outbreaks in the UK and are shared with the world. We continue to develop capabilities to track mutations of concern and provide expertise to the MHRA (Medicines and Healthcare products Regulatory Agency) Vaccine Expert Groups. We provide evidence to SAGE and numerous SAGE sub-groups. We have created strong links with other related core national studies, including the UK Coronavirus Immunology Consortium and the GenOMICC (Genetics of Mortality in Critical Care) study. We have written and published scores of scientific papers, which collectively provide one of the most prominent voices on SARS-CoV-2 genomics worldwide.


How did we achieve this success?

Some observers have asked how we have managed to develop and deliver COG-UK in such a short space of time, especially given the complexities involved. Mixing research with operational delivery can feel like mixing oil and water, yet we needed to draw together people from a wide range of different disciplines and perspectives to deliver on the consortium’s mission. Furthermore, we are living through a period of ongoing churn, with circumstances changing from month to month that reach all aspects of life, including political, scientific, operational and social.

I can boil our success down to three principles. First, we acted quickly and left extensive debate and discussion of finer points and details until later. We began with a strong governance framework and a series of working groups, but we did not begin with a perfectly crafted 200-page document that provided a detailed description of what was required before action was taken. Given the circumstances facing the UK, our approach required bold and decisive steps, despite the risks this entailed. The avoidance of lengthy discussion and debate may prove to be our undoing, but I doubt it. Second, we recognised the importance of an open science culture. We post details of all our methods, protocols and sequence data on open access portals. Consortium members make manuscripts available as preprints as soon as possible and publish final papers in open access formats. Similarly, our website reflects our outward-facing approach, and we provide as much information and analyses as possible. Third, it is important to recognise that COG-UK is a coalition of the willing. I lead by consensus. We deliver through consensus. And when we disagree, we try hard to work through our differences. We value each other. I cannot thank people enough.

There may well be management textbooks that lay out the principles and philosophy by which COG-UK has been established and shaped, but in truth we simply developed, implemented, managed and governed as quickly and efficiently as possible as we went along, based on a wish to do something of value. The extent of our success, the value of the work delivered by COG-UK as part of the COVID-19 response in the UK, and the relevance of rapid genome sequencing as part of COVID-19 control, will be for history to judge.


Next steps for the consortium

We have just been awarded a further £12.2M from the Department for Health and Social Care Testing Innovation Fund. This is set to take the work started by the consortium to the next stage. We are entering a period of transition. Our commitment to research will stay as strong and important to us as ever, and the original COG-UK remit will continue unabated. The world leading expertise of the consortium means that we are ideally placed to bring the best scientific enquiry to one of the largest single bodies of viral sequence data in the world. But we increasingly inhabit the space between our roots as a research network and service delivery. We are particularly well-placed to provide ongoing sample acquisition, sequencing and the infrastructure and tools required to support this, until such times that this becomes fully integrated into the operations of the relevant national Public Health Agencies.

The next few months will see us expanding our operations. Currently, we are sequencing around 8,000–10,000 SARS-CoV-2 genomes every week but aim to double this in the coming months. Strengthening our infrastructure will include further investment in our data repository powered and enabled through CLIMB, and the purchase of new equipment so that we can sequence more effectively. For example, we will invest in equipment at PHE (Public Health England) Colindale to support higher numbers of samples to be sequenced with a shorter turnaround time. We will also put more funding into staff since many members are volunteers, and this is not sustainable. Getting the leadership and governance right over time as we become increasingly delivery focused will also be of upmost importance.

We also need to reduce the time it takes from the point at which a swab is taken to the moment that genomic data is available for that sample. Comparison of the sequences of two viruses can help to unravel whether a cluster of people with COVID-19 were linked (or not), but this will only be meaningful if completed within a timescale that influences infection control and public health interventions.

We also want to increase our chances of early detection of mutations in the virus that could be important for human health, either because they may be associated with altered ability to cause, for example, more severe disease, or be less susceptible to vaccines. The sooner these can be detected and evaluated, the more prepared we will be.


Into the future

The latest funding award comes with support to develop a vision for the future. We need to ask ourselves: what does the UK want and need in relation to distributed pathogen sequencing? Who will deliver this? What else might it be used for?

From my perspective, the ideal outcome would be a future in which pathogen genomic research and public health service delivery are inseparable. Through the hard work of 2020, we have manufactured our very own golden ticket; we should spend this wisely as we plan the future of pathogen genomics in the UK.


COG-UK meets mount Olympus. Beginning in the bottom we have the hurried meeting leading to a proposal to government. This is followed by the lighthouse testing facilities. From there Hermes the Courier delivers samples to the laboratory and the pipeline begins in earnest, with laboratory staff preparing samples for sequencing and bioinformatics (with a little help from Hephaestus building the pipeline on the go). Then the data is sent up into ‘the cloud’ for analysis and finally the phylogenetic trees emerge at the top, with the ability to monitor and ultimately (we hope) control Covid-19. Image Credit: Alex Cagan


Meet our management team