Report

Report 4: 16th April 2020 – COVID-19 Genomics UK (COG-UK) Consortium

Please Note: This report is provided at the request of SAGE and includes information on the ongoing state of the research being carried out. It should not be considered formal or informal advice. The conclusions of the ongoing scientific studies may be subject to change as further evidence becomes available and as such any firm conclusions would be premature.

Executive Summary

  • COG-UK has nine active sequencing centres which combined have sequenced a total of 3202 SARSCoV-2 genomes, which accounts for over one third of the global total.
  • Progress is being made on approving and accessing metadata from patient electronic health records and the pipeline for moving samples from NHS labs to sequencing centres.
  • Two predominant long-established SARS-CoV-2 lineages (UK5 and UK17) can be distinguished in genome sequences from across the UK.
  • A preliminary phylodynamic model of SARS-CoV-2 infection in London has been used to infer that by 29th March ~278K individuals (1.9% of the population of the London metropolitan area) had been infected and that the reproductive number of the virus had reduced from 3.26 to 0.796.
  • An initial analysis of variation among primers and probes being used for diagnostic testing has revealed low frequency potential mismatches to SARS-CoV-2 lineage consensus sequences but at this stage cannot infer an effect of any of these mismatches on the effectiveness of the primer or probe.

 

COG-UK update

There are currently nine active COG-UK sites, with an additional four sequencing centres coming online this week, and a further three expected in due course.

By the data cut-off for this report, the total number of viral genomes now stands at 3202 (Table 1). To place this in context, since its inception, COG-UK has generated more genomes than reported by any other country, accounting for >1/3 of the total number of SARS-CoV-2 genomes reported globally (Table 2).

Progress has been made in receiving the necessary ethical approvals to access patient electronic health records and obtain the metadata. Work is now underway to prioritise those metadata fields most urgently needed to enable COG-UK analyses to provide actionable insights.

Work on improving communication between COG-UK and NHS laboratories to smooth the process for accessing samples is ongoing.

COG-UK is working to align with the national diagnostic testing strategy; the first batch of samples have been sent from the national testing centre in Milton Keynes to the Wellcome Sanger Institute and are undergoing evaluation preparatory to scale up.


 

Recommendations

  • Progress in access to samples and sequencing data means that COG-UK is now in a position to provide genomic insights into local and national transmission. The framework established for systematic genomic surveillance of most of the UK will help ensure inferences are robust. However, an urgent requirement for COG-UK is to link to contact tracing and detailed epidemiological data regardingpatients and key workers. This is the key priority of the COG-UK data and informatics working group.
  • A priority for the coming weeks will be to answer the question of whether we can observe a shift in transmission patterns from diverse imports to more homogenous geographical lineages circulating in the UK. This will provide a platform from which to monitor for introductions beginning again once the current restrictions have been relaxed.

 

Analysis updates

Current population structure of SARS-CoV-2 in UK

Public Health Questions

  1. Can we define specific viral lineages and if so have we mapped them at any level?
  2. In which global lineages do our isolates appear and which countries are these lineages associated with?
  3. Are there any obvious transmission hotspots anywhere or diminishing lineages?
  4. Does the distribution of lineages differ between population subgroups, and what are the implications of this?
  5. Are cryptic clusters identified and where are they occurring? (geographical and type of place; e.g. care, hospital etc.)
  6. Are currently identified epidemiological clusters supported by sequencing data?

Summary

Our analyses are able to map the duration, size, and geographic distribution of the SARS-CoV-2 lineages circulating in the UK and to reconstruct transmission chains. As can be seen in Figure S1 (Appendix 1), we can identify 85 lineages (with at least 5 viral genomes) that have been or are currently circulating in the UK. Many of these lineages are very likely no longer in circulation (as demonstrated by the lack of genomes from later time points) and as such can be considered extinct.

Data so far indicate that London and the South East have had more lineages introduced and in circulation than regions such as Scotland or Wales.

As can be seen in Figures S1 and S2 (See Appendix 1), the three most common lineages (by number of genome sequenced) in the data available to date are UK5, UK16 and UK17.

UK16 is the previously reported hospital outbreak in Wales. The high number of sequences reflects the intensity of sampling and not the size of the outbreak relative to other lineages. Notable is a single genome from the UK lineage sampled from an individual in Norfolk, a link that could be prioritized for epidemiological follow up (Figure 2).

The UK5 and UK17 lineages have been present in samples from across the UK since early March and likely represent the predominant long-established lineages currently in circulation (Figure 2). The wide distribution of these lineages is likely a result of both some transmission within the UK and multiple introductions of the same lineage (i.e. from a common holiday destination visited by individuals from across the UK).

Focusing on London (Figure 3), a notable number of lineages are or have been in circulation and identified in the genome sequences sampled to date. Incorporation of postcode and other epidemiological data will enable the within-city patterns of transmission and spread to be assessed.

It should be noted that there is no current evidence of different lineages being associated with any distinct biological or clinical characteristics (e.g. transmissibility, disease severity, drug resistance).

With the framework established for systematic genomic surveillance of most of the UK, we can begin to look for changing patterns in transmission of viral lineages. For
example, can we detect a shift from diverse imports to more homogenous geographical lineages circulating in the UK?

 

Phylodynamic insights into SARS-CoV-2 in the UK

Public Health Questions

  1. What is the estimated rate of epidemic growth, estimate of unreported cases and rate of sampling?

Summary

A preliminary phylodynamic model (lead by Dr Erik Volz on behalf of the MRC GIDA COVID-19 phylodynamics working group at Imperial College London) has been developed to estimate specific parameters of the SARS-CoV-2 epidemic by comparing genome sequence data from a region (a city or local health authority) with data from a larger national or international area.

The model is designed to account for non-linear epidemic dynamics within the region, a range of different lengths of incubation and infectious period, migration into and from the region, and variations in transmission rates.

The first analysis using this model looked at SARS-CoV-2 in London using data collected between 27th February and 29th March 2020. It compared 69 whole genomes sampled within London with 73 whole genomes sampled outside of London (from across the rest of the UK).

The analysis indicated that as of the 29th March, ~278K individuals had been cumulatively infected in London (median; range: 48,322 to 1,202,252; 95% confidence interval)

Assuming a 15 million population in the London metropolitan area, this corresponds with 1.9% of the London population having been infected cumulatively. (range: 0.32 to 8.00; 95% confidence interval).

There had been a total of 5957 confirmed infections reported in London by 29th March.

Estimates of the reproduction number for SARS-CoV-2 in London reduced from 3.26 (range 2.66 to 4.10; 95% confidence interval) in early February to 0.796 (range: 0.168 to 2.49; 95% confidence interval) on the 29th March.

The model does have limitations. The estimates provided should be viewed as preliminary and subject to revision as additional data become available. The validity of estimates rely on a strong assumption of random sampling, which could not be assessed with current data, and are also sensitive to assumptions made by the epidemiological model (c.f. detailed appendix).

The full report can be seen in Appendix 2 and details on methods and priors can be found here [http://whoinfectedwhom.org/seijr0.1.0_methods.pdf]

This model for London will be updated with more genomic information in the coming weeks, and applied to other regions in the UK.

 

SARS-CoV-2 variation affecting diagnostics and therapeutics

Public Health Questions

  1. Is there any evidence of genomic changes potentially affecting common diagnostic tests or even direct or indirect therapies?

a) Variation within SARS-CoV-2 Diagnostic Primers

Richard Myers (Public Health England)

Summary

The aim of the ongoing analysis is to continually assess the various diagnostic primers and probes available, to identify mismatches of their sequences to COG-UK genome sequence alignments. Primer/probe set names were identified using published GISAID information, and direct communication with relevant individuals. This analysis may not represent all primers/probes currently, or previously, used and work is being undertaken to better understand the diagnostic assays that are in use within the UK.

In the initial analysis, primer and probe sequences were aligned with the lineage-specific consensus sequences from the COG-UK genome sequence data (from 11/04/2020) and compared against a matrix of positional information to report the level of conservation across the primer/probe sites.

An overview of the results is given in the full report (See Appendix 3). The only primers that were found to have zero mismatches with any of the COG sequences analysed were nCoV_IP2-12669Fw and nCoV_IP4-14059Fw. All other primers and probes had at least one mismatch in the COG sequences, however these mismatches were usually detected at low frequency.

A range of potential mismatches to every set of primers/probes identified in this report. There were two primers which contained no mismatches (forward IP2 and IP4 primers). The only specific variation in UK sequences occurred using primer N3, in clade B_1_20. This clade contains 15 samples sequenced in England, 100% of the sequences contain a mismatch at position seven, G>T. In future investigations we hope to be able to do more work linking mismatches within sequences and to more detailed clade information (e.g country of sequencing).

There are limitations to this analysis. Whole genome sequencing errors cannot be excluded for low proportion mismatches as a result of RT-PCR, or sequencing errors. This analysis does not identify multiple mismatches in a primer sequence within the whole genome sequence of a single sample. This report does not attempt to infer the effect of any of these mismatches on the effectiveness of the primer or probe, only to report where mismatches are seen within the GOG alignments.

The full report can be seen in Appendix 3.

b) Drug target analysis:

No update this week.

Other analyses:

Introduction of new viral lineages to the UK

No update this week.

Nosocomial transmission of SARS-CoV-2 in the UK

No update this week.

Population mobility and SARS-CoV-2 transmission dynamics

No update this week; next report due on 23/04/20

Functional insights from SARS-CoV-2 Genomic data

No update this week.


 

Appendix 1

Figure S1 | Reconstructed UK transmission lineages with at least 5 viral genomes. Each row represents a single introduction and subsequent UK spread reconstructed from the virus genomes captured in the COG-UK sample in combination with global genome data. The diameter of the circles represent the number of virus genomes for the day of sampling.

Figure S2 | UK transmission lineages with at least 5 viral genomes. The bars denote the number of virus genomes in that lineage.

Figure S3 | Recent live view of data linked and visualised using Microreact – geographic and temporal patterns in lineages can be shared and investigated.


 

Appendix 2 – Phylodynamic Analysis

Location: London

Most recent sample: 29-03-2020

Primary author: Erik Volz

Report prepared: 12-04-2020

Authors:

Lily Geidelberg, Olivia Boyd, Manon Ragonnet, David Jorgensen, Igor Siveroni, Erik Volz, On behalf of the MRC GIDA COVID-19 response team at Imperial College

Background information

This is analysis is based on:

  • 69 whole genomes sampled from within London
  • 73 whole genomes sampled from outside of London
  • Samples within London were collected between 2020-02-27 and 2020-03-29

Model Assumptions and Limitations

These estimates are preliminary and subject to revision as new data become available. The analysis is based on the assumption that the 69 genomes from London were sampled uniformly at random from the infected population within London at each point in time that a sample was collected. The representativeness of this sample could not be assessed with metadata available at the time the analysis was carried out.

This analysis is based on a structured coalescent population genetic model which links genetic diversity in our sample, summarized in a phylogenetic tree, with an epidemic process. The epidemic process is described by a SEIR model. Both the population genetic model and the epidemiological model encode assumptions about how COVID 19 has spread in London, and the validity of these results depend on these assumptions being satisfied.

Important assumptions of the coalescent model:

  • Sampling rate may vary through time but at each sample collection time sampling is random.
  • There is ignorable genetic diversity within hosts.
  • The rate of migration of virus lineages is constant on a per lineage basis, and there is bidirectional migration between London and the international reservoir of virus. The international reservoir (population of virus lineages outside of London) is assumed to grow exponentially at a rate we estimate
  • Multiple simultaneous transmissions (point-source transmission events) are not accounted for
  • The model does not account for recombination
  • Important assumptions of the epidemiological model:
  • Incidence is a nonlinear function of time described by depletion of a susceptible host population. The size of the susceptible reservoir is an estimated parameter and does not reflect the actual number susceptible in London.
  • The serial interval is 6.5 days on average with a 5 day incubation period.
  • There is overdispersion in transmission rates corresponding to k approximately .2
  • The model is deterministic and may not accurately reflect dynamics early in the epidemic when stochastic effects dominate

How many are infected in London?

Figure 1a: Cumulative estimated infections through time. Points represent reported cases in the region.

Figure 1b: Cumulative estimated infections through time. Points represent reported cases in the region.

 

Estimated cumulative infections at last sample (2020-03-29): 278638 [48322-1202252] median [95%CI]

Cumulative %infected on March 29 assuming 15 million population in London metro area: 1.90 (95%CI: 0.32 – 8.00)

Cumulative confirmed infections reported at 2020-03-29: 5957

Figure 2a: Daily estimated infections through time. Points represent reported cases in the region.

Figure 2b: Daily estimated infections through time. Points represent reported cases in the region.

Figure 3: Reproduction number through time

 

Reproduction number at last sample (2020-03-29): 0.796 [0.168-2.49] median [95% CI]

How quickly has the epidemic in London grown?

Table 1: Reproduction number, growth rate and doubling times (continued below)

 

Molecular clock rate of evolution: 0.000893 [0.000747-0.0011] median [95% CrI]

Methods summary

Details on methods and priors can be found here.

Model version: seijr0.1.0

Report version: 20200412-131812-a5be3f65

Acknowledgements

This work was supported by the MRC Centre for Global Infectious Disease Analysis at Imperial College London.

Sequence data were provided by GISAID and the laboratories listed here.


 

Appendix 3 – COG Primer Analysis

15/04/2020

Summary

Introduction

The aim of this work is to continually assess the various primers and probes available, to identify mismatches of their sequences to the COG alignments. Primer/probe set names were found using published GISAID information, and other personal communication. This analysis may not represent all primers/probes currently, or previously, used.

In this initial analysis, the COG UK alignment file from 11.04.2020 was split into lineage specific alignments using the information encoded in the fasta header. Each alignment was converted into a consensus sequence representing the alignment and a matrix of the percentage of each base present at each position in the alignment. Primer and probe sequences in fasta format were reverse complemented, and aligned with the alignment consensus sequence. The best match (forward or reverse complement) was then compared against the matrix of positional information to report the level of conservation across the primer/probe sites.

Whole genome sequencing errors cannot be excluded for low proportion mismatches as a result of RT-PCR, or sequencing errors. This analysis does not identify multiple mismatches in a primer sequence within the whole genome sequence of a single sample. This report does not attempt to infer the effect of any of these mismatches on the effectiveness of the primer or probe, only to report where mismatches are seen within the GOG alignments.

An overview of the results is given here. The only primers that were found to have zero mismatches with any of the COG sequences analysed were nCoV_IP2-12669Fw and nCoV_IP4-14059Fw. All other primers and probes had at least one mismatch in the COG sequences.

Primer Mismatches

The figures contained within this report show any position within the given primer or probe where a mismatch is observed within the sample sequences. The proportion of each base observed at the relevant positions is shown in the bar (A=blue, C=orange, G=yellow, T=green). A bar is not displayed for any positions where 100% of sequences match the primer. The figures are split by clade and the number of sequences within each clade is displayed in the top right hand corner of the figure.

ChinaCDC_N

The ChinaCDC_N_F primer has six clades showing mismatches to the the first three nucleotides, GGG>AAC, with variation of this type seen in six clades. Progression through the B lineage shows the proportion of mismatches of these specific nucleotides increases in clades B_1_1, B_1_14 and B_1_15, where 100% of sequences contain the mismatch to the primer, although the number of sequences in these clades is much smaller. There are five other mismatches seen with the forward primer across seven clades, all at low proportions. The reverse primer contains seven different mismatches across eight clades, with the same mismatch occurring in clades B_1_11, B_2, and B_2_2, again, these are all at low proportions.

Mismatches observed compared to the forward (F) and reverse (R) Chinese CDC N gene primers.

The Chinese CDC N probe contains mismatches in two of the clades, with a different mismatch in each clade, in a fairly low proportion of sequences.

Mismatches observed compared to the Chinese CDC N gene probe.

ChinaCDC_ORF1ab

Clade B_1 and Clade B_2_1 both contained mismatches to the Chinese CDC ORF1ab F primer, at the 7th nucleotide G>T, at a low proportion. Clade B_1 also contains a very low proportion of sequences with C>T at nucleotide 17. The reverse primer has very low proportion mismatches at three different positions, in clades B_1 and B_2_1.

Mismatches observed compared to the forward (F) and reverse (R) Chinese CDC ORF1ab primers.

The probe sequences in this set contain mismatches at the 3’ end in a number of lineages in the B clade. Position 26 shows a mismatch (T>G) in 10 of the clades. The proportion of mismatches varies from 0.15% to 90%. Generally the very small proportions (<1%) are seen in larger clades. Clades B_1_15 and B_1_16 both show >90% mismatch at this position. Sequences within clade B_1_15 are mainly associated with Belgium but some from England and Wales are also present. Sequences in clade B_1_16 are also mainly associated with Belgium but sequences from Congo, Denmark and Portugal are also present.

Mismatches observed compared to the Chinses CDC ORF1ab gene probe.

nCoV_IP2

The forward primer contains no mismatches to any sequences. The probe contains low proportion mismatches in clade B_1, at positions six and 17. The reverse primer contains low proportion mismatches at three sites in clades A_2 and B_2_4 where 1.45% and 12.5%, of sequences respectively were mismatched at position two C>T.

Mismatches observed compared to the IP2 probe (P) and reverse primer (R).

nCoV_IP4

The probe sequence had two low proportion mismatches towards the 3’ end of the probe. The sequences show 5 mismatches to the reverse primer, from the position 12 to 20. All mismatches occur in less than 5% of sequences (clades B_1, B_1_1, B_2_1, B_4_1). Both mismatches in clade B_2_1 occur at the extreme 3’ end of the primer sequence.

Mismatches observed compared to the IP4 probe (P) and reverse primer (R).

HKU-N

In the forward primer. four mismatches occur at low proportion in three clades, A, B and B_1. Three mismatches occur in low proportion across clades A_1, A_2 and B_1. One of these mismatches, in third nucleotide, (G>T) occurs in clades A_2 and B_1 at low proportions. The reverse primer contains six different mismatches across nine clades. Three of the mismatches occur in more than one clade, but all are at low proportions. A mismatch occurs in one of the last two nucleotides in five of the clades (A_1, B_1, B_1_1, B_2, and B_2_1).

Mismatches observed compared to the forward (F) and reverse (R) primers for the Hong Kong University N gene.

Three low level mismatches were seen in the probe, in three clades. Clade A_1 contains a low level mismatch at the 12th nucleotide A>T. In Clade A_2 the third nucleotide has low level mismatches G>T , also seen in clade B_1. Clade B_1 has low level mismatches to the second from last nucleotide of the probe, G>T.

Mismatches observed compared to the Hong Kong University N gene probe.

Orf1b

In the forward primer, two mismatches are seen at very low proportion. The first in clade B at the R nucleotide, where most of the sequences contain an A, and a small proportion a C. The other mismatch is seen in clade B_1 and a small proportion of samples show a C instead of T at nucleotide 11.

Mismatches observed compared to the forward (F) and reverse (R) primers for the Hong Kong University ORF1b gene.

The probe show two different mismatches, again at low proportion. In clade A_1, nucleotide 15 T>C, and in clades B_1 and B_3_1 the 11th nucelotide has a small proportion of G>T. The reverse primer contains four different mismatches, all represented in clade B_1. All of these are at very low proportions, and are all either C>T or G>T. None of the mismatches occur at a frequency of above 5%.

Mismatches observed compared to the Hong Kong University ORF1b probe.

N1

The forward primer contains four different mismatches across four clusters. Two of these mismatches occur in two clades. All mismatches are at a low proportion of sequences. The reverse primer also contains several mismatches, The most obvious is clade B_2_4 containing 16 sequences, where the fourth nucleotide A is mismatched to a G in 93% of samples. Sequences in this clade are from Australia, New Zeland, Wales, and England. It is the only clade containing this mismatch. Other mismatches occur at low proportions, and include C>T/A at position 10 in clades A_1, B_1, B_1_15, B_2_1 and B_7. Three other mismatches are seen, in clade B T>C position 6, B_2 nucleotide 23 G>T, and B_1 nucleotide 18 A>T. All of these are at very low proportions.

Mismatches observed compared to the USA CDC forward (F) and reverse (R) primers for the N gene (version 1).

The probe sequence shows several mismatches. Clade B_6 shows a 100% mismatch nucleotide 3 of C>T. The origin of sequences from this clade includes Australia, Canada, England, Malaysia, Saudi Arabia and USA. Other clades also show some sequences containing this mismatch, including B_2 and A_2 at 1% and 11% respectively. Clade B_1 contains low level mismatches at position two of C>A/T, and two very low level variants (<0.1%) towards the end of the sequence of C>C and G>T. Clade B_1_11 contains a mismatch at position5 c>T at 0.9%. These three mismatches all occur in a run of four C nucleotides in the probe. Clade B_2_1 shows low proportion mismatches to the second to last nucleotide C>T. Clade A contains a mismatch of C>T at 0.8% at position 7, and clade A_1_2 contains the same mismatch at 7%, but this clade only contains 14 sequences.

Mismatches observed compared to the USA CDC N gene probe (version 3).

N2

The forward primer contains low proportion mismatches in clade A_2 and B_1 at position 16 G>T. Clade B_2_1 and B_1 contain mismatches at the fourth nucleotide, at C>T (low proportion), and finally B_1 contains one more mismatch at position eight, C>T.

The reverse primer contains five low proportion mismatches. Clades B and B_1 mismatch at the 15th nucleotide G>T. Clade B also contains a mismatch in the final nucleotide of the primer, C>T (0.15%). Clade B_1 also contains a mismatch in the second to last nucleotide at low levels C>T. Clade B_2_1 contains a mismatch at nucleotide six, C>T, and clade B_7 a mismatch at nucleotide eight G>A. All at low proportions.

Mismatches observed compared to the USA CDC forward (F) and reverse (R) primers for the N gene (version 2).

The probe sequence contains three different mismatches all at low proportions. Position 13 C>T occurs in sequences in clade A, B and B_1. This C is in a run of five C’s in the probe. In clade A_1 a very small number of sequences (0.4%) contain an A>T mismatch in the first nucleotide. The final mismatch observed is in clade B_1 and nucleotide eight, G>T (0.04%)

Mismatches observed compared to the USA CDC N gene probe (version 2).

N3

Three mismatches are observed across five clades for the forward primer. In clade B_1, most of the sequences show the same nucleotide as the primer sequence (T) at position 7 and only a small proportion are a C (0.037%). However, in clades B_4, B_4_1 and B_4_2, 3.22%, 0% and 0% respectively, show a T at this position.

In the reverse primer, there are three mismatches across 5 clades. At position 7 three clades show G>T mismatches. In clades B_1 and B_2 show a very small number of sequences with the change. However, in clade B_1_20 which contains 15 samples sequenced in England, 100% of the sequences contain a T at this position.

Mismatches observed compared to the USA CDC forward (F) and reverse (R) primers for the N gene (version 3).

The N3 probe shows low level mismatches at seven positions throughout the primer sequence. In clade B_8, 16.25% of sequences show a T at position 21 instead of a C in the probe sequence.

Mismatches observed compared to the USA CDC N gene probe (version 3).

E gene

For the forward primer, only one sequence in clade A_5 contains a mismatch at position 11 (A>G). No other sequences show any mismatches.

The reverse primer shows two mismatches across three clades. No clade contains more than one mismatch and the highest proportion of mismatches is in B_4_1 (5%) but this clade contains only 17 isolates.

Mismatches observed compared to the forward (F) and reverse (R) primers for the E gene.

In the probe there are 3 different mismatches across 4 clades with no clade containing more than 4% mismatches.

Mismatches observed compared to the E gene probe.

RdRP

In the forward primer, clades A_1 and B_1_5 show a mismatch at position 7 (G>T) at 0.19% and 4%, respectively. Clades B and B_1_11 show a C at position 6, again at low frequencies. Due to repetitive bases, it may be prudent to investigate potential mapping issues in this region when generating consensus sequences for samples. One other mismatch is observed in clade B_1 at base 12 (G>T) in a single sequence.

Mismatches observed compared to the RdRP gene forward primer.

The reverse primer contains a single mismatch (S>T) at position 13 in all clades at 100% frequency.

Mismatches observed compared to the RdRP gene reverse primer.

Probe 1 contains two high proportion mismatches across all 49 clades at positions 10 and 19. For the mismatch at position 10 the probe contains R but all but 2 clades contain a C at 100% frequency. Clades A and B_2_5 also contain a T at this position at 0.81% and 7.41% respectively. All sample in all clades show an A at position 19 where the probe contains a T. Clade B also shows a mismatch (G) at position 7 in 0.15% sequences.

Mismatches observed compared to the RdRP gene probe 1.

Probe 2 contains two mismatches across 3 clades at low proportions. However, these mismatches are in a nucleotide repeat region which may be prone to errors in WGS and mapping.

Mismatches observed compared to the RdRP gene probe 2.

RP

The frequency of mismatches for both the forward and reverse primers as well as the probe is high across the whole sequence and among all clades. Due to the high volume of mismatches, these will not be detailed further.

Mismatches observed compared to the RP gene forward primer.

Mismatches observed compared to the RP gene reverse primer.

Mismatches observed compared to the RP gene probe.

Conclusion

This initial analysis identified a range of potential mismatches to every set of primers/probes identified in this report. There were two primers which contained no mismatches (forward IP2 and IP4 primers). The only specific variation in UK sequences occurred using primer N3, in clade B_1_20. This clade contains 15 samples sequenced in England, 100% of the sequences contain a mismatch at position seven, G>T. In future investigations we hope to be able to do more work linking mismatches within sequences and to more detailed clade information (e.g country of sequencing). Once again, low proportion mismatches may be due to WGS or RT-PCR errors and no inferences are made as to the  effect the mismatches will have on the probe efficiency.

Appendix 1

The table below gives a list of primers and probes with the corresponding sequence.

Primer and probe sequences included in this analysis

Name
Sequence
RP.R
ACTTGTGGAGACAGCCGCTC
RP.F
AGATTTGGACCTGCGAGCG
RP.P
CGCGCAGAGCCTTCAGGTCAGAA
RdRP_SARSr.P1
CCAGGTGGWACRTCATCMGGTGATGC
RdRP_SARSr.R1
TATGCTAATAGTGTSTTTAACATYTG
RdRP_SARSr.P2
CAGGTGGAACCTCATCAGGAGATGC
RdRP_SARSr.F2
GTGARATGGTCATGTGTGGCGG
E_Sarbeco_F1
ACAGGTACGTTAATAGTTAATAGCGT
E_Sarbeco_R2
TGTGTGCGTACTGCTGCAATAT
E_Sarbeco_P1
ACACTAGCCATCCTTACTGCGCTTCG
N1_USA_CDC_2019.nCoV_N1.P
ACCCCGCATTACGTTTGGTGGACC
N1_USA_CDC_2019.nCoV_N1.R
CAGATTCAACTGGCAGTAACCAGA
N1_USA_CDC_2019.nCoV_N1.F
GACCCCAAAATCAGCGAAAT
N2_USA_CDC_2019.nCoV_N2.R
TTCTTCGGAATGTCGCGC
N2_USA_CDC_2019.nCoV_N2.P
ACAATTTGCCCCCAGCGCTTCAG
N2_USA_CDC_2019.nCoV_N2.F
TTACAAACATTGGCCGCAAA
N3_USA_CDC_2019.nCoV_N3.R
CAATGCTGCAATCGTGCTACA
N3_USA_CDC_2019.nCoV_N3.P
AYCACATTGGCACCCGCAATCCTG
N3_USA_CDC_2019.nCoV_N3.F
GGGAGCCTTGAATACACCAAAA
HKU.N_F
TAATCAGACAAGGAACTGATTA
HKU.N_
R
CATGGAAGTCACACCTTCG
HKU.N_P
CCGCAAATTGCACAATTTGC
HKU.ORF1b.nsp141P
TAGTTGTGATGCWATCATGACTAG
HKU.ORF1b.nsp141R
GAGTGCTTTGTTAAGCGYGTT
HKU.ORF1b.nsp14F
TGGGGYTTTACRGGTAACCT
nCoV_IP2.12669Fw
ATGAGCTTAGTCCTGTTG
nCoV_IP2.12696bProbe
AGATGTCTTGTGCTGCCGGTA
nCoV_IP2.12759Rv
ACAACACAACAAAGGGAG
nCoV_IP4.14146Rv
CCTATATTAACCTTGACCAG
nCoV_IP4.14084Probe
TCATACAAACCACGCCAGG
nCoV_IP4.14059Fw
GGTAACTGGTATGATTTCG
ChinaCDC_ORF1ab_F
CCCTGTGGGTTTTACACTTAA
ChinaCDC_ORF1ab_R
TCAGCTGATGCACAATCGT
ChinaCDC_ORF1ab_P
CCGTCTGCGGTATGTGGAAAGGTTATGG
ChinaCDC_N_R
CAGCTTGAGAGCAAAATGTCTG
ChinaCDC_N_P
TTGCTGCTGCTTGACAGATT
ChinaCDC_N_F
GGGGAACTTCTCCTGCTAGAAT


 

Download a PDF of this Report