“We mustn’t spend a long time with a lot of virus circulating among a partially vaccinated population”

Above: Molecular epidemiologist Emma Hodcroft. Image by Oliver Hochstrasser. 

The pandemic is entering a ‘danger period’ where increasing but not total immunity encourages the spread of variants that can reinfect people or evade vaccines, Emma Hodcroft tells The Biologist

Feb 23rd 2021 

Emma Hodcroft is a molecular epidemiologist at the University of Bern studying the evolution of human pathogens. She is one of the co-developers of Nextstrain.org, a massive open source project tracking the evolutionary history of different bacterial and viral strains, and CoVariants.org, a new website tracking the spread of the most significant variants of SARS-CoV-2.

As part of The Biologist’s latest COVID Q&A on research to understand the virus, Hodcroft explains how epidemiologists identify and track significant variants, and why current conditions are just right for evasive new variants to spread and become dominant.

She tells Tom Ireland that the UK and Denmark are now leading the way in tracking the evolution of the virus, but for too long variants were able to emerge and spread unchecked.

Hi Emma. Can you explain first how scientists track SARS-CoV-2 variants around the world, given that most COVID tests just identify a few genes not whole genome sequences?

Samples from positive COVID tests are chosen to be taken for full sequencing, and how this is done from country to country is really variable. The best systems for monitoring variants randomly select a certain percentage of all tests from across the country for sequencing, like the UK is now doing. This gives you a representative overview of what's circulating in your country. Denmark is sampling something like 10% of all tests, but very few countries are operating on the same level as that. 

The next in line are countries that are doing some random sequencing but not in as high numbers, for example Switzerland probably falls into this category. We have some great partnerships with the testing companies where they can send a random selection of the PCR tests they do to be sequenced. But it's an academic effort, so it's professors and people who work at universities or public health groups who have just decided that sequencing is important. Depending on how they've managed to organise themselves, that may just mean that they're getting samples from their local hospital, or their local university testing group. So it may not be representative of the whole country.

Then of course we also have a lot of countries where sequence data is still not available or where we don't have enough sequences. As a result of the news about the new variants, we've had a lot of countries start to invest in getting more sequences done in the past couple months. But worldwide, there's still some huge gaps, where the number of sequences are just nowhere near representative of the size of the outbreak that they're having. This makes it very hard to know what's going on – we're sampling the ice cube on top of the iceberg.

Do you feel like Governments and public health organisations were prepared for the inevitable appearance of variants with worrying characteristics? Or have they only started to take it seriously after the first more transmissible variants rose to prominence?

I think in general governments were not so prepared for the appearance of such variants, no. Otherwise, we would have seen wider investment in genetic sequencing, which is the only way to identify and track these variants. Thankfully we are seeing this investment increasing now in many places, but we would have been better served if this was in place already.

When tracking viral evolution, are the basic principles the same as evolution in other organisms, just much faster?

Essentially, yes. Mutations arise randomly and what matters is if they rise to a noticeable level: do they become more prevalent or the most prevalent mutation that we see? And of course that depends on whether they offer an advantage in a particular environment.

I think that's something that's really worth talking about right now: it’s not that the 484 or 501 mutation is suddenly arising a lot. We've changed the evolutionary landscape – for example, at the beginning of the outbreak, there was no point in the virus being able to reinfect people because no one had been infected. Now, we might be in a scenario where to be able to reinfect people offers the virus an advantage, and so we start to see these mutations become more dominant instead of just disappearing back into the background.

So introducing vaccination into that landscape will introduce a new selection pressure, encouraging variants that can still spread within a vaccinated population?

Yes, potentially. I think the easiest way to explain this is with two extremes: early in the pandemic there were no vaccines and nobody had been infected. Because there's no vaccines and no one has immunity, there's no selection pressure for the changes that help the virus adapt to either of those things.

On the other end of the scale, hopefully in a year or so we'll have almost everyone vaccinated and because of that there will be almost no viral circulation. If there's no viral circulation the rate at which the virus can adapt really slows down. It doesn't have that opportunity to evolve as quickly if it can’t jump from person to person.

The danger period is here in the middle where we've got a population that's partially vaccinated, where there's a reward for the virus being able to infect those people with the vaccination. What we don't want to do is spend a lot of time in this middle period where we've got a lot of virus circulating and a partially vaccinated population. We want to move through that as quickly as possible.

We should vaccinate and keep the numbers of actual cases as low as possible. Then you can get to that really safe period on the other side as quickly as possible.

We hear a lot about the genes of the SARS-CoV-2 spike protein and how mutations in this region can affect both its ability to bind to our cells and the efficacy of vaccines. Are other areas of the genome mutating in ways that could be significant for the course of the pandemic, or is it really all about the binding site?

That's a really good question. I think that in a lot of ways we just don't know. We know the structure of the spike and we know a lot about how it can impact how the virus works; it can impact binding to the human ACE2 receptor, it can impact how well the body’s immune system recognises the virus.

For the other genes we very often just don't have a good idea of exactly what they do. So for example, we also see mutations in the N gene - the nucleocapsid. We have some idea of what this does, but we still don't understand how changes in this area might affect the virulence or the transmissibility of the virus, or other factors.

When we aren’t sure what a gene does, then it becomes really hard to figure out if mutations here will matter. Okay, they probably matter to the virus or else it wouldn't have the gene, but do they matter for us on a pandemic or on a clinical level?

There’s been a couple of early studies that suggested that maybe a truncation of a couple of these genes might impact the clinical outcome, but the numbers are often small, and again it's just really hard for us to test these hypotheses when often we don't even understand the mechanism by which they make a difference. So I think there's still unfortunately a lot of work to go as far as understanding mutations and other parts of the genome.

You’ve helped build tools for tracking the emerging variants of SARS-CoV-2, but obviously this is all reliant on how different countries collect data on the variants circulating in their population. If you could set up the 'perfect' global surveillance system for monitoring and tracking variants, what would that look like?

We actually have a really good framework already in global flu surveillance. With flu monitoring we have many, many countries around the world, of all different developmental and economic statuses, that have signed up to make sure that they generate at least a couple of hundred flu sequences a year from all corners of the Earth.

So really it’s just about trying to emulate this global sequencing capacity so that more countries are sending us a few hundred sequences every month. Then we have a better global picture of what's going on and what kind of variants are emerging in places where we just don't have that surveillance right now.

How comparable is the evolution of SARS-CoV-2 to the seasonal changes in flu?

With flu, we have some level of global immunity or resistance to it because it has been around for so long and almost everyone has had it at some point in their lives.

We've been living with flu for a long time, so we're in a kind of parallel race with each other. Of course there's going to be regional variation in which flu strains went around in which year, but globally there are a finite number of flu strains circulating in any year and so we have some picture of the global immunity landscape. This can help us see where flu has been and kind of where it's going – although we are not fortune tellers and can't predict it perfectly.

With COVID-19 we don't really know what direction it's come from or what direction it's going. We don’t know if it can make big leaps in fitness, or if we're now at a point where it's going to be inching along with mutations that marginally increase how well it can do. Where there has been big outbreaks or lots of vaccination, we don't know yet how coronavirus will respond to this – we don't know what tricks it has up its sleeve.

Even as a specialist biology journalist, I find the nomenclature and taxonomy of viral strains and variants really confusing. I’ve read other virologists say it’s a mess, and that the taxonomy is why variants end up being given misleading common names like the ‘Kent’ or ‘South African’ variant. What is your take on this? Can a new system be developed?

There's essentially two naming systems that scientists use: there's the PANGO lineage system (ie B.1.1.7) and there's the Nextstrain system (501Y.V1). However, in the media we hear others, for example government designations like VOC-202012/01 [a numbered 'variant of concern']. I will admit that that this can be confusing, especially when they're used kind of interchangeably and some people are using one and some people are using the other.

The two systems are actually fairly complimentary in that they have different goals. The PANGO lineage system’s aim is to give a name to many little parts of the tree, so you can have a lineage that might only have three or four sequences in it. When you're looking at local outbreaks it can be really useful to have these detailed labels that describe very specific outbreaks. Of course there's no way to give nice happy memorable names to 1000s of closely-related things.

With the Nextstrain system the idea is not to try and label every variant but to talk about the major evolutionary points in the tree, or a variable of concern. It’s a different system which is useful if you're talking about the larger pandemic evolution over time and the patterns you see in big datasets.

Reconciling those becomes difficult because they have different aims. But there are WHO meetings happening to try and figure out we can do this, at least for the variants of concern, because it does matter how we talk about things that are going to be in headlines for a long time.

People resort to calling things by the region they were identified in, as even we have in this conversation. We're really trying to stay away from that, but if we have unpronounceable names its understandable people will fall back on the geographic names. The challenge is, can we come up with a standardised name, that's easy to remember and easy to say and doesn't have a geographical association? It’s important that people who aren't intimately involved in science can get their heads around and identify these things. 

And what are some of the ideas floating around about how you would do that?

I don't know exactly what the proposals are at the moment, but from conversations that have been happening on Twitter I found out that there is research on this type of naming that is totally unconnected to the pandemic. There are some really useful algorithms and systems people have already set up to tackle this problem, which some drug companies use for naming new medicines, apparently.

You can even feed in some words or characteristics that are associated with the thing you're trying to name, for example that it is more virulent. The system will then come up with letter combinations that are not a real word but that you can pronounce and sound like they could be a word. And you can even try to ensure they are not all, English western-focused words.

Emma Hodcroft is a molecular epidemiologist at the University of Bern and a co-developer of the open source pathogen tracking projects Nextstrain.org and CoVariants.org

More information on global SARS-CoV-2 variation can be found on Nextstrain.org and CoVariants.org