Does SARS-CoV-2 reverse transcribe and integrate into our genome? — Deplatform Disease

The short version: A preprint has emerged claiming that there is evidence that SARS-CoV-2 is reverse transcribed and integrated into the human genome. None of the evidence it provides justifies such a conclusion, and it demonstrates a failure to understand fundamental aspects of coronavirus biology and frankly the limitations of the methods used to make that conclusion. Furthermore there even appears to be an attempt by the preprint authors to make their data more difficult to scrutinize because it is available only upon request and not included in the paper. Its findings, even if true (something I have significant doubts about), have no relevance for mRNA vaccines.

A preprint has recently surfaced and been seized upon as proof that SARS-CoV-2 is being reverse transcribed into our genome, and somehow the argument has been extended to be that this means that vaccines, in particular mRNA vaccines, against SARS-CoV-2, are unsafe. If you want a short read of the problems Marius Walter, postdoc in the Verdin lab summarized the implausibility of the paper’s claims here fairly comprehensively.

The crux of the argument per the preprint is based mainly on a few observations:

The claim that SARS-CoV-2 is being reverse transcribed into our genome is an extraordinary claim, and in science we have a saying: extraordinary claims require extraordinary evidence. Let me be very explicit here: by no stretch of the imagination does this paper provide any convincing evidence to support this idea, let alone something close to the Sagan standard.

First though, let’s discuss a bit on why this claim is extraordinary:

Firstly, the leap from “persistently positive PCR” to “reverse transcription and integration” is absolutely not justified. The idea that the only way RNA viruses can possibly cause persistent infection is by integrating into the host genome is false- this is in fact not even the only strategy that HIV, the virus probably best known for reverse transcription, uses to establish persistent infection. The persistence of a virus inside a host requires that the host’s immune system be unable to clear it. Some examples of how that may be accomplished other than reverse transcription:

I should point out that reverse transcription into the genome in any arrant cell alone would never be sufficient to result in a persistent infection because as long as it kept producing viral transcripts, the immune system would destroy the infected cell (unless the host had certain immunological defects).

Secondly, although deep profiling of the genome does appear to identify RNA viral sequences that are not from retroviruses, this is still an extraordinarily rare event (though that rarity is subject to the limitation that the virus in question would have to be able to infect germ cells; notably it has never been observed for coronaviruses despite the evidence of many other RNA viral genomes that have been found). So in short, the leap to “SARS-CoV-2 is routinely being reverse transcribed and integrating into our genome” with this foundation is a truly extraordinary claim, and the persistently positive PCRs are explainable by simpler mechanisms. To be clear, the assumption that someone with persistently positive PCRs has actual infectious virus in them is not necessarily correct- which to their credit, the authors do acknowledge. The nature of the replication of SARS-CoV-2 and other coronaviruses (something I will return to shortly, as it explains the next issue) means that viral RNA can persist for a prolonged amount of time within double-membraned vesicles that may not be readily accessible by the immune system or nucleases within the infected cell, or the replication could be occurring within the cell at such a low level that it’s not even lytic and thus these individuals are not infectious, which seems to account for at least some of the persistent positives (persistent positives in someone with significant immunocompromise should be treated with caution, however, as they would be expected to have difficulty clearing virus and thus a PCR’s pre-test probability of being positive is high). However, if YOU (the person reading this) have persistently positive PCR results, you should not make the assumption that they are artifacts from the method.

The other point has to do with the presence of chimeric sequences via RNA-seq analysis. RNA-seq is a method to analyze which genes are “on” within a particular cell by attempting to profile the RNA within. There are many variations, but in general it starts by taking primers that are complementary to a bunch of genes (so that the RNA transcripts stick to them) and then running a reverse transcription reaction. Here’s the key point: the reverse transcription reaction often undergoes template switching. What that means is the reverse transcriptase starts a reaction on one RNA, then pauses, and then wanders onto another RNA. The result is it makes a DNA which has a piece of the sequence from one RNA and then a piece of the sequence of the other RNA. After this is done, bioinformatics analysis is used to align the sequences to the genome and see which genes were “on.” In other words: THIS IS LITERALLY A PROCEDURE WHICH GENERATES CHIMERIC TRANSCRIPTS. If a cell is infected with SARS-CoV-2, some of those transcripts will have pieces of SARS-CoV-2’s genome on them which will result in… SARS-CoV-2/human (or whatever type of cell it is) chimeric sequences. If I decided to infect the cell with any other virus, I would expect to get chimeric transcripts of human/my favorite virus. To state it more bluntly, the findings of this paper are explainable entirely by artifactual findings that result from the nature of the method.

There is also a secondary explanation having to do with the biology of the coronaviruses themselves. Coronaviruses recombine and mutate very well because their RNA-dependent RNA polymerase (the machinery which replicates their genome) is also prone to template switching. In other words, a coronavirus RNA polymerase could start with copying a coronavirus gene, pause, then wander and pick up a host RNA and then copy over that to make a chimeric transcript.

This would actually be very easy to show. Our genes undergo splicing to remove sequences called introns so that only exons remain. If this were artifact, we would expect that essentially all of the chimeric sequences would contain exons (i.e. no introns) from the cell in question. So let’s do that. Except…

So basically, if you want to scrutinize the data, you have to ask the author for the findings. Why this would not be included in the supplementary data is a complete mystery to me.

I would also add that the paper never examines the cells for evidence of a complete SARS-CoV-2 genome, and thus even if we are to take its findings as being truthful and rigorous (which there is strong reason to suspect they are not), the fragments of SARS-CoV-2’s genome are not sufficient for pathogenesis or persistent infection.

This paper does not substantiate the claim that SARS-CoV-2 is being reverse transcribed and integrating into the genome, and seems to be totally unaware of what an extraordinary claim that is. The experiments it does are not a good representation of what may be going on inside a real human. People aren’t cell lines; cell lines have complex, multifaceted genomic differences from our cells (that’s how they get to be immortal). Overexpressing LINE-1 and then observing more reverse transcription does not support anything. LINE-1 RT levels inside the cell are low, and despite the abundance of LINE elements in the genome, there are only about 60 of them which are active (consider that LINEs account for ~21% of the human genome and there are an estimated 860,000 such elements within it). I could conceive that there could be a very rare event within the cell in which a LINE-1 sequence grabs the wrong RNA and traffics back into the nucleus and integrates- that’s sort of how LINE-1 elements work (though again- they are sequence-specific, and given the short lifetime of RNAs within the cell, the probability that any specific non-LINE RNA could be picked up by mistake is infinitesimal), but coronavirus replication occurs in replication transcription complexes (RTCs) that are segregated from the rest of the cytoplasm. I find it very hard to believe that a LINE-1 RT could access these RNA sequences and reverse transcribe them.

On the point of what this means for an mRNA vaccine: literally nothing. This paper has absolutely nothing to do with them. If you’re wondering what would happen if the RNA from a vaccine were accidentally picked up by this proposed mechanism and integrated into the host cell, any of the following scenarios:

I hope that gives you some appreciation for how incredibly hard successful gene therapy is.

This preprint makes conclusions that are not supported by its data, its findings are most readily explained by artifacts from the methods used, and it doesn’t consider key aspects of coronavirus biology that would also explain the results. I am unconvinced, and even if true, I have no concerns about what this would mean for an mRNA vaccine.


I write about vaccines here. You can find me on Twitter @enirenberg and at (where I publish the same content without a paywall)