Scientific Puzzles Surrounding the Wuhan Novel Coronavirus
By Yuhong Dong
The sudden outbreak of the Wuhan Novel Coronavirus (2019-nCoV) has resulted in all of China’s Hubei Province and three major cities in Zhejiang Province being subjected to quarantine. Other nations are anxiously trying to get their people out of China, and restrictions are being placed on flights to China. Because this novel virus has an extremely high transmission speed (high R0) and a high fatality rate, it is posing a significant challenge to public health, not only in China, but around the world.
There are major gaps in our knowledge of the virus’s origin, duration of human-to-human transmission, and clinical management of those infected based on the current limited information coming from China. Nevertheless, the findings of those scientists who have recently published research papers about this virus are summarized below.
Lancet Article Reports Wuhan Virus Not Likely Caused by Natural Recombination
Most papers reported that the 2019-nCoV is only 88 percent related to the closest bat coronavirus, only 79 percent to SARS, and just 50 percent to MERS. Professor Roujian Lu from the China Key Laboratory of Biosafety, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, and his co-authors commented in a Jan. 30 paper in Lancet that “recombination is probably not the reason for emergence of this virus.”
A Jan. 27 2020, study by 5 Greek scientists analyzed the genetic relationships of 2019-nCoV and found that “the new coronavirus provides a new lineage for almost half of its genome, with no close genetic relationships to other viruses within the subgenus of sarbecovirus,” and has an unusual middle segment never seen before in any coronavirus. All this indicates that 2019-nCoV is a brand new type of coronavirus. The study’s authors rejected the original hypothesis that 2019-nCoV originated from random natural mutations between different coronaviruses. (Paraskevis et al 2020 BioRxiv) The article is a preprint made available through bioRxiv and has not been peer-reviewed.
Puzzles of the Wuhan Novel Coronavirus (Yuhong Dong)
Very High Genetic Identity in Patients Indicates a Recent Transmission to Humans
2019-nCoV is an RNA virus. RNA viruses have high natural mutation rates. The Lancet study by Lu et al. states: “As a typical RNA virus, the average evolutionary rate for coronaviruses is roughly 10-4 nucleotide substitutions per site per year, with mutations arising during every replication cycle. It is, therefore, striking that the sequences of 2019-nCoV from different patients described here were almost identical, with greater than 99.9% sequence identity. This finding suggests that 2019-nCoV originated from one source within a very short period and detected relatively rapidly.”
A Jan. 31 article by Jon Cohen in Science said: “The longer a virus circulates in a human population, the more time it has to develop mutations that differentiate strains in infected people, and given that the 2019-nCoV sequences analyzed to date differ from each other by seven nucleotides at most, this suggests it jumped into humans very recently. But it remains a mystery which animal spread the virus to humans.”
Bat or Huanan Market Source Is Not the Whole Story
Prof. Lu et. al. also discussed the natural host of the virus. An early hypothesis had been the virus had passed to humans from bats sold at Wuhan’s Huanan Seafood Market.
Lu et. al write: “First, the outbreak was first reported in late December 2019, when most bat species in Wuhan are hibernating. Second, no bats were sold or found at the Huanan seafood market, whereas various non-aquatic animals (including mammals) were available for purchase. Third, the sequence identity between 2019-nCoV and its close relatives bat-SL-CoVZC45 and bat-SL-CoVZXC21 was less than 90%. Hence, bat-SL-CoVZC45 and bat-SL-CoVZXC21 are not direct ancestors of 2019-nCoV.”
The authors point out that while the 2019-nCoV causing the Wuhan outbreak might have initially been hosted by bats, it may have been transmitted to humans via other as yet unknown mechanisms.
The Science article said: “Huanan marketplace played an early role in spreading 2019-nCoV, but whether it was the origin of the outbreak remains uncertain. Many of the initially confirmed 2019-nCoV cases—27 of the first 41 in one report, 26 of 47 in another—were connected to the Wuhan market, but up to 45%, including the earliest handful, were not. This raises the possibility that the initial jump into people happened elsewhere.”
Spike Protein Has 4 Precise Mutations Without Impacting Its Affinity for Human Receptor
Every virus must have a receptor to bind to human cells, can only live inside human cells, and must rely on human cells to replicate. Without these capabilities, viruses found circulating in blood or tissue fluids are easily cleared by the human immune system.
Viruses enter human cells via specific surface protein channels. The interaction of viral surface proteins binding to human cells is similar with how keys are used to open locks.
Previous studies have shown there are several receptors that different coronaviruses bind to, such as angiotensin-converting enzyme 2 (ACE2) for SARS-CoV. ACE2 receptors are abundantly present in human tissue, especially along the epithelial linings of lung and small intestines, provide routes of entry into cells for SARS-CoV.
According to Lu et al.’s Lancet paper, there is a structural similarity between the receptor-binding domains of SARS-CoV and 2019-nCoV. 2019-nCoV spike protein (S-protein) is responsible for binding to cell receptors and is crucial for viral targeting of host tissue. The molecular modelling data by Lu et. al. suggests that, despite the presence of amino acid mutations in the 2019-nCoV receptor-binding domain, 2019-nCoV might use the ACE2 receptor to gain entry into host cells.
On Jan. 21, 2020, Xintian Xu et al. from Key Laboratory of Molecular Virology and Immunology, Institute Pasteur of Shanghai, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Shanghai, China, published a paper entitled “Evolution of the novel coronavirus from the ongoing Wuhan outbreak and modeling of its spike protein for risk of human transmission” in SCIENCE CHINA Life Sciences. This paper provided a more precise analysis of the S-protein of Wuhan 2019-nCoV.
The S-protein was known to usually have the most variable amino acid sequences compared to other gene domains from coronavirus. However, despite considerable genetics distance between the Wuhan CoV and the human-infecting SARS-CoV, and the overall low homology of the Wuhan CoV S-protein to that of SARS-CoV, the Wuhan CoV S-protein had several patches of sequences in the receptor binding (RBD) domain with a high homology to that of SARS-CoV. The residues at positions 442, 472, 479, 487, and 491 in SARS-CoV S-protein were reported to be at receptor complex interface and considered critical for cross species and human-to-human transmission of SARS-CoV. So to our surprise, despite replacing four out of five important interface amino acid residues, the Wuhan CoV S-protein was found to have a significant binding affinity to human ACE2. The replacing residues at positions 442, 472, 479, and 487 in the Wuhan CoV S-protein did not alter the structural confirmation The Wuhan CoV S-protein and SARS-CoV S-protein shared an almost identical 3-D structure in the RBD domain, thus maintaining similar van der Waals and electrostatic properties in the interaction interface. Thus the Wuhan CoV is still able to pose a significant public health risk for human transmission via the S protein–ACE2 binding pathway.” (emphasis added)
We know already that the novel 2019-nCoV is a different virus than SARS. It is understood that S-protein is highly variable. It would be no surprise if the genetic sequence, protein structure, and even the function of 2019-nCoV’s S-protein is different than that of the SARS virus. But, how could this novel virus be so intelligent as to mutate precisely at selected sites while preserving its binding affinity to the human ACE2 receptor? How did the virus change just four amino acids of the S-protein? Did the virus know how to use Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) to make sure this would happen?
Stunning Finding: S-Protein Insertions From HIV
On Jan. 27, 2020, Prashant Pradhan et. al. from the Indian institute of Technology published a paper entitled “Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag,” which is currently being revised. The corresponding author of this paper, Professor Bishwajit Kundu, is specialized in protein genetic and genetic engineering and has published about 41 papers during the past 17 years on PubMed, including high-impact biomedical journals.
The authors found 4 insertions in the spike glycoprotein (S) which are unique to the 2019-nCoV and are not present in other coronaviruses. “Importantly, amino acid residues in all 4 inserts have identity or similarity to those of HIV-1 gp120 or HIV-1 Gag. Interestingly, despite the inserts being discontinuous on the primary amino acid sequence, 3D-modelling of the 2019-nCoV suggests that they converge to constitute the receptor binding site. The finding of 4 unique inserts in the 2019-nCoV, all of which have identity/similarity to amino acid residues in key structural proteins of HIV-1 is unlikely to be fortuitous in nature.” (emphasis added) author.
Pradhan et al. added, “To our surprise, these sequence insertions were not only absent in S-protein of SARS but were also not observed in any other member of the Coronaviridae family. This is startling as it is quite unlikely for a virus to have acquired such unique insertions naturally in a short duration of time.”
“Unexpectedly, all the insertions got aligned with Human immunodeficiency Virus-1 (HIV-1). Further analysis revealed that aligned sequences of HIV-1 with 2019-nCoV were derived from surface glycoprotein gp120 (amino acid sequence positions: 404-409, 462-467, 136-150) and from Gag protein (366-384 amino acid). Gag protein of HIV is involved in host membrane binding, packaging of the virus and for the formation of virus-like particles. Gp120 plays crucial role in recognizing the host cell by binding to the primary receptor CD4. This binding induces structural rearrangements in GP120, creating a high affinity binding site for a chemokine co-receptor like CXCR4 and/or CCR5.”
It is well known that CD4 cells are essential to human immunity and are the direct targets of the Human Immunodeficiency Virus or HIV. HIV attaches to CD4 cells, enters and infects them. The virus then turns each infected CD4 cell into a factory creating more HIV virus until eventually all CD4 cells are destroyed. People infected with HIV lose their immunity or defense system which is like a country losing the function of its army.
If we take a closer look at the 4 insertions of the S-protein in figure 3 (from Pradhan et. al. 2020 bioRxiv), they are all located on the binding surface of the protein, seemly designed to be able to bind to target cell receptor sites. Natural accidental mutation would be randomly distributed across the whole length of the S-protein. It is highly unlikely that all of these insertions would coincidentally be manifested on the binding site of the S-protein.
The article by Pradhan et. al. is a preprint made available through bioRxiv and has not been peer-reviewed.
bioRxiv reports: “This paper has been withdrawn by its authors. They intend to revise it in response to comments received from the research community on their technical approach and their interpretation of the results. If you have any questions, please contact the corresponding author.”