In a recent study published on the bioRxiv* preprint server, researchers developed and validated an approach for the joint inference of measurement noise and genetic drift by analyzing time-series data of lineage frequencies.
Random genetic drift in the dynamics of infectious disease outbreaks at the population level results from the randomness of transmission between hosts and of host death or recovery. Studies have reported strong genetic drift in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences due to super-spreading events, which are predicted to influence the viral evolution and epidemiology of the 2019 coronavirus disease (COVID-19 ) will significantly affect. Noise due to the measurement process, including bias in acquiring location and time data, can distort estimates of genetic drift.
About the study
In the current study, researchers developed an approach to collectively derive the strength of measurement noise and genetic drift from time-varying lineage frequency data that allowed measurement noise to be over-spread (rather than maintain uniformity) and to vary the strength of over-dispersion over time ( instead of being constant). They also validated the approach accuracy through simulations.
HMM (Hidden Markov Modeling) was used with continuously occurring observed states and hidden states representing observed and true frequencies, respectively. The transition ability between hidden states was determined by genomic drift, where the mean true frequency was based on the true frequencies determined in the previous period. For rare frequencies, the variance correlated with the mean values based on effective population size [Ne
The emission possibility between the observed and hidden states was based on measurement noise so that the mean value of the observed frequencies was equal to the true frequencies. In the case of rare frequencies, the value of variance in observed frequencies correlated with the mean value indicating the time-dependent deviations from uniform-type sampling. Modeling was performed assuming that the number of individuals and frequencies of descent were high enough to apply the central limit theorem.
The model generated “superlines” by grouping lines based on phylogenetic distances so that the total value of the abundance and frequency of the lines exceeded the threshold, representing 486, 4083, 6,225 and 24,867 strains of SARS-CoV-2s pre-B yielded. 1.177, B.1.177, Alpha and Delta variants, respectively. The team assumed that the Ne
Next, the parameters most likely to represent the data set were determined. The model has been validated by running simulations with time-varying Ne
The derivative Ne
The strength of the genetic drift was consistently higher than estimated from the observed number of SARS-CoV-2 positive individuals in England by one to three orders of magnitude, over time, even after adjusting for measurement noise. The increased genetic drift could not be explained on the basis of superspreading, but may be partly explained by deme community structures in hosts’ contact networks. The discrepancy could not be explained by corrections for epidemiological dynamics (SIR or SEIR modeling).
The sample of SARS-CoV-2 infected individuals from the English population was largely uniform across the dataset. The team found evidence of a spatial arrangement in the transmission dynamics of the B.1.177 variant, the Alpha variant and the Delta variant. The estimated Ne
The HMM-derived Ne
Overall, the study results showed that the strength of the genetic drift in SARS-CoV-2 transmission in England was greater than estimated, and indicated that further modeling research methods are needed to understand the mechanisms behind the high genetic drift levels for SARS-CoV-2 better understand CoV. 2 in England.
bioRxiv publishes preliminary scientific reports that have not been peer-reviewed and therefore should not be considered conclusive, guiding clinical practice/health-related behavior, or established information.