Viral evolution analysis and visualization based on a Shannons Entropy approach
Felipe Tambasco, Gerardo Martínez Jaunarena, Diego Simón, Gonzalo Moratorio, Maria Ines Fariello, Marco Vignuzzi, Federico Lecumberry
2018 Institut Pasteur International Network Symposium: Combating Resistance : Microbes and Vectors, Paris, France, 15-16 nov, - 2018
Research Group(s): Tratamiento de Imagenes (gti)
Department(s): Procesamiento de Señales
Download the publication :


Drug-resistant pathogens are currently considered as a major public health problem. Emergence of drug resistance can be monitored by deep sequencing over short periods of time, fine-tuning new methods, mathematically modeling the data and then detecting sites that are being subject to positive selection. Due to its high mutation rate and short generation time, viruses represent a great model to study this phenomena. As it is highly probable to find several alleles of a viral population in a random position of the genome just by chance, the consensus allele will appear with high frequency and several codons at low frequency. For each viral population traced, we will have a multidimensional array containing the codons' frequencies through temporal passages for each position of the viral genome. Three Coxsackie virus B3 (CVB3) variants differentiated by their mutational rate were tested in this work. We use Shannon's Entropy to represent codons' frequencies variability: entropy is close to zero in highly conservative positions and increases when several codons are present. Using this transformation the data dimensionality was significantly reduced without losing key information related with underlying evolutionary processes. Given that the 3D crystal structure of CVB3 was known, our approach allowed us to directly visualize the evolution of codons' variation through time. We mapped specific regions in the capsid where the virus tolerates this variation: sites with highest entropy values coded for aminoacids at the outer part of the capsid. Entropy was decomposed given its rate of temporal evolution into two processes: Leading and Random Variations, associated with the slower and faster changes through passages, respectively. Several statistical and machine learning analysis were applied to this data to clusterize sites in the genome based on their evolutionary behavior, and to differentiate among the three viral variants. Some of the outliers pinpointed by these methods were shown to be sites under selection by other authors. Altogether, we are testing new analysis tools and visualization methods for detecting relevant sites under ongoing selection in a rapid way. For example, to differentiate the evolution of viruses under a new environment, such as a new drug treatment.

Additional data


BibTex references

Descargar BibTex bibtex

Other publications in the database

» Gerardo Martínez Jaunarena
» Diego Simón
» Gonzalo Moratorio
» Maria Ines Fariello
» Federico Lecumberry