TY - JOUR
T1 - Different evolutionary trends form the twilight zone of the bacterial pan-genome
AU - Horesh, Gal
AU - Taylor-Brown, Alyce
AU - McGimpsey, Stephanie
AU - Lassalle, Florent
AU - Corander, Jukka
AU - Heinz, Eva
AU - Thomson, Nicholas R.
PY - 2021/11/1
Y1 - 2021/11/1
N2 - The pan-genome is defined as the combined set of all genes in the gene pool of a species. Pan-genome analyses have been very useful in helping to understand different evolutionary dynamics of bacterial species: an open pan-genome often indicates a free-living lifestyle with metabolic versatility, while closed pan-genomes are linked to host-restricted, ecologically specialized bacteria. A detailed understanding of the species pan-genome has also been instrumental in tracking the phylodynamics of emerging drug resistance mechanisms and drug-resistant pathogens. However, current approaches to analyse a species’ pan-genome do not take the species population structure into account, nor do they account for the uneven sampling of different lineages, as is commonplace due to over-sampling of clinically relevant representatives. Here we present the application of a population structure-aware approach for classify-ing genes in a pan-genome based on within-species distribution. We demonstrate our approach on a collection of 7500 Escherichia coli genomes, one of the most-studied bacterial species and used as a model for an open pan-genome. We reveal clearly distinct groups of genes, clustered by different underlying evolutionary dynamics, and provide a more biologically informed and accurate description of the species’ pan-genome.
AB - The pan-genome is defined as the combined set of all genes in the gene pool of a species. Pan-genome analyses have been very useful in helping to understand different evolutionary dynamics of bacterial species: an open pan-genome often indicates a free-living lifestyle with metabolic versatility, while closed pan-genomes are linked to host-restricted, ecologically specialized bacteria. A detailed understanding of the species pan-genome has also been instrumental in tracking the phylodynamics of emerging drug resistance mechanisms and drug-resistant pathogens. However, current approaches to analyse a species’ pan-genome do not take the species population structure into account, nor do they account for the uneven sampling of different lineages, as is commonplace due to over-sampling of clinically relevant representatives. Here we present the application of a population structure-aware approach for classify-ing genes in a pan-genome based on within-species distribution. We demonstrate our approach on a collection of 7500 Escherichia coli genomes, one of the most-studied bacterial species and used as a model for an open pan-genome. We reveal clearly distinct groups of genes, clustered by different underlying evolutionary dynamics, and provide a more biologically informed and accurate description of the species’ pan-genome.
KW - E. coli
KW - Evolutionary dynamics
KW - HGT
KW - Pan-genome
UR - http://www.scopus.com/inward/record.url?scp=85117537499&partnerID=8YFLogxK
UR - http://t https://github.com/ghoresh11/twilight/tree/ master/manuscript_scripts
U2 - 10.1099/MGEN.0.000670
DO - 10.1099/MGEN.0.000670
M3 - Article
C2 - 34559043
AN - SCOPUS:85117537499
SN - 2057-5858
VL - 7
JO - Microbial Genomics
JF - Microbial Genomics
IS - 9
M1 - 000670
ER -