Abstract
Gene arrays and operons that encode functionally linked proteins form the most basic unit of transcriptional regulation in bacteria. Rules that govern the order and orientation of genes in these systems have been defined; however, these were based on a small set of genomes that may not be representative. The growing availability of large genomic datasets presents an opportunity to test these rules, to define the full range and diversity of these systems, and to understand their evolution. Here we present SLING, a tool to Search for LINked Genes by searching for a single functionally essential gene, along with its neighbours in a rule-defined proximity (https://github.com/ghoresh11/sling/wiki). Examining this subset of genes enables us to understand the basic diversity of these genetic systems in large datasets. We demonstrate the utility of SLING on a clinical collection of enteropathogenic Escherichia coli for two relevant operons: toxin antitoxin (TA) systems and RND efflux pumps. By examining the diversity of these systems, we gain insight on distinct classes of operons which present variable levels of prevalence and ability to be lost or gained. The importance of this analysis is not limited to TA systems and RND pumps, and can be expanded to understand the diversity of many other relevant gene arrays.
Original language | English |
---|---|
Article number | e128 |
Number of pages | 10 |
Journal | Nucleic Acids Research |
Volume | 46 |
Issue number | 21 |
Early online date | 16 Aug 2018 |
DOIs | |
Publication status | Published - 30 Nov 2018 |
Keywords
- genomics
- computational methods
- bacterial datasets