The identification of pathogens is essential for effective surveillance and outbreak detection, which lately has been facilitated by the decreasing cost of whole-genome sequencing (WGS). However, extracting relevant virulence genes from WGS data remains a challenge. In this study, we developed a web-based tool to predict virulence-associated genes in enterotoxigenic
Escherichia coli
(ETEC), which is a major concern for human and animal health. The database includes genes encoding the heat-labile toxin (LT) (
eltA
and
eltB
), heat-stable toxin (ST) (
est
), colonization factors CS1 through 30, F4, F5, F6, F17, F18, and F41, as well as toxigenic invasion and adherence
loci
(
tia
,
tibAC
,
etpBAC
,
eatA
,
yghJ
, and
tleA
). To construct the database, we revised the existing ETEC nomenclature and used the VirulenceFinder webtool at the CGE website [
VirulenceFinder 2.0 (dtu.dk)
]. The database was tested on 1,083 preassembled ETEC genomes, two BioProjects (
PRJNA421191
with 305 and
PRJNA416134
with 134 sequences), and the ETEC reference genome H10407. In total, 455 new virulence gene alleles were added, 50 alleles were replaced or renamed, and two were removed. Overall, our tool has the potential to greatly facilitate ETEC identification and improve the accuracy of WGS analysis. It can also help identify potential new virulence genes in ETEC. The revised nomenclature and expanded gene repertoire provide a better understanding of the genetic diversity of ETEC. Additionally, the user-friendly interface makes it accessible to users with limited bioinformatics experience.
IMPORTANCE
Detecting colonization factors in enterotoxigenic
Escherichia coli
(ETEC) is challenging due to their large number, heterogeneity, and lack of standardized tests. Therefore, it is important to include these ETEC-related genes in a more comprehensive VirulenceFinder database in order to obtain a more complete coverage of the virulence gene repertoire of pathogenic types of
E. coli
. ETEC vaccines are of great importance due to the severity of the infections, primarily in children. A tool such as this could assist in the surveillance of ETEC in order to determine the prevalence of relevant types in different parts of the world, allowing vaccine developers to target the most prevalent types and, thus, a more effective vaccine.