The host specificity of bacteriophages is determined by their receptor-binding proteins (RBPs), which mediate the initial contact with the receptor on the bacterial cell envelope. Klebsiella pneumoniae capsule, serving as receptor for many Klebsiella phages, is a crucial virulence factor with highly diverse sugar composition. This diversity correlates to a correspondingly high multitude of Klebsiella phage RBPs that contain a specific polysaccharide-depolymerizing domain. RBPs characterisation is essential for phage therapy development. Identifying RBPs in phage genomes using a manual approach is a laborious task. To streamline this process, many tools are being developed. DPO1, DePP2, and DepoScope3 are examples of machine learning tools (MLTs) specifically designed to identify depolymerase sequences in bacteriophage genomes. Our objective is to verify how relatable is the depolymerase prediction using MLTs compared to the manual search. This analysis started from a collection of proteins from Przondoviruses and Drulisviruses (with model phages being KP32 and KP34, respectively). These proteins have been analysed with DPO, DePP and DepoScope. Simultaneously, depolymerases were manually identified in these genomes, following the criteria described before4, using PHYRE25, AlphaFold6 and genome inspection. First, we searched the best threshold combination to gather all the manually selected depolymerases from the data of MLTs. Then, we analysed the false positives from each tool. Examined MLTs vary significantly in the numbers of false positives as well as successfully predicted depolymerases, indicating that they are very helpful as a first step in depolymerase search, but still require manual curation of the results.
Focus: Academisch