Abstract
Background Cancer immunotherapies have become an effective tool by manipulating cellular immune response to fight cancer. A key step of cellular immunity involves peptides binding to the Human Leukocyte Antigen (HLA) molecules and forming a stable peptide-HLA (pHLA) complex. pHLAs are transported to the cell surface where they can be ‘inspected’ by T-cells, through highly specific T-cell receptors (TCRs). Understanding such interactions is central to designing peptide-based vaccines and T-cell-based immunotherapies. Key limitations to widespread use of cancer immunotherapies includes lack of immunogenicity of tumor-associated peptide-antigens, and risk of off-target causing immune-related adverse events. Different approaches are being explored to address and minimize this issue. Therefore, accurate prediction of pHLA immunogenicity is a critical area for advancing the design of cancer immunotherapies. However, current immunogenicity prediction tools are limited to identifying peptide motifs, neglecting critical structural features and TCR-specific recognition. We developed a structure-based machine learning tool that utilizes models of HLA-peptide-TCR complexes to extract features predictive of immunogenicity.
Methods A labelled dataset involving over 15 million pHLA sequences with experimentally-determined immunogenicity results was selected from a previously developed sequence-based immunogenicity prediction tool called BigMHC.1 APE-Gen 2.02 and Boltz-23 were used to generate structural models for each pHLA, to be utilized for the extraction of structural features. APE-Gen 2.0 was selected for its tailored, scalable pHLA modeling and docking-based scoring of conformational ensembles.2 Boltz-2, a novel AI-based tool, for outperforming Alphafold2 by incorporating improved biophysical refinement.3 In addition, it also predicts complex binding affinity using a new AI-based approach.
Results We modeled a pilot dataset of 77 complexes with binding affinity and immunogenicity labels using Ape-Gen 2.0 and Boltz2. Top scored conformation from Ape-Gen 2.0 ensembles are being analyzed to determine which structural features contribute to immunogenicity. Different properties and featurization approaches will be explored. We are also evaluating accuracy of affinity predictions on Boltz-2, in comparison to Ape-Gen 2.0 and Rosetta. Large-scale modeling of the entire dataset of 15 million complexes is ongoing.
Conclusions By integrating structural features of the entire pHLA complex we aim to overcome limitations of current sequence-based approaches and enable more accurate screening of therapeutic peptides, improving the safety of next-generation immunotherapies. Once modeled with both independent approaches (~30 million pHLA complexes), our dataset will be the largest available for machine-learning training. We will then explore how to further improve immunogenicity prediction by leveraging different structural sources, featurization methods and AI models.
Methods A labelled dataset involving over 15 million pHLA sequences with experimentally-determined immunogenicity results was selected from a previously developed sequence-based immunogenicity prediction tool called BigMHC.1 APE-Gen 2.02 and Boltz-23 were used to generate structural models for each pHLA, to be utilized for the extraction of structural features. APE-Gen 2.0 was selected for its tailored, scalable pHLA modeling and docking-based scoring of conformational ensembles.2 Boltz-2, a novel AI-based tool, for outperforming Alphafold2 by incorporating improved biophysical refinement.3 In addition, it also predicts complex binding affinity using a new AI-based approach.
Results We modeled a pilot dataset of 77 complexes with binding affinity and immunogenicity labels using Ape-Gen 2.0 and Boltz2. Top scored conformation from Ape-Gen 2.0 ensembles are being analyzed to determine which structural features contribute to immunogenicity. Different properties and featurization approaches will be explored. We are also evaluating accuracy of affinity predictions on Boltz-2, in comparison to Ape-Gen 2.0 and Rosetta. Large-scale modeling of the entire dataset of 15 million complexes is ongoing.
Conclusions By integrating structural features of the entire pHLA complex we aim to overcome limitations of current sequence-based approaches and enable more accurate screening of therapeutic peptides, improving the safety of next-generation immunotherapies. Once modeled with both independent approaches (~30 million pHLA complexes), our dataset will be the largest available for machine-learning training. We will then explore how to further improve immunogenicity prediction by leveraging different structural sources, featurization methods and AI models.
| Original language | English |
|---|---|
| Pages (from-to) | A1234 |
| Number of pages | 1 |
| Journal | Journal for ImmunoTherapy of Cancer |
| Volume | 13 |
| Issue number | Suppl 2 |
| DOIs | |
| Publication status | Published - 4 Nov 2025 |
| Event | SITC 40th Annual Meeting - National Harbor, United States Duration: 5 Nov 2025 → 9 Nov 2025 |
Keywords
- Cancer immunotherapies
- immunogenicity