INTRODUCTION: Chagas disease is a neglected illness caused by the protozoan Trypanosoma cruzi, prevalent in underprivileged areas of Brazil. The development of novel drugs targeting T. cruzi infection is of paramount importance [1,2]. Although numerous studies in medicinal chemistry have explored molecular targets and potential drugs for treating T. cruzi infection, only a select few candidates possess the requisite pharmacokinetic and safety profiles for human use. However, the wealth of data regarding potential targets and essential molecular features of active drugs has opened doors to the application of in silico methods that can harness machine learning techniques to uncover new drug candidates effective against T. cruzi [3]. Among these approaches, Quantitative Structure-Activity Relationship (QSAR) stands out as a widely employed tool in medicinal chemistry [4,5]. Recent studies have identified thiazolidines, a class of heterocyclic compounds, as antiparasitic agents [3,6,7], making them candidates for drug discovery against T. cruzi. AIMS: With the objective of providing insights into critical molecular characteristics of compounds effective against T. cruzi and predicting new active drug candidates, this study aims to construct and validate a QSAR model for predicting the pIC50 values of new thiazolidines against T. cruzi amastigotes. Additionally, the study aims to interpret the descriptors associated with the trypanocidal activity. METHODS: The statistical analysis was conducted using QSARINS version 2.2.4 software [8] with a dataset comprising 60 molecules, allocated as 80% for the training set and 20% for the test set. Descriptors were calculated using PADEL 2.21 (1444 descriptors) and the Edragon online platform (511 descriptors), resulting in 1955 1D and 2D descriptors. The QSAR model was developed using the Multiple Linear Regression (MLR) statistical method, providing a precise and interpretable model. Before fitting the MLR model, the descriptor set underwent a filtering process to remove highly correlated and nearly constant descriptors. The filtering criteria included a 90% cutoff for maximum descriptor correlation and for maximum constant values. Following this phase, only 590 descriptors remained eligible for model selection. A Genetic Algorithm (GA) approach was employed for model selection, generating descriptor subsets of three descriptors and running the GA algorithm for models comprising up to eight descriptors. The GA was configured with 5000 generations for each model size, 500 individual genes per generation, a mutation rate of 20%, and a fitness function based on leave-one-out Q2 optimization. Following the GA process, the top 10 models were evaluated based on Q2, R2, and RMSE metrics to select the optimal model. The chosen model underwent further validation through cross-validation and external validation using the test set. Cross-validation encompassed both leave-one-out (LOO) and leave-many-out (LMO) approaches, ensuring model robustness. External validation assessed the model's predictive power by calculating external R2, Q2, and RMSE values for predictions made on the test set. Finally, a permutation test was performed to evaluate the predictive metrics by generating 2000 models with the same descriptors but with shuffled pIC50 values, ensuring the original model's reliability and guarding against overfitting. With a validated model in hand, descriptors were computed for a set of 336 untested thiazolidines, with all predictions verified for consistency within the model's applicability domain. RESULTS AND DISCUSSION: The final model, consisting of seven descriptors, was selected. This model exhibited an R2 of 0.82 and a Q2LOO of 0.73, signifying a robust fit and predictive capacity. The selected descriptors included MATS1s, GATS7m, MLFER_BO, PJI2, MATS4m, E3m, and R7p+, all of which demonstrated coefficients with p-values below 0.05. The model's prediction error was estimated using RMSE and MAE metrics, resulting in an RMSE of 0.121 during cross-validation and a MAE of 0.10. LMO model validation, achieved by omitting 30% of the training set, resulted in a Q2LMO of 0.69, further affirming the model's robustness. Application of the model to untested thiazolidines unveiled TF06H as the most promising drug candidate, with an IC50 of 1.37 µM, accompanied by five other molecules (TF07D, TF06B, TF07B, TF04b, and TF01H) with IC50 values ranging from 1.4 to 1.9 µM. Additionally, by analyzing the coefficients of descriptors, critical molecular features for activity were identified. Notably, MATS1s, GATS7m, MATS4m, and R7p+ are autocorrelation descriptors for physicochemical properties of the molecules. GAT and MAT describe Moran and Geary descriptors, measuring the spatial autocorrelation of molecular properties [9]. MLFER_BO molecular descriptor coefficient considers the overall solute hydrogen bond basicity in linear free energy calculations [10], while E3m is associated with WHIM index descriptors [11], a set of statistical indexes focused on extracting steric and electrostatic properties of molecules, describing the third component accessibility directional index of WHIM. The molecular descriptors integrated into the model appear to be closely linked to physicochemical and spatial properties of the molecule, establishing connections between structural information and pIC50 values. CONCLUSION: Computational methods play a pivotal role in modern medicinal chemistry. The QSAR model proposed in this study presents a straightforward and interpretable machine learning model capable of predicting thiazolidines' activity against amastigotes of T. cruzi, serving as a valuable screening tool in drug development. Given the previous applications and studies on thiazolidines as potential drug candidates for various diseases, this research underscores the significance of screening molecular libraries for trypanocidal activity. The application of the developed model has led to the discovery of promising hits, reaffirming the model's ability to propose new potentially active thiazolidines. Furthermore, predicting pIC50 values can substantially reduce costs and time in the search for new drugs against T. cruzi.
ACKNOWLEDGMENTS:
This research is supported by CAPES, CNPq, and UFAL.
REFERENCES
1. Scarim CB, Jornada DH, Chelucci RC, de Almeida L, dos Santos JL, Chung MC. Current advances in drug discovery for Chagas disease. Eur J Med Chem. 2018;155:824-838. doi:10.1016/j.ejmech.2018.06.040
2. Bonney KM. Chagas disease in the 21st Century: a public health success or an emerging threat? Parasite. 2014;21:11. doi:10.1051/parasite/2014012
3. Moreira DRM, Lima Leite AC, Cardoso MVO, et al. Structural Design, Synthesis and Structure-Activity Relationships of Thiazolidinones with Enhanced Anti- Trypanosoma cruzi Activity. ChemMedChem. 2014;9(1):177-188. doi:10.1002/cmdc.201300354
4. Melo-Filho CC, Braga RC, Muratov EN, et al. Discovery of new potent hits against intracellular Trypanosoma cruzi by QSAR-based virtual screening. Eur J Med Chem. 2019;163:649-659. doi:10.1016/j.ejmech.2018.11.062
5. Chatterjee M, Roy K. Quantitative structure-activity relationships (QSARs) in medicinal chemistry. In: Cheminformatics, QSAR and Machine Learning Applications for Novel Drug Development. Elsevier; 2023:3-38. doi:10.1016/B978-0-443-18638-7.00029-3
6. Jain VS, Vora DK, Ramaa CS. Thiazolidine-2,4-diones: Progress towards multifarious applications. Bioorg Med Chem. 2013;21(7):1599-1620. doi:10.1016/j.bmc.2013.01.029
7. de Oliveira Filho GB, Cardoso MV de O, Espíndola JWP, et al. Structural design, synthesis and pharmacological evaluation of thiazoles against Trypanosoma cruzi. Eur J Med Chem. 2017;141:346-361. doi:10.1016/j.ejmech.2017.09.047
8. Gramatica P, Chirico N, Papa E, Cassani S, Kovarich S. QSARINS: A new software for the development, analysis, and validation of QSAR MLR models. J Comput Chem. 2013;34(24):2121-2132. doi:10.1002/jcc.23361
9. Horne DS. Prediction of protein helix content from an autocorrelation analysis of sequence hydrophobicities. Biopolymers. 1988;27(3):451-477. doi:10.1002/bip.360270308
10. Antanasijevic J, Antanasijevic D, Pocajt V, Trišovic N, Fodor-Csorba K. A QSPR study on the liquid crystallinity of five-ring bent-core molecules using decision trees, MARS and artificial neural networks. RSC Adv. 2016;6(22):18452-18464. doi:10.1039/C5RA20775D
11. Zaliani A, Gancia E. MS-WHIM Scores for Amino Acids: A New 3D-Description for Peptide QSAR and QSPR Studies. J Chem Inf Comput Sci. 1999;39(3):525-533. doi:10.1021/ci980211b
Comissão Organizadora
Francisco Mendonça Junior
Pascal Marchand
Teresinha Gonçalves da Silva
Isabelle Orliac-Garnier
Gerd Bruno da Rocha
Comissão Científica
Ricardo Olimpio de Moura