Skip to main content

Development of a proteochemometric-based support vector machine model for predicting bioactive molecules of tubulin receptors

Abstract

Microtubules are receiving enormous interest in drug discovery due to the important roles they play in cellular functions. Targeting tubulin polymerization presents an excellent opportunity for the development of anti-tubulin drugs. Drug resistance and high toxicity of currently used tubulin-binding agents have necessitated the pursuit of novel drug candidates with increased therapeutic potency. The design of novel drug candidates can be achieved using efficient computational techniques to support existing efforts. Proteochemometric (PCM) modeling is a computational technique that can be employed to elucidate the bioactivity relations between related targets and multiple ligands. We have developed a PCM-based Support Vector Machine (SVM) approach for predicting the bioactivity between tubulin receptors and small, drug-like molecules. The bioactivity datasets used for training the SVM algorithm were obtained from the Binding DB database. The SVM-based PCM model yielded a good overall predictive performance with an area under the curve (AUC) of 87%, Matthews correlation coefficient (MCC) of 72%, overall accuracy of 93%, and a classification error of 7%. The algorithm allows the prediction of the likelihood of new interactions based on confidence scores between the query datasets, comprising ligands in SMILES format and protein sequences of tubulin targets. The algorithm has been implemented as a web server known as TubPred, accessible via http://35.167.90.225:5000/.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Data availability

https://github.com/odam23/Hookworm-Drug-Discovery.git.

References

  1. 1.

    Jordan MA, Wilson L (2004) Microtubules as a target for anticancer drugs. Nat Rev Cancer. https://pubmed.ncbi.nlm.nih.gov/15057285/. Accessed 1 Feb 2021

  2. 2.

    Breviario D, Gianì S, Morello L (2013) Multiple tubulins: evolutionary aspects and biological implications. Plant J 75(2):202–218. https://doi.org/10.1111/tpj.12243

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Aguayo-Ortiz R et al (2013) Molecular basis for benzimidazole resistance from a novel β-tubulin binding site model. J Mol Graph Model 45:26–37. https://doi.org/10.1016/j.jmgm.2013.07.008

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Fennell B et al (2008) Microtubules as antiparasitic drug targets. Expert Opin Drug Discov 3(5):501–518. https://doi.org/10.1517/17460441.3.5.501

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Kwa MS, Veenstra JG, Van Dijk M, Roos MH (1995) Beta-tubulin genes from the parasitic nematode Haemonchus contortus modulate drug resistance in Caenorhabditis elegans. J Mol Biol 246(4):500–510. https://doi.org/10.1006/jmbi.1994.0102

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Lacey E (1988) The role of the cytoskeletal protein, tubulin, in the mode of action and mechanism of drug resistance to benzimidazoles. Int J Parasitol 18(7):885–936

    CAS  Article  Google Scholar 

  7. 7.

    Cooper GM (2000) The development and causes of cancer. The cell: a molecular approach, 2nd edn. https://www.ncbi.nlm.nih.gov/books/NBK9963/. Accessed 7 Feb 2021

  8. 8.

    Haider K, Rahaman S, Yar MS, Kamal A (2019) Tubulin inhibitors as novel anticancer agents: an overview on patents (2013–2018). Expert Opin Ther Pat 29(8):623–641. https://doi.org/10.1080/13543776.2019.1648433

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    Dumontet C, Sikic BI (1999) Mechanisms of action of and resistance to antitubulin agents: microtubule dynamics, drug transport, and cell death. J Clin Oncol Off J Am Soc Clin Oncol 17(3):1061–1070. https://doi.org/10.1200/JCO.1999.17.3.1061

    CAS  Article  Google Scholar 

  10. 10.

    Lacey E (1990) Mode of action of benzimidazoles. Parasitol Today. https://doi.org/10.1016/0169-4758(90)90227-U

    Article  PubMed  Google Scholar 

  11. 11.

    Orr AR et al (2019) Genetic Markers of Benzimidazole Resistance among Human Hookworms (Necator americanus) in Kintampo North Municipality, Ghana. Am J Trop Med Hyg 100(2):351–356. https://doi.org/10.4269/ajtmh.18-0727

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Redman E et al (2015) The emergence of resistance to the Benzimidazole anthlemintics in parasitic nematodes of livestock is characterised by multiple independent hard and soft selective sweeps. PLoS Negl Trop Dis. https://doi.org/10.1371/journal.pntd.0003494

    Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Schneider G (2019) Mind and machine in drug design. Nat Mach Intell 1(3):128–130. https://doi.org/10.1038/s42256-019-0030-7

    Article  Google Scholar 

  14. 14.

    Zhang L, Tan J, Han D, Zhu H (2017) From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov Today 22(11):1680–1685. https://doi.org/10.1016/j.drudis.2017.08.010

    Article  PubMed  Google Scholar 

  15. 15.

    Maltarollo VG, Kronenberger T, Espinoza GZ, Oliveira PR, Honorio KM (2019) Advances with support vector machines for novel drug discovery. Expert Opin Drug Discov 14(1):23–33. https://doi.org/10.1080/17460441.2019.1549033

    CAS  Article  PubMed  Google Scholar 

  16. 16.

    Lo Y-C, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23(8):1538–1546. https://doi.org/10.1016/j.drudis.2018.05.010

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Kwofie SK, Agyenkwa-Mawuli K, Broni E, Miller WA III, Wilson MD (2021) Prediction of antischistosomal small molecules using machine learning in the era of big data. Mol Divers. https://doi.org/10.1007/s11030-021-10288-2

    Article  PubMed  Google Scholar 

  18. 18.

    Aguayo-Ortiz R, Cano-González L, Castillo R, Hernández-Campos A, Dominguez L (2017) Structure-based approaches for the design of benzimidazole-2-carbamate derivatives as tubulin polymerization inhibitors. Chem Biol Drug Des 90(1):40–51. https://doi.org/10.1111/cbdd.12926

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Jiang D, Lei T, Wang Z, Shen C, Cao D, Hou T (2020) ADMET evaluation in drug discovery. 20. Prediction of breast cancer resistance protein inhibition through machine learning. J Cheminformatics 12(1):16. https://doi.org/10.1186/s13321-020-00421-y

    CAS  Article  Google Scholar 

  20. 20.

    Bongers BJ, IJzerman AP, Van Westen GJP (2019) Proteochemometrics–recent developments in bioactivity and selectivity modeling. Drug Discov Today Technol 32–33:89–98. https://doi.org/10.1016/j.ddtec.2020.08.003

    Article  PubMed  Google Scholar 

  21. 21.

    Parks C, Gaieb Z, Amaro RE (2020) An analysis of proteochemometric and conformal prediction machine learning protein-ligand binding affinity models. Front Mol Biosci. https://doi.org/10.3389/fmolb.2020.00093

    Article  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Cortés-Ciriano I et al (2015) Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. MedChemComm 6(1):24–50. https://doi.org/10.1039/C4MD00216D

    CAS  Article  Google Scholar 

  23. 23.

    Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J (2016) BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res 44(D1):D1045–D1053. https://doi.org/10.1093/nar/gkv1072

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Model 45(1):177–182. https://doi.org/10.1021/ci049714

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Ning X, Walters M, Karypisxy G (2012) Improved machine learning models for predicting selective compounds. J Chem Inf Model 52(1):38–50. https://doi.org/10.1021/ci200346b

    CAS  Article  PubMed  Google Scholar 

  26. 26.

    Murrell DS et al (2015) Chemically aware model builder (CAMB): an R package for property and bioactivity modelling of small molecules. J Cheminformatics. https://doi.org/10.1186/s13321-015-0086-2

    Article  Google Scholar 

  27. 27.

    Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474. https://doi.org/10.1002/jcc.21707

    CAS  Article  PubMed  Google Scholar 

  28. 28.

    Applied Predictive Modeling | Max Kuhn | Springer. http://www.springer.com/gp/book/9781461468486. Accessed 18 Mar 2017

  29. 29.

    Kuhn M (2017) The caret package. http://topepo.github.io/caret/index.html. Accessed 18 Mar 2017

  30. 30.

    Krstajic D, Buturovic LJ, Leahy DE, Thomas S (2014) Cross-validation pitfalls when selecting and assessing regression and classification models. J Cheminformatics 6(1):10. https://doi.org/10.1186/1758-2946-6-10

    Article  Google Scholar 

  31. 31.

    “R: The R Stats Package.” https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html. Accessed 29 June 2017

  32. 32.

    Stumpfe D, Ahmed HEA, Vogt I, Bajorath J (2007) Methods for computer-aided chemical biology. Part 1: design of a benchmark system for the evaluation of compound selectivity. Chem Biol Drug Des 70(3):182–194. https://doi.org/10.1111/j.1747-0285.2007.00554.x

    CAS  Article  PubMed  Google Scholar 

  33. 33.

    Eglen SJ (2009) A quick guide to teaching R programming to computational biology students. PLoS Comput Biol. https://doi.org/10.1371/journal.pcbi.1000482

    Article  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754. https://doi.org/10.1021/ci100050t

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Dubchak I, Muchnik I, Holbrook SR, Kim SH (1995) Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci USA 92(19):8700–8704

    CAS  Article  Google Scholar 

  36. 36.

    “FactoMineR: Exploratory Multivariate Data Analysis with R.” http://factominer.free.fr/. Accessed 18 Mar 2017

  37. 37.

    Steinbach M, Ertöz L, Kumar V (2004) The challenges of clustering high dimensional data. In: Wille LT (ed) New directions in statistical physics: econophysics, bioinformatics, and pattern recognition. Springer, Berlin, pp 273–309. https://doi.org/10.1007/978-3-662-08968-2_16

    Chapter  Google Scholar 

  38. 38.

    “Support Vector Machines for Classification and Regression.” https://www.researchgate.net/publication/37535445_Support_Vector_Machines_for_Classification_and_Regression. Accessed 17 June 2017

  39. 39.

    Han LY et al (2008) A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor. J Mol Graph Model 26(8):1276–1286. https://doi.org/10.1016/j.jmgm.2007.12.002

    CAS  Article  PubMed  Google Scholar 

  40. 40.

    Jorissen RN, Gilson MK (2005) Virtual screening of molecular databases using a support vector machine. J Chem Inf Model 45(3):549–561. https://doi.org/10.1021/ci049641u

    CAS  Article  PubMed  Google Scholar 

  41. 41.

    “Scikit-learn: machine learning in Python—scikit-learn 0.18.1 documentation.” http://webcache.googleusercontent.com/search?q=cache:http://scikit-learn.org/&gws_rd=cr&ei=WJvTWL64GojOgAboy7moDw. Accessed 23 Mar 2017

  42. 42.

    Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in large margin classifiers, 1999, pp 61–74

  43. 43.

    Holloway DT, Kon M, DeLisi C (2008) Classifying transcription factor targets and discovering relevant biological features. Biol Direct 3:22–22. https://doi.org/10.1186/1745-6150-3-22

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Ain QU, Méndez-Lucio O, Cortés Ciriano I, Malliavin T, van Westen GJP, Bender A (2014) Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. Integr Biol 6(11):1023–1033. https://doi.org/10.1039/C4IB00175C

    CAS  Article  Google Scholar 

  45. 45.

    Veríssimo GC et al (2019) HQSAR and random forest-based QSAR models for anti-T. vaginalis activities of nitroimidazoles derivatives. J Mol Graph Model 90:180–191. https://doi.org/10.1016/j.jmgm.2019.04.007

    CAS  Article  PubMed  Google Scholar 

  46. 46.

    Elhamdaoui O, El Orche A, Cheikh A, Mojemmi B, Nejjari R, Bouatia M (2020) Development of fast analytical method for the detection and quantification of honey adulteration using vibrational spectroscopy and chemometrics tools. J Anal Methods Chem 2020:e8816249. https://doi.org/10.1155/2020/8816249

    CAS  Article  Google Scholar 

  47. 47.

    Lapins M et al (2013) A Unified proteochemometric model for prediction of inhibition of cytochrome P450 isoforms. PLoS ONE. https://doi.org/10.1371/journal.pone.0066566

    Article  PubMed  PubMed Central  Google Scholar 

  48. 48.

    Lapinsh M, Prusis P, Uhlén S, Wikberg JES (2005) Improved approach for proteochemometrics modeling: application to organic compound—amine G protein-coupled receptor interactions. Bioinformatics 21(23):4289–4296. https://doi.org/10.1093/bioinformatics/bti703

    CAS  Article  PubMed  Google Scholar 

  49. 49.

    Cao D-S et al (2013) Genome-scale screening of drug-target associations relevant to Ki using a chemogenomics approach. PLoS ONE 8(4):e57680. https://doi.org/10.1371/journal.pone.0057680

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Fernandez M, Ahmad S, Sarai A (2010) Proteochemometric recognition of stable kinase inhibition complexes using topological autocorrelation and support vector machines. J Chem Inf Model 50(6):1179–1188. https://doi.org/10.1021/ci1000532

    CAS  Article  PubMed  Google Scholar 

  51. 51.

    Sahigara F, Mansouri K, Ballabio D, Mauri A, Consonni V, Todeschini R (2012) Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17(5):4791–4810. https://doi.org/10.3390/molecules17054791

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  52. 52.

    “Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models | en | OECD.” https://www.oecd.org/env/guidance-document-on-the-validation-of-quantitative-structure-activity-relationship-q-sar-models-9789264085442-en.htm. Accessed 4 Aug 2021

  53. 53.

    Espinoza GZ, Angelo RM, Oliveira PR, Honorio KM (2021) Evaluating deep learning models for predicting ALK-5 inhibition. PLoS ONE 16(1):e0246126. https://doi.org/10.1371/journal.pone.0246126

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Adawara SN, Shallangwa GA, Mamza PA, Ibrahim A (2020) Molecular docking and QSAR theoretical model for prediction of phthalazinone derivatives as new class of potent dengue virus inhibitors. Beni-Suef Univ J Basic Appl Sci 9(1):50. https://doi.org/10.1186/s43088-020-00073-9

    Article  Google Scholar 

  55. 55.

    “Welcome | Flask (A Python Microframework).” http://flask.pocoo.org/. Accessed 6 May 2017

  56. 56.

    Kutcher ME, Ferguson AR, Cohen MJ (2013) A principal component analysis of coagulation after trauma. J Trauma Acute Care Surg 74(5):1223–1230. https://doi.org/10.1097/TA.0b013e31828b7fa1

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17(9):763–774

    CAS  Article  Google Scholar 

  58. 58.

    Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci 374(2065):20150202. https://doi.org/10.1098/rsta.2015.0202

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Affiliations

Authors

Contributions

SKK, MDW and OA conceptualized the project. OA and SKK undertook the computational work with inputs from WAM and MDW. OA and SKK co-wrote the first draft. All authors read, revised, and accepted the final draft for submission. SKK was the principal supervisor with MDW as the co-supervisor of the work.

Corresponding author

Correspondence to Samuel K. Kwofie.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Agyapong, O., Miller, W.A., Wilson, M.D. et al. Development of a proteochemometric-based support vector machine model for predicting bioactive molecules of tubulin receptors. Mol Divers (2021). https://doi.org/10.1007/s11030-021-10329-w

Download citation

Keywords

  • Proteochemometric
  • Support vector machine
  • Tubulin
  • Bioactivity
  • Machine learning