Abstract
Microtubules are receiving enormous interest in drug discovery due to the important roles they play in cellular functions. Targeting tubulin polymerization presents an excellent opportunity for the development of anti-tubulin drugs. Drug resistance and high toxicity of currently used tubulin-binding agents have necessitated the pursuit of novel drug candidates with increased therapeutic potency. The design of novel drug candidates can be achieved using efficient computational techniques to support existing efforts. Proteochemometric (PCM) modeling is a computational technique that can be employed to elucidate the bioactivity relations between related targets and multiple ligands. We have developed a PCM-based Support Vector Machine (SVM) approach for predicting the bioactivity between tubulin receptors and small, drug-like molecules. The bioactivity datasets used for training the SVM algorithm were obtained from the Binding DB database. The SVM-based PCM model yielded a good overall predictive performance with an area under the curve (AUC) of 87%, Matthews correlation coefficient (MCC) of 72%, overall accuracy of 93%, and a classification error of 7%. The algorithm allows the prediction of the likelihood of new interactions based on confidence scores between the query datasets, comprising ligands in SMILES format and protein sequences of tubulin targets. The algorithm has been implemented as a web server known as TubPred, accessible via http://35.167.90.225:5000/.
This is a preview of subscription content, access via your institution.







Data availability
References
- 1.
Jordan MA, Wilson L (2004) Microtubules as a target for anticancer drugs. Nat Rev Cancer. https://pubmed.ncbi.nlm.nih.gov/15057285/. Accessed 1 Feb 2021
- 2.
Breviario D, Gianì S, Morello L (2013) Multiple tubulins: evolutionary aspects and biological implications. Plant J 75(2):202–218. https://doi.org/10.1111/tpj.12243
- 3.
Aguayo-Ortiz R et al (2013) Molecular basis for benzimidazole resistance from a novel β-tubulin binding site model. J Mol Graph Model 45:26–37. https://doi.org/10.1016/j.jmgm.2013.07.008
- 4.
Fennell B et al (2008) Microtubules as antiparasitic drug targets. Expert Opin Drug Discov 3(5):501–518. https://doi.org/10.1517/17460441.3.5.501
- 5.
Kwa MS, Veenstra JG, Van Dijk M, Roos MH (1995) Beta-tubulin genes from the parasitic nematode Haemonchus contortus modulate drug resistance in Caenorhabditis elegans. J Mol Biol 246(4):500–510. https://doi.org/10.1006/jmbi.1994.0102
- 6.
Lacey E (1988) The role of the cytoskeletal protein, tubulin, in the mode of action and mechanism of drug resistance to benzimidazoles. Int J Parasitol 18(7):885–936
- 7.
Cooper GM (2000) The development and causes of cancer. The cell: a molecular approach, 2nd edn. https://www.ncbi.nlm.nih.gov/books/NBK9963/. Accessed 7 Feb 2021
- 8.
Haider K, Rahaman S, Yar MS, Kamal A (2019) Tubulin inhibitors as novel anticancer agents: an overview on patents (2013–2018). Expert Opin Ther Pat 29(8):623–641. https://doi.org/10.1080/13543776.2019.1648433
- 9.
Dumontet C, Sikic BI (1999) Mechanisms of action of and resistance to antitubulin agents: microtubule dynamics, drug transport, and cell death. J Clin Oncol Off J Am Soc Clin Oncol 17(3):1061–1070. https://doi.org/10.1200/JCO.1999.17.3.1061
- 10.
Lacey E (1990) Mode of action of benzimidazoles. Parasitol Today. https://doi.org/10.1016/0169-4758(90)90227-U
- 11.
Orr AR et al (2019) Genetic Markers of Benzimidazole Resistance among Human Hookworms (Necator americanus) in Kintampo North Municipality, Ghana. Am J Trop Med Hyg 100(2):351–356. https://doi.org/10.4269/ajtmh.18-0727
- 12.
Redman E et al (2015) The emergence of resistance to the Benzimidazole anthlemintics in parasitic nematodes of livestock is characterised by multiple independent hard and soft selective sweeps. PLoS Negl Trop Dis. https://doi.org/10.1371/journal.pntd.0003494
- 13.
Schneider G (2019) Mind and machine in drug design. Nat Mach Intell 1(3):128–130. https://doi.org/10.1038/s42256-019-0030-7
- 14.
Zhang L, Tan J, Han D, Zhu H (2017) From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov Today 22(11):1680–1685. https://doi.org/10.1016/j.drudis.2017.08.010
- 15.
Maltarollo VG, Kronenberger T, Espinoza GZ, Oliveira PR, Honorio KM (2019) Advances with support vector machines for novel drug discovery. Expert Opin Drug Discov 14(1):23–33. https://doi.org/10.1080/17460441.2019.1549033
- 16.
Lo Y-C, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23(8):1538–1546. https://doi.org/10.1016/j.drudis.2018.05.010
- 17.
Kwofie SK, Agyenkwa-Mawuli K, Broni E, Miller WA III, Wilson MD (2021) Prediction of antischistosomal small molecules using machine learning in the era of big data. Mol Divers. https://doi.org/10.1007/s11030-021-10288-2
- 18.
Aguayo-Ortiz R, Cano-González L, Castillo R, Hernández-Campos A, Dominguez L (2017) Structure-based approaches for the design of benzimidazole-2-carbamate derivatives as tubulin polymerization inhibitors. Chem Biol Drug Des 90(1):40–51. https://doi.org/10.1111/cbdd.12926
- 19.
Jiang D, Lei T, Wang Z, Shen C, Cao D, Hou T (2020) ADMET evaluation in drug discovery. 20. Prediction of breast cancer resistance protein inhibition through machine learning. J Cheminformatics 12(1):16. https://doi.org/10.1186/s13321-020-00421-y
- 20.
Bongers BJ, IJzerman AP, Van Westen GJP (2019) Proteochemometrics–recent developments in bioactivity and selectivity modeling. Drug Discov Today Technol 32–33:89–98. https://doi.org/10.1016/j.ddtec.2020.08.003
- 21.
Parks C, Gaieb Z, Amaro RE (2020) An analysis of proteochemometric and conformal prediction machine learning protein-ligand binding affinity models. Front Mol Biosci. https://doi.org/10.3389/fmolb.2020.00093
- 22.
Cortés-Ciriano I et al (2015) Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. MedChemComm 6(1):24–50. https://doi.org/10.1039/C4MD00216D
- 23.
Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J (2016) BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res 44(D1):D1045–D1053. https://doi.org/10.1093/nar/gkv1072
- 24.
Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Model 45(1):177–182. https://doi.org/10.1021/ci049714
- 25.
Ning X, Walters M, Karypisxy G (2012) Improved machine learning models for predicting selective compounds. J Chem Inf Model 52(1):38–50. https://doi.org/10.1021/ci200346b
- 26.
Murrell DS et al (2015) Chemically aware model builder (CAMB): an R package for property and bioactivity modelling of small molecules. J Cheminformatics. https://doi.org/10.1186/s13321-015-0086-2
- 27.
Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474. https://doi.org/10.1002/jcc.21707
- 28.
Applied Predictive Modeling | Max Kuhn | Springer. http://www.springer.com/gp/book/9781461468486. Accessed 18 Mar 2017
- 29.
Kuhn M (2017) The caret package. http://topepo.github.io/caret/index.html. Accessed 18 Mar 2017
- 30.
Krstajic D, Buturovic LJ, Leahy DE, Thomas S (2014) Cross-validation pitfalls when selecting and assessing regression and classification models. J Cheminformatics 6(1):10. https://doi.org/10.1186/1758-2946-6-10
- 31.
“R: The R Stats Package.” https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html. Accessed 29 June 2017
- 32.
Stumpfe D, Ahmed HEA, Vogt I, Bajorath J (2007) Methods for computer-aided chemical biology. Part 1: design of a benchmark system for the evaluation of compound selectivity. Chem Biol Drug Des 70(3):182–194. https://doi.org/10.1111/j.1747-0285.2007.00554.x
- 33.
Eglen SJ (2009) A quick guide to teaching R programming to computational biology students. PLoS Comput Biol. https://doi.org/10.1371/journal.pcbi.1000482
- 34.
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754. https://doi.org/10.1021/ci100050t
- 35.
Dubchak I, Muchnik I, Holbrook SR, Kim SH (1995) Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci USA 92(19):8700–8704
- 36.
“FactoMineR: Exploratory Multivariate Data Analysis with R.” http://factominer.free.fr/. Accessed 18 Mar 2017
- 37.
Steinbach M, Ertöz L, Kumar V (2004) The challenges of clustering high dimensional data. In: Wille LT (ed) New directions in statistical physics: econophysics, bioinformatics, and pattern recognition. Springer, Berlin, pp 273–309. https://doi.org/10.1007/978-3-662-08968-2_16
- 38.
“Support Vector Machines for Classification and Regression.” https://www.researchgate.net/publication/37535445_Support_Vector_Machines_for_Classification_and_Regression. Accessed 17 June 2017
- 39.
Han LY et al (2008) A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor. J Mol Graph Model 26(8):1276–1286. https://doi.org/10.1016/j.jmgm.2007.12.002
- 40.
Jorissen RN, Gilson MK (2005) Virtual screening of molecular databases using a support vector machine. J Chem Inf Model 45(3):549–561. https://doi.org/10.1021/ci049641u
- 41.
“Scikit-learn: machine learning in Python—scikit-learn 0.18.1 documentation.” http://webcache.googleusercontent.com/search?q=cache:http://scikit-learn.org/&gws_rd=cr&ei=WJvTWL64GojOgAboy7moDw. Accessed 23 Mar 2017
- 42.
Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in large margin classifiers, 1999, pp 61–74
- 43.
Holloway DT, Kon M, DeLisi C (2008) Classifying transcription factor targets and discovering relevant biological features. Biol Direct 3:22–22. https://doi.org/10.1186/1745-6150-3-22
- 44.
Ain QU, Méndez-Lucio O, Cortés Ciriano I, Malliavin T, van Westen GJP, Bender A (2014) Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. Integr Biol 6(11):1023–1033. https://doi.org/10.1039/C4IB00175C
- 45.
Veríssimo GC et al (2019) HQSAR and random forest-based QSAR models for anti-T. vaginalis activities of nitroimidazoles derivatives. J Mol Graph Model 90:180–191. https://doi.org/10.1016/j.jmgm.2019.04.007
- 46.
Elhamdaoui O, El Orche A, Cheikh A, Mojemmi B, Nejjari R, Bouatia M (2020) Development of fast analytical method for the detection and quantification of honey adulteration using vibrational spectroscopy and chemometrics tools. J Anal Methods Chem 2020:e8816249. https://doi.org/10.1155/2020/8816249
- 47.
Lapins M et al (2013) A Unified proteochemometric model for prediction of inhibition of cytochrome P450 isoforms. PLoS ONE. https://doi.org/10.1371/journal.pone.0066566
- 48.
Lapinsh M, Prusis P, Uhlén S, Wikberg JES (2005) Improved approach for proteochemometrics modeling: application to organic compound—amine G protein-coupled receptor interactions. Bioinformatics 21(23):4289–4296. https://doi.org/10.1093/bioinformatics/bti703
- 49.
Cao D-S et al (2013) Genome-scale screening of drug-target associations relevant to Ki using a chemogenomics approach. PLoS ONE 8(4):e57680. https://doi.org/10.1371/journal.pone.0057680
- 50.
Fernandez M, Ahmad S, Sarai A (2010) Proteochemometric recognition of stable kinase inhibition complexes using topological autocorrelation and support vector machines. J Chem Inf Model 50(6):1179–1188. https://doi.org/10.1021/ci1000532
- 51.
Sahigara F, Mansouri K, Ballabio D, Mauri A, Consonni V, Todeschini R (2012) Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17(5):4791–4810. https://doi.org/10.3390/molecules17054791
- 52.
“Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models | en | OECD.” https://www.oecd.org/env/guidance-document-on-the-validation-of-quantitative-structure-activity-relationship-q-sar-models-9789264085442-en.htm. Accessed 4 Aug 2021
- 53.
Espinoza GZ, Angelo RM, Oliveira PR, Honorio KM (2021) Evaluating deep learning models for predicting ALK-5 inhibition. PLoS ONE 16(1):e0246126. https://doi.org/10.1371/journal.pone.0246126
- 54.
Adawara SN, Shallangwa GA, Mamza PA, Ibrahim A (2020) Molecular docking and QSAR theoretical model for prediction of phthalazinone derivatives as new class of potent dengue virus inhibitors. Beni-Suef Univ J Basic Appl Sci 9(1):50. https://doi.org/10.1186/s43088-020-00073-9
- 55.
“Welcome | Flask (A Python Microframework).” http://flask.pocoo.org/. Accessed 6 May 2017
- 56.
Kutcher ME, Ferguson AR, Cohen MJ (2013) A principal component analysis of coagulation after trauma. J Trauma Acute Care Surg 74(5):1223–1230. https://doi.org/10.1097/TA.0b013e31828b7fa1
- 57.
Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17(9):763–774
- 58.
Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci 374(2065):20150202. https://doi.org/10.1098/rsta.2015.0202
Author information
Affiliations
Contributions
SKK, MDW and OA conceptualized the project. OA and SKK undertook the computational work with inputs from WAM and MDW. OA and SKK co-wrote the first draft. All authors read, revised, and accepted the final draft for submission. SKK was the principal supervisor with MDW as the co-supervisor of the work.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Agyapong, O., Miller, W.A., Wilson, M.D. et al. Development of a proteochemometric-based support vector machine model for predicting bioactive molecules of tubulin receptors. Mol Divers (2021). https://doi.org/10.1007/s11030-021-10329-w
Received:
Accepted:
Published:
Keywords
- Proteochemometric
- Support vector machine
- Tubulin
- Bioactivity
- Machine learning