Development of a proteochemometric-based support vector machine model for predicting bioactive molecules of tubulin receptors

Agyapong, Odame; Miller, Whelton A.; Wilson, Michael D.; Kwofie, Samuel K.

doi:10.1007/s11030-021-10329-w

Original Article
Published: 09 October 2021

Development of a proteochemometric-based support vector machine model for predicting bioactive molecules of tubulin receptors

Odame Agyapong^1,2,
Whelton A. Miller^3,4,5,
Michael D. Wilson^2,3 &
Samuel K. Kwofie ORCID: orcid.org/0000-0002-1093-1517^1,6

Molecular Diversity (2021)Cite this article

82 Accesses
Metrics details

Abstract

Microtubules are receiving enormous interest in drug discovery due to the important roles they play in cellular functions. Targeting tubulin polymerization presents an excellent opportunity for the development of anti-tubulin drugs. Drug resistance and high toxicity of currently used tubulin-binding agents have necessitated the pursuit of novel drug candidates with increased therapeutic potency. The design of novel drug candidates can be achieved using efficient computational techniques to support existing efforts. Proteochemometric (PCM) modeling is a computational technique that can be employed to elucidate the bioactivity relations between related targets and multiple ligands. We have developed a PCM-based Support Vector Machine (SVM) approach for predicting the bioactivity between tubulin receptors and small, drug-like molecules. The bioactivity datasets used for training the SVM algorithm were obtained from the Binding DB database. The SVM-based PCM model yielded a good overall predictive performance with an area under the curve (AUC) of 87%, Matthews correlation coefficient (MCC) of 72%, overall accuracy of 93%, and a classification error of 7%. The algorithm allows the prediction of the likelihood of new interactions based on confidence scores between the query datasets, comprising ligands in SMILES format and protein sequences of tubulin targets. The algorithm has been implemented as a web server known as TubPred, accessible via http://35.167.90.225:5000/.

This is a preview of subscription content, access via your institution.

Access options

Buy single article

Instant access to the full article PDF.

34,95 €

Price includes VAT (Ghana)
Tax calculation will be finalised during checkout.

Data availability

https://github.com/odam23/Hookworm-Drug-Discovery.git.

References

1.
Jordan MA, Wilson L (2004) Microtubules as a target for anticancer drugs. Nat Rev Cancer. https://pubmed.ncbi.nlm.nih.gov/15057285/. Accessed 1 Feb 2021
2.
Breviario D, Gianì S, Morello L (2013) Multiple tubulins: evolutionary aspects and biological implications. Plant J 75(2):202–218. https://doi.org/10.1111/tpj.12243
CAS Article PubMed Google Scholar
3.
Aguayo-Ortiz R et al (2013) Molecular basis for benzimidazole resistance from a novel β-tubulin binding site model. J Mol Graph Model 45:26–37. https://doi.org/10.1016/j.jmgm.2013.07.008
CAS Article PubMed Google Scholar
4.
Fennell B et al (2008) Microtubules as antiparasitic drug targets. Expert Opin Drug Discov 3(5):501–518. https://doi.org/10.1517/17460441.3.5.501
CAS Article PubMed Google Scholar
5.
Kwa MS, Veenstra JG, Van Dijk M, Roos MH (1995) Beta-tubulin genes from the parasitic nematode Haemonchus contortus modulate drug resistance in Caenorhabditis elegans. J Mol Biol 246(4):500–510. https://doi.org/10.1006/jmbi.1994.0102
CAS Article PubMed Google Scholar
6.
Lacey E (1988) The role of the cytoskeletal protein, tubulin, in the mode of action and mechanism of drug resistance to benzimidazoles. Int J Parasitol 18(7):885–936
CAS Article Google Scholar
7.
Cooper GM (2000) The development and causes of cancer. The cell: a molecular approach, 2nd edn. https://www.ncbi.nlm.nih.gov/books/NBK9963/. Accessed 7 Feb 2021
8.
Haider K, Rahaman S, Yar MS, Kamal A (2019) Tubulin inhibitors as novel anticancer agents: an overview on patents (2013–2018). Expert Opin Ther Pat 29(8):623–641. https://doi.org/10.1080/13543776.2019.1648433
CAS Article PubMed Google Scholar
9.
Dumontet C, Sikic BI (1999) Mechanisms of action of and resistance to antitubulin agents: microtubule dynamics, drug transport, and cell death. J Clin Oncol Off J Am Soc Clin Oncol 17(3):1061–1070. https://doi.org/10.1200/JCO.1999.17.3.1061
CAS Article Google Scholar
10.
Lacey E (1990) Mode of action of benzimidazoles. Parasitol Today. https://doi.org/10.1016/0169-4758(90)90227-U
Article PubMed Google Scholar
11.
Orr AR et al (2019) Genetic Markers of Benzimidazole Resistance among Human Hookworms (Necator americanus) in Kintampo North Municipality, Ghana. Am J Trop Med Hyg 100(2):351–356. https://doi.org/10.4269/ajtmh.18-0727
CAS Article PubMed Google Scholar
12.
Redman E et al (2015) The emergence of resistance to the Benzimidazole anthlemintics in parasitic nematodes of livestock is characterised by multiple independent hard and soft selective sweeps. PLoS Negl Trop Dis. https://doi.org/10.1371/journal.pntd.0003494
Article PubMed PubMed Central Google Scholar
13.
Schneider G (2019) Mind and machine in drug design. Nat Mach Intell 1(3):128–130. https://doi.org/10.1038/s42256-019-0030-7
Article Google Scholar
14.
Zhang L, Tan J, Han D, Zhu H (2017) From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov Today 22(11):1680–1685. https://doi.org/10.1016/j.drudis.2017.08.010
Article PubMed Google Scholar
15.
Maltarollo VG, Kronenberger T, Espinoza GZ, Oliveira PR, Honorio KM (2019) Advances with support vector machines for novel drug discovery. Expert Opin Drug Discov 14(1):23–33. https://doi.org/10.1080/17460441.2019.1549033
CAS Article PubMed Google Scholar
16.
Lo Y-C, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23(8):1538–1546. https://doi.org/10.1016/j.drudis.2018.05.010
CAS Article PubMed PubMed Central Google Scholar
17.
Kwofie SK, Agyenkwa-Mawuli K, Broni E, Miller WA III, Wilson MD (2021) Prediction of antischistosomal small molecules using machine learning in the era of big data. Mol Divers. https://doi.org/10.1007/s11030-021-10288-2
Article PubMed Google Scholar
18.
Aguayo-Ortiz R, Cano-González L, Castillo R, Hernández-Campos A, Dominguez L (2017) Structure-based approaches for the design of benzimidazole-2-carbamate derivatives as tubulin polymerization inhibitors. Chem Biol Drug Des 90(1):40–51. https://doi.org/10.1111/cbdd.12926
CAS Article PubMed Google Scholar
19.
Jiang D, Lei T, Wang Z, Shen C, Cao D, Hou T (2020) ADMET evaluation in drug discovery. 20. Prediction of breast cancer resistance protein inhibition through machine learning. J Cheminformatics 12(1):16. https://doi.org/10.1186/s13321-020-00421-y
CAS Article Google Scholar
20.
Bongers BJ, IJzerman AP, Van Westen GJP (2019) Proteochemometrics–recent developments in bioactivity and selectivity modeling. Drug Discov Today Technol 32–33:89–98. https://doi.org/10.1016/j.ddtec.2020.08.003
Article PubMed Google Scholar
21.
Parks C, Gaieb Z, Amaro RE (2020) An analysis of proteochemometric and conformal prediction machine learning protein-ligand binding affinity models. Front Mol Biosci. https://doi.org/10.3389/fmolb.2020.00093
Article PubMed PubMed Central Google Scholar
22.
Cortés-Ciriano I et al (2015) Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. MedChemComm 6(1):24–50. https://doi.org/10.1039/C4MD00216D
CAS Article Google Scholar
23.
Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J (2016) BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res 44(D1):D1045–D1053. https://doi.org/10.1093/nar/gkv1072
CAS Article PubMed Google Scholar
24.
Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Model 45(1):177–182. https://doi.org/10.1021/ci049714
CAS Article PubMed PubMed Central Google Scholar
25.
Ning X, Walters M, Karypisxy G (2012) Improved machine learning models for predicting selective compounds. J Chem Inf Model 52(1):38–50. https://doi.org/10.1021/ci200346b
CAS Article PubMed Google Scholar
26.
Murrell DS et al (2015) Chemically aware model builder (CAMB): an R package for property and bioactivity modelling of small molecules. J Cheminformatics. https://doi.org/10.1186/s13321-015-0086-2
Article Google Scholar
27.
Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474. https://doi.org/10.1002/jcc.21707
CAS Article PubMed Google Scholar
28.
Applied Predictive Modeling | Max Kuhn | Springer. http://www.springer.com/gp/book/9781461468486. Accessed 18 Mar 2017
29.
Kuhn M (2017) The caret package. http://topepo.github.io/caret/index.html. Accessed 18 Mar 2017
30.
Krstajic D, Buturovic LJ, Leahy DE, Thomas S (2014) Cross-validation pitfalls when selecting and assessing regression and classification models. J Cheminformatics 6(1):10. https://doi.org/10.1186/1758-2946-6-10
Article Google Scholar
31.
“R: The R Stats Package.” https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html. Accessed 29 June 2017
32.
Stumpfe D, Ahmed HEA, Vogt I, Bajorath J (2007) Methods for computer-aided chemical biology. Part 1: design of a benchmark system for the evaluation of compound selectivity. Chem Biol Drug Des 70(3):182–194. https://doi.org/10.1111/j.1747-0285.2007.00554.x
CAS Article PubMed Google Scholar
33.
Eglen SJ (2009) A quick guide to teaching R programming to computational biology students. PLoS Comput Biol. https://doi.org/10.1371/journal.pcbi.1000482
Article PubMed PubMed Central Google Scholar
34.
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754. https://doi.org/10.1021/ci100050t
CAS Article PubMed Google Scholar
35.
Dubchak I, Muchnik I, Holbrook SR, Kim SH (1995) Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci USA 92(19):8700–8704
CAS Article Google Scholar
36.
“FactoMineR: Exploratory Multivariate Data Analysis with R.” http://factominer.free.fr/. Accessed 18 Mar 2017
37.
Steinbach M, Ertöz L, Kumar V (2004) The challenges of clustering high dimensional data. In: Wille LT (ed) New directions in statistical physics: econophysics, bioinformatics, and pattern recognition. Springer, Berlin, pp 273–309. https://doi.org/10.1007/978-3-662-08968-2_16
Chapter Google Scholar
38.
“Support Vector Machines for Classification and Regression.” https://www.researchgate.net/publication/37535445_Support_Vector_Machines_for_Classification_and_Regression. Accessed 17 June 2017
39.
Han LY et al (2008) A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor. J Mol Graph Model 26(8):1276–1286. https://doi.org/10.1016/j.jmgm.2007.12.002
CAS Article PubMed Google Scholar
40.
Jorissen RN, Gilson MK (2005) Virtual screening of molecular databases using a support vector machine. J Chem Inf Model 45(3):549–561. https://doi.org/10.1021/ci049641u
CAS Article PubMed Google Scholar
41.
“Scikit-learn: machine learning in Python—scikit-learn 0.18.1 documentation.” http://webcache.googleusercontent.com/search?q=cache:http://scikit-learn.org/&gws_rd=cr&ei=WJvTWL64GojOgAboy7moDw. Accessed 23 Mar 2017
42.
Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in large margin classifiers, 1999, pp 61–74
43.
Holloway DT, Kon M, DeLisi C (2008) Classifying transcription factor targets and discovering relevant biological features. Biol Direct 3:22–22. https://doi.org/10.1186/1745-6150-3-22
CAS Article PubMed PubMed Central Google Scholar
44.
Ain QU, Méndez-Lucio O, Cortés Ciriano I, Malliavin T, van Westen GJP, Bender A (2014) Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. Integr Biol 6(11):1023–1033. https://doi.org/10.1039/C4IB00175C
CAS Article Google Scholar
45.
Veríssimo GC et al (2019) HQSAR and random forest-based QSAR models for anti-T. vaginalis activities of nitroimidazoles derivatives. J Mol Graph Model 90:180–191. https://doi.org/10.1016/j.jmgm.2019.04.007
CAS Article PubMed Google Scholar
46.
Elhamdaoui O, El Orche A, Cheikh A, Mojemmi B, Nejjari R, Bouatia M (2020) Development of fast analytical method for the detection and quantification of honey adulteration using vibrational spectroscopy and chemometrics tools. J Anal Methods Chem 2020:e8816249. https://doi.org/10.1155/2020/8816249
CAS Article Google Scholar
47.
Lapins M et al (2013) A Unified proteochemometric model for prediction of inhibition of cytochrome P450 isoforms. PLoS ONE. https://doi.org/10.1371/journal.pone.0066566
Article PubMed PubMed Central Google Scholar
48.
Lapinsh M, Prusis P, Uhlén S, Wikberg JES (2005) Improved approach for proteochemometrics modeling: application to organic compound—amine G protein-coupled receptor interactions. Bioinformatics 21(23):4289–4296. https://doi.org/10.1093/bioinformatics/bti703
CAS Article PubMed Google Scholar
49.
Cao D-S et al (2013) Genome-scale screening of drug-target associations relevant to Ki using a chemogenomics approach. PLoS ONE 8(4):e57680. https://doi.org/10.1371/journal.pone.0057680
CAS Article PubMed PubMed Central Google Scholar
50.
Fernandez M, Ahmad S, Sarai A (2010) Proteochemometric recognition of stable kinase inhibition complexes using topological autocorrelation and support vector machines. J Chem Inf Model 50(6):1179–1188. https://doi.org/10.1021/ci1000532
CAS Article PubMed Google Scholar
51.
Sahigara F, Mansouri K, Ballabio D, Mauri A, Consonni V, Todeschini R (2012) Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17(5):4791–4810. https://doi.org/10.3390/molecules17054791
CAS Article PubMed PubMed Central Google Scholar
52.
“Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models | en | OECD.” https://www.oecd.org/env/guidance-document-on-the-validation-of-quantitative-structure-activity-relationship-q-sar-models-9789264085442-en.htm. Accessed 4 Aug 2021
53.
Espinoza GZ, Angelo RM, Oliveira PR, Honorio KM (2021) Evaluating deep learning models for predicting ALK-5 inhibition. PLoS ONE 16(1):e0246126. https://doi.org/10.1371/journal.pone.0246126
CAS Article PubMed PubMed Central Google Scholar
54.
Adawara SN, Shallangwa GA, Mamza PA, Ibrahim A (2020) Molecular docking and QSAR theoretical model for prediction of phthalazinone derivatives as new class of potent dengue virus inhibitors. Beni-Suef Univ J Basic Appl Sci 9(1):50. https://doi.org/10.1186/s43088-020-00073-9
Article Google Scholar
55.
“Welcome | Flask (A Python Microframework).” http://flask.pocoo.org/. Accessed 6 May 2017
56.
Kutcher ME, Ferguson AR, Cohen MJ (2013) A principal component analysis of coagulation after trauma. J Trauma Acute Care Surg 74(5):1223–1230. https://doi.org/10.1097/TA.0b013e31828b7fa1
CAS Article PubMed PubMed Central Google Scholar
57.
Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17(9):763–774
CAS Article Google Scholar
58.
Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci 374(2065):20150202. https://doi.org/10.1098/rsta.2015.0202
Article PubMed PubMed Central Google Scholar

Download references

Author information

Affiliations

Department of Biomedical Engineering, School of Engineering Sciences, College of Basic and Applied Sciences, University of Ghana, PMB LG 77, Legon, Accra, Ghana
Odame Agyapong & Samuel K. Kwofie
Department of Parasitology, Noguchi Memorial Institute for Medical Research (NMIMR), College of Health Sciences (CHS), University of Ghana, P.O. Box LG 581, Legon, Accra, Ghana
Odame Agyapong & Michael D. Wilson
Department of Medicine, Loyola University Medical Center, Maywood, IL, 60153, USA
Whelton A. Miller & Michael D. Wilson
School of Engineering and Applied Science, Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, 19104, USA
Whelton A. Miller
Department of Molecular Pharmacology and Neuroscience, Loyola University Medical Center, Maywood, IL, 60153, USA
Whelton A. Miller
West African Centre for Cell Biology of Infectious Pathogens, Department of Biochemistry, Cell and Molecular Biology, College of Basic and Applied Sciences, University of Ghana, Accra, Ghana
Samuel K. Kwofie

Authors

Odame Agyapong
View author publications
You can also search for this author in PubMed Google Scholar
Whelton A. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Michael D. Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Samuel K. Kwofie
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SKK, MDW and OA conceptualized the project. OA and SKK undertook the computational work with inputs from WAM and MDW. OA and SKK co-wrote the first draft. All authors read, revised, and accepted the final draft for submission. SKK was the principal supervisor with MDW as the co-supervisor of the work.

Corresponding author

Correspondence to Samuel K. Kwofie.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Agyapong, O., Miller, W.A., Wilson, M.D. et al. Development of a proteochemometric-based support vector machine model for predicting bioactive molecules of tubulin receptors. Mol Divers (2021). https://doi.org/10.1007/s11030-021-10329-w

Download citation

Received: 29 March 2021
Accepted: 23 September 2021
Published: 09 October 2021
DOI: https://doi.org/10.1007/s11030-021-10329-w

Keywords

Proteochemometric
Support vector machine
Tubulin
Bioactivity
Machine learning