University of Ghana  http://ugspace.ug.edu.gh
 
COMPUTER–AIDED APPROACHES TO DISCOVERY OF NOVEL DRUGS 
AGAINST THE HUMAN HOOKWORM NECATOR AMERICANUS 
(NEMATODA: ANCYLOSTOMATIDAE) 
 
BY 
 
ODAME AGYAPONG  
(10204283) 
 
 
 
 
THIS THESIS IS SUBMITTED TO THE UNIVERSITY OF GHANA, LEGON IN 
PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE AWARD OF 
MPHIL BIOMEDICAL ENGINEERING DEGREE  
 
 
DEPARTMENT OF BIOMEDICAL ENGINEERING 
COLLEGE OF BASIC AND APPLIED SCIENCES 
UNIVERSITY OF GHANA 
  
  
JULY, 2017 
University of Ghana  http://ugspace.ug.edu.gh
DECLARATION 
I, ODAME AGYAPONG, do hereby declare that this work, COMPUTER–AIDED 
APPROACHES TO DISCOVERY OF NOVEL DRUGS AGAINST THE HUMAN 
HOOKWORM NECATOR AMERICANUS (NEMATODA: ANCYLOSTOMATIDAE), 
with the exception of the cited references, was written and submitted by me in the 
University of Ghana from AUGUST 2015 to JULY 2017, under the supervision of Dr. 
Samuel K. Kwofie and Prof. Michael Wilson.  
I further declare that this work has not been submitted to University of Ghana or any other  
university.  
...........................................                                                      .............................................      
Odame Agyapong (10204283)                                                                (Date)  
(Student)  
 
............................................                                                    ..............................................                 
Samuel K. Kwofie (PhD)                                                                        (Date)  
(Principal Supervisor)  
 
............................................                                                     ..............................................                 
Prof. Michael Wilson                                                                              (Date)  
   (Co-Supervisor) 
i 
 
University of Ghana  http://ugspace.ug.edu.gh
ABSTRACT 
There is a crucial need to develop novel anthelminthic drugs due to the mounting disease 
burden and increasing evidence of hookworm resistance to drugs such as albendazole and 
mebendazole, which for decades have been used to treat the infection. Consequently, it is 
exigent to develop alternative drugs with improved therapeutic efficacy. Natural products 
due to their unique active ingredients have been shown to possess exceptional structures 
with chemical diversity that is unmatched by any synthetic libraries. It is imperative to 
leverage natural products to augment hookworm drug discovery. Therefore, this study 
aimed to: (i) identify potential novel anthelminthic lead compounds by screening African 
natural product-derived ligands against beta tubulin of Necator americanus, a known 
hookworm receptor and (ii) develop support vector machine-based proteochemometr ic 
modelling (PCM) for bioactivity profiling of beta tubulins receptors.  
The 3D structure of the beta tubulin of hookworm with UniProt entry W2T758, was 
generated using homology modelling. The model was subjected to molecular dynamics 
simulations and active site interactions prediction. The first set of ligand libraries 
comprising 885 natural product compounds obtained from African medicinal plants 
database (AfroDb) combined with Dichapetalin A, were screened against the receptor. 
ZINC14760755 and ZINC28462577 compounds were found to be potential leads due to 
promising binding affinity, active site interactions and pharmacokinetic profiles. 
Additionally, a second set comprising 2297 compounds derived from Northern African 
Natural Product Database (NANPDB) were virtually screened. The compound 
S,5Z,8Z,11Z,13E,17Z-15-hydroxy-1-(2,4,6-trihydroxyphenyl)-15-methylicosa-
5,8,11,13,17-pentaen-1-one exhibited plausible binding affinity, toxicity and 
ii 
 
University of Ghana  http://ugspace.ug.edu.gh
pharmacokinetic profile. The aforementioned natural compounds are potential leads which 
can be experimentally characterised for possible pre-clinical trials.  
Support vector machine based proteochemometric modelling was also developed to predict 
the bioactivity relations between beta tubulin variants and small compounds using an 
interaction dataset retrieved from BindingDB. The model achieved reasonably good 
performance with a ROC-AUC of 87%, an MCC of 0.75 and a classification error of 
approximately 4%, although it was trained on a small dataset. The model allows the 
prediction of the likelihood of interactions between query datasets comprising ligands in 
SMILES format and protein sequences of beta tubulin targets. In future, larger bioactive 
datasets of beta tubulins originating from high throughput experiments can be utilised to 
possibly enhance the performance of the hookworm PCM model. 
 
 
 
 
 
 
 
 
iii 
 
University of Ghana  http://ugspace.ug.edu.gh
DEDICATION 
I dedicate this thesis to the Almighty God for his protection, knowledge and guidance 
throughout the research journey. 
. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
iv 
 
University of Ghana  http://ugspace.ug.edu.gh
ACKNOWLEDGEMENTS 
I would like to first extend my deepest and sincere gratitude to my supervisors whose useful 
guidance has brought me this far. I would not have been able to finish this research without 
their expertise, insightful comments and considerable guidance. I would like to specially 
thank Dr Samuel Kojo Kwofie, my main supervisor, for patiently guiding me to develop 
my background in computational drug discovery and machine learning. His insightful 
comments, enthusiasm, corrections of thesis writing, suggestions, guidance and scholarly 
knowledge were greatly useful. I also want to take this opportunity to thank my co-
supervisor, Professor Michael Wilson, for the financial support given to me throughout this 
research. I am deeply grateful for all the support he gave me including correction of the 
thesis and the willingness to always support me wherever and whenever needed. I wish to 
also express special thanks to all the lecturers at the Department of Biomed ica l 
Engineering, University of Ghana for their useful criticisms during my thesis defence. 
Special thanks to Dr Whelton Miller from the University of Pennsylvania, USA, for 
guidance on molecular dynamics simulation of the beta tubulin of Necator americanus. 
Also, I am grateful to Dr. Christian Parry from the Howard University, USA and Dr 
Michael Bosu from Waikato Institute of Technology, New Zealand, for their useful 
suggestions. Last but not the least, I would finally like to thank my family and friends for 
all the encouragement, support and love for the completion and success of this research. 
 
 
v 
 
University of Ghana  http://ugspace.ug.edu.gh
TABLE OF CONTENTS 
DECLARATION ..................................................................................................................i 
ABSTRACT......................................................................................................................... ii 
DEDICATION .................................................................................................................... iv 
ACKNOWLEDGEMENTS .................................................................................................v 
TABLE OF CONTENTS.................................................................................................... vi 
LIST OF FIGURES .............................................................................................................x 
LIST OF TABLES ............................................................................................................. xii 
LIST OF ABBREVIATIONS ........................................................................................... xiv 
CHAPTER 1 ....................................................................................................................... 1 
INTRODUCTION .............................................................................................................. 1 
1.1 Background ............................................................................................................... 1 
1.2 Problem Statement, Rational and Overall Goal of the Study.................................... 3 
1.2.1 Problem statement .............................................................................................. 3 
1.2.2 Rationale of the study ......................................................................................... 4 
1.2.3 Overall goal of the study..................................................................................... 6 
1.2.4 Expected outcome (Contribution to knowledge) ................................................ 7 
CHAPTER 2 ....................................................................................................................... 8 
LITERATURE REVIEW ................................................................................................... 8 
2.1 The Hookworm, Necator americanus........................................................................ 8 
vi 
 
University of Ghana  http://ugspace.ug.edu.gh
2.1.1 Life cycle ............................................................................................................ 8 
2.1.2 Geographical distribution of nematode (hookworm) infections ....................... 10 
2.2 Hookworm Drug Targets ........................................................................................ 11 
2.2.1 Beta-tubulin ...................................................................................................... 11 
2.2.2 Other potential targets....................................................................................... 12 
2.3 Existing Treatment Methods and their Molecular Targets...................................... 13 
2.4 Natural Products (NP) and their Utility as Anthelminthic Therapeutics: ............... 15 
2.4.2 Other naturally derived compounds effective against hookworm .................... 17 
2.5 Computer-Aided Drug Design (CADD) ................................................................. 17 
2.5.1 Economic significance and time factor of CADD ............................................ 18 
2.5.3 Structure based drug design .............................................................................. 20 
2.5.4 Ligand based drug design ................................................................................. 23 
2.5.6 Proteochemometric modelling (PCM).............................................................. 26 
2.6 Recent Efforts in Hookworm Drug Discovery........................................................ 32 
CHAPTER 3 ..................................................................................................................... 34 
METHODS ....................................................................................................................... 34 
3.1 Template Identification and Homology Modelling of Proteins .............................. 34 
3.2 Molecular Dynamic Simulations of Modelled Protein ........................................... 37 
3.3 Prediction and Analysis of Binding Site ................................................................. 39 
3.4 Protein Preparation .................................................................................................. 39 
vii 
 
University of Ghana  http://ugspace.ug.edu.gh
3.5 Ligands Preparation................................................................................................. 39 
3.6 Virtual Screening Analysis...................................................................................... 40 
3.7 Interaction Profiling using LIGPLOT ..................................................................... 41 
3.8 Absorption, Distribution, Metabolism and Excretion (ADME) Prediction ............ 42 
3.9 Toxicity Prediction using OSIRIS Property Explorer in DataWarrior ................... 42 
3.10 Scaffold Analysis .................................................................................................. 43 
3.11 Proteo-Chemometric Predictive Model of Anti-Tubulin Activity ........................ 44 
3.11.1 Data collection ................................................................................................ 44 
3.11.2 Pre-processing of dataset ................................................................................ 46 
3.11.3 Ligand descriptions (Compound descriptors)................................................. 47 
3.11.4 Target descriptions (Protein descriptors) ........................................................ 48 
3.11.5 Exploratory principal component analysis (PCA) of compounds and target 
datasets....................................................................................................................... 49 
3.11.6 Model development ........................................................................................ 49 
3.11.7 Validation of model performance ................................................................... 51 
CHAPTER 4 ..................................................................................................................... 53 
RESULTS AND DISCUSSION ....................................................................................... 53 
4.1 Template Identification, Homology Modelling of Proteins and Validation ........... 53 
4.2 Molecular Dynamics Simulation............................................................................. 58 
4.3 Prediction and Analysis of Binding Site ................................................................. 59 
viii 
 
University of Ghana  http://ugspace.ug.edu.gh
4.4 Virtual Screening Analysis results .......................................................................... 60 
4.5 Interaction Profile using LIGPLOT ........................................................................ 65 
4.6 ADME Prediction and Pharmacokinetic Properties................................................ 73 
4.7 Toxicity Prediction Analysis ................................................................................... 77 
4.8 Scaffold Analysis .................................................................................................... 78 
4.9 Proteochemometric Modelling ................................................................................ 81 
4.9.1 Exploratory principal component analysis (PCA) of compounds and target 
datasets....................................................................................................................... 83 
4.9.2 Model development .......................................................................................... 85 
               4.9.2.1 Model validation................................................................................... 85 
 
CHAPTER 5 ..................................................................................................................... 88 
CONCLUSION AND RECOMMENDATION................................................................ 88 
REFERENCES ................................................................................................................. 91 
APPENDICES ................................................................................................................ 103 
  
 
 
 
 
 
ix 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
LIST OF FIGURES 
Figure 2.1. Life cycle of Hookworm................................................................................... 9 
Figure 2.2. Global distribution of the human hookworm infection .................................. 10 
Figure 2.3. The drug design pipeline ................................................................................ 20 
Figure 2.4. Structure based drug design............................................................................ 23 
Figure 2.5. Ligand based drug design ............................................................................... 24 
Figure 2.6. Proteochemometric modelling........................................................................ 27 
Figure 3.1. Workflow of protein modelling to scaffold analysis ...................................... 35 
Figure 3.2. Workflow for PCM modelling of beta tubulin bioactivity profiling  .............. 45 
Figure 4.1. A pairwise sequence alignment between the beta tubulin  
                  sequence of N. americanus and D chain of the crystal structure  
  with PDB ID, 5c8y .......................................................................................... 54 
Figure 4.2. Predicted binding site from I-TASSER and rendered in PYMOL ................. 55 
Figure 4.3. 3D model of the beta tubulin of N. americanus ............................................. 55 
Figure 4.4. Ramachandran plot of beta tubulin model from N. americanus .................... 57 
Figure 4.5. Errat plot ......................................................................................................... 58 
Figure 4.6. RMSD plot of the Molecular dynamic simulation using GROMACS  .......... 59 
Figure 4.7. Predicted colchicine binding site of beta tubulin from  
x 
 
University of Ghana  http://ugspace.ug.edu.gh
                    Necator americanus ......................................................................................... 60 
Figure 4.8. Docking pose of ZINC14760755- beta-tubulin receptor complex……..........63 
Figure 4.9. Docking pose of Dichapetalin A and albendazole beta tubulin  
             receptor complex.................................................................................................. 64 
Figure 4.10. Docking pose of S,5Z,8Z,11Z,13E,17Z-15-hydroxy-1-(2,4,6-
trihydroxyphenyl)-15-methylicosa-5,8,11,13,17-pentaen-1-one from 
NANPDB. ............................................................................................................. 65 
Figure 4.11. Interaction profile of ZINC14760755 and ZINC28462577 ......................... 68 
Figure 4.12. Interaction profile of Dichapetalin A and albendazole................................. 69 
Figure 4.13. Interaction profile of S,5Z,8Z,11Z,13E,17Z-15-hydroxy-1-(2,4,6-
trihydroxyphenyl)-15-methylicosa-5,8,11,13,17-pentaen-1-one from 
NANPDB as predicted by LIGPLOT ................................................................... 70 
Figure 4.14. A bar plot of scaffold counts versus the ring systems of 201 compounds 
from list A compared to list B and list C .............................................................. 80 
Figure 4.15. Distribution of response variable (class) in the dataset ................................ 82 
Figure 4.16. Chemical and biological space (compound–target interaction space)  
           of beta tubulin- inhibitor dataset. ............................................................................ 84 
Figure 4.17. Area under Receiver Operating Curve (AUC) ............................................. 87 
 
 
xi 
 
University of Ghana  http://ugspace.ug.edu.gh
 
LIST OF TABLES 
Table 1.1. Existing therapeutic agents and their mechanism of action............................. 14 
Table 3.1. Complete dataset used for PCM  ..................................................................... 47 
Table 4.1. Results of the molecular docking scores of the top 20 ligands from 
             AfroDB library plus Dichapetalin A and albendazole………………………......61 
Table 4.2. Results of the molecular docking scores of the top 20 ligands 
               from the Northern African Natural Product Database…………………………62  
Table 4.3. Results of number of hydrogen-bonds/hydrophobic-bonds and contact  
              residues of top ten ligands from AfroDB, and that of Dichapetalin A 
    and albendazole. .................................................................................................. 71 
Table 4.4. Results of number of hydrogen-bonds/hydrophobic-bonds and contact 
              residues of top ten ligands from the NANPDB. ................................................. 72 
Table 4.5. Results of ADME prediction of top ten virtually screened compounds and 
              that of Dichapetalin A and albendazole as predicted by SwissADME............... 75 
Table 4.6. Results of ADME prediction of top ten ranking compounds  
              from NANPDB………………………………………………………………….76  
Table 4.7. Toxicological profile results of top ten ranking compounds from both set of  
              virtual library compounds as predicted by DataWarrior..................................... 78 
xii 
 
University of Ghana  http://ugspace.ug.edu.gh
Table 4.8. Scaffold diversity analysis of natural products and anthelminthics ................ 81 
Table 4.9. Proteins and compounds descriptors used in the development of the model .. 82 
Table 4.10. SVM model parameters and evaluation of classification performance  ......... 87 
 
 
 
 
 
 
 
 
 
 
 
 
 
xiii 
 
University of Ghana  http://ugspace.ug.edu.gh
 
LIST OF ABBREVIATIONS 
ADME –  Absorption, Distribution, Metabolism and Excretion 
CADD  –  Computer-Aided Drug Design 
CV  –  Cross Validation 
DOPE –  Discrete optimised potential energy 
EST  –  Expressed Sequence TAG 
FDA  –   Food and Drugs Authority  
GTP  –  Guanosine-5-triphosphate 
H-bond –  Hydrogen bond 
HTS  –   High-throughput Screening 
LBDD –   Ligand Based Drug Design 
MDA  –  Mass Drug Administration 
NANPDB –  Northern African Natural Product Database 
NMR     –  Nuclear Magnetic Resonance 
PCM  –  Proteochemometric Modelling 
PDB  –  Protein Data Bank 
QSAR –   Quantity Structure Activity Relationship 
SBDD  –  Structure Based Drug Design 
SVM  –  Support Vector Machine 
xiv 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
xv 
 
University of Ghana  http://ugspace.ug.edu.gh
CHAPTER 1 
INTRODUCTION 
1.1 Background 
The need for development of new drugs cannot be overemphasised in light of current global 
burden of debilitating diseases such as hookworm infection. Drug discovery relies heavily 
on protein structures and the docking of potential compounds in the search for lead drugs. 
The latter is considered as a more deterministic approach for finding drugs against diseases 
and has led to what is called rational drug design [1]  . 
 
Benzimidazoles including albendazole and mebendazole have been around for many years 
as potent therapeutic drugs for hookworm infections, but there are concerns about their low 
efficacies and drug resistance [2–5]. There is therefore, the need to develop alternative 
novel drugs with improved therapeutic efficacy. Natural products present an excellent 
opportunity for novel drug discovery. Researchers from Noguchi Memorial Institute for 
Medical Research and the Chemistry Department of the University of Ghana, for example, 
have found potential anthelminthic activities of a natural compound, Dichapetalin A., 
against hookworm infection highlighting the importance for studies to determine further 
potential anthelminthic drug from natural products [6]. Natural products possess highly 
diverse and novel scaffold structures that make them potentially better drug candidates than 
synthetic compounds [7–12]. The discovery of unique scaffolds different from the known 
anthelminthic drugs is key to the identification of different mechanisms of action [13]. In 
addition, a greater scaffold diversity may suggest a wide chemical space coverage thus; 
1 
 
University of Ghana  http://ugspace.ug.edu.gh
increasing the chances of identifying compounds with interaction to more biological targets 
to elicit anthelminthic activity [13]. For this reason, scaffold diversity analysis is normally 
essential for exploring the presence of unique scaffolds and structures within compounds 
that show favourable binding than benzimidazoles. 
 
 In silico approaches are generally advantageous methods that can be used to provide more 
insight into the bioactivity profile of experimentally determined drugs and potentially 
predict additional drug targets implicated in various disease mechanisms and biologica l 
pathways. They are very useful strategies for shortening the time, reducing associated cost 
and effort required for drug development [1]. These computationally aided drug design 
approaches, including molecular docking and proteochemometric modelling can unravel 
the potential anthelminthic properties of natural compounds. In addition, it has the potential 
to identify new receptors, novel drug leads and different modes of action of potential 
natural product therapeutic agents and their implicated biological pathways.  
 
Two major in silico approaches comprising molecular docking and proteochemometr ic 
modelling are the focus in this study. Molecular docking is an approach which involves the 
interaction between two or more molecules to give a  stable complex  with optimized 
conformation and less binding free energy [14]. Proteochemometric modelling (PCM), on 
the other hand, is a computational method that can predict the bioactivity relations between 
a series of ligands and a series of targets [15]. 
 
2 
 
University of Ghana  http://ugspace.ug.edu.gh
Overall, this thesis provides an in-depth look at computer-aided design for predicting anti-
tubulin activities and computational techniques in optimizing leads from naturally derived 
drugs specific for hookworm. The computer-aided techniques cover structure-based drug 
design studies which includes homology modelling, binding site identification, docking to 
potential binding pockets, virtual screening, toxicity prediction, scaffold analysis and 
construction of a support vector machine based proteochemometric classification model. 
 
1.2 Problem Statement, Rational and Overall Goal of the Study 
1.2.1 Problem statement 
A major problem with current anthelminthic treatments is drug resistance [16].  
Development of varying degrees of drug tolerance among different species of nematodes 
including Necator americanus have been widely reported. This is largely due to the 
frequent and unnecessary use of anthelminthic drugs or increasing drug pressure especially 
in mass drug administration [2]. The increasing problem of drug resistance is a major 
concern because the older active drugs are becoming less effective, thereby drawing the 
attention of major stakeholders including the World Health Organisation (WHO) [17]. 
Drug development is a highly resource-intensive, time-consuming and expensive 
endeavour. Most research in hookworm drug discovery programs have employed low 
throughput methods to isolate, characterise and evaluate the anthelminthic activities of both 
synthetic and natural compounds. These techniques are expensive and laborious, and some 
isolated natural products that although demonstrated anthelminthic activities have not 
yielded the desired results for further drug development. Moreover, after an exhaustive 
3 
 
University of Ghana  http://ugspace.ug.edu.gh
literature review, it was found that very little research has been done in the search for 
natural products or naturally inspired products against the human hookworm, Necator 
americanus.   
 
Efforts in screening natural compounds both in vivo and in vitro, involve the purchase of 
millions of compounds as libraries from pharmaceutical companies. There appears to be 
diminishing investment by pharmaceutical companies in therapeutic research areas owing 
to the prospect of drug failure that could lead to huge financial losses. This is because these 
compounds sometimes fail the basic Absorption, Distribution, Metabolism, Excretion and 
Toxicity (ADMET) testing and the cost of these compounds is so expensive that it becomes 
an economic burden in drug discovery thus serving as a rate limiting step. Moreover, most 
debilitating diseases such as the hookworm disease are endemic in resource-constrained 
countries which may not be able to afford the cost of newly developed drugs. There is, 
therefore, the onus on all key stakeholders in these countries is to identify novel potential 
anthelminthic agents especially from natural products.  
 
1.2.2 Rationale of the study 
In view of the widespread increasing resistance of helminths to current anthelminthic drugs 
and its concomitant high cost, complicated drug administration procedures and high-r isk, 
and long-term endeavour of drug development, it has become important to identify 
alternative drug treatment methods and agents. This should be done by tailoring more 
natural products to current drug development pipelines using computational strategies. 
4 
 
University of Ghana  http://ugspace.ug.edu.gh
Computational methods including rational drug design present advantages in the 
prioritisation of potential lead compounds for preclinical development within the drug 
design landscape. Computational methods have been used in hookworm drug discovery, 
however, the screening datasets need to be expanded with natural products or compounds 
with similar properties to natural products (naturally inspired). Limited use of 
computational techniques leads to the whole sale application of expensive laboratory 
techniques from drug screening to pre-clinical trial level. 
 
In silico approaches that can be used include molecular docking, pharmacophore 
modelling, similarity search, Quantitative Structure Activity Relationship (QSAR) and an 
emerging area, Proteochemometric Modelling (PCM). However, these computationa l 
techniques have their successes and pitfalls. PCM has been shown from previous studies 
to be robust in predicting bioactivity profiles of untested compounds and protein targets 
[18–21]. Both molecular docking and PCM, however, have their limitations; molecular 
docking is sometimes challenged in terms of accuracy and speed while PCM is unable to 
predict compound bioactivity of unrelated targets [21]. After an exhaustive literature 
research, there were no reports on using PCM for anti-tubulin activity of chemical 
compounds and very limited study on application of computational docking of natural 
products against beta tubulins of nematodes including hookworms [22]. Given the limited 
exploration of molecular docking of natural products against beta tubulins from hookworm 
and the unexplored implementation of PCM in the prediction of anti-tubulin activity, there 
is the need to create an integrated approach to anthelminthic drug discovery combining 
5 
 
University of Ghana  http://ugspace.ug.edu.gh
proteochemometric modelling and molecular docking for the prediction of novel active 
compounds that target tubulins. 
 
1.2.3 Overall goal of the study 
The main aims of this study were: (i) the application of computational methods in the 
identification of novel anthelminthic drugs from natural products, and (ii) the use of a 
support vector machine based proteochemometric modelling to predict the bioactivity 
profile of compounds to beta tubulin targets in hookworm.  
 
1.2.3.1 Specific objectives 
The specific objectives of the study are: 
1. Homology modelling of the 3D structure of beta tubulin of Necator americanus. 
2. Virtual screening of naturally derived compounds for the identification of potential 
anthelminthic agents. 
3. In silico evaluation of pharmacological, drug-likeness and toxicity profiles of lead 
compounds. 
4. Comparative analysis of scaffolds of the docked natural products and known 
synthetic anthelminthics, specifically albendazole and mebendazole. 
5. Preliminary exploration of proteochemometric based machine learning model as a 
plausible technique for bioactivity profiling of beta tubulin receptors. 
6 
 
University of Ghana  http://ugspace.ug.edu.gh
1.2.4 Expected outcome (Contribution to knowledge)  
It is expected that this research will accelerate hookworm drug design effort by includ ing 
and prioritising potential naturally derived lead compounds as alternative anthelminthic 
drugs. The support vector machine based proteochemometric predictive model that was 
built in this research can be further explored by leveraging experimental bioassay of 
hookworm activity for the development of an enhanced model for predicting anthelminthic 
activity. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7 
 
University of Ghana  http://ugspace.ug.edu.gh
CHAPTER 2 
LITERATURE REVIEW 
2.1 The Hookworm, Necator americanus 
Hookworm infection remains a significant health burden globally. The worms currently 
infect over 700 million persons in resource limited countries and result in 135,000 deaths 
annually [23, 24]. Areas largely affected by hookworm include: South Asia, Latin America 
and the Caribbean, the Middle East, and North Africa [24 – 26]. The  morbidity and 
mortality of helminths far exceeds that of other tropical diseases including African 
trypanosomiasis and dengue [27, 28]. Children and pregnant women are more susceptible. 
Hookworm infection causes stunted growth and diminished physical fitness as well as 
impaired memory and cognition in children. It also causes infant death in pregnant women.  
According to WHO, hookworm infections is classified as moderate in individuals when it 
produces from 2000 to 3999 eggs per gram (epg) of faeces and heavily burdened in 
individuals at 4,000 epg or more [28, 29].  
 
2.1.1 Life cycle  
Human hookworm infection is primarily caused by Ancylostoma duodenale and Necator 
americanus [27] with the latter, accounting for more than 85% of all hookworm infect ions 
[23]. Hookworm gains access into a human by penetrating the skin, undergoes growth and 
development, and tends to reside more in the duodenum. The life cycle of N. americanus 
begins with eggs embryonating in the soil where under favourable conditions, the first -
stage larvae hatch and feed on environmental microbes. They then molt twice to become 
8 
 
University of Ghana  http://ugspace.ug.edu.gh
infective third-stage larvae (IL3).  These larvae penetrate the skin through the epidermis 
causing cutaneous larva migrants that invades the human circulatory system as shown in 
Figure 2.1. After skin penetration, they penetrate the pulmonary alveoli, migrate to the 
pharynx through the bronchial tree and locate the small intestines where they reside and 
mature into adults. Adult hookworms become attached to the intestinal lining and rupture 
capillaries to gain access to blood. Female worm can release about 10,000 eggs which 
when excreted may contaminate soil and water. The egg hatches in the soil, releasing a 
larva that undergoes various larval stages before a new host infection. Laceration of the 
capillaries can normally lead to anaemia. The life cycle of hookworm is illustrated in the 
Figure 2.1. Hookworms also play immunomodulatory roles suggesting a mechanism by 
which they can be used to suppress autoimmune, allergic and atopic dermatitis disease [31– 
33]. 
  
 
                 
 
 
 
 
 
 
 
 
Figure 2.1. Life cycle of Hookworm [32]. Hookworm eggs transferred through faeces hatch in the faces. 
(1  )  .  T  h   e   l a  r v  a    t h  a  t  i s   r  e  l e  a  s  e  d   m   a  t  u  r e   i n    t h  e   f  a  e  c  e  s   a  n  d   b  e   c  o  m   e   i n  fective L3 after few days of 5 to 10 (2-3). 
L3 penetrates the skin upon contact and migrate to the heart and lungs through the blood vessel (4) L3 
locates the pharynx though the pulmonary alveoli and bronchial tree. The larvae then migrate to the small 
intestines (5), and attaches itself to the intestinal wall.  
 
9 
 
University of Ghana  http://ugspace.ug.edu.gh
2.1.2 Geographical distribution of nematode (hookworm) infections 
The most affected regions are in the tropic and subtropic with A. duodenale and N. 
americanus geographically restricted somehow [34, 35]. An understanding of the 
epidemiology across countries and trends over time is of great importance to enable 
strategies of a cost-effective mass administration intervention programs. In Ghana, 
hookworm infection is seasonal with a high prevalence between April and August [34] . 
The disease is co-endemic with Oesophagostomum biurcum in Northern Togo and Ghana 
with a 50% greater prevalence [35]. 
 
 
 
 
 
 
 
 
 
 
Figure 2.2. Global distribution of the human hookworm infection [191]. The sub-
Saharan Africa and eastern Asia shows the highest prevalence of hookworm infection. 
 
10 
 
University of Ghana  http://ugspace.ug.edu.gh
2.2 Hookworm Drug Targets 
2.2.1 Beta-tubulin 
Beta tubulin is a subunit of the microtubule which, plays a crucial role in cell division and 
maintenance of the cytoskeleton. It binds to two molecules of guanosine-5- triphosphate 
(GTP), at the positive end of microtubules [36]. Beta tubulin has so far been exploited as a 
crucial target for anthelminthic and as a target for several other compounds [37]. Tubulin 
can be selectively targeted by benzimidazole anthelmintics (including albendazole, 
fenbendazole) which inhibits microtubule polymerization [38]. Benzimidozoles confer 
their anthelminthic effect on susceptible nematodes by binding to their beta-tubulin 
resulting in the subsequent prevention of microtubule polymerisation causing 
destabilisation of the intracellular processes and cellular division within the parasite and 
an overall immobility effect [39]. Glu198, Phe167 and Phe200 are implicated in 
anthelmintic resistance within the colchicine binding site of the beta tubulin of hookworm 
[40].  Genetic changes such as single nucleotide polymorphisms (SNPs) in beta tubulin 
have been widely reported to convey the nematodic resistance in several parasitic 
nematodes and the human hookworm N. americanus.  These mutations have been found to 
occur at codons 167,198 and 200 [40 – 42]. In many cases, the changes lead to the 
substitution of phenylalanine with tyrosine at amino acid positions 167 and 200, and 
glutamate with alanine at amino acid position 198 [43]. These mutations or SNPs have 
been reported to be predominant in several benzimidazole-resistant nematodes.  
 
11 
 
University of Ghana  http://ugspace.ug.edu.gh
Benzimidazoles have also been used as fungicides, bacteriostatics, insecticides, antivira ls 
and anti-cancer agents due to low toxicity in mammals. They are believed to bind close to 
the colchicine-binding site on the beta tubulin molecule and disrupt microtubule function, 
but the precise mechanism of action of anthelminthic are poorly understood  [30, 44 – 46]. 
They have been enormously successful drugs but their continued use as antiparasitics is 
being threatened by the development of resistance [46].  
 
2.2.2 Other potential targets 
Other potential drug targets include ion channels which are pore-forming membrane 
proteins and protein complexes that play important role in  electrical signalling and fast 
synaptic transmission in cells [47]. Activation of ion channels makes them particular ly 
useful as targets for anthelmintics [48]. Examples of ion channels in helminths include 
nicotinic Acetylcholine receptors (nAchR), choline receptor, slo-1 K+ channels; latrophilin 
receptors, voltage-gated Ca2+ channels and Gamma Aminobutyric Acid (GABA) receptors 
[48]. Glutamate-gated Chloride (GluCl) channels are also macrocyclic lactones targets, 
including some  avermectin anthelmintics [49]. Cell signalling pathway targets such as G 
protein-couple receptors (GPCR) are implicated in the pathology of many diseases 
including hookworms. Neuropeptides, which are examples of G protein-couple receptors, 
are presumed to be either neurotransmitters or neurohormones involved in the regulat ion 
of both physiology and behaviour of nematodes [23]. Others include proteases [50], kinases 
[51], hydrolases, and catalases which are all required to help the adult worms feed on 
blood.  Single-domain serine proteases in N. americanus for example are potential targets 
for immunomodulation since they play a key role in immunomodulation.  
12 
 
University of Ghana  http://ugspace.ug.edu.gh
 
2.3 Existing Treatment Methods and their Molecular Targets 
Efforts to clearly understand and combat hookworm disease dates to 1916 when the 
Department of Helminthology at the Johns Hopkins School of Hygiene and Public Health 
was set up to combat hookworm using quantitative methods, and thus providing a 
framework for understanding the pathogenesis of the disease. Their efforts in addition to 
many others [27] resulted in the understanding the disease and a renewed interest in its 
control with chemotherapeutic agents. 
 
Recent years have seen notable advances in several control and treatment methods for 
improved therapeutic intervention. Mass Drug Administration (MDA) is currently used as 
a strategy for treating hookworm and it usually involves a combinatorial administration of 
benzimidazoles (albendazole or mebendazole) along with others including pyrantel 
pamoate, imidothiazoles (levamisole), oxantel, avermectin etc. It is, however, noteworthy 
that drug treatment against hookworm whether through MDA or other treatment strategies 
do not prevent re-infection. This is because, existing drugs are becoming significantly less 
effective due to its repeated and excessive use leading to drug resistance and treatment 
failures [52]. The parasitic resistance to the current anthelmintics has been attributed to the 
following: changes in drug translocation; receptor modification or post receptor 
modification and mutation [4, 44]. Moreover, there are issues of unknown mechanisms of 
action of some of these drugs after their recurrent and frequent usage [53, 54]. Table 1 
shows the most widely used anthelminthic drugs and their molecular targets. 
 
13 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14 
 
Table 1.1 Existing therapeutic agents and their mechanism of action [2],. All known 
anthelminthic drugs are being resisted by the nematodes. 
Anthelmintic Group Examples Target Issues 
Albendazole, 
Nematodes, Resistance, 
Benzimidazoles mebendazole, 
trematodes; β-tubulin  Ineffective 
fenbendazole 
Nematodes; nAchR Resistance, 
Imidazothiazoles Levamisole 
agonists  Ineffective 
Pyrantel, oxantel, Nematodes; nAchR Resistance, 
Tetrahydropyrimidines 
morantel agonists  Ineffective 
Nematodes; choline Resistance, 
Amino-acetonitriles Monepantel 
receptor agonists  Ineffective 
Nematodes; nAchR Resistance, 
Tribendimidine  
agonist  Ineffective 
Nematodes; nAchR Resistance, 
Spiroindoles Derquantel 
antagonist  Ineffective 
Nematodes; GluCl Resistance, 
Macrocyclic lactones Ivermectin, moxidectin 
activation  Ineffective 
Nematodes; GABA Resistance, 
Piperazines Piperazine 
receptor agonist  Ineffective 
 
University of Ghana  http://ugspace.ug.edu.gh
2.4 Natural Products (NP) and their Utility as Anthelminthic Therapeutics  
Nature has given us abundantly rich sources of medicinal agents that are used to treat a lot 
of diseases. Some naturally derived drugs that have become forerunner drugs in modern 
pharmaceutical care include quinone, cocaine, salicylic acid, digitalis, morphine, 
penicillin, ergotamine, reserpine, paclitaxel, digoxine, cyclosporine, and Vitamin A [55, 
56]. Evidence of documented practice of medicine was recorded on ancient Egyptian “the 
Ebers Papyrus” as far back as 1500BC [57]. The papyrus listed over 700 drugs most of 
which were plant derived with detailed formulation and use [57]. The first natural product 
to be isolated and analysed was morphine in the 19th century. Thousands of natural products 
described in that era are still relevant today although some are no longer in use. Natural 
products for their unique active ingredients are the important cornerstone in the 
pharmaceutical industry. They are known to possess enormous, exceptional structural and 
chemical diversity unmatched by any synthetic library [10].  Analysis of their chemical 
formulation has revealed that about more than 50% of them pass Lipinski rule of five for 
drug-likeness [58, 59]. The remainder of NP, however, are characterised by higher 
molecular weights and more rotatable bonds with desirably low logP values. This makes 
natural products more absorbable than the synthetic counterparts. Interestingly, about 40% 
of the chemical scaffolds found in natural products cannot be accounted by today’s drugs 
[60]. According to the Scripps Research Institute (2016) [61], from 1940s to date, 131 
(74.8%) out of 175 small molecule anticancer drugs are naturally inspired. Interestingly, 
half of the 20 approved small molecule New Chemical Entities (NCEs) in 2010 are natural 
products [62]. The perceived efforts geared towards exploring natural products as anti-
cancer lead compounds must be extended to infectious diseases such as hookworm drug 
15 
 
University of Ghana  http://ugspace.ug.edu.gh
discovery. Ghana, including the rest of the Africa continent, is endowed with vast natural 
flora and fauna, with the potential to exploit to identify new natural product derived lead 
compounds. 
 
2.4.1 Contribution of natural products to anthelminthic therapy 
Some natural products have for many years been used for the treatment of the human 
hookworm. The Chenopodium oil is an example of such natural products which was 
obtained from Jerusalem Oak (Chenopodium ambrosoides). It contains about 60% of a 
terpene peroxide known as ascaridol [63]. It is however, reported to have many side effects 
and not recommended for use in children and pregnant women. A few natural products 
have become the major fulcrum around which alternative therapies are being developed 
recently. Two of such are plant extracts namely Dalea ornate and Oemalaria cerasiformis 
that have been proven to show potential anthelminthic activity against Anclystoma 
cecylanicum [64]. Another example of use of natural products is highlighted in Ghana 
where researchers from the Noguchi Memorial institute of Medical Research and the 
chemistry department have found residual anthelminthic activity in a group of Dichapetalin 
compounds notably Dichapetalin A [6]. This shows how natural products can be exploited 
as anthelminthics aside the well-known benzimidazoles.  
 
 
16 
 
University of Ghana  http://ugspace.ug.edu.gh
2.4.2 Other naturally derived compounds effective against hookworm 
Halogenated hydrocarbons such as carbon tetrachlorides, hexachloroethane among others 
have also been recognised long ago to possess varying degrees of anthelminthic activity. 
However, only a few of these compounds have been used due to the wide range of side 
effects [63]. For example, Tetrachloroethylene (Nema®, Tetracap®), which have been 
used since 1925 as human anti-hookworm drug has many side effects including anaemia, 
somnolence, dizziness and headache. The phenols and their derivatives have also been 
shown to have marked activity against hookworm although they are no more used in 
clinical practice [63].  Some of the phenolic compounds that have been used include 1-
Bromo-beta-napthol (III), Hexylresorcinal (IV), 2,4,5-Trichlorophenol (V) (Ranestol®), 
2,6-Diiodo-4-nitrophenol (VI) (Disofen) and diospyrol (X) [63]. 
 
2.5 Computer-Aided Drug Design (CADD) 
The traditional approach of high-throughput screening (HTS) in the late 1990s have paved 
the way for modern drug discovery techniques that comprise ligand and structure based 
drug design in the search for lead compounds as drug candidates. Modern drug 
development techniques using computational techniques have more advantages due to the 
less considerable time and effort that need to be invested in screening and searching for 
promising drug candidates compared to traditional drug development. Especially important 
is the use of computational tools for identifying potential candidates through virtua l 
screening to pre-filter inactive and toxic compounds before performing clinical evaluation.  
17 
 
University of Ghana  http://ugspace.ug.edu.gh
2.5.1 Economic significance and time factor of CADD 
The process of designing drug before meeting the strict regulatory requirements of Food 
and Drugs Authority (FDA) and getting approval for marketing authorization in humans 
requires a lot of research, money and time [1]. This normally requires screening of millions 
of compounds to obtain a drug candidate which is taken through many years of 
experimental testing and pre-clinical studies. In silico methods such as virtual screening of 
small compounds against drug targets have proven advantageous over in vivo or in vitro 
experimental methods in terms of cost, effort and time by significantly decreasing the 
number of compounds and retaining only lead hits for further HTS. 
 
2.5.2 Steps involved of CADD 
The availability of a public library of compounds, bioactivity and target databases has aided 
in the application of in silico techniques to predict potential lead compounds and the 
binding affinities on therapeutically interesting targets. These techniques are successful in 
their prediction using good performance classification models and algorithms encoded 
within docking programs [1]. The drug discovery pipeline generally involves target 
identification, target validation, virtual screening, lead identification and subsequent 
optimization [1, 65, 66]. Figure 2.3 shows the drug design pipeline. Target identifica t ion 
is the process in which drug targets are identified through literature review and searching 
databases that have information of experimental results [66]. In target validation, identified 
targets are compared to each other based on their association with each other and their 
associated effect on the behaviour of disease cells and interaction with metabolites in the 
18 
 
University of Ghana  http://ugspace.ug.edu.gh
body. Lead identification involves the identification of compounds with plausible potential 
to treat diseases often referred to as lead compound often using molecular docking 
techniques. Molecular docking of drugs to large libraries of proteins has the capability of 
identifying potential targets. Hits are generally compounds that exhibit favourable activity 
in the screening process using virtual screening [67, 68]. Virtual screening is an effective 
means of molecular docking to search for potential compounds against the target protein 
by using computational approaches. Hits with good binding affinities as represented by 
low binding energies are optimized by  structural modification (normally using QSAR 
methods) to obtain improved potency and pharmacokinetic properties and desirably a 
reduced toxicity [69]. The major approaches to current drug design are (i) structure based 
and (ii) ligand based [67, 71]. In both cases, a library of compounds is virtually screened 
against the target of interest. Virtual screening of potential lead generates several 
conformations of the complex with different inhibitors and can be very essential for 
providing insights into the mechanism of interaction of the lead and receptor. It can also 
predict the occurrence of resistance, the identification of new binding sites, potential targets 
and the design and optimisation of lead compounds for therapeutic agents. Some CADD 
also employs a technique generally referred to as drug repurposing whereby drugs known 
to be efficacious against one disease is tested against other diseases [71].    
19 
 
University of Ghana  http://ugspace.ug.edu.gh
 
Figure 2.3. The drug design pipeline. Targets can be identified using data mining tools and 
databases. Depending on availability of targets, a structure based or ligand based is employed for 
 
hit identification. Hits become leads after predicting their pharmacokinetic properties such as drug 
likeness, toxicity. Quantitative Structure Activity Relationship (QSAR) and Quantitative Structure-
 Property Relationship (QSPR) are used for lead optimisations before they are experimentally 
characterised in pre-clinical trials and subsequently going to the market  
  
These in silico approaches among others are clearly advantageous in pre-filter ing 
potentially low in vivo activity drugs not showing ideal pharmacokinetic profile and mode 
of actions [69]. CADD can undoubtedly be used to provide more insight into the bioactivity 
profile of experimentally determined drugs and be employed to potentially predict 
additional drug targets implicated in various disease mechanisms and biological pathways. 
Its effectiveness for expediting drug discovery has been recognized for decades, without 
exception, as in the case of exploring natural products for CADD.  
 
2.5.3 Structure based drug design  
Structure based drug design (SBDD) is employed when the structure of the target protein 
is available. With an exponential increase in the number of protein structures deposited in 
20 
 
University of Ghana  http://ugspace.ug.edu.gh
the protein data bank (PDB, www.rcsb.org/pdb/) , the volume of research using SBDD 
have increase significantly [72]. The process of SBDD can be summarised in the three 
steps with the main goal of finding potential ligand binders to targets. The first step is target 
selection. There are several protein databases with huge information about proteins which 
have been solved experimentally by either X ray crystallography or Nuclear Magnetic 
Resonance (NMR). Notable among these databases are Protein Data Bank and UniProt 
(www.uniprot.org). If the target has not been solved experimentally, it is obtained 
computationally using homology or comparative modelling. Homology modelling is a 
technique that is used to construct a three-dimensional (3D) model of an unknown structure 
of a target based on the structure of a suitable homologous template. It comprises four 
steps: (i) search and identification of template, (ii) alignment of target to template sequence, 
(iii) construction of models and (iv) model quality evaluation [73]. 
 
Beyond that, there are several machine learning techniques that can be used to predict the 
secondary structure of the target even before constructing the 3D model. PHD program  
predictor [74] uses neural network by taking evolutionary information and mult ip le 
sequence alignments to predict beta strand, PSIPRED [75] involves a feed forward neural 
network based on PSIBLAST [76] outputs and HHpred [77] uses Hidden Markov 
modelling for homology detection and a host of others. Figure 2.4 shows the structure 
based drug design pipeline. 
 
21 
 
University of Ghana  http://ugspace.ug.edu.gh
Using the obtained structure of target, the second step is determining possible binding sites 
within the receptor. There are several computational tools that predict the binding site 
and/or druggabilty regions of druggable targets including Fpocket [78],  DoGSiteScorer 
[79], PRANK [80], Meta Pocket [81]. These programs employ machine learning 
algorithms to identify cavities/pockets and/or “druggable” regions. For example, 
SitePredict [82]  uses Random Forest algorithms to predict binding sites using information 
about the solvent accessible surface area, pocket volume, pocket principal components and 
nearby residue pair count.  
 
An important third step after determining the druggable sites, is to study the structure of 
the ligand-target interactions. Here, the intermolecular interactions, binding conformations, 
conformational changes induced by ligands are studied. The most used technique for 
binding conformation is molecular docking which explores receptor-ligand interactions 
and conformation of some residues in the binding pocket. Then, potential bioactive ligands 
are identified, purchased and subjected to various pre-clinical biological tests.  
 
 
 
 
 
 
22 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
 
 
 
 
 
  
 
Figure 2.4. Structure based drug design [83].  With the availability of a target of 
interest, molecular docking or pharmacophore mapping (similarity searches by 
fingerprints or topology using ligand models) can be used for hit identification in SBDD. 
 
Potential leads either from molecular modelling or ligand modelling can be further 
c haracterised experimentally. 
 
2.5.4 Ligand based drug design  
This technique is used when the structural information about the biological target is not 
available and therefore, cannot be experimentally determined or homology modelled. Most 
ligand based drug design methods include similarity search and construction of 
classification models using multivariate statistical analysis [84]. Similarity searches 
normally employ two-dimensional (2D) and three-dimensional (3D) descriptors. 2D 
descriptors include molecular fingerprints, topological descriptors and molecular 
properties whereas 3D descriptors may be molecular shapes and MACC fingerprints [85]. 
23 
 
University of Ghana  http://ugspace.ug.edu.gh
The goal of similarity searches is to calculate a similarity index (Tanimoto, Dice or Tversky 
coefficients) or a fitting score in the case of 3D descriptors such as QSAR models based 
on available information of active compounds to rank unknown compounds [86]. It is 
generally faster with lower computational cost than structure based drug design 
approaches. Virtual screening using ligand based drug design approach is founded on the 
principle that compounds that share structural similarities have similar biological activit ies 
[87]. There are several programs that have been developed for ligand based virtua l 
screening. LigandScout  is an example of ligand-based virtual screening tool [88]. Figure 
2.5 illustrates ligand based drug design. 
 
 
 
 
 
 
 
 
Figure 2.5 Ligand based drug design [83]. This involves similarity search of ligands that bind to 
 same binding site in target molecules represented using pharmacophore models. After leads have 
been identified, they are usually optimised by modification of their moieties. 
 
 
 
24 
 
University of Ghana  http://ugspace.ug.edu.gh
2.5.5 Molecular docking and virtual screening 
Molecular docking studies are used to determine protein-protein and/or protein-ligand 
interactions and evaluate their binding affinities. The two most widely used approaches in 
the case of SBDD are protein-ligand docking and virtual screening. Most docking programs 
employ conformational sampling of the protein complex with potential ligands or 
predicting the probability of the activity of the interacting protein receptor with several 
small compound ligands [14]. The protein ligand interactions are based on different 
approximations/objectives: force fields, search and optimization algorithms as well as 
score functions [14]. The degrees of freedom of both ligand and target are also considered 
in molecular docking simulations. In most cases, it is desirable for the receptor to be less 
flexible and the ligand more flexible to aid in the docking simulations and avoid producing 
false positive results [89]. The scoring function and optimizations/search algorithms are 
used to evaluate the performance of a docking simulation and they serve as the major 
determinants of the efficiency of the docking algorithm. An efficient docking algorithm is 
generally considered to be one that has a good and fast score function, and a good search 
or optimization function [14]. Most docking programs rely on the energy scoring function 
as a way of evaluating the quality of a docking simulation and they all have similar 
accuracies making energy scoring function an ideal choice for evaluating docking 
simulations [89, 90]. One of the challenges faced by most docking programs is the 
susceptibility of the receptor undergoing conformational changes [90]. Regardless of any 
energy minimisation technique that can be employed in reducing conformational changes 
within targets, the complete flexibility of the targets used in the docking study remains the 
major challenge faced by most docking programs. Some docking programs have addressed 
25 
 
University of Ghana  http://ugspace.ug.edu.gh
this issue by achieving partially flexible targets [91]. The approach normally used is a 
combination of Monte Carlo methods with molecular dynamics, simulated cooling and 
others. Some docking programs and the scoring functions employed are mentioned here. 
Popular molecular docking programs include GSADOCK [92], Glide [93], Fred [94], 
AutoDock [95], AutoDock Vina [96], GOLD [97] and FlexX [98]. GOLD and AutoDock, 
for example, use genetic algorithms as search or optimization function. AutoDock Vina 
uses a  hybrid scoring function that combines knowledge-based and empirical scoring 
functions [96].  
 
The protein-ligand docking described in this section predicts the probability of interaction 
of a ligand to a target but do not provide pIC50 or pKi that are retrieved from experimenta l 
bioassays. Besides approaches employed by various docking programs to overcome the 
challenge of conformational changes inherent in proteins, there is the still need to increase 
the performance of docking program by resorting to other approaches. This way, the results 
of bioactivity prediction are totally dependent on activity values in terms of pIC50 or pKi.  
 
2.5.6 Proteochemometric modelling (PCM) 
PCM is a quantitative bioactivity prediction technique that is used to predict the bioactivity 
of compound-target pairs, usually reported by pIC50 or pKi values which come directly 
from experimental bioassay as the true binding [99]. PCM uses compounds and related 
targets information in the construction of a single machine learning model [99], allowing 
the simultaneous prediction of compound affinities across multiple targets. In terms of cost, 
26 
 
University of Ghana  http://ugspace.ug.edu.gh
PCM is not as computationally expensive as molecular docking. However, PCM is limited 
by its inability to account for the bioactivity of compounds against unrelated targets [21]. 
In PCM, the descriptions of both the ligand and protein, and an additional term called 
ligand protein cross term could be correlated to the binding interactions. To create a PCM 
model, the binding interaction of a series of ligand and targets is needed to train the model 
to enable to prediction and exploration of known targets. In PCM, compounds can be 
described by structural descriptors including molecular fingerprints, topologica l 
descriptors, geometrical descriptors and three-dimensional grid-independent descriptors 
(GRINDs) [99]. Description of receptors can be determined by calculating the receptors' 
amino acid sequence compositions. PCM can be applied whether 3D information of the 
target is available or not. PCM like QSAR implements a wide range of machine learning 
techniques (including both linear and non-linear methods) to develop models. 
Proteochemometric modelling, is used to alleviate the limitations associated with QSAR. 
 
 
 
 
 
 
 
 
 
 
  
Figure 2.6 Proteochemometric modelling [102]. The bioactivity profiles of multiple 
 
targets and ligands are used to construct a single model that can be used to predict the 
bioactivity between untested targets say target A and ligand, say compound 2. 
27 
 
University of Ghana  http://ugspace.ug.edu.gh
2.5.6.1 Advantages of PCM over QSAR 
Quantitative structure activity relationship (QSAR), normally employed in LBDD, has 
over the last decades been one of the mainstream computational methods in addition to 
molecular docking in the search of viable lead compounds. The basic assumption 
underpinning the success of QSAR is that compounds that share similar chemical activity 
should share similar targets and targets sharing similar ligands should share similar 
properties [87]. It is a method that is used to quantitatively determine the relationship 
between the structure and biological activity of a compound using statistical analyt ica l 
methods. With QSAR, a model of the output variable can be constructed based on 
computed molecular descriptors using statistical method [100]. There are, however, some 
notable drawbacks and limitations with QSAR. One of the drawbacks of QSAR is that it 
only considers the interaction of groups of compounds with a single target and thus this 
requires sufficiently enough data about the target before a meaningful model can be 
constructed which rarely should be the case when searching for hits for previously 
identified targets.  Conventional QSAR approaches are limited in terms of finding new 
ligand classes or binding interactions for a set of new compounds. This is because, in the 
strictest sense, multiple ligands that bind to targets are not determined only by the chemical 
structure but also binding interactive residues. Further pitfalls with QSAR is that it is not 
be able to describe all aspects of binding interactions in the case where the model was 
trained on descriptors of certain class of compounds. That is, it will fail to predict anything 
outside its applicability domain. 
 
28 
 
University of Ghana  http://ugspace.ug.edu.gh
PCM outperforms QSAR in many ways and these findings are corroborated by many 
literature reports [21, 15] . One of the main advantage of PCM over QSAR is that it does 
not only model similar targets but also dissimilar ones allowing scientists to explore the 
extensive applicability domain of PCM for highly distant targets [99]. In terms of 
bioactivity, however, models that are built using PCM techniques are difficult to account 
for when the dataset covers unrelated targets. In silico prediction algorithms that have been 
used over the years include Naive Bayesian classifiers, Support Vector Machine (SVM), 
neural network, Random Forest (RF), and regression analysis. 
 
2.5.6.2 Application of machine learning in PCM 
Machine learning algorithm employed in PCM include SVM, Naïve Bayesian classifiers, 
and decision tree algorithms. Prior to constructing PCM models, the data should be pre-
processed based on a description given by Andersson et al [101] and van Westen et al 
[100]. Following that, chemical and protein descriptors are calculated based on which 
feature selection is done and the subsequent construction of models. Three popular machine 
algorithms are discussed here, namely SVM, RF and Gaussian Processes. 
 
2.5.6.2.1 Support Vector Machine (SVM) 
Support vector machines are a group of non-linear machine learning techniques that have 
gained a lot of popularity in PCM [102]. It is a type of machine learning technique for 
classification and/or regression that uses linear or non-linear kernel-functions to project 
data into a high-dimensional feature space [103]. SVMs are able to produce high 
29 
 
University of Ghana  http://ugspace.ug.edu.gh
performance models and efficiently able to deal with large dataset with high dimensiona l 
space [103, 104]. Interpretability is normally the major challenge faced by SVM but 
accuracies of models are improved by fine-tuning using the so-called hyper parameters, the 
most important being the kernel function parameter, γ and the error penalty parameter, C. 
SVMs generally use internal kernel methods, the Radial Basis Function (RBF) Kernels 
being the most dominant [99]. RBF have been shown to produce some reliable results on 
the performance of PCM. Wu et al [106] improved the mapping power of their PCM 
models for a set of histone deacetylases (HDAC's) by using a (Pearson function-based 
Universal Kernel) PUK kernel. Various authors have applied different types of the classical 
SVM including the Dual Component SVMs (DC-SVM), Transductive SVMs and 
Relevance Vector Machines (RVMs). DC-SVM based PCM were shown to outperform 
classical SVM based QSAR [102]. Notable is the RVM where the authors demonstrated 
how well it performed by employing binary classifiers trained on some dataset from the 
MDL Drug Data Report (MDDR) database and concluded that it must be applied in future 
PCM studies [102]. SVMs have contributed enormously as a useful algorithm in several 
PCM studies. 
 
2.5.6.2.2 Random Forest (RF) 
Random Forests form a unique group of nonlinear machine learning techniques which have 
a comparable performance to SVM [107]. RF generally constitute a decision tree 
comprising of nodes and branches. Each node represents a point where dataset is divided 
based on a selected attribute value so that instances of different classes are moved to 
different branches. RF classification is performed starting at the root node along the tree to 
30 
 
University of Ghana  http://ugspace.ug.edu.gh
the leaf nodes.  The collective result of all trees is used as an estimate of the performance 
of the classification Unlike SVMs, they involve relatively short training times with less 
hyper parameter tuning. Although highly interpretable, it suffers from its inability to output 
error estimates which are tremendously important due to the level of error and noise 
annotations associated with public bioactivity databases [108]. This is normally fixed by 
applying Quantile Regression Forests (QRF) based on quantile inferences from the 
conditional distribution of the class variable [108]. 
 
2.5.6.2.3 Gaussian Process 
Gaussian Processes (GP) are a group of kernel-based non-parametric machine learning 
method based upon Bayesian framework. As there are huge concerns with errors or the so 
called “noise” in bioactivity databases arising from data curation and experimenta l 
inaccuracies, GP aims to address these concerns by constructing probabilistic models using 
the uncertainties contained in the data as input [89]. For a given compound-target 
combination, the GP predicts using a Gaussian distribution whose variance defines 
confidence interval as a measure of the distance of the compound-target pair to the training 
set. GP models can be generally validated by the conventional statistical metrics, square of 
the correlation coefficient (R2 or Q2) [108, 109] but has also internal validations and 
assessments. GP has seen many applications in the chemogenomic space [110 – 112]. The 
downside with GPs however is the longer training time due to the algorithm of O(N3) time 
complexity [114]. 
 
31 
 
University of Ghana  http://ugspace.ug.edu.gh
2.6 Recent Efforts in Hookworm Drug Discovery 
Herein, previous efforts geared towards hookworm drug discovery are enumerated. A new 
cysteine protease inhibitor, oral single-dose anthelmintic that is active in an animal model 
of hookworm infection and demonstrated a distinct mechanism of action from current 
anthelmintic was discovered [115]. Drug repositioning and pharmacophore identifica t ion 
was utilised in the discovery of hookworm MIF Inhibitors by targeting AceMIF [116]. 
About 1600 FDA approved library of compounds were screened against laboratory models 
of human intestinal nematode infections. Hits that were identified were suggested to serve 
as a starting point for drug discovery for soil transmitted helminths [117] Also, lead 
chemotherapeutic agents from medicinal plants were identified against blood flukes and 
whipworms [118]. As reported elsewhere [119], a set of compounds that were known to 
show activity against parasitic nematodes were collated from various literature sources 
including PubChem while the inactive dataset was retrieved from DrugBank database 
based on Tanimoto cutoff range of 0.25 to 0.75. An SVM algorithm was used to construct 
a model and stratified 10-fold cross validation was used to evaluate the performance of 
each classifier using the radial basis function kernel. An accuracy of 81.79% was achieved 
for the model when an external independent test set was applied. The results reported were 
remarkable. The model was then used to identify novel compounds with potential 
anthelmintic activity. In another work, Ponce-Marrero et al [120] used a linear discriminant 
analysis to obtain a quantitative model for classification of anthelminthics and non 
anthelminthics. This novel approach resulted in a model that correctly classified 88.18% 
of the compounds in external test set. Virtual screening was used to validate the 
performance of the model where it identified several compounds annotated as 
32 
 
University of Ghana  http://ugspace.ug.edu.gh
anthelminthic in the Merck Index and Negwer’s handbook. Train-Match-Fit-Streamline 
(TMFS), novel rapid computational proteo-chemometric method were used to map new 
interaction space and map new drug targets. The method combined shape, topology and 
chemical signatures, including docking score and functional contact points of the ligand, 
to predict potential drug-target interactions. Extensive molecular fit computations were 
performed on 3,671 FDA approved drugs across 2,335 human protein crystal structures. 
The algorithm predicted drug-target associations with 91% accuracy for most drugs. Over 
58% of the known best ligands for each target were correctly predicted as top ranked [121]. 
Furthermore, TMFS method was used to discover that mebendazole had the structural 
potential to inhibit EGFR2. In another work [120], support vector machine approach was 
employed to predict compounds active against parasitic nematodes, suggesting the 
importance of employing computational approaches for anti-parasitic drug discovery. The 
method presented an alternative approach to the existing traditional methods and may be 
useful for predicting hitherto novel anthelmintic compounds. 
 
 
 
 
 
 
 
33 
 
University of Ghana  http://ugspace.ug.edu.gh
CHAPTER 3 
METHODS 
 
The methods used which were homology modelling of the protein of interest, molecular 
dynamics simulations, virtual screening, Absorption, Distribution, Metabolism, Excretion 
(ADME) and toxicity predictions, scaffold analysis of the most favourable docked 
compounds are presented in this chapter. The proteochemometric modelling techniques for 
anti-tubulin bioactivity are also presented in this chapter. The entire workflow of homology 
modelling to scaffold analysis is shown in Figure 3.1 and details explained subsequently. 
 
3.1 Template Identification and Homology Modelling of Proteins 
A search in PDB (http://www.rcsb.org/) revealed that the tertiary structure of none of the 
beta tubulins of N. americanus was publicly available. The primary sequence of the beta 
tubulin protein with Gene ID: NECAME_01536 was retrieved from UniProt [122] 
(Accession number, W2T758, length: 449 amino acid). The sequence was submitted for 
template and binding site identification via the Iterative Threading ASSEmbly Refinement 
(I-TASSER) server [123] . The I-TASSER server is an online server for automated protein 
structure prediction and structure-based function annotation. I-TASSER predicted a 
number of plausible templates. The D chain, which is present in the subunit of the mult i-
meric structure of tubulin tyrosine ligase (T2R-TTL) (PDB ID: 5c8y), was selected as the 
most plausible template based on the presence of amino acid residues associated with 
nematode resistance in the binding site as found by I-TASSER as well as a high sequence  
34 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.1. Workflow of protein modelling to scaffold analysis. The target of interest is modelled 
 using homology modelling. The model is subjected to molecular dynamics simulation and binding site 
identification. Virtual screening of small compounds from AfroDB and the North African Natural 
Product Database against the target are used for the identification of potential lead compounds. Binding 
 
affinity scores are used for ranking the docked compounds. The pharmacokinetic and pharmacological 
properties of the top-ranking compounds are identified by predicting their ADMET properties. Scaffold 
  analysis is used to compare the scaffold diversity and/or similarity between the top ranking natural 
products and anthelminthics. 
 
 
 
35 
 
University of Ghana  http://ugspace.ug.edu.gh
identity to the template. The crystallographic structure of tubulin tyrosine ligase was 
downloaded from PDB (PDB ID: 5c8y, resolution: 2.59 Å) and used as a template in 
modelling. MODELLER [124] is a software that is used to generate three dimensiona l 
models of proteins which are known as homology models. MODELLER aligns the target 
sequence with the template structure and builds 3D models based on a target/temp late 
alignment. The major characteristic of MODELLER is the extraction of spatial constraints 
such as template Cα- Cα distances, backbone dihedrals (φ/ψ), sidechain dihedrals and van 
der Waals contacts from the template which are applied to target sequence to generate the 
modelled target protein [124]. The align2d function in MODELLER v.9.16 was used to 
align the sequence of target with the template (files in Appendix A). Once a target-temp late 
alignment was constructed, MODELLER 9.17 was used to compute a five candidate 3-D 
models of the target using the whole sequence of the target protein. The best model was 
selected based on the lowest value of the MODELLER 9.17  objective function or the 
Discrete Optimized Potential Energy (DOPE) and high GA341 score [124]. DOPE and 
GA341 are in-built assessment scores used to assess the quality of the protein model 
generated by MODELLER. Protein models constructed using homology modelling 
normally produce unfavourable bond lengths, bond angles, torsion angles and contacts. 
The model was therefore refined to fix steric clashes and bumps by submitting the model 
in protein data bank (pdb) format to WHAT IF server [125] and energy minimized to 
correct local bond and angle geometry, and to relax close contacts in the geometric chain 
using Swiss-PdbViewer 4.10 [126]. The WHAT IF server implements WHAT_CHECK 
program on its server to check and fix steric clashes based on the overlap of two non-
bonding atoms of distance cutoff set at 0.4 Ǻ [125]. Swiss-Pdb Viewer is an application 
36 
 
University of Ghana  http://ugspace.ug.edu.gh
that can be used for visualization, homology modelling and 3D structural analysis of 
proteins and it includes GROMOS43B1 force for minimization of the protein structures. 
The refined model was visualized using educational version of PYMOL 1.74 [127] 
software and further subjected to molecular dynamics simulation described in subsequent 
steps. PYMOL is an open-source software for interactive visualization and analysis of the 
molecular structures. 
 
 
3.1.1 Model assessment and refinement 
Further assessments of the selected best model was done by generating a Ramachandran 
plot [128] using the PROCHECK 3.5.4 software. Other programs such as ERRAT, 
VERIFY3D and Qmean [128, 129] were used to corroborate the PROCHECK results. 
Homology models of proteins are usually subject to prediction errors. Therefore , 
PROCHECK 3.5.4 was used to assess the stereochemical qualities of the three-dimensiona l 
homology model. PROCHECK [131] a suite of C and Fortran programs, provides a way to 
check the stereochemistry of a protein by a detailed residue-by-residue listing with an 
assessment of the overall quality of the structure compared to refined structures produced 
at the same resolution.  
 
3.2 Molecular Dynamic Simulations of Modelled Protein 
The modelled structure of the tubulin may show good accuracy but to use it for virtua l 
screening, it is required to show good molecular dynamics behaviour as well. To evaluate 
the stability and folding, and obtain insights into the conformational changes as well as the 
dynamics of the modelled protein in solution, a 1 nanosecond (ns) molecular dynamics 
37 
 
University of Ghana  http://ugspace.ug.edu.gh
simulation was performed. The molecular dynamics (MD) simulations of modelled tubulin 
receptor was carried out with the Linux version of GROMACS 5.1.4 [132] software 
package by employing GROMOS 96_43a1 force field and the flexible Simple Point Charge 
(SPC) water model by passing “-water spce” command. The modelled structure was first 
immersed in a periodic water box of cubic shape (1 nm thick). After solvating the receptor, 
the net charge on the protein was +8e. Genion command in GROMACS was used to add 
8Cl- ions to neutralise the net charge on the protein. Electrostatic energy was calculated 
using the particle mesh Ewald method with a computational load of 0.19. Cutoff distance 
for the calculation of the coulomb and van der Waals interaction was 1.0 Ǻ. The Cutoff 
scheme used was Verlet.  
 
After energy minimization using a steepest descent for 50000 steps, the system was 
subjected to equilibration at 300k and normal pressure for two pico-seconds (ps) under the 
conditions of position restraints for backbone atoms. Linear Constraint Solver (LINCS) 
constraints were performed for all bonds, keeping the whole protein molecule fixed and 
allowing only the water molecule to move to equilibrate with respect to the protein 
structure. The final molecular dynamic calculations were performed for 1 ns under the 
same conditions. The results were analysed using GROMACS 5.1.4  [132] and GRACE 
5.1.4 [190]  plot software using the command xmgrace in a Linux terminal. The stabilised 
receptor file in gro format was uploaded as frames and saved in pdb format using Visual 
Molecular Dynamics (VMD) software 1.9.3 version [133]. VMD is a molecular graphics 
and visualisation program of molecular structures. 
38 
 
University of Ghana  http://ugspace.ug.edu.gh
3.3 Prediction and Analysis of Binding Site   
The potential colchicine binding site of the receptor or protein containing the amino acid 
residues of interest was predicted with MetaPocket 2.0 server [134], complemented with 
Computed Atlas of Surface Topography of proteins (CASTp) server [135] and analysed 
with AutoDock/Vina v2.2.0 plugin [136] in educational version of PYMOL 1.74  [127] 
before undertaking molecular docking. MetaPocket and CASTp are online servers for the 
prediction of ligand-binding sites. 
 
3.4 Protein Preparation 
AutoDockTools 4.2.6 version [137] was used to prepare both receptors and ligands. 
Gasteiger charges were calculated and polar hydrogens added with non-polar hydrogens 
merged using AutoDockTools. All water or solvent molecules were removed to eliminate 
the influence of solvent interactions in the protein-ligand docking. The receptor file was 
converted to protein file in pdbqt format which was used as input receptor file for 
AutoDock Vina (Vina). Receptor energy grid and parameters were generated using 
AutoDockTools.  The grid box was set to dimensions; 22.5 Ǻ x 22.5 Ǻ x 22.5 Ǻ for the 
receptor with coordinates of -18.35, -8.23, -22.48, and centered around amino acids 
Glu198, Phe167 and Phe200. 
 
3.5 Ligands Preparation 
AfroDb [138] subset of natural compounds from ZINC [139] database was downloaded as 
single batch file in Structure Data File (SDF) format on 17th November, 2016. ZINC 
39 
 
University of Ghana  http://ugspace.ug.edu.gh
contains millions of free collections of small molecules that can be used for virtua l 
screening. AfroDB is a collection of highly potent natural products isolated from African 
medicinal plants and a subset of the ZINC database [138]. The file retrieved from the 
AfroDb subset of ZINC contained a total of 885 molecules. Dichapetalin A and 
albendazole were added to make it up to 887. A different set of virtual compound library 
was retrieved from the Northern African Natural Product Database (NANPDB) [140] on 
12th June, 2017. The NANPDB contains a large collection of over 4,500 annotated natural 
products originating from North Africa [141]. The retrieved file was a file containing a 
single 3D structure of all the compounds in an SDF format. The file into 2297 molecules 
with custom bash script using Open Babel 2.3.1 [142] (Appendix A). All ligand files were 
first optimized and energy minimised using PRODRUG server [143] and Open Babel 2.3.1 
within the Pyrx 0.8 interface [142]. They were then converted to pdbqt files using Pyrx 0.8  
[144]. Pyrx is a computer software that can be used for small molecule virtual screening.  
 
3.6 Virtual Screening Analysis 
To find out the preferred binding modes of the ligands in the active site of the receptor, 
molecular docking analysis was performed using AutoDock Vina 1.1.1 [145] via a four 
core Intelcore-I7 processor Linux operating system machine. Docking involves 3 main 
steps, (i) protein preparation and grid box specification, (ii) ligand preparation and (iii) 
docking of ligand against protein. Protein and ligand preparation had been previously 
performed which meant that the next step involved virtual screening. The docking 
simulation of each compound for the first set of AfroDb virtual library compounds was 
conducted and the different binding conformations of the docking ligands were generated 
40 
 
University of Ghana  http://ugspace.ug.edu.gh
and scored. Lastly, the top-ranking results were selected based on their binding energies in 
the final output log files. Virtual screening analysis was conducted separately for the 
NANPDB virtual library compounds and the different binding conformations results were 
obtained as well. Conformational analysis of the ligands was employed to fit ligand 
molecules into the receptor using AutoDock Vina 1.1.1 version with details of the docking 
generated as docking log files (Appendix A). The log files were analysed and tabulated. 
AutoDock Vina is a molecular docking software that uses an empirical scoring function to 
calculate the binding affinities of protein-ligand complex by summing up contributions of 
the energies of the protein-ligand binding (measured as the sum of the distance-dependent 
atom pair interactions) [112]. The lowest binding affinity score is normally considered as 
the compound that exhibited the strongest binding. Docking of the ligands to receptor for 
the AfroDb compounds in addition to Dichapetalin A and albendazole completed in 1 day, 
15 hours, 8 minutes, and 45.327 seconds and that of NANPDB due to its large quantity 
completed in 3 days, 2 hours and 45 minutes. All protein ligand binding affinities were 
expressed in Kcal/mol. 
 
3.7 Interaction Profiling using LIGPLOT 
The protein-ligand complex interactions were computed using LIGPLOT 1.4.5 [146] . 
LIGPLOT provides a schematic 2-D representation of the hydrogen and hydrophobic 
interactions between ligand and active site residues of the protein-ligand complex. 
Hydrogen bond interactions were represented by dashed green lines, while hydrophobic 
interactions were represented by arc with spokes radiating towards the ligands and the 
number of hydrogen bonds with the active site residues.   
41 
 
University of Ghana  http://ugspace.ug.edu.gh
3.8 Absorption, Distribution, Metabolism and Excretion (ADME) Prediction 
ADME profiling was carried out on SwissADME server [147] which predicted the relevant 
ADME properties. The latter constitute the pharmacokinetic profile of drugs which has a 
direct effect on the pharmacodynamics of the drug molecule immediately after the drug is 
orally administered. SwissADME [147]  is an online webserver that allows the calculat ion 
of several physicochemical descriptors and the prediction of ADME parameters, 
pharmacokinetic properties and drug-likeness of small molecules. It requires the user to 
upload the SMILES format of the query molecule. SwissADME was used for the 
calculation of pharmacokinetic properties such as ESOL logS, molecular weight, lipinsk i 
rule (drug-likeness), Gastrointestinal (GI) absorption, Blood Brain Barrier (BBB) permeant 
and bioavailability score.  
 
3.9 Toxicity Prediction using OSIRIS Property Explorer in DataWarrior 
The toxicity profile of the top ten virtually screened compounds (in addition to 
Dichapetalin A and albendazole) and that of the NANPDB whose interaction profile have 
been previously investigated were further analysed using the OSIRIS property explorer 
embedded in DataWarrior 4.5.2 [148] in order to assess the toxicity of the drug candidates. 
This explorer gives drug relevant properties such as mutagenicity, irritancy and 
reproductive effect. The top ten compounds from the first set of screened AfroDb 
compounds in addition to Dichapetalin A and albendazole and the second set of screened 
NANPDB compounds were subjected to toxicity prediction using DataWarrior 4.5.2 by 
submitting the compounds in SMILES format and the results were investigated. 
 
42 
 
University of Ghana  http://ugspace.ug.edu.gh
3.10 Scaffold Analysis 
A scaffold analysis was conducted to compare the scaffold diversity and/or similar ity 
between the docked natural products from AfroDb and NANPDB with Dichapetalin A 
included (hereafter referred to as “list A”) and 16 anthelminthics including albendazole, 
mebendazole, febendazole, levamisole and piperazine (hereafter referred to as “list B”) 
(files in Appendix A).  List A included Dichapetalin A, and top 100 compounds each from 
AfroDB and NANPDB. The scaffolds of list A were compared with that of list B. Another 
analysis was conducted between list A and a dataset of only albendazole with mebendazole 
(hereafter referred to as “list C”). There are several ways to represent scaffolds [147, 148]. 
One of such representations is the Murcko framework/scaffold as proposed by Bemis and 
Murcko [149]. This framework has been used to analyse the structures of known drugs and 
to identify the similarities in screening compound library [11, 153]. The Murcko 
framework preserves only the molecular topology of the ring systems and removes any 
substituents that do not contain ring systems linked to the ring or ring side chains. It 
contains no three-dimensional structure or any stereochemistry [152]. Murcko scaffold 
analysis was used for exploration of similarity and/or diversity amongst the scaffolds in 
the datasets of: list A and list B and another analysis between list A and list C. This was 
conducted using DataWarrior v.4.5.2 by using the SMILES notation of the different 
datasets and then analysed the scaffold architecture using the Murcko framework. Scaffold 
frequency and counts were used to measure the distribution of compounds over unique 
scaffold present in the data subsets. Comparative studies were performed between the 
subsets of natural products and anthelminthics. Bar charts were used to characterise the 
43 
 
University of Ghana  http://ugspace.ug.edu.gh
distribution, diversity and/or similarity of the scaffolds within the datasets using 
DataWarrior. 
 
3.11 Proteo-Chemometric Predictive Model of Anti-Tubulin Activity 
3.11.1 Data collection 
Since beta tubulins are the primary targets of benzimidazoles, the keyword “beta tubulin” 
was used to query a chemogenomic database, BindingDB [153]. The query produced a 
dataset comprising a bioassay of tubulins that was subsequently retrieved from BindingDB 
on 20th of December 2016. The dataset comprised active and inactive ZINC compounds 
against mostly beta tubulins and a few other tubulins with Uniprot IDs: Q25270, P02554 
and Q6B856. BindingDB is a publicly accessible database currently containing over 
20 000 experimentally determined binding affinities of protein–ligand complexes [153]. A 
dataset of four hundred and thirty-seven (437) bioactivity data on tubulins was obtained 
from BindingDB. The retrieved bioassay data reported 129 compounds with potency 
against beta tubulins and other tubulins ranging from 100 nM and 8100000 nM for 
inhibition constants, Kd (nM), IC50 (nM), Ki (nM) and EC50 (nM), covering beta tubulins 
from Leishmania donovani, Sus scrofa and Bos taurus. Due to the different assay 
conditions of the experimental dataset, the bioactivity values provided in the dataset were 
not used in developing the PCM-SVM model. The dataset was instead labelled as actives 
(Ki, Kd, IC50 or EC50 bioactivity values equal to or lower than 1μM or 1000nM) and 
inactives (all remaining observations/protein–ligand combinations) [154] as explained in 
subsequent section of the chapter (section 3.11.2). The schematic workflow for the PCM 
modelling is summarised in Figure 3.2. 
44 
 
University of Ghana  http://ugspace.ug.edu.gh
 
Figure 3.2. Workflow for PCM modelling of beta tubulin bioactivity profiling. A 
 bioassay dataset is first retrieved from a chemogenomic database, BindingDB. The dataset 
comprises a bioassy of small compounds tested against beta tubulin variants. Ligand and 
compound descriptors are computed which are used in the construction of a single SVM 
based PCM. The PCM could be used for predicting the bioactivity between untested or new 
compounds and beta tubulins 
45 
 
University of Ghana  http://ugspace.ug.edu.gh
3.11.2 Pre-processing of dataset 
Ligand structures based on SMILES notations were processed with the R package camb 
version 2.0 [155] using the function, StandardiseMolecules, which enables the depiction 
of molecular structures in the same (standardised) form [155].  The function also allows 
the removal of inorganic molecules. All the data was then further pre-processed by 
removing salts within the software, PaDEL 6.0 version [156]. After calculation of 
descriptors (described in sections 3.11.3 and 3.11.4), constant and near constant predictors 
(called zero and near-zero variance predictors respectively [157]) which can sometimes be 
found in the dataset and do not add any significance to the data  were checked for removal 
with the nearZeroVar function of the R package caret [158]. The function removes 
predictors that have one unique value across samples (zero variance predictors), but also 
removes predictors that are few unique values relative to the number of samples. In general, 
a predictor is classified as near-zero variance if the percentage of unique values in the 
samples is less than 10% and when the frequency ratio mentioned above is greater than 19 
(95/5) [159]. The cor function from the R package stats 3.50 version [160] was then used 
to compute the pairwise correlation between descriptors. Pearson’s correlation coeffic ient 
greater than a threshold of 0.7 was performed to check and filter out similar descriptors 
using the findCorrelation function from the R package caret 6.0-76 version, set at a cutoff 
of 0.7. The rationale behind normalising the dataset by the removal of highly correlated 
features and near zero variance features was to avoid bias in the final model that would be 
built. The feature elimination pipeline, however, retained the features/descriptors that were 
computed (described in sections 3.11.3 and 3.11.4).  
 
46 
 
University of Ghana  http://ugspace.ug.edu.gh
The dataset was subdivided into two groups based on their experimental anti-tubulin 
bioactivities using the rule: “active” (IC50: < 1µM) and “inactive” (IC50: ≥1µM).  This is 
because, approximately 1000 nM activity values are normally considered as a right 
threshold for IC50 to differentiate a bioactivity or not  [153, 160].  A drug candidate is 
generally considered to have low nanomolar on target  with a concentration, IC50,  required 
to reduce the target activity or biological process by half [154].  
 
SVM training and testing sets require normalised class data input as binary values; hence 
the dataset was subsequently labelled by normalizing it as active dataset with a “1” and the 
non-active dataset with a “0”. The dataset was split into training and testing set with 67% 
for training and validation and the remaining 33% for held out or independent test set (not 
included in the training set). Thus, the training data contained 292 data samples, while the 
test set comprised 145 as listed in Table 3.1. 
 
Table 3.1. Complete dataset used for PCM. Training dataset comprise 292 samples 
while the test consisted of 145 samples. 
 
  
 
3.11.3 Ligand descriptions (Compound descriptors)  
In this work, circular molecular fingerprints and physicochemical descriptors were used to 
represent the ligands. Compounds were described with unhashed Morgan fingerpr ints 
47 
 
University of Ghana  http://ugspace.ug.edu.gh
using the MorganFPs function of the package camb version 2.0 in R 3.4.0 version. R is 
statistical programming language with a set of inbuilt functions and object oriented features 
for software development [162]. Morgan fingerprints encode chemical structures by 
considering atom neighbourhoods [119]. Maximal user-defined bond diameter is normally 
assigned to each substructure. In this study, the maximum diameter of the substructures 
considered was set to 4, whereas the length of the fingerprints was set to 128. 
Physicochemical descriptors were also computed using the GeneratePadelDescriptors 
function from the R package camb, version 2.0 [119] which invokes the software PaDEL. 
The unhashed circular fingerprints were used because they are interpretable, very important 
for the inhibition against the tubulins with good performance and widely adopted [84, 101, 
162]. 
 
3.11.4 Target descriptions (Protein descriptors) 
To describe the target space in PCM models, whole protein sequence descriptors were 
calculated using composition/transition/distribution (CTD) descriptors. CTD descriptors 
such as relative hydrophobicity, predicted secondary structure, and predicted solvent 
exposure were employed because they are widely used and shown to be interpretable as 
well [164]. CTD amino acid descriptors were calculated with the function SeqDesc from 
the R package camb version 2.0 [155].  
 
48 
 
University of Ghana  http://ugspace.ug.edu.gh
3.11.5 Exploratory principal component analysis (PCA) of compounds and target 
datasets 
Principal Component Analysis (PCA) was performed as a way of reducing the high 
dimensional dataset to a low-dimensional set of variables using R. Principal component 
analysis (PCA) is a mathematical method for dimensionality reduction that allows for 
multidimensional datasets to be visualized using two or three-dimensional plots with 
minimal loss of information or variance [164 – 166]. It is a method for finding the linear 
combination of a set of observations with the most possible variance, and can reveal 
important characteristics of the data structures including the similarities and differences 
within the dataset based on calculated descriptors, which are otherwise difficult to 
distinguish. PCA identifies the features or descriptors that show as much variation across 
the data. The prcomp and autoplot functions of the R package FactoMineR  version 1.2.4 
[168] were used to perform PCA and the result was plotted by clustering the dataset with 
ellipses. PCA was performed on the descriptors of the compounds and proteins separately.  
 
3.11.6 Model development 
Among the multitude of available machine learning binary classification algorithms, 
Support Vector Machine learning algorithm was employed because it is highly effective, 
robust and has been extensively successful in the field of drug discovery [169–171]. 
 
Support Vector Machine (SVM) developed by Vapnik [172] is a statistical learning method 
known to be popular owing to its effectiveness and robust performance.  The downside of 
49 
 
University of Ghana  http://ugspace.ug.edu.gh
using SVM or any other machine learning algorithm to the user is the backend statistica l 
and computer algorithms that are used which the user generally does not have control over 
or not preview to. The way SVM learns is by finding the maximal hyperplane to 
differentiate data points in a vector feature space. Optimisation of the models is done by 
fine tuning optimal hyperparameters (i.e., the gamma (γ) and C parameters). The dataset 
was first mean centered and scaled to unit variance using a preprocessing module 
(StandardScaler) from a python library, Scikit-learn 0.17. The dataset was then split into 
67% and 33% for internal training/validation and hold-out (test) set respectively. The 
module, SVM estimator, of the python library Scikit-learn 0.17 [173] was used to train a 
SVM model using radial basis kernel function (rbf) with 10-fold Stratified Cross validat ion 
due to the paucity of the dataset. The performance of the model was optimised by fine 
tuning the hyperparameters of C and gamma (γ) using the values: {"C": [0.1, 1, 10, 100, 
1000], "gamma": [0.1, 0.01, 0.001, 0.0001, 0.00001]} (See Appendix A for scripts). 
 
The basic mathematics underpinning SVM for classification carried out is as follows [174]: 
Suppose we are given a set of samples, that is, a series of input vectors xi ∈ Rd (i=1, ..., N) 
with corresponding targets (x1, y1), ..., (xm, ym), ..., y∈ {-1, +1}; where -1 and +1 are used 
to represent the two classes respectively; in our case “non-active” and “inactive”.  The goal 
in SVM is to construct a binary classifier or derive a decision function from the available 
data sample, which has a low probability of misclassification given a new data sample. It 
accomplishes that using an optimized linear separator, that is, construct a hyperplane, wT  x 
+ b = 0 that separates the 2 classes (can be extended to multi-class problem) [174]. Different 
50 
 
University of Ghana  http://ugspace.ug.edu.gh
mappings construct different SVM. The mapping xi ∈ Rd (i=1, ..., N) is performed by a 
kernel function: 
K(xi , xj )=Φ(xi )Φ(xj ) [174]                           (1) 
which is an inner dot product in the feature space, H mapped by Φ. In the case of radial 
basis kernel function (RBF) where γ is the width parameter, the equation is given as: 
K(xi , xj )=exp(-γ || xi –xj ||2 ) [174]                     (2) 
 
3.11.7 Validation of model performance 
In this work, the robustness of the model was assessed by a 10-fold cross validation and 
the prediction accuracy or predictability of the models evaluated by internal validation set. 
The internal validation set (i.e., the 67% data subset) was subjected to training and 10-fold 
cross-validation (10-fold CV). In a 10-fold CV scheme, one-fold (10%) of the data was left 
out as the test set, while the remaining 90% were used as the training set for constructing 
the predictive model. This technique was repeated iteratively until all folds were left out 
once. SVM model was internally validated by 10-fold cross-validation in the scikit-learn 
package.  In addition, the model was evaluated using the counts of True Positive (TP) and 
False Positives (FP) or over-predictions, True Negatives (TN) and False Negatives (FN) or 
missed predictions. Specificity and Sensitivity were calculated based on the latter values. 
The binary classification model was also evaluated by the receiver operating characterist ic 
(ROC).  
51 
 
University of Ghana  http://ugspace.ug.edu.gh
Accuracy was calculated as the percentage of correctly classified instances and computed 
as  
𝑇𝑃+𝑇𝑁
                               Accuracy =                                                  (3) 
𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁
Due to unequal size of the classes used and certain errors that may potentially be considered 
in the model as more serious than others (e.g. false negatives compared to false positives), 
accuracy did not serve as an optimal measure of model performance. Instead, the Area 
Under the Receiver Operating Characteristic (ROC) curve (AUC) was used and the ROC 
curve is basically as a measure of the discriminatory power that is insensitive to changes 
in class distribution usually in the test dataset. It was obtained by 
calculating sensitivity and specificity at various discrimination threshold levels (chance 
level). Sensitivity is the fraction of true positives (the true positive rate (TPR)) and 
Specificity is the fraction of the true negative rate (TNR).  
 
𝑇𝑃
Sensitivity =                                                   (4)  
𝑇𝑃+𝑇𝑁
𝑇𝑁
Specificity =                                                          (5) 
𝑇𝑁+𝐹𝑃
(𝑇𝑁×𝑇𝑁) −(𝐹𝑁 ×𝐹𝑃)
                       MCC =                                (6) 
√(𝑇𝑃+𝐹𝑁) ×(𝑇𝑁+𝐹𝑃) ×(𝑇𝑃+𝐹𝑃) ×(𝑇𝑁+𝐹𝑁)
 
In addition, a balanced measure like MCC, was also computed. An AUC usually greater 
than 50% indicates a good prediction, and an MCC equal to 1 indicates a perfect 
prediction while MCC equal to 0 indicates a random prediction. 
52 
 
University of Ghana  http://ugspace.ug.edu.gh
CHAPTER 4 
RESULTS AND DISCUSSION 
The results of the homology modelling, molecular dynamics simulation, molecular 
docking, active site interaction profiling, ADMET prediction and PCM undertaken in this 
study are presented and discussed in this chapter. 
 
4.1 Template Identification, Homology Modelling of Proteins and Validation 
The D chain of the crystal structure of tubulin tyrosine ligase (T2R-TTL) with PDB ID 
5c8y was chosen as the best template for homology modelling based on the presence of 
amino acid residues comprising Phe167, Glu198 and Phe200, which are associated with 
nematode resistance within the binding site. A sequence alignment with the template also 
demonstrated that the beta tubulin is highly homologous to the subunit of the T2R-TTL 
with 96% sequence identity (Figure 4.1). In addition, the results of the sequence alignment 
indicated that beta tubulin contained conserved residues Phe167, Phe200 and Glu198 
within the active site of interest, the colchicine binding site (Figure 4.2). 
 
The best modelled protein of N. americanus (UniProt ID W2T75) produced using 
MODELLER was selected based on low DOPE score and high GA341 score (Figure 4.3). 
The modelled protein is a monomer, folded into a β domain consisting of 11-stranded β-
sheets and 11 α-helices (Figure 4.3). 
 
53 
 
University of Ghana  http://ugspace.ug.edu.gh
   
Figure 4.1. A pairwise sequence alignment between the  beta tubulin sequence of N. 
americanus and D chain of the crystal structure with PDB ID, 5c8y. The initials represent the 
amino acid residues. The PDB ID of the homologous template and the accession number 
of the beta tubulin of N. americanus are provided on the left.  The highlighted residues 
supported with an asterisk (*) show the conserved residues between the sequences of the 
homologous template and N. americanus beta tubulin sequence.  
 
 
 
 
 
 
54 
 
University of Ghana  http://ugspace.ug.edu.gh
                  
Figure 4.2. Predicted binding site from I-TASSER [88] and rendered in PYMOL. The D chain 
of the crystal structure of the template, PDB ID, 5c8y is represented in gray. The binding pocket is 
represented as a green surface. 
 
       
Figure 4.3. 3D model of the beta tubulin of N. americanus (Uniprot ID, W2T75, where helices 
are shown in red, beta sheets in yellow and loops in green. Monomer subunit of tetramer template 
with PDB ID, 5c8y. 
55 
 
University of Ghana  http://ugspace.ug.edu.gh
The quality of the generated 3D model was evaluated via Ramachandran plot (Figure 4.4) 
using PROCHECK software. Ramachandran plots highlight the most favoured, allowed, 
generously allowed and disallowed regions of the modelled protein structure. Ideally, a 
model of reasonably high quality should have at least 90% residues in the core regions 
[128]. The Ramachandran plot for the predicted model showed that 92.3% of residues were 
within the most favourable region whilst 4.9 % were in the allowed region, suggestive that 
the predicted model was of reasonably high quality. In addition, the overall quality factor 
predicted by the ERRAT server for the model was 89.327 (Figure 4.5), which corroborates 
the quality of the model. ERRAT [175] provides the overall quality factor for non-bonded 
atomic interactions and the generally accepted range is greater than 50 for a high-qua lity 
model [131]. When the model was further validated using VERIFY 3D server [176], 
88.77% of the residues were predicted as having an average 3D-1D score greater than 0.2. 
 
 
 
 
56 
 
University of Ghana  http://ugspace.ug.edu.gh
 
Figure 4.4. Ramachandran plot of beta tubulin model from N. americanus 
 
obtained by PROCHECK: 92.3% residues in favourable regions (A, B, L); 7.7% 
residues in additional allowed region (a, b, l, p); 0.0% residues in generously allowed 
 regions (-a,-b,-p,-l); 0% residues in disallowed regions. 
57 
 
University of Ghana  http://ugspace.ug.edu.gh
  
 Figure 4.5. Errat plot. Black bars identify the misfolded region located distantly from the 
active site, gray bars demonstrate the error region between 95% and 99%, and white bars 
 indicate the region with a lower error rate for protein folding. 
 
 
 
 
4.2 Molecular Dynamics Simulation  
The results of the MD simulation of the receptor obtained using GROMACS indicated that 
the Root Mean Square deviation (RMSD) increased from the beginning but after a period 
of 0.5 ns, it remained almost constant for the rest of the duration of the simulation (Figure 
4.6). This suggests that the model has very low RMSD for the backbone with less Root 
Mean Square (RMS) fluctuations and flexibility, indicating that the model had a stable 
structure during the MD simulations.  
58 
 
University of Ghana  http://ugspace.ug.edu.gh
 
Figure 4.6. RMSD plot of the molecular dynamics simulation using GROMACS. A plot of 
 RMSD in nanometres (nm) against time in nanoseconds (ns). The RMSD increased from 0ns to 
 0.5ns and levels off with slight fluctuations to the end of 1ns. 
  
 
4.3 Prediction and Analysis of Binding Site 
The predicted binding pocket was found to contain all the residues whose mutation is 
associated with anthelminthic resistance (Figure 4.7A). The binding site had 41 residues 
which form the putative binding pocket. These 41 residues include Phe167, Glu198 and 
Phe200 (Figure 4.7B). 
 
59 
 
University of Ghana  http://ugspace.ug.edu.gh
 
Figure 4.7 Predicted colchicine binding site of beta tubulin from N. americanus. A. The 
binding pocket indicated in blue mesh surface and enclosed in a box is depicted in PYMOL. B. The 
amino acid residues of the colchicine binding site in the modelled beta tubulin receptor of 
hookworm are shown in green. 
 
4.4 Virtual Screening Analysis results 
The top 20 ligands from AfroDb in addition to Dichapetalin A and albendazole (Table 4.1) 
and that of NANPDB (Table 4.2) after virtual screening were ranked according to the 
decreasing order of negative binding affinity scores. The complexes of the top ranked 
60 
 
University of Ghana  http://ugspace.ug.edu.gh
ligands show how firmly fitted the ligands are within the binding pockets of the protein 
(Figures 4.8, 4.9, 4.10 and Appendix B).  
 
Table 4.1 Results of the molecular docking scores of the top 20 ligands from AfroDB library 
plus Dichapetalin A and albendazole. Two variants of RMSD metrics are also 
provided, rmsd/lb (RMSD lower bound) and rmsd/ub (RMSD upper bound). 
 
Ligand Binding Affinity (Kcal/mol) rmsd/ub rmsd/lb 
ZINC14760755 -8.6 0 0 
 
ZINC95485927 -8.5 0 0 
 ZINC95486082 -8.5 0 0 
ZINC95486263 -8.5 0 0 
ZINC14780716 -8.3 0 0 
 ZINC95485922 -8.3 0 0 
ZINC95486052 -8.3 0 0 
ZINC95485928 -8.2 0 0 
 ZINC13480348 -8.1 0 0 
ZINC28462577 -8 0 0 
 ZINC95486072 -8 0 0 
ZINC95486073 -8 0 0 
ZINC95486081 -7.9 0 0 
 ZINC33833639 -7.8 0 0 
ZINC95485992 -7.8 0 0 
 ZINC95486074 -7.8 0 0 
ZINC95486075 -7.8 0 0 
ZINC13365959 -7.7 0 0 
 ZINC13485435 -7.7 0 0 
ZINC15120680 -7.7 0 0 
Dichapetalin A -5.8 0 0 
 
  Albendazole -5.6 0 0 
 
 
ZINC14760755 had the strongest binding with a more negative binding affinity score of -
8.6 Kcal/mol (Table 4.1) from the first library screening. 504 ligands had more negative 
binding affinity than albendazole (Appendix A). Also, Dichapetalin A had a more negative 
binding affinity score than albendazole. Higher negative binding affinity is an indicat ion 
of stronger binding to the receptors and perhaps the ligands could be potential 
61 
 
University of Ghana  http://ugspace.ug.edu.gh
anthelminthic leads as well as inhibitors of beta tubulin receptor of N. americanus. Notably, 
Dichapetalin A exhibited higher negative binding affinity than albendazole with scores of 
-5.8Kcal/mol and -5.6Kcal/mol, respectively.  
 
Table 4.2: Results of the molecular docking scores of the top 20 ligands from the Northern 
 African Natural Product Database. Two variants of RMSD metrics are also provided, 
rmsd/lb (RMSD lower bound) and rmsd/ub (RMSD upper bound). 
L igand Binding rmsd/ub rmsd/lb 
Affinity 
(Kcal/mol) 
S,5Z,8Z,11Z,13E,17Z)-15-hydroxy-1-(2,4,6- -8.7 0 0 
trihydroxyphenyl)-15-methylicosa-5,8,11,13,17-
pentaen-1-one 
campesterol -8.4 0 0 
orthidine_A -8.2 0 0 
robustaflavone -8.2 0 0 
tetrahydrorobustaflavone -8.2 0 0 
siphonellinol_C -8.1 0 0 
6,10-dimethyl-9-methylene-2-(4-methyl-1,2- -7.9 0 0 
dioxabicyclo [2.2.2] oct-5-en-l-yl) undec-5-ene 
spinescen -7.9 0 0 
euphohelionon -7.9 0 0 
anchinopeptolide_A -7.9 0 0 
uzarigenin -7.9 0 0 
isorhamnetin_3- [3'''-feruloylrhamnosyl (16) -7.8 0 0 
galactoside 
1,2,3,6-tetra-O-galloyl-beta-D-glucose -7.8 0 0 
(+)-silychristin -7.8 0 0 
scopofarnol -7.8 0 0 
isoquercitrin_6''-O-p-hydroxybenzoate -7.8 0 0 
(-)-(R,R)-7'-O-methylcuspidaline -7.8 0 0 
quercetin-3-rutinoside -7.7 0 0 
auraptene -7.7 0 0 
rutin -7.7 0 0 
 
Results of the top 20 ligands from NANPDB as listed in Table 4.2 revealed the compound 
with structural formula S,5Z,8Z,11Z,13E,17Z)-15-hydroxy-1-(2,4,6-trihydroxyphenyl)-
62 
 
University of Ghana  http://ugspace.ug.edu.gh
15-methylicosa-5,8,11,13,17-pentaen-1-one as showing the strongest binding to the beta 
tubulin due to its more negative binding affinity score of -8.7 Kcal/mol. This suggest that 
it could also be a potential lead compound. 
 
Figure 4.8. Docking pose of ZINC14760755, beta-tubulin receptor complex. Pose 
 shows how well fitted the ligand is in the binding pocket as visualised in PYMOL.  The 
structure in cyan represents the receptor and the region encircled in black shows the docked 
ligand in the binding pocket. 
 
 
 
 
63 
 
University of Ghana  http://ugspace.ug.edu.gh
 
Figure 4.9. Docking pose of Dichapetalin A and albendazole beta-tubulin receptor complex. A. 
D ichapetalin A due to its long molecular chain has side chains projecting out of the pocket. B. 
Albendazole is well fitted inside the pocket. Image was rendered in PYMOL. The structure in cyan 
represents the receptor and the region encircled in black shows the docked ligand in the binding 
pocket. 
 
64 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 Figure 4.10. Docking pose of S,5Z,8Z,11Z,13E,17Z-15-hydroxy-1-(2,4,6-trihydroxyphenyl)-
15-methylicosa-5,8,11,13,17-pentaen-1-one from North African Database. Image was 
rendered in PYMOL. The structure in red, blue and white representation is the receptor and the 
 
region encircled in black shows the docked ligand in the binding pocket. 
 
 
4.5 Interaction Profile using LIGPLOT 
The interactions within the protein-ligand complexes were further analysed using the 
program LIGPLOT. The number of hydrogen bonds formed, bond distances and the 
interacting residues are shown in Tables 4.3 and 4.4. The hydrogen bond distances 
contribute to the stability of the ligands within the active site. Even though, ZINC14760755 
65 
 
University of Ghana  http://ugspace.ug.edu.gh
had the stronger binding with the receptor, it had fewer hydrogen bond interactions and 
more hydrophobic interactions with residues of the active site as compared to 
ZINC28462577, which had its complex stabilised with four hydrogen bonds through 
GLU198, GLN134, ASN256 and LYS350 (Figure 4.11). Remarkably, ZINC28462577 was 
found to have stabilised with GLU198, which is a key residue associated with 
anthelminthic activity with relatively strong hydrogen bonding (shorter bond length). The 
bond length of the interaction between ZINC28462577 and GLU198 is 3.05  Å and this 
happens to be the shortest length compared to other compounds in Table 4.3. This suggests 
that both ZINC28462577 and ZINC14760755 are potential inhibitors. The results also 
indicate that most of the ligands are stabilized inside the binding site mainly by a hydrogen 
bond with ASN247 and GLN134 (Table 4.3). Albendazole and Dichapetalin did not form 
hydrogen bonds with Phe167, Phe200 and Glu198, which are key residues associated with 
anthelminthic activity but stabilised with hydrophobic interactions instead (Figure 4.12). 
Dichapetalin A was involved in hydrophobic interaction with residues Val255, Lys350, 
Asn247 and Ala315 while albendazole had hydrophobic contacts with Ala 315, Glu198 
and Phe200. It is therefore tempting to suggest that perhaps Dichapetalin A, a potential 
anthelminthic compound may have different mechanisms of interaction from albendazole, 
even though, both are predicted to bind Ala315 through hydrophobic interaction bond 
within the colchicine binding site of the beta tubulin target in N. americanus.  
 
The interaction profile of ligand S,5Z,8Z,11Z,13E,17Z)-15-hydroxy-1-(2,4,6-
trihydroxyphenyl)-15-methylicosa-5,8,11,13,17-pentaen-1-one (Table 4.4) reveal 
stabilization of the ligand inside the binding site mainly by hydrogen bonds with VAL236, 
66 
 
University of Ghana  http://ugspace.ug.edu.gh
ASN256 and GLN134 with relatively short bond lengths of 2.87, 3.21 and 0.00 respectively 
(Figure 4.13). The results also show that the docked ligands in Table 4.4 make hydrophobic 
contacts with Phe200, Phe167 and Glu198 which suggests a weak stabilisation with those 
key residues (Appendix C). ZINC14760755 and S,5Z,8Z,11Z,13E,17Z)-15-hydroxy-1-
(2,4,6-trihydroxyphenyl)-15-methylicosa-5,8,11,13,17-pentaen-1-one do not form 
hydrogen bonds with Phe167, Phe200 and Glu198 but rather hydrophobic bonds. As 
reported, these key residues of beta tubulin protein of hookworm are implicated in drug 
resistance [46, 178, 179]. Perhaps, ZINC14760755 and S,5Z,8Z,11Z,13E,17Z)-15-
hydroxy-1-(2,4,6-trihydroxyphenyl)-15-methylicosa-5,8,11,13,17-pentaen-1-one ligands 
have alternative binding modes. 
67 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
Figure 4.11. Interaction profile of ZINC14760755 and ZINC28462577. Interaction profile of A. 
ZINC14760755 and B. ZINC28462577 from AfroDb. The green dotted lines indicate 
hydrogen bond interactions between the residues in green and ligand in blue. The residues behind 
red radiating spokes are involved in hydrophobic interaction with the ligand 
 
68 
 
University of Ghana  http://ugspace.ug.edu.gh
 
Figure 4.12. Interaction profile of the Dichapetalin A. and albendazole ligands . Interaction 
 profile of A. Dichapetalin A and B. Albendazole. The green dotted lines indicate hydrogen 
bond (H-bond) interactions between the residues and ligand in blue. The residues behind red 
radiating spokes are involved in hydrophobic interaction with the ligand 
 
 
69 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
Figure 4.13. Interaction profile of S,5Z,8Z,11Z,13E,17Z-15-hydroxy-1-(2,4,6-
 trihydroxyphenyl)-15-methylicosa-5,8,11,13,17-pentaen-1-one from North African Database 
as predicted by LIGPLOT [146]. The green dotted lines indicate H‑bond interactions between 
the residues in green and ligand in blue. The residues behind red radiating spokes are involved in 
 
hydrophobic interaction with the ligand 
 
70 
 
University of Ghana  http://ugspace.ug.edu.gh
Table 4.3. Results of number of hydrogen-bonds/hydrophobic-bonds  and contact 
 
residues of top ten ligands from AfroDB, and that of Dichapetalin A and 
albendazole.  
 
Ligand Binding Number of Key residue contact Bond 
Affinity hydrogen residues and distance 
(kcal/mol) bonds hydrophobic residues (Ǻ)  
ZINC14760755 -8.6 2 ALA315, LYS350 2.77, 3.01 
ZINC95485927     -8.5 0 Hydrophobic contacts none 
ZINC95486082 -8.5 2 ASN247 3.07, 3.32 
ZINC95486263 -8.5 3 GLN134, GLU198, 2.89, 3.10, 
ASN247 3.25 
ZINC14780716 -8.3 2 GLU198, ALA315 3.16, 2.95 
ZINC95485922 -8.3 3 ASN247, ASN256, 2.86, 2.97, 
LYS350 3.13 
ZINC95486052 -8.3 0 Hydrophobic  
ZINC95485928 -8.2 2 CYS239, ALA315 3.34, 3.00 
ZINC13480348 -8.1 1 ASN247 3.00 
ZINC28462577 -8.0 4 GLU198, GLN134, 3.05, 2.75, 
ASN256, LYS350 3.31, 3.32 
DICHAPETALIN -5.8 0 Hydrophobic  
(VAL255, LYS350, none 
A  ASN247, ALA315) 
ALBENDAZOLE -5.6 0 Hydrophobic  
(PHE200, GLU198, none 
ALA315) 
 
 
 
 
71 
 
University of Ghana  http://ugspace.ug.edu.gh
 
Table 4.4. Results of number of hydrogen-bonds/hydrophobic-bonds and contact 
residues of top ten ligands from NANPDB. 
 
Ligand Binding Number Key residue contact Bond  
Affinity of residues and Distance 
(kcal/mol) hydrogen hydrophobic residues (Ǻ)  
bonds 
S,5Z,8Z,11Z,13E,17Z- -8.7 3 GLN134, VAL236, 2.87, 3.21, 
15-hydroxy-1-(2,4,6- ASN256 0.00 
trihydroxyphenyl)-15-
methylicosa-
5,8,11,13,17-pentaen-1-
one 
campesterol -8.4 0 Hydrophobic contacts none 
(GLU198) 
orthidine_A -8.2 2 ASN256, ASN247, 3.02, 3.12, 
THR351, ASN348 3.13, 
(2.95, 
3.22) 
robustaflavone -8.2 3 ASN256, GLN134 3.21, 2.64 
tetrahydrorobustaflavone -8.2 2 ASN256, GLN134 3.22, 2.64 
siphonellinol_C -8.1 3 2.69, 3.26 
ASN256, VAL313, 
ASN348 
6,10-dimethyl-9- -7.9 0 Hydrophobic none 
methylene-2-(4-methyl- (GLU198, PHE200, 
1,2-dioxabicyclo [2.2.2] PHE167) 
oct-5-en-l-yl) undec-5-
ene 
spinescen -7.9 2 Hydrophobic 3.34, 3.00 
(GLU198, PHE200) 
euphohelionon -7.9 1 ASN256 3.19 
anchinopeptolide_A -7.9 3 ASN256, ASN247, 3.15, 3.17, 
THR351 (2.95,3.10) 
 
 
72 
 
University of Ghana  http://ugspace.ug.edu.gh
4.6 ADME Prediction and Pharmacokinetic Properties 
The results of the ADME properties (Tables 4.5 and 4.6) revealed that most of the virtua lly 
screened compounds had relatively low ESOL LogS which gave the indication of their 
poor solubility class of compounds. ZINC14760755 from Table 4.1 had a high 
gastrointestinal (GI) absorption which suggests that the compound can be absorbed into 
the intestinal tract when administered orally. The Blood Brain Barrier (BBB) penetration 
as the name implies gives an indication of the likelihood of the drug being delivered to the 
central nervous system (CNS). The top ten ligands in addition to Dichapetalin A and 
albendazole from AfroDB were found to have no permeation into the blood brain barrier  
with exception of ZINC9548608 (Table 4.5).  In terms of distribution, the P-glycoprotein 
(P-gp) are important members of ATP transporters for active efflux through membranes. 
Knowledge about whether a compound is a substrate or not to P-gp provides an indicat ion 
of how well it will be distributed. ZINC14760755, ZINC28462577, Dichapetalin A and 
albendazole were all found as a non-substrate/inhibitor of P-gp suggesting desirable 
distribution of the compounds in the circulatory system when administered. In terms of 
metabolism, Dichapetalin A was found to be relatively a better non-inhibitor of the 
CYP450 proteins while the others had at least one inhibition to a CYP450 protein (Table 
4.5). The CYP450 are a superfamily of iso-enzymes and key players in drug elimina tion 
[180].  Any inhibition to the CYP450 will lead to accumulation and drug to drug 
interactions due to low clearance of the drugs [180]. In terms of drug-likeness, the Lipinsk i 
rule of five (ro5) [181] was used as a measure which has the following criteria: the 
molecular weight should be less than 500, the lipophilicity, LogP (the logarithm of the 
partition coefficient between water and 1-octanol) should be less than 5, the number of 
73 
 
University of Ghana  http://ugspace.ug.edu.gh
hydrogen bond donor atoms in the molecules should be less than 5, and the number of 
hydrogen bond acceptors should be less 10 . The physicochemical descriptors computed 
using DataWarrior and SwissADME were used in the determination of whether a drug 
violated the ro5 or not. From the results provided in Table 4.5, ZINC14760755 and 
albendazole passed all Lipinski ro5 while ZINC28462577 and Dichapetalin A failed the 
Lipinski’s ro5 with one and two violations respectively.  This suggests that ZINC14760755 
exhibit high druglikeness and is potentially relevant in hookworm drug discovery [181].  
 
The results of the ADME properties computed by SwissADME for the top ten NANPDB 
compounds (Table 4.6) revealed that most of the compounds, similar to the top ten 
compounds in Table 4.5, had relatively a low ESOL Log S of -5.55 (poor solubility). The 
potential lead compound, S,5Z,8Z,11Z,13E,17Z-15-hydroxy-1-(2,4,6-trihydroxyphenyl)-
15-methylicosa-5,8,11,13,17-pentaen-1-one had a low GI absorption. Results of the 
prediction indicate favourable distribution and excretion predictions for most of the 
compounds in terms of the Pgp inhibition and inhibition to CYP450 isoenzymes. In terms 
of Lipinski ro5, S,5Z,8Z,11Z,13E,17Z-15-hydroxy-1-(2,4,6-trihydroxyphenyl)-15-
methylicosa-5,8,11,13,17-pentaen-1-one satisfied all ro5 criteria and therefore may 
indicate that it is the more druglike than its counterparts [181]. All top ten compounds from 
NANPDB were non-permeant into the blood brain barrier. Table 4.6 provides a 
comprehensive list of the ADME prediction results for the top ten compounds from 
NANPDB. 
 
 
74 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75 
 
Table 4.5. Results of ADME prediction of top ten virtually screened compounds and that of Dichapetalin A and albendazole. 
Abbreviations: GI, gastrointestinal; BBB, blood brain barrier, ESOL, estimated solubility; Pgp, P-glycoprotein, CYP, 
cytochrome P450. 
Compound GI BBB ESOL ESOL Lipinski Bioavailability Pgp CYP1A2 CYP2C19 CYP2C9 CYP2D6 CYP3A4 
absorption permeant Log S Class #violations Score substrate inhibitor inhibitor inhibitor inhibitor inhibitor 
ZINC14760755 High No -6.52 Poorly 0 0.55 No No Yes No No No 
soluble 
ZINC95485927 Low No - Insoluble 2 0.17 Yes No No No No No 
10.55 
ZINC95486082 High Yes -5.82 Moderately 0 0.55 Yes No Yes Yes No Yes 
soluble 
ZINC95486263 Low No -6.99 Poorly 3 0.17 No No No Yes No No 
soluble 
ZINC14780716 High No -6.55 Poorly 0 0.55 No Yes No Yes No Yes 
soluble 
ZINC95485922 Low No -7.48 Poorly 0 0.55 No No No No No No 
soluble 
ZINC95486052 High No -6.03 Poorly 0 0.55 Yes No Yes Yes No Yes 
soluble 
ZINC95485928 Low No -7.19 Poorly 0 0.55 No No Yes No No No 
soluble 
ZINC13480348 High No -6.26 Poorly 0 0.55 No No No Yes No Yes 
soluble 
ZINC28462577 Low No -7.14 Poorly 1 0.55 No No No Yes No No 
soluble 
Dichapetalin A High No -7.29 Poorly 2 0.17 No No No No No No 
soluble 
Albendazole High No -3.23 Soluble 0 0.55 Yes No No No No Yes 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76 
 
Table 4.6. Results of ADME prediction of top ten ranking compounds from NANPDB.  Abbreviations: GI, 
gastrointestinal; BBB, blood brain barrier, ESOL, estimated solubility; Pgp, P-glycoprotein, CYP, cytochrome P450. 
 
Compound GI BBB ESOL ESOL Lipins Bioav Pgp CYP1 CYP2C CYP2C9 CYP2 CYP3
absorp perme Log S Class ki ailabil substr A2 19 inhibitor D6 A4 
tion ant #viola ity ate inhibit inhibito inhibit inhibit
tions Score or r or or 
S,5Z,8Z,11Z,13E,17Z)-15- Low No -5.55 Moderatel 0 0.55 No No No Yes No Yes 
hydroxy-1-(2,4,6- y soluble 
trihydroxyphenyl)-15-
methylicosa-5,8,11,13,17-
pentaen-1-one 
campesterol Low No -7.54 Poorly 1 0.55 No No No No No No 
soluble 
orthidine_A Low No -2.5 Soluble 1 0.55 No No No No No No 
robustaflavone Low No -6.75 Poorly 2 0.17 No No No No No No 
soluble 
tetrahydrorobustaflavone Low No -6.75 Poorly 2 0.17 No No No No No No 
soluble 
siphonellinol_C High No -4.76 Moderatel 0 0.55 No No No No No Yes 
y soluble 
6,10-dimethyl-9-methylene-2- High No -5.33 Moderatel 1 0.55 No Yes Yes Yes Yes No 
(4-methyl-1,2-dioxabicyclo y soluble 
[2.2.2] oct-5-en-l-yl) undec-5-
ene 
spinescen Low No -7.42 Poorly 1 0.55 No No No No No Yes 
soluble 
euphohelionon Low No -8.95 Poorly 2 0.17 No No No No Yes No 
soluble 
anchinopeptolide_A Low No -2.51 Soluble 2 0.17 Yes No No No No No 
 
University of Ghana  http://ugspace.ug.edu.gh
4.7 Toxicity Prediction Analysis 
The results of the toxicity study summarised in Table 4.7 suggest that most of the drug 
candidates were found not to be tumorigenic and irritant. Notably, ZINC28462577 was 
found to be pre-eminent in terms of mutagenicity, tumorigencity, reproductive effect and 
irritation since it was predicted to be safe under all conditions. This suggests  that 
ZINC28462577 may have minimal and tolerable harmful effects when administered [182]. 
Dichapetalin A was predicted to show irritation while albendazole, a well-known 
anthelminthic drug [3, 186] was predicted to be safe under all conditions. Most compounds 
from the second set (NANPDB) including S,5Z,8Z,11Z,13E,17Z-15-hydroxy-1-(2,4,6-
trihydroxyphenyl)-15-methylicosa-5,8,11,13,17-pentaen-1-one were predicted to be 
toxicologically safe except euphohelionon which exhibited high tumerogencity and 
irritation effects. Anchinopeptolide A was also predicted to possess high irritation effects 
but safe under the other conditions of mutagenicity, tumorigenicity and reproductive  
effects (Table 4.7). 
 
77 
 
University of Ghana  http://ugspace.ug.edu.gh
 
Table 4.7. Toxicological profile results of top ten ranking compounds from both set of 
 virtual library compounds as predicted by DataWarrior.  
 
Compound Mutagenic Tumorigenic Reproductive Irritant 
 Effective 
ZIN C14760755 none none none none 
ZINC95485927 none none none none 
ZIN C95486082 none none none none 
ZINC95486263 none none none none 
ZINC14780716 low none none none 
ZIN C95485922 none none none none 
ZINC95486052 high none none none 
ZINC95485928 high none none high 
 
ZINC13480348 none none high none 
ZINC28462577 none none none none 
Dic hapetalin A none none none high 
Albendazole none none none none 
S,5 Z,8Z,11Z,13E,17Z-15-hydroxy-1- none none none none 
(2,4,6-trihydroxyphenyl)-15-
methylicosa-5,8,11,13,17-pentaen-1-
on e 
campesterol none none none none 
ort hidine A none none none none 
robustaflavone none none none none 
tetrahydrorobustaflavone none none none none 
sip honellinol C none none none none 
6,10-dimethyl-9-methylene-2-(4- none high none none 
 
methyl-1,2-dioxabicyclo [2.2.2] oct-5-
en-l-yl) undec-5-ene 
spinescen none none none none 
euphohelionon none high none high 
anchinopeptolide A none none none high 
 
 
4.8 Scaffold Analysis 
We conducted a scaffold analysis between list A and list B and another analysis between 
the list A and list C (Section 3.10). This was done to assess scaffold diversity or otherwise 
78 
 
University of Ghana  http://ugspace.ug.edu.gh
similarity to currently used anthelminthics. In addition, the analysis was performed to 
determine how unique the scaffolds are within the natural products because such findings 
may give an indication of unique mechanisms of action owing to unique affinity for 
biological targets [184]. Murcko scaffolds were used to characterise the compounds and 
scaffold counts/frequency were used to characterise the distribution of molecules over 
unique scaffolds. The results of the scaffold analysis of list A compared to the list B showed 
that 142 unique scaffolds were identified in list A whereas 13 unique scaffolds were 
identified in list B but a single scaffold overlap between list A and list B (Figure 4.17).  
The scaffold percentage was 71% and that of the known anthelminthics in list B was 81% 
uniquely represented by Mucko framework (Table 4.8). This gives an indication of the high 
scaffold diversity between the docked natural compounds dataset and anthelminthics. 
Analysis of the list A as compared to list C clearly shows different ring systems present 
suggesting a high diversity between the docked natural products and albendazole with 
mebendazole as illustrated in Figure 4.14B. The results of both analyses suggest a high 
diversity within the natural compounds when compared to their anthelminthic counterparts. 
These findings appears to support the recognition of the high chemical diversity in natural 
products when compared to synthetic libraries  [183, 184]. 
 
 
 
 
 
79 
 
University of Ghana  http://ugspace.ug.edu.gh
 
Figure 4.14. A bar plot of scaffold counts  versus the ring systems of 201 compounds from 
 list A compared to list B and list C. A. Comparison between list A and list B.  B. Comparison 
between list A and list C. The plot with an asterisk * against it is a plot of compounds that failed 
to generate Murcko scoffold. 
80 
 
University of Ghana  http://ugspace.ug.edu.gh
 Table 4.8. Scaffold diversity analysis of natural products and anthelminthics. Ns/M 
represents ratio of number of unique scaffold to total number of compounds. 
 
Category list Ns/M  
Natural Products (AfroDb, NANPDB, Dichapetalin A) 0.71 
Anthelminthics 0.81 
 
  
  
4.9 Proteochemometric Modelling 
The dataset retrieved from BindingDB comprised bioactivity profiles of 3 tubulins with 
Uniprot IDs: Q25270, P02554, Q6B856 with a sample size of 437. Although, the dataset 
had a low sample size, the performance of the model was complemented by the choice of 
algorithm and optimisation strategies used in achieving the best model. The bioactivity 
assays within the dataset were labelled differently due to the different assay conditions. 
Additionally, the binding affinity values that have been provided as different inhibitory 
constants (Kd (nM), IC50 (nM), Ki (nM) and EC50 (nM), makes it difficult to correlate the 
bioactivity prediction by the PCM to a precise inhibition constant. Thus, it was labelled as 
active and inactive using the criteria provided in section 3.11.1. Figure 4.15 is a plot of the 
distribution of active subset and inactive subset of the labelled dataset (class variables). It 
can also be observed from the plot that there was a high imbalance within the dataset. 
 
For the development of a PCM model, it required protein and ligand descriptors. The highly 
interpretable compound and protein descriptors that were computed included solvent 
accessibility, polarizability, relative hydrophobicity, predicted secondary structure, ring 
81 
 
University of Ghana  http://ugspace.ug.edu.gh
counts, substructure counts and electro-topological state descriptors. The total number of 
compound descriptors that were computed was 859 and that of the protein was 146 (Table 
4.9).  This resulted in a high dimensional dataset of 437 sample containing 1005 
descriptors.  
 
 
 
 
 
 
 
 
  
Figure 4.15.  Distribution of response variable (class) in the dataset. A bar plot of frequencies 
 of the response variables or class labels within the dataset. 
 
 
Table 4.9. Proteins and compounds descriptors used in the development of the model 
 
 
82 
 
University of Ghana  http://ugspace.ug.edu.gh
4.9.1 Exploratory principal component analysis (PCA) of compounds and target 
datasets 
The high dimensional descriptors of the dataset comprising of 437 interacting compounds 
within the dataset were analysed using PCA to determine how groups of the dataset are 
substantially different from each other in terms of the biological and chemical space 
respectively. A PCA analysis of the compounds showed three clustering of the compounds 
(blue, green and red). The closeness of the clusters explains that most of the compounds 
occupy the same chemical space [186]. The computed descriptors differentiated the 
compounds into three clusters, suggesting variability within the dataset. Therefore, these 
descriptors were included in the construction of the PCM [186]. However, there were some 
outliers as shown in the PCA plot (Figure 4.16A) indicating these compounds do not share 
similar chemical space with the rest. The distribution of the beta tubulin protein variants 
suggest that they occupy a widened biological space and gives an indication of the high 
variability within the dataset (Figure 4.16B). The first principal components (PC1) 
explained the maximum variance of 95 % for the compounds and 97% variance for protein. 
The first components of the PCAs based on the variance were significantly large to describe 
the variability within the dataset. The PCA plots thus have been used to provide a 
visualisation of how the descriptors separate the compounds and the proteins into clusters 
or groups. 
 
 
 
83 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
 
Figure 4.16. Chemical and biological space (compound–target interaction space) of beta 
 tubulin- inhibitor dataset. A. The PCA analysis of chemical descriptors shows a n overlap of 
compound descriptors in PC1 and PC2 space. B. The widened distribution within the amino 
 acid descriptors that suggests a wide variation within the orthologues information. 
 
84 
 
University of Ghana  http://ugspace.ug.edu.gh
4.9.2 Model development 
An SVM model was developed for prediction of antitubulin activity of compounds based 
on the bioassay dataset from BindingDb. The SVM model was constructed using a 
combination of the protein and ligand descriptors. Non-linear radial basis kernel function 
with stratified 10-fold CV was used to train the dataset while performing a grid search with 
the hyperparameters C and gamma to optimise the model.  The model was then predicted 
on the held-out test set that was not used in training the model. The CV was used to select 
the best kernel parameters and to evaluate the best performance of the model. The choice 
of stratified CV was as a result of imbalance dataset (Figure 4.15) within the class variables 
which could result in overfitting of the model for the majority class because of more non-
active than active datasets.  
 
4.9.2.1 Model validation 
To determine the predictive ability of the model on the test set, the MCC, AUC and 
classification error metrics were used.  The AUC value for the model (87%) indicates that 
the PCM model achieved a good predictive ability (Figure 4.17). The best hyper-
parameters achieved were: {'C': 1000, 'gamma': 0.001}. The model was able to correctly 
classify 129 non-actives (97%) and 10 actives (77%) with an overall accuracy of 96% and 
MCC of 0.75 (Table 4.10). Due to the number of inactive being more than active 
compounds in the training set, a stratified cross validation technique which was used 
yielded a model with good sensitivity and specificity, although the specificity was rather 
higher (Table 4.10).  The classification error of the model was 0.04, which shows a better-
85 
 
University of Ghana  http://ugspace.ug.edu.gh
balanced performance of the model. Based on the performance metrics, it can be concluded 
that the model yielded an overall good performance when tested on the independent test 
set (Table 4.10).  
 
After exhaustive search of literature, it appears that this is the first time a PCM has been 
applied to the bioactivity profiling of beta tubulin receptors. The performance of the model 
was compared to others  [19, 186]. Cao et al. [188] trained a random forest classifier on 
13, 079 data samples retrieved from BindingDB and PDSP Ki database. The target space 
was described with CTD descriptors and ligand space described with hashed circular 
morgan fingerprints. The classifier performed well with an AUC of 0.96.  Fernandez et al 
[189] reported a SVM based PCM for ligand-target modelling trained on a total of 8,235 
inhibitors for 95 sequences of kinase. The SVM could classify 82% of data to be stable or 
unstable indicating a reasonably high performance of the model.  Lapins et al  [187] 
developed a unified PCM model which showed excellent predictive ability with interna l 
AUC of 0.923 and an external AUC of 0.940 for predicting the inhibition of five major drug 
metabolizing CYP isoforms (CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4) 
using 63, 391 data samples. Comparing the SVM based PCM model built in this study to 
the aforementioned, it could be suggested that although the model was trained on a 
relatively smaller experimental dataset of 437 for 3 beta tubulin variants, it performed 
reasonably well with an AUC of 87%.  It is recommended that in order to enhance the 
performance of the PCM model and be able to significantly compare it to other models , 
especially in the case of Cao et al [188] and Fernandez et al [189], a large bioactive dataset 
86 
 
University of Ghana  http://ugspace.ug.edu.gh
must be generated from an experimental high through-put screening of beta tubulin 
receptors to retrain the classifier. 
 
Figure 4.17. Area under Receiver Operating Curve (AUC). A plot of TPR against false positive 
rate (FPR) constitutes the ROC curve shown in blue. The red dashes represent the chance level. A 
ROC curve lying above the chance level has usually has an AUC greater than 0.5. 
 
 
Table 4.10. SVM model parameters and evaluation of classification performance  
 
 
87 
 
University of Ghana  http://ugspace.ug.edu.gh
CHAPTER 5 
CONCLUSION AND RECOMMENDATION 
This study was intended to apply computational methods in the identification of novel 
anthelminthic drugs from natural products as well as using support vector machine based - 
proteochemometric modelling to predict the bioactivity of compounds to beta tubulin 
targets. The specific aims were homology modelling of 3D structure of beta tubulin of 
Necatar americanus; virtual screening of naturally derived compounds for the 
identification of potential anthelminthic agents; evaluation of pharmacological, drug-
likeness and toxicity profile of lead compounds; comparative analysis of scaffolds of the 
docked natural products and known synthetic anthelminthics, specifically albendazole and 
mebendazole; and preliminary exploration of proteochemometric based machine learning 
model as a plausible technique for bioactivity profiling of beta tubulin receptors. The main 
conclusions are presented herein: 
• Homology modelling was used to model a monomeric protein that is folded into a 
β domain consisting of 11-stranded β-sheet and 11 α-helices. 
• Analysis of the modelled protein using molecular dynamic simulation showed a 
good dynamic behaviour for a period of 1ns which allowed us to subject the 
stabilised receptor or protein to molecular docking.  
• Molecular docking and computational modelling techniques have been utilised for 
the identification of potential natural product-derived compounds against hookworm 
from AfroDB and NANPDB databases. ZINC28462577 from AfroDB, and 
S,5Z,8Z,11Z,13E,17Z-15-hydroxy-1-(2,4,6-trihydroxyphenyl)-15-methylicosa-
5,8,11,13,17-pentaen-1-one, from NANPDB, were selected as the most favourable 
88 
 
University of Ghana  http://ugspace.ug.edu.gh
potential inhibitors when binding energy, interaction profile and pharmacologica l 
properties were considered. 
• Analysis of the scaffold of the docked natural compounds as compared to 
albendazole and mebendazole revealed different ring systems present and therefore 
led us infer a high diversity between the docked natural products and the known 
anthelminthics. 
• In addition, this study developed a PCM model for the prediction of the inhibit ion 
of compounds against beta tubulin variants using a curated experimental dataset. 
The chemical compounds retrieved from the curated dataset were represented with 
circular or morgan fingerprints while the proteins were described by CTD 
descriptors. The training dataset comprising 437 data samples and 1005 total 
descriptors were still retained after pre-processing. The PCM model was built using 
support vector machine based on the radial basis kernel method with stratified 10-
fold CV and this yielded a model with a good overall predictive performance with 
AUC of 87%, MCC of 75%, overall accuracy of 96% and a classification error of 
4%.  
One of the challenges that was encountered with regards to the virtual screening was the 
huge computational cost involved. It is recommended that future virtual screening could 
be done on high performance computing clusters. Furthermore, the compounds identified 
in this study must be experimentally characterised for possible pre-clinical trials. PCM has 
significant advantage because its predictive ability can be extrapolated to other related 
targets that were not used to train the model. Due to the paucity of experimental dataset on 
beta tubulin specifically for hookworm bioactivity assays, a future direction can include 
89 
 
University of Ghana  http://ugspace.ug.edu.gh
high through-put screening of compounds in the wet lab to generate larger dataset on 
hookworm. The larger datasets can then be used to train the PCM-SVM model to enhance 
the performance and increase its reliability for prediction pertaining to hookworm. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90 
 
University of Ghana  http://ugspace.ug.edu.gh
REFERENCES 
[1] G. Sliwoski, S. Kothiwale, J. Meiler, and E. W. Lowe, “Computational Methods in 
Drug Discovery,” Pharmacol. Rev., vol. 66, no. 1, pp. 334–395, Jan. 2014. 
[2] S. Geerts and B. Gryseels, “Drug Resistance in Human Helminths:  Current 
Situation and Lessons from Livestock,” Clin. Microbiol. Rev., vol. 13, no. 2, pp. 
207–222, Apr. 2000. 
[3] P. A. Soukhathammavong et al., “Low efficacy of single-dose albendazole and 
mebendazole against hookworm and effect on concomitant helminth infection in 
Lao PDR,” PLoS Negl. Trop. Dis., vol. 6, no. 1, p. e1417, Jan. 2012. 
[4] H. A. Shalaby, “Anthelmintics Resistance; How to Overcome it?,” Iran. J. 
Parasitol., vol. 8, no. 1, pp. 18–32, 2013. 
[5] I. A. Sutherland and D. M. Leathwick, “Anthelmintic resistance in nematode 
parasites of cattle: a global issue?,” Trends Parasitol, vol. 27, 2011. 
[6] M. A. Chama et al., “Isolation, characterization, and anthelminthic activities of a 
novel dichapetalin and other constituents of Dichapetalum filicaule,” Pharm. Biol., 
vol. 54, no. 7, pp. 1179–1188, Jul. 2016. 
[7] M. L. Lee and G. Schneider, “Scaffold architecture and pharmacophoric properties 
of natural products and trade drugs: application in the design of natural product-
based combinatorial libraries,” J. Comb. Chem., vol. 3, no. 3, pp. 284–289, Jun. 
2001. 
[8] J. Clardy and C. Walsh, “Lessons from natural molecules,” Nature, vol. 432, no. 
7019, pp. 829–837, Dec. 2004. 
[9] A. M. Boldi, “Libraries from natural product-like scaffolds,” Curr. Opin. Chem. 
Biol., vol. 8, no. 3, pp. 281–286, Jun. 2004. 
[10] M. S. Butler, “The Role of Natural Product Chemistry in Drug Discovery,” J. Nat. 
Prod., vol. 67, no. 12, pp. 2141–2153, Dec. 2004. 
[11] K. Grabowski, K.-H. Baringhaus, and G. Schneider, “Scaffold diversity of natural 
products: inspiration for combinatorial library design,” Nat. Prod. Rep., vol. 25, no. 
5, pp. 892–904, Oct. 2008. 
[12] D. Morton, S. Leach, C. Cordier, S. Warriner, and A. Nelson, “Synthesis of natural-
product-like molecules with over eighty distinct scaffolds,” Angew. Chem. Int. Ed 
Engl., vol. 48, no. 1, pp. 104–109, 2009. 
[13] R. S. Bon and H. Waldmann, “Bioactivity-guided navigation of chemical space,” 
Acc. Chem. Res., vol. 43, no. 8, pp. 1103–1114, Aug. 2010. 
[14] X.-Y. Meng, H.-X. Zhang, M. Mezei, and M. Cui, “Molecular Docking: A powerful 
approach for structure-based drug discovery,” Curr. Comput. Aided Drug Des., vol. 
7, no. 2, pp. 146–157, Jun. 2011. 
[15] T. Qiu et al., “The recent progress in proteochemometric modelling: focusing on 
target descriptors, cross-term descriptors and application scope,” Brief. Bioinform., 
vol. 18, no. 1, pp. 125–136, Jan. 2017. 
[16] N. C. Sangster and J. Gill, “Pharmacology of anthelmintic resistance,” Parasitol. 
Today Pers. Ed, vol. 15, no. 4, pp. 141–146, Apr. 1999. 
[17] “WHO . The World Health Report 2002. Geneva: World Health Organization; 2002. 
Reducing risks, promoting healthy life;,” p. 192. 
91 
 
University of Ghana  http://ugspace.ug.edu.gh
[18] L. M. and W. JE, “Proteochemometric modeling of drug resistance over the 
mutational space for multiple HIV protease variants and multiple protease 
inhibitors. - PubMed - NCBI.” [Online]. Available: 
https://www.ncbi.nlm.nih.gov/pubmed/19391634. [Accessed: 19-Jul-2017]. 
[19] M. Lapinsh, P. Prusis, S. Uhlén, and J. E. S. Wikberg, “Improved approach for 
proteochemometrics modeling: application to organic compound—amine G protein-
coupled receptor interactions,” Bioinformatics, vol. 21, no. 23, pp. 4289–4296, Dec. 
2005. 
[20] J. E. S. Wikberg, O. Spjuth, M. Eklund, and M. Lapins, “Chemoinformatics Taking 
Biology into Account: Proteochemometrics,” in Computational Approaches in 
Cheminformatics and Bioinformatics, R. Guha and A. Bender, Eds. John Wiley & 
Sons, Inc., 2011, pp. 57–92. 
[21] S. Paricharak, I. Cortés-Ciriano, A. P. IJzerman, T. E. Malliavin, and A. Bender, 
“Proteochemometric modelling coupled to in silico target prediction: an integrated 
approach for the simultaneous prediction of polypharmacology and binding 
affinity/potency of small molecules,” J. Cheminformatics, vol. 7, Apr. 2015. 
[22] “Modeling, docking, simulation, and inhibitory activity of the benzimidazole 
analogue against b-tubulin protein from Brugia malayi for treating lymphatic 
filariasis,” ResearchGate. [Online]. Available: 
https://www.researchgate.net/publication/232238315_Modeling_docking_simulatio
n_and_inhibitory_activity_of_the_benzimidazole_analogue_against_b-
tubulin_protein_from_Brugia_malayi_for_treating_lymphatic_filariasis. [Accessed: 
30-Jul-2017]. 
[23] Y. T. Tang et al., “Genome of the human hookworm Necator americanus,” Nat. 
Genet., vol. 46, no. 3, pp. 261–269, Mar. 2014. 
[24] R. L. Pullan, J. L. Smith, R. Jasrasaria, and S. J. Brooker, “Global numbers of 
infection and disease burden of soil transmitted helminth infections in 2010,” 
Parasit. Vectors, vol. 7, p. 37, 2014. 
[25] N. R. Stoll, “This wormy world,” J. Parasitol., vol. 33, no. 1, pp. 1–18, Feb. 1947. 
[26] J. Bethony et al., “Soil-transmitted helminth infections: ascariasis, trichuriasis, and 
hookworm,” Lancet Lond. Engl., vol. 367, no. 9521, pp. 1521–1532, May 2006. 
[27] S. Brooker, J. Bethony, and P. J. Hotez, “Human Hookworm Infection in the 21st 
Century,” Adv. Parasitol., vol. 58, pp. 197–288, 2004. 
[28] “WHO | Prevention and control of schistosomiasis and soil-transmitted 
helminthiasis: WHO Technical Report Series N° 912,” WHO. [Online]. Available: 
http://www.who.int/intestinal_worms/resources/who_trs_912/en/. [Accessed: 29-
Nov-2017]. 
[29] A. Forrer et al., “Risk Profiling of Hookworm Infection and Intensity in Southern 
Lao People’s Democratic Republic Using Bayesian Models,” PLoS Negl. Trop. 
Dis., vol. 9, no. 3, Mar. 2015. 
[30] A. J. Daveson et al., “Effect of hookworm infection on wheat challenge in celiac 
disease--a randomised double-blinded placebo controlled trial,” PloS One, vol. 6, 
no. 3, p. e17366, 2011. 
[31] H. J. McSorley and A. Loukas, “The immunology of human hookworm infections,” 
Parasite Immunol., vol. 32, no. 8, pp. 549–559, Aug. 2010. 
92 
 
University of Ghana  http://ugspace.ug.edu.gh
[32] P. J. Hotez, P. J. Brindley, J. M. Bethony, C. H. King, E. J. Pearce, and J. Jacobson, 
“Helminth infections: the great neglected tropical diseases,” J. Clin. Invest., vol. 
118, no. 4, pp. 1311–1321, Apr. 2008. 
[33] J. G. Shaw and J. F. Friedman, “Iron deficiency anemia: focus on infectious diseases 
in lesser developed countries,” Anemia, vol. 2011, p. 260380, 2011. 
[34] W. Walana, E. N. K. Aidoo, and S. C. K. Tay, “Prevalence of hookworm infection: 
a retrospective study in Kumasi,” Asian Pac. J. Trop. Biomed., vol. 4, no. Suppl 1, 
pp. S158–S161, May 2014. 
[35] J. J. Verweij et al., “Determining the prevalence of Oesophagostomum bifurcum 
and Necator americanus infections using specific PCR amplification of DNA from 
faecal samples,” Trop. Med. Int. Health TM IH, vol. 6, no. 9, pp. 726–731, Sep. 
2001. 
[36] “Cell Biology 06: The Cytoskeleton Part II: Tubulin.” [Online]. Available: 
http://www.cureffi.org/2013/03/10/cell-biology-06-the-cytoskeleton-part- ii-tubulin/. 
[Accessed: 19-Jul-2017]. 
[37] B. Fennell et al., “Microtubules as antiparasitic drug targets,” Expert Opin. Drug 
Discov., vol. 3, no. 5, pp. 501–518, May 2008. 
[38] M. S. Kwa, J. G. Veenstra, M. Van Dijk, and M. H. Roos, “Beta-tubulin genes from 
the parasitic nematode Haemonchus contortus modulate drug resistance in 
Caenorhabditis elegans,” J. Mol. Biol., vol. 246, no. 4, pp. 500–510, Mar. 1995. 
[39] E. Lacey, “The role of the cytoskeletal protein, tubulin, in the mode of action and 
mechanism of drug resistance to benzimidazoles,” Int. J. Parasitol., vol. 18, no. 7, 
pp. 885–936, Nov. 1988. 
[40] T. V. Hansen, S. M. Thamsborg, A. Olsen, R. K. Prichard, and P. Nejsum, “Genetic 
variations in the beta-tubulin gene and the internal transcribed spacer 2 region of 
Trichuris species from man and baboons,” Parasit. Vectors, vol. 6, p. 236, 2013. 
[41] M. H. Roos, “The molecular nature of benzimidazole resistance in helminths,” 
Parasitol. Today, vol. 6, no. 4, pp. 125–127, Apr. 1990. 
[42] E. Redman et al., “The Emergence of Resistance to the Benzimidazole 
Anthlemintics in Parasitic Nematodes of Livestock Is Characterised by Multip le 
Independent Hard and Soft Selective Sweeps,” PLoS Negl. Trop. Dis., vol. 9, no. 2, 
Feb. 2015. 
[43] J. Vercruysse et al., “Is anthelmintic resistance a concern for the control of human 
soil-transmitted helminths?,” Int. J. Parasitol. Drugs Drug Resist., vol. 1, no. 1, pp. 
14–27, Dec. 2011. 
[44] K. Lalchhandama, “Anthelmintic resistance: the song remains the same,” Sci. Vis. 
[45] Y. Ruckebusch, P.-L. Toutian, and G. D. Koritz, Veterinary Pharmacology and 
Toxicology. Springer Science & Business Media, 2012. 
[46] L. F. V. Furtado, A. C. P. de Paiva Bello, and É. M. L. Rabelo, “Benzimidazole 
resistance in helminths: From problem to diagnosis,” Acta Trop., vol. 162, no. 
Supplement C, pp. 95–102, Oct. 2016. 
[47] H. Lodish, A. Berk, S. L. Zipursky, P. Matsudaira, D. Baltimore, and J. Darnell, 
“Molecular Properties of Voltage-Gated Ion Channels,” 2000. 
[48] R. M. Greenberg, “Ion Channels and Drug Transporters as Targets for 
Anthelmintics,” Curr. Clin. Microbiol. Rep., vol. 1, no. 3, pp. 51–60, 2014. 
93 
 
University of Ghana  http://ugspace.ug.edu.gh
[49] W. C. Campbell, M. H. Fisher, E. O. Stapley, G. Albers-Schönberg, and T. A. 
Jacob, “Ivermectin: a potent new antiparasitic agent,” Science, vol. 221, no. 4613, 
pp. 823–828, Aug. 1983. 
[50] J. Wei et al., “The hookworm Ancylostoma ceylanicum intestinal transcriptome 
provides a platform for selecting drug and vaccine candidates,” Parasit. Vectors, 
vol. 9, no. 1, p. 518, 2016. 
[51] P. Cohen, “Protein kinases--the major drug targets of the twenty-first century?,” 
Nat. Rev. Drug Discov., vol. 1, no. 4, pp. 309–315, Apr. 2002. 
[52] Keiser J and Utzinger J, “Efficacy of current drugs against soil-transmitted helminth 
infections: Systematic review and meta-analysis,” JAMA, vol. 299, no. 16, pp. 
1937–1948, Apr. 2008. 
[53] M. Katz, “Anthelmintics,” Drugs, vol. 32, no. 4, pp. 358–371, Oct. 1986. 
[54] P. Köhler, “The biochemical basis of anthelmintic action and resistance.,” Int. J. 
Parasitol., vol. 31, no. 4, pp. 336–345, Apr. 2001. 
[55] D. J. Newman, G. M. Cragg, and K. M. Snader, “Natural products as sources of new 
drugs over the period 1981-2002,” J. Nat. Prod., vol. 66, no. 7, pp. 1022–1037, Jul. 
2003. 
[56] M. Lahlou, “The Success of Natural Products in Drug Discovery,” vol. 2013, Jun. 
2013. 
[57] “History of Iran: History of ancient Medicine in Mesopotamia & Iran.” [Online]. 
Available: 
http://www.iranchamber.com/history/articles/ancient_medicine_mesopotamia_iran.
php. [Accessed: 30-Jun-2017]. 
[58] D. G. I. Kingston, “Modern natural products drug discovery and its relevance to 
biodiversity conservation,” J. Nat. Prod., vol. 74, no. 3, pp. 496–511, Mar. 2011. 
[59] Y.-W. Chin, M. J. Balunas, H. B. Chai, and A. D. Kinghorn, “Drug discovery from 
natural sources,” AAPS J., vol. 8, no. 2, pp. E239-253, Apr. 2006. 
[60] “Natural Products Drug Discovery |.” [Online]. Available: 
https://www.omicsonline.org/conferences-list/natural-products-drug-discovery. 
[Accessed: 19-Jul-2017]. 
[61] “EXPANDING NATURAL PRODUCT SPACE,” Chemdiv, 19-Aug-2014. 
[Online]. Available: http://www.chemdiv.com/natural-product-libraries/. [Accessed: 
30-Jun-2017]. 
[62] | A., “Why Natural Products? – JCC Marketing, LLC.” . 
[63] E. Ravina, The Evolution of Drug Discovery: From Traditional Medicines to 
Modern Drugs. John Wiley & Sons, 2011. 
[64] “Chemistry | CWU Professors Awarded $360,000 to Fight Scourge of Hookworms.” 
[Online]. Available: https://www.cwu.edu/chemistry/cwu-professors-awarded-
360000-fight-scourge-hookworms. [Accessed: 30-Jun-2017]. 
[65] N. Prakash and P. Devangi, “Drug Discovery,” J. Antivir. Antiretrovir., vol. 2, no. 4, 
Dec. 2010. 
[66] C.-L. Hung and C.-C. Chen, “Computational approaches for drug discovery,” Drug 
Dev. Res., vol. 75, no. 6, pp. 412–418, Sep. 2014. 
[67] T. Zhu et al., “Hit Identification and Optimization in Virtual Screening: Practical 
Recommendations Based Upon a Critical Literature Analysis,” J. Med. Chem., vol. 
56, no. 17, pp. 6560–6572, Sep. 2013. 
94 
 
University of Ghana  http://ugspace.ug.edu.gh
[68] K. M. M. Jr, D. Ringe, and C. H. Reynolds, Drug Design: Structure- and Ligand-
Based Approaches. Cambridge University Press, 2010. 
[69] “Computer-Aided Drug Design of Bioactive Natural Products (PDF Download 
Available),” ResearchGate. [Online]. Available: 
https://www.researchgate.net/publication/274892654_Computer-
Aided_Drug_Design_of_Bioactive_Natural_Products. [Accessed: 14-Apr-2017]. 
[70] P. Aparoy, K. Kumar Reddy, and P. Reddanna, “Structure and Ligand Based Drug 
Design Strategies in the Development of Novel 5-LOX Inhibitors,” Curr. Med. 
Chem., vol. 19, no. 22, pp. 3763–3778, Aug. 2012. 
[71] H.-M. Lee and Y. Kim, “Drug Repurposing Is a New Opportunity for Developing 
Drugs against Neuropsychiatric Disorders,” Schizophr. Res. Treat., vol. 2016, p. 
e6378137, Mar. 2016. 
[72] “Exponential growth in the number of X-ray protein structures deposited... - Figure 
2 of 7,” ResearchGate. [Online]. Available: 
https://www.researchgate.net/figure/233541013_fig2_Exponential-growth-in-the-
number-of-X-ray-protein-structures-deposited- in-the-Protein. [Accessed: 15-Apr-
2017]. 
[73] N. Eswar et al., “Comparative Protein Structure Modeling Using Modeller,” Curr. 
Protoc. Bioinforma. Ed. Board Andreas Baxevanis Al, vol. 0 5, p. Unit-5.6, Oct. 
2006. 
[74] B. Rost, “PHD: predicting one-dimensional protein structure by profile-based neural 
networks,” Methods Enzymol., vol. 266, pp. 525–539, 1996. 
[75] L. J. McGuffin, K. Bryson, and D. T. Jones, “The PSIPRED protein structure 
prediction server,” Bioinforma. Oxf. Engl., vol. 16, no. 4, pp. 404–405, Apr. 2000. 
[76] A. Agrawal and X. Huang, “PSIBLAST_PairwiseStatSig: reordering PSI-BLAST 
hits using pairwise statistical significance,” Bioinforma. Oxf. Engl., vol. 25, no. 8, 
pp. 1082–1083, Apr. 2009. 
[77] J. Söding, A. Biegert, and A. N. Lupas, “The HHpred interactive server for protein 
homology detection and structure prediction,” Nucleic Acids Res., vol. 33, no. Web 
Server issue, pp. W244-248, Jul. 2005. 
[78] V. Le Guilloux, P. Schmidtke, and P. Tuffery, “Fpocket: An open source platform 
for ligand pocket detection,” BMC Bioinformatics, vol. 10, p. 168, 2009. 
[79] A. Volkamer, D. Kuhn, F. Rippmann, and M. Rarey, “DoGSiteScorer: a web server 
for automatic binding site prediction, analysis and druggability assessment,” 
Bioinforma. Oxf. Engl., vol. 28, no. 15, pp. 2074–2075, Aug. 2012. 
[80] “Improving protein-ligand binding site prediction accuracy by classification of inner 
pocket points using local features (PDF Download Available),” ResearchGate. 
[Online]. Available: 
https://www.researchgate.net/publication/275663912_Improving_protein-
ligand_binding_site_prediction_accuracy_by_classification_of_inner_pocket_points
_using_local_features. [Accessed: 15-Apr-2017]. 
[81] B. Huang, “MetaPocket: a meta approach to improve protein ligand binding site 
prediction,” Omics J. Integr. Biol., vol. 13, no. 4, pp. 325–330, Aug. 2009. 
[82] C. Zheng, M. Wang, K. Takemoto, T. Akutsu, Z. Zhang, and J. Song, “An 
Integrative Computational Framework Based on a Two-Step Random Forest 
95 
 
University of Ghana  http://ugspace.ug.edu.gh
Algorithm Improves Prediction of Zinc-Binding Sites in Proteins,” PLOS ONE, vol. 
7, no. 11, p. e49716, Nov. 2012. 
[83] Y.-C. Lo, R. Gui, H. Honda, and J. Z. Torres, “Quantitative Methods in System-
Based Drug Discovery,” 2016. 
[84] C.-H. Lee, H.-C. Huang, and H.-F. Juan, “Reviewing Ligand-Based Rational Drug 
Design: The Search for an ATP Synthase Inhibitor,” Int. J. Mol. Sci., vol. 12, no. 8, 
pp. 5304–5318, Aug. 2011. 
[85] R. C. Glem, A. Bender, C. H. Arnby, L. Carlsson, S. Boyer, and J. Smith, “Circular 
fingerprints: flexible molecular descriptors with applications from physical 
chemistry to ADME,” IDrugs Investig. Drugs J., vol. 9, no. 3, pp. 199–204, Mar. 
2006. 
[86] M. Kuhn, “Quantitative-Structure Activity Relationship Modeling and 
Cheminformatics,” in Nonclinical Statistics for Pharmaceutical and Biotechnology 
Industries, L. Zhang, Ed. Springer International Publishing, 2016, pp. 141–155. 
[87] D. Rognan, “Chemogenomic approaches to rational drug design,” Br. J. 
Pharmacol., vol. 152, no. 1, pp. 38–52, Sep. 2007. 
[88] G. Wolber and T. Langer, “LigandScout: 3-D pharmacophores derived from 
protein-bound ligands and their use as virtual screening filters,” J. Chem. Inf. 
Model., vol. 45, no. 1, pp. 160–169, Feb. 2005. 
[89] R. S. Armen, J. Chen, and C. L. Brooks, “An Evaluation of Explicit Receptor 
Flexibility in Molecular Docking Using Molecular Dynamics and Torsion Angle 
Molecular Dynamics,” J. Chem. Theory Comput., vol. 5, no. 10, pp. 2909–2923, 
Oct. 2009. 
[90] A. M. Dar and S. Mir, “Molecular Docking: Approaches, Types, Applications and 
Basic Challenges,” J. Anal. Bioanal. Tech., Apr. 2017. 
[91] Z. Zhou, A. K. Felts, R. A. Friesner, and R. M. Levy, “Comparative Performance of 
Several Flexible Docking Programs and Scoring Functions:  Enrichment Studies for 
a Diverse Set of Pharmaceutically Relevant Targets,” J. Chem. Inf. Model., vol. 47, 
no. 4, pp. 1599–1608, Jul. 2007. 
[92]  da R. Pita, S. Silva, T. V. A. Fernandes, E. R. Caffarena, and P. G. Pascutti, 
“Studies of molecular docking between fibroblast growth factor and heparin using 
generalized simulated annealing,” Int. J. Quantum Chem., vol. 108, pp. 2608–2614. 
[93] M. P. Repasky, M. Shelley, and R. A. Friesner, “Flexible ligand docking with 
Glide,” Curr. Protoc. Bioinforma., vol. Chapter 8, p. Unit 8.12, Jun. 2007. 
[94] “FRED — OEDocking, v3.2.0.2.” [Online]. Available: 
https://docs.eyesopen.com/oedocking/fred.html. [Accessed: 16-Apr-2017]. 
[95] S. Forli, R. Huey, M. E. Pique, M. F. Sanner, D. S. Goodsell, and A. J. Olson, 
“Computational protein-ligand docking and virtual drug screening with the 
AutoDock suite,” Nat. Protoc., vol. 11, no. 5, pp. 905–919, May 2016. 
[96] O. Trott and A. J. Olson, “AutoDock Vina: improving the speed and accuracy of 
docking with a new scoring function, efficient optimization and multithreading,” J. 
Comput. Chem., vol. 31, no. 2, pp. 455–461, Jan. 2010. 
[97] S. Joy, P. S. Nair, R. Hariharan, and M. R. Pillai, “Detailed comparison of the 
protein-ligand docking efficiencies of GOLD, a commercial package and ArgusLab, 
a licensable freeware,” In Silico Biol., vol. 6, no. 6, pp. 601–605, 2006. 
96 
 
University of Ghana  http://ugspace.ug.edu.gh
[98] “Center for Bioinformatics: Universität Hamburg - FlexX: Molecular Docking.” 
[Online]. Available: http://www.zbh.uni-hamburg.de/en/research/research-group-
for-computational-molecular-design/software-server/flexx-molecular-docking.html. 
[Accessed: 16-Apr-2017]. 
[99] I. Cortés-Ciriano et al., “Polypharmacology modelling using proteochemometrics 
(PCM): recent methodological developments, applications to target families, and 
future prospects,” MedChemComm, vol. 6, no. 1, pp. 24–50, 2015. 
[100] G. J. P. van Westen, J. K. Wegner, A. P. IJzerman, H. W. T. van Vlijmen, and A. 
Bender, “Proteochemometric modeling as a tool to design selective compounds and 
for extrapolating to novel targets,” MedChemComm, vol. 2, no. 1, pp. 16–30, Jan. 
2011. 
[101] M. G. G. and H. S. Claes R. Andersson, “Quantitative Chemogenomics: Machine-
Learning Models of Protein-Ligand Interaction,” http://www.eurekaselect.com. 
[Online]. Available: http://www.eurekaselect.com/88475/article. [Accessed: 24-
Apr-2017]. 
[102] I. Cortés-Ciriano et al., “Polypharmacology modelling using proteochemometrics 
(PCM): recent methodological developments, applications to target families, and 
future prospects,” MedChemComm, vol. 6, no. 1, pp. 24–50, Jan. 2015. 
[103] A. L. Tarca, V. J. Carey, X. Chen, R. Romero, and S. Drăghici, “Machine Learning 
and Its Applications to Biology,” PLoS Comput. Biol., vol. 3, no. 6, Jun. 2007. 
[104] K. R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf, “An introduction to 
kernel-based learning algorithms,” IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 
181–201, 2001. 
[105] A. Ben-Hur, C. S. Ong, S. Sonnenburg, B. Schölkopf, and G. Rätsch, “Support 
Vector Machines and Kernels for Computational Biology,” PLOS Comput. Biol., 
vol. 4, no. 10, p. e1000173, Oct. 2008. 
[106] D. Wu et al., “Screening of selective histone deacetylase inhibitors by 
proteochemometric modeling,” BMC Bioinformatics, vol. 13, p. 212, Aug. 2012. 
[107] R. Casanova, S. Saldana, E. Y. Chew, R. P. Danis, C. M. Greven, and W. T. 
Ambrosius, “Application of Random Forests Methods to Diabetic Retinopathy 
Classification Analyses,” PLoS ONE, vol. 9, no. 6, Jun. 2014. 
[108] I. Cortes-Ciriano, G. J. van Westen, D. S. Murrell, E. B. Lenselink, A. Bender, and 
T. E. Malliavin, “Applications of proteochemometrics - from species extrapolation 
to cell line sensitivity modelling,” BMC Bioinformatics, vol. 16, no. 3, p. A4, 2015. 
[109] A. Golbraikh and A. Tropsha, “Beware of q2!,” J. Mol. Graph. Model., vol. 20, no. 
4, pp. 269–276, Jan. 2002. 
[110] A. Tropsha, P. Gramatica, and V. K. Gombar, “The Importance of Being Earnest: 
Validation is the Absolute Essential for Successful Application and Interpretation of 
QSPR Models,” QSAR Comb. Sci., vol. 22, no. 1, pp. 69–77, Apr. 2003. 
[111] A. Schwaighofer et al., “Accurate Solubility Prediction with Error Bars for 
Electrolytes:  A Machine Learning Approach,” J. Chem. Inf. Model., vol. 47, no. 2, 
pp. 407–424, Mar. 2007. 
[112] P. Zhou, X. Chen, Y. Wu, and Z. Shang, “Gaussian process: an alternative 
approach for QSAM modeling of peptides,” Amino Acids, vol. 38, no. 1, pp. 199–
212, Jan. 2010. 
97 
 
University of Ghana  http://ugspace.ug.edu.gh
[113] O. Obrezanova, G. Csányi, J. M. R. Gola, and M. D. Segall, “Gaussian Processes:  
A Method for Automatic QSAR Modeling of ADME Properties,” J. Chem. Inf. 
Model., vol. 47, no. 5, pp. 1847–1857, Sep. 2007. 
[114] M. Belyaev, E. Burnaev, and Y. Kapushev, “Exact Inference for Gaussian Process 
Regression in case of Big Data with the Cartesian Product Structure,” 
ArXiv14036573 Math Stat, Mar. 2014. 
[115] J. J. Vermeire, L. D. Lantz, and C. R. Caffrey, “Cure of Hookworm Infection with 
a Cysteine Protease Inhibitor,” PLoS Negl. Trop. Dis., vol. 6, no. 7, Jul. 2012. 
[116] Y. Cho et al., “Drug Repositioning and Pharmacophore Identification in the 
Discovery of Hookworm MIF Inhibitors,” Chem. Biol., vol. 18, no. 9, pp. 1089–
1101, Sep. 2011. 
[117] J. Keiser, G. Panic, R. Adelfio, N. Cowan, M. Vargas, and I. Scandale, “Evaluation 
of an FDA approved library against laboratory models of human intestinal nematode 
infections,” Parasit. Vectors, vol. 9, no. 1, p. 376, 01 2016. 
[118] P. Wangchuk, P. R. Giacomin, M. S. Pearson, M. J. Smout, and A. Loukas, 
“Identification of lead chemotherapeutic agents from medicinal plants against blood 
flukes and whipworms,” Sci. Rep., vol. 6, p. 32101, Aug. 2016. 
[119] V. Khanna and S. Ranganathan, “In silico approach to screen compounds active 
against parasitic nematodes of major socio-economic importance,” BMC 
Bioinformatics, vol. 12, no. Suppl 13, p. S25, Nov. 2011. 
[120] Y. Marrero-Ponce et al., “TOMOCOMD-CARDD, a novel approach for computer-
aided ‘rational’ drug design: I. Theoretical and experimental assessment of a 
promising method for computational screening and in silico design of new 
anthelmintic compounds,” J. Comput. Aided Mol. Des., vol. 18, no. 10, pp. 615–
634, Oct. 2004. 
[121] S. Dakshanamurthy et al., “Predicting New Indications for Approved Drugs Using 
a Proteo-Chemometric Method,” J. Med. Chem., vol. 55, no. 15, pp. 6832–6848, 
Aug. 2012. 
[122] “UniProt: a hub for protein information,” Nucleic Acids Res., vol. 43, no. D1, pp. 
D204–D212, Jan. 2015. 
[123] A. Roy, A. Kucukural, and Y. Zhang, “I-TASSER: a unified platform for 
automated protein structure and function prediction,” Nat. Protoc., vol. 5, no. 4, pp. 
725–738, Apr. 2010. 
[124] N. Eswar et al., “Comparative Protein Structure Modeling Using Modeller,” Curr. 
Protoc. Bioinforma. Ed. Board Andreas Baxevanis Al, vol. 0 5, p. Unit-5.6, Oct. 
2006. 
[125] “WHAT IF homepage.” [Online]. Available: http://swift.cmbi.ru.nl/whatif/. 
[Accessed: 22-Mar-2017]. 
[126] “Swiss PDB Viewer - Home.” [Online]. Available: http://spdbv.vital- it.ch/. 
[Accessed: 29-Jun-2017]. 
[127] S. Yuan, H. C. S. Chan, and Z. Hu, “Using PyMOL as a platform for computational 
drug design,” Wiley Interdiscip. Rev. Comput. Mol. Sci., vol. 7, no. 2, p. n/a-n/a, 
Mar. 2017. 
[128] S. A. Hollingsworth and P. A. Karplus, “A fresh look at the Ramachandran plot and 
the occurrence of standard structures in proteins,” Biomol. Concepts, vol. 1, no. 3–4, 
pp. 271–283, Oct. 2010. 
98 
 
University of Ghana  http://ugspace.ug.edu.gh
[129] M. Kalman and N. Ben-Tal, “Quality assessment of protein model-structures using 
evolutionary conservation,” Bioinformatics, vol. 26, no. 10, pp. 1299–1307, May 
2010. 
[130] D. Eisenberg, R. Lüthy, and J. U. Bowie, “VERIFY3D: assessment of protein 
models with three-dimensional profiles,” Methods Enzymol., vol. 277, pp. 396–404, 
1997. 
[131] R. A. Laskowski, J. A. Rullmannn, M. W. MacArthur, R. Kaptein, and J. M. 
Thornton, “AQUA and PROCHECK-NMR: programs for checking the quality of 
protein structures solved by NMR,” J. Biomol. NMR, vol. 8, no. 4, pp. 477–486, 
Dec. 1996. 
[132] “Gromacs - Gromacs.” [Online]. Available: http://www.gromacs.org/. [Accessed: 
22-Mar-2017]. 
[133] W. Humphrey, A. Dalke, and K. Schulten, “VMD: visual molecular dynamics,” J. 
Mol. Graph., vol. 14, no. 1, pp. 33–38, 27–28, Feb. 1996. 
[134] B. Huang, “MetaPocket: A Meta Approach to Improve Protein Ligand Binding Site 
Prediction,” ResearchGate, vol. 13, no. 4, pp. 325–30, Sep. 2009. 
[135] T. A. Binkowski, S. Naghibzadeh, and J. Liang, “CASTp: Computed Atlas of 
Surface Topography of proteins,” Nucleic Acids Res., vol. 31, no. 13, pp. 3352–
3355, Jul. 2003. 
[136] D. Seeliger and B. L. de Groot, “Ligand docking and binding site analysis with 
PyMOL and Autodock/Vina,” J. Comput. Aided Mol. Des., vol. 24, no. 5, pp. 417–
422, May 2010. 
[137] G. M. Morris et al., “AutoDock4 and AutoDockTools4: Automated Docking with 
Selective Receptor Flexibility,” J. Comput. Chem., vol. 30, no. 16, pp. 2785–2791, 
Dec. 2009. 
[138] F. Ntie-Kang et al., “AfroDb: A Select Highly Potent and Diverse Natural Product 
Library from African Medicinal Plants,” PLOS ONE, vol. 8, no. 10, p. e78085, Oct. 
2013. 
[139] J. J. Irwin and B. K. Shoichet, “ZINC – A Free Database of Commercially 
Available Compounds for Virtual Screening,” J. Chem. Inf. Model., vol. 45, no. 1, 
pp. 177–182, 2005. 
[140] “NANPDB | NANPDB.” [Online]. Available: http://african-
compounds.org/nanpdb/. [Accessed: 29-Jun-2017]. 
[141] “NANPDB: A Resource for Natural Products from Northern African Sources - 
Journal of Natural Products (ACS Publications).” [Online]. Availab le: 
http://pubs.acs.org/doi/ipdf/10.1021/acs.jnatprod.7b00283. [Accessed: 28-Jul-2017]. 
[142] N. M. O’Boyle, M. Banck, C. A. James, C. Morley, T. Vandermeersch, and G. R. 
Hutchison, “Open Babel: An open chemical toolbox,” J. Cheminformatics, vol. 3, 
no. 1, p. 33, Oct. 2011. 
[143] “The PRODRG Server.” [Online]. Available: 
http://davapc1.bioch.dundee.ac.uk/cgi-bin/prodrg/. [Accessed: 28-Jul-2017]. 
[144] S. Dallakyan and A. J. Olson, “Small-molecule library screening by docking with 
PyRx,” Methods Mol. Biol. Clifton NJ, vol. 1263, pp. 243–250, 2015. 
[145] O. Trott and A. J. Olson, “AutoDock Vina: improving the speed and accuracy of 
docking with a new scoring function, efficient optimization and multithreading,” J. 
Comput. Chem., vol. 31, no. 2, pp. 455–461, Jan. 2010. 
99 
 
University of Ghana  http://ugspace.ug.edu.gh
[146] L. R. Wallace AC and Wallace AC, Laskowski RA, Thornton JM, “LIGPLOT: a 
program to generate schematic diagrams of protein-ligand interactions,” Protein 
Eng, vol. 8, no. 2, pp. 127–34, Feb. 1995. 
[147] A. Daina, O. Michielin, and V. Zoete, “SwissADME: a free web tool to evaluate 
pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small 
molecules,” Sci. Rep., vol. 7, p. 42717, Mar. 2017. 
[148] T. Sander, J. Freyss, M. von Korff, and C. Rufener, “DataWarrior: an open-source 
program for chemistry aware data visualization and analysis,” J. Chem. Inf. Model., 
vol. 55, no. 2, pp. 460–473, Feb. 2015. 
[149] G. W. Bemis and M. A. Murcko, “Properties of Known Drugs. 2. Side Chains,” J. 
Med. Chem., vol. 42, no. 25, pp. 5095–5099, Dec. 1999. 
[150] S. Wetzel et al., “Interactive exploration of chemical space with Scaffold Hunter,” 
Nat. Chem. Biol., vol. 5, no. 8, pp. 581–583, Aug. 2009. 
[151] A. A. Shelat and R. K. Guy, “Scaffold composition and biological relevance of 
screening libraries,” Nat. Chem. Biol., vol. 3, no. 8, pp. 442–446, Aug. 2007. 
[152] A. H. Lipkus et al., “Structural diversity of organic chemistry. A scaffold analysis 
of the CAS Registry,” J. Org. Chem., vol. 73, no. 12, pp. 4443–4451, Jun. 2008. 
[153] M. K. Gilson, T. Liu, M. Baitaluk, G. Nicola, L. Hwang, and J. Chong, 
“BindingDB in 2015: A public database for medicinal chemistry, computational 
chemistry and systems pharmacology,” Nucleic Acids Res., vol. 44, no. Database 
issue, pp. D1045–D1053, Jan. 2016. 
[154] X. Ning, M. Walters, and G. Karypisxy, “Improved Machine Learning Models for 
Predicting Selective Compounds,” J. Chem. Inf. Model., vol. 52, no. 1, pp. 38–50, 
Jan. 2012. 
[155] D. S. Murrell et al., “Chemically Aware Model Builder (camb): an R package for 
property and bioactivity modelling of small molecules,” J. Cheminformatics, vol. 7, 
Aug. 2015. 
[156] C. W. Yap, “PaDEL-descriptor: an open source software to calculate molecular 
descriptors and fingerprints,” J. Comput. Chem., vol. 32, no. 7, pp. 1466–1474, May 
2011. 
[157] Applied Predictive Modeling | Max Kuhn | Springer. . 
[158] M. Kuhn, The caret Package. . 
[159] D. Krstajic, L. J. Buturovic, D. E. Leahy, and S. Thomas, “Cross-validation pitfalls 
when selecting and assessing regression and classification models,” J. 
Cheminformatics, vol. 6, no. 1, p. 10, Mar. 2014. 
[160] “R: The R Stats Package.” [Online]. Available: https://stat.ethz.ch/R-manual/R-
devel/library/stats/html/00Index.html. [Accessed: 29-Jun-2017]. 
[161] D. Stumpfe, H. E. A. Ahmed, I. Vogt, and J. Bajorath, “Methods for computer-
aided chemical biology. Part 1: Design of a benchmark system for the evaluation of 
compound selectivity,” Chem. Biol. Drug Des., vol. 70, no. 3, pp. 182–194, Sep. 
2007. 
[162] S. J. Eglen, “A Quick Guide to Teaching R Programming to Computational 
Biology Students,” PLoS Comput. Biol., vol. 5, no. 8, Aug. 2009. 
[163] D. Rogers and M. Hahn, “Extended-connectivity fingerprints,” J. Chem. Inf. 
Model., vol. 50, no. 5, pp. 742–754, May 2010. 
100 
 
University of Ghana  http://ugspace.ug.edu.gh
[164] I. Dubchak, I. Muchnik, S. R. Holbrook, and S. H. Kim, “Prediction of protein 
folding class using global description of amino acid sequence.,” Proc. Natl. Acad. 
Sci. U. S. A., vol. 92, no. 19, pp. 8700–8704, Sep. 1995. 
[165] I. T. Jolliffe and J. Cadima, “Principal component analysis: a review and recent 
developments,” Philos. Transact. A Math. Phys. Eng. Sci., vol. 374, no. 2065, p. 
20150202, Apr. 2016. 
[166] M. E. Kutcher, A. R. Ferguson, and M. J. Cohen, “A principal component analysis 
of coagulation after trauma,” J. Trauma Acute Care Surg., vol. 74, no. 5, pp. 1223–
1230, May 2013. 
[167] K. Y. Yeung and W. L. Ruzzo, “Principal component analysis for clustering gene 
expression data,” Bioinforma. Oxf. Engl., vol. 17, no. 9, pp. 763–774, Sep. 2001. 
[168] “FactoMineR: Exploratory Multivariate Data Analysis with R.” [Online]. 
Available: http://factominer.free.fr/. [Accessed: 18-Mar-2017]. 
[169] D. Zhang et al., “A Genetic Algorithm Based Support Vector Machine Model for 
Blood-Brain Barrier Penetration Prediction,” BioMed Res. Int., vol. 2015, 2015. 
[170] L. Y. Han et al., “A support vector machines approach for virtual screening of 
active compounds of single and multiple mechanisms from large libraries at an 
improved hit-rate and enrichment factor,” J. Mol. Graph. Model., vol. 26, no. 8, pp. 
1276–1286, Jun. 2008. 
[171] R. N. Jorissen and M. K. Gilson, “Virtual screening of molecular databases using a 
support vector machine,” J. Chem. Inf. Model., vol. 45, no. 3, pp. 549–561, Jun. 
2005. 
[172] “The Nature of Statistical Learning Theory | Vladimir Vapnik | Springer.” [Online]. 
Available: http://www.springer.com/gp/book/9780387987804. [Accessed: 11-Jul-
2017]. 
[173] “scikit- learn: machine learning in Python — scikit-learn 0.18.1 documentation.” 
[Online]. Available: 
http://webcache.googleusercontent.com/search?q=cache:http://scikit-
learn.org/&gws_rd=cr&ei=WJvTWL64GojOgAboy7moDw. [Accessed: 23-Mar-
2017]. 
[174] “Support Vector Machines for Classification and Regression.” [Online]. Available: 
https://www.researchgate.net/publication/37535445_Support_Vector_Machines_for
_Classification_and_Regression. [Accessed: 17-Jun-2017]. 
[175] “ERRAT.” [Online]. Available: http://services.mb i.ucla.edu/ERRAT/. [Accessed: 
27-Jul-2017]. 
[176] “Verify_3D.” [Online]. Available: http://services.mbi.ucla.edu/Verify_3D/. 
[Accessed: 27-Jul-2017]. 
[177] “Drug resistance in nematodes of veterinary importance: A status report,” 
ResearchGate. [Online]. Available: 
https://www.researchgate.net/publication/8352055_Drug_resistance_in_nematodes_
of_veterinary_importance_A_status_report. [Accessed: 29-Jul-2017]. 
[178] S. Geerts and B. Gryseels, “Drug resistance in human helminths: current situation 
and lessons from livestock,” Clin Microbiol Rev, vol. 13, 2000. 
[179] J. Vercruysse et al., “Is anthelmintic resistance a concern for the control of human 
soil-transmitted helminths?,” Int. J. Parasitol. Drugs Drug Resist., vol. 1, no. 1, pp. 
14–27, Dec. 2011. 
101 
 
University of Ghana  http://ugspace.ug.edu.gh
[180] B. S. Kalra, “Cytochrome P450 enzyme isoforms and their therapeutic 
implications: an update,” Indian J. Med. Sci., vol. 61, no. 2, pp. 102–116, Feb. 2007. 
[181] C. A. Lipinski, F. Lombardo, B. W. Dominy, and P. J. Feeney, “Experimental and 
computational approaches to estimate solubility and permeability in drug discovery 
and development settings,” Adv. Drug Deliv. Rev., vol. 46, no. 1–3, pp. 3–26, Mar. 
2001. 
[182] A. B. Raies and V. B. Bajic, “In silico toxicology: computational methods for the 
prediction of chemical toxicity,” Wiley Interdiscip. Rev. Comput. Mol. Sci., vol. 6, 
no. 2, pp. 147–172, Mar. 2016. 
[183] S. Solaymani-Mohammadi, J. M. Genkinger, C. A. Loffredo, and S. M. Singer, “A 
Meta-analysis of the Effectiveness of Albendazole Compared with Metronidazole as 
Treatments for Infections with Giardia duodenalis,” PLoS Negl. Trop. Dis., vol. 4, 
no. 5, p. e682, May 2010. 
[184] S. Egieyeh, J. Syce, A. Christoffels, and S. F. Malan, “Exploration of Scaffolds 
from Natural Products with Antiplasmodial Activities, Currently Registered 
Antimalarial Drugs and Public Malarial Screen Data,” Mol. Basel Switz., vol. 21, 
no. 1, p. 104, Jan. 2016. 
[185] M. Pascolutti, M. Campitelli, B. Nguyen, N. Pham, A.-D. Gorse, and R. J. Quinn, 
“Capturing Nature’s Diversity,” PLOS ONE, vol. 10, no. 4, p. e0120942, Apr. 2015. 
[186] Q. U. Ain, O. Méndez-Lucio, I. Cortés Ciriano, T. Malliavin, G. J. P. van Westen, 
and A. Bender, “Modelling ligand selectivity of serine proteases using integrative 
proteochemometric approaches improves model performance and allows the multi-
target dependent interpretation of features,” Integr. Biol., vol. 6, no. 11, pp. 1023–
1033, 2014. 
[187] M. Lapins et al., “A Unified Proteochemometric Model for Prediction of Inhibition 
of Cytochrome P450 Isoforms,” PLoS ONE, vol. 8, no. 6, Jun. 2013. 
[188] D.-S. Cao et al., “Genome-Scale Screening of Drug-Target Associations Relevant 
to Ki Using a Chemogenomics Approach,” PLOS ONE, vol. 8, no. 4, p. e57680, 
Apr. 2013. 
[189] M. Fernandez, S. Ahmad, and A. Sarai, “Proteochemometric recognition of stable 
kinase inhibition complexes using topological autocorrelation and support vector 
machines,” J. Chem. Inf. Model., vol. 50, no. 6, pp. 1179–1188, Jun. 2010. 
[190] Turner PJ. XMGRACE, Version 5.1.19. Center for Coastal and Land-Margin 
Research, Oregon Graduate Institute of Science and Technology, Beaverton, OR; 
2005 
[191] P. J. Hotez, J. Bethony, M. E. Bottazzi, S. Brooker, and P. Buss, “Hookworm: ‘The 
Great Infection of Mankind,’” PLOS Med., vol. 2, no. 3, p. e67, Mar. 2005. 
 
 
 
 
102 
 
University of Ghana  http://ugspace.ug.edu.gh
APPENDICES 
APPENDIX I. REPOSITORY OF SUPPORTING FILES 
All python scripts related to this research have been deposited into a github repository. The 
repository is available at https://github.com/odam23/Hookworm-Drug-Discovery.git. The 
fasta sequence of the tubulin that was used to build a homology model has been also stored 
in the repository along with the related python scripts that were used to build it and the pdb 
format of the homology model itself. Shell scripts that were used for performing molecular 
docking with Vina have also been stored along with the resulting protein-ligand pdb 
complexes and docking results. The dataset and all the PCM model scripts described in the 
PCM predictive server section have been stored in the folder “PCM” in the repository along 
with the related python script and instructions to install required libraries.  The SVM model 
that is used to predict a new set of data resides in the “model” subfolder in PCM directory 
within the repository. 
 
 
 
 
 
 
 
103 
 
University of Ghana  http://ugspace.ug.edu.gh
APPENDIX II 
Pymol visualisation of protein-ligand complexes for the natural products including 
Dichapetalin A and albendazole 
A 
B 
    
C 
D 
    
E 
F 
   
104 
 
University of Ghana  http://ugspace.ug.edu.gh
G 
H 
    
I 
J 
   
K 
L 
  
 Complexes of docked compounds from the first set of screened virtual library 
ZINC14760755, ZINC95485927, ZINC95486082, ZINC95486263, ZINC14780716, 
 
ZINC95485922, ZINC95486052, ZINC95485928, ZINC1348034, ZINC28462577, 
 Dichapetalin A and albendazole respectively 
 
105 
 
University of Ghana  http://ugspace.ug.edu.gh
APPENDIX III 
Interaction profile of the protein-ligand complexes using LIGPLOT 
 
 
 
 
 
   
                               
 
 
 
 
 
 
 
 
 
 
                                   
  
Binding interactions of ZINC95485927, ZINC95486082, ZINC95486263, ZINC14780716, 
ZINC95485922, ZINC95486052 accordingly 
 
 
 
106 
 
University of Ghana  http://ugspace.ug.edu.gh
 
                                          
 
 
 
 
 
  
 
 
 
  
 
 
 
 
 
 
 
  
 
 Binding interactions of ZINC95485928, ZINC13480348, robustaflavone and 
tetrahydrorobustaflavone  respectively 
 
107 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Binding interactions of anchinopeptolide_A and tetrahydrorobustaflavone 
  
108 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Binding interactions of campesterol and tetrahydrorobustaflavone 
 
 
109 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
Binding interactions of orthidine_A and tetrahydrorobustaflavone 
 
 
110 
 
University of Ghana  http://ugspace.ug.edu.gh
 
 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
Binding interactions of euphohelionon and tetrahydrorobustaflavone  
 
  
111 
 
University of Ghana  http://ugspace.ug.edu.gh
APPENDIX IV 
 Chemical formula of the top 20 compounds from AfroDB, 
 Dichapetalin A and albendazole 
 
 
Ligand Chemical formula 
 ZINC14760755 3-[(2E)-3,7-dimethylocta-2,6-dienoxy]-1,8-dihydroxy-6-
methyl-10H-anthracen-9-one 
 
ZINC95485927 [(3S,4aR,6aR,6bS,8aR,12aS,14aR,14bR)-
 4,4,6a,6b,8a,11,11,14b-octamethyl-
1,2,3,4a,5,6,7,8,9,10,12,12a, 
 
ZINC95486082 (2S)-2-[2,2-dimethyl-8-(3-methylbut-2-enyl)chroman-6-
 yl]-7-hydroxy-chroman-4-one 
ZINC95486263 2-[4-[5-(5,7-dihydroxy-4-oxo-chromen-2-yl)-2-hydroxy-
 phenoxy]-3-hydroxy-phenyl]-5,7-dihydroxy-chrome 
 ZINC14780716 Stipulin 
ZINC95485922 2-[(2E)-3,7-dimethylocta-2,6-dienyl]-1,3,5,8-
 tetrahydroxy-4-(3-methylbut-2-enyl)xanthen-9-one 
 ZINC95486052 (2S)-2-[2,2-dimethyl-8-(3-methylbut-2-enyl)chroman-6-
yl]-5,7-dihydroxy-chroman-4-one 
 
ZINC95485928 10-[(2Z)-3,7-dimethylocta-2,6-dienyl]-5,9,11-trihydroxy-
 3,3-dimethyl-pyrano[3,2-a]xanthen-12-one 
ZINC13480348 [(2R)-7-[(2E)-3,7-dimethylocta-2,6-dienoxy]-5,10-
 
dihydroxy-2-methyl-4-oxo-1,3-dihydroanthracen-2-yl] 
 ZINC28462577 DNC006449 
ZINC95486072 heptamethylBLAHdione 
 
ZINC95486073 hydroxy(heptamethyl)BLAHone 
 
ZINC95486081 (2S)-7-hydroxy-2-[(2R,3S)-2-hydroxy-3-(3-methylbut-2-
 enyl)chroman-6-yl]chroman-4-one 
 ZINC33833639 (4aS,6aS,6aS,6bR,8aR,10S,12aR,14bS)-10-hydroxy-4a-
(hydroxymethyl)-2,2,6a,6b,9,9,12a-heptamethyl-3,4, 
 
ZINC95485992 (E)-1-[2,4-dihydroxy-5-[(3S)-3-hydroxy-4-methyl-pent-4-
 enyl]phenyl]-3-[4-hydroxy-3-[(E)-3-methylpent 
ZINC95486074 heptamethylBLAHdiol 
 
ZINC95486075 (3S,4aR,6aR,6bS,8R,8aS,12aS,14aS,14bR)-8a-
(hydroxymethyl)-4,4,6a,6b,11,11,14b-heptamethyl-
 1,2,3,4a,5 
ZINC13365959 3-[(1S)-1-(1H-indol-6-yl)-3-methyl-but-2-enyl]-6-(3-
 methylbut-2-enyl)-1H-indole 
ZINC13485435 Erybraedin C 
 ZINC15120680 DNC014426 
Dichapetalin A C38H48O5 
  Albendazole (5-(propylthio)-1H-benzimidazol-2-yl)carbamic acid 
methyl ester 
 
112