Received: 26 November 2022    Revised: 2 January 2023    Accepted: 5 January 2023 DOI: 10.1002/jid.3751 R E V I E W A R T I C L E A review of machine learning and satellite imagery for poverty prediction: Implications for development research and applications Ola Hall1  | Francis Dompae2 | Ibrahim Wahab1 |  Fred Mawunyo Dzanku2 1Department of Human Geography, Lund University, Lund, Sweden Abstract 2Institute of Statistical, Social and Economic The field of artificial intelligence is seeing the increased Research, University of Ghana, Legon, Ghana application of satellite imagery to analyse poverty in its Correspondence various manifestations. This nascent but rapidly grow- Ola Hall, Department of Human Geography, ing intersection of scholarship holds the potential to help Lund University, Lund, Sweden. Email: ola.hall@keg.lu.se us better understand poverty by leveraging big data and recent advances in machine vision. In this study, we statisti- Funding information Swedish Research Council, Grant/Award cally analyse the literature in the expanding field of welfare Number: 2019-04253; Riksbankens and poverty predictions from the combination of machine Jubileumsfond, Grant/Award Number: MXM19-1104:1 learning and satellite imagery. Here, we apply an integra- tive review method to extract key data on factors related to the predictive power of welfare. We found that the most important factors correlated to the predictive power of welfare are the number of pre-processing steps employed, the number of datasets used, the type of welfare indica- tor targeted and the choice of AI model. Studies that used stock measure indicators (assets) as targets achieved better performance—17 percentage points higher—in predicting welfare than those that targeted flow measures (income and consumption) ones. Additionally, we found that the combi- nation of machine learning and deep learning significantly increases predictive power—by as much as 15 percentage points—compared to using either alone. Surprisingly, we This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2023 The Authors. Journal of International Development published by John Wiley & Sons Ltd. J. Int. Dev. 2023;1–16. wileyonlinelibrary.com/journal/jid 1 2 HALL et AL. found that the spatial resolution of the satellite imagery used is important but not critical to the performance as the relationship is positive but not statistically significant. These findings have important implications for future research in this domain and for anyone aspiring to use the methodology. K E Y W O R D S deep learning, machine learning, poverty analysis, satellite imagery, welfare 1 | INTRODUCTION There is a need to measure the welfare of people and nations. The world's population should be counted, measured, weighed and evaluated (Jerven, 2017). Behind these rather raw statements hides a more humanistic perspective where measuring and counting people are in support of humanitarian and developmental efforts, targeting, mapping and monitoring people at risk of food insecurity, famine, poverty and disease (McBride et al., 2021). The effort is supported by the 17 Sustainable Development Goals along with their 169 targets that were adopted by member states of the United Nations as part of the 2030 Agenda (United Nation, 2015). To balance the agenda's economic,  social and environmental aspects, more timely, reliable and appropriate ways of collecting and interpreting information on a broad range of human development outcomes are needed (Head et al., 2017). Traditional approaches such as household surveys, often rich in detail but infrequent in time and space, especially in the poor regions of the world, have served the development community with data for a long time are now augmented with new methods and new types of data (Burke et al., 2021). The new approaches are predominantly digital in nature, for example, using cell phone data, harvesting social media including Internet, crowd sourced data, imagery including google street view and other forms of remotely sensed imagery, for measuring welfare and poverty and monitoring the progress towards the attainment of these SDGs. Many are computationally intensive and advanced and many times qualify as ‘big data’. For close to a decade now, the most interesting approach in this area of research has been the combination of satellite imagery (SI) with different machine learning (ML) algorithms, including deep learning (DL) for the estimation of human outcomes. When put to work, wealth and poverty can be estimated from a single satellite image almost as good as achieved from surveys. A recent review of 12 studies evinces that the methodology can predict, for example, the Demographic and Health Survey (DHS) welfare asset index with R 2 between 0.45 and 0.80 (Burke et al., 2021). This study focuses on studies at the intersection between ML/DL/TL, SI and poverty analysis. The SIML (Satel- lite Image Machine Learning) methodology combines some of the recent achievements from computer science and object recognition research and applies them to the field of human development research. At a conceptual level, the SIML approach shares many similarities with well-known applications such as learning different object categories from imagery, for example, distinguishing dogs from cats in photographs. For several reasons, it is more complicated to train algorithms to estimate, for example, poverty from imagery. This may be partly because labelled training data are less abundant for human development targets than for everyday objects typically found in image databases such as ImageNetn or AlexNet. Another is that poverty and welfare and how to measure them are strongly contested issues (Gibson, 2016), a fact that is not yet adequately reflected in recent works on SIML. Still, this is a rapidly growing area of scholarship. Like in most other fields of study, context matters in this domain. Some forms of data types perform better than others in different contexts. For example, in poor and extremely poor regions, night-time light (NTL) satellite data have been shown to underperform compared to daytime SI as the former is noted not to vary significantly in such regions (Yeh et al., 2020). This is particularly true for rural regions where the presence of economic activity, including 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License HALL et AL. 3 large farms, does not necessarily mean increasing intensity or even the presence of illumination. At the fundamental level, for a dataset to be useful in this endeavour, there must be sufficient variability in the input variable at different welfare levels and daytime imagery tend to provide that. The conceptualization and measurement of poverty are a complex endeavour. The increasing success rate of the SIML approach in this domain is influenced by the kind of indicator or measure of poverty that is being targeted. In this regard, it is important to note the different forms and manifestations of poverty or welfare so that in measuring or predicting it, both the so-called ‘soft’ measures (income, nutrition, food consumption, literacy rates, etc.) and the ‘hard’ measures (mainly physical assets) are taken into consideration or at least acknowledged. The use of the latter in machine vision processes would be expected to yield higher accuracy than the former in poverty prediction. That is, physical indicators of welfare such as roofing quality and type, infrastructure and farm sizes can more effectively be detected and classified from high-resolution SI than the quantity of meat consumed by a household in the last week. This constrains its applicability to certain types of poverty, which are place based and have a physical manifestation, rather than those measures that are transient in character. The present study introduces this paradigm of SI/ML/poverty analysis to a wider audience in the humanities, the social sciences and the development community. For anyone interested in this topic, the available literature is difficult to navigate and filled with discipline specific notations. The aim of this study is to quantitatively synthetize the findings and methods in the field, from the perspective of a potential user. 1.1 | Shifting the frontiers of poverty analysis A major strand of poverty geography studies the distributional characteristics—identification of poor areas and impoverished populations (Zhou & Liu, 2022). Spatial identification of poverty is important to the extent that it can reveal the spatial heterogeneity and geographical character of poverty and thus aid the prioritization of poverty alleviation efforts and resource allocation activities (Erenstein et al., 2010). This is critical given the relative and multi- dimensional character of poverty. The current standard approaches to quantifying welfare—based on face-to-face household interviews—can deliver detailed estimates of poverty, gender roles, the experience of hunger and many other important indicators of poverty. Such surveys on poverty often measure incomes, consumption, expenditure or assets. However, surveys are expensive, time-consuming, error-prone and difficult to scale up beyond the commu- nity or site level without substantial financial investment. Two of the more well-known survey programmes are the World Bank's flagship household Living Standards Measurement Survey and the USAID's DHS. With their roots in the 1980s, they have evolved substantially in terms of coverage, methods and technology. For example, GPS receivers are nowadays part of the standard equipment when visiting sample villages. While unprecedently rich in indicators, most large-scale surveys, such as these two, tend to suffer from long lags between surveys, limited spatial coverage and high aggregation levels (nation or region), which impede effective monitoring and evaluation of poverty and welfare indicators at the village and household levels where they are most needed (Burke et al., 2021). Again, given the multidimensional character of poverty, poverty indices are often constructed from these surveys as a single variable would not suffice as an indicator of poverty. A major determinant in the construction of a poverty index is the kinds of data that are available (Zhou & Liu, 2022). Increasingly, the data requirements, as well as approaches to the measurement of poverty, are becoming more sophisticated as the scale of measurement shifts from national and regional levels to the district, town, household and even individual levels. Work in recent decades has explored poverty estimation using remote techniques (Blumenstock, 2016). New forms of data are aiding this transition towards finer resolutions though some have met with limited success. Applications based on NTLs such as  Keola et al. (2015) were a step forward. While initial efforts in this domain relied on NTL data—given the strong correlation between nightlight luminosity and traditional measures of economic growth—the application has seen limited application at finer resolutions and in the poorest of regions (Blumenstock, 2016; Jean et al., 2016). Other major approaches include the use of high-resolution daytime satellite data (Head et al., 2017; Jean et al., 2016), mobile phone metadata (Aiken, Bedoya, et al., 2021; Blumenstock et al., 2015), internet search history and social  media 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 4 HALL et AL. activities (Choi & Varian, 2012; Fatehkia et al., 2020; Llorente et al., 2015) and various combinations of these data sources (Pokhriyal & Jacques, 2017; Steele et al., 2017). These applications have been possible largely because of the proliferation of big data as well as newer methods in ML to process them. 1.2 | Application of artificial intelligence to poverty analysis The introduction of ML, especially DL algorithms, has been instrumental in the application of these new ‘big data’ sources in poverty studies. An array of ML approaches exists, but the most utilized in this domain of research could be classified as feature extraction algorithms or feature selection ones. Feature extraction methods include Principal Component Analysis, Linear Discriminant Analysis and Singular Value Decomposition. The most popu- lar feature extraction algorithms include Neural Networks, Random Forests, Gradient Boosting, Decision Trees, Naïve Bayes, Gaussian models, Vector Machines and Linear and Logistic Regressions. These algorithms can also be categorized into unsupervised—using information that is not labelled to guide the learning of patterns, simi- larities and differences; supervised—using labelled data to train the machine to produce correct outcomes; and semi-supervised—working with both labelled and unlabelled data so that the former is used to train the algorithm while the latter is used to make predictions. In this domain of research, the outcome could be whether specific locations, areas, households or individuals are poor or not as well as estimate the degree of deprivation. The choice, utility and propriety of each technique are largely informed by the structure of the dataset and the task at hand. While NTLs, by themselves, have proven less accurate at differentiating between the poor and the ultra-poor, especially in less-developed regions (Yeh et al., 2020), they have proven useful when used in conjunction with daytime SI, which tends to have higher spatial resolutions. Jean et al. (2016), for instance, successfully trained the DL algo- rithm Convolutional Neural Network (CNN) on detail-rich daytime SI in which paved roads and metal roofs are visible. In doing so, they developed a technique that shows the relationship between features from daytime SI and night-time imagery—for the latter, lighted areas are indicative of economic activity. Through this approach, the  authors were able to predict indicators of poverty at the regional level. With regard to mobile phone metadata, there is a strong association between mobile phone use and regional distribution of wealth. Eagle et al. (2010), for instance, found that network diversity alone accounted for more than three quarters of the variance in a region's economic status in the United Kingdom. The application of this approach is not limited to developed countries though, given the increasingly high mobile penetration rates even in developing countries. Blumenstock et al. (2015), for instance, constructed a composite wealth index using principal components of various wealth indicators gleaned from the 2007 and 2010 DHS for Rwanda as well as a phone survey and CDR such as calls and text messages. Through this, the authors demonstrate that a mobile phone subscriber's wealth status can be inferred from their historical phone use pattern, with cross-validated correlation coefficients of 0.68. The authors accomplish this through a combination of feature engineering and feature selection to transform phone users' transaction logs into metrics that are then winnowed through vari- ous dimension reduction techniques. The authors also demonstrate how alternative supervised learning models, including decision tree-based regressors and classifiers, can produce comparable results. In addition to call and text messaging history, other key CDR features include top-up patterns, handset type and user mobility between and among cell towers (Steele et al., 2017). Indeed, merely owning a mobile device is indicative of a certain level of welfare. Exponential growth in computing power and more effective and efficient learning algorithms are providing condu- cive conditions for combining disparate data sources to predict and estimate poverty and welfare. Pokhriyal and Jacques (2017), for example, employed a Bayesian Gaussian Process regression on CDR, SI and environmental data to accurately predict poverty even at the individual level in Senegal. They demonstrate superior prediction accuracy when using such disparate data sources compared to using single datasets. Similarly, Steele et al. (2017) employed multi- 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License HALL et AL. 5 ple data sources—CDR, poverty data and remote sensing covariates such as NTLs, vegetation indices and metrics on distances to roads or urban centres—to predict poverty levels in Bangladesh using hierarchical Bayesian Geostatistical Models. It is important to note that CDRs are usually not easily accessible from mobile network operators due to commercial rights and privacy concerns. Even with scanty CDR, enough insights can be gleaned when such data are combined with other data sources. Njuguna and McSharry (2017), for instance, demonstrate how such sparse CDR can be combined with per capita mobile handset ownership and call volume per handset, normalized NTL from SI, and population density to estimate a multi-dimensional poverty level in Rwanda, with a cross-validated correlation coefficient of 0.88. The accuracy of these studies is not overly negatively impacted by applying them to multiple countries either. For example, using a variety of datasets including the DHS malnu- trition and asset poverty data, remotely sensed solar-induced chlorophyll fluorescence, and precipitation and conflict data, Browne et al. (2021) demonstrate how random forest models can be used to estimate malnutrition and poverty prevalence across 11 countries—Bangladesh, Ethiopia, Ghana, Guatemala, Honduras, Kenya, Mali, Nepal, Nigeria, Senegal and Uganda. Here, we hypothesize that while the combination of methods and data sources can improve the accuracy of prediction of welfare, performance reduces as the application is spread over multiple countries and regions. This may be due mainly to the multidimensional nature of poverty and the different ways in which poverty spatially manifests. Thus, while it is useful to assess how accurately human development can be predicted, it is equally important to evaluate the performance of the different approaches and datasets. While the performance of models for poverty analysis in this area of scholarship is reported to be on a general ascendency (Burke et al., 2021), certain factors play a role in the predictive performance. Some important indica- tors can be measured with higher success than others. Model performance may, thus, be influenced by the type, spatial resolution and nature of the data used in the modelling. Poverty data, for example, are usually sourced from variables based on income, consumption and/or assets. The predictive power of assets has been shown to be higher than consumption and income-based variables (Jean et al., 2016). This is partly because consumption and income levels tend to vary more significantly within shorter periods of time as they relate more directly to harvest outcomes, job losses and gains, and even household size (Steele et al., 2017). Assets such as ownership of mobile phones, tractors and the type and quality of roofing of buildings are more durable and tend to vary less often over time. Even more importantly, some poverty indicators, such as assets, are potentially more easily discernible in high-resolution SI—the so-called hard indicators—than other indicators such as consumption and income. Furthermore, despite the increasing access to big data, higher resolution datasets are usually more difficult or expensive to access and computationally more taxing to process. Some studies employ pre-processing steps such as pan-sharpening to improve the spatial resolution of satellite data. For example, NTLs, which tend to have lower spatial resolution of about 1 km, can be fused with higher resolution daytime SI to produce a higher resolu- tion composite dataset. Still, ML algorithms have been used to infer individual subscribers' socioeconomic status directly from their individual phone use habits (highest resolution) and then aggregate such predictions to town, district and regional levels (lower resolutions) (Blumenstock et al., 2015). We hypothesize that certain types of assets and features can more effectively be measured in higher resolution imagery than others. This will mean the spatial resolution of the dataset that is used for the analysis becomes important for model performance. Given the trade-off between increasing spatial resolution and accessibility of data, a clearer understanding of the cut-off point at which ML algorithms most accurately predict welfare is most relevant for the further development of the SIML approach. For the wider user community—social science researchers and the development practitioners—concerned with tackling poverty, inequality and improving welfare, understanding which ML approaches best predict welfare, its accuracy and even at what spatial resolution this can most optimally be done are fundamental questions. Shed- ding light on these, based on the state of the art in this domain of scholarship, is not only important for future 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 6 HALL et AL. methodological development of the field but also has long-term policy implications. It can, for instance, help to understanding why there are currently scanty downstream applications of this approach in development (Burke et al., 2021), with a few recent exceptions (Aiken, Bedoya, et al., 2021; Aiken, Bellue, et al., 2021; Blumenstock et al., 2021). 2 | METHODOLOGY 2.1 | Inclusion criteria Our review method can be described as integrative rather than systematic (Snyder, 2019). This body of knowledge, mixing preprints, working papers, technical reports, peer-reviewed papers and conference papers with contributions of various disciplines is notoriously difficult to capture with one single approach. We used Xie et al. (2015)—one of the first publications to examine the application of ML and SI for measuring poverty and economic well-being—as the benchmark and narrative for our analysis. On this basis, papers completed prior to 2014 and do not apply ML to study socioeconomic wellbeing from SI were excluded from the study. We include literature from published journal articles, grey literature such as working papers and validation studies that have clear empirical application; that is, we excluded reviews of literature. We however limited our inclusion criteria for the year of publication or completion of the drafts of the grey literature. For study design, we included any study that sufficiently describes the application of AI, ML and DL on SI. On selection criteria based on population and geographical location, we had no restrictions, meaning that studies from high-, middle- and low-income countries were eligible for inclusion. For thematic focus, we included studies that explicitly describe or propose either conventional or new ways of measuring the welfare or poverty levels of populations or proxies for doing so within social science disciplines. We gathered papers from multiple sources using different search words, phrases and topics related to the subject of the study. We focused on the use of SI or data, prediction of socioeconomic welfare indicators within the timeframe specified earlier. Since our interest was on both peer-reviewed papers and grey literature, we did not restrict our search to any specific search engines. However, we accessed papers on the former from Google Scholar and ScienceDirect. Our final database from this search comprised 60 papers from peer-reviewed journal articles, preprints, conference presentations and working papers and other grey literature. 2.2 | Data preparation, regression model specification and description of variables 2.2.1 | Data preparation Some studies report multiple results for the same metric used in analysing the target outcome, making it difficult to record results for such studies as a single variable. The multiple results may be estimated and reported for different datasets, models, study locations, years or a combination of these. To identify the actual performance of models, where separate results are presented for training and validation datasets (e.g., Hofer et al., 2020), we recorded the results for the latter. For results of different models, we recorded the results of the model with the highest precision (e.g., Mahabir et al., 2020). Where separate results are reported for satellite and ground truth data, the former is used (e.g., Bruederle & Hodler, 2018). Where several models are run to observe control effects, the results of the full model are used (e.g., Bruederle & Hodler, 2018). A number of studies looked at several target outcome variables. Where studies report multiple indicators, we captured the main target welfare outcome of interest specified or inferred by the authors. Unless otherwise specified, we report the composite target where there is one among the targets of a study. These are usually related to indi- ces of poverty, inequality and related measures or proxies of welfare. And in the absence of such explicit targets as 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License HALL et AL. 7 T A B L E 1   Description and summary statistics of study variables Variable Min Max Mean Std. Dev. Proportion of variability explained by models 0.36 0.96 0.75 0.16 Spatial resolution of satellite imagery (metres) 0.05 1000 198.82 2.81 Number of preprocessing methods 0 6 2.25 1.61 Number of datasets used 1 8 3.67 1.68 Target welfare indicator studied (0 if soft, 1 if hard) 0 1 28.33 - Proportion using Deep Learning (DL) 0 1 0.38 - Proportion using Machine Learning (ML) 0 1 0.50 - Proportion using both DL and ML 0 1 0.12 - Number of countries studied 1 57 5.90 11.45 Year of study or publication 2014 2021 2018 1.81 Note: N = 60. described, we capture data on the target closest or related to the indices of economic wellbeing. The primary targets include asset wealth index, poverty rates, socioeconomic status and slum mapping. Others target socioeconomic indicators (which were rarely captured in our study because of our focus on the main indicators), including access to electricity, NTL, access to water, access to a toilet, educational attainment, monetary income, body mass index and population (see, e.g., Lee et al., 2021; Steele et al., 2017; Tingzon et al., 2019). These are used as proxies for measur- ing poverty and inequality and yet are not readily observable from daytime SI. We define target as the outcome variable or what each paper tries to estimate or predict. The target indicators are measured at different levels: individual, household, neighbourhood, village, enumeration area, and so forth. Some authors used multiple indicators for measuring their poverty/welfare outcome variable. In such cases, we extracted the main target outcome reported in such papers for the purpose of our regression model. These were referenced in the title of the study, reported in the abstract or in the main text of the papers. 2.2.2 | Regression model specification and description of variables What determines the predictive power of the various SIML approaches employed for studying welfare and poverty? Answering this question helps shed light on the relevance for human development and the current meth- odological complexities and data requirements for predicting welfare using methods other than ground truthing and surveys. We answer the above question using regression analysis. Our dependent variable is the explained proportion of variability in the welfare variables measured using the SIML methods applied by the 60 papers. With our dependent variable measured as a proportion, our interest is in the conditional expectation of the share of variability explained by paAper i, y  , conditional on a vector of covarAiates, x  . We therefore specify the following i i fractional probit model: | ′E(y x ) = α+β x + ε ,i i i i wAhere y ≡ 0< yA< 1 , α is the interAcept, β is the coefficient vector associated with the explanatory variables i i described in Table A1 and ε is the random error term. The vAector x has seven variables, which are as follows. i i Spatial resolution The first is spatial resolution of the satellite images. The papers reported satellite spatial resolutions in centime- tres, metres and kilometres. A majority (about 67%) of the resolutions were reported in metres, and as a result, we 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 8 HALL et AL. converted all resolutions to metres. We expect the predictive power of a study to be positively correlated with spatial resolution.1 Number of pre-processing methods The second variable in the vAector x is the number of pre-processing methods used by the various publications. i Making meaning and preparing satellite images for analytical purposes as well as linking them to ground truth data requires additional data transformation steps, which are referred to as pre-processing. Depending on the number and complexity of SI involved, one or more methods of pre-processing the data may be required. Therefore, we expect the proportion of variability explained by the papers to be increasing with the number of pre-processing methods used. Number of datasets used The vAector x also contains the number of datasets used by each paper for predicting welfare. While some of the i studies use SI and related datasets, others use a combination of SI methods and ground truth data. We expect the power of the predictions to be increasing as more datasets are used for training. Target welfare indicator Different welfare indicators were used in the sample of published papers included in this study. These included household-level measures such as poverty and inequality indices, mobile phone use and expenditure; community and neighbourhood-level indicators such as mapping slams, infrastructure quality and areas lit at night in square kilo- metres; and city and country-level indicators such as economic development, gross domestic product and employ- ment rates. Given the sample size and the concentration of indicators, we constructed a binary variable that took on the value one for ‘soft’ welfare indicators (e.g., income, expenditure and quantity of proteins consumed) and zero for ‘hard’ welfare indicators (e.g., physical assets such as type of roof, rail tracks and bridges). We expect that SI and related methods used by the published papers would more accurately predict ‘hard’ indicators than ‘soft’ indicators; that is, we expect this variable to be positively correlated with the predictive power of the models estimated by the papers. Type of method applied Fifth, the vAector x captures methods of Artificial Intelligence (AI). The 60 papers apply two AI methods—DL and/ i or ML. Our hypothesis is that the type of method matters for the predictive power of the models presented in the papers. Therefore, we constructed three dummy variables, which are where the study applied (a) DL only, (b) ML only or (c) a combination of DL and ML. We expect that combining both approaches would increase the predictive power of the models over and above using either one of the approaches. Number of countries PenultiAmate, x  , also contains a binary covariate that captures whether the datasets used for the studies covered one i or more countries. We expect that there would be an inverse relationship between the number of countries included and the performance of the model as spatial poverty tends to manifest differently in various countries. Year study was published Lastly, our regression model includes year of publication as a covariate. As knowledge improves and better approaches and methods become available, one could expect prediction accuracy to rise. Thus, we expect prediction power to increase with time, which is why we included the year of publication, expecting a positive and significant association with the proportion of variability explained by the models. 1 We inverted the spatial resolution variable so that high values would intuitively be interpreted as better resolution. 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License HALL et AL. 9 Table 1 shows the summary statistics of the dependent variable and explanatory variables. The predictive power of the models in the 60 articles included in our analysis ranged between 36% and 96% with a mean of 75% and a median of about 79%. Spatial resolution of satellite images reported in the papers ranges from just about 5 cm to 1000 m; the mean and median were about 199 and 3.8 m respectively, meaning that the resolutions reported are highly skewed to the right. In fact, if we use 5 m/pixel as benchmark for high resolution, we observe that about 53% of the papers report using high-resolution imagery. Some studies reported no pre-processing; the mean number of pre-processing methods was about two. On average, the studies used nearly four different datasets for their analysis, and less than half of the studies (about 28%) targeted ‘hard’ welfare indicators. On type of AI method applied, about half of the studies reported using ML and 38% used DL, and 12% used both. About 58% of the papers were based on country-specific studies, but the mean number of countries covered per study was approximately six. The 60 papers considered in this article were published between 2014 and 2021. 3 | REGRESSION RESULTS The regression results are presented in Table 2, and the key insights are as follows. First, the mean spatial resolution of SI has a positive but statistically insignificant effect on the predictive power of the published welfare analyses. This is a surprising result because our a priori assumption was that imagery with a higher spatial resolution would be significantly associated with a better prediction of welfare. After examining several functional forms of the variable, the aggregate results from the pool of studies in our sample do not show a significant effect. Second, the number of preprocessing methods used has a significant positive effect on predictive power at the 0.10 level. An additional method of preprocessing is associated with almost two percentage point increase in the proportion of variation in welfare explained by the papers included in this article. T A B L E 2   Determinants of welfare prediction performance (1) (2) Variables Coefficient Average marginal effects Inverse spatial resolution of satellite images 0.008 (0.011) 0.002 (0.003) Number of preprocessing methods 0.052* (0.030) 0.016* (0.009) Number of datasets used 0.125*** (0.032) 0.038*** (0.010) Hard vs. soft target welfare indicator 0.607*** (0.130) 0.167*** (0.033) AI method (reference is deep learning)  Machine learning 0.212* (0.122) 0.064* (0.036)  Machine learning + deep learning 0.534*** (0.142) 0.138*** (0.032) Cross-country vs. country-specific studies −0.036 (0.103) −0.011 (0.031) Year of publication 0.015 (0.034) 0.004 (0.010) Intercept −30.071 (69.005) Observations 60 Pseudo R-squared 0.050 Model Chi-squared 63.587 p value for model test 0.000 Note: Robust standard errors in parentheses. Abbreviation: AME, average marginal effect. *p < 0.1. **p < 0.05. ***p < 0.01. 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 10 HALL et AL. Third, the number of datasets used by a study has a positive and highly significant (p value = 0) correlation with the predictive power of the welfare estimates. This typically means that studies that use a combination of ground truth and satellite data have higher welfare prediction power. Specifically, an additional dataset increases the predic tive power of the welfare estimate by about four percentage points. Fourth, the type of welfare indicator targeted by SI and the AI methods matters. As could be expected, studies targeting ‘hard’ welfare indicators have higher predictive power than those targeting ‘soft’ indicators, and the differ- ence in predictive power is significant at the 1% level. Compared with ‘soft’ indicators, targeting ‘hard’ indicators is associated with about 17 percentage points higher predictive power; this is a large magnitude of difference. Fifth, we find that the choice of tool for a study (whether ML, DL or a combination of the two) matters for the predictive power of the welfare models. Using ML increases predictive power by about seven percentage points compared with using DL, but the effect is statistically significant only at the 0.10 level; using a combination of the two increases predictive power by 15 percentage points relative to using DL, and this effect is significant at the 1% level. Similarly, Figure 1 shows that using both machine and DL increases the proportion of explained variation in welfare indicators by about eight percentage points above what could be realized using ML alone (p value = 0.022). This means that combining DL and ML should be preferred to either of them as a single tool in predicting welfare. Sixth, although we observe, as expected, that the number of countries included in a study is negatively asso- ciated with the predictive power of the welfare estimates, the effect is not statistically significant at conventional levels. One reason that may be responsible for this is the lack of standardization of welfare measurements and indi- cators across countries. Finally, although we expected a priori that the predictive power of the welfare estimates would have improved over time, we find no evidence of a statistically significant effect even though the expected positive sign is observed. F I G U R E 1   Difference in average marginal effects (AME) of AI tools on predictive power. Note: The line caps represent 95% confidence intervals. 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License HALL et AL. 11 4 | DISCUSSIONS The present paper set out to analyse the state of the art at the intersection of the application of cluster of ML and DL tools on SI to predict poverty or welfare. Key findings from this nascent but rapidly growing field suggests the following. First, our finding that the relationship between the mean spatial resolution of the individual studies and their predictive power is not statistically significant was quite surprising. Conventional wisdom holds that higher resolution SI would contain more abundant information about the landscape and its features that could be correlated with economic activity (Jean et al., 2016). It then stands to reason that train- ing datasets based on such higher resolution imagery would produce more accurate prediction and produce models that have higher predictive power (Engstrom et al., 2022; Head et al., 2017). Our result suggesting a positive but statistically insignificant relationship between spatial resolution and accuracy has important implications. It suggests, for instance, that previous poor results achieved were down to other factors than the unavailability of higher spatial resolution satellite data per se. For researchers, this implies that going forward, additional resources would not need to be expended to acquire higher resolution imagery, which are often only commercially available at high cost (Ayush et al., 2020) and that publicly available SI would suffice in most cases. While we do not find any evidence of a statistically significant effect that prediction performance using this approach increases over time, Burke et al. (2021) find the contrary. It must be noted, however, that they assessed the performance of these approaches within the specific domains of smallholder agriculture, economic livelihoods, population and informal settlements. They also attribute the improving performance they measure to three main factors: more creative application of advances in computer visions, more abundant and higher quality SI, and more numerous and accurate training datasets. The latter jives with our findings, which suggest that the number of datasets used is positively and statistically significant for prediction performance. In this vein, the increasing proliferation of more accurate and higher quality training datasets portends well for this field of scholarship. Most studies in this area previously relied more heavily on NTLs datasets with coarse spatial resolutions (1 km/pixel) for estimating the level of welfare or development. The cluster of ML approaches recently applied in this intersection has proven to significantly improve predictions that could be achieved using NTL. For example, one of the most important findings from Yeh et al. (2020) was that NTLs tend to perform relatively poorly compared to daytime imagery in predicting asset wealth, largely because the former does not vary sufficiently in poor regions. The review also notes the limited downstream application, which it attributes, in part, to the novelty of the approaches and their lack of interpretability. With regards to the latter, explainable AI is the next rung in the ladder of applying ML to everyday social development issues such as poverty analysis. This requires transparency in model building (Hall et al., 2022). We argue that a necessary, even if insufficient, condition for the development of transparent, explainable and interpretable rather than black box (Rudin, 2019) ML models is adequate domain knowledge, which, in turn, requires co-option of development research- ers and practitioners. An important consideration in the use of SI is the type and number of pre-processing operations that are employed to format images before they are fed to the models for training. In the reviewed papers, the main pre-processing operations include radiometric correction, rotation and flipping, channel re-scaling, normalization, cropping and pan-sharpening. The most-used pre-processing step—pan-sharpening—entails enhancing the lower spatial resolution of multispectral band images by combining them with higher resolution panchromatic images (Hofer et al., 2020). That it is the most employed pre-processing operation is unsurprising given the conventional view that higher spatial resolution datasets invariably translate into better training data and more accurate predictions. However, if it turns out that the relationship between the spatial resolution of the SI and accuracy of the result is positive but statistically insignificant, then pan-sharpening might become a redundant operation. This notwithstanding, other pre-processing steps would remain critical to ensure more accurate prediction of welfare. In the reviewed papers, radiance correction remains key for filtering out ephemeral light sources in DMSP data (Kim et al., 2016), especially in studies that rely on NTLs (Bruederle & Hodler, 2018; Rybnikova & Portnov, 2020; Zhao et al., 2019). Given how fundamental some 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 12 HALL et AL. of these pre-processing operations are, some SI datasets such as PlanetScope imagery are often radiometrically corrected before delivery to users (Warth et al., 2020), while for others, users need to implement the correction (Duque et al., 2017; Leonita et al., 2018). Key takeaways from our results are that the total number of datasets used in training of the model, the nature of the target welfare indicator and the specific learning model contribute the most to explaining the level of welfare that can be predicted. With regard to capability of the various learning models, our results suggest that a combination of more conventional ML models and those using DL approaches has the most predictive power in welfare estimation studies using SI data. A vast majority of the reviewed studies employed either ML or DL alone (24 and 30, respec- tively), with only 7 of them (Gram-Hansen et al., 2019; Hofer et al., 2020; Lee & Braithwaite, 2020; Li et al., 2019; Mboga et al., 2017; Puttanapong et al., 2022) combining both ML and DL in their analyses. These papers combine the approaches differently though. While Li et al. (2019) compare their performance concurrently to determine which performs relatively better, Lee and Braithwaite (2020) combine them in an iterative manner. The latter use the ML algorithm of eXtreme Gradient Boosting (XGBoost) to estimate welfare levels for all populated places in 25 countries and then use this predicted welfare level to train the DL model of CNN. This was done to circumvent the need to use night-time luminosity. Lee and Braithwaite (2020) then fed the featurized information that was the output from the CNN model back to the XGBoost model. They then apply transfer learning from the second iteration onwards to augment learning and speed up the process. Combining these different models in the iterative manner through  trans- fer learning tends to generate better training data which contribute to prediction accuracy (Burke et al., 2021; Head et al., 2017; Hofer et al., 2020). Other ML models might also contribute to better results in this field. Long Short-Term Memory (LSTM) networks, a class in the recurrent neural networks (RNNs) family, have been shown to be, capable of learning order dependence in sequence prediction problems. This quality makes them most useful in complex machine vision tasks such as predicting poverty from SI. However, thus far, LSTMs have seen limited application in this domain as none of our 60 reviewed papers employed this type of network; CNNs are the most common. Among our final 60 papers, 48% (29 of 60) of the studies employed some form of NN, with 21 of these studies using CNNs as the main model. The limited application of LSTM networks in studies at the intersection of poverty, SI and ML is not too surprising though, as CNNs are predominantly useful for spatial predictions while RNNs such as LSTM networks are more effective at capturing temporal predictions without suffering optimization hurdles, which tend to plague other RNNs (Greff et al., 2016). This, among other advantages, makes LSTM networks ideal candidate models for higher dimensional data analysis tasks such as handwriting recognition, and video and imagery analysis. More recently, LSTM networks are being used to further enhance the already impressive prediction results achieved by traditional NNs. For example, LSTM models have been applied in the field of infectious diseases to predict the spread of the coronavirus in Bangladesh (Absar et al., 2022). Similarly, in an innovative approach of combining CNNs and LSTMs, Yang et al. (2022) used a hybrid approach to achieve even better prediction results for surface erosion rates prediction at a significantly reduced time and computational costs. This holds great promise for the poverty-satellite data-ML domain of research, and we look forward to more studies adopting this hybrid ML approach in this niche area of research. Our finding of significantly higher predictive power of models that are based on visible features, the so-called ‘hard’ indicators, is instructive even if unsurprising given that ‘soft’ indicators such as income levels, expenditure or the quantity of meat consumed by a household, for example, are more difficult to estimate from an SI than the existence and size of buildings or quality of roofing in a scene. In this sense, these models may be grouped into feature-based algorithms—those that rely on quantifiable geospatial features such as the number of building, length of road, number of junctions and image-based models—those that can recognize the qualitative characteristics of these features (Lee & Braithwaite, 2020). The choice of either of these then comes down to the resolution of the satellite data available since lower resolution SI tends to provide more information about the spatial context such as whether the data are from a rural or urban landscape while higher resolution SI is more useful for extracting the 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License HALL et AL. 13 qualitative characteristics of the features (Kim et al., 2016). This suggests that the increasingly more accurate results obtained by studies in this area of scholarship could be driven more by a combination of improving the spatial reso- lution of readily available SI data and, perhaps more importantly, the potency and effectiveness of new tools, and approaches as well as the computational power to implement these as Burke et al. (2021) contend. As an illustration, Lee et al. (2021) show the importance of ‘hard’ indicators such as infrastructure (rail tracks and bridges) as well as other physical features like vehicles, street lights and billboards as indicators of the existence of services and devel- opment, and by extension, welfare. It is little wonder that the recent trend is one of combining feature-based and image-based approaches in either sequential or complementary manner to predict welfare (Abitbol & Karsai, 2020; Lee & Braithwaite, 2020; Warth et al., 2020). 5 | CONCLUSIONS The paper sets out to analyse the state of the art of studies that fall at the intersection of SI (operationalized broadly as all remotely sensed data), poverty analysis and using artificial intelligence tools of machine, deep and transfer learning. Studies in this nascent domain of scholarship are reported to be seeing consistently improving accuracy over the last couple of years, though our results suggest a weak but positive relationship between the spatial resolution of the SI in use and prediction accuracy. The strong explanatory power of models that are based on ‘hard’ indicators, which are, in turn, more accurate on higher resolution SI data suggests that marked improve- ments in the tools and computational capabilities are instrumental in these continuous improvements in prediction accuracy in the last few years. It also points to the importance of intermediate data pre-processing steps, especially those related to improving SI resolution such as pan-sharpening. Further progress in algorithms as well as less expensive access to more accurate and numerous training datasets portends well for the field of application of welfare measurement using SI and ML tools. With regard to the specific models and their efficacy in predicting poverty and welfare, we found that  a combination of ML and DL algorithms has the best performance, compared to either individual group of models in their own right. This further supports our view that rather than the sheer improvements in the resolution of SI, it is the increasing efficacy of newer tools and models that could be spurring any improving results that we may seeing in  this area. Here, we note the recent advances in LSTM ML approaches (Absar et al., 2022; Rußwurm & Korner, 2017). This holds a promise as LSTMs seem to be specifically well suited for keeping track of long-term dependencies in data, which resonates well with the multi-temporal characteristics of man-made landscapes. While the application of the SIML approach continues to see improving model performance, more transparency is needed to achieve the next target—explainable AI. This is quite a daunting task given the multidimensional nature, place-based character of poverty. Thus, the differences in spatial manifestation of poverty and its markers in different locations and regions or countries, from a ML model perspective, further complicate this approach. It is mainly for this reason that the combination of multiple datasets or data sources enhances the performance of these models. ACKNOWLEDGEMENTS The authors would like to thank financial support during the project from the Swedish Research Council 2019-04253 and Riksbankens Jubileumsfond MXM19-1104:1. We would also like to thank the reviewers for valuable input. DATA AVAILABILITY STATEMENT The data that support the findings of this study are available in the supporting information of this article. ORCID Ola Hall https://orcid.org/0000-0002-9231-4028 Fred Mawunyo Dzanku https://orcid.org/0000-0002-2271-7876 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 14 HALL et AL. REFERENCES Abitbol, J. L., & Karsai, M. (2020). Socioeconomic correlations of urban patterns inferred from aerial images: Interpreting activation maps of Convolutional Neural Networks. arXiv preprint arXiv:2004.04907. Absar, N., Uddin, N., Khandaker, M. U., & Ullah, H. (2022). The efficacy of deep learning based LSTM model in forecast- ing the outbreak of contagious diseases. Infectious Disease Modelling, 7(1), 170–183. https://doi.org/10.1016/j. idm.2021.12.005 Aiken, E., Bedoya, G., Blumenstock, J. E., & Coville, A. (2021). Program targeting with machine learning and mobile phone data: Evidence from an anti-poverty intervention in Afghanistan. Aiken, E., Bellue, S., Karlan, D., Udry, C. R., & Blumenstock, J. (2021). Machine learning and mobile phone data can improve the targeting of humanitarian assistance. Ayush, K., Uzkent, B., Tanmay, K., Burke, M., Lobell, D., & Ermon, S. (2020). Efficient poverty mapping using deep reinforce- ment learning. arXiv preprint arXiv:2006.04224. Blumenstock, J., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science, 350(6264), 1073–1076. https://doi.org/10.1126/science.aac4420 Blumenstock, J., Karlan, D., & Udry, C. (2021). Using mobile phone and satellite data to target emergency cash transfers. CEGA Blog Post. poverty-action.org/study/using-mobile-phone-and-satellite-data-target-emergency-cash-transfers-togo Blumenstock, J. E. (2016). Fighting poverty with data. Science, 353(6301), 753–754. https://doi.org/10.1126/science. aah5217 Browne, C., Matteson, D. S., McBride, L., Hu, L., Liu, Y., Sun, Y., Wen, J., & Barrett, C. B. (2021). Multivariate random forest prediction of poverty and malnutrition prevalence. PLoS ONE, 16(9), e0255519. https://doi.org/10.1371/journal. pone.0255519 Bruederle, A., & Hodler, R. (2018). Nighttime lights as a proxy for human development at the local level. PLoS ONE, 13(9), e0202231. https://doi.org/10.1371/journal.pone.0202231 Burke, M., Driscoll, A., Lobell, D. B., & Ermon, S. (2021). Using satellite imagery to understand and promote sustainable devel- opment. Science, 371(6535), eabe8628. https://doi.org/10.1126/science.abe8628 Choi, H., & Varian, H. (2012). Predicting the present with Google Trends. Economic Record, 88(s1), 2–9. https://doi. org/10.1111/j.1475-4932.2012.00809.x Duque, J. C., Patino, J. E., & Betancourt, A. (2017). Exploring the potential of machine learning for automatic slum identifica- tion from VHR imagery. Remote Sensing, 9(9), 895. https://doi.org/10.3390/rs9090895 Eagle, N., Macy, M., & Claxton, R. (2010). Network Diversity And Economic Development. Science, 328(5981), 1029–1031. https://doi.org/10.1126/science.1186605 Engstrom, R., Hersh, J., & Newhouse, D. (2022). Poverty from space: Using high resolution satellite imagery for estimating economic well-being. The World Bank Economic Review, 36(2), 382–412. https://doi.org/10.1093/wber/lhab015 Erenstein, O., Hellin, J., & Chandna, P. (2010). Poverty mapping based on livelihood assets: A meso-level application in the Indo-Gangetic Plains, India. Applied Geography, 30(1), 112–125. https://doi.org/10.1016/j.apgeog.2009.05.001 Fatehkia, M., Tingzon, I., Orden, A., Sy, S., Sekara, V., Garcia-Herranz, M., & Weber, I. (2020). Mapping socioeconomic indicators using social media advertising data. EPJ Data Science, 9(1), 22. https://doi.org/10.1140/epjds/s13688-020-00235-w Gibson, J. (2016). “Poverty measurement: we know less than policy makers realize.” Asia & the Pacific Policy Studies, 3(3), 430–442. Gram-Hansen, B. J., Helber, P., Varatharajan, I., Azam, F., Coca-Castro, A., Kopackova, V., & Bilinski, P. (2019). Mapping infor- mal settlements in developing countries using machine learning and low resolution multi-spectral data. Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2016). LSTM: A search space odyssey. IEEE Trans- actions on Neural Networks and Learning Systems, 28(10), 2222–2232. https://doi.org/10.1109/TNNLS.2016.2582924 Hall, O., Ohlsson, M., & Rögnvaldsson, T. (2022). Satellite image and machine learning based knowledge extraction in the poverty and welfare domain. Patterns, 3(10), 1–15 https://doi.org/10.2139/ssrn.4102620 Head, A., Manguin, M., Tran, N., & Blumenstock, J. E. (2017). Can human development be measured with satellite imagery? Proceedings of the Ninth International Conference on Information and Communication Technologies and Develop- ment, Lahore, Pakistan. Hofer, M., Sako, T., Martinez, A. Jr., Addawe, M., Bulan, J., Durante, R. L., & Martillan, M. (2020). Applying Artificial Intelligence On Satellite Imagery To Compile Granular Poverty Statistics. In Asian development bank economics working paper series (Vol. Working Paper No 629). Asian Development Bank. Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790–794. https://doi.org/10.1126/science.aaf7894 Jerven, M. (2017). How much will a data revolution in development cost? Forum for Development Studies, 44(1), 31–50. https://doi.org/10.1080/08039410.2016.1260050 Keola, S., Andersson, M., & Hall, O. (2015). Monitoring economic development from space: using nighttime light and land cover data to measure economic growth. World Development, 66, 322–334. https://doi.org/10.1016/j.worlddev.2014.08.017 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License HALL et AL. 15 Kim, J. H., Xie, M., Jean, N., & Ermon, S. (2016). Incorporating spatial context and fine-grained detail from satellite imagery to predict poverty. Lee, J., Grosz, D., Uzkent, B., Zeng, S., Burke, M., Lobell, D., & Ermon, S. (2021). Predicting livelihood indicators from community-generated street-level imagery. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 268–276. https://doi.org/10.1609/aaai.v35i1.16101 Lee, K., & Braithwaite, J. (2020). High-resolution poverty maps in Sub-Saharan Africa. arXiv preprint arXiv:2009.00544. https://doi.org/10.48550/arXiv.2009.00544 Leonita, G., Kuffer, M., Sliuzas, R., & Persello, C. (2018). Machine learning-based slum mapping in support of slum upgrading programs: The case of Bandung City, Indonesia. Remote Sensing, 10(10), 1522. https://doi.org/10.3390/rs10101522 Li, G., Cai, Z., Liu, X., Liu, J., & Su, S. (2019). A comparison of machine learning approaches for identifying high-poverty counties: Robust features of DMSP/OLS night-time light imagery. International Journal of Remote Sensing, 40(15), 5716–5736. https://doi.org/10.1080/01431161.2019.1580820 Llorente, A., Garcia-Herranz, M., Cebrian, M., & Moro, E. (2015). Social media fingerprints of unemployment. PLoS ONE, 10(5), e0128692. https://doi.org/10.1371/journal.pone.0128692 Mahabir, R., Agouris, P., Stefanidis, A., Croitoru, A., & Crooks, A. T. (2020). Detecting and mapping slums using open data: A case study in Kenya. International Journal of Digital Earth, 13(6), 683–707. https://doi.org/10.1080/17538947.2018.1 554010 Mboga, N., Persello, C., Bergado, J. R., & Stein, A. (2017). Detection of informal settlements from VHR images using convolu- tional neural networks. Remote Sensing, 9(11), 1106. https://doi.org/10.3390/rs9111106 McBride, L., Barrett, C. B., Browne, C., Hu, L., Liu, Y., Matteson, D. S., Sun, Y., & Wen, J. (2021). Predicting poverty and malnu- trition for targeting, mapping, monitoring, and early warning. Applied Economic Perspectives and Policy, 44, 879–892. https://doi.org/10.1002/aepp.13175 Njuguna, C., & McSharry, P. (2017). Constructing spatiotemporal poverty indices from big data. Journal of Business Research, 70, 318–327. https://doi.org/10.1016/j.jbusres.2016.08.005 Pokhriyal, N., & Jacques, D. C. (2017). Combining disparate data sources for improved poverty prediction and mapping. Proceedings of the National Academy of Sciences, 114(46), E9783–E9792. https://doi.org/10.1073/ pnas.1700319114 Puttanapong, N., Martinez, A., Bulan, J. A. N., Addawe, M., Durante, R. L., & Martillan, M. (2022). Predicting poverty using geospatial data in Thailand. ISPRS International Journal of Geo-Information, 11(5), 293. https://doi.org/10.3390/ ijgi11050293 Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x Rußwurm, M., & Korner, M. (2017). Temporal vegetation modelling using long short-term memory networks for crop identifi- cation from medium-resolution multi-spectral satellite images. Proceedings of the IEEE conference on computer vision and pattern recognition workshops. Rybnikova, N., & Portnov, B. A. (2020). Testing the generality of economic activity models estimated by merging night-time satellite images with socioeconomic data. Advances in Space Research, 66(11), 2610–2620. https://doi.org/10.1016/j. asr.2020.09.003 Snyder, H. (2019). Literature review as a research methodology: An overview and guidelines. Journal of Business Research, 104, 333–339. https://doi.org/10.1016/j.jbusres.2019.07.039 Steele, J. E., Sundsøy, P. R., Pezzulo, C., Alegana, V. A., Bird, T. J., Blumenstock, J., Bjelland, J., Engø-Monsen, K., de Montjoye, Y.-A., & Iqbal, A. M. (2017). Mapping poverty using mobile phone and satellite data. Journal of the Royal Society Interface, 14(127), 20160690. https://doi.org/10.1098/rsif.2016.0690 Tingzon, I., Orden, A., Go, K., Sy, S., Sekara, V., Weber, I., Fatehkia, M., García-Herranz, M., & Kim, D. (2019). Mapping poverty in the Philippines using machine learning, satellite imagery, and crowd-sourced spatial information. International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences, XLII, XLII-4/W19, 425–431. https://doi. org/10.5194/isprs-archives-XLII-4-W19-425-2019 United Nations. (2015). Transforming our world: The 2030 agenda for sustainable development. (A/RES/70/1). New York: United Nations Organization Retrieved from sustainabledevelopment.un.org/content/documents/21252030%20 Agenda%20for%20Sustainable%20Development%20web.pdf Warth, G., Braun, A., Assmann, O., Fleckenstein, K., & Hochschild, V. (2020). Prediction of socio-economic indicators for urban planning using VHR satellite imagery and spatial analysis. Remote Sensing, 12(11), 1730. https://doi.org/10.3390/ rs12111730 Xie, M., Jean, N., Burke, M., Lobell, D., & Ermon, S. (2015). Transfer learning from deep features for remote sensing and poverty mapping. arXiv preprint arXiv:1510.00098. 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 16 HALL et AL. Yang, S. D., Ali, Z. A., Kwon, H., & Wong, B. M. (2022). Predicting complex erosion profiles in steam distribution headers with convolutional and recurrent neural networks. Industrial & Engineering Chemistry Research, 61(24), 8520–8529. https:// doi.org/10.1021/acs.iecr.1c04712 Yeh, C., Perez, A., Driscoll, A., Azzari, G., Tang, Z., Lobell, D., Ermon, S., & Burke, M. (2020). Using publicly available satellite imagery and deep learning to understand economic well-being in Africa. Nature Communications, 11(1), 2583. https:// doi.org/10.1038/s41467-020-16185-w Zhao, X., Yu, B., Liu, Y., Chen, Z., Li, Q., Wang, C., & Wu, J. (2019). Estimation of poverty using random forest regression with multi-source data: A case study in Bangladesh. Remote Sensing, 11(4), 375. https://doi.org/10.3390/rs11040375 Zhou, Y., & Liu, Y. (2022). The geography of poverty: Review and research prospects. Journal of Rural Studies, 93, 408–416. https://doi.org/10.1016/j.jrurstud.2019.01.008 How to cite this article: Hall, O., Dompae, F., Wahab, I., & Dzanku, F. M. (2023). A review of machine learning and satellite imagery for poverty prediction: Implications for development research and applications. Journal of International Development, 1–16. https://doi.org/10.1002/jid.3751 10991328, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jid.3751 by University of Ghana - Accra, Wiley Online Library on [18/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License