Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022 © Author(s) 2022. This work is distributed under the Creative Commons Attribution 4.0 License. Location, biophysical and agronomic parameters for croplands in northern Ghana Jose Luis Gómez-Dans1,2, Philip Edward Lewis1,2, Feng Yin1,2, Kofi Asare3, Patrick Lamptey3, Kenneth Kobina Yedu Aidoo3, Dilys Sefakor MacCarthy4, Hongyuan Ma2, Qingling Wu2, Martin Addi3, Stephen Aboagye-Ntow3, Caroline Edinam Doe3, Rahaman Alhassan5, Isaac Kankam-Boadu5, Jianxi Huang6,7, and Xuecao Li6,7 1National Centre for Earth Observation, Leicester, UK 2Dept. of Geography, University College London, London, UK 3Remote Sensing, GIS & Climate Center, Ghana Space Science Technology Institute, Accra, Ghana 4Soil and Irrigation Research Centre, University of Ghana, Accra, Ghana 5ADRA Ghana, Tamale, Ghana 6Department of Geographical Information Engineering, China Agricultural University, Beijing, China 7Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing 100083, China Correspondence: Jose Luis Gómez-Dans (jgomezdans@gmail.com) Received: 25 July 2022 – Discussion started: 11 August 2022 Revised: 27 October 2022 – Accepted: 4 November 2022 – Published: 14 December 2022 Abstract. Smallholder agriculture is the bedrock of the food production system in sub-Saharan Africa. Yields in Africa are significantly below potentially attainable yields for a number of reasons, and they are particularly vulnerable to climate change impacts. Monitoring of these highly heterogeneous landscapes is needed to respond to farmer needs, develop an appropriate policy and ensure food security, and Earth observation (EO) must be part of these efforts, but there is a lack of ground data for developing and testing EO methods in western Africa, and in this paper, we present data on (i) crop locations, (ii) biophysical parameters and (iii) crop yield, and biomass was collected in 2020 and 2021 in Ghana and is reported in this paper. In 2020, crop type was surveyed in more than 1800 fields in three different agroecological zones across Ghana (the Guinea Savannah, Transition and Deciduous zones). In 2021, a smaller number of fields were surveyed in the Guinea Savannah zone, and additionally, repeated measurements of leaf area index (LAI) and leaf chlorophyll concentration were made on a set of 56 maize fields. Yield and biomass were also sampled at harvesting. LAI in the sampled fields ranged from 0.1 to 5.24 m2 m−2, whereas leaf chlorophyll concentration varied between 6.1 and 60.3 µg cm−2. Yield varied between 190 and 4580 kg ha−1, with an important within-field variability (average per-field standard deviation 381 kg ha−1). The data are used in this paper to (i) evaluate the Digital Earth Africa 2019 cropland masks, where 61 % of sampled 2020/21 cropland is flagged as cropland by the data set, (ii) develop and test an LAI retrieval method from Earth observation Planet surface reflectance data (validation correlation coefficient R = 0.49, root mean square error (RMSE) 0.44 m2 m−2), (iii) create a maize classification data set for Ghana for 2021 (overall accuracy within the region tested: 0.84), and (iv) explore the relationship between maximum LAI and crop yield using a linear model (correlation coefficient R = 0.66 and R = 0.53 for in situ and Planet-derived LAI, respectively). The data set, made available here within the context of the Group on Earth Observations Global Agricultural Monitoring (GEOGLAM) initiative, is an important contribution to understanding crop evolution and distribution in smallholder farming systems and will be useful for researchers developing/validating methods to monitor these systems using Earth observation data. The data described in this paper are available from https://doi.org/10.5281/zenodo.6632083 (Gomez-Dans et al., 2022). Published by Copernicus Publications. 5388 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 1 Introduction very similar weather patterns. This complicates both moni- toring and modelling efforts. There is a strong need for better information on cropland Agricultural production in sub-Saharan Africa is dominated area, crop productivity and the factors that affect crop pro- by smallholder farms that support most households (Giller duction in Africa. Earth observation (EO) has been shown et al., 2021; Antonaci et al., 2014). In Ghana, agriculture to provide a practical data source for monitoring croplands contributes around 20 % of gross domestic product (GDP) over large areas for much of the world and contributing to and employs around half of the population (SRID, 2010). the collection of better agronomic statistics. This in turn can Maize accounts for more than half of the country’s cereal be used by decision-makers, agronomists and other parties production (Ragasa et al., 2014), and in the north of the coun- to better understand and plan actions to improve food secu- try, the crop is grown in rain-fed conditions, with low inputs rity and alleviating poverty (e.g. Baruth et al., 2008; Carletto (limited use of fertiliser, low uptake of modern/hybrid va- et al., 2015; Brown, 2016). rieties, low mechanisation) suffering additionally from con- To monitor agriculture, the first layer of information maps siderable post-harvesting losses and nutrient-poor soils (Fre- the areas of croplands to distinguish them from other land duah et al., 2019; MacCarthy et al., 2017; Sánchez, 2010; uses, but even this basic information is currently very un- SRID, 2010). These factors result in an important yield gap certain for large parts of Ghana. To be able to estimate to- compared to potential attainable yields (van Loon et al., tal production or even average productivity in some region, 2019; Cairns et al., 2013). This yield gap is exacerbated in an additional level of sophistication is needed on top of that a climate change context, where agricultural production in where the crop type is identified. While there have been some Ghana is likely to be further limited by increased tempera- recent advances in cropland masks derived from Earth obser- ture and more erratic rain regimes (Sultan and Gaetani, 2016; vation for the region (Burton et al., 2022; Estes et al., 2022; Chemura et al., 2020), with complex relationships with ni- Xiong et al., 2017), accurate, timely data sets that allow the trogen use (Falconnier et al., 2020) and other social factors location of individual crops over large areas are still mainly (Nyantakyi-Frimpong and Bezner-Kerr, 2015) adding further lacking. vulnerability to yields in the region. The advent of frequent, medium- to high-resolution EO Timely monitoring of smallholder maize production is im- data over the last ∼ 5 years from sensors such as Sentinel portant for understanding the developing food security situa- 2 (Drusch et al., 2012) and the Planet constellation (Planet, tion but also for providing information to food producers and 2018) allows repeated data acquisitions over the growing other value creators. Decision-makers at regional or national season that give the potential to infer the status of crops at levels need this for planning policy, import–export require- the field and sub-field levels, even for smallholders. This ments or other advance planning or support mechanisms for is currently mainly done using empirical relationships be- farmers (Nations, 2013; Nakalembe et al., 2021). Monitor- tween satellite-derived indicators (such as spectral vegeta- ing capabilities are also important for developing crop in- tion indices) and in situ measurements of yield and/or above- surance and maximising the economic potential (and hence ground biomass. Typical approaches relate either the maxi- livelihoods) of smallholder farmers (Benami et al., 2021). At mum value or the time integral of the signal over the sea- more local scales, such information can be used by exten- son to yield or biomass (Becker-Reshef et al., 2010; Koua- sion workers to assist farmers in improving their practice. dio et al., 2012; Unganai and Kogan, 1998; Mkhabela et al., Carletto et al. (2013) argue that a lack of good-quality agri- 2011; Franch et al., 2015; Petersen, 2018). The EO data act cultural data in Africa has hampered innovation and growth as indicators of green leaf biomass or green-up or senescence in this crucial economic sector. rates. In Ghana, 60 % of farms have an area of less than 1.2 ha There is a concerted effort to provide a minimal set of (SRID, 2010), giving rise to a highly heterogeneous land- so-called “Essential Agricultural Variables” (EAVs) that are scape. Some factors that explain this heterogeneity are also required to monitor agriculture globally (Whitcraft et al., common with the yield gap: limited access to e.g. irriga- 2019). Some of the EAVs are more directly related to the sta- tion, mechanisation and fertiliser use, workforce scarcity, tus of the crop, e.g. leaf area index (LAI), the fraction of pho- low labour productivity, or limited access to finance (van tosynthetically active radiation absorbed by the canopy (fA- Loon et al., 2019). Local patterns of crop yield are known PAR), soil moisture, above-ground biomass and leaf pigment to be impacted by local soil and meteorological conditions concentrations such as chlorophyll concentration. Some of as well as farmer choices. For example, Freduah et al. (2019) these biophysical parameters have been successfully derived note that maize planting occurs over a 3-month period, with from EO data (Verrelst et al., 2018). The derivation of bio- further crop development variation depending on the use of physical parameters is more involved than calculating a veg- fertiliser and other management practices. This, alongside etation index but simplifies the interpretation of EO data, re- the varying quality of seed inputs, results in very different ducing the effect of extrinsic/nuisance processes on the EO inter- and intra-field crop evolution even for crops subject to Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 5389 signal, such as soil colour or brightness variations, acquisi- tal for understanding food production and for allowing the tion geometry effects, or different sensor spectral configura- development of models that link EO data to this. tions. As the derivation is indirect, careful validation and as- Releasing the data set described in this paper is a con- sessment of uncertainties in the inference of these parameters tribution to data-sharing efforts championed by the Group are critical (Loew et al., 2017). on Earth Observations Global Agricultural Monitoring (GE- LAI is an important indicator of crop development, and OGLAM, https://www.earthobservations.org/geoglam.php, its use for yield estimation has been shown to be superior to last access: 16 November 2022), an initiative launched by using simple indices (Baez-Gonzalez et al., 2005; Lambert the G20 international forum in 2011 (Becker-Reshef et al., et al., 2018). Although the relationship between surface re- 2020). The crops in the fields surveyed in this data set are flectance and LAI is complex, it is possible to develop empir- grown by smallholder farmers, represent a typical sample of ical mappings between LAI and vegetation indices, although the variability found in this region and provide a strong foun- these mappings may not be very general. Having estimates dation for developing and assessing land cover or crop-type of LAI allows us to leverage mechanistic crop growth mod- maps. The crop biophysical and agronomic parameters pro- els to e.g. train the relationship between modelled LAI (as vide an important source of data to develop and adapt crop- a function of meteorological data, typical management and monitoring methods to typical western African conditions as soils) EO observations (Jin et al., 2017, 2019; Azzari et al., well as a useful source of data to validate the performance 2017; Jain et al., 2016) or more sophisticated data assimi- of crop growth models parameterised for maize in the region lation methods, based on combining the uncertain evidence using typical fields. from the model predictions with the (incomplete) EO-derived The data are available from https://doi.org/10.5281/ observations of LAI (Huang et al., 2019). zenodo.6632083 (Gomez-Dans et al., 2022). Ultimately, all of these approaches rely on in situ data for method development and validation. The need for new data 2 Materials and methods collection is particularly acute for smallholder croplands in sub-Saharan Africa, as most studies have concentrated on 2.1 Location croplands and crops in the Global North (Pritchard et al., 2022). The contribution of this paper is to provide and de- In this study we present a new data set of farm boundaries and scribe a data set covering three main aspects: (i) location biophysical parameter measurements in Ghana. The data set of crops; (ii) biophysical parameters over the growing sea- includes information on crop type, collected in an extensive son for maize; (iii) crop yield and biomass data. We demon- campaign covering areas of 50 km by 50 km in three agroe- strate the application of the crop-type/location data to val- cological zones across the country in December 2020 (see idate cropland data sets and to train a maize classifier us- Fig. 1 for locations) and an intensive maize-focused cam- ing Sentinel 2 observations. This data set can be used to un- paign in 2021 in the north of Ghana, specifically the Northern derstand crop dynamics over three different regions. We de- and Savannah zones (see Fig. 2). scribe a biophysical parameter and yield data set collected In the intensive campaign, data were collected from the over a set of maize fields in northern Ghana between July and Tamale, Mion, Salaga North, Gushegu, Karaga and Nan- November 2021. An associated crop-type data set for Ghana ton districts. The intensive study area covers around 55 km is also introduced, covering the same area as the biophysical east to west and close to 70 km north to south. It is within parameters for the 2021 season but with a wider-area map- the Guinea Savannah agroecological zone of the country. ping undertaken during 2020. Rainfall patterns in the area are unimodal, with a rainy sea- Estimating biophysical parameters from EO data is an in- son starting in April/May and ending in September/Octo- direct problem, and in situ data are needed to validate EO re- ber. Mean daily temperatures oscillate between 31 ◦ C in the ◦ trievals for particular environments. For croplands, tracking hottest month (February/March) and around 22 C in the the vegetation over the entire growing season is particularly coolest month (August/September). Annual rainfall ranges important. Here, we concentrate on LAI and leaf chlorophyll between 900 and 1100 mm (see Fig. 3). During the rainy sea- concentration as the parameters of interest. LAI was selected son, when most crops are grown, cloud cover is persistent, for its known relationship with yield and the use of LAI as often making it difficult to acquire observations using opti- a critical state variable in crop growth models. Leaf chloro- cal EO sensors. Additional difficulties for remote sensing of phyll content has also been related to gross primary produc- crops in this area include the prevalence of trees within fields, tion (GPP) (Gitelson et al., 2006) and yield (Croft et al., inter-cropping practices, and often the presence of significant 2019). Leaf chlorophyll has also been linked to the CO - weed cover early in the season, all of which complicate the2 saturated photosynthetic rate (Vmax) (Wang et al., 2020), interpretation of EO signals (see Fig. 4 for an example of which would provide an additional linkage to crop models this). to LAI as well as potential information on nitrogen stress. The biophysical parameter data set is enhanced by collecting additional data on grain yield and biomass, both fundamen- https://doi.org/10.5194/essd-14-5387-2022 Earth Syst. Sci. Data, 14, 5387–5410, 2022 5390 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana Figure 1. Locations of samples for crop-type mapping in the 2020 extensive campaign. (a) Transition zone and (b) Deciduous and (c) Northern Savannah sites. The top right sub-image shows the locations of these areas within Ghana. Figure 2. Location of the crop-type mapping for the 2021 campaign. Red dots show agricultural fields. Green dots show locations where biophysical parameters were collected. The top right sub-image shows the location of the area within Ghana. 2.2 Crop-type mapping over different crops and cropping systems within Ghana. Three teams surveyed three ∼ 50 km× 50 km sites in the Two crop-type mapping campaigns were conducted: a pre- Guinea Savanna, Transition and Deciduous forest regions liminary campaign in December 2020, covering three agroe- (see Fig. 1 for the locations) between 12 December and cological zones, and a maize-focused campaign in 2021. 21 December 2020. The site locations and main crops associ- The 2020 campaign served partly as a training exercise for ated with each site were chosen based on previous knowledge 2021 data collection but also to provide extensive sampling Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 5391 Figure 3. Monthly climatologies for temperature (a), precipitation (b) and shortwave downwelling radiation (c) over the Tamale area (derived from the ERA5/Land data set between 1990 and 2021) as well as daily temperatures, precipitation and downwelling radiation for 2021. The grey area indicates the biophysical parameter collection period. Figure 4. Pictures of two maize fields within the study area, showing the crop heterogeneity and presence of weeds and trees within fields. (a) Field 7071ZIN (17 September 2021) in situ LAI: 1.1 m2 m−2, chlorophyll conc.: 39.3 µg cm−2. (b) Field 3075TAM (14 September 2021) LAI: 1.9 m2 m−2, chlorophyll conc.: 45.2 µg cm−2. and in discussion with local agricultural extension workers. lected for cases where (i) a crop could be identified (the cam- Within each site, crop types were initially identified from a paign took place in December, when many fields had already drive-through “windshield survey” (Defourny et al., 2014), been harvested, so crop identification was sometimes based and GPS locations and photographs were simultaneously on crop residue inspection) and (ii) a single crop was present gathered using mobile phones inside individual fields. Crop (so plots with evidence of intercropping were discarded). The type was mapped at a point location within each field. Fields crop types to map were selected as being the most represen- that were less than 0.25 ha were ignored, following the rec- tative of the region, with the goal of collecting around 600 ommendations from Defourny et al. (2014). Fields were se- points for each region. The crop distribution per region is https://doi.org/10.5194/essd-14-5387-2022 Earth Syst. Sci. Data, 14, 5387–5410, 2022 5392 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana shown in Table 1. For the Deciduous, Savannah and Transi- Measurements were taken from 6 September 2021 to tion zones, 644, 630 and 660 fields, respectively, were sam- 5 November 2021. Measurement of LAI and leaf chlorophyll pled. content was performed weekly on the sample farms using an A second crop-type campaign was carried out in Au- Li-Cor LAI-2200-C device and a Minolta SPAD 500 device, gust 2021, more focused on identifying maize fields. It was respectively. The full field protocol for LAI and leaf chloro- conducted near Tamale (Guinea Savannah region) (Fig. 2), phyll is presented in Appendix A. with the primary aim of identifying maize fields to compare In each field visited, four locations were selected and to the data collected in 2020 and the secondary aim of scop- marked and measurements taken at these. For the leaf chloro- ing fields that could be used for the biophysical parameter phyll measurements, the fifth and sixth leaves (relative to the campaign. Fields were selected following the same consider- bottom of the canopy) of individual plants in each sampling ations as the 2020 campaign, but rather than points, polygons site were tagged on the first visit. Measurements were then of the field boundaries were collected. This required physical repeatedly taken on these leaves in subsequent visits. At the access to the fields, for which access permission was needed time of measurements, the general state of the crop and its from the owners. This resulted in a smaller number of fields phenology were also noted. (375) being surveyed compared to the 2020 campaign (Ta- ble 1). Field boundaries were collected by walking around 2.4 Crop yield and biomass measurements the field using a GPS tracker and taking care to exclude trees and buildings to ensure that the mapped field area was as ho- Crop yield was measured by crop cutting. For each farm, mogeneous as possible and measurements in that area would three 6 m× 6 m plots inside the field were harvested, cobs re- relate to the crop. Only minor post-collection editing was moved, and grain weighted. We report the quadrant yields as performed (e.g. closing polygons). well as the field-averaged mean yield and associated standard deviation. For a subset of 10 farms, above-ground biomass 2.3 Biophysical parameter collection was also sampled. The full protocol is given in Appendix B. The collection of biophysical parameters in 2021 focused on maize farms around Tamale. These farms were selected from 2.5 Satellite data maize fields mapped in 2021 (Sect. 2.2) after permission was Together with the ground data described above, we have obtained from the farmers to allow repeated visits and har- also produced a ready-processed data set of contemporane- vesting at the end of the growing season. The field campaign ous satellite observations to facilitate training and experi- was delayed by the arrival of the measurement equipment in mentation. We will use the ground data to develop an em- Ghana and training requirements (Covid and related delays), pirical estimation of LAI. Figure 5 shows time series of so the fields that were selected were those that appeared to the field-averaged normalised difference vegetation index be less developed towards the start of the measurement cam- (NDVI) over four fields derived from Sentinel 2 and Planet. paign (August 2021). In the study area, maize is typically Sentinel 2 observations are restricted by clouds, whereas sown around June, so it is possible that the selected fields are Planet data are more frequent, particularly towards the end sown later than what is usual for this area. As for the crop of the growing season. When both sensors collect data, the mapping, the selected fields show no intercropping, and the NDVI value is comparable, although it is also clear that the presence of trees is limited to the edges and has been masked Planet data show a larger instability in time as well as the out. These decisions limit the selection of fields but provide presence of outliers. It would have been preferable to use a simpler setting to validate EO-derived products and to test Sentinel 2 observations to produce estimates of LAI as the the link between biophysical parameters and crop produc- data have a richer spectral information content, but given the tion. Heterogeneous fields required different measurement scarcity of matchups with the ground measurements, we de- strategies to characterise the nature of the crop combination, cided to use the Planet data and to develop a simple mapping and the presence of several canopy layers (e.g. crop–tree) is using a vegetation index as a pragmatic trade-off. not considered in most EO LAI products, so tree detection We have used the Planet Surface Reflectance (SR) version and masking using very high-resolution (VHR) data would 2 product (Planet, 2018) downloaded from Planet Explorer be needed to make any comparison fair. (https://www.planet.com/explorer/, last access: 16 Novem- An initial set of 56 maize fields was selected for both ber 2022). We calculate the NDVI as it is a commonly used biophysical parameters and yield characterisation. The field vegetation index that is frequently used to describe crop con- sizes ranged from 0.25 to 2.2 ha, with an average field size dition and yield (Turner et al., 1999; Smith et al., 2002; of 0.78 ha. From the initial 56 farms, 8 farms were dropped le Maire et al., 2004; Ferwerda and Skidmore, 2007; le Maire later in the season as the crop had been damaged by an- et al., 2008). The Planet SR product is derived from the top- imals, flooded or abandoned by the farmer and overgrown of-atmosphere (TOA) radiance images acquired by the Plan- with weeds. These fields are representative of typical maize etScope constellation which collects data in the red, green, fields grown in the Guinea Savannah region of Ghana. blue and near-infrared bands with a nominal resolution of Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 5393 Table 1. Number of mapped fields per crop-type class for the 2020 and 2021 campaigns. 2020 campaign 2021 campaign Deciduous Guinea Savannah Transition Guinea Savannah Crop class Number Crop class Number Crop class Number Crop class Number Oil palm 117 Maize 90 Maize 127 Maize 214 Maize 116 Soybean 87 Cassava 83 Groundnut 71 Cocoa 90 Sorghum 75 Cashew 78 Cassava 4 Cassava 74 Rice 74 Yam 60 Rice 50 Plantain 66 Groundnut 68 Plantain 54 Soybean 20 Orange 61 Pigeon pea 56 Cocoa 53 Cowpea 5 Rubber 46 Yams 56 Tomatoes 53 Yam 6 Rice 45 Millet 52 Mango 41 Millet 1 Coconut 10 Cassava 39 Garden eggs 33 Garden eggs 7 Cowpea 30 Pepper 26 Bush 12 Bush 3 Cabbage 23 Orange 18 Cowpea 11 ∼ 3.7 m. The SR product has a ground sampling distance of We have implemented the double-logistic fitting as a two- 3 m and a positional accuracy better than 10 m (Planet, 2018). stage process to account for both the large variability and The data are atmospherically corrected and have an associ- the limited observational opportunity at the early start of ated cloud, cloud shadow, etc., pixel mask (Planet, 2018). the growing season. As a first stage, the six parameters pi , The vast changes in acquisition geometry, sensor proper- i ∈ (0. . .5) in Eq. (1) are estimated for individual pixels over ties, failure of the cloud and cloud/shadow mask and incon- a field. Then, the per-field median values for the six param- sistencies in the atmospheric correction result in the measure- eters are calculated and used to define bounds in parameter ments from Planet being very noisy and contaminated with values for the second pass. In the second pass, the double lo- outliers, as is clear from Fig. 5 (see also Houborg and Mc- gistic is again fitted to the per-pixel NDVI time series, with Cabe, 2016). Outliers and gaps in the time series (particularly all parameters except p1 (the amplitude) being constrained at the start of the measurement period) require treatment: we to be within 5 % of the median field value. In this way, we develop here a robust smoothing and interpolation approach ensure that the mapped timing information is spatially corre- that allows us to achieve the desired NDVI-to-LAI mapping lated at the field level and that the variation in the amplitude along with an estimate of LAI uncertainty. of the vegetation index will be greater. We use an efficient and robust smoothing filter with a bi- Although the outlier filtering method described above is square weighting to flag and remov∣e gro∣ss outliers in the based on smoothing and statistical tests, the spatially awarePlanet NDVI time series (Heiberger ∣and B∣ecker, 1992; Gar- field constraints and typical consistency in reconstructed VIcia, 2010). An outlier is flagged if ui4 685 ≥ 1, where ui is trajectories (Fig. 6c) over a field suggest that the outlier fil-. the studentised residual for sample i (Garcia, 2010). An ex- tering is appropriate and does not introduce large biases. The ample application of the smoother is shown in Fig. 6a, where processing described above results in more stable estimates the Sentinel 2 field-averaged NDVI is also shown for com- of NDVI over time, as can be seen in Fig. 6c, particularly parison. tightening up the temporal trajectory towards the start of the To further reduce the large remaining variability, we fit time series. a double-logistic function (Zhang et al., 2003; Beck et al., We use the interpolated and smoothed NDVI data to de- 2006; Atkinson et al., 2012; Yang et al., 2019) (Eq. 1) to velop the mapping to the LAI. The large uncertainty in the NDVI as a function of time t for each pixel, an effective way individual elemental sampling unit (ESU) LAI ground mea- to both reduce the noise (Hird and McDermid, 2009; Jönsson surements suggests that the model is fitted at field level. Po- and Eklundh, 200[2) and allow temporal interpolation. tential further issues with a mapping from NDVI to LAI aresaturation effects with high LAI (Baret and Guyot, 1991). For 1 NDVI p p ] maize in the study area, a very high LAI is never achieved,= 0− 1 1+ exp(p2(t −p3)) and the field measurements never exceed an LAI of 3, so 1 we might suppose that saturation of the signal should not + − 1 (1) 1 exp( p (t p )) be a problem here. The limited range of the field LAI data+ − 4 − 5 also suggests that a linear model is an acceptable model https://doi.org/10.5194/essd-14-5387-2022 Earth Syst. Sci. Data, 14, 5387–5410, 2022 5394 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana choice. We estimate the value of NDVI on the day of the SWIR1 and SWIR2) are then smoothed/interpolated using a in situ observations from the smoothed/interpolated Planet robust Whittaker smoother (Eilers, 2003; Garcia, 2010), with data and average both the EO-estimated NDVI and the in a smoothing strength parameter of 0.5. The classifier used situ LAI over the field. We randomly split the data set into was a random forest with 100 trees, and in order to train the 70 % for training and 30 % validation. We fit the linear model classifier, the individual pixels underlying the field-surveyed LAI=m ·NDVI+ c to the training data and test its perfor- polygons were used. The pixels were split into two sets: 70 % mance on the validation samples. We repeat this fitting pro- and 30 % of the points were used for training and validation cedure using 20 random splits to avoid biases in the estimates (respectively). For the production of the final mask, all pixels of m and c and to provide an initial uncertainty in these pa- were used to train the classifier. rameters. 3 Results 2.6 Validation of cropland masks 3.1 Biophysical parameter measurements We use the collected data to partially validate the Digital Earth Africa cropland mask (Burton et al., 2022). This bi- Pictures from two typical fields are shown in Fig. 4, which nary (crop/no-crop) mask has been developed for 2019, but show the clear row structure and the low plant density that there are plans to extend it to other years. Using crop masks was common to most fields. Time series of the evolution of from prior years is a pragmatic choice to monitor the same in situ measurements are shown in Figs. 7 and 8 for LAI and region in the current season (Becker-Reshef et al., 2018). We chlorophyll, respectively, for the sample fields. can use the crop location data to assess the accuracy of the LAI values ranged from 0.1 to 5.24 m 2 m−2, with a mean crop mask for other years and to test its suitability to enable value of 1.37. The 10th, 50th and 90th percentiles were within-season crop mapping. The Digital Earth Africa crop- 0.5281, 1.15 and 2.13 m 2 m−2, respectively. LAI values are land mask is based on a random forest (RF) classifier, but lower compared to other regions where irrigation and fer- as a binary mask of the cropland/non-cropland final prod- tilisation are common but are in line with other studies for uct, it provides an estimate of the probability of each 10 m the area (Srivastava et al., 2016; MacCarthy et al., 2015). In pixel being cropland. The final binary mask is derived from some of the fields the decrease in LAI from around its max- the pixel probability data set, and cropland is assumed if the imum is obvious (e.g. fields labelled 1029ZIN, 5034TUG, pixel probability is greater than 0.5. We can assess the quality 5036TUG and 1056ZIN). The pattern for other fields is not of the mask by testing the fraction of surveyed cropland loca- clear, with some having the LAI peak towards day of year tions (pixels) and their associated probability. A good mask (DoY) 275 (e.g. fields labelled 7021YAM, 7068ZIN and would be characterised by a large proportion of the visited 7069ZIN), whereas other fields show no clear dynamics. cropland pixels having a probability larger than 0.5. Con- For leaf chlorophyll concentration, the values ranged −2 versely, a poor mask would show most visited pixels having between 6.1 and 60.3 µg cm , with a mean value of a probability lower than 0.5. 34.2 µg cm −2 and the 10th, 50th and 90th percentiles given by 15.71, 35.9 and 49.39 µg cm−2. The trends for leaf chlorophyll concentration are clearer 2.7 Crop mask classification than those of the LAI. Most fields show an expected decay of The data sets collected in 2020 and 2021 can be used to chlorophyll as the season progresses, with most of the differ- extend previous efforts to provide cropland/crop-type maps ences between fields relating to the timing and magnitude of (Xiong et al., 2017; Jolivot et al., 2021; Estes et al., 2022; the leaf chlorophyll reduction. Burton et al., 2022), but here we illustrate the use of the 2021 data set in developing a 10 m maize mask for the whole 3.2 Crop yield and biomass measurements Northern zone in Ghana using data from Sentinel 2. The clas- sification experiment is done using the Google Earth Engine The distributions of measured yield are shown in Fig. 9a, c. platform (Gorelick et al., 2017). The approach taken was to Biomass and harvest index histograms are shown in Fig. 9b, collect Sentinel 2 observations of surface reflectance (atmo- d. For the individual quadrant measurements, yields varied−1 spherically corrected with the sen2cor package, Louis et al., between 35 and 5036 kg ha . The per-field averaged val-−1 2016, GEE data set “COPERNICUS/S2_SR”) between May ues were between 190 (field 7033FUU) and 4580 kg ha−1 and October 2021, when rain-fed crops are being grown. Af- (field 7021YAM), with an average of 1379 kg ha and a−1 ter applying a cloud mask and only processing pixels labelled standard deviation of 872 kg ha . The uncertainties within “Crops” in the ESRI Sentinel 2 land cover map (Karra et al., the fields were also important, with an average within-field 2021) (although the code provided is flexible and users can standard deviation in yield of ∼ 350 kg ha −1. Total above- −1 modify the base map and its classes easily), temporal series ground biomass was between 1000 and 16 000 kg ha , and of a number of vegetation indices (NDVI, LSWI, IRECI and the harvest index varied between 0.21 and 0.77. GCVI) and a subset of spectral bands (Red Edge 1, NIR, Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 5395 Figure 5. Field-averaged NDVI from Sentinel 2 (green dots) and Planet (purple squares) over four of the visited maize fields in 2021. Vertical purple lines indicate the extent of the in situ data-gathering campaign. Error bars indicated 2 %–98 % field NDVI percentiles. Figure 6. Planet VI time series processing step example for field 7074ZIN. (a) Outlier filtering. (b) First-pass single-pixel double-logistic fitting (unconstrained). (c) Second-pass single-pixel double-logistic fitting (phenology parameters constrained by the median field). The vertical lines show the extent of the ground campaign period. 3.3 EO-derived LAI squared error (RMSE) around 0.43 m2 m−2, a mean absolute error (MAE) of 0.35 m2 m−2 and a negligible bias (Fig. 10). The approach described in Sect. 2.5 results in a simple trans- Figure 10 clearly shows an underestimation of the Planet formation between Planet NDVI and LAI. The calibration NDVI signal for LAI> 1.5. A comparison of the field LAI and validation of this approach are shown in Fig. 10. The measurements and the Planet-derived LAI time series is conversion equation is given by LAIpred = 3.95·NDVI−1.21, presented in Fig. 11, where a correspondence between the with the two coefficients having bootstrapped uncertainties model-predicted LAI and the field measurements is shown. of 0.16 and 0.09, respectively. In validation, the model shows a modest correlation (R = 0.5, R2 = 0.25), but in absolute terms, the model performs in line with medium-resolution products (Fang et al., 2019), with a validation root mean https://doi.org/10.5194/essd-14-5387-2022 Earth Syst. Sci. Data, 14, 5387–5410, 2022 5396 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana Figure 7. Per-field temporal evolution of the in situ leaf area index (LAI). For any date, the mean is shown, and error bars represent ±1σ . Figure 8. Per-field temporal evolution of in situ leaf area chlorophyll concentration (Cab). For any date, the mean is shown, and error bars represent ±1σ . Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 5397 Figure 9. Crop yield distribution (a: field-averaged, c: individual quadrants), field-averaged above-ground biomass (b) and field-averaged harvest index (d). Figure 10. NDVI-to-LAI calibration (grey lines show bootstrap uncertainty) (a) and validation (b). 3.4 Validation of cropland masks underreporting crop area. The best results are obtained for maize, cassava and plantain, where ≈ 12 % of the crop area Figure 12 shows the distribution of Digital Earth Africa’s is reported as such. For the Transition zone in 2020, the per- cropland mask probabilities for the surveyed pixels in 2020 formance is better. For pepper, cowpea and cabbage, ≈ 50 % and 2021. For a perfect mapping, one would expect the cu- of the surveyed pixels are reported as cropland. For maize mulative distribution to be a Heaviside step function chang- in this region, only 40 % of the pixels have a probability of ing from 0 to 1 for a high probability value, suggesting that more than 0.5. Results for the Savannah region in 2020 are all the pixels are detected as cropland with a high confidence. better: ≈ 60 % of maize fields are labelled cropland, but for At the very least, the change point should be around 50. Any other popular crops such as sorghum, millet, groundnut and samples that appear with less than 50 % probability would be soybean, less than 40 % of the samples are labelled cropland. omission errors. For the Savannah region in 2021, the results are similar to the There are clear differences between years and sites. For same region in 2020, with rice and maize having around half the semi-deciduous zone in 2020, the vast majority of pixels of the pixels detected as cropland by the mask. Soybean is are labelled non-crop, with the cropland mask consistently only detected in 20 % of the pixels. The conditions towards https://doi.org/10.5194/essd-14-5387-2022 Earth Syst. Sci. Data, 14, 5387–5410, 2022 5398 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana Figure 11. Predicted LAI time series and associated standard deviation (blue line and grey area) in comparison with field measurements (per-field mean and standard deviation, green lines). the south of Ghana (persistent cloud, mixing of crops and for this site is probably optimistic. An example of the crop- trees) make it harder to identify cropland. Towards the north, land mask and the visited fields around a small area within with fewer trees and (comparatively) less cloud, the crop- the Mion district is shown in Fig. 13, where it is clear that land mask improves but still misses around half of the crop- the cropland mask misses many of the visited fields. The land area, and for some crops (soybean), the situation is even reported probabilities of cropland values are also fairly low worse. (very few pixels have probabilities over 70 %), suggesting a It is worth noting that, for 2020, each pixel belongs to a large uncertainty for most pixels. different field, whereas for 2021, all pixels belonging to the Our results are hard to compare with the official validation visited fields have been considered, and as such, pixels within report from Burton et al. (2022), as we are considering using a field are spatially correlated, which suggests that the result the mask for different periods than those used to develop the Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 5399 data set, but in our case, the results show a large omission er- 4 Discussion ror of more than 50 % for all the crops, with the results being markedly worse towards the south of Ghana. These figures 4.1 Crop measurements are at least double the reported 25 % omission reported in Burton et al. (2022). The in situ data gathered are a useful contribution to ad- vancing crop-monitoring methods for smallholder farmers 3.5 Crop-type classification in western Africa. It is important to note that the data col- lection started when the crops were already established in The maize mask derived as presented in Sect. 2.7 is shown the fields, as is clear from the satellite temporal trajectories in Fig. 14. The validation results (using the 70 : 30 data split (e.g. Figs. 6 and 4), so that the initial growth period after introduced earlier) were an overall accuracy of 0.84 and a emergence was not captured. Additionally, due to logistical kappa score of 0.68 (see Table 2 for details). Five-fold cross- issues, the field campaign could only start towards the end validation was also used to evaluate the robustness of the re- of August/start of September 2021, when planting in this sults shown in Table 2, which resulted in an overall accuracy area started 3 months earlier. To capture the longest possi- of 0.72 (standard deviation 0.12). The importance of each ble dynamics over the campaign, the chosen fields were all considered feature is shown in the Gini index plot shown in late-sown, which may result in them not being very repre- Fig. 15, which ranks different features by classifier impor- sentative of the majority of fields in the region. The choice tance. of fields with the least amount of tree cover and no intercrop- ping may also impact the generality of the data gathered with 3.6 Yield prediction testing respect to other fields where these conditions are not met. The in situ LAI measurements show some variability (mean The relationship between leaf area and crop yield has been standard deviation for samples collected on the same date in widely explored, often via vegetation indices (e.g. Becker- the same field of ≈ 0.25 m2 m−2). The effect of this varia- Reshef et al., 2010), and it usually follows that a green, tion is relatively more important here as the maximum LAI healthy and dense canopy results in a higher yield. Many au- is quite small ( 2 −2≈ 2 m m compared to e.g. the US, where thors relate yield to the magnitude of the maximum value of the maximum LAI will be between around 5 and 6 m2 m−2 the leaf area (or vegetation) index. This is a fairly crude re- (Nguy-Robertson et al., 2012)). lationship, but for regional applications in areas with large The variation of leaf chlorophyll content appears different fields and monocultures, it can be quite effective (Mkha- to the LAI trajectory but is broadly characterised by a plateau bela et al., 2011; Petersen, 2018; Kouadio et al., 2012). followed by a steady drop. Here, we have not looked at its Here, we look at this relationship between maximum field- use, either for EO-product validation or other applications, measured LAI and field-measured yield as well as Planet- as the scarcity of Sentinel 2 reflectance data over the fields derived LAI and field-measured yield. For in situ data, a lin- makes retrievals unreliable. ear relationship yields a reasonable fit (coefficient of cor- 2 There is a large variability of yields within individualrelation R = 0.66, coefficient of determination R = 0.44) fields. This is due to differential management practices and once fields 7021YAN, 7033FUU and 7036SAN are removed small-scale soil and/or topography variations. Three crop as outliers. The slope and intercept of the linear model are cuts over a standardised area were used to determine yield, 866.44 and −180.45, respectively (Fig. 16a). Including the conforming to common practice. However, the spatial loca- left-out fields reduces the coefficient of determination to 2 tions of these estimates were not surveyed, so the data limitR = 0.31. Using the maximum LAI value from the Planet us to answering whether EO data provide useful within-field data introduced in Sect. 2.5, the correlation degrades to R = 2 yield variability. Yield estimates should also be co-located0.53 (R = 0.29, ignoring fields 7021YAN, 7033FUU and with the LAI/leaf chlorophyll measurement points to fully 7036SAN) (Fig. 16b). Slope and intercept for the Planet- exploit the connection between canopy variables and yield. derived maximum LAI are 1606.51 and −893.10, respec- Only limited data on above-ground biomass were col- tively. The lower correlation from the satellite-derived rela- lected, but the distribution of the harvest index between 0.2 tionship is probably explained by the underestimation of the and 0.7 suggests that the relationship between biomass and maximum LAI for the higher yield fields (Fig. 10). For the yield is not well established. Lambert et al. (2018) find a in situ case, the value of the slope in the maximum LAI re- 1 strong relationship between above-ground biomass (AGB)lationship is close to 900 kg ha− , suggesting that, with er- 2 2 and yield for maize in Mali. Karlson et al. (2020) show that,rors in the Planet LAI estimates of around 0.44 m m− , the for a range of cereals, the relationship varies from year to typical error in the estimate would be around 381 kg ha−1, a year in Burkina Faso. Hay and Gilbert (2001) show a harvest value close to the average uncertainty in the field-measured index (HI) ranging from 0.19 to 0.53 for maize in Malawi. yield. It is interesting that Moser et al. (2006) point out that water stress can have an impact on HI in maize, which suggests that the relationship between AGB and yield may not be constant. https://doi.org/10.5194/essd-14-5387-2022 Earth Syst. Sci. Data, 14, 5387–5410, 2022 5400 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana Figure 12. Cumulative frequency of surveyed pixels against the probability of cropland per Digital Earth Africa cropland probability mask for 2019. Horizontal dashed lines are 0.5, and the vertical line is a cropland probability of 50 %. Numbers between brackets in legends indicate the number of samples of each class included in the calculations. Table 2. Results from the crop classification for 2021. Crop Other Maize User’s accuracy Overall accuracy Kappa score Other 7823 1234 0.86 Maize 1800 7975 0.82 Overall accuracy 0.81 0.87 0.84 0.68 This has important implications, as in many applications of line with similar approaches (e.g. Fig. 4 in Fang et al., 2019 EO, the target variable is AGB, which is then converted into shows a median RMSE for an LAI around 0.5, although yield assuming a fixed HI (Jin et al., 2017; Lambert et al., larger correlations are typical) and better than universal re- 2018). lationships for maize (Kang et al., 2016, report an RMSE of 2 −2 2 −2 ∼ 1 m m for maize, against our reported ∼ 0.5 m m ), 4.2 LAI estimations there are issues with poor correlation and high LAI samples in Fig. 10. The small dynamic range of the field data and We presented a very simple approach to exploit the ground the large uncertainties of the measurements are behind both data set and calibrate a simple LAI relationship with satellite of these effects. Ground uncertainties arise from measure- NDVI. The outlier filtering and logistic function fitting are ments in a heterogeneous, sparse and discontinuous canopy, fairly standard approaches and are necessary pre-processing whereas the low variability of the ground data is caused by stages to establish a linkage between ground and satellite the period of data gathering not providing a full description measurements. While the results of the mapping appear in of all of the vegetation growth dynamics (see Fig. 7). A fur- Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 5401 Figure 13. Detail of some of the visited field sites (yellow polygons) and the Digital Earth Africa cropland probability mask (Burton et al., 2022) around the Mion district in the Northern zone in 2021. Figure 14. Left panel: maize mask (maize shown in orange). Insets (A) and (B) illustrate the field data. Maize fields are shown as green outlines and other crops as red outlines. ther potential source of uncertainty is the contribution of the make these data useful, whereas for others, just using the fil- soil to the NDVI signal (Baret and Guyot, 1991; Carlson and tered and smoothed NDVI trajectory may be more appropri- Ripley, 1997). ate. The evaluation metrics presented here and the suitability of these data and this method have to be evaluated for partic- 4.3 Cropland mask validation ular applications. In some applications, the low bias estimate of LAI and acceptable RMSE performance of the model will We have used the crop location data to assess a continent- wide cropland mask and to derive a local maize mask. Both https://doi.org/10.5194/essd-14-5387-2022 Earth Syst. Sci. Data, 14, 5387–5410, 2022 5402 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 2022), the data we provided in this contribution can be used as a source of independent validation but also as a source of training data, with the important caveat that no non-cropland samples were collected. Non-cropland samples are needed to fully characterise the masks. The maize mask that was developed in this paper demon- strates that the data can be used as an input to a classifier. However, the limited number of samples for 2021 (where the main aim of the field campaign was biophysical parameter collection) result in a crop mask that is probably only re- liable around the collected data points. Also, since the sur- veyed fields were selected as late-sown, this may also bias the field selection. Figure 15 indicates that the classifier is mostly being driven by observations around the first half of June (DoYs 150-165), suggesting that early crop develop- ment may be more informative for crop discrimination than late crop development. We note that even methods based on more training data (> 4000 samples), more complex classi- fiers and a very rich set of data combining Sentinel 1, Sen- tinel 2 and Planet (Rustowicz et al., 2019) still report overall accuracies for crops in Ghana of around 60 %. Figure 15. Gini index for the random forest maize mask classifier, It is instructive to compare the results for Ghana to those showing the 10 highest-ranking features in the classifier. Features in Nigeria. In Nigeria, cropland and crop-type masks appear are given by their abbreviation, followed by time step. For example, IRECI_165 represents the IRECI at day of year 165. feasible, with accuracies of over 70 % for both cropland and crop type reported by Ibrahim et al. (2021) and even over 90 % (using Sentinel 1 and 2) (Abubakar et al., 2020). These two studies suggest that, if cloud cover is not an issue, Sen- were developed with Sentinel 2 data, which in Ghana (and tinel 2 is able to use the temporal signal to map different similar tropical regions) is challenging due to limited obser- crops, and adding Sentinel 1 (as in Abubakar et al., 2020) vation opportunity due to clouds. In Ghana, this is exacer- only marginally improves on the Sentinel 2 results. The im- bated as the vast majority of crops are grown during the wet portance of the optical data suggests that there might be lim- season. Additionally, smallholder landscapes show a large ited improvements in crop-type mapping using Sentinel 1 for within-class variability, due to large differences in manage- Ghana. ment choices as well as the presence of trees and other non- crop vegetation. So, infrequent temporal sampling and most 4.4 Yield prediction crops growing simultaneously make using Sentinel 2 chal- lenging for developing a cropland mask. The task of devel- We show the effect of using maximum LAI to predict yield oping a crop-type mask is even harder and in all likelihood using both the in situ and satellite-derived LAI estimates at would need vast numbers of crop labels covering different field level. After removing three outlier fields, we find that the relationships are quite weak (R2 0.44 and R2seasons and locations. = = 0.28 The Burton et al. (2022) cropland mask was assessed for ground and EO-derived maximum LAI, respectively). for the 2 subsequent years to that when it was produced. The poor performance of the EO-derived method stems from We found that in areas towards the south of Ghana (semi- the saturation effect, which strongly affects the maximum deciduous agroecological zones), the cropland mask under- LAI estimation. For the ground data, the results still show estimates cropland area for most crops. Towards the Tran- a large dispersion. The slope of the regression is also large: sition, Northern and Savannah zones, the performance im- 866.33 kg ha −1 per unit of LAI. Any error in maximum LAI proves, but in the best case, omission errors are larger than estimation will have a considerable error in yield estimation. 50 %. This is in marked contrast to the validation report for Nevertheless, the results reported here compare favourably western Africa, where omission errors are 50 %. It would be to more complex field-level studies done in the US corn 2 of great interest to repeat the exercise here for the cropland belt (R = 0.45, RMSE 1850 kg ha −1) using a more com- masks for the years 2020 and 2021 and to assert whether plex model and Landsat data (Deines et al., 2021). Kang the performance is similar (suggesting that the issue is with and Özdoğan (2019) use a data assimilation system ingest- the data source and method) or whether they improve, sug- ing MODIS and Landsat data over the US corn belt and re- gesting a very dynamic interannual cropland variation. As port yield estimates with a correlation of R 2 = 0.41 and an −1 more of these data sets become available (e.g. Estes et al., RMSE of 2170 kg ha . Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 5403 Figure 16. Relationship between maximum LAI estimates and yield. For in situ measured LAI vs. yield (panel a), and discarding fields 7021YAN, 7033FUU and 7036SAN, the linear fit has R2 = 0.44, and the linear fit is shown as the dotted line y = 866.33 · x− 180.45.. For Planet-derived LAI vs. yield (panel b), and discarding fields 7021YAN, 7033FUU and 7036SAN, the linear fit has R2 = 0.28, and the linear fit is shown as the dotted line y = 1606.51 · x− 893.10. The above discussion hints at some of the challenges of Transition/2020 Points located inside fields. Filename monitoring yield in smallholder landscapes: the large within CropTypes_Transition_2020.geojson and between field variations in yield only have a modest ef- Deciduous/2020 Points located inside fields. File- fect in LAI. Given that these variations occur over small ar- name CropTypes_SemiDeciduous_ eas with roughly the same meteo drivers, the variability of the 2020.geojson system can only be studied by making use of wall-to-wall ob- servations of e.g. LAI, as other sources of information (e.g. Savannah/2020 Points located inside fields. Filename agrometeorological crop models) will not exhibit variation in CropTypes_NSavannah_2020.geojson its drivers over these spatial scales. Savannah/2021 Field outline poly- Our results suggest that extracting relationships between gons (avoiding trees). Filename yield and EO-derived diagnostics such as maximum LAI CropTypes_NSavannah_2021.geojson is uncertain, in part due to yield showing a large spread even within single fields, indicating that the scale of analysis In situ biophysical parameter time series Time series should be the point within the field rather than the field av- of repeated measurements of leaf area index and leaf erage. With precisely co-located yield and LAI ground mea- chlorophyll concentration, as well as phenology obser- surements, a clearer understanding of the uncertainty of the vations and general comments. CSV format. Filename canopy variable to LAI mapping could be developed. Ghana_ground_data_v5.csv Crop yield and biomass Crop yield measured in three 5 Data availability quadrants per field and biomass measurements on a reduced set of 10 fields. CSV format. Filename The following data are available from http://doi.org/10.5281/ Yield_Maize_Biomass.csv zenodo.6632083 (Gomez-Dans et al., 2022). Planet-derived time series of LAI Per-field GeoTIFF files that have been derived from the original Detailed field sampling locations A GeoJSON file with Planet data. Provided as a Zip archive. Filename the locations of maize fields where detailed measure- PlanetDerivedLAI.zip ments of biophysical parameters were made filenames: Biophysical_Data_Collection_Points_ Maize mask A maize/no-maize mask for 2021 derived from V1.geojson and Biophysical_Data_ Sentinel 2 data (Sect. 2.7) in GeoTIFF format. Filename Collection_Polygons_V1.geojson. CAU_maize_classification.tif Field location campaigns Four GeoJSON files with the lo- cations of fields and the crop type. https://doi.org/10.5194/essd-14-5387-2022 Earth Syst. Sci. Data, 14, 5387–5410, 2022 5404 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 6 Code availability cal soils variation, farming practices, etc. There are important implications for the use of crop growth models in these sys- Code for the classification introduced in Sect. 2.7 is tems and the need to consider the sources of the underlying available from https://code.earthengine.google.com/ yield variability, in addition to usual climate and soil drivers 4795796bb1f47ff7e9ca9c0aae263c11 (last access: (Beveridge et al., 2018). 16 November 2022). The large yield variability also suggests that to truly under- stand food production, the real spatial distribution of yields 7 Conclusions needs to be measured. This will require the use of EO-based methods and an important data collection effort similar to the We have gathered a rich data set to understand and develop one presented here to understand their limitations. We show crop-monitoring methods using EO for maize in northern that there is a relationship between the in situ measured max- Ghana. The data set includes the locations of crops, a com- imum LAI value and yield at the plot level, but the relation- prehensive set of repeated biophysical parameters (LAI and ship deteriorates for the EO-derived maximum LAI, due to leaf chlorophyll concentration) over the growing season and the additional uncertainty and bias of the EO-derived LAI measurements of crop yield and biomass. These measure- estimate. Developing robust methods to infer LAI from EO ments were acquired in 2021 (with some crop locations also data (in particular, high revisit frequency sensors) is critical reported in 2020) and were taken from an agricultural area to be able to monitor crop development. east of the city of Tamale. The collected data are novel in that Providing cropland and crop type maps for smallholder they focus on smallholder maize farms, an important and un- systems is challenging. We use our data to test a cropland derstudied agricultural landscape that supports many farmers mask, and find it underestimates cropland. We also use our in Africa. data to demonstrate a local maize/non-maize mask. In the This data set has a number of uses, some of which we il- light of the limitations shown in both of these efforts, the lustrate throughout this paper. The crop location data pre- collection of more extensive multi-site, multi-season crop lo- sented here complements recent contributions such as Jo- cation ground data is critical, as is the exploration of very livot et al. (2021) and can also be used to validate and cre- dense time series, as in many tropical sites with dry/rainy ate cropland/crop-type masks. We demonstrate both of these season dynamics, most of the crop growing happens during uses in the paper. In western Africa, producing these data sets the wet season, resulting in similar temporal dynamics and faces important challenges due to similar timing of crop de- low opportunity of observation. velopment, persistent cloud cover and huge heterogeneity in The data set has some limitations: it covers a single grow- crop development within and between growing seasons. This ing season over a small area. Similar efforts for other years points out for a need for more multi-location, multi-season in and areas would greatly strengthen any study that makes use situ data, to which this paper is a small contribution. of these data. Second, field measurements were done pref- The in situ collected biophysical parameters have been erentially in fields that were sown late in the season, and used to develop and validate a simple method to infer LAI broader sampling of earlier and late-sown crops would be from Planet data, which resulted in a local estimates having beneficial for an area where sowing occurs over a long pe- a low error (RMSE 0.44 m2 m−2, negligible bias) but a low riod in the rainy season. Finally, while the data set allows us coefficient of correlation R = 0.49, due to an underestima- to consider yield as a function of LAI and/or chlorophyll, it tion of high LAI. The measurements described in this con- does not allow us to understand what factors had a role in tribution can also be used to validate biophysical parameters crop development, e.g. soil, fertiliser, management, etc. Fi- retrieved using other sensors and methods (Fang et al., 2019; nally, yield measurements within a field were not georefer- Delegido et al., 2011; Clevers and Gitelson, 2013; Brown enced, and this makes it hard to use EO data to understand et al., 2021; Kganyago et al., 2020). The repeated temporal the important within-field variability. sampling is an important feature of this data set: as more so- In the light of this, we suggest that campaigns looking at phisticated techniques are developed to fully use time series agricultural applications, should consider repeated measure- of data (Lewis et al., 2012), repeated measurements cover- ments over the growing season, as well as considering mul- ing the entire development of the crop should be preferred tiple seasons or multiple geographical regions. In addition, over the traditional approach of data collected over a single measurements of crop yield should also be made to coincide or few dates. The multi-parameter nature of the data (LAI with biophysical parameters, as well as detailed management and chlorophyll concentration) is important in showing the information, as this would allow understanding of the vari- dynamics of both and supporting efforts to define a rich set ability in yield estimates even within and between fields. of Essential Agricultural Variables (Whitcraft et al., 2019). Collecting the data outlined above is expensive and chal- The data show a large variation in yield within a rela- lenging in smallholder agricultural landscapes, and in many tively small area and a considerable yield variation within cases, it will require a close conversation and collabora- fields. This local variability cannot be attributed to coarse- tion with the farmer. Providing useful satellite-derived data scale weather patterns or broad soil classes, but rather to lo- products for e.g. agricultural extension workers, who work Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 5405 closely with the farmers might encourage the collection and Appendix B: Biomass and yield measurement sharing of more data like that shown in this paper. protocol Appendix A: LAI and leaf chlorophyll measurement 1. Measure a quadrant size of 6 m× 6 m diagonally in protocol three different locations in the field. A1 Leaf area index measurements 2. Count the number of plants in each quadrant. 1. Identify four locations in each of the fields to take the 3. Count the number of cobs harvested from each quad- LAI measurements. Ensure that the selected locations rant. capture the variability in plant stand in the field. 4. Determine the number of cobs per square metre. 2. Identify the rows in which measurements will be made and mark the locations with pegs for subsequent mea- 5. Remove the ears, leaving the husk intact on the plant. surements. 6. Weigh the total cobs harvested from each quadrant sep- 3. At each location, take one above-canopy reading and arately. then four below-canopy readings diagonally as illus- trated in the picture below. 7. Select 10 cobs and weigh and shell and weigh the grains. 4. Ensure that the direction of the view cap is consistent for all readings per location. 8. Take 500 g of the grain sub-samples from each quadrant for moisture content determination. 5. Follow the steps in using the LI-COR 2200C in measur- ing LAI to capture the data. 9. Weigh all empty cobs and then 0.5 kg taken from each The following are important. sample for oven drying. – The 4A measurements are necessary to deal with scat- 10. Label the samples from each quadrant clearly with the tering corrections (generating K records) if the sun is quadrant number. Put all three samples from the farm out. Otherwise one above (A) reading and several be- into one poly bag and label the bag with the name of the low (B) readings are enough. farmer, community and district. – Increasing number of below canopy (B) readings im- 11. Submit the grain sample for moisture content determi- proves spatial average. nation within 24 h. – In a canopy that is 1 m high, the optical sensor should 12. With a known moisture content, estimate the grain yield be at least 3 m from the edge in any direction it can see. per square metre. – Be sure to take all B readings at the same height and in the same direction as the A reading. 13. Cut plants in the harvested area just above the ground. 14. Select 10 representative plants (stover) into leaf blade, A2 Leaf chlorophyll concentration measurement husk, leaf sheath and stem (including tassel). protocol 15. Weigh each component and log weight (undried). Chop 1. Select and tag five plants in each field for continuous components separately and take a sub-sample of 0.5 kg measurements. of each of the undried components. Place in sampling 2. Tag the fifth and sixth leaves of each of the five plants. bags and label. 3. On each leaf, take three measurements at 1/4, 1/2 and 16. Oven-dry each weighed undried sub-sample to a con- 3/4 of the distance from the leaf base to its tip. stant weight. 4. Take the readings on both leaves (six measurements per 17. Find a ratio of the dried to undried sub-samples of each plant). component and multiply them by their respective total 5. Press “Average” to generate the average SPAD reading undried weight to obtain the dry weight of each compo- for the plant. nent. The SPAD readings were taken from plants within the 18. Add all total dried weight of all components same four sections where the LAI readings were taken and (leaf blade+ leaf sheath+ stem (including tassel and from another section of the field (making five sections in all). husk)+ cob) to obtain total biomass. https://doi.org/10.5194/essd-14-5387-2022 Earth Syst. Sci. Data, 14, 5387–5410, 2022 5406 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana Author contributions. JLGD, PEL, FY, DSM and KA designed series data to estimate vegetation phenology, Remote Sens. En- the study and wrote the manuscript in close collaboration with the viron., 123, 400–417, https://doi.org/10.1016/j.rse.2012.04.001, other authors. JH and XL contributed Sect. 2.7. In situ samples were 2012. collected by MA, CED, SAN, RA and KKYA, and PL and IKB Azzari, G., Jain, M., and Lobell, D. B.: Towards fine resolution supervised the collected samples. HM and QW helped in organising global maps of crop yields: Testing multiple methods and satel- the campaign. All the co-authors contributed to data evaluation and lites in three countries, Remote Sens. Environ., 202, 129–141, interpretation and editing of the manuscript. https://doi.org/10.1016/j.rse.2017.04.014, 2017. Baez-Gonzalez, A. D., Kiniry, J. R., Maas, S. J., Tiscareno, M. L., Macias C., J., Mendoza, J. L., Richardson, C. W., Salinas G., J., Competing interests. The contact author has declared that none and Manjarrez, J. R.: Large-Area Maize Yield Forecasting Using of the authors has any competing interests. Leaf Area Index Based Yield Model, Agron. J., 97, 418–425, https://doi.org/10.2134/agronj2005.0418, 2005. Baret, F. and Guyot, G.: Potentials and limits of vegetation indices Disclaimer. Publisher’s note: Copernicus Publications remains for LAI and APAR assessment, Remote Sens. Environ., 35, 161– neutral with regard to jurisdictional claims in published maps and 173, https://doi.org/10.1016/0034-4257(91)90009-u, 1991. institutional affiliations. Baruth, B., Royer, A., Klisch, A., and Genovese, G.: The use of remote sensing within the MARS crop yield monitoring system of the European Commission, Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci, 37, 935–940, 2008. Acknowledgements. The authors were supported by the Newton Beck, P. S., Atzberger, C., Høgda, K. A., Johansen, B., Prize (Newton Prize 2019 Chair’s Award, administered by BEIS), and Skidmore, A. K.: Improved monitoring of vegeta- the United Kingdom’s Natural Environment Research Council tion dynamics at very high latitudes: A new method us- (NERC) National Centre for Earth Observation (NCEO) NC ODA ing MODIS NDVI, Remote Sens. Environ., 100, 321–334, Full project (NE/R000115/1) and the STFC under project AMAZ- https://doi.org/10.1016/j.rse.2005.10.021, 2006. ING – Advancing MAiZe INformation for Ghana (ST/V001388/1). Becker-Reshef, I., Vermote, E., Lindeman, M., and Justice, The 2020 campaign was funded by NERC NCEO under the NC C.: A generalized regression-based model for forecast- ODA Full project (NE/R000115/1). The Newton Prize funded the ing winter wheat yields in Kansas and Ukraine using purchase of equipment for this work as well as the personnel costs MODIS data, Remote Sens. Environ., 114, 1312–1323, for the 2021 campaigns. We thank the Planet Education and Re- https://doi.org/10.1016/j.rse.2010.01.010, 2010. search Program for providing access to the Planet data used in this Becker-Reshef, I., Franch, B., Barker, B., Murphy, E., Santamaria- study. Artigas, A., Humber, M., Skakun, S., and Vermote, E.: Prior Season Crop Type Masks for Winter Wheat Yield Forecasting: A US Case Study, Remote Sensing, 10, 1659, Financial support. This research has been supported by the Na- https://doi.org/10.3390/rs10101659, 2018. tional Centre for Earth Observation (NC ODA Full project, grant Becker-Reshef, I., Justice, C., Barker, B., Humber, M., Rem- no. NE/R000115/1), the Science and Technology Facilities Council bold, F., Bonifacio, R., Zappacosta, M., Budde, M., Maga- (AMAZING – Advancing MAiZe INformation for Ghana, grant no. dzire, T., Shitote, C., Pound, J., Constantino, A., Nakalembe, ST/V001388/1), and the Newton Fund (Newton Prize 2019 Chair’s C., Mwangi, K., Sobue, S., Newby, T., Whitcraft, A., Jarvis, Award). I., and Verdin, J.: Strengthening agricultural decisions in coun- tries at risk of food insecurity: The GEOGLAM Crop Mon- itor for Early Warning, Remote Sens. Environ., 237, 111553, Review statement. This paper was edited by Francesco N. https://doi.org/10.1016/j.rse.2019.111553, 2020. Tubiello and reviewed by two anonymous referees. Benami, E., Jin, Z., Carter, M. R., Ghosh, A., Hijmans, R. J., Hobbs, A., Kenduiywo, B., and Lobell, D. B.: Uniting remote sens- ing, crop modelling and economics for agricultural risk man- References agement, Nature Reviews Earth & Environment, 2, 140–159, https://doi.org/10.1038/s43017-020-00122-y, 2021. Abubakar, G. A., Wang, K., Shahtahamssebi, A., Xue, X., Belete, Beveridge, L., Whitfield, S., and Challinor, A.: Crop modelling: M., Gudo, A. J. A., Mohamed Shuka, K. A., and Gan, M.: Map- Towards locally relevant and climate-informed adaptation, Cli- ping Maize Fields by Using Multi-Temporal Sentinel-1A and matic Change, 147, 475–489, https://doi.org/10.1007/s10584- Sentinel-2A Images in Makarfi, Northern Nigeria, Africa, Sus- 018-2160-z, 2018. tainability, 12, 2539, https://doi.org/10.3390/su12062539, 2020. Brown, L. A., Fernandes, R., Djamai, N., Meier, C., Go- Antonaci, L., Demeke, M., and Vezzani, A.: The challenges bron, N., Morris, H., Canisius, F., Bai, G., Lerebourg, of managing agricultural price and production risks in C., Lanconelli, C., Clerici, M., and Dash, J.: Valida- sub-Saharan Africa, Tech. Rep. ESA Working Paper 14- tion of baseline and modified Sentinel-2 Level 2 Proto- 09, Agricultural Development Economics Division Food type Processor leaf area index retrievals over the United and Agriculture Organization of the United Nations, States, ISPRS J. Photogramm. Remote Sens., 175, 71–87, https://doi.org/10.22004/ag.econ.288979, 2014. https://doi.org/10.1016/j.isprsjprs.2021.02.020, 2021. Atkinson, P. M., Jeganathan, C., Dash, J., and Atzberger, C.: Inter- comparison of four models for smoothing satellite sensor time- Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 5407 Brown, M. E.: Remote sensing technology and land use analysis in Estes, L. D., Ye, S., Song, L., Luo, B., Eastman, J. R., Meng, food security assessment, Journal of Land Use Science, 11, 623– Z., Zhang, Q., McRitchie, D., Debats, S. R., Muhando, 641, https://doi.org/10.1080/1747423x.2016.1195455, 2016. J., Amukoa, A. H., Kaloo, B. W., Makuru, J., Mbatia, Burton, C., Yuan, F., Ee-Faye, C., Halabisky, M., Ongo, D., Mar, B. K., Muasa, I. M., Mucha, J., Mugami, A. M., Mugami, F., Addabor, V., Mamane, B., and Adimou, S.: Co-Production J. M., Muinde, F. W., Mwawaza, F. M., Ochieng, J., Oduol, of a 10-m Cropland Extent Map for Continental Africa using C. J., Oduor, P., Wanjiku, T., Wanyoike, J. G., Avery, Sentinel-2, Cloud Computing, and the Open-Data-Cube, AGU R. B., and Caylor, K. K.: High Resolution, Annual Maps of 2021 Fall Meeting, New Orleans, LA, 13–17 December 2021, Field Boundaries for Smallholder-Dominated Croplands at Na- 10, https://doi.org/10.1002/essoar.10510081.1, 2022. tional Scales, Frontiers in Artificial Intelligence, 4, 744863, Cairns, J. E., Hellin, J., Sonder, K., Araus, J. L., MacRobert, J. F., https://doi.org/10.3389/frai.2021.744863, 2022. Thierfelder, C., and Prasanna, B. M.: Adapting maize production Falconnier, G. N., Corbeels, M., Boote, K. J., Affholder, F., Adam, to climate change in sub-Saharan Africa, Food Secur., 5, 345– M., MacCarthy, D. S., Ruane, A. C., Nendel, C., Whitbread, 360, https://doi.org/10.1007/s12571-013-0256-x, 2013. A. M., Justes, E., Ahuja, L. R., Akinseye, F. M., Alou, I. N., Carletto, C., Jolliffe, D., and Banerjee, R.: The Emperor Amouzou, K. A., Anapalli, S. S., Baron, C., Basso, B., Baudron, has no data! Agricultural statistics in sub-Saharan Africa, F., Bertuzzi, P., Challinor, A. J., Chen, Y., Deryng, D., Elsayed, https://www.mortenjerven.com/wp-content/uploads/2013/04/ M. L., Faye, B., Gaiser, T., Galdos, M., Gayler, S., Gerardeaux, Panel-3-Carletto.pdf (last access: 22 November 2022), 2013. E., Giner, M., Grant, B., Hoogenboom, G., Ibrahim, E. S., Ka- Carletto, C., Jolliffe, D., and Banerjee, R.: From mali, B., Kersebaum, K. C., Kim, S.-H., Laan, M., Leroux, L., Tragedy to Renaissance: Improving Agricultural Data Lizaso, J. I., Maestrini, B., Meier, E. A., Mequanint, F., Ndoli, for Better Policies, J. Dev. Stud., 51, 133–148, A., Porter, C. H., Priesack, E., Ripoche, D., Sida, T. S., Singh, https://doi.org/10.1080/00220388.2014.968140, 2015. U., Smith, W. N., Srivastava, A., Sinha, S., Tao, F., Thorburn, Carlson, T. N. and Ripley, D. A.: On the relation between P. J., Timlin, D., Traore, B., Twine, T., and Webber, H.: Mod- NDVI, fractional vegetation cover, and leaf area index, Remote elling climate change impacts on maize yields under low nitro- Sens. Environ., 62, 241–252, https://doi.org/10.1016/s0034- gen input conditions in sub-Saharan Africa, Global Change Biol., 4257(97)00104-1, 1997. 26, 5942–5964, https://doi.org/10.1111/gcb.15261, 2020. Chemura, A., Schauberger, B., and Gornott, C.: Impacts Fang, H., Baret, F., Plummer, S., and Schaepman-Strub, G.: An of climate change on agro-climatic suitability of ma- Overview of Global Leaf Area Index (LAI): Methods, Prod- jor food crops in Ghana, PLoS One, 15, e0229881, ucts, Validation, and Applications, Rev. Geophys., 57, 739–799, https://doi.org/10.1371/journal.pone.0229881, 2020. https://doi.org/10.1029/2018rg000608, 2019. Clevers, J. and Gitelson, A.: Remote estimation of crop and Ferwerda, J. G. and Skidmore, A. K.: Can nutrient status of grass chlorophyll and nitrogen content using red-edge bands four woody plant species be predicted using field spectrom- on Sentinel-2 and -3, Int. J. Appl. Earth Obs., 23, 344–351, etry?, ISPRS J. Photogramm. Remote Sens., 62, 406–414, https://doi.org/10.1016/j.jag.2012.10.008, 2013. https://doi.org/10.1016/j.isprsjprs.2007.07.004, 2007. Croft, H., Arabian, J., Chen, J. M., Shang, J., and Liu, J.: Mapping Franch, B., Vermote, E., Becker-Reshef, I., Claverie, M., Huang, J., within-field leaf chlorophyll content in agricultural crops for ni- Zhang, J., Justice, C., and Sobrino, J.: Improving the timeliness trogen management using Landsat-8 imagery, Precis. Agric., 21, of winter wheat production forecast in the United States of Amer- 856–880, https://doi.org/10.1007/s11119-019-09698-y, 2019. ica, Ukraine and China using MODIS data and NCAR Growing Defourny, P., Jarvis, I., and Blaes, X.: JECAM Guidelines Degree Day information, Remote Sens. Environ., 161, 131–148, for cropland and crop type definition and field data col- https://doi.org/10.1016/j.rse.2015.02.014, 2015. lection, JECAM, http://jecam.org/wp-content/uploads/2018/10/ Freduah, B., MacCarthy, D., Adam, M., Ly, M., Ruane, A., JECAM_Guidelines_for_Field_Data_Collection_v1_0.pdf (last Timpong-Jones, E., Traore, P., Boote, K., Porter, C., and Adiku, access: 22 November 2022), 2014. S.: Sensitivity of Maize Yield in Smallholder Systems to Climate Deines, J. M., Patel, R., Liang, S.-Z., Dado, W., and Lobell, Scenarios in Semi-Arid Regions of West Africa: Accounting for D. B.: A million kernels of truth: Insights into scalable satellite Variability in Farm Management Practices, Agronomy, 9, 639, maize yield mapping and yield gap analysis from an extensive https://doi.org/10.3390/agronomy9100639, 2019. ground dataset in the US Corn Belt, Remote Sens. Environ., 253, Garcia, D.: Robust smoothing of gridded data in one and higher 112174, https://doi.org/10.1016/j.rse.2020.112174, 2021. dimensions with missing values, Comput. Stat. Data An., 54, Delegido, J., Verrelst, J., Alonso, L., and Moreno, J.: Evaluation of 1167–1178, https://doi.org/10.1016/j.csda.2009.09.020, 2010. Sentinel-2 Red-Edge Bands for Empirical Estimation of Green Giller, K. E., Delaune, T., Silva, J. a. V., van Wijk, M., Ham- LAI and Chlorophyll Content, Sensors-Basel, 11, 7063–7081, mond, J., Descheemaeker, K., van de Ven, G., Schut, A. G. T., https://doi.org/10.3390/s110707063, 2011. Taulya, G., Chikowo, R., and Andersson, J. A.: Small farms and Drusch, M., Del Bello, U., Carlier, S., Colin, O., Fernandez, development in sub-Saharan Africa: Farming for food, for in- V., Gascon, F., Hoersch, B., Isola, C., Laberinti, P., Marti- come or for lack of better options?, Food Secur., 13, 1431–1454, mort, P., Meygret, A., Spoto, F., Sy, O., Marchese, F., and https://doi.org/10.1007/s12571-021-01209-0, 2021. Bargellini, P.: Sentinel-2: Esa’s Optical High-Resolution Mission Gitelson, A. A., Viña, A., Verma, S. B., Rundquist, D. C., Arke- for GMES Operational Services, Remote Sens. Environ., 120, bauer, T. J., Keydan, G., Leavitt, B., Ciganda, V., Burba, G. G., 25–36, https://doi.org/10.1016/j.rse.2011.11.026, 2012. and Suyker, A. E.: Relationship between gross primary produc- Eilers, P. H. C.: A Perfect Smoother, Anal. Chem., 75, 3631–3636, tion and chlorophyll content in crops: Implications for the synop- https://doi.org/10.1021/ac034173t, 2003. https://doi.org/10.5194/essd-14-5387-2022 Earth Syst. Sci. Data, 14, 5387–5410, 2022 5408 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana tic monitoring of vegetation productivity, J. Geophys. Res., 111, situ datasets for agricultural land use mapping and monitor- D08S11, https://doi.org/10.1029/2005jd006017, 2006. ing in tropical countries, Earth Syst. Sci. Data, 13, 5951–5967, Gomez-Dans, J. L., Lewis, P., Yin, F., Asare, K., Lamptey, P., https://doi.org/10.5194/essd-13-5951-2021, 2021. Aidoo, K., MacCarthy, D., Ma, H., Wu, Q., Addi, M., Aboagye- Jönsson, P. and Eklundh, L.: Seasonality extraction by function fit- Ntow, S., Doe, C. E., Alhassan, R., Kankam-Boadu, I., Huang, ting to time-series of satellite sensor data, IEEE T. Geosci. Re- J., and Li, X.: Location, biophysical and agronomic param- mote, 40, 1824–1832, 2002. eters for croplands in Northern Ghana, Zenodo [data set], Kang, Y. and Özdoğan, M.: Field-level crop yield map- https://doi.org/10.5281/zenodo.6632083, 2022. ping with Landsat using a hierarchical data assimila- Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., tion approach, Remote Sens. Environ., 228, 144–163, and Moore, R.: Google Earth Engine: Planetary-scale geospa- https://doi.org/10.1016/j.rse.2019.04.005, 2019. tial analysis for everyone, Remote Sens. Environ., 202, 18–27, Kang, Y., Özdoğan, M., Zipper, S., Román, M., Walker, J., Hong, https://doi.org/10.1016/j.rse.2017.06.031, 2017. S., Marshall, M., Magliulo, V., Moreno, J., Alonso, L., Miyata, Hay, R. and Gilbert, R.: Variation in the harvest index of tropical A., Kimball, B., and Loheide, S.: How Universal Is the Rela- maize: Evaluation of recent evidence from Mexico and Malawi, tionship between Remotely Sensed Vegetation Indices and Crop Ann. Appl. Biol., 138, 103–109, https://doi.org/10.1111/j.1744- Leaf Area Index? A Global Assessment, Remote Sensing, 8, 597, 7348.2001.tb00090.x, 2001. https://doi.org/10.3390/rs8070597, 2016. Heiberger, R. M. and Becker, R. A.: Design of an S Func- Karlson, M., Ostwald, M., Bayala, J., Bazié, H. R., Ouedraogo, tion for Robust Regression Using Iteratively Reweighted A. S., Soro, B., Sanou, J., and Reese, H.: The Potential of Least Squares, J. Comput. Graph. Stat., 1, 181–196, Sentinel-2 for Crop Production Estimation in a Smallholder https://doi.org/10.2307/1390715, 1992. Agroforestry Landscape, Burkina Faso, Front. Environ. Sci., 8, Hird, J. N. and McDermid, G. J.: Noise reduction of 85, https://doi.org/10.3389/fenvs.2020.00085, 2020. NDVI time series: An empirical comparison of se- Karra, K., Kontgis, C., Statman-Weil, Z., Mazzariello, J. C., lected techniques, Remote Sens. Environ., 113, 248–258, Mathis, M., and Brumby, S. P.: Global land use/land https://doi.org/10.1016/j.rse.2008.09.003, 2009. cover with Sentinel 2 and deep learning, in: 2021 IEEE Houborg, R. and McCabe, M.: High-Resolution NDVI from International Geoscience and Remote Sensing Symposium Planet’s Constellation of Earth Observing Nano-Satellites: A IGARSS, Brussels, Belgium, 4704–4707, 11–16 July 2021, New Data Source for Precision Agriculture, Remote Sensing, 8, https://doi.org/10.1109/igarss47720.2021.9553499, 2021. 768, https://doi.org/10.3390/rs8090768, 2016. Kganyago, M., Mhangara, P., Alexandridis, T., Laneve, G., Huang, J., Gómez-Dans, J. L., Huang, H., Ma, H., Wu, Ovakoglou, G., and Mashiyi, N.: Validation of sentinel-2 leaf Q., Lewis, P. E., Liang, S., Chen, Z., Xue, J.-H., Wu, area index (LAI) product derived from SNAP toolbox and Y., Zhao, F., Wang, J., and Xie, X.: Assimilation of its comparison with global LAI products in an African semi- remote sensing into crop growth models: Current status arid agricultural landscape, Remote Sens. Lett., 11, 883–892, and perspectives, Agr. Forest Meteorol., 276–277, 107609, https://doi.org/10.1080/2150704x.2020.1767823, 2020. https://doi.org/10.1016/j.agrformet.2019.06.008, 2019. Kouadio, L., Duveiller, G., Djaby, B., El Jarroudi, M., Defourny, P., Ibrahim, E. S., Rufin, P., Nill, L., Kamali, B., Nendel, C., and and Tychon, B.: Estimating regional wheat yield from the shape Hostert, P.: Mapping Crop Types and Cropping Systems in of decreasing curves of green area index temporal profiles re- Nigeria with Sentinel-2 Imagery, Remote Sensing, 13, 3523, trieved from MODIS data, Int. J. Appl. Earth Obs., 18, 111–118, https://doi.org/10.3390/rs13173523, 2021. https://doi.org/10.1016/j.jag.2012.01.009, 2012. Jain, M., Srivastava, A. K., Balwinder-Singh, Joon, R. K., Lambert, M.-J., Traoré, P. C. S., Blaes, X., Baret, P., and Defourny, McDonald, A., Royal, K., Lisaius, M. C., and Lobell, P.: Estimating smallholder crops production at village level from D. B.: Mapping Smallholder Wheat Yields and Sowing Sentinel-2 time series in Mali’s cotton belt, Remote Sens. En- Dates Using Micro-Satellite Data, Remote Sensing, 8, 860, viron., 216, 647–657, https://doi.org/10.1016/j.rse.2018.06.036, https://doi.org/10.3390/rs8100860, 2016. 2018. Jin, Z., Azzari, G., and Lobell, D. B.: Improving the accuracy of le Maire, G., François, C., and Dufrêne, E.: Towards universal broad satellite-based high-resolution yield estimation: A test of mul- leaf chlorophyll indices using PROSPECT simulated database tiple scalable approaches, Agr. Forest Meteorol., 247, 207–220, and hyperspectral reflectance measurements, Remote Sens. Env- https://doi.org/10.1016/j.agrformet.2017.08.001, 2017. iron., 89, 1–28, https://doi.org/10.1016/j.rse.2003.09.004, 2004. Jin, Z., Azzari, G., You, C., Di Tommaso, S., Aston, S., Burke, M., le Maire, G., François, C., Soudani, K., Berveiller, D., Pontailler, and Lobell, D. B.: Smallholder maize area and yield mapping J.-Y., Bréda, N., Genet, H., Davi, H., and Dufrêne, E.: Calibra- at national scales with Google Earth Engine, Remote Sens. En- tion and validation of hyperspectral indices for the estimation of viron., 228, 115–128, https://doi.org/10.1016/j.rse.2019.04.016, broadleaved forest leaf chlorophyll content, leaf mass per area, 2019. leaf area index and leaf canopy biomass, Remote Sens. Envi- Jolivot, A., Lebourgeois, V., Leroux, L., Ameline, M., Andria- ron., 112, 3846–3864, https://doi.org/10.1016/j.rse.2008.06.005, manga, V., Bellón, B., Castets, M., Crespin-Boucaud, A., De- 2008. fourny, P., Diaz, S., Dieye, M., Dupuy, S., Ferraz, R., Gaetano, Lewis, P., Gómez-Dans, J., Kaminski, T., Settle, J., Quaife, T., Go- R., Gely, M., Jahel, C., Kabore, B., Lelong, C., le Maire, G., bron, N., Styles, J., and Berger, M.: An Earth Observation Land Lo Seen, D., Muthoni, M., Ndao, B., Newby, T., de Oliveira Data Assimilation System (EO-LDAS), Remote Sens. Environ., Santos, C. L. M., Rasoamalala, E., Simoes, M., Thiaw, I., 120, 219–235, https://doi.org/10.1016/j.rse.2011.12.027, 2012. Timmermans, A., Tran, A., and Bégué, A.: Harmonized in Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana 5409 Loew, A., Bell, W., Brocca, L., Bulgin, C. E., Burdanowitz, J., Planet: Planet imagery product specifications, Planet Team, Calbet, X., Donner, R. V., Ghent, D., Gruber, A., Kaminski, San Francisco, CA, USA, https://assets.planet.com/docs/Planet_ T., Kinzel, J., Klepp, C., Lambert, J.-C., Schaepman-Strub, Combined_Imagery_Product_Specs_letter_screen.pdf (last ac- G., Schröder, M., and Verhoelst, T.: Validation practices for cess: 22 November 2022), 2018. satellite-based Earth observation data across communities, Rev. Pritchard, R., Alexandridis, T., Amponsah, M., Ben Kha- Geophys., 55, 779–817, https://doi.org/10.1002/2017rg000562, tra, N., Brockington, D., Chiconela, T., Ortuño Castillo, 2017. J., Garba, I., Gómez-Giménez, M., Haile, M., Kagoyire, Louis, J., Debaecker, V., Pflug, B., Main-Knorn, M., Bieniarz, C., Kganyago, M., Kleine, D., Korme, T., Manni, A. A., J., Mueller-Wilm, U., Cadau, E., and Gascon, F.: Sentinel-2 Mashiyi, N., Massninga, J., Mensah, F., Mugabowindekwe, Sen2Cor: L2a processor for users, in: Proceedings of the Liv- M., Meta, V., Noort, M., Pérez Ramirez, P., Suárez Bel- ing Planet Symposium, Proceedings of the conference held 9– trán, J., and Zoungrana, E.: Developing capacity for impact- 13 May 2016 in Prague, Czech Republic, edited by: Ouwe- ful use of Earth Observation data: Lessons from the Afri- hand, L., ESA-SP Volume 740, ISBN 978-92-9221-305-3, 91, CultuReS project, Environmental Development, 42, 100695, https://elib.dlr.de/107381/1/LPS2016_sm10_3louis.pdf (last ac- https://doi.org/10.1016/j.envdev.2021.100695, 2022. cess: 22 November 2022), 2016. Ragasa, C., Chapoto, A., and Kolavalli, S.: Maize productivity in MacCarthy, D. S., Akponikpe, P. B. I., Narh, S., and Tegbe, Ghana, GSSP Policy Note 5, International Food Policy Research R.: Modeling the effect of seasonal climate variability on Institute (IFPRI), Washington, D. C., http://ebrary.ifpri.org/cdm/ the efficiency of mineral fertilization on maize in the coastal ref/collection/p15738coll2/id/128263 (last access: 22 Novem- savannah of Ghana, Nutr. Cycl. Agroecosys., 102, 45–64, ber 2022), 2014. https://doi.org/10.1007/s10705-015-9701-x, 2015. Rustowicz, R., Cheong, R., Wang, L., Ermon, S., Burke, MacCarthy, D. S., Adiku, S. G. K., Freduah, B. S., Gbefo, M., and Lobell, D.: Semantic Segmentation of Crop F., and Kamara, A. Y.: Using CERES-Maize and ENSO Type in Africa: A Novel Dataset and Analysis of Deep as Decision Support Tools to Evaluate Climate-Sensitive Learning Methods, in: Proceedings of the IEEE Con- Farm Management Practices for Maize Production in the ference on Computer Vision and Pattern Recognition Northern Regions of Ghana, Front. Plant Sci., 8, 31, Workshops, 75–82, Long Beach (CA), 16–20 June 2019, https://doi.org/10.3389/fpls.2017.00031, 2017. http://openaccess.thecvf.com/content_CVPRW_2019/papers/ Mkhabela, M. S., Bullock, P., Raj, S., Wang, S., and Yang, cv4gc/Rustowicz_Semantic_Segmentation_of_Crop_Type_ Y.: Crop yield forecasting on the Canadian Prairies using in_Africa_A_Novel_Dataset_CVPRW_2019_paper.pdf (last MODIS NDVI data, Agric. For. Meteorol., 151, 385–393, access: 22 November 2022), 2019. https://doi.org/10.1016/j.agrformet.2010.11.012, 2011. Sánchez, P. A.: Tripling crop yields in tropical Africa, Nat. Geosci., Moser, S. B., Feil, B., Jampatong, S., and Stamp, P.: Ef- 3, 299–300, https://doi.org/10.1038/ngeo853, 2010. fects of pre-anthesis drought, nitrogen fertilizer rate, and Smith, M.-L., Ollinger, S. V., Martin, M. E., Aber, J. D., variety on grain yield, yield components, and harvest in- Hallett, R. A., and Goodale, C. L.: Direct Estimation dex of tropical maize, Agr. Water Manage., 81, 41–58, Of Aboveground Forest Productivity Through Hyper- https://doi.org/10.1016/j.agwat.2005.04.005, 2006. spectral Remote Sensing Of Canopy Nitrogen, Ecol. Nakalembe, C., Becker-Reshef, I., Bonifacio, R., Hu, G., Hum- Appl., 12, 1286–1302, https://doi.org/10.1890/1051- ber, M. L., Justice, C. J., Keniston, J., Mwangi, K., Rem- 0761(2002)012[1286:deoafp]2.0.co;2, 2002. bold, F., Shukla, S., Urbano, F., Whitcraft, A. K., Li, Srivastava, A. K., Mboh, C. M., Gaiser, T., Webber, H., and Y., Zappacosta, M., Jarvis, I., and Sanchez, A.: A re- Ewert, F.: Effect of sowing date distributions on simula- view of satellite-based global agricultural monitoring sys- tion of maize yields at regional scale – A case study tems available for Africa, Global Food Security, 29, 100543, in Central Ghana, West Africa, Agr. Syst., 147, 10–23, https://doi.org/10.1016/j.gfs.2021.100543, 2021. https://doi.org/10.1016/j.agsy.2016.05.012, 2016. Nations, U.: Ensuring food and nutrition security, in: World Statistics, Research and Information Directorate (SRID): Agri- economic and social survey 2013, United Nations, 85–119, culture in Ghana – Facts and figures (2010), Ministry of Food https://doi.org/10.18356/0e3c4bbb-en, 2013. and Agriculture (MoFA), http://gis4agricgh.net/POLICIES/ Nguy-Robertson, A., Gitelson, A., Peng, Y., Viña, A., Arke- AGRICULTURE-IN-GHANA-FF-2010.pdf (last access: bauer, T., and Rundquist, D.: Green Leaf Area Index Esti- 22 November 2022), 2011. mation in Maize and Soybean: Combining Vegetation Indices Sultan, B. and Gaetani, M.: Agriculture in West Africa in the to Achieve Maximal Sensitivity, Agron. J., 104, 1336–1347, Twenty-First Century: Climate Change and Impacts Scenar- https://doi.org/10.2134/agronj2012.0065, 2012. ios, and Potential for Adaptation, Front. Plant Sci., 7, 1262, Nyantakyi-Frimpong, H. and Bezner-Kerr, R.: The relative im- https://doi.org/10.3389/fpls.2016.01262, 2016. portance of climate change in the context of multiple stres- Turner, D. P., Cohen, W. B., Kennedy, R. E., Fassnacht, K. S., sors in semi-arid Ghana, Glob. Environ. Change, 32, 40–56, and Briggs, J. M.: Relationships between Leaf Area Index https://doi.org/10.1016/j.gloenvcha.2015.03.003, 2015. and Landsat TM Spectral Vegetation Indices across Three Petersen, L.: Real-Time Prediction of Crop Yields From Temperate Zone Sites, Remote Sens. Environ., 70, 52–68, MODIS Relative Vegetation Health: A Continent- https://doi.org/10.1016/s0034-4257(99)00057-7, 1999. Wide Analysis of Africa, Remote Sensing, 10, 1726, Unganai, L. S. and Kogan, F. N.: Drought Monitoring and Corn https://doi.org/10.3390/rs10111726, 2018. Yield Estimation in Southern Africa from AVHRR Data, Remote https://doi.org/10.5194/essd-14-5387-2022 Earth Syst. Sci. Data, 14, 5387–5410, 2022 5410 J. L. Gómez-Dans et al.: Biophysical parameters for maize in northern Ghana Sens. Environ., 63, 219–232, https://doi.org/10.1016/s0034- Whitcraft, A. K., Becker-Reshef, I., Justice, C. O., Gifford, L., Kav- 4257(97)00132-6, 1998. vada, A., and Jarvis, I.: No pixel left behind: Toward integrating van Loon, M. P., Adjei-Nsiah, S., Descheemaeker, K., Akotsen- Earth Observations for agriculture into the United Nations Sus- Mensah, C., van Dijk, M., Morley, T., van Ittersum, tainable Development Goals framework, Remote Sens. Environ., M. K., and Reidsma, P.: Can yield variability be ex- 235, 111470, https://doi.org/10.1016/j.rse.2019.111470, 2019. plained? Integrated assessment of maize yield gaps across Xiong, J., Thenkabail, P., Tilton, J., Gumma, M., Teluguntla, P., smallholders in Ghana, Field Crop. Res., 236, 132–144, Oliphant, A., Congalton, R., Yadav, K., and Gorelick, N.: Nomi- https://doi.org/10.1016/j.fcr.2019.03.022, 2019. nal 30-m Cropland Extent Map of Continental Africa by Integrat- Verrelst, J., Malenovský, Z., Van der Tol, C., Camps-Valls, G., ing Pixel-Based and Object-Based Algorithms Using Sentinel-2 Gastellu-Etchegorry, J.-P., Lewis, P., North, P., and Moreno, and Landsat-8 Data on Google Earth Engine, Remote Sensing, 9, J.: Quantifying Vegetation Biophysical Variables from Imaging 1065, https://doi.org/10.3390/rs9101065, 2017. Spectroscopy Data: A Review on Retrieval Methods, Surv. Geo- Yang, Y., Luo, J., Huang, Q., Wu, W., and Sun, Y.: Weighted phys., 40, 589–629, https://doi.org/10.1007/s10712-018-9478-y, Double-Logistic Function Fitting Method for Reconstructing the cited By :131, 2018. High-Quality Sentinel-2 NDVI Time Series Data Set, Remote Wang, S., Guan, K., Wang, Z., Ainsworth, E. A., Zheng, T., Sensing, 11, 2342, https://doi.org/10.3390/rs11202342, 2019. Townsend, P. A., Li, K., Moller, C., Wu, G., and Jiang, C.: Zhang, X., Friedl, M. A., Schaaf, C. B., Strahler, A. H., Hodges, Unique contributions of chlorophyll and nitrogen to predict crop J. C., Gao, F., Reed, B. C., and Huete, A.: Monitoring vegetation photosynthetic capacity from leaf spectroscopy, J. Exp. Bot., 72, phenology using MODIS, Remote Sens. Environ., 84, 471–475, 341–354, https://doi.org/10.1093/jxb/eraa432, 2020. https://doi.org/10.1016/s0034-4257(02)00135-9, 2003. Earth Syst. Sci. Data, 14, 5387–5410, 2022 https://doi.org/10.5194/essd-14-5387-2022