“Without data you’re just another person with an opinion” – A friend of mine eagerly quoted Deming while I was complaining how data scarcity had been such a pain in development economics. The quote became my favorite for a few days.
The lack of data is a nightmare, and this nightmare is particularly scary for researchers in development economics. A lot of time developing countries lack resources or incentives to construct and maintain reliable databases – For example, data are often not available at a subnational level; data are not collected during wartime or recent post-conflict periods; or for the worst part data have never existed (and probably will never do…). Moreover, at times development topics themselves can be too sensitive or too “irrational”, which makes finding a good data source a tricky task. How could you conduct a survey with the local Indonesian government on their corruption through illegal logging?
Luckily many economists have broken the spell of data scarcity: some have creatively designed and collected field data, while others have found ingenious statistical methods to trace the forgotten trove of data. Among these techniques, satellite data is a new star. Coupled with the advancement of computer science, the technological evolution in satellite and remote sensing instruments has created a bust of novel spatial datasets, which cover diverse topics that are beyond the scope of traditional empirical data – data on Uppsala armed conflicts locations, historical Chinese boundaries and tribal areas, South African drainage network, to name a few interesting spatial datasets. Moreover, since 2014, for the first time American companies are allowed to sell images with resolution below 0.5 meters to non-governmental agencies. This has allowed researchers to gain access to a higher degree of variations in a wider range of topics.
However, satellite data has its own pitfalls. Like any other types of data, satellite data can be subject to measurement errors, and privacy concerns. Further, spatial dependence can plague the whole empirical analysis if it is not well treated. And lastly, from my experience, extracting and managing spatial data are unfortunately not a piece of cake.
Given its pros and cons, satellite data has become an increasingly common tool to answer unlikely questions in development economics. To name a few, Burgess et al (2012) use satellite data of deforestation to prove that an increase in political jurisdictions increases deforestation in Indonesia. Harari and La Ferrara (2015) exploit 1-degree gridded cells of weather data to pin down the spatially contagious effects of droughts on armed conflicts in Africa. Marx, Stoker and Suri (2015) use satellite data of sunlight reflection on the roofs to show that ethnicity matters for the housing market in Kenyan slums – co-ethnicity between residents and the tribal chief reduces the price and increases services of the slums. And lastly, by combining weather data with georeferenced data from Demographic and Health Surveys, Kudamatsu, Persson, and Stromberg (2016) estimate the effects of weather variations on infant mortality.
If satellite data sparks your interests, Donaldson and Storeygard (2016) provide a great systematic literature review on the recent applications of satellite data in economics. For now, read on to find out some hands-on experience with satellite data in development economics projects.
Long Hong – Master graduate, Bocconi:
- In which development economics project have you used spatial (satellite) data?
I have used spatial data for one chapter of my thesis. The research question is: at a disaggregated level, how ethnic segregation and fractionalization affect local conflict. In particular, by using “Geo-referencing ethnic groups” (GREG) and high-resolution population data from Gridded Population of the World v.4, I have calculated the ethnic segregation and fractionalization at 1 x 1 degree cell level. Also, the conflict data from the Armed Conflict Location and Event dataset (ACLED) contains detailed geographic information, which allows me to locate each conflict event.
- Why did you decide to use spatial data instead of “traditional” economic datasets?
For my project, using spatial data allows me to conduct my analysis at a subnational level instead of the traditional country level for ethnicity and conflict. And in general, there are no sub-national data for many variables such as population and GDP.
- Some advantages and disadvantages in using spatial data for your project?
Advantage: It helps me understand how the spatial distribution of ethnic groups, which is measured as ethnic segregation, affects conflict. Also, it gives a very nice visual representation of the data and helps me gain a better sense of the data simply by looking at the maps.
Disadvantage: First, the software, ArcGis, is not very user-friendly, although it is not hard to pick up. Second, since the spatial data is usually very large, sometimes it takes time to process.
- Any favourite Stata commands that help specifically dealing with spatial data?
You can use -shp2dta- and two other packages to import georeferenced data to Stata. For more information, please read http://www.stata.com/support/faqs/graphics/spmap-and-maps
- Do you have any recommendations for spatial data sources?
Yes, there are many sources: DIVA-GIS, UCDP-GED, NASA website, GRID-PRIO v2, Gridded Population of the World, etc.
Lara Engelfriet – Consultant, OECD:
- In which development economics project have you used spatial data?
For my thesis, using crowd-sourced datasets I analyzed the effect of urban form (city size, urban density, land-use mix, polycentricity and spatial clustering) on the cost of commuting expressed in distance and time in large Chinese cities. Studies on European and the U.S. cities have demonstrated that travel behavior is influenced by urban form. Based on these findings, policies steering the shape of cities have been proposed to reduce urban transport emissions and limit congestion. Such policies can also be relevant for the rapidly growing and motorizing Chinese cities. Yet, empirical evidence on the relationships between urban form and car usage is scarce for the specific Chinese context.
- Advantages and disadvantages of using spatial data for your project?
The advantages were that we could include more indicators for urban form than just aggregate measures such as population density or city size. Those indicators are essential to define the form of cities, because the internal resources distribution in a city can be an important determinant, say for commuting and accessibility. To name an example: the cities of Los Angeles and New York have the same average population density. However, those cities are very different in terms of the internal distribution of resources: New York has much higher density clusters of people/jobs/businesses than L.A. This, in turn, determines for a large part the length of commuting trips and CO2 emissions.
Disadvantages were that the availability of spatial data is scarce for Chinese cities. Therefore, we had to make use of crowd sourced datasets, which may be subject to certain biases.
- Do you have any recommendations for spatial data sources?
For studies on urbanization, built-up area and population data can be very useful, which are currently publicly available for a worldwide coverage and with high resolution (as detailed as cells of 12 meters). By combining these two types of data, it is possible to infer for each cell whether it is urban built-up area or not and how many people live there.
Furthermore, crowd-sourced datasets can be helpful in regions where data sources are scarce (in my research on Chinese cities I used crowd sourced data from the Beijing City Lab).
- Any additional suggestions or comments?
Use software that is specially developed to analyze spatial data! ArcGIS is a very good and easy to use program. It makes very nice maps as well! However, licenses are very expensive. So QGIS, the open source version, is a very good alternative with almost the same functionalities, however, a bit more difficult to use. Both QGIS and ArcGIS have built-in Python consoles, which can be very useful if you want to automate your tasks.
Robert Grundke – Economist, OECD:
- For which project that have you used spatial data?
The title of the paper is “Coerced Labor in a Global Supply Chain: How Higher Commodity Prices (Don’t) Transmit to the Poor”. In this project, I investigate whether the land privatization reform has affected the pass-through of cotton world market prices to rural labor markets.
- Any specific benefits from satellite data for your project?
A more robust identification. The empirical strategy is to identify which municipalities that can and cannot grow cotton. In the household survey, there is information pre-shock on whether municipalities grow cotton or not. But this status might be endogenous to other municipality-level variables like labor supply and wages, which we use as dependent variables. So using georeferenced data on the suitability of land for cotton production from the FAO Global Agricultural Ecological Zones (GAEZ) database was an important step to exogenously identify cotton and non-cotton municipalities (treatment and control group).
- Any shortcomings in using spatial data for your project?
A potential disadvantage is that matching geographical data to communities in the household survey might be time-consuming (e.g. wrong coordinates for communities in the survey, imprecise information in the GEO data).
- Any additional comments?
Always check the coordinates for communities and the preciseness of the GIS data! In my case, the database was highly erroneous with respect to coordinates for communities. More than 50% of the communities had wrong coordinates and I had to retrieve the correct coordinates from old Russian maps and other Tajik website information.
To conclude, satellite data can mitigate some parts of data limitation in development, but it is not a cure-all solution to approach development economics. The other side of the coin is that without an opinion, you’re just another person with data. A great interest in development is key after all.