The Potential of Machine Learning for Development Economics Research Data

By Armin Satzger

This short article aims to provide a few pointers for fellow students and applied researchers working on topics related to development, e.g. in the realms of agriculture, environment, health or infrastructure, that are interested in deepening their knowledge of how machine learning (ML) can be useful for the construction of valuable datasets for use in economics research projects. Please note that the linked articles provide for a far deeper dive into the topics mentioned than is possible in this rather brief and therefore necessarily superficial introduction to the subject.

How can machine learning contribute to economics?

Slowly but surely, ML methods and theory are starting to be adopted in mainstream economics research in fields such diverse as financial economics (Gu et al., 2019) and real estate (Glaeser et al., 2018); human capital selection (Bajari et al.,2015; Chalfin et al., 2016); and mechanism design (Dütting et al., 2019). In the following, I would, rather than focusing on ML methods, like to describe some of the new data sources that have now become accessible to economists and, among those, focus on the ones of potential relevance to (aspiring) development economics researchers. For those wishing to learn more about the similarities and differences between traditional econometric and machine learning methods, I would suggest taking a look at Athey & Imbens (2019) and Mullainathan & Spiess (2017), articles intended to supplement graduate-level introductory courses on ML methods for an audience already familiar, to some extent, with econometric theory.

What are the kinds of new data that are becoming accessible?

In general, a major advantage of new machine learning methods is the breadth of accessibility and useability of data that comes with it. This particular concerns areas where ML solves classification problems.

Natural-language processing may serve to make large corpora of texts more accessible to researchers, by allowing to identify certain themes, categories and events from the texts. The sources of text may include things such as financial statements, administrative data, party manifestos, or text from news sites. Last year, the Association for Computational Linguistics also started hosting an annual workshop series on economics and natural language processing (ECONLP; here the link to the proceedings of the first and the second workshop on economics and natural language processing).

Computer vision, on the other hand, may help to correctly classify large numbers of photos in a variety of areas. Glaeser et al. (2018), for example, use Google Street View images and computer vision techniques to investigate the impact of the appearance of houses and their neighbourhoods on real estate prices. Satellite imagery is, however, a particularly promising instance of a new data source. Satellite data may not only be used to construct night-time light intensity proxies for economic activity as has been done before by researchers in development economics (see, e.g., Michalopoulos & Papaioannou (2013)) but also to estimate crop yields, air pollution, and land cover change, amongst other things. A more extensive review of the possible uses of satellite data in economics may be found in Donaldson & Storeygard (2016). To the benefit of economists, remotely sensed and thus already useable data on a wide variety of topics is often already freely available online, as is the case with high-quality LANDSAT satellite data, for example. A great example of the use of such remotely sensed data in a political economy / environmental economics context is Burgess et al. (2012) who study the impacts of institutional redistricting reforms in Indonesia on deforestation / forest cover change by using a raster-level (250m x 250m cells) satellite imagery dataset from MODIS sensor data.

Why does this hold any relevance for the development field?

In development in particular, many non-experimental studies still rely to a large degree on household-level or individual-level surveys, such as the Demographic & Health (DHS) series of household surveys, conducted by organisations like the World Bank in a largely standardized manner in most developing countries in relatively regular multi-year intervals. ML techniques may allow researchers to either expand the range of accessible data by constructing novel datasets themselves or to rely on prior work by other researchers such as Jean et al. (2016) who estimate poverty proxies on a very granular level using satellite imagery in combination with ML algorithms. Compared with more developed economies, developing countries thus provide a particularly attractive setting for new kinds of data as the ones described above.

Last but not least, it shall also be mentioned that these advances in the development area also provide for interesting opportunities for collaboration with researchers from other disciplines / departments, in particular computer scientists working on ML topics. De-Arteaga et al. (2018) discuss a number of research areas where they see potential for making ML techniques more useful for overcoming the challenges typically associated with developing-country data, including improving the robustness of ML algorithms to small and/or messy datasets; introducing decision support systems to battle, e.g., corruption and support health services provision; and improving transfer learning for natural-language processing for low-resource languages to reduce obstacles to information and knowledge flow.

This shall already conclude this short piece hopefully shedding some light on the new data use enabled by machine learning for development economics. My motivation for writing this piece stems from my personal interest in the subject and I would, of course, be delighted to discuss the topic further with anyone from within the Bocconi community who shares this interest.

Armin Satzger is a student in the M.Sc. Economics and Social Sciences programme at Bocconi University. The topic of the blog post was also discussed at a recent session of the regular development coffee meetings between LEAP-affiliated Bocconi faculty members and Bocconi students interested in development.