Anthill Inside 2019

A conference on AI and Deep Learning

PySpark for GeoSpatial Data


Prakhar Srivastava


GeoSpatial data is the key data source when it comes to external data sources but this data is often too large to process. This is where PySpark comes in to reduce the computation time and makes the whole code more than 5 times faster. This workshop aims to solve the problem of calculating land covered with greenry of a region (Delhi) using Satellite images and Python.


Part - 1
The session will be started with an introduction to the external datasources viz. Satellite images (Landsat-8, Sentinel-2), the shapefiles corpus and the basics of GeoSpatial Data.

Libraries used: Shapely, Geopandas, Rasterio

Part - 2
The session would further be proceeded with the a very essential use case of Geospatial data, to find green cover of a given region. This particular use case will be focused on Delhi and how the vegetation of the region has changed in the span of 5 years. This would require an introduction to the concept of Green Cover and how this can be calculated using satellite imagery. I’ll also show how to parallelize this process using python libraries.

Libraries used: Numba, Numpy, Multiprocessing, Multithreading

Part - 3
This part of the session would deal with the carrying the code that has been done till now, and to modified it in a way such that PySpark would be able to run this existing python code at a much faster speed.
This part would deal with explaining the idea behind Spark, why does it works so well, how to hack your way to get GeoSpatial data working with PySpark.

Libraries used: PySpark

Data Sources: Nasa (LandSat - 8), ESA (Sentinel - 2)


The workshop would be self-contained with all the code and dependencies ready to execute.

Speaker bio

Greetings to everyone,

I’m Prakhar Srivastava, researcher and open source lover. I have worked on Deep learning models for almost 4 years now and mentor the course on Coursera. I’ve researched with India’s leading research college, IIIT Delhi ( I’ve worked as a student developer in GSoC‘18 under OpenAstronomy and as a team leader at Stanford Scholar initiative. I’ve hosted complete lecture sessions on Deep learning in my college under IEEE. I’m currently working as a Machine learning Engineer at Atlan (, a leading data democratization startup where we work with humans of data around the world to help them do their lives’ best work.