PySpark for GeoSpatial Data

Nov 2019

18 Mon

19 Tue

20 Wed

21 Thu

22 Fri

23 Sat 08:30 AM – 05:30 PM IST

24 Sun

Taj M G Road, Bangalore, Bangalore

PySpark for GeoSpatial Data

Submitted Aug 20, 2019

Section: Tutorials Technical level: Beginner Session type: Tutorial

GeoSpatial data is the key data source when it comes to external data sources but this data is often too large to process. This is where PySpark comes in to reduce the computation time and makes the whole code more than 5 times faster. This workshop aims to solve the problem of calculating land covered with greenry of a region (Delhi) using Satellite images and Python.

Outline

Part - 1
The session will be started with an introduction to the external datasources viz. Satellite images (Landsat-8, Sentinel-2), the shapefiles corpus and the basics of GeoSpatial Data.

Libraries used: Shapely, Geopandas, Rasterio

Part - 2
The session would further be proceeded with the a very essential use case of Geospatial data, to find green cover of a given region. This particular use case will be focused on Delhi and how the vegetation of the region has changed in the span of 5 years. This would require an introduction to the concept of Green Cover and how this can be calculated using satellite imagery. I’ll also show how to parallelize this process using python libraries.

Libraries used: Numba, Numpy, Multiprocessing, Multithreading

Part - 3
This part of the session would deal with the carrying the code that has been done till now, and to modified it in a way such that PySpark would be able to run this existing python code at a much faster speed.
This part would deal with explaining the idea behind Spark, why does it works so well, how to hack your way to get GeoSpatial data working with PySpark.

Libraries used: PySpark

Data Sources: Nasa (LandSat - 8), ESA (Sentinel - 2)

Requirements

The workshop would be self-contained with all the code and dependencies ready to execute.

Speaker bio

Greetings to everyone,

I’m Prakhar Srivastava, researcher and open source lover. I have worked on Deep learning models for almost 4 years now and mentor the deeplearning.ai course on Coursera. I’ve researched with India’s leading research college, IIIT Delhi (http://midas.iiitd.edu.in/team/prakhar.html). I’ve worked as a student developer in GSoC’18 under OpenAstronomy and as a team leader at Stanford Scholar initiative. I’ve hosted complete lecture sessions on Deep learning in my college under IEEE. I’m currently working as a Machine learning Engineer at Atlan (www.atlan.com), a leading data democratization startup where we work with humans of data around the world to help them do their lives’ best work.

Anthill Inside 2019