Gabbar: Machine learning to guard OpenStreetMap

This submission has been added to the schedule

Powered by VideoKen

BK

Gabbar: Machine learning to guard OpenStreetMap

Submitted Apr 30, 2017

Section: Full talk for data engineering track Technical level: Intermediate

OpenStreetMap is the largest free and open map of the world! An average of 2 million features are touched by volunteers around the world every single day. Amazing isn’t it? The global scale and the local diversity bring in a host of challenges for maintaining a high quality of data on OpenStreetMap.

The Mapbox data team works closely with communities of mappers to validate and protect OpenStreetMap data. In this talk, I will do a deep dive into the diversity of mapping on OpenStreetMap, the intricate and challenging data quality problems, learnings from building open tools to aid mappers and how Gabbar, a machine learning based infrastructure, can guard OpenStreetMap from invalid and suspicious edits.

From this talk, I hope to share how open and collaborative projects like OpenStreetMap and Wikipedia are benefitting from open and collaborative machine learning, the opportunities for us as volunteers to build cool and important technology in the open and use the power of AI for a better world for all of us. The intended audience is people interested and/or practicing machine learning to solve data problems as well as people interested and/or contributing to the tech for open projects like OpenStreetMap and Wikipedia.

Outline

Edits in a few minutes on OpenStreetMap

50pbxyi-scaled

1. OpenStreetMap (OSM)

OSM is the largest free and open map of the world, the Wikipedia of maps.
On a typical day, 2 million features are created, half a million modified and a quarter features deleted.

2. Validation

The OSM community, the heart beat of OpenStreetMap.
Interesting problems and inherent challenges.

3. Tools

OpenStreetMap changeset analyzer: https://osmcha.mapbox.com/
Rule based validation with: https://github.com/mapbox/osm-compare

4. Gabbar

Guarding OSM from invalid or suspicious edits.
Machine learning based infrastructure collaboratively build in the open.
Development workflow with Python data science tools.
Learning’s, current model performance and impact.

5. Future

Using AI to help make OSM the best map of the world!
Using open collaborative machine learning for open collaborative projects.

Speaker bio

Hey, I am Bhargav Kowshik, a Software Engineer at Mapbox, Bengaluru. I build tools to scale data operations at Mapbox. I am passionate about people and communities, open data and technology, creativity and side projects. Previously as the first engineer at Nextdrop, I helped build a platform to track water availability and consumption. You can contact me at:

Links

An open database of inconsistent edits observed on OSM: http://www.openstreetmap.org/user/manoharuss/diary/40118
Preparing accurate history and caching changesets: https://www.openstreetmap.org/user/geohacker/diary/40846
Common errors and unexplained edits observed: https://www.openstreetmap.org/user/nammala/diary/40338
Gabbar development and workflow: https://github.com/mapbox/gabbar/

Slides

https://bkowshik.github.io/fifth-elephant-2017/

The Fifth Elephant 2017

Gabbar: Machine learning to guard OpenStreetMap

Outline

1. OpenStreetMap (OSM)

2. Validation

3. Tools

4. Gabbar

5. Future

Speaker bio

Links

Slides

Comments