The Fifth Elephant 2016

India's most renowned data science conference

The Fifth Elephant is India’s most renowned data science conference. It is a space for discussing some of the most cutting edge developments in the fields of machine learning, data science and technology that powers data collection and analysis.

Machine Learning, Distributed and Parallel Computing, and High-performance Computing continue to be the themes for this year’s edition of Fifth Elephant.

We are now accepting submissions for our next edition which will take place in Bangalore 28-29 July 2016.

#Tracks

We are looking for application level and tool-centric talks and tutorials on the following topics:

  1. Deep Learning
  2. Text Mining
  3. Computer Vision
  4. Social Network Analysis
  5. Large-scale Machine Learning (ML)
  6. Internet of Things (IoT)
  7. Computational Biology
  8. ML in healthcare
  9. ML in education
  10. ML in energy and ecology
  11. ML in agriculrure
  12. Analytics for emerging markets
  13. ML in e-governance
  14. ML in smart cities
  15. ML in defense

The deadline for submitting proposals is 30th April 2016

Format

This year’s edition spans two days of hands-on workshops and conference. We are inviting proposals for:

  • Full-length 40 minute talks.
  • Crisp 15-minute talks.
  • Sponsored sessions, 15 minute duration (limited slots available; subject to editorial scrutiny and approval).
  • Hands-on Workshop sessions, 3 and 6 hour duration.

Selection process

Proposals will be filtered and shortlisted by an Editorial Panel. We urge you to add links to videos / slide decks when submitting proposals. This will help us understand your past speaking experience. Blurbs or blog posts covering the relevance of a particular problem statement and how it is tackled will help the Editorial Panel better judge your proposals.

We expect you to submit an outline of your proposed talk – either in the form of a mind map or a text document or draft slides within two weeks of submitting your proposal.

We will notify you about the status of your proposal within three weeks of submission.

Selected speakers must participate in one-two rounds of rehearsals before the conference. This is mandatory and helps you to prepare well for the conference.

There is only one speaker per session. Entry is free for selected speakers. As our budget is limited, we will prefer speakers from locations closer home, but will do our best to cover for anyone exceptional. HasGeek will provide a grant to cover part of your travel and accommodation in Bangalore. Grants are limited and made available to speakers delivering full sessions (40 minutes or longer).

Commitment to open source

HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source licence. If your software is commercially licensed or available under a combination of commercial and restrictive open source licences (such as the various forms of the GPL), please consider picking up a sponsorship. We recognise that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.

Key dates and deadlines

  • Revised paper submission deadline: 17 June 2016
  • Confirmed talks announcement (in batches): 13 June 2016
  • Schedule announcement: 30 June 2016
  • Conference dates: 28-29 July 2016

##Venue
The Fifth Elephant will be held at the NIMHANS Convention Centre, Dairy Circle, Bangalore.

##Contact
For more information about speaking proposals, tickets and sponsorships, contact info@hasgeek.com or call +91-7676332020.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Aruna S

@arunasank

Reducing the world with JavaScript

Submitted Apr 30, 2016

The Earth is a staggering dataset.

OpenStreetMap is the largest living open map of the world with a collection of over 1B mapped roads and ~2B mapped buildings. Processing this massive dataset can lead to a lot of interesting analyses about the world, but can also be really slow - enter the open source TileReduce module.

We use TileReduce for a lot of our geo-data processing requirements at Mapbox. TileReduce breaks down the processing of a massive amount of geo-data using the MapReduce concept on vector tiles. For example, a computer with 4 cores can process over 16B tiles, and filter ~90k features and 2B tiles based on pre-defined conditions in ~10 minutes.

TileReduce can be run on vector tilesets, which means that it can work on any data format(GeoJSON, osm files, shapefiles, and CSV and xml files having geo-data) that can be converted into a vector tileset.

This talk will cover the JavaScript principles behind TileReduce, how parallelising works on TileReduce, the skeleton of a simple program written with TileReduce and showcase some ways in which you can use TileReduce in your projects.

Outline

A single image of the world containing street level data would be far too large to be held in memory or to be able to be downloaded at once - which leads us to map tiles. Map tiles are usually 256px by 256px in size and are placed next to each other to create the illusion of a single image.

Tiles can be either rasters or vectors. While raster tiles contain pixel information and require processing on the server side before being rendered, vector tiles hold tile data in either a human readable geojson format or as protobufs, making them easier and faster to process and render on the client side. Vector tiles also contain useful geodata that can be parsed.

tr

Greater the number of tiles that a map is composed of, the greater detail that map can show. In order to manage millions of tile images/data, web maps use a simple coordinate system. Each tile has a z coordinate describing its zoom level and x and y coordinates describing its position within a square grid for that zoom level: z/x/y.

Zoom levels are related to each other by powers of four:

  • z0 contains 1 tile.

  • z1 contains 4 tiles.

  • z2 contains 16 tiles.

    ....

    ....

    ....

  • zn contains 2^n * 2^n tiles

As you can see, the number of tiles increases exponentially with the zoom level, which leads to an exponential increase in bandwidth and memory requirements, not to mention a greater difficulty in parsing and analysing such a lot of data.

TileReduce was written to process, or mine geodata from, these millions of tiles asynchronously using the MapReduce concept on vector tiles, making it one of the fastest ways to parse tile data for the whole world. Compared to the complex postgres queries you would have to write to do the same operations, TileReduce is also an extremely easy library to use.

Outline

  • Web Maps
  • Tiles
    • Vector and Raster
    • Tile formats.
    • What does a simple, human readable vector tile contain?
  • JavaScript Map Reduce
  • Asynchronous processing
  • Vector Tiles and Map Reduce
  • TileReduce
    • Background and history.
    • Vector tiles - tools to convert certain data types to vector tiles.
    • Program skeleton
    • Examples
    • Limitations
  • Q&A

Speaker bio

I am a developer at Mapbox. I’ve built several programs that use TileReduce to run analyses on OpenStreetMap data, and have been part of sessions at Mapbox on the JS principles that TileReduce was built on - so I understand the theory well. I also think it is one of the most beautiful tools to process geographic data and hope to share this same fascination that I have for the tool with the audience at the Fifth Elephant. Earlier, I have written 2D games using the Cocos2D engine, submitted small activities to the GCompris project and written application documentation for the GNOME Foundation. I also enjoy collecting recordings of Indian classical music that are in the public domain.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more