The Fifth Elephant 2016

India's most renowned data science conference

shubham sharma

@gabber12

Apache Drill - Optimising Time to market

Submitted Apr 30, 2016

Data is more than doubling up every year. With semi-structured data growing at a much larger pace than structured data and data flowing from different sources having different data types, much of one’s time is wasted in defining schemas and transformations. Often, the schemas are unknown upfront, as datasets are evolving in highly dynamic ways. And current systems are unable to let us query dynamically evolving datasets effectively.

Enter Apache Drill!!

Apache Drill enables self-service data exploration on big data with a schema-free SQL query engine.

It is an open source, low-latency query engine for big data that delivers secure and interactive SQL analytics at petabyte scale. With the ability to discover schemas on-the-fly, Drill is a pioneer in delivering self-service data exploration capabilities on data stored in multiple formats in files or NoSQL databases. Drill is fully ANSI SQL compliant and integrates seamlessly with visualization tools.

It supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google’s Dremel system which is available as an infrastructure service called Google BigQuery. One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. Drill is an Apache top-level project.

Anyone who does/wants to query and analyse large, highly dynamic data-sets with data stored in multiple formats would find something exciting in this talk.

Outline

  • What is Apache Drill?
  • Drill vs Spark SQL (an oft asked yet incorrect-to-ask question)
  • Why should you use it?
    • Schema on the fly - lower barrior to entry
    • Single abstraction of ANSI SQL(Not "SQL LIKE”).
    • Fast with Flexibilty
  • How it can connect to with your current systems?
  • Brief demo on how to get started.

Requirements

If you want to follow along bring your laptops, preferrably with Apache Drill installed.

Speaker bio

I have been doing data engineering and visualisation for past 2 years. I spend most of time analysing and making sense of ever evolving massive datasets.
Currently working at Finomena a Bangalore based Fin-Tech startup.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures