Jul 2018
23 Mon
24 Tue
25 Wed
26 Thu 07:45 AM – 06:15 PM IST
27 Fri 07:45 AM – 05:35 PM IST
28 Sat
29 Sun
devjyoti
Data Driven performance management of Big Data Infrastructure is very different from performance management of standard applications like web servers. A single cluster is submitted multiple simultaneous discrete applications where each of these applications can comprise up to hundreds of thousands of tasks of varying complexities. If these jobs are not tuned properly, then it’s easy to both blow up the costs because of an underutilized cluster or starve the jobs and miss SLA’s because of shortage of resources.
This talk is targeted towards engineers who administer Big Data Clusters and would like to improve the efficiency and utilization of their clusters using a data-driven methodology.
Say, You have been storing the job characteristics for SQL queries that are run on you cluster
And you also know the layout of the data which form the input to these queries
With these two datasets, stored over a period of time, we will try to answer the following questions:
Though, there are other parameters like Cluster Configuration and Cluster Resource Allocation which also affect the job’s performance, but we will keep the scope of this talk limited to the Job Statistics and Data Layout. Also, we are going to discuss analysis of only the SQL workloads, which form the major percentage of jobs running on Hive, Spark or Presto clusters.
To serve these needs, we built Tenali, Qubole’s SQL parser and analyzer which we intend to open source shortly. Tenali is a collection of scoping rules and heuristics, that given a set of queries and corresponding job characteristics, generate insights to improve the jobs efficiency.
Understanding of Data tools like Hadoop, Hive, Spark, etc.,
Familiarity with ML nomenculature like Classification, Clustering, Nearest Neighbour, etc.,
Devjyoti is working with Qubole as Data Engineer and helps the company gain more insights into the performance of its data processing tools.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}