The Fifth Elephant 2014

A conference on big data and analytics

Mukesh Gangadhar


BDAS, the Berkeley Data Analytics Stack

Submitted Apr 15, 2014

This talk is an introduction to the features about the next generation, open source data analysis stack developed by UC Berkeley AMPLab.


BDAS is made up of multiple components and compatible with the Hadoop stack

  • Spark, a high speed cluster computing system with an ability to perform computations in memory.
  • Mesos, a cluster manager that provides efficient resource isolation and sharing across distributed applications
  • Tachyon, a fault tolerant distributed file system enabling reliable file sharing at memory-speed across cluster frameworks
  • MLBase, a platform for implementing and consuming Machine Learning techniques at scale
  • Shark, a port of Apache Hive onto Spark that is compatible with existing Hive warehouses and queries
  • Spark Streaming extends Spark to build scalable fault-tolerant streaming applications
  • GraphX, extends Spark with an ability to deal with structured graph data


Participants should have basic understanding about Big Data concepts and Hadoop.

Speaker bio

Working on software for more than 15 years, with a focus towards improving performance and optimization of applications and algorithms. Interests include Big Data, parallelism, algorithm optimization etc...


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

All about data science and machine learning