The Fifth Elephant 2014

A conference on big data and analytics

BDAS, the Berkeley Data Analytics Stack

Submitted by Mukesh Gangadhar (@mukgbv) on Tuesday, 15 April 2014

videocam_off

Technical level

Beginner

Section

Crisp talk

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +4

Objective

This talk is an introduction to the features about the next generation, open source data analysis stack developed by UC Berkeley AMPLab.

Description

BDAS is made up of multiple components and compatible with the Hadoop stack
- Spark, a high speed cluster computing system with an ability to perform computations in memory. - Mesos, a cluster manager that provides efficient resource isolation and sharing across distributed applications - Tachyon, a fault tolerant distributed file system enabling reliable file sharing at memory-speed across cluster frameworks - MLBase, a platform for implementing and consuming Machine Learning techniques at scale - Shark, a port of Apache Hive onto Spark that is compatible with existing Hive warehouses and queries - Spark Streaming extends Spark to build scalable fault-tolerant streaming applications - GraphX, extends Spark with an ability to deal with structured graph data -

Requirements

Participants should have basic understanding about Big Data concepts and Hadoop.

Speaker bio

Working on software for more than 15 years, with a focus towards improving performance and optimization of applications and algorithms. Interests include Big Data, parallelism, algorithm optimization etc...

Comments

Login with Twitter or Google to leave a comment