BDAS, the Berkeley Data Analytics Stack
Mukesh Gangadhar
@mukgbv
This talk is an introduction to the features about the next generation, open source data analysis stack developed by UC Berkeley AMPLab.
Outline
BDAS is made up of multiple components and compatible with the Hadoop stack
- Spark, a high speed cluster computing system with an ability to perform computations in memory.
- Mesos, a cluster manager that provides efficient resource isolation and sharing across distributed applications
- Tachyon, a fault tolerant distributed file system enabling reliable file sharing at memory-speed across cluster frameworks
- MLBase, a platform for implementing and consuming Machine Learning techniques at scale
- Shark, a port of Apache Hive onto Spark that is compatible with existing Hive warehouses and queries
- Spark Streaming extends Spark to build scalable fault-tolerant streaming applications
- GraphX, extends Spark with an ability to deal with structured graph data
-
Requirements
Participants should have basic understanding about Big Data concepts and Hadoop.
Speaker bio
Working on software for more than 15 years, with a focus towards improving performance and optimization of applications and algorithms. Interests include Big Data, parallelism, algorithm optimization etc…
{{ errorMsg }}