The Fifth Elephant is India’s most renowned data science conference. It is a space for discussing some of the most cutting edge developments in the fields of machine learning, data science and technology that powers data collection and analysis.
Machine Learning, Distributed and Parallel Computing, and High-performance Computing continue to be the themes for this year’s edition of Fifth Elephant.
We are now accepting submissions for our next edition which will take place in Bangalore 28-29 July 2016.
We are looking for application level and tool-centric talks and tutorials on the following topics:
- Deep Learning
- Text Mining
- Computer Vision
- Social Network Analysis
- Large-scale Machine Learning (ML)
- Internet of Things (IoT)
- Computational Biology
- ML in healthcare
- ML in education
- ML in energy and ecology
- ML in agriculrure
- Analytics for emerging markets
- ML in e-governance
- ML in smart cities
- ML in defense
The deadline for submitting proposals is 30th April 2016
This year’s edition spans two days of hands-on workshops and conference. We are inviting proposals for:
- Full-length 40 minute talks.
- Crisp 15-minute talks.
- Sponsored sessions, 15 minute duration (limited slots available; subject to editorial scrutiny and approval).
- Hands-on Workshop sessions, 3 and 6 hour duration.
Proposals will be filtered and shortlisted by an Editorial Panel. We urge you to add links to videos / slide decks when submitting proposals. This will help us understand your past speaking experience. Blurbs or blog posts covering the relevance of a particular problem statement and how it is tackled will help the Editorial Panel better judge your proposals.
We expect you to submit an outline of your proposed talk – either in the form of a mind map or a text document or draft slides within two weeks of submitting your proposal.
We will notify you about the status of your proposal within three weeks of submission.
Selected speakers must participate in one-two rounds of rehearsals before the conference. This is mandatory and helps you to prepare well for the conference.
There is only one speaker per session. Entry is free for selected speakers. As our budget is limited, we will prefer speakers from locations closer home, but will do our best to cover for anyone exceptional. HasGeek will provide a grant to cover part of your travel and accommodation in Bangalore. Grants are limited and made available to speakers delivering full sessions (40 minutes or longer).
Commitment to open source
HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source licence. If your software is commercially licensed or available under a combination of commercial and restrictive open source licences (such as the various forms of the GPL), please consider picking up a sponsorship. We recognise that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.
Key dates and deadlines
- Revised paper submission deadline: 17 June 2016
- Confirmed talks announcement (in batches): 13 June 2016
- Schedule announcement: 30 June 2016
- Conference dates: 28-29 July 2016
The Fifth Elephant will be held at the NIMHANS Convention Centre, Dairy Circle, Bangalore.
For more information about speaking proposals, tickets and sponsorships, contact firstname.lastname@example.org or call +91-7676332020.
Distributed Computing Abstractions for Big Data Science
The data science field has made significant advances in the last few years, with a renewed focus on getting data science to work at scale. The talk shall outline distributed computing abstractions required to realize data science at scale. The Resilient Distributed DataSet (RDD) abstraction provided by Spark is becoming a de-facto approach for big data science. However, Apache Flink and recently, Concord have emerged as interesting alternatives to Spark and provide streaming dataflow abstractions – while Spark can achieve real-time analytics by mini-batching, Flink’s allows event streaming as a first class abstraction and provides exactly once guarantees. TensorFlow also provides a dataflow abstraction for deep learning nteworks. TensorFlow has recently released distributed version by using gRPC or by integrating with cluster management systems such as Kubernetes. Graph processing abstractions are useful in realizing complex algorithms on large real-life natural power law graphs such as Twitter or LinkedIn graphs. GraphLab and Titan are the prominent graph processing systems. GraphLab provides an efficient partitioning mechanism to split a large graph across a cluster of nodes and run algorithms at scale. It must be noted that common machine learning algorithms such as clustering or classification as well as deep learning can be realized on top of graph processing abstractions. Titan graph DB has very good integration with several NoSQLs as data sources including Cassandra and HBase as well as processing engines for machine learning including Spark, Giraph and Hadoop. We also outline our experience of implementing machine learning and deep learning algorithms over many of these abstractions.
The key audience takeaways include:
Implementation details of machine learning algorithms over several distributed computing frameworks such as Spark, GraphLab, Flink and TensorFlow.
State-of-art review of big data science – right from distributed TensorFlow to Dato to Flink, audience get a feel for cutting edge technology in the field.
Discussion of pros and cons of similar frameworks and when to use them – for instance, trade-offs between Apache Spark and Flink and when to use one over the other (if you need low latency event specific processing use Flink or use Spark-streaming when you need high throughput processing not requiring CEP). Similarly trade-offs between GraphLab and Titan, when to use one over the other.
- Introduction to Apache Spark, Flink. ML/Deep Learning on top of Spark/Flink with code.
- Introduction to TensorFlow - distributed deep learning.
- Introduction to GraphLab/Titan - ML/deep learning on top of GraphLab/Titan with code.
Dr. Vijay Srinivas Agneeswaran has a Bachelor’s degree in Computer Science & Engineering from SVCE, Madras University (1998), an MS (By Research) from IIT Madras in 2001, a PhD from IIT Madras (2008) and a post-doctoral research fellowship in the LSIR Labs, Swiss Federal Institute of Technology, Lausanne (EPFL). He has joined as Director of Technology in the data sciences team of SapientNitro. He has spent the last ten years creating intellectual property and building products in the big data area in Oracle, Cognizant and Impetus. He has built PMML support into Spark/Storm and realized several machine learning algorithms such as LDA, Random Forests over Spark. He led a team that designed and implemented a big data governance product for a role-based fine-grained access control inside of Hadoop YARN. He and his team have also built the first distributed deep learning framework on Spark. He is a professional member of the ACM and the IEEE (Senior) for the last 10+ years. He has four full US patents and has published in leading journals and conferences, including IEEE transactions. His research interests include distributed systems, data sciences as well as Big-Data and other emerging technologies. He has been an invited speaker in several national and International conferences such as O’Reilly’s Strata Big-data conference series. He lives in Bangalore with his wife, son and daughter and enjoys researching history and philosophy of Egypt, Babylonia, Greece and India.
- ACM Distinguished Speaker
- Big Data Analytics Beyond Hadoop book
- O’Reilly Strata Conference presentation:
- Strata conf video:
- Video of big data beyond Hadoop Webinar
- LinkedIn Profile
- US Patents:
- Keynote speaker at the fifth elephant conference 2014 -