In 2013, commodity hardware and computing capacity for storing and processing large and small volumes of data are easily available on demand. The bigger issues pertain to questions of how to scale data processing, handle data diversity, manage infrastructure costs, decide which technologies work best for different contexts and problems, and build products from the insights and intelligence that the data is presenting to you.
The Fifth Elephant 2013 is a three-day workshop and conference on big data, storage and analytics, with product demos and hacker corners.
Event format, themes and submission guidelines
The Fifth Elephant 2013 invites proposals on use cases and real-life examples. Tell us what specific problem you faced, which technology/tools worked for your use case and why, how you have developed business intelligence on the data you are collecting, and analytics tools and techniques you employ. Our preference is for showcasing original work with clear take-aways for the audience. Please emphasize these in your proposal.
The conference will have two parallel tracks on 12th and 13th July:
- Storage: OLTP, messaging and notifications, databases and big data, NoSQL
- Analytics: Metrics and tools, cloud computing, mathematical modelling and statistical analysis, visualization
This year we are adding a preliminary day of workshops, on 11th July, to provide attendees more in-depth, hands-on training on open source frameworks and tools (Pig, Hadoop, Hive, etc), commercial solutions (sponsored), programming languages such as R, and visualization techniques and tricks, among others.
Product demos and sponsored sessions
We have a demo track for startups and companies who want to showcase their product to customers at The Fifth Elephant 2013 and get feedback. Slots are also open for 4-6 sponsored sessions for companies who want to talk about their technologies and reach out to developers, CTOs, CIOs and product managers at The Fifth Elephant. For more information on demo and sponsored session proposals, write to firstname.lastname@example.org.
Commitment to open source
HasGeek believes in open source as the foundation of the internet. Our aim is to strengthen these foundations for future generations. If your talk describes a codebase for developers to work with, we require that it is available under a license that does not impose itself on subsequent work. This is typically a permissive open source license (almost anything that is listed at opensource.org/licenses and is not GPL or AGPL), but restrictive and commercial licenses are also considered depending on how they affect the developer’s relationship with the user.
If you’d like to showcase commercial work that makes money for you, please consider supporting the event with a sponsorship.
Proposal selection process
Voting is open to attendees who have purchased event tickets. If there is a proposal you find notable, please vote for it and leave a comment to initiate discussions. Your vote will be reflected immediately, but will be counted towards selections only if you purchase a ticket. Proposals will also be evaluated by a program committee consisting of:
- Gopal Vijayraghavan, Hortonworks
- Govind Kanshi, Microsoft
- Joydeep Sen Sharma, Qubole
- Srinivasan Seshadri (Sesh), Boltell
Emphasis will be placed on original work and talks which present new insights to the audience.
The programme committee will interview proposers who have received maximum votes from attendees and the committee. Proposers must submit presentation drafts as part of the selection process to ensure the talk is in line with the original proposal and to help the program committee build a coherent line-up for the event.
There is only one speaker per session. Attendance is free for selected speakers. HasGeek will cover your travel to and accommodation in Bangalore from anywhere in the world. As our budget is limited, we will prefer speakers from locations closer home, but will do our best to cover for anyone exceptional. If you are able to raise support for your trip, we will count that towards an event sponsorship.
If your proposal is not accepted, you can buy a ticket at the same rate as was available on the day you proposed. We’ll send you a code.
Discounted tickets are available from http://fifthelephant.doattend.com/
The program committee will announce the first round of selected proposals by end of April, a second round by end-May, and will finalize the schedule by 20th June. The funnel will close on 5th June. The event is on 11th-13th July 2013.
Customizing One Database for Your Multiple Data Structures
Attendees will gain hands-on experience on how to address the challenge of managing multiple types of structured and unstructured data by customizing data-structures to accurately represent their data as it exists and is queried in its natural form, to attain an impedance match between data in the wild and its model. The data structures will then go through a series of customizations to optimize for performance, scalability, data-size, ease-of-use, etc.
Historically, distributed databases have imposed restrictions on data models (e.g. structured-data-model, range-based partitioning) on data requiring high locality. In addition, databases have dictated that data be stored in either row or column oriented storage, which causes an explosion in I/O operations as the size of a row/column increases, triggering adverse side effects in subsystems such as replication.
This workshop will explore the architectural decisions that we made in developing Aerospike to address these issues while being a reliable, fast, distributed, and linearly scalable database cluster.
We will introduce the LinkedRecord API, which gives end-users the ability to define linked data structures (with guaranteed data locality) within the database. Linked data structures can be used to represent a logical data structure that is physically stored as many records, which in practice contains the I/O explosion described above.
We will also cover Aerospike’s user-defined functions, the underpinnings of the LinkedRecord API, for use in both single-record and multi-record operations. Each of these features have moved Aerospike from a key-value store to a rich, robust data platform, allowing users to build databases for their applications, rather than applications for their database.
The audience of this advance-level workshop will:
• Gain an understanding of the Aerospike data platform.
• Learn how easy it is to install and run an Aerospike cluster.
• Develop a simple application creating their own linked data structures and user-defined functions.
Any reasonably modern laptop will do to setup an Aerospike client and server. Operating system must be CentOS 6.x or Linux 12.x; virtual machines are OK.
Russell Sullivan, Aerospike’s principal architect and performance guru, is leading the integration of AlchemyDB capabilities into the Aerospike real-time NoSQL database and further enhancing its performance. An expert in high-performance and distributed systems, he was the founder and developer of AlchemyDB, an ultra high velocity open source database and the first database to combine SQL, NoSQL key-value store, document store, and GraphDB. Russell was also a key contributor to the performance enhancements of the Redis open source key-value store upon which AlchemyDB was built. Prior to founding AlchemyDB, Russell was a principal engineer of the Makara platform-as-a-service (PaaS), which was acquired by RedHat. He served as vice president of engineering and operations with BE2.com, one of the largest and fastest growing dating sites in the world with 100 million members. He also served as a software architect at Lycos Europe, senior software engineer at 24/7 Real Media, and software programmer at the National Institute of Health (NIH).
- High Scalability Blog on database speed: http://highscalability.com/blog/2012/9/10/russ-10-ingredient-recipe-for-making-1-million-tps-on-5k-har.html
- Russ Sullivan’s Aerospike blog on database speed: http://www.aerospike.com/blog/all-about-speed