The Fifth Elephant 2012

Finding the elephant in the data.

What are your users doing on your website or in your store? How do you turn the piles of data your organization generates into actionable information? Where do you get complementary data to make yours more comprehensive? What tech, and what techniques?

The Fifth Elephant is a two day conference on big data.

Early Geek tickets are available from

The proposal funnel below will enable you to submit a session and vote on proposed sessions. It is a good practice introduce yourself and share details about your work as well as the subject of your talk while proposing a session.

Each community member can vote for or against a talk. A vote from each member of the Editorial Panel is equivalent to two community votes. Both types of votes will be considered for final speaker selection.

It’s useful to keep a few guidelines in mind while submitting proposals:

  1. Describe how to use something that is available under a liberal open source license. Participants can use this without having to pay you anything.

  2. Tell a story of how you did something. If it involves commercial tools, please explain why they made sense.

  3. Buy a slot to pitch whatever commercial tool you are backing.

Speakers will get a free ticket to both days of the event. Proposers whose talks are not on the final schedule will be able to purchase tickets at the Early Geek price of Rs. 1800.

Hosted by

All about data science and machine learning

Kaushik Paranjape

Sharding MySQL

Submitted Jun 28, 2012

Learn about how MySQL database can be scaled up to for your requirements.


Start ups that have a web based product typically start with a database that can fit on a single instance. As the product grows and the client base grows, they start facing scaling issues. As the data grows query time increases, insert time increases, the data grows out of bound. Tech team starts looking for scaling solutions, they look at no-sql databases, hadoop based solutions and what not. We at sokrati have gone through this phase and finally decided to come up with our own sharding solution. Sharding is a simple concept where data on a single instance of mysql (one shard) is kept within limits, once the shard is full or tends to become full you bring up another instance and data is populated on the other shard.
This sharding solution solves multiple problems

  • Data is distributed across multiple shards, making the queries faster.

  • Fancy dbs like no-sql are not required, hence reducing the re-engineering of all the apps

  • There is a single point of contact (sharding db) for accessing any database, hence slave credentials can be returned depending on type of access (read only accesses can go the slave).

  • Multiple slaves can be added to achieve load balancing.

  • If client has data-secrecy issues and wants its data to be hosted separately, can be easily achieved by having a separate shard for that client.

Speaker bio

Kaushik Paranjape is a Co-Founding Geek at Sokrati. Sokrati is the leading Search Marketing platform in India.


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

All about data science and machine learning