The Fifth Elephant 2012

Finding the elephant in the data.

What are your users doing on your website or in your store? How do you turn the piles of data your organization generates into actionable information? Where do you get complementary data to make yours more comprehensive? What tech, and what techniques?

The Fifth Elephant is a two day conference on big data.

Early Geek tickets are available from fifthelephant.doattend.com.

The proposal funnel below will enable you to submit a session and vote on proposed sessions. It is a good practice introduce yourself and share details about your work as well as the subject of your talk while proposing a session.

Each community member can vote for or against a talk. A vote from each member of the Editorial Panel is equivalent to two community votes. Both types of votes will be considered for final speaker selection.

It’s useful to keep a few guidelines in mind while submitting proposals:

  1. Describe how to use something that is available under a liberal open source license. Participants can use this without having to pay you anything.

  2. Tell a story of how you did something. If it involves commercial tools, please explain why they made sense.

  3. Buy a slot to pitch whatever commercial tool you are backing.

Speakers will get a free ticket to both days of the event. Proposers whose talks are not on the final schedule will be able to purchase tickets at the Early Geek price of Rs. 1800.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Ashok Banerjee

@ashokbanerjee

Exponential Growth Models and Impact on Sales Forecast,Data Volume, Query Latency, Capacity Planning and Search Latency

Submitted Jun 14, 2012

We often loosely talk about exponential growth in this talk we will delve into the mathematical models of when a domain or market will undergo exponential growth. We often mistakenly believe the execution of one company is better than that of another, when in fact the domains and fundamental mathematical growth models of the 2 markets are in fact different.

Exponential growth and exponential decay are often seen in many domains not just business. These mathematical models have great fertility, from the growth of bacteria in your mouth every night, to the growth of population, to the spread of infections, to distribution of allergens, dust or mosquitos, to radioactive decay, to revolutions in the middle east and the decay of interest in topics on Twitter, to the decay of your sorrows, infatuation and many more domains.

This lecture will help users connect to the spaces/domains their current businesses, their lives via a fundamental mathematical model.

This understanding informs everything, from scaling of database, to scaling of message systems (the 2 have very different challenges), to demand forecasting, inventory planning to operations planning for (base, trend, seasonality and spike) and even staffing. Most often organizations undergoing these changes cannot comprehend the challenges that barrel at them but this structure enables deeper thinking.

We will also talk at the end time permitting when the exponential growth really ends and how the “epidemic” stabilizes.

This talk will not work at just 30 minutes - the lowest we can target is 40-45 minutes with opportunity to ask some questions.

I would highly highly advise not to be late, the first slide is where we will spend a lot of time and really build on a few basic concepts. If you need to leave early but dont come late :)

Outline

-Exponential Growth markets
---- Analyze mathematically the market spaces
Word of mouth Facebook, Foursquare, Twitter, Flipkart
Advertising driven growth and mathematical model

---- Model Fertility in other domains
- Radioactive decay
- Revolution, Love, Infatuation
- Mosquito/Allergens distribution with height

----- Basic Demand Forecasting
Base demand
Trends on demand
Seasonality on demand
Peak Demand Modelling (Exponential + Poisson)

---- Impact to OLTP
Web Scaling
Scaling Message Systems - traditional databases ->Custom solutions
Caching
Large DB scaling (compression, indexing, archiving, sharding and federated query)

---- Impact to OLAP
Hadoop and Scaling operations
Recommendation systems and the impact from time series

When does exponential growth end?
Epidemic models applied here
How to prevent exponential growth of a competitor (vaccine models in disease spread)

Q&A

Requirements

I will be working from the basics so people need not do anything special. But being late for a 1st slide will diminish understanding of the entire flow significantly.

Repeating please leave early if you need to, but do avoid coming late the entire thing may just be missed then.

Speaker bio

Ashok Banerjee is VP of Data Platform and Supply Chain Engineering at Flipkart and has to date 22 patents approved and counting. Prior to Flipkart Ashok has worked at Twitter in San Francisco and Google in Mountain View.

Experience Summary (reverse chronologically)

Ashok today leads the technology team for Data Platform and the largest online Supply Chain infrastructure in India (Flipkart)

  • At Google he led a large scale Datawarehouse infrastructure which converts SQL (approximately) into execution on a platform built on MapReduce, GFS, Columnar compressed data using block oriented computing. This was at the scale of many billion rows added per day (cannot disclose how many billions)
  • At Google Ashok had led the payment processing infrastructure which processes payments for Adwords, Adsense, Checkout and Google Apps
    At BEA he worked on WebLogic Server and led infrastructure teams on EJB Container, Web Container, Classloading, Application Deployment within a Server etc.
  • At Oracle Ashok led the Oracle Application Server Clustering infrastructure and also worked on EJB container and RMI-IIOP Protocols

Ashok takes interest in Large Data Systems (Databases and alternative databases - NOSQL, Message Systems), Parallel Computing, Distributed Systems, Fault Tolerant Computing, Database, Recommendation Systems, Supply Chain and Mathematical Models and Investments.

On the non-work side Ashok enjoys - sailing, wind surfing, horse riding, german shepherd dogs and soccer.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more