Make a submission

Accepting submissions till 15 Jun 2019, 01:00 PM

NIMHANS Convention Centre, Bengaluru

The eighth edition of The Fifth Elephant will be held in Bangalore on 25 and 26 July. A thousand data scientists, ML engineers, data engineers and analysts will gather at the NIMHANS Convention Centre in Bangalore to discuss:

  1. Model management, including data cleaning, instrumentation and productionizing data science.
  2. Bad data and case studies of failure in building data products.
  3. Identifying and handling fraud + data security at scale
  4. Applications of data science in agriculture, media and marketing, supply chain, geo-location, SaaS and e-commerce.
  5. Feature engineering and ML platforms.
  6. What it takes to create data-driven cultures in organizations of different scales.

Highlights:

1. Meet Peter Wang, co-founder of Anaconda Inc, and learn about why data privacy is the first step towards robust data management; the journey of building Anaconda; and Anaconda in enterprise.
2. Talk to the Fulfillment and Supply Group (FSG) team from Flipkart, and learn about their work with platform engineering where ground truths are the source of data.
3. Attend tutorials on Deep Learning with RedisAI; TransmorgifyAI, Salesforce’s open source AutoML.
4. Discuss interesting problems to solve with data science in agriculture, SaaS perspective on multi-tenancy in Machine Learning (with the Freshworks team), bias in intent classification and recommendations.
5. Meet data science, data engineering and product teams from sponsoring companies to understand how they are handling data and leveraging intelligence from data to solve interesting problems.

Why you should attend?

  1. Network with peers and practitioners from the data ecosystem
  2. Share approaches to solving expensive problems such as cleanliness of training data, model management and versioning data
  3. Demo your ideas in the demo session
  4. Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

Full schedule published here: https://hasgeek.com/fifthelephant/2019/schedule

Contact details:

For more information about The Fifth Elephant, sponsorships, or any other information call +91-7676332020 or email info@hasgeek.com

Sponsors:

Sponsorship Deck.
Email sales@hasgeek.com for bulk ticket purchases, and sponsoring 2019 edition of JSFoo:VueDay.

JSFoo:VueDay 2019 sponsors:

Platinum Sponsor

Anatta

Community Sponsors

Salesforce Ericsson freshworks
databricks

Exhibition Sponsors

Sapient Atlassian GO-JEK
Bayer

Bronze Sponsor

Sumologic Walmart Labs Atlan
Simpl Great Learning

Community Sponsors

Elastic Anaconda Aruba Networks

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Kumar Puspesh

@puspesh

10 steps to build-your-own data pipeline - for day 1 of your startup

Submitted Jan 14, 2019

We are a gaming company making mass market social games. Since being in a consumer market where user experience is the the key, we had to rely heavily on data from Day 1 of game/product launches. This is the reason we actually built our data infrastructure in parallel to games/products and had it ready for production usage from begining itself. We relied heavily on ready-to-use systems but at the same time had to be cost sensitive being a startup. Setting up whole data-lake and heavy duty hdfs cluster was ruled out due to cost and maintenance overhead. We setup a lightweight data collection pipline to central queues which is then ingested in realtime to our warehouse of choice Redshift (reason being ease-of-use). Also, scaling such a system has its cost overheads when your product grows. So we had to design data retention and data querying capabilities such that we aren’t paying hefty bills as well as aren’t being limited in terms of querying real-time data from our users.

Outline

  1. Be clear of Requirements and Constraints
    • Having a scalable system for data ingestion
    • Data design (Specific or Generic)
    • Querying interface - why stick to SQL?
  2. Take time to Design Data
    • Walking through example of generic table design
  3. Sort out Data production part first
    • Identify all possible data producers (and understand requirements). In our case -
    • Android/iOS app
      • Cannot keep sending each event over network
      • Cannot lose data even if app crashes or is killed
      • Keep out of context from the application itself
    • Microservice(s)
      • Cannot keep sending each event over network
      • Keep data collection agnostic of microservice itself
  4. Design v1.0 of Data pipeline
    • How and why we chose “anti-pattern”
  5. Choose/Design Data warehouse
    • Data design in Redshift
    • Compression ON for certain columns
    • Tuning for scale
    • Taking care of Querying patterns of Product Managers and Data scientists
  6. Open up: Enable many Data Interfaces
    • On demand Data loading and querying: OnDemand Table(s)
    • Flexibility for complicated analysis: Adhoc redshift cluster(s)
  7. Understand, Tune & Repeat
  8. Optimize for Usage
    • Added more columns at generic level e.g.
    • More examples
  9. Optimize for Cost & Ops
    • Retention policies of data
      • Not all events are of same importance
      • But all events should be accessible if required
  10. Upgrade to v2.0 of Data pipeline

Speaker bio

I am Kumar Puspesh, CTO and Co-Founder of Moonfrog, India’s top mobile gaming company. We had to design a large scale data infrastructre from day 1 of our company to cater to our product needs. Having a cost sensitive as well as scalable approach helped us achieve large scale as a gaming company in India in short amount of time. At the same time taught us a lot of ingenious ways of building large scale infra customized for business and its users (rather than a generic paid solution and then changing your usage/requirements based on that).

Slides

https://docs.google.com/presentation/d/1qYkGQLzK8UO-f59TFgkj1GROAmFLkotoQWMZNYXJbFQ/edit#slide=id.g4a9ee349ba_2_75

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Make a submission

Accepting submissions till 15 Jun 2019, 01:00 PM

NIMHANS Convention Centre, Bengaluru

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more