The Fifth Elephant is rated as India’s best conference on big data, data science and application of data to real-life use cases.
It is a conference for practitioners, by practitioners. The Fifth Elephant completed its seventh edition in Bangalore, on 26 and 27 July 2018. The Bangalore edition caters to data and ML engineers, architects, technologists, data scientists, product managers, researchers and business decision-makers.
Kumar Puspesh, CTO and co-founder at Moonfrog
Data pipeline on day one of your startup: cost and scale sensitive!Rough outline Business Requirements Usecase Having a scalable system for data ingestion Data design - Specific or Generic? Querying interface - why stick to SQL? Query interface users - skills, requirements and expectations Data ingestion High throughput stats service Thin client: Badger High throughput Ingestion backend Hot loading to Redshift Data Warehousing Data design in Redshift and data lake Tuning for scale Taking care of Querying patterns of Product Managers and Data scientists S3 as Data Lake On demand Data loading and querying: OnDemand Table(s) Gotchas Flexibility for complicated analysis: Adhoc redshift cluster(s) Gotchas Scaling up Typical bottlenecks and solutions we tried Learnings
Jayesh Sidhwani, Data Infrastructure Team Lead at Hotstar
Patterns for building a scalable Data PlatformIngestion Patterns Unified Ingestion Proxy Schema Definitions In-flight enrichments Highly Available Storage Patterns Decouple storage and compute Query Lineage & Optimization Noisy Neighbour Consumption Patterns Single GUI and a programmatic interface. All the magic underneath Parity between streaming and stationary data
Venkata Pingali, CEO and co-founder at Scribble Data
Reducing cost of production AI: a feature engineering case studyFeature Engineering Overview Typical Feature Engineering Cycle Trends Detailed Cost Drivers Examples: Reconciliation & auditing, change management Indicative Quantitative Improvement Detailed discussion of each driver
Piyush Srivastava, Director of Engineering for Delivery Team at Swiggy
Role of data in solving capacity and efficiency problems in real-time logisticsIntroduction and Context The Capacity Problem - what is it; why it is important? The Efficiency Problem - what, why and the necessary trade-offs Data and its Nature Challenges with Accurate Data Capture Challenges with high Variance Real-time Vs. historical data Representing Capacity Aggregated capacity (Zone-level) Point-in-time-capacity (Order-level) Journey and Results: Solving for Capacity Efficiency Levers Predictions and accounting for errors Trade-offs Optimal Assignment Batching Aggregate Analysis Vs. Specific Analysis Pitfalls of Aggregate Analysis Conclusions
Kaushik Bhatt, Vice President at Wells Fargo
Data governance: lessons on data usage and data controls from finance domainData Governance session outline will cover, - systematic approach to identifying enterprise data assets, who owns them and who can access them - data protection approach - data catalogue, data profiling and data quality