BEGIN:VCALENDAR VERSION:2.0 PRODID:-//HasGeek//NONSGML Funnel//EN DESCRIPTION:The seventh edition of India's best data conference NAME:The Fifth Elephant 2018 REFRESH-INTERVAL;VALUE=DURATION:PT12H SUMMARY:The Fifth Elephant 2018 TIMEZONE-ID:Asia/Kolkata X-PUBLISHED-TTL:PT12H X-WR-CALDESC:The seventh edition of India's best data conference X-WR-CALNAME:The Fifth Elephant 2018 X-WR-TIMEZONE:Asia/Kolkata BEGIN:VEVENT SUMMARY:Check-in and breakfast DTSTART;VALUE=DATE-TIME:20180726T021500Z DTEND;VALUE=DATE-TIME:20180726T033000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/J1MoygcZ8ueQKndiC2NLKY@hasgeek.com CREATED;VALUE=DATE-TIME:20180521T011739Z LAST-MODIFIED;VALUE=DATE-TIME:20180720T120637Z ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Check-in and breakfast in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Introduction to The Fifth Elephant DTSTART;VALUE=DATE-TIME:20180726T033000Z DTEND;VALUE=DATE-TIME:20180726T034000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/hu831q9zUTzekCHGvbXK2@hasgeek.com CREATED;VALUE=DATE-TIME:20180522T024817Z LAST-MODIFIED;VALUE=DATE-TIME:20180720T120620Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Introduction to The Fifth Elephant in Auditorium 1 in 5 minute s TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:So you think you know about linear regression ... DTSTART;VALUE=DATE-TIME:20180726T034000Z DTEND;VALUE=DATE-TIME:20180726T043000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/4LAQEmP8w5iZMZGrMmg9au@hasgeek.com CATEGORIES:Full talk,Beginner CREATED;VALUE=DATE-TIME:20180703T111832Z DESCRIPTION:1. Introduce the idea of a Bayesian posterior\, and illustrate the general idea with PyMC.\n2. Show how to set up linear regression in P yMC\, and generate a sample of answers that illustrate the model uncertain ty\, how uncertainty varies with sample size\, etc.\n2a) Illustrate in a p ractical example\, namely predicting Cricket or Baseball scores (batter vs bowler).\n3. Show how ordinary least squares corresponds to maximizing li kelihood\, and why various assumption violations make maximization impossi ble. Show how bayesian perspective still works\, just gives different pict ure.\n4. Many violations of the rules of linear regression stem from unrea sonable solutions to the problem. I'll show that if we incorporate reasona ble assumptions into our model (Bayesian priors)\, then we get reasonable results out. \n4a) If you do naive OLS on batter vs bowler data set\, you get crazy results for batters who've only played in 1 or 2 games. But you can fix this by making reasonable assumptions and putting them into the ma th.\n5. Show how different Bayesian priors correspond to many regression t ricks\, e.g. ridge regression\, l1 regularization\, etc. No magic here - j ust express assumptions as math\n6. If the data violates our assumptions\, just change our assumptions. Bayesian regression still works. \n6a) Error s in sports data are not normally distributed. But we can fix that! \n\nEn d goal of this talk: if you have highly correlated input data\, non-normal errors\, domain knowledge exceeding input data\, or other common problems \, you shouldn't get stuck. You might need to custom hack some tools LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/so-you-think-you-know- about-linear-regression-4LAQEmP8w5iZMZGrMmg9au BEGIN:VALARM ACTION:display DESCRIPTION:So you think you know about linear regression ... in Auditoriu m 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:A study in classification DTSTART;VALUE=DATE-TIME:20180726T043000Z DTEND;VALUE=DATE-TIME:20180726T050000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/AK4n1R1VwEJP9s9cHCHuUZ@hasgeek.com CATEGORIES:Crisp talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T112253Z DESCRIPTION:### Introduction [3-4 mins]\n\nAn introduction to the ML probl em at hand (an import/export related classification task). Examples will b e presented to highlight the complexity of tasks involved. This section wi ll also be used to explain the real-world implications of the system that we aim to develop. The use-case introduced in this section\, will be conti nuously referred to throughout the talk.\n\n---\n\n### Starting steps [5 m ins]\n\nThis section will describe the ideal first steps to start with. Ap proaches to analyze the dataset will be presented. Expected outcomes will be discussed\, together with the need to develop baseline guarantees.\n\nT opics Covered\n\n- dataset considerations\n- problem solving by pattern ma tching\n- analyzing existing workflows (aka the system you are looking to make redundant)\n- calibrating expectations\n\n---\n\n### Advanced conside rations [5 mins]\n\nIn more complicated scenarios\, additional (business-d riven) objectives need to be considered before making decisions. This sect ion will talk about how involving other project stakeholders can drastical ly affect your own internal roadmap towards a successful ML product.\n\nTo pics covered\n\n- business context considerations\n- other stakeholder inv olvement\n\n---\n\n### Deployment and continuous learning [5 mins]\n\nGive n the knowledge learned in the earlier sections\, we can now focus on what makes a ML deployment successful. The advantages to having a "human-in-th e-loop" workflow will also be presented here. By introducing additional ch eckpoints at multiple stages and continuous monitoring\, effective quantit ative assessments can be carried out.\n\nTopics covered\n\n- deployment sc enarios\n- human-in-the-loop augmentation\n- effective monitoring outcomes \n\n---\n\n### Conclusion [3-4 mins]\n\nThis section will serve as a recap of the entire talk. The approach followed through the earlier sections wi ll be summarized and hopefully presented as a generalizable approach for o thers. LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/a-study-in-classificat ion-AK4n1R1VwEJP9s9cHCHuUZ BEGIN:VALARM ACTION:display DESCRIPTION:A study in classification in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Sponsored workshop: Machine Learning with Amazon SageMaker DTSTART;VALUE=DATE-TIME:20180726T043000Z DTEND;VALUE=DATE-TIME:20180726T060000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/XCmuti2xdt5bbVvqq5qqg5@hasgeek.com CREATED;VALUE=DATE-TIME:20180521T011836Z LAST-MODIFIED;VALUE=DATE-TIME:20180724T015245Z LOCATION:Auditorium 3 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Sponsored workshop: Machine Learning with Amazon SageMaker in Auditorium 3 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Improving product discovery via relevance and ranking optimization DTSTART;VALUE=DATE-TIME:20180726T045000Z DTEND;VALUE=DATE-TIME:20180726T053000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/KmSbPkNe2Q6jJMdcSHyEP5@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T112209Z DESCRIPTION:The talk will cover following topics :\na) User shopping journ ey at Flipkart and importance of product discovery\nb) Types of product re commendations : similar products\, cross selling etc.\nc) Architecture of Recommender System : Relevance and Ranking modules\nd) Using product textu al and visual attributes for computing product similarity \ne) Using crowd sourced activity data to compute the set of relevant products\nf) Formulat ion of ranking as an machine learning problem towards optimising conversio n rates\ng) Our learnings from various iterations over feature-sets and ML models\n LAST-MODIFIED;VALUE=DATE-TIME:20180720T122305Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/improving-product-disc overy-via-relevance-and-ranking-optimization-KmSbPkNe2Q6jJMdcSHyEP5 BEGIN:VALARM ACTION:display DESCRIPTION:Improving product discovery via relevance and ranking optimiza tion in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Morning beverage break DTSTART;VALUE=DATE-TIME:20180726T050000Z DTEND;VALUE=DATE-TIME:20180726T053000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/XUfF5Snf1NwSiRfQCP9wkv@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T141110Z LAST-MODIFIED;VALUE=DATE-TIME:20180629T141120Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Morning beverage break in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Keynote: The power of intuition in data science\, and why it will always have a role. DTSTART;VALUE=DATE-TIME:20180726T053000Z DTEND;VALUE=DATE-TIME:20180726T061500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/EL7Wu53JjeWvAKxPw5fi7X@hasgeek.com CATEGORIES:Full talk,Beginner CREATED;VALUE=DATE-TIME:20180704T073426Z DESCRIPTION:- Context setting: why does a subject driven by data need the notion of intuition? Isn’t intuition akin to black magic?\n- What is int uition: perspectives and definitions\n- Where does intuition come from? Wh at is the science behind it?\n- Why is intuition needed even in data scien ce\, where we have abundant data\n- What does it take to develop intuition as a data scientist? Our 7 tips\n- Is intuition the Data scientist’s de fence against replacement by machine? LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/the-power-of-intuition -in-data-science-and-why-it-will-always-have-a-role-EL7Wu53JjeWvAKxPw5fi7X BEGIN:VALARM ACTION:display DESCRIPTION:Keynote: The power of intuition in data science\, and why it w ill always have a role. in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Morning beverage break DTSTART;VALUE=DATE-TIME:20180726T053000Z DTEND;VALUE=DATE-TIME:20180726T061500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/5QgSEsHWC6fD3y8EwggtiT@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T143902Z LAST-MODIFIED;VALUE=DATE-TIME:20180720T122334Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Morning beverage break in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Morning beverage break DTSTART;VALUE=DATE-TIME:20180726T060000Z DTEND;VALUE=DATE-TIME:20180726T063000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/XneHSR4HHBTUKBvzCD1iKX@hasgeek.com CREATED;VALUE=DATE-TIME:20180611T110741Z LAST-MODIFIED;VALUE=DATE-TIME:20180629T141143Z LOCATION:Auditorium 3 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Morning beverage break in Auditorium 3 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Using structural estimation methods from economics to model user b ehaviour in bike-sharing systems DTSTART;VALUE=DATE-TIME:20180726T061500Z DTEND;VALUE=DATE-TIME:20180726T070500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/TNWFcLkAmxiHSEZhnqfXd4@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T112905Z DESCRIPTION:* Introduction of bike-share context\n* Challenge of estimatin g user preference from station level data\n* Model formulation and estimat ion\n * Computational Challenge - Solution\n* Illustration of prescript ive power of method to solve system design problems LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/using-structural-estim ation-methods-from-economics-to-model-user-behaviour-in-bike-sharing-syste ms-TNWFcLkAmxiHSEZhnqfXd4 BEGIN:VALARM ACTION:display DESCRIPTION:Using structural estimation methods from economics to model us er behaviour in bike-sharing systems in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Needle in a haystack: entity search on text and graph DTSTART;VALUE=DATE-TIME:20180726T061500Z DTEND;VALUE=DATE-TIME:20180726T065500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/RUJtd3Fv6qUBDXehoVT3W6@hasgeek.com CATEGORIES:Full talk,Beginner CREATED;VALUE=DATE-TIME:20180703T112714Z DESCRIPTION:Draft slides: https://docs.google.com/presentation/d/1hT1LfaX- jK0HpC8I41CXkwd4qONjRJYt4cDeReapn5k/edit?usp=sharing LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/needle-in-a-haystack-e ntity-search-on-text-and-graph-RUJtd3Fv6qUBDXehoVT3W6 BEGIN:VALARM ACTION:display DESCRIPTION:Needle in a haystack: entity search on text and graph in Audit orium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Sponsored workshop: Machine Learning with Amazon SageMaker - cont inued DTSTART;VALUE=DATE-TIME:20180726T063000Z DTEND;VALUE=DATE-TIME:20180726T080000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/CV7dvm3raLsHivh6M87sCr@hasgeek.com CREATED;VALUE=DATE-TIME:20180521T011914Z LAST-MODIFIED;VALUE=DATE-TIME:20180724T015235Z LOCATION:Auditorium 3 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Sponsored workshop: Machine Learning with Amazon SageMaker - continued in Auditorium 3 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Building analytics application with streaming expressions in Apach e Solr DTSTART;VALUE=DATE-TIME:20180726T065500Z DTEND;VALUE=DATE-TIME:20180726T073500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/B3NpN2RNhHWMch4EKAz2bu@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T113135Z DESCRIPTION:- Challenges building analytics applications with real-time da ta \n- Introduction to Streaming Expressions and Overview\n- Sources\, Dec orators and Evaluators\n- Short solutions from simple to complex use-cases optimised \n- Statistical Programming with use-case\n- Conclusion LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/building-analytics-app lication-with-streaming-expressions-in-apache-solr-B3NpN2RNhHWMch4EKAz2bu BEGIN:VALARM ACTION:display DESCRIPTION:Building analytics application with streaming expressions in A pache Solr in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Our experiments with food recommendations @Swiggy DTSTART;VALUE=DATE-TIME:20180726T070500Z DTEND;VALUE=DATE-TIME:20180726T073500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/U6qGQkPS6oLEZ1jPQ6K3j5@hasgeek.com CATEGORIES:Crisp talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T112959Z DESCRIPTION:We want to talk all about Art & Science of Food Discovery @Swi ggy. How we use advanced Machine Learning/AI on terabytes of data ( implic it/Explicit Feedback ) everyday\, to bring you recommendations that powers Restaurant Feeds\, Filter Widgets\, Personalized Collections. \n \nWe wil l also be talking about our Journey\, Learning and Challenges of building Food Recommendation System. \n \n \n\n* Overview\n* Recommendation @Swigg y \n* Evolution of Recommendation Systems\n* CF & Content Based Methods \n * Learning to rank\n* Understanding Food Catalog\n* Meals Recommendations. \n* Page generation.\n LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/our-experiments-with-f ood-recommendations-swiggy-U6qGQkPS6oLEZ1jPQ6K3j5 BEGIN:VALARM ACTION:display DESCRIPTION:Our experiments with food recommendations @Swiggy in Auditori um 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Lunch break DTSTART;VALUE=DATE-TIME:20180726T073500Z DTEND;VALUE=DATE-TIME:20180726T083500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/BjNjtvTND7CUFyiQPfcnvu@hasgeek.com CREATED;VALUE=DATE-TIME:20180720T122418Z LAST-MODIFIED;VALUE=DATE-TIME:20180720T122426Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Lunch break in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Lunch break DTSTART;VALUE=DATE-TIME:20180726T073500Z DTEND;VALUE=DATE-TIME:20180726T083500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/9XCi7KXWeb8nsiB9zFf1VW@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T141901Z LAST-MODIFIED;VALUE=DATE-TIME:20180704T073457Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Lunch break in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Lunch break DTSTART;VALUE=DATE-TIME:20180726T080000Z DTEND;VALUE=DATE-TIME:20180726T090000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/PDc7cwaUheJ8215cqL9om2@hasgeek.com CREATED;VALUE=DATE-TIME:20180609T055249Z LAST-MODIFIED;VALUE=DATE-TIME:20180611T115003Z LOCATION:Auditorium 3 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Lunch break in Auditorium 3 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Birds Of Feather (BOF) session: Women in data science DTSTART;VALUE=DATE-TIME:20180726T083500Z DTEND;VALUE=DATE-TIME:20180726T092000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/NduEH5KJ7xNUYQs98HdyWE@hasgeek.com CREATED;VALUE=DATE-TIME:20180722T072153Z DESCRIPTION:TBA LAST-MODIFIED;VALUE=DATE-TIME:20180722T075349Z LOCATION:BOF area - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Birds Of Feather (BOF) session: Women in data science in BOF a rea in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Design for data DTSTART;VALUE=DATE-TIME:20180726T083500Z DTEND;VALUE=DATE-TIME:20180726T091500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/K4zMFbqMdCjUpuBSaFummx@hasgeek.com CATEGORIES:Full talk,Beginner CREATED;VALUE=DATE-TIME:20180720T121015Z DESCRIPTION:* Introduction: the framework of Workflow\, Data\, Algorithm f or AI/ML projects.\n* What is data? A _representation_ of a part of the wo rld that we care about.\n* The Data Generating Process\n* * The data colle ction process (the technology and operations by which data reaches a datab ase)\n* * The statistical model\n* * The probabilistic model\n* Data Quali ty as a function of data use - availability and visibility\n* * Knowing th e past readily - before predicting the future\n* The Complexity of Taking Action on the World - Learning from Machine Learning\n* * Tracking and sto ring models\, predictions\, and results \n* Conclusion and Takeaways LAST-MODIFIED;VALUE=DATE-TIME:20200619T062515Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/design-for-data-K4zMFb qMdCjUpuBSaFummx BEGIN:VALARM ACTION:display DESCRIPTION:Design for data in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Serviceability under high demand DTSTART;VALUE=DATE-TIME:20180726T083500Z DTEND;VALUE=DATE-TIME:20180726T091500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/Aa6EDVKiLD5RwZKLWVvYY8@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T113359Z DESCRIPTION:I will first describe the nature and scope of the problem and why it requires a multi-pronged effort to deal with it. I will show how i deas from time-series\, operations research\, machine learning and simulat ions come together for solving these problems. The topics covered will inc lude:\n\n(1)Forecasting Demand\n(2)Characterizing stress of delivery syste m\n(3)Real-time paring of demand\n(4)Delivery leg predictions\n(5)Order Qu eue Dynamics - Inflow and Outflow\n(6)Batching of orders\n(7)Supply side p arameter control\n LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/serviceability-under-h igh-demand-Aa6EDVKiLD5RwZKLWVvYY8 BEGIN:VALARM ACTION:display DESCRIPTION:Serviceability under high demand in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Birds Of Feather (BOF) session: Solr users' BOF DTSTART;VALUE=DATE-TIME:20180726T091500Z DTEND;VALUE=DATE-TIME:20180726T102000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/AwxAwxFcSFK4pJaLrYA3qw@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T144106Z DESCRIPTION:Faclitators and participants will share experiences and insigh ts in the process of Solr usage (and non-usage)\, advantages and drawbacks of Solr\, alternatives to Solr\, and how Solr has been thought of and use d given the problems that users and potential adopters want to solve + giv en the specificity of domains (which aggravates the pros and cons of Solr usage). LAST-MODIFIED;VALUE=DATE-TIME:20180725T061614Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Birds Of Feather (BOF) session: Solr users' BOF in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Compromising a $6B big data project through poor data quality: the Aadhaar case study DTSTART;VALUE=DATE-TIME:20180726T091500Z DTEND;VALUE=DATE-TIME:20180726T095500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/NM6Z2UuULHULR8xTmR6u6v@hasgeek.com CATEGORIES:Full talk,Beginner CREATED;VALUE=DATE-TIME:20180703T113238Z DESCRIPTION:1. The Aadhaar enrollment software and how it works. \n2. The Data quality checks in the software for maximizing enrollment success. \n3 . Additional meta data created by the software for successful fraud detect ion in the back end.\n4. Case Study 1 - Data pollution using exceptions - The ILF&S fraud case and how the humble postman detected it but not Big da ta analytics. \n5. Case Study 2 - The Accelerating data quality errors - H ow UIDAI missed the tea leaves. \n6. Case Study 3 - The UP Aadhaar hack ca se - Why the first version of the software only had biometric overrides. \ n7. Case Study 4 - The Punjab hack case - Why it only had fraud detection overrides (such as GPS)\n8. Case Study 5 - The Bengal hack case - Why it h ad overrides for biometric data quality overrides. \n9. Case Study 6 - The missing Identity documents\n10. Cost benefit analysis from a fraudster's point of view\, fighting against a Big data analytics engine. \n\nEnd goal of this talk is to make attendees recognize that\n* Scaling data acquisit ion systems deployed on a country-wide basis creates novel challenges that can fully compromise data quality. \n* Offline data acquistion systems (E ventually consistent) need full tamper proofing for analytics to be effect ive. \n* Sentient opponents facing a machine driven intelligence\, will fo cus on corrupting it's data inputs and be successful. \n\n\n\n\n\n\n\n LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/compromising-a-6b-big- data-project-through-poor-data-quality-the-aadhaar-case-study-NM6Z2UuULHUL R8xTmR6u6v BEGIN:VALARM ACTION:display DESCRIPTION:Compromising a $6B big data project through poor data quality: the Aadhaar case study in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Weaponizing data for politics DTSTART;VALUE=DATE-TIME:20180726T095500Z DTEND;VALUE=DATE-TIME:20180726T103500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/3vQuhCmXRRR6Qy5dLUPwAU@hasgeek.com CATEGORIES:Full talk,Beginner CREATED;VALUE=DATE-TIME:20180703T113203Z DESCRIPTION:The aim of the presentation is to give people an idea of how p olitical parties can use data to shape a narrative during an election. It' ll start with what types of data can be used\, how this data is converted into strategy\, how the strategy is executed on the ground. The talk would also cover how fake news is spread using insights from data and will end with raising the ethical issues surrounding the use of data in politics. LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/weaponizing-data-for-p olitics-3vQuhCmXRRR6Qy5dLUPwAU BEGIN:VALARM ACTION:display DESCRIPTION:Weaponizing data for politics in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Evening beverage break DTSTART;VALUE=DATE-TIME:20180726T102000Z DTEND;VALUE=DATE-TIME:20180726T105000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/PSByWtJsRPafvB2HNbgCjb@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T144209Z LAST-MODIFIED;VALUE=DATE-TIME:20180720T122457Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Evening beverage break in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Evening beverage break DTSTART;VALUE=DATE-TIME:20180726T103500Z DTEND;VALUE=DATE-TIME:20180726T110000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/Azf86kf36KwskRgAtBtBPH@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T142255Z LAST-MODIFIED;VALUE=DATE-TIME:20180720T122459Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Evening beverage break in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Flash talks – by audience DTSTART;VALUE=DATE-TIME:20180726T105000Z DTEND;VALUE=DATE-TIME:20180726T111000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/6FPLKwTjRgQK5VuKsDh9SF@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T144231Z LAST-MODIFIED;VALUE=DATE-TIME:20180629T144237Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Flash talks – by audience in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Birds Of Feather (BOF) session: Data Science in Production BOF DTSTART;VALUE=DATE-TIME:20180726T110000Z DTEND;VALUE=DATE-TIME:20180726T120000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/CLa9DgBfRwawu2F3u67ToN@hasgeek.com CREATED;VALUE=DATE-TIME:20180722T052125Z DESCRIPTION:The DataOps BOF intends to address:\n\n1. Practical\, day-to-d ay concerns. \n2. Choice of platforms\, cloud providers (or otherwise)\, a nd tools -- and the trade-offs that you have to make in the process. \n3. Team organization and worflows.\n4. Challenges and bottlenecks. \n5. Succe ss stories. LAST-MODIFIED;VALUE=DATE-TIME:20180726T051054Z LOCATION:BOF area - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Birds Of Feather (BOF) session: Data Science in Production BOF in BOF area in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Qubole Sparklens: understanding the scalability limits of Spark ap plications DTSTART;VALUE=DATE-TIME:20180726T110000Z DTEND;VALUE=DATE-TIME:20180726T114000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/EbGAB1vvePmZ2p6w15Yz5k@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180720T120455Z DESCRIPTION:1) Single threaded applications\n2) Multi-threaded application s\n3) Distributed applications using spark \n4) When the applicaton "does nothing" and why?\n5) Driver\, Parallelism & Skew \n6) Critical Path of sp ark application\n7) Defining Ideal Spark application \n8) Introduction to Sparklens \n9) Understanding Sparklens report\n10) Where to fish for furth er improvements LAST-MODIFIED;VALUE=DATE-TIME:20200619T062515Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/qubole-sparklens-under standing-the-scalability-limits-of-spark-applications-EbGAB1vvePmZ2p6w15Yz 5k BEGIN:VALARM ACTION:display DESCRIPTION:Qubole Sparklens: understanding the scalability limits of Spar k applications in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Birds Of Feather (BOF) session: Math for data science DTSTART;VALUE=DATE-TIME:20180726T111000Z DTEND;VALUE=DATE-TIME:20180726T121000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/UuLB8ar73WwaHoiZ8XdEbT@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T144250Z DESCRIPTION:This BOF will address: \n\n1. Where does one start -- is it ma th\, or is it the problem that you are trying to solve? \n2. Why data sci ence? Therefore\, what is it that someone with and without specialized tra ining can and cannot do? \n3. From the above two questions\, where does ma th stand and therefore\, based on the facilitators' personal experiences\, how can participants create their own learning journeys?\n LAST-MODIFIED;VALUE=DATE-TIME:20180722T075819Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Birds Of Feather (BOF) session: Math for data science in Audit orium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Discussing the agenda for The Fifth Elephant 2019: an open discuss ion DTSTART;VALUE=DATE-TIME:20180726T120000Z DTEND;VALUE=DATE-TIME:20180726T124500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/Q8FbW2apYpXgvAgTxJQSV6@hasgeek.com CREATED;VALUE=DATE-TIME:20180723T113453Z DESCRIPTION:The Fifth Elephant has evolved quite a bit over the last few y ears. A great deal of its evolution is owing to inputs from the community\ , past speakers and practitioners who have been part of the community. \n\ nThis year\, we want to open up the agenda discussion at the conference it self. \n\n1. What are the emerging needs of the ecosystem? How do we addre ss these needs in this year\, leading up to The Fifth Elephant 2019?\n2. W hat are individuals' needs versus organizations' needs?\n3. How do we unde rstand and articulate people and organizational issues\, given that The Fi fth Elephant has been a conference about cutting edge data engineering tec hnology and data science practice?\n4. Any other issues we'd like to discu ss that will help us frame the agenda better. \n\nThis is an open session. All participants are invited to be part of this session. LAST-MODIFIED;VALUE=DATE-TIME:20180723T113500Z LOCATION:BOF area - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Discussing the agenda for The Fifth Elephant 2019: an open dis cussion in BOF area in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Check-in and breakfast DTSTART;VALUE=DATE-TIME:20180727T021500Z DTEND;VALUE=DATE-TIME:20180727T033000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/6jqHjC1TYLV6372cVMLha7@hasgeek.com CREATED;VALUE=DATE-TIME:20180521T011805Z LAST-MODIFIED;VALUE=DATE-TIME:20180629T142437Z ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Check-in and breakfast in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Operating data pipeline using Airflow @ Slack DTSTART;VALUE=DATE-TIME:20180727T033000Z DTEND;VALUE=DATE-TIME:20180727T041000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/EvkUuk7d7qg3cQNrQKLtKi@hasgeek.com CATEGORIES:Full talk,Advanced CREATED;VALUE=DATE-TIME:20180703T113826Z DESCRIPTION:- Intro to slack and the data engineering team\n- problem stat ement and the customer complaints.\n- Overview of Airflow infrastructure a nd deployment workflow\n- Scale Airflow Local Executor.\n- Data pipeline o perations.\n- Alerting and monitoring data pipeline. LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/operating-data-pipelin e-using-airflow-slack-EvkUuk7d7qg3cQNrQKLtKi BEGIN:VALARM ACTION:display DESCRIPTION:Operating data pipeline using Airflow @ Slack in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Scalability truths and serverless architectures: why it is harder with stateful\, data-driven systems DTSTART;VALUE=DATE-TIME:20180727T041000Z DTEND;VALUE=DATE-TIME:20180727T045000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/SEgJxKQYTVmpc7EEqkVbVA@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T114029Z DESCRIPTION:* Defining scalability - as applied to stateless and stateful systems\n* Stateless service - case of state pushed to a stateful layer\n* Database/Data store for stateful systems. Choices of such stores - Relati onal\, Append-only etc\n* Distributing stateful compute\, things to take c are of\n* Introduction to serverless architecture\, what to expect. Servic es available\n* Building your own stateful serverless compute engine - the Flux example\n* Data engineering for stateful systems - scaling from sing le node to multi-node cluster on the network LAST-MODIFIED;VALUE=DATE-TIME:20180720T121623Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/scalability-truths-and -serverless-architectures-why-it-is-harder-with-stateful-data-driven-syste ms-SEgJxKQYTVmpc7EEqkVbVA BEGIN:VALARM ACTION:display DESCRIPTION:Scalability truths and serverless architectures: why it is har der with stateful\, data-driven systems in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Segmenting 500 million users using Airflow + Hive DTSTART;VALUE=DATE-TIME:20180727T041000Z DTEND;VALUE=DATE-TIME:20180727T043000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/D5d9rZe3mh1uT436JRWezy@hasgeek.com CATEGORIES:Crisp talk,Intermediate CREATED;VALUE=DATE-TIME:20180720T121644Z DESCRIPTION:- Problem Statement - Segmenting 500 million Users using data from 20+ different sources.\n- Generating the customer data \n- Join the c ustomer data from multiple sources\n- Data sanitization and reliability ch ecks\n- Publishing the data for easy use LAST-MODIFIED;VALUE=DATE-TIME:20180720T121653Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/segmenting-500-million -users-using-airflow-hive-D5d9rZe3mh1uT436JRWezy BEGIN:VALARM ACTION:display DESCRIPTION:Segmenting 500 million users using Airflow + Hive in Auditoriu m 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Improve data quality using Apache Airflow and check operator DTSTART;VALUE=DATE-TIME:20180727T043000Z DTEND;VALUE=DATE-TIME:20180727T045000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/62sr5KcqqEG8nVCKiTDbtA@hasgeek.com CATEGORIES:Crisp talk,Intermediate CREATED;VALUE=DATE-TIME:20180720T121706Z DESCRIPTION:1. Data quality issues we faced with data ingestion/transforma tion. \n2. Approach we have adopted using Apache airflow check operators.\ n3. Enhancements we had to make to Check operators.\n4. Integration of Apa che Airflow Check operators with our ETLs. \n5. Challenges faced in devel oping the alerting framework. \n6. Lesson learnt and best practices in usi ng Apache Airflow for data quality checks.\n7. Limitations and Future work .\n LAST-MODIFIED;VALUE=DATE-TIME:20180720T121711Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/improve-data-quality-u sing-apache-airflow-and-check-operator-62sr5KcqqEG8nVCKiTDbtA BEGIN:VALARM ACTION:display DESCRIPTION:Improve data quality using Apache Airflow and check operator i n Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Tutorial: Deep learning based hybrid recommendation systems in Ten sorFlow DTSTART;VALUE=DATE-TIME:20180727T043000Z DTEND;VALUE=DATE-TIME:20180727T060000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/JzjfUDRoeTeUttQK16p4Ts@hasgeek.com CATEGORIES:Workshop,Intermediate CREATED;VALUE=DATE-TIME:20180703T113906Z DESCRIPTION:Slides are also uploaded at the Strata website. We would need to cut down and extract small subset of slides from here:\nhttps://confere nces.oreilly.com/strata/strata-ca/public/schedule/detail/63818 LAST-MODIFIED;VALUE=DATE-TIME:20180724T015326Z LOCATION:Auditorium 3 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/deep-learning-based-hy brid-recommendation-systems-in-tensorflow-JzjfUDRoeTeUttQK16p4Ts BEGIN:VALARM ACTION:display DESCRIPTION:Tutorial: Deep learning based hybrid recommendation systems in TensorFlow in Auditorium 3 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:User Response Prediction at Scale DTSTART;VALUE=DATE-TIME:20180727T045000Z DTEND;VALUE=DATE-TIME:20180727T053000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/3TXdKtrx2sEy75HaD3ebwF@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180629T144331Z DESCRIPTION:1. User Response Prediction-\n a) Problem motivation.\n2. Data and Domain Nuances (I)-\n a) The purchase funnel. \n b) D esktop v/s Mobile. \n c) Click v/s Conversion. \n d) Bid != S cores.\n3. Building Offline Models - \n a) Problem Formulation - oft en ignored but extremely important. \n b) Data collection and featur e engineering at scale. \n c) Building scalable model pipelines in S park-Scala.\n4. Deploying Models Online - \n a) Scores to Bids - Cali bration\, Scaling\, Inventory\, Context. \n b) Challenges - Realtime \, Spark streaming\, A/B testing. \n c) Wins - A/B test against a th ird-party advertiser.\n6. Data and Domain Nuances (II) - \n a) Probl em of multiple user touchpoints. \n b) Robust Factorization Machines .\n7. Key Learnings -\n a) The only thing more important than Data is - Nothing. \n b) Plan big. Start small. Iterate. \n c) A/B te sts - Last mile. \n d) Innovate.\n\n(Robust Factorization Machines - This is our reasearch work accepted at the WWW'18 to be presented in Apri l 2018.) LAST-MODIFIED;VALUE=DATE-TIME:20180720T121714Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/user-response-predicti on-at-scale-3TXdKtrx2sEy75HaD3ebwF BEGIN:VALARM ACTION:display DESCRIPTION:User Response Prediction at Scale in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Building a next generation speech and NLU engine: in pursuit of mu lti-modal experience for Bixby DTSTART;VALUE=DATE-TIME:20180727T045000Z DTEND;VALUE=DATE-TIME:20180727T053000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/Bni63NmbmVFpB5rfizWMHN@hasgeek.com CATEGORIES:Crisp talk,Intermediate CREATED;VALUE=DATE-TIME:20180720T121853Z DESCRIPTION:Bixby is an intelligent\, personalized voice interface for you r phone. It lets you seamless switch between voice & type/touch\, and supp orts more than 75 domains (eg. Camera\, Gallery\, Messages\, WhatsApp\, Yo utube\, Uber etc.). It was launched in July 2017 for English and is now av ailable on more than 200 countries with about 8 million registered users. \n\nMy talk focuses on challenges in deep learning for Bixby Automatic Spe ech Recognition & Natural Language understanding\, ranging from CNN vs. RN Ns\, Word vs. Character based models\, Domain Classification challenges gi ven the massive contextual input space\, Grammar complexity\, Multi-modal and Multi-accent handling. We go into details of hierarchical classificati on\, session based classification\, intent rejection logic… Also about t he tradeoffs between RNNs and CNNs\, Optimal filter sizes for CNNs\, Handl ing variations of data and conflicts between data. Also go into use of Tra nsfer learning and Bilingual models for Bixby for Hindi\n\nWhen you look a t processing steps of voice engine\, it typically is like this. User speak s an utterance\, for example “text to mom”. Then NLU engine tries to u nderstand what domain the user is talking about\, what command the user wa nts to execute\, and extract the required parameters for execution in slot tagger. \n\nIn a minimalistic view\, Bixby accepts voice signals with its Automatic Speech Recognition engine\, and then give transcribed text to i ts Natural Language Processing engine. Then NLU engine extracts the inform ation required for execution\, and send it to devices or CP services. \n\n Bixby Automatic Speech Recognition (ASR) was earlier optimized for US Engl ish accent only. In our testing\, we found that it did not perform as well as expected. The root cause was that there are many people of Indian\, Ko rean\, Chinese and Spanish origin residing in US and the ASR did not work so well for them. So we trained ASR models optimized for Indian English\, Korean English\, Chinese English and Spanish English using transfer learni ng to save training time as well computing resources. Then\, we had to fin d a way to load the model that would best for the individual's voice. We i ncorporated an accent determination step at the Bixby onboarding time and the user is asked to speak five sentences. Word recognition accuracy is me asured for all these models and we select the model using ASR performance as well as other cues such as Keyboard\, Contact information. The accent s election once determined will be used as default.\n\nOne big difference of Bixby is that we tried to build a multi-modal system which supports both touch and voice interface\, so that a user can execute the same function w ith touch or voice. This 1st version of Bixby we call it Bixby 1.0\n\nUsua lly voice assistants classify user utterances into commands not caring muc h about the screen status. In Bixby 1.0\, we try to understand user utter ances based on their screen context too. So “find James” in contact ap plication should give you the contact information of James\, And “find James” in Gallery application should give you the images tagged as James . \n\nTo support that kind of multi-modality\, we modeled application scre ens as contexts of dialog management system. So we should have added cont ext awareness to traditional NLU to build a multi-modal NLU engine. The pr oblem was there were thousands of different screens that should be modeled as different contexts. Moreover\, we needed another kind of challenge in context awareness which is coming from supporting many device types. The s et of commands vary from device to device because they have delta function s according to models and locales. So we needed to consider the changes of command set as well.\n\nNow let’s look at the first challenge\, which i s the challenge of massive contextual input space. The input to NLU engine is now not only the utterances for 6\,000 commands\, but also the context of where the user started talking. So like I presented in the previous sl ide\, “find james” in gallery application should work differently to “find james” in contact application. If we model it in a dumbest way\, we can maintain a command classifier per each context. This will be best in performance\, but the developing cost is prohibitive. It means training and maintaining 2\,000 classifiers. We have a hierarchical classifier in place – meta domain (for some domains)\, domain\, intent.. As we have a session based architecture. Once we are inside the session\, we go to inte nt classification directly (bypassing domain classification). In case the intent classification rejects\, it takes the output of the domain classifi er. \n\nRNN Domain Classification was designed as word-based model. The mo del converges fast. But it had issues of unknown words. And it was perform ing poorly for variations of the client state from where the utterance was generated. Due to this reason Domain Classification was moved to characte r based CNN model\, where data is more and build time is also increased. \n\nWord based model has known problem of unknown words. Whereas character based model does not have any unknowns. But character based model is not good at making a difference between different words\, having similar spell ing. For example “search for s8 plus”\, goes to calculator domain due to presence of similar character sequence “8 plus” in calculator domai n.\n\nFor such a huge input space\, there were extreme variations of data. That includes lots of unknown words\, during training phase. The unknowns were issues for accuracy in lots of domains. That led us to experiment on the possibilities on CNN with Respect to RNN\n\nThere were issues of misc lassifications for the word inflections (when the word boundary goes beyon d the representation). The CNN was candidate of research to counter the i nflection problem\, which we faced in the RNN. In RNN\, the state was not getting learnt… Sentence is represented in vector space but it was too h uge for the word based RNN to handle. Also\, unknown words were not being handled with the word based RNN. So we went into CNN. This for both domain and intent classification. For the tagger\, we continued with RNN.\n\n\nW hen the migration was done to CNN\, then there was a question on the optim al filter size for the CNN design.We conducted various experimentation on different combinations of values of N in N-Gram for CNN Filters. Typically shorter values of N was used for sub-word level features. And in the same time\, larger values of N is used for understanding the language structur es. Various experiments were conducted to determine the best filter sizes to achieve the commercial quality accuracy. We have multiple filters with various sizes (2x2\, 4x4\, 6x6 etc.). We have another layer of CNN which g ives the final output with a probabilistic score.\n\nFor such a huge input space\, there were extreme variations of data. At the same time\, there e xists similarities between the data. So we needed some tools to help resol ve such data conflicts. We used techniques such as tf-idf\, cosine similar ity and policy conflict concept words to deal with this problem.\n\nAs dis cussed earlier\, we built the DNN classifier to take the context as input as well as utterances. Now we are good as we have just one classifier for every context. But still we need to train this neural network with utteran ces with different context. For example\, an utterance A should be mapped into command 1 when they are under context alpha or beta\, utterance B nee ds to be mapped to command 1 at context alpha\, and command 2 at context b eta. If you want to maintain the training set like this\, it will serve yo ur purpose but training time and maintenance cost will still be prohibitiv e. So we needed a nice sampling algorithm to pick up necessary training da ta. How the sampling works well will ultimately determines the fluency of context understanding. Samsung is recognized for making various device mod els\, throughout the year. When we are having multi-modality \, then vario us device models will have their differences in UX. That’s a challenge t o Bixby to handle a wide variety of output spaces. The architecture here s how the handling of variable output space.\n\nWe have evaluated our Bixby1 .0 architecture for its adaptability in other languages. We have taken Hin d as our language for experimentations.\nIn India\, the spoken Hindi is no t strict Hindi. It’s a mix of other languages as well. Mostly it uses th e Engish in it. We have used Bilingual Modeling to solve this issue. We ha ve also experimented with neural machine translation system to translate t he input data from English to Hindi. This worked. We also experimented wit h transliteration. This also worked but debugging/management was not so go od in both these.\n\n\n LAST-MODIFIED;VALUE=DATE-TIME:20200619T062515Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/building-a-next-genera tion-speech-and-nlu-engine-in-pursuit-of-multi-modal-experience-for-bixby- Bni63NmbmVFpB5rfizWMHN BEGIN:VALARM ACTION:display DESCRIPTION:Building a next generation speech and NLU engine: in pursuit o f multi-modal experience for Bixby in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Morning beverage break DTSTART;VALUE=DATE-TIME:20180727T053000Z DTEND;VALUE=DATE-TIME:20180727T060000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/6y2Tp4Vvo3GsnhnECANbPQ@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T143015Z LAST-MODIFIED;VALUE=DATE-TIME:20180629T143025Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Morning beverage break in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Morning beverage break DTSTART;VALUE=DATE-TIME:20180727T053000Z DTEND;VALUE=DATE-TIME:20180727T060000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/P2ASjJ5vis6a7Ydit4sNA7@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T144509Z LAST-MODIFIED;VALUE=DATE-TIME:20180720T121750Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Morning beverage break in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Birds Of Feather (BOF) session: Airflow users' BOF DTSTART;VALUE=DATE-TIME:20180727T060000Z DTEND;VALUE=DATE-TIME:20180727T064500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/EX4rRN9x7MhdLCQcKyr8VY@hasgeek.com CREATED;VALUE=DATE-TIME:20180722T023708Z DESCRIPTION:The discussion will focus on: \n\n1. How facilitators and part icipants *stumbled* on Airflow and what your one key learning has been. \n 2. Advantages and drawbacks of Airflow - where Airflow fits and does not f it in the context of the problem you are solving\, and the specificity of your domain.\n LAST-MODIFIED;VALUE=DATE-TIME:20180722T080119Z LOCATION:BOF area - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Birds Of Feather (BOF) session: Airflow users' BOF in BOF area in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Sponsored talk: Market propensity modelling using XStream: unified self-service analytics ETL and ML platform DTSTART;VALUE=DATE-TIME:20180727T060000Z DTEND;VALUE=DATE-TIME:20180727T064000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/F9sD76rZm2Tfv6ZqGQjarP@hasgeek.com CATEGORIES:Sponsored talk,Advanced CREATED;VALUE=DATE-TIME:20180703T114117Z DESCRIPTION:Introducing XStream\nFeatures of the Product\nMachine Learning Usecase(Realtime Market Propesity Modeling) using XStream LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/market-propensity-mode lling-using-xstream-unified-self-service-analytics-etl-and-ml-platform-F9s D76rZm2Tfv6ZqGQjarP BEGIN:VALARM ACTION:display DESCRIPTION:Sponsored talk: Market propensity modelling using XStream: uni fied self-service analytics ETL and ML platform in Auditorium 2 in 5 minut es TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Michelangelo: Uber's machine learning platform DTSTART;VALUE=DATE-TIME:20180727T060000Z DTEND;VALUE=DATE-TIME:20180727T064000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/RRSqnuLqRGvTWkM9vcwrmT@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T114606Z DESCRIPTION:Slides are a work in progress LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/michelangelo-ubers-mac hine-learning-platform-RRSqnuLqRGvTWkM9vcwrmT BEGIN:VALARM ACTION:display DESCRIPTION:Michelangelo: Uber's machine learning platform in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Morning beverage break DTSTART;VALUE=DATE-TIME:20180727T060000Z DTEND;VALUE=DATE-TIME:20180727T063000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/YQ8Ya6MKHnX6hcUNjWHJGx@hasgeek.com CREATED;VALUE=DATE-TIME:20180522T024940Z LAST-MODIFIED;VALUE=DATE-TIME:20180618T070319Z LOCATION:Auditorium 3 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Morning beverage break in Auditorium 3 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Tutorial: Deep learning based hybrid recommendation systems in Ten sorFlow – continued DTSTART;VALUE=DATE-TIME:20180727T063000Z DTEND;VALUE=DATE-TIME:20180727T080000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/CWHUqhcjzZWybaqbnJtv3A@hasgeek.com CREATED;VALUE=DATE-TIME:20180609T054637Z LAST-MODIFIED;VALUE=DATE-TIME:20180609T102119Z LOCATION:Auditorium 3 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Tutorial: Deep learning based hybrid recommendation systems in TensorFlow – continued in Auditorium 3 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Data science for business: adopting analytics without paralysis DTSTART;VALUE=DATE-TIME:20180727T064000Z DTEND;VALUE=DATE-TIME:20180727T072000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/35k3fXr68KHo45DcE5QSzh@hasgeek.com CATEGORIES:Full talk,Beginner CREATED;VALUE=DATE-TIME:20180703T114545Z DESCRIPTION:A bunch of factors has led companies to become data rich as co mpared to companies from the past. \n\nBut having data alone is not good e nough.Through case studies we will explore how companies can work to get t heir management to think more analytically & how they can create a culture where data scientists can thrive. \n\nAnd how you can teach data scientis ts to socialize their learnings so that once the data science capability h as been developed for one application\, other applications throughout the business become obvious. Storytelling with Data is becoming much more comm on today because of both vast amounts of data being available in the publi c space & also the emergence of a newer breed of younger\, more “social ” professionals who consume such data with far more ease! AI & machine l earning are also changing the context within which you can tell data stori es. In this talk we will look at examples of how data insights can lead to embedding analytics into the fabric of the company.\n\nAnd what must comp anies do to get a wider appreciation of data science\, so it blends into t he decision-making fabric. Even company furniture finds its way into the b alance sheet\, but “customer data” has no representation in the financ ial reports. We will explore how companies can build the data asset into a competitive advantage & what role does Technology have in this journey. F inally\, how does all this integrate with Marketing technology to make a d ifference to Customer experience.\n LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/data-science-for-busin ess-adopting-analytics-without-paralysis-35k3fXr68KHo45DcE5QSzh BEGIN:VALARM ACTION:display DESCRIPTION:Data science for business: adopting analytics without paralysi s in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:The battle for privacy: right to be forgotten in India DTSTART;VALUE=DATE-TIME:20180727T064000Z DTEND;VALUE=DATE-TIME:20180727T072000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/XcTUKqt5UidFMKYZZVZwrf@hasgeek.com CATEGORIES:Full talk,Beginner CREATED;VALUE=DATE-TIME:20180703T114215Z DESCRIPTION:This so-called right to be forgotten has not been expressly re cognized in international human rights instruments\, nor in national const itutions. Its scope remains murky\, meaning different things in different contexts and jurisdictions. While most commonly seen as a part of data pro tection\, its spirit draws more on laws regarding defamation and honor. By extending speech removal practices into data protection and privacy laws\ , the right places strong privacy protections and free expression in direc t\, and unnecessary\, conflict.\n\nThis right expands the power of private intermediaries\, making them the arbitrator of relevance and legitimacy o f online information including\, if information being available has public interest. It introduces obligations for a specific class of intermediary/ ies whose decision to delink results or erase content will become the de-f acto rules for defining the contours of online speech and expression. \n\n In some cases\, de-linking may not be possible for legal or technical reas ons for example when services are required to retain data for auditing pur poses. In the absence of rules and criteria on the basis of which intermed iaries may deny requests\, companies may struggle to interpret the law\, h owever defining categories of legal speech is problematic. The right to be forgotten creates an opaque\, unaccountable censorship regime that curbs journalism and free speech. There are clear incentives for them to remove or erase information in order to avoid penalties or litigation. \n\nThe id ea that\, it is the individual who should retain ultimate control over inf ormation\, ignores the broader right of the public to share and receive ma terial that is legitimately in the public domain. The act of seeking searc h engines to de-index links also affects the "forgetting" of other individ uals—those who are involved in the same event and yet do not want to be forgotten. It also impacts those who may be involved in the future or inte rested in similar events. \n\nUnder the GDPR's requirements for respondin g to right to erasure requests\, an online service provider must inform ot her processors of the request\, and must inform the data subject when it e rases information or takes action based on request. Sharing more precise o r granular information about delisting standards in difficult cases might risk disclosing personal information about the data subject\, bringing bot h legal penalties and public opprobrium to the company. It is difficult\, and may be impossible\, to maintain appropriate levels of public oversight and political control\, when intermediaries are required to hide from sig ht the content of information that they de-link or “forget”. Transpare ncy and censorship online are at odds\, especially when censorship is inte nded to make more obscure publicly available data.\n\nThe Right to Be Forg otten challenges several other basic principles of an open society\, inclu ding due process\, the role of private actors in public policy\, press fre edom\, transparency\, the duty of society to preserve debate for its citiz ens\, protection of the integrity of archives and history for its descenda nts. LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/the-battle-for-privacy -right-to-be-forgotten-in-india-XcTUKqt5UidFMKYZZVZwrf BEGIN:VALARM ACTION:display DESCRIPTION:The battle for privacy: right to be forgotten in India in Audi torium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Birds Of Feather (BOF) session: Spark users' BOF DTSTART;VALUE=DATE-TIME:20180727T064500Z DTEND;VALUE=DATE-TIME:20180727T074500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/K5Gebxh35Rx57dQweyfWCN@hasgeek.com CREATED;VALUE=DATE-TIME:20180722T041548Z DESCRIPTION:The discussion will focus on:\n\n1. How facilitators and parti cipants have and have not been using Spark\, and what your one key learnin g has been. \n2. Advantages and drawbacks of Spark - where Spark fits and does not fit in the context of the problem you are solving\, and the speci ficity of your domain. LAST-MODIFIED;VALUE=DATE-TIME:20180722T080301Z LOCATION:BOF area - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Birds Of Feather (BOF) session: Spark users' BOF in BOF area i n 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Lunch break DTSTART;VALUE=DATE-TIME:20180727T072000Z DTEND;VALUE=DATE-TIME:20180727T082000Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/B8gELpgxm5kxjqnAzZ4Yy7@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T143401Z LAST-MODIFIED;VALUE=DATE-TIME:20180629T143411Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Lunch break in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:The right to privacy versus the people's right to know: challenges and the way forward DTSTART;VALUE=DATE-TIME:20180727T072000Z DTEND;VALUE=DATE-TIME:20180727T075500Z DTSTAMP;VALUE=DATE-TIME:20200930T220506Z UID:session/J9AEpzJg7YuqDq6NxME76K@hasgeek.com CATEGORIES:Full talk,Beginner CREATED;VALUE=DATE-TIME:20180703T114146Z DESCRIPTION:* Legal and regulatory landscape governing the right to privac y.\n* The public avaiability of public data-sets and people sensitivity to it\n* Impact of the public availability of the public data-sets\n* Curren t ad hoc solutions to balance right to privacy with the people right to kn ow\n* Legal and regulatory changes required for a more effective balancing of the two rights. LAST-MODIFIED;VALUE=DATE-TIME:20180720T121804Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/the-right-to-privacy-v ersus-the-peoples-right-to-know-challenges-and-the-way-forward-J9AEpzJg7Yu qDq6NxME76K BEGIN:VALARM ACTION:display DESCRIPTION:The right to privacy versus the people's right to know: challe nges and the way forward in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Lunch break DTSTART;VALUE=DATE-TIME:20180727T074500Z DTEND;VALUE=DATE-TIME:20180727T082000Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/KmKH1LvaJSFdNSYVnEBmtq@hasgeek.com CREATED;VALUE=DATE-TIME:20180722T051235Z LAST-MODIFIED;VALUE=DATE-TIME:20180722T051245Z LOCATION:BOF area - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Lunch break in BOF area in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Lunch break DTSTART;VALUE=DATE-TIME:20180727T075500Z DTEND;VALUE=DATE-TIME:20180727T085500Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/Mx8XBcttWqRLxKRCJbL2qM@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T144747Z LAST-MODIFIED;VALUE=DATE-TIME:20180720T121812Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Lunch break in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Lunch break DTSTART;VALUE=DATE-TIME:20180727T080000Z DTEND;VALUE=DATE-TIME:20180727T090000Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/WnT4TUyhhH9S86GtPyv9yc@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T143704Z LAST-MODIFIED;VALUE=DATE-TIME:20180629T145129Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Lunch break in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Birds Of Feather (BOF) session: Inculcating data-driven thinking a nd systems in your organization DTSTART;VALUE=DATE-TIME:20180727T082000Z DTEND;VALUE=DATE-TIME:20180727T092000Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/hLZzDv5E9Ln3gjzfpQoer@hasgeek.com CREATED;VALUE=DATE-TIME:20180722T051154Z DESCRIPTION:Some of the specific issues that will be discussed here are: \ n\n1. What is the current state of thinking around data in your organizati on?\n2. Are data scientists thinking about data or around techniques?\n3. What kind of education do you offer to your existing and new team members around data-driven thinking?\n4. How do you empower super users of data (t he business side of your organization) and those who are at novice levels? \n5. Way forward\, from here. LAST-MODIFIED;VALUE=DATE-TIME:20180726T053703Z LOCATION:BOF area - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Birds Of Feather (BOF) session: Inculcating data-driven thinki ng and systems in your organization in BOF area in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Atlas: GO-JEK’s real-time geospatial visualization platform DTSTART;VALUE=DATE-TIME:20180727T082000Z DTEND;VALUE=DATE-TIME:20180727T090000Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/4obnncJRaZBWjWudRYtuBg@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T114506Z DESCRIPTION:1. Brief about Speaker and GoJek\n2. State of data at GoJek\n3 . Challenges in making realtime decisions with data\n4. Atlas Introduction \n5. Data pipeline architecture\n6. Atlas Architecture\n7. Atlas metric st reaming\n8. Atlas dimension mapping\n9. Atlas data experience \n10. Road A head LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/atlas-go-jeks-real-tim e-geospatial-visualization-platform-4obnncJRaZBWjWudRYtuBg BEGIN:VALARM ACTION:display DESCRIPTION:Atlas: GO-JEK’s real-time geospatial visualization platform in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Elastic search users' BOF DTSTART;VALUE=DATE-TIME:20180727T082000Z DTEND;VALUE=DATE-TIME:20180727T092000Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/9CKptHgsTQT5rh51BEP82M@hasgeek.com CREATED;VALUE=DATE-TIME:20180727T012359Z LAST-MODIFIED;VALUE=DATE-TIME:20180727T012534Z LOCATION:BOF area - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Elastic search users' BOF in BOF area in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Birds Of Feather (BOF) session: Data privacy and questions to thin k about. DTSTART;VALUE=DATE-TIME:20180727T085500Z DTEND;VALUE=DATE-TIME:20180727T095500Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/4GV2wcXJ96ZXqD1T4cJcLT@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T144814Z DESCRIPTION:The session will cover the following:\n\n1. Is data protection only about privacy or should we also think about other reasons to have da ta protection laws in addition to and beyond privacy?\na. Is data protecti on just a sub-set of privacy and therefore the same concerns should apply? \nb. Can data protection laws actually help build trust between businesses and consumers? Or between citizen and State? or between citizen and citiz en?\n2. Given that there are going to be some areas where privacy and the needs of open data will conflict (e.g. rtbf v right to know)\, can we find principles which will help us resolve the conflicts?\na. Should right to privacy always win over needs of open data or vice versa?\nb. If not\, is there a set of principles which we can help resolve these conflicts?\nc. O r are there really no principles and we just decide in each case\, whateve r works. \n3. If informational privacy is an aspect of privacy\, is your d ata your data?\na. Do you *own* information about yourself - can you stop others from using it and control how it's used?\nb. If others are generati ng data about you\, do *they* own it? \nc. Is there are definitive answer to the property vs rights debate on data?\n LAST-MODIFIED;VALUE=DATE-TIME:20180722T080725Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Birds Of Feather (BOF) session: Data privacy and questions to think about. in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Incremental transform of transactional data models to analytical d ata models in near real time DTSTART;VALUE=DATE-TIME:20180727T090000Z DTEND;VALUE=DATE-TIME:20180727T094000Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/8yinhavrfK8uTTU5NNbxjD@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T114445Z DESCRIPTION:1. Business and technical need\n a. 100% Completeness\n b. 5 minutes to 1 hour latencies\n c. Business Agility\n2. Evaluation and results of existing solutions\n a. Existing stream processing implementa tions\n b. Existing incremental processing implementations\n3. Our appro ach to solving the problem\n a. Incremental Transforms at scale for lower latencies\n b. Metadata\n c. Processing\n d. Learnings\n4. Results\n a. Live use-cases and Impact LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/incremental-transform- of-transactional-data-models-to-analytical-data-models-in-near-real-time-8 yinhavrfK8uTTU5NNbxjD BEGIN:VALARM ACTION:display DESCRIPTION:Incremental transform of transactional data models to analytic al data models in near real time in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Scaling write-heavy OLTP systems with strong data guarantees: lear ning from Flipkart’s user facing order capture systems DTSTART;VALUE=DATE-TIME:20180727T094000Z DTEND;VALUE=DATE-TIME:20180727T102000Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/HAyztvjqZj7EcFoSFg1NXq@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T114354Z DESCRIPTION:Challenges faced with existing order capture systems at Scale\ na) Context and landscape of the user-facing order capture systems\nb) Sca ling problems and gaps in the existing technologies\n\nConsolidation of ch aracteristics \na) Key-value store favouring strong consistency and data g uarantees\nb) Basic secondary index support\nc) Transactional change-propa gation\n\nOur choice: HBase \na) Good parts of HBase for us\nb) Downsides of HBase: maintenance of multiple components\, lack-of transactional chang e-propagation \nc) Overview of HBase\n\nSolving for single multi-tenant cl uster\na) Logical components of HBase \nb) Custom HBase LoadBalancer with tenant & region-server group awareness \nc) Using Hadoop’s favoured node API to bring in isolation at hadoop level replica placements\nd) Handling Region Splits and Merges\n\nSolving for Transactional change-capture \na) Using ReplicationEndpoint handlers \nb) Solve for no data loss\, rsgroup specific balancing\n\nHow this helped us \na) Helped reached our scale nee ds\nb) Improved cluster manageability \nc) Improved efficiency and reliabi lity \n\nFuture work and the way forward\na) Uniform data + replica distri bution\nb) Memstore flush optimization\nc) Compaction optimization LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/scaling-write-heavy-ol tp-systems-with-strong-data-guarantees-learning-from-flipkarts-user-facing -order-capture-systems-HAyztvjqZj7EcFoSFg1NXq BEGIN:VALARM ACTION:display DESCRIPTION:Scaling write-heavy OLTP systems with strong data guarantees: learning from Flipkart’s user facing order capture systems in Auditoriu m 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:BOF session: Automating inequality(?) AI and Indian governance DTSTART;VALUE=DATE-TIME:20180727T095000Z DTEND;VALUE=DATE-TIME:20180727T105000Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/WbzwMwjdmgm3RrdzpU7cZU@hasgeek.com CREATED;VALUE=DATE-TIME:20180727T074143Z LAST-MODIFIED;VALUE=DATE-TIME:20180727T074229Z LOCATION:BOF area - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:BOF session: Automating inequality(?) AI and Indian governance in BOF area in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Birds Of Feather (BOF) session: Data science for ad tech DTSTART;VALUE=DATE-TIME:20180727T095000Z DTEND;VALUE=DATE-TIME:20180727T105000Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/EJnrTL3drthzUT6YDUhKRh@hasgeek.com CREATED;VALUE=DATE-TIME:20180722T071300Z DESCRIPTION:The scope of this discussion will include:\n\n1. What are the interesting problems to solve in ad tech with data science? \n2. What are the typical and atypical problems to solve in ad tech\, and how and where data science comes into the picture. \n3. Challenges with respect to data quality and data collection\, and where the constraints are\, on a day-to- day basis. \n LAST-MODIFIED;VALUE=DATE-TIME:20180722T081000Z LOCATION:BOF area - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Birds Of Feather (BOF) session: Data science for ad tech in B OF area in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Deep portfolio: using neural networks for portfolio construction DTSTART;VALUE=DATE-TIME:20180727T095500Z DTEND;VALUE=DATE-TIME:20180727T103500Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/Nsr1ENFM5JnJo4bxnbZDt3@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T114258Z DESCRIPTION:There are many hidden factors that relate the universe of fina ncial products which cannot be unearthed using APT. However\, using Deep N eural Networks\, we can hunt for the latent linkages between the financial products and use that for building your portfolio\nThe process can be div ided into the following 3 tasks\n1) Auto Encoding\nThis step will be used for creating a condensed map of the entire universe of financial products\ n\n2) Calibrating\nThis allows us to choose a particular target ( benchmar k or a manual set of returns ) that we have in mind that we would like to create a portfolio for\n\n3) Validation and Verification\nThis allows us t o choose the appropriate condensed map of the universe of produces which w ill give the best calibration\n\nThere is a paper that talks about the pot ential applications of Deep\n\nAs part of the talk\, I will be choosing th e following examples\n1) Benchmark the NIFTY Index\n2) Benchmark a list of user defined returns\n\nDuring the course of the talk\, I will be using t he following technologies and data\n1) Python : Tensorflow\, numpy\n2) J upyter : For coding/visualization\n3) Datasets : Open Financial data fro m Quandl/Kaggle etc\n LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/deep-portfolio-using-n eural-networks-for-portfolio-construction-Nsr1ENFM5JnJo4bxnbZDt3 BEGIN:VALARM ACTION:display DESCRIPTION:Deep portfolio: using neural networks for portfolio constructi on in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Evening beverage break DTSTART;VALUE=DATE-TIME:20180727T102000Z DTEND;VALUE=DATE-TIME:20180727T105000Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/PoFxyoPSfcB8tD8qwe3Rbu@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T143121Z LAST-MODIFIED;VALUE=DATE-TIME:20180703T114417Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Evening beverage break in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Evening beverage break DTSTART;VALUE=DATE-TIME:20180727T103500Z DTEND;VALUE=DATE-TIME:20180727T110500Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/5e2ECs7isbm2wrLsf9Dr66@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T144942Z LAST-MODIFIED;VALUE=DATE-TIME:20180720T121825Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Evening beverage break in Auditorium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Seeing through the eyes of a self-driving car: visualizing autonom ous vehicle data on the web DTSTART;VALUE=DATE-TIME:20180727T105000Z DTEND;VALUE=DATE-TIME:20180727T113500Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/9odBTr4sJehnNXt9VkTGMo@hasgeek.com CATEGORIES:Full talk,Intermediate CREATED;VALUE=DATE-TIME:20180703T114318Z DESCRIPTION:- Data visualization at Uber: the many visualization tools\, a nd why they are crucial to the business\n- Introduction to ATG\n- Overview of the autonomous vehicle data: what is in there\, and why it's hard to v isualize\n- Designing a visual language for the decision making process of a self-driving car\n- Why invest in the web?\n- Uber's open-source visual ization frameworks power beautiful\, performant data applications in the w eb\n- Video of the AV web platform\n- Use case study: using the AV web pla tform to triage issues\n- Use case study: using the AV web platform to deb ug software LAST-MODIFIED;VALUE=DATE-TIME:20200619T062516Z LOCATION:Auditorium 1 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com URL:https://hasgeek.com/fifthelephant/2018/schedule/seeing-through-the-eye s-of-a-self-driving-car-visualizing-autonomous-vehicle-data-on-the-web-9od BTr4sJehnNXt9VkTGMo BEGIN:VALARM ACTION:display DESCRIPTION:Seeing through the eyes of a self-driving car: visualizing aut onomous vehicle data on the web in Auditorium 1 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT BEGIN:VEVENT SUMMARY:Birds Of Feather (BOF) session: Data engineering BOF DTSTART;VALUE=DATE-TIME:20180727T110500Z DTEND;VALUE=DATE-TIME:20180727T120500Z DTSTAMP;VALUE=DATE-TIME:20200930T220507Z UID:session/P3senLCph1NuNrCXfxwvUx@hasgeek.com CREATED;VALUE=DATE-TIME:20180629T144957Z DESCRIPTION:This session covers system@scale and people@scale challenges w ith respect to data engineering. \n\n1. How do we democratize the data and ease the data access?\n2. How data engineering work with the other teams in the org?\n3. How can we scale data infrastructure across the org?\n4. H ow do we provide data literacy and empower decision makers to makes sense of the data?\n5. How can we protect the sensitive PII data\, at the same t ime ease the data access?\n6. How can we encourage ethical data usage?\n\n The challenges above is the mix of organizational alignment and pragmatic system design. LAST-MODIFIED;VALUE=DATE-TIME:20180726T053723Z LOCATION:Auditorium 2 - NIMHANS Convention Centre\nBengaluru\, IN ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com BEGIN:VALARM ACTION:display DESCRIPTION:Birds Of Feather (BOF) session: Data engineering BOF in Audit orium 2 in 5 minutes TRIGGER:-PT5M END:VALARM END:VEVENT END:VCALENDAR