Submissions for MLOps November edition
On ML workflows, tools, automation and running ML in production
Accepting submissions till 01 Nov 2021, 11:59 PM
Not accepting submissions
We are accepting experiential talks and written content on the following topics:
Content can be submitted in the form of:
All content will be peer-reviewed by practitioners from industry.
Productionize engineered data with feature store in Kubeflow OrchestrationIn this session we will find out the building the feature store pipeline in Kubeflow with serving endpoint and alongside using the spark job with batch and real time kafka ingestion for offline and online extraction in Google Cloud Platform. And we will be tracking the metrics for the feature store in Grafana and Prometheus. more
|
ROI of building internal MlOps vs adopting open-source vs buying managed options“To build or to buy?”" - That is the question which will be explored in this session. I will compare and contrast end-to-end managed MlOps offerings like H2O.ai and sagemaker vs Building your own platform from established components vs Mixing and matching components from managed, opensource and self-built sources. As a part of this exercise, I will also cover the current state of the ecosystem in… more
|
CapFlow: A scalable ML framework for training and serving CRM machine learning model operationsPreface In Capillary Technologies, we have around 30 different models to serve different use cases like recommendations, personalization, insight, decision-making, and several other retail CRM predictions. We have hundreds of customers and terabytes of data that we consume to train and serve these models through online and offline inference. To scale to such a level and cater to continuous traini… more
|
Empowering Data Scientists at Farfetch to GoFar with PaaSMachine Learning is a strategic goal at Farfetch, and enabling Data Scientists to be more productive is a key objective to achieve it. Cloud Providers like AWS, Google, Azure have a lot of services/products which offer various capabilities to create value in ML and Data space. But often the problem with using these services, or building one such Product from scratch is that the key stakeholders -… more
|
Cap-auto-feature: Scalable feature store for training CRM machine learning model operationsIntroduction Feature engineering is the heart of modeling, especially for tabular datasets. While modelling it’s often a good idea to add historical data on top of the contextual data, this makes data more rich and robust for all kinds of machine learning problems. Using centralized feature engineering we can achieve this. A good feature impacts the results of the model significantly. It can help… more
|
Handling Bias while building ML systemsAt Eightfold.ai our mission has been to help enable the right career for everyone using the power of AI. We employ deep learning algorithms that leverage information from career data of 1 billion+ profiles - these algorithms in turn help organizations find the most relevant talent and individuals identify the best career options for themselves. more
|
Monitoring Data Quality at ScaleLevel : Beginner Timing: 15 min Abstract Data drift and data cascades are real problems that can wreck havoc with any business insights. When operating with data at scale and dealing with external systems, any changes in data can cause cascading impact through all the data pipelines which are difficult to trace and incur significant cost for correcting data. Data quality frameworks like Deequ / G… more
|
ML Infrastructure for Feed Recommendations at ShareChatOverview In this talk, we will describe ShareChat’s feed recommendation infrastructure in detail. The talk will delve into various ml-infrastructure related aspects, such as model training, serving, design, and development of feature-store, and also feature-computation pipelines. The subject matter will also provide insights and learnings that we have obtained via building these large-scale, low … more
|
Production Grade DataOps Framework for Building Intelligence Over User-Content Interaction DataProduction Grade DataOps Framework for Building Intelligence Over User-Content Interaction Data more
|
ML Fairness 2.0 - Intersectional Group FairnessTopic: As more companies adopt AI, more people question the impact AI creates on society, especially on algorithmic fairness. However, most metrics that measure the fairness of AI algorithms today don’t capture the critical nuance of intersectionality. Instead, they hold a binary view of fairness, e.g., protected vs. unprotected groups. In this talk, we’ll discuss the latest research on intersect… more
|
Opening the NLP Blackbox - Analysis, Evaluation and Testing of NLP ModelsRapid progress in NLP Research has seen a swift translation to real world commercial deployment. While a number of success stories of NLP applications have emerged, failures of translating scientific progress in NLP to real-world software have also been considerable. Evaluation of NLP models is often limited to held out test set accuracy on a handful of datasets, and analysis of NLP models is oft… more
|
Using feature stores to build a fraud modelFeature stores enable companies to make the difficult leap from research to production machine learning. At their best, feature stores allow you to define new features, automate the data pipelines to process feature values, and serve data for training and online inference. You can quickly and reliably serve features to your production models so your customers aren’t waiting for predictions. more
|
Brands Dilemma: Personalization at the cost of privacyWe are in an era where we are so well connected virtually we are part of this humongous digital footprint that we are leaving behind. For eg when we buy anything from a marketplace, our app purchases, our entertainment preferences, and many more. These footprints are patterns of our behavior which could be private and public. Brands are hugely investing in this data to understand and cater to the… more
|
Designing an Autonomous Workbench for Data Science on AWSIn the wake of the COVID-19 pandemic and the consequent remote work setup, we - the Engineering team at Episource - were keen on developing a hosted, self-serving platform which would allow our Data Science counterparts to access the compute and data they needed for their experiments on-the-fly. more
|
Data and Model versioninghttps://docs.google.com/presentation/d/1qLRYcE00wnD83FgWxlLCoS_eFzaIL4yMGDOIhwJ1zNs/edit?usp=sharing more
|
Building Human-in-the-loop pipeline in MLOpsThe objective of this talk is to throw some light on how the productionized models can be improved iteratively by adopting Human-in-the-loop pipeline (Active learning strategy and human annotation) in an MLOps lifecycle. more
|
Taking ML models to productionIn spite of massive ongoing improvements in machine learning and deep learning, a majority of data science teams still struggle to solve the last mile problem - taking models to production. Due to the ad-hoc nature of training iterations and lack of a standard process, the process of tracking experiments and deploying models is anything but a smooth one. Especially in the last decade, Machine Lea… more
|
ML Governance from the Bottom-Up: Deriving Data Access Policy from Code through Ethical Monkey-PatchingPractical implementations of Data Governance tend to enforce access control at the datastore-level - think ACLs for S3, Snowflake or HDFS. But top-down enforcement of an infrastructure policy can be painful for the engineers working day-to-day with the data, especially in an ETL or Feature Engineering context. For example, critical data needed for extracting features can become obscured or even a… more
|
Reducing technical debt for ML platformsDeploying machine learning models at scale is a time-consuming process that involves many stages of simulations and stress testing. Continuous testing is needed to ensure that the engineers’ ML Models are performing as anticipated in production - especially monitoring data/model drift. What if the data scientists want to put their latest model enhancements to the test in a simulated near-producti… more
|
Teamwork on a Machine Learning project that scalesA Machine Learning project is composed of a variety of artifacts that are distinguished from one another. When a project evolves and grows in complexity, this fact becomes a significant challenge in our workflow with multiple aspects, such as: more
|
Search Driven Analytics: Enabled through a Conversational BotTo deliver insights at the speed of thought, instead of requiring the need through dashboards, applying filters / or asking analysts - We at Mahindra & Mahindra, have developed Genie for Analytics, a voice-enabled analytics conversational assistant. This Engine has been a first-of-its-kind, cutting edge work in Mahindra. It integrates natural language processing, query processing and natural lang… more
|
MLOps patterns to address Machine Learning Models deterioration in productionIn this talk, I will present Typical Life cycle of ML Models more
|