Privacy Attacks in Machine Learning Systems - Discover, Detect and Defend

Jul 2021

19 Mon

20 Tue

21 Wed

22 Thu

23 Fri 12:00 PM – 06:15 PM IST

24 Sat 12:00 PM – 05:10 PM IST

25 Sun

Jul 2021

26 Mon

27 Tue 02:00 PM – 05:10 PM IST

28 Wed

29 Thu

30 Fri

31 Sat

1 Sun

Make a submission

Accepting submissions till 14 Jul 2021, 11:00 PM

Tickets

Pinned update

Birds of Feather (BOF) session on Observability for data and ML; SRE Conf CfP This update is for participants only

Machine Learning (ML) is at the helm of products. As products evolve with time, so is the necessity for ML to evolve. In 2010s, we saw DevOps culture take the forefront for engineering teams. 2020s will be all about MLOps.

MLOps stands for Machine Learning Operations. MLOps mainly focuses on workflows, thought processes and tools that are used in creating ML models, and their evolution over time. The workflows for ML at organizations are different as the problem space, maturity of teams and experience with ML tools are widely different.

MLOps relies on DataOps. DataOps is about Data operations, and helps define data and SLOs for data - how they are stored, managed and mutate over time - thereby providing the foundations for sound ML. The success and failure of ML models depends heavily on DataOps, where data is well-managed and brought into the system in a well thought out manner. ML and data processes have to evolve to provide insights into the reasons as to why certain models are not behaving as before.

Productionizing ML models is a challenge, but so is running experiments at scale. MLOps caters not only to scaling ML models in production, but also helps in providing guidelines and thought processes to support rapid prototyping and research for ML teams.

MLOps Conference 2021 edition

The 2021 edition is curated by Nischal HP, Director of Data at Scoutbee.

The conference covers the following themes:

Machine Learning Operations
Machine Learning in Production
Privacy and Security in Machine Learning
Tooling and frameworks in Machine Learning
Economies of Machine Learning

Speakers from Doordash, Twilio, Scribble Data, Microsoft Research Labs India, Freshworks, Aampe, Myntra, Farfetch and other organizations will share their experiences and insights on the above topics.

Schedule: https://hasgeek.com/fifthelephant/mlops-conference/schedule

Who should participate in MLOps conference?

Data/MLOps engineers who want to learn about state-of-the-art tools and techniques.
Data scientists who want a deeper understanding of model deployment/governance.
Architects who are building ML workflows that scale.
Tech founders who are building products that require ML or building developer productivity products for ML.
Product managers, who are seeking to learn about the process of building ML products.
Directors, VPs and senior tech leadership who are building ML teams.

Contact information: Join The Fifth Elephant Telegram group on https://t.me/fifthel or follow @fifthel on Twitter. For inquiries, contact The Fifth Elephant on fifthelephant.editorial@hasgeek.com or call 7676332020.

Hosted by

The Fifth Elephant

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Supported by

Scribble Data

Scribble Data builds feature stores for data science teams that are serious about putting models (ML, or even sub-ML) into production. The ability to systematically transform data is the single biggest determinant of how well these models do. Scribble Data streamlines the feature engineering proces… more

Promoted

Privacy Mode

Deep dives into privacy and security, and understanding needs of the Indian tech ecosystem through guides, research, collaboration, events and conferences. Sponsors: Privacy Mode’s programmes are sponsored by: more

All submissions

This submission has been added to the schedule

Privacy Attacks in Machine Learning Systems - Discover, Detect and Defend

Submitted Apr 17, 2021

My name is Upendra Singh. I work at Twilio as an Architect. As a part of this talk proposal I would like to shed some light on the new kind of attacks machine learning systems are facing nowadays - Privacy Attacks. During the talk we will explain and demonstrate how to discover, detect and defend Privacy related vulnerabilities in our machine learning models. Will also explain why it is so critical to have solid Model Governance to manage the risks associated with these kinds of vulnerabilities. One of the main objectives of model governance is to manage risks associated with machine learning models. How safe and secure machine learning models are? And we are not talking about model security from an exposed api point of view but from a privacy point of view. Let’s try to understand what we exactly mean by that.
Fueled by large amounts of available data and hardware advances, machine learning has experienced tremendous growth in academic research and real world applications. At the same time, the impact on the security, privacy, and fairness of machine learning is receiving increasing attention. In terms of privacy, our personal data are being harvested by almost every online service and are used to train models that power machine learning applications. However, it is not well known if and how these models reveal information about the data used for their training. If a model is trained using sensitive data such as location, health records, or identity information, then an attack that allows an adversary to extract this information from the model is highly undesirable. At the same time, if private data has been used without its owners’ consent, the same type of attack could be used to determine the unauthorized use of data and thus work in favor of the user’s privacy.
Apart from the increasing interest on the attacks themselves, there is a growing interest in
uncovering what causes privacy leaks and under which conditions a model is susceptible to
different types of privacy-related attacks. There are multiple reasons why models leak information. Some of them are structural and have to do with the way models are constructed, while others are due to factors such as poor generalization or memorization of sensitive data samples. Training for adversarial robustness can also be a factor that affects the degree of information leakage.
Data Protection regulations, such as GDPR, and AI governance frameworks require personal data to be protected when used in AI systems, and that the users have control over their data and awareness about how it is being used. For projects involving machine learning on personal data, it is mandatory from Article 35 of GDPR to perform a Data Protection Impact Assessment (DPIA). Thus, proper mechanisms need to be in place to quantitatively evaluate and verify the privacy of individuals in every step of the data processing pipeline in AI systems.
In this talk we will focus on:

What are different types of attacks on machine learning systems?
attacks against integrity, e.g., evasion and poisoning backdoor attacks that cause misclassification of specific samples,
attacks against a system’s availability, such as poisoning attacks that try to maximize the misclassification error
attacks against privacy and confidentiality, i.e., attacks that try to infer information about user data and models(in this talk we will focus, demonstrate and discuss about these types of attacks)
How to do threat modeling in any ML project from a privacy point of view for Machine Learning Models? Here we will define and explain terminology which we will use for the rest of the discussion. Under threat modeling we have to analyze and conclude whether our Machine Learning Models are safe against attacks mentioned below.
What are different types of attacks on machine learning models impacting privacy? Breif explanation to each followed by deep dive into “Reconstruction Attack”
The attacks are categorized in following groups:
Membership Inference attack: This type attack tries to determine whether an input sample was part of the training set
Reconstruction attack: This type of attack tries to recreate one or more training samples and/or their respective training labels.
Property Inference attack: This type of attack tries to extract dataset properties which were not explicitly encoded as features or were not correlated to the learning task.
Model extraction attack: This is a type of black box attack where the attacker tries to extract information and potentially fully reconstruct a model.

Each of the above attack type is a serious topic of discussion and disection. And is a candidate of dedicated talk in itself. In my talk we will focus specifically on “Reconstruction Attack”.
4. What are the causes(in the design of the architecture of machine learning models) which lead to Reconstruction Attack on machine learning models?
5. How is Reconstruction Attack implemented? How is Reconstruction Attack implemented under different kinds of learning(centralized, distributed) settings? Hands on demo of the same and the techniques used. It is critical to understand how these attacks are implemented in order to avoid them. Just as a network security expert has to think like a hacker and understand the craft of hacking to better design network security we need to have a similar mindset while designing machine learning models to avoid privacy vulnerabilities.
6. How to detect whether your existing machine learning models are susceptible to Reconstruction Attack and quantifying the same for the DPIA(Data Protection and impact assessment)? Will provide a hands on demo for the same and the technique to quantify the same.
7. How to defend against Recosntruction Attacks by applying state of the art techniques? For example:
Differential Privacy Techniques
Regularization Techniques
Prediction vector tampering
I believe in explaining by doing. Whole talk will be sprinkled with hands on demonstrations wherever possible to explain the concept better.

SlideLink: https://docs.google.com/presentation/d/e/2PACX-1vRZO1I0KZhpIeMp-T1iK1as4kcsV0qDN_FozD2RS_skN6Hoe5FpsLv3vnXF0fhsQsxH2c9PYMzeF7Jh/pub?start=false&loop=false&delayms=3000

Supported by

Privacy Mode

All submissions

Comments

Jul 2021

19 Mon

20 Tue

21 Wed

22 Thu

23 Fri 12:00 PM – 06:15 PM IST

24 Sat 12:00 PM – 05:10 PM IST

25 Sun

Jul 2021

26 Mon

27 Tue 02:00 PM – 05:10 PM IST

28 Wed

29 Thu

30 Fri

31 Sat

1 Sun

Make a submission

Accepting submissions till 14 Jul 2021, 11:00 PM

Hosted by

The Fifth Elephant

Supported by

Scribble Data

Promoted

Privacy Mode