The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem


Machine Learning Security - The Data Scientist's Guide to Hardening ML Models

Submitted by Arjun Bahuguna (@arjunbahuguna) on Tuesday, 30 April 2019

Preview video

Session type: Tutorial



With increased attack incidents on machine learning models (adversarial images, membership inference, model inversion, information reconstruction, data poisoning, etc) it becomes imperative for companies to be aware of the attack surface of their ML services and published results.

In this 1.5 hour tutorial, the speakers will provide insights from their 3 years of research in privacy-preserving data mining, and show how companies like Google and Microsoft are coping with threats to their machine-learning models and user data privacy. The session will contain live-demos and be interactive.


  • Learn about attacks happening on ML models today
  • Learn how to code defenses against them, using existing libraries


  • Introduction
  • Real-World Attacks on Machine Learning Systems
    • Membership Inference at AWS, GCP, Azure
    • Data Linkage Attacks at Netflix
    • Dataset Poisoning at Microsoft
    • Attacks on Amazon Alexa
    • Adversarial Image Attacks at
    • Attacks using Google’s Prediction API
    • Others
  • Implications for Business Compliance
    • Penalties under International Data Regulation
    • Penalties under Indian Data Regulation
    • Ethical Issues in Data Acquisition
  • Why do these Attacks occur?
  • How do these Attacks affect ML pipelines?
  • How to customize your defense for business needs (tradeoffs and tips)?
  • Insights from 3 years of privacy-preserving machine learning at Next Tech Lab
  • Defenses (being used in-production)
    • Homomorphic Encryption at Microsoft
    • Multi-party Computation at VISA Research
    • Federated Learning at Google
    • Differential Privacy at Google
    • Blockchain-based solutions at OpenMined
    • Others
  • Defenses (upcoming theoretical research)
    • Zero-knowledge Proofs
    • Garbled Circuits
    • Machine Learning on Secure Enclaves
    • Quantum Defenses
    • Others
  • Learn to Implement
    • Adversarial Image Attacks
    • Implement a secure-MPC pipeline using PyTorch
    • Differential Privacy using Tensorflow
    • Implement SPDZ for Tensorflow
  • Learn to Use Existing Implementations & Frameworks
    • Tensorflow Encrypted
    • Tensorflow Cleverhans
    • PyTorch PySyft
    • Microsoft’s PySEAL
    • Google’s RAPPOR
    • Others
  • Conclusion


Laptops with Tensorflow and PyTorch pre-installed

Speaker bio

Arjun Bahuguna is an applied cryptography researcher at Next Tech Lab. His interests include statistical learning theory, privacy-enhancing technologies, and distributed systems. His research has been awarded with two university gold medals.

Sourav Sharan is a computer vision engineer at Next Tech Lab with 2 years of experience, with a focus on deep learning approaches. His interests include facial recognition systems, adversarial image attacks, numerical optimization, and chess.



Preview video


  • Abhishek Balaji (@booleanbalaji) Reviewer 3 months ago

    Hi Arjun,

    Thank you for submitting a proposal. We need to see more detailed slides and a preview video to evaluate your proposal. Your slides must cover the following:

    • Problem statement/context, which the audience can relate to and understand. The problem statement has to be a problem (based on this context) that can be generalized for all.
    • What were the tools/frameworks available in the market to solve this problem? How did you evaluate these, and what metrics did you use for the evaluation? Why did you pick the option that you did?
    • Explain how the situation was before the solution you picked/built and how it changed after implementing the solution you picked and built? Show before-after scenario comparisons & metrics.
    • What compromises/trade-offs did you have to make in this process?
    • What is the one takeaway that you want participants to go back with at the end of this talk? What is it that participants should learn/be cautious about when solving similar problems?

    Update your slides so they answer all the above questions. Further, we need a 2 min self recorded preview video where you talk to the camera about what you intend to cover in your talk, who the audience is and what is the one key takeaway for the audience.

    We need your updated slides and preview video by Jun 8, 2019 to evaluate your proposal. If we do not receive an update, we’d be moving your proposal for evaluation under a future event.

    • Arjun Bahuguna (@arjunbahuguna) Proposer 3 months ago

      Hi Abhishek,
      Video preview and slides have been updated according to the request. Looking forward to feedback.

      • Abhishek Balaji (@booleanbalaji) Reviewer 3 months ago

        Thanks. I see there are two presenters listed now. As per the policy, we allow only one speaker on stage per talk. Please decide among yourselves on who’d be presenting this talk and update the bio according.

      • Abhishek Balaji (@booleanbalaji) Reviewer 3 months ago

        Arjun, thanks for updating the slides. Here are the comments:

        1) The problem is still not apparent. The presentation is a lot of handwaving on how ML can go wrong.

        2) The solutions talked about are available already and there’s nothing novel in the options presented. What puts you in a unique position to present this? Have you implemented/worked on these in production?

Login with Twitter or Google to leave a comment