Fighting Fraudsters in Email Communication at Twilio using Machine Learning

Jul 2021

19 Mon

20 Tue

21 Wed

22 Thu

23 Fri 12:00 PM – 06:15 PM IST

24 Sat 12:00 PM – 05:10 PM IST

25 Sun

Jul 2021

26 Mon

27 Tue 02:00 PM – 05:10 PM IST

28 Wed

29 Thu

30 Fri

31 Sat

1 Sun

Make a submission

Accepting submissions till 14 Jul 2021, 11:00 PM

Tickets

Pinned update

Birds of Feather (BOF) session on Observability for data and ML; SRE Conf CfP This update is for participants only

Machine Learning (ML) is at the helm of products. As products evolve with time, so is the necessity for ML to evolve. In 2010s, we saw DevOps culture take the forefront for engineering teams. 2020s will be all about MLOps.

MLOps stands for Machine Learning Operations. MLOps mainly focuses on workflows, thought processes and tools that are used in creating ML models, and their evolution over time. The workflows for ML at organizations are different as the problem space, maturity of teams and experience with ML tools are widely different.

MLOps relies on DataOps. DataOps is about Data operations, and helps define data and SLOs for data - how they are stored, managed and mutate over time - thereby providing the foundations for sound ML. The success and failure of ML models depends heavily on DataOps, where data is well-managed and brought into the system in a well thought out manner. ML and data processes have to evolve to provide insights into the reasons as to why certain models are not behaving as before.

Productionizing ML models is a challenge, but so is running experiments at scale. MLOps caters not only to scaling ML models in production, but also helps in providing guidelines and thought processes to support rapid prototyping and research for ML teams.

MLOps Conference 2021 edition

The 2021 edition is curated by Nischal HP, Director of Data at Scoutbee.

The conference covers the following themes:

Machine Learning Operations
Machine Learning in Production
Privacy and Security in Machine Learning
Tooling and frameworks in Machine Learning
Economies of Machine Learning

Speakers from Doordash, Twilio, Scribble Data, Microsoft Research Labs India, Freshworks, Aampe, Myntra, Farfetch and other organizations will share their experiences and insights on the above topics.

Schedule: https://hasgeek.com/fifthelephant/mlops-conference/schedule

Who should participate in MLOps conference?

Data/MLOps engineers who want to learn about state-of-the-art tools and techniques.
Data scientists who want a deeper understanding of model deployment/governance.
Architects who are building ML workflows that scale.
Tech founders who are building products that require ML or building developer productivity products for ML.
Product managers, who are seeking to learn about the process of building ML products.
Directors, VPs and senior tech leadership who are building ML teams.

Contact information: Join The Fifth Elephant Telegram group on https://t.me/fifthel or follow @fifthel on Twitter. For inquiries, contact The Fifth Elephant on fifthelephant.editorial@hasgeek.com or call 7676332020.

Hosted by

The Fifth Elephant

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Supported by

Scribble Data

Scribble Data builds feature stores for data science teams that are serious about putting models (ML, or even sub-ML) into production. The ability to systematically transform data is the single biggest determinant of how well these models do. Scribble Data streamlines the feature engineering proces… more

Promoted

Privacy Mode

Deep dives into privacy and security, and understanding needs of the Indian tech ecosystem through guides, research, collaboration, events and conferences. Sponsors: Privacy Mode’s programmes are sponsored by: more

All submissions

Previous Next

This submission has been added to the schedule

Fighting Fraudsters in Email Communication at Twilio using Machine Learning

Submitted Jun 2, 2021

My name is Sachin Nagargoje. I work at Twilio as a Staff Data Scientist. As a part of this talk proposal I would like to shed some light on the kind of attacks we are facing at Twilio nowadays and how we are tackling it via different innovative ways and Machine Learning techniques. I want to showcase what are the challenges we face, and how we do and what we do to catch such unwanted communication.

At Twilio, we serve 2T email addresses, sending 90 B emails per Month with 80,000+ paying customers, which communicate with 56% of the world’s email addresses every year. We also serve 3+ B phone numbers in 100+ countries with 172K+ paying customers, and together sends Trillions of SMSs.

As per Litmus Benchmark Report, 2019, orgs get ROI of $42 for each $1 spend on email. As per Tessian reports, 2019, 75% of orgs have faced some sort of phish attach and 96% of attacks arrive via emails. The FBI’s Internet Crime Report shows that in 2020, BEC scammers made over $1.8 billion.

So as you can see, one of the major challenges we are facing today is the misuse of the Twilio platform for sending phish/fraud/spam.

There are various attacks we face on daily basis:

Phishing
ATO (Account Take Over) - Fraudster hacks the account by stealing credentials or attacking the account.
Fake Accounts - Fraudsters create an account with false info and stolen credit card.
Trial Accounts - Fraudster create a dummy account and utilise the trial Money for sending phish (small scale)
Toll Fraud - Fraudster make a call to a premium phone number from small shady carriers, sharing the revenue later with them.
Spam (Good/Bad)
Vishing

Below are the challenges with fraudulent communication:

Twilio reputation is at stake since messages go from Twilio account
Hard to detect fraudsters in flight although we can sample a few messages post flight, after stripping PIIs.
Generally good accounts are used by fraudsters for such frauds , so they could have some good traffic as well. So Ban or No-Ban?
Labelled Training data for ML modelling

There are various hints we use as a features for ML algorithm:

Rate of emails per day different from average
Engagement rate with emails at receiver’s end
- Spam
- Soft Bounce
  - Mail box full, recipient server was down, message was too large, etc.
- Hard Bouce
  - Email id does not exists, email id is invalid, etc.
- Open/Click
Down-streaming content (meeting GDPR regulations)

Phish Detection Strategy:

Labelled Data Collection
- Hand curated / Programmatic way
Data Preprocessing
- Stemming, Stopwords, etc
AI Modelling
- Deep Learning (Bi-Directional LSTM)
- BERT Language Models
Model Deployment at Downstream data

Below are the actions we take for the suspicious accounts:

Email/SMS/Call cap
Manual Review by Fraud Ops team
Hard Ban

Key Takeaways:

How Twilio help to communicate at Scale
Various attacks we face at Twilio.
Hints/Signals we observe to identify attackers.
ML Solutions we use to avoid fraud communication.
Illustrate Deep Learning Model for identifying Phish attacks.
Actions we take on fraudsters.
Learnings from our Journey so far.
Future steps to stop fraudsters by using AI.
Demo of Sift tool - one of the tools we use to detect fraudster’s entry at Twilio (Optional).

Presentation: https://docs.google.com/presentation/d/1HYS1e9jx36krPxOhYkrny4XLM0jF2PmUh-_KSPogQRQ/edit?usp=sharing

Supported by

Privacy Mode

All submissions

Previous Next