Fighting Fraudsters in Email Communication at Twilio using Machine Learning
My name is Sachin Nagargoje. I work at Twilio as a Staff Data Scientist. As a part of this talk proposal I would like to shed some light on the kind of attacks we are facing at Twilio nowadays and how we are tackling it via different innovative ways and Machine Learning techniques. I want to showcase what are the challenges we face, and how we do and what we do to catch such unwanted communication.
At Twilio, we serve 2T email addresses, sending 90 B emails per Month with 80,000+ paying customers, which communicate with 56% of the world’s email addresses every year. We also serve 3+ B phone numbers in 100+ countries with 172K+ paying customers, and together sends Trillions of SMSs.
As per Litmus Benchmark Report, 2019, orgs get ROI of $42 for each $1 spend on email. As per Tessian reports, 2019, 75% of orgs have faced some sort of phish attach and 96% of attacks arrive via emails. The FBI’s Internet Crime Report shows that in 2020, BEC scammers made over $1.8 billion.
So as you can see, one of the major challenges we are facing today is the misuse of the Twilio platform for sending phish/fraud/spam.
- ATO (Account Take Over) - Fraudster hacks the account by stealing credentials or attacking the account.
- Fake Accounts - Fraudsters create an account with false info and stolen credit card.
- Trial Accounts - Fraudster create a dummy account and utilise the trial Money for sending phish (small scale)
- Toll Fraud - Fraudster make a call to a premium phone number from small shady carriers, sharing the revenue later with them.
- Spam (Good/Bad)
- Twilio reputation is at stake since messages go from Twilio account
- Hard to detect fraudsters in flight although we can sample a few messages post flight, after stripping PIIs.
- Generally good accounts are used by fraudsters for such frauds , so they could have some good traffic as well. So Ban or No-Ban?
- Labelled Training data for ML modelling
- Rate of emails per day different from average
- Engagement rate with emails at receiver’s end
- Soft Bounce
- Mail box full, recipient server was down, message was too large, etc.
- Hard Bouce
- Email id does not exists, email id is invalid, etc.
- Down-streaming content (meeting GDPR regulations)
- Labelled Data Collection
- Hand curated / Programmatic way
- Data Preprocessing
- Stemming, Stopwords, etc
- AI Modelling
- Deep Learning (Bi-Directional LSTM)
- BERT Language Models
- Model Deployment at Downstream data
- Email/SMS/Call cap
- Manual Review by Fraud Ops team
- Hard Ban
- How Twilio help to communicate at Scale
- Various attacks we face at Twilio.
- Hints/Signals we observe to identify attackers.
- ML Solutions we use to avoid fraud communication.
- Illustrate Deep Learning Model for identifying Phish attacks.
- Actions we take on fraudsters.
- Learnings from our Journey so far.
- Future steps to stop fraudsters by using AI.
- Demo of Sift tool - one of the tools we use to detect fraudster’s entry at Twilio (Optional).