Rootconf Pune edition

Rootconf Pune edition

On security, network engineering and distributed systems



Arjun BM

Arjun BM


MUDPIPE - Malicious URL Detection for Phishing Identification and Prevention

Submitted May 24, 2019

Social engineering is one of the most dangerous threats facing every individual and modern organization. Phishing is a well-known, computer-based, social engineering technique. Attackers use disguised emails as a weapon to target large companies. Numerous fake websites have
been developed to mimic trusted websites, with the aim of stealing financial assets from users and organizations.
With the huge number of phishing emails received every day, companies are not able to detect all of them. That is why new techniques and safeguards are needed to defend against phishing. In the layered-security model, this is the next level of security control to deal with those emails that even manage to evade spam filtering gateway & also block undesired action when a user clicks on a malicious link.
Machine learning (ML) is a popular tool for data analysis and recently has shown promising results in combating phishing. This talk will explore the behind-the-scenes of phishing detection and walk thorugh the the steps required to build a machine learning-based solution to detect phishing attempts, using cutting-edge Python machine learning libraries.


  1. Introduction to Phishing & Social Engineering
  2. Threat actors and vectors in phishing exploitation attacks
  3. Determinants at play while evaluating website genuineness
  4. How to build your own Machine Learning Model for phishing detection
  5. Demo of an existing model and model evaluation
  6. Factors to be considered while deploying the model in production

Speaker bio

Arjun is a security professional with diverse experience in architecting, designing, implementing & supporting IT Security & Vulnerability Management solutions in Enterprise & Cloud environments. He is an information security enthusiast with diverse experience in areas like Application Security, Security Architecture, DevSecOps, Cloud Security & Machine Learning. Currently, Arjun is currently working as a Security Architect ensuring end-to-end implementation, design and governance of security measures an e-commerce platform, aimed at brand protection and improving customer confidence. He is currently developing products that aid in phishing detection for the enterprise and ensure that defenses are in place to counter this threat.



Login to leave a comment

  • Zainab Bawa

    @zainabbawa Editor & Promoter

    Hello Arjun,

    Here are two sets of feedback I received from a review we carried out:

    1. How do you crawl the internet to identify phishing domains? Do you use Apache Nutch or Spiderfoot? How do we validate the accuracy of spidering?
    2. Have you validated results with or some other method? Will your analysis work with an available phishing DB rather than with ML? Why is ML really needed to solve the problem you are describing?
    3. Which ML models did you try before deciding the one you are presenting? What were the drawbacks of other approaches? This will be interesting unless you used a small dataset.
    4. While the topic is useful, Google, Microsoft and other public email providers have done significant works on this area. Phishing detection that these email providers do are decent (especially given their scale) and because they have massive dataset. Therefore, are you trying to solve this problem from scratch i.e. using conventional supervised learning / classifiers as it will end up being just another lab project that will be difficult to apply in real life due to lack of training data.
    5. Does it make more sense to build a solution using proven and reliable technologies/APIs from Google or other providers (if any) in a way that can be integrated with self-hosted mail solution or any other use-case that requires classification of email or text content?
    Posted 5 years ago
Hybrid access (members only)

Hosted by

We care about site reliability, cloud costs, security and data privacy