Rootconf Pune edition

On security, network engineering and distributed systems


MUDPIPE - Malicious URL Detection for Phishing Identification and Prevention

Submitted by Arjun BM (@arjunbm) on Friday, 24 May 2019

Section: Crisp talk Technical level: Intermediate Session type: Demo Status: Rejected


Social engineering is one of the most dangerous threats facing every individual and modern organization. Phishing is a well-known, computer-based, social engineering technique. Attackers use disguised emails as a weapon to target large companies. Numerous fake websites have
been developed to mimic trusted websites, with the aim of stealing financial assets from users and organizations.
With the huge number of phishing emails received every day, companies are not able to detect all of them. That is why new techniques and safeguards are needed to defend against phishing. In the layered-security model, this is the next level of security control to deal with those emails that even manage to evade spam filtering gateway & also block undesired action when a user clicks on a malicious link.
Machine learning (ML) is a popular tool for data analysis and recently has shown promising results in combating phishing. This talk will explore the behind-the-scenes of phishing detection and walk thorugh the the steps required to build a machine learning-based solution to detect phishing attempts, using cutting-edge Python machine learning libraries.


  1. Introduction to Phishing & Social Engineering
  2. Threat actors and vectors in phishing exploitation attacks
  3. Determinants at play while evaluating website genuineness
  4. How to build your own Machine Learning Model for phishing detection
  5. Demo of an existing model and model evaluation
  6. Factors to be considered while deploying the model in production

Speaker bio

Arjun is a security professional with diverse experience in architecting, designing, implementing & supporting IT Security & Vulnerability Management solutions in Enterprise & Cloud environments. He is an information security enthusiast with diverse experience in areas like Application Security, Security Architecture, DevSecOps, Cloud Security & Machine Learning. Currently, Arjun is currently working as a Security Architect ensuring end-to-end implementation, design and governance of security measures an e-commerce platform, aimed at brand protection and improving customer confidence. He is currently developing products that aid in phishing detection for the enterprise and ensure that defenses are in place to counter this threat.



Preview video


  •   saurabh hirani (@saurabh-hirani) 10 months ago

    Hi Arjun - can you add some first cut slides and a preview video explaining the flow. Please highlight what value will this talk provide to the audience. I see that the later half of your outline gives a demo + insights on considerations while deploying it in production. Highlighting that would really help because if it is information that the audience can’t easily google - it will make worth their while to attend the talk.

  •   Arjun BM (@arjunbm) Proposer 10 months ago

    Hi Saurabh. Thank you for your feedback. I have updated the proposal with the first cut slides. Please review. Thank you.

  •   Arjun BM (@arjunbm) Proposer 10 months ago

    I have updated the youtube link for preview video also. Please let me know if you need anything else. Thank you

  •   Zainab Bawa (@zainabbawa) Reviewer 9 months ago

    Thanks for the update, Arjun. This proposal is definitely interesting, and is at the intersection of ML and DevSecOps. The intersectionality makes this an important point of view to build on and share with the Rootconf community.

    Given the paucity of time, we are happy to consider your proposal for Rootconf Pune 2019 (to be scheduled in September).

    Meanwhile, we will have reviewers from the community give you feedback on your slides and preview video. The slides are far too cluttered right now. It might serve everyone if you were revisit them and break each slide down into one specific idea.

    •   Arjun BM (@arjunbm) Proposer 9 months ago

      Hi Zainab. Thank you for taking the time and effort to review this proposal. I would be privileged to present this at Rootconf Pune 2019. I would be happy to work with the reviewers and get their feedback to improve the slides. Please keep me updated. Thank you.

  •   Zainab Bawa (@zainabbawa) Reviewer 8 months ago

    Hello Arjun,

    Here are two sets of feedback I received from a review we carried out:

    1. How do you crawl the internet to identify phishing domains? Do you use Apache Nutch or Spiderfoot? How do we validate the accuracy of spidering?
    2. Have you validated results with or some other method? Will your analysis work with an available phishing DB rather than with ML? Why is ML really needed to solve the problem you are describing?
    3. Which ML models did you try before deciding the one you are presenting? What were the drawbacks of other approaches? This will be interesting unless you used a small dataset.
    4. While the topic is useful, Google, Microsoft and other public email providers have done significant works on this area. Phishing detection that these email providers do are decent (especially given their scale) and because they have massive dataset. Therefore, are you trying to solve this problem from scratch i.e. using conventional supervised learning / classifiers as it will end up being just another lab project that will be difficult to apply in real life due to lack of training data.
    5. Does it make more sense to build a solution using proven and reliable technologies/APIs from Google or other providers (if any) in a way that can be integrated with self-hosted mail solution or any other use-case that requires classification of email or text content?

Login with Twitter or Google to leave a comment