The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

Participate

MUDPIPE - Malicious URL Detection for Phishing Identification and Prevention

Submitted by Arjun BM (@arjunbm) on Thursday, 13 June 2019


Preview video

Session type: Short talk of 20 mins

View proposal in schedule

Abstract

Social engineering is one of the most dangerous threats facing every individual and modern organization. Phishing is a well-known, computer-based, social engineering technique. Attackers use disguised emails as a weapon to target large companies. Numerous fake websites have
been developed to mimic trusted websites, with the aim of stealing financial assets from users and organizations.
With the huge number of phishing emails received every day, companies are not able to detect all of them. That is why new techniques and safeguards are needed to defend against phishing. In the layered-security model, this is the next level of security control to deal with those emails that even manage to evade spam filtering gateway & also block undesired action when a user clicks on a malicious link.
Machine learning (ML) is a popular tool for data analysis and recently has shown promising results in combating phishing. This talk will explore the behind-the-scenes of phishing detection and walk thorugh the the steps required to build a machine learning-based solution to detect phishing attempts, using cutting-edge Python machine learning libraries.

Outline

  1. Introduction to Phishing & Social Engineering
  2. Threat actors and vectors in phishing exploitation attacks
  3. Determinants at play while evaluating website genuineness
  4. How to build your own Machine Learning Model for phishing detection
  5. Demo of an existing model and model evaluation
  6. Factors to be considered while deploying the model in production

Speaker bio

Arjun is a security professional with diverse experience in architecting, designing, implementing & supporting IT Security & Vulnerability Management solutions in Enterprise & Cloud environments. He is an information security enthusiast with diverse experience in areas like Application Security, Security Architecture, DevSecOps, Cloud Security & Machine Learning. Currently, Arjun is currently working as a Security Architect ensuring end-to-end implementation, design and governance of security measures an e-commerce platform, aimed at brand protection and improving customer confidence. He is currently developing products that aid in phishing detection for the enterprise and ensure that defenses are in place to counter this threat.

Links

Slides

https://www.slideshare.net/ArjunBM3/rootconfphishingv2

Preview video

https://youtu.be/Aev9c6hf4lo

Comments

  • Abhishek Balaji (@booleanbalaji) Reviewer a month ago

    Hi Arjun,

    Thank you for submitting a proposal. We need to see more detailed slides to evaluate your proposal. Your slides must cover the following:

    • Problem statement/context, which the audience can relate to and understand. The problem statement has to be a problem (based on this context) that can be generalized for all.
    • What were the tools/frameworks available in the market to solve this problem? How did you evaluate these, and what metrics did you use for the evaluation? Why did you pick the option that you did?
    • Explain how the situation was before the solution you picked/built and how it changed after implementing the solution you picked and built? Show before-after scenario comparisons & metrics.
    • What compromises/trade-offs did you have to make in this process?
    • What is the one takeaway that you want participants to go back with at the end of this talk? What is it that participants should learn/be cautious about when solving similar problems?

    We need your updated slides by Jun 27, 2019 to evaluate your proposal. If we do not receive an update, we’d be moving your proposal for evaluation under a future event.

  • Huzaifa Sidhpurwala (@huzaifas) 19 days ago

    Hi,

    I like the proposal and the slides, just wondering if there is a demo or a proof of concept developed using ML/python. Audiences love demos and with a topic as current as this one, a working proof of concept will be highly appreciated by the audience.

  • Arjun BM (@arjunbm) Proposer 19 days ago

    Hi Huazaifa - Yes. Demo/POC is available and ready to be presented in the conference. Do you have specific recommendations on how I can include this in the PPT?

    • Zainab Bawa (@zainabbawa) Reviewer 9 days ago

      Record a video of the demo and add it in the slides to play from the slides. Safest, and secure way to have a seamless presentation!

  • Prajal Kulkarni (@prajal) 18 days ago

    I like your proposal.It would be interesting to see this approach for solving phishing problems.
    I have a request though. Would like to understand more on what is that you do post data filtering and what is the crux of your model which establishes that its a phishing site. Have you modelled it from scratch or are you leveraging any open-source library’s. And lastly, is there any existing framework which solves this problem?

  • Prajal Kulkarni (@prajal) 18 days ago

    I like your proposal.It would be interesting to see this approach for solving phishing problems.
    I have a request though. Would like to understand more on what is that you do post data filtering and what is the crux of your model which establishes that its a phishing site. Have you modelled it from scratch or are you leveraging any open-source library’s. And lastly, is there any existing framework which solves this problem?

    • Arjun BM (@arjunbm) Proposer 17 days ago

      Hi Prajal,
      Since this is SUPERVISED Machine Learning, the first step is to create a labeled data set. i.e. each row in the dataset represents the attributes of a website and it is labeled as “Phishing or Legitimate” site. There are about 30 parameters/attributes which provide indicators of phishing sites. Feature/Data engineering is used to map each wesbite to these 30 parameter values (details in the PPT). Using this dataset (with sizeable no. of records), we create a baseline model to help identify similar patterns for future/unknown/test data.
      This has been modeled using standard python ML libraries like numpy, pandas, sklearn etc and standard python libraries like: logging, urlparse, json, dns.resolver, etc
      As far as I am aware, there are no existing frameworks which solve this problem. Most organizations use commercial based software (like Symantec Bluecoat) to detect phishing sites.

      • Prajal Kulkarni (@prajal) 7 days ago

        Got it.But how do you plan to get this data-set in the first place? How are you feeding this massive list of sites to your engine? Is this framework expected to solve a point problem (one site at a time) or do you plan to make it a scalable solution.

  • Arjun BM (@arjunbm) Proposer 15 days ago

    Proposal has been updated with new slide deck - https://www.slideshare.net/ArjunBM3/rootconfphishingv2
    Kindly review. Thank you

Login with Twitter or Google to leave a comment