50p 2018

India's first independent conference on payments and the payment ecosystem

From Data to Dollars – Using Advanced Data Technologies to Minimize Fraud Loss.

Submitted by Starakjeet Nayak (@starakjeet) on Thursday, 7 December 2017

Section: Full talk Technical level: Intermediate Status: Rejected


In the world where digital payment is increasingly becoming commonplace, it is important to keep a check on fraud loss. Fraud pressure on popular payments companies is on an upsurge and if left unattended, it can potentially eliminate the business. One way to keep a tap on the fraudulent activities is to leverage the humongous amount of data that flows in with every transaction or user activity on the platform. Using the right data technologies to process the data can help in adjudicating whether a given transaction is fraudulent. In this presentation, I will talk about various ways in which an ecosystem can be constructed with the popular large scale data processing technologies like Hadoop and Teradata to mitigate the fraud loss.


The flow of the talk is as follows:
1. Introduction - What is Fraud Loss and Fraud Pressure?
2. Fraud Models – Mathematical armor for fraud prevention. (Illustration with a fraud model training)
3. Building an Ecosystem of Data Technologies
3.1 Ways to pre-process data to create unique data artifacts that can be used in near-real time. 3.2 How to link popular Data Technologies using a service to create an ecosystem to enable easy data access. 3.3 Journey of a data variable across all stages - raw data collection, data processing, model score computation and finally decision.

Speaker bio

Starakjeet Nayak is a Product Manager at PayPal, working in the Risk Platform team for nearly four years. He has worked for designing and implementing many risk services at PayPal. He has extensively used Hadoop to analyze data to identify transactional fraud at PayPal. Prior to PayPal, Starakjeet has worked for Oracle and he has graduated in Computer Science from Birla Institute of Technology and Science, Pilani.
Outside of work, he likes outdoor activities like hiking, cycling and running. He enjoys public speaking and he has been the event anchor for big events at PayPal like Recharge 2017.


  • Zainab Bawa (@zainabbawa) Crew 2 years ago

    Thanks for this interesting proposal, Starakjeet. I’d like to see either diagrams or draft slides to understand the model and architecture you have for fraud detection. I’d also like to understand why such a system is built with Hadoop, and not anything else. Look forward to the responses.

    • Starakjeet Nayak (@starakjeet) Proposer 2 years ago

      Hi Zainab. Thanks for your interest in this. The purpose of this talk is to illustrate how we can use data for fraud prevention. Fraud models are statistical artifacts which are designed using various techniques like machine learning, deep learning, linear regression etc. These models depend on the data variables to be available in real time or near real time in order to compute a score. The objective is not to talk about how models are designed (I mean there is a definitive science for this, but not the aim of my talk), but rather I want to focus on how we can build an ecosystem which constitues of various data technologies to prepare the necessary data variables for these models. Secondly, Hadoop is just one of the tech that one can use. Hadoop works out best as we need to process a lot of data. I will also talk about other data technologies in the ecosystem like Aerospike, Teradata etc.

      Hope this answers your question.

      • Zainab Bawa (@zainabbawa) Crew 2 years ago

        “but rather I want to focus on how we can build an ecosystem which constitues of various data technologies to prepare the necessary data variables for these models.” Please elaborate more, specifically on what is the learning for the audience from this?

Login to leave a comment