Anthill Inside 2019

On infrastructure for AI and ML: from managing training data to data storage, cloud strategy and costs of developing ML models

Accountable Behavioural Change Detection (VEDAR) using Machine Learning

Submitted by Srinivasa Rao Aravilli (@aravilli) on Tuesday, 16 April 2019


Preview video

Session type: Short talk of 20 mins

Abstract

With exponential increase in the availability of telemetry / streaming / real-time data, understanding contextual behavior changes is a vital functionality in order to deliver unrivalled customer experience and build high performance and high availability systems. Real-time behavior change detection finds a use case in number of domains such as social networks, network traffic monitoring, ad exchange metrics etc. In streaming data, behavior change is an implausible observation that does not fit in with the distribution of rest of the data. A timely and precise revelation of such behavior changes can give us substantial information about the system in critical situations which can be a driving factor for vital decisions. Detecting behavior changes in streaming fashion is a difficult task as the system needs to process high speed real-time data and continuously learn from data along with detecting anomalies in a single pass of data. In this talk, we introduce a novel algorithm called Accountable Behavior Change Detection (VEDAR) which can detect and elucidate the behavior changes in real-time and operates in a fashion similar to human perception. We have bench marked our algorithm on open source anomaly detection datasets. We have bench marked our algorithm by comparing its performance on open source anomaly datasets against industry standard algorithms like Numenta HTM and Twitter AdVec (SH-ESD). Our algorithm outperforms above mentioned algorithms for behaviour change detection, efficacy

Outline

This talk mainly covers VEDAR algorithem in detail and benchmarks comparison with other streamingly anomoly detection. More details in the https://arxiv.org/abs/1902.06663

Speaker bio

Aravilli Srinivasa Rao working as Sr. Engineering Manager in Cisco CTO group and leading innovation & incubation of ML and AI projects. As a speaker presented in following conferences/workshops
1) Presented about Cisco’s ML/AI Applications in PDPC/CIPL workshop in Singapore. As a panelist shared experiences and thoughts on Accountable and Responsible AI. 2 ) Presented in IoT and AI Sumit organized by CII ’s in India about IoT and ML applications and related platforms in IoT space. 3) Presented about “Streaming Anomaly Detection” in Cisco’s Data Science Summit in Prague

He has a patent in Software recommendations uisng Reinforcement Learning.

Links

Slides

https://arxiv.org/abs/1902.06663

Preview video

https://youtu.be/gEcBaRjwxXw

Comments

  • Anwesha Sarkar (@anweshaalt) 2 months ago

    Thank you for your submission. Submit your preview video and slides by 23rd March(latest). It helps us to provide a fair evaluation to the proposal and close the review process.

  • Srinivasa Rao Aravilli (@aravilli) Proposer 2 months ago

    Thanks. Slides can be accessed @ https://www.slideshare.net/secret/6f8RsdF8n7fmeZ

    • Zainab Bawa (@zainabbawa) Reviewer a month ago

      This is a link to a paper. We are evaluating this. Meanwhile, help us understand:

      1. Who is the target audience for this talk?
      2. What are the takeaways from this proposed talk for the audience?
      • Zainab Bawa (@zainabbawa) Reviewer a month ago

        We’ll also need a two-minute elevator pitch video, explaining what this talk is about and why participants at The Fifth Elephant should be interested in listening to this. Submit this by or before 10 May.

        • Srinivasa Rao Aravilli (@aravilli) Proposer a month ago

          uploaded 3 minute preview video as requested

      • Zainab Bawa (@zainabbawa) Reviewer a month ago

        More feedback points that have come up in the review:

        1. Share details about the algorithm, use cases of where it applies and where it doesn’t.
        2. Show comparisons to other anomaly detection methods such as LSTM, CNN, and graph methods, if you have tried these.
  • Srinivasa Rao Aravilli (@aravilli) Proposer a month ago

    The algorithm details are mentioned already in the paper in detail. The benchmark results are compared with two known industry benchmarks i.e TwitterAdvec and Numenta HTM. We will open source the code soon and don’t have plan to compare with LSTM because it deviates the core algorithm which we have implemented. There are various flavours in LSTM, with input gate, without input gate or just using only forget etc and hyperparameters etc. As mentioned in the paper, Deep learning models have to be re-trained frequently in order to stay updated with new data. Also, they require huge amount of data for training purpose. The key take a ways from this talk/paper are 1) how to detect anomalies in the streaming data 2) considering seasonality and trend as well 3) Explainability of the anomaly detected 4) context aware 5)Benchmark comparison with two other.

  • Zainab Bawa (@zainabbawa) Reviewer a month ago

    Here are some other comments that have come up in the review:

    1. The comparison with other algorithms relies on just a handful of false positive and negative counts (less than 5 in most cases). From this table no conclusion can be drawn about whether the technique is in general better than others. How do you substantiate your claims?
    2. Since the proposed technique involves multiple other algorithms each of which are themselves computation heavy it’s essential to compare the run time. Can VEDAR detect anomalies in sublinear time?
    3. Following from the above, what are the limitations of VEDAR? Which use cases will benefit from using VEDAR and which will not?
    4. Can you share real-life experiences where VEDAR has been used? What were the outcomes in such cases?
    • Srinivasa Rao Aravilli (@aravilli) Proposer a month ago

      Hi, Pleae find the responses to your questions.
      1) We have run the algorithm on multiple datasets, since it is not possible to show our algorithm’s output on every open-source dataset, we have listed a few.
      2) Yes VEDAR can be deployed in streaming fashion with no overhead. It take 2-3 micro-seconds to process each data point.
      3) There is one module which takes 2-3 minutes to run but this modules executes once every data on separate thread which does not affects the run-time of VEDAR.
      4)VEDAR is designed to apply on any kind of streaming data. It doesn’t require any specific use case tuning. We have tested VEDAR in telemetry data from CISCO devices (routers) and found very satisfactory results. VEDAR deployed in our prod and non prod environments (Data from 200+ Application Services which are deployed on 100+ servers) and reduced a substainamount of support team involvement in monitoring and reduce false alrams and alrams storms.

Login with Twitter or Google to leave a comment