Anthill Inside 2019

A conference on AI and Deep Learning

Tickets Propose a session

Accountable Behavioural Change Detection (VEDAR) using Machine Learning

Submitted by Srinivasa Rao Aravilli (@aravilli) on Tuesday, 16 April 2019


Preview video

Session type: Short talk of 20 mins

Abstract

With exponential increase in the availability of telemetry / streaming / real-time data, understanding contextual behavior changes is a vital functionality in order to deliver unrivalled customer experience and build high performance and high availability systems. Real-time behavior change detection finds a use case in number of domains such as social networks, network traffic monitoring, ad exchange metrics etc. In streaming data, behavior change is an implausible observation that does not fit in with the distribution of rest of the data. A timely and precise revelation of such behavior changes can give us substantial information about the system in critical situations which can be a driving factor for vital decisions. Detecting behavior changes in streaming fashion is a difficult task as the system needs to process high speed real-time data and continuously learn from data along with detecting anomalies in a single pass of data. In this talk, we introduce a novel algorithm called Accountable Behavior Change Detection (VEDAR) which can detect and elucidate the behavior changes in real-time and operates in a fashion similar to human perception. We have bench marked our algorithm on open source anomaly detection datasets. We have bench marked our algorithm by comparing its performance on open source anomaly datasets against industry standard algorithms like Numenta HTM and Twitter AdVec (SH-ESD). Our algorithm outperforms above mentioned algorithms for behaviour change detection, efficacy

Outline

This talk mainly covers VEDAR algorithem in detail and benchmarks comparison with other streamingly anomoly detection. More details in the https://arxiv.org/abs/1902.06663

Speaker bio

Aravilli Srinivasa Rao working as Sr. Engineering Manager in Cisco CTO group and leading innovation & incubation of ML and AI projects. As a speaker presented in following conferences/workshops
1) Presented about Cisco’s ML/AI Applications in PDPC/CIPL workshop in Singapore. As a panelist shared experiences and thoughts on Accountable and Responsible AI. 2 ) Presented in IoT and AI Sumit organized by CII ’s in India about IoT and ML applications and related platforms in IoT space. 3) Presented about “Streaming Anomaly Detection” in Cisco’s Data Science Summit in Prague

He has a patent in Software recommendations uisng Reinforcement Learning.

Links

Slides

https://arxiv.org/abs/1902.06663

Preview video

https://youtu.be/gEcBaRjwxXw

Comments

  • Anwesha Sarkar (@anweshaalt) 6 months ago

    Thank you for your submission. Submit your preview video and slides by 23rd March(latest). It helps us to provide a fair evaluation to the proposal and close the review process.

  • Srinivasa Rao Aravilli (@aravilli) Proposer 6 months ago

    Thanks. Slides can be accessed @ https://www.slideshare.net/secret/6f8RsdF8n7fmeZ

    • Zainab Bawa (@zainabbawa) Reviewer 5 months ago

      This is a link to a paper. We are evaluating this. Meanwhile, help us understand:

      1. Who is the target audience for this talk?
      2. What are the takeaways from this proposed talk for the audience?
      • Zainab Bawa (@zainabbawa) Reviewer 5 months ago

        We’ll also need a two-minute elevator pitch video, explaining what this talk is about and why participants at The Fifth Elephant should be interested in listening to this. Submit this by or before 10 May.

        • Srinivasa Rao Aravilli (@aravilli) Proposer 5 months ago

          uploaded 3 minute preview video as requested

      • Zainab Bawa (@zainabbawa) Reviewer 5 months ago

        More feedback points that have come up in the review:

        1. Share details about the algorithm, use cases of where it applies and where it doesn’t.
        2. Show comparisons to other anomaly detection methods such as LSTM, CNN, and graph methods, if you have tried these.
  • Srinivasa Rao Aravilli (@aravilli) Proposer 5 months ago

    The algorithm details are mentioned already in the paper in detail. The benchmark results are compared with two known industry benchmarks i.e TwitterAdvec and Numenta HTM. We will open source the code soon and don’t have plan to compare with LSTM because it deviates the core algorithm which we have implemented. There are various flavours in LSTM, with input gate, without input gate or just using only forget etc and hyperparameters etc. As mentioned in the paper, Deep learning models have to be re-trained frequently in order to stay updated with new data. Also, they require huge amount of data for training purpose. The key take a ways from this talk/paper are 1) how to detect anomalies in the streaming data 2) considering seasonality and trend as well 3) Explainability of the anomaly detected 4) context aware 5)Benchmark comparison with two other.

  • Zainab Bawa (@zainabbawa) Reviewer 5 months ago

    Here are some other comments that have come up in the review:

    1. The comparison with other algorithms relies on just a handful of false positive and negative counts (less than 5 in most cases). From this table no conclusion can be drawn about whether the technique is in general better than others. How do you substantiate your claims?
    2. Since the proposed technique involves multiple other algorithms each of which are themselves computation heavy it’s essential to compare the run time. Can VEDAR detect anomalies in sublinear time?
    3. Following from the above, what are the limitations of VEDAR? Which use cases will benefit from using VEDAR and which will not?
    4. Can you share real-life experiences where VEDAR has been used? What were the outcomes in such cases?
    • Srinivasa Rao Aravilli (@aravilli) Proposer 5 months ago

      Hi, Pleae find the responses to your questions.
      1) We have run the algorithm on multiple datasets, since it is not possible to show our algorithm’s output on every open-source dataset, we have listed a few.
      2) Yes VEDAR can be deployed in streaming fashion with no overhead. It take 2-3 micro-seconds to process each data point.
      3) There is one module which takes 2-3 minutes to run but this modules executes once every data on separate thread which does not affects the run-time of VEDAR.
      4)VEDAR is designed to apply on any kind of streaming data. It doesn’t require any specific use case tuning. We have tested VEDAR in telemetry data from CISCO devices (routers) and found very satisfactory results. VEDAR deployed in our prod and non prod environments (Data from 200+ Application Services which are deployed on 100+ servers) and reduced a substainamount of support team involvement in monitoring and reduce false alrams and alrams storms.

  • Zainab Bawa (@zainabbawa) Reviewer 4 days ago

    Hello Srinivasa,

    Bringing your proposal back into consideration.

    We have the final set of comments based on the proposal and the responses so far:

    1. The examples and the applications are more like time series analysis than streaming analytics. Most examples are where you have data points coming in at close to 5 min intervals which can hardly be called streaming data. The fundamental difference is that in streaming analytics the assumption is that one can’t store all the data so the algorithms need to work with sublinear storage and outlier detection has to happen in linear or sublinear time. In the paper, the authors mention that their intent is to work with non-stationary and dynamically changing data streams. The techniques that the authors use for such data e.g., Kernel density estimation, DBSCAN etc, have nonlinear computational complexity. DBSCAN has quadratic complexity while some versions of Kernel density estimation have exponential complexity. Since true streaming data isn’t used (the acquisition rate is very low), the technique is able to process data using quadratic or exponential time algorithms. But these should not be mistaken for streaming algorithms.
    2. An algorithm needs to be accompanied by a thorough analysis of both the sample complexity and computational complexity. The authors give no such analysis.
    3. It appears that the authors have developed a technique which seems to work well for time series data. However it is an amalgamation of very well-known algorithms and it is best termed a technique for a practical use case rather than as a fundamental algorithm.

    This will be an interesting session. The authors need to clarify:

    1. Modify the abstract to mention this is a technique for analyzing time series data (instead of streaming data).
    2. Refer to the work as a technique rather than an algorithm. Modify the deck accordingly. This is to prevent setting the wrong expectations with the audience.
  • Srinivasa Rao Aravilli (@aravilli) Proposer 4 days ago

    Agree and it is not one sinlge algorithem and it is framework conisits of different layers and each layer the algorithems can be configured based on the use case and directly use the default ones for each layer. The main challenge in behavioural change detection systems to find robust detection that minimizes false positives and elucidates the behaviour changes in real-time similar to human perception. This paper illustrates a novel framework called Accountable Behaviour Change Detection (VEDAR) which illustrates the behaviour changes in real-time using time series data. VEDAR framework consists of 6 layers. Each layer has a unique objective to enrich the time series data to detect the behaviour changes. To achieve each layer objective the best in the class of algorithms are chosen and fine-tuned to work in time-series data. All these layers functionality explained in greater depth in section 3. of the paper.

Login with Twitter or Google to leave a comment