Rootconf 2017

On service reliability

A quick how-to on capacity planning for an application deployed in AWS and how to use this information for configuring AWS autoscaling policies

Submitted by Laxmi Nagarajan (@laxmi777) on Tuesday, 14 February 2017

videocam_off

Technical level

Beginner

Section

Crisp talk of 15 mins duration

Status

Confirmed & Scheduled

View proposal in schedule

Vote on this proposal

Login to vote

Total votes:  +2

Abstract

Understanding the Capacity limits of an application is critical to ensuring that SLAs are consistently met.
This how-to talk aims to break down the process of Capacity planning into three steps that leverage standard, simple tools. It also touches upon how the learnings from the capacity planning can be channelled into the setup of AWS autoscaling policies.

Outline

Capacity planning involves the three main steps below
a) Coming up with the load pattern for one single host: While it is useful to benchmark key APIs individually and regress degradations in these KPIs release over release, from a capacity prediction perspective, it is more accurate to base predictions off of production traffic patterns. Dashboards in New Relic provide a clear, real time, window into the top used APIs and this data, coupled with Splunk filters, provides peak incoming request count for each API. Based on the total AWS instances count, production load per AWS instance can be arrived at and simulated in the performance load scripts.
b) Preparing the load testing scripts and run the tests in the Perf environment: JMeter is the tool of choice for load testing script creation and execution. For the predictions to be reliable, the tests must run in a (scaled down) performance environment which has server size matching that of the production boxes and tests must run from the same subnet. Care must be exercised to ensure dependent downstream environments are also performance environments. Any caching optimisations must be identified and called out. Load tests starting at current load should be scaled up incrementally to upto 5X/10X of the current load.
c) Analysing/extrapolating the results to determine the capacity and autoscaling policies: KPIs for analysis are the client and server side response times, TP90, CPU and memory consumption and Apdex scores. This KPI data can be used to identify the load at which application SLAs are met and extrapolated to determine loads that can be optimally processed in Production. Also, based on peak traffic analysis, if there is recurring, predictable spike in usage for a time window, auto scaling policies can be configured in AWS for provisioning AWS instances on demand, so as to optimise operation costs.

Speaker bio

Laxmi Nagarajan is a Staff Software Engineer in Quality, Intuit, Inc. She has helped drive Quality upstream in the development cycle for SAAS applications built in Adobe, Paypal and startups in the Bay area and more recently in Intuit, IDC.

Slides

https://drive.google.com/open?id=0B6D4MkV1TbB-aGg5bmhHdWpYN00

Comments

  • 1
    Zainab Bawa (@zainabbawa) Reviewer a year ago

    Thanks for this proposal, Laxmi. To complete the review, we need draft slides and link to a self-recorded video explaining what this talk is about and why the audience should attend it. Please share this information, latest by Wednesday, 22 Feb.

  • 1
    Laxmi Nagarajan (@laxmi777) Proposer a year ago

    Thank you for the follow-up Zainab. I will upload draft slides and link to a video by 22nd Feb at the latest

    • 1
      Zainab Bawa (@zainabbawa) Reviewer a year ago

      Looking forward.

  • 1
    saurabh hirani (@saurabh-hirani) a year ago

    can you please open up the slides for general public access? none of the rootconf proposal slides should be closed as it helps the audience to understand your thought process

    • 1
      Laxmi Nagarajan (@laxmi777) Proposer a year ago

      The slides setting is for anyone with the link to be able to view it. Could you retry please? Thank you

      • 1
        saurabh hirani (@saurabh-hirani) a year ago

        Thanks. Am able to access it.

Login with Twitter or Google to leave a comment