A quick how-to on capacity planning for an application deployed in AWS and how to use this information for configuring AWS autoscaling policies
Understanding the Capacity limits of an application is critical to ensuring that SLAs are consistently met.
This how-to talk aims to break down the process of Capacity planning into three steps that leverage standard, simple tools. It also touches upon how the learnings from the capacity planning can be channelled into the setup of AWS autoscaling policies.
Capacity planning involves the three main steps below
a) Coming up with the load pattern for one single host: While it is useful to benchmark key APIs individually and regress degradations in these KPIs release over release, from a capacity prediction perspective, it is more accurate to base predictions off of production traffic patterns. Dashboards in New Relic provide a clear, real time, window into the top used APIs and this data, coupled with Splunk filters, provides peak incoming request count for each API. Based on the total AWS instances count, production load per AWS instance can be arrived at and simulated in the performance load scripts.
b) Preparing the load testing scripts and run the tests in the Perf environment: JMeter is the tool of choice for load testing script creation and execution. For the predictions to be reliable, the tests must run in a (scaled down) performance environment which has server size matching that of the production boxes and tests must run from the same subnet. Care must be exercised to ensure dependent downstream environments are also performance environments. Any caching optimisations must be identified and called out. Load tests starting at current load should be scaled up incrementally to upto 5X/10X of the current load.
c) Analysing/extrapolating the results to determine the capacity and autoscaling policies: KPIs for analysis are the client and server side response times, TP90, CPU and memory consumption and Apdex scores. This KPI data can be used to identify the load at which application SLAs are met and extrapolated to determine loads that can be optimally processed in Production. Also, based on peak traffic analysis, if there is recurring, predictable spike in usage for a time window, auto scaling policies can be configured in AWS for provisioning AWS instances on demand, so as to optimise operation costs.
Laxmi Nagarajan is a Staff Software Engineer in Quality, Intuit, Inc. She has helped drive Quality upstream in the development cycle for SAAS applications built in Adobe, Paypal and startups in the Bay area and more recently in Intuit, IDC.