Rootconf 2017

On service reliability

Spotswap: running production APIs on spot instances

Submitted by Aruna Sankaranarayanan (@aruna28) on Sunday, 9 April 2017

videocam_off

Technical level

Intermediate

Section

Full talk of 40 mins duration

Status

Confirmed & Scheduled

View proposal in schedule

Vote on this proposal

Login to vote

Total votes:  +3

Abstract

The Mapbox maps API runs on AWS’s cloud infrastructure, responding to a peak of ~1.5M requests/minute on the edge, and spread over 6 origin regions globally.

Since 2015, we’ve cut the cost of running our maps services on AWS EC2 while continuing to increase the scale of our web traffic. This talk will delineate our cost optimisation journey by talking about why we decided to run production architecture on spot instances and the evolution of our architecture as a result.

Outline

The AWS EC2 market allows users to bid for unused EC2 capacity, which is sold at a lower price than on-demand capacity. On the spot market, you could pay as little as 10% of list price for an EC2. The market price is constantly fluctuating, determined based on demand for certain instance types, among other factors. The one caveat to using spot instances is that if the market price rises above your bid price, you only have 2 minutes’ notice before the EC2 shuts down.

In order to run production instances on the spot market, we set up two AutoScaling groups, one running spot instances and another running the fallback on-demand instances. When instances in the spot group showed signs of failing healthchecks, and scaledown the on-demand group again when the spot group had stabilised. While the spot autoscaling group provided for the cost efficiency, the backup on-demand group made our services robust and scalable.

spot

When we migrated our stacks to ECS (AWS’s hosted Docker solution) last summer, we replaced the spot on demand group with a spot fleet. Now we are protected not only by the cost savings on the spot market, but also the additional diversity of a spotfleet where we are able to choose multiple EC2 instance types in a spotfleet to replace the spot autoscaling group in our earlier architecture.

Detailed breakup

  • Reliable production services on spot instances.
  • The need for spotswap
  • Spotswap 0.0.1
    • Bottlenecks
    • Ensuring stability of the architecture
    • Spotswap 0.0.2
  • Open sourcing spotswap

Speaker bio

I am a developer on the Platform team at Mapbox. Over the last year, I have worked on scaling our internal infrastructure, and helped in migrating our services to the AWS ECS. This talk talks about an elegant architectural concept that we devised at Mapbox to run our services in a cost-efficient manner. In other lives, I have written 2D games using the Cocos2D engine, submitted small activities to the GCompris project and contributed to the GNOME Foundation. I also enjoy collecting recordings of Indian classical music that are in the public domain.

Slides

https://www.dropbox.com/s/y8wv5oo8kggxj40/may-2017-rootconf-PDF.pdf?dl=0

Comments

Login with Twitter or Google to leave a comment