Optimizing costs of cloud infrastructures

Optimizing costs of cloud infrastructures

Practical case studies from enterprises and startups

Tickets

Loading…

Shivam Beeyani

@bombdwarf

Data transfer cost optimization.

Submitted Jan 31, 2022

Need for Data Transfer Cost Visibility

Data transfer is a vital part of the functioning of an organization, when an organization is spread over multiple regions it becomes even more important. Data transfer cost contributes to a large chunk of overall cloud costs for most large-scale AWS architectures.

Solutions Architects and DevOps engineers, while optimizing cloud spending on AWS, find it time-consuming and hard to reduce data transfer costs. This is due to the inadequate and distributed visibility of the source & destination of the resources or services that are exchanging data in the form of traffic within or outside the cloud environment which results in constant surges in the overall cloud costs as we are unable to effectively pinpoint the root cause of the issues and activities leading to high data transfer costs. This would also help Security Engineers to identify any resources using public IPs and perform public data transfer so that I can find data leaks and fix any security vulnerabilities.

Controlling & streamlining data transfer costs need changes in the architectures. To do that, we not only need to develop context around data transfer pricing but also need to have granular visibility into the traffic or data exchange between services or resources, both of which - are very time-consuming and hard to build & maintain.

We set out to build a system that provides DevOps engineers with pre-built and near real-time granular visibility into data transfer costs so that they do not have to spend any time and effort in building the same.
This should be able to additionally help engineers to reduce the data transfer costs by providing them insights & suggestive actions so that they don’t have to spend time & effort in doing a continuous literature review of data transfer pricing which in most cases is not a very exciting work to do regularly.

Engineers start to solve this problem by using cloud-native cost-control tools which at the max, provide information about the infrastructure API operations that are causing data transfer costs, but no information about the traffic between the resources or services that invoke those API operations.
From CUR to VPC flow logs

There are many columns in the AWS Cost and Usage Report. Which helps build an understanding of your data transfer cost and see which resources contribute to this cost. AWS CUR has an Operation line item, which provides Generic costs like Interzone-In or VPCPeering-In can be filtered by Operation, enabling you to find resources that might generate unnecessary Operation costs. But lacks the granular visibility which resources and/or Services are communicating with each other.

To obtain this visibility we can use VPC flow logs which are records of traffic flow within an environment, between applications, and services. These are raw network logs providing details about the source and destination of the request, the ports that the resources are communicating on, the total size and total packets, the attached network interface, and much more. But this data in itself does not provide much information.

Below is an example of a vpc flow logs

image

VPC flow logs can be enabled for a VPC, a subnet, or a network interface. This also includes resources for which the network interface is created by AWS services. This acts as a central source of truth for analysing the data transfers within an environment. The volume of VPC flow logs generated is humongous, as this is the log of each and every network request that happens within an environment

We use a 3 step ETL process(Extract, Transform, and Load) to analyze Data Transfer Cost

Prerequisites

We at Opslyft have an internal tool that polls AWS APIs and other services to collect resource, service mapping to their network interfaces along with a list of IP mappings for all AWS global services. And a separate API which powers up the pricing data for data transfers.

Step 1. Extract

VPC flow logs are enabled on all VPC's across all regions and child accounts. These flow logs are delivered to a central S3 bucket either in an hourly or daily partition. Along with extracting VPC Flow logs this step also involves our internal tool which polls AWS API and other services to collect information related to the IP address to resource mapping. This also allows us to look at the IPs mapping to a resource with the function of time (ie. At a given time which resource was using the requested IP address).

Step 2. Transform

A lambda is triggered on an hourly basis once the flow logs start being delivered to a Centralised S3 bucket. Here the lambda functions reads the flow logs and enrichs them with the resource metadata that is collected by the mapping created by IP to Resources. This Step also takes into account the many conditions that are posed as a challenge while analysing flow logs. To mention a few
Let's assume that Instance A is communicating with instance B the flow logs are generated by the network interfaces attached to these instances which means each network interface logs 2 flow records one which is outbound another which is inbound, resulting in 4 flow log records per data transfer. This complexity increases when we introduce a NAT gateway between transactions resulting in 12 logs for one transaction.

image

Step 3. Load

The enriched flow logs are then stored in S3. Which is then further aggregated to be used by the dashboard that contains data grouped on account, service, region, az and traffic flow direction. This output is stored in an RDS database, where a separate lambda populates our aggregations with the transfer cost we incurred using the AWS pricing API. This in turn powers our dashboards with different views which would help you understand and plan the next steps in your path to reduce data transfer costs.

Analyzing the data

From the dashboards below, we can get a holistic as well as granular view of the data transfer operations in our AWS VPC. Additionally, we have an aggregation browser, if we want to manually browse, filter and group the flow logs to get a better understanding of the data flow within your VPCs.

The above aggregations provide us with an overview of the data flow and the costs that we have incurred. As you can see we get a clear visibility over where our transfer costs are coming from, this can be further drilled down to figure out the source and destination regions or services of these transfers (as shown below).

The drilled down version of this data will aid you in figuring out the necessary changes that we need to incorporate so we can reduce our data transfer costs, for example here we can see a sizeable amount of data is being transferred from EC2 instances to us-east-1 region through a NAT gateway, empowered with this information we can decide if we can shift these instances into a subnet with an internet gateway, which would remove our transfer costs as outbound transfer through an internet gateway is not charged.

Additionally the aggregation browser can help us in aggregating the data further using more filters and time ranges from which we can plan out actionable items to reduce our transfer costs.

Key Takeaways to help you reduce your data transfer costs

  1. This has allowed us to gain meaningful insights on which of the NAT Gateways can be replaced with Internet gateway to reduce both data processing costs and the NAT Gateway running cost.

  2. Use Amazon CloudFront to serve static content which reduces transfer costs by introducing caching at your edge locations as well as providing you with higher availability.

  3. Keep Infrastructural components like RDS and Redshift in the same region to save on Inter-region transfer costs. Transfer costs vary between source and destination regions.

  4. This allows us to monitor traffic between resources in different regions or multiple accounts and recommend using VPC Peering or VPC Sharing to further optimize the data transfer cost.

  5. Avoid using public IP addresses for internal data transfers within the same Availability Zone. Intra-Availability Zone data transfers are free, provided you use private IP addresses

Have a look at the slidedeck here: https://docs.google.com/presentation/d/1dphLNKK6qBjKjv3K2AJzhhkA2kIAaduxerrPi8tSwOI/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid access (members only)

Hosted by

We care about site reliability, cloud costs, security and data privacy