Published on

Optimizing Storage Costs with Automatic Tiering

Authors
  • avatar
    Name
    Parminder Singh
    Twitter

Storage costs can quickly add up as data volumes grow. Automatic tiering is a powerful technique that can help optimize storage expenses by moving data between different storage tiers based on its access patterns and business requirements. With multi-cloud environments tiering is even more important as it can help you leverage the best storage options across different cloud providers. In this article, I'll discuss building a solution around automatic tiering using MinIO as the storage backend.

Optimizing Storage Costs with Automatic Tiering

Photo by Ries Bosch on Unsplash

Terminology

Before diving into the details, let's define some key terms.

RTO (Recovery Time Objective): The maximum acceptable time to restore data after an outage or failure. E.g. Data needs to be restored within 4 hours.

RPO (Recovery Point Objective): The maximum acceptable amount of data loss in case of a disaster. E.g. We can afford to lose up to 4 hours of data.

SDS (Software-Defined Storage): Using abstraction and software to manage storage resources, decoupled from the underlying hardware.

Object Storage: A storage architecture that manages data as objects, each with its own unique identifier and metadata.

Tiering: The process of moving data between different storage tiers based on predefined criteria like access frequency, age, or cost.

Tiered Storage

Cloud providers offer various storage classes (tiers) with different performance characteristics and costs. E.g., Amazon S3 has Standard, Glacier Instant Retrieval, Glacier Flexible Retrieval, and more. Azure Blob Storage has Hot, Cool, and Archive tiers. Costs for these tiers vary based on factors like data access frequency, retrieval times, and durability.

Please note that each cloud provider provides automatic tiering solutions for their storage services as well. However, a custom solution becomes imperative in the following scenarios:

  1. Multi-Cloud Environments: Some of the challenges include diverse APIs, different storage classes, data mobility (egress costs, transfer speeds), etc. A custom solution can provide a unified interface abstracting underlying complexities.
  2. Custom Policies: You may have specific data management requirements that aren't covered by cloud provider tiering policies. E.g., compliance regulations, rules based on data sensitivity, etc.
  3. Custom Workflows: Integrating tiering with existing workflows and applications may require a more flexible solution.
  4. Managed Service Providers: MSPs managing storage for multiple clients need a centralized tiering solution.
  5. Deduplication, Compression & Encryption: Custom tiering solutions can incorporate additional data management features.

Example Use Case

Let's take a sample scenario where you're backing up a medical provider's testing data (Reports/XRays/MRIs/etc.). The data has the following requirements:

  • Frequent access (multiple times per day) for the first 7 days for operational recovery and testing.
  • Infrequent access (1-2 times per month) for the next 3 months for compliance and auditing.
  • Rare access (once or twice a year) for long-term archival.
  • RPO of 24 hours (daily backups).
  • RTO of 4 hours for operational recovery.
  • RTO of 24 hours for compliance and auditing.

Custom Tiering Solution

Here's how we can design a custom tiering solution to optimize storage costs:

  • Tier 1 (Hot): Store the last 7 days of backups on high-performance storage (e.g., SSD-backed or S3 Standard) to meet the 4-hour RTO for operational recovery.
  • Tier 2 (Warm): Transition backups older than 7 days to a lower-cost, readily accessible tier (e.g., S3 Glacier Instant Retrieval) for compliance and auditing needs.
  • Tier 3 (Cold): Move backups older than 3 months to the cheapest archival tier (e.g., S3 Glacier Flexible Retrieval or Deep Archive) for long-term retention.

Back of the Envelope Calculation

The following table is based on assuming 10GB of data with a weekly access pattern. The custom solution assumes tiering data across S3 Standard, Glacier Instant Retrieval, and Glacier Flexible Retrieval based on access frequency and age.

Storage TierDescriptionStorage Cost (per month)Retrieval Cost (per week)Total Monthly Cost
S3 StandardFrequent access$0.23$0.00$0.23
S3 Glacier Instant RetrievalInfrequent access$0.004$0.00$0.004
S3 Glacier Flexible RetrievalRare access$0.00099$0.025 per GB retrieved (estimated)0.00099+0.00099 + 0.25 = $0.25099
Custom Solution (MinIO)Tiered across S3 Standard, Glacier Instant Retrieval, and Glacier Flexible Retrieval$0.01 (estimated)$0.005 (estimated)$0.015 (estimated)

Please note that this is a simplified example and actual costs will vary based on data volume, access patterns, and cloud provider pricing. The custom solution is estimated to save around 93% compared to storing all data in S3 Standard.

Solution Design

Here's a logical overview of how an automated tiering solution can be implemented:

  1. Data Ingestion with Metadata: When uploading/ingesting data, tag it with relevant metadata, such as its RTO (Recovery Time Objective) or access frequency requirements, etc.
  2. Storage Backend Integration: Configure the system to interact with different storage backends (AWS S3, Azure Blob Storage, GCP Storage, etc.).
  3. Tiering Policy: Define rules that determine when and how data should be moved between tiers. This might involve considering factors like data age, access patterns, and RTO.
  4. Automated Transition: Implement a mechanism to periodically evaluate data and automatically transition it to the appropriate tier based on the defined rules.
  5. Configure underlying Engine:: Translate business constructs and requirements into the underlying storage engine. E.g., How do you convert RPO and RTO into storage classes and tiers?

MinIO

MinIO is a cloud-native, open-source, high-performance object storage server that is S3-compatible. It's designed for cloud-native applications and provides a simple and scalable way to manage data. MinIO is just one of the many SDS options available for building a custom tiering solution. There are other solutions like Ceph, OpenIO, etc.

MinIO is easy of use and you can setup a local MinIO server for testing by following the Quickstart Guide. It's tiering feature, called Object Transition, allows you to define lifecycle rules for automatically moving objects between different storage classes. MinIO runs a background process that automatically transitions objects based on the defined lifecycle rules.

All operations can be performed via APIs, CLI or the MinIO Console.

Some screenshots of the MinIO Console are shown below.

Bucket Listing Bucket Listing
Tier Creation Tier Creation
Tier Listing Custom Tiers
Lifecycle Rules Lifecycle Rules

Takeaway

Beyond cost optimization, automatic tiering helps in simplified data management, improved performance, and increased scalability. By leveraging solutions like MinIO, you can implement a custom tiering strategies that can span multiple cloud providers and handle complex data management requirements. If you need help in designing and implementing a tiering solution for your organization, feel free to reach out.