What Is S3? How S3 Storage Works, Pros & Cons, Use Cases

Amazon S3 is the foundation of object storage for many cloud and hybrid environments. Our new article explains its architecture, storage classes, and how they apply to backup, analytics, and long-term retention.

The S3 protocol has become the de facto standard for object storage. While it originated with Amazon, “S3” now refers to two distinct things: the universal API specification used across the industry, and the specific service provided by AWS.

This distinction is important. Whether you are managing data on-premises or in the public cloud, S3 is likely how you will interact with it. This article covers the basics and architecture of the S3 protocol, how Amazon implements it, and what are the most optimal scenarious and use cases for S3 storage.

The Difference: S3 Protocol vs. Amazon S3

To understand the ecosystem, you must separate the interface from the infrastructure.

The S3 Protocol is a REST-based API. It defines a strictly limited set of operations, primarily PUT, GET, DELETE, and LIST – to manage data over HTTP. Because it is stateless and simple, it has been adopted by almost every storage vendor. If a platform claims “object storage support,” it usually means it speaks S3.

Amazon S3 (Simple Storage Service) is AWS’s proprietary implementation of S3 protocol. It is a fully managed service that handles the physical complexity of storage: replication, hardware maintenance, and scaling, while exposing the standard S3 API to the user.

How does S3 storage work?

S3 is a Key-Value store. Unlike a file system, which manages data in a hierarchy of directories and sub-directories, S3 uses a flat structure.

Buckets: The top-level logical container. In a flat namespace, the bucket is the primary boundary for access policies, quotas, and replication rules.
Objects: The data itself (payload) combined with metadata.
Keys: The unique identifier for the object.

There are no folders in S3

S3 has no directories. If you upload a file to backups/servers/db.iso, the system does not create a folder named backups containing a folder named servers. The entire string backups/servers/db.iso is the Object Key. The “folders” visible in storage browsers are merely UI abstractions based on the / prefix.

This flat architecture is why object storage scales to petabytes. The system never needs to traverse a file tree or lock a directory; it simply looks up the unique hash of the key.

How applications interact with S3?

Because S3 is API-driven, each operation is self-contained. A client request must include everything needed to complete that action: authentication, object name, and full payload when writing.

There is no persistent session and no open file handle. Every GET or PUT is an independent transaction.

This has several architectural consequences:

High latency per operation matters less than aggregate throughput.
Parallelism is the normal way to achieve performance.
Small, frequent updates are inefficient compared to batching data into larger objects.
Retry logic is simple because operations are atomic.

Modern backup software, for example, writes large sequential objects instead of many small files specifically to align with this behavior.

Why the S3 model scales differently from file or block storage

File and block storage are built around mutation. Data is opened, modified in place, locked, extended, truncated, and rewritten. That requires tight synchronization between clients and the storage system.

S3 avoids this entirely by treating objects as immutable units.

When an application writes data to S3, it creates a new object rather than modifying an existing one. If the content changes, the client writes a replacement object. The storage system does not need to coordinate partial updates or maintain write ordering across clients.

This design dramatically simplifies scaling because:

There is no need for distributed locking.
Metadata operations are predictable and isolated.
Data can be placed anywhere without maintaining structural relationships.
Failures do not interrupt in-progress modifications because objects are never edited.

That tradeoff is why S3 is excellent for backups, analytics datasets, logs, and media repositories – workloads where data is written once and read many times.

Amazon S3 storage classes

Amazon’s native S3 implementation offers multiple storage classes designed for different access patterns and cost profiles. The idea is not simply cheap storage – it is matching cost to how data is actually used. Storage classes can be assigned at the bucket or object level, and lifecycle policies can move data between tiers automatically as access patterns change.

Storage Class	Access	Latency	Availability	Cost Profile
S3 Standard	Frequent	Milliseconds	High	Highest storage cost
S3 Standard-IA	Infrequent	Milliseconds	High	Lower than Standard, retrieval fees
S3 One Zone-IA	Infrequent	Milliseconds	Lower	Lower than Standard-IA
S3 Glacier	Rare	Minutes to hours	Archival	Very low
Glacier Deep Archive	Very rare	Hours	Archival	Lowest

S3 Standard is designed for frequently accessed, latency-sensitive workloads: active application data, websites, analytics, content delivery. Data is stored redundantly across multiple systems within a region.

S3 Standard-Infrequent Access (Standard-IA) offers the same durability and millisecond access as Standard, but at a lower per-GB storage price. The trade-off is a per-GB retrieval fee and a 128 KB minimum object size charge. There is also a 30-day minimum storage duration – if you delete an object before 30 days, you still pay for the full 30 days. This makes Standard-IA a sweet spot for backup copies and data accessed a few times per month, but a bad choice for data you might delete or overwrite quickly.

S3 Glacier and S3 Glacier Deep Archive are designed for long-term retention and compliance. They offer the lowest storage costs in Amazon S3, but retrieval is not instant. Glacier retrievals range from minutes (Expedited) to hours (Bulk), and Deep Archive retrievals take 12+ hours by default. Both have a 90-day (Glacier) or 180-day (Deep Archive) minimum storage duration. These tiers are ideal for data you must keep but rarely touch – audit logs, legal holds, regulatory archives.

“S3-compatible” common storage classes and tiering

While Amazon popularized terms like “Standard,” “Infrequent Access,” and “Glacier,” the concept of Storage Classes is now part of the S3 vernacular. S3-compatible systems use these classes to map data to different underlying media types based on performance needs.

Hot / Standard: Data lands here by default. It assumes immediate access needs and is typically backed by NVMe or SSDs in on-prem systems, or high-availability clusters in the cloud.
Warm / Infrequent Access: For data accessed monthly (e.g., secondary backups). The storage is cheaper, but retrieval often carries a cost or slight latency penalty.
Cold / Archive: For data that must be kept for compliance but rarely read (e.g., “Glacier” tiers). In on-premises object storage, this might map to high-density HDDs or even tape libraries.

Lifecycle Policies automate data movement. A standard S3 lifecycle rule can transition an object from Hot to Cold storage after 30 days without breaking the application’s ability to reference the object.

Getting started with Amazon S3

There are two main ways to work with Amazon S3, depending on scale and automation needs.

The AWS Management Console provides a web interface for creating buckets, selecting regions, and configuring settings like permissions, encryption, and storage classes. It is the fastest way to get oriented and works fine for small-scale uploads and experimentation.

For anything beyond manual testing, the AWS CLI and SDKs are the standard path. The CLI lets you script bucket creation, upload/download operations, lifecycle policies, and cross-region replication in a repeatable way. In production, S3 is almost always API-driven – integrated with backup software, CI/CD pipelines, disaster recovery platforms, and large-scale data transfer workflows.

Critical Protocol Features

Beyond raw storage, three capabilities make the S3 protocol essential for modern operations:

1. Object Locking (Immutability) This is the standard for ransomware protection. S3 Object Lock allows you to set a “Retain Until” date on an object. Once set, the object cannot be deleted or overwritten by anyone—including the administrator—until that date passes. This creates a WORM (Write Once, Read Many) model that is natively understood by backup software like Veeam.

2. Multipart Upload Uploading a 10TB disk image in one stream is risky, since a single network drop coulkd ruin the transfer. The S3 protocol handles this by breaking large files into chunks (parts) that upload in parallel. If a part fails, the client retries only that specific chunk.

3. Event Notifications S3 buckets are active. They can be configured to generate events whenever a new object is created or deleted. In a cloud context, this might trigger a serverless function. In an on-prem context, it triggers a webhook or a Kafka message. This removes the need for servers to constantly poll the storage (“Is the file there yet?”), enabling real-time automation pipelines.

Common S3 Use Cases

Disaster recovery. S3 is commonly used as a target for off-site copies of critical data. Its durability and on-demand scalability mean you do not need to overprovision local storage for DR. Data can be accessed through the standard S3 API during recovery, regardless of scale. Standard-IA is a popular choice here – lower storage cost, and the retrieval fees are acceptable when you are recovering from an actual disaster.

Backup and archival. Backup workloads map naturally to S3 because objects can be retained for long periods without modification. S3-compatible storage is frequently used for secondary backups, compliance retention, and long-term archives. For data you must keep for months or years but access rarely, Glacier and Glacier Deep Archive offer the lowest cost. Just make sure your retrieval time requirements match the tier you choose – Glacier Deep Archive is not the right pick if you need data back in minutes.

Data lakes and analytics. S3 is widely used as the foundation for big data and ML workflows. Storing large volumes of unstructured data and accessing it in parallel through the S3 API works well with analytics engines like Athena, Spark, and Presto. S3 Standard is the usual choice here since analytics workloads tend to read data frequently.

Static content and media hosting. S3 can store and serve static websites, media assets, and downloadable files directly over HTTP. Combined with CloudFront (AWS CDN), this offloads bandwidth and infrastructure management from your application servers.

Software distribution. Organizations use S3 to distribute packages, updates, and application binaries. The global accessibility of the S3 API and support for large objects make it well-suited for delivering software to distributed systems. Standard storage class ensures fast, reliable downloads.

What StarWind and DataCore Can Offer?

StarWind and DataCore offer S3 storage solutions for on-premises and hybrid environments.

StarWind Virtual Tape Library (VTL) modernizes traditional backup processes by combining familiar tape-based workflows with S3 storage. It uses S3-compatible object storage, such as Amazon S3, as the disaster recovery copy, improving durability and supporting immutability and ransomware protection.

DataCore Swarm is a scale-out, S3-compatible object storage platform designed for large datasets and long-term retention. By exposing the S3 API, Swarm lets on-premises object storage integrate with the same applications and backup platforms that work with Amazon S3.

Summary

S3 storage has become a widely adopted because it replaces infrastructure-specific behavior with a consistent, application-friendly model. By interacting through a simple API and treating data as immutable objects, systems can scale, automate, and protect information without depending on the characteristics of any particular hardware platform.

What Is Amazon S3? How S3 Storage Works, Storage Classes, and Use Cases