Storage technology explained: What is S3 and what is it good for?

Wire Tech

1 week ago

Storage technology explained: What is S3 and what is it good for?

We look at S3, AWS’s object storage protocol that originated in its cloud services and has now spread as near enough a standard and to third-party on-premise deployments

While on-premise object storage is a minority interest, relatively speaking, object storage in the cloud is huge. It is its natural home, and AWS’s S3 is the big beast that roams there.

While it’s hard to get a definitive figure, the most recent AWS (Amazon Web Services) estimates of the number of S3 objects stored approach half a quadrillion – it reached a trillion in 2010 – and volume-wise that’s many exabytes of data.

So, here we give an overview of S3; what it is, how it works, what classes of storage it provides, the use cases it is good for, and the on-prem options spawned by the ascension of S3 to de facto standard status.

What is S3 storage?

S3 gets its name from Simple Storage Service in AWS public cloud. It is object storage and arose as the most basic storage building block of AWS’s cloud services.

It has also become a de facto standard, with S3-based products available in object storage from vendors that target customer’s on-site deployments. It is available in a wide variety of service level-based offerings from AWS and other cloud suppliers, as well as storage array and storage software makers.

S3 storage: What’s under the hood?

S3 is object storage. Any type of data can be stored using it – although it may not suit some application use cases, such as databases – and these can include documents, video and images.

Objects are stored with a unique identifier. This is what distinguishes object storage from traditional file and block. There’s no file system hierarchy. Under the covers, object storage data can be in any location, with its unique ID pointing to it.

S3 data also has metadata, of which some is system-generated and comprises object management-related variables such as datestamps, service levels, size, content type, encryption, versioning, zone, and upload information. Meanwhile, customers and users can set metadata for storage and data management purposes that might include data classification-relevant details and user activity.

A single S3 object upload maxes out at 160GB. But objects can be as big as 5TB and uploaded in a multi-part structure – up to 10,000 parts – via the GUI, command line or API.

What is the structure of S3 object storage?

S3 objects are stored in buckets. These are a fundamental of S3 storage and their creation is specific to Amazon regions, which may bring with them particular cost, availability and regulatory characteristics.

Customers create buckets and control access to buckets, create lifecycle rules for objects in buckets, track costs, manage replication, track access requests, use object locking and receive alerts, among other things.

Management of buckets and the objects within them comes via the S3 Console (if you’re using it in AWS). Here, you can use the Console GUI to upload, download, search for and manage objects.

There are also folders in S3, but they are more like a label to group objects and not a fundamental of the way it works, such as buckets. Folders have no relevance to the S3 API, for example.

What commands does S3 use?

S3 storage is based around core HTTP methods or verbs that include GET, PUT, DELETE etc., and accessed via the AWS browser GUI, the command line, and via API. Customers can use these commands to create, list, change and delete buckets; control access to buckets and objects and receive notifications about access; and upload, download, copy and move objects and sync them with local directories.

Commands can go via the command line for one-off work and be built into scheduled scripts etc, or go via API into application code, with the full range available for authorisation, bucket creation, gets, puts, copy, list, metadata access, uploads and downloads.

What classes of storage exist in S3?

AWS S3 storage classes range from those it intends for use with frequently accessed objects right through to those aimed at archive use cases.

At the frequently accessed end of the spectrum, these include S3 Standard and S3 Express One Zone, which gives claimed millisecond access on a single availability zone.

S3 Standard-IA and S3 One Zone-IA are the infrequently accessed version of standard S3. They charge a retrieval fee but still offer millisecond access, and target backup workloads and data that may be older but can be accessed relatively rapidly when needed.

AWS’s storage classes for rarely accessed objects are Glacier Instant Retrieval (millisecond access), Glacier Flexible Retrieval (minutes), and Deep Archive (12 to 48 hours).

In addition, there is S3 Intelligent-Tiering, in which, for a charge, data access is tracked by AWS and moved to the cheapest tier according to usage patterns.

What use cases is S3 suited to?

By nature, S3 storage – and object storage in general – is not best suited to all types of use case.

Object storage can handle almost any kind of data, is very scalable, can come with rich metadata, and is cost effective. But, it is not generally very quick to access – compared to block storage for databases, for example – and lacks the kind of consistency that comes with that kind of high performance transactional storage.

All that makes S3 suited to bulk storage use cases and for unstructured data, such as backups, content distribution, as a disaster recovery repository, and for AI and analytics datasets in data lakes, for example.

What on-premise or private S3 options exist?

AWS offers its own on-premise S3 storage via Outposts, which allows data to be held on-site and near applications or to meet data location requirements. But S3 is fundamentally storage of data objects accessed via HTTP verbs and REST API, so it’s quite possible for any supplier to offer access in a compatible fashion.

There are numerous other providers of S3-compatible on-prem storage, including Cloudian (HyperStore storage software), Dell (ECS), Minio, NetApp (in its Ontap and StorageGrid products), Pure Storage, QNAP, Red Hat, Scality (also offered via HPE), and StoneFly.