AWS S3 Append To File | 5 Things To Know

We strive to provide you with authoritative, trustworthy, and expert advice. In doing so, the staff at clouddropout.com performs extensive research, editing, and fact checking to every post on this webiste. If you feel that this article can improve, please feel free to reach us at staff@clouddropout.com

Before continuing this article, I wanted to let you know that I have a Youtube Channel where I showcase all sorts of video content related to Tech. Subscribing would mean a lot to me, and I very much appreicate all the support!

Does S3 Support Appending FIles?

The process of appending essentially involves adding data to an existing file. It’s a typical process often applied in databases when large amounts of new data are being added to an existing table.

However, in Amazon Web Services, the typical storage S3 storage resource doesn’t behave like a typical database repository.

While it is a very powerful storage resource for cloud usage and can handle an incredible amount of data being put into it as a cloud system, Amazon’s S3 does not allow file appending at all. Seriously.

AWS S3 Append to File Doesn’t Happen

Most databases, whether physical or online, have an appending feature.

This is a common bread and butter function of database working. Amazon’s S3 system takes a different approach, however. Instead, a user or account user has to replace the old data package object already stored with a brand new object, i.e. an entire replacement of the old file.

That implies a couple of things.

First, the user has a full copy of the subject file or object locally or in another location that will transfer it to AWS’ S3 bucket system.

Second, the user has the ability to update and append the file locally, which of course also assumes the necessary software and hardware to support that activity locally as well.

Third, the user is regularly updating the given file object to keep it current, creating two copies of the data set instead of the primary being online in AWS S3.

From a data management perspective, this lack of appending ability in the S3 function can be downright frustrating.

AWS S3 Append to Existing File Workarounds

There are workarounds to the lack of an append process issue, but generally they still involve the same issue of replacing the old S3 object in AWS with a new one.

And it’s not a clean substitute per se. If the workaround doesn’t take into account the need to replace the entire existing file with a newly modified one that also needs to be complete, then data is going to get lost in the switch.

A typical automation approach would involve having a working copy locally that is being modified by a user. When it reaches a set stage of new change, then the automation would go into the AWS S3 bucket, save the new object and wipe out the old one.

You can see right away where there is a huge risk for old data to be lost if the new, replacing file is incomplete or corrupted for some reason. And file corruption happens more frequently than people think.

AWS Lambda Append to S3 File Options

Alternatively, one could with multi-step automation, download the existing S3 object in AWS and use that as a control version.

Then the user would append it locally, and save it as a new S3 object, either replacing the existing S3 object or creating a brand new one with a different ID. Both are viable options, but creating a second, new S3 preserves the first one on file in AWS in case someone goes wrong.

It would be a bit like how Linux OS updates its code, keeping a stable version while putting a new one through the paces in production but always supporting the earlier stable version in case one has to rollback.

The Kinesis Firehose approach seems to be an appending workaround for the AWS S3 limitation, but it is not. Instead, new data coming in goes into the firehose delivery channel, and that pathway is sent to an S3 destination.

It’s not overwriting the existing S3; it’s creating a new object entirely.

Simplicity in Amazon’s Favor

It is interesting that a powerhouse like Amazon would not allow for something as standard as file appending to occur in its cloud platform, which is practically designed for online database administration tied to websites.

However, it may very well be that Amazon didn’t want any risks associated with constant modification of existing files. It’s a much cleaner process to force people to do their modifications offline and then upload the clean version to the S3 bucket instead. Less processing, less risk, and less responsibility.