The emerging risk of unstructured data in the public cloud

About the author

From solutions architecture to security, Gumbs Gumbs brings deep technical experience to his position as Chief Innovation Officer at Spirion. He leads the Spirion team through strategic product development to create technologies that push data security forward in an increasingly complex digital world.

Every company can be the victim of a data breach – even cybersecurity companies. When Imperva was breached last year, it turned out to be because of cloud misconfiguration. It’s a type of breach that is becoming more common, and it is putting unstructured data at greater risk.

There’s a lot of unstructured data in public cloud environments. AWS is one of the most popular clouds, and we find a lot of open S3 buckets in AWS clouds. Those openings aren’t new. By default, S3 buckets are private, but when they are exposed, it is done deliberately, whether by error, negligence, or to provide access to someone else. The problem, of course, is when the buckets are left open, it offers an easy entry point for hackers.

But just how bad is the problem surrounding exposed S3 buckets? We took a look at 279,000 buckets and found that 217,000 of them were unsecured. They were wide open for the world – or at least the bad guys – to look at.

Filled with PII

Then we began to break down the data inside these unsecure buckets. We found a lot of PII. In fact, 48 percent of the unsecure buckets had PII inside. Also, on average, each unsecure bucket had 121,000 objects inside.

Most of the file types in these buckets skewed to web development. That’s not surprising, as S3 buckets are often used as backup depositories for web applications. There was also a healthy mix of productivity files found, as well, such as documents, spreadsheets, and text files.

So in this investigation, we determined two critical issues: first, there is a lot of misconfiguration happening, perhaps a lot more than we realized, and second, and more importantly, most organizations don’t know what data is in these buckets. And these two issues are why we’re seeing an emerging risk of unstructured data in the public cloud.

You’re one misconfiguration away from a breach

If you are using a public cloud environment, you are just one misconfiguration away from a data breach. And the sad part is, this is an easy breach to avoid. And a global pandemic and that rush to remote work has made misconfiguration even more likely. To make work from home easier, companies are taking advantage of moving more work into the cloud. But Jim Reavis, co-founder and CEO of the Cloud Security Alliance, pointed out that these companies are failing at basic security hygiene and are careless about how they move around and use the cloud environment. This leads to misconfiguration.

Misconfiguration is bad enough. If you don’t know what data is in those buckets, or what data is in your cloud, you have no idea what can be lost or what is at risk. Not having insight to this data leads to loss, whether the data itself or a financial loss to the organization.

Unstructured data takes users down a rabbit hole. A single file folder could contain hundreds of documents and more file folders with more documents, and so on. Do you know what is in all of those files? Of course not, which adds to the risk. If you can’t identify the most sensitive data, you can’t protect it properly. If you can’t protect it properly, you can’t be in compliance of data privacy regulations. And if you can’t be in compliance, you could end up paying a lot of money in fines.

Theoretically, you can prevent an easy data breach by adding policies that will cut down on misconfigurations. But errors will happen. Negligence will happen. Misconfigurations will happen. So it is more important to know the data inside these S3 buckets so security and privacy can be properly addressed.