The Evolution of Data Classification
Data classification is not a new concept. In a way, it has existed for millennia. When the point is keeping sensitive data out of the hands of the wrong people, historically there’s no better example than in Greece in 678 AD. To defend Constantinople from the sea, the Greek army created a type of fire that could burn on water. The “Greek fire” formula was kept a state secret from all enemies. Their “top secret” classification of this intellectual property worked so well that it died with the Byzantium Empire.
Can your sensitive data protection program stand up to that standard? If not, you’re not alone. According to The Value of Data study, which surveyed 1,500 IT decision-makers and data managers across 15 countries, on average over half (52%) of all data within organizations remains unclassified or untagged. This means that businesses have limited or no visibility over vast volumes of potentially business-critical and private data.
The study further reported that three in five (61%) organizations admit they have classified less than half of their public cloud data. And over two-thirds (67%) have classified less than half of the data that sits on mobile devices.
It’s even worse for small businesses. A GetApp study reported that only 16% of small businesses have a data classification policy that provides different levels of access based on data sensitivity.
All of this unclassified data is putting organizations at risk because there is no way to ensure that it is safeguarded. The result is a lot of potentially sensitive data that sometimes may be insecure, and other times may be too secure. Being just as secure as is required should always be the goal. Data that is too “unsecure” is at risk of breach and noncompliance. Data that is too secure can hinder day-to-day business operations and become more difficult to search, share among applications and databases, and use for its intended purposes.
Stricter Regulations Require Data Classification
Data classification has been used in the modern world for several decades, but in fairly rudimentary forms. Today it’s generating a new wave of interest from the business world. The reason is likely the increased attention being paid to data classification by today’s more progressive compliance regulations: the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA).
These much-stricter rules are intensifying the need for stronger privacy and security measures — particularly the adoption of data classification processes. They are making new demands on organizations, including requiring them to respond to consumers’ requests for their personal data to “be forgotten.”
Right now, if organizations deal with any personally identifiable information (PII) from European Union or California citizens, that data must be classified. What’s more, it’s highly likely that new or updated national and state regulations will also require data classification to add a vital added layer to data privacy and security.
Traditional Data Classification
Classifying data according to its risk levels enables organizations to quickly scan and tag it to ensure that sensitive or risky information is properly managed and protected. IT teams and data managers can ensure it’s handled and protected appropriately by, for example, applying proper data access controls, keeping permissions up-to-date, and implementing the right backup and remediation capabilities.
There are many schemas that organization can use for classifying data, but most categorize data as variations of four general categories — public, private, confidential, and restricted. A simple example of a four-level data classification schema includes:
- Public — Information that is freely available and accessible to the public without any restrictions or adverse consequences, such as marketing materials, contact information, customer service contracts, and price lists.
- Internal — Data with low-security requirements, but not meant for public disclosure, such as client communications, sales playbooks, and organizational charts. Unauthorized disclosure of such information can lead to short-term embarrassment and loss of competitive advantage.
- Confidential — Sensitive data that, if compromised, could negatively impact operations, including harming the company, its customers, partners, or employees. Examples include vendor contracts, employee reviews and salaries, and customer information.
- Restricted — Highly sensitive corporate data that, if compromised, could put the organization at financial, legal, regulatory, and reputational risk. Examples include customers’ PII, PHI, and credit card information.
Redefining Data Classification Today
While data classification has been with us for a few decades, in the past few years it has changed considerably — becoming much more sophisticated and delivering much greater value. What used to be a simple process that involved applying data to a few buckets to streamline data management is now a much more sophisticated process that meets organizations’ intensifying data privacy and security demands.
Along with traditional approaches to data classification, there is a new way to look at it — one that is updated to address today’s more sophisticated data privacy regulatory and business needs. This new methodology adds an additional layer to data classification that accounts for three critical variables: data processing, purpose, and privacy. Simply stated, these sub-categories are:
- Data Processing (aka, consent) — New and evolving data privacy regulations require an individual’s consent for how organizations use their private data, in particular, GDPR and CCPA.
- Purpose (aka, access) — GDPR requires organizations that process European Union citizens’ personal data to clarify the purposes for which they are collecting data. As a result, companies now have to manage their data according to what purpose or purposes it serves within their organizations.
- Privacy (aka, compliance) — Both GDPR and CCPA are laser-focused on data privacy. Complying with these stricter regulations requires more advanced data classification schemas.