The Analyst View: Take control of your unruly data with privacy-preserving classification

About the author

Jen Holtvluwer, Chief Marketing Officer at Spirion, has more than 20 years of senior marketing and business development experience with success creating compelling stories that impact audiences and ensure a customer-focused approach to business opportunities.

Thanks to increasing data privacy regulations, such as the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA), data classification is receiving renewed interest from organizations around the world. A key driver is that if organizations deal with personal information or sensitive personal information from European Union or California citizens, the data must be classified.

What used to be a simple process that involved applying data to a few buckets to streamline data management is evolving into a much more sophisticated process to meets organizations’ intensifying data privacy and security demands. Gartner recently released insightful research on “Using Classification to Improve Unstructured Data Security” (Mike Wonham, August 11, 2020).

The report states the benefits of data classification within the broader context of security and compliance, identifies both the key criteria and limitations of today’s classification tools, and as always, provides expert analysis and guidance. In today’s post, we’ll discuss what we consider to be the key takeaways from the 35-page report.

The role of classification in the data life cycle

Spirion has long advocated that data discovery and data classification go hand in hand: one cannot exist without the other. Data discovery is the process of collecting data from databases and silos and consolidating it into a single source that can be easily and instantly accessed.

Once you have located all of your data, the application of labels, tags, and visual marker classifications help both humans and computers determine its sensitivity and treat it consistently. Such data classifications ensure that only appropriate parties can access sensitive data as it moves through the organization. They also support information sharing with other data protection controls.

The Gartner report stresses, “Understanding the sensitive data estate is a prerequisite for effective security and data privacy compliance. DLP is not sufficient as privacy is more than a data loss problem. Security and risk management technical professionals should use data classification to support these requirements.” They further add, “Data classification capabilities are therefore not only useful; they are often necessary to achieve compliance and make data-centric security controls effective.”

The business cases for classification

Why classify? Admittedly by itself, data classification may not be all that useful. But in concert with a holistic data life cycle, it becomes a key enabler and necessary component for effective data governance and compliance programs.

Gartner explains, “Classification is used to provide insight and either deliver or support control activities. In doing this, it supports a variety of business drivers for data-focused security. The most commonly seen of these drivers that are grounded in security within Gartner inquiry is privacy, which forms the bulk of data security risk for many clients.”

Some of the most common business cases for data classification include:

  • A responsibility to uphold privacy and regulatory compliance
  • Confidentiality for organizations subject to regulations about the control of data (such as contractual obligations)
  • Data retention
  • Technical control support for tools, such as DLP and data access governance, which benefit from pre-classification and labeling of data

Classification Policy: The prerequisite first step

There is a critical need to classify data according to organization policy. While there are many schemas that organizations can use for classifying data, for security, there are two key categories:

  1. Confidential — Sensitive data that, if compromised, could negatively impact operations, including harming the company, its customers, partners, or employees. Examples include vendor contracts, employee reviews and salaries, and customer information.
  2. Restricted — Highly sensitive corporate data that, if compromised, could put the organization at financial, legal, regulatory, and reputation risk. Examples include customers’ PII, PHI, and credit card information.

Gartner emphasizes: “Even if you do nothing else, put an information classification policy in place. If it demands that users treat and mark data in a certain way, then you have provided the foundation for user education, technical control and compliance.” They add, “confidentiality labels will form part of an information classification policy, which should be accompanied by a data handling policy, which provide the logical basis for security standards and control requirements. Without such a policy, classification programs are likely to fail as there is no common understanding, classification labels can proliferate, and misclassification will be more frequent.”

Classification solution landscape

Although there are few sole data classification tools, it is common for classification to span multiple product categories—ranging from Data Access Governance (DAG), Data Loss Prevention (DLP), User-Driven Classification (UDC), and SaaS point solutions.

Let’s look at use cases and tools to consider:

Use Case Tools commonly used
Immediate control action only, no recording of the classification DLP
Insight into data within SaaS and on-premises environments DAG, SaaS, File Analysis
Insight into data within user endpoints UDC, DLP
Tagging based on user input UDC, SaaS
Automatic tagging UDC, SaaS, DAG

Essential classification capabilities

Because data classification is an enabler for other aspects of the data lifecycle—whether triggering other data security controls or imparting strategic insights—it must meet a broad range of capabilities.

Gartner suggests, “In order for classification to work, the following conditions must be met:

  • The data should meet some criteria that enable a decision to be made about what classification applies.
  • The presence of one of the following two capabilities:
    -An automated system that can analyze the data and apply rules to make that decision
    -An interface for users to create, verify or override a classification.
    -Discovery in a variety of data storage environments is a key capability for automated systems.
    -The provision of a recording of that classification that allows other systems and processes to leverage that decision.
    -The inclusion of a log, dashboard or other method to allow data and security administrators to understand the data estate for a variety of reasons.”

They also recommend the following criteria when evaluating classification tools:

  • Storage locations—with more than 20 different types of storage where sensitive data can hide, Gartner highlights the differences of classification tools when it comes to discovering key file types. While file stores are well-covered, “tools for mobile endpoints such as tablets and mobile phones are lacking.” However, they say that “some vendors such as Bitglass and Spirion are actively combining capabilities to help deliver in this space.”
  • Recording classification—logically they state, “there is no point classifying a document if you’re not going to do something as a result, and in order to do that you need to record the outcome (unless you’re taking an immediate action, such as within DLP). There are two tagging methods available, tagging the document or recording the outcome in a metadata repository.”
  • Repositories—Gartner points out, “tools that use this outcome usually are automated and provide a wealth of metadata about not only the data, but the context in which it was found.”
  • Dashboards and reports— Gartner states “all classification tools provide dashboards and reporting capabilities, but their depth varies considerably depending on the tool itself.” Some of the key elements that you will want to have visibility into include: access rights, data ownership, file usage recording and duplication reporting.
  • Tagging (Labeling)—Gartner provides solid guidance when it comes to tagging. Proper tagging aids in the ability to find data when responding to a consumer’s “right to know,” or “right to be forgotten,” that is common among data privacy regulations. As a best practice, they suggest to “identify where the tag is to be used, and provide only enough information to allow that control to work correctly.”

Classification technology strengths

Today’s classification tools, which encompass Data Access Governance (DAG), Data Loss Prevention (DLP), User-Driven Classification (UDC), and SaaS point solutions, generally perform well when it comes to the following capabilities:

  • Coverage of file types
  • Metadata reporting
  • Workflow and privacy—specifically, Gartner emphasizes, “As privacy requirements expand globally, vendors are introducing privacy workflow and data subject access request (DSAR) support. These are common themes in most privacy regulations.”

Classification technology weaknesses

These same tools also carry some inherent limitations:

  • Data tagging limitations—Gartner states that “tagging has a significant limitation in that it is not possible to tag all data objects. Some file types have room or even formal support for tagging in headers or document properties. However, the vast majority of file types have no such capability or are so limited as to be effectively useless for tagging or other form of labeling.” They also offer the guidance: “If the requirement is to track data across multiple dimensions (for example, internal policy, privacy, health and department), then use file analysis tools for visibility. Use classification tagging tools for policy-dependent control.”
  • Classification change—They caution, “classification tools support changing classifications. But automated tools are not good at automatic reclassification based on external conditions rather than changes in content.” They advise, “focus on data that might leave the company rather than moving internally, except where internal sharing would be absolutely unacceptable. Keep your most restrictive classification label reserved for exceptional rather than common use cases.”
  • Encrypted data—encryption often interferes with the detection of data. Gartner recommends as a best practice: “The safest classification approach is to treat individual encrypted files that cannot be accessed by the classification tool as sensitive and use controls to prevent their movement or access.” They further add, “the best approach for any unreadable file is to use the discovery of these files to support a review of the business process and sensitive data handling.”

Data classification trends

There are several data classification technology trends. Among them are automated classification tools, the rise of machine learning and artificial intelligence, and privacy workflows.

  • Automated tools—While Gartner suggests, “automated classification tools must be configurable so that their output is reliable for a given client problem. Except in the simplest of use cases, 100% accuracy isn’t possible, the best tools will enable enough precision based on the data in the document.” They point out, “automated tools get best results with well-known standard data types, such as driving license numbers, proper names, and social security numbers. If your intellectual property is consistently well-formatted (such as with an account number or project coding system), then automated systems will succeed there.”
  • Machine learning—Gartner acknowledges, “the ideal automated classification solution would use powerful ML and artificial intelligence capabilities to determine the sensitivity or other categorization of data. Machine learning in data classification is improving, but has some way to go.”

Gartner provides the following recommendations for technical professionals responsible for data security:

  • “Ensure that a data classification policy is in place as it is the root of data security governance. It provides clarity and authority, supports control standards and underpins user awareness efforts.
  • Align your security classification with any broader data governance programs. It is easy to confuse users and create technical complexity and possibly conflict. Focus on high-risk and high-value data, especially regulated data, to support such alignment.
  • Use automated data classification to provide users with a baseline and you with insight. User classification is less expensive, but harder to introduce. Use both together for best results.
  • Aim for “good enough” solutions, recognizing that the automated technology has limits in precision. Phase your implementation carefully to avoid diminishing returns.
  • Use at least two labels. Sensitivity and either “owner” or “project/department.” Identify where the tag is to be used and provide only enough information to allow that control to work correctly.”

Redefining automated data classification

Automating data classification overcomes many of the limitations by making the process reliable, accurate, and continuous (aka, persistent). A sophisticated platform, such as Spirion Data Privacy Manager, can spot personal information by looking for data patterns, such as names, dates of birth, addresses, phone numbers, financial information, health information, and social security numbers. Importantly, automated systems can also re-classify data as needed, such as for updates and changes within the business or from changes in compliance regulations.

Every organization should assess the data classification automation options available in the marketplace and determine which solution can provide them with the capabilities they need to take their data privacy protection to the next level. Ideally, organizations should choose a platform that is purpose-built to deliver the key functions of deploying robust data privacy programs. The right automation system can aid in streamlining data classification, automatically analyzing and categorizing data based on pre-determined parameters continually and in real-time.

Want to dive deeper?

Data classification in an infrastructure should be paramount. Regulations, like CCPA and GDPR, now require it. In this white paper, learn how data classification has moved from a nice-to-have to a necessity in data privacy management and why you should expect data protection software to have automated data classification capabilities.

Download now