October 6, 2020
About the author
Jen Holtvluwer, Chief Marketing Officer at Spirion, has more than 20 years of senior marketing and business development experience with success creating compelling stories that impact audiences and ensure a customer-focused approach to business opportunities.
Thanks to increasing data privacy regulations, such as the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA), data classification is receiving renewed interest from organizations around the world. A key driver is that if organizations deal with personal information or sensitive personal information from European Union or California citizens, the data must be classified.
What used to be a simple process that involved applying data to a few buckets to streamline data management is evolving into a much more sophisticated process to meets organizations’ intensifying data privacy and security demands. Gartner recently released insightful research on “Using Classification to Improve Unstructured Data Security” (Mike Wonham, August 11, 2020).
The report states the benefits of data classification within the broader context of security and compliance, identifies both the key criteria and limitations of today’s classification tools, and as always, provides expert analysis and guidance. In today’s post, we’ll discuss what we consider to be the key takeaways from the 35-page report.
The role of classification in the data life cycle
Spirion has long advocated that data discovery and data classification go hand in hand: one cannot exist without the other. Data discovery is the process of collecting data from databases and silos and consolidating it into a single source that can be easily and instantly accessed.
Once you have located all of your data, the application of labels, tags, and visual marker classifications help both humans and computers determine its sensitivity and treat it consistently. Such data classifications ensure that only appropriate parties can access sensitive data as it moves through the organization. They also support information sharing with other data protection controls.
The Gartner report stresses, “Understanding the sensitive data estate is a prerequisite for effective security and data privacy compliance. DLP is not sufficient as privacy is more than a data loss problem. Security and risk management technical professionals should use data classification to support these requirements.” They further add, “Data classification capabilities are therefore not only useful; they are often necessary to achieve compliance and make data-centric security controls effective.”
The business cases for classification
Why classify? Admittedly by itself, data classification may not be all that useful. But in concert with a holistic data life cycle, it becomes a key enabler and necessary component for effective data governance and compliance programs.
Gartner explains, “Classification is used to provide insight and either deliver or support control activities. In doing this, it supports a variety of business drivers for data-focused security. The most commonly seen of these drivers that are grounded in security within Gartner inquiry is privacy, which forms the bulk of data security risk for many clients.”
Some of the most common business cases for data classification include:
Classification Policy: The prerequisite first step
There is a critical need to classify data according to organization policy. While there are many schemas that organizations can use for classifying data, for security, there are two key categories:
- Confidential — Sensitive data that, if compromised, could negatively impact operations, including harming the company, its customers, partners, or employees. Examples include vendor contracts, employee reviews and salaries, and customer information.
- Restricted — Highly sensitive corporate data that, if compromised, could put the organization at financial, legal, regulatory, and reputation risk. Examples include customers’ PII, PHI, and credit card information.
Gartner emphasizes: “Even if you do nothing else, put an information classification policy in place. If it demands that users treat and mark data in a certain way, then you have provided the foundation for user education, technical control and compliance.” They add, “confidentiality labels will form part of an information classification policy, which should be accompanied by a data handling policy, which provide the logical basis for security standards and control requirements. Without such a policy, classification programs are likely to fail as there is no common understanding, classification labels can proliferate, and misclassification will be more frequent.”
Classification solution landscape
Although there are few sole data classification tools, it is common for classification to span multiple product categories—ranging from Data Access Governance (DAG), Data Loss Prevention (DLP), User-Driven Classification (UDC), and SaaS point solutions.
Let’s look at use cases and tools to consider:
|Use Case||Tools commonly used|
|Immediate control action only, no recording of the classification||DLP|
|Insight into data within SaaS and on-premises environments||DAG, SaaS, File Analysis|
|Insight into data within user endpoints||UDC, DLP|
|Tagging based on user input||UDC, SaaS|
|Automatic tagging||UDC, SaaS, DAG|
Essential classification capabilities
Because data classification is an enabler for other aspects of the data lifecycle—whether triggering other data security controls or imparting strategic insights—it must meet a broad range of capabilities.
Gartner suggests, “In order for classification to work, the following conditions must be met:
They also recommend the following criteria when evaluating classification tools:
Classification technology strengths
Today’s classification tools, which encompass Data Access Governance (DAG), Data Loss Prevention (DLP), User-Driven Classification (UDC), and SaaS point solutions, generally perform well when it comes to the following capabilities:
- Coverage of file types
- Metadata reporting
- Workflow and privacy—specifically, Gartner emphasizes, “As privacy requirements expand globally, vendors are introducing privacy workflow and data subject access request (DSAR) support. These are common themes in most privacy regulations.”
Classification technology weaknesses
These same tools also carry some inherent limitations:
- Data tagging limitations—Gartner states that “tagging has a significant limitation in that it is not possible to tag all data objects. Some file types have room or even formal support for tagging in headers or document properties. However, the vast majority of file types have no such capability or are so limited as to be effectively useless for tagging or other form of labeling.” They also offer the guidance: “If the requirement is to track data across multiple dimensions (for example, internal policy, privacy, health and department), then use file analysis tools for visibility. Use classification tagging tools for policy-dependent control.”
- Classification change—They caution, “classification tools support changing classifications. But automated tools are not good at automatic reclassification based on external conditions rather than changes in content.” They advise, “focus on data that might leave the company rather than moving internally, except where internal sharing would be absolutely unacceptable. Keep your most restrictive classification label reserved for exceptional rather than common use cases.”
- Encrypted data—encryption often interferes with the detection of data. Gartner recommends as a best practice: “The safest classification approach is to treat individual encrypted files that cannot be accessed by the classification tool as sensitive and use controls to prevent their movement or access.” They further add, “the best approach for any unreadable file is to use the discovery of these files to support a review of the business process and sensitive data handling.”
Data classification trends
There are several data classification technology trends. Among them are automated classification tools, the rise of machine learning and artificial intelligence, and privacy workflows.
Gartner provides the following recommendations for technical professionals responsible for data security:
- “Ensure that a data classification policy is in place as it is the root of data security governance. It provides clarity and authority, supports control standards and underpins user awareness efforts.
- Align your security classification with any broader data governance programs. It is easy to confuse users and create technical complexity and possibly conflict. Focus on high-risk and high-value data, especially regulated data, to support such alignment.
- Use automated data classification to provide users with a baseline and you with insight. User classification is less expensive, but harder to introduce. Use both together for best results.
- Aim for “good enough” solutions, recognizing that the automated technology has limits in precision. Phase your implementation carefully to avoid diminishing returns.
- Use at least two labels. Sensitivity and either “owner” or “project/department.” Identify where the tag is to be used and provide only enough information to allow that control to work correctly.”
Redefining automated data classification
Automating data classification overcomes many of the limitations by making the process reliable, accurate, and continuous (aka, persistent). A sophisticated platform, such as Spirion Data Privacy Manager, can spot personal information by looking for data patterns, such as names, dates of birth, addresses, phone numbers, financial information, health information, and social security numbers. Importantly, automated systems can also re-classify data as needed, such as for updates and changes within the business or from changes in compliance regulations.
Every organization should assess the data classification automation options available in the marketplace and determine which solution can provide them with the capabilities they need to take their data privacy protection to the next level. Ideally, organizations should choose a platform that is purpose-built to deliver the key functions of deploying robust data privacy programs. The right automation system can aid in streamlining data classification, automatically analyzing and categorizing data based on pre-determined parameters continually and in real-time.