Data classification de-mystified via automation

About the author

With a 15-year career that has spanned telecommunications, geospatial analysis, machine vision and more, Jason Hodgert focuses on making technologies logical, accessible, and understandable to mass audiences. As Product Marketing Manager at Spirion, Jason finds the areas where product functionality meets market requirements to help make privacy solutions available to all.

In Marge vs the Monorail, one of my favorite episodes of The Simpsons, Lyle Lanley, as voiced by the late Phil Hartman, stated: “You know, a town with money’s a little like the mule with the spinning wheel. No one knows how he got it and danged if he knows how to use it.”

I think about this quote in connection with data privacy way more often than I’d ever expect. You see, all one has to do is replace “town” with “enterprise” and “money” with “sensitive data” and it’s a story that repeats itself far too often in data security.

Clients we work with choose Spirion because they understand that there are large numbers of unknowns living in their networks. Personally Identifiable Information (PII), Protected Health Information (PHI), Payment Card Information (PCI), and Intellectual Property (IP) represent just a few that are can be ticking timebombs scattered throughout an organization if not identified and handled appropriately.

Organizations clearly understand the identification part of that equation, it is, in fact, what brings many customers to Spirion in the first place. Where many have trouble is with “what’s next?” What does proper handling look like and how is it implemented? For many, the logical next step is data classification. When data is classified organizations can more easily develop frameworks that simplify security procedures and deliver compliance with corporate policies.
Data classification, however, is something that has historically been easier said than done. Even with advances in automation streamlining the process, many still see classification as an unnecessary burden on workflows and a hinderance to productivity.

The need for data classification

Data classification is a simple concept. It is simply a system by which a level of confidentiality and defined handling instructions are applied to each piece of information under one’s domain.

Businesses define data classification schemes to define the levels of confidentiality that are required for each piece of information maintained by the organization; A simple data classification scheme might include classifications such as:

  • Public – data available to anyone
  • Internal – data that could be problematic if it were to be seen by persons outside the organization
  • Confidential – these are often defined by department. For example: company personnel files often remain confidential to the HR department
  • Restricted – the highest level of sensitivity offering the least mobility to data. Intellectual property, and various PII exists here.

Without a classification scheme, organizations treat all information the same. This increases the probability that sensitive data will not have adequate security controls. Conversely, it also means that low-sensitivity data will have more security controls than necessary, leading to reduced productivity and workforce frustration.

Fighting remediation hesitation with intelligent rights management.

Classification, in and of itself, provides very little in the way of true protection, but it does provide a great indicator of how to best protect files depending on their classification. An increasingly popular and intelligent way to do so is Information Rights Management.

In situations where organizations don’t want to delete, quarantine, or encrypt data with broad strokes, information rights management, as delivered by companies like Seclore, can provide the security required without stripping the valuable business utility out of files containing sensitive data.

Rights management and encryption are not the same thing

While it’s true that both Encryption and Rights Management are focused on the same use case – restricting data access to only the specific individuals that require it – the two technologies differ in several ways.

  • Granularity: With encryption, once data are decrypted by genuine, authorized users, the security is removed and information has the potential to be leaked. Users with access to the encryption key can edit the content, print copies, and can even copy all the content to other files and locations. In other words, encryption cannot secure data from unauthorized usage by authorized users the way a full-featured IRM solution can.
  • Flexibility: Encryption is a binary technology. Meaning authorized users get either full access or no access. With Seclore rights management, levels of rights can be applied to restrict recipients from editing the data, printing it, or copying anything from it to an insecure location.
  • Auditing: File encryption cannot be used to track or audit activities performed on your data. With rights management it is possible to track who is accessing it, what they are doing with it, when they are doing it, and so on.
  • Access Expiry: With Encryption, it’s not possible to expire access to the information remotely. Once an encrypted file is sent to someone, it stays theirs, forever. Rights Management allows the ability to either distribute with a predefined expiry date or expire the access remotely from a central console.
  • Adjustibility: Once an encrypted file is sent out, it is not possible to make changes to the encryption policy applied on it. With Rights Management it is possible to revoke or adjust access for certain users even after the file has been shared.

Traditional classification methods are slow and unreliable

For years, the best way to obtain a fully classified dataset was by human operators, often the creators of the data, manually assigning a classification to each piece of data as it passes through their daily lives. Even today, there is a belief that performing these steps manually provides much better contextual analysis of documents, but as the number of files grow, classification levels gain complexity, and rules increase, the weaknesses in manual processes start to show; not the least of which is the impediment to daily operations they represent.

Six gotchas of manual data classification

  1. Manual Classification is inaccurate: Proponents of manual classification techniques will say that only the data owner/creator has the subject knowledge necessary for accurate classification but it’s more likely that any individual classifying a dataset has a set of core competencies that a far from what’s required to accurately identify the sensitivity of the data they’re working with. Keeping every employee aware of corporate classification policies and requires an extensive and consistent training regimen that is difficult to maintain.
  2. Manual Classification is subjective: One person’s internal might be another’s confidential. For example, a product manager or executive leader with many cross-functional contacts may see more value in sharing of the exact same document that an engineer would classify at a much higher level to protect IP that may or may not even be present.
  3. Manual Classification is inconsistent: Variations in classifications do not necessarily require diversity in end users. Even a single individual might classify similar content inconsistently depending on any number of factors that have nothing to do with its sensitivity. Personal workloads, deadlines, and even the time of day can lead to inconsistency across an organization. Commonly, a user will attempt to err on the side of caution and “over classify” files in an attempt to keep the data safe, but actually end up adding unnecessary controls to files that creates headaches and unnecessary roadblocks to productivity in other areas of the organization. The next day, that same user may apply a lower classification to sensitive data to facilitate sharing with an outside contractor.
  4. Manual Classification doesn’t solve for the right problem: Manual classifications often unnecessarily account for the intended audience of the file(s) in question. When applying a classification label to files manually, users tend to think of the intended audience for said file instead of focusing on the sensitivity of the data within. Individual departments will want to restrict their files departmentally out of an overabundance of caution. This creates islands of information and barriers to productivity.
  5. Manual Classification is tedious: Manual classification technologies work often by launching distracting pop-up dialogues when files are saved, updated, or moved. At best, this represents a disruption that slows down productivity. At worst, users become so accustomed to the disruption, they instinctively click the buttons that get them back to work the quickest regardless of the actual sensitivity of the data.
  6. Manual Classification ignores data at rest and cannot account for data evolution: When files are classified at the time of creation, it can create a “set it and forget it” mentality. As policies evolve and data ages, the accuracy of the original classification comes into question. Finally, while proponents of manual classification will argue that enlisting the data owner in the application thereof, manual processes do not translate or scale well to the classification of petabytes of data already living on databases, file shares, and cloud repositories.

Follow the data

Ideally, any classification exercise should be objective, consistent, thorough, and above all, accurate. When organizations start with accurate data discovery, the rest of the classification journey becomes much more easily streamlined. Instead of relying on internal politics and departmental objectives, classification schemes can be set based on the observed sensitivity of the existing data. Automated application of labels and machine-readable metadata becomes objective instead of subjective, and the whole process is invisible to the end user. When paired with the powerful rights management provided by Seclore, the end result is that data is protected against exfiltration with controls that maintain the business value by facilitating the safe flow of data within and outside the enterprise.

It’s this foundation of seamless, reliable accuracy that allows Spirion customers to confidently move forward with the no-longer-daunting prospect of data classification. By deploying an end-to-end automated discovery, classification, and remediation solution, organizations are no longer analogous to the aforementioned mule with the spinning wheel. They have a complete understanding of the sensitive data they have and know exactly what to do with it.

1