The role of classification in DLP strategies

SC magazine released its review of classification solutions and weighed in on its importance in DLP strategies. The full review can be found here

At the risk of stating the obvious, the core mission of Information Security is about protecting sensitive data. Sensitive data means a lot of things to different people. And thus it is about accurately classifying data into what is sensitive and what does not pose a risk of harm. Hackers make headlines when they compromise valuable data, be that financial information, personal data, state secrets, or simply embarrassing emails whose publication draws unwarranted and damaging public attention.

Is Your Data Classified? - Spirion

Headlines are not made nor are fines levied when successful data theft by hackers leads to access of say, an organization’s office supply purchasing data or the monthly condiment consumption in the company cafeteria. Such hacks, if they occur, are certainly unnerving to the IT security department because they would reveal exploitable openings, but they don’t make the evening news, nor do they result in financially meaningful damages.

So, if the name of the game is protecting sensitive types of data (aka valuable data) from unwanted access or distribution, the first step needs to be locating that data. IT departments can’t protect what they can’t find, and in order to protect the valuable data they do find, they need to also know how to classify what they found into data sensitivity categories that they can then focus their protection strategies on.

It is for this reason that any protection strategy needs to not only include data discovery by finding where the sensitive data is, but it also needs to include a relevant and precise data classification method so that sensitive data can eventually be monitored and protected. Different types of classification approaches can be applicable: Common are military-derived approaches where data types are separated into categories or classification labels like public, private, confidential, or secret.

However, other classification methods are possible that can address the needs of commercial enterprises, such as classifying data by how much post-breach damage its loss could do. The business impact, for example, losing financial information or healthcare data that falls under compliance regulations can be as costly as it might be losing corporate secrets.

Regulation or compliance-based classifications are also possible, e.g. classifying data that falls under PCI, HIPAA, SOX or other data privacy regulations. Custom classification schemes are also possible; the above-mentioned corporate secrets are one example such as with, say, law firms where certain clients’ data can be classified as sensitive and needs to be found, classified as such, and safeguarded.

Thus, classification methods are a function of what type of data needs to be protected, as well as what types and magnitudes of consequences are supposed to be prevented, all while allowing the data to be properly used for legitimate business purposes. Data then needs to be classified accordingly, and the more accurate the search and classification algorithms of a classification solution, the more types of sensitive data can be identified and classified accurately.

However, achieving compliance is not the same as achieving data security, and thus additional considerations come into play when thinking about data classification: When and how do you classify sensitive data? How often do you need to classify your data? Who will perform the task of classifying the data?

Though often thrown into the same bucket, monitoring and classifying data as it transitions the perimeter of a network (‘data in motion” or DIM) is quite different from classifying data that is stored (“data at rest” or DAR). Data-in-motion-based classification strategies, such as they are implemented as part of DLP solutions, are often seen as difficult to implement, and if perpetrators gained insider credentials, they can be ineffective.

And the decision needs to be made if DAR-based classification technologies need to complement or even replace DIM-based approaches to ensure all sensitive data, whether structured or unstructured, is discovered and appropriately and accurately classified. The recent, well-publicized breaches have highlighted this gap in predominantly exfiltration and / or infiltration focused security strategies: If the bad guys manage to penetrate a network perimeter, or sensitive data makes it out, which has been increasingly shown can be the case, esp. if attackers are credentialed, the data-at-rest discovery and classification technologies are a needed, additional line of defense and privacy protections to ensure sensitive data does not get into the wrong hands.

When data discovery and classification take place in an automated fashion, in near real-time the window of opportunity between the creation of new sensitive data, and its discovery and classification closes. Manual classification approaches, besides being subjective and prone to human error, thus cannot alone be the solution for sensitive data classification; in some cases, the time lag between creation and manual discovery and classification of sensitive data can be months.

For automated discovery and classification solutions to be effective, complementary or even alternative security strategies, they have to be highly accurate with both, low or zero false negative and false positive rates. Accuracy being dependent on the ability to discover and classify all data locations, and all data types and formats via the implementation of highly precise search and classification algorithms.

Once implemented such data-at-rest discovery and classification solutions have two principal benefits: By precisely locating and tagging sensitive data, post-breach losses can be minimized because the bad guys can’t steal what they can’t find. And they allow organizations to more efficiently target their security spend: Encryption technologies, for example, can be focused on the locations where sensitive data actually resides, or egress controls can be tuned to monitor certain types of sensitive data that are found to be the prevalent sources of risk for a security organization.

So, sensitive data classification – when done accurately, and in real-time – will reduce both, the residual, non-zero risk of post-breach damages, as well as the security investment required to protect against sensitive data loss.