Automating Data Discovery and Data Classification for Enhanced Privacy
Data discovery goes hand in hand with data classification when it comes to complying with privacy regulations. Once you find sensitive data, you must be able to tag it properly to ensure proper access and use throughout the organization. Proper tagging and classifying means being able to find that data when responding to a consumer’s “right to know,” or “right to be forgotten,” that is common among data privacy regulations. Yet companies have so much data that trying to discover or classify data manually is an impossible task. Instead, companies should look to automation tools to discover and classify their sensitive data.
Together, data discovery and classification secure data by providing the critical first step in a comprehensive data privacy and security program. In fact, data discovery and classification are the first two steps of our data privacy management framework, which breaks down data privacy management into five steps: Discover, Classify, Understand, Control, and Comply. By using this framework, businesses can take a strategic approach to data management which not only protects consumer privacy but increases business benefits, too.
According to IDC’s The State of Data Discovery and Cataloging report, the ability to locate, understand, access, and trust data is a key enabler of business in the era of digital transformation. “No one driver stands out. Data discovery is important for business. Period.”
Organizations surveyed said that data discovery supports these five business drivers:
- 82% Operations and efficiency
- 81% Policy compliance
- 80% Risk reduction
- 78% Regulatory compliance
- 77% Increasing revenue
The report also found that 30% to 50% of organizations are not where they want to be when it comes to data discovery. As a result, data professionals say they are wasting on average 30% of their time because they cannot find, protect, or prepare data. Data discovery is critical to any data security operation—you cannot protect what you do not know you have.
What Makes Data Discovery and Classification so challenging? The Data Sprawl Problem
According to two studies, the volume and variety of data are exploding across today’s business world.
IDC released a report on the ever-growing data-sphere. It predicts that businesses will generate 175 zettabytes of data by 2025 at a compounded annual growth rate of 61%. A zetabyte is a trillion gigabytes. Multiply that 175 times to gauge the scope of the challenge.
This unprecedented data sprawl means that data is stored in every nook and cranny of enterprises’ IT infrastructures. As data volume and variety continues to grow, resources become more ineffective because data and information assets are harder to find. No one knows which files contain personally identifiable information (PII), information about particular projects, intellectual property (IP), or other valuable or regulated content. As a result, organizations struggle to protect information adequately, comply with legal mandates, weed out duplicate and redundant data, and empower employees to find the content they need to do their jobs.
Based on its prediction, the report urges CEOs to act now to ensure their data strategy is focused on storing data in small sets (categories) according to their business impact — rather than trying to analyze and use anything and everything. In other words, the more data organizations have to manage, the more data classification becomes a critical necessity.
However, companies must first know what data they have and where it is stored before being able to classify and protect it.
The Dark Data Problem and How It Affects Data Discovery
Among the mountains of data already stored and arriving daily in today’s businesses is a secret hiding in plain sight: dark data. This is unknown and unused data, and it comprises more than half the data collected by companies, creating a considerable issue, said a Splunk report, The State of Dark Data.
According to the report, 55% of all data collected by companies is dark data. Within this category lies two subcategories — data that they know has been captured, but do not know how to use, and data that they are not even sure with certainty that they have. Further, 85% of companies say they are not using dark data, because they do not have the tools to find it, capture it, and analyze it.
Among the dark data could be important customer information, for example, about a transaction, but it is missing location or other important metadata because that information sits somewhere else or was never captured in a useable format. The implications are vast. When companies do not know where all the sensitive data is stored, they cannot be confident they are complying with consumer data protection measures. At the same time, data that is misused or improperly protected makes businesses vulnerable to legal action or theft from hackers.
Due to the large amount of data that companies collect and create on a given day, manually discovering data throughout their enterprise is an insurmountable task. Instead, companies must look to automating their data discovery efforts.
Leveraging Data Discovery Automation
Today’s organizations do not have to be in the dark about their data. Intelligent data discovery automation can tell them exactly what data they have and where it resides.
Automated data discovery and inventory tools work by scanning endpoints or corporate network assets to identify resources that could contain sensitive information, such as hosts, database columns and rows, web applications, storage networks, and file shares. Sophisticated systems can find data located in all file types, including .doc, .xls, .pdf, .txt, .ppt, .zip, csv, and .xml, among many others.
Those same tools ought to be able to search unstructured data, cloud repositories, endpoints, and on-premise servers, to find all of the locations where data can be stored. If not, companies risk sensitive data going undiscovered and unprotected.
Automating Classification: A New Schema for New Privacy Rules
Many companies face significant challenges when it comes to data classification. A Gartner report on overcoming the pitfalls stated: “Most data classification implementations continue to be unexpectedly complex and fail to produce practical results. CISOs and information security leaders should simplify schemes, leverage tools, and allow for implementation flexibility to make classification valuable for the entire organization.”
Similar to data discovery, companies face hurdles when it comes to classifying the sensitive data they are able to find. Data is inextricably linked to other pieces of data (such as a person’s name tied to an address and a purchase order) and companies need to process data for different purposes (such as marketing and shipping) in different capacities. Companies must also follow a consumer’s request to opt-out of certain data processing measures but still retain that same data for legal reasons or other legitimate purposes. With large data sets, classifying data manually for any of the above scenarios would be next to impossible for almost any organization.
The solution is an intelligent automated data classification platform that replaces their manual processes or an ineffective automated process. The right automation system can aid in streamlining data classification, automatically analyzing and categorizing data based on pre-determined parameters continually and in real-time.
Organizations often struggle with their data classification programs because they approach them mostly as manual processes. But classifying data manually is simply too labor-intensive, time-consuming, and error-prone to be a practical solution for all but the smallest companies. In particular, manual data classification suffers from the following issues:
- Inaccuracy — Busy employees often fail to classify data at all or simply pick the first tag in the list to expedite the process.
- Inconsistency — Different people classify similar documents in different ways.
- Inflexibility — As companies’ sensitive-data requirements and regulations change, no one has the time or inclination to update the tags on terabytes of existing data.
- Failure — As users realize that data is not classified correctly, they will quit trusting the process and the whole project fails.
Automating data classification overcomes these limitations by making the process reliable, accurate, and continuous (aka, persistent). A sophisticated platform can, for example, spot personally identifiable information (PII) by looking for data patterns, such as names, dates of birth, addresses, phone numbers, financial information, health information, and social security numbers. Importantly, automated systems can also re-classify data as needed, such as for updates and changes within the business or from changes in compliance regulations.
Leveraging Data Classification Automation
Some companies have taken a completely automated approach to data classification. Some have taken a completely manual approach. Others have chosen a hybrid approach. However, even small organizations that haven’t considered full automation, realize it’s a godsend after seeing how everything they are doing manually can be fully automated — thereby, eliminating the man-hours and human errors inherent in manual processes.
Today’s automated data classification applications vary in terms of usage, access, and enforcement capabilities. But common features include:
- Pull-down menus of available data classification selections for user or file types
- Content-aware capabilities to suggest a classification for review and change or confirmation by the user
- Automatic selection of the appropriate classification levels based on content analysis engines
- Classification lifecycle policy enforcement, such as preventing a user action unless the file is classified, or preventing an unauthorized classification change
- Some limited integrated data loss prevention (DLP) functionality, often around specific use cases, such as email
Every organization should assess the data classification automation options available in the marketplace and determine which solution can provide them with the capabilities they need to take their data privacy protection to the next level. Ideally, organizations should choose a platform that is purpose-built to deliver the key functions of deploying robust data privacy programs.
Even better, organizations need to ensure that their platform can operationalize data classification in ways that support their ideal classification schema, including the addition of data processing, purpose, and privacy.
Automation Supports Cyber Resilience
A recent Ponemon study on cyber resilience made a strong case for the power of automation within organizations’ security programs. Yet, many organizations still have not made the leap to automation. The study found that only 23% of organizations use automation extensively versus 77% of respondents who use automation moderately.
The report cited six benefits from automating security, including these two related to data privacy and cybersecurity:
- High automation organizations recognize the value of the privacy function in achieving cyber resilience. Moreover, high automation organizations are more likely than the overall sample to recognize the importance of aligning the privacy and cybersecurity roles in their organizations (71% vs. 62%). Most recognize that the privacy role is becoming increasingly important, especially due to the European Union’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).
- High automation organizations are more likely to say their organizations have the right number of security solutions and technologies. This can be accomplished by aligning in-house expertise to tools so that investments are leveraged properly.
Why Automate Data Discovery and Classification? Data-Privacy Readiness
Data Discovery and Classification are the first two steps of Spirion’s Data Privacy Management Framework. These form the bedrock of any data privacy program because you must first know where your sensitive data resides and what levels of protection it needs. Without those two fundamental steps, your company will not be able to get a full picture of its data for privacy compliance.
In the United States and internationally, the data privacy regulation landscape is constantly changing. Compliance with one set of laws does not guarantee compliance with another. Instead, companies must take a proactive approach where they can understand the entirety of their data to comply with any privacy regulation (current or in the future).
That understanding starts with the Data Privacy Management Framework where the first two steps are Data Discovery and Data Classification. Only through knowing the location of your sensitive data and its appropriate levels of protection can you begin to make sense of any privacy compliance requirement. But when your organization handles mountains of data, trying to Discover and Classify manually is impossible. Instead, look to automated data discovery and classification tools for peace of mind and a better understanding of your company’s data.