What Unstructured Data is Sensitive?


“Threat actors are having more success with breaching and exfiltrating sensitive unstructured data targets.”


Your organization’s sensitive unstructured data is a rapidly growing threat surface increasingly targeted by cybercriminals and threat actors. While more attacks are directed at structured databases, cybercriminals are having greater success in stealing sensitive unstructured data.

It’s because this type of data poses a unique series of security and privacy regulation challenges, many of which are not addressed by today’s investments in network, device and application security, cybersecurity frameworks or traditional vulnerability management strategies.

Unlike structured data that resides in well protected IT perimeters, sensitive content exists in unstructured formats such as office documents, CAD/CAE files, or images and are distributed and published via file sharing, social media and email. You generate it when HR collects personal employee information, your sales teams add customer contact information into your customer relationship management (CRM) system, your engineering/security teams collaborate with third-party intellectual property (IP), and so on.


  • New product plans
  • Product designs
  • Customer information
  • Supplier information/third-party contracts
  • Competitor research
  • Customer surveys
  • Software code
  • Job applications, Employee contracts
  • Internal processes, and procedure manuals
  • Data Analytics: Google Analytics, Tableau and Salesforce reports


  • California Consumer Privacy Act (CCPA)
  • General Data Protection Regulation (GDPR)
  • Health Insurance Portability and Accountability Act (HIPAA)
  • Gramm–Leach–Bliley Act (GLBA)
  • Personal Information Protection and Electronic Documents Act
  • New York State Department of Financial Service
  • Payment Card Industry Data Security Standard

“A dangerous gap has emerged …”

Sharing and storing sensitive information in free-form documents that live outside carefully monitored or secured databases is now a widespread practice. This creates a gap that presents countless opportunities for unauthorized disclosure through inadvertent handling by employees, actions of malicious insiders, and cyberattacks.

Businesses are mobilizing to combat these threats. The first step is to ensure your organization understands the character, significance and challenges surrounding sensitive unstructured data. Focus on these topics to drive better organizational insights into why and what can be done now to close the gap.

  • Who cares about it?
  • What are the types?
  • How are sensitivity levels determined?
  • What are the next steps?

Who cares about sensitive unstructured data?

Unauthorized access or loss of sensitive data hurts your competitive advantage, damages your brand, and can incur significant regulatory penalties.




of customers will stop spending for several months after a breach



will never return to your brand



In addition to customers and potential loss of revenue,

  • Breach of partner information exposes the business to legal damages and seriously impacts the relationship and reputation of both parties.
  • Regulators are responding to increased threats and individual rights. Over 80 countries now have published privacy laws. Non-compliance penalties are increasing and more strictly enforced. Your data may be subject to overlapping and often conflicting requirements.
  • Corporate Governance, Risk and Compliance(GRC) committees define the level and handling policies of sensitive information. New threats and trends must be reflected in policies to guide activities to implement systems and procedures to safeguard this content.
  • Security and IT professionals have spent considerable time focused on network perimeter tools and gap analysis shows shortfalls in safeguarding unstructured data. To fix this, they are turning to data-centric approaches and tools to protect the data itself rather than its location.
  • Employees create and share unstructured office documents, PDFs, CAD/CAE, internally and externally daily, and should work to protect content appropriate to its sensitivity level (e.g., confidential, internal, public).

What are the types of sensitive unstructured data?

Sensitive data is any information that must be safeguarded from unauthorized disclosure. The broadest categories are regulated or unregulated. The former, as required by laws, must be handled as sensitive. Unregulated data includes both business sensitive and publicly known information. It’s up to the business to determine what content it deems sensitive.

Regulated data arises from:


Privacy Regulations: Information that personally identifies an individual and associates that individual with financial, healthcare, and other data.


Industry Regulations: Industry sensitive data. An example would be a weapon system or critical infrastructure governed by the International Traffic in Arms Regulations (ITAR) and North American Electric Reliability Corporation (NERC).


Personal Health Information (PHI), Personal Identifiable Information (PI), and Payment Card Industry Data Security Standard (PCI) continue to be the traditional definition of individual privacy. By gaining access to this valuable data, cybercriminals can steal identities and/or compromise bank accounts to easily earn a quick profit.

Modern day privacy regulations, such as GDPR and CCPA, have broadened the definition of what information is subject to regulations to include individual interactions in the digital space, putting companies under significant new obligations.

Unregulated data of a sensitive nature is determined by the business. It is data the business doesn’t want exposed and can be strategic, competitive, financial or operational in nature. Examples include:

  • IP: Patents, trademarks, formulas, R&D programs, source code
  • Strategic: Pending financial releases, on-going M&A transactions, internal risk deliberations
  • Operations: Inventory levels, pricing policies, customer lists

Today’s cybercriminals are opportunistic and look for companies involved in a current event or have an obvious vulnerability they can exploit for the most value. Examples include: stealing data about important drugs or vaccines being developed or exposing damaging information from an ongoing legal proceeding.

Interestingly, unregulated sensitive content breaches are often a hidden secret. It’s not subject to disclosure like regulated data so organizations often choose to avoid the reputational damage associated with publicizing a breach.

How are sensitivity levels determined?

Regulated data is always sensitive. Most unregulated is not as it includes publicly known information.

Your corporate GRC team or chartered committee determines what data is sensitive. They consider all internal and external mandates, the nature of the data, how it is being used, the likelihood of a breach, and its overall impact on your organization (financial and reputational).

Helpfully, policies have become standardized across industries with “templates and toolkits” that leave little to risk that you can implement with reasonable effort.

Best practices recommend three classification levels (e.g., confidential, internal, public), four at most. Any greater number have shown that the distinctions are too finite for employees to assess and result in subjective and inconsistent application.

To move from templated policies to meaningful execution, its critical GRC team help security and IT professionals in your organization prioritize sensitive unstructured data tasks by directing attention to such factors as:

  • Not all data leaks are equal: The business impact varies depending on the sensitivity of the data and the extent of exposure. Determine what sensitive data, if lost, would hurt your company’s finances and reputation the most.
  • Identify how your sensitive data is shared and stored: What data is at highest risk of being stolen? Not all threats are external. Insider threats are responsible for some of the costliest breaches.
  • Employees: Verizon’s 2020 Data Breach Investigation Report states “employees mistakes account for roughly the same number of breaches as external parties who are actively attacking you.” Education, automation, and centralized controls are critical.

The dynamics surrounding sensitive unstructured data can be daunting. Focusing on a few key steps provides a meaningful path forward:


Consider current trends and update best practices. Most organizations have some form of GRC policies, but the focus has been on structured data security and handling. Locate all potential sources of unstructured data, independent of sensitivity. This helps operationalize the process and keeps your project on task.


Look for gaps in the security infrastructure, taking advantage of data-centric approaches, processes, and tools that safeguard data rather than where the data is (servers, laptops, mobile devices).


Employees need one thing – to get their work done. They will benefit most from automated sensitive data classification that minimizes impact to their workflows. They will be more receptive and committed to the effort if the policies are clearly communicated and outlined for them.

Six trends impacting your sensitive data right now

Explore the latest article

Sign up for emails on new Sensitive Unstructured Data articles

Never miss an insight. We’ll email you when new articles are published on this topic.