Streamline and Operationalize Security and Privacy Initiatives
Sensitive Unstructured Data

Leading organizations are discovering how a protect first, file-centric approach fortifies data security and enhances data visibility to comply with privacy regulations like GDPR and CCPA. Now, learn how this approach simplifies implementation and operations to fast track your security and privacy initiatives.


Today’s Data Loss Prevention (DLP) and data security analytics solutions are challenging to deploy and manage. These solutions repetitively apply complicated rules and analytics at each location where data travels to identify misuse.

Common shortfalls include:

  • Rule-sets and analytics only monitor but don’t protect the data itself
  • Responding to alerts, including false-positives, overwhelms security staff
  • Inappropriately applied rules block user workflows
  • Implementation is required at each email, network, endpoint and cloud location

A protect first approach takes a more direct path to safeguarding files that contain sensitive data. At its core is a file-centric technology. A file with sensitive data is discovered, classified and secured the moment it’s created. This one time detect and secure method:

  • Encrypts and binds identity and access to the file itself for strong protection
  • Eliminates continuous monitoring and alert administration
  • Uses transparent and seamless protection that doesn’t disrupt workflows
  • Protects file independent of server, storage or device

By working at the file level, this approach creates a sequence of efficiencies that simplify and streamline data discovery, classification, protection, audit and policy management. When deployed as an integrated platform, the approach delivers a high degree of automation with centralized controls.

Here’s how organizations use the protect first approach to keep their security and privacy projects on a fast track.


Discover and Learn

Build informed policy decisions by keeping initial discovery general by looking for file extensions like docx, xlsx, jpg, and dwg to start gaining insights in advance of more complex security and privacy scans.


Prioritize Inventory

Focus on your active data first - files that were accessed in the past year, what is it, where is it going, who is accessing it, and how is it being used. This is likely your most valuable and vulnerable inventory.



Focus on security needs first. Employ a single fundamental - if it’s sensitive, secure it. Find and protect the most prevalent and common forms of sensitive content first for quick wins.


Protect, Not Alert

Secure the data itself. Eliminate repetitive content and analytic scans at every sever, cloud service, application, or device. No ensuing alerts to burden security administrators so they can concentrate on more pressing security matters.


Platform Solution

Automate and centralize processes. Apply polices and controls across your entire unstructured data inventory with minimal operational overhead or disruption to business workflows.


Now let’s take a more detailed look at how each these activities can keep your project moving.

Discover to Learn

Keep initial discovery simple to gain a first-pass understanding of your data inventory and where security gaps exist

Searching for common file extensions will provide valuable insights into the kind of sensitive information you have and where it is located. The discovery tool searches file shares, desktops, laptops, other endpoints and mapped drives. This snapshot will give you the location of all files, volume of file types you have, who the file owner is, which department it belongs to, and the last date it was accessed.

Use basic insights to formulate priorities

By focusing on the primary goal – to safeguard sensitive unstructured data - you might quickly find that files owned by Human Resources (HR), Research and Development (R&D) or Finance have spread outside their designated file storage locations. If these sensitive files are on employees' laptops, on removable drives or are shared with third parties, the data is at high risk of exposure and should be assessed as an early priority target.


Too often projects lose momentum as governance, legal, compliance, IT and security work across multiple departments to gather requirements and develop policies. Overcome inertia and engage with your data inventory to help drive informed policies.

Image Divide and Conquer

Focus on data that your organization currently generates, accesses and shares. Set older, dated inventory on a separate remediation path. Finally, assess all data for its value, especially “dark” and redundant, obsolete or trivial (ROT) data.

As a general rule, data less than one year old often represents less than 25% of corporate data

Current and active data is typically what matters to your business today and likely the most valuable to threat actors. Target this subset of data first and use the experience to fine tune your policies. Move this current data onto downstream classification and protection processes first to get sensitive data protected and under control as quickly as possible.

Set dated inventory on a separate path

Consider various remediation paths for older inventory to limit its risk exposure while prioritizing current data. Data discovered in unauthorized locations should immediately be moved to approved file-shares. For departments known to deal with a large amount of sensitive data (e.g., legal, audit, HR), use bulk folder-based or in place encryption methods to protect the data inventory in the interim.

They can’t steal what you don’t have

As much as 52% of data stored by organization is “dark” data, the value of which is undetermined and 33% is redundant, obsolete or trivial (ROT). Data discovery will surface all files for review and the deep visibility enabled by a file-based method will identify duplicates and file derivatives (i.e., files that are renamed or format has changed). Eliminate ROT immediately and engage data owners to assess dark data.


Simply discovering the basic facts about your unstructured data will help you decide on the best path to address and prioritize your data security and privacy requirements.

Image Classification

Don’t get side-tracked by diverse data stakeholder interests

There’s a wide range of information governance purposes that impact your data, and security and privacy are just one part. It includes:

  • Data categorization (e.g., identifying a sales contract vs. a memo)
  • Data attributes (e.g., managing big data warehouses)

Stakeholders will understand the negative impact to their brand and punitive privacy penalties arising from a data breach and why a protect first approach makes sense. Ensure your tool selection can support all stakeholder needs and commit to revisit their requirements after enacting your initial data security and privacy safeguards.

Straightforward security measures makes classification simpler

Classification cues downstream tools to invoke controls. Security that relies on multiple factors to cue controls, like DLP and data analytics, adds complexity to classification.

Keep classification simple. If its sensitive, secure it. This enables you to eliminate complexity and streamline classification efforts.

Quick classification win

Our experience has shown that the majority of sensitive content can be found by searching for the most common sensitive data types using basic and proven filters the 80/20 rule. Like:

  • Identification number (like SSN), driver’s license, passport information
  • Bank account numbers, credit card formats
  • Health care codes and terminology
  • Patent and trademark numbers

Organizations too often start by scanning volumes of unstructured data using multiple and complex filters to meet a full range of governance requirements. Instead, use proven filters first to find the majority of your sensitive content, keep false-positives to a minimum and then layer on more specific searches.


Don’t let the classification process turn into an academic exercise. Keep a protect first priority to get your data safeguarded as fast as possible.

Protect, Not Alert
Protect, Not Alert

Today’s DLP and employee monitoring don’t secure the data itself: they monitor data (who has access, where the files are stored, etc.) and alert on misuse but don't secure the sensitive files themselves.

Monitor alert approach overwhelms staff

Security and IT professionals must actively administer and respond to thousands of alerts to implement today’s solutions. Complex rules and analytics generate a high percentage of false-positives and the tools often lack the context to prioritize incidents for administrator actions. Already burdened, security and IT are falling behind and will continue to be less effective as data volumes grow.

A better approach is to automatically secure sensitive data with strong protection from the start and for its lifecycle. There won’t be repetitive content, analytics scans or ensuing alerts. Your data is truly protected from a breach and valuable resources are available for more productive purposes.

Protect the file, not the locations

Traditional solutions implement rules and analytics at each location where data may reside or travel. It’s become increasingly challenging and complicated to scale these solutions with today’s cloud environments, mobile workforces and explosion of endpoint devices.

Rather than struggle to control every network, server, cloud service or endpoint device that interacts with their data, protect the data itself. Eliminate multiple implementations and costly administration.

Platform Approach
Platform Approach

A protect first, file-centric approach creates a sequence of efficiencies that simplify and streamlines document security through data discovery, classification, protection, audit and policy management. It uniquely enables a purpose-built, automated data-centric platform that enforces centralized policies across your entire data inventory.

Eliminate complexity and inconsistency

You quickly lose control of sensitive files governed by a patch work of policies spread across networks, cloud services and devices.

Centralize policy management and manage security, access control, and privacy settings all in one platform and enforce actions immediately that updates across your entire sensitive data inventory.


It’s essential that privacy and security measures don’t disrupt end user workflows. It must be transparent and seamless with controls applied consistently, in real-time and across the entire enterprise. Automate these processes with a file-centric platform:

  • Discovery: don’t rely on users to determine data sensitivity. Use continuous scanning to find files with sensitive information the moment they are created.
  • Data classification: categorize and tag files with automated tools to apply a consistent set of policies.
  • Protect: use classification cues to instantaneously encrypt and apply access and rights controls.

Eliminate tool sprawl and achieve lowest Total Cost of Ownership

Deploying point solutions to close each emerging security and privacy gap, at each location data travels, is inefficient and adds operational complexity.

Consolidate multiple security and privacy tools with a platform that’s location-agnostic, efficient to administer and layers seamlessly with current infrastructure.


The best path to operationalize data security and privacy is to employ highly automated processes and centralized controls that place the burden on the technology and not the end user.

Data security and privacy is everyone’s responsibility and is essential in today’s digital organization. It’s key to your brand, reputation and essential to building and keeping customer confidence.

One of the biggest challenges you can face is working with multiple stakeholders and departments is the time it takes to resolve data security and privacy issues. With everyone having an agenda or priority – your initiatives can languish or stall.

Use a protect first, file-centric approach to streamline and operationalize your sensitive data initiatives with:


Discover to Learn


Divide and Conquer




Protect, Not Alert


Platform Approach


Data Visibility for Privacy and Security

Explore the latest article

Sign up for emails on new Sensitive Unstructured Data articles

Never miss an insight. We’ll email you when new articles are published on this topic.

Protect-First Approach To Data-Centric Security
Sensitive Unstructured Data

Three predominant data-centric security


There are three predominant methods in the market today to prevent loss and unauthorized access to sensitive unstructured data. Each is different and the best way to compare and contrast the methods is to understand what a vendor’s solution looks to defend and the primary data-centric tools used.

Data Flow-Centric


Data at Ingress/Egress Points

Folders, File Shares, Disk, Cloud Files



Data Loss Prevention

Identity & Access Management
Behavior Analytics

Persistent Encryption
Identity & Access Management

Today, with increasing threats and the consequential impacts of a data breach, more organizations are adopting a file-centric method as the foundation of their data-centric architectures.  It’s the only method that truly denies unauthorized access to your sensitive data no matter how it flows or the location it resides.  This protect-first foundation recognizes that if data isn’t properly protected – your entire house crumbles. 

A file-centric method works as a frontline defense and can be deployed in combination with other methods to achieve a fortified, cohesive data-centric security architecture.  Understanding the key distinctions between the methods helps you navigate vendor engagements and build a protect-first architecture that best fits your needs

Image Data Flow-Centric

These solutions defend sensitive data at corporate infrastructure ingress and egress points and use data loss prevention (DLP) tools to stop data leakage. Ingress and egress points include servers, networks end-points, and cloud services.

Today, the majority of businesses have deployed DLP as point solutions – known as Integrated DLP (e.g., network DLP, email-server DLP, or end-point DLP) while few have scaled to a full enterprise DLP deployment (e.g., a full solution suite across all points).

Data flow-centric characteristics:


Prevents data from leaking by intervening with the use or movement of data.


Content matching that actively looks for regular expressions, defined strings, keywords, patterns or data dictionaries.

Additional tools that can be used include fingerprinting (indexing) and image recognition.

DLP solutions set up rules that specify conditions, actions and exceptions. The tools filter messages and files based on their content and prompt corrective measures. They can simply alert a user that an action may be risky or completely block the action. Examples include alerting when sharing sensitive data through email and restricting the copying of sensitive files onto a USB drive.

Many organizations have implemented email DLP since this is the most obvious ingress/egress point prone to unauthorized exchanges of sensitive data. While there are measured improvements, security and IT administrators still have challenges when implementing and operating DLP  solutions, such as:

  • Rules are complex and create thousands of initial false alerts.
  • Concerns over disrupting user workflows causes administrators to loosen controls and implement few blocking mechanisms.
  • Alerts burden administrators and backlogs might take weeks or months to address.

Too often businesses have inappropriate expectations for DLP.  It works - but many underestimate the complexities and resources needed to build, tune, and manage policies to fit your environment. You should anticipate iterative refinement of rules and alert resolution.


Data flow-centric solutions are good at reducing risk but not a strong, protect-first approach. They don’t defend the data itself, but only how it flows in your organization. Any leakage exposes the data to unauthorized disclosure.

Image Location-Centric

These solutions defend sensitive data storage locations. They look for gaps and inconsistencies in identity and access management (IAM) and apply user behavior analytics (UBA) to reduce the risk of unauthorized disclosure of sensitive data. Locations include folders, file-shares, disks, and cloud services.

Location-centric characteristics:


Folder, file-share or disk from unauthorized access and suspicious usage.


Analysis of IAM settings and policies to find discrepancies and obsolete controls.

UBA to monitor and detect anomalous events.

Unlike DLP solutions that query and assess content repetitively, location-centric solutions pre-process, classify, and tag sensitive data. These tags flag where sensitive content is located within your IT data architecture and use:

  • IAM tools: Find excessive, outdated, or inconsistent user permissions and non-existing passwords, evaluate access controls and authorization processes plus search any Active Directory structures to discover discrepancies.
  • UBA tools: Monitor privilege and end user access to detect anomalous behaviors (unusual mailbox activity, large number of failed attempts to access a folder, or excessive downloads of files to a portable storage device).

Location-centric solutions are easier to implement than rules-based data flow-centric solutions because the tools are non-intrusive and use system log and UBA. Location-centric solutions place priority on data visibility and are superior to many approaches when it comes to privacy compliance, audit and reporting requirements.

However, drawbacks with location-centric solutions include:

  • IAM and UBA tools are location-specific. Once a file is removed from the location and downloaded to laptops or endpoints, you lose visibility of the data.
  • Folder management becomes a challenge at scale as a single terabyte can spread to over 50,000 folders. Keeping access lists current and monitoring user activity across millions of folders is burdensome.
  • Like data flow-centric solutions, the alerts place significant demands on administrators’ workloads and their ability to respond in a timely manner.

While obfuscation tools are not native to these solutions, some do use data encryption while the data resides and is used within a particular location. However, when files are downloaded to endpoints, stored in personal cloud accounts, and shared outside the location - protection, visibility and control is lost.


Location-centric solutions use a “least privilege” approach as the foundation for their data protection method – not a “protect-first” approach. Critical gaps arise when data is moved from its original location, and lacking persistent encryption, expose your sensitive unstructured data to a breach.

Image File-Centric

In contrast to the other methods, persistent encryption and IAM are tied to and travel with the file. This is independent of networks, severs, locations and devices. 

File-centric characteristics:


Office documents, CAD/CAE files, PDF, plain text, other digital media file types.


Encryption is persistent, centrally managed and enforced at the file level.

IAM is assigned and enforced at the file level

The method uses data classification tags to:

  • Encrypt the file contents: If exfiltrated, the sensitive data is obfuscated and is of no value to threat actors.
  • Restrict file access to only authorized users: Users can be an individual, departments, business unit or defined by role or title.

File-centric solutions were historically used for very specific use cases but today are experiencing a market resurgence. Modern solutions take advantage of the latest in software tools like RESTful APIs and open operating system standards to work transparently across the enterprise. Centralized policies ensure access and protection are consistently applied across all networks, file-shares, devices, end-points and cloud services.

And when it comes to denying access to sensitive content, the file-centric method is by far the best "protect-first" approach. Here's how leading analyst are advising clients:

  • Despite extensive DLP coverage there are “gaps in data flows where data can leak” and “the better answer is a strategy focused on securing the data itself.”
  • Encryption is entering a “Golden Age.” Due to the growing concerns of data theft, privacy and government surveillance, security pros are increasingly using all forms of encryption throughout their digital businesses.
  • “Identity” is the new perimeter in a world of distributed Software as a Service (SaaS) and other cloud-based services. Centralized administration and control of access to data must be maintained by the business, not service providers.

Look for file-centric solutions that automate discovery, classification and encryption in a single instantaneous step without user intervention. This improves productivity and consistency in application of policies.


File-centric solutions use a “protect-first” approach as the foundation of their data protection method. Persistent access control and encryption remains with the file throughout its life-cycle. Most privacy regulations exempt loss of encrypted files from breach reporting or alternatively, impose significantly reduced penalties.


Organizations struggle to distinguish between data-centric solutions from different vendors as they search for the best way to safeguard their sensitive unstructured data.   Data-centric security encompasses a wide range of processes and tools, many with overlapping functions and focused to different end goals.  Adding to this confusion has been a flurry of gap-filling point solutions (e.g., CASB, end-point protection) launched to address today’s cloud and mobility adoption. 

And despite significant investments in traditional data flow and location-centric methods, data breaches today are at all time highs. 

Adopt a protect-first, file-centric method for your data security architecture. Establish this strong frontline defense to deny any unauthorized access to sensitive unstructured data, no matter how it is used, with whom it is shared, or where it is located. Then, use this foundation to integrate other data-centric methods and tools to architect a data security infrastructure that meets your organization’s governance, risk and compliance mandates.   

Fasoo products span the life-cycle of sensitive unstructured data to discover, classify, protect, monitor, control, track and expire access to content wherever it travels or resides. Our unified solution enables users to securely collaborate internally and externally with sensitive information while consistently meeting corporate governance and regulatory requirements. Our file centric approach using encryption with a unique identifier allows organizations to have more visibility and control over unstructured data without interrupting workflows. We’ve engaged in this journey with over 1,500 enterprises to field data-centric solutions that proactively protect corporate brand, competitive position and meet increasing regulatory demands.

Six trends impacting your sensitive data right now

Explore the latest article

Sign up for emails on new Sensitive Unstructured Data articles

Never miss an insight. We’ll email you when new articles are published on this topic.

Data Visibility for Privacy and Security
Sensitive Unstructured Data

Organizations need better visibility into the use and movement of their sensitive data to meet privacy regulations and safeguard content.


The best approach is a self-reporting file method, one that automatically traces, gathers and records all document interactions without reliance on disparate network, application, and device logs.

The same technology that enables self-reporting files is the foundation of a powerful data security approach – a file-centric method.  Bridge both privacy and security gaps with a file-centric method that delivers deep data visibility and a strong front line defense for your sensitive data.

Traditional security and network tools create a patchwork approach to data visibility that is inadequate, impractical, and unsustainable.

You need visibility to know where your data is, who is using it, and how it changes throughout its lifecycle. Discovery and classification tools are a good start to find data and tag it for downstream controls. However, to maintain control, you need deep visibility to track data as it travels, is accessed and transforms into other file types throughout its lifecycle. 

Cybersecurity and privacy teams are challenged to keep track of sensitive files. A file will be accessed by multiple systems, applications and devices as users share it internally and with external parties. With over 40 different security and IT operations tools used in a typical business, organizations struggle as they work to accumulate, correlate, and report file interactions.

This challenge grows as data visibility is often obscured when documents travel within the organization or shared externally to the organization and change either through duplication or revisions. Without proper data visibility, you can miss the moment sensitive information is shared, moved to a different location, changed, or deleted.

You must also have visibility into sensitive file interactions for data breach investigations and to comply with privacy regulations.  Details must be readily available to support incident response teams; and privacy regulations like GDPR and CCPA compel businesses to report on all data they hold regarding an individual within a specified period or be subject to fine. 


Faced with millions of files and countless interactions across global networks with thousands of end points, organizations need a new way to track data use and movement.

Image Visibility gaps widen as three trends stress legacy infrastructure

IT, security and privacy professionals are working to address widening visibility gaps and overcome the risk posed by:

  • Exponential growth of unstructured data that includes strategic, operational and intellectual property
  • COVID-driven remote workforces suddenly operating outside the corporate perimeter
  • Privacy regulations increasingly focused on an individual’s rights to control their data used by businesses

Data proliferation is staggering, and unstructured data is rapidly growing, estimated to be 80% of a business’s data inventory.  Unstructured data is routinely undermanaged and is hard to control and track as users take sensitive files from controlled repositories, store them on laptops, endpoints, and cloud services and share them in collaboration applications both internally and with external parties.

COVID-19 rapidly expanded the remote workforce and dissolved corporate perimeters.  Sensitive data now resides on more unmanaged and shared devices. It travels on insecure networks and is used in unauthorized or non-compliant apps.  All this is obscured from corporate oversight.      

Privacy regulations have vaulted individual rights to the forefront.  Right to be informed; right to be forgotten; and data residency all impose new demands on data visibility, tracing, control and reporting.


Regulatory agencies and corporate Governance, Risk and Compliance (GRC) teams increasingly focus on the visibility gap of sensitive unstructured data and the actions of security, compliance and IT professionals to close these gaps.

Self-reporting files use an embedded ID technology to trace and record all interactions

Legacy security and privacy data architecture lack the deep data visibility and persistent tracking needed to meet today’s requirements.

Data loss prevention (DLP) and identity and access management (IAM) solutions designed for perimeter security lose track of data migrated to the cloud and when downloaded by remote workers.  Privacy and legal e-discovery applications may have file mapping features, but they are siloed, don’t track all interactions, and the multiple datasets are disconnected and incomplete.     

A unique ID that’s embedded and travels with the file enables persistent tracing and self-reporting of interactions throughout the file’s lifecycle.  By using this method, it:

  • Eliminates working with patch-work logs from multiple systems
  • Provides a single source of truth for audit and regulatory purposes
  • Enables efficient and timely incident and privacy response

An organization’s existing data-centric tools perform better with an embedded ID approach.  Discovery scans lack the intelligence to relate file derivatives that are copied or duplicated.

With an embedded ID, derivatives of an original file, whether duplicated or renamed, inherit the parent ID tag and all its security and governance policies.

An embedded ID reduces tool sprawl by negating the need for tracking tools fielded with each security, privacy and legal e-discovery application.  All applications benefit from a single source of truth for file tracing and interactions.


Using an embedded ID for deeper visibility, tracking and reporting at the file level is the best way to achieve sustainable and auditable processes and better safeguard sensitive data.

Deep Visibility with Embedded File ID

Image File Derivatives

Data changes throughout its lifecycle: As the original file copied and renamed or saved in a different format.

Discovery scans find sensitive unstructured data but lack: The means in subsequent scans to relate derivatives to a previously scanned file.

Missing derivative traceability compromises: Privacy compliance and increases the organization's threat surface as redundant sensitive data is unnecessarily retained across multiple locations.

With an embedded ID: Derivative files inherit the same file ID as the original, making visibility, security classifications and handling controls consistent across your IT infrastructure.

Image Individual Data Rights

Tracing of individual information: Requires persistent visibility and reporting in order to comply with modern day privacy regulations.

Responding to Data Subject Access Request ("DSAR") requires: Organizations to find all customer information and report in a specific period of time (e.g., 30 days).

Any file associated with an individual: Must be accounted for throughout its lifecycle.

An embedded ID: Eliminates the time-consuming task of file forensics. It provides a single source of truth that offers current deep data visibility, letting organizations meet today’s demanding individual information rights regulations.

Image Control at 3rd Parties

Businesses lose data visibility: When they share files outside the corporate network with supply-chain vendors, external legal and financial professionals.

Regulators make you responsible to ensure data is appropriately safeguarded: Breaches of your data while in custody of a third-party requires you to report the breach.

Secure and compliant sharing means: You extend the same visibility and controls that exist within your managed networks to any third parties.

An embedded ID provides the same activity tracking as if the files were internal: Enabling additional controls to set a file expiration date and revoke access at any time to third party locations. This feature is a key compliance component to the individual regulatory "rights to be informed and forgotten".

Image User Behavior Monitoring

Who is accessing your data, how it is being used, and where it is being moved: Are critical inputs for monitoring solutions focusing on detecting data misuse and policy violations.

Data transfers to removable drives and large uploads to cloud services outside of your organization: May be an early warning sign of malicious insider threat intent.

User behavior (UB) analytics are most effective when: Data visibility tools provide a full perspective of user activities across all applications and storage locations.

An embedded ID: Provides the highest granularity of data activity to drive UB analytics leading to earlier detection of insider threats. These data insights cue security methods, such as restricting the copy of data to removable drives.

Deep visibility and a protect-first approach to data security. It’s been observed that “you can have security without privacy, but you can’t have privacy without security.” Both are tightly related, and today, it’s not an either or choice.

A file-centric method with embedded ID is the best choice for data visibility.  The same method enables a protect-first security approach that protects the data itself with encryption and access controls and eliminates redundant and overlapping tools implemented at multiple network and end-points.

Bridge both worlds and close privacy and security gaps with a file-centric method that delivers deep data visibility and a strong front-line defense for your sensitive data. 


Protect-First Approach To Data- Centric Security

Explore the latest article

Sign up for emails on new Sensitive Unstructured Data articles

Never miss an insight. We’ll email you when new articles are published on this topic.

What Unstructured Data is Sensitive?
Sensitive Unstructured Data

“Threat actors are having more success with breaching and exfiltrating sensitive unstructured data targets.”


Your organization’s sensitive unstructured data is a rapidly growing threat surface increasingly targeted by cybercriminals and threat actors. While more attacks are directed at structured databases, cybercriminals are having greater success in stealing sensitive unstructured data.

It’s because this type of data poses a unique series of security and privacy regulation challenges, many of which are not addressed by today’s investments in network, device and application security, cybersecurity frameworks or traditional vulnerability management strategies.

Unlike structured data that resides in well protected IT perimeters, sensitive content exists in unstructured formats such as office documents, CAD/CAE files, or images and are distributed and published via file sharing, social media and email. You generate it when HR collects personal employee information, your sales teams add customer contact information into your customer relationship management (CRM) system, your engineering/security teams collaborate with third-party intellectual property (IP), and so on.


  • New product plans
  • Product designs
  • Customer information
  • Supplier information/third-party contracts
  • Competitor research
  • Customer surveys
  • Software code
  • Job applications, Employee contracts
  • Internal processes, and procedure manuals
  • Data Analytics: Google Analytics, Tableau and Salesforce reports


  • California Consumer Privacy Act (CCPA)
  • General Data Protection Regulation (GDPR)
  • Health Insurance Portability and Accountability Act (HIPAA)
  • Gramm–Leach–Bliley Act (GLBA)
  • Personal Information Protection and Electronic Documents Act
  • New York State Department of Financial Service
  • Payment Card Industry Data Security Standard

“A dangerous gap has emerged …”

Sharing and storing sensitive information in free-form documents that live outside carefully monitored or secured databases is now a widespread practice. This creates a gap that presents countless opportunities for unauthorized disclosure through inadvertent handling by employees, actions of malicious insiders, and cyberattacks.

Businesses are mobilizing to combat these threats. The first step is to ensure your organization understands the character, significance and challenges surrounding sensitive unstructured data. Focus on these topics to drive better organizational insights into why and what can be done now to close the gap.

  • Who cares about it?
  • What are the types?
  • How are sensitivity levels determined?
  • What are the next steps?

Who cares about sensitive unstructured data?

Unauthorized access or loss of sensitive data hurts your competitive advantage, damages your brand, and can incur significant regulatory penalties.




of customers will stop spending for several months after a breach



will never return to your brand



In addition to customers and potential loss of revenue,

  • Breach of partner information exposes the business to legal damages and seriously impacts the relationship and reputation of both parties.
  • Regulators are responding to increased threats and individual rights. Over 80 countries now have published privacy laws. Non-compliance penalties are increasing and more strictly enforced. Your data may be subject to overlapping and often conflicting requirements.
  • Corporate Governance, Risk and Compliance(GRC) committees define the level and handling policies of sensitive information. New threats and trends must be reflected in policies to guide activities to implement systems and procedures to safeguard this content.
  • Security and IT professionals have spent considerable time focused on network perimeter tools and gap analysis shows shortfalls in safeguarding unstructured data. To fix this, they are turning to data-centric approaches and tools to protect the data itself rather than its location.
  • Employees create and share unstructured office documents, PDFs, CAD/CAE, internally and externally daily, and should work to protect content appropriate to its sensitivity level (e.g., confidential, internal, public).

What are the types of sensitive unstructured data?

Sensitive data is any information that must be safeguarded from unauthorized disclosure. The broadest categories are regulated or unregulated. The former, as required by laws, must be handled as sensitive. Unregulated data includes both business sensitive and publicly known information. It’s up to the business to determine what content it deems sensitive.

Regulated data arises from:


Privacy Regulations: Information that personally identifies an individual and associates that individual with financial, healthcare, and other data.


Industry Regulations: Industry sensitive data. An example would be a weapon system or critical infrastructure governed by the International Traffic in Arms Regulations (ITAR) and North American Electric Reliability Corporation (NERC).


Personal Health Information (PHI), Personal Identifiable Information (PI), and Payment Card Industry Data Security Standard (PCI) continue to be the traditional definition of individual privacy. By gaining access to this valuable data, cybercriminals can steal identities and/or compromise bank accounts to easily earn a quick profit.

Modern day privacy regulations, such as GDPR and CCPA, have broadened the definition of what information is subject to regulations to include individual interactions in the digital space, putting companies under significant new obligations.

Unregulated data of a sensitive nature is determined by the business. It is data the business doesn’t want exposed and can be strategic, competitive, financial or operational in nature. Examples include:

  • IP: Patents, trademarks, formulas, R&D programs, source code
  • Strategic: Pending financial releases, on-going M&A transactions, internal risk deliberations
  • Operations: Inventory levels, pricing policies, customer lists

Today’s cybercriminals are opportunistic and look for companies involved in a current event or have an obvious vulnerability they can exploit for the most value. Examples include: stealing data about important drugs or vaccines being developed or exposing damaging information from an ongoing legal proceeding.

Interestingly, unregulated sensitive content breaches are often a hidden secret. It’s not subject to disclosure like regulated data so organizations often choose to avoid the reputational damage associated with publicizing a breach.

How are sensitivity levels determined?

Regulated data is always sensitive. Most unregulated is not as it includes publicly known information.

Your corporate GRC team or chartered committee determines what data is sensitive. They consider all internal and external mandates, the nature of the data, how it is being used, the likelihood of a breach, and its overall impact on your organization (financial and reputational).

Helpfully, policies have become standardized across industries with “templates and toolkits” that leave little to risk that you can implement with reasonable effort.

Best practices recommend three classification levels (e.g., confidential, internal, public), four at most. Any greater number have shown that the distinctions are too finite for employees to assess and result in subjective and inconsistent application.

To move from templated policies to meaningful execution, its critical GRC team help security and IT professionals in your organization prioritize sensitive unstructured data tasks by directing attention to such factors as:

  • Not all data leaks are equal: The business impact varies depending on the sensitivity of the data and the extent of exposure. Determine what sensitive data, if lost, would hurt your company’s finances and reputation the most.
  • Identify how your sensitive data is shared and stored: What data is at highest risk of being stolen? Not all threats are external. Insider threats are responsible for some of the costliest breaches.
  • Employees: Verizon’s 2020 Data Breach Investigation Report states “employees mistakes account for roughly the same number of breaches as external parties who are actively attacking you.” Education, automation, and centralized controls are critical.

The dynamics surrounding sensitive unstructured data can be daunting. Focusing on a few key steps provides a meaningful path forward:


Consider current trends and update best practices. Most organizations have some form of GRC policies, but the focus has been on structured data security and handling. Locate all potential sources of unstructured data, independent of sensitivity. This helps operationalize the process and keeps your project on task.


Look for gaps in the security infrastructure, taking advantage of data-centric approaches, processes, and tools that safeguard data rather than where the data is (servers, laptops, mobile devices).


Employees need one thing – to get their work done. They will benefit most from automated sensitive data classification that minimizes impact to their workflows. They will be more receptive and committed to the effort if the policies are clearly communicated and outlined for them.

Six trends impacting your sensitive data right now

Explore the latest article

Sign up for emails on new Sensitive Unstructured Data articles

Never miss an insight. We’ll email you when new articles are published on this topic.

Six Vulnerable Points In Your Data Security Architecture and How You Can Protect Them
Sensitive Unstructured Data

Do you know where you are most vulnerable? Now is the time to check these key trends:



Hybrid and Multi-Cloud




Insider Threat


Security Gaps


Remote Workforce


Third-Party Collaboration

1. Hybrid and Multi-Cloud Environment

According to Flexera’s “State of the Cloud, 2020 Report”, organizations use an average of 2.2 public and private cloud providers. This exposes your data to the following risks:


Identity and Access Management (IAM): You may have heard the phrase, “identity is the new perimeter”. This “new perimeter” is the intersection of users, devices, and cloud services. Due to the COVID-19 pandemic and increasing regulations, many companies across the globe have had to reconsider how much access their employees have to their systems, applications, and data.


Security: Educate your Governance, Risk and Compliance (GRC), IT security, and Human Resources (HR) teams on the latest risks and make sure they have the data-centric tools they need to combat them. Ultimately, a breach will significantly impact your organization’s reputation and finances.


Data Residency: Cloud environments are boundless and can be located anywhere in the world. Legal and regulatory requirements are imposed on data in the country or region it resides. Review where your sensitive unstructured data is stored (on or off-premise) and make updates accordingly.


A data-centric approach identifies files and secures them in a centralized management system to provide consistency across all channels. Using discovery tools helps locate your data and classifies it with specific tags to control their cloud location.


2. Privacy

Today’s privacy regulations demand greater visibility and control over an individual’s data.

Regulation types include:

  • Responding to the Rights of Individuals: Regulations such as General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) give individuals greater rights to their personal data. Data subject and consent rights must be associated with all information collected on an individual.
  • Access and Revoke: Every file access (system and user) must be traced for data collected. Individuals can elect how and when their data is used. The “right to be forgotten” requires total removal of all data and most transactions. Your organization’s staffing department must respond promptly to any individual privacy and audit requests. Breach notifications timelines are tightened (GDPR and CCPA is 72 hours).


Deep visibility tools accumulate access information during the entire lifecycle of the sensitive unstructured data. You should avoid traditional tools that provide limited visibility and require forensic action to correlate and search across multiple log files.


3. Insider Threat

While external threats from hackers and cybercriminals make the headlines, trusted insiders can pose a greater threat to your sensitive unstructured data. A traditional security infrastructure focuses on external threats using firewalls, anti-malware, intrusion detection, and other security solutions. These solutions may not prevent an employee, contractor or third party vendor with access from sharing it with unauthorized users.

There are three types of insider threats that require your attention:


Accidental: An employee or contractor may accidentally share a document with the wrong person exposing sensitive data. Once out of the person’s control, the information could go anywhere, violating privacy regulations and compromising your competitive position.


Negligence: An IT or security administrator forgets to apply a security patch or update to a firewall rule, exposing your sensitive unstructured data to theft. This is most likely an oversight, since many IT and security groups are overworked and understaffed. Another example would be for a user to deliberately circumvent security policies.


Malicious: Employees, contractors or partners who want to harm your organization or make money selling valuable information to competitors. This type of insider threat is difficult to stop because many have a legitimate need to access sensitive unstructured data.


Encrypt files and apply rights management to decrease the likelihood of unauthorized users accessing your sensitive unstructured data. If hackers and cybercriminals exfiltrate protected sensitive data, it will be useless to them. The same goes for employees or contractors who want to take sensitive data.


4. Security Gaps

Despite significant investments in security infrastructure and the deployment of data loss prevention capabilities, breaches are at all-time highs. Threat actors have greater success exfiltrating information on endpoints and servers where sensitive unstructured data is common.

What you need to acknowledge and have teams address:

  • Beyond prevention: Data Loss Prevention (DLP) blocks and prevents sensitive data activities but doesn’t protect the data itself. Data breaches continue. Organizations and regulators are recommending the increased use of encryption to address the challenge.
  • Not a breach: Many regulations take into account if encrypted data was considered a breach or not. Fines can be significantly reduced depending on the status.
  • Ransomware: While companies may still be subject to disruption, often the most significant risk is sensitive data being exposed to the public or provided to others for financial gain. Data protected with encryption eliminates this risk. Encryption is mandated in modern-day regulations such as GDPR, CCPA, and New York State Department of Financial Services (23 NYCRR 500).


Enhance existing DLP investments by encrypting files with sensitive data. Use centralized encryption key management to maintain protection and control wherever the file travels.


5. Remote Workforce

This is a significant trend that’s been recently accelerated by COVID-19. Security and privacy implemented in corporate offices can’t be replicated at each home. Review your current policies to see if they address:


Home office/Virtual Workspaces: Work is more likely to happen on unmanaged and shared devices, over insecure networks, and in unauthorized or non-compliant apps.


Increased downloads: Slow network traffic, the convenience of working and sharing files - all result in increased volumes of sensitive unstructured data on endpoints.


Insider threat: Unintentional errors disclosing sensitive content increases without safety precautions. Malicious intent from at risk employees with access to home-based, non-sanctioned portable drives and printers is particularly concerning.


Use strong data-in-use tools like rights management capabilities that restrict printing and storing content on removable media.


6. Secure Third-Party Collaboration

Customer information shared with others remains your responsibility, regardless of who leaks the data. The challenges here are:


Loss of control: Once outside your organization, highly sensitive information can be shared either unknowingly or for improper business advantage that hurts your competitiveness.


Screen sharing: Zoom, Skype, WebEx, Google Chat and Google Meet, Microsoft Teams, Free Conference Call, and similar applications expose sensitive information to screen capture by others.


End of project: Sensitive information often remains with third parties long after the project or relationship ends, often unprotected.


Deploy agentless browser collaboration with file tracking and protection. Screen blocking of sensitive information during collaboration sessions prevents losing sensitive data. Revoke access of sensitive files if shared with third parties once no longer needed.


Proactive organizations stay ahead of these vulnerabilities by acting early to evaluate the impact of safeguarding their sensitive unstructured data.

Recommended best practices include:



Update GRC policies to reflect new guidance


Perform security gap analysis of current infrastructure


Implement employee awareness training as new risk and threat vectors emerge

Educate and empower your organization to stay one step ahead of hackers, cybercriminals, threat actors, and those with malicious intent.


What Unstructured Data is Sensitive?

Explore the latest article

Sign up for emails on new Sensitive Unstructured Data articles

Never miss an insight. We’ll email you when new articles are published on this topic.

Trade Secrets and Insider Threats – Levandowski’s Are Everywhere
Data security Insider threat Sensitive Unstructured Data

Insider threat has been an issue for many years, but the consequences of these events have a strong and long term impact on your business.

If competitive advantage isn’t enough reason to protect sensitive data, how about the legal costs?

The risk posed by insiders is again, in the spotlight as Anthony Levandowski, a founding engineer at Google’s autonomous vehicle project, now known as Waymo after it was spun off in 2016, is convicted and sentenced to 18 months in prison. After 3 long years of legal proceedings where Levandowski was charged with stealing trade secrets by downloading 9.7 GB of confidential files, he was sentenced to 18 months in prison and ordered to pay over $178 million in fines to Google.

Justice Served for Trade Secret Laws, But Levandowski’s Actions Have Significant Collateral Damage

Levandowski founded Otto, another autonomous vehicle technology company, after leaving Google, which was acquired shortly thereafter by Uber. A year-long legal battle ensued with Waymo claiming damages of $1.9 billion. A guilty verdict against Uber could have delayed its own self-driving initiatives for years.