<- Back to the blog

Data Discovery: A Detailed Guide to the What, Why, and How

Modern business runs on data. Even companies that produce and sell physical products create, store, and use data. They need it to find customers, maintain relationships, sell products, and monitor costs and profits. Therefore, data is valuable. It's worth protecting, especially when you consider how often we hear about bad actors stealing it. 

But you can't protect something you don't know you have. You need a complete picture of what data your business is producing, storing, and using. You can't manage data if you don't know where employees keep it, and who has access to it. As a result, you need a data discovery program that will show you what's going on in your enterprise. 

Let's talk about data discovery. We'll start out with a definition, review what it's important, and then outline how you can get started. 

What is Data Discovery?

Simply put, data discovery is locating data, collecting it, and then highlighting sensitive or regulated information. 

Regulated data is information that governed by laws or regulations. This includes personally identifiable information (PII), personal health information (PHI), and other data that's covered by laws like the GDPR, HIPAA, and PCI DSS. 

Sensitive data is information that regulations don't cover, but is still critical to your company. This includes confidential or proprietary data that your company doesn't want to share with the public or competitors. For example, sales records, legal information, and source code. 

With data discovery, your security teams identify this information and act to protect it. 

Graphical user interface, text, applicationDescription automatically generated

Why Is Data Discovery Important?

Two relatively recent trends make data discovery critical: data proliferation and the cloud. 

Data multiplies. Every business activity generates new information each day. Much of this data is valuable since you can mine it for business intelligence. But the data is often sensitive or regulated, too. Discovery ensures that you know where your data is so that you can manage it safely. 

Some data isn't worth keeping, but that doesn't mean it doesn't contain sensitive or regulated information. So, you need to identify and manage it, too. Even if "managed" simply means destroying as quickly as possible or simply ensuring that you don't collect it anymore. 

An old meme says the cloud is just someone else's computer. There's some truth to that joke: if your data is in the cloud, it's stored on systems that don't belong to you. But the data still does, and if something goes wrong, you'll still be responsible. 

This doesn't just pertain to cloud hosting, because the cloud includes SaaS offerings like Dropbox, Google Suite, and Office365. Should you allow your company's data on these systems? If so, you still need to know where it is. If not, you need to find out if it's there and get it out ASAP. 

Of course, cloud hosting and cloud applications are a still a factor, too. If you're using a log aggregation service, it probably contains sensitive information. It's up to you to make sure that the data isn't made available to the wrong people and kept as long as required for reporting. 

Data discovery is the only way you can identify your company’s data and ensure that the right controls are in effect. 

Graphical user interface, text, applicationDescription automatically generated

How Can I Do Data Discovery?

The good news is that there is a simple way to perform data discovery, and a wealth of tools that can help you. Simple doesn't mean "easy," though. Data discovery is a lot of work, and it's a process that's never done. Even after you discover all the data your company uses today, there will be more tomorrow. 

Let's go over five essential steps you can take to find your data and keep it safe. 

Collect

The first step is to gather all your organization's data. Sensitive, non-sensitive, regulated‚ÄĒit doesn't matter. Gather it all.¬†

Depending on the current state of your data management, this step may be the most difficult. It requires interacting with every department and examining the data they produce and consume. 

It's also with this first step that a data discovery platform can help. The platform integrates with your systems and help with locating, identifying, and collecting your data. 

The result of this step is a consolidated view of your enterprise data. 

A picture containing electronicsDescription automatically generated

Analyze and Curate

Now that you've gathered the data, it's time to analyze it. 

Each data set fits in one or more category, based on its contents. How you curate it depends on your business. You need to organize the data that best suits how you need to protect and report on your data. 

One of the most important factors is the regulations that apply to you. While all jurisdictions have rules regarding PII, how they define it and how you report on it differs. So, you'll need to process personal differently depending on where it comes from, where you store it, and which jurisdictions you're accountable, too. 

Next comes data that's sensitive, but not regulated. What information is critical to your company? One obvious answer is your source code. It may contain trade secrets or serve as a road map for hackers. You may also consider transaction history, once you cleanse it of PII, sensitive. Or you may need to keep those records in order to conform to contractual or legal requirements. 

Finally, there's the data that's neither regulated nor sensitive. This information may or may not be useful for business intelligence so, it needs to be classified based on where it belongs and how long you will keep it. 

This is another area where the right discovery platform can help. It can give you a consolidated view of your data and may have tools to automate data classification. 

Organize and Sanitize

Next, convert the results of your curation into action. 

If you've discovered sensitive or regulated data in places it doesn't belong, remove it and get it back under control. This may mean needing security controls that limit access to cloud storage or email, which we'll cover in the next step. 

If you are collecting and keeping data for too long, purge it. This may reduce storage costs, eliminate unnecessary work, and reduce the attack surface for hackers. 

A close-up of a computerDescription automatically generated with low confidence

Protect

Finally, put policies and procedures in place to protect the data you've identified. 

We already mentioned some steps above. Some companies limit or eliminate access to cloud applications. 

There are more steps for you to consider: 

  • Physical storage for computer systems, backup media, and paper files.
  • VPN and firewalls for work-from-home and client connections.
  • Encryption-at-rest for sensitive and regulated data.
  • High availability systems for all client data.
  • Limiting access to PII to only employees that need to see it.

The protection measures that make sense for your company vary with the data you collect and with how you use it. 

Repeat

Data discovery is an ongoing process. A discovery platform will provide you with ongoing monitoring and reports as your company creates new information. But you'll still need to schedule regular audits and reviews. 

Data Discovery

In this post, we covered data discovery and how it will help you secure your data. We started out by defining data discovery and how, in the context of data security, it's a critical business process. Then we went over five decisive steps you need to take today to find your data and classify, organize, and protect it. We also saw how a data discovery platform can help you by finding data, collecting it, and providing you with tools to curate it. 

This post was written by Eric Goebelbecker. Eric has worked in the financial markets in New York City for 25 years, developing infrastructure for market data and financial information exchange (FIX) protocol networks. He loves to talk about what makes teams effective (or not so effective!).

Industry Focus
Share this article:

Bring data security to DevOps

Get a personalized demo to see how Bearer helps you reduce risks of data breaches across your application environment.