Bearer | Your data map is missing APIs and dependencies

In order to ensure compliance with the growing list of personal privacy regulations—like GDPR, CCPA, and PDPA—your company needs to know how it handles the personal information of your customers, users, and even visitors.

A data flow map helps bring visibility into how data moves throughout your application, but it should also include any instances where data leaves your application. This is where APIs come in. Do you know which APIs are using personally identifiable information (PII) from your application? How about logging and bug tracking tools? Are they capturing user data and storing it without your knowledge? Does your privacy policy make users aware that these third parties will be processing their data? These are all questions that a thorough and automatically updated data flow map can answer.

‍

The data map tells a story

Data maps represent how data flows through your organization. They organize the journey of each piece of information from when it was collected until it is deleted or placed in long-term storage.

It is the story of how you first learned a piece of information about a customer, to what you did with it, and all the way until you deleted it.

The format a data map takes varies from company to company. Some use spreadsheets and label each piece of PII, while others use dedicated flowcharts and data mapping tools that can provide a more visual "map" of how information flows through your organization. Regardless of how the data flow map is created, all data maps answer the following questions:

Where is the data collected?

This could be a form filled in by a user or authorized data transfer from a third party service or API—like receiving the name and email part of an OAuth exchange.

What personal data is collected?

Any piece of information that could fall into the category of "personal information" should be tracked. To learn more about what data falls under GDPR, you can read more about the various types of personal information (PI) or personally identifiable information (PII) in our article The Essentials of Personally Identifiable Information.

Where is the data stored, and in what format?

If the data is stored on-site or in personally controlled data centered, this should be easier to identify. If it is stored in a cloud provider—more common these days—each piece of data should be tagged with the correct location. Keep in mind that some regional legislation has very specific rules for how data can be transmitted across borders. The map should also indicate if the data is encrypted.

Where does the data go?

User information is used throughout your products. Some may be limited to a single feature, while others like email addresses and names may touch many parts of your product. A data flow map should highlight where personal data moves—both internally within your application and externally to any third party integrations or data processors.

Who is responsible for each data transfer?

If a piece of personal data is identified on the data flow map as transferring between storage locations or between services, who is responsible for this action? It can be an individual or team that overseas the feature or service that handles the transfer, but either way an "owner" needs to be established for the data map. This ensures that the whole team plays an important role in the protection of personal data. It is a good practice to tag every data point with each part of your organization that interacts with it.

How is the data used?

This question is bigger than your data map. Privacy regulation increasingly requires that you only use data for the purpose that it was originally collected. This means receiving consent from the individual when you collect the data or change the purpose. This is known as lawful collection. Defining how the data is used as part of your data map keeps you honest, and it makes it clear for everyone in the organization that the data isn't free to use however an individual team pleases. This also means that any new uses will result in an update to the map, along with updates to the other questions mentioned in this section. Now is a good opportunity to ensure that your usage matches up with the privacy policy that customers and users agree to.

How long is the data retained?

Earlier I mentioned that a data map shows the journey of personal data through your organization. That journey ends either when the data is deleted, or when it is placed in a kind of long-term storage. Many teams have systems in place for how long inactive accounts are active, but this is a good time to also establish how long to keep data that is no longer needed. In fact, some regulation even requires that data be removed once it is used for its initial purpose. Including the data retention timeline for each piece of personal data in your organization as part of your data map ensures that the timeline is actually being followed. This question also presents you with an opportunity to investigate how long your external data processors retain information about your customers. Even if you only have access to data for a limited amount of time, they may be storing it longer.

Data maps and GDPR compliance

The General Data Protection Regulation (GDPR) is the most famous piece of regulation where a data map is a valuable artifact. It isn't explicitly required to have a data flow map of the personal information your applications collect, but it does help fulfill a handful of the law's requirements.

One requirement is the data protection impact assessment (DPIA) explained in article 35. This assessment is required whenever a new project begins as a way to explain why data is being collected, how it is being used, and what measures are taken to avoid risk when handling the data. A data flow map allows you to cover most of these grounds as it catalogs every instance of data collection, and explains how and why the data is used.

Data mapping for GDPR also helps comply with article 30, which requires organizations to keep a written record of all data processing. They also cover article 6 by showing that the collection is occurring lawfully. This is particularly useful if your company is audited, as you'll have a complete picture to present to the auditors of how data is collected and used.

One area of particular interest is article 28, which requires organizations to keep an accurate list of all third-party data processors. Thorough data flow mapping will ensure that all third party processors, like APIs, logging services, and data storage centers are accounted for and tracked.

Perhaps the most intangible benefit toward GDPR compliance is in article 25. A data flow map shows an organization's ongoing commitment to data privacy.

How to begin mapping data points and creating a data map

Creating a data map is mostly a manual process. It is recommended that the data protection officer (DPO) within your company start by building and distributing a survey to all parts of the company. In it, they should request information about all the pieces of data collected in order to create a data inventory. Next, the DPO should meet with key stakeholders and leadership from each team to ensure the accuracy of the surveys and discuss how each team uses personal information.

The DPO also needs to gather any contracts, SLAs, and agreements the organization has with third-party vendors. This includes everything from data centers, to recruiting tools, and especially APIs. Along with these business contracts, they should seek out all privacy and data sharing agreements that their vendors provide, and ensure that those without clear privacy policies can make them available.

From here, the data protection officer should have a better picture of how personal information, and data as a whole, travels through the organization. At this stage they can organize it using an existing data mapping tool or input all the data into the software of their choice.

While this process can be very manual, there are tools that will monitor your usage of external services and APIs to track how you are sending PII data to them. It won't create the full data map, but it can intelligently—and more importantly, automatically—enhance your data map.

Stay on top of changes

While building the initial data map is important for GDPR compliance and will take the bulk of up-front work, maintaining the accuracy of the data map can sometimes prove challenging. For this, implementing a data governance strategy is important. In the past I've explored how implementing a governance strategy can help track APIs within an application, and even prevent shadow APIs. The same approach can be applied to protecting personal information. By adding a system of processes and checks to the workflow of all teams, you can avoid new data collection methods or unexpected data-leaking APIs from making their way into the codebase of your products. We've found that some of the most common areas where companies lose track of PII is in third-party APIs, integrations, and development tools like bug trackers and user analytics.

Protecting the personal data of your customers is more than regulatory compliance. It is also becoming a major business advantage as customers are becoming more aware of the privacy implications that face their data on a daily basis.

‍

Your data map is missing APIs and dependencies

The data map tells a story

Where is the data collected?

What personal data is collected?

Where is the data stored, and in what format?

Where does the data go?

Who is responsible for each data transfer?

How is the data used?

How long is the data retained?

Data maps and GDPR compliance

How to begin mapping data points and creating a data map

Stay on top of changes

Bearer Entered into an Agreement to be Acquired by Cycode

Redefining SAST: When AppSec Meets Developer Experience

More blog posts

How to discover sensitive data across your products

Using Bearer to scan your code for Privacy risks

Scaling Secure Code Review in Modern Enterprises