Working with data is something that requires a lot of care and precision, yet it often remains an under-scrutinized aspect of DevSecOps. This is because it requires focusing on many moving parts. You need to know exactly when data events occur, what parties are involved, and how they send and store data. In any process with more than minimal complexity, this is a huge web of events. Data flow mapping is the key to detangling that web.
Put simply, data flow mapping is a high-level look at the architecture of a system. It allows you to step back and see moving parts clearly. This can provide insight and transparency into the security of the system’s processes as a whole. In this post, we'll look at why this is so important, and how you can map your own data flows.
Why Data Flow Mapping Is Important
Let's look at an industry where data flow mapping can come in handy: healthcare. Data protection rules in healthcare are often extremely strict and diligently enforced. Let’s look at the (somewhat simplified) data flow events for a clinical trial:
1. A company initiates the clinical trial with a request, providing information on the trial treatment.
2. A separate management entity will then run the trial.
- The management entity first shares the clinical trial protocol with an organization capable of administering treatment to patients.
- This organization will coordinate with the management entity in the handling of this protocol throughout the trial via an intermediate processing unit. This unit receives anonymized data from the treatment organization. It then shares this data with the management entity as statistical data.
- The management entity collects patient information that is relevant to the trial and shares it with the treatment administration organization.
- In the event of serious or adverse events, the management entity and treatment organization communicate these to the healthcare system. The healthcare system then manages emergency treatment though the treatment administration organization.
3. The treatment administration organization works with patients and records their responses to the trial treatment. It then passes the results back to the management entity (via the statistical analysis entity) and the healthcare system.
4. Once the trial is complete, the management entity makes evaluations and hands these back to the company that requested the trial.
Every step must be carefully analyzed and recorded to ensure that patient information, company data, and other sensitive data are stored properly and change hands in accordance with the clinical trial protocol and data protection laws and frameworks such as HIPAA and NIST.
The Who, How, and Why of Data Flow Mapping
In order to find security risks in transmission storage in each of these steps, we need to know the following information:
- Who has access to what data, and where is it stored?
- Who needs access to what data, and where is it stored?
- Who is sending the data, and where?
- How are they sending the data?
- How are they storing the data?
- Why do they need the data (a.k.a. the uses for the data)?
This simplified view of an overall controlled process is already a headache to keep track of. When working with large development projects, it can get even more complicated, as large projects often include many teams working separately, and third-party sources of data events, such as APIs and third-party analytics software.
Data Flow Mapping and Compliance
The inherent complexity of data sharing leads to an often overwhelming amount of data events that need to be followed in order to track potential security or handling issues. Data events must also be documented up to the standards of data protection laws such as the GDPR, making this an even larger undertaking. Depending on the country and sector, we may also need to know:
- when data crosses borders from country to country,
- the type and nature of data,
- compliance and contracting specifics for third-party services, and
- recording and documentation policies under local protection laws.
At this point, we can start to see why we want an easy-to-comprehend, high-level view of our data handling structure. For example, that wordy list of entities and data events for a clinical trial, when in data flow form, looks more like this:
While the written list of data events isn’t the most readable thing in the world, even in simplified form, a data map such as this makes it easier to see who’s handling and storing what data. Furthermore, it can also serve as an aid to reports needed for compliance!
How Data Flow Mapping Works
So, now that you understand why data flow mapping is useful, let's find out how it works.
There are two main types of data flow mapping: manual or automatic. While some elements of data flow mapping are inevitably going to be manual, automation can help make your data flow map more accurate and safer.
Once you or your data protection officer (DPO) decide on the level of automation, data flow mapping is just about following the data.
Step 1: Identify Entities with Access to Data
Identifying entities with access to data is a vital step, regardless of whether you use automation. Even when using data discovery tools, you need to know where to apply them. When collecting a list of entities to map, consider the following:
- Internal or external transfers: Where is the data handled? If it's being handled internally, what departments, people, or roles will be working with it? If it's being handled externally, how is the data being shared? What measures are being taken to coordinate compliance questions?
- Local or cloud-based servers: Is the data being stored locally or on the cloud? If it's being stored in cloud-based servers, what data protection laws apply?
- Third-party access: Be sure to list any third parties with access to the data. These may include APIs, dependencies, external services, etc. Consider ways in which third parties may have hidden or unexpected data sharing.
A good practice is to always keep track of the end goal of data sharing. For each person or entity with access to data, ask yourself:
- What data do they need for their purposes?
- What are their purposes for the data?
Step 2: Identify Personal Data
You or your DPO should now have a coherent view of who to scrutinize. Knowing this, you can assemble a team or employ data discovery software to find instances of personal data. Personal data encompasses any data that can be used to identify a person. This includes names, emails, locations, and biometric data.
After finding these instances of data, categorize and label them with the specific data types they may contain. Pay specific attention to the following:
- Special cases and regulations. Personal data from specific sectors, such as healthcare and finance, is more vigorously protected.
- Specific handling requirements. Protection laws often vary from region to region. Be sure to find the rules for compliance in the source region, and in all other regions where the data may travel.
Step 3: Scan Data Handlers for Security Concerns
Once sensitive data is located and categorized, you must scan and identify security concerns for each data handling step. Much of this, again, can be done with automated tools such as risk detection engines and data discovery and classification tools. However, always scrutinize the end result with an understanding of the process at hand. Automation helps find data or security risks that may be overlooked, but treat it as a tool instead of a one-size-fits-all situation.
From step 2, you should already have a comprehensive overview of who has access to what data. With this knowledge, you can again refer to the questions, Who needs the data, and for what purposes? What security issues may arise from sharing the data with them? There may be additional insights you can find from the data discovery stage.
Step 4: Document and Report Data
Organize the information of each step into your data flow map. Most data flow mapping tools have plenty of options for collecting information about the location, categorization, and life cycle of data. This allows you to have a continuous, detailed view of where data travels. Make sure to format the map in such a manner that it's legible and useful for both internal use and compliance purposes.
Data Flow Mapping Makes Your Life Easier
At the end of the day, the purpose of data flow mapping is to be able to see what's happening with your data. Collecting and cataloging data in a clear visual manner allows you to understand where to look for issues and catch them. This makes a world of difference when thinking about security and compliance, and it’s well worth the effort.
This post was written by Vivienne Roberts. Vivienne is a versatile medical physicist turned developer involved in all manner of projects, from web development and database management for small businesses to machine learning in agricultural automation. Their focus is on providing software for companies and community organizations in need of solid infrastructure.