Bearer | APIDays: Data Privacy in the age of cloud-native applications

APIDays is a world series of conferences about—you guessed it—APIs. It made a lot of sense for us to attend it in past years, since we started Bearer as an API monitoring platform. As we pivoted to a data security product a year ago, we wondered if we still had something to contribute.

That was until we learned that APIDays would host the Privacy Engineer Conference. Since we had the chance to learn from Data Protection Officers (DPO) and privacy engineers over the past months, we thought we could share their insights to the crowd. If you haven’t followed the event, here is the core of this talk.

The talk in its entirety is now available here.

Key concepts

Before jumping too deep in the topic, it’s important to understand three concepts:

Data privacy refers to the right of an individual to have control over their personal data and the technical procedures to ensure these rights. It is the area of expertise of DPOs and privacy teams, whose objective is to comply with privacy laws such as GDPR.

Data security refers to the protection of data from any unauthorized access or malicious attacks. It is the area of expertise of security teams, whose objective is to prevent data breaches. Data security is one of the key principles of any modern data protection law (for instance “Integrity and confidentiality” for GDPR).

Privacy engineering refers to the methodologies, tools, and techniques used by engineers to integrate privacy principles into the product building process. It is the technical side of the privacy profession, which aims at filling the gap between the law and code. It is still an emerging role that is often taken on by the security team.

Data privacy is paramount for tech companies

Few companies can, nor should, escape privacy laws if their software products process personal data. We have the GDPR in Europe, CCPA in the United States, PIPEDA in Canada, PIPL in China, LGPD in Brazil, and according to Gartner, 65% of the world’s population will have its personal data covered under modern privacy regulations by 2023. As a tech company, it’s difficult to imagine none of your customers living in any of those countries, or that you won’t hold some of their data.

By the way, privacy laws are now actually binding: GDPR fines go up to €20M per infraction or 4% of annual revenues. Technology companies—and not only FANG (Facebook, Amazon, Netflix, Google)—are starting to feel the urgency: N26 got fined €50,000 in 2019, Delivery Hero €200,000 in 2019, Deliveroo €2,5M in 2021, and Tiktok €750,000 in 2021.

Besides the strict legal requirement, customers are simply asking for more transparency. Some companies understand this and use it as a competitive advantage. Apple is the obvious example, but we also see many smaller tech startups playing on this angle. For example, Plausible, a privacy focused analytics company that we use at Bearer, markets itself to privacy-minded companies and individuals.

Privacy teams can’t keep up with engineering

Diagram of how mapping, documentation, and identification play a key role in privacy.

Engineering organizations are increasingly complex. They hire more developers than ever, architecture is moving to microservices, and they use more and more third-party services thanks to the API-first economy. Essentially, infrastructure is becoming extremely fragmented, creating a never seen complexity for privacy teams.

Picture this: a DPO joins a company subject to GDPR with 350 engineers and thousands of internal applications and dozens of integrations with third-party APIs. They need to map personal data flows to build their Record of Processing Activities (ROPA), answer Data Subject Access Requests (DSARs) and identify privacy risks. They spent 80 hours interviewing developers to do so. Months later, their work is only 20% complete and already obsolete. This isn’t hyperbole. We’ve heard this story from DPOs multiple times over.

You just can’t manually keep up with the pace of engineering changes. You should adopt a DevOps mindset to map data flows and identify privacy risk proactively. Here are the best practices we observed 👇.

Engineering to the rescue

1. Identify your engineering assets

First, bring clarity to your software architecture (applications, databases, third-party services). Startups often rely on diagrams and spreadsheets that are updated manually to build their inventory of engineering assets. It works well to begin with, but it doesn’t scale.

You can rely on code annotation practices—for example, by enforcing developers to add comments to their code—to ensure new engineering assets are systematically detected and added to your inventory. Some organizations use JIRA Insight for this. The challenging part is to ensure developers follow your guidelines consistently, plus manage the millions of lines of code already written.

2. Identify personal data flows

Second, map personal data flows between your engineering assets. You can rely on manual surveys here. We have seen privacy engineering teams build powerful GitHub or JIRA workflows to collect data flow information from engineering teams whenever meaningful code changes happened and populate their inventory with them.

The ideal approach is obviously to automate data flow mapping, though it’s highly challenging. Existing data cataloging tools only cover the scope of your production databases; so they won’t help understand your data flows with third-party services. They also require you to give access to your most critical asset—your data and code. As for proxies and in-app agents, they are heavy to deploy and maintain over time, and create new security risks, as we’ve seen with the SolarWinds and Log4j exploits.

3. Audit privacy and security controls

Third, document security and privacy information to identify risks based on your specific data policies. We’re talking about: data subjects, data processing activities, lawful basis, processing locations, data retention, security measures, and Data Protection Agreements (DPA). Still with us? This information can hardly be collected automatically, and you’ll need to collaborate with your engineering team to get it.

Only then can you complete your ROPA, answer DSARs, and identify risks such as:

Collecting more personal data than described in your privacy policy.
Sharing more personal data with a third-party integration than described in your Data Processing Agreement.
Transferring personal data out of Europe.
Storing data longer than described in your data retention policy.
Missing end-to-end encryption on health data you process.

4. Monitor privacy and security risks continuously

Most privacy teams review their inventory every quarter or year. They manually compare what they do (their inventory) with what they are supposed to do (their data policies) to identify gaps. It is time-consuming and issues can go unnoticed for months before being mitigated.

Privacy engineers aim at automating all these steps, and we at Bearer aim at empowering them.

The Bearer way

We help security and privacy engineering teams at product-centric companies mitigate data security and privacy risks.

Bearer scans your code repositories to automatically catalog your engineering assets, map your personal data flows, and identify risk based on your data policies.

Why code scanning? We figured it was the smoother way: no code changes required from your engineers, no connection to your databases, no access to your data. Is it the simpler approach? Clearly not, but this is why we have an amazing engineering team that continues to grow!

APIDays: Data Privacy in the age of cloud-native applications

Key concepts

Data privacy is paramount for tech companies

Privacy teams can’t keep up with engineering

Engineering to the rescue

1. Identify your engineering assets

2. Identify personal data flows

3. Audit privacy and security controls

4. Monitor privacy and security risks continuously

The Bearer way

Bearer Entered into an Agreement to be Acquired by Cycode

Redefining SAST: When AppSec Meets Developer Experience

More blog posts

Enhancing Developer Experience and Security Reporting on Workflows

How Rust lets us monitor 30k API calls/min

Data Flow Mapping: Why It Matters and How to Do It