It is clear a new technology is taking hold when it becomes impossible to avoid hearing about it. Thatâs the case with generative AI. Large language models (LLMs) like OpenAIâs GPT-4 and the more approachable ChatGPT are making waves the world over.Â
Generative AI is exciting, and itâs causing a real fear of missing out for tech companies as they try to match competitors. Weâre not just talking about consumer fascination or perceived âmagic.â There is a very real race to throw âPowered by AIâ into every new feature announcement.
Every development team has at least thought about integrating with these tools. DevSecOps teams need to be engaged to support the inevitable functionality built on generative AI.Â
How teams use generative AI
OpenAI is the category leader in this space. What weâre seeing are two main entry points for users: the superficial âpaste things into ChatGPTâ approach, and the use of OpenAIâs developer APIs.
Giving ChatGPT sensitive data
The first approach may be unexpected for some organizations, as previously it has been assumed that employees wonât randomly enter sensitive information into third-party services. It generally violates company policy to do so. Unfortunately, thatâs exactly whatâs happening. From doctors entering patient details to middle managers building presentations, more and more people are bragging about feeding data into ChatGPT. Adoption is moving so fast that weâre starting to see companies ban its usage.
It is important to understand that OpenAI uses user-submitted data to improve its models. Users can opt out, but the vast majority of users wonât. The data entered is not private. To be fair to OpenAI, this is standard practice for most tech companies. The difference here is the ease and speed of adoption.
Processing sensitive data with OpenAI APIs
The second entry point is the one we at Bearer care more aboutâdevelopers using GPT-N and LLM-style APIs. By default, OpenAI wonât use data sent to their API to build models. This is great, but teams are still sending sensitive data to a third party. Â
The main ways developers make use of the API are:
- Using the API to allow users of their applications to interact with the models, either through chat, autocomplete, or any of the other common use cases.
- Fine-tuning the model with internal datasets to make better use of GPTâs capabilities.
To be clear, both of these approaches bring risks that must be both acknowledged and managed. The first one relies on users to know better than to enter personal or sensitive information. Depending on the application domain, thereâs a good chance the whole point might be to enter sensitive details. The second approach should be one that development teams have more control over. They can anonymize and scrub datasets of anything sensitive before using or fine-tuning a model. To be successful, this requires explicit policies as well as a culture of privacy and security within the organization.
Dependencies are dependencies
If we step back for a moment, we can judge these tools like any other third-party dependency. Mature organizations already have policies in place to assess APIs and services before adopting them. LLMs are no different. They have vulnerabilities, already leaked sensitive data, and caused many in the EU to sound the privacy alarm. OpenAIâs commercial Data Privacy Agreement (DPA) isnât available publicly, but they do note that it is non-negotiable. As with any other service, itâs up to application developers to protect customer data.
The solution is to look past the perceived magic of this shiny new tool and treat it like any other dependency in the stack using scrutiny and security assessments. Thatâs why weâve already begun to add rules to our open-source static analysis product Bearer CLI that explicitly check for OpenAI usage. This rule combined with our resource recipe, and future rules like it, allow Bearer to alert teams if their code sends sensitive data types to LLMs. We believe itâs vital for DevSecOps teams to assess how their organizations use generative AI and ensure it meets the required standards. You can try Bearer CLI and our OpenAI ruleset now to find potential security risks and privacy violations in your code.
Building better workflows
As new platforms emerge we know generative AI represents yet another thing for security teams to keep tabs on. Thatâs why we built our static code analysis tool specifically for developers. Bearer provides context-aware prioritization to teams so they know which alerts are the most criticalâbefore the code goes live. Ensure your business stays ahead of the curve with our state-of-the-art software. Subscribe to receive similar updates from Bearer here and join our waitlist to get early access to Bearer Cloud.