Bearer | We benchmarked top SAST products, and this is what we learned

When we started to build Bearer, we wanted to understand how to validate the quality of our findings and be able to benchmark it. Code security scanning solutions are notorious for reporting a lot of false positives and other deficiencies, and even though we believed we could do much better, we needed a way to prove it.

In Java, there is an OWASP project, BenchmarkJava, which makes it easy to compare the output of two software security solutions. Unfortunately, there is no similar benchmark for other coding languages.

We’ve shared previously how we are building and improving our own solution using Open Source projects. During our conversations with enterprise customers and users, we understood that they deal with a similar challenge of comparing a solution with another, resulting in an incomplete decisions and a lot of frustration later on. With organisations consistently dealing with tight product release timelines in a competitive market, our hope is to help security meet the developers where they are, and for them to use this benchmark as a decision enabler.

As we like to build in public as much as possible, keeping in theme with our Open Source engine, we thought it was time to look into how Bearer CLI compares today with other “free” and available SAST solutions on the market.

For this benchmark, we are focusing on a few key features, such as language support, quality of the findings, speed of the scanner, extensibility and relevance of the ruleset, and User Experience (UX) for both security and developers. We will go through them all in detail below, giving you a blueprint and data points for how to compare two or more SAST solutions.

Note: Like any benchmark, ours has some inherent biases. We have documented our decisions as thoroughly as possible so that you have all the context you need to understand them. More importantly, we have included the dataset used to generate the numbers at the end of the article, so that you can explore it yourself and come to your own conclusions.

Landscape

We decided to benchmark Bearer against solutions that are often mentioned as being the 'modern' ones, which is a mix of commercial solutions with Free and/or Open Source offering.

We have selected Semgrep, Snyk Code, Sonar, Brakeman (as part of Synopsys) and of course, Bearer CLI.

	Bearer CLI	Semgrep	Snyk Code	Brakeman (Synopsys)
Free offering limitation	N/A	N/A	100 scan per month	N/A
Open Source license	Elastic	LGPL 2.1	N/A	Unclear
Built in	Go	OCAML + Python	Unknown	Ruby
Launched in	2023	2017 (originates from Facebook Pfff OSS project)	2020 (originates from Deepcode acquisition)	2011

‍

We will review these solutions under 5 sections in this benchmark:

Language support
Quality of findings
Speed
User Experience
New risks coverage

If you want the see all the datapoints we collected, head directly to the 'The Complete SAST Benchmark' at the end of this post.

As you go through this benchmark, you are welcome to find more about Bearer CLI, or try it directly via Github to compare it with your SAST scanner and create your own benchmark. If you are interested in learning how you can manage application security at scale, supercharged with sensitive data context, you can request a demo for Bearer Cloud.

Language support

Language support is probably the #1 factor when deciding to use a SAST product. In today’s world, teams tend to use multiple languages and stacks to build their product, resulting in the need for a solution with broad language support or the usage of multiple solutions.

Unfortunately, language support is not as simple as “does it support X?” The level of granularity required to offer good support makes it difficult to do it well across many languages. Behind the language support there are factors such as framework support, relevance of the ruleset, quality of the rules themselves and their maintenance.

Solutions that support many languages tend to offer a very disparate quality of support, leading customers to either combine multiple options or accept low quality on some of their stacks.

We collected data on the three categories below for our benchmark:

Language coverage: Which language does the solution cover?
Ruleset: How many rules are available for each language supported? Even if it’s not necessarily a good indication of the depth of the support (details follow), it’s a good point of comparison of language coverage among the same solution.
% of rules that triggered in the benchmark: Providing a lot of rules is great, but do they actually matter? That’s a very different question. One of the ways to assess the quality of the rules set, is to assess their probability to trigger.

	Bearer CLI	Semgrep	Snyk Code	Brakeman (Synopsys)
Language coverage	JS/TS, Ruby, Java	JS/TS, Ruby, Java, Go, C#, Kotlin, PHP, Python, Scala	JS/TS, Ruby, Java, Go, PHP, Python, C#	Ruby
JS/TS
Ruleset	65	208	51	N/A
% of rules that triggered in the benchmark	54%	18%	100%	N/A
Ruby
Ruleset	59	43	26	84
% of rules that triggered in the benchmark	58%	65%	100%	65%

‍

‍TL;DR We can clearly see that the size of the ruleset itself is not that important. A large count that only triggers a few rules may indicate a lack of relevance and maintenance, while on the other hand fewer rules may be because the way they are built is more in 'catch all' than 'surgical'.

Quality of findings

Beyond the language support, what matters most is the quality of the findings, which is the most difficult part to assess. To run this benchmark, we’ve used 15 top Open Source projects for each language (Ruby and JS/TS) and manually reviewed and classified every finding. Traditional SAST tools are notorious for a high %age of false positives, so this will be our focus here.

Ultimately, we collected data on the three categories below for our benchmark:

Total # of findings: How many findings did the solution surface?
Total # of false positives: How many of those findings are actually not relevant?
% Precision: What is the overall precision?

	Bearer CLI	Semgrep	Snyk Code	Brakeman (Synopsys)
JS/TS
Total # of findings	508	216	600	N/A
Total # of false positives	50	27	234	N/A
% Precision	90%	87%	61%	N/A
Ruby
Total # of findings	333	1345	327	471
Total # of false positives	39	765	221	310
% Precision	88%	43%	32%	34%

‍

TL;DR The level of precision is the most important data point we all have in mind, though it’s important to contextualise it with the number of findings. It is possible to achieve 100% precision with only 10 findings, which highlights the need for comprehensive evaluation.

‍Furthermore, what is overlooked in this analysis is the number of false negatives (often known as 'recall'), which represents the instances of missed findings or false negatives.

Speed

Speed is a single data point, but quite important so it required its own category. As reported in the Github DevEx Survey 2023, 25% of the developers’ time is being spent on waiting for code reviews, so it is an important consideration for DevSecOps programs, mainly for two reasons:

How long does the team need to wait to review findings?
CI/CD runtime costs money. Every second a scan hangs is money lost.

	Bearer CLI	Semgrep	Snyk Code	Brakeman (Synopsys)
JS/TS
AVG execution time	82 seconds	33 seconds	270 seconds	N/A
Ruby
AVG execution time	131 seconds	72 seconds	418 seconds	79 seconds

‍

TL;DR In general, solutions that operate "locally" tend to be quite fast, enabling seamless integration into a CI/CD workflow. However, it is noteworthy that Snyk, which runs in the cloud, is significantly slower compared to others.

It is crucial to consider both speed and precision together, as speed alone holds no value. Therefore, we strongly recommend keeping these two data points in mind simultaneously.

User Experience

When providing a solution for security engineers and developers, UX is key and has some specific language associated with it. Ultimately, we are talking about a tool for engineers.

These are important questions we all need to consider - how difficult is it to set it up? how do you run it? does it require sending source code to a cloud? do you have access to a vast number of output formats? what are the arguments required to control the tool precisely? how well does it integrate into your workflow?

We gathered data on the following categories below for our benchmark:

Setup type & avg time: Engineers have very little time and therefore patience. The speed and easiness of the setup is usually a good indicator of a good developer tool.
Execution type: Is the scanning done offline (on your machine or infra) or on a distant Cloud? Essentially, do you need to trust your SAST providers with your code?
CLI options: Is the solution fully controllable from the CLI? From choosing which rules to execute and limiting certain severity levels, up to ignoring findings.
Output format: The more formats it supports, the better the tool will integrate in your workflow. It’s especially true with SAST that are run both manually as well as automatically and need to integrate with other tools easily.
Open rules: Are the rules underlying code/pattern provided? It’s an important topic if you want to understand why a finding was triggered or not, and ultimately to provide confidence in the rules themselves.
Custom rules support: Providing an excellent bundle of rules as part of language coverage support is key, but there are always custom use-cases where specific might be required. Being able to build your own rules using a solution is the best way to make sure all your use-case will be covered in the long-term, and help reduce vendor lock-in.
Source Code Management (SCM) integration level: Modern security products need the best possible CI/CD integration. Does it integrate out of the box with GitHub and GitLab? In your CI, CD? Can it annotate PR?

	Bearer CLI	Semgrep	Snyk Code	Brakeman (Synopsys)
Setup type and avg time	CLI install (< 1 min.)	CLI install (< 1 min.)	Require online signup + CLI install	Ruby package install (< 1 min.)
Execution type	Locally	Locally	On the cloud	Locally
CLI options	Complete	Complete	Partial (Missing filtering per rule and ability to ignore findings)	Complete
Output format	JSON, SARIF, HTML	JSON, SARIF, XML	JSON, SARIF, HTML	JSON, SARIF, CSV, HTML
Open rules	Yes	Yes	No	Yes
Custom rule support	Yes	Yes	Beta	Yes
SCM integration level	GitHub and GitLab Security integration, pull request annotation	GitHub and GitLab Security integration	GitHub and GitLab Security integration, pull request annotation	Unclear

‍

‍TL;DR Since we’ve benchmarked 'modern' solutions, we can clearly see that they mostly all live up to this expectation when it comes to User Experience. Though, it’s important to mention that the best experience comes from those that are open and free in comparison with the fully closed source products.

New risks coverage

SAST is evolving, as risks and security team roles are. We believe that sensitive data exfiltration risks combined with third-party services risks should by default be part of any SAST product.

By adding a sensitive date context layer to our SAST solution, Bearer CLI is able to detect risks such as “leakage of PHI to a logger”, or “leakage of PII to OpenAI”, as well as provide an automated privacy report to allow security and privacy engineering team to kick-start their compliance journey.

	Bearer CLI	Semgrep	Snyk Code	Brakeman (Synopsys)
Third-party component detection	Yes	No	No	No
Data exfiltration rules	Yes	No	No	No
Threat modeling	Yes	No	No	No
Privacy report	Yes	No	No	No

‍

TL;DR The increasing importance of safeguarding sensitive data and preserving privacy has made them among the most influential emerging risks for your organization. As the nature of risks continues to evolve, it is crucial for the value offered by your SAST solution to align with these changes.

The Complete Benchmark

Here is the datapoint summary of the benchmark:

	Bearer CLI	Semgrep	Snyk Code	Brakeman (Synopsys)
Free offering limitation	N/A	N/A	100 scan per month	N/A
Open Source license	Elastic	LGPL 2.1	N/A	Unclear
Built in	Go	OCAML + Python	Unknown	Ruby
Launched in	2023	2017 (originates from Facebook Pfff OSS project)	2020 (originates from Deepcode acquisition)	2011
Language support
Language support	JS/TS, Ruby, Java	JS/TS, Ruby, Java, Go, C#, Kotlin, PHP, Python, Scala	JS/TS, Ruby, Java, Go, PHP, Python, C#	Ruby
JS/TS
Ruleset	65	208	51	N/A
% of rules that triggered in the benchmark	54%	18%	100%	N/A
Ruby
Ruleset	59	43	26	84
% of rules that triggered in the benchmark	58%	65%	100%	65%
Quality of findings
JS/TS
Total # of findings	508	216	600	N/A
Total # of false positives	50	27	234	N/A
% Precision	90%	87%	61%	N/A
Ruby
Total # of findings	333	1345	327	471
Total # of false positives	39	765	221	310
% Precision	88%	43%	32%	34%
Speed
JS/TS
AVG execution time	82 seconds	33 seconds	270 seconds	N/A
Ruby
AVG execution time	131 seconds	72 seconds	418 seconds	79 seconds
UX
Setup type and avg time	CLI install (< 1 min.)	CLI install (< 1 min.)	Require online signup + CLI install	Ruby package install (< 1 min.)
Execution type	Locally	Locally	On the cloud	Locally
CLI options	Complete	Complete	Partial (Missing filtering per rule and ability to ignore findings)	Complete
Output format	JSON, SARIF, HTML	JSON, SARIF, XML	JSON, SARIF, HTML	JSON, SARIF, CSV, HTML
Open rules	Yes	Yes	No	Yes
Custom rule support	Yes	Yes	Beta	Yes
SCM integration level	GitHub and GitLab Security integration, pull request annotation	GitHub and GitLab Security integration	GitHub and GitLab Security integration, pull request annotation	Unclear
New risks coverage
Third-party component detection	Yes	No	No	No
Data exfiltration rules	Yes	No	No	No
Threat modeling	Yes	No	No	No
Privacy report	Yes	No	No	No

‍

Our goal is to try to update this benchmark every once in a while and ideally expand it to other solutions and languages. We are excited to hear your feedback and comments on this, so please don't hesitate to reach out to us on Twitter @tryBearer or join us at Discord!

‍Please find here the entire data set used to create this benchmark.

We benchmarked top SAST products, and this is what we learned

Landscape

Language support

Quality of findings

Speed

User Experience

New risks coverage

The Complete Benchmark

Bearer Entered into an Agreement to be Acquired by Cycode

Redefining SAST: When AppSec Meets Developer Experience

More blog posts

9 Data security best practices and how to implement them

API security best practices

The difference between Turbo Streams and Turbo Frames