Blog

Container Security Tools Comparison for Vulnerability Scans

Recently, we took on a new challenge: compare 5 popular container security tools, including our solution. We wanted to see how the products stack up against each other. How did they do? Read on to find out!

TL; DR:

Containers have been causing waves in IT and dev circles since 2013 when Docker’s container technology was launched. They have revolutionized deployment adding both speed and stability and have become critical for most IT operations, so securing them is a priority for all of us. How well do various tools do that? See the table below:

ToolStep 1 (Squid)Step 2 (Patched)Step 3 (Add App)Result
MergeBase312ALL
Aqua000NONE
Snyk2002
Docker Hub1001
Quay000NONE

But before we get into the details, it’s worthwhile to quickly revisit the importance of application container security in the modern-day development landscape.

The Importance of Container Security

Containers enable developers to run applications quickly and reliably when moved from one computing environment to another. But despite their many advantages – including increased application isolation – containers also amplify security risks. Increasing adoption in production environments makes them attractive to malicious actors. Since traditional network security solutions cannot always protect against lateral attacks, a lot of effort goes into developing application container security solutions.

Container security refers to the tools (e.g. Docker container security solutions) and policies implemented to protect container integrity and reliability, mitigate risk, and minimize vulnerabilities.

Container Security Tools Compared

To protect containers from attacks, many security tools are available. Usually, they audit the Common Vulnerabilities and Exposures (CVE) set by the National Vulnerability Database (NVD), or the benchmarks set by the Center for Internet Security (CIS).
Most containerized applications and their underlying infrastructure are distributed widely and highly dynamic. In this scenario, manual vulnerability scanning can be time-consuming and resource-intensive. To reduce operational overhead, many tools offer automation. Some focus on specific aspects of the cloud-native ecosystem, e.g. runtime security.
For our analysis, we picked 5 popular automated container scanners:

  • Aqua
  • Snyk
  • Docker Hub
  • Quay
  • MergeBase

Methodology

Before starting our analysis, we set up three images. The images are a logical progression where start with a vulnerable version of squid, then patch it and then add a vulnerable proprietary library, a proxy for applications you might produce and deploy in Docker images

container scanning expectations
  1. Seeded with a vulnerable version of Squid, a caching and forwarding HTTP web proxy
  2. Patch Squid to latest safe version
  3. Download a proprietary jar file that is vulnerable. Your own applications would typically fall in this category and it is challenging for most container scanning tools to analyze these.

We expected that each application would find vulnerabilities in all these steps.

However, this is not quite what happened!

Before we reveal the results of our tool comparison, here’s a sequence of steps that shows the Docker files used to build the images. Also, for readers planning to replicate our experiment, bear in mind that vulnerability scanning is sensitive to the date of the scan. We completed this experiment in early April 2021. New vulnerabilities may have been found and published since then, and security scanning tools themselves may have also changed.

Procedure to Build Images with Docker Files

If you need the build scripts, please ask us. We believe in transparency and are happy to provide them.

Results of Container Scanner Tool Comparison & Analysis

1. Aqua

For teams wondering how to secure Docker containers, Aqua claims to provide “enterprise-grade security for Docker environments” from development to production. Its tool scans images for vulnerabilities, malware, configuration issues, etc. for continuous image assurance. Its vulnerabilities database is aggregated from multiple, constantly-updated data streams to increase detection accuracy and provide better protection.

Aqua container scanning did not find any vulnerabilities

Despite these claims, the tool didn’t quite make the cut in our test. In fact, Aqua found no vulnerabilities at all, raising doubts about its effectiveness.

2. Snyk

Snyk helps teams automatically find, prioritize and fix vulnerabilities in containers throughout the container lifecycle. It can detect vulnerable dependencies during coding, prevent new vulnerabilities from passing through the build process, and test the production environment for newly-disclosed vulnerabilities.

Snyk says that it has fixed over 5 million container vulnerabilities. But during our tests, Snyk found two vulnerabilities in Step 1:

  • CVE-2020-25097 (Squid)
  • CVE-2021-30139 (apk-tools)

Snyk did not find any vulnerabilities from Steps 2 and 3.

3. Docker Hub

Docker hub container scanning did find only one vulnerability

When a Docker image is pushed to Docker Hub, it automatically scans it for vulnerabilities. Teams can review the security state of images, and fix identified issues for more secure deployments. The vulnerability report displays vulnerabilities, and sorts them according to severity. It also displays information about the:

  • Package containing the vulnerability
  • Version in which it was introduced
  • Whether the vulnerability is fixed in a later version

In our analysis, we found that this Docker container security scanner is not effective at finding all vulnerabilities. During testing, it only found one vulnerability from Step 1.

4. Quay

Quay container scanning did not find any vulnerabilities

Quay automatically scans containers to provide a real-time view of known vulnerabilities. The scan report displays vulnerabilities by severity level: Low, Medium and High. It also specifies whether patches are available.

But in our vulnerability test, Quay found no vulnerabilities. For all 3 steps, the report displayed a “passed” status for the security scan.

And now, we come to the final tool in our analysis: our own MergeBase tool.

5. MergeBase

In our analysis, only MergeBase found all vulnerabilities, including those the other tools missed:

MergeBase container scanning finds all vulnerabilities
  • CVE-2021-28116 (Squid) for which no patch is available
  • CVE-2016-5725 in the application, a directory traversal vulnerability in JCraft JSch before 0.1.54 on Windows, when the mode is ChannelSftp (source: CVE Mitre)

In summary, MergeBase found:

  • 3 vulnerabilities in Step 1
  • 1 vulnerability in Step 2
  • 2 vulnerabilities in Step 3

Comparison Conclusion

In containerized environments, the deployment pipeline is often standardized across different dev teams. Container scanning can help find vulnerabilities and take proactive action to fix security gaps. Securing containers and building security into the CI/CD their pipeline can help reduce the size of the attack surface.

However, different container scanning solutions yield inconsistent results on the same environment. Worse, many solutions fall short of their claims to help strengthen end-to-end container security.

In our analysis of 5 application container security tools, we found that our tool MergeBase was the only one that could find all vulnerabilities in our testing environment. Thus, compared to other tools, MergeBase provides complete DevSecOps coverage and reliable container security.

Want to know more about MergeBase? Take a look here!

Enhance software supply chain security says White House

Summary

The White House exec order to improve the software supply chain security
White House executive cyber security order

Software supply chain security is a common theme in recent attacks such as on the Colonial Pipeline and SolarWinds. In response, President Biden released an executive order to improve the nation’s cyber security. The order is a testimony of the importance of the digital ecosystem today to our society, economy and way of life. Cyber security is critical in protecting it and the President’s clear message is that we need to do more.

The government want to improve its own cyber security practices and related agencies. It also proposes to remove barriers for information sharing, and enhancing software supply chain security. The software supply chain security is the only specific attack vector that is highlighted.

These attacks cause the most damage and at the same time businesses and governments are ill prepared. So what is a supply chain attack and how can it be mitigated?

What is a Software Supply Chain Attack?

A supply chain attack leverages the access of an external partner or provider to gain unauthorized entry to a system or network. It takes advantage of the inherent trust a target has in its suppliers, using it to infiltrate and launch the cyber-attack. Its use of stealth and its indirect ‘trusted’ approach make a supply chain attack an effective weapon in any threat actor’s arsenal.

Every organization and individual relies on third-party software in some way or another. When we install software, hardware, or use code from a trusted source, it is natural to assume that it hides no malicious intent. In addition to this inherent trust, a supply chain attack also uses the human element to bypass any perimeter security. As administrative users and software developers install software, hardware, or reuse third-party code, they do so on the internal network, bypassing any security controls that prevent external threats.

Supply chain attacks that target technology infrastructure come in many variations. Threat actors can infiltrate a software provider and embed malicious code that infects end users when they install or access the product. Another effective supply chain attack technique is infecting software code repositories that software developers leverage to create systems. Finally, threat actors can also infiltrate and infect the embedded software that operates the hardware on networking equipment, servers, and end-user devices.

What is a software supply chain?

Modern software development processes rely on code reuse to build systems rapidly and cost-effectively. By leveraging existing code, developers can quickly assemble a system with its needed components instead of coding the entire solution from a blank canvas. Typically, programmers either reuse internally developed software code or leverage third-party libraries and frameworks. These components and their dependencies form part of the software supply chain. In other words, a software supply chain is a list of elements that goes into or affects the code from development to production.

Almost every software application or service we use today leverages a software supply chain. For example, Netflix and Uber use Node.js, an open-source, server-side JavaScript platform that is well suited for scalable applications. Another example is WordPress, a content management system used to run nearly 40% of the world’s websites. The list goes on, but it suffices to say that organizations leverage software supply chains everywhere. Besides leveraging the frameworks and platforms mentioned, software developers also use code libraries to build their solutions. Services like GitHub and StackOverflow are valuable resources where developers can find libraries, code snippets, and advice to help them create solutions.

However, a software supply chain does not only pertain to software development. It can also refer to instances where organizations install and run third-party applications in their technology environment. For example, every organization leverages third-party software for email. It would be both inefficient and expensive to develop an in-house solution for this utility. The same goes for system monitoring, file sharing, security, and other commodities in a technology environment. All these third-party applications and the external code it uses in its custom-developed applications form part of an organization’s software supply chain.

The anatomy of a supply chain attack

A supply chain attack infects the third-party technologies organizations use. It then leverages this unauthorized access to infiltrate and attack their primary targets. Typically, supply chain attacks start when threat actors exploit a vulnerability to gain access to a supplier’s systems. Once they have gained entry, they embed malicious code into the supplier’s software or hardware with a particular payload. The threat actor then waits until the target organization or user runs the supplier’s infected software or installs its infected hardware. As this infiltration technique circumvents any perimeter security, its indirect attack methodology is highly effective. It is also successful in gaining access to secure environments as these attacks typically target less secure elements in the supply chain.

Supply chain attacks are not a new type of threat, but recent cases have raised their prominence in the public domain. If we look at past instances, the Target data breach where malware infected their Point of Sale systems occurred in 2013. In that instance, the attackers compromised the organization’s third-party refrigerator vendor to infect Target’s POS environment with malware that stole credit card details. Another significant example is the famous Stuxnet malware that nation-states used to sabotage Iran’s nuclear centrifuges in 2010. In this example, the attackers used the digital certificates of Realtek Semiconductor to make their malware look legitimate to system administrators and evade anti-virus.

More recently, in the SolarWinds supply chain attack, threat actors deployed malware during a routine update that emanated from SolarWinds’ servers. Every organization that ran the update was subsequently compromised, including technology companies and secure government agencies. As a result of this attack, the United States sanctioned Russia, believing that the Kremlin played a role in this mass infiltration. Other recent supply chain attacks include the narrowly averted PHP backdoor and the Code Dev incident.

These supply chain attack examples show that this technique has successfully infected many organizations around the world. What is of particular concern is that these supply chain attacks also succeeded in highly secure environments. The examples also show the extensive ramifications of a successful attack. One undetected infection can affect thousands of users and organizations.

Software supply chain security, the open-source risk

Many organizations use open-source software in some way, shape, or form. With open-source code present in 90% of modern applications, this vital element in the software development ecosystem is vulnerable to a supply chain attack. The recent PHP case mentioned earlier is a prime example. Modern software applications reuse open-source libraries, frameworks, and code snippets. Threat actors target these components as they are typically less secure.

The 2020 Sonatype State of the Software Supply Chain Report stated next-generation attacks increased by 430% in the preceding 12 months. Unlike commercial software, open-source relies on the community to ensure its security. However, it is up to the organizations that use the software to conduct regular analysis, security audits, and penetration tests.

Technology supply chain risk

The technology supply chain includes hardware and software. Although the focus of this article has been on software supply chain attacks, organizations cannot ignore the hardware risk. Numerous examples of mobile devices arriving with embedded malware and compromised networking equipment used to breach secure networks highlight this threat.

These instances illustrate that business and technology leaders need to consider their entire technology ecosystem when assessing their supply chain risk. As threat actors have shown they can infiltrate hardware vendors, global software corporations, and open-source code repositories, organizations need a comprehensive security strategy. A supply chain attack could come from multiple vectors, and enterprises need to ensure they cover all their bases.

Software supply chain security 

Mitigating the risk of a supply chain attack requires a defense-in-depth approach. Organizations need to conduct thorough security assessments and implement multiple measures to minimize this risk. For example, many regulatory security frameworks such as PCI DSS and the NIST Cybersecurity Framework mention supply chain risk. These compliance standards state that organizations should routinely assess third parties to ensure they comply with any contractual obligations.

As part of the contractual obligations organizations enforce on their suppliers, security testing must form a vital part of any technology deliverable. By placing the responsibility on the vendor to ensure their product is safe, organizations can enforce terms should the vendor fail to meet their obligations. In addition to requiring vendors to test, organizations should also implement internal security testing and monitoring. This layered defensive approach can help them identify any security issues the vendors may have missed.

However, when leveraging open-source software, enforcing contractual terms is not an option. As many open-source technologies come with set licensing terms, it compounds the problem even further. Organizations also need to consider the license and infringement risk, restricting how the company can use the software while protecting itself from a supply chain attack. In these instances, leveraging the services of a Software Composition Analysis (SCA) tool like MergeBase mitigates this open-source risk. As the onus is on the organization to test and validate the open-source components used in an application, an SCA tool like MergeBase adds the required defensive layer.

According to this Gartner Report, the information security of a supply chain must focus on data and IT infrastructure, products, and operations. If we consider the various security technologies in place at enterprises today, organizations implement a myriad of defensive technologies and processes. Firewalls, intrusion detection and prevention solutions, segmented networking, and vulnerability scanners are just some of the solutions that protect their IT landscape. However, these solutions typically safeguard against external threats. Organizations need to ensure the configuration of these platforms also scan and detect any internal anomalies which may indicate a successful supply chain attack.

Enterprises can also consider air gapping systems to mitigate the supply chain risk. Applications and networks not connected to the Internet have a much lower risk of compromise. However, in many cases, this approach is not feasible. They are also not foolproof. The Stuxnet example mentioned earlier was a successful attack against an air-gapped system.

Securing the software supply chain

Supply chain attacks target less secure elements in complex systems. As modern technology solutions rely on reusable components, threat actors target these elements to circumvent security controls. This type of cyber-attack is not new. However, recent discoveries have highlighted the risk organizations face. Traditionally, organizations have tailored their security solutions to mitigate external threats. With the increase in supply chain attacks, it is clear that threat actors target the supply chain to circumvent these controls.

The supply chain attack risk covers every element of an organization’s IT landscape. Attackers have succeeded in infiltrating the supply chain of hardware and software elements. With software development components being a key area of risk, organizations need to implement controls that ensure the security of their application ecosystem.

The MergeBase platform specializes in software supply chain security. It mitigates the risk of a software supply chain attack from development to production. It highlights risks and empowers developers to remedy any security issues during the early stages of the software development lifecycle. MergeBase also assesses software components for vulnerabilities during the build process, ensuring organizations do not release insecure code to production. However, once an application is in production, it may not remain secure. Researchers or threat actors discover software vulnerabilities all the time. MergeBase mitigates this risk with its monitoring and alerting capabilities keeping organizations protected against any new vulnerabilities in production.

What is a supply chain attack?

Summary

harbour part of a traditional supply chain

A supply chain attack leverages the access of an external partner or provider to gain unauthorized entry to a system or network. It takes advantage of the inherent trust a target has in its suppliers, using it to infiltrate and launch the cyber-attack. Its use of stealth and its indirect ‘trusted’ approach make a supply chain attack an effective weapon in any threat actor’s arsenal.

Every organization and individual relies on third-party software in some way or another. When we install software, hardware, or use code from a trusted source, it is natural to assume that it hides no malicious intent. In addition to this inherent trust, a supply chain attack also uses the human element to bypass any perimeter security. As administrative users and software developers install software, hardware, or reuse third-party code, they do so on the internal network, bypassing any security controls that prevent external threats.

Supply chain attacks that target technology infrastructure come in many variations. Threat actors can infiltrate a software provider and embed malicious code that infects end users when they install or access the product. Another effective supply chain attack technique is infecting software code repositories that software developers leverage to create systems. Finally, threat actors can also infiltrate and infect the embedded software that operates the hardware on networking equipment, servers, and end-user devices. 

What is a software supply chain?

Modern software development processes rely on code reuse to build systems rapidly and cost-effectively. By leveraging existing code, developers can quickly assemble a system with its needed components instead of coding the entire solution from a blank canvas. Typically, programmers either reuse internally developed software code or leverage third-party libraries and frameworks. These components and their dependencies form part of the software supply chain. In other words, a software supply chain is a list of elements that goes into or affects the code from development to production. 

Almost every software application or service we use today leverages a software supply chain. For example, Netflix and Uber use Node.js, an open-source, server-side JavaScript platform that is well suited for scalable applications. Another example is WordPress, a content management system used to run nearly 40% of the world’s websites. The list goes on, but it suffices to say that organizations leverage software supply chains everywhere. Besides leveraging the frameworks and platforms mentioned, software developers also use code libraries to build their solutions. Services like GitHub and StackOverflow are valuable resources where developers can find libraries, code snippets, and advice to help them create solutions. 

However, a software supply chain does not only pertain to software development. It can also refer to instances where organizations install and run third-party applications in their technology environment. For example, every organization leverages third-party software for email. It would be both inefficient and expensive to develop an in-house solution for this utility. The same goes for system monitoring, file sharing, security, and other commodities in a technology environment. All these third-party applications and the external code it uses in its custom-developed applications form part of an organization’s software supply chain.

The anatomy of a supply chain attack

A supply chain attack infects the third-party technologies organizations use. It then leverages this unauthorized access to infiltrate and attack their primary targets. Typically, supply chain attacks start when threat actors exploit a vulnerability to gain access to a supplier’s systems. Once they have gained entry, they embed malicious code into the supplier’s software or hardware with a particular payload. The threat actor then waits until the target organization or user runs the supplier’s infected software or installs its infected hardware. As this infiltration technique circumvents any perimeter security, its indirect attack methodology is highly effective. It is also successful in gaining access to secure environments as these attacks typically target less secure elements in the supply chain.

Supply chain attacks are not a new type of threat, but recent cases have raised their prominence in the public domain. If we look at past instances, the Target data breach where malware infected their Point of Sale systems occurred in 2013. In that instance, the attackers compromised the organization’s third-party refrigerator vendor to infect Target’s POS environment with malware that stole credit card details. Another significant example is the famous Stuxnet malware that nation-states used to sabotage Iran’s nuclear centrifuges in 2010. In this example, the attackers used the digital certificates of Realtek Semiconductor to make their malware look legitimate to system administrators and evade anti-virus.  

More recently, in the SolarWinds supply chain attack, threat actors deployed malware during a routine update that emanated from SolarWinds’ servers. Every organization that ran the update was subsequently compromised, including technology companies and secure government agencies. As a result of this attack, the United States sanctioned Russia, believing that the Kremlin played a role in this mass infiltration. Other recent supply chain attacks include the narrowly averted PHP backdoor and the Code Dev incident.

These supply chain attack examples show that this technique has successfully infected many organizations around the world. What is of particular concern is that these supply chain attacks also succeeded in highly secure environments. The examples also show the extensive ramifications of a successful attack. One undetected infection can affect thousands of users and organizations.

The open-source risk

Many organizations use open-source software in some way, shape, or form. With open-source code present in 90% of modern applications, this vital element in the software development ecosystem is vulnerable to a supply chain attack. The recent PHP case mentioned earlier is a prime example. Modern software applications reuse open-source libraries, frameworks, and code snippets. Threat actors target these components as they are typically less secure. 

The 2020 Sonatype State of the Software Supply Chain Report stated next-generation attacks increased by 430% in the preceding 12 months. Unlike commercial software, open-source relies on the community to ensure its security. However, it is up to the organizations that use the software to conduct regular analysis, security audits, and penetration tests. 

Technology supply chain risk

The technology supply chain includes hardware and software. Although the focus of this article has been on software supply chain attacks, organizations cannot ignore the hardware risk. Numerous examples of mobile devices arriving with embedded malware and compromised networking equipment used to breach secure networks highlight this threat. 

 These instances illustrate that business and technology leaders need to consider their entire technology ecosystem when assessing their supply chain risk. As threat actors have shown they can infiltrate hardware vendors, global software corporations, and open-source code repositories, organizations need a comprehensive security strategy. A supply chain attack could come from multiple vectors, and enterprises need to ensure they cover all their bases. 

Mitigating supply chain risk

Mitigating the risk of a supply chain attack requires a defense-in-depth approach. Organizations need to conduct thorough security assessments and implement multiple measures to minimize this risk. For example, many regulatory security frameworks such as PCI DSS and the NIST Cybersecurity Framework mention supply chain risk. These compliance standards state that organizations should routinely assess third parties to ensure they comply with any contractual obligations. 

 As part of the contractual obligations organizations enforce on their suppliers, security testing must form a vital part of any technology deliverable. By placing the responsibility on the vendor to ensure their product is safe, organizations can enforce terms should the vendor fail to meet their obligations. In addition to requiring vendors to test, organizations should also implement internal security testing and monitoring. This layered defensive approach can help them identify any security issues the vendors may have missed. 

However, when leveraging open-source software, enforcing contractual terms is not an option. As many open-source technologies come with set licensing terms, it compounds the problem even further. Organizations also need to consider the license and infringement risk, restricting how the company can use the software while protecting itself from a supply chain attack. In these instances, leveraging the services of a Software Composition Analysis (SCA) tool like MergeBase mitigates this open-source risk. As the onus is on the organization to test and validate the open-source components used in an application, an SCA tool like MergeBase adds the required defensive layer.

According to this Gartner Report, the information security of a supply chain must focus on data and IT infrastructure, products, and operations. If we consider the various security technologies in place at enterprises today, organizations implement a myriad of defensive technologies and processes. Firewalls, intrusion detection and prevention solutions, segmented networking, and vulnerability scanners are just some of the solutions that protect their IT landscape. However, these solutions typically safeguard against external threats. Organizations need to ensure the configuration of these platforms also scan and detect any internal anomalies which may indicate a successful supply chain attack.

Enterprises can also consider air gapping systems to mitigate the supply chain risk. Applications and networks not connected to the Internet have a much lower risk of compromise. However, in many cases, this approach is not feasible. They are also not foolproof. The Stuxnet example mentioned earlier was a successful attack against an air-gapped system. 

Securing the software development supply chain

Supply chain attacks target less secure elements in complex systems. As modern technology solutions rely on reusable components, threat actors target these elements to circumvent security controls. This type of cyber-attack is not new. However, recent discoveries have highlighted the risk organizations face. Traditionally, organizations have tailored their security solutions to mitigate external threats. With the increase in supply chain attacks, it is clear that threat actors target the supply chain to circumvent these controls. 

The supply chain attack risk covers every element of an organization’s IT landscape. Attackers have succeeded in infiltrating the supply chain of hardware and software elements. With software development components being a key area of risk, organizations need to implement controls that ensure the security of their application ecosystem.

The MergeBase platform mitigates the risk of a software supply chain attack from development to production. It highlights risks and empowers developers to remedy any security issues during the early stages of the software development lifecycle. MergeBase also assesses software components for vulnerabilities during the build process, ensuring organizations do not release insecure code to production. However, once an application is in production, it may not remain secure. Researchers or threat actors discover software vulnerabilities all the time. MergeBase mitigates this risk with its monitoring and alerting capabilities keeping organizations protected against any new vulnerabilities in production.

Open Source Risk: Plugging the hole

"The leak was worse than first thought" notes a bypasser of a boy stuck head-first into the wall of a digital dyke. Plug you open source holes with a SCA tool

Origin 

Software development based on the sharing and collaborative improvement of software source code goes back to its very origins. 

In the late 1990s, the term “open-source” was coined and received mainstream recognition in publications such as Forbes. The Netscape browser’s source code was made open source and that got a lot of attention.

The original open-source projects were “revolutions” against the “unfair” profits that closed-source software companies were reaping. Microsoft, Oracle, SAP and others, it was argued, were extracting monopoly-like “rents” for software, which the top developers of the time did not believe was world class.

Open Source Growth

Open source software was originally created by developers for developers. It was embraced slowly by more and more projects, organisations and companies and it now forms the foundation for the Internet and most of our digital assets. The code base of a typical modern application consists of 80 to 90% of open source software. Even in something as proprietary as Apple’s iPhone, the operating system consists largely of open source software. 

Currently, there are close to 1 million open source projects globally and this number increases by 79% a year

Open source victorious as last ones standing capitulate

Apple and Google embraced open source more than 20 years ago. The champions of proprietary software, IBM and Microsoft, resisted much longer. 

  “Once open source gets good enough,
competing with it would be insane.”

2006, Larry Elison, the chairman of Oracle in conversation with the Financial Times

Elison was right on the mark. It looks like we reached that point a few years ago. IBM and Microsoft were the last ones standing against open source, but  in the end they capitulated. IBM acquired RedHat  early 2019 for $34B and Microsoft acquired GitHub for $7.5B in 2018.

Open source use a surprise to many executives

Many organizations where leadership does not have a strong engineering or technical background often do not fully realize yet the importance of open source and how dependent they are on it in their digital supply chain. We regularly encounter executives who are very surprised when we analyze their applications and identify many open source libraries. Awareness is the first step in managing open source risk and rewards.

Open Source Risks: Is it really free?

Open source is bringing huge rewards to business. However, with reward comes open source risk. The two main risks are legal related to the licenses  and cyber risk related to vulnerabilities. 

Open source is free but can come with strings attached that do not match with your organization’s business model. Open source software is released under different licensing models. There are over 300 licensing models in use. Most open source software comes with friendly licenses such as the licenses for Apache and BSD. However other licensing models not so much, such as licenses for GNU GPL and GNU Affero. Use of these licenses, even in a minor way, could force an organisation to open source their entire software with devastating impact on the IP value of the organisation.  

Open source software, like all software, can contain vulnerabilities. Open source software, in general, is high quality software and not intrinsically more vulnerable. However, because of its wide usage, it is a very attractive target for cyber adversaries and so, over time, vulnerabilities are uncovered. At the moment, there are more than 150,000 known vulnerabilities. A lot of these vulnerabilities can be exploited to breach organisations and are considered to be the cause of approximately 25% of data breaches. 

One example of a major breach is the Equifax breach which exposed 145 million client records and cost the organisation more than $1.3 B to remediate. The company also lost $5B in stock market value overnight and later received a $700 M fine from the US government. 

The best defence: SCA / OSS

The best defense against open source risk is to use a Software Composition Analysis tool, sometimes also called Open Source Security scanner. These tools quickly analyse your applications or containers and provide insight into license and cyber risk. MergeBase goes a step further and provides solutions to quickly and easily reduce your cyber risk. 

A Critical Look at Cyber Investment

What is the top defensive technology area to invest in right now?

Cyber defense is a global whack-a-mole game with hundreds of billions of dollars being invested in offensive and defensive capabilities. After you invest in one area, another area of risk tends to pop up. What is the top defensive technology area to invest in right now?

Cyber is multifaceted

Cyber defense requires a multifaceted approach. Fragmentation is a natural consequence of the back and forth between cyber attackers and defenders: If we have an effective defence against a particular type of attack, adversaries will try another area, angle, or approach. Over time this means we need many technologies to secure our organisation. Like it or not, cyber defence is a global whack-a-mole game. It is an arms race, with governments and corporations investing hundreds of billions of dollars continuously in building out offensive and defensive capabilities.

We all know that we need a multifaceted approach. This involves people, process and tools. We need to make sure that everyone in the organization is motivated and has the skills and resources to fight cybercrime. Beyond understanding why and how, technology is critically important as cyberspace is tech heavy.

What area do we need to invest in?

Unless you feel at ease with your cyber protection, the question is: What is the key technology area to invest in right now? This question is very difficult for most cyber professionals as most organizations under fund and under resource their cyber operations.

We posed this question to cyber professionals by posting a poll to LinkedIn. To eliminate bias, we conducted the poll twice (second poll), reaching out to two distinct networks of cyber professionals. Feel free to repost the poll and let us know what your results are.

The poll asked what areas to focus on: MFA, perimeter security, known vulnerabilities or education. The results, which were consistent between the two polls, were: known vulnerabilities at 49% , MFA at 29%, and perimeter and education each approximately at 10%.

Known vulnerabilities routinely exploited

The results of the poll make a lot of sense. Of course, all these areas are important and really need more investment. However, the NSA and CISA continue to warn that cyber adversaries routinely exploit known vulnerabilities..

If we look at major breaches, we see plenty of evidence supporting these warnings. Sophisticated attackers use a combination of hacking techniques, as we have seen recently with SolarWinds. Exploiting known application vulnerabilities is a big part of their arsenal and allows adversaries to move laterally and subsequently elevate privileges.

In reality we find that very few organizations are able to execute fully on a vulnerability strategy.

Why can we not eliminate known vulnerabilities?

Why are we not able to routinely eliminate our known application vulnerabilities? The answer is that it is a daunting task given the level of software that most organisations are operating in combination with the level of technical debt that most of these applications suffer from. Some cyber experts call for continuous upgrading of all components. That would eliminate these problems. However, continuous upgrading is difficult for organisations that have a lot of applications. For instance, a typical North American bank has 600 software applications. Large banks tend to have many more. A lot of these applications are older and do not have active development. Therefore, routinely upgrading may not be practical.

Software Composition Analysis (SCA) vs. Java Über Jars

Image source: unsplash.com.

Introduction

Über jars are a type of reuseable Java library that applications sometimes (knowingly or not) incorporate into their systems. Über jars are particularly challenging for software composition analysis (SCA) tools to understand because their structure and organization is complex. In this blog post I explain what über jars are, why they exist, and I provide a mini-benchmark to see how current SCA tools deal with this type of Java library.

TLDR

Circa February 2021, both MergeBase and Sonatype use deep binary analysis to measure software composition within über jars.

  • MergeBase is the only SCA tool I observed with comprehensive support for über jars.
  • Sonatype also provides decent support but misses a few obvious cases. (Strangely, their legal analysis completely ignores their über jar results.)
  • OWASP Depedency-Check and JFrog Xray are not bad, but their scanning is based solely on metadata files found inside über jars.
  • Snyk, WhiteSource, and Github Dependabot currently have no ability to understand über jars at all.

I did not benchmark Black Duck (I don’t have access to that tool).

Background

Über jars are the Java equivalent of taking everything in your fridge and throwing it all into your largest pot, giving it a good stir. From an SCA (Software Composition Analysis) perspective they are a bit of a nightmare.

Recall: normally you point your SCA tool at a Java jar file (a reuseable Java library) and your SCA tool responds by telling you the jar file’s name, version, and known-vulnerabilities.

But what if that single Java jar file is actually an agglomeration of dozens of Java jar files? What if some maniac cracked open all your jar files and poured all their contents into a single mega jar? That’s exactly what an über jar is and your SCA tool is going to need to reverse-engineer the contents accurately before it can say anything.

Why would anyone create an über jar in the first place?

Java programs are awkward to invoke. You have to tell Java the exact locations of all the jar files your program is using. Über jars are a way around that problem.

For example, a typical normal java program is started like this:

java -classpath lib1.jar:lib2.jar:main.jar name.of.MainEntry

With an über jar it’s less typing, since “lib1.jar” and “lib2.jar” have been blended directly into a single “main-uber.jar” file:

java -classpath main-uber.jar name.of.MainEntry

In this way über jars make Java programs easier to distribute and easier to start. That’s the main reason why they exist.

Recall that Jar files are actually just zip files. You can rename them to “.zip” and then double-click on them if you ever want to see what’s inside them. Über jars are what you get if you unzipped all of your Jar files and combined all the contents into a single zip file instead.

Über jars are a challenge for SCA

Most SCA tools are geared towards providing a single succinct answer for each library they scan.

SCA Scan Results

Identified Library:
Apache Commons-Collections 3.2.1.

Vulnerabilities:
CVE-2017-15708, CVE-2015-7501, and CVE-2015-6420

With über jars the answer is more complicated. “Well, actually… this library is a combination of many libraries.”

At MergeBase we analyze every jar file against our master database for this possibility. For example, consider “apacheds-all-1.5.5.jar”, a large über jar containing over 500,000 lines of code coming from dozens of libraries. When we compare this jar file against all known versions of “slf4j-api” here are the results:

Match RatioKnown Library Version
81.0%slf4j-api@1.5.11
90.5%slf4j-api@1.5.8
100.0%slf4j-api@1.5.6
90.5%slf4j-api@1.5.5
90.5%slf4j-api@1.5.4
These results show that version 1.5.6 of slf4j-api is contained inside the apacheds-all-1.5.5 über jar file.

In the “slf4j-api” case there is also another hint inside the über jar. If I grep the jar’s contents for “sl4fj-api” I see these two entries:

    META-INF/maven/org.slf4j/slf4j-api/pom.xml
    META-INF/maven/org.slf4j/slf4j-api/pom.properties

Opening the latter, I see this:

    #Generated by Maven
    #Fri Nov 21 14:48:07 CET 2008
    version=1.5.6
    groupId=org.slf4j
    artifactId=slf4j-api

This gives me further confidence that my binary analysis is correct: version 1.5.6 aligns with my MergeBase result. Some SCA scanners only consider this metadata when examining über jars, but philosophically I don’t agree with that approach, since metadata is not always present, as in the bouncy-castle example below. Metadata is also vulnerable to transcription mistakes and tampering.

You might be curious why this metadata is even present in the first place.

My own theory: it was probably present in the original “slf4j-api” jar. Über jars don’t just combine the software files – they combine all the files! And so if a metadata file is present in the original “slf4j-api” file, it will be dutifully copied into the über jar. I can download the original and see for myself. Sure enough, running “unzip -l slf4j-api-1.5.6.jar” shows both of those metadata files were in the original.

Moving onto to an example without metadata, here’s the results when we compare our über jar against “bcprov-jdk15”:

Match RatioKnown Library Version
84.7%bcprov-jdk15@1.44
91.3%bcprov-jdk15@1.43
100.0%bcprov-jdk15@1.40
82.0%bcprov-jdk15@1.38
48.6%bcprov-jdk15@1.32

There is no metadata available to warn consumers that the highly vulnerable version 1.40 of bcprov-jdk15 was copied into apacheds-all-1.5.5.jar. Unfortunately bcprov-jdk15@1.40 contains over 15 known-vulnerabilities. Scanners that rely on metadata (such as JFrog Xray and OWASP Dependency-Check) will miss this. And of course scanners that lack über jar handling (such as WhiteSource and Snyk) will also miss this.

Using our high-confidence matches we then query our known-vulnerability database for any corresponding vulnerabilities. Our technique is based on binary analysis – no metadata is involved at all, since metadata can be inaccurate. Using this technique we are able to identify dozens of sub-components encapsulated by the apacheds-all-1.5.5 über jar. Here’s a partial listing based on MergeBase’s analysis:

  1. 100.0% – antlr/antlr@2.7.7
  2. 100.0% – commons-io/commons-io@1.4
  3. 100.0% – commons-lang/commons-lang@2.4
  4. 100.0% – org.apache.directory.server/apacheds-core-jndi@1.5.5
  5. 100.0% – org.apache.directory.shared/shared-ldap@0.9.15
  6. 100.0% – org.apache.mina/mina-core@2.0.0-M6
  7. 100.0% – org.bouncycastle/bcprov-jdk15@1.43
  8. 100.0% – org.slf4j/slf4j-api@1.5.8

(Etc… 25 more sub-components identified!)

Quick Competitive Check

We were curious to see if competing SCA tools are able to handle über jars. What follows is a quick benchmark against a half-dozen popular SCA tools.

Methodology

For each SCA tool (MergeBase, OWASP Dependency-Check, Snyk, WhiteSource, Sonatype, etc…):

  1. Git clone: https://github.com/mergebase/vuln-example-apacheds-all
  2. Run “mvn install”.
  3. Apply each SCA tool against the built “vuln-example-apacheds-all”.
  4. Observe and compare the scan results.

Mini-Benchmark Results

As of February 2021, the apacheds-all-1.5.5 über jar contains two vulnerable sub-components. One of these (bcprov-jdk15@1.40) can only be identified using binary approaches since it had no metadata in the first place, and one of these (commons-collections@3.2.1) can be identified either via binary approaches or via metadata scanning.

We group the benchmark results into 3 categories:

1. Scanners that do not support über jars at all.

Snyk and Whitesource appear to have no idea that “apacheds-all@1.5.5” is made by combining many jar files together.

Similarly, Github’s Dependabot also has no idea about this.

2. Scanners that support a metadata-based understanding of über jars.

OWASP Dependency-Check and JFrog Xray both detect the “commons-collections@3.2.1” metadata inside the über jar.

3. Scanners that support deep understanding of über jars.

Sonatype fails to identify any known-vulnerabilities with respect to commons-collections@3.2.1, and yet it does correctly identify that apacheds-all@1.5.5 contains bcprov-jdk15@1.40! This is a lopsided result: Sonatype clearly has a deep understanding here (otherwise it would be impossible to identify bcprov-jdk15), and yet somehow Sonatype is failing to spot the easy one. We also noted that Sonatype reported the license as Apache 2.0, when bcprov-jdk15 uses the MIT license.

MergeBase identifies all vulnerabilities correctly in this case. 🙂

Conclusion

Über jars are a special type of Java software component made by combining several jars into a single jar. Aside from MergeBase, most SCA scanners currently provide sub-par or even zero support for this component type.

Special Thanks

Specials thank to Dr. Ken Warkentyne, our principal engineer, who built MergeBase’s über jar scanning capabiliity.

Scanning .NET and Nuget projects for known vulnerabilities

We recently (August 2020) completed version 1 of our .NET scanner. Its goal is scanning .NET and Nuget projects for libraries with known vulnerabilities in any .NET project.

For this blog post we thought we’d take our scanner out for a spin and see how it compares against the competition.

TL;DR:

Here are the results of scanning .NET and Nuget projects for known vulnerabilities:

  • MergeBase – 18 vulnerabilities, 0 false positives.
  • Snyk – 7 vulns and 5 false, or 4 vulns and 0 false (depends on scanner setup).
  • WhiteSource – 12 vulns, 0 false.
  • OWASP Dependency Check – 12 vulns, 17 false.
  • Dotnet Retire – 2 vulns, 0 false.
  • Sonatype – 0 vulns, 0 false.
  • Dependabot – 0 vulns, 0 false.

Methodology:

We chose the .NET Orleans project as to scan for .NET and Nuget vulnerabilities. It’s active, complex, and it builds successfully (August 6th, 2020, master = 2e10856f7b7ed9443c). We also liked how this project contained a mix of Nuget styles (e.g., older “packages.config” style as well as the newer “<PackageReference/>” style).

We type “dotnet build” before scanning. This way .NET scanners can use the generated “obj/project.assets.json” files to supplement their scan data if they want to, and “dotnet build” is such a critical step for building any .NET project that we think it’s safe for an SCA tool to assume this command has completed successfully.

As for comparing results, we count CVE’s. If the scan outputs 1 or 300 or 9,000,000 hits against CVE-2018-8292, we count that as a single CVE. We then do a quick “desk check” to categorize the result as either a true-hit, a false-negative, or an ambiguous result (where it’s hard to say one way or the other). The “desk check” is very much based on my own decades of experience as a software engineer – I encourage others to rerun these scans and see if they agree or disagree.

Because this is a .NET scan for Nuget project, we ignore any results the scanners find from other file-types lying on the file system (e.g., “VotingWeb/wwwroot/lib/jquery/jquery.min.js”). We do, however, count results found from nuget references into other language artifacts (e.g., “GPSTracker.Web/packages.config” contains a nuget reference to “<package id=”bootstrap” version=”3.0.0″ targetFramework=”net45″ />” in its packages.config file – we’ll count this.)

Here is the exact sequence of steps:

  1. git clone https://github.com/dotnet/orleans.git
  2. dotnet build
  3. Deploy The Scanners!
  4. Validate the results.

A note about ambiguous results:

We classify some results as ambiguous. This means there’s definitely some smoke, so we can’t rule immediately it out as a false negative after examining the metadata, but on the other hand, there’s enough uncertainty to also make us uncomfortable considering it a true hit.

Example:

The vulnerability references “bootstrap” in the scan report but the CVE description talks about “bootstrap-sass”. Maybe? Or in another case the CVE description starts out with the words (in all caps) “DISPUTED”.

Results of scanning .NET, Nuget project for vulnerabilities:

I’ll save the best for first! Here’s what MergeBase finds:

1. MergeBase

18 vulnerabilities found (and two ambiguous hits).

Drop the scanner into the Orleans subdirectory. Type “java -jar mergebase.jar .” and the results are pretty straightforward: 2 critical CVE’s, 5 high ones, and 11 mediums. A quick spot-check of the metadata looked good (no false positives and two ambiguous results).

2. Github Dependabot

Zero vulnerabilities found.

Dependabot not doing too much here, despite being a Microsoft product (albeit recently acquired):

3. Dotnet Retire

Two vulnerabilities found: CVE-2018-8292 and CVE-2018-8416. MergeBase also found these two among the 18 vulnerabilities it identified.

4. Sonatype

Zero .NET vulnerabilities found!

Sonatype does detect a small handful of JavaScript vulnerabilities (since Orleans contains things like “VotingWeb/wwwroot/lib/jquery/jquery.min.js”), but nothing for .NET. To be fair, their scanner instructions did say “you must copy all .NET packages you depend on into the zip file you are scanning beforehand.” I typed “dotnet build” and zipped the result (660MB). As far as I’m concerned, I was doing them a favour by even zipping up orleans post-build in the first place – no other scanner required that.

Note-to-self: Probably MergeBase should also scan those JavaScript packages! (Our current logic looks for NPM and Yarn lock files, but maybe it’s time to roll up our sleeves and consider scanning raw *.js and *.min.js files, too.)

5. Snyk

“It’s Complicated!” The problem with Snyk is that there’s two different ways to invoke the Snyk scanner, and each way returns wildly different results.

Snyk Approach #1 – Github Integration:

15 vulnerabilities found. 5 of those are false positives (all because Microsoft.NETCore.App was flagged as a dependency, but it’s not). 3 are ambiguous. 7 are true hits.

A few NPM and Docker vulnerabilities also found, but seeing as this bakeoff is only about .NET we ignored those.

Snyk Approach #2 – Command Line Invocation:

7 vulnerabilities found. 3 are ambiguous, leaving 4 true hits, including 1 true hit that Snyk approach #1 above did not find (CVE-2020-1469).

No NPM or Docker vulnerabilities found via this approach.

6. Whitesource Bolt

12 true CVE vulns.

3 ambigs.

7. OWASP Dependency Check

12 true CVE vulns.

3 true NON-CVE vulns.

2 ambigs.

17 falses

Unfortunately OWASP Dependency Check is currently unable to handle .NET’s property substitution (e.g., when a *.csproj file references “Directory.Build.props”), a common convention for developers maintaining these files. This causes some frustrating false positives, such as reporting that “Google.Protobuf:$(GoogleProtobufVersion)” is vulnerable to CVE-2015-5237.

OWASP Dependency Check also considers version 0.61.0 of the .NET MySqlConnector package to be vulnerable to 14 CVE’s – these are certainly all false positives. This is probably happening because Dependency Check considers version “0.61.0” to come before releases from MySQL’s popular version 5.x series against which many CVE’s have been filed over the years. However, version “0.61.0” of this package is less than 10 months old, making it impossible that it’s vulnerable to these ancient CVE’s.

Scanning .NET and Nuget Conclusion

Our own offering looks compelling in the .NET space. We were one of the top performers in this mini-benchmark, with Snyk, Whitesource, and the open source OWASP Dependency Check tool also providing reasonable results. We were surprised to see Sonatype and Dependabot perform so poorly here. Software developers currently using the popular open source “Dotnet Retire” tool for this problem should definitely consider other options.

As always with benchmarks such as these and security tools in general your mileage can vary a lot based on the tools you’re using and your own particular context. I think a lot of companies become complacent with their existing tools. Similar to with the smoke detectors in your house, it’s a good idea to benchmark your existing tools periodically, just to ensure that they are still working properly!

Last piece of advice: Have .NET software? Give MergeBase a closer look!

Introducing CodeGreen for Bitbucket

Recommended pre-reading:
  
Intro to SCA – Software Composition Analysis (mergebase.com)

Atlassian Marketplace Link:
MergeBase CodeGreen (marketplace.atlassian.com)

Introduction

One of the main challenges with known-vulnerabilities is how they mess with standard software lifecycles. A lot of traditional quality engineering relies on the old saying, “if it’s not broken, don’t touch it.” Known-vulnerability announcements for popular open source libraries completely go against that, since they are discovered and announced more or less at random. A good known-vulnerability SCA solution needs to deal with three very different cadences through which known-vulnerabilities will manifest themselves in your software:

If you’re serious about reducing open-source known-vulnerabilities within your software assets, CodeGreen is a tool for getting real results company wide. CodeGreen puts known-vulnerability software composition analysis (SCA) scans directly in front of software engineer eyeballs. A lot of application security work is done by following checklists and invoking security tools and uploading artifacts to cloud URLs during coding and reviewing tasks. CodeGreen short circuits all that by inserting itself directly into your company’s software engineering workflow (as a Bitbucket plugin). From there CodeGreen can inject a range of interventions customized to your corporate application security policy, from low-friction informational reports all the way to outright blocking. These interventions help you quickly get all of your software engineering teams onto the same page.

By attaching directly to the enterprise source-control system (as a Bitbucket plugin) GodeGreen is able to improve application security posture across the board for an entire organization. Your application security will improve within hours after your local Bitbucket administrator installs the CodeGreen plugin through Atlassian’s marketplace.

Vulnerabilities Arrive On Different Cadences

  1. New vulnerability announcements. Your application is not broken, in fact it’s working great! Clients love it. Management is happy. But a known vulnerability has been discovered and published that could be exploited by criminals and bring your brand down. You have to fix it! You must upgrade the insecure library to a safer version.
  2. Accidental vulnerability import (“developer-as-vector”). Under this scenario one of your developers unwittingly introduces a bad library version (that contains known-vulnerabilities) into one of your systems. Just because a “known-vulnerability” is known to the cyber security world at large does not mean it’s known to your own development staff!
  3. That terrifying first scan. This scenario is essentially a combination of the above two scenarios, albeit after several years of unmonitored vulnerability accumulation. The experience of running a first vulnerability scan can be so overwhelming and demoralising for staff that good SCA tools must account for this and provide strategies to manage the first scan.

CodeGreen is a unique tool in the SCA space in that it provides mitigations, reports, and controls designed specifically for these 3 cadences. The rest of this blog post goes into those capabilities in-depth.

For New-Vulnerability-Announcements: Add A Little Friction (Cadence #1)

Developers need to be aware of how newly discovered vulnerabilities affect their systems, but finding time to address these is always a balancing act based on risk, urgency, and other priorities. This is where CodeGreen can apply a little friction.

For Developer-As-Vector: Slam On The Brakes! (Cadence #2)

For cadence #2 (developer-as-vector), once awareness is in place, vulnerabilities should never come into software via this vector. The vast majority of software vulnerabilities are announced alongside a patched (fixed) release of the library. This means developers should never introduce vulnerable libraries into a software project unless such is absolutely unavoidable. This is where CodeGreen can slam on the brakes.

Managing That Terrifying First Scan (Cadence #3)

A lot of security tools are sold and marketed based on a simplified models of their operation – the tool is presented similar to a flashlight. Turn on the light, see into the darkness. But under the hood the tool might offer dials and controls and subtleties to users to help make its operation more successful. CodeGreen is no exception here!

Under ideal operation CodeGreen would be configured to apply maximum friction to encourage developers to eliminate all vulnerabilities, but that’s not tractable for most organizations, at least not at first.

To help make CodeGreen more practical we allow repository administrators to adjust the CVSS thresholds at which the various CodeGreen mitigations become active:

We recommend setting these to more permissive values during your initial rollout, and tightening them to more restrictive values as your teams’ application-security maturity improves.

For example, in the beginning you might want to enable only the CodeGreen double-push friction and set it to a CVSS 9.0 threshold and disable everything else. Make it an overt term goal to clear out all 9.0 vulnerabilities and above.

(But always enable “block-net-new-vulnerabilities” because that’s the dreaded cadence #2!)

Once you’ve achieved that, increase the “double-push” control to use a CVSS threshold of 8.0, so it catches more vulnerabilities.

Meanwhile, enable the “requires dual-approval” control (a much higher friction compared to double-push) and set that one to 9.0.

The end result here is interesting: any newly announced vulnerabilities will suddenly dramatically slow down development teams. The developer has a choice: find someone to approve their work, leaving the vulnerability in place, or just patch the brand new 9.8 vulnerability and avoid the dual-approval.

Which would you choose?

It’s a lot like thoroughly cleaning a house methodically from top to bottom: once a given room is clean, you can lock its door to prevent any additional mess from occurring in the already cleaned room. Similarly here you can clean out all the 9.0’s and above, and then “lock the door” on them by turning on the dual-approval control.

Conclusion

GodeGreen improves application security posture across the board for your entire organization by embedding open-source known-vulnerability scans directly into your centralized git source control. Your application security will improve within hours after your local Bitbucket administrator installs it!

MergeBase successful seed raise

VANCOUVER, BC – Will cybercrime cause $1 trillion in damage to an already-vulnerable economy by 2021? Not if MergeBase Software Inc. has anything to do with it. The company has just announced it raised $500,000 funding for its best-in-class cybersecurity product — helping it ramp up sales and distribution. The funding round officially closed on March 19.

“I’m impressed that during this unprecedented crisis, business leaders and investors are able to ‘keep calm and carry on’, continuing to invest in leading-edge technology to solve critical problems like cybercrime,” says MergeBase CEO Oscar van der Meer. 

The current COVID-19 crisis and social distancing measures will only accelerate the move to a fully digital economy. In this new environment, cybersecurity for digital assets and IT will be even more mission-critical to business and governments.

“So many technology-powered companies are built on open-source code and third-party apps — which is a quicker, easier and cheaper way of building software,” he explains. “But those savings come with a cost, exposing organisations using these applications to data breaches.

“Integration with external apps already causes up to 24 percent of all cybersecurity breaches — and that’s only going to grow,” he says. “MergeBase is a best-in-class solution to boost the immune system of enterprises around the world.” MergeBase’s app-security solution detects more vulnerabilities than any other tool on the market.

Enterprises are already boosting purchases of app security solutions, from 5 percent in 2019 to 60 percent by 2024, according to a Gartner report. “It’s why we’re expecting to see big growth in our business.” 

MergeBase’s solutions are aimed at large enterprises. Their customers include a government agency that processes trillions of dollars of payments every year. 

The company’s co-founders bring a combined 50 years of experience in the financial industry, and a wealth of knowledge about identifying and dealing with vulnerabilities in technology; for instance, van der Meer was a senior executive at Central1, the central financial facility and trade association for the B.C. and Ontario credit union systems. 

Investors in the current funding round include Lisa Shields and Western Universities Technology Innovation Fund (WUTIF) and Maninder Dhaliwal.

Intro to SCA software composition analysis

Background

Why do we need SCA software? Well, that is a long story:

Commercial and industrial software is now primarily constructed from components. Open source components, to be exact. Open source software licenses dramatically decrease business frictions that arise from incorporating and integrating software developed by external entities. No more contract negotiation or in-house legal review!

Add to this the fact that many software use cases are more or less identical across systems: http connectivity, encryption, spell checking, transaction management, database object mapping, unit testing, etc. The end result is predictable: in less time than it takes to read the 2-clause BSD open source software license, your developers are copying externally developed software libraries into your proprietary systems. Because: why not? The license allows it, and developers achieve their objectives with fewer bugs and time to spare.

Software developers can now easily obtain pre-fabricated high-quality software libraries to help implement significant portions of their software. Your colleagues only need to write a small amount of glue code to wire these libraries into the larger system. Software, like automobiles, is now made mostly from parts.

But unlike cars, the supply chain in the software world is complete mayhem and chaos. Consider this common clause found in the majority of open software licenses:

Unless required by applicable law or agreed to in writing, Licensor provides the Work on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND

Risk management would flag this as a major risk and procurement  policies would not allow such clauses to stand in a legal review. But the advantages of open source software outweigh the risk of running without warranties. You get what you pay for. But there are costs. People often say there is no such thing as a free lunch – the same is true of open source software.

The Upgrade Problem

Some people, when confronted with a problem, think “I know, I’ll use an open source software library.” Now they have two problems.

When I build my systems, I choose specific versions of open source libraries to incorporate into these systems. These versions quickly become stale as open source authors continuously update and evolve their libraries, issuing new releases periodically. Worse, you’ll start to find vulnerabilities in them.

At first glance, the problem looks pretty simple and straight-forward. Somewhere in my build script my software will contain a line like this:

<dependency>
   <artifactId>apache-struts2</artifactId>
   <version>2.5.22</version>
 </dependency>

If I’m using JavaScript / NPM, then it would look a little different, but essentially the same:

"dependencies": {

   "apache-struts2": "^2.5.22"

 }

To avoid any vulnerability to the infamous cyber security bug that took down Equifax in 2017, I just have to change the “<version>” line in the text above to this:

 <version>2.5.23</version>

Then I click “save” and “build” in my coding editor and voila! My software is now safe. In the NPM example things are even better thanks to that “^” character. The “^” symbol tells the build script to upgrade the library to 2.5.23 automatically.

However, despite how simple the example above appears, in actual practice this problem is a complete f’ing nightmare. For several reasons:

  1. How do I even find out that the libraries I’m using have updates available?
  2. My system currently operates correctly (to the best of my knowledge). Could a library update break my system (regression risk)?
  3. Sometimes libraries change their own calling protocols and requirements in subtle or even not-so-subtle ways. How much work will I need to do updating my glue code to integrate a particular library’s newest version into my software system?
  4. Related to item #3, do the library authors themselves have any recommendations regarding long term plans? For example, the authors of the popular “Apache HttpClient 3.x” library decided they hated maintaining it and rewrote the library completely from scratch. They actively encouraged consumers of their library to switch to their new rewrite (“Apache HttpComponents 4.x”), and stopped all maintenance of the older library, but unfortunately switching to this newer version required significant effort for consumers.
  5. Does the current version of the library I’m using have any critical security flaws in it? Normal bugs prevent or perturb normal usage patterns, but I’ve already established that the library operates correctly within my system, and so I’m not too concerned about normal bug fixes. Security bugs are a whole different animal, since they often allow malicious users to cause the library to misbehave in ways that can degrade or even breach and exploit the larger running system.
  6. Are any of the critical security flaws widely known to the public at large? E.g., are they referenced by specific CVE (Common Vulnerability and Exposure) advisories within the U.S. Government’s NVD (National Vulnerability Database)? Upgrading library versions that are associated to CVE records should be considered a high priority, since cyber security breaches via these vectors are often perceived as engineering negligence by the public.
  7. Can we confirm exploitability based on our current configuration? If we can prove our specific setup is non-exploitable, that can buy us time to postpone the upgrade for now. But sometimes even establishing non-exploitability requires more work than simply upgrading the library.
  8. Bear in mind we must tackle this problem repeatedly for every library currently incorporated into our larger software system. Most minimally useful commercial systems will bring in at least 30 libraries; I figure the average is around 80 libraries; and I’ve personally seen systems that contain more than 300 distinct libraries.
  9. Some practioners recommend upgrading libraries when new library versions contain useful features that you would like to incorporate into your system, especially if such new features would allow you to delete some of your own code. I am on the fence on this matter, since in my opinion the maxim “if it ain’t broke, don’t fix it” outweighs this. However, should a library update happen to obviate code you are using in a different library, allowing you to completely remove one of the library dependencies from your system, I do recommend taking that upgrade. Good luck ever noticing such obviations, however.

The list above enumerates the tensions and problems we face when upgrading software components.

How to manage open source software vulnerabilities and license risk?

So what are people doing about it? First hand “in the field” I’ve seen three different approaches applied to this upgrade problem.

  • PURE MANUAL BEST EFFORTS. Under this approach the engineering team tries their best to keep library versions up to date when possible, and they try to keep an eye on any associated CVE records in the NVD database through google searches and peripheral awareness. END RESULT: typically these systems are severely stale and rife with vulnerabilities.
  • AUTOMATED ALWAYS UPGRADE EVERYTHING ALWAYS. These systems are less affected by CVE’s or other known-vulnerabilities, since known-vulnerability announcements tend to correspond to version updates, and systems under this regime take in updates immediately. This approach does not deal well with incompatible library upgrades, and such usually end up in a “Pure Manual Best Efforts” pile. END RESULT: these systems tend to have fewer known vulnerabilities, but they can be vulnerable to broken builds and regression bugs. They are also vulnerable to supply-chain attacks such as the event-stream NPM attack that occurred in late 2018.
  • TOOL ASSISTED SOFTWARE COMPOSITION ANALYSIS. Engineering teams can use SCA (Software Composition Analysis) tools to tackle the upgrade problem. Despite their name, SCA tools should really be called recall notifiers, since that is their primary function: to determine all public recalls associated with any of the software component versions referenced in a given system. These tools operate similar to the computer at your car dealership when the dealer types in your VIN and determines if your car has any outstanding recalls for any of its constituent parts. SCA tools immediately surface all library versions within your system that correspond to item 6 of my list above, helping software engineers prioritise their upgrading efforts to focus on the most urgent library updates.

SCA tools sometimes include additional features such as copyright license analysis and staleness checks. MergeBase’s own SCA toolchain focuses exclusively on the recall problem.

In summary, the “Upgrade Problem” is a fundamental tension inherent to any software development practice that builds on reuseable software components. The problem is not easy to resolve, but ultimately some libraries MUST be upgraded. Personally I recommend tying the library upgrade decision to two factors: first, consider the library version’s current cyber security risk profile, and second, consider if the library’s own development team is relatively active and responsive.

In a nutshell, leave the library version alone (do not update it) if the following two factors hold (“if it ain’t broke”):

  1. The library is actively maintained.
  2. There are currently no public known-vulnerability security advisories tied to the version my system is using.

Otherwise, upgrade the library! In particular, if factor #1 no longer holds, migrate as soon as possible to an actively maintained competing library. Dead open source libraries like httpclient-3.x and apache-axis are notorious for accumulating CVE’s, and emergency migrations with such defunct libraries become high-effort and high-risk – a terrible combination.

Risk management for your open source software

An SCA tool (such as MergeBase Detect) is critical for determining if a library should be upgraded. In my own experience the “upgrade problem” is simply not tractable for manual best-effort approaches, and always-upgrade is too much work with too little benefit.

There’s one major caveat though. If you’ve been using the “PURE MANUAL BEST EFFORTS” approach for a long time, you need to both don a safety mask and buckle your seat belt before first running an SCA tool against your system. The initial report is going to be intimidating and overwhelming.

Discover More from MergeBase

Core Product

BuildGreen is a powerful solution for identifying the real risk of open source at build time or in existing applications

Learn how BuildGreen can protects your Enterprise

Add RunTime Protection

RunGreen detects and defends against known-vulnerabilities at runtime.

Learn why Runtime Protection Matters

Optional Developer Add-on

CodeGreen is an early-warning defence for your in-house development and integrates directly into code repositories

Quick Start - For Free