Blog

MergeBase successful seed raise

VANCOUVER, BC – Will cybercrime cause $1 trillion in damage to an already-vulnerable economy by 2021? Not if MergeBase Software Inc. has anything to do with it. The company has just announced it raised $500,000 funding for its best-in-class cybersecurity product — helping it ramp up sales and distribution. The funding round officially closed on March 19.

“I’m impressed that during this unprecedented crisis, business leaders and investors are able to ‘keep calm and carry on’, continuing to invest in leading-edge technology to solve critical problems like cybercrime,” says MergeBase CEO Oscar van der Meer. 

The current COVID-19 crisis and social distancing measures will only accelerate the move to a fully digital economy. In this new environment, cybersecurity for digital assets and IT will be even more mission-critical to business and governments.

“So many technology-powered companies are built on open-source code and third-party apps — which is a quicker, easier and cheaper way of building software,” he explains. “But those savings come with a cost, exposing organisations using these applications to data breaches.

“Integration with external apps already causes up to 24 percent of all cybersecurity breaches — and that’s only going to grow,” he says. “MergeBase is a best-in-class solution to boost the immune system of enterprises around the world.” MergeBase’s app-security solution detects more vulnerabilities than any other tool on the market.

Enterprises are already boosting purchases of app security solutions, from 5 percent in 2019 to 60 percent by 2024, according to a Gartner report. “It’s why we’re expecting to see big growth in our business.” 

MergeBase’s solutions are aimed at large enterprises. Their customers include a government agency that processes trillions of dollars of payments every year. 

The company’s co-founders bring a combined 50 years of experience in the financial industry, and a wealth of knowledge about identifying and dealing with vulnerabilities in technology; for instance, van der Meer was a senior executive at Central1, the central financial facility and trade association for the B.C. and Ontario credit union systems. 

Investors in the current funding round include Lisa Shields and Western Universities Technology Innovation Fund (WUTIF) and Maninder Dhaliwal.

Intro to SCA software composition analysis

Background

Why do we need SCA software? Well, that is a long story:

Commercial and industrial software is now primarily constructed from components. Open source components, to be exact. Open source software licenses dramatically decrease business frictions that arise from incorporating and integrating software developed by external entities. No more contract negotiation or in-house legal review!

Add to this the fact that many software use cases are more or less identical across systems: http connectivity, encryption, spell checking, transaction management, database object mapping, unit testing, etc. The end result is predictable: in less time than it takes to read the 2-clause BSD open source software license, your developers are copying externally developed software libraries into your proprietary systems. Because: why not? The license allows it, and developers achieve their objectives with fewer bugs and time to spare.

Software developers can now easily obtain pre-fabricated high-quality software libraries to help implement significant portions of their software. Your colleagues only need to write a small amount of glue code to wire these libraries into the larger system. Software, like automobiles, is now made mostly from parts.

But unlike cars, the supply chain in the software world is complete mayhem and chaos. Consider this common clause found in the majority of open software licenses:

Unless required by applicable law or agreed to in writing, Licensor provides the Work on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND

Risk management would flag this as a major risk and procurement  policies would not allow such clauses to stand in a legal review. But the advantages of open source software outweigh the risk of running without warranties. You get what you pay for. But there are costs. People often say there is no such thing as a free lunch – the same is true of open source software.

The Upgrade Problem

Some people, when confronted with a problem, think “I know, I’ll use an open source software library.” Now they have two problems.

When I build my systems, I choose specific versions of open source libraries to incorporate into these systems. These versions quickly become stale as open source authors continuously update and evolve their libraries, issuing new releases periodically. Worse, you’ll start to find vulnerabilities in them.

At first glance, the problem looks pretty simple and straight-forward. Somewhere in my build script my software will contain a line like this:

<dependency>
   <artifactId>apache-struts2</artifactId>
   <version>2.5.22</version>
 </dependency>

If I’m using JavaScript / NPM, then it would look a little different, but essentially the same:

"dependencies": {

   "apache-struts2": "^2.5.22"

 }

To avoid any vulnerability to the infamous cyber security bug that took down Equifax in 2017, I just have to change the “<version>” line in the text above to this:

 <version>2.5.23</version>

Then I click “save” and “build” in my coding editor and voila! My software is now safe. In the NPM example things are even better thanks to that “^” character. The “^” symbol tells the build script to upgrade the library to 2.5.23 automatically.

However, despite how simple the example above appears, in actual practice this problem is a complete f’ing nightmare. For several reasons:

  1. How do I even find out that the libraries I’m using have updates available?
  2. My system currently operates correctly (to the best of my knowledge). Could a library update break my system (regression risk)?
  3. Sometimes libraries change their own calling protocols and requirements in subtle or even not-so-subtle ways. How much work will I need to do updating my glue code to integrate a particular library’s newest version into my software system?
  4. Related to item #3, do the library authors themselves have any recommendations regarding long term plans? For example, the authors of the popular “Apache HttpClient 3.x” library decided they hated maintaining it and rewrote the library completely from scratch. They actively encouraged consumers of their library to switch to their new rewrite (“Apache HttpComponents 4.x”), and stopped all maintenance of the older library, but unfortunately switching to this newer version required significant effort for consumers.
  5. Does the current version of the library I’m using have any critical security flaws in it? Normal bugs prevent or perturb normal usage patterns, but I’ve already established that the library operates correctly within my system, and so I’m not too concerned about normal bug fixes. Security bugs are a whole different animal, since they often allow malicious users to cause the library to misbehave in ways that can degrade or even breach and exploit the larger running system.
  6. Are any of the critical security flaws widely known to the public at large? E.g., are they referenced by specific CVE (Common Vulnerability and Exposure) advisories within the U.S. Government’s NVD (National Vulnerability Database)? Upgrading library versions that are associated to CVE records should be considered a high priority, since cyber security breaches via these vectors are often perceived as engineering negligence by the public.
  7. Can we confirm exploitability based on our current configuration? If we can prove our specific setup is non-exploitable, that can buy us time to postpone the upgrade for now. But sometimes even establishing non-exploitability requires more work than simply upgrading the library.
  8. Bear in mind we must tackle this problem repeatedly for every library currently incorporated into our larger software system. Most minimally useful commercial systems will bring in at least 30 libraries; I figure the average is around 80 libraries; and I’ve personally seen systems that contain more than 300 distinct libraries.
  9. Some practioners recommend upgrading libraries when new library versions contain useful features that you would like to incorporate into your system, especially if such new features would allow you to delete some of your own code. I am on the fence on this matter, since in my opinion the maxim “if it ain’t broke, don’t fix it” outweighs this. However, should a library update happen to obviate code you are using in a different library, allowing you to completely remove one of the library dependencies from your system, I do recommend taking that upgrade. Good luck ever noticing such obviations, however.

The list above enumerates the tensions and problems we face when upgrading software components.

How to manage open source software vulnerabilities and license risk?

So what are people doing about it? First hand “in the field” I’ve seen three different approaches applied to this upgrade problem.

  • PURE MANUAL BEST EFFORTS. Under this approach the engineering team tries their best to keep library versions up to date when possible, and they try to keep an eye on any associated CVE records in the NVD database through google searches and peripheral awareness. END RESULT: typically these systems are severely stale and rife with vulnerabilities.
  • AUTOMATED ALWAYS UPGRADE EVERYTHING ALWAYS. These systems are less affected by CVE’s or other known-vulnerabilities, since known-vulnerability announcements tend to correspond to version updates, and systems under this regime take in updates immediately. This approach does not deal well with incompatible library upgrades, and such usually end up in a “Pure Manual Best Efforts” pile. END RESULT: these systems tend to have fewer known vulnerabilities, but they can be vulnerable to broken builds and regression bugs. They are also vulnerable to supply-chain attacks such as the event-stream NPM attack that occurred in late 2018.
  • TOOL ASSISTED SOFTWARE COMPOSITION ANALYSIS. Engineering teams can use SCA (Software Composition Analysis) tools to tackle the upgrade problem. Despite their name, SCA tools should really be called recall notifiers, since that is their primary function: to determine all public recalls associated with any of the software component versions referenced in a given system. These tools operate similar to the computer at your car dealership when the dealer types in your VIN and determines if your car has any outstanding recalls for any of its constituent parts. SCA tools immediately surface all library versions within your system that correspond to item 6 of my list above, helping software engineers prioritise their upgrading efforts to focus on the most urgent library updates.

SCA tools sometimes include additional features such as copyright license analysis and staleness checks. MergeBase’s own SCA toolchain focuses exclusively on the recall problem.

In summary, the “Upgrade Problem” is a fundamental tension inherent to any software development practice that builds on reuseable software components. The problem is not easy to resolve, but ultimately some libraries MUST be upgraded. Personally I recommend tying the library upgrade decision to two factors: first, consider the library version’s current cyber security risk profile, and second, consider if the library’s own development team is relatively active and responsive.

In a nutshell, leave the library version alone (do not update it) if the following two factors hold (“if it ain’t broke”):

  1. The library is actively maintained.
  2. There are currently no public known-vulnerability security advisories tied to the version my system is using.

Otherwise, upgrade the library! In particular, if factor #1 no longer holds, migrate as soon as possible to an actively maintained competing library. Dead open source libraries like httpclient-3.x and apache-axis are notorious for accumulating CVE’s, and emergency migrations with such defunct libraries become high-effort and high-risk – a terrible combination.

Risk management for your open source software

An SCA tool (such as MergeBase Detect) is critical for determining if a library should be upgraded. In my own experience the “upgrade problem” is simply not tractable for manual best-effort approaches, and always-upgrade is too much work with too little benefit.

There’s one major caveat though. If you’ve been using the “PURE MANUAL BEST EFFORTS” approach for a long time, you need to both don a safety mask and buckle your seat belt before first running an SCA tool against your system. The initial report is going to be intimidating and overwhelming.

DOING GIT WRONG

Too much fun with “git pull –rebase”

Note:  this article refers to “git pull -r” and “git pull –rebase” interchangeably. They are the same command, except the merge-preserving variation can only be specified via the long form: git pull --rebase=preserve

Introduction

I’ve long known that “git pull –rebase” reconciles the local branch correctly against upstream amends, rebases, and reorderings. The official “git rebase” documentation attests to this:

‘Note that any commits in HEAD which introduce the same textual changes as a commit in HEAD..<upstream> are omitted (i.e., a patch already accepted upstream with a different commit message or timestamp will be skipped).’

Thanks to the git patch-id command it’s easy to imagine how this mechanism might work. Take two commits, look at their patch-ids, and if they’re the same, drop the local one.

But what about squashes and other force-pushes where git patch-id won’t help? What does “git pull -r” do in those cases? I created a series of synthetic force-pushes to find out. I tried squashes, merge-squashes, dropped commits, merge-base adjustments, and all sorts of other force-push craziness.

I was unable to confuse “git pull –rebase,” no matter how hard I tried. It’s bulletproof, as far as I can tell.

Investigating ‘git pull –rebase’

The context here is not a master branch that’s advancing. The context is a feature branch that two people are working in parallel, where either person might force-push at any time. Something like this:

The starting state. Evangeline and Gabriel are working together on branch ‘feature’. Note: ‘evangeline/feature’ is actually Evangeline’s local ‘feature’ branch, and ‘gabriel/feature’ is Gabriel’s local ‘feature’ branch. Clone it from here, or recreate it using the script below.

git init
echo 'a' > a; git add .; git commit -m 'a'
echo 'b' > b; git add .; git commit -m 'b'
echo 'c' > c; git add .; git commit -m 'c'
git checkout -b feature HEAD~1
echo 'd' > d; git add .; git commit -m 'd'
echo 'e' > e; git add .; git commit -m 'e'
echo 'f' > f; git add .; git commit -m 'f'
git checkout -b gabriel/feature
echo 'gf' > gf; git add .; git commit -m 'gf' --author='gabriel@mergebase.com'
git checkout -b evangeline/feature HEAD~1
echo 'ef' > ef; git add .; git commit -m 'ef'evangeline@mergebase.com'
git push --mirror [url-to-an-empty-git-repo]

The Experiment

In each scenario Evangeline rewrites the history of origin/feature with a force-push of some kind, usually incorporating her own ‘ef‘ commit into her push. Meanwhile, Gabriel has already made his own ‘gf‘ commit to his local feature branch. For each scenario we want to see if Gabriel can use “git pull –rebase” to correctly reconcile his own work (his ‘gf‘ commit) against Evangeline’s most recent push.

Preconditions

  1. We assume Gabriel has correctly setup remote tracking for his local feature branch. This is a reasonable assumption, since git sets this up by default when a user first types “git checkout feature”.
  2. We only tested Git v2.14.1 and Git v1.7.2 for this experiment. Perhaps “git pull –rebase” behaves differently in other versions.
  3. Important: we only use “git pull –rebase” (or -r).  Some people claim “git fetch; git rebase origin/master” is equivalent to “git pull -r”, but it isn’t.

Force-Push Scenarios

For each scenario we are on Gabriel’s local branch feature. The graph on the left shows both the state of origin/feature (thanks to Evangeline’s force-push), as well as the state of Gabriel’s local feature and how it relates to Evangeline’s force-push.  The graph on the right shows the result of Gabriel typing “git pull -r”.

A scenario is deemed successful if “git pull -r” results in Gabriel’s gf‘ commit sitting on top of origin/feature.  Since Gabriel does not push back in these scenarios, his ‘gf‘ commit remains confined to his local feature branch.

  1. origin/feature rebased (against origin/master)
    This is the canonical example of why we prefer “git pull -r”.  The rebase notices that older commits ‘d‘, ‘e‘, and ‘f‘ on Gabriel’s feature branch are patch-identical to the rebased ones on origin/feature, and thus it only replays the final ‘gf’ commit.



    Result: Success!

  2. origin/feature squash-merged (with origin/master)
    This is the rebase + squash combo meal.  Evangeline takes all work on feature, squashes it down to a single commit, and rebases it on top of origin/master.  She probably did this via “git merge –squash.” I did not expect “git pull -r” to be able to handle this, but I was wrong.




    Result: Success!

  3. origin/feature squashed in-place
    This is the classic squash. Evangeline types “git rebase –interactive origin/master”.  In the interactive screen she marks the first commit as “pick” and every other commit as “squash” or “fixup”.  This squashes feature down to a single commit, but leaves the merge-base alone (commit ‘b‘ in this case). I also did not expect “git pull -r” to handle this one, but I was wrong here, too.



    Result: Success!

  4. origin/feature dropped a commit
    For some reason Evangeline decided she wanted to drop commit ‘e‘ from origin/feature. She ran “git rebase –interactive origin/master” and marked every commit as “pick,” except commit ‘e‘, which she marked with “drop”.  I expected “git pull -r” to erroneously bring commit ‘e‘ back.  I was wrong.  Running “git rebase” instead of “git pull -r” did bring commit ‘e‘ back, and so there is obviously some deeper intelligence inside “git pull -r” enabling the correct behaviour here.



    Result: Success!

  5. origin/feature lost their mind
    I have no idea what Evangeline was trying to do here.  If you look closely, you’ll see she reversed her branch (‘ef’ is now the oldest commit), she squashed the middle two commits, and she adjusted the merge-base so that origin/feature emerges from commit ‘a‘ on the mainline instead of commit ‘b‘.  This is one serious force-push!  I had no idea what to expect here.  I certainly did not expect “git pull -r” to nail it, but it did.



    Result: Success!

  6. origin/feature went back to how things were (undoes the rewrite)
    Evangeline, either through her reflog or her photographic memory, happened to remember that origin/feature previously pointed to commit ‘325a76a.  Here she force-pushes origin/feature back to ‘325a76a‘ to undo her push from scenario 5. The command to do that is useful to know:  “git push –force origin 325a76a:refs/heads/feature”. Staring in awe at how “git push -r” did the right thing for scenario 5, all I could do was continue to stare when it did the same here. (Note: Gabriel’s start-state here is scenario 5, not the original start-state).



    Result: Success!

Conclusion: Time To Revise The Golden Rule

Supposedly, the golden rule of git is to never force-push public branches.

Of course I would never force-push against ‘master’ and ‘release/*’.  As a git admin, that’s always the first config I set for a new repo:  disallow all rewrites for ‘master’ and ‘release/*’.

But all public branches?  I find force-pushing feature branches incredibly useful.

Industry has arrived at a compromise: defer the rewrite to the final merge. Bitbucket, Gitlab, and Github now offer “rebase” and “squash” flavours of PR merge. But it’s a silly compromise, because the golden rule itself is silly.  Instead of building complex merge machinery to dance around the golden rule, I think we’d be better served by reworking the rule itself. Three reasons:

  1. Force-pushes are useful!  Public amends, squashes, and rebases help us make better PR’s for code review.
  2. What is the actual point of the golden rule?  Are we trying to prevent lost work on the mainlines (e.g., ‘master’ and ‘release/*’)?  If that’s the point, then we’re much better off setting appropriate branch permissions on our central git server for those branches.
  3. Is the point to prevent the spaghetti graphs caused by default “git pull” behaviour?  In that case a better golden rule would be never use default “git pull” and always use “git pull –rebase”, since it avoids spaghetti graphs, while allowing history rewrites.

I propose a new golden git rule (in haiku form):

We never force-push master
or release. But always,
for all branches: git pull -r

Alternatively, you can make “git pull -r” the default behaviour:

git config --global pull.rebase true

Git graphs in this article were generated using the Bit-Booster – Rebase Squash Amend plugin for Bitbucket Server.

Discover More from MergeBase

Core Product

BuildGreen is a powerful solution for identifying the real risk of open source at build time or in existing applications

Learn how BuildGreen can protects your Enterprise

Add RunTime Protection

RunGreen detects and defends against known-vulnerabilities at runtime.

Learn why Runtime Protection Matters

Optional Developer Add-on

CodeGreen is an early-warning defence for your in-house development and integrates directly into code repositories

Quick Start - For Free