With so many security and developer teams doing postmortems on the Log4j security vulnerability fiasco that unfolded in late 2021, just 10 days before Christmas, the main question is: how do we avoid this type of pain in the future? The answer, unfortunately, is … it’s complicated.
SEE: Patch management policy (TechRepublic Premium)
According to new data from (ISC)2, the world’s largest nonprofit association of certified cybersecurity professionals, nearly half (48%) of cybersecurity teams gave up holiday time and weekends to assist with remediation, and 52% of teams spent “weeks or more” remediating Log4j. Not exactly how already-stretched developers wanted to spend the holidays.
On the upside, however, the pain of that experience has triggered a major software supply-chain security rethink from developers and security teams.
Fixing vulnerabilities without breaking legacy code
One of the most troublesome aspects of Log4j was not the vulnerability itself, but how deeply embedded the technology is in legacy code. After all, Java is one of the world’s most popular platforms, and Log4j is an incredibly popular Java logging system whose initial release dates all the way back to 2001. So Log4J touches not only a ton of production systems, but also a lot of legacy code.
“Nobody wants to touch legacy code,” said Sergei Egorov, CEO of AtomicJar, the new startup behind the open source integration testing framework, Testcontainers. “You don’t just need to fix a security vulnerability, you need to know that you fixed the vulnerability without breaking your system. When you have a vulnerability with a super popular logging library like Log4j, you are talking about dependencies on legacy code that typically lacks any testing, and where sometimes the developers who wrote the code and understand how it works don’t even work at the company anymore.”
According to Egorov, Log4j often is a transitive dependency of other libraries that need to be updated. Unlike platforms that provide static binaries, with Java systems, the code linking happens at run-time, so there’s no way to have 100% confidence that the application will behave correctly until you actually run it and test the real-time linkage between dependencies and compilations.
Egorov said Log4j has accelerated interest in the already popular Testcontainers platform, as a way to test these interactions ahead of production deployment. He sees developers who were stung by Log4j now creating integration tests between systems and external dependencies, so that the next time a security vulnerability arrives, developers can test that their fixes won’t take down production systems when deployed. Testcontainers is becoming a popular pairing with Snyk, because developers can get pull requests for automated security requests, and integration testing gives them the confidence they can merge them without breaking anything.
Which is worse … the vulnerability or disrupting users?
The arrival of the Log4j security vulnerability and its terrible timing during peak holiday season created a perverse choice for developer teams—deploy fixes now and risk taking down systems during peak holiday e-commerce cycles, or punt the deployment of fixes to less commercially risky intervals?
It’s a decision that is impossible to make if you don’t have the right context.
“Log4j caused many engineering teams to panic because they had no way of predicting how fixing it would affect their users,” said Marcin Kurc, CEO at software reliability startup Nobl9, whose customers include large e-retailers. “There is a cost-benefit analysis that needs to take place on any security remediation. You have to determine the right interval to deploy the fix, which requires a complete understanding of the risk in terms of how many users it could affect, and the acceptable level of unreliability you can accept.”
SEE: NIST Cybersecurity Framework: A cheat sheet for professionals (free PDF) (TechRepublic)
Kurc’s company is championing a movement called service level objectives (SLOs) that were born in Google’s site reliability engineering practices and that Nobl9 has commercialized into a platform.
SLOs allow developers to model uptime and success rates across software interactions and to define thresholds for user outcomes—say, for example, what percentage of shopping cart checkouts are executed correctly. The companies that are modeling SLOs, Kurc says, can have a real conversation about the risk of patching versus not patching.
Such solutions, however, come after the fact or, rather, after open source software has been written. But what do we do about making it inherently more secure?
A broader problem: who owns security for open source?
No one is going to stop using open source. It would be absurd to build a logging solution from scratch, rather than reaching for tools like Log4j. Developers are writing less logic and integrating more frameworks, libraries and APIs these days, and that is not going to change.
As Google’s Filippo Valsorda wrote in a viral post, many of these open source maintainers “fall in one of two categories: volunteers or big company employees. Sometimes both. Neither model is healthy.”
Log4j illuminated the fact that so much of the modern software supply chain is propped up on open source projects with a handful of maintainers, or even a single maintainer, who often created the technology as a side project. (And let’s be clear: recent data suggests that the vast majority of all open source software is written by fewer than 10 people.)
“Modern applications are built from many components, many of which are not developed in-house but are rather assembled using ‘off the shelf’ solutions,” according to John France, CISO at (ISC)2. “A large part of qualifying and identifying issues is knowing what components and libraries are used by your software and keeping a software bill of materials (SBOM).”
But according to one anonymous security practitioner in (ISC)2’s Log4 remediation poll, “Developers in general have been very lax about tracking what they use in their software. When an event like this requires us to identify whether some library or component is used by our code, that lack of traceability becomes a major pain point. It turns a simple exercise of checking inventories and SBOMs into a complex scanning process, with many opportunities for false positives and false negatives. If we ever needed a wake-up call, we got a big one with Log4j.”
Google and other heavyweights are throwing muscle into this open source security vulnerability challenge, and time will tell whether deeper investment and vendor collaboration can help reduce the frequency and pain of incidents like Log4j. But in the meantime, developers are devising strategies to avoid terrible security surprises next holiday season.
Disclosure: I work for MongoDB, but these views are mine alone.