SHARE

Why the Apache Lucene and Solr “divorce” is better for developers and users

Commentary: A decade ago Apache Lucene and Apache Solr merged to improve both projects. The projects recently split for the same reason, which is a really good thing for users of search services.

Written By

Matt Asay

Jul 8, 2020

We may earn from vendors via affiliate links or sponsorships. This might affect product placement on our site, but not the content of our reviews. See our Terms of Use for details.

Image: photo_Pawel, Getty Images/iStockphoto

Must-read developer coverage

It’s very possible that you rely on Apache Lucene and Apache Solr every day, whether you’re looking for jobs on LinkedIn, trying to find that “bird-carries-shark” video on Twitter, or looking up random facts on Wikipedia. It’s also very possible that you have no clue how Lucene/Solr work, or how they’re developed. As such, you can be forgiven for not noticing that a few weeks back the Lucene/Solr community voted to break up, breaking Solr out from under Lucene and reversing the merger of the two a decade earlier, which you also likely missed.

And yet the designation of Solr as a top-level Apache Software Foundation project matters, and not just for the developers who contribute to one or the other (or both). While disentangling the two projects (build infrastructure, source code, etc.) will take time, users will benefit. Here’s how.

Making life easier for the kingmakers

While most people reading this won’t have any familiarity with Lucene, Solr, or Elasticsearch (a distributed search application that relies on Lucene), we use them every day. Lucene is a full-text search engine library, whereas Solr is a full-text search engine web application built on Lucene. One way to think about Lucene and Solr is as a car and its engine. The engine is Lucene; the car is Solr. A wide array of companies (Ford, Salesforce, etc.) use Solr to provide search on their websites without needing to build an application to make use of the Lucene library. Others want to fiddle more with the dials and knobs of Lucene and don’t rely on Solr.

Regardless, the two projects have been tightly bound since 2010 when the Lucene and Solr project management committees (PMC) voted to merge the two projects because “there was a lot of code duplication and interaction between Solr and Lucene back then,” as Dawid Weiss explained. Keeping the two together has become a burden over time. Solr depends on Lucene, but Lucene doesn’t depend on Solr, and tying Lucene to Solr has, among other things, made it harder to innovate the Lucene code at a pace many of its developers would like.

SEE: 5 developer interview horror stories (free PDF) (TechRepublic)

The two projects have continued to attract healthy, largely independent development communities, with new feature work happening in one or the other, not both. This divergence isn’t complete, of course. As Mike Sokolov noted, “A substantial number of people commit to both, over time, although most people do not. Also, relatively few commits span both projects. Some do though, and it’s certainly worth considering what the workflow for such changes would be like in the split world.” Even so, forcing them to join at the hip, though it once made sense as a way to retire some technical debt, no longer makes sense.

None of which would matter to the average user of LinkedIn, except that this separation promises to improve developer productivity for Lucene and Solr. If developers are the new kingmakers, as analyst firm Redmonk is wont to say, then making developers as productive as possible matters a great deal. So how does this split promise to help developers?

First, the split will make development for the respective projects more nimble. According to Weiss:

Precommit/ test times. These are crazy high. If we split into two projects we can pretty much cut all of Lucene testing out of Solr (and likewise), making development a bit more fun again.

Build system itself and source release packaging. The current combined codebase is a *beast* to maintain. Working with gradle on both projects at once made me realise how little the two have in common. The code layout, the dependencies, even the workflow of people working on these projects… The build (both ant and gradle) is full of Solr and Lucene-specific exceptions and hooks that could be more elegantly solved if moved to each project independently.

Second, separating the two allows their respective developers to focus on making Lucene (or Solr) as great as possible; so, a developer who makes API changes to Lucene will no longer need to make corresponding changes to Solr. This, in turn, allows both projects to release according to feature readiness, rather than waiting on each other. Given that Lucene tends to move at a fast pace of feature development, it means faster releases and improvements to the search services users depend upon.

This split won’t make the front page of The New York Times, unfortunately. However, those searching for articles in the Times will benefit. Because, of course, the Times relies on Lucene-powered Elasticsearch for that search functionality.

Disclosure: I work at AWS, but this article reflects my views, not those of my employer.

Matt Asay

Matt Asay is a veteran technology columnist who has written for CNET, ReadWrite, and other tech media. Asay has also held a variety of executive roles with leading mobile and big data software companies.