Commentary: Enterprise convenience more than any license seems to govern database popularity.
There's never been a better (or worse) time to be looking for a database. "Better" because you're spoiled for choice, with DB-Engines listing 371 different databases. "Worse" because, well, you're spoiled for choice. It's hard to know which database model to use (document? relational? graph? other?), which databases others are using, and which databases they love to use (actually, according to Stack Overflow's 2020 survey, that would be Redis, PostgreSQL, MongoDB...).
That job may not be made any easier by the fact that no clear signals emerge from the DB-Engines data. Well, except perhaps for the sheer pragmatism of enterprises when it comes to their data. As RedMonk analyst Steve O'Grady pointed out in a Twitter conversation, while it used to be critical for data infrastructure to be open source to succeed, that "rule" no longer seems to apply.
SEE: Cheat sheet: The most important cloud advances of the decade (free PDF) (TechRepublic)
A databases popularity contest
But first, what are the most popular databases, based on DB-Engines multi-faceted ranking (Figure A)?
That's today, but when DB-Engines first started tracking database popularity in 2012, the numbers were very different, even though many of the same databases graced the top 10 (Figure B).
Keep in mind that DB-Engines doesn't track absolute database popularity--it tracks relative popularity. Here's how they describe the methodology:
We calculate the popularity value of a system by standardizing and averaging of the individual parameters. These mathematical transformations are made in a way so that the distance of the individual systems is preserved. That means, when system A has twice as large a value in the DB-Engines Ranking as system B, then it is twice as popular when averaged over the individual evaluation criteria. In order to eliminate effects caused by changing quantities of the data sources themselves, the popularity score is always a relative value, which should be interpreted in comparison with other systems only.
If I'm understanding their approach correctly, this means, for example, that Oracle was 22X more popular than a document database like MongoDB in 2012 but is now just 2.6X more popular (with "popular" measured by jobs, professional qualifications mentioned on LinkedIn, online discussions in places like Stack Overflow, etc.). That is incredible growth for MongoDB in less than a decade, particularly given how long Oracle dominated the database market, and how slow enterprises tend to be to swap out their databases.
But then, this is just part of a larger trend away from proprietary databases toward open and/or shared source databases (Figure C).
As an open source guy, I like that trend. Yet there are some clear counterfactuals in the up/down movement on the popularity rankings.
SEE: 5 programming languages cloud engineers should learn (free PDF) (TechRepublic)
The first few movers confirm the open source hypothesis, with Redis moving up to #7 from #8. That doesn't seem like much, but for database systems in the top-10 rankings, even one spot is a big deal. By that same token, Elasticsearch bumping down one spot is also significant (though, of course, it remains very popular). The same holds true of Apache Cassandra, which dropped out of the top 10, replaced by Microsoft Access. Microsoft Access has been falling down the popularity chart for nearly a decade, yet it remains an easy option for the Excel crowd that needs to manage more data than is easily contained within Excel.
That's the top 10. What were the big movers in the top 25?
The biggest move came from Microsoft Azure SQL Database, a proprietary cloud database that jumped eight places to #15 (up from #23 in 2020). Neo4j, an open source graph database, also made a big move up the chart, climbing four places to #18 (up from #22). And, finally, proprietary SAP Adaptive Server moved from #22 to #17. Seeing a pattern in what's hot and what's not? I don't.
Though significant, none of these shifts are as pronounced as the database ranking gyrations outside DB-Engines' top 25. Though there were pronounced falls in some databases between #25 and #100, the databases on the rise are perhaps more interesting:
Snowflake rocketed from #101 in 2020 to #26 today, the single biggest jump of any database/data platform this past year (and the largest I've seen since I've started following the DB-Engines rankings in 2013).
Prometheus went from #72 to #61.
CockroachDB jumped from #74 to #58.
Timescale bumped from #110 to #93.
Apache Druid soared from #119 to #100.
Of these, Prometheus and Druid are open source, TimescaleDB and CockroachDB are shared source/source available, and Snowflake is proprietary. What conclusions can we draw from these, as well as other rankings (up, down, static)? First, there doesn't seem to be a magic recipe for database popularity. With my open source hat on, I'd love to say that "open source is eating the database world" (oh, wait–I have), but some of the biggest upward shifts come from databases with licenses that aren't strictly open source, or that are offered as a service without the ability for a developer to download them at all.
In fact, that might be the one big lesson from the popularity rankings: Whatever enterprises may say about their preferences (most will tout open source, said O'Grady), their purchasing decisions are governed by convenience and productivity. Whatever helps them move faster and deliver more is what they'll choose. For example, part of the reason for Redis' rise is, of course, the fact that it's a fantastic database. But just as important is the fact that a variety of vendors compete to make Redis easy for enterprises to use. Take away the managed services, and the popularity of Redis, PostgreSQL, and other open source (and proprietary) databases would plummet.
Which raises another question for another post: What are customers signaling when they choose to use a fully managed open source database? Does the license still matter? O'Grady seems to suggest "no" but, again, that's another post.
Disclosure: I work for AWS, but the views expressed herein are mine.
Cloud wars: Who can make cloud the most boring? (TechRepublic)
Multicloud: A cheat sheet (TechRepublic)
Power checklist: Local email server-to-cloud migration (TechRepublic Premium)
Cloud computing: More must-read coverage (TechRepublic on Flipboard)