It took much longer than expected but the queen of NoSQL DBMS' latest version, Cassandra 4.0, is finally here.
Well, that took a while. Well over a year ago the Apache Software Foundation announced the beta of Cassandra 4.0. Developers were eager to get their hands on this, the most popular of the NoSQL databases. Alas, it took longer than many had hoped. Then at the 11th hour, a nasty bug was found, which further delayed Cassandra's release for a few days. But, at long last Apache Cassandra 4.0 is here and ready to tear into your petabytes of data.
SEE: Electronic Data Disposal Policy (TechRepublic Premium)
If you haven't met this open-source, NoSQL database it's high time you did. Like all NoSQL databases, Cassandra's designed to analyze huge—I opened by saying petabytes remembe—amounts of semi-structured data. The name of Cassandra's game is storing massive amounts of incoming data with over a million writes per second and being able to quickly access this data in a scalable and reliable manner.
Because of that, Cassandra is used as the database of record for some of the world's most critical applications by companies such as Apple, DataStax, Netflix and Yelp. Because it stores viable data from everything from finance to healthcare and everything in between, its data must have the highest guarantees of correctness and quality. So the Cassandra Project Management Committee decided: "The overarching goal of the 4.0 release is that Cassandra 4.0 should be at a state where major users would run it in production when it is cut."
To make that happen, the Cassandra crew custom-built new data correctness tools. These covered:
- Property-based/fuzz testing
- Replay testing
- Upgrade/diff testing
- Performance testing
- Fault injection
- Unit/test coverage expansion
This wasn't easy, and it took more time than expected. The developers also ran into numerous hiccups along the way. But, now they believe the code is fully baked and ready to be served. Indeed, Cassandra 4.0 is already being used in many major businesses. This was, after all, the idea in the first place.
Looking ahead, Cassandra won't be taking so long with its next release. It may have taken six years to go from Cassandra 3.0 to 4.0, and the 4.0 beta took more than 13 months, but the plan is for Cassandra to move to a six-month release cycle. There will be six months between dot releases, and 12 months between major releases.
SEE: Snowflake data warehouse platform: A cheat sheet (free PDF) (TechRepublic)
Cassandra 4 brings to the table many improvements. This starts with supporting Java 11, long term support, in addition to Java 8. However, Java Development Kit 11 is only supported as an experimental feature, so you should not use it for production.
Cassandra also finally includes Audit Logging. With this, you can set configurable limits to heap memory and disk space to prevent out-of-memory errors. All database activity is logged per node as file-based records to a specified local filesystem directory.
In a related development, Cassandra now supports live full query logging. Once again you can set configurable limits to heap memory and disk space to prevent out-of-memory errors. Besides being helpful for live traffic capture and traffic replay, you can also use it for debugging query traffic and migration.
Cassandra's new Zero Copy streaming, enables you to have five times faster data streaming between clusters. For users in the real world, that means five times faster mean time to recovery when there are problems. This, in turn, means it will reduce your total cost of ownership because you'll need less cloud, server and network resources.
Finally, Cassandra's programmers promise that it will be the most stable version of the program ever. I'm inclined to believe them. They take a lot of time and trouble to not just improve Cassandra's performance but its stability as well. We'll soon see if my faith in them has been justified. With the kinds of loads Cassandra deals with every day for every one of its customers, there's no place to hide problems.
- Geospatial data is being used to help track pandemics and emergencies (TechRepublic)
- Akamai boosts traffic by 350% but keeps energy use flat thanks to edge computing (TechRepublic)
- How to become a data scientist: A cheat sheet (TechRepublic)
- Top 5 programming languages data admins should know (free PDF) (TechRepublic download)
- Data Encryption Policy (TechRepublic Premium)
- Volume, velocity, and variety: Understanding the three V's of big data (ZDNet)
- Big data: More must-read coverage (TechRepublic on Flipboard)