Time series databases are on the rise, with TimescaleDB of particular interest to developers.
Just a few years ago, time series databases were somewhat niche in nature. Sure, if you were running a trading application within a financial services firm, you were devoted to your kdb+ (proprietary) database, but for most everyone else a general-purpose relational or NoSQL database was de rigueur. No more. The reason? The world increasingly demands that enterprises be able to query, analyze, and report on streaming data in real-time, not batch mode.
Over the last two years, time series databases like TimescaleDB and InfluxDB have exploded in popularity, according to DB-Engines data, with also jumping into the market with its Amazon Timestream database in late 2018. In so doing, it's an open question whether all databases begin to look like time series databases and if, in this way, "niche" becomes mainstream and databases like TimescaleDB, InfluxDB, and Amazon Timestream become the MySQLs and PostgreSQLs of the future.
Getting somewhere fast
Over the last two years, no database type has grown faster in popularity than time series databases:
While the chart above tracks relative growth in popularity (relational databases like MySQL and document databases like MongoDB, for example, are already well established), it's still indicative of something important happening in the industry. Time series databases help us make sense of changes in the world over time. More thoughtfully, as Timescale CEO Ajay Kulkarni put it:
[T]ime-series datasets track changes to the overall system as INSERTs, not UPDATEs.
This practice of recording each and every change to the system as a new, different row is what makes time-series data so powerful. It allows us to measure change: analyze how something changed in the past, monitor how something is changing in the present, predict how it may change in the future.
[So] here's how I like to define time-series data: data that collectively represents how a system/process/behavior changes over time.
This sounds suspiciously like what all databases are supposed to do, yet these old-school databases lack the ability to efficiently store and give access to high volumes of data. Relational databases and NoSQL databases can be used for time series data, but arguablywill get better performance from purpose-built time series databases, rather than trying to apply a one-size-fits-all database to specific workloads. As AWS' Shawn Bice once explained to me, developers want the right tools for the right job, even if that means using multiple tools to get a multi-faceted job done.
But what if you could have the comfort of a known database and the performance of a purpose-built time series database?
SEE: 13 things that can screw up your database design (free PDF) (TechRepublic)
That's what the Timescale team is doing with TimescaleDB, explained company founders Ajay Kulkarni and Michael Freedman in an interview this week. Similar to how MongoDB started out as a PaaS but eventually settled on the database portion of its PaaS, Timescale started as an effort to deliver an IoT platform. The company tried to use InfluxDB, MongoDB, and other existing database systems, but ultimately opted to build its own.
That is, TimescaleDB is an extension, or overlay, of the popular PostgreSQL database. Why does this matter? First, they explained, it gives them a rock-solid foundation upon which to build. More than this, however, it also gives companies the comfort of the ecosystem of PostgreSQL tooling, as Freedman told The Next Platform's Timothy Prickett Morgan:
We don't muck around with how the data is stored on disk, and therefore we inherit all of the reliability of PostgreSQL. We also enforce the same PostgreSQL interface, so all of the tooling for this database works with TimescaleDB. The part is in the middle is that we have figured out how to scale PostgreSQL for time series data, and we are 20X faster at inserts than PostgreSQL. And we are 10X faster than Cassandra, and unlike Cassandra, we also support full SQL.
All your PostgreSQL goodness but with added performance for time series data (e.g., fast ingest). A developer gets to leverage her SQL experience and query SQL natively. But because the Timescale team has built on top of PostgreSQL as an overlay (or extension, if you will), its development track runs independently from the main PostgreSQL database. It's the best of both worlds for customers and for the company.
It's an interesting approach to an increasingly interesting type of database. As the world continues its march toward real-time, time series databases will continue to grow in popularity. The real question is whether there are natural boundaries to their utility. According to Kulkarni, the answer is an emphatic "No": "All data is time-series data."
How to become a data scientist: A cheat sheet (TechRepublic)
Feature comparison: Data analytics software, and services (Tech Pro Research)
IBM launches $200m centre to double down on Watson and IoT (TechRepublic)
Big data: More must-read coverage (TechRepublic on Flipboard)