Data Management

It's time to consider alternatives to RDBMS

Justin James says applications are far too complex to force into the inherent limits of the relational database management systems (RDBMS) model. He encourages developers to explore their options.

 

I have been amazed at the explosive growth that we are seeing on top of relational databases in terms of the layers that developers add to them. For at least the last 10 years, it has been the norm for applications to use direct database calls, possibly parameterized, to access the database. Advanced developers might store the application's SQL statements externally from the source code, perhaps in a text file or even a configuration table in the database, to give some separation between the two. Then we saw the rise of the data layers; then the use of Web services to loosely couple the data layers; and now many developers add an object-role modeling (ORM) system or similar abstraction on top of the database just to make the data layer.

For at least the last 20 years, developers have been trying to figure out how to make the relational database work better for our needs. When do we start trying something entirely different?

I understand the history associated with relational database management systems (RDBMS); these systems are fast, ACID-compliant, standardized (for the most part), and well understood. A developer can code an application against a vendor's RDBMS and switch to another one without major heartache in most cases. Indeed, there are piles of systems out there that abstract away the vendor differences to the point where the developer is not aware of them.

Another consideration is the RDBMS alternatives (such as object-oriented database management system or OODBMS) have had a lot of issues in the past. The alternatives were slow, completely proprietary, vendor specific, lacked ACID-compliance, and were a total mystery to most developers. This is not a formula for success. Folks who experimented with OODBMSs tended to get burned. The closest that OODBMSs got to mainstream was Smalltalk, which effectively was an OODBMS, amongst other things.

My frustration level with the RDBMS model has been steadily increasing; RDBMS simply does not represent the business model of most applications very well. The industry likes to call this "impedance mismatch," but I call it a pain in my neck.

Why am I so annoyed with RDBMS?

The applications that I work on that have data storage requirements (I'm tasked with programming and supporting these apps) have lately had very irregular data needs. Here's a good example: Take a look at the database for any application that allows users to create their own types of content (not just their own content). (SharePoint and Team Foundation Server are good instances of this, and they illustrate the pattern well.) At the database level, they are architected as best as they can be, given their requirements. All the same, you see a real mess in the storage systems; there are piles of columns whose purpose to the application is not determined by the database structure itself but rather a cross reference of each row to a "content type" table somewhere. This means that a developer looking to extend, maintain, or otherwise use the database is forced to recreate significant amounts of logic. And woe to the system administrator who is trying to solve a problem with the application.

Developer tool creators started to really notice these issues in the last few years, and they have released a flurry of products and components designed to provide some relief. Most of these ideas build on the existing RDBMS infrastructure, which makes sense. Customers have an existing investment in those systems; the systems are understood, tested, and known entities; and the systems are already in place.

In addition, there is the bizarrely illogical (yet amusingly "human") tendency for people to chase "sunk costs" in an effort to recover them (also known as "throwing good money after bad"). In this case, programmers would rather add layers of complexity to their applications (often at a significant cost in time and sometimes tools), which makes maintenance and debugging more difficult, to compensate for the "impedance mismatch" between their applications and their databases. Systems in the ADO.NET Entities Framework Hibernate and so on are good examples of what I mean. I've looked at the ADO.NET Entities Framework, and I have no plan of touching it unless my application's database diagram can cover my kitchen table; it's just too complex and heavy to bother with unless my application is already complex and heavy. Some systems I've seen (such as OutSystems' Agile Platform) hide the complexity very well, but it is still there.

Meanwhile, the OODBMS market has made a lot of strides. At the suggestion of a friend, I checked out db4O. While I haven't used it yet, my friend (whose opinion I value and trust) said some really nice things about it. I plan to take a look at db40 in the near future. Many OODBMSs are now ACID-compliant. Looking through the documentation and examples of a few of them, not only are they less work to interface with than an RDBMS, they are less effort than dealing with the ORM tools and similar systems. And while the raw data access speeds (particularly reads) are probably slower than an RDBMS, once you account for the various data access layers in a typical application plus the cost of converting database data to language-native objects, the OODBMS will be faster. (Again, I haven't had the chance to use an OODBMS yet.)

I plan to provide more coverage about this topic in the future, though I suggest that you start to investigate the market. Most developers I know recognize that we've been bandaging this issue for a long time, but they do not realize that there are options. There are choices, and I think it's time that we started exercising them. Our applications are far too complex to keep trying to force them into the inherent limits of the RDBMS model.

J.Ja

Disclosure of Justin's industry affiliations: Justin James has a working arrangement with Microsoft to write an article for MSDN Magazine. He also has a contract with Spiceworks to write product buying guides.

---------------------------------------------------------------------------------------

Get weekly development tips in your inbox Keep your developer skills sharp by signing up for TechRepublic's free Web Developer newsletter, delivered each Tuesday. Automatically subscribe today!

About

Justin James is the Lead Architect for Conigent.

31 comments
JulesLt
JulesLt

Worth taking a look at The Third Manifesto - Date and Darwen's revisit of the idea of relational databases. Having seen Darwen (one of the inventors of SQL at IBM) present on this, his argument is that a lot of the impedance/mismatch between 'relational' and 'OO' is down to implementation, not the relational model. For instance, the relational model doesn't really say anything about tables. It's only relatively recently that RDBMS have allowed updates through Views - or projections - yet imagine if there was no difference between sub-classing and views. But he reserves his biggest criticism for SQL - even though he had a hand in developing it, his point is that it was the first attempt at a relational language - and compromised by what was possible at the time (mid-70s). Contrast this with something like Smalltalk, where the language was years ahead of the hardware and runtime - the focus with SQL and early RDBMS was on creating something usable, not theoretical. On the other hand, many critics of RDBMS seem to propose ideas that smell very much like the heirachical database systems they replaced (and which I recall being a far bigger pain to work with). I do think there is something interesting in the Smalltalk idea of just persisting an image of the current machine state, rather than having a separate 'data store'. There are a couple of systems out there revisiting this idea - Etoile is one - looking at replacing files and databases with OS level objects, automatically persisted by the language runtime. I do have a concern that people are turning against RDBMS as unthinkingly as they turned to them, or have embraced OO programming as 'correct' - particularly developers who've never had to deal with genuinely complex data sets (quick look - there are dead on 1000 tables in the app I'm working on right now). And I have seen for myself the problems caused when an OO developer decided to persist objects as individual XML files onto the filesystem to 'save the overhead of going to the database' - only to find that file access speed when you have tens of thousands of files in an externally mounted directory starts to get very slow. That's not saying direct file storage, or XML are bad, just that decision making should be informed, rather than opinion based. Oh - and one more thing - I think that the whole JDBC/ODBC approach to RDBMS integration hasn't helped - the SQL-J or LINQ approaches seem far better to me - trap the errors at compile time, not runtime, rather than treating everything as strings. Even if it is just syntactic sugar / pre-compiled back into ODBC calls.

verelse
verelse

...who point and click their way to big ole honkin apps that use too much space, too much bandwidth, are unscalable, closed, proprietary. These apps are impossible to maintain and get tossed every few years as each new iteration of the latest programming fad appears. Meanwhile, RDBMS and the programmers who *code* against them continue their patterns of success, scalability, maintainability, and interoperable openness. SharePoint is not for programming, it is for pointing and clicking. No offense, but if you do not want to code, get a new career.

TexasJetter
TexasJetter

I looked at db4O, it looks interesting. I have a project that would benefit from customizable objects, but I wonder about the long term availability of these "off the wall" storage systems. To put it in perspective, I have an app that was started in VB3 using MS Access. While the app is now distributed using VB6, it still uses Access for data storage. So while the app has been tweaked with features over the years, the base storage has stayed the same (for better or worse). This means I have been able to use basically the same storage system for over 8 years. Prior to me the app was written in Quicksilver/Pascal/C++ (different modules were in different languages). Of course this was a DOS world back then. The data for most modules was in DB2 (I think it was 2). If I really had to I can import from the DB2 structure into Access, so data going back 10+ years is still accessible. Now if I go with custom data stored in a proprietary engine to access it, what would happen if they go belly up and no longer write their engine for a new OS? If I completely build my app around a custom storage system I'm hosed. I don't know, what am I missing?

WhatTheThunk
WhatTheThunk

"Advanced developers might store the application?s SQL statements externally from the source code, perhaps in a text file or even a configuration table in the database." Ever heard of stored procedures? If not, you must be using MySQL v4x or lower. Store procedures is the only way to go.

wallace.wood
wallace.wood

I have seen InterTystems Cache ODBMS and it is impressive. Their Ensemble product is a tool for tyng things togethor from disparate sources. There is also a group, the ODBMS.org that has some very interesting stuff.

don.gulledge
don.gulledge

I've been studying SharePoint to some extent and find that it has some very new and capable ways of dealing with mixed data types. However, the one thing is lacks is the relational agility even when it's built on top of SQL. You can't create two list, and relate the one list to the other unless you use the lookup field type that isn't really a solid relation, but just a fancy way to make a popdown populate. I think if MS would have implemented relationships between list so you could have a Master-Detail or Parent-child-child situation it would be a screamer product. But, since they've left it out, the end developer has to build it in C# or C++ and if you do that, you might as well be using Java. Even with Infopath it avoids providing it either. I think that your asertion that we move past relational data system into something else is a false premise since everything in the universe is relational at some level. We can move past it, but only if it's included as a basic capability and we can still use it as needed.

Tony Hopkinson
Tony Hopkinson

There just aren't that many applications I've worked on that don't need relational functionality. The ideal is for a completely self contained object (ie something we'd just stream straight into a blob) in a relational database. Great hoorah it works... Now for the real world ObjectA and ObjectB not necessarily the same type link to ObjectC. When A or B or opened the user wants to see the detail, and amending either one wants that to appear in all places. Give it a new name if you want, make a suspicious bump under the rug if you want, it's a relation. Get to it could be an ObjectC1 or C2 a list of either, or a D,E,F. All youve done is shifted a polymorphic conract by interface design into a 3rd party hideously complex service. Doesn't look too bad under the rug, but the lump will get bigger over time and trip you up on a regular basis.

Vladas Saulis
Vladas Saulis

As years go by, considering computing power will grow exponentaly, the best DBMS soon be an old plain text files/filesystems with simple APIs. Especially when 99% of databases now are write-once/read-mostly/no-deletes-at-all.

Sterling chip Camden
Sterling chip Camden

The Synergy DBMS has a SQL interface, but it can also be accessed directly via an ISAM approach. That scales very well, and also allows for data models that don't find the usual relational model. It's not OO, though.

Justin James
Justin James

Have you used or looked at the OODBMSs out there? What are your thoughts on them? J.Ja

JulesLt
JulesLt

I've spent my entire working life (since 1992) working against Oracle databases, and I've seen various front-end languages rise and fall - so I should be on your side. On the other hand, I can see the point of both OO and an object datastore - for certain types of application, there seems little point converting from the in-memory representation of objects into a structured format. In fact, with today's memory size, and vastly improved virtual memory management, there is far less requirement to put data into off-line storage - in many cases, the RDBMS itself will be using an internal OO representation of the relational data, and cacheing as much as possible for speed - we're a long way from navigating around disk blocks via a B* trees (which is what I did in 1989).

Tony Hopkinson
Tony Hopkinson

Wrong... Very, very wrong. Have you seen a property bag solution in an RDBMS. I have, made my hair stand on end. Anyway what's this success you are talking about. Programming in business land is treading water at best, turd polishing for the most part though.

Tony Hopkinson
Tony Hopkinson

I remember looking at an objectstore approach in 2000. My current employers looked at in 2002/3. It wasn't right then, and for waht we do OODBMS doesn't offere enough advantages to throw away the investment since then. I'd be the first to agree that the property bag is very convoluted in an RDBMS. The document one trivial but basically gives you nothing with out writing a shed load of code, to do things like find me all Fred's documents. Relational databases weren't invented for a laugh. They were the best way to get the desired functionality at a reasonable cost with data integrity. When you read you don't need locking with an OODBMS, you start to wonder.... If two things can access the object at the same time, of course you need locking, unless your object is merely an unmanaged completely isolated unstructured bucket.

Justin James
Justin James

One thing that SQL databases have going for them, is that they are supported by so many things. Even if your current system no longer exists down the road, you can be sure that something else can read the data. OODBMS' are much less likely to work like that. At best, they might support a small amount of cross language/system support, like DB4O does. J.Ja

Justin James
Justin James

If you carefully look at what I wrote, it serves an *entirely* different purpose than a stored procedure. In fact, one might say that it serves the *opposite* functionality. SQL statements stored in configuration allow the program logic to be changed "on the fly" in a manner much more easily changed than a stored procedure. Stored procedures are indeed very useful (particularly regarding performance and security) but they do not address the underlying difficulties with programming against an RDBMS. J.Ja

WhatTheThunk
WhatTheThunk

"However, the one thing is lacks is the relational agility even when it's built on top of SQL. " Actually that isn't true, at all. With SP datasource not only can you join two or more SP lists, you can join an SP list with just about any other datastore such as an XML file or even another table in another SQL database. You can't do this with the standard SQLDataSource. SP Designer provides a very nice wizard for doing this. SP rocks especially if you know how to utilize it's Web Services.

Justin James
Justin James

... but where did you get the idea that OODBMS's are not aware of relations? If anything, their understanding of relations is *much* more explicit and clear than an RDBMS! Instead of having rows linked by ID numbers (which do not need to be used, I may add... no reason why you can't manipulate a row indpendent of the parent), you deal with fields/properties/members of objects, which maintains the parent/child relationships. J.Ja

verelse
verelse

...someone will still have to code so all the pointy clicky types can be employed. Sort of like the OOD/Procedural debate: take a look at the actual machine code: it is ALL procedural. OOD is nice for readability, but someone has to write real code for it to work. Same for the endless desire to "wrap" that seems to be the hallmark of the new superbloat that passes for programming these days.

Sterling chip Camden
Sterling chip Camden

The lump under the rug that gets bigger over time and trips you up on a regular basis. Applies to so many things!

verelse
verelse

File-based RDBMS is inevitable and scalable, but requires programming, not pointing and clicking and property-setting.

illanonline
illanonline

I have considered this on an off over the years - its been around for at least 8 or 10. The interesting thing is that it claims not only to be a true OODBMS, but also to give *improved* performance over an RDBMS, AND it offers a native SQL query interface to the underlying object representations. Its always sounded almost too good to be true, and has many real world large installations to its name. Should have taken over the world by now... maybe an idea ahead of its time?

blackfalconsoftware
blackfalconsoftware

Most of the complexity today in IT is mostly a result of management seemingly demanding the impossible out of its development staff and for little worth. If you study the realities of current IT there is a tremendous amount of waste in effort and little gained by the results from many projects that do eventually go in, especially the rather large ones, which often time are over budget, over blown, and over propagandized.. That being said, applied to databases you would be surprised to find what a study on the subject produced several years ago. In a study to determine the efficacy of SQL based RDBMSs, it was found that maybe only 10 percent of the time were the actual benefits of SQL databases actually being used. The other 90% of such usage was relegated to operations more applicable to ISAM databases, which at the time, and as a result, were seeing a resurgence. SQL was actually designed for the end-user in order so that they may develop ad-hoc queries. It was never really intended to be a development platform but moved in that direction quite quickly once the ease of use was seen to be far superior to that of the then current hierarchical implementations. Today, we have a rather newcomer on the block in terms of the OODBMS, most of which have actually been around for a little while. OODBMS satisfies the need when dealing directly with object structured data is a requirement. However, how often is that a necessity? Most likely you will find such requirements in the military and scientific communities, where such databases could easily shine. In fact, a study in this subject showed the same results when applied to OO programming many years ago. The concepts have not changed since... In terms of business, there is really very little that new innovative technically based technologies will provide... except a lot more expense in terms of acclimating to them. Most of what is newly integrated has already been accomplished successfully with current and existing technologies. ASP and ASP.NET are classic examples. Yes, ASP.NET is much easier to develop with and more enjoyable to do so as a result. However, much of what we do with ASP.NET today we were doing with ASP yesterday. There are some areas where new technologies definitely improve the day-to-day operations in a business such as the recent surge in collaborative technologies. However, overall, business has gained little in terms of actual technical innovation. Moving to an OODBMS simply for the sake of seeing RDBMSs as "old technology" will not increase the benefits as a result. You will still do the same things with it as you did before; insert, update, and delete while rarely requiring comparisons between objects such as would be required in military systems. Business simply doesn't have such requirements...

paul.spaulding
paul.spaulding

If you are developing a web application, check out http://www.persvr.org. This is an open source project with great potential. I've been playing with it for a couple weeks and have been impressed with how easy it is to work with, especially when coupled with Dojo (but it will work with any REST library/client). For the Java folks out there, it can be used strictly server side if you wish. From the website: By implementing standards-based communication protocols, Persevere provides you with a feature-rich set of open source interoperable client and server frameworks. The Persevere server features a secure RESTful JSON interface for data interaction and storage of dynamic data, JSONQuery/JSONPath querying, Comet-based real-time data notification through Rest Channels and Bayeux support, class-based based object-oriented business logic with data integrity and validation through JSON Schema, and still supports existing SQL tables. Persevere supports a JSON-RPC interface for interaction with its server-side JavaScript environment. The Persevere Server is an object storage engine and application server (running on Java/Rhino) that provides persistent data storage of dynamic JSON data in an interactive server side JavaScript environment with the following key features: * Create, read, update, and delete access to persistent data through a standard JSON HTTP/REST web interface * Dynamic object persistence - expando objects, arrays, and JavaScript functions can be stored, for extensive JavaScript persistence support * Remote execution of JavaScript methods on the server through JSON-RPC for a consistent client/server language platform * Flexible and fast indexed query capability through JSONQuery/JSONPath * Comet-based data monitoring capabilities through HTTP Channels with Bayeux transport plugin/negotiation support * Data-centric capability-based object level security with user management, Persevere is designed to be accessed securely through Ajax with public-facing sites * Comprehensive referencing capabilities using JSON referencing, including circular, multiple, lazy, non-lazy, cross-data source, and cross-site referencing for a wide variety of object structures * Data integrity and validation through JSON Schema * Class-based data hierarchy - typed objects can have methods, inheritance, class-based querying * Pluggable data source architectures - SQL tables, XML files, remote web services can be used as data stores * Full access and control over Java libraries via JavaScript * Robust, solid cross-site request forgery protection

Jeunito
Jeunito

...in the form of Google's Big Table for my thesis in the University. I don't know for sure if that's an OODBMS but it seems to me like it is. Objects are known as entities. Someone told me that CouchDB was something similar to Google's Big Table but apparently it's Document Oriented. Don't know what the difference is. Maybe you can try it out too? My adviser says its gaining popularity because of Ruby.

The 'G-Man.'
The 'G-Man.'

It worked but was a bit over complex even for quick tables. I think a long term investement in time was needed. It was only a 12 week module at Uni.

Tony Hopkinson
Tony Hopkinson

Ad hoc reporting and analysis is one that gets thown in my path. Whether it's excel, access, crystal reports etc. There needs to be some standard interface anf probably sql based on these implementations. Personally I've never been mad keen on some MBA type adhc querying my live database, but a lot of people do work this way in SMEs

Tony Hopkinson
Tony Hopkinson

If it's a good one and you deal with it at that level, all is good. You could say it's all quantum mechanical events, which may be particular vibrations in a 2-brane through n dimensions at something we call a point, but is really not one in a Euclidean way.... Not a good model for describing a Customer's properties and the operations you can carry out on it though is it?

paul.spaulding
paul.spaulding

When I last looked at Intersystems Cache (and actually talked to them), their approach was "we want to partner with our customers...we will look at your business and come up with pricing for your scenario". I think most companies that are unable to be upfront about their pricing will have a tough time taking over the world even if they have a good product.

Justin James
Justin James

For applications which can be reported against, OODMBS' aren't the greatest choice, unless they either feature a SQL interface as well, or can automagically dump to a data warehouse which can be queried. J.Ja

Sterling chip Camden
Sterling chip Camden

Exactly -- if OO describes your problem domain well, then use it. It's a tool, not a panacea. But as verelse notes, it has become SOP - which is not always a good thing.

Editor's Picks