Hardware

Scalability vs. performance


Building a scalable application? So am I. One of the mental conflicts that I keep having is that all of the layers that we add to an application (particularly a Web application) to add scalability… well, they hurt the performance of the application!

An increasing common pattern is to have the client talk to a presentation layer talking to a business logic layer talking to a data access layer that talks to the database. This makes sense. And for the sake of interoperability and reusability of each layer for future growth, we end up doing it through XML at the very least, or SOAP, or something similar (anyone remember CORBA?). And it all makes sense. And let's add some more layers to this. After all, if the app ever gets enough users, we are going to need to have some sort of state sharing within each of those layers, so let's add a clustering technology at each layer. After all, with all of these concurrent requests flying around, we need to be able to maintain state, locks at the session and application layer, etc. etc. etc. between servers. Let's face it, if we are thinking about a system big enough to have to worry about being able to add business logic processing servers without having to add presentation logic servers (they will scale at different rates), then it is an application big enough and important enough to need some redundancy, failover, load balancing, and more. Which means clustering with shared "stuff."

So now we have this monstrosity of an architecture, and tracing an error involves two or three XML transactions, three (at a minimum) logical servers (they may all be on the same hardware), and a big dose of oversight software.

What a mess. It is so easy to say, "Hey, let's just have the application logic and presentation logic and data access logic all in the same layer, and if the box goes down, oh well, sessions will be lost. But we live in a world where customers get sold contracts with "five nines" and SLAs and performance metrics and the works. So our applications need to be fast, scalable, and robust. And "scalable" and "robust" run counter to "fast."

The disaster of Java frameworks is a great example. Build a app on a popular Java application server, and start piling on the thousand and one different frameworks out there. Spring, struts, faces, and all of those other Java words. Now, do something dumb like cause a NullPointerException on purpose, and examine the stack trace. That null value went through 80+ layers before it got tossed as an exception. I am not picking on where the exception occurred. What I am pointing out is just how much overhead went into handling that variable. Do the same thing in a little Perl or PHP or Ruby script, and the variable went through only a few functions before it hit the runtime exception level.

This kind of thing is why .Net and Java will never be as fast as native code: The native code writer will rarely have more than a few layers between the OS and the application. These massive frameworks are so abstracted in the "kitchen sink" effort, they have 50 -150 copies of the data sitting in the stack before it ever gets to where it is going.

This has been a bit of a digression, but it illustrates my point.

You see the exact same thing in an n-tiered Web application built for scalability, robustness, and interoperability plus future expansion. Let's follow an imaginary client request to insert a new record into the database:

1. The load balanced HTTP server receives the request and parses it just enough to decide which application server cluster it needs to go to.

2. The application cluster receives it, and then starts to process it at a presentation layer level. It "sees" the request to insert a record, and creates a SOAP request to the business logic layer to handle it.

3. The business logic cluster gets the request, and needs to check to make sure that there are no other requests trying to insert a duplicate row (maybe a double click on the "Submit" button?). So it polls itself and all of the other cluster members with some shared memory system to see if they are doing the same thing. If it does not see that happening, it sticks a note into the shared memory for "heads up, I’m doing XYZ right now, so please do not do it." The business logic layer creates a SOAP request to the data access layer to insert the row.

4. The data access layer gets the request, opens it up, figures out which stored procedure on the database layer to use, and calls the stored procedure over a TCP/IP pipe.

5. The database gets the request, runs the stored procedure, and reports success to the caller.

6. The data access layer receives the status, and reports success via a response to the SOAP request.

7. The business logic layer receives a success notification, reports to the shared memory that it is done, maybe does a few more things, and returns a success message to presentation.

8. Presentation selects an “A-OK!

About

Justin James is the Lead Architect for Conigent.

19 comments
Wayne M.
Wayne M.

As a general concept, it is better to parallelize things than to serialize them. There are good reasons to break this rule, but I still feel it has value as a concept. For reliability, it is better to have multiple processors executing copies of the full program than to have multiple processors executing specific pieces of the progrm. In the first case reliability improves as processors are added. In the second case reliability decreases. Likewise, serialization decreases performance by adding handoffs between processes and processors. Keeping as much of an operation within the bounds of a single process can improve performance. Within program organization, having a few top level calls with a long call chain (highly serialized path) is often more difficult to maintain the having a wider range of top level calls each with a short call chain. Hint, push control logic as high as possible, algorithms as low as possible. The conflicting forces for creating serialized tiers must also be considered. One strong reason is third party software, tools, and environments. If there are portions that are already developed, it would make sense to trade development time for some performance and reliabiity hits over completely developing from scratch. Security is another rationale. Tiers can provide impediments to unauthorized access or attacks.

apotheon
apotheon

I think, actually, that what's needed is nothing more than a simple return to writing software in smaller chunks -- chunks that each do only one thing, and do it exceedingly well. Employ the Unix tradition of simplified software development, and leave all this ridiculous overcomplex tomfoolery where it belongs (in the lawyer's office). In fact, I just wrote about that subect at some length, in a piece I called [url=http://sob.apotheon.org/?p=245][b]OOP and the death of modularity[/b][/url]. (Too bad most of what I have to say that's blogworthy isn't well-suited to the Geekend, where I've been invited to blog at TR. Heh.) edit: punctuation

RexWorld
RexWorld

The complexity you describe, the kind that comes from trying to build a scalable architecture, does have some side benefits. The one I see all the time is maintainability. That may seem counter-intuitive given the complexity of n-tier, that it would improve maintainability. But I've worked on several projects where we were able to upgrade key components of the system without users noticing at all. Because we could keep the application up and running while slowly pushing out the new upgrades. For example, replacing the ad engine that powers a Web site. If the core page generation system on that site had been a single monolithic JSP that pulled in everything as a single database call (content, ads, etc.), we'd have had to bring down the entire site just to upgrade the ad portion. Instead the ad calls were a separate component, invoked via a SOAP-like mechanism as you describe, running on separate hardware. We could gracefully replace the ad engine in each ad server and slip them into the production ring without users or advertisers ever noticing. Now I'll admit, if that ability to keep the system running while upgrading portions is not vital to your application, then these n-tiers may indeed be overkill. There's lots of apps where it's okay to endure a few hours outage during an upgrade, in which case I'd go for a simpler architecture.

Justin James
Justin James

Do you like n-tiered, no tiers, or something in between? If you do not use tiers, how do you provide for scalability, redundency, and robustness? J.Ja

Justin James
Justin James

Wayne - All great points! I wish I had mentioned these items in my original post. :) J.Ja

Justin James
Justin James

I read that a day or two ago, when you posted it. I agree with much of what you say. OO, despite the promises, almost invariably leads to tighter coupling, not looser coupling. Too many classes get made where they really just act as a "roll up" of a dozen other classes, sometimes in an array or collection, and at the end of the day, "feel" like a relational database in memory. Not exactly "loosely coupled" nor "efficient"! I do like how *Nix handles the coupling concept in many way; for example, I can swap sendmail with the MTA of my choice, and none of the systems above or below the MTA in the stack notice or care, as long as the new MTA reacts identically to sendmail. Just an example. Unfortunately, all too often what I see is that it takes a piece of software to become the de facto standard (like sendmail was) and be lousy enough to be worth writing alternatives to (like sendmail was) but easy enough on the interface end to replicate what functionality that is exposed (like sendmail is). That's why we really do not see alternatives to tar (it works too well to bother replicating) or OpenOffice (too complicated to replicate), to give two easy examples. I am not sure if OSS is an inherent necessity of this design pattern; if the component piece has well defined "edges" via a precisely documented API. As an example, poorly documented, complex pieces of open source code are much harder to swap than a well documented, relatively simple piece of closed source code. Sure, having the source available makes it easy to examine the logic of the piece you want to replace. As an example, think of database access. Because the methods of communicating with a particular database are well documented and standardized (via ODBC, JDBC, DBI, etc.), it is very easy for an application writer to work with a backend database without knowing or caring what is on the backend (provided that no vendor specific extensions are used, and that the connection string is stored outside of the code), regardless of whether the backend database is open source or not. Indeed, I would argue that open standards/specifications/APIs are *more* important than open source, for the simple fact that if you have those (and they are accurate) you do not need to see the code to know what to do to replace the component. But back to the original point, OOP is an utter mess, in my mind, as far as most people use it. Your arguement that OO has enabled programmers to make even more gigantic messes because it is easier to manage giant piles or code is spot on. It's like if you speed 10 over the limit when you owned a Ford Taurus, you start speeding 30 over the limit when you buy a Corvette. Just because you can be better at doing something wrong, does not suddenly make it right. J.Ja

Tony Hopkinson
Tony Hopkinson

your first responder :D I've been looking at hosting IronPython under C# .Net recently, more for reducing maintenance, (data model is fairly static, but the processing on the values is more dynamic). It does have the side benefit of shifting the coupling from code to data though, which is pretty much how chaining could be described. I'm looking at the costs in terms of processing a lot of simple tasks on simple classes vs one path through a very complex process on a small set of generic classes. It might turn out to be much of a muchness, but design wise in a dynamic environment which is driven by external legislative criteria there are a lot of advantages. The trick is not to go barking mad and script absolutely everything or conversely next to nothing. Finding the balance is why they pay me the small to medium bucks. :D I'm familiar with the latter, had to work on an application where all the functions were effectively in the Customer class. I do mean all as well, use that class and you had to reference 95 other units and absolutely every operation required you start with MyCustomer. MyCustomer.Locations.Offices.Contacts.TelephoneNumbers[j].FullInternationalNumber(); sort of thing. Bloody awful design.

Justin James
Justin James

N-Tier architechture can indeed significantly improve maintenance. It also makes testing a lot easier! J.Ja

Meesha
Meesha

Although I agree with the statements overall I must say that having the architecture defined does make a difference in both simple and complex situations. Always looking to the future,by laying down the right foundation now even if it appears to be "over kill" may be what sets your stuff apart when scalability, portability, usability, securability, etc. I mean that, I've implemented a basi 3 tier architecture - Presentation, Business Logic and Database - as the foundation for all apps/software regardless on in-house developed or not. This ensures that I can update the GUI without affecting the application or do the upgrade at that level on the fly. When business logic (process) changes the affect is less invasive as is a change in the db. All in all, laying good groundwork now provides a stable foundation that allows for quick turnaround and response to new requirements, technologies, etc.

debuggist
debuggist

I've seen far too many attempts to create an ultra-uber app that can scale unrealistically. Just build what you need now and can be modified easily to handle what's coming 6-12 months ahead. The business environment just changes too much to try more than that.

Saurondor
Saurondor

Justin, I don't think the issue lays so much in the architecture as it does in the interconnection of components and individual testing of these. Work to have clear and simple interfaces between your layers and hammer them do death with unit tests. I recall my fist experiences with Java and all the MVC, XML, remote calls etc. Kept coming up with huge stack dumps which meant nothing. Something in Hibernate could crack and all JSF said was something broke in a page. Was it the JSP? The validator? The backing bean? Or the ORM? By dividing you reduce the size of the trace and make things a lot easier to fix and maintain. The ASP-PHP model is nice, but trying to handle higher levels of complexity is like thinking that if you can handle a 1024x768 image on your PC you can handle a 1024000x768000 one too. You will have to break it down to smaller images sooner or later. Same thing happens with your original small web app. As it grows you'll have to break it apart or you'll end up with an "all inclusive" script. The n-tiered if kept to simple interfaces is much easier to maintain, benchmark and upgrade.

Gast?n Nusimovich
Gast?n Nusimovich

So, It seems like we could evenly blame both the paradigm (OOP) and the way devs use the paradigm as the main culprits for this mess of an architecture.

apotheon
apotheon

See? I [b]knew[/b] I had readers!

Justin James
Justin James

One thing I keep seeing is people trying to shoehorn what amounts to an eval.() statement into compiled langugaes (typically Java, C#, and VB.Net) based on configuration or metadata. It is why I am really attracted to .Net; write the core components in C# or VB.Net, and expose their functionality to IronPython or something similar (really waiting for Ruby.Net!). It gives you the best of both worlds. Or ou can flip it around, and have that compiled code call an interpreted piece of code for that rapidly changing business logic. More and more apps, the "configuration" has gone far beyond a simple key/value pair, and is practically a language of its own. In that kind of scenario, compiled code is too rigid (what are you going to do, recompile every time the user changes something that affects core business logic?) and the frameworks people layer on top of compiled code to get it to allow run time changes to business logic are rediculous (like config files or config tables filled with SQL commands that get read at run time... ugh!). J.Ja J.Ja

Justin James
Justin James

I do not disagreee with a single one of your statement; you describe scalability quite well! But all of the overhead needed to acheive scalability hurts performance significantly. OO abstraction is a similar thing. When a variable needs to go through 53 layers just to report its value, that is a huge performance hit. J.Ja

Tony Hopkinson
Tony Hopkinson

My plan is to pass all data in and out on a class, so the script only changes the content of the class. Then I can do a test run with example cases, expected and answers and a diff. Always a chance of a spelling mistake, but the scripts can be tested out side of teh software. Building scripts the fly scripts is much more problematic, paths through teh code will proliferate exponentially if you go mad with that.

Justin James
Justin James

Mark - You are absolutely right about that... at the very least, you need a debug build of the code to put the configuration through to step through that, or a good development environment that mirrors production to a T. J.Ja

Mark Miller
Mark Miller

I explored this several years ago with a co-worker. He had a home grown framework he was working with that was built by our employer, to build an app. It was a UI framework and appearance was configurable using an XML file. He asked the question, why couldn't we script the framework so that when business requirements change we don't have to recompile the app.? Stuff like that is possible, of course, but then you get into a different problem. Once you start putting decisional logic into script code, then it's possible to introduce bugs into the system. If you're using a scripting language that has some support behind it you might be okay. The big thing for me with that approach is debugging support in the language. Can you do a step-wise trace through the code, monitor variable values, etc.? If not I'd be extremely wary of taking that approach. You'll be asking for trouble. If you can, however, then go for it.

Tony Hopkinson
Tony Hopkinson

using dynamicmethod calls and lightweight codegen etc. It worked but, IronPython is much better solution from just about every point of view you can think of. I haven't hammered it as yet, but where performance is a concern I won't be doing either. I really like the ironpython implementation under C#. Get the architecture right, which basically means keeping it as simple as possible and you can take almost all of the dynamic business logic out of the code. When I passed in a class instance to engine and it just accessed it through reflection in the script without me doing a thing, I was able to cut down on my viagra intake. :p

Editor's Picks