Leadership optimize

Performance often still matters


Do developers care about performance anymore? I seriously doubt it, at least for the vast majority (over 90%) of developers. A few weeks ago, I had an interesting talk on the forums here about performance. I compared some simple Perl code to equivalent C# code for text processing speed. The numbers were fairly inconclusive (Perl was about 25% faster) due to the fact that I most likely was hitting disk limitations more than anything else. With some tweaking, I can fine tune the test to time only the processing, and pipe in data cached in memory to take the disk speeds out of the equation, to do a raw performance measure. What struck me most about this whole thing was that it has been quite some time since I talked to another developer about performance of code. Sure, when talking about coding, a smart developer can play the, "Ah, but that is so slow!" trump card. But rarely do we ever talk about performance plain and simple.

This weekend, I became extremely curious about some aspects of the ASP.Net Application object and how it works. Doing some research into it, I ended up doing a lot of reading about the caching options that ASP.Net offers. I knew that there was this cache object and such, but I never really looked into it, assuming that it would be fairly Mickey Mouse, like much of the built-in ASP.Net objects. Boy was I wrong! Despite the huge performance gains that this system can deliver, it truly stunned me that I have never once heard of anyone use it, show how to use it, or even mention it. It even supports the ability to have SQL Server notify the cache that a data source has changed and that the cache should be invalidated; it also provides a callback mechanism so that you can trigger a re-caching if desired. Pretty nice stuff!

This begs the question: Why does no one seem to care?

OK, let's get honest for a moment. I am talking about ASP.Net here, which is not known for being a monstrously fast system. .Net is fairly close to native code, probably a 10% - 20% (depending upon what you are doing) hit compared to C++ for most programs. On modern hardware, and for most applications, that hardly seems like much. And, yes, the interpreted languages that used to get knocked for being "slow" five or ten years ago (Perl, Python, and PHP) are fairly swift in comparison to today's managed code systems of today (.Net and J2EE). But in a Web server environment, or any kind of server environment, a 10% - 20% reduction in speed will equate to a 10% - 20% increase in hardware needs (actually it's a bit more because each server loses some resources to the OS, and even more to the overhead needed if clustering is used), which means a 10% - 20% increase in power bills, and a 15% - 25% increase or so in IT support time (particularly for deployments). For a large application (particularly for a SaaS/ASP provider with a huge application), this can be millions of dollars. But most companies today choose that cost rather than developer time. After all, developing in any modern framework is still faster than writing C++ code.

Even so, sloppy coding or not taking advantage of built-in speed boosters like caching is just throwing money out the window. Performance is a lot like multithreading -- there are no hard-and-fast rules, but there are a lot of firm-and-speedy guidelines. There are a lot of tradeoffs and compromises. And unless the programmer knows what he or she is doing, they stand a very real risk of making things worse.

Sticking with the caching example, a bad developer can cache too much, increasing the overhead to the point where non-cached performance suffers. Or, a bad developer can improperly cache, delivering out-of-date results where real-time results are needed. And a bad developer can cache data that constantly needs refreshing or should not be cached, which uselessly increases overhead. And so on.

It is not about to cache or not to cache. Here's what I'm really wondering: Why do so many developers not seem to care about being ignorant of performance-increasing techniques?  And, more bizarrely, why don't the bean counters care? The bean counters are the bane of every developer. They're the ones who tell us that we can't have eye-saving task lighting and must make do with cheap overhead fluorescents. They're the ones who sit us in $25 chairs with $5 keyboards and $7 mice and $100 monitors. Yet, they apply zero pressure whatsoever to reduce the real operating costs of their server room by 10%, 15%, or more. I find this attitude bizarre. Granted, many performance refactorings can take nearly as long as writing the code the first time, only to eke out a marginal improvement. And yes, developer time is expensive. And, of course, an hour spent refactoring working code is an hour not spent incurring billable hours.

The tradeoff between time and value is known as ROI. How much time you can afford to put into performance enhancements is largely dependent upon the size of the project. And there is zero excuse for coding practices that are known to slow code down, like not calculating loop ending conditions in advance, forcing a recalculation on each loop. A good developer will write code that runs fast by itself, and outside items like external caching libraries, multithreading, and so on, can be added later. But a poor developer writes sloppy, slow code to begin with, and then chirps up with the "throw hardware at it" excuse or blames the language/sys admins/network engineers/framework developers/the latest OS patch/whatever. So, please, try to be a good developer. Learn some healthy habits that help your code run faster and reduces the need for tweaking, and get to know what your options are for boosting performance. In the tough world of development, a developer who has this knowledge and approach to programming stands out in the pack.

J.Ja

About

Justin James is the Lead Architect for Conigent.

60 comments
c45207
c45207

"A good developer will write code that runs fast by itself, and outside items like... multithreading... can be added later." It has not been my experence the multithreading can be added later without redesigning a large chunk of the affected module, so, no, multithreading cannot be added on later.

Logos-Systems
Logos-Systems

Always Complaining! I agree that performance always matters. Sloppy coding and DESIGNS should never be allowed. But I don't see sloppy programming & designs that you have found, and then show the fix. That way the rest of the Software Development Community learns from you. But all I see hear in this artical is complaining that nobody has educated you.

bluemoonsailor
bluemoonsailor

What I want to hear about is what you discovered about the caching options in the .Net Application object! How about a column on what you found out? Steve G.

pauljakubik
pauljakubik

Performance optimization gets a bad name because there are plenty of developers who do stupid things in the name of performance. Tools can help. For Java, IntelliJ IDEA has inspections you can turn on for performance problems. As you type your code, IntelliJ will put warnings on performance issues. IntelliJ then gives you the option to automatically change your code to a form that doesn't create an unnecessary object, or takes advantage of more efficient standard library APIs. If more tools could do this, every developer could make little performance improvements to all their code with no loss of readability or horrid transformations of their code.

BOUND4DOOM
BOUND4DOOM

Well I think you are right, some people will argue that development time is mor important and sometimes refactoring takes a long time. However if you build with perfomance in mind from the begining then this really doesn't become and issue. I was really hoping this article would show me something I didn't know but caching is old news to me and I do not build a website without. I have also done presentations on caching at .net user groups. it is a very powerful tool however it does have some gotcha. Another simple performance booster is StringBuilder for any time you are building up strings and doing a lot of concantinations. Heck using Regular Expressions and so on as well. There are a ton of things you can do to improve performance. Oh yeah and caching data on the server, if you cache a DataTable or a DataSet. You do know you can set up indexes and keys on those internal DataTable structures didn't you? Even primary and foreign key relationships inside your cached Dataset. We all know that setting up indexing correctly in a Database speeds things up. Well same thing on a DataTable or DataSet. Performance does matter, people that do not program with performance and Security in mind right from the begining are not really what I would call programmers, I would call them tinkerers.

john
john

There are two types of performance: 1) Speed of development 2) Speed of execution Of course, you don't want to write junk just to ship the product but you must ship and spending a lot of extra time to squeeze a 10% performance gain (that won't be noticed/appreciated) is not a good investment. That said, my current project is very performance sensitive. Even a 5% gain in performance is worth spending a few days implementing. It runs on desktops for a targeted user base. So, we just need to keep in mind what is important in each project and let's not waste efforts where they are not needed or wanted.

Mark Miller
Mark Miller

Re: the "bean counters" don't demand efficiency in the server room My guess is you've already put your finger on this: "bean counters" can relate more to hardware than to software. Notice that they pay attention to the cost of the monitor, keyboard, mouse, etc. My guess is they also pay attention to the per unit costs of the server room, buying cheap boxes, though ones that are performant enough to run the tasks. My guess is they were convinced a while back that managed code was the way to go for web apps., so they're sticking with that, and they're just figuring that the hardware cost goes with the territory. As for efficient vs. inefficient software, my guess is they don't have a clue what the difference is. As I've often said before with customers, who do have their performance requirements, they don't care how an app. is engineered, just so long as it works. It's not just the "bean counters". I had a boss once years ago, who was (and is) a professional developer (still know the guy), who seemed to insist that we write as much as possible in-house, even though at the pay rate we were getting, buying a pre-packaged solution that did basically the same thing would've been quite a bit cheaper. I suppose one of his criteria was performance. We were typically able to produce our own stuff that ran faster and took up less disk space and memory than the commercial equivalent, because we were able to target it at exactly what we wanted, rather than the "kitchen sink" approach of a commercial solution. The problem was the stuff we wrote had to be debugged more. So, he cared about performance, just not the cost to get it.

jhuybers
jhuybers

Hell yes we care about performance especially developming mainly for the Pocket PC platform.

Justin James
Justin James

How much effort do you put into making your cade fast when you program? What kinds of effort do you put into it? J.Ja

Justin James
Justin James

I agree that multithreading a major logic function "later" can indeed be a brutal amount of work. Choosing to multithread an I/O function that takes a while, after the core module has been written, is a no-brainer; you just pop off a separate thread as soon as you have everything you need for it, and rejoin the main thread when you finally need the data. Choosing the parallel process a complex algorithm is an entirely different matter. When it comes to multithreading, there is sometimes a few pieces of low hanging fruit - the gimmes. I am in agreement with you, that in many cases, the decision to multithread needs to be made in the planning or early coding stages, or it will need to wait until the next version. Those cases are often hidden, unfortunately... you do not discover that a routine is a good candidate for multithreading until after the fact when you are profiling to find the bottleneck. But sometimes they are apparent even in the planning stages. Hope that helps to clarify my statement, which admitedly looked dumb on the surface! J.Ja

Logos-Systems
Logos-Systems

Performance always matters. It is not only the sloppy coding, but also the design, or lack there of that is also at fault. Also management is at fault, by the fact that not providing Software Quality Assurance, or a Testing Department to enforce good designs, and coding practices. But all I see in your article is complaints that nobody has taking the time to train you! I don't see examples of poor coding practices you have found, and how you fixed them. You complain that languages like PHP execute faster than Managed Code. But I don't see the PHP and Managed Code examples that you compared and what performance metrics you acquired. If you are not just another one of complainer you are ranting about, then provide examples. If you are a complainer I would point out that it is your responsibility to educate yourself, not the responsibility of other. As far as nobody writing about the Cache Object I remember a number of Webcast, articles, and MSDN Events that occurred just before and right after ASP 2.0 was available. I just did a quick search on www.msdn.com and found over 50 articles that are still available. I also did a similar search for "Performance Best Practices" at the same site and got the follow results: "Results 1-50 of approximately 23642 for: Performance Best Practices" So it would appear to me you have made very little effort to educate yourself, or provide proof of your complaint!

Justin James
Justin James

Steve - I just might do that for next week. As another poster pointed out, there was a ton of information on it and publicized... when .Net 2.0 was released. One thing I have noticed in the industry, is that all of the exposure to really strong functionality happens in the few months leading up to, and a month or three after the release of something. If you were not paying attention at that time, you missed it and have to go dig it up, if you know where to look. In my case, with the cahcing stuff, I happened to be knee deep in FoxPro and VBA when it was released, and really missed most of the good .Net 2.0 information at that time. Generics are another one that it took me a while to find out about. For me, with the .Net world, much of my ability to keep up is based on how much time I have to read MSDN Magazine, which is a particularly helpful resource. J.Ja

Justin James
Justin James

A good tool, while not finding every little trick, is indeed pretty helpful. Of course, profile the app before and after, just to make sure that the tool is sane within your particular codebase. J.Ja

Justin James
Justin James

To be frankly honest, if you are pulling so many results that indexing a pulled dataset will speed things up more than the indexing takes to perform, you may want to start asking yourself the following questions: * Why am I pulling so many rows? Can I truly use this many rows at one time? Would I be better served by paging through the data instead, or pulling a more restrictive query? * Do I really want to cache 10,000 records on my app server? * Does this really fit into my architecture? On the very rare ocassions that I had a result set come out to enough records where I might want to consider a cache or an index on the result set, it usually was a requirement from the customer that did not make sense. Where it does make sense is pulling a result set that you may wish to run subqueries against, which is a fairly infrequent scenario. And even then, the damage of caching a big result set (particularly in an environment where cached objects get shared between servers) has to be carefully weight against the advantages of not going back to the DB. For configuration and such? Great idea. But for the results of a user's queries? Not so sure. J.Ja

metalpro2005
metalpro2005

reporting, performance and security often are not in 'right from the beginning' . Tight deadlines and focus on 'visible requirements' are ! Lets picture the following scenario: - you work at a commercial custom solution IT shop and are pitching for a lets say 'webapplication' - in your proposal you take reporting, security and performance into account - a competitor who puts in his bid, does not and therefore is a lot cheaper - because your client can see the quality in your bid [if you are lucky, often clients does not know or care and just want a 'working solution'], he asks you to modify your proposal without these requirements (we do not need these , we can manage without them and fix it when we need it on a later date) How much time will you (I know depends on scale of project etc.) deduct from your offer to still get the job? Or will you refuse to cut? (convincing the client is not an option) And if you have cut the proposal, you will need a lot of restraint from the programmer not to implement the 'better code'..... My opinion: - application/system design should never be affected by the lack of these 'soft' requirements and implementation afterwards should never be a problem. But it is a hell of a message to bring across to good programmers 'NOT to implement'!

Justin James
Justin James

John - I agree with your point on that 10% gain... sometimes you hit a wall, and that 10% gain can take almost as long to acheive as writing the code in the first place. That being said, the vast majority of the code I have read in my time written by many coders (including myself, embarrasingly enough) was half as fast as it should be, due to laziness, sloppiness, or ignorance. That's the stuff I mean. It does not take a genius to calculate the upper bound of a loop in advance, to avoid recalculation on each iterations, but guess what? Most developers don't do it, and they think they are being clever and efficient because they are saving 1 integer's worth of memory. Stuff like that. The low hanging fruit is low risk and easy to pluck, but most folks out there are not doing it. J.Ja

Justin James
Justin James

Speed of development, cost, and quality, sounds like that manager preferred quality! Managed code for Web apps makes sense, beleive it or not. The cost of letting some shake-n-bake programmer deal with pointers and bound checking on a server is much more than the cost of managed code. For a desktop app, it shows up as the occassional seg fault. On a server, it can take down a whole box. Unwholesome. As slow as managed code can be, the sad fact is, most developers out there are unable to write code at a high enough quality to be using anything else in that environment. That being said, a bad developer can make native code slow or managed code even slower. That's where the focus needs to be, showing that a good code review + refactoring pays for itself with low risk. Gotta speak manager speak to convince managers and all of that. J.Ja

Justin James
Justin James

While most of us are writing for big servers or powerful desktop machines, there are still plenty of folks writing code for devices where battery life is more important than speed, and performance really makes a difference there! J.Ja

SoftwareMaven
SoftwareMaven

In your discussion, you talked about how much cheaper coding for performance is. Certainly, code that is outrageously slow is quite expensive. However, my experience has taught me that one of the most expensive things developers can do it optimize too early and too often. Optimizing code almost always means making code more difficult to read and more difficult to maintain. It often means data storage becomes less efficient. It also usually means an increase in the number of defects in a piece of code for those reasons. As a product manager trying to get a project out the door, quality is of topmost importance to me. Optimizing "for fun" or for some arbitrary sense of obligation is antithetical to that goal. On the flip side, there are certainly reasons to do performance optimizations, but those should be driven by hard requirements. "This function needs to take less than 'n' seconds" or "The system needs to scale to X users in this hardware configuration." At coding time, those requirements guide design decisions to assist with possible optimization later. After coding is complete, you perform analysis of the system to find out where the bottlenecks are, and you optimize only to the degree you have to. tj

bqui001
bqui001

We have a very big server to handle a very big overnight process. the rest of the day it is practically idle. The applicaitons I write are used during the day. User experience is far more important. premature optimisation is the root of all evil.

MadestroITSolutions
MadestroITSolutions

When I first joined the company I work for and was presented with our websites' code, I almost resigned on the spot, lol.... I can't tell you how many times I have suggested refactoring in our applications only to have my recommendations thrown out the window, just like you mentioned. Just so you get a feeling of what I am talking about, the person who designed our corporate website was an Access Report maker [believe it or not] who was "promoted" to developer. The code is very sloppy and all over the place. There is no use of stored procedures and there are SQL statements embedded everywhere in the site. No standards whatsoever as far as coding (or anything really), multiple redundant calls to the database, no caching on an almost static line of products, etc, etc. I actually made a suggestion to REDESIGN the entire thing as doom is inminent with the current design, but they refuse to spend any time for "academic purposes". At least that is what they call my efforts for improvement. I know, I know, what the hell am I doing here?... oh well, it is close to my house, I am tired of commuting to N.Y. and I had enough with the corporate bureocracy!

Tell It Like I See It
Tell It Like I See It

I have a number of minor "performance tricks" that I tend to use all the time. These are generally small things, like what was outlined elsewhere. Things like pre-calculating your loop repetitions, etc. For most good programmers, I think these are second nature. However, I also tend to balance sheer runtime speed against flexibility and maintainability. I generally look for a "happy medium" of these. For some batch processing programs I do, I have a fair number of "settings" stored externally (either database, text file, ini file, etc.). I load the settings into memory once at the beginning of the program; they won't change for the run of the program. This approach adds some flexibility without really affecting the performance of processing a few million records. In this case the benefit is that the programmer doesn't have to recompile anything. So, while I value performance, I tend to see it as one part of a larger picture. I've had times where the only way do something is some specific way that utterly kills performance. I was not happy about having to do it and said so, but I did it. Imagine two data servers in two different network domains. From where this program would run, I could see both servers. I had to pull a set of records from server A and for each record go pull another set of records from server B. No -- the servers were not allowed to talk to each other due to network security considerations. So even though they were MS SQL Servers, you couldn't have one use an external reference to the other and use a join in SQL.

Mike Page
Mike Page

I'm an old school programmer having worried about how to fit my real-time 6502 assembly code into an Apple //e with 64 KB RAM. So, I'm conditioned to think about performance in terms of resource usage and speed of execution. It shocks me how young programmers have the mind set that anything they write will run fast enough without any consideration for performance. For example, one guy had an algorithm that took 3 hours to run. The algorithm needs to run about 50,000 times to process one day's data, which translated to 150,0000 hours - not practical. I looked at the code and made several suggestions. Now the same algorithm takes 8 seconds X 50,000 = 111 hours. My suggestions were quite basic: Pull calculations out of loops, use look up tables where possible, etc. The truth is that performance does matter. You often don't need to wring out every last microsecond, but we should all make efforts to avoid wasteful coding.

Tony Hopkinson
Tony Hopkinson

Design basics and a sense of how things work stands me in good stead for a reasopnable level of performance. On those rare occasions, where performance is wholly unacceptable a simple change, de-normalisation, concatenating queries etc generally gets me back up to speed :D. Not often I have to get under the hood (assembler) for instance anymore. Most investigations usually reveal a bottle neck and usually enforced by a design contraint. In my experience most of the time this is down to say reporting out of a live high volume transactional database or some such.

SoftwareMaven
SoftwareMaven

I think the "rejoin the main thread" is a little bit of an over simplification. The original thread was expecting the results to be returned directly; now, it has to continue processing something different while waiting. Those kinds of changes are very error prone. My experience has taught me that you either plan for the threading up front (e.g. "I know I/O or queries or whatever" can take a long time) or you utilize a different mechanism, such as an event-driven mechanism (such a model is used by Java's Swing). Trying to retrofit threading into a non-threaded application is often very painful. I'm a very pragmatic developer, but this is one of those cases that an ounce of prevention is worth a pound of cure. tj

Justin James
Justin James

I think you might want to give my article a re-read. My "complaint" was not that "no one taught me this!" Not at all! I am self taught in just about everything computer wise. What I took issue with is that performance (and archtiecture too) related information exists, but not where most programmers would find it. Most programmers simply do not pay much attention to the MSDN Web site, MSDN magazine, or even look through the parts of the documentation that touch on the same topics. Instead, they buy "Teach Yourself XYZ in 21 Days" which won't mention performance, or look at some sample code which is rarely very good code, or start looking up function names in the reference guide. And indeed, most programmers simply do not care enough to actually go and look for that data. In my case, as soon as I decided to get more information, it only took me a few minutes to find exactly what I was looking for, as well as learn a lot about the cache system, which I had never really needed in ASP.Net before. Your point about the web casts and such is a good one. One thing I *have* noted is that Microsoft (in particular, but everyone does it) publicizes these things really well right befire, at, and a little while after the launch of something new, but then the hoopla fades away. I happened to be rather AWOL on my reading when .Net 2.0 (and 1.X, bad timing on my part for both of those!) launched, so it took me a while to catch up on things like generics, and obviously caching. But at least I found them. Most developers I have met would not bother looking into it, or have even gotten curious about the subject. Indeed, my biggest issue with a lot of languages is that this level of detail is missing from the documentation! .Net does a decent job with it. J2EE is miserable, and PHP is rather hit or miss. For my numbers on managed vs. interpreted code, follow the link to the forum I mention in the article; I show (somewhat inconclusively, due to disk speed possibly causing some interference) Perl beating C# in basic text processing. For a small, short duration task, an interpretede language should beat managed code every time, simply because the interpreter is lighter than the CLR or JVM, and both .Net and Java application rely upon a ton of libraries that need to get loaded into memory. Where the managed code apps shine is when they get to reside in memory for a while and the code is complicated enough to make the interpreter sweat a bit. J.Ja

Mark Miller
Mark Miller

For web apps. I think managed code is preferable to native code. The stuff you have to deal with as a web developer is complex as it is. It doesn't need pointers to make it more complicated. You asked why management doesn't pay more attention to the efficiency of the software. That's mainly what I was addressing. I wonder what the best way is to go about improving this. It could just be as simple as writing up a "coding standards" document that talks about efficiency practices, and distribute that to the software crew. I know it can take a while to write one of those up. I did that once years ago. So I'm not saying it's a small task. I think the best management could do would be to be more picky about which programmers they hire.

Justin James
Justin James

As I have said elsewhere in these comments, I agree that early "optimization" tends to be a mess. But there are smart coding habits and common sense patterns wheich a lot of programmers just fail to know about entirely. These are the "sweet spot" of performance. Get things like looping fast, get things like passing by reference and passing by value right (and know when to do each!), and so on, and I find that performance is less slow than it could be. For example, when I see a developer create a 200 KB dataset and then start passing it by value as a parameter all over the place, I know that there is a serious performance problem looming, particularly in OO code where you have no clue if one parameter will work through 10 objects and be replicated 10 times on the call stack. But most developers do not think this way, and the Web app that runs great on their desktop or in the test environment suddenly collapses under load, either (hopefully) in the load testing phase (where the need to boost performance just causes a missed deadline) or worse, in Production (where the performance problems require gobs of hardware to deal with). J.Ja

Mike Page
Mike Page

I think that poor performance can be mitigated by making intelligent choices at coding time. You don't have to unroll loops or code in assembler. Just don't recalculate the same value in a loop, choose a sort algorithm appropriate for the number of items, or pick a data structure appropriate for the application's access needs. This doesn't have to hard or time consuming. It can be done at no cost, and can even save time when done well.

Justin James
Justin James

"premature optimisation is the root of all evil." I have always thought that far too many people used that sentence as a cop out. There is a difference (a huge one) between an "optimization" and "good coding practice". Many developers code as if they get paid based on the WPM they code at, not the results of what they type. They sit down and bang away as if their life depended upon it, but won't take the time to think about what they are doing. The optimization quote is pretty specific, in terms of meaning, "don't sit there trying to acheive the most perfectly performing code until the application at least works right." Yolu are right, user experience is extremely important... don't you think that performance plays a large role in that? Study after study shows that after 10 seconds, a Web user is ready to click over to another page. I would wager that thanks to the slow demise of dialup, users are even more impatient. Not too mention that if your service or application is sold with a performance SLA, a 10% speed boot = a 10% reduction in cost to meet SLA. Now, to work my way into a bonus plan where meeting SLA triggers a bonus. ;) J.Ja

Justin James
Justin James

... that I am talking about! I have seen applications that will hit the DB on every page view to find out where the logo belongs. Indeed, PHP and CGI programs are especially bad about stuff like this, because they lack the concept of persistence and shared storage at the applciation level. At best, you load this type of thing at the beginning of the session at cache it there. J.Ja

Tony Hopkinson
Tony Hopkinson

At least you knew it was wrong, amazing how many people don't. Usually some 'expert' who finds VB easier to write than SQL.

Justin James
Justin James

I agree that the farther away from the days of constrained memory and small hard drives have indeed led to a lot more slop in coding. It would be nice is CS cources made students spend a semester working on an old DOS machine, or a creaking mainframe, just to give them an appreciation of the value of a clock cycle or a meg of RAM. J.Ja

Justin James
Justin James

Tony - Great point on that! When will developers start pushing back on the business folks demanding "real time reports" (or worse, "real time, ad hoc reports") and accept the simple concept of a reporting against a de-normalized, read only DB that resides on separate drives and gets replicated from Production once a day? Do they really think those foolish "gas gauges and speedometers" reports are worth jamming up the database that paying customers are using? J.Ja

Justin James
Justin James

A lot of it depends on the circumstances, to be honest. It is not even worth it unless the I/O takes a long time, and there is a lot of work to be done until the I/O results are needed. But that being said, it that circumstance, it most definitely pays off, and the time needed to make it asynchronous typically pays off. For simple operations like making a database connection or dumping a large file to disk, we are talking about 10 - 30 minutes worth of effort. That being said, I agree 100% with you that "on ounce of prevention is worth a pound of cure." I just do not see many developers putting in the effort for either the ounce of prevention *or* the pound of cure, which is really at the heart of my post here. It feels like there is an apathy in the mainstream development community, and I find it rather ironic, since using multithreading, caching, and so on is easier now than it ever was before. J.Ja

Justin James
Justin James

You raise some good points here, in terms of the hiring process. I have been deeply involved with that myself lately, and I am gaining a reputation as a tough interviewer because I do cut into the fundamentals fairly deeply. I have been truly shocked and amazed at the number of folks with Masters in CS who cannot tell me the difference between passing by reference and passing by value... how could I let someone like that write code more intensive than gluing libraries together? Another item which I am going to add to my bag is, "explain the difference between 'equality' and 'equivalency.'" Again, the number of even "senior" developers who were taught (or self taught) in the last 5 - 10 years who can answer that question (or even come close) is surprisingly low! Along the lines of what you mentioned in point #3, I think that not only should these things be tracked, but play a significant role in performance reviews and bonuses. A developer who writes code that frequently misses the mark in terms of quality, security, speed, etc. should be warned to "shape up or ship out," assigned a mentor, and have more feedback on their code, and should not be awarded as well as a coder who consistently makes high quality code. Regarding the code speed: I think that you are right. I will tell you what, I really am jammed up the next week or two, two jammed up to write a proper test harness that takes the disk speed out of the loop to perform the timings. But I will make the time for it soon, and write a blog about it. If you watch this space, expect to see a blog about it soon, with links to the code. I also tend to agree that 1,000,000 iterations of managed code tends to run much faster per iteration than the same code run 100 times. There is just something magical (particularly in .Net) about keeping code in memory and multiple iterations to speed it up. We shall see in a few weeks! J.Ja

Logos-Systems
Logos-Systems

Justin, I did re-read your article, and your response, and I agree that that you described what you did, and potential problems of why the numbers are screwed. But I still stand by my title that with out concrete examples and numbers for everyone examine it more of a complaint rather than a question. That would be like the medical community claiming that a given medication was dangerous to the patient and only broadly describe the process they used to make this pronouncement. This would never be tolerated in the medical community, or other scientific or engineering communities. If we expect others to believe our statements as IT Professionals then we need to adhere to the same type of standards. We also need to call into question any pronouncement that does not adhere to reasonable ser of professional standards. Therefore I would ask you also post your code for both PHP and .NET Managed Code. I?m willing to bet that that the community can optimize the Managed Code to show that I can run faster then PHP. I?m sure that for a single iteration of a simple algorithm PHP maybe able to outperform Managed Code; but if I ran the algorithm say a 1,000,000 times Managed Code would still outperform any interruptive code. So please post both code, and the numbers you got for them. I will agree that most developers are self-taught, believe they are qualified to develop software because they have read and a book ?Learn XYZ in 21 Days?. But that problem speaks to the couple of other problems. 1. The Interview Process. We tell them that this is not important, because we never ask questions on how they would optimize a piece of code. Or what lines of code are causing the this code fragment to run less then optimum . If we don?t select experienced software developers based these kind of questions then why do we expect them to produce optimum code once they are hired. Also we are willing to hire self-taught over those that have gone through a demonstrated set of courses, such as colleges, recognized boot camps and training facilities that produce quality graduates. But if you want more then those that have read a couple of books that talk the talk, without knowing how to do the work, then you have to start at the interview process to weed these people out. 2. For those that have been hired, either as Junior Programmer?s, or those who got through the interview process and shouldn?t have. The lead and senior software engineers need to mentor these people to train them in how to produce optimum designs and code, and they need to know that if they fail to learn and apply the training that they will not be working at job they have very long. 3. Management must be willing to establish Software Quality Assurance/Test Groups whose job it is to find these failures in the design/code and report them not only to development but also to management. As far as most developers not using MSDN site or magazine is no excuse! Even if you go out and just type in the ?ASP.NET Caching? you will get the following results: ?Results 1 - 10 of about 1,820,000 for ASP.NET Caching? on Google, I get Results 1 - 10 of about 1,030,000 on Yahoo.com, and if you go to MSN.com you get ?ASP.NET Caching Page 1 of 193,156 results?. You would also have the same type of results if you search these sites for ?Performance Best Practices". Given that these are the three top Web Search Sites I would have to say that it is because they are lazy, incompetent, ignorant, clearly acting in an unethical and unprofessional manner!

bqui001
bqui001

Good thread behaviour and short database transactions will have more beneficial impact than unrolled loops and efficient sort routines. Then again unrolling loops is usually a complete waste of time and will just create messier code, as almost any commercial grade compiler will do things like that anyway. if my process is in the middle of a job that is the heaviest usage of the machine then i will spend a long time thinking about how it will impact the system. if i have very high performance requirements I will think very hard about how to meet them. However if the decision is between a web page refresh being 300ms faster or supporting the systems being easier. I will advocate the second every time.

Tell It Like I See It
Tell It Like I See It

To be fair, I never really figured out what the deal was for this refusal. I suspect that one data server was in a web DMZ and another was on the "Inside Network". The security guys were probably worried about opening up a route for hackers to use or something like that. So, maybe they had some half-logical reason. But I still am not sure why the Inside server couldn't pull data from the DMZ (if that was indeed the situation). Either way, this lack of communication between them forced me to use code I hated writing. Even thinking about it now still gets my blood pressure up.

Tony Hopkinson
Tony Hopkinson

but they are SQL, foolishly assuming that an other person has created the table, done the contraints and popped in some indexes. You could teach the basics of what a developer must know about databases in a week.

Tell It Like I See It
Tell It Like I See It

Don't worry, I didn't take any offense. I was just trying to point out a couple of things. 1 - that SQL exists for working with data and it should be used to its maximum when doing so 2 - in my opinion, it helps to match the tool you use to the goal you are trying to accomplish You touch on an issue that I have with some training stuff. Much of what I find for general development (or general database usage) is woefully basic. It would be good to teach someone in high school, but not so helpful for expert developers. I'm not sure what can be done about it other than to stick with classes focused on specific technologies (or products) you are trying to learn and hope for the best. If you can see the course syllabus, that helps, of course.

Tony Hopkinson
Tony Hopkinson

but one at the cookie cutter, once wrote a macro solution developers I end up sweeping up after. Functionally VB (.net) certainly with Orcas is a much of muchness comapred to C#, some on the twiddles in it to make development more 'approachable' leave a bad taste in my mouth. Modern 'Compiled' languages other than VB tend to be far stricter compiler wise potentially giving rise to better code, if you know what you are doing. VB's principal appeal is being able to bash something together that looks OK without any real formal study. I was at TechEd (Barcelona) last year the 4th (highest) level course on writing DB applications was principlally about not dragging all your data into the client and then processing a bit of it. I couldn't understand sending anyone there, if they didn't know that already, but there you go.

Tell It Like I See It
Tell It Like I See It

Actually, I use VB quite a bit. But I also recognize that if I can let the server do some of the work for me, it will likely be faster. To me, I use whatever I can use to make it work well. "Well" being a rather nebulous term at times (like in the situation I described above). SQL doesn't handle presentations too well, nor does it handle web pages all that well. So, I use VB or VB.NET (depending on if I work on legacy traditional ASP or new pages). But when it comes to getting data, I do as much as I can with SQL rather than VB. In my opinion, that's why SQL exists. Yes, I've done some work in C#, but I haven't learned it sufficiently to be truly comfortable writing it yet. When necessary, I'll even use VBA. It all just depends on what I'm trying to accomplish and which set of tools better meshes with the big-picture goal.

Justin James
Justin James

I agree that a lot of this stuff can be taught with managed code... but a lot of it cannot be taught with managed code. A data structures class, for example, is rather lacking when taught in a GC'ed langugage without pointers. I also agree that testing and security courses are needed as well. This is one reason why I favor breaking out education into a "trade school" track for people who want to be "real world programmers", and an "theory" track for people who want to do hardcore "computer science." Someone who wants to be a programmer would benefit highly from learning things like requirement gathering, testing/security, and so on, without being innundated with a lot of the theory intensive stuff that happens in a CS program. On the flip side, someone wanting to learn the intensive internal theory of the science (calculating the speed of sorting algorithms is my faviorite example) would be on the theory track. It makes little sense to spend the time teaching a mainstream developer how to build a tree or graph, or how to write a sorting algorithm, they just need to trust the library written by the theory folks to use the best algorithm or structure for their needs. I think this will benefit the industry overall rather significantly. J.Ja

Logos-Systems
Logos-Systems

I agree that CS programs need to teach more in Data Structures, Algorithm Design, and Design Optimization. But I can teach these classes using either .NET or J2SEE. While they are leaning to do these courses they would be required to use tools that are used in the real world to do Memory Profiling to find Memory Leaks, or other Performance Counters that would allow you to optimize an algorithm and the data structure that it uses. While we are talking about course, another course that is needed is Testing and Security. I?m not talking about the simple Black Box Testing that most developer?s do to prove that their code is correct; but rather a combination of Black Box, White Box Testing, Performance and Stress Testing that are designed to break applications. First they would have to find out how to break the application, and then to complete the problem they must recommend the correction that is need to resolve the problem. The other course is Security; how to design and build ?Security In Depth? into an application. But the problem in the IT Industry is that most practitioners are self-taught and have little if any college course work.

Tell It Like I See It
Tell It Like I See It

I've had requests come in to "start tracking this kind of data". One of my favorite questions in such an instance is "what does tracking this data give us?" or "why do we need to track it?" I get a response along the lines of "so we can have a report to track how we are doing." At that point I ask for a sample layout of the report. As you might suspect I generally get answers like "we don't know what we want yet" or "why do we need to define it now?" My reply at this point is a straight-faced, "well, if you don't know what you want on the report, how can I know what fields I need to store the data you want reported?" Fortunately for me, this usually drives home my point. I'm a developer who has done a lot of report development (and some documentation). It's not my favorite thing, but it fits better with my skills than, oh, say help desk or data entry. However, I insist on an understanding. If you want me to build a report, we'll talk about my schedule, etc. But if you want me to run the report daily (or any other schedule), we won't even talk. As I see it, I'm a developer, not a clerk. Often the reports I develop are for other departments and in my view the department the report is built for should shoulder assignment of a resource to run the report. Otherwise, I see it as a form of poaching. Additionally, there are very few developers in this company. The company can't afford to have these few developers spending their time running reports when there are so many other people who can hit a button on a form (or something similar) to run the reports and mail them. Especially when there are so many tools the company also wants built.

Tell It Like I See It
Tell It Like I See It

Actually, I don't work for an IT department any more. Technically I work for a documentation department. But my supervisor is one who understands about cooperating with other departments. Plus, the new project management software would (hoepfully) help our department get things done better/faster/easier/etc. when you look at the big picture. To be honest, the IT department pretty much bowed out of the entire project management upgrade project -- at least as much as they could. They still had to handle some issues with the project management software under Citrix (for remote users) and such. I was basically the architect for the Reporting DB. I got elected because I was the one who'd done the most work with the MS Access based client reports. I was also the person most skilled with running apps at night to perform updates, etc. It also helped that I used to work for what passes as the IT department here, but was transferred to the Documentation department because that's where they put the web oversight. (Digressing a bit but the IT manager felt neither he nor his department should be responsible for web development or development in general.) So far, we haven't had a need to store any "business quarters". Under current operating philosophy, we'd add a field to the appropriate table in the Reporting DB and it would be calculated by the processes that load up the data as a part of the load. That's an additional bonus we found already - we can basically translate things, if needed, during the load process and also calculate some fields. For example, one of the items loaded is call ticket data and one of the calculated fields there is ticket age (from the time the ticket was opened to the current date). Now that some people have seen it and begun doing some minor work with it, they love it. To them it is easy to use and much more convenient than the live system databases. So far the worst complaint I had was along the lines of wanting a data dictionary. I put together a little VBA code in Access to read the SQL table structures and put that information into more SQL tables. I then went in and added some title/description type fields. Then I put a web page on the intranet to display the information out of the data dictionary tables. Oh, yes, I also have a report in the Access database that I export to an RTF file and post in a "knowledgebase" section of our network. So, take your pick on how you want your data dictionary :)

Justin James
Justin James

... that you are onto something. We need a big concert. Steve Ballmer could do an encore of his smash hit, "Developers, developers, developers!" and maybe even dance a bit. James Gosling could show of his covers of classic early 90's West Coast gangsta' rap. Maybe a guess appearance by Martin Fowler, doing his spoken word refactorings of famous 19th century poetry. But seriously... You are dead on right, reporting is something that often gets conveniently overloked. For one thing, most of the developers I know despise it, the only task they will do reporting before is documentation. For another thing, there is always the assumption that reporting is "simple". The salesguys love to show the customers all of the pretty gas gauges and pie charts that they will get "in real time, showing you minute by minute how your company is running!" But then no one really defines the reports up front ("yeah, we need a yearly sales chart, and, uh, some other reports too, with blue pie charts!") and a week before the project is due, someone asks about the reports, and then chaos ensues. Instead, work reporting needs to be be treated almost as a separate but related project entirely, with its own timeline, goals, and requirements gathering. Ah well. :) J.Ja

Justin James
Justin James

That's a good point about global companies. For a lot of folks, there really is no "overnight", so frequent, small syncs are the way to go. J.Ja

Justin James
Justin James

Sometimes that's really the best you can do, deliver something dumb to an insistent customer, and let them be miserable with it. J.Ja

Justin James
Justin James

... for one of my former employers? I swear I've heard that exact requirement specified at least a few times... ;) J.Ja

Justin James
Justin James

... you work with a pretty good group of people! It is rare that an IT group is able to rationally talk with the business group and have their side of things considered and accepted. The solution that you present is just about ideal, but it is rare that you see it fully and/or properly implemented. One thing too, on the note of reporting, is just how different the needs of an application writer and a report writer are at the database level. Sure, the reporter likes a de-normalized DB for speed and ease of use. But one thing I have seen a lot, is that the report writer needs (or prefers) for the dates to be stored as a reference to a corporate calendar table, so caluclating quarterly numbers and such is easy. Meanwhile, the app writer really has no use for that the vast majority of the time, and will most likely not even think of storing a date as anything other than a date field. J.Ja

paulo
paulo

We use log shipping from our live server to a fail over server, because our data isnt super critical we ship every 15 minutes. This gives us a readonly database that we can query for almost real time data, it is at most about 20 mins behind. We went for this approach as ours is a global application, so there is no such thing as "overnight", there is no time when the servers are not being hit by people somewhere in the world. Also "yesterday" is a different set of hours to people in different time zones. We are currently working on a plan to data mine out of the readonly database into a denormalised database for reporting and a few other long running queries.

MadestroITSolutions
MadestroITSolutions

I have an interface that "fits your description", lol.... They just wouldn't listen, so I shut up and built them the damn thing. It is slow as hell and chokes every time they go beyond two months of data! But hey, they "need" real time data....

MadestroITSolutions
MadestroITSolutions

I guess they just don't understand the workhorse power it takes to produce the reports they want and the impact it has on the system overall. I go through this every time.

Locrian_Lyric
Locrian_Lyric

"Hi! We need to search all the data since the beginning of time, have the information in a crosstab format and have this in real time. Oh, and we want to be able to change parameters on the fly."

Tell It Like I See It
Tell It Like I See It

We are in the process of putting together a separate Reporting Database to pull data from a number of our systems. This combines data from various systems (in some cases merging data from multiple systems) and optimizes it for reports -- meaning highly de-normalized tables. This started as a result of implementing a new system for project management (or manglement depending on your views) that stored data in a highly normalized way. It was so normalized that even the customization programmers working on this upgrade project had trouble finding certain values. When talk turned to reporting from that system, all the customization programmers glanced at each other with barely controlled laughter. They said that even they have trouble finding any of the values in that database and the average employee would have no chance whatsoever of finding any useful data. Despite the fact that this could sound a bit arrogant, it wasn't. It was simply a truthful statement. The structure in this product was so normalized that reporting from it would be pathetic. If you wanted to get 5 pieces of data for one project out of that system, you'd have to have something on the order 8 or so nested select statements in SQL code. Imagine what it would be in any kind of reporting tool! Pretty much every value you have for the project was a separate record in some table (depending on the value, it could be in one of about 50-75 tables). As an example, to get a person's name required getting about 6 separate records spread over something like 3 or 4 different tables. Total nightmare for reporting. So, we suggested a new database that gets updated nightly, say 1:00 am or so. The data is de-normalized in order to make reporting easier for the average Joe/Jane. When some initial responses came in that it wasn't really "live" data then, we pressed on why it needs to be live data. Those who raised the issue couldn't give us a solid answer that we were not able to shoot down. We even pointed out that in some cases, reporting from the live database caused us problems (see below). Beyond that, we pointed out that our definition of certain reports to clients were effectively, "what happened yesterday"; not what is happening now. Running against the live database gives you current, which is NOT what we promised to give. In fact, the nightly update better matches what we promised. To help sell the matter, we even pointed out that there would be fewer record locks in the various "live" systems because fewer reports are pulled from their data. This would mean improved performance in the live systems. BTW, we also pointed out that the record locking issue has, at times, effectively shut down the live system when someone runs a long report. We also mentioned that security for the live databases increases because fewer people would need to see the raw data or mess with it at all. Instead, they would go to the Reporting DB, which could be read-only. Even if someone hacked past the read-only and was able to change some data, that change would be overwritten the next time the update is done. We are in the final stages of rolling out the Reporting DB. The main things we have left to work out is the fact that the project management software is loading test data that they said they were going to delete. There is also a question about what values they are putting in some fields. Funny, though, the Reporting DB combines data from the project system with data from other systems as well. The other systems are all loading data perfectly as far as we can tell.

metalpro2005
metalpro2005

Maybe start some rock concerts on each continent to spread awareness? :-) I de-scope reporting issues on all projects I do for the initial release . This way I can convince the client/users that the system will run smoothly before reporting is in. If I am not defensive on this issue, a lot of energy is wasted in discussions when the system is in production and the client-focus is on 'bad code' in stead of 'silly reporting requirements and realtime enterprise' discussions. In my experience, the reason why developers implement reporting on production hardware/environment because : 1. it can be done and is challenging to build (Bob the builder syndrome) 2. in the design phase there was not enough analysis done on the reporting requirements and the project budget 'dried up', so this analysis is never done, and there is no budget to offload the queries to a different DB server, but the reports needs to be implemented ASAP. And for speed optimizing in general; we all know the saying : 'premature optimization is the root of all evil' (source: codecomplete) Having said this, bad coding solutions still are bad coding solutions.

Justin James
Justin James

... much of the world are those "complete pratts". My favirotie is when they want to have a Production box sloweded down through real time reporting to pull... you guessed it... performance numbers! J.Ja

Tony Hopkinson
Tony Hopkinson

inaccuracy is permitted. A quick cummulative set of figures running off very simple triggers for instance, for total throughput etc should more than do the job for a an overview of say how production is going through a shift or some such. 'Proper' reporting, ad hoc analysis etc should be done off line though. On a database designed for it. It doesn't have to include up to the minute data. After all that's dynamic and may change anyway, so any tablets of stone type figures are bolocks anyway. Only a complete pratt would slow down dynamic data collection to accurately report on what was not being collected.