How much effort do you put into making your cade fast when you program? What kinds of effort do you put into it?
J.Ja
Discussion on:
View:
Show:
Design basics and a sense of how things work stands me in good stead for a reasopnable level of performance. On those rare occasions, where performance is wholly unacceptable a simple change, de-normalisation, concatenating queries etc generally gets me back up to speed
.
Not often I have to get under the hood (assembler) for instance anymore. Most investigations usually reveal a bottle neck and usually enforced by a design contraint.
In my experience most of the time this is down to say reporting out of a live high volume transactional database or some such.
Not often I have to get under the hood (assembler) for instance anymore. Most investigations usually reveal a bottle neck and usually enforced by a design contraint.
In my experience most of the time this is down to say reporting out of a live high volume transactional database or some such.
Tony -
Great point on that! When will developers start pushing back on the business folks demanding "real time reports" (or worse, "real time, ad hoc reports") and accept the simple concept of a reporting against a de-normalized, read only DB that resides on separate drives and gets replicated from Production once a day? Do they really think those foolish "gas gauges and speedometers" reports are worth jamming up the database that paying customers are using?
J.Ja
Great point on that! When will developers start pushing back on the business folks demanding "real time reports" (or worse, "real time, ad hoc reports") and accept the simple concept of a reporting against a de-normalized, read only DB that resides on separate drives and gets replicated from Production once a day? Do they really think those foolish "gas gauges and speedometers" reports are worth jamming up the database that paying customers are using?
J.Ja
inaccuracy is permitted.
A quick cummulative set of figures running off very simple triggers for instance, for total throughput etc should more than do the job for a an overview of say how production is going through a shift or some such.
'Proper' reporting, ad hoc analysis etc should be done off line though. On a database designed for it. It doesn't have to include up to the minute data. After all that's dynamic and may change anyway, so any tablets of stone type figures are bolocks anyway.
Only a complete pratt would slow down dynamic data collection to accurately report on what was not being collected.
A quick cummulative set of figures running off very simple triggers for instance, for total throughput etc should more than do the job for a an overview of say how production is going through a shift or some such.
'Proper' reporting, ad hoc analysis etc should be done off line though. On a database designed for it. It doesn't have to include up to the minute data. After all that's dynamic and may change anyway, so any tablets of stone type figures are bolocks anyway.
Only a complete pratt would slow down dynamic data collection to accurately report on what was not being collected.
... much of the world are those "complete pratts". My favirotie is when they want to have a Production box sloweded down through real time reporting to pull... you guessed it... performance numbers!
J.Ja
J.Ja
Maybe start some rock concerts on each continent to spread awareness? 
I de-scope reporting issues on all projects I do for the initial release . This way I can convince the client/users that the system will run smoothly before reporting is in. If I am not defensive on this issue, a lot of energy is wasted in discussions when the system is in production and the client-focus is on 'bad code' in stead of 'silly reporting requirements and realtime enterprise' discussions.
In my experience, the reason why developers implement reporting on production hardware/environment because :
1. it can be done and is challenging to build (Bob the builder syndrome)
2. in the design phase there was not enough analysis done on the reporting requirements and the project budget 'dried up', so this analysis is never done, and there is no budget to offload the queries to a different DB server, but the reports needs to be implemented ASAP.
And for speed optimizing in general; we all know the saying : 'premature optimization is the root of all evil' (source: codecomplete)
Having said this, bad coding solutions still are bad coding solutions.
I de-scope reporting issues on all projects I do for the initial release . This way I can convince the client/users that the system will run smoothly before reporting is in. If I am not defensive on this issue, a lot of energy is wasted in discussions when the system is in production and the client-focus is on 'bad code' in stead of 'silly reporting requirements and realtime enterprise' discussions.
In my experience, the reason why developers implement reporting on production hardware/environment because :
1. it can be done and is challenging to build (Bob the builder syndrome)
2. in the design phase there was not enough analysis done on the reporting requirements and the project budget 'dried up', so this analysis is never done, and there is no budget to offload the queries to a different DB server, but the reports needs to be implemented ASAP.
And for speed optimizing in general; we all know the saying : 'premature optimization is the root of all evil' (source: codecomplete)
Having said this, bad coding solutions still are bad coding solutions.
... that you are onto something. We need a big concert. Steve Ballmer could do an encore of his smash hit, "Developers, developers, developers!" and maybe even dance a bit. James Gosling could show of his covers of classic early 90's West Coast gangsta' rap. Maybe a guess appearance by Martin Fowler, doing his spoken word refactorings of famous 19th century poetry.
But seriously...
You are dead on right, reporting is something that often gets conveniently overloked. For one thing, most of the developers I know despise it, the only task they will do reporting before is documentation. For another thing, there is always the assumption that reporting is "simple". The salesguys love to show the customers all of the pretty gas gauges and pie charts that they will get "in real time, showing you minute by minute how your company is running!" But then no one really defines the reports up front ("yeah, we need a yearly sales chart, and, uh, some other reports too, with blue pie charts!") and a week before the project is due, someone asks about the reports, and then chaos ensues.
Instead, work reporting needs to be be treated almost as a separate but related project entirely, with its own timeline, goals, and requirements gathering.
Ah well.
J.Ja
But seriously...
You are dead on right, reporting is something that often gets conveniently overloked. For one thing, most of the developers I know despise it, the only task they will do reporting before is documentation. For another thing, there is always the assumption that reporting is "simple". The salesguys love to show the customers all of the pretty gas gauges and pie charts that they will get "in real time, showing you minute by minute how your company is running!" But then no one really defines the reports up front ("yeah, we need a yearly sales chart, and, uh, some other reports too, with blue pie charts!") and a week before the project is due, someone asks about the reports, and then chaos ensues.
Instead, work reporting needs to be be treated almost as a separate but related project entirely, with its own timeline, goals, and requirements gathering.
Ah well.
J.Ja
I've had requests come in to "start tracking this kind of data".
One of my favorite questions in such an instance is "what does tracking this data give us?" or "why do we need to track it?"
I get a response along the lines of "so we can have a report to track how we are doing."
At that point I ask for a sample layout of the report. As you might suspect I generally get answers like "we don't know what we want yet" or "why do we need to define it now?"
My reply at this point is a straight-faced, "well, if you don't know what you want on the report, how can I know what fields I need to store the data you want reported?"
Fortunately for me, this usually drives home my point.
I'm a developer who has done a lot of report development (and some documentation). It's not my favorite thing, but it fits better with my skills than, oh, say help desk or data entry.
However, I insist on an understanding. If you want me to build a report, we'll talk about my schedule, etc. But if you want me to run the report daily (or any other schedule), we won't even talk.
As I see it, I'm a developer, not a clerk. Often the reports I develop are for other departments and in my view the department the report is built for should shoulder assignment of a resource to run the report. Otherwise, I see it as a form of poaching.
Additionally, there are very few developers in this company. The company can't afford to have these few developers spending their time running reports when there are so many other people who can hit a button on a form (or something similar) to run the reports and mail them. Especially when there are so many tools the company also wants built.
One of my favorite questions in such an instance is "what does tracking this data give us?" or "why do we need to track it?"
I get a response along the lines of "so we can have a report to track how we are doing."
At that point I ask for a sample layout of the report. As you might suspect I generally get answers like "we don't know what we want yet" or "why do we need to define it now?"
My reply at this point is a straight-faced, "well, if you don't know what you want on the report, how can I know what fields I need to store the data you want reported?"
Fortunately for me, this usually drives home my point.
I'm a developer who has done a lot of report development (and some documentation). It's not my favorite thing, but it fits better with my skills than, oh, say help desk or data entry.
However, I insist on an understanding. If you want me to build a report, we'll talk about my schedule, etc. But if you want me to run the report daily (or any other schedule), we won't even talk.
As I see it, I'm a developer, not a clerk. Often the reports I develop are for other departments and in my view the department the report is built for should shoulder assignment of a resource to run the report. Otherwise, I see it as a form of poaching.
Additionally, there are very few developers in this company. The company can't afford to have these few developers spending their time running reports when there are so many other people who can hit a button on a form (or something similar) to run the reports and mail them. Especially when there are so many tools the company also wants built.
We are in the process of putting together a separate Reporting Database to pull data from a number of our systems. This combines data from various systems (in some cases merging data from multiple systems) and optimizes it for reports -- meaning highly de-normalized tables.
This started as a result of implementing a new system for project management (or manglement depending on your views) that stored data in a highly normalized way. It was so normalized that even the customization programmers working on this upgrade project had trouble finding certain values.
When talk turned to reporting from that system, all the customization programmers glanced at each other with barely controlled laughter. They said that even they have trouble finding any of the values in that database and the average employee would have no chance whatsoever of finding any useful data.
Despite the fact that this could sound a bit arrogant, it wasn't. It was simply a truthful statement. The structure in this product was so normalized that reporting from it would be pathetic.
If you wanted to get 5 pieces of data for one project out of that system, you'd have to have something on the order 8 or so nested select statements in SQL code. Imagine what it would be in any kind of reporting tool! Pretty much every value you have for the project was a separate record in some table (depending on the value, it could be in one of about 50-75 tables). As an example, to get a person's name required getting about 6 separate records spread over something like 3 or 4 different tables. Total nightmare for reporting.
So, we suggested a new database that gets updated nightly, say 1:00 am or so. The data is de-normalized in order to make reporting easier for the average Joe/Jane.
When some initial responses came in that it wasn't really "live" data then, we pressed on why it needs to be live data. Those who raised the issue couldn't give us a solid answer that we were not able to shoot down.
We even pointed out that in some cases, reporting from the live database caused us problems (see below). Beyond that, we pointed out that our definition of certain reports to clients were effectively, "what happened yesterday"; not what is happening now. Running against the live database gives you current, which is NOT what we promised to give. In fact, the nightly update better matches what we promised.
To help sell the matter, we even pointed out that there would be fewer record locks in the various "live" systems because fewer reports are pulled from their data. This would mean improved performance in the live systems. BTW, we also pointed out that the record locking issue has, at times, effectively shut down the live system when someone runs a long report.
We also mentioned that security for the live databases increases because fewer people would need to see the raw data or mess with it at all. Instead, they would go to the Reporting DB, which could be read-only. Even if someone hacked past the read-only and was able to change some data, that change would be overwritten the next time the update is done.
We are in the final stages of rolling out the Reporting DB. The main things we have left to work out is the fact that the project management software is loading test data that they said they were going to delete. There is also a question about what values they are putting in some fields.
Funny, though, the Reporting DB combines data from the project system with data from other systems as well. The other systems are all loading data perfectly as far as we can tell.
This started as a result of implementing a new system for project management (or manglement depending on your views) that stored data in a highly normalized way. It was so normalized that even the customization programmers working on this upgrade project had trouble finding certain values.
When talk turned to reporting from that system, all the customization programmers glanced at each other with barely controlled laughter. They said that even they have trouble finding any of the values in that database and the average employee would have no chance whatsoever of finding any useful data.
Despite the fact that this could sound a bit arrogant, it wasn't. It was simply a truthful statement. The structure in this product was so normalized that reporting from it would be pathetic.
If you wanted to get 5 pieces of data for one project out of that system, you'd have to have something on the order 8 or so nested select statements in SQL code. Imagine what it would be in any kind of reporting tool! Pretty much every value you have for the project was a separate record in some table (depending on the value, it could be in one of about 50-75 tables). As an example, to get a person's name required getting about 6 separate records spread over something like 3 or 4 different tables. Total nightmare for reporting.
So, we suggested a new database that gets updated nightly, say 1:00 am or so. The data is de-normalized in order to make reporting easier for the average Joe/Jane.
When some initial responses came in that it wasn't really "live" data then, we pressed on why it needs to be live data. Those who raised the issue couldn't give us a solid answer that we were not able to shoot down.
We even pointed out that in some cases, reporting from the live database caused us problems (see below). Beyond that, we pointed out that our definition of certain reports to clients were effectively, "what happened yesterday"; not what is happening now. Running against the live database gives you current, which is NOT what we promised to give. In fact, the nightly update better matches what we promised.
To help sell the matter, we even pointed out that there would be fewer record locks in the various "live" systems because fewer reports are pulled from their data. This would mean improved performance in the live systems. BTW, we also pointed out that the record locking issue has, at times, effectively shut down the live system when someone runs a long report.
We also mentioned that security for the live databases increases because fewer people would need to see the raw data or mess with it at all. Instead, they would go to the Reporting DB, which could be read-only. Even if someone hacked past the read-only and was able to change some data, that change would be overwritten the next time the update is done.
We are in the final stages of rolling out the Reporting DB. The main things we have left to work out is the fact that the project management software is loading test data that they said they were going to delete. There is also a question about what values they are putting in some fields.
Funny, though, the Reporting DB combines data from the project system with data from other systems as well. The other systems are all loading data perfectly as far as we can tell.
... you work with a pretty good group of people! It is rare that an IT group is able to rationally talk with the business group and have their side of things considered and accepted. The solution that you present is just about ideal, but it is rare that you see it fully and/or properly implemented.
One thing too, on the note of reporting, is just how different the needs of an application writer and a report writer are at the database level. Sure, the reporter likes a de-normalized DB for speed and ease of use. But one thing I have seen a lot, is that the report writer needs (or prefers) for the dates to be stored as a reference to a corporate calendar table, so caluclating quarterly numbers and such is easy. Meanwhile, the app writer really has no use for that the vast majority of the time, and will most likely not even think of storing a date as anything other than a date field.
J.Ja
One thing too, on the note of reporting, is just how different the needs of an application writer and a report writer are at the database level. Sure, the reporter likes a de-normalized DB for speed and ease of use. But one thing I have seen a lot, is that the report writer needs (or prefers) for the dates to be stored as a reference to a corporate calendar table, so caluclating quarterly numbers and such is easy. Meanwhile, the app writer really has no use for that the vast majority of the time, and will most likely not even think of storing a date as anything other than a date field.
J.Ja
Actually, I don't work for an IT department any more. Technically I work for a documentation department. But my supervisor is one who understands about cooperating with other departments. Plus, the new project management software would (hoepfully) help our department get things done better/faster/easier/etc. when you look at the big picture.
To be honest, the IT department pretty much bowed out of the entire project management upgrade project -- at least as much as they could. They still had to handle some issues with the project management software under Citrix (for remote users) and such.
I was basically the architect for the Reporting DB. I got elected because I was the one who'd done the most work with the MS Access based client reports. I was also the person most skilled with running apps at night to perform updates, etc.
It also helped that I used to work for what passes as the IT department here, but was transferred to the Documentation department because that's where they put the web oversight. (Digressing a bit but the IT manager felt neither he nor his department should be responsible for web development or development in general.)
So far, we haven't had a need to store any "business quarters". Under current operating philosophy, we'd add a field to the appropriate table in the Reporting DB and it would be calculated by the processes that load up the data as a part of the load.
That's an additional bonus we found already - we can basically translate things, if needed, during the load process and also calculate some fields. For example, one of the items loaded is call ticket data and one of the calculated fields there is ticket age (from the time the ticket was opened to the current date).
Now that some people have seen it and begun doing some minor work with it, they love it. To them it is easy to use and much more convenient than the live system databases.
So far the worst complaint I had was along the lines of wanting a data dictionary. I put together a little VBA code in Access to read the SQL table structures and put that information into more SQL tables. I then went in and added some title/description type fields. Then I put a web page on the intranet to display the information out of the data dictionary tables. Oh, yes, I also have a report in the Access database that I export to an RTF file and post in a "knowledgebase" section of our network.
So, take your pick on how you want your data dictionary
To be honest, the IT department pretty much bowed out of the entire project management upgrade project -- at least as much as they could. They still had to handle some issues with the project management software under Citrix (for remote users) and such.
I was basically the architect for the Reporting DB. I got elected because I was the one who'd done the most work with the MS Access based client reports. I was also the person most skilled with running apps at night to perform updates, etc.
It also helped that I used to work for what passes as the IT department here, but was transferred to the Documentation department because that's where they put the web oversight. (Digressing a bit but the IT manager felt neither he nor his department should be responsible for web development or development in general.)
So far, we haven't had a need to store any "business quarters". Under current operating philosophy, we'd add a field to the appropriate table in the Reporting DB and it would be calculated by the processes that load up the data as a part of the load.
That's an additional bonus we found already - we can basically translate things, if needed, during the load process and also calculate some fields. For example, one of the items loaded is call ticket data and one of the calculated fields there is ticket age (from the time the ticket was opened to the current date).
Now that some people have seen it and begun doing some minor work with it, they love it. To them it is easy to use and much more convenient than the live system databases.
So far the worst complaint I had was along the lines of wanting a data dictionary. I put together a little VBA code in Access to read the SQL table structures and put that information into more SQL tables. I then went in and added some title/description type fields. Then I put a web page on the intranet to display the information out of the data dictionary tables. Oh, yes, I also have a report in the Access database that I export to an RTF file and post in a "knowledgebase" section of our network.
So, take your pick on how you want your data dictionary
"Hi! We need to search all the data since the beginning of time, have the information in a crosstab format and have this in real time. Oh, and we want to be able to change parameters on the fly."
I have an interface that "fits your description", lol....
They just wouldn't listen, so I shut up and built them the damn thing. It is slow as hell and chokes every time they go beyond two months of data!
But hey, they "need" real time data....
They just wouldn't listen, so I shut up and built them the damn thing. It is slow as hell and chokes every time they go beyond two months of data!
But hey, they "need" real time data....
Sometimes that's really the best you can do, deliver something dumb to an insistent customer, and let them be miserable with it.
J.Ja
J.Ja
... for one of my former employers? I swear I've heard that exact requirement specified at least a few times... 
J.Ja
J.Ja
I guess they just don't understand the workhorse power it takes to produce the reports they want and the impact it has on the system overall.
I go through this every time.
I go through this every time.
We use log shipping from our live server to a fail over server, because our data isnt super critical we ship every 15 minutes. This gives us a readonly database that we can query for almost real time data, it is at most about 20 mins behind.
We went for this approach as ours is a global application, so there is no such thing as "overnight", there is no time when the servers are not being hit by people somewhere in the world. Also "yesterday" is a different set of hours to people in different time zones.
We are currently working on a plan to data mine out of the readonly database into a denormalised database for reporting and a few other long running queries.
We went for this approach as ours is a global application, so there is no such thing as "overnight", there is no time when the servers are not being hit by people somewhere in the world. Also "yesterday" is a different set of hours to people in different time zones.
We are currently working on a plan to data mine out of the readonly database into a denormalised database for reporting and a few other long running queries.
That's a good point about global companies. For a lot of folks, there really is no "overnight", so frequent, small syncs are the way to go.
J.Ja
J.Ja
I'm an old school programmer having worried about how to fit my real-time 6502 assembly code into an Apple //e with 64 KB RAM. So, I'm conditioned to think about performance in terms of resource usage and speed of execution. It shocks me how young programmers have the mind set that anything they write will run fast enough without any consideration for performance. For example, one guy had an algorithm that took 3 hours to run. The algorithm needs to run about 50,000 times to process one day's data, which translated to 150,0000 hours - not practical. I looked at the code and made several suggestions. Now the same algorithm takes 8 seconds X 50,000 = 111 hours. My suggestions were quite basic: Pull calculations out of loops, use look up tables where possible, etc. The truth is that performance does matter. You often don't need to wring out every last microsecond, but we should all make efforts to avoid wasteful coding.
I agree that the farther away from the days of constrained memory and small hard drives have indeed led to a lot more slop in coding. It would be nice is CS cources made students spend a semester working on an old DOS machine, or a creaking mainframe, just to give them an appreciation of the value of a clock cycle or a meg of RAM.
J.Ja
J.Ja
I agree that CS programs need to teach more in Data Structures, Algorithm Design, and Design Optimization. But I can teach these classes using either .NET or J2SEE. While they are leaning to do these courses they would be required to use tools that are used in the real world to do Memory Profiling to find Memory Leaks, or other Performance Counters that would allow you to optimize an algorithm and the data structure that it uses.
While we are talking about course, another course that is needed is Testing and Security. I?m not talking about the simple Black Box Testing that most developer?s do to prove that their code is correct; but rather a combination of Black Box, White Box Testing, Performance and Stress Testing that are designed to break applications. First they would have to find out how to break the application, and then to complete the problem they must recommend the correction that is need to resolve the problem. The other course is Security; how to design and build ?Security In Depth? into an application.
But the problem in the IT Industry is that most practitioners are self-taught and have little if any college course work.
While we are talking about course, another course that is needed is Testing and Security. I?m not talking about the simple Black Box Testing that most developer?s do to prove that their code is correct; but rather a combination of Black Box, White Box Testing, Performance and Stress Testing that are designed to break applications. First they would have to find out how to break the application, and then to complete the problem they must recommend the correction that is need to resolve the problem. The other course is Security; how to design and build ?Security In Depth? into an application.
But the problem in the IT Industry is that most practitioners are self-taught and have little if any college course work.
I agree that a lot of this stuff can be taught with managed code... but a lot of it cannot be taught with managed code. A data structures class, for example, is rather lacking when taught in a GC'ed langugage without pointers. I also agree that testing and security courses are needed as well. This is one reason why I favor breaking out education into a "trade school" track for people who want to be "real world programmers", and an "theory" track for people who want to do hardcore "computer science." Someone who wants to be a programmer would benefit highly from learning things like requirement gathering, testing/security, and so on, without being innundated with a lot of the theory intensive stuff that happens in a CS program. On the flip side, someone wanting to learn the intensive internal theory of the science (calculating the speed of sorting algorithms is my faviorite example) would be on the theory track. It makes little sense to spend the time teaching a mainstream developer how to build a tree or graph, or how to write a sorting algorithm, they just need to trust the library written by the theory folks to use the best algorithm or structure for their needs. I think this will benefit the industry overall rather significantly.
J.Ja
J.Ja
I have a number of minor "performance tricks" that I tend to use all the time. These are generally small things, like what was outlined elsewhere. Things like pre-calculating your loop repetitions, etc. For most good programmers, I think these are second nature.
However, I also tend to balance sheer runtime speed against flexibility and maintainability. I generally look for a "happy medium" of these.
For some batch processing programs I do, I have a fair number of "settings" stored externally (either database, text file, ini file, etc.). I load the settings into memory once at the beginning of the program; they won't change for the run of the program.
This approach adds some flexibility without really affecting the performance of processing a few million records. In this case the benefit is that the programmer doesn't have to recompile anything.
So, while I value performance, I tend to see it as one part of a larger picture. I've had times where the only way do something is some specific way that utterly kills performance. I was not happy about having to do it and said so, but I did it.
Imagine two data servers in two different network domains. From where this program would run, I could see both servers. I had to pull a set of records from server A and for each record go pull another set of records from server B. No -- the servers were not allowed to talk to each other due to network security considerations. So even though they were MS SQL Servers, you couldn't have one use an external reference to the other and use a join in SQL.
However, I also tend to balance sheer runtime speed against flexibility and maintainability. I generally look for a "happy medium" of these.
For some batch processing programs I do, I have a fair number of "settings" stored externally (either database, text file, ini file, etc.). I load the settings into memory once at the beginning of the program; they won't change for the run of the program.
This approach adds some flexibility without really affecting the performance of processing a few million records. In this case the benefit is that the programmer doesn't have to recompile anything.
So, while I value performance, I tend to see it as one part of a larger picture. I've had times where the only way do something is some specific way that utterly kills performance. I was not happy about having to do it and said so, but I did it.
Imagine two data servers in two different network domains. From where this program would run, I could see both servers. I had to pull a set of records from server A and for each record go pull another set of records from server B. No -- the servers were not allowed to talk to each other due to network security considerations. So even though they were MS SQL Servers, you couldn't have one use an external reference to the other and use a join in SQL.
At least you knew it was wrong, amazing how many people don't.
Usually some 'expert' who finds VB easier to write than SQL.
Usually some 'expert' who finds VB easier to write than SQL.
Actually, I use VB quite a bit. But I also recognize that if I can let the server do some of the work for me, it will likely be faster.
To me, I use whatever I can use to make it work well. "Well" being a rather nebulous term at times (like in the situation I described above).
SQL doesn't handle presentations too well, nor does it handle web pages all that well. So, I use VB or VB.NET (depending on if I work on legacy traditional ASP or new pages). But when it comes to getting data, I do as much as I can with SQL rather than VB. In my opinion, that's why SQL exists.
Yes, I've done some work in C#, but I haven't learned it sufficiently to be truly comfortable writing it yet.
When necessary, I'll even use VBA. It all just depends on what I'm trying to accomplish and which set of tools better meshes with the big-picture goal.
To me, I use whatever I can use to make it work well. "Well" being a rather nebulous term at times (like in the situation I described above).
SQL doesn't handle presentations too well, nor does it handle web pages all that well. So, I use VB or VB.NET (depending on if I work on legacy traditional ASP or new pages). But when it comes to getting data, I do as much as I can with SQL rather than VB. In my opinion, that's why SQL exists.
Yes, I've done some work in C#, but I haven't learned it sufficiently to be truly comfortable writing it yet.
When necessary, I'll even use VBA. It all just depends on what I'm trying to accomplish and which set of tools better meshes with the big-picture goal.
but one at the cookie cutter, once wrote a macro solution developers I end up sweeping up after.
Functionally VB (.net) certainly with Orcas is a much of muchness comapred to C#, some on the twiddles in it to make development more 'approachable' leave a bad taste in my mouth.
Modern 'Compiled' languages other than VB tend to be far stricter compiler wise potentially giving rise to better code, if you know what you are doing. VB's principal appeal is being able to bash something together that looks OK without any real formal study.
I was at TechEd (Barcelona) last year the 4th (highest) level course on writing DB applications was principlally about not dragging all your data into the client and then processing a bit of it.
I couldn't understand sending anyone there, if they didn't know that already, but there you go.
Functionally VB (.net) certainly with Orcas is a much of muchness comapred to C#, some on the twiddles in it to make development more 'approachable' leave a bad taste in my mouth.
Modern 'Compiled' languages other than VB tend to be far stricter compiler wise potentially giving rise to better code, if you know what you are doing. VB's principal appeal is being able to bash something together that looks OK without any real formal study.
I was at TechEd (Barcelona) last year the 4th (highest) level course on writing DB applications was principlally about not dragging all your data into the client and then processing a bit of it.
I couldn't understand sending anyone there, if they didn't know that already, but there you go.
Don't worry, I didn't take any offense.
I was just trying to point out a couple of things.
1 - that SQL exists for working with data and it should be used to its maximum when doing so
2 - in my opinion, it helps to match the tool you use to the goal you are trying to accomplish
You touch on an issue that I have with some training stuff. Much of what I find for general development (or general database usage) is woefully basic. It would be good to teach someone in high school, but not so helpful for expert developers.
I'm not sure what can be done about it other than to stick with classes focused on specific technologies (or products) you are trying to learn and hope for the best. If you can see the course syllabus, that helps, of course.
I was just trying to point out a couple of things.
1 - that SQL exists for working with data and it should be used to its maximum when doing so
2 - in my opinion, it helps to match the tool you use to the goal you are trying to accomplish
You touch on an issue that I have with some training stuff. Much of what I find for general development (or general database usage) is woefully basic. It would be good to teach someone in high school, but not so helpful for expert developers.
I'm not sure what can be done about it other than to stick with classes focused on specific technologies (or products) you are trying to learn and hope for the best. If you can see the course syllabus, that helps, of course.
but they are SQL, foolishly assuming that an other person has created the table, done the contraints and popped in some indexes.
You could teach the basics of what a developer must know about databases in a week.
You could teach the basics of what a developer must know about databases in a week.
... that I am talking about! I have seen applications that will hit the DB on every page view to find out where the logo belongs. Indeed, PHP and CGI programs are especially bad about stuff like this, because they lack the concept of persistence and shared storage at the applciation level. At best, you load this type of thing at the beginning of the session at cache it there.
J.Ja
J.Ja
To be fair, I never really figured out what the deal was for this refusal. I suspect that one data server was in a web DMZ and another was on the "Inside Network". The security guys were probably worried about opening up a route for hackers to use or something like that.
So, maybe they had some half-logical reason. But I still am not sure why the Inside server couldn't pull data from the DMZ (if that was indeed the situation).
Either way, this lack of communication between them forced me to use code I hated writing. Even thinking about it now still gets my blood pressure up.
So, maybe they had some half-logical reason. But I still am not sure why the Inside server couldn't pull data from the DMZ (if that was indeed the situation).
Either way, this lack of communication between them forced me to use code I hated writing. Even thinking about it now still gets my blood pressure up.
When I first joined the company I work for and was presented with our websites' code, I almost resigned on the spot, lol....
I can't tell you how many times I have suggested refactoring in our applications only to have my recommendations thrown out the window, just like you mentioned.
Just so you get a feeling of what I am talking about, the person who designed our corporate website was an Access Report maker [believe it or not] who was "promoted" to developer. The code is very sloppy and all over the place. There is no use of stored procedures and there are SQL statements embedded everywhere in the site. No standards whatsoever as far as coding (or anything really), multiple redundant calls to the database, no caching on an almost static line of products, etc, etc.
I actually made a suggestion to REDESIGN the entire thing as doom is inminent with the current design, but they refuse to spend any time for "academic purposes". At least that is what they call my efforts for improvement.
I know, I know, what the hell am I doing here?... oh well, it is close to my house, I am tired of commuting to N.Y. and I had enough with the corporate bureocracy!
I can't tell you how many times I have suggested refactoring in our applications only to have my recommendations thrown out the window, just like you mentioned.
Just so you get a feeling of what I am talking about, the person who designed our corporate website was an Access Report maker [believe it or not] who was "promoted" to developer. The code is very sloppy and all over the place. There is no use of stored procedures and there are SQL statements embedded everywhere in the site. No standards whatsoever as far as coding (or anything really), multiple redundant calls to the database, no caching on an almost static line of products, etc, etc.
I actually made a suggestion to REDESIGN the entire thing as doom is inminent with the current design, but they refuse to spend any time for "academic purposes". At least that is what they call my efforts for improvement.
I know, I know, what the hell am I doing here?... oh well, it is close to my house, I am tired of commuting to N.Y. and I had enough with the corporate bureocracy!
We have a very big server to handle
a very big overnight process.
the rest of the day it is practically
idle.
The applicaitons I write are used during
the day.
User experience is far more important.
premature optimisation is the root of all
evil.
a very big overnight process.
the rest of the day it is practically
idle.
The applicaitons I write are used during
the day.
User experience is far more important.
premature optimisation is the root of all
evil.
"premature optimisation is the root of all
evil."
I have always thought that far too many people used that sentence as a cop out. There is a difference (a huge one) between an "optimization" and "good coding practice". Many developers code as if they get paid based on the WPM they code at, not the results of what they type. They sit down and bang away as if their life depended upon it, but won't take the time to think about what they are doing.
The optimization quote is pretty specific, in terms of meaning, "don't sit there trying to acheive the most perfectly performing code until the application at least works right."
Yolu are right, user experience is extremely important... don't you think that performance plays a large role in that? Study after study shows that after 10 seconds, a Web user is ready to click over to another page. I would wager that thanks to the slow demise of dialup, users are even more impatient. Not too mention that if your service or application is sold with a performance SLA, a 10% speed boot = a 10% reduction in cost to meet SLA.
Now, to work my way into a bonus plan where meeting SLA triggers a bonus.
J.Ja
evil."
I have always thought that far too many people used that sentence as a cop out. There is a difference (a huge one) between an "optimization" and "good coding practice". Many developers code as if they get paid based on the WPM they code at, not the results of what they type. They sit down and bang away as if their life depended upon it, but won't take the time to think about what they are doing.
The optimization quote is pretty specific, in terms of meaning, "don't sit there trying to acheive the most perfectly performing code until the application at least works right."
Yolu are right, user experience is extremely important... don't you think that performance plays a large role in that? Study after study shows that after 10 seconds, a Web user is ready to click over to another page. I would wager that thanks to the slow demise of dialup, users are even more impatient. Not too mention that if your service or application is sold with a performance SLA, a 10% speed boot = a 10% reduction in cost to meet SLA.
Now, to work my way into a bonus plan where meeting SLA triggers a bonus.
J.Ja
Good thread behaviour and short
database transactions will have more
beneficial impact than unrolled loops
and efficient sort routines.
Then again unrolling loops is usually a complete waste of time and will just
create messier code, as almost any
commercial grade compiler will do
things like that anyway.
if my process is in the middle of a job
that is the heaviest usage of the machine
then i will spend a long time thinking about how it will impact the system.
if i have very high performance requirements
I will think very hard about how to meet
them.
However if the decision is between a web
page refresh being 300ms faster or supporting the systems being easier.
I will advocate the second every time.
database transactions will have more
beneficial impact than unrolled loops
and efficient sort routines.
Then again unrolling loops is usually a complete waste of time and will just
create messier code, as almost any
commercial grade compiler will do
things like that anyway.
if my process is in the middle of a job
that is the heaviest usage of the machine
then i will spend a long time thinking about how it will impact the system.
if i have very high performance requirements
I will think very hard about how to meet
them.
However if the decision is between a web
page refresh being 300ms faster or supporting the systems being easier.
I will advocate the second every time.
In your discussion, you talked about how much cheaper coding for performance is. Certainly, code that is outrageously slow is quite expensive. However, my experience has taught me that one of the most expensive things developers can do it optimize too early and too often.
Optimizing code almost always means making code more difficult to read and more difficult to maintain. It often means data storage becomes less efficient. It also usually means an increase in the number of defects in a piece of code for those reasons.
As a product manager trying to get a project out the door, quality is of topmost importance to me. Optimizing "for fun" or for some arbitrary sense of obligation is antithetical to that goal.
On the flip side, there are certainly reasons to do performance optimizations, but those should be driven by hard requirements. "This function needs to take less than 'n' seconds" or "The system needs to scale to X users in this hardware configuration."
At coding time, those requirements guide design decisions to assist with possible optimization later. After coding is complete, you perform analysis of the system to find out where the bottlenecks are, and you optimize only to the degree you have to.
tj
Optimizing code almost always means making code more difficult to read and more difficult to maintain. It often means data storage becomes less efficient. It also usually means an increase in the number of defects in a piece of code for those reasons.
As a product manager trying to get a project out the door, quality is of topmost importance to me. Optimizing "for fun" or for some arbitrary sense of obligation is antithetical to that goal.
On the flip side, there are certainly reasons to do performance optimizations, but those should be driven by hard requirements. "This function needs to take less than 'n' seconds" or "The system needs to scale to X users in this hardware configuration."
At coding time, those requirements guide design decisions to assist with possible optimization later. After coding is complete, you perform analysis of the system to find out where the bottlenecks are, and you optimize only to the degree you have to.
tj
I think that poor performance can be mitigated by making intelligent choices at coding time. You don't have to unroll loops or code in assembler. Just don't recalculate the same value in a loop, choose a sort algorithm appropriate for the number of items, or pick a data structure appropriate for the application's access needs. This doesn't have to hard or time consuming. It can be done at no cost, and can even save time when done well.
As I have said elsewhere in these comments, I agree that early "optimization" tends to be a mess. But there are smart coding habits and common sense patterns wheich a lot of programmers just fail to know about entirely. These are the "sweet spot" of performance. Get things like looping fast, get things like passing by reference and passing by value right (and know when to do each!), and so on, and I find that performance is less slow than it could be. For example, when I see a developer create a 200 KB dataset and then start passing it by value as a parameter all over the place, I know that there is a serious performance problem looming, particularly in OO code where you have no clue if one parameter will work through 10 objects and be replicated 10 times on the call stack. But most developers do not think this way, and the Web app that runs great on their desktop or in the test environment suddenly collapses under load, either (hopefully) in the load testing phase (where the need to boost performance just causes a missed deadline) or worse, in Production (where the performance problems require gobs of hardware to deal with).
J.Ja
J.Ja
Hell yes we care about performance especially developming mainly for the Pocket PC platform.
While most of us are writing for big servers or powerful desktop machines, there are still plenty of folks writing code for devices where battery life is more important than speed, and performance really makes a difference there!
J.Ja
J.Ja
Re: the "bean counters" don't demand efficiency in the server room
My guess is you've already put your finger on this: "bean counters" can relate more to hardware than to software. Notice that they pay attention to the cost of the monitor, keyboard, mouse, etc. My guess is they also pay attention to the per unit costs of the server room, buying cheap boxes, though ones that are performant enough to run the tasks. My guess is they were convinced a while back that managed code was the way to go for web apps., so they're sticking with that, and they're just figuring that the hardware cost goes with the territory. As for efficient vs. inefficient software, my guess is they don't have a clue what the difference is.
As I've often said before with customers, who do have their performance requirements, they don't care how an app. is engineered, just so long as it works.
It's not just the "bean counters". I had a boss once years ago, who was (and is) a professional developer (still know the guy), who seemed to insist that we write as much as possible in-house, even though at the pay rate we were getting, buying a pre-packaged solution that did basically the same thing would've been quite a bit cheaper. I suppose one of his criteria was performance. We were typically able to produce our own stuff that ran faster and took up less disk space and memory than the commercial equivalent, because we were able to target it at exactly what we wanted, rather than the "kitchen sink" approach of a commercial solution. The problem was the stuff we wrote had to be debugged more. So, he cared about performance, just not the cost to get it.
My guess is you've already put your finger on this: "bean counters" can relate more to hardware than to software. Notice that they pay attention to the cost of the monitor, keyboard, mouse, etc. My guess is they also pay attention to the per unit costs of the server room, buying cheap boxes, though ones that are performant enough to run the tasks. My guess is they were convinced a while back that managed code was the way to go for web apps., so they're sticking with that, and they're just figuring that the hardware cost goes with the territory. As for efficient vs. inefficient software, my guess is they don't have a clue what the difference is.
As I've often said before with customers, who do have their performance requirements, they don't care how an app. is engineered, just so long as it works.
It's not just the "bean counters". I had a boss once years ago, who was (and is) a professional developer (still know the guy), who seemed to insist that we write as much as possible in-house, even though at the pay rate we were getting, buying a pre-packaged solution that did basically the same thing would've been quite a bit cheaper. I suppose one of his criteria was performance. We were typically able to produce our own stuff that ran faster and took up less disk space and memory than the commercial equivalent, because we were able to target it at exactly what we wanted, rather than the "kitchen sink" approach of a commercial solution. The problem was the stuff we wrote had to be debugged more. So, he cared about performance, just not the cost to get it.
Speed of development, cost, and quality, sounds like that manager preferred quality!
Managed code for Web apps makes sense, beleive it or not. The cost of letting some shake-n-bake programmer deal with pointers and bound checking on a server is much more than the cost of managed code. For a desktop app, it shows up as the occassional seg fault. On a server, it can take down a whole box. Unwholesome. As slow as managed code can be, the sad fact is, most developers out there are unable to write code at a high enough quality to be using anything else in that environment.
That being said, a bad developer can make native code slow or managed code even slower. That's where the focus needs to be, showing that a good code review + refactoring pays for itself with low risk. Gotta speak manager speak to convince managers and all of that.
J.Ja
Managed code for Web apps makes sense, beleive it or not. The cost of letting some shake-n-bake programmer deal with pointers and bound checking on a server is much more than the cost of managed code. For a desktop app, it shows up as the occassional seg fault. On a server, it can take down a whole box. Unwholesome. As slow as managed code can be, the sad fact is, most developers out there are unable to write code at a high enough quality to be using anything else in that environment.
That being said, a bad developer can make native code slow or managed code even slower. That's where the focus needs to be, showing that a good code review + refactoring pays for itself with low risk. Gotta speak manager speak to convince managers and all of that.
J.Ja
For web apps. I think managed code is preferable to native code. The stuff you have to deal with as a web developer is complex as it is. It doesn't need pointers to make it more complicated.
You asked why management doesn't pay more attention to the efficiency of the software. That's mainly what I was addressing.
I wonder what the best way is to go about improving this. It could just be as simple as writing up a "coding standards" document that talks about efficiency practices, and distribute that to the software crew. I know it can take a while to write one of those up. I did that once years ago. So I'm not saying it's a small task. I think the best management could do would be to be more picky about which programmers they hire.
You asked why management doesn't pay more attention to the efficiency of the software. That's mainly what I was addressing.
I wonder what the best way is to go about improving this. It could just be as simple as writing up a "coding standards" document that talks about efficiency practices, and distribute that to the software crew. I know it can take a while to write one of those up. I did that once years ago. So I'm not saying it's a small task. I think the best management could do would be to be more picky about which programmers they hire.
There are two types of performance:
1) Speed of development
2) Speed of execution
Of course, you don't want to write junk just to ship the product but you must ship and spending a lot of extra time to squeeze a 10% performance gain (that won't be noticed/appreciated) is not a good investment.
That said, my current project is very performance sensitive. Even a 5% gain in performance is worth spending a few days implementing. It runs on desktops for a targeted user base.
So, we just need to keep in mind what is important in each project and let's not waste efforts where they are not needed or wanted.
1) Speed of development
2) Speed of execution
Of course, you don't want to write junk just to ship the product but you must ship and spending a lot of extra time to squeeze a 10% performance gain (that won't be noticed/appreciated) is not a good investment.
That said, my current project is very performance sensitive. Even a 5% gain in performance is worth spending a few days implementing. It runs on desktops for a targeted user base.
So, we just need to keep in mind what is important in each project and let's not waste efforts where they are not needed or wanted.
John -
I agree with your point on that 10% gain... sometimes you hit a wall, and that 10% gain can take almost as long to acheive as writing the code in the first place.
That being said, the vast majority of the code I have read in my time written by many coders (including myself, embarrasingly enough) was half as fast as it should be, due to laziness, sloppiness, or ignorance. That's the stuff I mean. It does not take a genius to calculate the upper bound of a loop in advance, to avoid recalculation on each iterations, but guess what? Most developers don't do it, and they think they are being clever and efficient because they are saving 1 integer's worth of memory. Stuff like that. The low hanging fruit is low risk and easy to pluck, but most folks out there are not doing it.
J.Ja
I agree with your point on that 10% gain... sometimes you hit a wall, and that 10% gain can take almost as long to acheive as writing the code in the first place.
That being said, the vast majority of the code I have read in my time written by many coders (including myself, embarrasingly enough) was half as fast as it should be, due to laziness, sloppiness, or ignorance. That's the stuff I mean. It does not take a genius to calculate the upper bound of a loop in advance, to avoid recalculation on each iterations, but guess what? Most developers don't do it, and they think they are being clever and efficient because they are saving 1 integer's worth of memory. Stuff like that. The low hanging fruit is low risk and easy to pluck, but most folks out there are not doing it.
J.Ja
Well I think you are right, some people will argue that development time is mor important and sometimes refactoring takes a long time. However if you build with perfomance in mind from the begining then this really doesn't become and issue. I was really hoping this article would show me something I didn't know but caching is old news to me and I do not build a website without. I have also done presentations on caching at .net user groups. it is a very powerful tool however it does have some gotcha. Another simple performance booster is StringBuilder for any time you are building up strings and doing a lot of concantinations. Heck using Regular Expressions and so on as well. There are a ton of things you can do to improve performance.
Oh yeah and caching data on the server, if you cache a DataTable or a DataSet. You do know you can set up indexes and keys on those internal DataTable structures didn't you? Even primary and foreign key relationships inside your cached Dataset. We all know that setting up indexing correctly in a Database speeds things up. Well same thing on a DataTable or DataSet.
Performance does matter, people that do not program with performance and Security in mind right from the begining are not really what I would call programmers, I would call them tinkerers.
Oh yeah and caching data on the server, if you cache a DataTable or a DataSet. You do know you can set up indexes and keys on those internal DataTable structures didn't you? Even primary and foreign key relationships inside your cached Dataset. We all know that setting up indexing correctly in a Database speeds things up. Well same thing on a DataTable or DataSet.
Performance does matter, people that do not program with performance and Security in mind right from the begining are not really what I would call programmers, I would call them tinkerers.
reporting, performance and security often are not in 'right from the beginning' .
Tight deadlines and focus on 'visible requirements' are !
Lets picture the following scenario:
- you work at a commercial custom solution IT shop and are pitching for a lets say 'webapplication'
- in your proposal you take reporting, security and performance into account
- a competitor who puts in his bid, does not and therefore is a lot cheaper
- because your client can see the quality in your bid [if you are lucky, often clients does not know or care and just want a 'working solution'], he asks you to modify your proposal without these requirements
(we do not need these , we can manage without them and fix it when we need it on a later date)
How much time will you (I know depends on scale of project etc.) deduct from your offer to still get the job? Or will you refuse to cut? (convincing the client is not an option)
And if you have cut the proposal, you will need a lot of restraint from the programmer not to implement the 'better code'.....
My opinion:
- application/system design should never be affected by the lack of these 'soft' requirements and implementation afterwards should never be a problem. But it is a hell of a message to bring across to good programmers 'NOT to implement'!
Tight deadlines and focus on 'visible requirements' are !
Lets picture the following scenario:
- you work at a commercial custom solution IT shop and are pitching for a lets say 'webapplication'
- in your proposal you take reporting, security and performance into account
- a competitor who puts in his bid, does not and therefore is a lot cheaper
- because your client can see the quality in your bid [if you are lucky, often clients does not know or care and just want a 'working solution'], he asks you to modify your proposal without these requirements
(we do not need these , we can manage without them and fix it when we need it on a later date)
How much time will you (I know depends on scale of project etc.) deduct from your offer to still get the job? Or will you refuse to cut? (convincing the client is not an option)
And if you have cut the proposal, you will need a lot of restraint from the programmer not to implement the 'better code'.....
My opinion:
- application/system design should never be affected by the lack of these 'soft' requirements and implementation afterwards should never be a problem. But it is a hell of a message to bring across to good programmers 'NOT to implement'!
To be frankly honest, if you are pulling so many results that indexing a pulled dataset will speed things up more than the indexing takes to perform, you may want to start asking yourself the following questions:
* Why am I pulling so many rows? Can I truly use this many rows at one time? Would I be better served by paging through the data instead, or pulling a more restrictive query?
* Do I really want to cache 10,000 records on my app server?
* Does this really fit into my architecture?
On the very rare ocassions that I had a result set come out to enough records where I might want to consider a cache or an index on the result set, it usually was a requirement from the customer that did not make sense. Where it does make sense is pulling a result set that you may wish to run subqueries against, which is a fairly infrequent scenario. And even then, the damage of caching a big result set (particularly in an environment where cached objects get shared between servers) has to be carefully weight against the advantages of not going back to the DB.
For configuration and such? Great idea. But for the results of a user's queries? Not so sure.
J.Ja
* Why am I pulling so many rows? Can I truly use this many rows at one time? Would I be better served by paging through the data instead, or pulling a more restrictive query?
* Do I really want to cache 10,000 records on my app server?
* Does this really fit into my architecture?
On the very rare ocassions that I had a result set come out to enough records where I might want to consider a cache or an index on the result set, it usually was a requirement from the customer that did not make sense. Where it does make sense is pulling a result set that you may wish to run subqueries against, which is a fairly infrequent scenario. And even then, the damage of caching a big result set (particularly in an environment where cached objects get shared between servers) has to be carefully weight against the advantages of not going back to the DB.
For configuration and such? Great idea. But for the results of a user's queries? Not so sure.
J.Ja
Performance optimization gets a bad name because there are plenty of developers who do stupid things in the name of performance.
Tools can help. For Java, IntelliJ IDEA has inspections you can turn on for performance problems. As you type your code, IntelliJ will put warnings on performance issues. IntelliJ then gives you the option to automatically change your code to a form that doesn't create an unnecessary object, or takes advantage of more efficient standard library APIs.
If more tools could do this, every developer could make little performance improvements to all their code with no loss of readability or horrid transformations of their code.
Tools can help. For Java, IntelliJ IDEA has inspections you can turn on for performance problems. As you type your code, IntelliJ will put warnings on performance issues. IntelliJ then gives you the option to automatically change your code to a form that doesn't create an unnecessary object, or takes advantage of more efficient standard library APIs.
If more tools could do this, every developer could make little performance improvements to all their code with no loss of readability or horrid transformations of their code.
A good tool, while not finding every little trick, is indeed pretty helpful. Of course, profile the app before and after, just to make sure that the tool is sane within your particular codebase.
J.Ja
J.Ja
What I want to hear about is what you discovered about the caching options in the .Net Application object! How about a column on what you found out?
Steve G.
Steve G.
- Keyboard Shortcuts:
- Prev
- Next
- Toggle









































