Discussion on:

18
Comments

Join the conversation!

Follow via:
RSS
Email Alert
0 Votes
+ -
Select COUNT(*) DOES NOT do full table scan if table has a primary key. It counts the Index. Sometimes indexes may be 50MB for a 500MB table and Select COUNT(*) can be very fast! DB optimizition needs knowledge of programmer to get best out of the CBO - cost based optimizer. Backend JOINs and filters are often better than frontend logic as you minimize DB traffic. In all RDBMS good use of stored procs and temporary tables that move logic inside the DB server may improve performance. Extractall the "Horse Power" from the RDBMS engine.
http://www.sql-server-performance.com

Went on a 2005 course recently, there's some good stuff here. Trainer passed it on.
A couple of points the reason to try to avoid cursors on sql server is it's optimised for set based operations. Cursors being for logical operations. There will still be operations where overall performance of an operation is better with cursors than selecting loads of data and then processing on a client. If you can morph them into a set based operation, perhaps using temporary tables or even triggers, that will work better than using cursors.


This DBA who suggested that the developer should do his optimisation by himself, sack him.

He's not doing his job. When I've got my DBA hat on, I don't trust an application devloper to get database anything right. Even if he did, he 's looking at an isolated operation on an unloaded and usually low volume system. All that is, is a syntax check.
Using "SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid 2" can and does produce different values.

I never heard of this and when I tried it out, the first two tables I used it on the values of count and your query produced different results.
if exists (select * from departments where dept='HQ')

Some SQL developers use count(*) which is a more expensive call than checking for existence.

Exists can also be used in a where clause:

select deparment_cd from departments dept
where exists(select * from employees emp
where emp.deparment_cd = dept.deparment_cd)

instead of using count(*)

select deparment_cd from departments dept
where 0 (select count(*) from employees emp
where emp.deparment_cd = dept.deparment_cd)

Note: Using aliases makes the code more readable.
0 Votes
+ -
Arrrgh !!!
Tony Hopkinson 27th Sep 2006
Using count for existance?

Bet they use the recordcount property on the dataset as well.

If (MydataSet.RecordCount) > 0 Then
...

Training Requirement !!!!
"Indexes are your friend" can get you in trouble in a high-transaction application. Avoid using too many indexes or even clustered indexes on tables where you expect to insert lots of transactions. However, reporting application will greatly benefit from adding more indexes (when used).
0 Votes
+ -
Too right
Tony Hopkinson 27th Sep 2006
If you are doing high volume intensive, you want in and out and quick.

Never yet found a need to drop any unique constraint, but extra indexes, triggers anything that will slow it down particularly if you've got the potential for locks should be done away with.
Some good info, but sloppy writing. For example, "While the asterisk (*) makes it easy to return all column values returned by a query, you should only return the necessary rows" -- should say only return the necessary COLUMNS, not rows. It is true in most RDBMSs that specifying columns by name makes queries faster, and returning only the columns needed reduces the size of the dataset returned.
The author is correct about overuse of UNIQUE or DISTINCT keywords. I've been guilty of using DISTINCT in multi table join queries as a general rule just to prevent returning duplicates or to prevent Cartesian products, but later had to go back over my SQL code and remove those where unnecessary so that the queries ran faster (and the DBMS engine was loaded down less).
As for WHERE vs HAVING, there may be times when it is useful to limit the records before being grouped by using a WHERE clause, but adding a HAVING clause for specific purposes. One might have to experiment back and forth.
I'm not sure if this is true in SQL Server, since I work mostly with Oracle these days, but I'f learned that the order of the conditions in a WHERE clause can affect optimization. As optimization engines get smarter, this becomes less so, but I've developed the practice to put the most limiting conditions first (and especially to leave conditions that include things like functions to the last). I'd be interested to hear peoples' opinions and experience on that.
0 Votes
+ -
engine will optimise the statement based on statistics after it's built some up it should smooth out.
On of the big initial failures of the db upgrade to 2005 is it made an arse out of the stats and 2005's performance was shocking.

I order mine automatically as well, after all I know the data better than the server as well.
indexes, then by most limiting.

membership (Ins) functions , then subqueries if I'm real unlucky

Can't rember this guy saying anything about sorts either, they can kill you.
0 Votes
+ -
Where Clause
gsquared 28th Sep 2006
Order of constraints in the Where Clause in SQL Server 2000/2005 has no effect on the performance of the query.

The optimization engine looks for indexes it can use, finds the tightest one, and starts from there. Doesn't matter if it's your first clause or your last.

Subqueries are resolved before outer queries, unless they are correlated subqueries (reference a value in the outer query in their own where statement), in which case they are run once per row in the outer query after the outer query is run.

So you can get improved performance in your where statement if at least one of the clauses includes a well-indexed subquery or an index on the primary table being queried, but the sequence doesn't matter.
0 Votes
+ -
Although guidelines are nice, one should always base performance tuning on measurements rather than rules. Off the top of my head, I have seen examples counter to at least two of the listed suggestions.

1) Select(*) may be more network efficient. If there are a lot of columns specified, the select query that is sent to the database can be quite long. Yes, the return response will contain all of the column names plus all of the data, but it may still optimize the time to get the first row of data displayed. Measured and verified this one with EtherPeek for queries over a 56 KBaud line.

2) Straight SQL may be just as fast as stored procedures. I can only speak to Oracle here, but Oracle caches the compiled version of queries, so you only pay the compile penalty once, and the cache is quite big. Unless there is a measurable performance benefit otherwise, I prefer to keep the SQL with the main code. It saves on coordination and configuration efforts if the code versions and the database versions are not tightly coupled. I've seen too much time wasted rewriting queries to procedures only to find no benefit and also seen too muchtime wasted with developers sitting on their hands waiting for the DBA to get to their individual updates.

In general, never undertake additional effort for a performance enhancement without a valid measure. Make sure you have a DBA or someone highly familiar with the database tools before attempting any performance tuning.
0 Votes
+ -
Whether to use the server or the workstation to crunch data is a more complex question than you might think. The article says let the server do it, which is often true, but not always.

For example: You have a server cluster that is running at 80% CPU use on average, 90% page file use, and it's hitting the hard drive for over 50% of all data calls. You have a workstation that's running at 3% CPU use and 50% RAM use. By all means, have the workstation do the crunching! Pull the raw data from the server, then parse, modify, tabulate, spindle, mutilate and fold locally.

On the other hand, if the workstations most people are using are the rock-bottom minimum that procurement can find, while the servers are top of the line, then the workload should be on the servers.

On the other hand, I never, ever leave data integrity up to the workstation. Referential integrity, ACIDity of transactions, etc., I keep in the database. If you don't enforce those there, you might as well not enforce them at all.

So it's a ballancing act, based on hardware availability and average load.

Also, network constraints might make a difference in this. Just as with select *, there are options and considerations to take into account.

If, for example, you are selecting from a wide table (lots of columns) that has primarily numeric data types, and you only want one row, but you need all/most of the columns, your overall performance from select top 1 * ... will probably be better than selecting each column by name. Why? Because the select is a smaller network packet, also it allows the server to grab the data in native sequence instead of forcing it to grab in the sequence you specify, and the return network packet will be about the same size regardless of which method you use. So, the work on the network will be less overall, and that might make a difference in some cases.

On the other hand, the author is right in the vast majority of cases on the subject of letting the server do the work.

(Though select count(*) from table doesn't require a full table scan. It just scans the primary key index in most cases, and if it doesn't have one of those, it scans the smallest index it has. If no indexes, it still doesn't scan all columns in the table even then. But it will be faster with a primary key.)
As always swings and roundabouts, but if you were talking 3 minutes with an hourglass cursor in a gui vs pulling the data down and four minutes with a progress bar, the latter might be a better choice anyway in terms of usability. At least they know they've got time to make a brew.

Shaving minutes off a query once you've done what SQL optimisation you can could cost a lot of money.
0 Votes
+ -
Client vs Server
gsquared 29th Sep 2006
Again, true in most cases.

But what if we're talking about something that will lock up a workstation for 3 minutes or something that will lock up the whole server for 2 minutes? In a multi-user environment, sorry, the guy who needs the data is going to sit for 3 minutes so I can avoid making everyone on the network sit for 2.

A real example? Heavy data analysis. If loading a large recordset into Excel and then having the user pivot it and parse it is an option, I'll often go for that over having my OLTP server tied up every time the user wants to move a column to a row in an analysis cube.

Ideally, of course, analysis should be on a seperate server with a real OLAP database and a lot of warehoused, precalculated data. Then the OLTP server won't be bogged down when management needs to do heavy crunching.

In a small/medium business, such a practice is often not financially feasible. A solution is dump the data to the client in Excel, and let them use the pivot table abilites of that program to analyze the data. Sure, that individual workstation is going to get a lot of hourglass time. Better that the one or two managers doing the analysis get that, than that the whole business has to wait every time they want to change the relationships of what they're looking at.

That's the kind of situation I'm talking about when I say sometimes it's better to let the client do the crunching.

Again, in an ideal situation, you have a separate OLAP server with good warehousing practices, on a separate RAID array, separate CPU(s), separate RAM, etc. Even better yet if it can be on a separate Web/Intranet server for whatever front end you're using. No performance hit on the OLTP server, so no slow-downs for order entry, CRM, etc., while having fast analysis cube service. But, as mentioned, that's not always possible.

Outside of exception situations outlined above, yes, I have most/all of the work done on the server, because it's faster there and takes advantage of the database engine's ability to handle sets of data more efficiently than the front end (OOP) can.

Cursor type action, on the other hand, can often be done better by the client than the server. Send a dataset to the client and have it step through and do one-by-one manipulations. SQL server isn't good at that. It can do it, but often not as efficiently as VB can.
Seeing as most reports are standard, you can even set it up so the result is pushed back up onto an archived reports table on the server. Saves someone else working it out for the next request.

Sort of a poor mans distributed processing manouvre.

Worked very well that. Gave me a very good 'requirements document' for the reporting system that was successfully argued for later on.

Even better the front end stayed the same , same stored proc, same parameters different server/database connection. Aside from optimising the distribution from production to reporting, most of the work was already done.

SQL server is optimised for set based operations, it sucks at cursors, aside from admin stuff I try to stay well away from them.

VB ? Yuck.
LOL
A Simple yet very useful article.
Thanks a ton.
Good Article.

Turn Off Record Counts in Stored procedures, if not required. This option is not required in many instances.

SET NOCOUNT ON
Keyboard Shortcuts:
Prev
Next
Toggle
Join the conversation
Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]

Join the TechRepublic Community and join the conversation! Signing-up is free and quick, Do it now, we want to hear your opinion.