I have all my lyrics in a database.
It's a huge database witch is working slow, because of the large amount of rows >150.000 in db.
So that's why i made static pages for my site, even if i take more space from HDD.
But now i want to implement a search engine based on my lyrics database. My question is how to make a php search engine witch will run fast enough for my site
Many thanks
Message was edited by: beth.blakely@...
Discussion on:
View:
Show:
I had a similar problem. I started with a database with a few thousand rows, and it is now currently at over 20million. I want a full text search on that table. I am using MySQL, and installed a 3rd party index engine called Sphinx. It is free, open source. It queries all 20million rows in less then 1/10th of a second. Let me know if I can help you in any way
I believe you could use FullText indexing and search feature offered by most of the RDBMS.
Wow, another great article Tim. I'm probably at the perfect point in my IT career to be reading this because it all "sank in" and now it's brought out some questions:
1) An exclusion operator (for example: ) might cause the query to not use an index...is that assuming the column has multiple values. What if a column contains a constraint of 'Y' or 'N' and I do a search WHERE col = Y and then another search WHERE col N. I'd assume my speed results would be the same, but does the presence of the "not equal to" operator make it take longer?
2) Using functions in the WHERE clause means they execute for every row? Wow, that changes everything. I just assumed that during a query, the WHERE clause would execute any functions once and hold the result until the end of the query. The example you used with dates and date functions is a good one because nearly every query I write, if I need to check against a date, I use a self-created date conversion function to ensure that if it's a datetime field, I'm stripping the time. I shall rethink this in the future (and probably restructure some existing functions and procedures).
With all this talk about speed, I'd really like a summary of the optimizer functions in Query Analyzer. You may already have plans for this, but could you briefly discuss the "Show Execution Plan", "Show Server Trace" and "Show Client Stats" options in one of your future posts. I"ve looked at them once or twice but just don't have the time to dig into their true meanings. Even just a brief summary of how to compare two query results would be great!
1) An exclusion operator (for example: ) might cause the query to not use an index...is that assuming the column has multiple values. What if a column contains a constraint of 'Y' or 'N' and I do a search WHERE col = Y and then another search WHERE col N. I'd assume my speed results would be the same, but does the presence of the "not equal to" operator make it take longer?
2) Using functions in the WHERE clause means they execute for every row? Wow, that changes everything. I just assumed that during a query, the WHERE clause would execute any functions once and hold the result until the end of the query. The example you used with dates and date functions is a good one because nearly every query I write, if I need to check against a date, I use a self-created date conversion function to ensure that if it's a datetime field, I'm stripping the time. I shall rethink this in the future (and probably restructure some existing functions and procedures).
With all this talk about speed, I'd really like a summary of the optimizer functions in Query Analyzer. You may already have plans for this, but could you briefly discuss the "Show Execution Plan", "Show Server Trace" and "Show Client Stats" options in one of your future posts. I"ve looked at them once or twice but just don't have the time to dig into their true meanings. Even just a brief summary of how to compare two query results would be great!
Thanks, Im glad the article helped. Here are some answers to your questions:
1. Most likely, an index isn't really going to help you on a column that isn't highly selective (such as only having values of Y or N). You will likely incur a table scan either way. On other fields that are highly selective, any time you use a construct that excludes data, the index will likely not be used. The reason is that it will have to search every row to make sure the value in that row isn't the value. For instance, it is much easier to search for all values from 1 to 100 that are evenly divisible by 10. It is much harder to find the ones that are not evenly divisible by 10.
2. All the documentation from MS, and all of the experiences I have had show that functions in the where clause will execute for every row. The reason is that you can pass table values into the functions if necessary. If you have one constant value assigned by a function, store it in a variable and use that variable in the WHERE clause.
In my next article I plan on focusing on more advanced query tuning. That article will include STATISTICS IO. I haven't decided yet, if I will include execution plans in that article. I may save that for a future article.
1. Most likely, an index isn't really going to help you on a column that isn't highly selective (such as only having values of Y or N). You will likely incur a table scan either way. On other fields that are highly selective, any time you use a construct that excludes data, the index will likely not be used. The reason is that it will have to search every row to make sure the value in that row isn't the value. For instance, it is much easier to search for all values from 1 to 100 that are evenly divisible by 10. It is much harder to find the ones that are not evenly divisible by 10.
2. All the documentation from MS, and all of the experiences I have had show that functions in the where clause will execute for every row. The reason is that you can pass table values into the functions if necessary. If you have one constant value assigned by a function, store it in a variable and use that variable in the WHERE clause.
In my next article I plan on focusing on more advanced query tuning. That article will include STATISTICS IO. I haven't decided yet, if I will include execution plans in that article. I may save that for a future article.
That datetime fields are stored as a single number, if I remember right. The "index" your article refers to. SQL server just displays in date/time format. It doesn't store it that way. I've been doing the >'startdate' AND 10
AND tblA.idx = tblB.idx
BTW, nice to see a Louisvillian here...live in etown and work in Jeff myself (when not telecommuting).
AND tblA.idx = tblB.idx
BTW, nice to see a Louisvillian here...live in etown and work in Jeff myself (when not telecommuting).
It does store the date and time together for a DATETIME data type...the first 4 bytes are the number of days before or after the base date (Jan 1, 1900), and the 2nd 4 bytes store the time of day as milliseconds after midnight.
As for your question regarding the two join statements: they are exactly the same. SQL Server is smart enough to convert them to be the same thing. In actuality, I believe that the JOIN statement is more or less syntactic sugar...it works the exact same as using a WHERE clause.
Where do you work in Jeffersonville? Drop me an email.
As for your question regarding the two join statements: they are exactly the same. SQL Server is smart enough to convert them to be the same thing. In actuality, I believe that the JOIN statement is more or less syntactic sugar...it works the exact same as using a WHERE clause.
Where do you work in Jeffersonville? Drop me an email.
try doing an outer join without using JOIN syntax
;o)
JOIN syntax is more than sugar -- it can help you think properly, and is certainly easier to understand when you have to pick up and modify someone else's query
;o)
JOIN syntax is more than sugar -- it can help you think properly, and is certainly easier to understand when you have to pick up and modify someone else's query
Sure, below is how you do outer joins without using the JOIN syntax. Although, I am going to be honest, I don't know how to do a full outer join in this style, but it can probably be done. When I said "syntactic sugar", I meant that it does make it easier to program joins with them...which I imagine it is one of the reasons it became an ANSI standard. I only use JOIN operators...but in the end, they ar e the same as using a WHERE clause. When you said "JOIN syntax is more than sugar -- it can help you think properly, and is certainly easier to understand when you have to pick up and modify someone else's query "....that is exactly what syntactic sugar is....it makes life easier. 
(LEFT JOIN)
SELECT Fld1, Fld2 FROM Table1, Table2
WHERE Table1.Fld1 *= Table2.Fld1
(RIGHT JOIN)
(LEFT JOIN)
SELECT Fld1, Fld2 FROM Table1, Table2
WHERE Table1.Fld1 =* Table2.Fld1
(LEFT JOIN)
SELECT Fld1, Fld2 FROM Table1, Table2
WHERE Table1.Fld1 *= Table2.Fld1
(RIGHT JOIN)
(LEFT JOIN)
SELECT Fld1, Fld2 FROM Table1, Table2
WHERE Table1.Fld1 =* Table2.Fld1
Thanks for pointing out what is not so obvious. I didn't realize that the IN clause causes SQL to not use the index... that is very interesting.
We are a small web design company and most of the databases we design and host are on the small side. However, we do have a few that are getting quite large and this is good to know.
I know we have used the IN clause quite a bit. What are your thoughts on this... (for large databases)
Step 1. Create a temp table out of the list of Id's that make up the IN clause
Step 2. Use an inner join on the temp table to bring back the results.
Thanks,
Michael Riley
Webfodder, Inc.
We are a small web design company and most of the databases we design and host are on the small side. However, we do have a few that are getting quite large and this is good to know.
I know we have used the IN clause quite a bit. What are your thoughts on this... (for large databases)
Step 1. Create a temp table out of the list of Id's that make up the IN clause
Step 2. Use an inner join on the temp table to bring back the results.
Thanks,
Michael Riley
Webfodder, Inc.
When trying to solve an query, its better to test more than one approach and compare performance measures (like comparing the results of execution plans of all the alternatives).
If you compile you script into a stored procedure, recompile the SP every now and then, to allow for changes in the dynamic execution context of the database.
In relation to alternatives to the IN clause, you can try both the temp table approach and a subquery approach, and then compare performance metrics of the approaches.
If you compile you script into a stored procedure, recompile the SP every now and then, to allow for changes in the dynamic execution context of the database.
In relation to alternatives to the IN clause, you can try both the temp table approach and a subquery approach, and then compare performance metrics of the approaches.
Its difficult to say without looking at the code. Its always good to test different methods for the same query to see what turns out to be faster, etc. A temp table might be a good place to start for your IN() statements. It may/may not make if faster...it might be worth investigating.
He did a great job of telling you what to do and what not to do (if it can be avoided) for the best query optimization. I really liked the simple examples.
Thanks!
Thanks!
Quesiton on using the Select * method.
Say you have a stored proc so it is sitting on the server. I have tested this multiple times and multiple ways and not seen any performance difference.
What is your experience with something like this.
IF EXISTS(Select * From YourTable Where Something = Something)
Begin
DoSomething
End
Now I have tried just returning a signle column and as long as I am using the Where clause properly this doesn't seem to matter. Now like you though I never use Select * anywhere else.
Say you have a stored proc so it is sitting on the server. I have tested this multiple times and multiple ways and not seen any performance difference.
What is your experience with something like this.
IF EXISTS(Select * From YourTable Where Something = Something)
Begin
DoSomething
End
Now I have tried just returning a signle column and as long as I am using the Where clause properly this doesn't seem to matter. Now like you though I never use Select * anywhere else.
It probably doesn't matter there because it is inside an EXISTS() clause..which means that it will return true for the first row it returns...ie. It doesn't really return any columns. Your statement is the same as IF EXISTS(Select 1 From YourTable Where Something = Something). I never use a SELECT * to return data, or INSERT data.
i used to write IF EXISTS(Select * From TableName where condition = criteria) to know wether i should insert or to update a row in a table, but then i just put the update into a stored procedure like this:
update TableName
set Field1 = @Field1
where ID = @ID
IF @@ROWCOUNT = 0
INSERT INTO TableName(Field1)
VALUES (@Field1)
update TableName
set Field1 = @Field1
where ID = @ID
IF @@ROWCOUNT = 0
INSERT INTO TableName(Field1)
VALUES (@Field1)
I kind of figured that was what it was doing, or at least hoped, however I could never find any docs on what it was really doing.
But since I could never get a perf increase or decrease I could only assume, and this thread seemed like the perfect place and oportunity to ask that question.
I would imagine depending on the database there is an option to override the behavior but I have never ran into that as well. A lot of programming languages are set up this way.
For Example in C#
say int he example below Both Method returned true
if(DoMethod1() || DoMethod2())
{}
In that example Method 1 would fire and eveluate to true so method 2 would never fire.
if(DoMethod1() | DoMethod2())
{}
Would actually still force both methods to fire then evaluate to true.
I just never knew if SQL did that or not.
But since I could never get a perf increase or decrease I could only assume, and this thread seemed like the perfect place and oportunity to ask that question.
I would imagine depending on the database there is an option to override the behavior but I have never ran into that as well. A lot of programming languages are set up this way.
For Example in C#
say int he example below Both Method returned true
if(DoMethod1() || DoMethod2())
{}
In that example Method 1 would fire and eveluate to true so method 2 would never fire.
if(DoMethod1() | DoMethod2())
{}
Would actually still force both methods to fire then evaluate to true.
I just never knew if SQL did that or not.
on SELECT *. I always want to know exactly what I am bringing back without having to go to the datatable each time.
The only time I use SELECT * is when I am debugging a query and the temp table has numerous columns. I don't want to have to type out all of the columns in the select list for something that I am going to eventually remove from the proc or script.
It was interesting knowing that indexes are not used when there is an IN or exclusion operator. I will have to re-check some of the procs to see if they can be tweaked by changing WHERE clause.
The only time I use SELECT * is when I am debugging a query and the temp table has numerous columns. I don't want to have to type out all of the columns in the select list for something that I am going to eventually remove from the proc or script.
It was interesting knowing that indexes are not used when there is an IN or exclusion operator. I will have to re-check some of the procs to see if they can be tweaked by changing WHERE clause.
Great article. Thanks for explaining the various ways in which an index can be ignored in a query.
I just want to clarify something that looked to me like a typo. You wrote:
"In my experience with database administration and development, I have seen very few times that have warranted using a SELECT * statement. The reason for this is twofold. It doesn't make sense to bring back rows that you are not going to be using."
Did you mean to say "It doesn't make sense to bring back COLUMNS that you are not going to be using."?
Thanks again.
I just want to clarify something that looked to me like a typo. You wrote:
"In my experience with database administration and development, I have seen very few times that have warranted using a SELECT * statement. The reason for this is twofold. It doesn't make sense to bring back rows that you are not going to be using."
Did you mean to say "It doesn't make sense to bring back COLUMNS that you are not going to be using."?
Thanks again.
In the article I am looking at, it reads "The reason for this is twofold. It doesn?t make sense to bring back rows that you are not going to be using. "
The syntax of a SELECT statement is
SELECT column_name(s)
FROM table_name
so if you say SELECT * you are bringing back all the COLUMNS, not all the ROWS.
SELECT column_name(s)
FROM table_name
so if you say SELECT * you are bringing back all the COLUMNS, not all the ROWS.
Misread it the first time...it is supposed to say COLUMNS, not rows. I'll have it corrected.
Thanks, Tim.
I hate to say this but in my years of searching for more detailed information about indexing, this is one of the few best nuggets I've found on the subject.
I've done lots (and lots and lots....) of comparison tests on my own and have reviewed lots of execution plans and have slowly come to find what does and does not work for me but your article spelled out some specifics that would have been nice to know a few years ago.
I've looked at many books trying to find this kind of information on indexes and usually they have some tripe like, "duh, you should used indexes...." No kidding? Thanks for the advice.... Of course, maybe my problem is that I've looked at the wrong books.
But seriously, this little article does actually help me.
Thanks.
Doug
I hate to say this but in my years of searching for more detailed information about indexing, this is one of the few best nuggets I've found on the subject.
I've done lots (and lots and lots....) of comparison tests on my own and have reviewed lots of execution plans and have slowly come to find what does and does not work for me but your article spelled out some specifics that would have been nice to know a few years ago.
I've looked at many books trying to find this kind of information on indexes and usually they have some tripe like, "duh, you should used indexes...." No kidding? Thanks for the advice.... Of course, maybe my problem is that I've looked at the wrong books.
But seriously, this little article does actually help me.
Thanks.
Doug
Yup, 'Use indexes', duh.
And now we know it doesn't always work and why! Thanks for sharing!
Cheers!
And now we know it doesn't always work and why! Thanks for sharing!
Cheers!
I often have to query a large database remotely, and the Date is a crucial part.
However, the dates are expressed in Uk format, so the queries with day numbers less than 13 can be ambiguous
e.g. is
WHERE
SaleDate >= '11/8/2005' AND
SaleDate ???12/8/2005'
Looking at 11th August, or half of November and December?
I have often looked around for guidance on what formats can be used for date literals in SQL, but no luck so far.
Can anyone guide me to a comprehensive list, or suggest a 'universal' format (like "2005/08/11")?
However, the dates are expressed in Uk format, so the queries with day numbers less than 13 can be ambiguous
e.g. is
WHERE
SaleDate >= '11/8/2005' AND
SaleDate ???12/8/2005'
Looking at 11th August, or half of November and December?
I have often looked around for guidance on what formats can be used for date literals in SQL, but no luck so far.
Can anyone guide me to a comprehensive list, or suggest a 'universal' format (like "2005/08/11")?
Neal,
To the best of my knowledge, SQL Server interprets the date based on the Language setting. So in your example, the dates are August 11 to August 12 2005. You could issue the Set DateFormat command as part of your query to force it to be any supported format. For example,
Set DateFormat MDY
Select x from y where SaleDate = '11/8/2005' now considers this date to be november 8 2005.
From SQL Server 2005 BOL:
"The advantage in using the ISO 8601 format is that it is an international standard. Also, datetime values that are specified by using this format are unambiguous. Also, this format is not affected by the SET DATEFORMAT or SET LANGUAGE settings."
You can find this by looking for Using Date and Time Data.
To the best of my knowledge, SQL Server interprets the date based on the Language setting. So in your example, the dates are August 11 to August 12 2005. You could issue the Set DateFormat command as part of your query to force it to be any supported format. For example,
Set DateFormat MDY
Select x from y where SaleDate = '11/8/2005' now considers this date to be november 8 2005.
From SQL Server 2005 BOL:
"The advantage in using the ISO 8601 format is that it is an international standard. Also, datetime values that are specified by using this format are unambiguous. Also, this format is not affected by the SET DATEFORMAT or SET LANGUAGE settings."
You can find this by looking for Using Date and Time Data.
Great tip. Thanks!
I'm relatively new to SQL Server (a year or so), although I've worked for several years
on Oracle databases, so this whole discussion thread has yielded several gold nuggets.
One thing that I've found to cause a lot of misunderstanding for people querying the database
in our company is the fact that if only a date is specified, the time defaults to 12:00 A.M. (Midnight).
Of course, meaning that if I just use '2007-09-14' I won't get any data except records with a timestamp of
2007-09-14 00:00:00.000 and not from midnight on, which is what they actually wanted. Most don't know this
and I often get a complaint that data isn't correct when the problem was the exclusion of the time in
the date format. To get the full day's data they would need to use something like:
between '2007-09-14 00:00:00' and '2007-09-14 23:59:59'
Is there any way around that? We have hundreds of people using various databases and I have no idea who
they all are, so educating the populace is not a real option.
I'm relatively new to SQL Server (a year or so), although I've worked for several years
on Oracle databases, so this whole discussion thread has yielded several gold nuggets.
One thing that I've found to cause a lot of misunderstanding for people querying the database
in our company is the fact that if only a date is specified, the time defaults to 12:00 A.M. (Midnight).
Of course, meaning that if I just use '2007-09-14' I won't get any data except records with a timestamp of
2007-09-14 00:00:00.000 and not from midnight on, which is what they actually wanted. Most don't know this
and I often get a complaint that data isn't correct when the problem was the exclusion of the time in
the date format. To get the full day's data they would need to use something like:
between '2007-09-14 00:00:00' and '2007-09-14 23:59:59'
Is there any way around that? We have hundreds of people using various databases and I have no idea who
they all are, so educating the populace is not a real option.
You can use ISO format. Or I should say you should use ISO format because it is the safest format. It is not effected by SET DATE FORMAT or SET LANGUAGE settings
e.g. >= '20070914' AND = convert(datetime, '9/14/2007', 112) AND convert(datetime, '9/15/2007', 112)
e.g. >= '20070914' AND = convert(datetime, '9/14/2007', 112) AND convert(datetime, '9/15/2007', 112)
Thanks for the tip about using ISO format. We are an international company and the the SET LANGUAGE and SET DATE FORMAT is a consideration.
Thanks, Tim. I'll try to make sure everybody gets the word.
May God bless you in your work.
May God bless you in your work.
YYYY-MM-DD with literal dates in SQL Server queries.
Saves a lot of heartache.
Oposite problem to yourself, usually caused by some twit in the UK with their date time set to US.
I've got tripped up a few times with auto conversions using the local environment, now I try to avoid it as all costs using explicit conversions where ever necessary.
Saves a lot of heartache.
Oposite problem to yourself, usually caused by some twit in the UK with their date time set to US.
I've got tripped up a few times with auto conversions using the local environment, now I try to avoid it as all costs using explicit conversions where ever necessary.
Tim, I like the article and gives me some ideas. I have a question about the the last example. Would using between be as efficient or more than using the greater than and less than operators ?
Example : SaleDate between '8/15/2005' AND
???8/16/2005'
Example : SaleDate between '8/15/2005' AND
???8/16/2005'
Now I can use 'between' without wondering whether I'm unecessarily having a negative impact.
In your article, you wrote:
SELECT INTO DestinationTable
(Fld1, Fld2, Fld3)
SELECT *
FROM SourceTable.
Did you mean:
INSERT INTO DestinationTable
(Fld1, Fld2, Fld3)
SELECT *
FROM SourceTable.
If that was a typo, please also correct the following typo as well. This is a very practical piece of article and I hope this to be 100% perfect. Thank you for your effort in writing this up.
SELECT INTO DestinationTable
(Fld1, Fld2, Fld3)
SELECT *
FROM SourceTable.
Did you mean:
INSERT INTO DestinationTable
(Fld1, Fld2, Fld3)
SELECT *
FROM SourceTable.
If that was a typo, please also correct the following typo as well. This is a very practical piece of article and I hope this to be 100% perfect. Thank you for your effort in writing this up.
In your article, you wrote:
SELECT INTO DestinationTable
(Fld1, Fld2, Fld3)
SELECT *
FROM SourceTable.
Did you mean:
INSERT INTO DestinationTable
(Fld1, Fld2, Fld3)
SELECT *
FROM SourceTable.
If that was only a typo, please also correct the following one as well:
This statement is exaactly the same as ...
Other than these typos, your article is very useful and practical for programmers, even for old-timers like me who have over 20 years of experience in xBASE programming since the era of dBASE II on Apple CP/M.
SELECT INTO DestinationTable
(Fld1, Fld2, Fld3)
SELECT *
FROM SourceTable.
Did you mean:
INSERT INTO DestinationTable
(Fld1, Fld2, Fld3)
SELECT *
FROM SourceTable.
If that was only a typo, please also correct the following one as well:
This statement is exaactly the same as ...
Other than these typos, your article is very useful and practical for programmers, even for old-timers like me who have over 20 years of experience in xBASE programming since the era of dBASE II on Apple CP/M.
Retrieve the name of the employee whose firstname ends with ???E???.Don???t use the ???%??? wild character.
if you want to contact me use this email
tesfayenator@gmail.com
if you want to contact me use this email
tesfayenator@gmail.com
- Keyboard Shortcuts:
- Prev
- Next
- Toggle

































