General discussion
-
CreatorTopic
-
August 24, 2006 at 12:29 pm #2259169
Please help us beta test TechRepublic’s new search
Lockedby peter spande · about 17 years, 7 months ago
We realize our current site search leaves a bit to be desired. I’d like to invite members to take our new search engine for a test drive. You can find the beta version here:
http://search.techrepublic.com.com/
The search allows members to view all TechRepublic content in one result listing, sort by content type (articles, blogs, white papers, discussions, etc.,)sort by relevance, date, and popularity.
We are prioritizing member and editorial tags in our search results. If there are no matching tags we use full text search for the primary results. If you do a search on the term “Microsoft Windows Vista”
http://search.techrepublic.com.com/index.php?q=Microsoft+Windows+Vista&c=1&s=0&m=20&o=0&i=0&t=0
you can also view the items with most page views from members
http://search.techrepublic.com.com/index.php?q=Microsoft%20Windows%20Vista&c=1&s=0&m=20&o=2&i=0&t=0
you can also filter by secondary terms as well. Perhaps security?
We look forward to your thoughts,comments, etc.
Thanks in advance!
Topic is locked -
CreatorTopic
All Comments
-
AuthorReplies
-
-
August 24, 2006 at 5:18 pm #3283790
Better but
by zlitocook · about 17 years, 7 months ago
In reply to Please help us beta test TechRepublic’s new search
A bit slow and it should have a misspelled word function. If the word is close to what a word is then it could ask if this is what you mean. But it works better then the old search.
What search engine are you using?-
August 25, 2006 at 8:22 am #3209339
Working on Speed
by jfpsf · about 17 years, 7 months ago
In reply to Better but
We are working on speeding it up. You should see improvements in a day or two. I will look into adding common misspellings as synonyms.
-
-
August 24, 2006 at 7:51 pm #3283753
It appears to work very well
by nicknielsen · about 17 years, 7 months ago
In reply to Please help us beta test TechRepublic’s new search
I have yet to not find an article or discussion with the search tags. Now all we have to work on is the tagging. For example, I searched TR for years for the page defining the emoticons, but could never find it. Why?
Because the page is tagged “software” only. Go figure.
-
August 25, 2006 at 12:02 pm #3209248
It’s a matter of time
by Mark W. Kaelin · about 17 years, 7 months ago
In reply to It appears to work very well
Unfortunately, the emoticon definitions article was published before there were tags. But I can take care of that right now.
I’ll tag it with emoticon
(If I can find it) ;\
-
August 25, 2006 at 9:39 pm #3209159
Link
by nicknielsen · about 17 years, 7 months ago
In reply to It’s a matter of time
http://articles.techrepublic.com.com/5100-22_11-5269527.html?tag=search
Once I found it, I saved it! 🙂
-
-
-
August 25, 2006 at 8:15 am #3283643
I like it
by tig2 · about 17 years, 7 months ago
In reply to Please help us beta test TechRepublic’s new search
I tried a number of references and oblique references based on the tagging habits I have seen.
The difficulty will lie in getting peers to remember that their tags should be relavant to content. Good luck with that- there are a number of folks that didn’t find the existing search function helpful so did not tag posts relevant to content.
And of course, Jaqui- “mandatory tagging sucks”.
Spell suggestion or spell check would be very helpful- especially to chronic mis-spellers who shall remain nameless…
-
August 25, 2006 at 8:20 am #3283639
Not just user tags
by jfpsf · about 17 years, 7 months ago
In reply to I like it
The search doesn’t rely only on tagging by TR users. We have built software that tags all the content by extracting all the statistically significant phrases from the text. That is what is providing the majority of the tags.
-
August 25, 2006 at 11:28 am #3209266
Then I missed something
by tig2 · about 17 years, 7 months ago
In reply to Not just user tags
And will go back and re-try. I searched on “cancer” because I was reasonably certain of discussions on this topic. It didn’t surprise me when those discussions were yielded, nor was I surprised to see white papers that used the term in their titles. I did expect to see a blog but also am aware that blogs are being phased out and you may not accomodate them.
I wasn’t surprised when the “Join the Pink Ribbon Brigade” didn’t show as I didn’t tag it for cancer, but if the software should have discerned that content, then I am curious as to what I should have expected.
I shall have to continue to play with this…
-
August 25, 2006 at 2:53 pm #3209205
Limits of Software
by jfpsf · about 17 years, 7 months ago
In reply to Then I missed something
The software will only tag the content if the phrase appears often enough.
-
August 26, 2006 at 8:26 am #3209139
That makes sense then
by tig2 · about 17 years, 7 months ago
In reply to Limits of Software
I believe that the actual phrase occurred once or perhaps twice in the originating post and comparitively few times thereafter.
Now the results make sense. I appreciate you clarifying that for me!
-
August 26, 2006 at 9:33 am #3209134
Try Pink Ribbon Now
by jfpsf · about 17 years, 7 months ago
In reply to That makes sense then
The main problem appears to be that the software decided “ribbon” was a tag, but not “pink ribbon.” Generally, we try and strip out adjectives, and for the most part that is a good rule. Otherwise we don’t identify enough tags.
We had a failover search feature for when there were no tag results that I had taken out for performance reasons. I am happier with the performance now, so I have put it back. So, try pink ribbon now.
-
August 28, 2006 at 5:04 am #3208985
Nice!
by tig2 · about 17 years, 7 months ago
In reply to Try Pink Ribbon Now
The response was quite good and found what I thought it should as well as some things I didn’t know existed.
I think that the speed is fine but that’s just me. Nice work!
-
-
-
August 25, 2006 at 11:55 am #3209252
Thank goodness it’s finally launching
by rexworld · about 17 years, 7 months ago
In reply to Please help us beta test TechRepublic’s new search
I was working on this tag search like six months ago, before I left TR. I’m glad you guys finally are launching it. Though clearly somebody has improved, if not completely replaced, my crappy code because this one runs really quickly 🙂
I’m heavily biased of course but I love this search. It’s not just incrementally better than the existing search, it’s orders of magnitude better I think.
My only input is that having the number of results in the the filter drop-down is confusing a bit. Because it almost makes it seem as though when I type in a new search term it’s going to only search within those results in the drop-down. Instead it’s doing a whole new search across the entire selected category.
-
August 25, 2006 at 12:09 pm #3209245
No full-text failover?
by rexworld · about 17 years, 7 months ago
In reply to Thank goodness it’s finally launching
Okay one other comment–I think the lack of a full-text failover is bad. Because one thing that folks use a lot is searching for a person’s name. For example if I want to search for threads from or about “Sonja”, I can’t do that with this new tag search page.
You need to integrate full-text with tag-search, can’t have just tag-search.
-
August 25, 2006 at 2:51 pm #3209206
full-text failover
by jfpsf · about 17 years, 7 months ago
In reply to No full-text failover?
full-text backup was removed while I sped it up. It’s back now, Rex. And, your code was pretty good.
-
-
-
August 25, 2006 at 7:42 pm #3209172
OK I waited a bit till you got it faster
by hal 9000 · about 17 years, 7 months ago
In reply to Please help us beta test TechRepublic’s new search
Well for Discussions it works fine even if it did find that [b]Horrible Green Festering Monster[/b] that is taking over your servers called [b]The Evolution Lie.[/b]
It seems quite OK for the discussions but seems to mis some in the White Papers though to be fair the one that I was looking for is quite old and may no longer exist as I found several that had been removed from your Servers.
Over all it’s a great improvement to the old search engine and I can see that with some more work that has been said will be done it will be a massive improvement to TR.
Great Job Guys!
Col
-
August 26, 2006 at 9:43 am #3209132
White Paper Examples
by jfpsf · about 17 years, 7 months ago
In reply to OK I waited a bit till you got it faster
Can you give me some examples of white papers you couldn’t find, because the search index includes the entire white paper database. Maybe, I can figure out the problem.
-
August 26, 2006 at 3:00 pm #3209113
The main one that I couldn’t find was
by hal 9000 · about 17 years, 7 months ago
In reply to White Paper Examples
A great TR publication on Consulting Agreements. It was basically a template for Consultants to use to set up their Service Agreements with Companies. I used to point various companies to it as a way to look at how a Consultant should be approached but for some reason I lost the link.
While there are other good Consultant Agreements listed this TR one was defiantly the best but as I looked through the results of the search I noticed that several no longer where in the DBase so maybe it’s been removed to make way for something newer.
Col
-
August 26, 2006 at 4:06 pm #3209106
Is this it?
by jfpsf · about 17 years, 7 months ago
In reply to The main one that I couldn’t find was
-
August 27, 2006 at 6:26 am #3209069
Sorry no that’s not it
by hal 9000 · about 17 years, 7 months ago
In reply to Is this it?
The one that I was looking for was a PDF only file.
JD I’ll dig through my stored White Papers and try to find it again and post you a link when I manage to find it.
Col
-
August 28, 2006 at 4:27 am #3208990
JP It’s not currently on this system
by hal 9000 · about 17 years, 7 months ago
In reply to Sorry no that’s not it
So I’ll have to dig through some of the generational Backups to see if I still have a copy. That might take a bit of time to do.
However the one that I was looking for was more of a Generalisation of what should be in a Consultants agreement not a [b]Fill In The Blanks[/b] type thing.
Col
-
August 28, 2006 at 8:20 am #3205729
We are looking too
by jfpsf · about 17 years, 7 months ago
In reply to JP It’s not currently on this system
We have the TR editors looking for this now.
-
September 11, 2006 at 12:25 pm #3226971
JP if this is of any help
by hal 9000 · about 17 years, 6 months ago
In reply to JP It’s not currently on this system
I haven’t found a copy of the thing but I haven’t had time to work through all the PDF files either yet.
But I first saw this in 2003- 2004 time frame and think that it was a Service Agreement or Independent/Outside Consultants Agreement. Of course if they had names and not numbers it would be easier to find. 🙂
Col
-
-
-
August 25, 2006 at 10:02 pm #3209156
What I would appreciate, would be to have a search
by deadly ernest · about 17 years, 7 months ago
In reply to Please help us beta test TechRepublic’s new search
able to filter on Thread, Post Title, and Poster, with a wildcard if one of them is vacant. Also being able to set a time frame for the search would be nice, but not as important as the other.
Often I will remember the thread title subject, or the post title subject, or part of one of them, usually I’ll remember who posted it.
A perfect example is my current problem. I’m trying to find a post I made some months back where I costed the, then values, of a Vista ready machine against an iMac. I have used the search to view numerous threads but still can’t find this particular post in any of the threads.
If I could run a search along the lines of:
Thread Title = Vista or Cost
Post Titles = Vista or Cost
Poster = Deadly Ernest
Then many of the threads and post would not appear. It would not matter if the results page only listed the threads, as opening that Thread page would list all the posts within it and a scan of them should trigger my memory, in the end I just open all my posts in that thread. But it would eliminate all the threads on Vista that I did NOT participate in.
In the above search algorhythm, a blank should always = Anything wildcard
-
August 26, 2006 at 10:50 pm #3209081
Search for emoticons
by ontheropes · about 17 years, 7 months ago
In reply to Please help us beta test TechRepublic’s new search
and see if you can find this. 😐
http://articles.techrepublic.com.com/5100-22_11-5269527.html?tag=search
-
August 27, 2006 at 7:00 am #3209065
NB, we already tried
by nicknielsen · about 17 years, 7 months ago
In reply to Search for emoticons
and you still can’t find it.
-
August 27, 2006 at 7:51 am #3209058
Please remind me next time
by ontheropes · about 17 years, 7 months ago
In reply to NB, we already tried
to actually [b]read[/b] the entire discussion before posting a reply.
That’s the first time I’ve made a mistake… :p
-
August 27, 2006 at 2:36 pm #3209044
In this thread…
by nicknielsen · about 17 years, 7 months ago
In reply to Please remind me next time
:p
-
-
August 27, 2006 at 8:18 am #3209056
I will add it to the index
by jfpsf · about 17 years, 7 months ago
In reply to Search for emoticons
I will make sure that article gets tagged emoticons in the index.
-
-
August 27, 2006 at 10:27 pm #3209017
This does not work in the slightest.
by justin james · about 17 years, 7 months ago
In reply to Please help us beta test TechRepublic’s new search
“AII” does not find the blog comment I wrote containing that phrase. “Justin James” only finds one item, the T1000 review I wrote. “Justin +James” does not bring up anything I wrote at all. “J.Ja” does not bring up any results, despite it appearing in everything I ever wrote for this site except for articles. “Justin AND James” does not return results. “”Justin James”” brings up results, but none for me. The results it brings up all simply have the word “James”. “Usability” does not bring up a single one of my blog posts or discussion items or blog comments, despite it being something that I am constantly writing about. Relevancy should more heavily weight the words in the query as the query words nearly always decrease in importance from right to left in the searcher’s mind; instead, all words in the query are weighted equally. As another commenter already pointed out, you also need to be using a soundex algorithm and/or a spell checker to be catching mispellings, typos, and mistakes.
This search does not work right at all, where I sit. It actually pulls up less items than the current search system.
Tag search is worthless, for the following reasons:
* Tags are not standardized. I may use “software development” as a tag, the search may look for “programming.”
* No automatic thesaurus for tagging, with a proper weighted graph of meaning. “Software development” would be just as important as “programming” if the user searches for “programming”, but a more obscure synonym for “software development” would be less important. Without the thesaurus, and standardized tags, the system lacks all use if it prioritizes tags over all else.
* Limited tags. TechRepublic limits the length of the tag line, so I cannot even manually type in all of the applicable synonyms into the tags.
* Emphasis on filtering full text search by the frequency of occurance is bad. If I write an article about “Timothy Walters” and refers to him throughout the article as “Mr. Walters” with the exception of the initial paragrah saying “Timothy Walters,” your new search system will not find my article about “Timothy Walters” because that name only appears once, despite my whole article being about him.
As such, the system MUST be using full text search, not as a failover, but it must be using full text search with tagging to add additional weight, provided the conditions I specify above are met (and that is just the tip of the iceberg!).
“Folksonomy” as an idea is sexy, hip, cool, and has the aura of “democracy.” In reality, it is nearly useless in terms of helping users. There is plenty of research out there to support that statement.
I heartily recommend the following article about vector-based search systems before attempting to write any search engine: http://www.perl.com/pub/a/2003/02/19/engine.html I also highly recommend a THOROUGH reading of Jakob Neilsen’s research into search habits (www.alertbox.com). He has put in countless hours getting detailed statistical models of how users actually use search. Search Engine Watch is another good resource as well.
Attempting to shoehorn quality search within the confines of standard SQL is nearly impossible. I do not recommend it in the slightest. I can almost imagine how the search SQL is written, too, because I wrote something along these lines a few months ago, but at least the data was not actual content, and therefore tricky to search, it was mostly numerical and address searching. Trying to use SQL for something like this is hopeless. You probably want to be using a functional programming language such as Lisp, Scheme, O’Caml, etc. to implement a search, or one that has a lot of functional programming elements in it like Perl or Ruby. Trying to write search in an object oriented language like Java is like trying to move a pile of sand with a pair of tweezers, when there is a shovel two feet away that you are not allowed to touch.
Also, the results returned are hard to read. Much better than giving the first few lines of an item is giving the word/phrases itself, surroudned by context, which lets the searcher more easily know if the content is really what they need. Sometimes we only need one sentence, or only remember one phrase, the phrase that we are looking for.
A number of years ago, I wrote a basic search system (for an ecommerce package I was sort of working on) that could at least parse out Boolean operators (both symbolic and using words). It is in Perl, but should translatable fairly easily into another language. I would be happy to send it to you guys to use.
J.Ja
-
August 28, 2006 at 5:43 am #3205823
addressing your points
by peter spande · about 17 years, 7 months ago
In reply to This does not work in the slightest.
You’re right to say that user tagging alone will not work. We’ve done two things to add to this –
1. We have machine and editor tagging added to the search tables
2. We have full text search as a back up.We’re also including related tags into this so that someone search for “Windows” can standardize their search on “Microsoft Windows.”
As for the author elements, this is one of the reasons we are switching over to this search yet but asking members to take it for a test drive. Stay tuned and thanks for pushing us to make this better.
-
August 29, 2006 at 6:21 pm #3284827
some more points
by tp_cnet · about 17 years, 7 months ago
In reply to This does not work in the slightest.
J.Ja, thanks for your detailed reply.
To reiterate and elaborate a bit upon what JP_CNET and Peter have noted, we’ve built a hybrid search system that combines tags and a full-text search engine in an effort to leverage the best of both worlds.
The tags used by the system are a combination of member-created tags, editor-created tags and software-generated tags. These tags not only help with keyword-oriented searches, but also let us show you similar or closely related tags to help you guide your search. The current notion of related-ness is based on content alone, but in future we hope to use additional methods to expand and refine the list of related tags.
We will consider the use of soundex and/or spelling correction algorithms to handle typos. We also plan to index author names so you’ll be able to search by name.
As a point of fact, our content analysis engine, which does the software tagging and is based on a vector space model, does attempt to map “Walters” to “Timothy Walters” etc, when it discerns (based on specific text patterns) that the latter is likely the name of a person. This is not very foolproof however since we need to err on the side of caution to avoid false positives.
We do read Search Engine Watch when we have time 🙂
-
September 5, 2006 at 7:25 am #3199010
Excellent!
by justin james · about 17 years, 6 months ago
In reply to some more points
I am really happy that you guys are revamping/improving the search, and taking these suggestions seriously. It is really frustrating to know that something actually exists on the site and to not be able to find it. I look forwards to seeing the finished product!
J.Ja
-
September 5, 2006 at 10:51 am #3198894
Boolean Full Text Searches
by jfpsf · about 17 years, 6 months ago
In reply to Excellent!
Justin,
I just rolled out boolean full text search to test. You can now enter “ibm service” to search for exact phrase, or +ibm -microsoft to search for documents that contain ibm, but not microsoft, etc.
This bypasses the tag search and goes right to full-text.
The full-text index is not ideal yet, but it is heading in the right direction.
-
September 11, 2006 at 10:15 am #3200261
Hmm.
by justin james · about 17 years, 6 months ago
In reply to Boolean Full Text Searches
It definitely seems to work, to an extent. It still has a few problems:
1. English versions of the Boolean do not work (for example, “Justin AND James”).
2. It does not consider the username as part of the text, and since it is not a tag either, a search for “Justin James” will only find items where that text is in the content itself or the tags, but not anything that I have actually written.
3. Quoted queries do not function, for example, “”Justin James”” does not work.
4. It only seems to be doing full text search on articles, not blogs or discussions.
Hope this helps, and thanks for working so hard to deliver better search!
J.Ja
-
September 13, 2006 at 7:57 am #3228179
Response to your points
by jfpsf · about 17 years, 6 months ago
In reply to Hmm.
Sorry to take so long to respond, but I have been out of the office.
1. No, that will do a dual tag search. Only the Boolean operators work.
2. We are adding user name as a tag to content. Just taking a little time.
3 and 4. It works, but not all of the content is there to be searched yet.
-
-
-
September 6, 2006 at 12:44 pm #3201477
New Features
by jfpsf · about 17 years, 6 months ago
In reply to Please help us beta test TechRepublic’s new search
We have added several new features to the search.
1. Corrections of miss-spellings, and indistinct search terms
2. Boolean full text support. For example, “Network Administrator” to search for that exact phrase. +IBM -DB2 to search for all documents containing IBM, but not containing DB2.
Please, try them out and give us your thoughts.
-
September 11, 2006 at 10:17 am #3200260
One more bug…
by justin james · about 17 years, 6 months ago
In reply to New Features
It does not work properly at all when a period is part of a search token. For example, “J.Ja” returns a zillion articles about Java.
J.Ja
-
September 11, 2006 at 2:51 pm #3226915
-
-
-
-
AuthorReplies