Networking

Create a custom search engine in minutes with the Google Mini

Google has bottled up some of their search magic into an easy-to-use search appliance called the Google Mini. Aimed at small- to medium-sized businesses, it is a capable and flexible tool for building custom search engines.

The billions of documents on the Web are worthless if you can't find the one document you really need. That's the power behind what Google.com does, and the company has bottled up some of that power into an easy-to-use search appliance called the Google Mini. Aimed at small- to medium-sized businesses that need a search engine without the hassles, we found the Mini to be capable and flexible. We just wish the upgrade path was a little bit easier on the pocketbook.

A 30-minute install

The Google Mini hardware is simple enough: a bright blue 1U-height rack-mount box with the big "Google" logo painted on top. Connect a network cable (included) and the power cord, then wait for the happy tone that tells you it has finished booting up. There's no separate power switch--once you've plugged it into a wall outlet it begins the boot-up process.

At this point you have to do the network configuration. That means giving the Mini its IP address for your network, telling it what email server to use when it sends out status alerts, stuff like that.

To do that you have to plug a laptop into the Mini. Actually I suppose you could drag the Mini to a desktop and plug it in, but you're probably gonna have the Mini set up in a server room or a closet somewhere so a laptop is the easiest way.

There's two RJ-45 ports in the back of the Mini--a network port that connects it to your regular network and a configuration port designed specifically for the initial configuration tasks. Google provides a special crossover cable that lets you plug your laptop into that configuration port, and they've thoughtfully put a tag on that cable telling you what the config IP address is.

You simply fire up a browser on the laptop and point it at the Mini's config IP address. At this point your Mini needs to be told how to use the network port--what IP address it should be, what DNS machines to query, what SMTP server to use, and so on. The Mini's administration console is essentially a series of Web forms where you fill out with this information.

The Mini can obtain some of the needed configuration (i.e., DNS, gateway address) from your network's DHCP. However you will need to give the Mini a static IP address (provided by your network admin). And some things, like SMTP servers, aren't part of the DHCP protocol to begin with.

It all sounds a bit complicated but it really isn't. There's test functions built into the admin console to help you debug whether you've input valid addresses and names. We may have been a bit lucky, but from the time we first plugged in our Mini until the time we had the initial configuration finished was about 30 minutes.

Our one complaint was that the configuration port is in the back. In theory you'll probably never need to change the network config but we would have liked to see a configuration port in the front just in case. Depending on your rack space it could be a little complicated to get a laptop re-connected to that rear configuration port.

Once you've got the network configured you can unplug the laptop and perform the actual search configuration from any browser on your network. The simplest configuration is to just specify a name for your collection, give it an initial url where it should begin the crawl, and then accept all the other default parameters. If you go that route you could have it beginning the crawl within a few minutes.

However you'll probably want to use at least a few of the other search options. The Mini can index a maximum of 100,000 documents, so the most likely parameters you'll want to control are which url patterns to crawl and which to ignore. That way you don't waste any of those 100,000 documents crawling urls you don't need. For example, if your site provides printer-friendly versions of all Web pages you'll probably want the Mini to ignore those.

You do all this thru the Mini's browser-based console. The UI is a bit bare but it's also pretty self-explanatory. One helpful item for those of us who don't live and breathe regular expressions--there's a test button that lets you see whether a specific url will fit the patterns that you've input.

Once the Mini has finished the crawl, it builds an index from all the documents and then copies that index out to be served from a query interface. The query interface looks, of course, an awful lot like Google.com. If the Mini is running as part of your Intranet you can stick with the basic customizations, like using your company logo instead of Google's at the top of the page. If you'll be using the Mini to serve up a public search page you'll want to use the built-in XSLT editor to really customize the pages so that they look like the rest of your site.

Options galore

One of the options we appreciated in the Mini is that you can easily change the user agent the crawler reports when it reads in Web pages. That's essential if your site traffic gets audited and you need to be able to exclude the Mini's crawler from any "real" page hits.

We also like the Synonyms feature. Suppose you have a lot of Macintosh content, you can specify the word "Macintosh" as a synonym for "Apple". When a user searches for "Apple" they'll see a hint at the top of the search result page telling them they can also search for "Macintosh". Click on Macintosh and the Mini will carry out the search.

One thing we found a bit annoying was that Synonyms are strictly one-way. If you need a pair of words to be synonyms in both directions you have to explicitly set up both. Google says this is for maximum flexibility, and that's probably true but it's still a little irritating for those of us who expect synonyms to work automatically in both directions.

Another search option we liked was KeyMatch--for a specific search word or phrase you can specify an url and text to display at the top of the result page. It ends up looking and working a bit like the keyword ads on Google.com. If you're having a special sale on soap, you might set up a KeyMatch so that whenever somebody searches for "soap" they see a short note at the top of the page and a link that takes them to the sale page.

Pricey from here

We definitely recommend the Google Mini if you need an easy-to-use way of indexing and searching a relatively small collection of Web pages. It's almost trivial to set up and get started, much easier than having to set up your own separate hardware and search software.

However you might want to think about the upgrade path. If the Mini's 100,000-document limit ever becomes an issue, you could buy a second Mini. But that means having to find a way to configure them so they crawl separate sections of your site, and it means users would have to query both interfaces to search your site.

Your other option is going for the big Google Search Appliance which can index anything from 500,000 documents on up. But it starts at a $30,000 price point, a substantial jump from the Mini's more reasonable $2,995 price.

0 comments

Editor's Picks