Google developed a local search program for workstations called “Google Desktop” a few years back. Google Desktop was designed to allow you to search your local documents in much the same way that Google Search lets you comb the web. It offered integration with other Google products such as Gmail and Talk and thanks to caching capabilities it could permit you to work with your email offline.
I never really got on the Google Desktop train since I felt it caused performance problems resulting in user complaints to my company’s help desk. At the time my users only relied on Google products for their personal endeavors so we discouraged the installation of the Desktop application in order to maintain business operations.
There were also some security concerns revolving around the program; exploitable vulnerabilities were discovered which Google addressed in an updated version, but privacy issues nevertheless also plagued Desktop. Of particular note was the fact that the default options in one version would actually store files on Google servers whereby other users could access them, putting confidential data at risk. Granted, this required these individuals to know the Google credentials involved but certainly history has shown us numerous examples of ID and password theft by the bad guys.
Google Desktop was finally retired in 2011 with Google stating “in the last few years, there’s been a huge shift from local to cloud-based storage and computing, as well as the integration of search and gadget functionality into most modern operating systems. People now have instant access to their data, whether online or offline. As this was the goal of Google Desktop, the product will be discontinued.”
The goal of being able to search local resources for relevant content is a good one and a clear business need exists for this demand, as evidenced by various search programs such as Coveo, Attivio and dtSearch. Google’s methodology for providing enterprise search capability shifted from a desktop to a server concept.
Enter the Google Search Appliance.
(image provided by googsolution.biz)
What is a Search Appliance?
The search appliance runs as a server inside or outside your network. It indexes critical content by “crawling” your company resources (or public web servers) for data so users can locate it using the same search methods offered on the Google Search site. This may include websites, file shares and databases. You can configure the type and location of your data to ensure only appropriate content is shown to users. Reporting capabilities can show you how your employees are using the appliance so you can get the best results from it.
(image provided by developers.google.com)
As shown above, it’s easy to specify the URLs and file shares which the search appliance should crawl, as well as filtering out unwanted locations or file types.
What kinds of content can it index?
The full list of file formats which the search appliance can index is lengthy and diverse, numbering several dozen available extensions. This includes all popular and well-used file formats such as:
The search appliance can even index ancient document types such as those created in Lotus 1-2-3 or WordStar for DOS!
Attributes such as who created the document, when it was created and where it resides can help streamline search results. It’s also possible to add keywords to content so it is more easily located; “move_project” can be applied to data related to a cross-town company move, for instance. This can be performed by users as well as admins.
Content is not crawled if the server or site contains a “robot.txt” file at the root location; this tells the search appliance not to index the material at that resource. This is the same principle which blocks Google Search from doing the same on the public internet.
In terms of capacity, Google states they currently provide two versions of the appliance:
- The “G100” which can index 20 million or fewer documents.
- The “G500” can index as many as 100 million documents.
You can add devices as your needs expand, so therefore starting out with the G100 and adding another G100 (or upgrading to the G500) is a possibility.
How do users access it?
Since the search appliance is a server like any other, users connect to it over the network via their web browser of choice. The example below shows a search conducted via the search appliance URL of “gsa.yourdomain.com.”
(image provided by code.google.com)
In this case the search was conducted for the term “Google Earth.” Note how the results returned include company resources, Salesforce links and public webservers.
Is it secure?
One person’s definition of “secure” may be vastly different from that of someone else, so this is a hard question to answer. However, Google has built several security mechanisms into the search appliance. In the first place, unlike the Google Desktop version which caused such a commotion, the default option when configuring crawler access is NOT to make the infrmation public. This option must be specifically checked off:
(image provided by www.google.com)
Furthermore, you must configure authentication at the appliance to required locations, as shown above. You can configure authentication requirements so users can only see the material to which they currently have access (for instance, if you index a file share with rigid security permissions these permissions will still apply when users access the search appliance looking for data).
In other words, the search appliance will not permit unwanted access to private material, but will work with existing permissions/restrictions.
As I understand the documentation, the search appliance only stores a local index to data; it does not actually copy the data locally. So, if someone breaches your data center and steals the box they won’t get your confidential information (although of course in this scenario they could simply steal your source server(s) or hard drives, so make sure those data centers are well-guarded, folks!)
How much does it cost?
This is often another tough question to answer. The pricing arrangement for the search appliance is based on a per-document structure. A document provided by Google titled “Return on Information: Improving your ROI with Google Enterprise Search” examines the productivity gains offered by the search appliance and estimates that a company with 50,000 employees might pay $45K per year for the product.
A reply to this question on quora.com last summer suggested that “Price depends on number of documents, starting from 500K documents it costs you about $1000 per month up front for 3 years.” A similar reply on stackoverflow.com indicated the search appliance “starts at around $20k for 500k documents/URLs.” Both of these comments are unsubstantiated, of course, but will hopefully provide a ballpark figure for smaller organizations.
Google requests interested buyers to contact their Sales department at 855-720-6978, or visit their Google Search Appliance page and click “Get in Touch” button to fill out an online form requesting information.
Where can I find out more?
Google’s Search Appliance product page contains some basic details, but for the true nitty-gritty check out their “Google Search Appliance Overview” page, which provides links to documentation, tools, support resources and more.