So what does it actually do?

The Google Mini, stripped of the fancy wording and vague feature descriptions, is a search bot. It looks for what it's configured to search for, keeps track of it, and keeps looking for more indefinitely, until it has reached its page limit. At that point, it stops adding new pages, but will keep its existing index properly updated. After setting it up and unleashing it on your unsuspecting web and file servers, you will find your Mini slowly gathering results and building up its index.

Once the Mini is online, a user visiting its IP address finds the familiar Google search page. As the indexing progresses, the Mini begins to give the results one would expect. It searches the designated websites, and automatically indexes the designated fileservers for files, as well as the contents of the files. The crawler handles common formats such as .pdf, .doc and .xls (full list available here).

One possible area of concern (which we certainly had) is the Mini's ability to search content that normally can't be accessed without authentication. The Mini includes some basic authentication methods for both websites and file shares. We configured our Mini to crawl our samba fileserver by providing it with an existing account with read rights, and although secure websites definitely provide a bigger challenge, the Mini is equipped for most basic authentication processes. We had no problem configuring the Google Mini to use HTTP-based and HTTPS-based login procedures, although more advanced authentication methods require the more expensive Google Search Appliance.

A full list of all possible authentication methods - and a comparison to what the GSA can do - can be found here.

Setting the Mini up properly might require some snooping into the help documentation (which, ironically, isn't searchable). Note that using the "Make Public" checkbox allows you to make secure search results public. The end result is that everyone would be able to see the corresponding URLs in their search-results page, but will need to authenticate if they choose to open a file that requires authentication to be accessed.

Leaving the "Make Public" checkbox unchecked would require people to authenticate before viewing search results from a secured webserver. However, the Mini doesn't yet support this type of restriction for fileservers, meaning that these files are publicly visible to anyone using the search system, so the Mini's administrator should take precautions not to index confidential files on file servers.


Adding in some credentials for our secure content.

Larger organizations may appreciate the ability to use different "collections" for different users or situations (see our previous Mini review for more information on collections). The Mini's administrator can create several collections of, for example, knowledge-base articles and news messages, and have these searched separately. Different front-ends can be added and customized to seamlessly fit the website in question, to make this separation transparent to the user; Anandtech's own search function is an example of this.

Once properly configured, the Google Mini is essentially a basic Google bot. For small intranet-based user groups, this might be all that's ever needed. However, integrating the Google Mini with existing websites supporting a large user base (such as Anandtech.com) calls for some extra functionality to make the addition more seamless, and to take advantage of the full capabilities of the system.

Scratching the surface Exploring the Mini's possibilities...
Comments Locked

19 Comments

View All Comments

  • GhandiInstinct - Friday, December 21, 2007 - link

    lol
  • legoman666 - Friday, December 21, 2007 - link

    I would have expected this product to be a few years old with hardware like that. A prescott? seriously. And no RAID?
  • razor2025 - Friday, December 21, 2007 - link

    It's a search engine appliance. The product's main focus is in its software algorithm, not how "fast" the hardware itself is. Why would it need RAID? Any sane network/system administrator will have this box backed up in regular interval to the backup array / server. RAID != back up and this product doesn't need the file system performance either.
  • legoman666 - Friday, December 21, 2007 - link

    I didn't comment about the prescott and the lack of RAID based on a performance concern. The precott is hot and inefficient, why not get something that uses less power (IE, a C2D) even if it doesn't need the added processing power of a newer chip? That way, they could market it as a effiecient device or green or whatever.

    As for the RAID, I am not talking about RAID0 (technically that's not even raid), I was leaning more towards RAID1 or RAID5. They mentioned in the review that it took 36 hours to crawl to the 50000 document capcacity, I'm sure most people wouldn't want their search function down for 36 hours while the engine reindexes because it wasn't backed up. Not only that, but you'd probably have to send it back to Google for repairs with only a single drive. With 2 in RAID1, if one dies, a replacement could easily be swapped in.
  • razor2025 - Friday, December 21, 2007 - link

    Maybe it's an option you can request to Google. As for your take on RAID, you're still treating it as Backup. It would be must simpler if they had a second backup google mini instead. Look, they're charging you for the license per document, not how many mini you have hooked up. Also, it's in a 1U form factor. I highly doubt they can manage to squeeze in another drive to satisfy your "RAID!" obsession.
  • Justin Case - Friday, December 21, 2007 - link

    Backups take time to restore from. RAID1 means no downtime. It *is* a backup, and one that's available instantly.

    It doesn't replace regular, preferably _remote_ backups, but it's a pretty basic feature of any system designed to have zero downtime.
  • reginald - Wednesday, January 2, 2008 - link

    RAID and backup are two entirely different things. No RAID in the world can protect you against the same things as backups can (handling errors, programs incorrectly overwriting data, etc). And backups can never replace RAID to achieve continuous availability.

    Thinking you need no backups because you have RAID is like thinking you need no seatbelt because you've got insurance. They simply aren't the same.
  • rudder - Friday, December 21, 2007 - link

    Prescott performance aside... as the article mentioned this is a 24/7 device... why use such a toaster of a cpu when Core2Duos would not add a whole lot to the bottom line?
  • Calvin256 - Tuesday, January 1, 2008 - link

    If you're looking at the prices as a consumer, that may be the case, but you need to rememeber that Google/Gigabyte is not you or I. When purchasing in bulk those processors can be VASTLY cheaper than we could ever hope to pay, even when they're in the bargain bin at shadyetailer.com. Things made for consumers can easily be marked up 200-2000%, things made for OEMs might have a 50-100% margin.

Log in

Don't have an account? Sign up now