Networking

Scroogle: Adding privacy to Google Search

Google Search is an amazing tool. Even so, to many, it has a dark side. Scroogle may be able to help.

Over the years, I've witnessed--from a safe distance--highly-charged debates about search behemoths like Google. The topic most often discussed is whether or not they retain too much Personally Identifiable Information (PII) for too long. Valuable lessons surfaced from those frank discussions, many important enough for me to write about.

Another place where I have gleaned similar information has been in the comment sections of the articles I just mentioned. One example is my introduction to Scroogle.

My first impression was: What an odd name. I didn't think much more of it. Then a colleague gave his middle-finger explanation of the term. "Oh," was all naive me could say, "You really think so?"

Scroogle, what is it?

Now I had to find out about Scroogle. First thing that caught my eye:

"Every day Scroogle crumbles 350,000 cookies and blocks a million ads."

Next thing I noticed, Scroogle does not:

  • Pass cookies on.
  • Keep search-term records.
  • Retain access logs for more than 48 hours.

The website calls Scroogle a scraper. Being from Minnesota, I have this image of a scraper and it is not Scroogle.

Actually, after some study, referring to it as a scraper does make sense. The pertinent search results are "scraped" from Google's response to the search query. And only that information, no cookies or additional requests, get back to the client's web browser.

The following slide depicts the steps involved (courtesy of Scroogle):

Behind the scene

The process is simple. You enter your search request in the web browser, like normal. It is sent to Scroogle via a SSL connection -- more on that later. Scroogle replaces all your identifying information with that of Scroogle. The search request is forwarded to Google. Google records the IP address and search information issued by Scroogle.

Google then replies with a cookie and the search results. Scroogle sanitizes the data, sending only the search results back to you. Below are the search results for ice scraper using Google:

Next are the results using Scroogle:

Scroogle, the plugin

The website calls Scroogle a browser plugin. Simple enough to implement, but I'd like to expand on the minimal help offered by the website:

  • Firefox: This link is to the Firefox add-on. All that is required is to click on the Add-on button.
  • Internet Explorer: Microsoft set up Internet Explorer to ask for the desired search engine. Details are at this link. All that is required is to enter http://www.scroogle.org/cgi-bin/nbbw.cgi?Gw=TEST where it asks.
  • Opera: Click on the following: Tools/Preferences/Search/Add. Pick a new keyword "example" and use http://www.scroogle.org/cgi-bin/nbbw.cgi?Gw=%s as the address.
  • Chrome: Click on Wrench/Options/Default Search Manage/Add. Then paste https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi?Gw=%s where an URL is requested.

If you prefer not to alter the current configuration of your web browser, or are using a computer other than your own, Scroogle has a webpage similar to Google, where you can enter search terms.

Back to SSL

The Scroogle website points out why the creators decided to use SSL connections:

"For Scroogle, SSL is used to hide your search terms from anyone who might be monitoring traffic between your browser and Scroogle's servers. This encryption happens when you send your search terms to Scroogle, and it also happens when Scroogle sends the results of your search back to you."

The SSL webpage points out another advantage that I was not aware of:

"When the Scroogle results come back from an SSL search, and you click on any of the links shown on that secure page, there is another advantage. SSL does not allow the browser to record the address where that secure page came from and attaches it to any outgoing non-SSL links on that page. Normally all browsers do this and it's called the "referrer" address.

Using SSL blanks out this referrer, so that any non-SSL site you click on from a Scroogle SSL page won't know that you arrived at their site from Scroogle. The referrer will be blank, and your log entry at that site will look like any of the hundreds of bots that crawl the web all day and night with similar blank referrers."

I did not know that until now.

That said, do not let the use of SSL connections lure you into a false sense of security. SSL may or may not be in play after you click on one of the returned search links. It depends on whether the web server advertised in the link is using SSL or not.

Both use SSL

Google also has the option to use SSL. And, Google makes the same claim on how encryption prevents third parties from intercepting transmissions between the user's computer and Google Search web servers.

My immediate thought: It would be cool if the Scroogle servers talking to Google Search would use their SSL connection. I shot off an email to Scroogle and Daniel Brandt, Founder and President of Scroogle, offered this:

"No, the connection between my servers and Google does not use SSL.

There are two reasons for this:

  • The search terms for that hop are carried by the IP address of my server, and the only way they can be associated with the searcher's IP address would be if someone hacked into my dedicated servers and read my logs. And they'd have to be quick about it, because I don't keep any logs longer than 48 hours. I'm the only one with access to my servers.

  • I do not use DNS to do a lookup of www.google.com. Instead, I randomly select one of their static IP addresses for www.google.com (they have thousands). As you may know, https initiation requires a handshake that certifies that the domain name belongs to the IP address. Since I'm not using "www.google.com" at all, I cannot initiate an https session with Google."

That makes sense to me. Thank you for clearing that up, Daniel.

Quality of SSL connection

I just happen to be researching a new Comodo website, SSL Analyzer. It is a free web-based scanning tool that checks the security of a web server providing SSL connections.

Included in the summary is information about the certificate and digital signature. Also included, is a list of security protocols and encryption suites supported by the web server.

SSL Analyzer uses the following designations to highlight problems:

  • Red: Problem that needs immediate attention.
  • Amber: Potential issue that needs evaluation.

With so much emphasis being placed on SSL connections, I thought, why not test them? Here are the results for Scroogle and the results for Google Search. You can see that both have issues. I am not sure I would consider them show-stoppers, but it is something to think about.

Bottom line

Now comes the hard part. After all is said and done, it ends up being a matter of trust. If using Google Search is important, but you are not sure about trusting Google, you may want to think about Scroogle.

About

Information is my field...Writing is my passion...Coupling the two is my mission.

27 comments
KahunaNui
KahunaNui

Don't forget Yippy (formerly Clusty) which many (including me) believe provide much better search results than google. No SSL but they do care about privacy.

cdedbdbunny
cdedbdbunny

Although I have concerns with the length of time Google retains personal information and the possibility that this data may end up in the hands of the government, there is a rational reason for the cookies and search term data being farmed from users. Google stays in business from Adwords and other advertising. They know that if that advertising is more relevant to someone's search or the page on which they land, more people are likely to click on the ad. For example, someone somewhere may find my site in a search. That search was customized by Google from the location of the IP address so that local websites will show up first. This way when they do a search on "oak furniture," local results come up first. If the search is specific enough, my site will come up first no matter where the location. However, on the top or sides of the page of Google search results, ads that are local will more than likely appear. Once on my site, Google Adsense will send ads to the user that either are local for that person or are related to previous pages this person has visited that also have Google cookies. Last month I visited a site to buy a streaming video player for myself and now suddenly I see ads everywhere advertising this very same streaming player. The imperfection in this is that I already bought this and don't need to see any further ads. However, for others, it may be useful, especially if they didn't buy it and are still considering a purchase. Google does all of this purely as a business decision to increase revenue through advertising. They don't have malicious intent with the data. Even the wireless point data they obtained from their Google Street View cars that roam the street was recorded so that when people search for free wireless locations, Google can tell them where they found it. However, I think this is where I draw the line. I don't think people who have unprotected wireless at their homes don't necessarily want people all over the world to know they have it. That's like Google placing a sign in the front lawn of a house that says, "This home is not protected by any home security." There is also a concern with our government continuing to encroach on our freedoms and may subpoena Google for information we would prefer to remain private.

nwhittier
nwhittier

But why not just cut over to a more privacy-sensitive search engine like

pgit
pgit

I set every browser I touch to use scroogle scraper as the home page. Nobody has ever complained, in fact the opposite. A few folks have read the random data that appears on the scraper web page, and have learned a thing or two about the internet and privacy. Thank you scroogle! They have had a few outages because google had removed a certain URL that scroogle is scripted to work with. According to their releases at the time, they would not be able to use the content on the main google site, it changes too frequently and would require too much re-scripting to keep it working. So scroogle might notice an abundance of hits from my neck of the woods. Off the top of my head there must be 200 people with the scraper set as their home page. BTW after any reinstall, when I reestablish the browser and go to scroogle scraper to set as home, I have learned to double check that I have typed ".org at the end of the url... it's especially important if the owner/user is looking over my shoulder. :p

Daniel Brandt
Daniel Brandt

Scroogle does 350,000 searches a day, and by all mainstream media accounts I've seen in the last year, Google does about 1 billion a day. In other words, Scroogle is 0.035 percent of Google's total.

jth4944
jth4944

First, excellent article MK ??? thanks. I'll give Scroogle a try. I found flhtc???s related comment interesting also, and reminiscent of the Enterprise vs. the Borg battles whereby the strength of the Borg ship is undermined when Enterprise changes the frequency of its phasers. The Borg counters by changing the frequency of its defenses which in turn is countered by the Enterprise changing its frequencies again, and so on, and so on. Just replace the ever changing frequencies in the story with the algorithms designed by Scroogle & Google to undermine each other. And, I leave it up to you to determine which fictional character represents Google and Scroogle. :-)

flhtc
flhtc

But, now that the cat's out of the bag, it won't be as easy. Lately google has been changing the output from their searches. Just enough to mess with scroogle. Now, they (google) have been limiting the number of requests per IP address per some time frame. During periods of high usage this can make scroogle unusable for a period of time. When it does happen, you'll get a message from scroogle saying "Try back in 10 minutes". In the voice of Eeyore, Thanks for noticing, Mike! LOL BTW... Write on Mr. Kassner, I DO enjoy your "other" articles.

bboyd
bboyd

Oh no my Scroogle will get slower because of all this advertising! Thanks for feeding the flock MK. I've always felt that using it added one more feature, search neutrality. The searches aren't tuned for me so the give somewhat different results. This does drive one question are results modified because its Scroogle or are they unmodified? When using Scroogle is the search result modified based on what the mass of users search for and click or are the results of the mass ignored by Google. I have a feeling that I'm being inarticulate.

kmeuskens
kmeuskens

Very interesting article. Unfortunately I can't get to Scroogle because my corporate proxy has it blocked under the category 'Porn'!?! Anyway, I prefer not to rely on just one search engine and I enjoy excellent results from multi-search websites Dogpile and Yippy. I'm not sure if Google is able to collect data from that type of search...

Craig_B
Craig_B

OK, I don't trust Google, so I start using Scroogle. How do I know I can trust Scroogle or anyone else out there?

Michael Kassner
Michael Kassner

I certainly get it and there is truth to it as well. Google gets billions of search requests, Scroogle is like 5 per cent of that, I have heard. If Scroogle gets too popular, Google may take notice.

Michael Kassner
Michael Kassner

I was going to mention your concern, but oopsed. Thanks for reading my other stuff too.

Michael Kassner
Michael Kassner

It's possible that ecommerce-oriented search terms are affected by the fact that my servers are located in Florida and Arizona. The IP address of my servers would reveal these locations to Google. However, for non-commercial terms, I've been checking certain items daily for years on Scroogle, and they seem quite stable. They are affected mainly by PageRank, and also by "recency" -- how recent an important new backlink to the URL shown by Google may be. Since there are never any cookies sent or recorded by Scroogle, I assume that there is no "personalization" involved on Google's part. It must be based only on location, which is based on my server's IP address and/or the user's language selection. I just use English -- if you select a different language, Google may assume you are in that country even though my server says you are in either Florida or Arizona. The random IPs I use from Google are from all over the world. This does not appear to have any affect on non-commercial rankings, because there is no movement in the rankings I watch when I repeat the search. A repeat of the same search on Scroogle would most likely mean a switch to a different Google data center, which could even be in a different country.

Michael Kassner
Michael Kassner

You are making sense, and I did not think about that aspect. If Behavioral Targeting is employed, the results will be skewed. I will ask the developer about this. Thanks.

blair.howze
blair.howze

Make sure you are not using .com behind scroogle.

Michael Kassner
Michael Kassner

Particularly DogPile, they send cookies to your browser and retain information about you. They do not say for how long either. Google mentions they retain it for 9 months I believe. Scroogle points out it retains logs for 48 hours.

Michael Kassner
Michael Kassner

But, Scroogle's developer admits in public; he does not keep data for more than 48 hours. I don't believe that Google can say the same.

Michael Kassner
Michael Kassner

I have read the privacy policy and two things. They add a cookie, say it is benign, but they add one. They also maintain log files like Scroogle, but unlike Scroogle, they do not say for how long. I have not heard much about StartPage, do you have any more thoughts?

pgit
pgit

I go the home page route mainly to get eye-time on the messages scroogle puts out. I've had a lot of great discussions with clients who'd read something on one of those that got them thinking about something. All good. So you could say my method is politically driven. Ease of use never factored into it... I suppose it should. :p

kmeuskens
kmeuskens

I was using .com. Embarassed now. Thanks!

Spitfire_Sysop
Spitfire_Sysop

I personally block most cookies anyways. Here is what I found about the cookie they use: "Startpage's 'preferences' cookie is a non-identifiable cookie and safe to accept, but cookies in general can pose a threat to your privacy." It's for all of the things on the settings page. Without this, you have to go to the settings page every visit -OR- you can get a custom URL with the settings flags in them and use that as your home page. All of these things make you unique but it depends on who is looking. There is also an HTTPS option to secure your connection up to the ixquick servers, if you are concerned with the man in the middle. "Startpage does NOT record your IP address!" More info: https://www.startpage.com/eng/what-makes-startpage-special.html Sounds like their logging is limited and it is deleted after 48 hours: https://www.european-privacy-seal.eu/press-room/press-releases/20080714-europrise-press-release-en.html

Michael Kassner
Michael Kassner

It also could promote the users to use the web site with other computers. Good point, Pgit

Michael Kassner
Michael Kassner

For clearing that up. I wonder what they mean when they say that other "non-personal" information is deleted after 14 days.

Editor's Picks