Google Apps investigate

Can Google survive its blind faith in the algorithm?

It's been a tough year for Google search. It's had a difficult time targeting content farms and has accidentally removed good content.

Google's search engine is a triumph of technology. There's no denying that.

It was the capstone that completed the initial structure of the Internet. But, the Internet is now in the midst of a dramatic remodel and it's unclear whether Google search will get the refresh it needs to make it more appealing than ever or if it will be one of the things that gets painted over.

Photo credit: iStockPhoto/craetive

Google entered 2011 with two major problems that threatened the company's immediate relevance and it's long-term future:

1.) The search results on Google.com were becoming increasingly ineffective because they were littered with "web spam" and articles from "content farms" (sites creating faux content to turn as many ads as possible).

2.) Social media has been replacing traditional web search for many different kinds of information gathering and Google didn't have a legitimate play in social.

The company went a long way toward addressing the second issue in July with the launch of Google+. After several high-profile social flameouts -- such as Google Wave and Google Buzz -- they've pretty much nailed it with Google+.

To be clear, we still don't know whether Google+ will be able to win over the masses, but it has become wildly popular among tech and media professionals and it is already causing Facebook to react and make changes to buffer itself against people abandoning it for Google+. To dig deeper on this topic, read my article Why Google+ is about to change the web as we know it.

As huge as social media is, the even bigger challenge for Google has been the declining potency of its search engine. In recent years, Google searches have become a lot less useful and a lot more frustrating. It has become more difficult to find stuff that you know is out there -- even stuff that you've searched for (and found) previously. Another example is pages that have posted to the web more recently. They get overpowered in the Google algorithm by older pages that have had time to accumulate more incoming links.

The big problem is SEO -- search engine optimization. A whole cottage industry has arisen around helping sites optimize their pages to get ranked as highly as possible in Google. As a result, the sites that land at the top of Google search results have become more about which sites are best optimized rather than which ones have the best and most relevant content.

Even worse, whole companies have emerged whose entire purpose is to create low-quality content that is highly-optimized for Google and loaded up with ads to turn a quick buck. These "content farms" have become big business. One of them, Demand Media -- which hates to be called a content farm and shuns the label -- is now a public company and brags about having a close partnership with Google.

I'll let you judge for yourself whether Demand Media is a content farm. Below are four articles from its flagship site, eHow. Are these helpful or useful? Would a site that aims to serve readers and not just serve ads publish these?

(We'll talk more in a moment about whether Google considers eHow a content farm.)

Recognizing the growing risks that this stuff poses to Google's relationship with users, and ultimately its business model, the company has moved aggressively in 2011 to fix the situation. It started with a contradictory blog post in January in which Google defended the quality of its search engine as "better than it has ever been in terms of relevance" while also throwing down the gauntlet on web spam (sites that "cheat their way into higher positions in search results") and content farms ("sites with shallow or low-quality content").

Then, it dropped the real bombs --  a series of major updates to its search algorithm. These have been dubbed the "Panda" or "Farmer" or "Panda Farmer" updates (don't laugh). The first one (Panda 1.0) came in February, and it obliterated search traffic to a bunch of sites, but oddly, eHow (the site most notorious for the "content farm" label) escaped unscathed.

Google eventually unleashed Panda 2.0 in April, Panda 2.1 in May, Panda 2.2 in June, and Panda 2.5 in September. According to SEO analyst Sistrix, these Panda updates eventually crushed eHow, which relied on Google search to drive most of its traffic. Despite reports of eHow's traffic dip earlier this year, Demand Media denied that it had been hurt by the Panda updates. Then, earlier this month, the company admitted eHow's traffic problem, although it tried to brush it off as "an internal technical issue." The public hasn't been fooled, as Demand's Media's stock has fallen precipitously.

So, Google apparently bagged its big game in the Panda hunt. The problem is that it took months to do it and a lot of algorithm trial-and-error and there was plenty of collateral damage done in the process. It's as if Google looked at its backyard, spotted a bunch of dandelions, and instead of taking hand trimmers and going out and clipping them, Google decided to build a highly-advanced chainsaw to deal with it. The chainsaw eventually got rid of the dandelions but it also whacked some chunks out of the hedges, put some gashes into the ground, and took out part of the back fence.

In terms of the collateral damage, TechRepublic hasn't been immune from its affects. This site has taken some bullets in the crossfire between Google and eHow. TechRepublic has a long history of publishing in-depth tips, tutorials, and best practices that have a long shelf life and that Google has always loved because they get lots of links from around the web. TechRepublic's content is the exact opposite of both web spam (we've never been great at SEO optimization) and a content farm (we focus on fewer articles and higher-quality content), and yet the Panda updates have cut in half the amount of users that Google sends to TechRepublic.

I point this out not as sour grapes or to whine about Google picking on us. TechRepublic will be just fine. We have a large base of loyal users who regularly come to our site -- especially subscribers to our popular email newsletters -- and Google may eventually figure out how to tell the difference between a content farm tip like the ones on eHow and the in-depth tutorials you get on TechRepublic.

Still, what this all comes down to is Google's faith in the algorithm. Google says that it doesn't single out sites to include or reject in Google search results. It simply builds an algorithm that systematically finds the most relevant stuff and ignores (or removes) the least relevant stuff. Google argues that this creates a fairer and more objective system, and that introducing human filtering into the system would make it biased and subjective. While that may be true, the big question is whether human intervention would make Google search more effective, and ultimately more accurate.

The problem with the algorithm (and artificial intelligence in general) is that it has no common sense or wisdom -- at least not yet. Meanwhile, the systems that Google search is increasingly competing with for information discovery -- social search and mobile apps -- use the collective wisdom of the community or targeted experts to deliver better information more quickly than Google search, in many cases.

Despite the early success of the Google+ social experiment, the Panda updates during 2011 show that Google still believes in the algorithm above all things. The company thinks that throwing more math, PhDs, and servers at any problem is the right answer. As we've seen, that approach has started to fail Google in 2011. It has had a difficult time targeting content farms and it has ended up accidentally removing a bunch of useful content in the process. The big question now is whether Google can learn from this experience and change, or if it will eventually fade into becoming a fallback mechanism that people use when they can't find the information they need from social search (asking their Twitter or Facebook friends) or a mobile app.

Also read

About

Jason Hiner is the Global Editor in Chief of TechRepublic and Global Long Form Editor of ZDNet. He is an award-winning journalist who writes about the people, products, and ideas that are revolutionizing the ways we live and work in the 21st century.

4 comments
afedwin
afedwin

This is an interesting article. Could it really be possible Google's automated search engine faces a real threat from (as you put it) "the collective wisdom of the community"? I agree with you that this is a genuine threat. However the silver lining is that this could lead to Google making more 'intelligent' algorithms or a combination of human-machine filtered results. Either ways, it's an excellent opportunity for improvement. Good for internet users. Bad for content farms. Oh, and sorry about your search results slashed.

admin
admin

I would agree completely with Hiner. Google in its eagerness to clean its results has thrown the baby along with the bath water. It has ended up pushing traditional content rich sites to the bottom and instead thrown up many one page sites and MFA sites to the top. Panda appears to go after sites with large number of pages - apparently targeting portals. If you have sites which offer hundreds of products, you would be hurt as you can't write enough content on any single product. Just writing brief product description on say - Power Transistor 2N3055 in your electronic catalog will put you at the bottom of the pile. You can't judge the quality of a page based on the number of words. Frankly you can't create an algorithm to evaluate literary work. How do you objectively evaluate the works of Shakespeare ? Do something Google before you lose relevance.

yaskil
yaskil

Google finds its algorithms successful. If someone (SEO) affects search results, can anyone still say that algorithms are successful? PageRank is becoming PageGap to Google and its search results.

JCitizen
JCitizen

to filter those results that they would ordinarily check with a live human. This would take a whopping work load of Google employees. Or maybe they should just buy Watson? C=