You have arrived at the web home of Noah Brier. This is mostly an archive of over a decade of blogging and other writing. You can read more about me or get in touch. If you want more recent writing of mine, most of that is at my BrXnd marketing x AI newsletter and Why Is This Interesting?, a daily email for the intellectually omnivorous.

February, 2011

Google's Opinions

Google's algorithm changes and the impact on search results and content farms.

I don't think it's far-fetched to say that most people think of Google as a public utility. It is, for much of America, the site all of the internet flows through. They trust it to deliver relevant and accurate results to the queries they plug into its search box. Google, for its part, constantly refines the design and algorithm to deliver better results for users.

Of course Google also makes a lot of money off those same folks by presenting them contextual ads on the right (and top) of the page for them to click on. It's those ads that have helped to make Google a very rich company. And while the company frequently claims its search and ad teams have no interaction, there is some evidence that at least on some occasions tradeoffs are made between revenue and user experience. Take this brief filed in the case Rosetta Stone has filed against Google for allowing competitors to use its trademark in ads. Although Google has done its best to keep documents out of the public eye, the following quote from an unredacted brief speaks squarely to this issue:

In connections with its 2004 policy change allowing the purchase of trademarks as keywords, Google conducted in-house experiments to assess the user confusion that would result if trademarks actually appeared in the sponsored link text. JA(41)-4362-4363 These experiments concluded that the use of trademarks anywhere in the text of the sponsored link resulted in a "high" degree of consumer confusion. JA(41)-4365-4368 ("For a user, it seems to make little difference whether s/he sees a ™ [trademark] in the ad title or ad body - the likelihood of confusion remains high."); JA(41)-4370-4373 ("87.5% of users were confused at least once during Experiment 2, and 76% of the users were confused at least once during Experiment 4."). Indeed, Google's evidence of user confusion was overwhelming: "Overall very high rate of trademark confusion (30-40% on average per user)[;] . . . 94% of users were confused at least once during the study." JA(41)-4375-4377

I don't include this to blame Google for its decision, as the small change was estimated to "result in at least $100 million, and potentially more than a billion dollars, in additional annual revenue." Rather, I include it to highlight the danger with being one of the most trusted brands in America. But ultimately I bring this all up to get to Google's big announcement of an algorithm change that affects nearly 12 percent of searches (that's a giant number).

The change, which Google announced in an official blog post, targets "sites which are low-value add for users, copy content from other websites or sites that are just not very useful." This has been bubbling for sometime now, with Google getting lots of bad press over the past few months and, with the company's recent IPO, Demand Media, and as an extension "content farms," generating quite a bit of conversation.

Now most people assume this week's algorithm change was specifically targeting sites like Demand (though the company claims "we haven't seen a material net impact on our Content & Media business"). The assumption is certainly not unfounded, as just a few weeks earlier Google wrote:

As "pure webspam" has decreased over time, attention has shifted instead to "content farms," which are sites with shallow or low-quality content. In 2010, we launched two major algorithmic changes focused on low-quality sites. Nonetheless, we hear the feedback from the web loud and clear: people are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content. We take pride in Google search and strive to make each and every search perfect. The fact is that we're not perfect, and combined with users' skyrocketing expectations of Google, these imperfections get magnified in perception. However, we can and should do better.

Later in that entry they specifically call out their "content guidelines" something I didn't know existed (and they didn't link to in the body of the post). They explained that "Google absolutely takes action on sites that violate our quality guidelines regardless of whether they have ads powered by Google." Curious, I dug in on those guidelines. Though there are a total of ten, though only two are relevant to content (many of the others are about using proper HTML):

- Create a useful, information-rich site, and write pages that clearly and accurately describe your content.
- Think about the words users would type to find your pages, and make sure that your site actually includes those words within it.

What Demand Media (and many others these days) do is take things lots of people are searching for and create content around it. You've probably been on one of their sites and not even realized it. Although I'm not very fond of the content, it's hard for me to argue that they break either of those rules (in fact the second rule is pretty much their business model). Which gets to the heart of my problem: It's feeling more and more like Google is making arbitrary decisions about the value of different content.

When Google first started, things were simple. PageRank was a sea change in the way search worked, rather than just matching keywords they ranked pages based on how many incoming links it had and the authority of those sites linking to it. As Arrington wrote yesterday, "That's why Google was so great in 1999, when there was less incentive to game search results, and less expertise by the people doing it." Clearly the game is a lot harder on both sides as Google continues to adapt it's algorithm and the spammers find more and more clever methods.

But let's move away from spam for a second and back to content farms. Google is, at its heart, an algorithm. An algorithm is nothing more than a set of rules. You can write an algorithm to detect spam, say, by scanning the page to see if it makes any sense. If there are not complete sentences but a ton of outgoing links the code can conclude this page is probably spam. That's the essential piece of writing an algorithm, you need to figure out your objective and then be able to parse it into a set of objective rules to judge it. After you write those rules you feed it some sample content and see whether its performing the way you want it to.

Which is what bothers me so much about all this talk about quality. I can't figure out how Google could possibly be using an algorithm to judge quality. I get incoming links as a judge, and it's quite possible that they're using that to help judge, but of the three criteria they gave to define a low quality site ("low-value add for users, copy content from other websites or sites that are just not very useful"), I can only understand how you could objectively judge one of them (copy content). Of course, algorithms are not really objective (which Google surprisingly admitted a few weeks ago), but the opinions you feed it (a comment is spam if it includes the word viagra 16 times) is turned into an objective rule (the comment either has the word 16 times or it doesn't). Value and usefulness are completely personal. Not only that, they constantly shift around as a piece of valuable information today may not be valuable tomorrow.

As usual, this change in algorithm is given very little explanation. The Google algorithm has always been a black box for lots of reasons (some good, some bad). Hiding behind the opacity was fine when we trusted Google's motives, but it's hard, for me at least, to still feel like they're watching out for me. The bottom line is a company like Demand is a serious threat to Google's wellbeing.

February 26, 2011
Noah Brier | Thanks for reading. | Don't fake the funk on a nasty dunk.