This is a guest post by Lior Levin.
Google’s top spam fighter, Matt Cutts, recently posted a lengthy blog entry about the state of Google’s battle with spammers.
In the post, Cutts said that, while spam in Google’s index was less than half what it was five years ago, that spam has increased in Google’s index in recent months and, for certain queries, has become more noticeable.
According to Cutts, this change isn’t just due to the fact that there are more spammers than ever trying to cheat their way to a higher search engine ranking, but because Google’s recent Caffeine update has the search engine indexing more content than ever before, including spam.
However, he did outline Google’s plan to fight against this uptick and prevent it from becoming a more serious problem. Specifically, he outlined three changes to Google that he hopes will improve the quality of Google’s results and improve the search experience for everyone.
The first change, which Google recently launched, is targeted at spammers itself and is a “redesigned document-level classifier” that does a better job of detecting spammy content on a single page. This includes things such as repeated words in the text as well as certain kinds of comment spam.
The second improvement, which is currently being evaluated, is an algorithmic change targeted at sites that copy content from other pages, usually without permission. These sites often copy large blocks of content with very little original material but they sometimes rank very well, including ahead of the sites that produced the content originally.
However, it’s the third and final change that has created the most discussion. According to Cutts, Google is exploring ways to further reduce the ranking of so-called “content farms” or sites that produce large amounts of content cheaply, usually through contract labor that’s often of questionable quality. This includes companies such as Demand Media, which operates eHow, Livestrong as well as other sites.
According to Cutts, Google made two changes in 2010 to reduce the impact of these content farms but understand that people are asking for even stronger action to be taken. The sites, however, have been controversial because, even though they often have lower-quality content, they are not considered to be traditional web spam.
Given that Demand Media just had an IPO that valued it at $1.5 billion, it is easy to see how much value the company has been able to grow through its content “farming”. What remains to be seen is if Google will be able to stifle its prevalence in the results, especially for long tail searches it targets.
What’s clear though is that, even after a decade in the industry, Google is still wrestling with content quality issues in its index and is struggling to keep spammers, scrapers and content farmers at bay.
Even though it has definitely made progress in the past decade, there is still clearly a great deal of work to be done and Google is setting about to do it.
All we can do is sit back, watch what happens and hope that legitimate, high-quality sites are not inadvertently caught up in the mix.