Penguin & Panda who? - Is Google still really easy to game?
So far I have resisted writing anything about " Penguin 2.0", the latest iteration of Google's flagship anti-spam algorithm that rolled out on May 22nd. Amidst the sea of SEO's telling everyone "it happened" and/or parroting out the usual information about what Penguin is and how you can avoid it/escape from it, I have struggled to see much point.
Fortunately Google came to my rescue with a real peach of a SERP (search engine results page), showcasing an example of Penguin, Panda, and Google in general really not working at all, providing me the perfect pretext to talk about it.
I conducted an advanced forensic SEO investigation to uncover what you're about to see,but the sad fact is that it was really quite easy and took all of 10 minutes - probably not far off the amount of time it took to build the sites in question and set up the necessary link building tools before sitting back and watching the money roll in.
At the time of writing figure 1 (above), demonstrates the SERP for the keyword "superdry sale". Superdry is one of the hottest fashion keywords out there right now, with various high street stores that sell the label competing against the brand itself.
The top ranking page, drivethedeal.co.uk/shop/superdry.asp is the one I'm interested in here (see figure 2, below). It came from nowhere to position 1 a couple of weeks ago and has been enjoying top ranks ever since.
It's not a spectacular piece of web design, the grammar used in the site navigation is somewhat suspect (including such gems as "men jean" and "men outwear") and let's just say that the prices for some of the clothes are particularly attractive.
But wait,although this is the site we end up on after clicking the first Google result, it isn't the ranking site we're interested in - the URL of this site is www.superdryoutlet2013. org. The top ranking site is a doorway page with some rudimentary cloaking put in place to discriminate between search engines and users to treat them differently. Illustrated below (figure 3, below) is the delightful page that is actually ranking - a blatantly manipulative text heavy monstrosity that has been "spun" by software designed to create unique copy that can fool search engines into thinking it is good, well structured and readable content....exactly one of the things that the Panda algorithm is supposed to tackle! This page has a simple script that identifies when a visitor has come from the Google SERP and redirects them to the superdryoutlet2013.org website. Because search engines will never crawl from the Google SERP this is an incredibly crude but effective means of cloaking so that search engines index and rank the text heavy page while users are sent to something more suitable for driving sales. You can view the cloaked doorway page by simply copying and pasting its URL into your browser.
It's possible that drivethedeal.co.uk has been the target of a hack by the spammer behind this ranking. While the site itself isn't necessarily that genuine (it has no contact details, for one) it is obviously about a completely different topic (cheap cars) with the Superdry page being totally out of place. It's not uncommon at all for this kind of discreet hack to be used by black hat SEO's, because the benefit of placing hyper-optimised content on a domain with history is that rankings can be gained much quicker.
Figure 4 (below), shows new links accrued to the doorway page over the last few months, as reported by majesticseo.com. Majestic has discovered in excess of 130,000 Feature Article links in a 2 week period starting just after the launch of Penguin 2.
Figure 5 (above) shows links accrued to the entire domain (drivethedeal.co.uk) during the same time. The profile is exactly the same, meaning that the only new links to the site during this time were the 130,000+ links to the doorway page.
In fact, forgetting the last few months and looking at all time, 96% of the links to the site point to the doorway page!
I find it difficult to imagine a more unnatural spike in a link profile than this, and astounding that Google's 2nd generation anti-spam algorithm (i.e. Penguin 2) has been completely fooled by it, especially when you consider the sorts of links being gained; forum spam on foreign forums that are potentially more vulnerable to software that automatically creates profiles and posts spun content with many links, and blog comment spam of a similar nature (See figures 6 & 7).
The spammer behind the Superdry sale ranking hasn't stopped at generating 130,000 links straight to the doorway page. There is also a significant amount of cross linking happening between the forum posts and blog comments in a tactic known as a "link wheel", so called because it builds a network of linked sites around the primary target of the link activity. The network, rather than linking universally between all pages (which is easier for search engines to spot), will typically link consecutively from one page to the next (providing the wheel analogy) in order to compound the value of link equity that is ultimately extracted from it. (See figure 8).
What can we conclude from all this?
Investigating this (and again I used the term "investigating" in the loosest possible sense of the word) I have arrived at a list of spam tactics that reads like the agenda for a "Black Hat 101" training session. Cloaking, doorway pages, content spinning, hacking, forum and comment spam - these things have been around for over a decade and aside from the mentions of Penguin 2 and link wheels (the one concession to a slightly more modern tactic) this article could have been published in 2002 and would have made perfect sense to anyone reading it then.
I expect Google would argue that this is just one example of "getting it wrong" among millions, and read literally they'd be right. But the point of algorithms is that they apply universally and I think we have a right to expect that they aren't so easily duped. Since Penguin 2 rolled out I'd say the SERPs have become anecdotally worse not better, with examples like this page, which I also saw ranking in 2nd for the "superdry" brand term (since dropped to 7th or so), some ranks for "online casino" supported by clearly paid and hidden links, and most amusing of all a page 1 ranking for "payday loans" with the URL paydayloansfrommrcutts.-blog.co.uk, just days after Google announced their new focus on spammy queries like "pay day loans" (Matt Cutts, for those who don't know, heads up Google' web spam team).
I am not going to come out and say that Penguin and Panda are doing a poor job, because there's no way of knowing how much worse things might be without those algorithms in place. But I do expect Google to take quicker manual action on this sort of thing. I know the "superdry sale" and "superdry" SERPs have been reported to Google a number of times, in some detail, with no discernible effect. By contrast, the Matt Cutts-related "pay day loans" result lasted just one day before being removed. To me this shows a level of cynicism that is unacceptable from a company that demands such high standards from brands when it comes to towing the Google line.
A few weeks ago we had a potential customer not choose Greenlight to do their SEO on the basis that they were looking for a company that would just go and buy them a lot of links. As bizarre as that sounded initially, I wonder if you can blame people for arriving at that conclusion when worse things still seem to fly under Google's radar?