Nuffnang Ads


Saturday, November 24, 2007

Before I can show you anything, we must be clear on what nofollow is, and what it is not. In short, nofollow is a link relationship which one can add to any hyperlink on a Web site. When added, it looks much like this:


The rel="nofollow" tells search engines not to count this link as an inbound link to the target web site. It does not prevent the search engine from following the link, as it seems to imply. The search engine is still free to follow the link and index the content on the target web site, but for the purposes of counting inbound links (and PageRank) the link should not be factored into that calculation.

The intended effect of this is that any link containing rel="nofollow" will allow both users and search engines to reach the site, but the existence of the link will not increase the ranking of the site in participating search engines. Aside from Google, so far MSN and Yahoo! are participating. Others may be as well.
Comment spam

Comment spam is the problem which led to the introduction of nofollow. Many types of software, including blogs, wikis, forums, and some CMS software, allow the general public to post comments, and in some cases, to post top-level articles (what these are and how they appear depends on the software). In the beginning, this was good; it led to a lot of interactivity between a site and its readers. However, soon spammers discovered that most of these systems allowed them to post links to their own sites, and thus began posting spam.

In case you’ve never seen one, here is an example of a comment spam:

In the ordinary, everyday understandings of the words involved, to say that someone survived death is to contradict yourself; while to assert that all of us live forever is to assert a manifest falsehood, the flat contrary of a universally known truth: namely, the truth that all human beings are mortal. For when, after some disaster, the ‘dead’ and the ’survivors’ have both been listed, what logical space remains for a third category? by buy Viagra

Comment by phentermine — 3/24/2005 @ 6:03 pm

The two links go to various spamvertised sites, and I’ve omitted them here. Sorry if you were actually looking for Viagra or phentermine.

There are two main reasons spammers target blogs, wikis, forums and CMS. In no particular order, the first is to create additional inbound links to their sites, in order to raise the sites’ rankings in search engines. The second is to create additional inbound links to their sites, in order to entice users to purchase their products.
Nofollow: the final solution to comment spam

If you’re a blogger (or a blog reader), you’re painfully familiar with people who try to raise their own websites’ search engine rankings by submitting linked blog comments like “Visit my discount pharmaceuticals site.” This is called comment spam, we don’t like it either, and we’ve been testing a new tag that blocks it. From now on, when Google sees the attribute (rel=”nofollow”) on hyperlinks, those links won’t get any credit when we rank websites in our search results. This isn’t a negative vote for the site where the comment was posted; it’s just a way to make sure that spammers get no benefit from abusing public areas like blog comments, trackbacks, and referrer lists. — Google

Google’s promise was: by tagging spammers’ links with nofollow, their sites would decrease in rank in their search engine. It was quite surprising how quickly virtually everyone in the blogging community signed on. I recall a few people raised concerns as to whether it would actually cause spammers to stop, and I was one of them, but that didn’t seem to stop anyone. MovableType, WordPress, Blogger, Flickr, you name it, everybody was adding nofollow. Even Slashdot. Who are they trying to stop, the Gay Nigger Association of America? (Yes, that last link has a rel="nofollow" on it.)

MSN quickly signed on to the nofollow initiative, and Yahoo joined as well. People all over the Internet started rejoicing: the comment spam problem had been solved! Or had it?
Why nofollow doesn’t stop spam

If you’ve been running a blog, you are quite well aware that nofollow has done little or nothing to stop comment spam. In some cases they are hitting blogs so hard as to generate denial of service conditions, even when the blogs use rel="nofollow"!

I am so sick of the damn spammers. Spammers are teh sux0r. Spammers are a festering boil on the ass of the Internet's. I wouldn’t let a spammer kiss my butt with a pair of wax lips from ten feet away. If I ever see a spammer bleeding in a ditch, I will not be a Good Samaritan, I will kick him in the head, cover him up with dirt, and leave him there to rot. — Dougal Campbell

Why is this? Where did nofollow fail? It prevents spammers from getting PageRank, doesn’t it?

Yes, nofollow prevents spammers from getting PageRank. But they want traffic on their web sites. How they get it is irrelevant, except as a means to an end: bringing in users and taking their money. Indeed, shortly after Google launched nofollow, The Register published an interview with a link spammer. It goes into great detail as to how link spammers operate, and it is required reading if you want to understand why nofollow has failed, how to actually stop link spam, or both. As is my usual style, I’ll post a few choice cuts. Quotes are from the link spammer, “Sam,” interviewed in the article.

“You could be aiming at 20,000 or 100,000 blogs. Any sensible spammer will be looking to spam not for quality [of site] but quantity of links.”

This is Rule #1 for a link spammer. Post the link in as many places as you can, to bring in as many people as possible.

When a new blog format appears, it can take less than ten minutes to work out how to comment spam it. Write a couple of hundred lines of terminal script, and the spam can begin. But you can’t just set your PC to start doing that. It’ll get spotted by your ISP, and shut down; or the IP address of your machine will be blocked forever by the targeted blogs. So Sam, like other link spammers, uses the thousands of ‘open proxies’ on the net. These are machines which, by accident (read: clueless sysadmins) or design (read: clueless managers) are set up so that anyone, anywhere, can access another website through them. . . . Sam’s code gets hundreds of open proxies to obediently spam blogs and other sites with the messages he wants posted.

Most link spammers, manual or automated, use open proxies to disguise the source of their spam. It’s rare to receive link spam that did not originate from an open proxy. Yes, I’ve been tracking this.

When Sam spams tons of blogs and sites with links to his sites - which are affiliates of bigger PPC sites - people see the links and, seeking some porn, pills or casino action, click through to his site, and from there to the parent site, which pays Sam for each person landing there. The PPC sites can see revenues of £100,000 to £200,000 per month, says Sam. He gets a slice of that - and he wants it to stay that way.

Aha! We get to the heart of the matter. Our link spammer cares about click-throughs. Nofollow is completely irrelevant to click-throughs. Remember, it affects search engines, not users. Now, you may never follow a spammed link and buy something from one of these sites, but there are many users reading your blog (and others’ blogs) who will indeed patronize these sites if they happen to run across one from comment spam.

Will the initiative by Google, Yahoo and MSN, to honour “don’t follow” links defeat Sam and his ilk? “I don’t think it’ll have much effect in the short, medium or long term. The search engines caused the problem . . . and they’re doing this to placate the community. It won’t work because most blogs and [forums] are set up with the best intentions, but when people find hard graft has to go into it they’re left to rot. To use this, they’ll all have to be updated. The majority won’t be. And there’ll just be trackback spamming.”

Straight from the spammer’s mouth. He doesn’t care about nofollow. It isn’t stopping him. Not only do people click through anyway, many sites he spams don’t have nofollow implemented, so it doesn’t affect his search engine rankings too much.
What nofollow really stops

From the beginning, people began questioning the use of rel="nofollow", whether it would be effective at stopping comment spam, and what other effects it might have.

I’m deeply mystified by the hallelujahs bursting forth about Google’s rel="nofollow" method of preventing comment spam. . . .

If rel="nofollow" works, if it’s applied universally, it will actually have the reverse effect. It actually gets less effective the more it is implemented. Why? Because the comment spamming sites are in competition with each other, and not with any legitimate businesses. They’re not so much trying to get the best pagerank for their term, as trying to get a better one than their rivals. That’s a key distinction. If the playing field is levelled by rel="nofollow", then everyone involved will be forced to try all the harder to get their links out there. The blogosphere will be hit all the harder because of the need to maximise the gains. As there’s no more effort in hitting 6 million blogs as there is in hitting 1 million, this really won’t bother the spammers one bit. All it does is shift the problem from the high pagerank blogs we here might have, with rel="nofollow", custom sanitize settings, and mt-blacklist in full effect, all the way over to the less technically adept. And that is one enormous customer service problem heading towards Blogger, 6A and the rest.

. . . forcing comment spammers to cast a wider net will cause them to target the long tail of people who have no idea what to do but come screaming at tech support, or slagging blogs off to their friends.

That would be a disaster. — Ben Hammersley

Hammersley didn’t even mention the effect nofollow would have on legitimate bloggers who use comments and trackbacks to interlink their blogs. However, The Register did:

It’s effectively declaring PageRank™ dead for weblogs, in an attempt to stem the problem [of comment spam].

“If such a tag were used widespread against comments and trackbacks, then wouldn’t this end up kneecaping blogs, by killing their intricate networks of interlinks?”

Indeed, this has already begun to happen. If your blog software inserts nofollow, then in order for you to give another blog Google juice, you have to go out of your way to link to them without nofollow, such as in your blogroll. It is no longer enough that your reader left an insightful comment or a trackback to his blog with more information. Now, as far as Google juice is concerned, it is as if all of your readers were never there and you had received no comments or trackbacks at all.

That’s what Google wanted all along.

For years Google has been plagued by blog noise, the phenomenon where blogs’ articles, comments and trackbacks will show as highly ranked for search results. While they have tried many approaches to dealing with this problem, none have worked out very well — or at all.

But is blog noise a problem at all? Sven-S. Porst doesn’t think so, and neither do others. The counter-argument is that blogs often are the search results people need.

One of the keys to being found on Google is that the webmaster has to want the page to be found. And most of what one would normally consider “primary source material” doesn’t want to be found. . . .

With most of the good reading material unavailable for free, what’s left? — Calico Cat

What’s left is the poor vilified blog and the thousands upon thousands of bloggers who work hard every day to bring useful content to the Web that wouldn’t otherwise be there.
Stopping nofollow

As we’ve seen, rel="nofollow" is Google’s way of having bloggers effectively delist themselves from search engines under the guise of protecting them from comment spam. If you want your site to have more Google juice, and who doesn’t, people have to link to you without rel="nofollow". It’s that simple. Nofollow hurts the entire blogosphere, and if carried to its extreme, will result in most blogs being relegated to obscurity as they drop out of the top 100 search engine results.

When you’re ready to get rid of rel="nofollow", first urge your blogging software developers to drop rel="nofollow" from their software. Then (if applicable) install a plugin or extension which removes rel="nofollow" from your blog, or remove the plugin or extension which added it. If you’re on a hosted blogging site such as Blogger, LiveJournal or MSN Spaces, your only immediate recourse will likely be to switch to another platform.

For Movable Type, simply disable and remove the nofollow plugin. For WordPress, install the NoNofollow plugin and set the number of days to 0 (zero). Update: Mark Jaquith has posted his Screw Nofollow plugin for WordPress which is smaller and less complicated.

And if you want to actually see how prevalent rel="nofollow" is, you can use Firefox, and add this bit of CSS to your chrome/userContent.css file:

a[rel~=nofollow] { color: red !important; background: black !important; text-decoration: blink !important; }

Unfortunately, I don’t know of any way to cause Internet Explorer to show rel="nofollow" links. If you do, please leave a comment below.
And finally, stopping comment spam

No article on rel="nofollow" would be complete without mention of how to stop comment spam. As I’m a WordPress user, much of my research in this area has been focused on stopping WordPress comment and trackback spam. However, I do have one thing of interest to Movable Type users.

Bad Behavior Blackhole is a DNS-based blackhole list which lists sources of comment spam and open proxy servers. Bad Behavior Blackhole intends to have the most complete list of open proxies available anywhere as well as automated removal for any legitimate user who happens to get stuck with a dynamic IP address a spammer once used. A WordPress plugin is available which looks up addresses in Bad Behavior Blackhole, and it is trivially easy to convert MT-DSBL to use Bad Behavior Blackhole; just replace “” with “”. I recommend for best coverage that you use both, though, and that’s probably not a trivial hack, unfortunately.

For WordPress, Bad Behavior (which is different from Bad Behavior Blackhole) analyzes incoming requests and determines if they are spambots; if they are found to be automated spambots, they are blocked before they can ever read your site to find the comment form! I’m not aware of any similar solution for Movable Type, unfortunately. In addition, WP-SpamAssassin uses SpamAssassin to analyze comments; this has been ported to Movable Type.

See also Movable Type anti-spam plugins and WordPress anti-spam plugins.

No comments: