Sunday, June 25, 2006

What does not work with SEO.

With my random walks in SEO world, I have been trying to find what works in the seo universe. Now you would find a million people giving a million different advice. Though there are a few people who are right on mark, most of the advice is either pure BS or very, very outdated.
So instead of adding to that garbage and telling you what works in SEO, let me tell you what doesnot.
If there is a technique which everyone is using, run. Run fast and away from it. It is going to be abused by shady SEO guys. ANd then you can be sure that the SE would penalise it.
Artice submission and directory submissions are the things to do right now. But with every one using it, I wonder for how long that is going to stay that way!

Wednesday, May 03, 2006

How google helps spammers and destroys your internet experience.

Not so long ago, if anyone asked you to name the one thing which made web better, chances are you would have named Google. And so would have I. Not so long ago.
Are you a webmaster? Quick, one term which just spoils your day. Was it MFA? MFA- Made for adsense sites. Automated sites which just copy content and add no value.
Google's lax enforcing of Adsense TOS means that spammers can show adsense on crappy site and get away with it. It means that you would be forced to see pages with a sentences and 3 ad units. It means search engine spammers can get away with anything and visitors lose time, publishers loose money and valid adsense ads get a bad rep.
But that is not even the bad part.
Google is actively, ok almost actively promoting these Black hat techniques.
What does Joe BlackHatter needs to create a MFA site? Softwares. Softwares to spew out a bazzilion automated sites. Now Google has declared a Jehad against automated/scrapped content. So you would think that they would not touch it with a 10 foot barge pole. Ok lets just ask google http://www.google.com/search?q=AUTOMATIC+CONTENT+GENERATOR.
The result
Yes sir. About a million sponsored listings for software to create spam. How much broke is google for money that they have to promote such Search Engine Fodder.
As a leading Search Engine one would expect Google to take a pro-actice role in weeding them out.



Just for fun, there are some more search results where google lists spammy softwares. (Only sponsored results are shown.)



http://www.google.com/search?q=Adsense
http://www.google.com/search?q=CLOAKING

So next time you see Matt Cutts crying about Bhack Hat SEOs, being scum of the earth, ask him just drop him a line.

(If you liked this story, why not digg it?)

Thursday, April 27, 2006

How spammers are beating CAPTCHA.

(Ok this is not exactly SEO, but then I know you would be interested in this).
Just in case you donot know CAPTCHA is Completely Automated Public Turing test to tell Computers and Humans Apart.

Captcha are the pictures containing words you have to spell before you can post a comment in blogs, write something on digg or make a free mail account.
Now spammers need a lot of free email account. They want to comment spam your blog. For this they need to beat the captcha.
Spammers are beating captcha in two ways. Unless the image is very blurred/grainy iage processing software can be used to get the words in them. The guy at http://www.mperfect.net/aiCaptcha/
gives an example of how captcha can be beaten using software. But there is an even better way. Social Engineering.
What is the internet most used for? I donot have the statistics, but I am willing to bet that PORN is right there at the top. And what is even better than porn? Free porn, obviously.
When Mr. BigSpammer needs to break a million captchas he makes a tie up with BigFreePornSite.com. His software gets the captcha images and sends them to BigFreePornSite.com. When Joe TeenHighOnSex visits BigFreePornSite.com he asked to post the text in captcha image which is sent to Mr. BigSpammer's servers. Lo, the captcha is broken. Now Mr. Big Spammer can comment spam, digg spam, yahoo spam.

Monday, April 24, 2006

Add Links for Del.icio.us, Digg, and More to Blogger Posts

Social bookmarking sites can be a very effective way to get visitors to your sites. If your readers like what you say, why not give them a chance to bookmark you at del.icio.us and other similar sites.
To add a quick link on your blog to all of the popular traffic-boosting sites , simply add the code below to your template. I generally add it it just below the content part but you can put it anywhere.

( How to edit blogger template).

Del.icio.us Link:
http://del.icio.us/post?url=< $BlogItemPermalinkURL$> &title=< $BlogItemTitle$>

Digg Link:
http://digg.com/submit?phase=2&url="<$BlogItemPermalinkURL$>"

Technorati Cosmos Link:
http://technorati.com/cosmos/search.html?url=< $BlogItemPermalinkURL$>

Furl Link:
http://furl.net/storeIt.jsp?t=< $BlogItemTitle$> &u=< $BlogItemPermalinkURL$>

reddit Link:
http://reddit.com/submit?url=< $BlogItemPermalinkURL$> &title=< $BlogItemTitle$>

Do automatic content generators work?

If you are in a hurry and cannot wait to read the rest of the article, no they do not.

What are automatic content generators?

Automatic content generators are software which claim to create content, in the form of articles automatically. This might seem an amazing capability, to write articles without human intervention, but the software rewrites existing articles to create new one.

There are three main ways in which these content generators work.

  1. Scraping. The software gets different parts of the articles from different places and joins them all together to create a new article.
  2. Thesaurus substitution. Synonyms are substituted in the original article to create the new article.
  3. Markov chains: Markov chain is a technique in which a statistical model of the existing article is created and the new article is created using the statistical model.

Of all these methods Markov chain holds the most promise as it is hardest of all the methods to detect.

So what are Markov chains?

Apart from being lots of bullshit in computer science, they are a tool to create pseudo random text from a statistical of another text. Since it is based on non random text, most of the times it will follow the rules of English grammar. Given a large non random text to create the statistical model, it will generate text which can sometimes pass the scrutiny of humans.

Markov chain takes into account what words follow a given set of words. Based on this data the new text is created.

My experiments with Markov chain.

Most black hat SEO techniques leave some footprint which the SEs use to identify the article as automatically generated. This leaves commercial automatic content generators vulnerable. I wanted to check if the SEs are able to identify Markov chain content. For this purpose I wrote my own software. I tried to remove other signs which might flag the content as automatically generated. In particular, the size of files was changed. I removed the trailing sentences which ended abruptly. Paragraph breaks were introduced.

A site was created with such content and hosted on Tripod. It was given a link from PR 3 page. We checked the position of the web pages in SE from time to time. After a period of four months no references to the automatically created web pages were found.

So the final words.

Since the web pages were not included in the SEs indexes, the value of creating such web pages is very limited. There are some commercial SW which claim to create automatic articles. I have tried only one of them, so I cannot make claims on their effectiveness. But basically all use the same algorithms. So the results should hold for others as well.

References.

  1. URLs to created web pages. List at http://seo-experiments.blogspot.com/.
  2. Source and Binaries of the SW used to create web pages. http://www.fileshack.us/files/1058/MarkovSeo.zip

Saturday, April 01, 2006

"You must be feeling a bit like alice now?"
Morpheus to Neo, the Matrix.
I surely am an alice wandering the SEO wonderland. And from what I gather, no one knows any thing in SEO. Ok let me rephrase it to, no one knows most of the things in the SEO. The SEO wondeland I have been wandering consists of the forums of highrankings, digitalpoint and WMW.
Are reciprocal links dead? Almost, but *mutual* links are in!
Are directories the next big thing? Umm, erm if they are niche, or they are DMOZ.
Ok these were the easy ones.
If I name a page link.html does SE ignore it?
Do SEs respect the nofollow tag?
For once guys give me an honest to god, clear answer.
I have a real grudge with Google. Why donot they make the PR data publically available, say via their API? There are ways to get PR data, via sites such as www.prchecker.info/check_page_rank.php. By not making PR data publically available, google is only hurting everyone.
If I really want to know PR of a site, I do have tricks using which I can get the PR. But well its against the TOS. So they hurt the webmaster community. But then it does not help them any way. When people get PR without API, methinks the server load on google will be heigher. Not that google would be concerned about it or anything.
So why not
1. Make the PR publically available via say its API?
2. If you donot want to do so, why not do away with PR data all together. Use it only internally. Never show it to an outside guy. Or is PR just a way to force you to use goog toolbar!
The SEO world as seen by a completely brain dead.