<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-25189681</id><updated>2012-02-03T00:55:34.612-08:00</updated><category term='stats'/><category term='python yahoo sitexplorer tutorial'/><category term='parasite hosting'/><category term='Comment spam'/><category term='scrape'/><category term='python'/><category term='Toolbars'/><category term='Captcha'/><category term='reddit'/><category term='blackhat'/><category term='Search engines'/><title type='text'>Alice in the SEO wonderland.</title><subtitle type='html'>Journey through this crazy SEO wonderland. Meet Queen of Hearts, King Google, mad Hatter and the PR. And if you are lucky, you might just see some talking rabbits and a crazy guy called Matt Cutts.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>14</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-25189681.post-8511043827634827471</id><published>2011-10-13T04:32:00.000-07:00</published><updated>2011-10-13T04:33:09.873-07:00</updated><title type='text'>Agiliq - We build amazing apps for android | Android App Development</title><content type='html'>You can find me now Blogging at Agiliq.com where blog about &lt;a href="http://agiliq.com/blog/"&gt;Django Web Development&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-8511043827634827471?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://agiliq.com' title='Agiliq - We build amazing apps for android | Android App Development'/><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/8511043827634827471/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=8511043827634827471' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/8511043827634827471'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/8511043827634827471'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2011/10/agiliq-we-build-amazing-apps-for.html' title='Agiliq - We build amazing apps for android | Android App Development'/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-5618321503273314777</id><published>2007-09-02T05:54:00.000-07:00</published><updated>2007-09-02T06:47:11.292-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='reddit'/><category scheme='http://www.blogger.com/atom/ns#' term='parasite hosting'/><category scheme='http://www.blogger.com/atom/ns#' term='blackhat'/><title type='text'>Parasite hosting - Or why social networking sites need to review user generated content.</title><content type='html'>&lt;a href="#listing"&gt;To directly see the list of highly ranked spammy users pages within reddit&lt;/a&gt;&lt;br /&gt;From &lt;a href="http://en.wikipedia.org/wiki/Parasite_Hosting"&gt;wikipedia&lt;/a&gt;&lt;span style="font-weight: italic;"&gt;: Parasite hosting is the process of hosting a site on some one else's server without their consent, generally for the purpose of search engine benefit.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;One of the most competitive keywords which everyone wants to rank for is &lt;a href="http://www.google.com/search?hl=en&amp;client=opera&amp;rls=en&amp;hs=ca9&amp;q=buy+viagra+online&amp;btnG=Search"&gt;buy viagra online&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;On that and &lt;a href="http://www.google.com/search?q=order+viagra+online&amp;amp;revid=1325216918&amp;sa=X&amp;amp;oi=revisions_inline&amp;resnum=0&amp;amp;ct=broad-revision&amp;amp;cd=1"&gt;many other similar&lt;/a&gt; searches, you would start noticing that user pages of social bookmarking sites like reddit, get very prominent rankings.&lt;br /&gt;These sites have a very high &lt;a href="http://en.wikipedia.org/wiki/TrustRank"&gt;TrustRank&lt;/a&gt; and amazing authority scores. So even off topic pages are very easily ranked high for pages created on these domains.&lt;br /&gt;I was playing with the &lt;a href="http://seodummy.blogspot.com/2007/09/getting-started-on-seo-programming.html"&gt;sitexplorer python library&lt;/a&gt; I wrote. The user pages on the reddit domain outrank even the subreddit pages.&lt;br /&gt;Using a little python I found that of the top 400 pages on the reddit.com domain, 102 are user pages of the form &lt;a href="http://reddit.com/user/shabda/"&gt;reddit.com/user/{username}/&lt;/username&gt;&lt;/a&gt;. All of these are pages promoting prescription pills like Viagra or Cialis.&lt;br /&gt;(&lt;a href="#pycode"&gt;The python program&lt;/a&gt; &lt;a href="http://seodummy.blogspot.com/2007/09/getting-started-on-seo-programming.html"&gt;[1]&lt;/a&gt;&lt;br /&gt;&lt;a href="#listing"&gt;List of the user pages in the top 400 pages on reddit&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;How can the social bookmarking sites combat this?&lt;br /&gt;1. Use a &lt;a href="http://robotstxt.org/"&gt;robots.txt&lt;/a&gt;. For reddit.com this can be as simple as using&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;User-Agent: *&lt;br /&gt;Disallow: /user&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;2. Have kill words which are not allowed in the user names. This allows the pages to be indexed with facing parasite hosting.&lt;br /&gt;&lt;br /&gt;&lt;a name="pycode"&gt;&lt;/a&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;import re&lt;br /&gt;all_pages = []&lt;br /&gt;for i in range(4):&lt;br /&gt;    start = i*100 + 1&lt;br /&gt;    results = get_page_data('YahooDemo', u'http://reddit.com', start = start, results = 100)&lt;br /&gt;    pages = [el['Url'] for el in results]&lt;br /&gt;    all_pages.extend(pages)&lt;br /&gt;&lt;br /&gt;pat = '/user/([a-zA-Z0-9_]*)'&lt;br /&gt;rep = re.compile(pat)&lt;br /&gt;users = [rep.search(el) for el in all_pages]&lt;br /&gt;users_ = [el.groups()[0] for el in users if el is not None]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;a name="listing"&gt;&lt;/a&gt;&lt;br /&gt;List of the users on the reddit.com site. (These links are nofollowed).&lt;br /&gt;[u'&lt;a href=http://reddit.com/user/Buy_viagra_online_/ rel="nofollow"&gt;Buy_viagra_online_&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/BUY_VIAGRA_MEDS/ rel="nofollow"&gt;BUY_VIAGRA_MEDS&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/CHEAP_VIAGRA_PRICE/ rel="nofollow"&gt;CHEAP_VIAGRA_PRICE&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/ORDER_VIAGRA_NOW/ rel="nofollow"&gt;ORDER_VIAGRA_NOW&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/tylerton/ rel="nofollow"&gt;tylerton&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/VIAGRA_ONLINE_CHEAP/ rel="nofollow"&gt;VIAGRA_ONLINE_CHEAP&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Buy_viagra_online/ rel="nofollow"&gt;Buy_viagra_online&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/DISCOUNT_VIAGRA_NOW/ rel="nofollow"&gt;DISCOUNT_VIAGRA_NOW&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/order_viagra_cheap/ rel="nofollow"&gt;order_viagra_cheap&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/order_viagra_online/ rel="nofollow"&gt;order_viagra_online&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/order_cialis_cheap/ rel="nofollow"&gt;order_cialis_cheap&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Viagra_/ rel="nofollow"&gt;Viagra_&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/viagraagain/ rel="nofollow"&gt;viagraagain&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/cialisagain/ rel="nofollow"&gt;cialisagain&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/BUY_VIAGRA_ONLINEE/ rel="nofollow"&gt;BUY_VIAGRA_ONLINEE&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/tramadolagain/ rel="nofollow"&gt;tramadolagain&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/phentermineagain/ rel="nofollow"&gt;phentermineagain&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/levitraagain/ rel="nofollow"&gt;levitraagain&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/BUY_VIAGRA_ONLINE3/ rel="nofollow"&gt;BUY_VIAGRA_ONLINE3&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/BUY_FLAGYL_ONLINE/ rel="nofollow"&gt;BUY_FLAGYL_ONLINE&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/CIALIS_LOWEST_PRICES/ rel="nofollow"&gt;CIALIS_LOWEST_PRICES&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/BUY_HOODIA_ONLINE/ rel="nofollow"&gt;BUY_HOODIA_ONLINE&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/ORDER_VIAGRA_TODAY/ rel="nofollow"&gt;ORDER_VIAGRA_TODAY&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Ephedra_Pills/ rel="nofollow"&gt;Ephedra_Pills&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Viagra_online/ rel="nofollow"&gt;Viagra_online&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/BUY_VIAGRA_ONLINE2/ rel="nofollow"&gt;BUY_VIAGRA_ONLINE2&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/dans_movies/ rel="nofollow"&gt;dans_movies&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/panda_movies/ rel="nofollow"&gt;panda_movies&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/cialispills/ rel="nofollow"&gt;cialispills&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/BUY_VIAGRA1/ rel="nofollow"&gt;BUY_VIAGRA1&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Viagrapills_Online/ rel="nofollow"&gt;Viagrapills_Online&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/phenterminepills/ rel="nofollow"&gt;phenterminepills&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/tramadolpills/ rel="nofollow"&gt;tramadolpills&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/free_porn_movies/ rel="nofollow"&gt;free_porn_movies&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/phenterminepharm/ rel="nofollow"&gt;phenterminepharm&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Soma_Carisoprodol/ rel="nofollow"&gt;Soma_Carisoprodol&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/cialispharm/ rel="nofollow"&gt;cialispharm&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/viagraonline/ rel="nofollow"&gt;viagraonline&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Cheapest_Fioricet/ rel="nofollow"&gt;Cheapest_Fioricet&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Meridia_Diet_Pills/ rel="nofollow"&gt;Meridia_Diet_Pills&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/viagrapills/ rel="nofollow"&gt;viagrapills&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/valiumpills/ rel="nofollow"&gt;valiumpills&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/levitrapharm/ rel="nofollow"&gt;levitrapharm&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/cialis_buy/ rel="nofollow"&gt;cialis_buy&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Buy_Percocet/ rel="nofollow"&gt;Buy_Percocet&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/viagrapharm/ rel="nofollow"&gt;viagrapharm&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/BUY_VIAGRA_MD/ rel="nofollow"&gt;BUY_VIAGRA_MD&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Tramadol_Hcl/ rel="nofollow"&gt;Tramadol_Hcl&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Generic_Propecia/ rel="nofollow"&gt;Generic_Propecia&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/xanaxpill/ rel="nofollow"&gt;xanaxpill&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/BUY_LEVITRA_ONLINE1/ rel="nofollow"&gt;BUY_LEVITRA_ONLINE1&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/tramadolpill/ rel="nofollow"&gt;tramadolpill&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/cialis_cheap/ rel="nofollow"&gt;cialis_cheap&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/tramadolpharm/ rel="nofollow"&gt;tramadolpharm&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/CIALIS_BEST_PRICES/ rel="nofollow"&gt;CIALIS_BEST_PRICES&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/ordertramadol/ rel="nofollow"&gt;ordertramadol&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/phenterminepill/ rel="nofollow"&gt;phenterminepill&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/order_levitra_med/ rel="nofollow"&gt;order_levitra_med&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/orderxanax/ rel="nofollow"&gt;orderxanax&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/cialis_online_drug/ rel="nofollow"&gt;cialis_online_drug&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/orderphentermine/ rel="nofollow"&gt;orderphentermine&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/generic_cialis_pill/ rel="nofollow"&gt;generic_cialis_pill&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/ordercialis/ rel="nofollow"&gt;ordercialis&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/CIALIS_ONLINE/ rel="nofollow"&gt;CIALIS_ONLINE&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/BUY_VIAGRA_ONLINE1/ rel="nofollow"&gt;BUY_VIAGRA_ONLINE1&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/online_buy_cialis/ rel="nofollow"&gt;online_buy_cialis&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/CHEAP_VIAGRA_PRICES/ rel="nofollow"&gt;CHEAP_VIAGRA_PRICES&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/generic_levitra_pill/ rel="nofollow"&gt;generic_levitra_pill&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/orderviagra/ rel="nofollow"&gt;orderviagra&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/LEVITRA_SALE/ rel="nofollow"&gt;LEVITRA_SALE&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/CHEAP_VIAGRA_ONLINE/ rel="nofollow"&gt;CHEAP_VIAGRA_ONLINE&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Order_Cialis_Online0/ rel="nofollow"&gt;Order_Cialis_Online0&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/BUY_DISCOUNT_VIAGRA/ rel="nofollow"&gt;BUY_DISCOUNT_VIAGRA&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/order_cialis_online/ rel="nofollow"&gt;order_cialis_online&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/BUY_VIAGRA_TODAY/ rel="nofollow"&gt;BUY_VIAGRA_TODAY&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/VIAGRA_LOWEST_PRICE/ rel="nofollow"&gt;VIAGRA_LOWEST_PRICE&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/CHEAP_VIAGRA_PILL/ rel="nofollow"&gt;CHEAP_VIAGRA_PILL&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/viagracialis/ rel="nofollow"&gt;viagracialis&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Suboxone/ rel="nofollow"&gt;Suboxone&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/VIAGRA_BEST_PRICES/ rel="nofollow"&gt;VIAGRA_BEST_PRICES&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/FDA_CIALIS_ONLINE/ rel="nofollow"&gt;FDA_CIALIS_ONLINE&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/DISCOUNT_VIAGRA_A/ rel="nofollow"&gt;DISCOUNT_VIAGRA_A&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/cialistop/ rel="nofollow"&gt;cialistop&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/Buy_viagra_/ rel="nofollow"&gt;Buy_viagra_&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/phenermine/ rel="nofollow"&gt;phenermine&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/FDA_LEVITRA/ rel="nofollow"&gt;FDA_LEVITRA&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/phenterminetop/ rel="nofollow"&gt;phenterminetop&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/insura/ rel="nofollow"&gt;insura&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/viagrapharmacy/ rel="nofollow"&gt;viagrapharmacy&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/cialispharmacy/ rel="nofollow"&gt;cialispharmacy&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/viagratop/ rel="nofollow"&gt;viagratop&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/levitrapharmacy/ rel="nofollow"&gt;levitrapharmacy&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/phenterminepharmacy/ rel="nofollow"&gt;phenterminepharmacy&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/autoverzekering/ rel="nofollow"&gt;autoverzekering&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/discount_viagra/ rel="nofollow"&gt;discount_viagra&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/goba/ rel="nofollow"&gt;goba&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/levitratop/ rel="nofollow"&gt;levitratop&lt;/a&gt;', u'&lt;a href=http://reddit.com/user/generic_cialis/ rel="nofollow"&gt;generic_cialis&lt;/a&gt;']&lt;br /&gt;&gt;&gt;&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-5618321503273314777?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/5618321503273314777/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=5618321503273314777' title='40 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/5618321503273314777'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/5618321503273314777'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2007/09/parasite-hosting-or-why-social.html' title='Parasite hosting - Or why social networking sites need to review user generated content.'/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>40</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-4466472395139812623</id><published>2007-09-02T03:56:00.000-07:00</published><updated>2007-09-02T05:07:07.724-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='python yahoo sitexplorer tutorial'/><title type='text'>Getting started on SEO programming (using Python)</title><content type='html'>&lt;a href="http://code.google.com/p/pynswers/wiki/PySitexplorer"&gt;The python code is here.&lt;/a&gt;&lt;br /&gt;&lt;a href="#install"&gt;Usage instructions are here&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;You own a website and want to keep track of its placement in search engines. You want to know who is linking to you, how many of your pages are indexed in the search engines. You want to tell the SE when you update you sitemaps or when you update your   website.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://developer.yahoo.com/search/siteexplorer/"&gt;SiteExplorerApi&lt;/a&gt; from yahoo makes this extremely convenient. And with Google discontinuing the &lt;a href="http://code.google.com/apis/soapsearch/"&gt;Soap Search API&lt;/a&gt;, this is the only feasible choice.&lt;br /&gt;The site explorer api is a &lt;a href="http://en.wikipedia.org/wiki/Representational_State_Transfer"&gt;Rest&lt;/a&gt; service. You construct a URL, and make a request, from your browser, from your command line, or any place else. You need to parse the server's response to get the data in the format of your choice.&lt;br /&gt;We would write a thin Python wrapper over this rest service so that we can construct our queries in python.&lt;br /&gt;&lt;br /&gt;(to follow these examples, you need &lt;a href="http://code.google.com/p/pynswers/wiki/PySitexplorer"&gt;this python code&lt;/a&gt;, and &lt;a href="http://cheeseshop.python.org/pypi/simplejson"&gt;simplejson&lt;/a&gt; library)&lt;br /&gt;&lt;br /&gt;Some simple examples.&lt;br /&gt;1. We want to get the top 1000 sites which link to &lt;a href="http://reddit.com/"&gt;reddit&lt;/a&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;all_urls = []&lt;br /&gt;for i in range(10):&lt;br /&gt;  start = i*100 + 1&lt;br /&gt;  results = get_inlink_data('YahooDemo', u'http://reddit.com', start = start, results = 100)&lt;br /&gt;  urls = [el['Url'] for el in results]&lt;br /&gt;  all_urls.extend(urls)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;2. We want the 400 highest rated pages on reddit.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;all_pages = []&lt;br /&gt;for i in range(4):&lt;br /&gt;  start = i*100 + 1&lt;br /&gt;  results = get_page_data('YahooDemo', u'http://reddit.com', start = start, results = 100)&lt;br /&gt;  pages = [el['Url'] for el in results]&lt;br /&gt;  all_pages.extend(pages)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;3. Google.com has updated its sitemap. We want to let Yahoo know of it.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;do_ping(u'http://www.google.com/sitemap.xml')&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;4. I have updated SeoDummy. Lets tell yahoo of that.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;do_update_notification('YahooDemo', 'http://www.seodummy.blogspot.com/')&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;5. You can use these methods in conjunction to get some advanced functionality. For example, you can use get_inlink_data and get_page_data together to get a breakup of who links to each of your subpages.&lt;br /&gt;For examples of some &lt;a href="http://tools.seobook.com/"&gt;cool SEO tools, you can go here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a name="install"&gt;&lt;/a&gt;&lt;br /&gt;You would need to get &lt;a href="http://cheeseshop.python.org/pypi/simplejson"&gt;simplejson&lt;/a&gt; to use this library. We get the response from yahoo in Json, and simplejson is needed to parse that.&lt;br /&gt;There are four methods corresponding to the 4 yahoo api calls. The arguments for each method are exactly same as required arguments for the REst api, excepting &lt;pre&gt;output&lt;/pre&gt; and &lt;pre&gt;callback&lt;/pre&gt;, which are never used.&lt;br /&gt;&lt;br /&gt;get_inlink_data(&lt;a href="http://developer.yahoo.com/search/siteexplorer/V1/inlinkData.html"&gt;inLinkData&lt;/a&gt;)&lt;br /&gt;get_page_data(&lt;a href="http://developer.yahoo.com/search/siteexplorer/V1/pageData.html"&gt;pageData&lt;/a&gt;)&lt;br /&gt;do_ping(&lt;a href="http://developer.yahoo.com/search/siteexplorer/V1/ping.html"&gt;ping&lt;/a&gt;)&lt;br /&gt;do_update_notification(&lt;a href="http://developer.yahoo.com/search/siteexplorer/V1/updateNotification.html"&gt;update_notification&lt;/a&gt;)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-4466472395139812623?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/4466472395139812623/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=4466472395139812623' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/4466472395139812623'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/4466472395139812623'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2007/09/getting-started-on-seo-programming.html' title='Getting started on SEO programming (using Python)'/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-7449369442394839073</id><published>2007-08-23T05:24:00.000-07:00</published><updated>2007-08-23T09:50:13.604-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='reddit'/><category scheme='http://www.blogger.com/atom/ns#' term='stats'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='scrape'/><title type='text'>Python fun with reddit URLs</title><content type='html'>These days I spend a lot of time on reddit. So I got a itch, to find out which sites are the most popular on reddit, what do people comment on and what is the average points a url submitted gets. So I wrote a quick &lt;a href="http://paste.lisp.org/display/46603"&gt;python program&lt;/a&gt;(&lt;a href="#pyth"&gt;Description&lt;/a&gt;) to scrape reddit, and found that&lt;br /&gt;&lt;br /&gt;Scraping the 1000 highest rated submissions at &lt;a href="http://reddit.com/top?offset=0"&gt;reddit.com/top&lt;/a&gt;,&lt;br /&gt;1. The sites with most entries are www.nytimes.com, reddit.com, www.flickr.com www.youtube.com, www.washingtonpost.com. Xkcd.com beats en.wikipedia.org by getting 11 entries to wikipedia's 10 .&lt;br /&gt;2. http://reddit.com/goto?id=1328g got maximum points ever, 1937.  (Hint, hint)&lt;br /&gt;3. The average points for top 1000 submissions are 682.851&lt;br /&gt;4. Longest title has 83 words and says&lt;br /&gt;&lt;span style="font-style: italic;"&gt; Barak Obama in 2002: "I know that even a successful war against Iraq will require a US occupation of undetermined length, at undetermined cost, with undetermined consequences. I know that an invasion of Iraq without a clear rationale and without strong international support will only fan the flames of the Middle East, and encourage the worst, rather than best, impulses of the Arab world, and strengthen the recruitment arm of al-Qaeda.&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt; I am not opposed to all wars. I’m opposed to dumb wars."&lt;br /&gt;&lt;/span&gt;5.  There are 637 unique sites.&lt;br /&gt;6. Average title length is 11.774&lt;br /&gt;7. 516 sites have only one submission.&lt;br /&gt;8. The most common &lt;span style="font-style: italic;"&gt;uncommon word &lt;/span&gt;in the title is&lt;span style="font-style: italic;"&gt; &lt;/span&gt;[pic] (54 repeatations).&lt;br /&gt;&lt;a href="#red_tp"&gt;raw output from python program&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Scraping the top 1000 sites on &lt;a href="http://reddit.com/?offset=0"&gt;reddit.com&lt;/a&gt;&lt;br /&gt;1. Sites with most submissions are news.yahoo.com, news.bbc.co.uk, www.youtube.com, www.nytimes.com, www.washingtonpost.com&lt;br /&gt;2. Average points are 55.138&lt;br /&gt;3. Largest title is 51 words.&lt;br /&gt;4. Total unique sites 566&lt;br /&gt;5. Average title length 10.308&lt;br /&gt;6. 365 sites have only one submission.&lt;br /&gt;7. The most common &lt;span style="font-style: italic;"&gt;uncommon word &lt;/span&gt;in title is&lt;span style="font-style: italic;"&gt; &lt;/span&gt;iraq (34 repeatations).&lt;br /&gt;&lt;a href="#red"&gt;raw output from python program&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Scapping the all time top submissions on &lt;a href="http://programming.reddit.com/top?offset=0"&gt;programming.reddit.com/top&lt;/a&gt;&lt;br /&gt;1. Sites with most submissions are www.codinghorror.com, www.joelonsoftware.com, groups.google.com, xkcd.com, thedailywtf.com.&lt;br /&gt;2. Average points are 221.961&lt;br /&gt;3. Longest title is 47 words.&lt;br /&gt;4. Average title length is 8.385&lt;br /&gt;5. 559 sites have only one submission.&lt;br /&gt;6. Total unique sites are 675&lt;br /&gt;7. 7. The most common &lt;span style="font-style: italic;"&gt;uncommon word &lt;/span&gt;in title is&lt;span style="font-style: italic;"&gt; &lt;/span&gt;programming (obviously) (58 repeatations).&lt;br /&gt;8. Lisp is the most common language name in the title, followed by python.&lt;br /&gt;9. Maximum points are 1609 by http://upload.wikimedia.org/wikipedia/commons/1/17/Metric_system.png&lt;br /&gt;&lt;a href="#prog_red_tp"&gt;raw output from python program&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a name="pyth"&gt;&lt;/a&gt;&lt;br /&gt;The python program can be found at &lt;a href="http://paste.lisp.org/display/46603"&gt;paste.lisp.org&lt;/a&gt;. It needs &lt;a href="www.crummy.com/software/BeautifulSoup/ "&gt;BeautifulSoup &lt;/a&gt;to work. It can work on any subreddit if you modify the base_url in the script. Running this script would be a heavy resource drain on the reddit servers. So, Please do not abuse it. If you need the output file of these, just mail me, and I would be willing to send them to you.&lt;br /&gt;&lt;a name="red_tp"&gt;&lt;br /&gt;&lt;/a&gt;****fun with reddit urls(base_url = http://reddit.com/top?offset=)****&lt;br /&gt;total sites are 1000&lt;br /&gt;total unique sites 637&lt;br /&gt;top 20 sites are [(u'reddit.com', 24), (u'www.nytimes.com', 24), (u'www.flickr.com', 19), (u'www.youtube.com', 15), (u'www.washingtonpost.com', 14), (u'news.bbc.co.uk', 12), (u'news.yahoo.com', 12), (u'xkcd.com', 11), (u'en.wikipedia.org', 10), (u'www.guardian.co.uk', 10), (u'www.craigslist.org', 9), (u'consumerist.com', 7), (u'www.google.com', 7), (u'www.msnbc.msn.com', 7), (u'www.snopes.com', 7), (u'money.cnn.com', 6), (u'www.crooksandliars.com', 6), (u'www.dailymail.co.uk', 6), (u'community.livejournal.com', 5), (u'pressesc.com', 5)]&lt;br /&gt;Sites with only one entry 516&lt;br /&gt;maximum points are 1937 by http://reddit.com/info/1328g/comments&lt;br /&gt;average points are 682.851&lt;br /&gt;average title length 11.774&lt;br /&gt;largest title has length 83 and is           Barak Obama in 2002: &amp;quot;I know that even a successful war against Iraq will require a US occupation of undetermined length, at undetermined cost, with undetermined consequences. I know that an invasion of Iraq without a clear rationale and without strong international support will only fan the flames of the Middle East, and encourage the worst, rather than best, impulses of the Arab world, and strengthen the recruitment arm of al-Qaeda.&lt;br /&gt; I am not opposed to all wars. I’m opposed to dumb wars.&amp;quot;&lt;br /&gt;50 most common words are [(u'the', 354), (u'to', 291), (u'of', 243), (u'a', 223), (u'in', 144), (u'and', 133), (u'The', 111), (u'you', 105), (u'for', 104), (u'is', 95), (u'on', 82), (u'-', 71), (u'I', 56), (u'that', 52), (u'with', 51), (u'from', 49), (u'it', 49), (u'A', 47), (u'are', 46), (u'at', 45), (u'this', 39), (u'What', 38), (u'by', 37), (u'not', 37), (u'an', 36), (u'How', 35), (u'You', 33), (u'about', 33), (u'as', 33), (u'your', 33), (u'This', 29), (u'his', 29), (u'[pic]', 27), (u'Bush', 26), (u'be', 26), (u'have', 26), (u'like', 26), (u'up', 26), (u'if', 25), (u'no', 25), (u'Why', 24), (u'can', 24), (u'do', 21), (u'they', 21), (u'what', 21), (u'US', 20), (u'get', 20), (u'or', 20), (u'we', 20), (u'Google', 19)]&lt;br /&gt;50 most common words, ignoring case are [('the', 467), ('to', 307), ('a', 270), ('of', 252), ('in', 162), ('and', 143), ('you', 138), ('for', 117), ('is', 108), ('on', 92), ('-', 71), ('this', 71), ('that', 64), ('it', 61), ('what', 59), ('i', 58), ('from', 56), ('with', 56), ('[pic]', 54), ('are', 53), ('not', 50), ('at', 48), ('your', 48), ('an', 47), ('how', 47), ('if', 41), ('by', 40), ('about', 39), ('as', 36), ('can', 34), ('why', 34), ('no', 33), ('we', 33), ('have', 32), ('do', 31), ('his', 31), ('they', 31), ('(pic)', 30), ('like', 29), ('up', 28), ('bush', 27), ('one', 27), ('be', 26), ('who', 25), ('all', 23), ('it&amp;#39;s', 23), ('so', 23), ('was', 23), ('when', 23), ('but', 22)]&lt;br /&gt;&lt;a name="red"&gt;&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;****fun with reddit urls(http://reddit.com/?offset=)****&lt;br /&gt;total sites are 1000&lt;br /&gt;total unique sites 566&lt;br /&gt;top 20 sites are [(u'news.yahoo.com', 22), (u'news.bbc.co.uk', 21), (u'www.youtube.com', 18), (u'www.nytimes.com', 16), (u'www.washingtonpost.com', 12), (u'www.wired.com', 11), (u'www.cnn.com', 9), (u'thinkprogress.org', 8), (u'www.guardian.co.uk', 8), (u'www.salon.com', 7), (u'blog.wired.com', 6), (u'www.chinapost.com.tw', 6), (u'www.dailymail.co.uk', 6), (u'www.myfoxdfw.com', 6), (u'www.opednews.com', 6), (u'www.reuters.com', 6), (u'www.telegraph.co.uk', 6), (u'www.timesonline.co.uk', 6), (u'apnews.myway.com', 5), (u'en.wikipedia.org', 5)]&lt;br /&gt;Sites with only one entry 365&lt;br /&gt;maximum points are 895&lt;br /&gt;average points are 55.138&lt;br /&gt;average title length 10.308&lt;br /&gt;largest title has length 51 and is           [Quote] A tyrant must put on the appearance of uncommon devotion to religion. Subjects are less apprehensive of illegal treatment from a ruler whom they consider god-fearing and pious. On the other hand, they do less easily move against him, believing that he has the gods on his side - Aristotle&lt;br /&gt;50 most common words are [(u'the', 306), (u'of', 232), (u'to', 221), (u'a', 159), (u'in', 157), (u'and', 125), (u'The', 105), (u'for', 94), (u'-', 90), (u'on', 79), (u'is', 68), (u'with', 48), (u'by', 41), (u'that', 39), (u'A', 37), (u'Iraq', 37), (u'from', 34), (u'Bush', 33), (u'are', 31), (u'New', 30), (u'at', 29), (u'as', 28), (u'have', 26), (u'you', 26), (u'How', 25), (u'your', 25), (u'Of', 24), (u'US', 24), (u'about', 23), (u'In', 22), (u'not', 22), (u'For', 21), (u'I', 20), (u'To', 19), (u'be', 19), (u'this', 19), (u'Vietnam', 18), (u'an', 18), (u'they', 18), (u'American', 17), (u'no', 17), (u'U.S.', 16), (u'was', 16), (u'their', 15), (u'will', 15), (u'Is', 14), (u'What', 14), (u'Why', 14), (u'You', 14), (u'has', 14)]&lt;br /&gt;50 most common words, ignoring case are [('the', 414), ('of', 257), ('to', 241), ('a', 196), ('in', 179), ('and', 132), ('for', 117), ('on', 94), ('-', 90), ('is', 83), ('with', 62), ('that', 46), ('by', 44), ('are', 40), ('new', 40), ('you', 40), ('not', 39), ('iraq', 37), ('your', 37), ('from', 35), ('at', 34), ('bush', 33), ('how', 33), ('as', 30), ('us', 30), ('about', 28), ('an', 28), ('have', 28), ('be', 25), ('do', 25), ('they', 25), ('no', 24), ('this', 24), ('war', 23), ('will', 23), ('it', 21), ('i', 20), ('my', 20), ('out', 20), ('what', 20), ('police', 19), ('has', 18), ('vietnam', 18), ('we', 18), ('why', 18), ('american', 17), ('if', 17), ('says', 17), ('their', 17), ('was', 17)]&lt;br /&gt;&lt;a name="prog_red_tp"&gt;&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;****fun with reddit urls(base_url = http://programming.reddit.com/top?offset=)****&lt;br /&gt;total sites are 1000&lt;br /&gt;total unique sites 675&lt;br /&gt;top 20 sites are [(u'www.codinghorror.com', 31), (u'www.joelonsoftware.com', 22), (u'groups.google.com', 17), (u'xkcd.com', 16), (u'thedailywtf.com', 12), (u'programming.reddit.com', 10), (u'worsethanfailure.com', 10), (u'paulgraham.com', 9), (u'blogs.msdn.com', 8), (u'blogs.sun.com', 8), (u'www.defmacro.org', 8), (u'arstechnica.com', 7), (u'en.wikipedia.org', 7), (u'kerneltrap.org', 7), (u'steve-yegge.blogspot.com', 7), (u'weblog.raganwald.com', 7), (u'codist.biit.com', 6), (u'scienceblogs.com', 6), (u'www.paulgraham.com', 6), (u'diveintomark.org', 5)]&lt;br /&gt;Sites with only one entry 559&lt;br /&gt;maximum points are 1609 by http://upload.wikimedia.org/wikipedia/commons/1/17/Metric_system.png&lt;br /&gt;average points are 221.961&lt;br /&gt;average title length 8.385&lt;br /&gt;largest title has length 47 and is           &amp;quot;The &amp;quot;you don&amp;#39;t own your computer&amp;quot; paradigm is not merely wrong. It is violently, disastrously wrong, and the consequences of this error are likely to be felt for generations to come, unless steps are taken to prevent it.&amp;quot;   On the need for a Hippocratic Oath for programmers.&lt;br /&gt;50 most common words are [(u'the', 186), (u'to', 168), (u'of', 159), (u'a', 148), (u'The', 137), (u'in', 103), (u'and', 89), (u'-', 79), (u'for', 77), (u'on', 71), (u'is', 66), (u'Why', 54), (u'you', 54), (u'I', 46), (u'How', 45), (u'A', 38), (u'Programming', 38), (u'with', 36), (u'your', 33), (u'Google', 32), (u'What', 30), (u'by', 30), (u'Lisp', 29), (u'about', 26), (u'from', 26), (u'Software', 25), (u'it', 25), (u'not', 25), (u'an', 24), (u'are', 24), (u'code', 22), (u'that', 22), (u'Python', 21), (u'do', 21), (u'Linux', 20), (u'be', 20), (u'programming', 20), (u'software', 20), (u'Web', 18), (u'To', 17), (u'at', 17), (u'this', 17), (u'Is', 16), (u'all', 16), (u'as', 16), (u'how', 16), (u'why', 15), (u'--', 14), (u'Microsoft', 14), (u'Ruby', 14)]&lt;br /&gt;50 most common words, ignoring case are [('the', 323), ('a', 186), ('to', 186), ('of', 164), ('in', 110), ('and', 98), ('for', 83), ('is', 82), ('on', 80), ('-', 79), ('you', 71), ('why', 69), ('how', 61), ('programming', 58), ('i', 46), ('software', 45), ('your', 41), ('with', 40), ('what', 39), ('not', 35), ('code', 34), ('it', 34), ('lisp', 34), ('an', 33), ('about', 32), ('by', 32), ('google', 32), ('are', 30), ('from', 30), ('do', 29), ('web', 29), ('all', 25), ('be', 25), ('computer', 25), ('my', 25), ('this', 25), ('that', 24), ('one', 22), ('language', 21), ('linux', 21), ('python', 21), ('can', 20), ('at', 19), ('new', 19), ('things', 18), ('when', 18), ('as', 17), ('it&amp;#39;s', 17), ('like', 17), ('programmers', 17)]&lt;br /&gt;&lt;br /&gt;www.codinghorror.com, www.joelonsoftware.com, groups.google.com, xkcd.com, thedailywtf.com&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-7449369442394839073?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/7449369442394839073/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=7449369442394839073' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/7449369442394839073'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/7449369442394839073'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2007/08/python-fun-with-reddit-urls.html' title='Python fun with reddit URLs'/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-2423650574121792769</id><published>2007-01-19T09:10:00.000-08:00</published><updated>2007-01-19T10:20:13.994-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Search engines'/><category scheme='http://www.blogger.com/atom/ns#' term='Toolbars'/><title type='text'>The war for your search bar</title><content type='html'>(Welcome &lt;a href="http://reddit.com/info/z6ne/comments"&gt;reddit&lt;/a&gt; users)&lt;br /&gt;If you are anything like me, you probably have &lt;a href="http://toolbar.google.com/"&gt;Google toolbar&lt;/a&gt; installed in your primary browser. It does many things, but foremost, it allows you to search Google without going to google.com.&lt;br /&gt;But there is a search bar built right within your browser. It sits right next to the address bar.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_ka0zcs_C8q8/RbEAzUmlFGI/AAAAAAAAAAk/w8Uivl9544g/s1600-h/2.png"&gt;&lt;img style="cursor: pointer;" src="http://bp1.blogger.com/_ka0zcs_C8q8/RbEAzUmlFGI/AAAAAAAAAAk/w8Uivl9544g/s400/2.png" alt="" id="BLOGGER_PHOTO_ID_5021795941198664802" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Now you might think that no one would care about such a puny, tiny- winy search bar. And sir, can you be more wrong?&lt;br /&gt;&lt;br /&gt;It all started when I wanted to install &lt;a href="http://www.picasa.com/"&gt;picasa&lt;/a&gt;, this is what I get in the last step of installation.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_ka0zcs_C8q8/RbEBgEmlFHI/AAAAAAAAAAs/mAkV2PEQrms/s1600-h/3.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://bp0.blogger.com/_ka0zcs_C8q8/RbEBgEmlFHI/AAAAAAAAAAs/mAkV2PEQrms/s400/3.png" alt="" id="BLOGGER_PHOTO_ID_5021796709997810802" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Now picasa is an image management software. Why should it try to reset my search preferences? Oh and by the way, the default option is to switch the default search engine, not to&lt;br /&gt;retain your preferences.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_ka0zcs_C8q8/RbEChkmlFJI/AAAAAAAAAA8/392Gnm8m5dA/s1600-h/3.5.PNG"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp2.blogger.com/_ka0zcs_C8q8/RbEChkmlFJI/AAAAAAAAAA8/392Gnm8m5dA/s400/3.5.PNG" alt="" id="BLOGGER_PHOTO_ID_5021797835279242386" border="0" /&gt;&lt;/a&gt;Bad, bad Google. Stealing my search bar! Yahoo would not do anything like that. Let's install &lt;a href="http://toolbar.yahoo.com/"&gt;Yahoo search bar&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_ka0zcs_C8q8/RbEDP0mlFKI/AAAAAAAAABE/lQdAY80Y0G0/s1600-h/4.png"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp3.blogger.com/_ka0zcs_C8q8/RbEDP0mlFKI/AAAAAAAAABE/lQdAY80Y0G0/s400/4.png" alt="" id="BLOGGER_PHOTO_ID_5021798629848192162" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Aw! Not so  fast yahoo baby. Cap'n Google won't let you change the default option.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_ka0zcs_C8q8/RbEDyEmlFLI/AAAAAAAAABM/c2iV0S5vtm4/s1600-h/5.png"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp0.blogger.com/_ka0zcs_C8q8/RbEDyEmlFLI/AAAAAAAAABM/c2iV0S5vtm4/s400/5.png" alt="" id="BLOGGER_PHOTO_ID_5021799218258711730" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Well then lets try &lt;a href="http://toolbar.msn.com/"&gt;MSN toolbar&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_ka0zcs_C8q8/RbEEikmlFMI/AAAAAAAAABU/-FYfg496wLg/s1600-h/6.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://bp2.blogger.com/_ka0zcs_C8q8/RbEEikmlFMI/AAAAAAAAABU/-FYfg496wLg/s400/6.png" alt="" id="BLOGGER_PHOTO_ID_5021800051482367170" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;So does opening Gmail change search preferences too? Look like it does not. Thank god for small mercies.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_ka0zcs_C8q8/RbEFHkmlFNI/AAAAAAAAABc/rCtC_z377Ao/s1600-h/8.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://bp2.blogger.com/_ka0zcs_C8q8/RbEFHkmlFNI/AAAAAAAAABc/rCtC_z377Ao/s400/8.png" alt="" id="BLOGGER_PHOTO_ID_5021800687137526994" border="0" /&gt;&lt;/a&gt;Lets see what happpens if I manually change the &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_ka0zcs_C8q8/RbEFxkmlFOI/AAAAAAAAABk/IHsIgQWmJAI/s1600-h/9.png"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp2.blogger.com/_ka0zcs_C8q8/RbEFxkmlFOI/AAAAAAAAABk/IHsIgQWmJAI/s400/9.png" alt="" id="BLOGGER_PHOTO_ID_5021801408692032738" border="0" /&gt;&lt;/a&gt;search settings.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Do they do this with firefox too?&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_ka0zcs_C8q8/RbEGRUmlFPI/AAAAAAAAABs/OGez28VVKCM/s1600-h/10.png"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp1.blogger.com/_ka0zcs_C8q8/RbEGRUmlFPI/AAAAAAAAABs/OGez28VVKCM/s400/10.png" alt="" id="BLOGGER_PHOTO_ID_5021801954152879346" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Looks like they do.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;When you install a toolbar aren't you already reserving a part of your screen real estate to that search engine. And then shouldn't the toolbar offer to leave your search bar, instead of trying to capture it?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-2423650574121792769?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/2423650574121792769/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=2423650574121792769' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/2423650574121792769'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/2423650574121792769'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2007/01/war-for-your-search-bar.html' title='The war for your search bar'/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp1.blogger.com/_ka0zcs_C8q8/RbEAzUmlFGI/AAAAAAAAAAk/w8Uivl9544g/s72-c/2.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-4847693588347490180</id><published>2007-01-18T07:51:00.000-08:00</published><updated>2007-01-18T08:36:17.201-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Captcha'/><category scheme='http://www.blogger.com/atom/ns#' term='Comment spam'/><title type='text'>ACAPTCHA - Almost Completely Automated Public Turing test to tell Computers and Humans Apart</title><content type='html'>(Welcome &lt;a href="http://reddit.com/info/z05j/comments"&gt;reddit&lt;/a&gt; users)&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Captcha"&gt;Captcha&lt;/a&gt; generally (&lt;a href="http://seodummy.blogspot.com/2006/04/how-spammers-are-beating-captcha.html"&gt;but not always&lt;/a&gt;)solve the problem of comment and other spam. But this comes at a price. Users with low visibility and other disablities find solving captcha hard. And blind users cant solve it unless you provide an alternative audio captcha. &lt;a href="http://sethgodin.typepad.com/seths_blog/2006/12/three_follow_up.html"&gt;Why, even Seth hates it!&lt;/a&gt;&lt;br /&gt;&lt;a href="http://damienkatz.net/2007/01/negative_captch.html"&gt;Negative captcha&lt;/a&gt; - where you hide form fields via CSS so user can't see it and hence not fill it, while bots will, is an interesting possibility. But let me itroduce ACAPTCHA - "&lt;b&gt;Almost C&lt;/b&gt;ompletely &lt;b&gt;A&lt;/b&gt;utomated &lt;b&gt;P&lt;/b&gt;ublic &lt;a href="http://en.wikipedia.org/wiki/Turing_test" title="Turing test"&gt;&lt;b&gt;T&lt;/b&gt;uring test&lt;/a&gt; to tell &lt;b&gt;C&lt;/b&gt;omputers and &lt;b&gt;H&lt;/b&gt;umans &lt;b&gt;A&lt;/b&gt;part" to you. This is waht you do.&lt;br /&gt;&lt;br /&gt;1. There are some questions which are very easy for humans to answer but very difficult for bots to understand. Take "What color is a blue towel?" or "Is a green towel red?". Any (well most) humans can answer that qwestion in a snap, but probably not bot can.&lt;br /&gt;2. Create a centralized AND rapidly changing repository of such questions. May be allow users to submit new questions and answers there. May be peer review questions before accepting them, whatever you do get a large and fast changing repositary.&lt;br /&gt;3. Create a plugin/architecture where you get a random question for the repositary (ala &lt;a href="http://en.wikipedia.org/wiki/Akismet"&gt;Akismet&lt;/a&gt; which is a distributed anti spam engine) and ask users to solve it.&lt;br /&gt;There are already some sites which try to do something similar. They ask question where they ask something like "What is 2 + 2". The problem is, it is probably very easy to break this. As soon as this becomes mainstream, you can be sure that the bots will break trough and abuse. To beat completely automated systems, you need to bring in human intelligence.&lt;br /&gt;&lt;br /&gt;Updates -&lt;br /&gt;Foo asked: ". The repo would have to include the *answers* and be as easily downloadable, right? Right. So Mr. Spammer wins again."&lt;br /&gt;And I say: Well no the idea is that the central repository has say a million questions and answers. And whenever any site wants to check using a ACaptcha, they ask for a question-answer pair(Using an API). Now no one excepting the repository has all the questions and each time the spammers get a new question. This is why you need the repository to get new questions quickly, so that spammers can not build up a bank of questions over time and know there answers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-4847693588347490180?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/4847693588347490180/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=4847693588347490180' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/4847693588347490180'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/4847693588347490180'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2007/01/acaptcha-almost-completely-automated.html' title='ACAPTCHA - Almost Completely Automated Public Turing test to tell Computers and Humans Apart'/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-115122474645133123</id><published>2006-06-25T01:33:00.000-07:00</published><updated>2006-06-25T01:39:07.443-07:00</updated><title type='text'>What does not work with SEO.</title><content type='html'>With my random walks in SEO world, I have been trying to find what works in the seo universe. Now you would find a million people giving a million different advice. Though there are a few people who are right on mark, most of the advice is either pure BS or very, very outdated.&lt;br /&gt;So instead of adding to that garbage and telling you what works in SEO, let me tell you what doesnot.&lt;br /&gt;If there is a technique which everyone is using, run. Run fast and away from it. It is going to be abused by shady SEO guys. ANd then you can be sure that the SE would penalise it.&lt;br /&gt;Artice submission and directory submissions are the things to do right now. But with every one using it, I wonder for how long that is going to stay that way!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-115122474645133123?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/115122474645133123/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=115122474645133123' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/115122474645133123'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/115122474645133123'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2006/06/what-does-not-work-with-seo.html' title='What does not work with SEO.'/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-114669574898051996</id><published>2006-05-03T13:22:00.000-07:00</published><updated>2006-05-03T21:47:39.873-07:00</updated><title type='text'>How google helps spammers and destroys your internet experience.</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/996/865/1600/0.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://photos1.blogger.com/blogger/996/865/320/0.gif" alt="" border="0" /&gt;&lt;/a&gt;Not so long ago, if anyone asked you to name &lt;span style="font-weight: bold;"&gt;the&lt;/span&gt; one thing which made web better, chances are you would have named Google. And so would have I. &lt;span style="font-weight: bold;"&gt;Not so long ago.&lt;/span&gt;&lt;br /&gt;Are you a webmaster? Quick, one term which just spoils your day. Was it MFA? MFA- Made for adsense sites. Automated sites which just copy content and add no value.&lt;br /&gt;Google's lax enforcing of Adsense TOS means that spammers can show adsense on crappy site and get away with it. It means that you would be forced to see pages with a sentences and 3 ad units. It means search engine spammers can get away with anything and visitors lose time, publishers loose money and valid adsense ads get a bad rep.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;But that is not even the bad part.&lt;/span&gt;&lt;br /&gt;Google is actively, ok almost actively promoting these Black hat techniques.&lt;br /&gt;What does Joe BlackHatter needs to create a MFA site? Softwares. Softwares to spew out a bazzilion automated sites. Now Google has declared a Jehad against automated/scrapped content. So you would think that they would not touch it with a 10 foot barge pole. Ok lets just ask google &lt;a href="http://www.google.com/search?q=AUTOMATIC+CONTENT+GENERATOR"&gt;http://www.google.com/search?q=AUTOMATIC+CONTENT+GENERATOR&lt;/a&gt;.&lt;br /&gt;The result&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/996/865/1600/1.gif"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; cursor: pointer; float: left; padding-right: 10px;" src="http://photos1.blogger.com/blogger/996/865/320/1.0.png" alt="" border="0" /&gt;&lt;/a&gt;   Yes sir. About a million sponsored listings for software to create spam. How much broke is google for money that they have to promote such Search Engine Fodder.&lt;br /&gt;As a leading Search Engine one would expect Google to take a pro-actice role in weeding them out.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Just for fun, there are some more search results where google lists spammy softwares. (Only sponsored results are shown.)&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/996/865/1600/2.gif"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: left; cursor: pointer;" src="http://photos1.blogger.com/blogger/996/865/320/2.gif" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/996/865/1600/3.gif"&gt;&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.google.com/search?q=Adsense"&gt;http://www.google.com/search?q=Adsense&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.google.com/search?q=CLOAKING"&gt;http://www.google.com/search?q=CLOAKING&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;So next time you see Matt Cutts crying about Bhack Hat SEOs, being scum of the earth, ask him just drop him a line.&lt;br /&gt;&lt;br /&gt;(If you liked this story, why not &lt;a href="http://digg.com/links/How_Google_helps_spammers_and_destroys_your_web_experience."&gt;&lt;span style="font-weight: bold;"&gt;digg it&lt;/span&gt;&lt;/a&gt;?)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-114669574898051996?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/114669574898051996/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=114669574898051996' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114669574898051996'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114669574898051996'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2006/05/how-google-helps-spammers-and-destroys.html' title='How google helps spammers and destroys your internet experience.'/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-114620275401725946</id><published>2006-04-27T22:20:00.000-07:00</published><updated>2006-04-27T22:39:14.050-07:00</updated><title type='text'>How spammers are beating CAPTCHA.</title><content type='html'>&lt;span style="font-style: italic;"&gt;(Ok this is not exactly SEO, but then I know you would be interested in this).&lt;/span&gt;&lt;br /&gt;Just in case you donot know CAPTCHA is Completely Automated Public Turing test to tell Computers and Humans Apart.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/996/865/1600/captcha.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://photos1.blogger.com/blogger/996/865/320/captcha.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Captcha are the pictures containing words you have to spell before you can post a comment in blogs, write something on digg or make a free mail account.&lt;br /&gt;Now spammers need a lot of free email account. They want to comment spam your blog. For this they need to beat the captcha.&lt;br /&gt;Spammers are beating captcha in two ways. Unless the image is very blurred/grainy iage processing software can be used to get the words in them. The guy at &lt;a href="http://www.mperfect.net/aiCaptcha/"&gt;http://www.mperfect.net/aiCaptcha/&lt;/a&gt;&lt;br /&gt;gives an example of how captcha can be beaten using software. But there is an even better way. Social Engineering.&lt;br /&gt;What is the internet most used for? I donot have the statistics, but I am willing to bet that PORN is right there at the top. And what is even better than porn? Free porn, obviously. &lt;br /&gt;When Mr. BigSpammer needs to break a million captchas he makes a tie up with BigFreePornSite.com. His software gets the captcha images and sends them to BigFreePornSite.com. When Joe TeenHighOnSex visits BigFreePornSite.com he asked to post the text in captcha image which is sent to Mr. BigSpammer's servers. Lo, the captcha is broken. Now Mr. Big Spammer can comment spam, digg spam, yahoo spam.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-114620275401725946?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.mperfect.net/aiCaptcha/' title='How spammers are beating CAPTCHA.'/><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/114620275401725946/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=114620275401725946' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114620275401725946'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114620275401725946'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2006/04/how-spammers-are-beating-captcha.html' title='How spammers are beating CAPTCHA.'/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-114594850071737666</id><published>2006-04-24T23:53:00.000-07:00</published><updated>2006-04-25T04:55:10.720-07:00</updated><title type='text'>Add Links for Del.icio.us, Digg, and More to Blogger Posts</title><content type='html'>Social bookmarking sites can be a very effective way to get visitors to your sites. If your readers like what you say, why not give them a chance to bookmark you at del.icio.us and other similar sites.&lt;br /&gt;To add a quick link on your blog to all of the popular traffic-boosting sites , simply add the code below to your template. I generally add it it just below the content part but you can put it anywhere.&lt;br /&gt;&lt;br /&gt;( &lt;a href="http://blogger-templates.blogspot.com/2004/04/change-blogger-template.html"&gt;How to edit blogger template&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt; Del.icio.us Link:&lt;/span&gt;&lt;br /&gt;http://del.icio.us/post?url=&lt; $BlogItemPermalinkURL$&gt; &amp;title=&lt; $BlogItemTitle$&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Digg Link:&lt;/span&gt;&lt;br /&gt;http://digg.com/submit?phase=2&amp;url="&lt;$BlogItemPermalinkURL$&gt;"&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Technorati Cosmos Link:&lt;/span&gt;&lt;br /&gt;http://technorati.com/cosmos/search.html?url=&lt; $BlogItemPermalinkURL$&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Furl Link:&lt;/span&gt;&lt;br /&gt;http://furl.net/storeIt.jsp?t=&lt; $BlogItemTitle$&gt; &amp;u=&lt; $BlogItemPermalinkURL$&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;reddit Link:&lt;/span&gt;&lt;br /&gt;http://reddit.com/submit?url=&lt; $BlogItemPermalinkURL$&gt; &amp;amp;title=&lt; $BlogItemTitle$&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-114594850071737666?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/114594850071737666/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=114594850071737666' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114594850071737666'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114594850071737666'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2006/04/add-links-for-delicious-digg-and-more.html' title='Add Links for Del.icio.us, Digg, and More to Blogger Posts'/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-114594070019253434</id><published>2006-04-24T21:42:00.000-07:00</published><updated>2006-04-24T21:51:40.210-07:00</updated><title type='text'>Do automatic content generators work?</title><content type='html'>&lt;i&gt;If you are in a hurry and cannot wait to read the rest of the article, no they do not.&lt;o:p&gt;&lt;/o:p&gt;&lt;/i&gt;    &lt;p class="MsoNormal"&gt;&lt;b&gt;What are automatic content generators?&lt;o:p&gt;&lt;/o:p&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;Automatic content generators are software which claim to create content, in the form of articles automatically. This might seem an amazing capability, to write articles without human intervention, but the software &lt;i&gt;rewrites &lt;/i&gt;existing articles to create new one.&lt;/p&gt;  &lt;p class="MsoNormal"&gt;There are three main ways in which these content generators work.&lt;/p&gt;  &lt;ol style="margin-top: 0in;" start="1" type="a"&gt;&lt;li class="MsoNormal" style=""&gt;Scraping.      The software gets different parts of the articles from different places      and joins them all together to create a new article.&lt;/li&gt;&lt;li class="MsoNormal" style=""&gt;Thesaurus      substitution. Synonyms are substituted in the original article to create      the new article.&lt;/li&gt;&lt;li class="MsoNormal" style=""&gt;Markov      chains: Markov chain is a technique in which a statistical model of the      existing article is created and the new article is created using the      statistical model.&lt;/li&gt;&lt;/ol&gt;  &lt;p class="MsoNormal"&gt;Of all these methods Markov chain holds the most promise as it is hardest of all the methods to detect.&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;b&gt;So what are Markov chains?&lt;o:p&gt;&lt;/o:p&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;Apart from being lots of bullshit in computer science, they are a tool to create pseudo random text from a statistical of another text. Since it is based on non random text, most of the times it will follow the rules of English grammar. Given a large non random text to create the statistical model, it will generate text which can sometimes pass the scrutiny of humans.&lt;/p&gt;  &lt;p class="MsoNormal"&gt;Markov chain takes into account what words follow a given set of words. Based on this data the new text is created.&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;b&gt;My experiments with Markov chain.&lt;o:p&gt;&lt;/o:p&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;Most black hat SEO techniques leave some footprint which the SEs use to identify the article as automatically generated. This leaves commercial automatic content generators vulnerable. I wanted to check if the SEs are able to identify Markov chain content. For this purpose I wrote my own software. I tried to remove other signs which might flag the content as automatically generated. In particular, the size of files was changed. I removed the trailing sentences which ended abruptly. Paragraph breaks were introduced.&lt;/p&gt;  &lt;p class="MsoNormal"&gt;A site was created with such content and hosted on Tripod. It was given a link from PR 3 page. We checked the position of the web pages in SE from time to time. After a period of four months no references to the automatically created web pages were found.&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;b&gt;So the final words.&lt;o:p&gt;&lt;/o:p&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;Since the web pages were not included in the SEs indexes, the value of creating such web pages is very limited. There are some commercial SW which claim to create automatic articles. I have tried only one of them, so I cannot make claims on their effectiveness. But basically all use the same algorithms. So the results should hold for others as well.&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;References.&lt;/p&gt;  &lt;ol style="margin-top: 0in;" start="1" type="a"&gt;&lt;li class="MsoNormal" style=""&gt;URLs      to created web pages. List at &lt;a href="http://seo-experiments.blogspot.com/"&gt;http://seo-experiments.blogspot.com/&lt;/a&gt;.&lt;/li&gt;&lt;li class="MsoNormal" style=""&gt;Source      and Binaries of the SW used to create web pages. &lt;a href="http://www.fileshack.us/files/1058/MarkovSeo.zip"&gt;http://www.fileshack.us/files/1058/MarkovSeo.zip&lt;/a&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-114594070019253434?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/114594070019253434/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=114594070019253434' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114594070019253434'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114594070019253434'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2006/04/do-automatic-content-generators-work.html' title='Do automatic content generators work?'/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-114391881157120702</id><published>2006-04-01T11:03:00.000-08:00</published><updated>2006-04-01T11:14:05.493-08:00</updated><title type='text'></title><content type='html'>"You must be feeling a bit like alice now?"&lt;br /&gt;Morpheus to Neo, the Matrix.&lt;br /&gt;I surely am an alice wandering the SEO wonderland. And from what I gather, no one knows any thing in SEO. Ok let me rephrase it to, no one knows most of the things in the SEO. The SEO wondeland I have been wandering consists of the forums of &lt;a href="http://www.highrankings.com"&gt;highrankings&lt;/a&gt;, &lt;a href="http://www.digitalpoint.com/forums/"&gt;digitalpoint&lt;/a&gt; and &lt;a href="http://www.webmasterworld.com"&gt;WMW&lt;/a&gt;.&lt;br /&gt;Are reciprocal links dead? Almost, but *mutual* links are in!&lt;br /&gt;Are directories the next big thing? Umm, erm if they are niche, or they are DMOZ.&lt;br /&gt;Ok these were the easy ones.&lt;br /&gt;If I name a page link.html does SE ignore it?&lt;br /&gt;Do SEs respect the nofollow tag?&lt;br /&gt;For once guys give me an honest to god, clear answer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-114391881157120702?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/114391881157120702/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=114391881157120702' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114391881157120702'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114391881157120702'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2006/04/you-must-be-feeling-bit-like-alice-now.html' title=''/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-114391768544948946</id><published>2006-04-01T10:24:00.000-08:00</published><updated>2006-04-01T10:54:45.460-08:00</updated><title type='text'></title><content type='html'>I have a real grudge with Google. Why donot they make the PR data publically available, say via their API? There are ways to get PR data, via sites such as www.prchecker.info/check_page_rank.php. By not making PR data publically available, google is only hurting everyone.&lt;br /&gt;If I really want to know PR of a site, I do have tricks using which I can get the PR. But well its against the TOS. So they hurt the webmaster community. But then it does not help them any way. When people get PR without API, methinks the server load on google will be heigher. Not that google would be concerned about it or anything.&lt;br /&gt;So why not&lt;br /&gt;1. Make the PR publically available via say its API?&lt;br /&gt;2. If you donot want to do so, why not do away with PR data all together. Use it only internally. Never show it to an outside guy. Or is PR just a way to force you to use goog toolbar!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-114391768544948946?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/114391768544948946/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=114391768544948946' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114391768544948946'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114391768544948946'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2006/04/i-have-real-grudge-with-google.html' title=''/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25189681.post-114390943012670893</id><published>2006-04-01T08:36:00.000-08:00</published><updated>2006-04-01T08:37:10.133-08:00</updated><title type='text'></title><content type='html'>The SEO world as seen by a completely brain dead.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25189681-114390943012670893?l=seodummy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seodummy.blogspot.com/feeds/114390943012670893/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25189681&amp;postID=114390943012670893' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114390943012670893'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25189681/posts/default/114390943012670893'/><link rel='alternate' type='text/html' href='http://seodummy.blogspot.com/2006/04/seo-world-as-seen-by-completely-brain.html' title=''/><author><name>shabda</name><uri>http://www.blogger.com/profile/07961528262493927188</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry></feed>
