Sunday, September 02, 2007

Parasite hosting - Or why social networking sites need to review user generated content.

To directly see the list of highly ranked spammy users pages within reddit
From wikipedia: Parasite hosting is the process of hosting a site on some one else's server without their consent, generally for the purpose of search engine benefit.

One of the most competitive keywords which everyone wants to rank for is buy viagra online

On that and many other similar searches, you would start noticing that user pages of social bookmarking sites like reddit, get very prominent rankings.
These sites have a very high TrustRank and amazing authority scores. So even off topic pages are very easily ranked high for pages created on these domains.
I was playing with the sitexplorer python library I wrote. The user pages on the reddit domain outrank even the subreddit pages.
Using a little python I found that of the top 400 pages on the reddit.com domain, 102 are user pages of the form reddit.com/user/{username}/. All of these are pages promoting prescription pills like Viagra or Cialis.
(The python program [1]
List of the user pages in the top 400 pages on reddit)

How can the social bookmarking sites combat this?
1. Use a robots.txt. For reddit.com this can be as simple as using

User-Agent: *
Disallow: /user

2. Have kill words which are not allowed in the user names. This allows the pages to be indexed with facing parasite hosting.



import re
all_pages = []
for i in range(4):
start = i*100 + 1
results = get_page_data('YahooDemo', u'http://reddit.com', start = start, results = 100)
pages = [el['Url'] for el in results]
all_pages.extend(pages)

pat = '/user/([a-zA-Z0-9_]*)'
rep = re.compile(pat)
users = [rep.search(el) for el in all_pages]
users_ = [el.groups()[0] for el in users if el is not None]



List of the users on the reddit.com site. (These links are nofollowed).
[u'Buy_viagra_online_', u'BUY_VIAGRA_MEDS', u'CHEAP_VIAGRA_PRICE', u'ORDER_VIAGRA_NOW', u'tylerton', u'VIAGRA_ONLINE_CHEAP', u'Buy_viagra_online', u'DISCOUNT_VIAGRA_NOW', u'order_viagra_cheap', u'order_viagra_online', u'order_cialis_cheap', u'Viagra_', u'viagraagain', u'cialisagain', u'BUY_VIAGRA_ONLINEE', u'tramadolagain', u'phentermineagain', u'levitraagain', u'BUY_VIAGRA_ONLINE3', u'BUY_FLAGYL_ONLINE', u'CIALIS_LOWEST_PRICES', u'BUY_HOODIA_ONLINE', u'ORDER_VIAGRA_TODAY', u'Ephedra_Pills', u'Viagra_online', u'BUY_VIAGRA_ONLINE2', u'dans_movies', u'panda_movies', u'cialispills', u'BUY_VIAGRA1', u'Viagrapills_Online', u'phenterminepills', u'tramadolpills', u'free_porn_movies', u'phenterminepharm', u'Soma_Carisoprodol', u'cialispharm', u'viagraonline', u'Cheapest_Fioricet', u'Meridia_Diet_Pills', u'viagrapills', u'valiumpills', u'levitrapharm', u'cialis_buy', u'Buy_Percocet', u'viagrapharm', u'BUY_VIAGRA_MD', u'Tramadol_Hcl', u'Generic_Propecia', u'xanaxpill', u'BUY_LEVITRA_ONLINE1', u'tramadolpill', u'cialis_cheap', u'tramadolpharm', u'CIALIS_BEST_PRICES', u'ordertramadol', u'phenterminepill', u'order_levitra_med', u'orderxanax', u'cialis_online_drug', u'orderphentermine', u'generic_cialis_pill', u'ordercialis', u'CIALIS_ONLINE', u'BUY_VIAGRA_ONLINE1', u'online_buy_cialis', u'CHEAP_VIAGRA_PRICES', u'generic_levitra_pill', u'orderviagra', u'LEVITRA_SALE', u'CHEAP_VIAGRA_ONLINE', u'Order_Cialis_Online0', u'BUY_DISCOUNT_VIAGRA', u'order_cialis_online', u'BUY_VIAGRA_TODAY', u'VIAGRA_LOWEST_PRICE', u'CHEAP_VIAGRA_PILL', u'viagracialis', u'Suboxone', u'VIAGRA_BEST_PRICES', u'FDA_CIALIS_ONLINE', u'DISCOUNT_VIAGRA_A', u'cialistop', u'Buy_viagra_', u'phenermine', u'FDA_LEVITRA', u'phenterminetop', u'insura', u'viagrapharmacy', u'cialispharmacy', u'viagratop', u'levitrapharmacy', u'phenterminepharmacy', u'autoverzekering', u'discount_viagra', u'goba', u'levitratop', u'generic_cialis']
>>>

1 comment:

Unknown said...

http://www.allhookup.com - get laid right now!