Sunday, September 02, 2007

Getting started on SEO programming (using Python)

The python code is here.
Usage instructions are here

You own a website and want to keep track of its placement in search engines. You want to know who is linking to you, how many of your pages are indexed in the search engines. You want to tell the SE when you update you sitemaps or when you update your website.

The SiteExplorerApi from yahoo makes this extremely convenient. And with Google discontinuing the Soap Search API, this is the only feasible choice.
The site explorer api is a Rest service. You construct a URL, and make a request, from your browser, from your command line, or any place else. You need to parse the server's response to get the data in the format of your choice.
We would write a thin Python wrapper over this rest service so that we can construct our queries in python.

(to follow these examples, you need this python code, and simplejson library)

Some simple examples.
1. We want to get the top 1000 sites which link to reddit

all_urls = []
for i in range(10):
start = i*100 + 1
results = get_inlink_data('YahooDemo', u'http://reddit.com', start = start, results = 100)
urls = [el['Url'] for el in results]
all_urls.extend(urls)


2. We want the 400 highest rated pages on reddit.

all_pages = []
for i in range(4):
start = i*100 + 1
results = get_page_data('YahooDemo', u'http://reddit.com', start = start, results = 100)
pages = [el['Url'] for el in results]
all_pages.extend(pages)


3. Google.com has updated its sitemap. We want to let Yahoo know of it.

do_ping(u'http://www.google.com/sitemap.xml')


4. I have updated SeoDummy. Lets tell yahoo of that.

do_update_notification('YahooDemo', 'http://www.seodummy.blogspot.com/')


5. You can use these methods in conjunction to get some advanced functionality. For example, you can use get_inlink_data and get_page_data together to get a breakup of who links to each of your subpages.
For examples of some cool SEO tools, you can go here.


You would need to get simplejson to use this library. We get the response from yahoo in Json, and simplejson is needed to parse that.
There are four methods corresponding to the 4 yahoo api calls. The arguments for each method are exactly same as required arguments for the REst api, excepting
output
and
callback
, which are never used.

get_inlink_data(inLinkData)
get_page_data(pageData)
do_ping(ping)
do_update_notification(update_notification)

No comments: