网络收藏夹: python脚本:批量查询网站的pr

2012年4月9日星期一

python脚本:批量查询网站的pr

satan 通过 Google 阅读器发送给您的内容：

python脚本:批量查询网站的pr

于 12-4-9 通过 averiany涂鸦馆作者：averainy

这个python脚本:批量查询网站的pr是在老王python的博客看到的,站长会喜欢的,贴出来分享一下,

Source code

# -*- coding: utf-8 -*- import re,urllib,httplib,time def get_url(url):     '''获取标准的url'''       host_re  = re.compile(r'^https?://(.*?)($|/)',                        re.IGNORECASE                    )       return host_re.search(url).group(0)[7:-1]   def get_pr(url):     '''获取相关的pr'''     params = urllib.urlencode({'PRAddress':url})     headers = {"Content-type": "application/x-www-form-urlencoded",                "Accept": "text/plain",              "User-agent":"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)",             "Referer":"http://pr.chinaz.com/?PRAddress=www.baidu.com"             }     conn = httplib.HTTPConnection("pr.chinaz.com")     conn.request("GET", "", params, headers)     response = conn.getresponse()     data = response.read()     datautf8 = data.decode('utf-8')     posin = datautf8.find('enkey')     keyinfo =  datautf8[posin+6:posin+38]       opener = urllib.FancyURLopener()     opener.addheaders = [         ('User-agent','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)')     ]       hosturl = "http://pr.chinaz.com/ajaxsync.aspx?at=pr&enkey=%s&url=%s" % (keyinfo,url)     info = opener.open(hosturl).read()     cinfo = info.decode('utf-8').encode('gbk')     num_re = re.compile(r'[0-9]')     pr_num =  num_re.search(cinfo).group(0)     print pr_num     return pr_num   f = file('pr.txt','w')   for m in file('info.txt','r'):     murl = m.strip() #    checkurl = get_url(murl)     try:         prnum = get_pr(murl)     except Exception,e:         prnum = -1         content = "%s,%s\n" % (murl,prnum)         f.write(content)         continue     else:         content = "%s,%s\n" % (murl,prnum)         f.write(content)         time.sleep(5)   f.close()

来源:老王python

链接:http://www.cnpythoner.com/post/190.html

可从此处完成的操作：

使用 Google 阅读器订阅averiany涂鸦馆
开始使用 Google 阅读器，轻松地与您喜爱的所有网站保持同步更新

网络收藏夹

2012年4月9日星期一

python脚本:批量查询网站的pr

satan 通过 Google 阅读器发送给您的内容：

python脚本:批量查询网站的pr

可从此处完成的操作：

没有评论:

发表评论