satan 通过 Google 阅读器发送给您的内容:
Today we'll spend some time looking at three different ways to make Python submit a web form. In this case, we will be doing a web search with duckduckgo.com searching on the term "python" and saving the result as an HTML file. We will use Python's included urllib modules and two 3rd party packages: requests and mechanize. We have three small scripts to cover, so let's get cracking!
Submitting a web form with urllib
We will start with urllib and urllib2 since they are included in Python's standard library. We'll also import the webbrowser to open the search results for viewing. Here's the code:
import urllib import urllib2 import webbrowser url = "http://duckduckgo.com/html" data = urllib.urlencode({'q': 'Python'}) results = urllib2.urlopen(url, data) with open("results.html", "w") as f: f.write(results.read()) webbrowser.open("results.html")
The first thing you have to do when you want to submit a web form is figure out what the form is called and what the url is that you will be posting to. If you go to duckduckgo's website and view the source, you'll notice that its action is pointing to a relative link, "/html". So our url is "http://duckduckgo.com/html". The input field is named "q", so to pass duckduckgo a search term, we have to pass it to the "q" field. This is where the urllib.urlencode line comes in. It encodes our search term correctly and then we open the url and search. The results are read and written to disk. Finally, we open our saved results using the webbrowser module. Now let's find out how this process differs when using the requests package.
Submitting a web form with requests
The requests package does form submissions a little bit more elegantly. Let's take a look:
import requests url = "http://duckduckgo.com/html" payload = {'q':'python'} r = requests.post(url, payload) with open("requests_results.html", "w") as f: f.write(r.content)
With requests, you just need to create a dictionary with the field name as the key and the search term as the value. Then you use requests.post to do the search. Finally you use the resulting requests object, "r", and access its content property which you save to disk. We skipped the webbrowser part in this example (and the next) for brevity. Now we should be ready to see how mechanize does its thing.
Submitting a web form with mechanize
The mechanize module has lots of fun features for browsing the internet with Python. Sadly it doesn't support javascript. Anyway, let's get on with the show!
import mechanize url = "http://duckduckgo.com/html" br = mechanize.Browser() br.set_handle_robots(False) # ignore robots br.open(url) br.select_form(name="x") br["q"] = "python" res = br.submit() content = res.read() with open("mechanize_results.html", "w") as f: f.write(content)
As you can see, mechanize is a little more verbose than the other two methods were. We also need to tell it to ignore the robots.txt directive or it will fail. Of course, if you want to be a good netizen, then you shouldn't ignore it. Anyway, to start off, you need a Browser object. Then you open the url, select the form (in this case, "x") and set up a dictionary with the search parameters as before. Note that in each method, the dict setup is a little different. Next you submit the query and read the result. Finally you save the result to disk and you're done!
Wrapping Up
Of the three, requests was probably the simplest with urllib being a close follow-up. Mechanize is made for doing a lot more then the other two though. It's made for screen scraping and website testing, so it's no surprise it's a little more verbose. You can also do form submission with selenium, but you can read about that in this blog's archives. I hope you found this article interesting and perhaps inspiring. See you next time!
Further Reading
Source Code
可从此处完成的操作:
- 使用 Google 阅读器订阅Planet Python
- 开始使用 Google 阅读器,轻松地与您喜爱的所有网站保持同步更新
没有评论:
发表评论