I am trying to write a python program for reading a whole website.
With what I found from Google, I just found explanations for only one webpage with urllib2.
Here is my code:
- Code: Select all
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
infile = opener.open('http://ru.wikipedia.org/wiki/%D0%97%D0%B0%D0%B2%D0%BE%D0%B4_%D0%B8%D0%BC%D0%B5%D0%BD%D0%B8_%D0%9C%D0%B0%D0%BB%D1%8B%D1%88%D0%B5%D0%B2%D0%B0')
page = infile.read()
Now, If I want to read from the whole wikipedia for example, how should I proceed?
Not only http://en.wikipedia.org but all the webpages which address starts with http://en.wikipedia.org/blablabla....
Thanks a lot all for your attention !