麻烦帮我看看下面的代码,我这里在最后会报错,不知什么原因.
gevent.hub.LoopExit: ('This operation would block forever', <Hub at 0x2f62af8 select default pending=0 ref=0>)
还有我这段代码有什么问题吗?有哪些地方可以优化,求赐教.
我是新手,代码可能比较 low,求"教做人".
谢谢. (还请jiandan同学放过....我就是学习下)
# -*- coding:utf-8 -*-
import gevent
import gevent.queue
import requests
from lxml import etree
url_queue = gevent.queue.JoinableQueue(100)
headers = {
'User-Agent': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
}
def spider():
while True:
url = url_queue.get()
if url is None:
url_queue.task_done()
break
try:
html = requests.get(url, headers=headers, timeout=1).content
selector = etree.HTML(html)
title = selector.xpath('//a[@href="%s"]/text()' % url)[0]
print(title)
except Exception as e:
print(e)
if __name__ == '__main__':
urls = [
'http://jandan.net/2016/09/22/migrants-choice.html',
'http://jandan.net/2016/09/22/farting-really-good.html',
'http://jandan.net/2016/09/22/special-cleaner.html',
'http://jandan.net/2016/09/22/hand-mobile-phone.html',
'http://jandan.net/2016/09/22/beer-you-order.html',
'http://jandan.net/2016/09/22/pigeons-can-read.html',
'http://jandan.net/2016/09/22/snake-inter-species.html',
'http://jandan.net/2016/09/21/north-koreas-internet-2.html',
'http://jandan.net/2016/09/21/mona-lisa-overrated.html',
'http://jandan.net/2016/09/21/antikythera-ancient-skeleton.html',
'http://jandan.net/2016/09/21/mentality-fish.html',
'http://jandan.net/2016/09/21/things-smuggled-space.html',
'http://jandan.net/2016/09/21/water-bear.html',
'http://jandan.net/2016/09/21/oldest-fishing-hooks.html',
'http://jandan.net/2016/09/21/b-21-raider.html',
'http://jandan.net/2016/09/21/paper-cuts-hurt.html',
'http://jandan.net/2016/09/21/cat-ecological-disaster.html',
'http://jandan.net/2016/09/21/pluto-owns-heart.html',
'http://jandan.net/2016/09/21/a-teenage-girl.html',
'http://jandan.net/2016/09/21/light-drive-men.html',
'http://jandan.net/2016/09/21/ai-analyses-mammograms.html',
'http://jandan.net/2016/09/21/burnt-cheese.html',
'http://jandan.net/2016/09/21/black-hole-spaghetti.html',
'http://jandan.net/2016/09/21/womens-pubic-hair.html'
]
for url in urls:
url_queue.put(url)
threads = []
for i in range(1, 3):
threads.append(gevent.spawn(spider))
gevent.joinall(threads)
这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。
V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。
V2EX is a community of developers, designers and creative people.