Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
""" An extension to retry failed requests that are potentially caused by temporary problems such as a connection timeout or HTTP 500 error.
You can change the behaviour of this middleware by modifing the scraping settings: RETRY_TIMES - how many times to retry a failed page RETRY_HTTP_CODES - which HTTP response codes to retry
Failed pages are collected on the scraping process and rescheduled at the end, once the spider has finished crawling all regular (non failed) pages. Once there is no more failed pages to retry this middleware sends a signal (retry_complete), so other extensions could connect to that signal.
About HTTP errors to consider:
- You may want to remove 400 from RETRY_HTTP_CODES, if you stick to the HTTP protocol. It's included by default because it's a common code used to indicate server overload, which would be something we want to retry """
ConnectionRefusedError, ConnectionDone, ConnectError, \ ConnectionLost, TCPTimedOutError
# IOError is raised by the HttpCompression middleware when trying to # decompress an empty response ConnectionRefusedError, ConnectionDone, ConnectError, ConnectionLost, TCPTimedOutError, IOError)
raise NotConfigured
and 'dont_retry' not in request.meta:
spider=spider, level=log.DEBUG) else: spider=spider, level=log.DEBUG)
|