Coverage for scrapy/utils/url : 100%
Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
|
""" This module contains general purpose URL functions not found in the standard library.
Some of the functions that used to be imported from this module have been moved to the w3lib.url module. Always import those from there instead. """
"""Return True if the url belongs to any of the given domains"""
else:
"""Return True if the url belongs to the given spider""" getattr(spider, 'allowed_domains', []))
encoding=None): """Canonicalize the given url by applying the following procedures:
- sort query arguments, first by key, then by value - percent encode paths and query arguments. non-ASCII characters are percent-encoded using UTF-8 (RFC-3986) - normalize all spaces (in query arguments) '+' (plus symbol) - normalize percent encodings case (%2f -> %2F) - remove query arguments with blank values (unless keep_blank_values is True) - remove fragments (unless keep_fragments is True)
The url passed can be a str or unicode, while the url returned is always a str.
For examples see the tests in scrapy.tests.test_utils_url """
"""Return urlparsed url from the given argument (which could be an already parsed url) """ urlparse.urlparse(unicode_to_str(url, encoding))
""" Return the crawleable url according to: http://code.google.com/web/ajaxcrawling/docs/getting-started.html
TODO: add support for urls with query arguments
>>> escape_ajax("www.example.com/ajax.html#!key=value") 'www.example.com/ajax.html?_escaped_fragment_=key=value' """ |