Skip to content

Instantly share code, notes, and snippets.

@pkmishra
Last active November 8, 2020 05:26

Revisions

  1. pkmishra revised this gist Mar 19, 2013. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions settings.py
    Original file line number Diff line number Diff line change
    @@ -6,8 +6,8 @@
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.55.3 (KHTML, like Gecko) Version/5.1.3 Safari/534.53.10'
    ]
    DOWNLOADER_MIDDLEWARES = {
    'winestores.middlewares.RandomUserAgentMiddleware': 400,
    'winestores.middlewares.ProxyMiddleware': 410,
    'myproject.middlewares.RandomUserAgentMiddleware': 400,
    'myproject.middlewares.ProxyMiddleware': 410,
    'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
    # Disable compression middleware, so the actual HTML pages are cached
    }
  2. pkmishra revised this gist Mar 19, 2013. 1 changed file with 13 additions and 0 deletions.
    13 changes: 13 additions & 0 deletions settings.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,13 @@
    # More comprehensive list can be found at
    # http://techpatterns.com/forums/about304.html
    USER_AGENT_LIST = [
    'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.36 Safari/535.7',
    'Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0) Gecko/16.0 Firefox/16.0',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.55.3 (KHTML, like Gecko) Version/5.1.3 Safari/534.53.10'
    ]
    DOWNLOADER_MIDDLEWARES = {
    'winestores.middlewares.RandomUserAgentMiddleware': 400,
    'winestores.middlewares.ProxyMiddleware': 410,
    'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
    # Disable compression middleware, so the actual HTML pages are cached
    }
  3. pkmishra revised this gist Mar 19, 2013. 1 changed file with 0 additions and 1 deletion.
    1 change: 0 additions & 1 deletion middlewares.py
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,6 @@
    import random
    from scrapy.conf import settings
    class RandomUserAgentMiddleware(object):

    def process_request(self, request, spider):
    ua = random.choice(settings.get('USER_AGENT_LIST'))
    if ua:
  4. pkmishra created this gist Mar 19, 2013.
    13 changes: 13 additions & 0 deletions middlewares.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,13 @@
    import os
    import random
    from scrapy.conf import settings
    class RandomUserAgentMiddleware(object):

    def process_request(self, request, spider):
    ua = random.choice(settings.get('USER_AGENT_LIST'))
    if ua:
    request.headers.setdefault('User-Agent', ua)

    class ProxyMiddleware(object):
    def process_request(self, request, spider):
    request.meta['proxy'] = settings.get('HTTP_PROXY')