python urllib2 and https proxy

Wednesday, August 28, 2013 » debian, proxy, python, urllib2

Preamble

Since urllib2 openers are global I recently had some expected behavior I would like to explore.

urllib2, https, and proxies

urllib2 does not support https proxy natively. There is a nice activestate recipe for handling it though.

There is one bug I ran into.

Time TimeoutSocket fake socket doesn't have a sendall method \
so you get the above error when using https.

I am saving this module as sslhandler.py

Making normal non-proxy post

This is a post with generic data as the payload


import json
import urllib2
import os 

def post(url, trail, **kwargs):
    #posting key with None values is not received well better to strip them
    data = json.dumps(dict((k, v) for k, v in kwargs.iteritems() if v is not None))
    req = urllib2.Request(url + trail, data, {'Content-Type': 'application/json'})
    f = urllib2.urlopen(req)
    response = f.read()
    f.close()
    return json.loads(response)

if __name__ == '__main__':
    print post('http://localhost', '/info', {'data': 'value'})

This is sufficient for making a post to a local REST API.

I am saving this file as basic.py.

Making a request through the proxy using our https opener


import urllib2
from sslhandler import ConnectHTTPHandler, ConnectHTTPSHandler

def request(url, trail):
    opener = urllib2.build_opener(ConnectHTTPHandler, ConnectHTTPSHandler)
    urllib2.install_opener(opener)
    req = urllib2.Request(url=url + trail)
    proxy = 'ourproxy.com:8000'
    req.set_proxy(proxy, 'https')
    return urllib2.urlopen(req).read()

if __name__ == '__main__':
    print request('https://remotehost', '/info')
This uses our urllib2 opener to connect through a proxy using the CONNECT method I am saving this file as __proxied.py__. #### Now let’s say there is a job that calls both. We have three files: sslhandler.py basic.py proxied.py We will make two scripts that call both of these and see what happens. ##### First script…

from basic import post
from proxied import request
print post('http://localhost', '/info', {'data': 'value'})
print request('https://remotehost', '/info')

example output:


#echoed
{'data': 'value'}
#string response
hello!
Second script…

from basic import post
from proxied import request
print request('https://remotehost', '/info')
print post('http://localhost', '/info', {'data': 'value'})

example output:


hello!
Traceback (most recent call last):
  File "sslhandler.py", line 93, in 
    f = urllib2.urlopen(req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", \
    line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", \
    line 394, in open
    response = self._open(req, data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", \
    line 412, in _open
    '_open', req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", \
    line 372, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", \
    line 1207, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "sslhandler.py", line 84, in do_open
    return urllib2.HTTPSHandler.do_open(self, ProxyHTTPSConnection, req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", \
    line 1174, in do_open
    raise URLError(err)
urllib2.URLError: 

Our local call failed and the remote one succeeded when we reversed them.

Why does urllib2 hate me?

The basic idea here is when the basic post is the first directive everything is fine, but if we make the proxy request first then both try to use the proxy.

This a product of the urllib2 opener being global. Even though these are two libraries the way import handles arranging resources both are calling the same instance of urllib2.

So I can't install my proxy opener and call urllib2 without using it

Solution 1

We can build our opener and use it without installing it.

We need to change proxied.py


import urllib2
from sslhandler import ConnectHTTPHandler, ConnectHTTPSHandler

def request(url, trail):
    proxyhandler = urllib2.build_opener(ConnectHTTPHandler, ConnectHTTPSHandler)
    req = urllib2.Request(url=url + trail)
    proxy = 'ourproxy.com:8000'
    req.set_proxy(proxy, 'https')
    return proxyhandler.open(req).read()

if __name__ == '__main__':
    print request('https://remotehost', '/info')
Now we are not changing the state of the global urllib2 handler so all is well.

from basic import post
from proxied import request
print request('https://remotehost', '/info')
print post('http://localhost', '/info', {'data': 'value'})

Works:


#string response
hello!
#echoed
{'data': 'value'}

Solution 2

We can reset the state of urllib2 by reloading the module. This is not the ideal
solution, and is certainly heavy handed but it is demonstrative.


from sslhandler import ConnectHTTPHandler, ConnectHTTPSHandler

def request(url, trail):
    import urllib2
    opener = urllib2.build_opener(ConnectHTTPHandler, ConnectHTTPSHandler)
    urllib2.install_opener(opener)
    req = urllib2.Request(url=url + trail)
    proxy = 'ourproxy.com:8000'
    req.set_proxy(proxy, 'https')
    urllib2 = reload(urllib2)
    return urllib2.urlopen(req).read()

if __name__ == '__main__':
    print request('https://remotehost', '/info')

Works:


#string response
hello!
#echoed
{'data': 'value'}

Solution 3

use the requests module.

Not a full example but you get the idea.


https_proxy = "10.10.1.11:1080"

proxies = { 
              "https" : https_proxy, 
            }

r = requests.get('https://remotehost/info', proxies=proxies)

Conclusion

I like the requests module.

Reference

SSl proxy
urllib2 docs