Hi,
I need to check whether a page is being redirected or not without actually downloading the content. I just need the final URL. What's the best way of doing this is Python? Thanks!
From stackoverflow
-
When you open the URL with
urllib2, and you're redirected, you get a status 30x for redirection. Check the info to see the location to which you're redirected. You don't need to read the page to read theinfo()that's part of the response.Paul Tomblin : Does urllib2 give you a way to issue a HEAD command? That's usually the way to get just the information you need without the network overhead of transmitting the page contents.S.Lott : You don't have to read the page. Your response includes a socket which you can simply close.Adam Rosenfield : Yes, but you're still incurring network traffic. The point of HEAD is to not incur the network traffic. -
If you specifically want to avoid downloading the content, you'll need to use the HEAD request method. I believe the
urllibandurllib2libraries do not support HEAD requests, so you'll have to use the lower-levelhttpliblibrary:import httplib h = httplib.HTTPConnection('www.example.com') h.request('HEAD', '/') response = h.getresponse() // Check for 30x status code if 300 <= response.status < 400: // It's a redirect location = response.getheader('Location')bgoncalves : Great. I was trying to force urllib/urllib2 to do this without much luck, and httplib's documentation isn't the best. Thanks!
0 comments:
Post a Comment