pythonurllib2編碼

發布時間: 2024-09-21 14:10:29

❶ python urllib2 加頭問題

defopenUrl(url):
url='http://'+url
req=urllib2.Request(url)
req.add_header('User-agent','Mozilla/5.0(WindowsNT6.2;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/33.0.1707.0Safari/537.36')
response=urllib2.urlopen(req)
the_page=response.read()
printthe_page
printresponse.geturl()
printresponse.info()
printresponse.headers

你試試哈，訪問網頁我一般用urllib2來做，這代碼我都用過的，模擬瀏覽器的話，一般就在header里的 User-agent里指定就可以了，至於其他，除非你訪問的網頁有特定的寫法（比如，'Accept-Encoding',只能用'gzip'或者deflate，不能通用）

❷ python的httplib，urllib和urllib2的區別及用

整體來說，urllib2是urllib的增強，但是urllib中有urllib2中所沒有的函數。 urllib2可以用urllib2.openurl中設置Request參數，來修改Header頭。如果你訪問一個網站，想更改User Agent（可以偽裝你的瀏覽器），你就要用urllib2. urllib支持設置

❸ python urllib2進行網頁源代碼扒取時，出現urllib2.HTTPError: HTTP Error 250: Forbidden問題

HTTP請求的Headers包含瀏覽器的信息、所使用的語言、請求的主機、COOKIE等信息。

其中最重要的兩項是瀏覽器的信息User-Agent，如果請求中沒有User-Agent，網站會認為不是人在瀏覽器的請求，是惡意攻擊

對於需要登錄的網站，請求中往往需要COOKIE來驗證用戶，來獲取打開某些網站的許可權。

使用firefox瀏覽器的開發者工具箱>網路選項，可以很容易獲取User-Agent等頭信息

headers={"User-Agent":"Mozilla/5.0Firefox/35.0",
"Cookie":"BDUSS=AAAAAAAAAAAAAAAAAAAAAAAA",}
request=urllib2.Request(url,postData,headers=headers)
response=urllib2.urlopen(request)

❹ url編碼問題在python中怎麼解決

最近在抓取一些js代碼產生的動態數據，需要模擬js請求獲得所需用的數據，遇到對url進行編碼和解碼的問題，就把遇到的問題總結一下，有總結才有進步，才能使學到的知識更加清晰。對url進行編碼和解碼，python提供了很方便的介面進行調用。

url中的query帶有特殊字元（不是url的保留字）時需要進行編碼。當url中帶有漢字時，需要特殊的處理才能正確編碼，以下都只針對這種情形，當然也適用於純英文字元的url。

(1) url編碼：

import urllib

url = 'wd=哈哈' #如果此網站編碼是gbk的話，需要進行解碼，從gbk解碼成unicode，再從Unicode編碼編碼為utf-8格式。

url = url.decode('gbk', 'replace')

print urllib.quote(url.encode('utf-8', 'replace'))

結果: 3a%2f%2ftest.com%2fs%3fwd%3d%e5%93%88%e5%93%88

(2) url解碼:

import urllib

encoded_url = est.com%2fs%3fwd%3d%e5%93%88%e5%93%88'

print urllib.unquote(encoded_url).decode('utf-8', 'replace').encode('gbk', 'replace') #反過來

函數調用的參數以及結果都是utf-8編碼的，所以在對url編碼時，需要將參數串的編碼從原始編碼轉換成utf-8，

對url解碼時，需要將解碼結果從utf-8轉換成原始編碼格式。

依據網站採用的編碼不同，或是gbk或是utf-8，賦賦予不同的編碼，進行不同的url轉碼。GBK格式，一個中文字元轉為%xx%xx，共兩組；utf-8格式，一個中文字元轉為%xx%xx%xx，共三組。

>>>importsys,urllib
>>>s='杭州'
>>>urllib.quote(s.decode(sys.stdin.encoding).encode('gbk'))
%BA%BC%D6%DD
>>>urllib.quote(s.decode(sys.stdin.encoding).encode('utf8'))
'%E6%9D%AD%E5%B7%9E'

[python]view plain

a="墨西哥女孩被拐4年接客4萬次生的孩子成為人質-搜狐新聞"
printurllib.quote(urllib.quote(a))

進行兩次編碼轉換後，會變為：%25E5%25A2%25A8%25E8%25A5%25BF%25E5%2593%25A5%25E5%25A5%25B3%25E5%25AD%25A9%25E8%25A2%25AB%25E6%258B%25904%25E5%25B9.................................................................................這樣的形式。

同樣需要兩次解碼後才能得到中文。

最近用python寫了個小爬蟲自動下點東西，但是url 是含中文的，而且中文似乎是 gbk 編碼然後轉成 url的。舉個例子吧，我如果有個unicode字元串「歷史上那些牛人們.pdf」，那麼我轉換成url之後是，
t="%20%E5%8E%86%E5%8F%B2%E4%B8%8A%E9%82%A3%E4%BA%9B%E7%89%9B%E4%BA%BA%E4%BB%AC.pdf"，
但是對方網站給的是 s="%C0%FA%CA%B7%C9%CF%C4%C7%D0%A9%C5%A3%C8%CB%C3%C7.PDF"

>>>print urllib.unquote("%C0%FA%CA%B7%C9%CF%C4%C7%D0%A9%C5%A3%C8%CB%C3%C7.PDF").decode('gbk').encode('utf-8')

>>>歷史上那些牛人們.PDF

❺ python urllib2的用法

urllib2 默認會使用環境變數 http_proxy 來設置 HTTP Proxy。如果想在程序中明確控制 Proxy 而不受環境變數的影響，可以使用下面的方式:
import urllib2
enable_proxy = True
proxy_handler = urllib2.ProxyHandler({"http" : 'IP:8080'})
null_proxy_handler = urllib2.ProxyHandler({})
if enable_proxy:
opener = urllib2.build_opener(proxy_handler)
else:
opener = urllib2.build_opener(null_proxy_handler)
urllib2.install_opener(opener)
這里要注意的一個細節，使用 urllib2.install_opener() 會設置 urllib2 的全局 opener 。這樣後面的使用會很方便，但不能做更細粒度的控制，比如想在程序中使用兩個不同的 Proxy 設置等。比較好的做法是不使用 install_opener 去更改全局的設置，而只是直接調用 opener 的 open 方法代替全局的 urlopen 方法。

❻ python的httplib，urllib和urllib2的區別及用

他們的區別
urllib和urllib2
urllib 和urllib2都是接受URL請求的相關模塊，但是urllib2可以接受一個Request類的實例來設置URL請求的headers，urllib僅可以接受URL。
這意味著，你不可以偽裝你的User Agent字元串等。
urllib提供urlencode方法用來GET查詢字元串的產生，而urllib2沒有。這是為何urllib常和urllib2一起使用的原因。
目前的大部分http請求都是通過urllib2來訪問的

httplib
httplib實現了HTTP和HTTPS的客戶端協議，一般不直接使用，在python更高層的封裝模塊中（urllib,urllib2）使用了它的http實現。

urllib簡單用法
urllib.urlopen(url[, data[, proxies]]) :

詳細使用方法見
urllib學習

urllib2簡單用法
最簡單的形式
import urllib2
response=urllib2.urlopen('http://www.douban.com')
html=response.read()

實際步驟：
1、urllib2.Request()的功能是構造一個請求信息，返回的req就是一個構造好的請求
2、urllib2.urlopen()的功能是發送剛剛構造好的請求req，並返回一個文件類的對象response，包括了所有的返回信息。
3、通過response.read()可以讀取到response裡面的html，通過response.info()可以讀到一些額外的信息。
如下：
#!/usr/bin/env python
import urllib2
req = urllib2.Request("http://www.douban.com")
response = urllib2.urlopen(req)
html = response.read()
print html

有時你會碰到，程序也對，但是伺服器拒絕你的訪問。這是為什麼呢?問題出在請求中的頭信息(header)。有的服務端有潔癖，不喜歡程序來觸摸它。這個時候你需要將你的程序偽裝成瀏覽器來發出請求。請求的方式就包含在header中。
常見的情形：

import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'# 將user_agent寫入頭信息
values = {'name' : 'who','password':'123456'}
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

values是post數據
GET方法
例如網路：
網路是通過http://www..com/s?wd=XXX 來進行查詢的，這樣我們需要將{『wd』:』xxx』}這個字典進行urlencode

#coding:utf-8
import urllib
import urllib2
url = 'http://www..com/s'
values = {'wd':'D_in'}
data = urllib.urlencode(values)
print data
url2 = url+'?'+data
response = urllib2.urlopen(url2)
the_page = response.read()
print the_page

POST方法

import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' //將user_agent寫入頭信息
values = {'name' : 'who','password':'123456'} //post數據
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values) //對post數據進行url編碼
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

urllib2帶cookie的使用

#coding:utf-8
import urllib2,urllib
import cookielib

url = r'http://www.renren.com/ajaxLogin'

#創建一個cj的cookie的容器
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
#將要POST出去的數據進行編碼
data = urllib.urlencode({"email":email,"password":pass})
r = opener.open(url,data)
print cj

httplib簡單用法
簡單示例

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import httplib
import urllib

def sendhttp():
data = urllib.urlencode({'@number': 12524, '@type': 'issue', '@action': 'show'})
headers = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain"}
conn = httplib.HTTPConnection('bugs.python.org')
conn.request('POST', '/', data, headers)
httpres = conn.getresponse()
print httpres.status
print httpres.reason
print httpres.read()

if __name__ == '__main__':
sendhttp()

具體用法見
httplib模塊
python 3.x中urllib庫和urilib2庫合並成了urllib庫。其中、
首先你導入模塊由
import urllib
import urllib2
變成了
import urllib.request

然後是urllib2中的方法使用變成了如下
urllib2.urlopen()變成了urllib.request.urlopen()
urllib2.Request()變成了urllib.request.Request()

urllib2.URLError 變成了urllib.error.URLError
而當你想使用urllib 帶數據的post請求時，
在python2中
urllib.urlencode(data)

而在python3中就變成了
urllib.parse.urlencode(data)

腳本使用舉例：
python 2中

import urllib
import urllib2
import json
from config import settings
def url_request(self, action, url, **extra_data): abs_url = "http://%s:%s/%s" % (settings.configs['Server'],
settings.configs["ServerPort"],
url)
if action in ('get', 'GET'):
print(abs_url, extra_data)
try:
req = urllib2.Request(abs_url)
req_data = urllib2.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
# print "-->server response:",callback
return callback

except urllib2.URLError as e:
exit("\033[31;1m%s\033[0m" % e)
elif action in ('post', 'POST'):
# print(abs_url,extra_data['params'])
try:
data_encode = urllib.urlencode(extra_data['params'])
req = urllib2.Request(url=abs_url, data=data_encode)
res_data = urllib2.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = res_data.read()
callback = json.loads(callback)
print("\033[31;1m[%s]:[%s]\033[0m response:\n%s" % (action, abs_url, callback))
return callback
except Exception as e:
print('---exec', e)
exit("\033[31;1m%s\033[0m" % e)

python3.x中

import urllib.request
import json
from config import settings

def url_request(self, action, url, **extra_data):
abs_url = 'http://%s:%s/%s/' % (settings.configs['ServerIp'], settings.configs['ServerPort'], url)
if action in ('get', 'Get'): # get請求
print(action, extra_data)try:
req = urllib.request.Request(abs_url)
req_data = urllib.request.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
return callback
except urllib.error.URLError as e:
exit("\033[31;1m%s\033[0m" % e)
elif action in ('post', 'POST'): # post數據到伺服器端
try:
data_encode = urllib.parse.urlencode(extra_data['params'])
req = urllib.request.Request(url=abs_url, data=data_encode)
req_data = urllib.request.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
callback = json.loads(callback.decode())
return callback
except urllib.request.URLError as e:
print('---exec', e)
exit("\033[31;1m%s\033[0m" % e)

settings配置如下：

configs = {
'HostID': 2,
"Server": "localhost",
"ServerPort": 8000,
"urls": {

'get_configs': ['api/client/config', 'get'], #acquire all the services will be monitored
'service_report': ['api/client/service/report/', 'post'],

},
'RequestTimeout': 30,
'ConfigUpdateInterval': 300, # 5 mins as default

}

閱讀全文

熱點內容

java返回this 發布：2025-10-20 08:28:16 瀏覽：643

製作腳本網站發布：2025-10-20 08:17:34 瀏覽：933

python中的init方法發布：2025-10-20 08:17:33 瀏覽：630

圖案密碼什麼意思發布：2025-10-20 08:16:56 瀏覽：818

怎麼清理微信視頻緩存發布：2025-10-20 08:12:37 瀏覽：728

c語言編譯器怎麼看執行過程發布：2025-10-20 08:00:32 瀏覽：1064

郵箱如何填寫發信伺服器發布：2025-10-20 07:45:27 瀏覽：296

shell腳本入門案例發布：2025-10-20 07:44:45 瀏覽：157

怎麼上傳照片瀏覽上傳發布：2025-10-20 07:44:03 瀏覽：847

python股票數據獲取發布：2025-10-20 07:39:44 瀏覽：759

pythonurllib2編碼

與pythonurllib2編碼相關的資訊