pythonurllib模塊

發布時間: 2023-11-19 12:37:52

⑴ python urllib2模塊在哪裡下載

urllib2是python自帶的模塊，不需要下載。

urllib2在python3.x中被改為urllib.request

⑵ 為什麼我下載的Python3.6，urllib包裡面沒有urlopen方法

Python3.x以上版本里的urllib模塊已經發生改變，此處的urllib都應該改成urllib.request。
例如要寫成這樣：
import urllib.request
web = urllib.request.urlopen('http://www..com')
f = web.read()
print(f)

⑶ urllib.parse在python2.7中怎麼用

最新版的python3.3.0已經發布了。相較於python3.0，3.2的改動並不大。但網上的大量的教程等大都以2.x版本為基礎。這為想要從python3.0學起的菜鳥帶來了不少的困難。作為一隻菜鳥，最近想學習一下python中urllib模塊的使用方法。從網上找的最簡單的實例：把google 首頁的html抓取下來並顯示在控制台上代碼：

[python]view plain

importurllib
printurllib.urlopen('http://www.google.com').read()

首先，使用過python3.0的朋友都知道，print已經變成含樹了，需要括弧。但這不是主要問題。問題是控制台顯示錯誤，說urllib模塊中沒有urlopen方法。奇怪了,網上的教程能錯了？又嘗試help(urllib)，發現什麼方法都沒有,只提供了package contents，裡面有5個名字。

[python]view plain

importurllib
help(urllib)

3.0版本中已經將urllib2、urlparse、和robotparser並入了urllib中，並且修改urllib模塊，其中包含5個子模塊，即是help()中看到的那五個名字。

為了今後使用方便，在此將每個包中包含的方法列舉如下：

urllib.error:ContentTooShortError; HTTPError; URLError

urllib.parse:parseqs; parseqsl; quote; quotefrombytes; quote_plus; unquote unquoteplus; unquoteto_bytes; urldefrag; urlencode; urljoin;urlparse; urlsplit; urlunparse; urlunsplit

urllib.request:AbstractBasicAuthHandler; AbstractDigestAuthHandler; BaseHandler; CatheFTPHandler; FTPHandler; FancyURLopener; FileHandler; HTTPBasicAuthHandler; HTTPCookieProcessor; HTTPDefaultErrorHandler; HTTPDigestAuthHandler; HTTPErrorProcessorl; HTTPHandler; HTTPPasswordMgr; ; HTTPRedirectHandler; HTTPSHandler;OpenerDirector;ProxyBasicAuthHandler ProxyDigestAuthHandler; ProxyHandler; Request; URLopener; UnknowHandler; buildopener; getproxies; installopener; pathname2url; url2pathname; urlcleanup;urlopen; urlretrieve;

urllib.response:addbase; addclosehook; addinfo; addinfourl;

urllib.robotparser:RobotFileParser

---------------------------------------------------------------------------------------------------------

在2.X版本下，打開HTML文檔的實例：

[python]view plain

importurllib
webURL="http://www.python.org"
localURL="index.html"
#通過URL打開遠程頁面
u=urllib.urlopen(webURL)
buffer=u.read()
printu.info()
print"從%s讀取了%d位元組數據."%(u.geturl(),len(buffer))
#通過URL打開本地頁面
u=urllib.urlopen(localURL)
buffer=u.read()
printu.info()
print"從%s讀取了%d位元組數據."%(u.geturl(),len(buffer))

運行結果如下：

[html]view plain

Date:Fri,26Jun200910:22:11GMT
Server:Apache/2.2.9(Debian)DAV/2SVN/1.5.1mod_ssl/2.2.9OpenSSL/0.9.8gmod_wsgi/2.3Python/2.5.2
Last-Modified:Thu,25Jun200909:44:54GMT
ETag:"105800d-46e7-46d29136f7180"
Accept-Ranges:bytes
Content-Length:18151
Connection:close
Content-Type:text/html
從http://www.python.org讀取了18151位元組數據.
Content-Type:text/html
Content-Length:865
Last-modified:Fri,26Jun200910:16:10GMT
從index.html讀取了865位元組數據.

若要通過urllib模塊中的urlopen(url [,data])函數打開一個HTML文檔，必須提供該文檔的URL地址，包括文件名。函數urlopen不僅可以打開位於遠程web伺服器上的文件，而且可以打開一個本地文件，並返回一個類似文件的對象，我們可以通過該對象從HTML文檔中讀出數據。

一旦打開了HTML文檔，我們就可以像使用常規文件一樣使用read([nbytes])、readline()和readlines()函數來對文件進行讀操作。若要讀取整個HTML文檔的內容的話，您可以使用read()函數，該函數將文件內容作為字元串返回。

打開一個地址之後，您可以使用geturl()函數取得被獲取網頁的真正的URL。這是很有用的，因為urlopen(或使用的opener對象)也許會伴隨一個重定向。獲取的網頁URL也許和要求的網頁URL不一樣。

另一個常用的函數是位於從urlopen返回的類文件對象中的info()函數，這個函數可以返回URL位置有關的元數據，比如內容長度、內容類型，等等。下面通過一個較為詳細的例子來對這些函數進行說明。

--------------------------------------------------------------------------------------------------------------------------

在2.X版本下，urlparse使用實例：

[python]view plain

importurlparse
URLscheme="http"
URLlocation="www.python.org"
URLpath="lib/mole-urlparse.html"
modList=("urllib","urllib2",
"httplib","cgilib")
#將地址解析成組件
print"用Google搜索python時地址欄中URL的解析結果"
parsedTuple=urlparse.urlparse(
"http://www.google.com/search?
hl=en&q=python&btnG=Google+Search")
printparsedTuple
#將組件反解析成URL
print"反解析python文檔頁面的URL"
unparsedURL=urlparse.urlunparse(
(URLscheme,URLlocation,URLpath,'','',''))
print" "+unparsedURL
#將路徑和新文件組成一個新的URL
print"利用拼接方式添加更多python文檔頁面的URL"
formodinmodList:
newURL=urlparse.urljoin(unparsedURL,
"mole-%s.html"%(mod))
print" "+newURL
#通過為路徑添加一個子路徑來組成一個新的URL
print"通過拼接子路徑來生成Python文檔頁面的URL"
newURL=urlparse.urljoin(unparsedURL,
"mole-urllib2/request-objects.html")
print" "+newURL

運行結果如下：

[python]view plain

用Google搜索python時地址欄中URL的解析結果
('http','www.google.com','/search','',
'hl=en&q=python&btnG=Google+Search','')
反解析python文檔頁面的URL
http://www.python.org/lib/mole-urlparse.html
利用拼接方式添加更多python文檔頁面的URL
http://www.python.org/lib/mole-urllib.html
http://www.python.org/lib/mole-urllib2.html
http://www.python.org/lib/mole-httplib.html
http://www.python.org/lib/mole-cgilib.html
通過拼接子路徑來生成Python文檔頁面的URL

⑷ python3中使用urllib進行https請求

剛入門python學習網路爬蟲基礎，我使用的python版本是python3.6.4，學習的教程參考 Python爬蟲入門教程

python3.6的版本已經沒有urllib2這個庫了，所以我也不需要糾結urllib和urllib2的區別和應用場景

參考這篇官方文檔 HOWTO Fetch Internet Resources Using The urllib Package 。關於http(s)請求一般就get和post兩種方式較為常用，所以寫了以下兩個小demo，url鏈接隨便找的，具體場景具體變化，可參考注釋中的基本思路

POST請求：

GET請求：

注意，
使用ssl創建未經驗證的上下文，在urlopen中需傳入上下文參數
urllib.request.urlopen(full_url, context=context)
這是Python 升級到 2.7.9 之後引入的一個新特性，所以在使用urlopen打開https鏈接會遇到如下報錯：
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)
所以，當使用urllib.urlopen打開一個 https 鏈接時，需要先驗證一次 SSL 證書
context = ssl._create_unverified_context()
或者或者導入ssl時關閉證書驗證
ssl._create_default_https_context =ssl._create_unverified_context

⑸ python的urllib如何POST傳遞數組參數

1.如果機器上安裝了 Python 的 setuptools，可以通過下面的命令來安裝 poster：
easy_install poster

# test_client.pyfrom poster.encode import multipart_encodefrom poster.streaminghttp import register_openersimport urllib2# 在 urllib2 上注冊 http 流處理句柄register_openers()# 開始對文件 "DSC0001.jpg" 的 multiart/form-data 編碼# "image1" 是參數的名字，一般通過 HTML 中的 <input> 標簽的 name 參數設置# headers 包含必須的 Content-Type 和 Content-Length# datagen 是一個生成器對象，返回編碼過後的參數datagen, headers = multipart_encode({"image1": open("DSC0001.jpg", "rb")})# 創建請求對象request = urllib2.Request("http://localhost:5000/upload_image", datagen, headers)# 實際執行請求並取得返回print urllib2.urlopen(request).read()
很簡單，文件就上傳完成了。
2.其中那個 register_openers() 相當於以下操作：
from poster.encode import multipart_encodefrom poster.streaminghttp import StreamingHTTPHandler, StreamingHTTPRedirectHandler, StreamingHTTPSHandlerhandlers = [StreamingHTTPHandler, StreamingHTTPRedirectHandler, StreamingHTTPSHandler]opener = urllib2.build_opener(*handlers)urllib2.install_opener(opener)

3.另外，poster 也可以攜帶 cookie，比如：
opener = poster.streaminghttp.register_openers()opener.add_handler(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))params = {'file': open("test.txt", "rb"), 'name': 'upload test'}datagen, headers = poster.encode.multipart_encode(params)request = urllib2.Request(upload_url, datagen, headers)result = urllib2.urlopen(request)

⑹ python2.7 怎樣集成 urllib2

python最惡心的地方就在於它的版本和配置了，特別是安裝第三方包的時候經常會出現莫名其妙的錯誤，又不懂。

所以只能不斷的切來切去的。

今天學習python爬蟲，其中Python2.7使用了urllib和urllib2，python3的urllib結合了py2.7的兩部分。但是電腦不知為什麼又安裝不了py3的urllib，好煩。出現下面的錯誤。

python2.7和python3主要是模塊的位置變化地方較多。

其中python2.7的urllib和urllib2的區別一下：

urllib2可以接受一個Request類的實例來設置URL請求的headers，urllib僅可以接受URL。這意味著，你不可以通過urllib模塊偽裝你的User Agent字元串等（偽裝瀏覽器）。
urllib提供urlencode方法用來GET查詢字元串的產生，而urllib2沒有。這是為何urllib常和urllib2一起使用的原因。
urllib2模塊比較優勢的地方是urlliburllib2.urlopen可以接受Request對象作為參數，從而可以控制HTTP Request的header部。
但是urllib.urlretrieve函數以及urllib.quote等一系列quote和unquote功能沒有被加入urllib2中，因此有時也需要urllib的輔助。

⑺ python的httplib，urllib和urllib2的區別及用

宗述
首先來看一下他們的區別
urllib和urllib2
urllib 和urllib2都是接受URL請求的相關模塊，但是urllib2可以接受一個Request類的實例來設置URL請求的headers，urllib僅可以接受URL。
這意味著，你不可以偽裝你的User Agent字元串等。
urllib提供urlencode方法用來GET查詢字元串的產生，而urllib2沒有。這是為何urllib常和urllib2一起使用的原因。
目前的大部分http請求都是通過urllib2來訪問的

httplib
httplib實現了HTTP和HTTPS的客戶端協議，一般不直接使用，在更高層的封裝模塊中（urllib,urllib2）使用了它的http實現。

urllib簡單用法
urllib.urlopen(url[, data[, proxies]]) :
[python] view plain

google = urllib.urlopen('')
print 'http header:/n', google.info()
print 'http status:', google.getcode()
print 'url:', google.geturl()
for line in google: # 就像在操作本地文件
print line,
google.close()

詳細使用方法見
urllib學習

urllib2簡單用法
最簡單的形式
import urllib2
response=urllib2.urlopen(')
html=response.read()

實際步驟：
1、urllib2.Request()的功能是構造一個請求信息，返回的req就是一個構造好的請求
2、urllib2.urlopen()的功能是發送剛剛構造好的請求req，並返回一個文件類的對象response，包括了所有的返回信息。
3、通過response.read()可以讀取到response裡面的html，通過response.info()可以讀到一些額外的信息。
如下：

#!/usr/bin/env python
import urllib2
req = urllib2.Request("")
response = urllib2.urlopen(req)
html = response.read()
print html

有時你會碰到，程序也對，但是伺服器拒絕你的訪問。這是為什麼呢?問題出在請求中的頭信息(header)。有的服務端有潔癖，不喜歡程序來觸摸它。這個時候你需要將你的程序偽裝成瀏覽器來發出請求。請求的方式就包含在header中。
常見的情形：

import urllib
import urllib2
url = '
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'# 將user_agent寫入頭信息
values = {'name' : 'who','password':'123456'}
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

values是post數據
GET方法
例如網路：
，這樣我們需要將{『wd』:』xxx』}這個字典進行urlencode

#coding:utf-8
import urllib
import urllib2
url = ''
values = {'wd':'D_in'}
data = urllib.urlencode(values)
print data
url2 = url+'?'+data
response = urllib2.urlopen(url2)
the_page = response.read()
print the_page

POST方法

import urllib
import urllib2
url = ''
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' //將user_agent寫入頭信息
values = {'name' : 'who','password':'123456'} //post數據
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values) //對post數據進行url編碼
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

urllib2帶cookie的使用

#coding:utf-8
import urllib2,urllib
import cookielib

url = r''

#創建一個cj的cookie的容器
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
#將要POST出去的數據進行編碼
data = urllib.urlencode({"email":email,"password":pass})
r = opener.open(url,data)
print cj

httplib簡單用法
簡單示例

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import httplib
import urllib

def sendhttp():
data = urllib.urlencode({'@number': 12524, '@type': 'issue', '@action': 'show'})
headers = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain"}
conn = httplib.HTTPConnection('bugs.python.org')
conn.request('POST', '/', data, headers)
httpres = conn.getresponse()
print httpres.status
print httpres.reason
print httpres.read()

if __name__ == '__main__':
sendhttp()

具體用法見
httplib模塊
python 3.x中urllib庫和urilib2庫合並成了urllib庫。其中、
首先你導入模塊由
import urllib
import urllib2
變成了
import urllib.request

然後是urllib2中的方法使用變成了如下
urllib2.urlopen()變成了urllib.request.urlopen()
urllib2.Request()變成了urllib.request.Request()

urllib2.URLError 變成了urllib.error.URLError
而當你想使用urllib 帶數據的post請求時，
在python2中
urllib.urlencode(data)

而在python3中就變成了
urllib.parse.urlencode(data)

腳本使用舉例：
python 2中

import urllib
import urllib2
import json
from config import settings
def url_request(self, action, url, **extra_data): abs_url = "http://%s:%s/%s" % (settings.configs['Server'],
settings.configs["ServerPort"],
url)
if action in ('get', 'GET'):
print(abs_url, extra_data)
try:
req = urllib2.Request(abs_url)
req_data = urllib2.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
# print "-->server response:",callback
return callback

except urllib2.URLError as e:
exit("\033[31;1m%s\033[0m" % e)
elif action in ('post', 'POST'):
# print(abs_url,extra_data['params'])
try:
data_encode = urllib.urlencode(extra_data['params'])
req = urllib2.Request(url=abs_url, data=data_encode)
res_data = urllib2.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = res_data.read()
callback = json.loads(callback)
print("\033[31;1m[%s]:[%s]\033[0m response:\n%s" % (action, abs_url, callback))
return callback
except Exception as e:
print('---exec', e)
exit("\033[31;1m%s\033[0m" % e)

python3.x中

import urllib.request
import json
from config import settings

def url_request(self, action, url, **extra_data):
abs_url = 'http://%s:%s/%s/' % (settings.configs['ServerIp'], settings.configs['ServerPort'], url)
if action in ('get', 'Get'): # get請求
print(action, extra_data)try:
req = urllib.request.Request(abs_url)
req_data = urllib.request.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
return callback
except urllib.error.URLError as e:
exit("\033[31;1m%s\033[0m" % e)
elif action in ('post', 'POST'): # post數據到伺服器端
try:
data_encode = urllib.parse.urlencode(extra_data['params'])
req = urllib.request.Request(url=abs_url, data=data_encode)
req_data = urllib.request.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
callback = json.loads(callback.decode())
return callback
except urllib.request.URLError as e:
print('---exec', e)
exit("\033[31;1m%s\033[0m" % e)

settings配置如下：

configs = {
'HostID': 2,
"Server": "localhost",
"ServerPort": 8000,
"urls": {

'get_configs': ['api/client/config', 'get'], #acquire all the services will be monitored
'service_report': ['api/client/service/report/', 'post'],

},
'RequestTimeout': 30,
'ConfigUpdateInterval': 300, # 5 mins as default

}

閱讀全文

熱點內容

java返回this 發布：2025-10-20 08:28:16 瀏覽：547

製作腳本網站發布：2025-10-20 08:17:34 瀏覽：832

python中的init方法發布：2025-10-20 08:17:33 瀏覽：537

圖案密碼什麼意思發布：2025-10-20 08:16:56 瀏覽：717

怎麼清理微信視頻緩存發布：2025-10-20 08:12:37 瀏覽：639

c語言編譯器怎麼看執行過程發布：2025-10-20 08:00:32 瀏覽：955

郵箱如何填寫發信伺服器發布：2025-10-20 07:45:27 瀏覽：211

shell腳本入門案例發布：2025-10-20 07:44:45 瀏覽：68

怎麼上傳照片瀏覽上傳發布：2025-10-20 07:44:03 瀏覽：761

python股票數據獲取發布：2025-10-20 07:39:44 瀏覽：666

pythonurllib模塊

與pythonurllib模塊相關的資訊