pythonurllib模块

发布时间: 2023-11-19 12:37:52

⑴ python urllib2模块在哪里下载

urllib2是python自带的模块，不需要下载。

urllib2在python3.x中被改为urllib.request

⑵ 为什么我下载的Python3.6，urllib包里面没有urlopen方法

Python3.x以上版本里的urllib模块已经发生改变，此处的urllib都应该改成urllib.request。
例如要写成这样：
import urllib.request
web = urllib.request.urlopen('http://www..com')
f = web.read()
print(f)

⑶ urllib.parse在python2.7中怎么用

最新版的python3.3.0已经发布了。相较于python3.0，3.2的改动并不大。但网上的大量的教程等大都以2.x版本为基础。这为想要从python3.0学起的菜鸟带来了不少的困难。作为一只菜鸟，最近想学习一下python中urllib模块的使用方法。从网上找的最简单的实例：把google 首页的html抓取下来并显示在控制台上代码：

[python]view plain

importurllib
printurllib.urlopen('http://www.google.com').read()

首先，使用过python3.0的朋友都知道，print已经变成含树了，需要括号。但这不是主要问题。问题是控制台显示错误，说urllib模块中没有urlopen方法。奇怪了,网上的教程能错了？又尝试help(urllib)，发现什么方法都没有,只提供了package contents，里面有5个名字。

[python]view plain

importurllib
help(urllib)

3.0版本中已经将urllib2、urlparse、和robotparser并入了urllib中，并且修改urllib模块，其中包含5个子模块，即是help()中看到的那五个名字。

为了今后使用方便，在此将每个包中包含的方法列举如下：

urllib.error:ContentTooShortError; HTTPError; URLError

urllib.parse:parseqs; parseqsl; quote; quotefrombytes; quote_plus; unquote unquoteplus; unquoteto_bytes; urldefrag; urlencode; urljoin;urlparse; urlsplit; urlunparse; urlunsplit

urllib.request:AbstractBasicAuthHandler; AbstractDigestAuthHandler; BaseHandler; CatheFTPHandler; FTPHandler; FancyURLopener; FileHandler; HTTPBasicAuthHandler; HTTPCookieProcessor; HTTPDefaultErrorHandler; HTTPDigestAuthHandler; HTTPErrorProcessorl; HTTPHandler; HTTPPasswordMgr; ; HTTPRedirectHandler; HTTPSHandler;OpenerDirector;ProxyBasicAuthHandler ProxyDigestAuthHandler; ProxyHandler; Request; URLopener; UnknowHandler; buildopener; getproxies; installopener; pathname2url; url2pathname; urlcleanup;urlopen; urlretrieve;

urllib.response:addbase; addclosehook; addinfo; addinfourl;

urllib.robotparser:RobotFileParser

---------------------------------------------------------------------------------------------------------

在2.X版本下，打开HTML文档的实例：

[python]view plain

importurllib
webURL="http://www.python.org"
localURL="index.html"
#通过URL打开远程页面
u=urllib.urlopen(webURL)
buffer=u.read()
printu.info()
print"从%s读取了%d字节数据."%(u.geturl(),len(buffer))
#通过URL打开本地页面
u=urllib.urlopen(localURL)
buffer=u.read()
printu.info()
print"从%s读取了%d字节数据."%(u.geturl(),len(buffer))

运行结果如下：

[html]view plain

Date:Fri,26Jun200910:22:11GMT
Server:Apache/2.2.9(Debian)DAV/2SVN/1.5.1mod_ssl/2.2.9OpenSSL/0.9.8gmod_wsgi/2.3Python/2.5.2
Last-Modified:Thu,25Jun200909:44:54GMT
ETag:"105800d-46e7-46d29136f7180"
Accept-Ranges:bytes
Content-Length:18151
Connection:close
Content-Type:text/html
从http://www.python.org读取了18151字节数据.
Content-Type:text/html
Content-Length:865
Last-modified:Fri,26Jun200910:16:10GMT
从index.html读取了865字节数据.

若要通过urllib模块中的urlopen(url [,data])函数打开一个HTML文档，必须提供该文档的URL地址，包括文件名。函数urlopen不仅可以打开位于远程web服务器上的文件，而且可以打开一个本地文件，并返回一个类似文件的对象，我们可以通过该对象从HTML文档中读出数据。

一旦打开了HTML文档，我们就可以像使用常规文件一样使用read([nbytes])、readline()和readlines()函数来对文件进行读操作。若要读取整个HTML文档的内容的话，您可以使用read()函数，该函数将文件内容作为字符串返回。

打开一个地址之后，您可以使用geturl()函数取得被获取网页的真正的URL。这是很有用的，因为urlopen(或使用的opener对象)也许会伴随一个重定向。获取的网页URL也许和要求的网页URL不一样。

另一个常用的函数是位于从urlopen返回的类文件对象中的info()函数，这个函数可以返回URL位置有关的元数据，比如内容长度、内容类型，等等。下面通过一个较为详细的例子来对这些函数进行说明。

--------------------------------------------------------------------------------------------------------------------------

在2.X版本下，urlparse使用实例：

[python]view plain

importurlparse
URLscheme="http"
URLlocation="www.python.org"
URLpath="lib/mole-urlparse.html"
modList=("urllib","urllib2",
"httplib","cgilib")
#将地址解析成组件
print"用Google搜索python时地址栏中URL的解析结果"
parsedTuple=urlparse.urlparse(
"http://www.google.com/search?
hl=en&q=python&btnG=Google+Search")
printparsedTuple
#将组件反解析成URL
print"反解析python文档页面的URL"
unparsedURL=urlparse.urlunparse(
(URLscheme,URLlocation,URLpath,'','',''))
print" "+unparsedURL
#将路径和新文件组成一个新的URL
print"利用拼接方式添加更多python文档页面的URL"
formodinmodList:
newURL=urlparse.urljoin(unparsedURL,
"mole-%s.html"%(mod))
print" "+newURL
#通过为路径添加一个子路径来组成一个新的URL
print"通过拼接子路径来生成Python文档页面的URL"
newURL=urlparse.urljoin(unparsedURL,
"mole-urllib2/request-objects.html")
print" "+newURL

运行结果如下：

[python]view plain

用Google搜索python时地址栏中URL的解析结果
('http','www.google.com','/search','',
'hl=en&q=python&btnG=Google+Search','')
反解析python文档页面的URL
http://www.python.org/lib/mole-urlparse.html
利用拼接方式添加更多python文档页面的URL
http://www.python.org/lib/mole-urllib.html
http://www.python.org/lib/mole-urllib2.html
http://www.python.org/lib/mole-httplib.html
http://www.python.org/lib/mole-cgilib.html
通过拼接子路径来生成Python文档页面的URL

⑷ python3中使用urllib进行https请求

刚入门python学习网络爬虫基础，我使用的python版本是python3.6.4，学习的教程参考 Python爬虫入门教程

python3.6的版本已经没有urllib2这个库了，所以我也不需要纠结urllib和urllib2的区别和应用场景

参考这篇官方文档 HOWTO Fetch Internet Resources Using The urllib Package 。关于http(s)请求一般就get和post两种方式较为常用，所以写了以下两个小demo，url链接随便找的，具体场景具体变化，可参考注释中的基本思路

POST请求：

GET请求：

注意，
使用ssl创建未经验证的上下文，在urlopen中需传入上下文参数
urllib.request.urlopen(full_url, context=context)
这是Python 升级到 2.7.9 之后引入的一个新特性，所以在使用urlopen打开https链接会遇到如下报错：
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)
所以，当使用urllib.urlopen打开一个 https 链接时，需要先验证一次 SSL 证书
context = ssl._create_unverified_context()
或者或者导入ssl时关闭证书验证
ssl._create_default_https_context =ssl._create_unverified_context

⑸ python的urllib如何POST传递数组参数

1.如果机器上安装了 Python 的 setuptools，可以通过下面的命令来安装 poster：
easy_install poster

# test_client.pyfrom poster.encode import multipart_encodefrom poster.streaminghttp import register_openersimport urllib2# 在 urllib2 上注册 http 流处理句柄register_openers()# 开始对文件 "DSC0001.jpg" 的 multiart/form-data 编码# "image1" 是参数的名字，一般通过 HTML 中的 <input> 标签的 name 参数设置# headers 包含必须的 Content-Type 和 Content-Length# datagen 是一个生成器对象，返回编码过后的参数datagen, headers = multipart_encode({"image1": open("DSC0001.jpg", "rb")})# 创建请求对象request = urllib2.Request("http://localhost:5000/upload_image", datagen, headers)# 实际执行请求并取得返回print urllib2.urlopen(request).read()
很简单，文件就上传完成了。
2.其中那个 register_openers() 相当于以下操作：
from poster.encode import multipart_encodefrom poster.streaminghttp import StreamingHTTPHandler, StreamingHTTPRedirectHandler, StreamingHTTPSHandlerhandlers = [StreamingHTTPHandler, StreamingHTTPRedirectHandler, StreamingHTTPSHandler]opener = urllib2.build_opener(*handlers)urllib2.install_opener(opener)

3.另外，poster 也可以携带 cookie，比如：
opener = poster.streaminghttp.register_openers()opener.add_handler(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))params = {'file': open("test.txt", "rb"), 'name': 'upload test'}datagen, headers = poster.encode.multipart_encode(params)request = urllib2.Request(upload_url, datagen, headers)result = urllib2.urlopen(request)

⑹ python2.7 怎样集成 urllib2

python最恶心的地方就在于它的版本和配置了，特别是安装第三方包的时候经常会出现莫名其妙的错误，又不懂。

所以只能不断的切来切去的。

今天学习python爬虫，其中Python2.7使用了urllib和urllib2，python3的urllib结合了py2.7的两部分。但是电脑不知为什么又安装不了py3的urllib，好烦。出现下面的错误。

python2.7和python3主要是模块的位置变化地方较多。

其中python2.7的urllib和urllib2的区别一下：

urllib2可以接受一个Request类的实例来设置URL请求的headers，urllib仅可以接受URL。这意味着，你不可以通过urllib模块伪装你的User Agent字符串等（伪装浏览器）。
urllib提供urlencode方法用来GET查询字符串的产生，而urllib2没有。这是为何urllib常和urllib2一起使用的原因。
urllib2模块比较优势的地方是urlliburllib2.urlopen可以接受Request对象作为参数，从而可以控制HTTP Request的header部。
但是urllib.urlretrieve函数以及urllib.quote等一系列quote和unquote功能没有被加入urllib2中，因此有时也需要urllib的辅助。

⑺ python的httplib，urllib和urllib2的区别及用

宗述
首先来看一下他们的区别
urllib和urllib2
urllib 和urllib2都是接受URL请求的相关模块，但是urllib2可以接受一个Request类的实例来设置URL请求的headers，urllib仅可以接受URL。
这意味着，你不可以伪装你的User Agent字符串等。
urllib提供urlencode方法用来GET查询字符串的产生，而urllib2没有。这是为何urllib常和urllib2一起使用的原因。
目前的大部分http请求都是通过urllib2来访问的

httplib
httplib实现了HTTP和HTTPS的客户端协议，一般不直接使用，在更高层的封装模块中（urllib,urllib2）使用了它的http实现。

urllib简单用法
urllib.urlopen(url[, data[, proxies]]) :
[python] view plain

google = urllib.urlopen('')
print 'http header:/n', google.info()
print 'http status:', google.getcode()
print 'url:', google.geturl()
for line in google: # 就像在操作本地文件
print line,
google.close()

详细使用方法见
urllib学习

urllib2简单用法
最简单的形式
import urllib2
response=urllib2.urlopen(')
html=response.read()

实际步骤：
1、urllib2.Request()的功能是构造一个请求信息，返回的req就是一个构造好的请求
2、urllib2.urlopen()的功能是发送刚刚构造好的请求req，并返回一个文件类的对象response，包括了所有的返回信息。
3、通过response.read()可以读取到response里面的html，通过response.info()可以读到一些额外的信息。
如下：

#!/usr/bin/env python
import urllib2
req = urllib2.Request("")
response = urllib2.urlopen(req)
html = response.read()
print html

有时你会碰到，程序也对，但是服务器拒绝你的访问。这是为什么呢?问题出在请求中的头信息(header)。有的服务端有洁癖，不喜欢程序来触摸它。这个时候你需要将你的程序伪装成浏览器来发出请求。请求的方式就包含在header中。
常见的情形：

import urllib
import urllib2
url = '
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'# 将user_agent写入头信息
values = {'name' : 'who','password':'123456'}
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

values是post数据
GET方法
例如网络：
，这样我们需要将{‘wd’:’xxx’}这个字典进行urlencode

#coding:utf-8
import urllib
import urllib2
url = ''
values = {'wd':'D_in'}
data = urllib.urlencode(values)
print data
url2 = url+'?'+data
response = urllib2.urlopen(url2)
the_page = response.read()
print the_page

POST方法

import urllib
import urllib2
url = ''
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' //将user_agent写入头信息
values = {'name' : 'who','password':'123456'} //post数据
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values) //对post数据进行url编码
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

urllib2带cookie的使用

#coding:utf-8
import urllib2,urllib
import cookielib

url = r''

#创建一个cj的cookie的容器
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
#将要POST出去的数据进行编码
data = urllib.urlencode({"email":email,"password":pass})
r = opener.open(url,data)
print cj

httplib简单用法
简单示例

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import httplib
import urllib

def sendhttp():
data = urllib.urlencode({'@number': 12524, '@type': 'issue', '@action': 'show'})
headers = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain"}
conn = httplib.HTTPConnection('bugs.python.org')
conn.request('POST', '/', data, headers)
httpres = conn.getresponse()
print httpres.status
print httpres.reason
print httpres.read()

if __name__ == '__main__':
sendhttp()

具体用法见
httplib模块
python 3.x中urllib库和urilib2库合并成了urllib库。其中、
首先你导入模块由
import urllib
import urllib2
变成了
import urllib.request

然后是urllib2中的方法使用变成了如下
urllib2.urlopen()变成了urllib.request.urlopen()
urllib2.Request()变成了urllib.request.Request()

urllib2.URLError 变成了urllib.error.URLError
而当你想使用urllib 带数据的post请求时，
在python2中
urllib.urlencode(data)

而在python3中就变成了
urllib.parse.urlencode(data)

脚本使用举例：
python 2中

import urllib
import urllib2
import json
from config import settings
def url_request(self, action, url, **extra_data): abs_url = "http://%s:%s/%s" % (settings.configs['Server'],
settings.configs["ServerPort"],
url)
if action in ('get', 'GET'):
print(abs_url, extra_data)
try:
req = urllib2.Request(abs_url)
req_data = urllib2.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
# print "-->server response:",callback
return callback

except urllib2.URLError as e:
exit("\033[31;1m%s\033[0m" % e)
elif action in ('post', 'POST'):
# print(abs_url,extra_data['params'])
try:
data_encode = urllib.urlencode(extra_data['params'])
req = urllib2.Request(url=abs_url, data=data_encode)
res_data = urllib2.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = res_data.read()
callback = json.loads(callback)
print("\033[31;1m[%s]:[%s]\033[0m response:\n%s" % (action, abs_url, callback))
return callback
except Exception as e:
print('---exec', e)
exit("\033[31;1m%s\033[0m" % e)

python3.x中

import urllib.request
import json
from config import settings

def url_request(self, action, url, **extra_data):
abs_url = 'http://%s:%s/%s/' % (settings.configs['ServerIp'], settings.configs['ServerPort'], url)
if action in ('get', 'Get'): # get请求
print(action, extra_data)try:
req = urllib.request.Request(abs_url)
req_data = urllib.request.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
return callback
except urllib.error.URLError as e:
exit("\033[31;1m%s\033[0m" % e)
elif action in ('post', 'POST'): # post数据到服务器端
try:
data_encode = urllib.parse.urlencode(extra_data['params'])
req = urllib.request.Request(url=abs_url, data=data_encode)
req_data = urllib.request.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
callback = json.loads(callback.decode())
return callback
except urllib.request.URLError as e:
print('---exec', e)
exit("\033[31;1m%s\033[0m" % e)

settings配置如下：

configs = {
'HostID': 2,
"Server": "localhost",
"ServerPort": 8000,
"urls": {

'get_configs': ['api/client/config', 'get'], #acquire all the services will be monitored
'service_report': ['api/client/service/report/', 'post'],

},
'RequestTimeout': 30,
'ConfigUpdateInterval': 300, # 5 mins as default

}

阅读全文

热点内容

java返回this 发布：2025-10-20 08:28:16 浏览：713

制作脚本网站发布：2025-10-20 08:17:34 浏览：977

python中的init方法发布：2025-10-20 08:17:33 浏览：686

图案密码什么意思发布：2025-10-20 08:16:56 浏览：838

怎么清理微信视频缓存发布：2025-10-20 08:12:37 浏览：745

c语言编译器怎么看执行过程发布：2025-10-20 08:00:32 浏览：1085

邮箱如何填写发信服务器发布：2025-10-20 07:45:27 浏览：314

shell脚本入门案例发布：2025-10-20 07:44:45 浏览：194

怎么上传照片浏览上传发布：2025-10-20 07:44:03 浏览：882

python股票数据获取发布：2025-10-20 07:39:44 浏览：840

pythonurllib模块

与pythonurllib模块相关的资讯