urllibpython27

发布时间: 2022-07-24 10:52:22

⑴ python27中except urllib2.URLError, e:语句的小问题

except是处理所有出现的异常
except urllib2.URLError只有当出现urllib2.URLError这个异常时才会处理
except urllib2.URLError,e: 中间应该没有逗号，应为
except urllib2.URLError e:
e表示一个urllib2.URLError类型的变量

⑵ urllib.parse在python2.7中怎么用

最新版的python3.3.0已经发布了。相较于python3.0，3.2的改动并不大。但网上的大量的教程等大都以2.x版本为基础。这为想要从python3.0学起的菜鸟带来了不少的困难。作为一只菜鸟，最近想学习一下python中urllib模块的使用方法。从网上找的最简单的实例：把google 首页的html抓取下来并显示在控制台上代码：

[python]view plain

importurllib
printurllib.urlopen('http://www.google.com').read()

首先，使用过python3.0的朋友都知道，print已经变成含树了，需要括号。但这不是主要问题。问题是控制台显示错误，说urllib模块中没有urlopen方法。奇怪了,网上的教程能错了？又尝试help(urllib)，发现什么方法都没有,只提供了package contents，里面有5个名字。

[python]view plain

importurllib
help(urllib)

3.0版本中已经将urllib2、urlparse、和robotparser并入了urllib中，并且修改urllib模块，其中包含5个子模块，即是help()中看到的那五个名字。

为了今后使用方便，在此将每个包中包含的方法列举如下：

urllib.error:ContentTooShortError; HTTPError; URLError

urllib.parse:parseqs; parseqsl; quote; quotefrombytes; quote_plus; unquote unquoteplus; unquoteto_bytes; urldefrag; urlencode; urljoin;urlparse; urlsplit; urlunparse; urlunsplit

urllib.request:AbstractBasicAuthHandler; AbstractDigestAuthHandler; BaseHandler; CatheFTPHandler; FTPHandler; FancyURLopener; FileHandler; HTTPBasicAuthHandler; HTTPCookieProcessor; HTTPDefaultErrorHandler; HTTPDigestAuthHandler; HTTPErrorProcessorl; HTTPHandler; HTTPPasswordMgr; ; HTTPRedirectHandler; HTTPSHandler;OpenerDirector;ProxyBasicAuthHandler ProxyDigestAuthHandler; ProxyHandler; Request; URLopener; UnknowHandler; buildopener; getproxies; installopener; pathname2url; url2pathname; urlcleanup;urlopen; urlretrieve;

urllib.response:addbase; addclosehook; addinfo; addinfourl;

urllib.robotparser:RobotFileParser

---------------------------------------------------------------------------------------------------------

在2.X版本下，打开HTML文档的实例：

[python]view plain

importurllib
webURL="http://www.python.org"
localURL="index.html"
#通过URL打开远程页面
u=urllib.urlopen(webURL)
buffer=u.read()
printu.info()
print"从%s读取了%d字节数据."%(u.geturl(),len(buffer))
#通过URL打开本地页面
u=urllib.urlopen(localURL)
buffer=u.read()
printu.info()
print"从%s读取了%d字节数据."%(u.geturl(),len(buffer))

运行结果如下：

[html]view plain

Date:Fri,26Jun200910:22:11GMT
Server:Apache/2.2.9(Debian)DAV/2SVN/1.5.1mod_ssl/2.2.9OpenSSL/0.9.8gmod_wsgi/2.3Python/2.5.2
Last-Modified:Thu,25Jun200909:44:54GMT
ETag:"105800d-46e7-46d29136f7180"
Accept-Ranges:bytes
Content-Length:18151
Connection:close
Content-Type:text/html
从http://www.python.org读取了18151字节数据.
Content-Type:text/html
Content-Length:865
Last-modified:Fri,26Jun200910:16:10GMT
从index.html读取了865字节数据.

若要通过urllib模块中的urlopen(url [,data])函数打开一个HTML文档，必须提供该文档的URL地址，包括文件名。函数urlopen不仅可以打开位于远程web服务器上的文件，而且可以打开一个本地文件，并返回一个类似文件的对象，我们可以通过该对象从HTML文档中读出数据。

一旦打开了HTML文档，我们就可以像使用常规文件一样使用read([nbytes])、readline()和readlines()函数来对文件进行读操作。若要读取整个HTML文档的内容的话，您可以使用read()函数，该函数将文件内容作为字符串返回。

打开一个地址之后，您可以使用geturl()函数取得被获取网页的真正的URL。这是很有用的，因为urlopen(或使用的opener对象)也许会伴随一个重定向。获取的网页URL也许和要求的网页URL不一样。

另一个常用的函数是位于从urlopen返回的类文件对象中的info()函数，这个函数可以返回URL位置有关的元数据，比如内容长度、内容类型，等等。下面通过一个较为详细的例子来对这些函数进行说明。

--------------------------------------------------------------------------------------------------------------------------

在2.X版本下，urlparse使用实例：

[python]view plain

importurlparse
URLscheme="http"
URLlocation="www.python.org"
URLpath="lib/mole-urlparse.html"
modList=("urllib","urllib2",
"httplib","cgilib")
#将地址解析成组件
print"用Google搜索python时地址栏中URL的解析结果"
parsedTuple=urlparse.urlparse(
"http://www.google.com/search?
hl=en&q=python&btnG=Google+Search")
printparsedTuple
#将组件反解析成URL
print"反解析python文档页面的URL"
unparsedURL=urlparse.urlunparse(
(URLscheme,URLlocation,URLpath,'','',''))
print" "+unparsedURL
#将路径和新文件组成一个新的URL
print"利用拼接方式添加更多python文档页面的URL"
formodinmodList:
newURL=urlparse.urljoin(unparsedURL,
"mole-%s.html"%(mod))
print" "+newURL
#通过为路径添加一个子路径来组成一个新的URL
print"通过拼接子路径来生成Python文档页面的URL"
newURL=urlparse.urljoin(unparsedURL,
"mole-urllib2/request-objects.html")
print" "+newURL

运行结果如下：

[python]view plain

用Google搜索python时地址栏中URL的解析结果
('http','www.google.com','/search','',
'hl=en&q=python&btnG=Google+Search','')
反解析python文档页面的URL
http://www.python.org/lib/mole-urlparse.html
利用拼接方式添加更多python文档页面的URL
http://www.python.org/lib/mole-urllib.html
http://www.python.org/lib/mole-urllib2.html
http://www.python.org/lib/mole-httplib.html
http://www.python.org/lib/mole-cgilib.html
通过拼接子路径来生成Python文档页面的URL

⑶ python报错urllib.error.URLError: <urlopen error unknown url type: src="https>，Windows系统怎么解决

jango站点使用django_cas接入SSO（单点登录系统），配置完成后登录，抛出“urlopen error unknown url type: https”异常。寻根朔源发现是python内置的urllib模块不支持https协议。

>>> import urllib
>>> urllib.urlopen('htom')
<addinfourl at 269231456 whose fp = <socket._fileobject object at 0xff98250>>
>>> urllib.urlopen('hm')
Traceback (most recent call last):
File "<stdin>", line 1, in <mole>
File "/usr/local/python27/lib/python2.7/urllib.py", line 86, in urlopen
return opener.open(url)
File "/usr/local/python27/lib/python2.7/urllib.py", line 204, in open
return self.open_unknown(fullurl, data)
File "/usr/local/python27/lib/python2.7/urllib.py", line 216, in open_unknown
raise IOError, ('url error', 'unknown url type', type)
IOError: [Errno url error] unknown url type: 'https'

之所以python内置的urllib模块不支持https协议是因为编译安装python之前没有编译安装类似于openssl这样的SSL库,以至于python不支持SSL

因为我用的是Centos系统所以安装openssl-devel
sudo yum install openssl-devel

之后重新编译Python
./configure(可选，因为之前已经配置过，按之前的配置来就行了，而且最好按之前的配置配编译安装以免依赖的库需要重新编译安装。)
make
make install

>>> import urllib
>>> urllib.urlopen('htt.com')
没有再报同样的错误。

在安装完openssl-devel后重新编译python前也有说需要编辑Moles文件夹内Setup.dist文件的
修改
# Socket mole helper for SSL support; you must comment out the other
# socket line above, and possibly edit the SSL variable:
#SSL=/usr/local/ssl
#_ssl _ssl.c \
# -DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \
# -L$(SSL)/lib -lssl -lcrypto
为
# Socket mole helper for SSL support; you must comment out the other
# socket line above, and possibly edit the SSL variable:
SSL=/usr/local/ssl
_ssl _ssl.c \
-DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \
-L$(SSL)/lib -lssl -lcrypto

但实际测试下来好像并不需要修改这个文件，编译的时候能自动将SSL库编译进python中。

另外需要特别注意的是，重新编译安装python后，通过可执行文件名（可能是个连接文件）运行python可能运行的还是老的python，这是因为可执行文件名没有连接到新的python可执行程序。因此要用最新的python可执行文件名或指向该名字的连接来运行python。

重新编译安装python后有可能导致需要重新编译django，MySQLdb，pycrypto，python-ldap,django-auth-ldap,django_cas,django_cas,pymongo等一些列依赖python的模块。这里要特别注意

⑷ Linux下Python2.7使用urllib2.urlopen报错，应该怎么解决

ubuntu 的控制台默认是utf8编码的吧。而且这个google返回的是big5编码吗，用下面的代码解码下试试 url="网址" content = urllib2.urlopen(url).read() print content.decode('big5').encode('utf8')

⑸ python2.7 怎样集成 urllib2

python最恶心的地方就在于它的版本和配置了，特别是安装第三方包的时候经常会出现莫名其妙的错误，又不懂。

所以只能不断的切来切去的。

今天学习python爬虫，其中Python2.7使用了urllib和urllib2，python3的urllib结合了py2.7的两部分。但是电脑不知为什么又安装不了py3的urllib，好烦。出现下面的错误。

python2.7和python3主要是模块的位置变化地方较多。

其中python2.7的urllib和urllib2的区别一下：

urllib2可以接受一个Request类的实例来设置URL请求的headers，urllib仅可以接受URL。这意味着，你不可以通过urllib模块伪装你的User Agent字符串等（伪装浏览器）。
urllib提供urlencode方法用来GET查询字符串的产生，而urllib2没有。这是为何urllib常和urllib2一起使用的原因。
urllib2模块比较优势的地方是urlliburllib2.urlopen可以接受Request对象作为参数，从而可以控制HTTP Request的header部。
但是urllib.urlretrieve函数以及urllib.quote等一系列quote和unquote功能没有被加入urllib2中，因此有时也需要urllib的辅助。

⑹ python的httplib，urllib和urllib2的区别及用

宗述
首先来看一下他们的区别
urllib和urllib2
urllib 和urllib2都是接受URL请求的相关模块，但是urllib2可以接受一个Request类的实例来设置URL请求的headers，urllib仅可以接受URL。
这意味着，你不可以伪装你的User Agent字符串等。
urllib提供urlencode方法用来GET查询字符串的产生，而urllib2没有。这是为何urllib常和urllib2一起使用的原因。
目前的大部分http请求都是通过urllib2来访问的

httplib
httplib实现了HTTP和HTTPS的客户端协议，一般不直接使用，在更高层的封装模块中（urllib,urllib2）使用了它的http实现。

urllib简单用法
urllib.urlopen(url[, data[, proxies]]) :
[python] view plain

google = urllib.urlopen('')
print 'http header:/n', google.info()
print 'http status:', google.getcode()
print 'url:', google.geturl()
for line in google: # 就像在操作本地文件
print line,
google.close()

详细使用方法见
urllib学习

urllib2简单用法
最简单的形式
import urllib2
response=urllib2.urlopen(')
html=response.read()

实际步骤：
1、urllib2.Request()的功能是构造一个请求信息，返回的req就是一个构造好的请求
2、urllib2.urlopen()的功能是发送刚刚构造好的请求req，并返回一个文件类的对象response，包括了所有的返回信息。
3、通过response.read()可以读取到response里面的html，通过response.info()可以读到一些额外的信息。
如下：

#!/usr/bin/env python
import urllib2
req = urllib2.Request("")
response = urllib2.urlopen(req)
html = response.read()
print html

有时你会碰到，程序也对，但是服务器拒绝你的访问。这是为什么呢?问题出在请求中的头信息(header)。有的服务端有洁癖，不喜欢程序来触摸它。这个时候你需要将你的程序伪装成浏览器来发出请求。请求的方式就包含在header中。
常见的情形：

import urllib
import urllib2
url = '
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'# 将user_agent写入头信息
values = {'name' : 'who','password':'123456'}
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

values是post数据
GET方法
例如网络：
，这样我们需要将{‘wd’:’xxx’}这个字典进行urlencode

#coding:utf-8
import urllib
import urllib2
url = ''
values = {'wd':'D_in'}
data = urllib.urlencode(values)
print data
url2 = url+'?'+data
response = urllib2.urlopen(url2)
the_page = response.read()
print the_page

POST方法

import urllib
import urllib2
url = ''
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' //将user_agent写入头信息
values = {'name' : 'who','password':'123456'} //post数据
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values) //对post数据进行url编码
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

urllib2带cookie的使用

#coding:utf-8
import urllib2,urllib
import cookielib

url = r''

#创建一个cj的cookie的容器
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
#将要POST出去的数据进行编码
data = urllib.urlencode({"email":email,"password":pass})
r = opener.open(url,data)
print cj

httplib简单用法
简单示例

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import httplib
import urllib

def sendhttp():
data = urllib.urlencode({'@number': 12524, '@type': 'issue', '@action': 'show'})
headers = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain"}
conn = httplib.HTTPConnection('bugs.python.org')
conn.request('POST', '/', data, headers)
httpres = conn.getresponse()
print httpres.status
print httpres.reason
print httpres.read()

if __name__ == '__main__':
sendhttp()

具体用法见
httplib模块
python 3.x中urllib库和urilib2库合并成了urllib库。其中、
首先你导入模块由
import urllib
import urllib2
变成了
import urllib.request

然后是urllib2中的方法使用变成了如下
urllib2.urlopen()变成了urllib.request.urlopen()
urllib2.Request()变成了urllib.request.Request()

urllib2.URLError 变成了urllib.error.URLError
而当你想使用urllib 带数据的post请求时，
在python2中
urllib.urlencode(data)

而在python3中就变成了
urllib.parse.urlencode(data)

脚本使用举例：
python 2中

import urllib
import urllib2
import json
from config import settings
def url_request(self, action, url, **extra_data): abs_url = "http://%s:%s/%s" % (settings.configs['Server'],
settings.configs["ServerPort"],
url)
if action in ('get', 'GET'):
print(abs_url, extra_data)
try:
req = urllib2.Request(abs_url)
req_data = urllib2.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
# print "-->server response:",callback
return callback

except urllib2.URLError as e:
exit("\033[31;1m%s\033[0m" % e)
elif action in ('post', 'POST'):
# print(abs_url,extra_data['params'])
try:
data_encode = urllib.urlencode(extra_data['params'])
req = urllib2.Request(url=abs_url, data=data_encode)
res_data = urllib2.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = res_data.read()
callback = json.loads(callback)
print("\033[31;1m[%s]:[%s]\033[0m response:\n%s" % (action, abs_url, callback))
return callback
except Exception as e:
print('---exec', e)
exit("\033[31;1m%s\033[0m" % e)

python3.x中

import urllib.request
import json
from config import settings

def url_request(self, action, url, **extra_data):
abs_url = 'http://%s:%s/%s/' % (settings.configs['ServerIp'], settings.configs['ServerPort'], url)
if action in ('get', 'Get'): # get请求
print(action, extra_data)try:
req = urllib.request.Request(abs_url)
req_data = urllib.request.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
return callback
except urllib.error.URLError as e:
exit("\033[31;1m%s\033[0m" % e)
elif action in ('post', 'POST'): # post数据到服务器端
try:
data_encode = urllib.parse.urlencode(extra_data['params'])
req = urllib.request.Request(url=abs_url, data=data_encode)
req_data = urllib.request.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
callback = json.loads(callback.decode())
return callback
except urllib.request.URLError as e:
print('---exec', e)
exit("\033[31;1m%s\033[0m" % e)

settings配置如下：

configs = {
'HostID': 2,
"Server": "localhost",
"ServerPort": 8000,
"urls": {

'get_configs': ['api/client/config', 'get'], #acquire all the services will be monitored
'service_report': ['api/client/service/report/', 'post'],

},
'RequestTimeout': 30,
'ConfigUpdateInterval': 300, # 5 mins as default

}

⑺ python2.7 pip urllib失败

urllib是默认库，不用安装的

importurllib

res=urllib.urlopen('www')
printres.getcode()
forlineinres:
printline
res.close()

⑻ python中quote函数是什么意思，怎么用

quote函数一般用于处理URL链接里的特殊字符，比如一些非ASCII列表中的字母。

位置：该函数在Python27中位于urllib模块下，在Python3中应该是向下移动一级目录，位于urllib.parse模块中。

功能：替换字符串string中的一些特殊字符，并使用%xx的方式替换该特殊字符（xx为该字符的（uft-8）十六进制数值）。正常字符ascii字母a-z，数字，还有符号'_.-'是不会被替换的。当然使用函数的第二个参数（默认值为/）可以指定哪些字符也不需要替换。

例子：

>>>urllib.quote('/test')
'/test'
>>>urllib.quote('/test',safe='')
'%2Ftest'#2F为/的uft-8的hex值。

其他：函数的第一个参数为需要转换的字符串，格式应该为str或者bytes。

函数的第三个字符为编码方式。

⑼ python 2.7中的urllib无法使用如图

是你自己文件名跟系统库冲突了。

⑽ 如何在Python中使用urllib2

urllib2 默认会使用环境变量 http_proxy 来设置 HTTP Proxy。如果想在程序中明确控制 Proxy 而不受环境变量的影响，可以使用下面的方式:
import urllib2
enable_proxy = True
proxy_handler = urllib2.ProxyHandler({"http" : 'IP:8080'})
null_proxy_handler = urllib2.ProxyHandler({})
if enable_proxy:
opener = urllib2.build_opener(proxy_handler)
else:
opener = urllib2.build_opener(null_proxy_handler)
urllib2.install_opener(opener)
这里要注意的一个细节，使用 urllib2.install_opener() 会设置 urllib2 的全局 opener 。这样后面的使用会很方便，但不能做更细粒度的控制，比如想在程序中使用两个不同的 Proxy 设置等。比较好的做法是不使用 install_opener 去更改全局的设置，而只是直接调用 opener 的 open 方法代替全局的 urlopen 方法。

阅读全文

热点内容

qq电脑聊天缓存不安全发布：2025-09-19 03:43:38 浏览：361

大话2脚本制作发布：2025-09-19 03:25:47 浏览：497

脚本精灵用的什么语言发布：2025-09-19 03:21:32 浏览：847

微型机常用的存储器发布：2025-09-19 03:18:17 浏览：469

迷你世界脚本编辑代码在哪里发布：2025-09-19 03:17:40 浏览：374

我的世界110服务器的天域组织发布：2025-09-19 02:49:36 浏览：797

为什么安卓手机使用久了会变卡发布：2025-09-19 02:49:36 浏览：876

国家校时服务器ip 发布：2025-09-19 02:45:18 浏览：922

安卓补帧软件在哪里下发布：2025-09-19 02:45:17 浏览：33

安卓移机苹果怎么操作发布：2025-09-19 01:58:55 浏览：164

urllibpython27

与urllibpython27相关的资讯