python34urllib

发布时间: 2023-08-14 13:27:52

Ⅰ python3.4 urllib.error.URLError: <urlopen error unknown url type:

你的http请求的链接有问题
链接是不是以协议的名称(http:/ /)开头

Ⅱ python3中使用urllib进行https请求

刚入门python学习网络爬虫基础，我使用的python版本是python3.6.4，学习的教程参考 Python爬虫入门教程

python3.6的版本已经没有urllib2这个库了，所以我也不需要纠结urllib和urllib2的区别和应用场景

参考这篇官方文档 HOWTO Fetch Internet Resources Using The urllib Package 。关于http(s)请求一般就get和post两种方式较为常用，所以写了以下两个小demo，url链接随便找的，具体场景具体变化，可参考注释中的基本思路

POST请求：

GET请求：

注意，
使用ssl创建未经验证的上下文，在urlopen中需传入上下文参数
urllib.request.urlopen(full_url, context=context)
这是Python 升级到 2.7.9 之后引入的一个新特性，所以在使用urlopen打开https链接会遇到如下报错：
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)
所以，当使用urllib.urlopen打开一个 https 链接时，需要先验证一次 SSL 证书
context = ssl._create_unverified_context()
或者或者导入ssl时关闭证书验证
ssl._create_default_https_context =ssl._create_unverified_context

Ⅲ python的httplib，urllib和urllib2的区别及用

宗述
首先来看一下他们的区别
urllib和urllib2
urllib 和urllib2都是接受URL请求的相关模块，但是urllib2可以接受一个Request类的实例来设置URL请求的headers，urllib仅可以接受URL。
这意味着，你不可以伪装你的User Agent字符串等。
urllib提供urlencode方法用来GET查询字符串的产生，而urllib2没有。这是为何urllib常和urllib2一起使用的原因。
目前的大部分http请求都是通过urllib2来访问的

httplib
httplib实现了HTTP和HTTPS的客户端协议，一般不直接使用，在更高层的封装模块中（urllib,urllib2）使用了它的http实现。

urllib简单用法
urllib.urlopen(url[, data[, proxies]]) :
[python] view plain

google = urllib.urlopen('')
print 'http header:/n', google.info()
print 'http status:', google.getcode()
print 'url:', google.geturl()
for line in google: # 就像在操作本地文件
print line,
google.close()

详细使用方法见
urllib学习

urllib2简单用法
最简单的形式
import urllib2
response=urllib2.urlopen(')
html=response.read()

实际步骤：
1、urllib2.Request()的功能是构造一个请求信息，返回的req就是一个构造好的请求
2、urllib2.urlopen()的功能是发送刚刚构造好的请求req，并返回一个文件类的对象response，包括了所有的返回信息。
3、通过response.read()可以读取到response里面的html，通过response.info()可以读到一些额外的信息。
如下：

#!/usr/bin/env python
import urllib2
req = urllib2.Request("")
response = urllib2.urlopen(req)
html = response.read()
print html

有时你会碰到，程序也对，但是服务器拒绝你的访问。这是为什么呢?问题出在请求中的头信息(header)。有的服务端有洁癖，不喜欢程序来触摸它。这个时候你需要将你的程序伪装成浏览器来发出请求。请求的方式就包含在header中。
常见的情形：

import urllib
import urllib2
url = '
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'# 将user_agent写入头信息
values = {'name' : 'who','password':'123456'}
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

values是post数据
GET方法
例如网络：
，这样我们需要将{‘wd’:’xxx’}这个字典进行urlencode

#coding:utf-8
import urllib
import urllib2
url = ''
values = {'wd':'D_in'}
data = urllib.urlencode(values)
print data
url2 = url+'?'+data
response = urllib2.urlopen(url2)
the_page = response.read()
print the_page

POST方法

import urllib
import urllib2
url = ''
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' //将user_agent写入头信息
values = {'name' : 'who','password':'123456'} //post数据
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values) //对post数据进行url编码
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

urllib2带cookie的使用

#coding:utf-8
import urllib2,urllib
import cookielib

url = r''

#创建一个cj的cookie的容器
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
#将要POST出去的数据进行编码
data = urllib.urlencode({"email":email,"password":pass})
r = opener.open(url,data)
print cj

httplib简单用法
简单示例

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import httplib
import urllib

def sendhttp():
data = urllib.urlencode({'@number': 12524, '@type': 'issue', '@action': 'show'})
headers = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain"}
conn = httplib.HTTPConnection('bugs.python.org')
conn.request('POST', '/', data, headers)
httpres = conn.getresponse()
print httpres.status
print httpres.reason
print httpres.read()

if __name__ == '__main__':
sendhttp()

具体用法见
httplib模块
python 3.x中urllib库和urilib2库合并成了urllib库。其中、
首先你导入模块由
import urllib
import urllib2
变成了
import urllib.request

然后是urllib2中的方法使用变成了如下
urllib2.urlopen()变成了urllib.request.urlopen()
urllib2.Request()变成了urllib.request.Request()

urllib2.URLError 变成了urllib.error.URLError
而当你想使用urllib 带数据的post请求时，
在python2中
urllib.urlencode(data)

而在python3中就变成了
urllib.parse.urlencode(data)

脚本使用举例：
python 2中

import urllib
import urllib2
import json
from config import settings
def url_request(self, action, url, **extra_data): abs_url = "http://%s:%s/%s" % (settings.configs['Server'],
settings.configs["ServerPort"],
url)
if action in ('get', 'GET'):
print(abs_url, extra_data)
try:
req = urllib2.Request(abs_url)
req_data = urllib2.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
# print "-->server response:",callback
return callback

except urllib2.URLError as e:
exit("\033[31;1m%s\033[0m" % e)
elif action in ('post', 'POST'):
# print(abs_url,extra_data['params'])
try:
data_encode = urllib.urlencode(extra_data['params'])
req = urllib2.Request(url=abs_url, data=data_encode)
res_data = urllib2.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = res_data.read()
callback = json.loads(callback)
print("\033[31;1m[%s]:[%s]\033[0m response:\n%s" % (action, abs_url, callback))
return callback
except Exception as e:
print('---exec', e)
exit("\033[31;1m%s\033[0m" % e)

python3.x中

import urllib.request
import json
from config import settings

def url_request(self, action, url, **extra_data):
abs_url = 'http://%s:%s/%s/' % (settings.configs['ServerIp'], settings.configs['ServerPort'], url)
if action in ('get', 'Get'): # get请求
print(action, extra_data)try:
req = urllib.request.Request(abs_url)
req_data = urllib.request.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
return callback
except urllib.error.URLError as e:
exit("\033[31;1m%s\033[0m" % e)
elif action in ('post', 'POST'): # post数据到服务器端
try:
data_encode = urllib.parse.urlencode(extra_data['params'])
req = urllib.request.Request(url=abs_url, data=data_encode)
req_data = urllib.request.urlopen(req, timeout=settings.configs['RequestTimeout'])
callback = req_data.read()
callback = json.loads(callback.decode())
return callback
except urllib.request.URLError as e:
print('---exec', e)
exit("\033[31;1m%s\033[0m" % e)

settings配置如下：

configs = {
'HostID': 2,
"Server": "localhost",
"ServerPort": 8000,
"urls": {

'get_configs': ['api/client/config', 'get'], #acquire all the services will be monitored
'service_report': ['api/client/service/report/', 'post'],

},
'RequestTimeout': 30,
'ConfigUpdateInterval': 300, # 5 mins as default

}

Ⅳ python3.4没有urllib2怎么办

python 3.x中urllib库和urilib2库合并成了urllib库。

其中urllib2.urlopen()变成了urllib.request.urlopen()

urllib2.Request()变成了urllib.request.Request()

Ⅳ python爬虫之urllib_get

from urllib import request
import ssl

url = ' http://www..com/'

"""
url, 请求的目标url地址
data=None,默认情况为咐胡None,表示发起的是一个get请求,不为None,则发起的是一个post请求
timeout=,设置请求的超时时间
cafile=None, 设置证书
capath=None, 设置证书路径
cadefault=False, 是否要使用默认证书（默认为False）
context=None:是一个ssl值,表示忽略ssl认证
"""

content = ssl._create_unverified_context()
response = request.urlopen(url,timeout=10,content=content)

code = response.status
print(code)

b_html = response.read()
print(type(b_html),len(b_html))

res_headers = response.getheaders()
print(res_headers)

cookie_data = response.getheader('Set-Cookie')
print(cookie_data)

reason = response.reason
print(reason)

str_html = b_html.decode('utf-8')
print(type(str_html))

with open('b_.page.html','w') as file:
# file.write(b_html)
file.write(str_html)

"""
url:发起请求的url地址
data=None, 默认情况为None,表示发起的是一个get请求,不为None,则发起的是一个post请求
headers={},设置请求头（headers对应的数据类型是一个字典）
origin_req_host=None, (指定发起请求的域)
unverifiable=False,忽略SSL认证
method=None：指定发起请求的方式
"""
req_header = {
'User-Agent'衡茄拦:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
req = request.Request(url,headers=req_header)

response = request.urlopen(req)
response.status
response.read()
response.getheaders()
response.getheader('Server')
response.reason

python2中:对于字符纳枯串和bytes类型的数据没有明显的区分

python3中:对于字符串和bytes类型的数据有明显的区分
将bytes类型的数据转换为字符串使用decode('编码类型')
将字符串转换为bytes类型的数据使用encode('编码类型')
bytearray和bytes类型的数据是有区别的：前者是可变的,后者是不可变的

Ⅵ 如何使用 python3+urllib 发送一个 application/json 的请求

head = {
'Accept': '*/*',
'Host': '',
'Connection': 'keep-alive',
'Content-Length': '245',
'Origin': '',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome'
'/51.0.2704.103 Safari/537.36',
'Content-Type': 'application/json',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Referer': '',
'Accept-Language': 'zh-CN,zh;q=0.8',
'Accept-Encoding': 'gzip, deflate'
}

cookie = cookiejar.CookieJar()
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
opener.addheader(head)
# 其他为了获取 cookie 的各种请求。
.....
.....
.....
parsed_request_data = json.mps(self.hotel_search_request_data).encode()
response = openner.open(self.hotel_search_url, parsed_request_data)

阅读全文

热点内容

我的世界服务器记分板排版发布：2025-07-03 11:39:22 浏览：568

安卓前期用什么处理器发布：2025-07-03 11:37:54 浏览：868

如何更换安卓手机内存发布：2025-07-03 11:18:52 浏览：57

魔兽清理缓存发布：2025-07-03 10:46:38 浏览：521

神州防火墙web怎么配置代码发布：2025-07-03 10:37:54 浏览：328

安卓看小说哪个软件免费又最好发布：2025-07-03 10:25:30 浏览：435

linuxprofile 发布：2025-07-03 10:25:29 浏览：719

存储蓝盘发布：2025-07-03 09:55:10 浏览：887

java必学发布：2025-07-03 09:21:57 浏览：447

go在线编译发布：2025-07-03 09:14:51 浏览：19

python34urllib

与python34urllib相关的资讯