python连接hive

发布时间: 2022-09-18 08:29:20

Ⅰ hive 调用python 写的udf 报错

我也遇到这个问题了，是python脚本的问题，不管hive表的分隔符是什么，在python脚本里面的分隔符都是'\t'，换成'\t'试一试。

Ⅱ python 连接hive后处理导出excel 问题

你的原始数据里面有空值，因此导致的错误，在写入或者读取之前填充以下缺失值，或者先对要写入或者读取的数据判断下是否为空，再做操作。
要不然你就加入try except，来主动跳过

Ⅲ python如何增量读取hive数据，每次执行脚本把上次的结果做基准，打印出新增的部分

1.读取文本文件数据（.txt结尾的文件）或日志文件（.log结尾的文件）list0与list1分别为文档中的第一列数据与第二列数据。

Ⅳ python连接Mysql 数据库问题 cursor( ) 、execute（）和fetchall( )方法的作用

cucursor()方法的作用？获取操作游标
execute方法的作用？执行SQL,括号里的是sql语句
fetchall()方法滴作用？返回查询到的所有记录

Ⅳ python连接hive，怎么安装thrifthive

HiveServer2的启动

启动HiveServer2

HiveServer2的启动十分简便：

$ $HIVE_HOME/bin/hiveserver2

或者

$ $HIVE_HOME/bin/hive --service hiveserver2

默认情况下，HiverServer2的Thrift监听端口是10000，其WEB UI端口是10002。可通过http://localhost:10002来查看HiveServer2的Web UI界面，这里显示了Hive的一些基本信息。如果Web界面不能查看，则说明HiveServer2没有成功运行。

使用beeline测试客户端连接

HiveServer2成功运行后，我们可以使用Hive提供的客户端工具beeline连接HiveServer2。

$ $HIVE_HOME/bin/beeline

beeline > !connect jdbc:hive2://localhost:10000

如果成功登录将出现如下的命令提示符，此时可以编写HQL语句。

0: jdbc:hive2://localhost:10000>

报错：User: xxx is not allowed to impersonate anonymous

在beeline使用!connect连接HiveServer2时可能会出现如下错误信息：

Caused by: org.apache.hadoop.ipc.RemoteException:
User: xxx is not allowed to impersonate anonymous

这里的xxx是我的操作系统用户名称。这个问题的解决方法是在hadoop的core-size.xml文件中添加xxx用户代理配置：

<property> <name>hadoop.proxyuser.xxx.groups</name> <value>*</value></property><property> <name>hadoop.proxyuser.xxx.hosts</name> <value>*</value></property>

重启HDFS后，再用beeline连接HiveServer2即可成功连接。

常用配置

HiveServer2的配置可以参考官方文档《Setting Up HiveServer2》

这里列举一些hive-site.xml的常用配置：

hive.server2.thrift.port：监听的TCP端口号。默认为10000。

hive.server2.thrift.bind.host：TCP接口的绑定主机。

hive.server2.authentication：身份验证方式。默认为NONE（使用 plain SASL），即不进行验证检查。可选项还有NOSASL, KERBEROS, LDAP, PAM and CUSTOM.

hive.server2.enable.doAs：是否以模拟身份执行查询处理。默认为true。

Python客户端连接HiveServer2

python中用于连接HiveServer2的客户端有3个：pyhs2，pyhive，impyla。官网的示例采用的是pyhs2，但pyhs2的官网已声明不再提供支持，建议使用impyla和pyhive。我们这里使用的是impyla。

impyla的安装

impyla必须的依赖包括：

six
bit_array
thriftpy(python2.x则是thrift)

为了支持Hive还需要以下两个包：

sasl
thrift_sasl

可在Python PI中下载impyla及其依赖包的源码。

impyla示例

以下是使用impyla连接HiveServer2的示例：

from impala.dbapi import connect

conn = connect(host='127.0.0.1', port=10000, database='default', auth_mechanism='PLAIN')

cur = conn.cursor()

cur.execute('SHOW DATABASES')print(cur.fetchall())

cur.execute('SHOW Tables')print(cur.fetchall())

Ⅵ jupyter如何链接hive

from impala.dbapi import connect提示找不到模块
安装包的时候参考Win7平台Python3使用impyla连接Hive遇到的坑，特别要注意安装包的顺序和对应的版本。

Ⅶ windows下怎么用python连接hive数据库

#!/usr/bin/python2.7
#hive--servicehiveserver>/dev/null2>/dev/null&
#/opt/cloudera/parcels/CDH/lib/hive/lib/pyimportsys

#python与hiveserver交互
sys.path.append('C:/hadoop_jar/py')
fromhive_serviceimportThriftHive
fromhive_service.
fromthrift.transportimportTSocket
fromthriftimportThrift
fromthrift.transportimportTTransport
fromthrift.protocolimportTBinaryProtocol

if__name__=='__main__':
try:
socket=TSocket.TSocket('10.70.50.111',10000)
transport=TTransport.TBufferedTransport(socket)
protocol=TBinaryProtocol.TBinaryProtocol(transport)
client=ThriftHive.Client(protocol)
sql='select*fromtest'
transport.open()
client.execute(sql)
withopen('C:/Users/DWJ/Desktop/python2hive.txt','w')asout_file:
whileclient.fetchOne():
out_file.write(client.fetchOne())
transport.close()
exceptThrift.TException,tx:
print'%s'%(tx.message)

其中，C:/hadoop_jar/py里的包来自于hive安装文件自带的py，如：/opt/cloudera/parcels/CDH/lib/hive/lib/py，将其添加到python中即可。

Ⅷ python连接hive的时候必须要依赖sasl类库吗

客户端连接Hive需要使用HiveServer2。HiveServer2是HiveServer的重写版本，HiveServer不支持多个客户端的并发请求。当前HiveServer2是基于Thrift RPC实现的。它被设计用于为像JDBC、ODBC这样的开发API客户端提供更好的支持。Hive 0.11版本引入的HiveServer2。

HiveServer2的启动

启动HiveServer2

HiveServer2的启动十分简便：

$ $HIVE_HOME/bin/hiveserver2

或者

$ $HIVE_HOME/bin/hive --service hiveserver2

默认情况下，HiverServer2的Thrift监听端口是10000，其WEB UI端口是10002。可通过来查看HiveServer2的Web UI界面，这里显示了Hive的一些基本信息。如果Web界面不能查看，则说明HiveServer2没有成功运行。

使用beeline测试客户端连接

HiveServer2成功运行后，我们可以使用Hive提供的客户端工具beeline连接HiveServer2。

$ $HIVE_HOME/bin/beeline

beeline > !connect jdbc:hive2://localhost:10000

如果成功登录将出现如下的命令提示符，此时可以编写HQL语句。

0: jdbc:hive2://localhost:10000>

报错：User: xxx is not allowed to impersonate anonymous

在beeline使用!connect连接HiveServer2时可能会出现如下错误信息：

12Caused by: org.apache.hadoop.ipc.RemoteException:User: xxx is not allowed to impersonate anonymous

这里的xxx是我的操作系统用户名称。这个问题的解决方法是在hadoop的core-size.xml文件中添加xxx用户代理配置：

123456789<spanclass="hljs-tag"><<spanclass="hljs-title">property><spanclass="hljs-tag"><<spanclass="hljs-title">name>hadoop.proxyuser.xxx.groups<spanclass="hljs-tag"></<spanclass="hljs-title">name><spanclass="hljs-tag"><<spanclass="hljs-title">value>*<spanclass="hljs-tag"></<spanclass="hljs-title">value><spanclass="hljs-tag"></<spanclass="hljs-title">property><spanclass="hljs-tag"><<spanclass="hljs-title">property><spanclass="hljs-tag"><<spanclass="hljs-title">name>hadoop.proxyuser.xxx.hosts<spanclass="hljs-tag"></<spanclass="hljs-title">name><spanclass="hljs-tag"><<spanclass="hljs-title">value>*<spanclass="hljs-tag"></<spanclass="hljs-title">value><spanclass="hljs-tag"></<spanclass="hljs-title">property>

重启HDFS后，再用beeline连接HiveServer2即可成功连接。

常用配置

HiveServer2的配置可以参考官方文档《Setting Up HiveServer2》

这里列举一些hive-site.xml的常用配置：

hive.server2.thrift.port：监听的TCP端口号。默认为10000。

hive.server2.thrift.bind.host：TCP接口的绑定主机。

hive.server2.authentication：身份验证方式。默认为NONE（使用 plain SASL），即不进行验证检查。可选项还有NOSASL, KERBEROS, LDAP, PAM and CUSTOM.

hive.server2.enable.doAs：是否以模拟身份执行查询处理。默认为true。

Python客户端连接HiveServer2

impyla的安装

impyla必须的依赖包括：

six
bit_array
thriftpy(python2.x则是thrift)

为了支持Hive还需要以下两个包：

sasl
thrift_sasl

可在Python PI中下载impyla及其依赖包的源码。

impyla示例

以下是使用impyla连接HiveServer2的示例：

Ⅸ hive中如何调用python函数

ADD FILE /home/taobao/dw_hive/hivelets/smoking/ext/tsa/hivesql/bjx_topic_t1/splitsysin.py.bak;
create table if not exists splittest_t1
(
topic_id string,
topic_title string,
topic_desc string,
biz_date string,
gmt_create string
) PARTITIONED BY(pt string)
row format delimited fields terminated by '\001'
lines terminated by '\n'
STORED AS textfile;

select TRANSFORM(topic_id,topic_title,topic_desc,biz_date,gmt_create)
USING 'splitsysin.py'
as topic_id,topic_title,topic_desc,biz_date,gmt_create
from r_bjx_dim_topic_t1;

阅读全文

热点内容

java返回this 发布：2025-10-20 08:28:16 浏览：847

制作脚本网站发布：2025-10-20 08:17:34 浏览：1113

python中的init方法发布：2025-10-20 08:17:33 浏览：818

图案密码什么意思发布：2025-10-20 08:16:56 浏览：986

怎么清理微信视频缓存发布：2025-10-20 08:12:37 浏览：874

c语言编译器怎么看执行过程发布：2025-10-20 08:00:32 浏览：1223

邮箱如何填写发信服务器发布：2025-10-20 07:45:27 浏览：445

shell脚本入门案例发布：2025-10-20 07:44:45 浏览：328

怎么上传照片浏览上传发布：2025-10-20 07:44:03 浏览：1004

python股票数据获取发布：2025-10-20 07:39:44 浏览：971

python连接hive

与python连接hive相关的资讯