python連接hive

發布時間: 2022-09-18 08:29:20

Ⅰ hive 調用python 寫的udf 報錯

我也遇到這個問題了，是python腳本的問題，不管hive表的分隔符是什麼，在python腳本裡面的分隔符都是'\t'，換成'\t'試一試。

Ⅱ python 連接hive後處理導出excel 問題

你的原始數據裡面有空值，因此導致的錯誤，在寫入或者讀取之前填充以下缺失值，或者先對要寫入或者讀取的數據判斷下是否為空，再做操作。
要不然你就加入try except，來主動跳過

Ⅲ python如何增量讀取hive數據，每次執行腳本把上次的結果做基準，列印出新增的部分

1.讀取文本文件數據（.txt結尾的文件）或日誌文件（.log結尾的文件）list0與list1分別為文檔中的第一列數據與第二列數據。

Ⅳ python連接Mysql 資料庫問題 cursor( ) 、execute（）和fetchall( )方法的作用

cucursor()方法的作用？獲取操作游標
execute方法的作用？執行SQL,括弧里的是sql語句
fetchall()方法滴作用？返回查詢到的所有記錄

Ⅳ python連接hive，怎麼安裝thrifthive

HiveServer2的啟動

啟動HiveServer2

HiveServer2的啟動十分簡便：

$ $HIVE_HOME/bin/hiveserver2

或者

$ $HIVE_HOME/bin/hive --service hiveserver2

默認情況下，HiverServer2的Thrift監聽埠是10000，其WEB UI埠是10002。可通過http://localhost:10002來查看HiveServer2的Web UI界面，這里顯示了Hive的一些基本信息。如果Web界面不能查看，則說明HiveServer2沒有成功運行。

使用beeline測試客戶端連接

HiveServer2成功運行後，我們可以使用Hive提供的客戶端工具beeline連接HiveServer2。

$ $HIVE_HOME/bin/beeline

beeline > !connect jdbc:hive2://localhost:10000

如果成功登錄將出現如下的命令提示符，此時可以編寫HQL語句。

0: jdbc:hive2://localhost:10000>

報錯：User: xxx is not allowed to impersonate anonymous

在beeline使用!connect連接HiveServer2時可能會出現如下錯誤信息：

Caused by: org.apache.hadoop.ipc.RemoteException:
User: xxx is not allowed to impersonate anonymous

這里的xxx是我的操作系統用戶名稱。這個問題的解決方法是在hadoop的core-size.xml文件中添加xxx用戶代理配置：

<property> <name>hadoop.proxyuser.xxx.groups</name> <value>*</value></property><property> <name>hadoop.proxyuser.xxx.hosts</name> <value>*</value></property>

重啟HDFS後，再用beeline連接HiveServer2即可成功連接。

常用配置

HiveServer2的配置可以參考官方文檔《Setting Up HiveServer2》

這里列舉一些hive-site.xml的常用配置：

hive.server2.thrift.port：監聽的TCP埠號。默認為10000。

hive.server2.thrift.bind.host：TCP介面的綁定主機。

hive.server2.authentication：身份驗證方式。默認為NONE（使用 plain SASL），即不進行驗證檢查。可選項還有NOSASL, KERBEROS, LDAP, PAM and CUSTOM.

hive.server2.enable.doAs：是否以模擬身份執行查詢處理。默認為true。

Python客戶端連接HiveServer2

python中用於連接HiveServer2的客戶端有3個：pyhs2，pyhive，impyla。官網的示例採用的是pyhs2，但pyhs2的官網已聲明不再提供支持，建議使用impyla和pyhive。我們這里使用的是impyla。

impyla的安裝

impyla必須的依賴包括：

six
bit_array
thriftpy(python2.x則是thrift)

為了支持Hive還需要以下兩個包：

sasl
thrift_sasl

可在Python PI中下載impyla及其依賴包的源碼。

impyla示例

以下是使用impyla連接HiveServer2的示例：

from impala.dbapi import connect

conn = connect(host='127.0.0.1', port=10000, database='default', auth_mechanism='PLAIN')

cur = conn.cursor()

cur.execute('SHOW DATABASES')print(cur.fetchall())

cur.execute('SHOW Tables')print(cur.fetchall())

Ⅵ jupyter如何鏈接hive

from impala.dbapi import connect提示找不到模塊
安裝包的時候參考Win7平台Python3使用impyla連接Hive遇到的坑，特別要注意安裝包的順序和對應的版本。

Ⅶ windows下怎麼用python連接hive資料庫

#!/usr/bin/python2.7
#hive--servicehiveserver>/dev/null2>/dev/null&
#/opt/cloudera/parcels/CDH/lib/hive/lib/pyimportsys

#python與hiveserver交互
sys.path.append('C:/hadoop_jar/py')
fromhive_serviceimportThriftHive
fromhive_service.
fromthrift.transportimportTSocket
fromthriftimportThrift
fromthrift.transportimportTTransport
fromthrift.protocolimportTBinaryProtocol

if__name__=='__main__':
try:
socket=TSocket.TSocket('10.70.50.111',10000)
transport=TTransport.TBufferedTransport(socket)
protocol=TBinaryProtocol.TBinaryProtocol(transport)
client=ThriftHive.Client(protocol)
sql='select*fromtest'
transport.open()
client.execute(sql)
withopen('C:/Users/DWJ/Desktop/python2hive.txt','w')asout_file:
whileclient.fetchOne():
out_file.write(client.fetchOne())
transport.close()
exceptThrift.TException,tx:
print'%s'%(tx.message)

其中，C:/hadoop_jar/py里的包來自於hive安裝文件自帶的py，如：/opt/cloudera/parcels/CDH/lib/hive/lib/py，將其添加到python中即可。

Ⅷ python連接hive的時候必須要依賴sasl類庫嗎

客戶端連接Hive需要使用HiveServer2。HiveServer2是HiveServer的重寫版本，HiveServer不支持多個客戶端的並發請求。當前HiveServer2是基於Thrift RPC實現的。它被設計用於為像JDBC、ODBC這樣的開發API客戶端提供更好的支持。Hive 0.11版本引入的HiveServer2。

HiveServer2的啟動

啟動HiveServer2

HiveServer2的啟動十分簡便：

$ $HIVE_HOME/bin/hiveserver2

或者

$ $HIVE_HOME/bin/hive --service hiveserver2

默認情況下，HiverServer2的Thrift監聽埠是10000，其WEB UI埠是10002。可通過來查看HiveServer2的Web UI界面，這里顯示了Hive的一些基本信息。如果Web界面不能查看，則說明HiveServer2沒有成功運行。

使用beeline測試客戶端連接

HiveServer2成功運行後，我們可以使用Hive提供的客戶端工具beeline連接HiveServer2。

$ $HIVE_HOME/bin/beeline

beeline > !connect jdbc:hive2://localhost:10000

如果成功登錄將出現如下的命令提示符，此時可以編寫HQL語句。

0: jdbc:hive2://localhost:10000>

報錯：User: xxx is not allowed to impersonate anonymous

在beeline使用!connect連接HiveServer2時可能會出現如下錯誤信息：

12Caused by: org.apache.hadoop.ipc.RemoteException:User: xxx is not allowed to impersonate anonymous

這里的xxx是我的操作系統用戶名稱。這個問題的解決方法是在hadoop的core-size.xml文件中添加xxx用戶代理配置：

123456789<spanclass="hljs-tag"><<spanclass="hljs-title">property><spanclass="hljs-tag"><<spanclass="hljs-title">name>hadoop.proxyuser.xxx.groups<spanclass="hljs-tag"></<spanclass="hljs-title">name><spanclass="hljs-tag"><<spanclass="hljs-title">value>*<spanclass="hljs-tag"></<spanclass="hljs-title">value><spanclass="hljs-tag"></<spanclass="hljs-title">property><spanclass="hljs-tag"><<spanclass="hljs-title">property><spanclass="hljs-tag"><<spanclass="hljs-title">name>hadoop.proxyuser.xxx.hosts<spanclass="hljs-tag"></<spanclass="hljs-title">name><spanclass="hljs-tag"><<spanclass="hljs-title">value>*<spanclass="hljs-tag"></<spanclass="hljs-title">value><spanclass="hljs-tag"></<spanclass="hljs-title">property>

重啟HDFS後，再用beeline連接HiveServer2即可成功連接。

常用配置

HiveServer2的配置可以參考官方文檔《Setting Up HiveServer2》

這里列舉一些hive-site.xml的常用配置：

hive.server2.thrift.port：監聽的TCP埠號。默認為10000。

hive.server2.thrift.bind.host：TCP介面的綁定主機。

hive.server2.authentication：身份驗證方式。默認為NONE（使用 plain SASL），即不進行驗證檢查。可選項還有NOSASL, KERBEROS, LDAP, PAM and CUSTOM.

hive.server2.enable.doAs：是否以模擬身份執行查詢處理。默認為true。

Python客戶端連接HiveServer2

impyla的安裝

impyla必須的依賴包括：

six
bit_array
thriftpy(python2.x則是thrift)

為了支持Hive還需要以下兩個包：

sasl
thrift_sasl

可在Python PI中下載impyla及其依賴包的源碼。

impyla示例

以下是使用impyla連接HiveServer2的示例：

Ⅸ hive中如何調用python函數

ADD FILE /home/taobao/dw_hive/hivelets/smoking/ext/tsa/hivesql/bjx_topic_t1/splitsysin.py.bak;
create table if not exists splittest_t1
(
topic_id string,
topic_title string,
topic_desc string,
biz_date string,
gmt_create string
) PARTITIONED BY(pt string)
row format delimited fields terminated by '\001'
lines terminated by '\n'
STORED AS textfile;

select TRANSFORM(topic_id,topic_title,topic_desc,biz_date,gmt_create)
USING 'splitsysin.py'
as topic_id,topic_title,topic_desc,biz_date,gmt_create
from r_bjx_dim_topic_t1;

閱讀全文

熱點內容

java返回this 發布：2025-10-20 08:28:16 瀏覽：643

製作腳本網站發布：2025-10-20 08:17:34 瀏覽：933

python中的init方法發布：2025-10-20 08:17:33 瀏覽：630

圖案密碼什麼意思發布：2025-10-20 08:16:56 瀏覽：818

怎麼清理微信視頻緩存發布：2025-10-20 08:12:37 瀏覽：728

c語言編譯器怎麼看執行過程發布：2025-10-20 08:00:32 瀏覽：1063

郵箱如何填寫發信伺服器發布：2025-10-20 07:45:27 瀏覽：296

shell腳本入門案例發布：2025-10-20 07:44:45 瀏覽：157

怎麼上傳照片瀏覽上傳發布：2025-10-20 07:44:03 瀏覽：847

python股票數據獲取發布：2025-10-20 07:39:44 瀏覽：759

python連接hive

與python連接hive相關的資訊