當前位置:首頁 » 編程語言 » python貝葉斯分類

python貝葉斯分類

發布時間: 2024-01-21 16:33:59

1. python分類演算法有哪些

常見的分類演算法有:

  • K近鄰演算法

  • 決策樹

  • 樸素貝葉斯

  • SVM

  • Logistic Regression

2. python scikit-learn 有什麼演算法

1,前言

很久不發文章,主要是Copy別人的總感覺有些不爽,所以整理些干貨,希望相互學習吧。不啰嗦,進入主題吧,本文主要時說的為樸素貝葉斯分類演算法。與邏輯回歸,決策樹一樣,是較為廣泛使用的有監督分類演算法,簡單且易於理解(號稱十大數據挖掘演算法中最簡單的演算法)。但其在處理文本分類,郵件分類,拼寫糾錯,中文分詞,統計機器翻譯等自然語言處理范疇較為廣泛使用,或許主要得益於基於概率理論,本文主要為小編從理論理解到實踐的過程記錄。

2,公式推斷

一些貝葉斯定理預習知識:我們知道當事件A和事件B獨立時,P(AB)=P(A)(B),但如果事件不獨立,則P(AB)=P(A)P(B|A)。為兩件事件同時發生時的一般公式,即無論事件A和B是否獨立。當然也可以寫成P(AB)=P(B)P(A|B),表示若要兩件事同事發生,則需要事件B發生後,事件A也要發生。

由上可知,P(A)P(B|A)= P(B)P(A|B)

推出P(B|A)=

其中P(B)為先驗概率,P(B|A)為B的後驗概率,P(A|B)為A的後驗概率(在這里也為似然值),P(A)為A的先驗概率(在這也為歸一化常量)。

由上推導可知,其實樸素貝葉斯法就是在貝葉斯定理基礎上,加上特徵條件獨立假設,對特定輸入的X(樣本,包含N個特徵),求出後驗概率最大值時的類標簽Y(如是否為垃圾郵件),理解起來比邏輯回歸要簡單多,有木有,這也是本演算法優點之一,當然運行起來由於得益於特徵獨立假設,運行速度也更快。

8. Python代碼

# -*-coding: utf-8 -*-

importtime

fromsklearn import metrics

fromsklearn.naive_bayes import GaussianNB

fromsklearn.naive_bayes import MultinomialNB

fromsklearn.naive_bayes import BernoulliNB

fromsklearn.neighbors import KNeighborsClassifier

fromsklearn.linear_model import LogisticRegression

fromsklearn.ensemble import RandomForestClassifier

fromsklearn import tree

fromsklearn.ensemble import GradientBoostingClassifier

fromsklearn.svm import SVC

importnumpy as np

importurllib

# urlwith dataset

url ="-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"

#download the file

raw_data= urllib.request.urlopen(url)

#load the CSV file as a numpy matrix

dataset= np.loadtxt(raw_data, delimiter=",")

#separate the data from the target attributes

X =dataset[:,0:7]

#X=preprocessing.MinMaxScaler().fit_transform(x)

#print(X)

y =dataset[:,8]

print(" 調用scikit的樸素貝葉斯演算法包GaussianNB ")

model= GaussianNB()

start_time= time.time()

model.fit(X,y)

print('training took %fs!' % (time.time() - start_time))

print(model)

expected= y

predicted= model.predict(X)

print(metrics.classification_report(expected,predicted))

print(metrics.confusion_matrix(expected,predicted))

print(" 調用scikit的樸素貝葉斯演算法包MultinomialNB ")

model= MultinomialNB(alpha=1)

start_time= time.time()

model.fit(X,y)

print('training took %fs!' % (time.time() - start_time))

print(model)

expected= y

predicted= model.predict(X)

print(metrics.classification_report(expected,predicted))

print(metrics.confusion_matrix(expected,predicted))

print(" 調用scikit的樸素貝葉斯演算法包BernoulliNB ")

model= BernoulliNB(alpha=1,binarize=0.0)

start_time= time.time()

model.fit(X,y)

print('training took %fs!' % (time.time() - start_time))

print(model)

expected= y

predicted= model.predict(X)

print(metrics.classification_report(expected,predicted))

print(metrics.confusion_matrix(expected,predicted))

print(" 調用scikit的KNeighborsClassifier ")

model= KNeighborsClassifier()

start_time= time.time()

model.fit(X,y)

print('training took %fs!' % (time.time() - start_time))

print(model)

expected= y

predicted= model.predict(X)

print(metrics.classification_report(expected,predicted))

print(metrics.confusion_matrix(expected,predicted))

print(" 調用scikit的LogisticRegression(penalty='l2')")

model= LogisticRegression(penalty='l2')

start_time= time.time()

model.fit(X,y)

print('training took %fs!' % (time.time() - start_time))

print(model)

expected= y

predicted= model.predict(X)

print(metrics.classification_report(expected,predicted))

print(metrics.confusion_matrix(expected,predicted))

print(" 調用scikit的RandomForestClassifier(n_estimators=8) ")

model= RandomForestClassifier(n_estimators=8)

start_time= time.time()

model.fit(X,y)

print('training took %fs!' % (time.time() - start_time))

print(model)

expected= y

predicted= model.predict(X)

print(metrics.classification_report(expected,predicted))

print(metrics.confusion_matrix(expected,predicted))

print(" 調用scikit的tree.DecisionTreeClassifier()")

model= tree.DecisionTreeClassifier()

start_time= time.time()

model.fit(X,y)

print('training took %fs!' % (time.time() - start_time))

print(model)

expected= y

predicted= model.predict(X)

print(metrics.classification_report(expected,predicted))

print(metrics.confusion_matrix(expected,predicted))

print(" 調用scikit的GradientBoostingClassifier(n_estimators=200) ")

model= GradientBoostingClassifier(n_estimators=200)

start_time= time.time()

model.fit(X,y)

print('training took %fs!' % (time.time() - start_time))

print(model)

expected= y

predicted= model.predict(X)

print(metrics.classification_report(expected,predicted))

print(metrics.confusion_matrix(expected,predicted))

print(" 調用scikit的SVC(kernel='rbf', probability=True) ")

model= SVC(kernel='rbf', probability=True)

start_time= time.time()

model.fit(X,y)

print('training took %fs!' % (time.time() - start_time))

print(model)

expected= y

predicted= model.predict(X)

print(metrics.classification_report(expected,predicted))

print(metrics.confusion_matrix(expected,predicted))

"""

# 預處理代碼集錦

importpandas as pd

df=pd.DataFrame(dataset)

print(df.head(3))

print(df.describe())##描述性分析

print(df.corr())##各特徵相關性分析

##計算每行每列數據的缺失值個數

defnum_missing(x):

return sum(x.isnull())

print("Missing values per column:")

print(df.apply(num_missing, axis=0)) #axis=0代表函數應用於每一列

print(" Missing values per row:")

print(df.apply(num_missing, axis=1).head()) #axis=1代表函數應用於每一行"""

熱點內容
黑漫的伺服器ip 發布:2025-01-23 03:16:40 瀏覽:650
tplink無internet訪問 發布:2025-01-23 03:15:18 瀏覽:566
原神用安卓手機玩為什麼畫質那麼低 發布:2025-01-23 03:09:31 瀏覽:847
空調壓縮機是外機嗎 發布:2025-01-23 03:09:31 瀏覽:950
大學資料庫學 發布:2025-01-23 02:54:30 瀏覽:588
部隊營區監控系統錄像存儲多少天 發布:2025-01-23 02:49:26 瀏覽:523
oraclelinux用戶名和密碼 發布:2025-01-23 02:43:06 瀏覽:404
安卓手機主頁滑動屏幕怎麼設置 發布:2025-01-23 02:41:15 瀏覽:225
小臉解壓 發布:2025-01-23 02:24:17 瀏覽:368
網易電腦版我的世界布吉島伺服器 發布:2025-01-23 02:20:17 瀏覽:985