python写数据

发布时间: 2024-06-20 04:58:21

1. 如何用python把一组数据写入一个文件

一、将一组数据追加到文件中

例如：将123追加到文件1.txt的末尾

definit():
withopen('1.txt','r+')astext:
text.read()
text.write('123')
text.close()
init()

二、将一组数据覆盖到文件中

将123覆盖到1.txt文件中，1.txt之前的数据全没了

definit():
withopen('1.txt','r+')astext:
text.write('123')
text.close()
init()

2. Python中怎样实现向一个文档中写入数据，要求从倒数第三行开始写入

将文本每行读入一个数组中啊然后直接用python 中数组 insert（）插入

file = open("C:/a.txt", "r")
li = []
#line_counter = 0
while 1:
line = file.readline()
if line:
li.append(line)
else:
break

file.close()
lines = len(li)
‘’‘
下面从倒数第三行开始插入想要插入的字符串，每行插入的字符用\n分开。如果要写的东西多，可以从外部文件读入数据
’‘’
li.insert(lines-3,"1st row you want to write \n 2nd row you want to write \n")

file=open("C:/a.txt", "w")
for line in li:
file.write(line)
file.close()

3. 利用Python如何将数据写到CSV文件中

如果你的数据是列表格式，可以使用一个迭代器，将数据写入文件，同时添加必要的分隔符以构成csv文件
如果数据是字典格式，需要考虑使用换行符或者其他特殊符号来分割每个字典元素（包括键和值）。键和值可以考虑使用和之前不重复的分隔符进行分割。
这样就构成了一个csv文件（csv使用分隔符分割值的文件）
操作方法如下：
1，使用读写追加的方式打开csv文件。
2，找到csv文件的结尾。
3，在结尾使用和之前csv使用的分割相同的格式进行数据添加。
4，关闭文件

4. 怎么用python进行数据

pandas是本书后续内容的首选库。pandas可以满足以下需求：

具备按轴自动或显式数据对齐功能的数据结构。这可以防止许多由于数据未对齐以及来自不同数据源（索引方式不同）的数据而导致的常见错误。.
集成时间序列功能
既能处理时间序列数据也能处理非时间序列数据的数据结构
数学运算和简约（比如对某个轴求和）可以根据不同的元数据（轴编号）执行
灵活处理缺失数据
合并及其他出现在常见数据库（例如基于SQL的）中的关系型运算

1、pandas数据结构介绍

两个数据结构：Series和DataFrame。Series是一种类似于以为NumPy数组的对象，它由一组数据（各种NumPy数据类型）和与之相关的一组数据标签（即索引）组成的。可以用index和values分别规定索引和值。如果不规定索引，会自动创建 0 到 N-1 索引。

#-*- encoding:utf-8 -*-import numpy as npimport osimport pandas as pdfrom pandas import Series,DataFrameimport matplotlib.pyplot as pltimport time#下面看一下cummin函数#注意：这里的cummin函数是截止到目前为止的最小值，而不是加和以后的最小值frame = DataFrame([[1,2,3,4],[5,6,7,8],[-10,11,12,-13]],index = list('abc'),columns = ['one','two','three','four'])print frame.cummin()print frame

one two three four

a 1 2 3 4

b 1 2 3 4

c -10 2 3 -13

one two three four

a 1 2 3 4

b 5 6 7 8

c -10 11 12 -13

相关系数与协方差

有些汇总

5. 如何用python写数据分析工具

数据导入
导入本地的或者web端的CSV文件；
数据变换；
数据统计描述；
假设检验
单样本t检验；
可视化；
创建自定义函数。

数据导入

这是很关键的一步，为了后续的分析我们首先需要导入数据。通常来说，数据是CSV格式，就算不是，至少也可以转换成CSV格式。在Python中，我们的操作如下：

Python

import pandas as pd

# Reading data locally

df = pd.read_csv('/Users/al-ahmadgaidasaad/Documents/d.csv')

# Reading data from web

data_url = "t/Analysis-with-Programming/master/2014/Python/Numerical-Descriptions-of-the-Data/data.csv"

df = pd.read_csv(data_url)

为了读取本地CSV文件，我们需要pandas这个数据分析库中的相应模块。其中的read_csv函数能够读取本地和web数据。

数据变换

既然在工作空间有了数据，接下来就是数据变换。统计学家和科学家们通常会在这一步移除分析中的非必要数据。我们先看看数据：

Python

# Head of the data

print df.head()

# OUTPUT

0 12432934148330010553

1 41589235 4287806335257

2 17871922 19551074 4544

317152 14501 3536 1960731687

4 12662385 25303315 8520

# Tail of the data

print df.tail()

# OUTPUT

74 2505 20878 3519 1973716513

7560303 40065 7062 1942261808

76 63116756 3561 1591023349

7713345 38902 2583 1109668663

78 2623 18264 3745 1678716900

对R语言程序员来说，上述操作等价于通过print(head(df))来打印数据的前6行，以及通过print(tail(df))来打印数据的后6行。当然Python中，默认打印是5行，而R则是6行。因此R的代码head(df, n = 10)，在Python中就是df.head(n = 10)，打印数据尾部也是同样道理。

在R语言中，数据列和行的名字通过colnames和rownames来分别进行提取。在Python中，我们则使用columns和index属性来提取，如下：

Python

# Extracting column names

print df.columns

# OUTPUT

Index([u'Abra', u'Apayao', u'Benguet', u'Ifugao', u'Kalinga'], dtype='object')

# Extracting row names or the index

print df.index

# OUTPUT

Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78], dtype='int64')

数据转置使用T方法，

Python

# Transpose data

print df.T

# OUTPUT

01 23 45 67 89

Abra1243 41581787171521266 5576 927215401039 5424

Apayao2934 92351922145012385 7452109917038138210588

Benguet148 42871955 353625307712796 24632592 1064

Ifugao3300

... 69 70 71 72 73 74 75 76 77

Abra ...12763 247059094 620913316 250560303 631113345

Apayao ...376251953235126 6335386132087840065 675638902

Benguet... 2354 4045 5987 3530 2585 3519 7062 3561 2583

Ifugao ... 9838171251894015560 774619737194221591011096

Kalinga...

Abra2623

Apayao 18264

Benguet 3745

Ifugao 16787

Kalinga16900

Other transformations such as sort can be done using<code>sort</code>attribute. Now let's extract a specific column. In Python, we do it using either<code>iloc</code>or<code>ix</code>attributes, but<code>ix</code>is more robust and thus I prefer it. Assuming we want the head of the first column of the data, we have

其他变换，例如排序就是用sort属性。现在我们提取特定的某列数据。Python中，可以使用iloc或者ix属性。但是我更喜欢用ix，因为它更稳定一些。假设我们需数据第一列的前5行，我们有：

Python

print df.ix[:, 0].head()

# OUTPUT

0 1243

1 4158

2 1787

317152

4 1266

Name: Abra, dtype: int64

顺便提一下，Python的索引是从0开始而非1。为了取出从11到20行的前3列数据，我们有：

Python

print df.ix[10:20, 0:3]

# OUTPUT

AbraApayaoBenguet

109811311 2560

1127366 15093 3039

12 11001701 2382

13 7212 11001 1088

14 10481427 2847

1525679 15661 2942

16 10552191 2119

17 54376461734

18 10291183 2302

1923710 12222 2598

20 10912343 2654

上述命令相当于df.ix[10:20, ['Abra', 'Apayao', 'Benguet']]。

为了舍弃数据中的列，这里是列1(Apayao)和列2(Benguet)，我们使用drop属性，如下：

Python

print df.drop(df.columns[[1, 2]], axis = 1).head()

# OUTPUT

AbraIfugaoKalinga

0 1243330010553

1 4158806335257

2 17871074 4544

317152 1960731687

4 12663315 8520

axis参数告诉函数到底舍弃列还是行。如果axis等于0，那么就舍弃行。

统计描述

下一步就是通过describe属性，对数据的统计特性进行描述：

Python

print df.describe()

# OUTPUT

AbraApayaoBenguetIfugao Kalinga

count 79.000000 79.00000079.000000 79.000000 79.000000

mean 12874.37974716860.6455703237.39240512414.62025330446.417722

std16746.46694515448.1537941588.536429 5034.28201922245.707692

min927.000000401.000000 148.000000 1074.000000 2346.000000

25% 1524.000000 3435.5000002328.000000 8205.000000 8601.500000

50% 5790.00000010588.0000003202.00000013044.00000024494.000000

75%13330.50000033289.0000003918.50000016099.50000052510.500000

max60303.00000054625.0000008813.00000021031.00000068663.000000

假设检验

Python有一个很好的统计推断包。那就是scipy里面的stats。ttest_1samp实现了单样本t检验。因此，如果我们想检验数据Abra列的稻谷产量均值，通过零假设，这里我们假定总体稻谷产量均值为15000，我们有：

Python

from scipy import stats as ss

# Perform one sample t-test using 1500 as the true mean

print ss.ttest_1samp(a = df.ix[:, 'Abra'], popmean = 15000)

# OUTPUT

(-1.1281738488299586, 0.26270472069109496)

返回下述值组成的元祖：

t : 浮点或数组类型
t统计量
prob : 浮点或数组类型
two-tailed p-value 双侧概率值

通过上面的输出，看到p值是0.267远大于α等于0.05，因此没有充分的证据说平均稻谷产量不是150000。将这个检验应用到所有的变量，同样假设均值为15000，我们有：

Python

print ss.ttest_1samp(a = df, popmean = 15000)

# OUTPUT

(array([ -1.12817385, 1.07053437, -65.81425599,-4.564575, 6.17156198]),

array([2.62704721e-01, 2.87680340e-01, 4.15643528e-70,

1.83764399e-05, 2.82461897e-08]))

第一个数组是t统计量，第二个数组则是相应的p值。

可视化

Python中有许多可视化模块，最流行的当属matpalotlib库。稍加提及，我们也可选择bokeh和seaborn模块。之前的博文中，我已经说明了matplotlib库中的盒须图模块功能。

;

重复100次; 然后
计算出置信区间包含真实均值的百分比

Python中，程序如下：

Python

import numpy as np

import scipy.stats as ss

def case(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100):

m = np.zeros((rep, 4))

for i in range(rep):

norm = np.random.normal(loc = mu, scale = sigma, size = n)

xbar = np.mean(norm)

low = xbar - ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))

up = xbar + ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))

if (mu > low) & (mu < up):

rem = 1

else:

rem = 0

m[i, :] = [xbar, low, up, rem]

inside = np.sum(m[:, 3])

per = inside / rep

desc = "There are " + str(inside) + " confidence intervals that contain "

"the true mean (" + str(mu) + "), that is " + str(per) + " percent of the total CIs"

return {"Matrix": m, "Decision": desc}

上述代码读起来很简单，但是循环的时候就很慢了。下面针对上述代码进行了改进，这多亏了Python专家，看我上篇博文的15条意见吧。

Python

import numpy as np

import scipy.stats as ss

def case2(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100):

scaled_crit = ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))

norm = np.random.normal(loc = mu, scale = sigma, size = (rep, n))

xbar = norm.mean(1)

low = xbar - scaled_crit

up = xbar + scaled_crit

rem = (mu > low) & (mu < up)

m = np.c_[xbar, low, up, rem]

inside = np.sum(m[:, 3])

per = inside / rep

desc = "There are " + str(inside) + " confidence intervals that contain "

"the true mean (" + str(mu) + "), that is " + str(per) + " percent of the total CIs"

return {"Matrix": m, "Decision": desc}

更新

那些对于本文ipython notebook版本感兴趣的，请点击这里。这篇文章由Nuttens Claude负责转换成ipython notebook 。

6. 用python把list里的数据写入csv

result=[(u'appleiOS',u'appleiOS',u'$400'),
(u'likenew',u'5',u'$149'),
(u'appleiOS',u'appleiOS',u'$900'),
(u'excellent',u'6Plus',u'$550'),
(u'likenew',u'appleiOS',u'$279'),
(u'likenew',u'4',u'$59')]
withopen('data.csv','wb')asf:
foriteminresult:
line=','.join(item)+'
'
f.write(line.encode('utf-8'))

7. Excel用Python读取清洗后怎么写入数据

导入xlrd库。
要导入xlrd库，它是读取excel中数据的库，解压所下载的压缩包，用cmd命令行CD到解压目录，执行pythonsetup.pyinstall命令，要导入xlwt库，它是开发人员用于生成与MicrosoftExcel版本95到2003兼容的电子表格文件的库。接着用cmd命令行切换到下载文件所解压的目录，输入pythonsetup.pyinstall命令，如无意外则安装成功。
openpyxl是用于读取和写入Excel2010xlsx/xlsm/xltx/xltm文件的Python库。

阅读全文

热点内容

怎么查身份证密码发布：2025-04-22 23:12:07 浏览：206

如何用服务器跑github项目发布：2025-04-22 23:10:55 浏览：947

ccs编译dsp程序的指令发布：2025-04-22 23:06:42 浏览：368

映射盘符脚本发布：2025-04-22 22:55:35 浏览：259

王者荣耀安卓系统怎么转换到苹果发布：2025-04-22 22:53:29 浏览：986

emobile7服务器地址如何查看发布：2025-04-22 22:32:51 浏览：763

房间的秘密码是什么发布：2025-04-22 22:32:43 浏览：121

文件夹前面多了选择框发布：2025-04-22 22:32:40 浏览：704

迅雷网ftp 发布：2025-04-22 22:30:02 浏览：622

鼠标驱动源码发布：2025-04-22 22:29:55 浏览：768

python写数据

与python写数据相关的资讯