python数据处理（数据集的格式转换）

2023-05-16

摘要

本文总结常常用到的一些数据处理方法，主要是numpy和python 数组、列表、字典的操作。

一、保存N维数组(ndarray)到本地文件

本节参考原文在此，本节复述目的是防止丢失，时常复习。

1、需求

实际程序中，往往需要将运算结果（ndarray类型）保存到本地，以便进行后续的数据分析。利用numpy.savetxt可以保存1维或2维数据到txt文件中，但无法保存3维以上的数据。比如对一个图像库提取的图像特征。
此时可以用numpy.savez方法来保存3维以上的数据。

2、接口

保存数据用numpy.savez

可以保存任意多个N维数组，有两种保存方式：

用*args方式，比如np.savez('data',d1,d2,d3),它将会以以arr_0,arr_1,arr_2来表示d1,d2,d3的名字，用于访问数据时用。
用**kwds方式，比如np.savez('data',d1=d1,d2=d2,d3=d3)，即指定了数组名，用于访问数据时用。

保存后的本地文件为npz格式。
需要载入数据时用np.load(file)方法
载入后的对象为NpzFile对象，类似一个字典。可以通过NpzFile.files来查看该文件中，有哪些数据。进而再通过类似字典索引的方式来访问数据，详情见实例。

Signature: np.savez(file, *args, **kwds)
Docstring:
Save several arrays into a single file in uncompressed ``.npz`` format.

If arguments are passed in with no keywords, the corresponding variable
names, in the ``.npz`` file, are 'arr_0', 'arr_1', etc. If keyword
arguments are given, the corresponding variable names, in the ``.npz``
file will match the keyword names.

Parameters
----------
file : str or file
    Either the file name (string) or an open file (file-like object)
    where the data will be saved. If file is a string or a Path, the
    ``.npz`` extension will be appended to the file name if it is not
    already there.
args : Arguments, optional
    Arrays to save to the file. Since it is not possible for Python to
    know the names of the arrays outside `savez`, the arrays will be saved
    with names "arr_0", "arr_1", and so on. These arguments can be any
    expression.
kwds : Keyword arguments, optional
    Arrays to save to the file. Arrays will be saved in the file with the
    keyword names.

实例1：用*args方式保存数据

In [48]: d1 = np.random.randint(0,100,(2,2,2))

In [49]: d1
Out[49]: 
array([[[12, 72],
        [60, 41]],

       [[ 2,  6],
        [62, 53]]])

In [50]: d2, d3 = d1*10, d1*100

In [51]: np.savez('data',d1,d2,d3)

In [52]: data = np.load('data.npz')

In [53]: data.files
Out[53]: ['arr_0', 'arr_1', 'arr_2']

In [54]: data['arr_0']
Out[54]: 
array([[[12, 72],
        [60, 41]],

       [[ 2,  6],
        [62, 53]]])

In [55]: data['arr_1']
Out[55]: 
array([[[120, 720],
        [600, 410]],

       [[ 20,  60],
        [620, 530]]])

In [56]: data['arr_2']
Out[56]: 
array([[[1200, 7200],
        [6000, 4100]],

       [[ 200,  600],
        [6200, 5300]]])

实例2：用**kwds方式保存数据

In [57]: np.savez('data2',d1=d1,d2=d2,d3=d3) # 等价于
# d = {'d1':d1,'d2':d2,'d3':d3}
# np.savez('data2',**d)

In [58]: data2 = np.load('data2.npz')

In [59]: data2.files
Out[59]: ['d1', 'd2', 'd3']

In [62]: data2['d1']
Out[62]: 
array([[[12, 72],
        [60, 41]],

       [[ 2,  6],
        [62, 53]]])

In [63]: data2['d2']
Out[63]: 
array([[[120, 720],
        [600, 410]],

       [[ 20,  60],
        [620, 530]]])

In [64]: data2['d3']
Out[64]: 
array([[[1200, 7200],
        [6000, 4100]],

       [[ 200,  600],
        [6200, 5300]]])

二、判断变量类型

参考文章：python中判断变量的类型

1、需求

有时候需要对数组进行增减元素之类的操作，但是数组不支持这些操作，于是我们需要转换为列表或者ndarray进行操作，如果我们转换成列表list，之后进行numpy计算的时候，就要转换成ndarray类型，才能计算，因此需要有个判断数据类型的过程。

2、接口

python的数据类型有：数字(int)、浮点(float)、字符串(str)，列表(list)、元组(tuple)、字典(dict)、集合(set)

一般通过以下方法进行判断：

1、isinstance(参数1,参数2)

描述：该函数用来判断一个变量（参数1）是否是已知的变量类型(参数2) 类似于type()

参数1：变量

参数2：可以是直接或间接类名、基本类型或者由它们组成的元组。

返回值: 如果对象的类型与参数二的类型（classinfo）相同则返回 True，否则返回 False

用法

#判断变量类型的函数
def typeof(variate):
    type=None
    if isinstance(variate,int):
        type = "int"
    elif isinstance(variate,str):
        type = "str"
    elif isinstance(variate,float):
        type = "float"
    elif isinstance(variate,list):
        type = "list"
    elif isinstance(variate,tuple):
        type = "tuple"
    elif isinstance(variate,dict):
        type = "dict"
    elif isinstance(variate,set):
        type = "set"
    return type
# 返回变量类型
def getType(variate):
    arr = {"int":"整数","float":"浮点","str":"字符串","list":"列表","tuple":"元组","dict":"字典","set":"集合"}
    vartype = typeof(variate)
    if not (vartype in arr):
        return "未知类型"
    return arr[vartype]

#判断变量是否为整数
money=120
print("{0}是{1}".format(money,getType(money)))
#判断变量是否为字符串
money="120"
print("{0}是{1}".format(money,getType(money)))
money=12.3
print("{0}是{1}".format(money,getType(money)))
#判断变量是否为列表
students=['studentA']
print("{0}是{1}".format(students,getType(students)))
#判断变量是否为元组
students=('studentA','studentB')
print("{0}是{1}".format(students,getType(students)))
#判断变量是否为字典
dictory={"key1":"value1","key2":"value2"}
print("{0}是{1}".format(dictory,getType(dictory)))
#判断变量是否为集合
apple={"apple1","apple2"}46 print("{0}是{1}".format(apple,getType(apple)))

三、字符串打印拼接

参考文章：python之print字符串拼接输出方式

1、定义变量

name = "蓝天"
robot_name = "小麦"

2、输出

方法一：

print('你好，'+name+'，我是'+robot_name)

方法二：

用{}表示变量，然后通过format函数传入要填入的变量

print('你好，{}，我是{}'.format(name,robot_name))

方法三：

在字符串前加上f，然后在字符串中只用{name}自动拼接内容

print(f'你好，{name}，我是{robot_name}')

方法四：

print接受多个参数，自动打印出来，并且默认用空格隔开

print('你好，',name,'，我是',robot_name)

四、Python enumerate() 函数

参考文章：Python enumerate() 函数
enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列，同时列出数据和数据下标，一般用在 for 循环当中。
Python 2.3. 以上版本可用，2.6 添加 start 参数。

1、语法

enumerate(sequence, [start=0])

2、参数

sequence – 一个序列、迭代器或其他支持迭代对象。
start – 下标起始位置。

3、返回值

返回 enumerate(枚举) 对象。

4、实例

>>>seasons = ['Spring', 'Summer', 'Fall', 'Winter']
>>> list(enumerate(seasons))
[(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]
>>> list(enumerate(seasons, start=1))       # 下标从 1 开始
[(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]

普通for循环

>>>i = 0
>>> seq = ['one', 'two', 'three']
>>> for element in seq:
...     print i, seq[i]
...     i +=1
... 
0 one
1 two
2 three

for 循环使用 enumerate

>>>seq = ['one', 'two', 'three']
>>> for i, element in enumerate(seq):
...     print i, element
... 
0 one
1 two
2 three

五、python中list与数组转换

参考文章：python中 list 与数组的互相转换

1、`list`转`array`

np.array(a)

2、`array` 转`list`

a.tolist()

六、python创建元素为0的多维数组

参考文章：python创建全为0的二维列表遇到的坑

1、正确做法

dp = [[0] * (3) for _ in range(3)]
print(dp)
dp[0][0]=4
print(dp)

[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[4, 0, 0], [0, 0, 0], [0, 0, 0]]

2、错误做法

dp = [[0,0,0]]*3
print(dp)
dp[0][0]=4
print(dp)

[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[4, 0, 0], [4, 0, 0], [4, 0, 0]]

matrix = [[array]] * 3操作中，只是创建3个指向array的引用，所以一旦array改变，matrix中3个list也会随之改变。

七、python删除列表元素

参考文章：Python删除列表元素的3种方法

1、方法一：`del`语句

使用del语句可以删除任何位置处的列表元素，条件是知道索引
如要删除列表中等的 ‘a’ 元素。

lis = [1, 2, 3, 'a', 'b']
print(lis)

del lis[3]
print(lis)

[1, 2, 3, ‘a’, ‘b’]
[1, 2, 3, ‘b’]

2、方法二：`pop()`方法

pop() 方法用于移除列表中的一个元素（默认最后一个元素），并且返回该元素的值。

lis = [1, 2, 3, 'a', 'b']
print(lis)

a = lis.pop()

print(a)
print(lis)

[1, 2, 3, ‘a’, ‘b’]
b
[1, 2, 3, ‘a’]

实际上，pop()方法可以用来删除列表中任何位置的元素，只需要在括号中指定要删除元素的索引即可。

lis = [1, 2, 3, 'a', 'b']
print(lis)

a = lis.pop(1)

print(a)
print(lis)

[1, 2, 3, ‘a’, ‘b’]
2
[1, 3, ‘a’, ‘b’]

可以看出用pop()方法把索引值为1的元素从列表中删除

同del一样，被弹出的元素就不在列表中了。

3、方法三：根据值删除元素

有时候不知道元素在列表中的位置，但知道元素的值，就可以用remove()方法删除元素
例如我们要把列表中的元素 3 删除

lis = [1, 2, 3, 'a', 'b']
print(lis)

lis.remove(3)
print(lis)

[1, 2, 3, ‘a’, ‘b’]
[1, 2, ‘a’, ‘b’]

注意：
remove()方法只删除第一个指定的值
如我们列表中有两个元素的值为 3

lis = [1, 2, 3, 'a', 'b', 3]
print(lis)

lis.remove(3)
print(lis)

[1, 2, 3, ‘a’, ‘b’, 3]
[1, 2, ‘a’, ‘b’, 3]

可以看出只删除了第一次出现的3
我们可以用循环来删除所有的3

lis = [1, 2, 3, 'a', 'b', 3]
print(lis)

while 3 in lis:
    lis.remove(3)
print(lis)

[1, 2, 3, ‘a’, ‘b’, 3]
[1, 2, ‘a’, ‘b’]

八、打印时间戳

import time
# 格式化成 2016-03-20 11:45:39形式
print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)