Python3------NumPy学习(一)

2023-10-29

NumPy学习

1------NumPy介绍

Numpy（Numerical Python）是一个开源的Python科学计算库，用于快速处理任意维度的数组。

Numpy支持常见的数组和矩阵操作。对于同样的数值计算任务，使用Numpy比直接使用Python要简洁的多。

Numpy使用ndarray对象来处理多维数组，该对象是一个快速而灵活的大数据容器

1.1------ndarray

NumPy提供了一个N维数组类型ndarray，它描述了相同类型的“items”的集合。

score = np.array([[80, 89, 86, 67, 79],
[78, 97, 89, 67, 81],
[90, 94, 78, 67, 74],
[91, 91, 90, 67, 69],
[76, 87, 75, 67, 86],
[70, 79, 84, 67, 84],
[94, 92, 93, 67, 64],
[86, 85, 83, 67, 80]])

就是将一个Python内置的数组作为参数传递给numpy，返回结果如下：

array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])

使用Python列表可以存储一维数组，通过列表的嵌套可以实现多维数组，那么为什么还需要使用Numpy的ndarray呢？

这就关乎到效率这个问题了，请看下一节。

1.2------ndarray与Python原生list运算效率对比

import numpy as np
import random
import time
#初始化定义一个列表list
a = []
for i in range(100000000):
	#往a中随机添加100000000个元素
	a.append(random.random())
t1 = time.time()
#求列表a中元素的和
sum = sum(a)
t2 = time.time()
#求和开始时间
print(t1)
#求和结束时间
print(t2)
#总共花费的时间
print(t2-t1)

b = np.array(a)
t3 = time.time()
sum2 = np.sum(b)
t4 = time.time()
print(t4-t3)

输出结果如下：

1543031869.482664
1543031870.6981153
1.2154512405395508
0.22154474258422852

所以可见，ndarray中对数据的计算效率比传统Python内置对象要高得多！！！

机器学习的最大特点就是大量的数据运算，那么如果没有一个快速的解决方案，那可能现在python也在机器学习领域达不到好的效果。

思考：ndarray为什么这么快呢？

1.2.1------ndarray的优势

直接上图

从图中我们可以看出ndarray在存储数据的时候，数据与数据的地址都是连续的，这样就给使得批量操作数组元素时速度更快。

这是因为ndarray中的所有元素的类型都是相同的，而Python列表中的元素类型是任意的，所以ndarray在存储元素时内存可以连续，而python原生lis就t只能通过寻址方式找到下一个元素，这虽然也导致了在通用性能方面Numpy的ndarray不及Python原生list，但在科学计算中，Numpy的ndarray就可以省掉很多循环语句，代码使用方面比Python原生list简单的多。

下面，我们就来具体学习一下ndarray

2------ndarray

2.1------ndarray的属性

属性名字	属性解释
ndarray.shape	数组维度的元组
ndarray.ndim	数组维数
ndarray.size	数组中的元素数量
ndarray.itemsize	一个数组元素的长度（字节）
ndarray.dtype	数组元素的类型

2.2------ndarray的形状

a = np.array([[1,2,3],[4,5,6]])
b = np.array([1,2,3,4])
c = np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])
print(a.shape)
print(b.shape)
print(c.shape)

结果如下：

(2, 3)
(4,)
(2, 2, 3)

在这里，详细说一下三位数组怎么分析其形状！！！

如例：[[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]]

首先去掉最外层的中括号，剩下----[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]此时由两部分构成[[1,2,3],[4,5,6]]和[[7,8,9],[10,11,12]]

把这两个叫做构成单元

所以第一维是2

同理对上述构成单元分析（只需要分析一个即可，这里拿第一个说明），再次去掉最外层的中括号，剩下----[1,2,3],[4,5,6]此时由两部分构成，[1,2,3]和[4,5,6]也是两个构成单元。

所以第二维是2

再对上面构成单元分析，去掉中括号，剩下1，2，3

所以第三位是3--------最后结果即(2,2,3)

2.3----ndarrat的数据类型

代码上见：

a = np.array([[1,2,3],[4,5,6]])
print(a.dtype)

结果如下：

int32

ndarray中具体的数据类型如下表：

名称	描述	简写
np.bool	用一个字节存储的布尔类型（True或False）	'b'
np.int8	一个字节大小，-128 至 127	'i'
np.int16	整数，-32768 至 32767	'i2'
np.int32	整数，-231 至 232 -1	'i4'
np.int64	整数，-263 至 263 - 1	'i8'
np.uint8	无符号整数，0 至 255	'u'
np.uint16	无符号整数，0 至 65535	'u2'
np.uint32	无符号整数，0 至 2 ** 32 - 1	'u4'
np.uint64	无符号整数，0 至 2 ** 64 - 1	'u8'
np.float16	半精度浮点数：16位，正负号1位，指数5位，精度10位	'f2'
np.float32	单精度浮点数：32位，正负号1位，指数8位，精度23位	'f4'
np.float64	双精度浮点数：64位，正负号1位，指数11位，精度52位	'f8'
np.complex64	复数，分别用两个32位浮点数表示实部和虚部	'c8'
np.complex128	复数，分别用两个64位浮点数表示实部和虚部	'c16'
np.object_	python对象	'O'
np.string_	字符串	'S'
np.unicode_	unicode类型	'U'

在创建的时候可以这样创建:

a = np.array([[1, 2, 3],[4, 5, 6]], dtype=np.float32)

2.4------基本操作

2.4.1------生成数组

empty(shape[, dtype, order]) empty_like(a[, dtype, order, subok])

见名知意：参数为shape时许传递一个表示维数的数组，参数为a[....]时需要传递进去一个数组或者矩阵，下面也是一样。

eye(N[, M, k, dtype, order])
identity(n[, dtype])
ones(shape[, dtype, order])
ones_like(a[, dtype, order, subok])
zeros(shape[, dtype, order]) zeros_like(a[, dtype, order, subok])
full(shape, fill_value[, dtype, order])
full_like(a, fill_value[, dtype, order, subok])

empty系列：

a = np.empty([3,4])
print(a)
b = [[1,2,3],[4,5,6]]
b = np.empty_like(b)
print(b)
#eye()
c = np.eye(5)
print(c)

d = np.identity(5)
print(d)



[[8.82769181e+025 7.36662981e+228 7.54894003e+252 2.95479883e+137]
 [1.42800637e+248 2.64686750e+180 1.09936856e+248 6.99481925e+228]
 [7.54894003e+252 7.67109635e+170 2.64686750e+180 5.63234836e-322]]
[[0 0 0]
 [0 0 0]]
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]

解释：

empty()方法回按照传递进去的维数，生成对应维数的0矩阵，单数矩阵中的元素不为0只是接近0

empty_like()方法按照传递进去的矩阵，生成对应维数的0矩阵，这里元素都为0

eye()和identity()需要传递进去一个整数，代表维数，这里生成的是对应维数的单位矩阵

***()与***_like()方法类似。

2.4.2------从已有数组(矩阵)中生成

array(object[, dtype, copy, order, subok, ndmin])
asarray(a[, dtype, order])

a = np.array([[1,2,3],[4,5,6]])
# 从现有的数组当中创建
a1 = np.array(a)
# 相当于索引的形式，并没有真正的创建一个新的
a2 = np.asarray(a)
print(a1)
print(a2)

[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]

解释：array()和asarray() 的区别

相同：array和asarray都可以将数组转化为ndarray对象

区别：当参数为一般数组时，两个函数都会开辟新的内存来存放copy的数组或矩阵；

当参数本身就是ndarray类型时，array会新建一个ndarray对象，作为参数的副本，但是asarray不会新建，而是与参数共享同一个内存。

2.4.3------生成固定范围的数组

np.linspace (start, stop, num, endpoint, retstep, dtype)

参数说明：

start 序列的起始值
stop 序列的终止值，
如果endpoint为true，该值包含于序列中
num 要生成的等间隔样例数量，默认为50
endpoint 序列中是否包含stop值，默认为ture
retstep 如果为true，返回样例，
以及连续数字之间的步长
dtype 输出ndarray的数据类型

代码上见

a = np.linspace(0,100,50)
print(a)

[  0.           2.04081633   4.08163265   6.12244898   8.16326531
  10.20408163  12.24489796  14.28571429  16.32653061  18.36734694
  20.40816327  22.44897959  24.48979592  26.53061224  28.57142857
  30.6122449   32.65306122  34.69387755  36.73469388  38.7755102
  40.81632653  42.85714286  44.89795918  46.93877551  48.97959184
  51.02040816  53.06122449  55.10204082  57.14285714  59.18367347
  61.2244898   63.26530612  65.30612245  67.34693878  69.3877551
  71.42857143  73.46938776  75.51020408  77.55102041  79.59183673
  81.63265306  83.67346939  85.71428571  87.75510204  89.79591837
  91.83673469  93.87755102  95.91836735  97.95918367 100.        ]

--------------------------------------------------------------

numpy.arange(start,stop, step, dtype)
参数说明：从start开始，到stop结束，步长为step

代码上见：

a = np.arange(0,100,2)
print(a)

[ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94
 96 98]

2.4.4------生成随机数组成的数组

np.random模块

np.random.rand(d0, d1, ..., dn)------返回[0.0，1.0)内的一组均匀分布的数，参数是产生的数组的维数。

np.random.randn(d0, d1, …, dn)------功能：从标准正态分布中返回一个或多个样本值

a = np.random.rand(3,2)
print(a)
b = np.random.randn(4,2)
print(b)


[[0.47018098 0.53773488]
 [0.44468209 0.14701938]
 [0.44349829 0.90800236]]
--------------------------
[[-1.58174979 -0.8541224 ]
 [ 0.3468391  -2.68307933]
 [-0.5132048   1.19656033]
 [ 0.96606693  0.16333828]]

------------------------------------------------------------------------------------------------------------------------------------------------

有志者事竟成百二秦关终属楚

苦心人天不负三千越甲可吞吴

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python3