python: pandas与numpy（一）创建DataFrame数组与数组的简单操作

sm = np.arange(0, 20, 2)
smp = pd.Series(sm)
# ！！！！这里注意！！！！
# np.arange()函数的参数，要么只写一个end，要么三个全写，不能只写end，step，不写start，会返回空数 # 组
print(sm)
print(pd.Series(sm))
'''
[ 0  2  4  6  8 10 12 14 16 18]
打印出来的Series数据结构如下：
0     0
1     2
2     4
3     6
4     8
5    10
6    12
7    14
8    16
9    18
dtype: int32
'''

会发现，前面多了一串数字（0~9），很眼熟，像是索引。没错，这就是Series默认的index，如果你觉得报看，可以换掉（具体操作如下）；你要说这表也没个column，这个没办法满足，其函数内部米有。那能加吗？能，放在DataFrame里面就可以了。

smp = pd.Series(sm, index=[7, 5, 3000, 0, 2, 4, 1, 8, 73, 88])
print(smp)
'''
7        0
5        2
3000     4
0        6
2        8
4       10
1       12
8       14
73      16
88      18
dtype: int32
'''

至此，Series的大部分内容都已经过了一遍，用pycharm的时候会发现，其参数还有一个dtype=，这个已经是老朋友了，可以更改数据类型。

2. 创建DataFrame数组：

DataFrame和Series最大的区别就是，DataFrame为多行多列，Series为多行单列。

使用字典来创建DataFrame：

chart = {
    "ID": ["1", "2", "3", "4", "5"],
    "name": ["abi", "baxi", "cine", "deker", "ebby"],
    "gender": [True, False, True, False, False],
    "age": [19, 10, 29, 19, 29],
    "score": [99, 40, 89, 70, 30]
}
finalChart = pd.DataFrame(chart)

使用列表来创建DataFrame：

chart1 = pd.DataFrame([[1, "abi", True, 19, 99], [2, "baxi", False, 10, 40]],
                        columns=["ID", "name", "gender", "age", "score"])

使用Series数组创建DataFrame：

dd = {'one': pd.Series([11, 22, 33, 99], index=[1, 2, 3, 4]),
      'two': pd.Series([55, 66, 7], index=[1, 2, 3], dtype=int)}
dc = pd.DataFrame(dd)

'''
   one   two  
1   11  55.0    
2   22  66.0   
3   33   7.0    
4   99   NaN    
'''

这里注意：Series是没有列标签的，因此我们需要用字典赋予其一个列标签。没有对齐的部分则用NAN补齐。

使用numpy函数创建DataFrame：

am1 = pd.DataFrame(np.random.rand(1, 3), columns=['a', 'b', 'c'])
am2 = pd.DataFrame(np.random.rand(3, 3), columns=['a', 'b', 'c'])
print(am1, am2)
'''
          a         b         c
0  0.561229  0.789217  0.438709       a         b         c # 从下一行开始才是第二个数组
0  0.415782  0.719267  0.210404
1  0.363728  0.759833  0.190822
2  0.332160  0.196373  0.290467
'''

除了使用np.random()下的函数之外，还可以使用np.arange()：

wb = pd.DataFrame(np.arange(20).reshape(5, 4), columns=['a', 'b', 'c', 'd'])
'''
    a   b   c   d
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15
4  16  17  18  19
'''

3. 在DataFrame数组中添加其他DataFrame数组——concat()函数：

chart = {
    "ID": ["1", "2", "3", "4", "5"],
    "name": ["abi", "baxi", "cine", "deker", "ebby"],
    "gender": [True, False, True, False, False],
    "age": [19, 10, 29, 19, 29],
    "score": [99, 40, 89, 70, 30]
}
finalChart = pd.DataFrame(chart)

bChat = {
    "height": [12, 13, 13, 14, 13]
}
bChat1 = pd.DataFrame(bChat)

# 以上创建两个DataFrame
'''axis=1'''
finalChart = pd.concat([finalChart, bChat1], axis=1)
print(finalChart)
'''
  ID   name  gender  age  score  height
0  1    abi    True   19     99      12
1  2   baxi   False   10     40      13
2  3   cine    True   29     89      13
3  4  deker   False   19     70      14
4  5   ebby   False   29     30      13
'''

finalChart = pd.concat([bChat1, finalChart], axis=1)
'''两个数组位置调换'''
print(finalChart)
'''
   height ID   name  gender  age  score
0      12  1    abi    True   19     99
1      13  2   baxi   False   10     40
2      13  3   cine    True   29     89
3      14  4  deker   False   19     70
4      13  5   ebby   False   29     30
'''

pandas里面也可以使用append()，不过会出现即将删除append()的提示，并建议使用concat()，可以认为这两个函数是相同的，但是，千万不要和numpy中的append()函数混淆！

4. DataFrame数组中列与列之间的运算：

加减乘除：（举两个例子：加法、求整数商）

dd = {'one': pd.Series([11, 22, 33, 99], index=[1, 2, 3, 4]),
      'two': pd.Series([55, 66, 7], index=[1, 2, 3], dtype=int)}
dc = pd.DataFrame(dd)

dc['three'] = pd.DataFrame([23, 4, 566, 88], index=[1, 2, 3, 4])
'''
   one   two  three
1   11  55.0     23
2   22  66.0      4
3   33   7.0    566
4   99   NaN     88
'''

'加法'
dc['four'] = dc['three'] + dc['two']
print(dc)
'''
   one   two  three   four
1   11  55.0     23   78.0
2   22  66.0      4   70.0
3   33   7.0    566  573.0
4   99   NaN     88    NaN
'''

'求整数商'
dc['four'] = dc['three'] // dc['two']
print(dc)
'''
   one   two  three  four
1   11  55.0     23   0.0
2   22  66.0      4   0.0
3   33   7.0    566  80.0
4   99   NaN     88   NaN
'''

5. 两种删除列的方法——del 和pop():

del：

del 类似于字典中删除键值对的操作，严格意义上来说，算不上是函数吧……瞎猜的。可以删除任意一列。

del dc['three']
print(dc)
'''
   one   two   four
1   11  55.0   78.0
2   22  66.0   70.0
3   33   7.0  573.0
4   99   NaN    NaN
'''

pop()：

依然可以删除任意一列。

dc.pop('two')  # 不需要赋值，直接在数组上进行运算，pop函数内传入的参数名就是列名。
print(dc)
'''
   one   four
1   11   78.0
2   22   70.0
3   33  573.0
4   99    NaN
'''

忽然想起，栈（列表）中的pop不传入参数，是否默认挤出最末尾的元素？如果这里不传入参数会发生什么？

会报错。

日拱一卒，功不唐捐。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)