Datawhale数据分析第一章第一节:数据载入及初步观察

2023-05-16

复习:这门课程得主要目的是通过真实的数据,以实战的方式了解数据分析的流程和熟悉数据分析python的基本操作。知道了课程的目的之后,我们接下来我们要正式的开始数据分析的实战教学,完成kaggle上泰坦尼克的任务,实战数据分析全流程。
这里有两份资料:
教材《Python for Data Analysis》和 baidu.com & google.com(善用搜索引擎)

1 第一章:数据载入及初步观察

1.1 载入数据

数据集下载 https://www.kaggle.com/c/titanic/overview

1.1.1 任务一:导入numpy和pandas

import numpy as np
import pandas as pd

【提示】如果加载失败,学会如何在你的python环境下安装numpy和pandas这两个库

1.1.2 任务二:载入数据

(1) 使用相对路径载入数据
(2) 使用绝对路径载入数据

import os 
os.getcwd()
'/Users/liubaoyun/Desktop/Datawhale 数据分析/Datawhale数据分析/第一单元项目集合'
abs_path = os.path.abspath('train.csv')
abs_path
'/Users/liubaoyun/Desktop/Datawhale 数据分析/Datawhale数据分析/第一单元项目集合/train.csv'
abs_train = pd.read_csv(abs_path)
rel_train = pd.read_csv('train.csv')
train = abs_train

【提示】相对路径载入报错时,尝试使用os.getcwd()查看当前工作目录。
【思考】知道数据加载的方法后,试试pd.read_csv()和pd.read_table()的不同,如果想让他们效果一样,需要怎么做?了解一下’.tsv’和’.csv’的不同,如何加载这两个数据集?
【总结】加载的数据是所有工作的第一步,我们的工作会接触到不同的数据格式(eg:.csv;.tsv;.xlsx),但是加载的方法和思路都是一样的,在以后工作和做项目的过程中,遇到之前没有碰到的问题,要多多查资料吗,使用googel,了解业务逻辑,明白输入和输出是什么。

1.1.3 任务三:每1000行为一个数据模块,逐块读取

#在尝试大文件前,可以对pandans的显示设置进行调整
pd.options.display.max_rows=10
train
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

chunker = pd.read_csv('train.csv',chunksize=100)
chunker
<pandas.io.parsers.TextFileReader at 0x7fa713369250>
for piece in chunker:
    print('chunk_train')
    print('\n')
    print(piece)
chunk_train


    PassengerId  Survived  Pclass  \
0             1         0       3   
1             2         1       1   
2             3         1       3   
3             4         1       1   
4             5         0       3   
..          ...       ...     ...   
95           96         0       3   
96           97         0       1   
97           98         1       1   
98           99         1       2   
99          100         0       2   

                                                 Name     Sex   Age  SibSp  \
0                             Braund, Mr. Owen Harris    male  22.0      1   
1   Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                              Heikkinen, Miss. Laina  female  26.0      0   
3        Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                            Allen, Mr. William Henry    male  35.0      0   
..                                                ...     ...   ...    ...   
95                        Shorney, Mr. Charles Joseph    male   NaN      0   
96                          Goldschmidt, Mr. George B    male  71.0      0   
97                    Greenfield, Mr. William Bertram    male  23.0      0   
98               Doling, Mrs. John T (Ada Julia Bone)  female  34.0      0   
99                                  Kantor, Mr. Sinai    male  34.0      1   

    Parch            Ticket     Fare    Cabin Embarked  
0       0         A/5 21171   7.2500      NaN        S  
1       0          PC 17599  71.2833      C85        C  
2       0  STON/O2. 3101282   7.9250      NaN        S  
3       0            113803  53.1000     C123        S  
4       0            373450   8.0500      NaN        S  
..    ...               ...      ...      ...      ...  
95      0            374910   8.0500      NaN        S  
96      0          PC 17754  34.6542       A5        C  
97      1          PC 17759  63.3583  D10 D12        C  
98      1            231919  23.0000      NaN        S  
99      0            244367  26.0000      NaN        S  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass                                    Name  \
100          101         0       3                 Petranec, Miss. Matilda   
101          102         0       3        Petroff, Mr. Pastcho ("Pentcho")   
102          103         0       1               White, Mr. Richard Frasar   
103          104         0       3              Johansson, Mr. Gustaf Joel   
104          105         0       3          Gustafsson, Mr. Anders Vilhelm   
..           ...       ...     ...                                     ...   
195          196         1       1                    Lurette, Miss. Elise   
196          197         0       3                     Mernagh, Mr. Robert   
197          198         0       3        Olsen, Mr. Karl Siegwart Andreas   
198          199         1       3        Madigan, Miss. Margaret "Maggie"   
199          200         0       2  Yrois, Miss. Henriette ("Mrs Harbeck")   

        Sex   Age  SibSp  Parch    Ticket      Fare Cabin Embarked  
100  female  28.0      0      0    349245    7.8958   NaN        S  
101    male   NaN      0      0    349215    7.8958   NaN        S  
102    male  21.0      0      1     35281   77.2875   D26        S  
103    male  33.0      0      0      7540    8.6542   NaN        S  
104    male  37.0      2      0   3101276    7.9250   NaN        S  
..      ...   ...    ...    ...       ...       ...   ...      ...  
195  female  58.0      0      0  PC 17569  146.5208   B80        C  
196    male   NaN      0      0    368703    7.7500   NaN        Q  
197    male  42.0      0      1      4579    8.4042   NaN        S  
198  female   NaN      0      0    370370    7.7500   NaN        Q  
199  female  24.0      0      0    248747   13.0000   NaN        S  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass  \
200          201         0       3   
201          202         0       3   
202          203         0       3   
203          204         0       3   
204          205         1       3   
..           ...       ...     ...   
295          296         0       1   
296          297         0       3   
297          298         0       1   
298          299         1       1   
299          300         1       1   

                                                Name     Sex   Age  SibSp  \
200                   Vande Walle, Mr. Nestor Cyriel    male  28.0      0   
201                              Sage, Mr. Frederick    male   NaN      8   
202                       Johanson, Mr. Jakob Alfred    male  34.0      0   
203                             Youseff, Mr. Gerious    male  45.5      0   
204                         Cohen, Mr. Gurshon "Gus"    male  18.0      0   
..                                               ...     ...   ...    ...   
295                                Lewy, Mr. Ervin G    male   NaN      0   
296                               Hanna, Mr. Mansour    male  23.5      0   
297                     Allison, Miss. Helen Loraine  female   2.0      1   
298                            Saalfeld, Mr. Adolphe    male   NaN      0   
299  Baxter, Mrs. James (Helene DeLaudeniere Chaput)  female  50.0      0   

     Parch    Ticket      Fare    Cabin Embarked  
200      0    345770    9.5000      NaN        S  
201      2  CA. 2343   69.5500      NaN        S  
202      0   3101264    6.4958      NaN        S  
203      0      2628    7.2250      NaN        C  
204      0  A/5 3540    8.0500      NaN        S  
..     ...       ...       ...      ...      ...  
295      0  PC 17612   27.7208      NaN        C  
296      0      2693    7.2292      NaN        C  
297      2    113781  151.5500  C22 C26        S  
298      0     19988   30.5000     C106        S  
299      1  PC 17558  247.5208  B58 B60        C  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass                                      Name  \
300          301         1       3  Kelly, Miss. Anna Katherine "Annie Kate"   
301          302         1       3                        McCoy, Mr. Bernard   
302          303         0       3           Johnson, Mr. William Cahoone Jr   
303          304         1       2                       Keane, Miss. Nora A   
304          305         0       3         Williams, Mr. Howard Hugh "Harry"   
..           ...       ...     ...                                       ...   
395          396         0       3                       Johansson, Mr. Erik   
396          397         0       3                       Olsson, Miss. Elina   
397          398         0       2                   McKane, Mr. Peter David   
398          399         0       2                          Pain, Dr. Alfred   
399          400         1       2          Trout, Mrs. William H (Jessie L)   

        Sex   Age  SibSp  Parch    Ticket     Fare Cabin Embarked  
300  female   NaN      0      0      9234   7.7500   NaN        Q  
301    male   NaN      2      0    367226  23.2500   NaN        Q  
302    male  19.0      0      0      LINE   0.0000   NaN        S  
303  female   NaN      0      0    226593  12.3500  E101        Q  
304    male   NaN      0      0  A/5 2466   8.0500   NaN        S  
..      ...   ...    ...    ...       ...      ...   ...      ...  
395    male  22.0      0      0    350052   7.7958   NaN        S  
396  female  31.0      0      0    350407   7.8542   NaN        S  
397    male  46.0      0      0     28403  26.0000   NaN        S  
398    male  23.0      0      0    244278  10.5000   NaN        S  
399  female  28.0      0      0    240929  12.6500   NaN        S  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass  \
400          401         1       3   
401          402         0       3   
402          403         0       3   
403          404         0       3   
404          405         0       3   
..           ...       ...     ...   
495          496         0       3   
496          497         1       1   
497          498         0       3   
498          499         0       1   
499          500         0       3   

                                                Name     Sex   Age  SibSp  \
400                               Niskanen, Mr. Juha    male  39.0      0   
401                                  Adams, Mr. John    male  26.0      0   
402                         Jussila, Miss. Mari Aina  female  21.0      1   
403                   Hakkarainen, Mr. Pekka Pietari    male  28.0      1   
404                          Oreskovic, Miss. Marija  female  20.0      0   
..                                               ...     ...   ...    ...   
495                            Yousseff, Mr. Gerious    male   NaN      0   
496                   Eustis, Miss. Elizabeth Mussey  female  54.0      1   
497                  Shellard, Mr. Frederick William    male   NaN      0   
498  Allison, Mrs. Hudson J C (Bessie Waldo Daniels)  female  25.0      1   
499                               Svensson, Mr. Olof    male  24.0      0   

     Parch             Ticket      Fare    Cabin Embarked  
400      0  STON/O 2. 3101289    7.9250      NaN        S  
401      0             341826    8.0500      NaN        S  
402      0               4137    9.8250      NaN        S  
403      0   STON/O2. 3101279   15.8500      NaN        S  
404      0             315096    8.6625      NaN        S  
..     ...                ...       ...      ...      ...  
495      0               2627   14.4583      NaN        C  
496      0              36947   78.2667      D20        C  
497      0          C.A. 6212   15.1000      NaN        S  
498      2             113781  151.5500  C22 C26        S  
499      0             350035    7.7958      NaN        S  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass  \
500          501         0       3   
501          502         0       3   
502          503         0       3   
503          504         0       3   
504          505         1       1   
..           ...       ...     ...   
595          596         0       3   
596          597         1       2   
597          598         0       3   
598          599         0       3   
599          600         1       1   

                                             Name     Sex   Age  SibSp  Parch  \
500                              Calic, Mr. Petar    male  17.0      0      0   
501                           Canavan, Miss. Mary  female  21.0      0      0   
502                O'Sullivan, Miss. Bridget Mary  female   NaN      0      0   
503                Laitinen, Miss. Kristina Sofia  female  37.0      0      0   
504                         Maioni, Miss. Roberta  female  16.0      0      0   
..                                            ...     ...   ...    ...    ...   
595                   Van Impe, Mr. Jean Baptiste    male  36.0      1      1   
596                    Leitch, Miss. Jessie Wills  female   NaN      0      0   
597                           Johnson, Mr. Alfred    male  49.0      0      0   
598                             Boulos, Mr. Hanna    male   NaN      0      0   
599  Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan")    male  49.0      1      0   

       Ticket     Fare Cabin Embarked  
500    315086   8.6625   NaN        S  
501    364846   7.7500   NaN        Q  
502    330909   7.6292   NaN        Q  
503      4135   9.5875   NaN        S  
504    110152  86.5000   B79        S  
..        ...      ...   ...      ...  
595    345773  24.1500   NaN        S  
596    248727  33.0000   NaN        S  
597      LINE   0.0000   NaN        S  
598      2664   7.2250   NaN        C  
599  PC 17485  56.9292   A20        C  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass  \
600          601         1       2   
601          602         0       3   
602          603         0       1   
603          604         0       3   
604          605         1       1   
..           ...       ...     ...   
695          696         0       2   
696          697         0       3   
697          698         1       3   
698          699         0       1   
699          700         0       3   

                                                  Name     Sex   Age  SibSp  \
600  Jacobsohn, Mrs. Sidney Samuel (Amy Frances Chr...  female  24.0      2   
601                               Slabenoff, Mr. Petco    male   NaN      0   
602                          Harrington, Mr. Charles H    male   NaN      0   
603                          Torber, Mr. Ernst William    male  44.0      0   
604                    Homer, Mr. Harry ("Mr E Haven")    male  35.0      0   
..                                                 ...     ...   ...    ...   
695                         Chapman, Mr. Charles Henry    male  52.0      0   
696                                   Kelly, Mr. James    male  44.0      0   
697                   Mullens, Miss. Katherine "Katie"  female   NaN      0   
698                           Thayer, Mr. John Borland    male  49.0      1   
699           Humblen, Mr. Adolf Mathias Nicolai Olsen    male  42.0      0   

     Parch  Ticket      Fare  Cabin Embarked  
600      1  243847   27.0000    NaN        S  
601      0  349214    7.8958    NaN        S  
602      0  113796   42.4000    NaN        S  
603      0  364511    8.0500    NaN        S  
604      0  111426   26.5500    NaN        C  
..     ...     ...       ...    ...      ...  
695      0  248731   13.5000    NaN        S  
696      0  363592    8.0500    NaN        S  
697      0   35852    7.7333    NaN        Q  
698      1   17421  110.8833    C68        C  
699      0  348121    7.6500  F G63        S  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass  \
700          701         1       1   
701          702         1       1   
702          703         0       3   
703          704         0       3   
704          705         0       3   
..           ...       ...     ...   
795          796         0       2   
796          797         1       1   
797          798         1       3   
798          799         0       3   
799          800         0       3   

                                                  Name     Sex   Age  SibSp  \
700  Astor, Mrs. John Jacob (Madeleine Talmadge Force)  female  18.0      1   
701                   Silverthorne, Mr. Spencer Victor    male  35.0      0   
702                              Barbara, Miss. Saiide  female  18.0      0   
703                              Gallagher, Mr. Martin    male  25.0      0   
704                            Hansen, Mr. Henrik Juul    male  26.0      1   
..                                                 ...     ...   ...    ...   
795                                 Otter, Mr. Richard    male  39.0      0   
796                        Leader, Dr. Alice (Farnham)  female  49.0      0   
797                                   Osman, Mrs. Mara  female  31.0      0   
798                       Ibrahim Shawah, Mr. Yousseff    male  30.0      0   
799  Van Impe, Mrs. Jean Baptiste (Rosalie Paula Go...  female  30.0      1   

     Parch    Ticket      Fare    Cabin Embarked  
700      0  PC 17757  227.5250  C62 C64        C  
701      0  PC 17475   26.2875      E24        S  
702      1      2691   14.4542      NaN        C  
703      0     36864    7.7417      NaN        Q  
704      0    350025    7.8542      NaN        S  
..     ...       ...       ...      ...      ...  
795      0     28213   13.0000      NaN        S  
796      0     17465   25.9292      D17        S  
797      0    349244    8.6833      NaN        S  
798      0      2685    7.2292      NaN        C  
799      1    345773   24.1500      NaN        S  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass  \
800          801         0       2   
801          802         1       2   
802          803         1       1   
803          804         1       3   
804          805         1       3   
..           ...       ...     ...   
886          887         0       2   
887          888         1       1   
888          889         0       3   
889          890         1       1   
890          891         0       3   

                                            Name     Sex    Age  SibSp  Parch  \
800                         Ponesell, Mr. Martin    male  34.00      0      0   
801  Collyer, Mrs. Harvey (Charlotte Annie Tate)  female  31.00      1      1   
802          Carter, Master. William Thornton II    male  11.00      1      2   
803              Thomas, Master. Assad Alexander    male   0.42      0      1   
804                      Hedman, Mr. Oskar Arvid    male  27.00      0      0   
..                                           ...     ...    ...    ...    ...   
886                        Montvila, Rev. Juozas    male  27.00      0      0   
887                 Graham, Miss. Margaret Edith  female  19.00      0      0   
888     Johnston, Miss. Catherine Helen "Carrie"  female    NaN      1      2   
889                        Behr, Mr. Karl Howell    male  26.00      0      0   
890                          Dooley, Mr. Patrick    male  32.00      0      0   

         Ticket      Fare    Cabin Embarked  
800      250647   13.0000      NaN        S  
801  C.A. 31921   26.2500      NaN        S  
802      113760  120.0000  B96 B98        S  
803        2625    8.5167      NaN        C  
804      347089    6.9750      NaN        S  
..          ...       ...      ...      ...  
886      211536   13.0000      NaN        S  
887      112053   30.0000      B42        S  
888  W./C. 6607   23.4500      NaN        S  
889      111369   30.0000     C148        C  
890      370376    7.7500      NaN        Q  

[91 rows x 12 columns]

【思考】什么是逐块读取?为什么要逐块读取呢?

【提示】大家可以chunker(数据块)是什么类型?用for循环打印出来出处具体的样子是什么?

1.1.4 任务四:将表头改成中文,索引改为乘客ID [对于某些英文资料,我们可以通过翻译来更直观的熟悉我们的数据]

PassengerId => 乘客ID
Survived => 是否幸存
Pclass => 乘客等级(1/2/3等舱位)
Name => 乘客姓名
Sex => 性别
Age => 年龄
SibSp => 堂兄弟/妹个数
Parch => 父母与小孩个数
Ticket => 船票信息
Fare => 票价
Cabin => 客舱
Embarked => 登船港口

train.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
train.columns=['乘客ID','是否幸存','乘客等级(1/2/3等舱位)',
               '乘客姓名','性别','年龄','堂兄弟/妹个数',
               '父母与小孩个数','船票信息','票价','客舱','登船港口'
    ]
train
乘客ID是否幸存乘客等级(1/2/3等舱位)乘客姓名性别年龄堂兄弟/妹个数父母与小孩个数船票信息票价客舱登船港口
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

【思考】所谓将表头改为中文其中一个思路是:将英文列名表头替换成中文。还有其他的方法吗?

1.2 初步观察

导入数据后,你可能要对数据的整体结构和样例进行概览,比如说,数据大小、有多少列,各列都是什么格式的,是否包含null等

1.2.1 任务一:查看数据的基本信息

train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   乘客ID            891 non-null    int64  
 1   是否幸存            891 non-null    int64  
 2   乘客等级(1/2/3等舱位)  891 non-null    int64  
 3   乘客姓名            891 non-null    object 
 4   性别              891 non-null    object 
 5   年龄              714 non-null    float64
 6   堂兄弟/妹个数         891 non-null    int64  
 7   父母与小孩个数         891 non-null    int64  
 8   船票信息            891 non-null    object 
 9   票价              891 non-null    float64
 10  客舱              204 non-null    object 
 11  登船港口            889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

【提示】有多个函数可以这样做,你可以做一下总结

1.2.2 任务二:观察表格前10行的数据和后15行的数据

#写入代码
train[:10]
乘客ID是否幸存乘客等级(1/2/3等舱位)乘客姓名性别年龄堂兄弟/妹个数父母与小孩个数船票信息票价客舱登船港口
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NaNS
91012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708NaNC
train[-15:]
乘客ID是否幸存乘客等级(1/2/3等舱位)乘客姓名性别年龄堂兄弟/妹个数父母与小孩个数船票信息票价客舱登船港口
87687703Gustafsson, Mr. Alfred Ossianmale20.00075349.8458NaNS
87787803Petroff, Mr. Nedeliomale19.0003492127.8958NaNS
87887903Laleff, Mr. KristomaleNaN003492177.8958NaNS
87988011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.0011176783.1583C50C
88088112Shelley, Mrs. William (Imanita Parrish Hall)female25.00123043326.0000NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

15 rows × 12 columns

1.2.4 任务三:判断数据是否为空,为空的地方返回True,其余地方返回False

train.isnull()
乘客ID是否幸存乘客等级(1/2/3等舱位)乘客姓名性别年龄堂兄弟/妹个数父母与小孩个数船票信息票价客舱登船港口
0FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalse
1FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
2FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalse
3FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
4FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalse
.......................................
886FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalse
887FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
888FalseFalseFalseFalseFalseTrueFalseFalseFalseFalseTrueFalse
889FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
890FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalse

891 rows × 12 columns

【总结】上面的操作都是数据分析中对于数据本身的观察

【思考】对于一个数据,还可以从哪些方面来观察?找找答案,这个将对下面的数据分析有很大的帮助

1.3 保存数据

1.3.1 任务一:将你加载并做出改变的数据,在工作目录下保存为一个新文件train_chinese.csv

#写入代码
# 注意:不同的操作系统保存下来可能会有乱码。大家可以加入`encoding='GBK' 或者 ’encoding = ’utf-8‘‘`
train.to_csv('train_chinese.csv',encoding = 'utf8')
c= pd.read_csv('train_chinese.csv')
c.head()
Unnamed: 0乘客ID是否幸存乘客等级(1/2/3等舱位)乘客姓名性别年龄堂兄弟/妹个数父母与小孩个数船票信息票价客舱登船港口
00103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
11211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
22313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
33411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
44503Allen, Mr. William Henrymale35.0003734508.0500NaNS

【总结】数据的加载以及入门,接下来就要接触数据本身的运算,我们将主要掌握numpy和pandas在工作和项目场景的运用。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Datawhale数据分析第一章第一节:数据载入及初步观察 的相关文章

随机推荐

  • mac上通过自动操作达到右键通过vscode打开文件、文件夹

    mac上通过自动操作达到右键通过vscode打开文件 文件夹 打开mac中的自动操作app 搜索运行shell脚本 工作流程收到当前 xff1a 文件或文件夹 xff0c 位于 xff1a 访达 xff0c 传递输入为 xff1a 变量 x
  • 索引签名的使用及松散索引签名

    索引签名的使用及松散索引签名 推荐 xff1a 阅读本文 xff0c 需要有一定的ts基础 xff0c 最好有理解字面量类型严格检测方面的知识 xff0c 才能理解奇怪的现象字面量严格检测官方描述 xff1a https github co
  • TS扩展类型

    扩展类型 以扩展interface为例 xff0c type同理使用type的规则扩展 xff0c 如typeA typeB 方式一 xff1a 类型声明文件使用declare关键字的 xff0c 直接在项目类型声明文件进行扩展 span
  • 深入理解node的web stream模块

    深入理解node的web stream模块 提示 xff1a 需要掌握node传统的流以及事件机制node环境 xff1a v16 5 0 43 一下内容全部以node v18 12 0实验为基础如果观看期间发现了一些不认识的api xff
  • Royal TSX 教程(macOS 的 SSH 工具)

    文章目录 引言一 下载安装二 汉化三 基础配置1 安装基础插件2 创建文档3 创建远程主机凭证4 Terminal xff08 终端 xff09 基础设置5 FTP xff08 文件传输 xff09 基础设置6 连接测试 四 高效使用技巧
  • Layui表格日期格式显示

    span class token punctuation span span class token punctuation span field span class token punctuation span span class t
  • maven项目关于target目录没有生成xml、properties等文件问题

    问题描述 xff1a 我在maven第一次弄了父子项目 xff0c 然后tomcat xff0c 启动失败 xff0c 报异常 xff1a class path resource applicationContext xml cannot
  • 虚拟机centos7 xshell连接不上虚拟机情况, net模式ping不通百度情况

    一 centos7 xshell连接不上虚拟机 win 43 R 然后 输入services msc 打开服务 看这俩服务开没开 看这俩网络开没开 xshell里 xff1a 二 ping不同百度 xff08 net模式 自定义IP地址 x
  • 糖尿病遗传风险检测挑战赛

    数据预读 数据集字段说明 编号 xff1a 标识个体身份的数字 xff1b 性别 xff1a 1表示男性 xff0c 0表示女性 xff1b 出生年份 xff1a 出生的年份 xff1b 体重指数 xff1a 体重除以身高的平方 xff0c
  • seata关于‘dataSourceProxy‘创建失败异常

    Caused by org springframework beans factory BeanCreationException Error creating bean with name dataSourceProxy defined
  • java生成文字二维码、url二维码

    java生成文字二维码 url二维码 pom xff1a 1 xff09 生成文字二维码java工具类 xff1a 2 xff09 url地址生成二维码java工具类 xff1a pom xff1a lt dependency gt lt
  • 根据图片URL下载到压缩包

    根据图片URL生成到压缩包 vo类工具类方法枚举类 vo类 span class token keyword package span span class token namespace com span class token punc
  • spring boot中thymeleaf配置说明

    spring boot中thymeleaf配置说明 thymeleaf是一种模板引擎 xff0c 可以查看页面的静态效果 也可以让程序员在服务器查看带数据的动态页面效果 引入依赖 xff0c 在pom xml文件添加以下内容 span cl
  • Ubuntu软件安装Ubuntu Software突然无法启动以及安装残留存在Install Release图标

    文章目录 背景Ubuntu默认的Linux内核无法启动Pycharm加载缓慢问题Ubuntu Software安装的软件无法打开 xff0c 即便是Ubuntu Software也无法打开 结果安装Ubuntu系统之后的残留 背景 之前在U
  • vue使用lottie web插件渲染动画

    1 先安装相应的插件 npm install lottie web save 2 引入插件 我这里是局部引入的 import lottieWeb from 39 lottie web 39 3 使用 xff0c 整体代码如下 xff0c 我
  • Apache 配置禁止IP地址访问,只允许使用域名的方式访问

    配置过程 提前安装好httpd服务器 正常IP地址访问效果 编辑主配置文件 在最后面添加如下信息即可 lt VirtualHost 80 gt DocumentRoot var www html error ServerName 192 1
  • 删掉启动分区进不了系统,复活办法(win10)

    删掉EFI启动分区进不了系统可以这样复活 xff08 win10 xff09 一 复活流程1 u盘制作启动盘2 创建efi分区 二 可能遇到的其他问题 一 复活流程 1 u盘制作启动盘 gt 链接直达 gt gt xff1a win10官方
  • 谈谈你对 多线程 的理解........

    目录 一 认识线程 xff1a xff08 1 xff09 线程和进程的区别 xff1a xff08 面试 xff09 xff08 2 xff09 线程创建 xff1a 二 Thread类及常见方法 xff1a xff08 1 xff09
  • GreenDao和Room

    GreenDao和Room比较 一个小测试insert 增select 查询所有update 修改同位置一条数据delete 删除同位置一条数据写了个kotlin的Room测了下 看如下结果总结 一个小测试 GreenDao版本 imple
  • Datawhale数据分析第一章第一节:数据载入及初步观察

    复习 这门课程得主要目的是通过真实的数据 xff0c 以实战的方式了解数据分析的流程和熟悉数据分析python的基本操作 知道了课程的目的之后 xff0c 我们接下来我们要正式的开始数据分析的实战教学 xff0c 完成kaggle上泰坦尼克