shell脚本--awk的基本使用

2023-05-16

awk、sed和grep号称文本处理三剑客。

（1）awk的基本用法

a）awk默认使用空白（一个或者多个空格、一个或者多个\t，一个或者多个空格和\t的组合）作为分隔符。

例如：文件testfile

[root@172-0-10-222 shell-test]# cat testfile 
ll 201907001 80 97 70
kk 201907002 90 97 90
hh 201907003 60 67 60
jj 201907004 59 57 58
aa 201907005 23 34 12

使用awk获取文件内容的第一列，结果如下：

[root@172-0-10-222 shell-test]# cat testfile | awk '{print $1}'
ll
kk
hh
jj
aa

b）假如有多个空格和\t，文件如下：

[root@172-0-10-222 shell-test]# cat testfile 
ll 	201907001 80 97 70
kk 		201907002 90 97 90
hh 201907003 60 67 60
jj 201907004 59 57 58
aa 201907005 23 34 12

使用awk默认分割效果跟上面一样：

[root@172-0-10-222 shell-test]# cat testfile | awk '{print $1}'
ll
kk
hh
jj
aa

c）awk还可以使用多个不同字符进行文本的分割，文件如下：

[root@172-0-10-222 shell-test]# cat testfile 
ll:  201907001 80 97 70
kk:      201907002 90 97 90
hh 201907003 60 67 60
jj 201907004 59 57 58
aa 201907005 23 34 12

使用一个或多个空格，一个或多个冒号，一个或多个冒号和空格的组合进行文本的分割：

[root@172-0-10-222 shell-test]# cat testfile | awk -F '[: ]+' '{print $1}'
ll
kk
hh
jj
aa

案例：获取本机ip

查看ip信息如下：

[root@172-0-10-222 shell-test]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:50:56:87:37:a9 brd ff:ff:ff:ff:ff:ff
    inet 172.0.10.222/16 brd 172.0.255.255 scope global noprefixroute ens32
       valid_lft forever preferred_lft forever
    inet6 fe80::2dea:8641:fee:950f/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

使用awk获取ip：

[root@172-0-10-222 shell-test]# ip a | grep 'scope global' | awk -F '[ /]+' '{print $3}'
172.0.10.222

（2）awk的内置变量NF

NF（number of field）表示当前行被分割后的字段（列）数。

测试文件如下：

[root@172-0-10-222 shell-test]# cat testfile 
 ll 201907001 80 97 70
 kk 201907002 90 97 90
 hh 201907003 67 60
 jj 201907004 59 57 58
 aa 201907005    12

使用NF获取每一行被分割后的字段数

[root@172-0-10-222 shell-test]# cat testfile | awk '{print NF}'
5
5
4
5
3

输出每行最后一列的数据

[root@172-0-10-222 shell-test]# cat testfile | awk '{print $NF}'
70
90
60
58
12

输出每行倒数第二列的数据

[root@172-0-10-222 shell-test]# cat testfile | awk '{print $(NF-1)}'
97
97
67
57
201907005

（3）awk的内置变量NR和FNR

NR（number of record）表示行号，FNR可以将多个文件的行号独立排序。

测试文件：

[root@172-0-10-222 shell-test]# cat testfile 
 ll 201907001 80 97 70
 kk 201907002 90 97 90
 hh 201907003 67 60
 jj 201907004 59 57 58
 aa 201907005    12

使用NR打印行号：

[root@172-0-10-222 shell-test]# cat testfile | awk '{print NR}'
1
2
3
4
5

分别使用NR和FNR打印多个文件的整体行号和独立行号：

测试文件如下：

[root@172-0-10-222 shell-test]# cat testfile
 ll 201907001 80 97 70
 kk 201907002 90 97 90
 hh 201907003 67 60
 jj 201907004 59 57 58
 aa 201907005    12
[root@172-0-10-222 shell-test]# cat testfile2 
 ll 201907001 80 97 70
 hh 201907003 67 60
 aa 201907005    12

NR输出多个文件的整体行号

[root@172-0-10-222 shell-test]# awk '{print NR}' testfile testfile2
1
2
3
4
5
6
7
8

FNR输出多个文件的独立行号

[root@172-0-10-222 shell-test]# awk '{print FNR}' testfile testfile2
1
2
3
4
5
1
2
3

一般情况下，FNR不常用。

NR的常用用法：输出指定的行。

例如：输出testfile的第三行

[root@172-0-10-222 shell-test]# awk 'NR==3{print}' testfile
 hh 201907003 67 60

输出testfile的第2行到第四行

[root@172-0-10-222 shell-test]# awk 'NR>=2&&NR<=4{print}' testfile
 kk 201907002 90 97 90
 hh 201907003 67 60
 jj 201907004 59 57 58

注意：awk的执行过程是pattern{action}方式，就是单引号中的输出方式是满足条件后输出。比如上面的'NR>=2&&NR<=4{print}'，表示行号是2到4的行输出。可以使用这种方式，灵活输出满足条件的任意列。

（4）awk中使用BEGIN和END

BEGIN和END分别在awk的正常输出之前和之后做一些事情。所谓的之后，就是在awk执行完每一行结束后。

测试文件入下：

[root@172-0-10-222 shell-test]# cat testfile
 ll 201907001 80 97 70
 kk 201907002 90 97 90
 hh 201907003 67 60
 jj 201907004 59 57 58
 aa 201907005    12

输出文件的2-4行，并且在前面加上========begin========，在后面加上========end========

[root@172-0-10-222 shell-test]# awk 'BEGIN{print "========begin========"}NR>=2&&NR<=4{print $0}END{print "========end========"}' testfile
========begin========
 kk 201907002 90 97 90
 hh 201907003 67 60
 jj 201907004 59 57 58
========end========

通常，BEGIN可以用来做变量赋值。

（5）awk中的计算

测试文件：

[root@172-0-10-222 shell-test]# cat testfile
 ll 201907001 80 97 70
 kk 201907002 90 97 90
 hh 201907003 67 60 77
 jj 201907004 59 57 58
 aa 201907005 89 62 63

将每一行的后三列之和放在最后一列

[root@172-0-10-222 shell-test]# cat testfile | awk '{print $0,$3+$4+$5}'
 ll 201907001 80 97 70 247
 kk 201907002 90 97 90 277
 hh 201907003 67 60 77 204
 jj 201907004 59 57 58 174
 aa 201907005 89 62 63 214

再将每一行的后三列的平均数放在最后一列

[root@172-0-10-222 shell-test]# cat testfile | awk '{sum=$3+$4+$5;print $0,sum,sum/3}'
 ll 201907001 80 97 70 247 82.3333
 kk 201907002 90 97 90 277 92.3333
 hh 201907003 67 60 77 204 68
 jj 201907004 59 57 58 174 58
 aa 201907005 89 62 63 214 71.3333

后三列，计算每行之和

[root@172-0-10-222 shell-test]# cat testfile | awk '{a+=$3;b+=$4;c+=$5}END{print a,b,c}'
385 373 358

案例：统计/usr/bin/下面的所有进程占CPU和内存的百分比。

/usr/bin/下面的所有进程的情况：

[root@172-0-10-222 shell-test]# ps -aux | grep /usr/bin/
dbus       643  0.0  0.1  66432  2588 ?        Ssl  Nov06   0:03 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root       646  0.0  0.3  99656  6112 ?        Ss   Nov06   0:00 /usr/bin/VGAuthService -s
root       647  0.1  0.3 298660  6172 ?        Ssl  Nov06  17:05 /usr/bin/vmtoolsd
root      1013  0.0  0.9 573852 17004 ?        Ssl  Nov06   2:04 /usr/bin/python -Es /usr/sbin/tuned -l -P
root     10245  0.0  0.0 112704   972 pts/1    S+   14:08   0:00 grep --color=auto /usr/bin/

第三列和第四列分别是CPU和内存，所以将这两列的每行相加即可

[root@172-0-10-222 shell-test]# ps -aux | grep /usr/bin/ | awk '{cpu+=$3;mem+=$4}END{print cpu,mem}'
0.1 1.6

（6）awk中if的使用

if判断，跟之前的pattern{action}模式功能类似。

测试目录：

[root@172-0-10-222 shell-test]# ll
total 100
-rw-r--r--. 1 root root    0 Oct 21 19:20 4]]
-rw-r--r--. 1 root root 6048 Nov 13 16:19 dataMigration_part1.sh
-rwxr-xr-x. 1 root root 1059 Nov 13 15:10 dataMigration_part2.sh
-rwxr-xr-x. 1 root root  104 Oct 21 19:19 expr_test.sh
-rwxr-xr-x. 1 root root  161 Oct 16 20:22 input_test.sh
drwxr-xr-x. 2 root root    6 Nov  6 14:02 ipconf
-rw-r--r--. 1 root root  160 Nov  6 11:22 ip.txt
-rwxr-xr-x. 1 root root   94 Oct 18 17:36 let_test2.sh
-rwxr-xr-x. 1 root root  229 Oct 18 17:26 let_test.sh
-rwxr-xr-x. 1 root root 1107 Nov 12 11:31 modifyDir.sh
-rw-r--r--. 1 root root  738 Nov 13 09:00 modifyNfsConfiguration.sh
-rw-r--r--. 1 root root 2798 Nov 13 11:59 modifyShareConfiguration.sh
drwxr-xr-x. 2 root root  197 Nov 11 14:15 myfolder
-rw-r--r--. 1 root root   17 Nov 12 20:24 test
-rwxr-xr-x. 1 root root  243 Nov  6 11:05 test_break_continue.sh
-rwxr-xr-x. 1 root root  234 Oct 30 11:23 test_case.sh
-rw-r--r--. 1 root root   76 Oct 16 20:22 test.cnf
-rw-r--r--. 1 root root  115 Nov 14 13:44 testfile
-rw-r--r--. 1 root root   63 Nov 14 10:49 testfile2
-rwxr-xr-x. 1 root root  241 Oct 31 19:34 test_for.sh
-rwxr-xr-x. 1 root root  395 Oct 30 11:17 test_if.sh
-rw-r--r--. 1 root root   84 Nov 13 14:23 test.sh
-rwxr-xr-x. 1 root root   78 Oct 31 19:57 test_while_case.sh
-rwxr-xr-x. 1 root root  367 Nov  6 13:59 test_whilereadline2.sh
-rwxr-xr-x. 1 root root  313 Nov  6 11:38 test_whilereadline.sh
-rwxr-xr-x. 1 root root  217 Oct 31 19:53 test_while.sh
-rwxr-xr-x. 1 root root  254 Oct 31 20:19 test_xunhuan_case.sh

查询出指定目录下，大于1K的所有文件。

[root@172-0-10-222 shell-test]# ll | awk '/^-/{if($5>1024){print $0}}'
-rw-r--r--. 1 root root 6048 Nov 13 16:19 dataMigration_part1.sh
-rwxr-xr-x. 1 root root 1059 Nov 13 15:10 dataMigration_part2.sh
-rwxr-xr-x. 1 root root 1107 Nov 12 11:31 modifyDir.sh
-rw-r--r--. 1 root root 2798 Nov 13 11:59 modifyShareConfiguration.sh

（7）awk的for循环

测试文件：

[root@172-0-10-222 shell-test]# cat testfile
 ll 201907001 80 97 70
 kk 201907002 90 97 90
 hh 201907003 67 60 77
 jj 201907004 59 57 58
 aa 201907005 89 62 63

循环输出按照默认分隔符（空格）分割得到的数据

[root@172-0-10-222 shell-test]# cat testfile | awk '{for(i=1;i<=NF;i++){print $i}}'
ll
201907001
80
97
70
kk
201907002
90
97
90
hh
201907003
67
60
77
jj
201907004
59
57
58
aa
201907005
89
62
63

如果想在上面输出数据的基础上，打印成最开始文件的内容格式，可以采用如下做法

[root@172-0-10-222 shell-test]# cat testfile | awk '{for(i=1;i<=NF;i++){printf $i" "}print xxoo}'
ll 201907001 80 97 70 
kk 201907002 90 97 90 
hh 201907003 67 60 77 
jj 201907004 59 57 58 
aa 201907005 89 62 63

其中，print xxoo表示打印一个空变量，效果就相当于空行。也可以写print ""或者print " "都行。

（8）awk中使用正则表达式

在awk中使用正则表达式，基本都是用来匹配条件，然后进行打印输出的。因此，使用正则的位置通常都是在if条件中（或者前面说的patten位置）。

测试文件：

[root@172-0-10-222 shell-test]# cat testfile
ll 201907001 80 97 70
kk 201907002 90 97 90
hh 201908003 67 60 77
jj 201908004 59 57 58
aa 201909005 67 82 63

a）输出第二行以后的所有行

[root@172-0-10-222 shell-test]# cat testfile | awk 'NR>=2{print $0}'
kk 201907002 90 97 90
hh 201908003 67 60 77
jj 201908004 59 57 58
aa 201909005 67 82 63

或者写成：cat testfile | awk '{if(NR>=2){print $0}}'

b）使用正则表达式过滤出包含8的行

[root@172-0-10-222 shell-test]# cat testfile | awk '/8/&&NR>=2{print $0}'
hh 201908003 67 60 77
jj 201908004 59 57 58
aa 201909005 67 82 63

或者写成：cat testfile | awk '/8/{if(NR>=2){print $0}}'

c）对上述结果过滤出第二列包含8的行

[root@172-0-10-222 shell-test]# cat testfile | awk '$2~/8/&&/8/&&NR>=2{print $0}'
hh 201908003 67 60 77
jj 201908004 59 57 58

也可以这样写：cat testfile | awk '$2~/8/&&/8/{if(NR>=2){print $0}}'

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)