透过 Ansible 自动部署节点(上)
假设我们有这么一个网络:
- 一个 HTTP 服务器,只供内网访问,内网节点必须能透过 http://app.example.com/ 链接造访该网站
- 一个 DNS 服务器,把 app.example.com 解析为内网 HTTP 服务器的 IP
- 一个 Ansible 控制节点
如何能让这三个节点都用上内网的 DNS 服务器,继而能透过 http://app.example.com/ 链接造访内网网站呢?当然,我们能登入每个节点分别进行手动配置,但当节点的数量增多,分别手动配置的可行性将不断降低。因此,一个更为实际的长远方案是透过自动化的方式批量部署节点,而 Ansible 的社区版正提供了这么一个方案。
实验环境概括
该实验采取 阿里云 ECS 虚拟化方案,内网网段是 192.168.1.0/24
。
角色 | IP 地址 | 操作系统 |
---|
HTTP 服务器 | 192.168.1.200 | Alibaba Cloud Linux 3.2104 LTS |
DNS 服务器 | 192.168.1.201 | Alibaba Cloud Linux 3.2104 LTS |
Ansible 控制节点 | 192.168.1.202 | Alibaba Cloud Linux 3.2104 LTS |
配置实验环境
HTTP 服务器
安装 Apache HTTP 服务器:
sudo dnf install httpd
把以下内容写进 /var/www/html/index.html
:
<html>
<head>
<title>欢迎来到 app.example.com</title>
</head>
<body>
<h1>欢迎来到 app.example.com</h1>
</body>
</html>
启用 Apache:
sudo systemctl enable --now httpd
确认 HTTP 服务器成功安装并正常运行:
systemctl status httpd
如成功,运行状态应该为 active (running)
。
再测试一下吧:
wget -qO - http://localhost/
输出:
<html>
<head>
<title>欢迎来到 app.example.com</title>
</head>
<body>
<h1>欢迎来到 app.example.com</h1>
</body>
</html>
DNS 服务器
先测试一下能造访我们的内网网站:
wget -qO - http://192.168.1.200/
输出:
<html>
<head>
<title>欢迎来到 app.example.com</title>
</head>
<body>
<h1>欢迎来到 app.example.com</h1>
</body>
</html>
安装 BIND:
sudo dnf install bind bind-utils
然后把 /etc/named.conf
覆盖:
//
// named.conf
//
// Provided by Red Hat bind package to configure the ISC BIND named(8) DNS
// server as a caching only nameserver (as a localhost DNS resolver only).
//
// See /usr/share/doc/bind*/sample/ for example named configuration files.
//
acl intranet { 192.168.1.0/24; };
options {
listen-on port 53 { 127.0.0.1; intranet; };
listen-on-v6 port 53 { ::1; };
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named_stats.txt";
memstatistics-file "/var/named/data/named_mem_stats.txt";
secroots-file "/var/named/data/named.secroots";
recursing-file "/var/named/data/named.recursing";
allow-query { localhost; intranet; };
/*
- If you are building an AUTHORITATIVE DNS server, do NOT enable recursion.
- If you are building a RECURSIVE (caching) DNS server, you need to enable
recursion.
- If your recursive DNS server has a public IP address, you MUST enable access
control to limit queries to your legitimate users. Failing to do so will
cause your server to become part of large scale DNS amplification
attacks. Implementing BCP38 within your network would greatly
reduce such attack surface
*/
recursion yes;
allow-recursion { localhost; intranet; };
dnssec-enable yes;
dnssec-validation yes;
managed-keys-directory "/var/named/dynamic";
pid-file "/run/named/named.pid";
session-keyfile "/run/named/session.key";
/* https://fedoraproject.org/wiki/Changes/CryptoPolicy */
include "/etc/crypto-policies/back-ends/bind.config";
};
logging {
channel default_debug {
file "data/named.run";
severity dynamic;
};
};
zone "example.com" {
type master;
file "/etc/named/zones/db.example.com";
};
#zone "." IN {
# type hint;
# file "named.ca";
#};
zone "." {
type forward;
forward only;
forwarders { 8.8.8.8; 8.8.4.4; };
};
include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";
这配置文档我不多作解释了,详情请参考我此前发布 关于 DNS 的文章 或 RTFM 😉
创建 /etc/named/zones/
目录:
sudo mkdir -p /etc/named/zones/
写入 /etc/named/zones/db.example.com
:
$TTL 604800
@ IN SOA ns1.example.com. admin.example.com. (
1
604800
86400
2419200
604800 )
IN NS ns1
ns1 IN A 192.168.1.201
app IN A 192.168.1.200
确保配置文档没问题:
sudo named-checkconf
sudo named-checkzone example.com /etc/named/zones/db.example.com
正常输出为:
zone example.com/IN: loaded serial 1
OK
启用 BIND:
sudo systemctl enable --now named
查看 BIND 运行状态 —— 正常为 active (running)
:
systemctl status named
测试一下内外网解析:
host csdn.net localhost
host app.example.com localhost
host ns1.example.com localhost
输出:
...
csdn.net has address 39.106.226.142
...
app.example.com has address 192.168.1.200
...
ns1.example.com has address 192.168.1.201
只可惜,因为我们未设置节点默认使用 192.168.1.201
DNS 服务器;因此还没能直接透过 http://app.example.com/ 造访内网网站:
wget -O - http://app.example.com/
--2023-01-07 11:33:16-- http://app.example.com/
Resolving app.example.com (app.example.com)... failed: Name or service not known.
wget: unable to resolve host address ‘app.example.com’
Ansible 控制节点
先确认一下我们能造访内网网站:
wget -qO - http://192.168.1.200/
<html>
<head>
<title>欢迎来到 app.example.com</title>
</head>
<body>
<h1>欢迎来到 app.example.com</h1>
</body>
</html>
安装 DNS 测试工具:
sudo dnf install bind-utils
再测试内网 DNS 是否可用:
host csdn.net 192.168.1.201
host app.example.com 192.168.1.201
host ns1.example.com 192.168.1.201
...
csdn.net has address 39.106.226.142
...
app.example.com has address 192.168.1.200
...
ns1.example.com has address 192.168.1.201
安装 Ansible 控制工具
参考文章: Installing Ansible
安装 Ansible 前先确认有 pip
包管理器:
python3 -m pip -V
pip 9.0.3 from /usr/lib/python3.6/site-packages (python 3.6)
另外也需要安装 Git:
sudo dnf install git
然后透过 pip
安装 Ansible 控制节点相关工具:
python3 -m pip install --user ansible
python3 -m pip install --user ansible-lint
确认 Ansible 已成功安装:
ansible --version
ansible [core 2.11.12]
config file = None
configured module search path = ['/home/ecs-user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/ecs-user/.local/lib/python3.6/site-packages/ansible
ansible collection location = /home/ecs-user/.ansible/collections:/usr/share/ansible/collections
executable location = /home/ecs-user/.local/bin/ansible
python version = 3.6.8 (default, Oct 18 2022, 18:55:55) [GCC 10.2.1 20200825 (Alibaba 10.2.1-3 2.32)]
jinja version = 2.10.1
libyaml = True
Ansible 配置
参考文章: Ansible Configuration Settings
首先在 $HOME
目录下创建一个 .ansible.cfg
配置文档,并写入相关配置内容:
cat > $HOME/.ansible.cfg << EOF
[defaults]
inventory = $HOME/.config/ansible/hosts
EOF
创建 Ansible 库存清单
参考文章: How to build your inventory
Ansible 库存清单能让你定义需要批量推送配置的节点。库存清单默认的路径是 /etc/ansible/hosts
,但修改此文档需要 root
权限;为了避免过度使用 root
权限,刚才的 $HOME/.ansible.cfg
配置文档已经把库存清单路径重新设置为 $HOME/.config/ansible/hosts
:
grep inventory $HOME/.ansible.cfg
inventory = /home/ecs-user/.config/ansible/hosts
先确保 $HOME/.config/ansible/
目录存在:
mkdir -p $HOME/.config/ansible/
然后写入我们的库存清单:
cat > $HOME/.config/ansible/hosts << EOF
[servers]
192.168.1.200
192.168.1.201
192.168.1.202
EOF
再看一下库存清单内容:
cat $HOME/.config/ansible/hosts
[servers]
192.168.1.200
192.168.1.201
192.168.1.202
servers
是群组名称,定义了一组节点的 IP 地址,这些在 servers
群组下的节点将是我们需要批量推送配置的服务器,我们称之为「被控节点」。这里,我们把 HTTP 服务器、DNS 服务器及 Ansible 控制节点列为 servers
群组里的被控节点。
用 ansible-inventory
输出 Ansible 所理解的库存清单配置,确认我们的 $HOME/.config/ansible/hosts
配置文档无误:
ansible-inventory --list --yaml
all:
children:
servers:
hosts:
192.168.1.200: {}
192.168.1.201: {}
192.168.1.202: {}
ungrouped: {}
写 Playbook 前的准备
Playbook 是 Ansible 控制节点用来批量配置被控节点的文档,格式为 YAML。写 Playbook 前要确保控制节点能透过 SSH 密钥免密登录各被控节点的 root
账户。
首先,于控制节点生成 SSH 密钥对:
ssh-keygen
然后把 SSH 公钥导出来:
cat $HOME/.ssh/id_rsa.pub
拷贝以上命令的输出,例如:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDddfF/Mk+TcJeDy9pcszDrMeP4Ux8Quo/DcS+gaN+YqByHEiR7i1eG46YQwadwPmgDlRUSTwAjDQprHnFyAaSlYulI8ptKPwXL6ptWXDpnmcgyJX5IfbKHPkMNQrZSI0x9Jcl1QpXIXQd+bRxPPovf/sT2WEnbuNUHQjBsQmDaHxfOsbsRw8tybldXIfzPdMoIao+XUBOzG3u9scZlU2hZZY10kTtAWCsLcU4L0dcXAlUKb5zKCy91Gj8u7vHoYX29aZvJ1/ehbJaYjgm5j6AD1IbtmbB+bdtR0a8sAY+1BZD71y/iD7lLEgAEE4vFg7MJpTsCdn1/1SSvINaiiDfNwYzCi3Pla4bPg0wlpjwyqFkpxpMX9xFlPzg0tORSB5MLgQhon6eng05ciZeq3lOoQW3q03WIGDGDfwgHUZCX5/vv0ZHMyWmhNOtWCSeQeXGMYtFt0q5eD1S4d0ed/igoNaRJmP+vSPIhYDDmNNk2YZ5rM1GlpY9olpOMwYZCQLM= ecs-user@ansible-lab-control
然后在 每一个被控节点(HTTP 服务器、DNS 服务器、Ansible 控制节点) 的 /root/.ssh/authorized_keys
文档下方附加上生成的 SSH 公钥。
倘若成功,您应该能顺利 SSH 到每一个节点的 root
账户,并完全不需要输入密码:
ssh root@192.168.1.200
ssh root@192.168.1.201
ssh root@192.168.1.202
我们第一个 Playbook
参考文章: Ansible playbooks
创建一个 $HOME/playbooks/
目录,在这里放置我们的 playbook:
mkdir -p $HOME/playbooks/
接着创建一个新的文档 $HOME/playbooks/specify-dns.yaml
,内容如下:
---
- name: Specify DNS settings for all server nodes
hosts: servers
remote_user: root
tasks:
- name: Ensure NetworkManager is started
ansible.builtin.service:
name: NetworkManager
state: started
这个 playbook 将以 root
账户(remote_user: root
)登入 servers
群组的每一个节点(hosts: servers
),确保 NetworkManager 已经启动。虽然 Alibaba Cloud Linux 上 NetworkManager 未启动会直接导致节点不可访问,但此测试项目确保所有节点均透过 NetworkManager 联网,排除了节点使用其他联网方式的可能性,例如 Ubuntu 使用 netplan.io 或 Windows 使用别的联网方案,因此还是有那么一点点价值的。
用 ansible-lint
验证 playbook 基本 YAML 语法的有效性:
ansible-lint $HOME/playbooks/specify-dns.yaml
WARNING: PATH altered to include /usr/bin
[DEPRECATION WARNING]: Ansible will require Python 3.8 or newer on the
controller starting with Ansible 2.12. Current version: 3.6.8 (default, Oct 18
2022, 18:55:55) [GCC 10.2.1 20200825 (Alibaba 10.2.1-3 2.32)]. This feature
will be removed from ansible-core in version 2.12. Deprecation warnings can be
disabled by setting deprecation_warnings=False in ansible.cfg.
没有显示错误即可,警告可直接忽略。
运行 playbook 用 ansible-playbook
:
ansible-playbook $HOME/playbooks/specify-dns.yaml
[DEPRECATION WARNING]: Ansible will require Python 3.8 or newer on the controller starting with Ansible 2.12.
Current version: 3.6.8 (default, Oct 18 2022, 18:55:55) [GCC 10.2.1 20200825 (Alibaba 10.2.1-3 2.32)]. This
feature will be removed from ansible-core in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
PLAY [Specify DNS settings for all server nodes] *************************************************************
TASK [Gathering Facts] ***************************************************************************************
[WARNING]: Platform linux on host 192.168.1.200 is using the discovered Python interpreter at
/usr/bin/python, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.11/reference_appendices/interpreter_discovery.html for more
information.
ok: [192.168.1.200]
[WARNING]: Platform linux on host 192.168.1.201 is using the discovered Python interpreter at
/usr/bin/python, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.11/reference_appendices/interpreter_discovery.html for more
information.
ok: [192.168.1.201]
[WARNING]: Platform linux on host 192.168.1.202 is using the discovered Python interpreter at
/usr/bin/python, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.11/reference_appendices/interpreter_discovery.html for more
information.
ok: [192.168.1.202]
TASK [Ensure NetworkManager is started] **********************************************************************
ok: [192.168.1.200]
ok: [192.168.1.201]
ok: [192.168.1.202]
PLAY RECAP ***************************************************************************************************
192.168.1.200 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
192.168.1.201 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
192.168.1.202 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
成功了 😄 证明我们基本的 Ansible 配置没问题,但现在需要做的是使用 Ansible 批量设置节点的 DNS 服务器为 192.168.1.201
。
使用 playbook 修改 DNS 配置
参考文章: community.general.nmcli module
Ansible playbook 的众多功能由模块提供,刚才确保 NetworkManager 服务启动的功能由内置 ansible.builtin
模块提供。透过 nmcli
修改 DNS 配置的功能则由 community.general
模块提供。
修改一下刚才的 $HOME/playbooks/specify-dns.yaml
playbook,更新后的内容如下:
---
- name: Specify DNS settings for all server nodes
hosts: servers
remote_user: root
tasks:
- name: Ensure NetworkManager is started
ansible.builtin.service:
name: NetworkManager
state: started
- name: Ensure DNS server is set to 192.168.1.201
community.general.nmcli:
ifname: eth0
conn_name: System eth0
type: ethernet
state: present
dns4_ignore_auto: true
dns4:
- 192.168.1.201
- name: Ensure NetworkManager is restarted
ansible.builtin.service:
name: NetworkManager
state: restarted
这里新增了两个任务:
- 确保相应网卡的 DNS 设置为
192.168.1.201
(dns4: [ 192.168.1.201 ]
),并禁用自动 DNS 配置以免影响 DNS 返回结果(dns4_ignore_auto: true
)。这里的 ifname
、conn_name
及 type
选项值参考了 sudo nmcli con show
的输出结果 - 确保设置 DNS 后重启 NetworkManager,不然新 DNS 配置不会立即生效
再用 ansible-lint
验证 YAML 语法:
ansible-lint $HOME/playbooks/specify-dns.yaml
然后把修改好的 playbook 再跑一次:
ansible-playbook $HOME/playbooks/specify-dns.yaml
[DEPRECATION WARNING]: Ansible will require Python 3.8 or newer on the controller starting with Ansible 2.12.
Current version: 3.6.8 (default, Oct 18 2022, 18:55:55) [GCC 10.2.1 20200825 (Alibaba 10.2.1-3 2.32)]. This
feature will be removed from ansible-core in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
PLAY [Specify DNS settings for all server nodes] *************************************************************
TASK [Gathering Facts] ***************************************************************************************
[WARNING]: Platform linux on host 192.168.1.200 is using the discovered Python interpreter at
/usr/bin/python, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.11/reference_appendices/interpreter_discovery.html for more
information.
ok: [192.168.1.200]
[WARNING]: Platform linux on host 192.168.1.202 is using the discovered Python interpreter at
/usr/bin/python, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.11/reference_appendices/interpreter_discovery.html for more
information.
ok: [192.168.1.202]
[WARNING]: Platform linux on host 192.168.1.201 is using the discovered Python interpreter at
/usr/bin/python, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.11/reference_appendices/interpreter_discovery.html for more
information.
ok: [192.168.1.201]
TASK [Ensure NetworkManager is started] **********************************************************************
ok: [192.168.1.202]
ok: [192.168.1.200]
ok: [192.168.1.201]
TASK [Ensure DNS server is set to 192.168.1.201] *************************************************************
ok: [192.168.1.200]
ok: [192.168.1.201]
ok: [192.168.1.202]
TASK [Ensure NetworkManager is restarted] ********************************************************************
changed: [192.168.1.201]
changed: [192.168.1.200]
changed: [192.168.1.202]
PLAY RECAP ***************************************************************************************************
192.168.1.200 : ok=4 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
192.168.1.201 : ok=4 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
192.168.1.202 : ok=4 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
最后分别登录每个节点,造访 http://app.example.com/,若成功则不会再报错:
wget -qO - http://app.example.com/
<html>
<head>
<title>欢迎来到 app.example.com</title>
</head>
<body>
<h1>欢迎来到 app.example.com</h1>
</body>
</html>
结语
从这个实验我们可以看到 Ansible 能以自动化的方式快速有效同步多个节点的配置。除此以外,Ansible 绝大部分模块的功能是 幂等 (idempotent) 的,这确保了已同步配置的节点不会因为多余的重新配置而报错。因此,对于在一个传统 Linux 环境中批量同步多个 Linux 节点的配置,Ansible 绝对是一个非常不错的选择。
只可惜,现实生产环境很多时候未必所有的节点都运行 Linux —— 即使生产环境以 Linux 为主,也难免一些客户端会采用 Windows 或 macOS。针对这些非 Linux 的终端尤其是 Windows,Ansible 是否完全派不上用场?非也!有关使用 Ansible 管理 Windows 终端,敬请期待我的下一篇文章 😉
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)