文章目录
- 第一章 - Introduction to adversarial robustness
- 第二章 - linear models
- 第三章 - Adversarial examples, solving the inner maximization
- 1.非针对性攻击
- 2.针对性攻击(基于改进的PGD->the (normalized) steepest descent method)
- 3.组合优化解决内部最大问题
- 第四章 - Adversarial training, solving the outer minimization
- 1. 方案目标
- 2. 可选择方案
- 3. 方案实施
- 4. 代码
Adversarial Robustness - Theory and Practice
第一章 - Introduction to adversarial robustness
我运行Adversarial Robustness-Theory and Practice.introduction代码,加载resnet50,看到在注入噪音后,pig图像被算法误认为是airliner。
第二章 - linear models
(1) 加载MINIST数据集
(2) 对数据进行常规训练,TEST_ERR错误率仅仅0.04%.
(3) 开始进行对抗攻击,随机干扰数EPSILON =0.2
(4) 实行对抗攻击,发现TEST_ERR的错误率从之前的0.04%骤升到85%左右。
(5) 然后进行鲁棒训练,最核心的是MODEL(X.VIEW(X.SHAPE[0], -1))[:,0] - EPSILON*(2*Y.FLOAT()-1)*MODEL.WEIGHT.NORM(1)
(6) 鲁棒训练完成后,任何对抗攻击不会让TEST_ERROR高于2.5%。此时非对抗攻击得到的TEST_ERROR的错误率是0.3%左右(这个结果是大于之前的0.04%的)。这是鲁棒训练20个周期的结果,我测试了下,如果加大训练周期,并不会让结果更优。
也就是说进行鲁棒训练会提升抵抗对抗攻击的能力,但是同时会小幅度提升TEST_ERROR的比率。
第三章 - Adversarial examples, solving the inner maximization
1.非针对性攻击
主要方法是FGSM和PGD。PGD是迭代更新,比FGSM的迭代次数多。但是当梯度很小的时候,传统的PGD的效果也不好,于是出现the (normalized) steepest descent method.相对于传统PGD算法,它的delta.data = (delta + alpha*delta.grad.detach().sign()).clamp(-epsilon,epsilon)。这种改进的PGD的表现仍然受到目标内局部最优可能性的限制,虽然不可能完全避免局部最优,但可以通过随机重启来缓解这个问题。
2.针对性攻击(基于改进的PGD->the (normalized) steepest descent method)
最大化真实label的损失函数,并最小化目标label的损失函数,这相当于解决内部优化问题。下面是几种损失函数设计
(1)loss = (yp[:,y_targ] - yp.gather(1,y[:,None])[:,0]).sum()
缺点:仅仅让非零数字欺骗分类器。原因在于我们是the class logit for the zero minus the class logit for the true class. 但是我们实际上并不关心其他类的情况。所以我们可以修改损失函数为下面这种。
(2)loss = 2*yp[:,y_targ].sum() - yp.sum()
缺点:不能达到100%正确率。
(3) 占个位,这个不太懂。
3.组合优化解决内部最大问题
有一些寻找边界区间界限的方法,但是被轻微扰动后,区间界限上下浮动比较大,不实用。最终用的方法是混合整数线性规划策略。代码部分主要是利用cvxpy构建了很多constraints。
关于优化这部分内容不需要细看。如果细看的话,估计2年也看不完。大概知道做什么的就行了。
第四章 - Adversarial training, solving the outer minimization
1. 方案目标
The goal of the robust optimization formulation, therefore, is to ensure that the model cannot be attacked even if the adversary has full knowledge of the model.
In other words, no matter what attack an adversary uses, we want to have a model that performs well.
2. 可选择方案
2.1 local gradient-based search (providing a lower bound on the objective) 基于局部梯度的搜索
2.2 exact combinatorial optimization (exactly solving the objective) 精确的组合优化 (不实用)
2.3. convex relaxations (providing a provable upper bound on the objective) 凸松弛
但是经过分析,法2不实用,最终的可行方案是下面两个
2.1.Using lower bounds, and examples constructed via local search methods, to train an (empirically) adversarially robust classifier.
2.3Using convex upper bounds, to train a provably robust classifier.
3. 方案实施
The basic idea is to simply create and then incorporate adversarial examples into the training process
the question arises as to which adversarial examples we should train on?
4. 代码
4.1 加载minist数据集
4.2 初始化model_cnn
4.3 定义fgsm、pgd函数
4.4 定义标准训练函数、对抗攻击函数
4.5 进行联合训练(基于cnn)
opt = optim.SGD(model_cnn.parameters(), lr=1e-1)
for t in range(10):
train_err, train_loss = epoch(train_loader, model_cnn, opt)
test_err, test_loss = epoch(test_loader, model_cnn)
adv_err, adv_loss = epoch_adversarial(test_loader, model_cnn, pgd_linf)
if t == 4:
for param_group in opt.param_groups:
param_group["lr"] = 1e-2
print(*("{:.6f}".format(i) for i in (train_err, test_err, adv_err)), sep="\t")
torch.save(model_cnn.state_dict(), "model_cnn.pt")
So as we saw before, the clean error is quite low, but the adversarial error is quite high (and actually goes up as we train the model more). Let’s now do the same thing, but with adversarial training.
4.6 做一些happy的事情
opt = optim.SGD(model_cnn_robust.parameters(), lr=1e-1)
for t in range(10):
train_err, train_loss = epoch_adversarial(train_loader, model_cnn_robust, pgd_linf, opt)
test_err, test_loss = epoch(test_loader, model_cnn_robust)
adv_err, adv_loss = epoch_adversarial(test_loader, model_cnn_robust, pgd_linf)
if t == 4:
for param_group in opt.param_groups:
param_group["lr"] = 1e-2
print(*("{:.6f}".format(i) for i in (train_err, test_err, adv_err)), sep="\t")
torch.save(model_cnn_robust.state_dict(), "model_cnn_robust.pt")
pretty good!
4.7 对比两个不同的cnn
model_cnn_robust = nn.Sequential(nn.Conv2d(1, 32, 3, padding=1), nn.ReLU(),
nn.Conv2d(32, 32, 3, padding=1, stride=2), nn.ReLU(),
nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(),
nn.Conv2d(64, 64, 3, padding=1, stride=2), nn.ReLU(),
Flatten(),
nn.Linear(7*7*64, 100), nn.ReLU(),
nn.Linear(100, 10)).to(device)
model_cnn = nn.Sequential(nn.Conv2d(1, 32, 3, padding=1), nn.ReLU(),
nn.Conv2d(32, 32, 3, padding=1, stride=2), nn.ReLU(),
nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(),
nn.Conv2d(64, 64, 3, padding=1, stride=2), nn.ReLU(),
Flatten(),
nn.Linear(7*7*64, 100), nn.ReLU(),
nn.Linear(100, 10)).to(device)
额额,一模一样…
4.8 对抗攻击?对抗训练?
损失函数的目标是最小化。
在chapter 1里,通过在损失函数前加符号,可以控制损失函数最大化和最小化。而且损失函数可以联合训练哦。
在chapter 3里,仅仅攻击训练,而不去反向传播更新权值,会大化损失函数(相当于改变了输入)。
在chapter 4里,通过对有干扰(fgsm或者pgd产生的对抗样本)的输入的损失函数最小化,相当于进行了对抗训练。
4.9 完美收官
经过对抗训练后,贼6!
4.10 凸优化
4.10.1 策略一
注:对比chapter 3 bound_propagation 是一样的
def bound_propagation(model, initial_bound):
l, u = initial_bound
bounds = []
for layer in model:
if isinstance(layer, Flatten):
l_ = Flatten()(l)
u_ = Flatten()(u)
elif isinstance(layer, nn.Linear):
l_ = (layer.weight.clamp(min=0) @ l.t() + layer.weight.clamp(max=0) @ u.t()
+ layer.bias[:,None]).t()
u_ = (layer.weight.clamp(min=0) @ u.t() + layer.weight.clamp(max=0) @ l.t()
+ layer.bias[:,None]).t()
elif isinstance(layer, nn.Conv2d):
l_ = (nn.functional.conv2d(l, layer.weight.clamp(min=0), bias=None,
stride=layer.stride, padding=layer.padding,
dilation=layer.dilation, groups=layer.groups) +
nn.functional.conv2d(u, layer.weight.clamp(max=0), bias=None,
stride=layer.stride, padding=layer.padding,
dilation=layer.dilation, groups=layer.groups) +
layer.bias[None,:,None,None])
u_ = (nn.functional.conv2d(u, layer.weight.clamp(min=0), bias=None,
stride=layer.stride, padding=layer.padding,
dilation=layer.dilation, groups=layer.groups) +
nn.functional.conv2d(l, layer.weight.clamp(max=0), bias=None,
stride=layer.stride, padding=layer.padding,
dilation=layer.dilation, groups=layer.groups) +
layer.bias[None,:,None,None])
elif isinstance(layer, nn.ReLU):
l_ = l.clamp(min=0)
u_ = u.clamp(min=0)
bounds.append((l_, u_))
l,u = l_, u_
return bounds
注:对比chapter 3 bound_propagation 有差异 添加了idx
def interval_based_bound(model, c, bounds, idx):
cW = c.t() @ model[-1].weight
cb = c.t() @ model[-1].bias
l,u = bounds[-2]
return (cW.clamp(min=0) @ l[idx].t() + cW.clamp(max=0) @ u[idx].t() + cb[:,None]).t()
注:新添加
def robust_bound_error(model, X, y, epsilon):
initial_bound = (X - epsilon, X + epsilon)
err = 0
for y0 in range(10):
C = -torch.eye(10).to(device)
C[y0,:] += 1
err += (interval_based_bound(model, C, bounds, y==y0).min(dim=1)[0] < 0).sum().item()
return err
注:新添加
def epoch_robust_bound(loader, model, epsilon):
total_err = 0
C = [-torch.eye(10).to(device) for _ in range(10)]
for y0 in range(10):
C[y0][y0,:] += 1
for X,y in loader:
X,y = X.to(device), y.to(device)
initial_bound = (X - epsilon, X + epsilon)
bounds = bound_propagation(model, initial_bound)
for y0 in range(10):
lower_bound = interval_based_bound(model, C[y0], bounds, y==y0)
total_err += (lower_bound.min(dim=1)[0] < 0).sum().item()
return total_err / len(loader.dataset)
That doesn’t seem particularly useful, and indeed, it is a property of virtually all the relaxation-based verification approaches, is that they are vaccuous when evaluated upon a network trained without knowledge of these bounds.
4.10.2 策略二 Training using provable criteria
if we train a network specifically to minimize a loss based upon this upper bound, we get a network where the bounds are meaningful. This is a somewhat subtle but important point which is worth repeating.
def epoch_robust_bound(loader, model, epsilon, opt=None):
total_err = 0
total_loss = 0
C = [-torch.eye(10).to(device) for _ in range(10)]
for y0 in range(10):
C[y0][y0,:] += 1
for X,y in loader:
X,y = X.to(device), y.to(device)
initial_bound = (X - epsilon, X + epsilon)
bounds = bound_propagation(model, initial_bound)
loss = 0
for y0 in range(10):
if sum(y==y0) > 0:
lower_bound = interval_based_bound(model, C[y0], bounds, y==y0)
loss += nn.CrossEntropyLoss(reduction='sum')(-lower_bound, y[y==y0]) / X.shape[0]
total_err += (lower_bound.min(dim=1)[0] < 0).sum().item()
total_loss += loss.item() * X.shape[0]
if opt:
opt.zero_grad()
loss.backward()
opt.step()
return total_err / len(loader.dataset), total_loss / len(loader.dataset)
!!!Finally, let’s train our model using this robust loss bound. Note that training rovably robust models is a bit of a tricky business. If we start out immediately by trying to train our robust bound with the full ϵ=0.1, the model will collapse to just predicting equal probability for all digits, and will never recover. Instead, to reliably train such models we need to schedule ϵ during the training process, starting with a small ϵ and gradually raising it to the desired level. The schedule we use below was picked rather randomly, and we can do much better with a bit of tweaking, but it serves our basic purpose.
torch.manual_seed(0)
model_cnn_robust_2 = nn.Sequential(nn.Conv2d(1, 32, 3, padding=1, stride=2), nn.ReLU(),
nn.Conv2d(32, 32, 3, padding=1, ), nn.ReLU(),
nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(),
nn.Conv2d(64, 64, 3, padding=1, stride=2), nn.ReLU(),
Flatten(),
nn.Linear(7*7*64, 100), nn.ReLU(),
nn.Linear(100, 10)).to(device)
opt = optim.SGD(model_cnn_robust_2.parameters(), lr=1e-1)
eps_schedule = [0.0, 0.0001, 0.001, 0.01, 0.01, 0.05, 0.05, 0.05, 0.05, 0.05] + 15*[0.1]
print("Train Eps", "Train Loss*", "Test Err", "Test Robust Err", sep="\t")
for t in range(len(eps_schedule)):
train_err, train_loss = epoch_robust_bound(train_loader, model_cnn_robust_2, eps_schedule[t], opt)
test_err, test_loss = epoch(test_loader, model_cnn_robust_2)
adv_err, adv_loss = epoch_robust_bound(test_loader, model_cnn_robust_2, 0.1)
print(*("{:.6f}".format(i) for i in (eps_schedule[t], train_loss, test_err, adv_err)), sep="\t")
torch.save(model_cnn_robust_2.state_dict(), "model_cnn_robust_2.pt")
Train Eps Train Loss* Test Err Test Robust Err
0.000000 0.829700 0.033800 1.000000
0.000100 0.126095 0.022200 1.000000
0.001000 0.119049 0.021500 1.000000
0.010000 0.227829 0.019100 1.000000
0.010000 0.129322 0.022900 1.000000
0.050000 1.716497 0.162200 0.828500
0.050000 0.744732 0.092100 0.625100
0.050000 0.486411 0.073800 0.309600
0.050000 0.393822 0.068100 0.197800
0.050000 0.345183 0.057100 0.169200
0.100000 0.493925 0.068400 0.129900
0.100000 0.444281 0.067200 0.122300
0.100000 0.419961 0.063300 0.117400
0.100000 0.406877 0.061300 0.114700
0.100000 0.401603 0.061500 0.116400
0.100000 0.387260 0.059600 0.111100
0.100000 0.383182 0.059400 0.108500
0.100000 0.375468 0.057900 0.107200
0.100000 0.369453 0.056800 0.107000
0.100000 0.365821 0.061300 0.116300
0.100000 0.359339 0.053600 0.104200
0.100000 0.358043 0.053000 0.097500
0.100000 0.354643 0.055700 0.101500
0.100000 0.352465 0.053500 0.096800
0.100000 0.348765 0.051500 0.096700
print("PGD, 40 iter: ", epoch_adversarial(test_loader, model_cnn_robust_2, pgd_linf, num_iter=40)[0])
PGD, 40 iter: 0.0779
So somewhere right in the middle. Note also that training these provably robust models is a challenging task, and a bit of tweaking (even still using interval bounds) can perform quite a bit better. For now, though, this is sufficient to make our point that we can obtain non-trivial provable bounds for trained networks.
Notes:还有很长的路要走…
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)