Adversarial Robustness - Theory and Practice

第一章 - Introduction to adversarial robustness

我运行Adversarial Robustness-Theory and Practice.introduction代码,加载resnet50,看到在注入噪音后,pig图像被算法误认为是airliner。

第二章 - linear models

(1) 加载MINIST数据集

(2) 对数据进行常规训练,TEST_ERR错误率仅仅0.04%.

(3) 开始进行对抗攻击,随机干扰数EPSILON =0.2

(4) 实行对抗攻击,发现TEST_ERR的错误率从之前的0.04%骤升到85%左右。

(5) 然后进行鲁棒训练,最核心的是MODEL(X.VIEW(X.SHAPE[0], -1))[:,0] - EPSILON*(2*Y.FLOAT()-1)*MODEL.WEIGHT.NORM(1)

(6) 鲁棒训练完成后,任何对抗攻击不会让TEST_ERROR高于2.5%。此时非对抗攻击得到的TEST_ERROR的错误率是0.3%左右(这个结果是大于之前的0.04%的)。这是鲁棒训练20个周期的结果,我测试了下,如果加大训练周期,并不会让结果更优。

第三章 - Adversarial examples, solving the inner maximization


主要方法是FGSM和PGD。PGD是迭代更新,比FGSM的迭代次数多。但是当梯度很小的时候,传统的PGD的效果也不好,于是出现the (normalized) steepest descent method.相对于传统PGD算法,它的 = (delta + alpha*delta.grad.detach().sign()).clamp(-epsilon,epsilon)。这种改进的PGD的表现仍然受到目标内局部最优可能性的限制,虽然不可能完全避免局部最优,但可以通过随机重启来缓解这个问题。

2.针对性攻击(基于改进的PGD->the (normalized) steepest descent method)


(1)loss = (yp[:,y_targ] - yp.gather(1,y[:,None])[:,0]).sum()

缺点:仅仅让非零数字欺骗分类器。原因在于我们是the class logit for the zero minus the class logit for the true class. 但是我们实际上并不关心其他类的情况。所以我们可以修改损失函数为下面这种。

(2)loss = 2*yp[:,y_targ].sum() - yp.sum()


(3) 占个位,这个不太懂。




第四章 - Adversarial training, solving the outer minimization

1. 方案目标

The goal of the robust optimization formulation, therefore, is to ensure that the model cannot be attacked even if the adversary has full knowledge of the model.
In other words, no matter what attack an adversary uses, we want to have a model that performs well.

2. 可选择方案

2.1 local gradient-based search (providing a lower bound on the objective) 基于局部梯度的搜索
2.2 exact combinatorial optimization (exactly solving the objective) 精确的组合优化 (不实用)
2.3. convex relaxations (providing a provable upper bound on the objective) 凸松弛


2.1.Using lower bounds, and examples constructed via local search methods, to train an (empirically) adversarially robust classifier.
2.3Using convex upper bounds, to train a provably robust classifier.

3. 方案实施

The basic idea is to simply create and then incorporate adversarial examples into the training process
the question arises as to which adversarial examples we should train on?

4. 代码

4.1 加载minist数据集
4.2 初始化model_cnn
4.3 定义fgsm、pgd函数
4.4 定义标准训练函数、对抗攻击函数
4.5 进行联合训练(基于cnn)

  opt = optim.SGD(model_cnn.parameters(), lr=1e-1)
   for t in range(10):
       train_err, train_loss = epoch(train_loader, model_cnn, opt)
       test_err, test_loss = epoch(test_loader, model_cnn)
       adv_err, adv_loss = epoch_adversarial(test_loader, model_cnn, pgd_linf)
       if t == 4:
           for param_group in opt.param_groups:
               param_group["lr"] = 1e-2
       print(*("{:.6f}".format(i) for i in (train_err, test_err, adv_err)), sep="\t"), "") 

So as we saw before, the clean error is quite low, but the adversarial error is quite high (and actually goes up as we train the model more). Let’s now do the same thing, but with adversarial training.
4.6 做一些happy的事情

opt = optim.SGD(model_cnn_robust.parameters(), lr=1e-1)
for t in range(10):
    train_err, train_loss = epoch_adversarial(train_loader, model_cnn_robust, pgd_linf, opt)
    test_err, test_loss = epoch(test_loader, model_cnn_robust)
    adv_err, adv_loss = epoch_adversarial(test_loader, model_cnn_robust, pgd_linf)
    if t == 4:
        for param_group in opt.param_groups:
            param_group["lr"] = 1e-2
    print(*("{:.6f}".format(i) for i in (train_err, test_err, adv_err)), sep="\t"), "")

pretty good!

4.7 对比两个不同的cnn

  model_cnn_robust = nn.Sequential(nn.Conv2d(1, 32, 3, padding=1), nn.ReLU(),
                                    nn.Conv2d(32, 32, 3, padding=1, stride=2), nn.ReLU(),
                                    nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(),
                                    nn.Conv2d(64, 64, 3, padding=1, stride=2), nn.ReLU(),
                                    nn.Linear(7*7*64, 100), nn.ReLU(),
                                    nn.Linear(100, 10)).to(device)
   model_cnn = nn.Sequential(nn.Conv2d(1, 32, 3, padding=1), nn.ReLU(),
                             nn.Conv2d(32, 32, 3, padding=1, stride=2), nn.ReLU(),
                             nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(),
                             nn.Conv2d(64, 64, 3, padding=1, stride=2), nn.ReLU(),
                             nn.Linear(7*7*64, 100), nn.ReLU(),
                             nn.Linear(100, 10)).to(device)


4.8 对抗攻击?对抗训练?
在chapter 1里,通过在损失函数前加符号,可以控制损失函数最大化和最小化。而且损失函数可以联合训练哦。
在chapter 3里,仅仅攻击训练,而不去反向传播更新权值,会大化损失函数(相当于改变了输入)。
在chapter 4里,通过对有干扰(fgsm或者pgd产生的对抗样本)的输入的损失函数最小化,相当于进行了对抗训练。

4.9 完美收官

4.10 凸优化
4.10.1 策略一
注:对比chapter 3 bound_propagation 是一样的

def bound_propagation(model, initial_bound):
   l, u = initial_bound
   bounds = []
   for layer in model:
       if isinstance(layer, Flatten):
           l_ = Flatten()(l)
           u_ = Flatten()(u)
       elif isinstance(layer, nn.Linear):
           l_ = (layer.weight.clamp(min=0) @ l.t() + layer.weight.clamp(max=0) @ u.t() 
                 + layer.bias[:,None]).t()
           u_ = (layer.weight.clamp(min=0) @ u.t() + layer.weight.clamp(max=0) @ l.t() 
                 + layer.bias[:,None]).t()
       elif isinstance(layer, nn.Conv2d):
           l_ = (nn.functional.conv2d(l, layer.weight.clamp(min=0), bias=None, 
                                      stride=layer.stride, padding=layer.padding,
                                      dilation=layer.dilation, groups=layer.groups) +
                 nn.functional.conv2d(u, layer.weight.clamp(max=0), bias=None, 
                                      stride=layer.stride, padding=layer.padding,
                                      dilation=layer.dilation, groups=layer.groups) +
           u_ = (nn.functional.conv2d(u, layer.weight.clamp(min=0), bias=None, 
                                      stride=layer.stride, padding=layer.padding,
                                      dilation=layer.dilation, groups=layer.groups) +
                 nn.functional.conv2d(l, layer.weight.clamp(max=0), bias=None, 
                                      stride=layer.stride, padding=layer.padding,
                                      dilation=layer.dilation, groups=layer.groups) + 
       elif isinstance(layer, nn.ReLU):
           l_ = l.clamp(min=0)
           u_ = u.clamp(min=0)
       bounds.append((l_, u_))
       l,u = l_, u_
   return bounds

注:对比chapter 3 bound_propagation 有差异 添加了idx

    def interval_based_bound(model, c, bounds, idx):
        # requires last layer to be linear
        cW = c.t() @ model[-1].weight
        cb = c.t() @ model[-1].bias
        l,u = bounds[-2]
        return (cW.clamp(min=0) @ l[idx].t() + cW.clamp(max=0) @ u[idx].t() + cb[:,None]).t()    


    def robust_bound_error(model, X, y, epsilon):
        initial_bound = (X - epsilon, X + epsilon)
        err = 0
        for y0 in range(10):
            C = -torch.eye(10).to(device)
            C[y0,:] += 1
            err += (interval_based_bound(model, C, bounds, y==y0).min(dim=1)[0] < 0).sum().item()
        return err


    def epoch_robust_bound(loader, model, epsilon):
        total_err = 0
        C = [-torch.eye(10).to(device) for _ in range(10)]
        for y0 in range(10):
            C[y0][y0,:] += 1
        for X,y in loader:
            X,y =,
            initial_bound = (X - epsilon, X + epsilon)
            bounds = bound_propagation(model, initial_bound)
            for y0 in range(10):
                lower_bound = interval_based_bound(model, C[y0], bounds, y==y0)
                total_err += (lower_bound.min(dim=1)[0] < 0).sum().item()
        return total_err / len(loader.dataset)

That doesn’t seem particularly useful, and indeed, it is a property of virtually all the relaxation-based verification approaches, is that they are vaccuous when evaluated upon a network trained without knowledge of these bounds.

4.10.2 策略二 Training using provable criteria
if we train a network specifically to minimize a loss based upon this upper bound, we get a network where the bounds are meaningful. This is a somewhat subtle but important point which is worth repeating.

    def epoch_robust_bound(loader, model, epsilon, opt=None):
        total_err = 0
        total_loss = 0
        C = [-torch.eye(10).to(device) for _ in range(10)]
        for y0 in range(10):
            C[y0][y0,:] += 1
        for X,y in loader:
            X,y =,
            initial_bound = (X - epsilon, X + epsilon)
            bounds = bound_propagation(model, initial_bound)
            loss = 0
            for y0 in range(10):
                if sum(y==y0) > 0:
                    lower_bound = interval_based_bound(model, C[y0], bounds, y==y0)
                    loss += nn.CrossEntropyLoss(reduction='sum')(-lower_bound, y[y==y0]) / X.shape[0]
                    total_err += (lower_bound.min(dim=1)[0] < 0).sum().item()
            total_loss += loss.item() * X.shape[0]  
            if opt:
        return total_err / len(loader.dataset), total_loss / len(loader.dataset)

!!!Finally, let’s train our model using this robust loss bound. Note that training rovably robust models is a bit of a tricky business. If we start out immediately by trying to train our robust bound with the full ϵ=0.1, the model will collapse to just predicting equal probability for all digits, and will never recover. Instead, to reliably train such models we need to schedule ϵ during the training process, starting with a small ϵ and gradually raising it to the desired level. The schedule we use below was picked rather randomly, and we can do much better with a bit of tweaking, but it serves our basic purpose.

    model_cnn_robust_2 = nn.Sequential(nn.Conv2d(1, 32, 3, padding=1, stride=2), nn.ReLU(),
                                       nn.Conv2d(32, 32, 3, padding=1, ), nn.ReLU(),
                                       nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(),
                                       nn.Conv2d(64, 64, 3, padding=1, stride=2), nn.ReLU(),
                                       nn.Linear(7*7*64, 100), nn.ReLU(),
                                       nn.Linear(100, 10)).to(device)
    opt = optim.SGD(model_cnn_robust_2.parameters(), lr=1e-1)
    eps_schedule = [0.0, 0.0001, 0.001, 0.01, 0.01, 0.05, 0.05, 0.05, 0.05, 0.05] + 15*[0.1]
    print("Train Eps", "Train Loss*", "Test Err", "Test Robust Err", sep="\t")
    for t in range(len(eps_schedule)):
        train_err, train_loss = epoch_robust_bound(train_loader, model_cnn_robust_2, eps_schedule[t], opt)
        test_err, test_loss = epoch(test_loader, model_cnn_robust_2)
        adv_err, adv_loss = epoch_robust_bound(test_loader, model_cnn_robust_2, 0.1)
        #if t == 4:
        #    for param_group in opt.param_groups:
        #        param_group["lr"] = 1e-2
        print(*("{:.6f}".format(i) for i in (eps_schedule[t], train_loss, test_err, adv_err)), sep="\t"), "")

Train Eps Train Loss* Test Err Test Robust Err
0.000000 0.829700 0.033800 1.000000
0.000100 0.126095 0.022200 1.000000
0.001000 0.119049 0.021500 1.000000
0.010000 0.227829 0.019100 1.000000
0.010000 0.129322 0.022900 1.000000
0.050000 1.716497 0.162200 0.828500
0.050000 0.744732 0.092100 0.625100
0.050000 0.486411 0.073800 0.309600
0.050000 0.393822 0.068100 0.197800
0.050000 0.345183 0.057100 0.169200
0.100000 0.493925 0.068400 0.129900
0.100000 0.444281 0.067200 0.122300
0.100000 0.419961 0.063300 0.117400
0.100000 0.406877 0.061300 0.114700
0.100000 0.401603 0.061500 0.116400
0.100000 0.387260 0.059600 0.111100
0.100000 0.383182 0.059400 0.108500
0.100000 0.375468 0.057900 0.107200
0.100000 0.369453 0.056800 0.107000
0.100000 0.365821 0.061300 0.116300
0.100000 0.359339 0.053600 0.104200
0.100000 0.358043 0.053000 0.097500
0.100000 0.354643 0.055700 0.101500
0.100000 0.352465 0.053500 0.096800
0.100000 0.348765 0.051500 0.096700

    print("PGD, 40 iter: ", epoch_adversarial(test_loader, model_cnn_robust_2, pgd_linf, num_iter=40)[0])

PGD, 40 iter: 0.0779

So somewhere right in the middle. Note also that training these provably robust models is a challenging task, and a bit of tweaking (even still using interval bounds) can perform quite a bit better. For now, though, this is sufficient to make our point that we can obtain non-trivial provable bounds for trained networks.



