忽视警告_忽视针对神经网络的网络攻击

忽视警告

Artificial neural networks can be thought of as computed biological neural networks. Is it possible to perceive an object as something it is not? If our perception of the outside world is an interpretation upon reflecting on sensory information; can we alter that interpretation to perceive something differently than what it really is? If so, how? How could we avoid neural networks misclassifying due to malicious attacks such as adversarial poisoning and evasion attacks on white-box and black-box models?

人工神经网络可以被认为是计算机生物神经网络。是否有可能将一个物体感知为不是？如果我们对外界的感知是对感官信息进行反思的一种解释；我们可以改变这种解释，以使其与实际事物有所不同吗？如果是这样，怎么办？我们如何避免神经网络因恶意攻击(例如对白盒和黑盒模型的对抗性中毒和逃避攻击)而分类错误？

一些背景 (Some context)

The use of artificial neural networks (ANNs) can be traced back to 1943 — McCulloch-Pitts Neuron. This computational model of a neuron used Thresholding Logic to make decisions. Simply the output either met a threshold or didn’t — which is what made the decisions based on the output from these neurons connected together. This model is also known as a Perceptron which is a building block for deep neural networks (DNNs are artificial neural networks with more than the usual three layers (input layer, hidden layers, and an output layer)).

人工神经网络(ANN)的使用可以追溯到1943年-McCulloch-Pitts Neuron。这个神经元的计算模型使用阈值逻辑进行决策。只是输出达到阈值或没有达到阈值-这是基于这些连接在一起的神经元的输出做出决定的原因。此模型也称为感知器，它是深度神经网络的构建块(DNN是具有比通常的三层(输入层，隐藏层和输出层)更多的人工神经网络)。

Ever since this model was produced we have come far during the mid-2000s where DNNs perform many tasks; Deep Learning (DL) has taken these neural networks to higher accuracies than previously used classification methods. The advancement in DL has motivated the use of adversaries to manipulate DNNs to force misclassification of inputs. Now we use neural networks in spaces such as speech recognition, face/object recognition, fraud detection, security applications, etc.

自从产生这种模型以来，在2000年代中期DNN执行许多任务的过程中，我们已经走了很远。深度学习(DL)使这些神经网络具有比以前使用的分类方法更高的准确性。 DL的进步促使使用对手来操纵DNN来迫使输入错误分类。现在我们在诸如语音识别，面部/物体识别，欺诈检测，安全应用程序等空间中使用神经网络。

If neural networks, used in important applications such as self-driving cars, with high accuracy demands were attacked maliciously to force misclassifications, the matter could incline into a life or death scenario if a physical attack is performed such as not classifying a stop sign properly or not identifying a red light as a red light. In this post, I will be delving into the possible security attacks against neural networks such as adversarial attacks and how they affect the network + how we could possibly avoid such attacks to build more robust systems.

如果对具有高精确度要求的重要应用(如自动驾驶汽车)中使用的神经网络进行了恶意攻击，以强制进行错误分类，那么如果进行物理攻击(例如未正确分类停车标志)，则该问题可能会陷入生死攸关的境地或不将红灯识别为红灯。在本文中，我将深入研究针对神经网络的可能安全攻击，例如对抗性攻击，以及它们如何影响网络，以及如何避免此类攻击以构建更强大的系统。

对神经网络的可能攻击 (Possible attacks against neural networks)

对抗攻击 (Adversarial Attacks)

Adversarial machine learning is a ML technique employed as an attempt to fool models through malicious inputs. The main idea is the introduction of strategic noise.

对抗性机器学习是一种机器学习技术，旨在通过恶意输入来欺骗模型。主要思想是引入战略噪音。

白盒与黑盒攻击 (White-Box vs. Black-Box Attacks)

A white box attack takes place where the attacker has access to the architecture of the network. The structure of the network will allow you to manipulate individual neurons and select the most damaging attacks to perform.

白盒攻击发生在攻击者可以访问网络体系结构的地方。网络的结构将使您可以操纵单个神经元并选择最具破坏性的攻击来执行。

To perform a white box adversarial attack, to a binary classification model using a subset of the iris dataset where there are two inputs x_1 and x_2 and hence two classes, class 1 and class2, the strategy would be to create some adversarial examples to fool the model based on its inputs.

为了对白名单进行对抗，使用虹膜数据集的子集对二进制分类模型进行攻击，虹膜数据集有两个输入x_1和x_2，因此有两个类别(类别1和类别2)，策略是创建一些对抗性例子来欺骗基于其输入的模型。

To classify the red points as class 1 we would need to move them across the decision boundary. The movement of the points from class 0 to class 1 is called a perturbation. To move a point across the decision boundary, with full knowledge of how the model works, the attacker can determine how to change the loss by changing inputs going into the function. How does this work?

要将红点分类为1类，我们需要将其移过决策边界。点从0类到1类的运动称为微扰。要在决策边界上移动点，并充分了解模型的工作原理，攻击者可以确定如何通过更改输入函数的输入来更改损失。这是如何运作的？

If the dot product, z, of certain input and its weight was such that z < 0 then we would expect, using a Sigmoid activation function, for the output certainty that this input belongs to class 1 to be fairly small; certainty < 27% Since:

如果某些输入的点积z及其权重使得z <0，则我们期望使用Sigmoid激活函数，使该输入属于1类的输出确定性很小；确定性<27％，因为：

Hence, we know that if we increased the dot product’s magnitude then we could have greater confidence that an input belongs to class 1. An adversarial, adx, could be such that adx = x + 0.5w by halving the weights we tweak the value of the dot product and obtain a value such that z > 0 and hence the model is now more confident that the input belongs to class 1.

因此，我们知道，如果我们增加点积的大小那么我们就可以有一个输入属于1类对抗性，ADX更大的信心，可能是这样的：由减半的权重ADX = X +0.5瓦特我们调整的价值点积，并获得z> 0的值，因此该模型现在更加确信该输入属于1类。

However, the attacker may not always know the structure of the model.

但是，攻击者可能并不总是知道模型的结构。

A black-box attack can be defined as one where the attacker has access only to network inputs and outputs, but not to any internal parameters. Hence, an attacker could send inputs and receive outputs such as labels and class to design an adversarial attack.

黑匣子攻击可以定义为攻击者只能访问网络输入和输出，而不能访问任何内部参数的攻击。因此，攻击者可以发送输入并接收输出(例如标签和类)来设计对抗性攻击。

The strategy carried out in a black box adversarial attack, devised by I. Goodfellow in Practical Black-Box Attacks against Machine Learning, is training a substitute model with a random sample of data. The adversarial examples are created from the dataset using gradient-based attacks. The objective of a gradient-based attack, described in Explaining and Harnessing Adversarial Examples by I. Goodfellow, is to move a point over a model’s decision boundary as explained above. Here, the adversarial examples are a step in the direction of the model’s gradient to determine if the black-box model will classify the new data points the same way as the substitute model. The substitute model gets a more precise understanding of where the black-box model’s decision boundary is. After a few iterations of this, the substitute model shares almost the exact same decision boundaries as the black-box model.

由I.Goodfellow在“针对机器学习的实用黑箱攻击”中设计的黑箱对抗攻击中实施的策略，正在使用随机数据样本训练替代模型。对抗示例是使用基于梯度的攻击从数据集中创建的。 I. Goodfellow在“ 解释和利用对抗示例”中描述的基于梯度的攻击的目的是如上所述，在模型的决策边界上移动一个点。这里，对抗性示例是朝模型梯度方向迈出的一步，以确定黑盒模型是否将以替代模型的相同方式对新数据点进行分类。替代模型可以更准确地了解黑匣子模型的决策边界在哪里。经过几次迭代后，替代模型与黑盒模型几乎共享完全相同的决策边界。

The substitute model doesn’t even need to be the same type of ML model as the black-box. In fact, a simple Multi-Layer Perceptron is enough to learn close enough decision boundaries of a complex Convolutional Neural Network. Ultimately, with a small sample of data, a few iterations of the data augmentation and labeling, a black-box model can be successfully attacked.

替代模型甚至不需要与黑匣子相同类型的ML模型。实际上，简单的多层感知器足以了解复杂的卷积神经网络的足够近的决策边界。最终，通过少量数据样本，几次数据扩充和标记迭代，就可以成功地攻击黑匣子模型。

The adversarial examples instantiated by altered and perturbated inputs force a classifier to misclassify the resulting adversarial inputs, while the human observer is still able to correctly classify the inputs themselves. E.g. an autonomous vehicle that gets attacked may not be able to identify a stop sign while a human observer doesn’t have the same trouble.

由变化和扰动的输入实例化的对抗性示例迫使分类器对所得的对抗性输入进行错误分类，而人类观察者仍然能够正确地对输入本身进行分类。例如，当人类观察者没有同样的麻烦时，遭到攻击的自动驾驶汽车可能无法识别停车标志。

Practical Black-Box Attacks against Machine Learning 针对机器学习的实用黑匣子攻击

对抗攻击的类型 (Types of Adversarial Attacks)

中毒发作 (Poisoning Attack)

A type of adversarial attack. The attacker provides some malicious input which causes the decision boundary between two classes to change. E.g. if we 4 go back to the binary classification model in Figure 2, inputting data to train the model allows it to establish a decision boundary between the red and blue inputs. If a malicious input were to change the position of the decision boundary then the model would start misclassifying some inputs from then on.

一种对抗攻击。攻击者提供了一些恶意输入，导致两个类之间的决策边界发生了变化。例如，如果我们4回到图2中的二进制分类模型，则输入数据来训练模型可以使其在红色和蓝色输入之间建立决策边界。如果恶意输入将更改决策边界的位置，则此后模型将开始对某些输入进行错误分类。

躲避攻击 (Evasion Attacks)

This is also a type of adversarial attack where the attacker causes the model to misclassify a sample. More clearly, if there was an ML model classifying whether a bank transaction is a fraud or not based on certain parameters; the weights of these parameters define a transaction, hence if the attacker was dealing with a white-box system they could find out which parameter determines that it isn’t a fraud and increase the weight of that parameter.

这也是一种对抗性攻击，其中攻击者使模型对样本进行错误分类。更明确地说，是否存在基于某些参数对银行交易是否为欺诈进行分类的ML模型；这些参数的权重定义了一个事务，因此，如果攻击者正在处理白盒系统，他们可以找出哪个参数确定这不是欺诈，并增加该参数的权重。

E.g. MIT turtle rifle misclassification incident: During 2017 a paper was released about misclassifications of various objects under Google’s InceptionV3 image classifier as a result of small perturbations to those objects as an adversarial data modification attack. It was this 3D printed turtle that was classified as a rifle in every angle it was shown to the camera.

例如MIT乌龟步枪分类错误事件：在2017年期间，发表了一篇论文，内容涉及Google的InceptionV3图像分类器对各种对象的分类错误，这是由于对抗性数据修改攻击而对这些对象进行小的干扰。正是这种3D打印的乌龟在向相机显示的每个角度都被归为步枪。

MIT adversarial attack research 麻省理工学院对抗攻击研究

It was this 3D printed turtle that was classified as a rifle in every angle it was shown to the camera.

正是这种3D打印的乌龟在向相机显示的每个角度都被归为步枪。

对抗能力 (Adversarial Capabilities)

The term adversarial capabilities refer to the amount of information available to an adversary about the system. For illustration, consider the case of an automated vehicle system with the attack surface being the testing time. An internal adversary is one who has access to the model architecture and can use it to distinguish between different images and traffic signs, whereas a weaker adversary is one who has access only to the dump of images fed to the model during testing time. Though both the adversaries are working on the same attack surface, the former adversary is assumed to have much more information and is thus strictly “stronger”. We explore the range of adversarial capabilities in machine learning systems as they relate to testing and training phases.

术语“对抗能力”是指针对系统的对抗者可用的信息量。 为了说明起见，考虑以攻击面为测试时间的自动车辆系统的情况。 内部对手是有权访问模型架构并可以使用它来区分不同的图像和交通标志的对手，而实力较弱的对手是只能访问测试期间馈送到模型的图像转储的对手。 尽管两个对手都在相同的攻击面上进行工作，但假定前一个对手具有更多信息，因此严格说来“更强”。 我们探索机器学习系统中与测试和训练阶段相关的对抗能力的范围。

基于对抗能力的攻击方法： (Methods of attack based on adversarial capabilities:)

Label modification — adversary to modify solely the labels in supervised learning datasets.

标签修改-只能在有监督的学习数据集中修改标签的对手。

Data injection — adversary does not have any access to the training data as well as to the learning algorithm but has the ability to augment a new data to the training set.

数据注入-对手无法访问训练数据以及学习算法，但是具有将新数据扩充到训练集的能力。

Data modification — The adversary does not have access to the learning algorithm but has full access to the training data.

数据修改-对手无法访问学习算法，但可以完全访问训练数据。

Logic corruption — The adversary has the ability to meddle with the learning algorithm.

逻辑损坏-对手有能力混入学习算法。

在有监督，无监督和强化ML模型中的应用： (Applications to Supervised, Unsupervised, and Reinforcement ML models:)

监督模型： (Supervised Models:)

Supervised ML models are task-driven and are used mainly for classification and regression purposes. For a supervised learning model, the input data is labeled. Alongside the input, the corresponding output is known by the supervisor. These models can be attacked by label modification, data injection, and data modification. Examples of Supervised Learning: Regression, Decision Tree, Random Forest, KNN, Logistic Regression, etc.

监督的ML模型是任务驱动的，主要用于分类和回归。对于监督学习模型，输入数据被标记。除了输入之外，主管还知道相应的输出。这些模型可能会受到标签修改，数据注入和数据修改的攻击。监督学习的例子：回归，决策树，随机森林，KNN，逻辑回归等。

Classification can simply be explained as mapping outputs into classes. As explained above, misclassification of inputs is possible by poisoning and evasion attacks in spaces such as fraud detection, where many parameters come into play, and other binary or non-binary classification models. Methods of attacks are discussed above.

分类可以简单地解释为将输出映射到类中。如上所述，在诸如欺诈检测(其中有许多参数起作用)以及其他二进制或非二进制分类模型之类的空间中，中毒和逃避攻击可能导致输入的错误分类。上面讨论了攻击方法。

无监督模型： (Unsupervised Models:)

Unsupervised learning models are more data-driven than supervised models. The input data encompasses no labels and there exists no supervisor nor 6 feedback. Common methods of how the output is produced with this method are clustering and association.

无监督学习模型比有监督模型更受数据驱动。输入数据不包含标签，也没有主管或6个反馈。使用此方法产生输出的常见方法是聚类和关联。

In simple terms, clustering is inherently grouping data according to similarity whereas the association is discovering rules to describe data, also known as finding patterns. These are all methods known well in ‘data mining.’ Examples of Unsupervised Learning: Apriori algorithm, K-means.

用简单的术语来说，聚类本质上是根据相似性对数据进行分组，而关联是发现描述数据的规则，也称为发现模式。这些都是“数据挖掘”中众所周知的方法。无监督学习的示例：Apriori算法，K均值。

Unsupervised models can be attacked mainly using data modification and injection based on adversarial capabilities due to their data-driven nature.

由于无监督模型具有数据驱动特性，因此主要可以使用基于对抗功能的数据修改和注入来攻击无监督模型。

强化学习： (Reinforcement Learning:)

Reinforcement Learning is a branch of AI, often referred to as true Machine Learning. This type of learning allows machines to automatically determine ideal behaviors in specific contexts, using a reward system. The goal is to take actions according to observations gathered from the interaction with the environment to maximize rewards. The possible method of attack would be to reinforce the agent incorrectly. Example of Reinforcement Learning: Markov Decision Process.

强化学习是AI的一个分支，通常称为真正的机器学习。这种学习方式使机器可以使用奖励系统自动确定特定情况下的理想行为。目标是根据从与环境的互动中收集到的观察结果采取行动，以最大化回报。攻击的可能方法是不正确地增强代理。强化学习的示例：马尔可夫决策过程。

An MDP is a Markov Decision Process, in the figure below, is a mathematical framework for modeling decision making. The input state of the graph is observed by the agent. A decision-making function allows the agent to take some action, orange circles. The output produced by this action is the process where the agent is reinforced by the environment. The orange arrows indicate rewards and punishments that drive the algorithm. A graph such as this helps visualize how an attack may be carried out upon a reinforcement agent.

MDP是马尔可夫决策过程，在下图中，是用于建模决策的数学框架。图的输入状态由代理观察。决策功能允许业务代表采取某些行动，橙色圆圈。通过此操作产生的输出是环境增强代理的过程。橙色箭头表示驱动算法的奖励和惩罚。这样的图形有助于可视化如何对增强剂进行攻击。

如何避免此类攻击？ (What can be done to avoid such attacks?)

使用云ML模型 (Use cloud ML models)

Cloud-based models mean an intruder can’t play with the model locally. Of course, technically, an attacker could still try to brute-force the cloud ML model. But a black box attack like this takes a lot of time and would be easily detected.

基于云的模型意味着入侵者无法在本地使用该模型。当然，从技术上讲，攻击者仍然可以尝试强行使用云ML模型。但是像这样的黑匣子攻击需要花费大量时间，并且很容易被检测到。

对抗训练 (Adversarial Training)

Actively generate adversarial examples, adjust their labels, and add them to the training set. You can then train the new network on this updated training set and it will help to make your network more robust to adversarial examples.

积极生成对抗性示例，调整其标签，然后将其添加到训练集中。然后，您可以在此更新的训练集上训练新的网络，这将有助于使您的网络对对抗性示例更可靠。

平滑的决策边界 (Smooth decision boundaries)

Smoothen the decision boundaries between classes to make it less easy to manipulate network classification using strategic noise injection.

平滑类之间的决策边界，以减少使用策略性噪声注入操作网络分类的难度。

渗透测试 (Penetration Testing)

Hire an attacker to assess the magnitude of damage that can be done to your model. This draws the big picture as to how much damage can be done in an actual cyber attack.

雇用攻击者以评估可以对模型造成的破坏的程度。这样就可以大致了解实际网络攻击可能造成的损害。

翻译自: https://medium.com/analytics-vidhya/an-overlook-of-cyberattacks-against-neural-networks-e221b7cff3fd