传统学科怎么和深度学习领域结合

2023-05-16

这篇博客【程序员读论文】LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015). 中的论文提到深度学习将在很多行业上有广阔的前景。最近看到毕导的公众号发文菜鸡程序员的一天都在折腾些什么？,我有点疑惑毕导不是化工的，怎么也搞起编程，看内容也是和深度学习相关的，好奇心之驱使我去看看清华的化工学生到底在研究什么。

一、前言

毕导的公众号下贴了几篇文章：

1、Zheng, Shaodong, and Jinsong Zhao. “A Self-Adaptive Temporal-Spatial Self-Training Algorithm for Semi-Supervised Fault Diagnosis of Industrial Processes.” IEEE Transactions on Industrial Informatics (2021).

2、Wu, Deyang, and Jinsong Zhao. “Process topology convolutional network model for chemical process fault diagnosis.” Process Safety and Environmental Protection 150 (2021): 93-109.

3、Xiang, Shuaiyu, Yiming Bai, and Jinsong Zhao. “Medium-term Prediction of Key Chemical Process Parameter Trend with Small Data.” Chemical Engineering Science (2021): 117361.

发现他们老板也是个大牛，这是简介https://www.chemeng.tsinghua.edu.cn/info/1094/2385.htm

当然也碰巧发现了毕导的文章，毕导还是强啊，刚回归学校没多久就发文章了。

从他们文章的标题也可以看出，基本是都是采用深度学习模型去探测和识别化工过程故障。这种一般就需要对化工过程比较了解，积累了很多化工过程数据，再加上对深度学习的掌握，才可以产生一篇这样的文章。

二、文章内容

因为我本身对化工过程不是很了解，因此，本文只做简单解读，主要是了解深度学习怎么和化工过程结合起来的。

先看毕导的文章：A novel orthogonal self-attentive variational autoencoder method for interpretable chemical process fault detection and identification

Abstract

Industrial processes are becoming increasingly large and complex, thus introducing potential safety risks and requiring an effective approach to maintain safe production. Intelligent process monitoring is critical to prevent losses and avoid casualties in modern industry. As the digitalization of process industry deepens, data-driven methods offer an exciting avenue to address the demands for monitoring complex systems. Nevertheless, many of these methods still suffer from low accuracy and slow response. Besides, most black-box models based on deep learning can only predict the existence of faults, but cannot provide further interpretable analysis, which greatly confines their usage in decision-critical scenarios. In this paper, we propose a novel orthogonal self-attentive variational autoencoder (OSAVA) model for process monitoring, consisting of two components, orthogonal attention (OA) and variational self-attentive autoencoder (VSAE). Specifically, OA is utilized to extract the correlations between different variables and the temporal dependency among different timesteps; VSAE is trained to detect faults through a reconstruction-based method, which employs self-attention mechanisms to comprehensively consider information from all timesteps and enhance detection performance. By jointly leveraging these two models, the OSAVA model can effectively perform fault detection and identification tasks simultaneously and deliver interpretable results. Finally, extensive evaluation on the Tennessee Eastman process (TEP) demonstrates that the proposed OSAVA-based fault detection and identification method shows promising fault detection rate as well as low detection delay and can provide interpretable identification of the abnormal variables, compared with representative statistical methods and state-of-the-art deep learning methods.

工业过程变得越来越庞大和复杂，从而引入了潜在的安全风险，需要一种有效的方法来维持安全生产。智能过程监控对于现代工业中防止损失和避免人员伤亡至关重要。随着过程工业数字化的深入，数据驱动的方法提供了一种令人兴奋的途径来满足监控复杂系统的需求。尽管如此，许多这些方法仍然存在准确性低和响应慢的问题。此外，大多数基于深度学习的黑盒模型只能预测故障的存在，而不能提供进一步的可解释性分析，这极大地限制了它们在关键决策场景中的使用。在本文中，我们提出了一种用于过程监控的新型正交自注意变分自编码器（OSAVA）模型，它由两个组件组成，正交注意（OA）和变分自注意自编码器（VSAE）。具体来说，OA用于提取不同变量之间的相关性以及不同时间步长之间的时间依赖性； VSAE 被训练通过基于重构的方法检测故障，该方法采用自注意力机制来综合考虑来自所有时间步的信息并提高检测性能。通过联合利用这两个模型，OSAVA 模型可以有效地同时执行故障检测和识别任务，并提供可解释的结果。最后，对田纳西伊士曼过程 (TEP) 的广泛评估表明，与代表性统计方法相比，所提出的基于 OSAVA 的故障检测和识别方法具有良好的故障检测率和低检测延迟，并且可以提供可解释的异常变量识别和最先进的深度学习方法。

Introduction

Safe production is a continuing concern within modern industry. With the development of automation and digitization, industrial processes can be efficiently controlled by systems like distributed control systems (DCS) and advanced process control (APC) (Shu et al., 2016). However, despite advances in control systems that have made production more intelligent, real-world processes are often rather complicated and inevitably in a fault state, leading to shutdowns, economic losses, injuries, or even catastrophic accidents in severe cases (Venkatasubramanian et al., 2003). Therefore, it is imperative to achieve higher levels of safety, efficiency, quality, and profitability by compensating for the effects of faults occurring in the processes (Qin, 2012, Ge et al., 2013, Weese et al., 2016).

Advances in process safety management have highlighted the importance of intelligent process monitoring (Madakyaru et al., 2017, Fazai et al., 2019). Overall, process monitoring is associated with four procedures: fault detection, fault identification, fault diagnosis, and process recovery (Chiang et al., 2001). Fault detection refers to determining whether a fault has occurred. Fault identification refers to identifying the variables most relevant to the fault. Fault diagnosis requires further specification of the type and cause of the fault. Customarily, fault detection and diagnosis are collectively referred to as FDD. Khan et al. described the development of process safety in terms of risk management (RA) and pointed out that the integration of FDD and RA had effectively improved the level of process safety (Khan et al., 2015). Arunthavanathan et al. elucidated the dialectical interrelation between FDD, RA, and abnormal situation management (ASM), further expanding the connotation of these concepts from safety perspectives (Arunthavanathan et al., 2021). ASM is a centralized and integrated process that implies instant detection of abnormal conditions, timely diagnosis of the root causes, and decision support to operators for the elimination of the faults (Hu et al., 2015, Dai et al., 2016). It has become a consensus in academia and industry that process monitoring including FDD is one of the most critical issues of ASM (Shu et al., 2016). Therefore, building an efficient, robust, and application-worthy process monitoring framework is of supreme importance for process safety.

In recent years, many researchers have enriched the scope of process monitoring from the standpoint of safety and risk management. For example, BahooToroody et al. proposed a process monitoring based signal filtering method for the safety assessment of the natural gas distribution process (BahooToroody et al., 2019). Yin et al. deployed a supervised data mining method for the risk assessment and smart alert of gas kick during the industrial drilling process (Yin et al., 2021). Amin et al. proposed a novel risk assessment method that integrated multivariate process monitoring and a logical dynamic failure prediction model (Amin et al., 2020). In summary, effective and reliable process monitoring can reduce the risk of accidents, keep operators informed of the status of the process, and thereby enhance process safety.

On the other hand, process monitoring methods can be divided into the following categories in line with the way of modeling: the first principle methods, also known as white-box models, and data-driven methods, also known as black-box models (Lam et al., 2017). First principle methods mainly use knowledge, mechanisms, and mathematical equations to describe the process quantitatively. On the contrary, data-driven methods analyze regular patterns of processes only from collected data without prescribing the meaning of faulty states. With the widespread use of big data and the strengthening of computing power, data-driven methods have gradually attracted attention from both academia and industry (Shardt, 2015).

Data-driven process monitoring methods can be roughly grouped into two lines of development: multivariate statistical methods and deep learning methods. Traditional statistical methods mainly include principal component analysis (PCA) (Wise et al., 1990), partial least squares (PLS) (Kresta et al., 1991), independent component analysis(ICA) (Comon, 1994), and Fisher discriminant analysis (FDA) (Chiang et al., 2000), etc. There have been insightful discussions on the merits of these methods (Joe Qin, 2003). In order to cope with the nonlinear and dynamic nature of actual industrial processes, kernel-based and dynamic-based methods are derived (Kaspar and Harmon Ray, 1993, Ku et al., 1995, Rosipal and Trejo, 2001, Cho et al., 2005). Unfortunately, these traditional methods are not well equipped to deal with complex problem and still suffer from low accuracy rates.

With the rapid rise of deep learning, methods based on deep neural networks (DNN) for process monitoring have drawn tremendous attention. Due to its powerful ability to learn the intrinsic regularities and representation hierarchies of data, deep-learning-based methods has become a mainstream area in the field of process monitoring today, with advanced models emerging in recent years. For example, deep belief networks (DBN) and convolutional neural networks (CNN) are used to perform chemical fault diagnosis tasks and have achieved excellent results (Lv et al., 2016, Zhang and Zhao, 2017, Wu and Zhao, 2018). However, these works consider fault diagnosis as a supervised classification task, and thus require the labels of all kinds of faults in advance. Nevertheless, the occurrence of a fault is a low probability event in reality, and even if a faulty event occurs, it is often uncharted. Such a situation makes it impractical to acquire well-labeled fault data and reduces the practical value of deploying supervised learning methods in industrial process monitoring.

By contrast, unsupervised process monitoring methods usually extract features from normal data only, which may have broader application prospects in practice. Among them, autoencoder (AE) and its derivatives are representative neural networks (Hinton, 2006). They have been successfully applied in many scenarios, including image feature extraction (Le, 2013), machine translation (Cheng et al., 2016), anomaly detection (Zhou and Paffenroth, 2017, Al-Qatf et al., 2018, Roy et al., 2018), and fault diagnosis (Zheng and Zhao, 2020). In principle, AE learns to reconstruct the normal data through dimension reduction and regards the data that cannot be reconstructed as anomalies (Längkvist et al., 2014).

Despite its wide application in many domains, one disadvantage of AE is that it is solely trained to reduce the reconstruction loss through encoding and decoding, but it does not regularize the latent space, which is prone to overfitting and does not give meaningful representations (Fu et al., 2019). Variational autoencoder (VAE) has a similar structure to AE but works in a perspective of probability and estimates a posterior distribution in latent space (Kingma and Welling, 2014). By introducing such explicit regularization, VAE can be trained to obtain a latent space with good properties such as continuity and completeness, which allows for interpolation and interpretation (Kingma and Welling, 2019). So far, VAE has been extensively applied to process monitoring (Lee et al., 2019, Wang et al., 2019; Zhang et al., 2019b) as well as many other fields (Walker et al., 2016; Zhang et al., 2019a; Pol et al., 2020).

However, AE and VAE are primarily static deep networks and do not consider the dynamic behavior of data (Längkvist et al., 2014). Since industrial process data is often in a form of complex multivariate time series, it is necessary to capture the temporal correlations and characteristics. To better process such time sequences, the model should be able to comprehensively consider the current time step as well as its relation to time steps from the past. Recurrent neural networks (RNN) use several internal states to memorize variable length sequences of inputs to model the temporal dynamic behavior. Benefiting from this, variational recurrent autoencoder (VRAE) is proposed and applied in many areas in the past couple of years. For example, Park et al. used an LSTM-VAE structure to perform multimodal anomaly detection tasks in a Robot-Assisted Feeding scene (Park et al., 2017). Lin et al. proposed a hybrid model of VAE and LSTM as an unsupervised approach for anomaly detection in time series (Lin et al., 2020). Cheng expounded a novel fault detection method based on VAE and GRU, which achieved both higher detection accuracy and lower detection delay than conventional methods on the Tennessee Eastman process (Cheng et al., 2019).

In such an RNN-based encoder-decoder structure, the encoder needs to gather all input sequences into one integrated feature. Unfortunately, there are certain concerns associated with the ability of modeling long input sequences of RNN-based methods (Cho et al., 2014). Especially in industrial processes, downstream data often have a strong dependence on long-term upstream data. To address this challenge, an attention-based encoder-decoder structure is adopted that allows the model to give different weights across all time steps of the sequence and automatically attend to the more important parts (Bahdanau et al., 2016). For example, Aliabadi et al. used an attention-based RNN model for multistep prediction of chemical process status which showed superior performance over conventional machine learning techniques (Aliabadi et al., 2020). Mu et al. introduced a temporal attention mechanism to augment LSTM and focus on local temporal information, resulting in a high-quality fault classification rate on the Tennessee Eastman process (Mu et al., 2021). Up till now, the attention mechanism is usually used in combination with RNNs or CNNs in most cases. However, recent work has proved that outstanding performance can be achieved on many tasks by using the attention mechanism only (Vaswani et al., 2017) and the application of a pure attention mechanism in industrial process monitoring has received scant attention in research literature.

From the review above, it is observed that the methods adopted in the field of process monitoring have been evolving in recent years along with the enrichment of state-of-the-art deep learning models. In recent years, a variety of novel and powerful deep networks have imparted greater representation capability to deep learning, thus overcoming the drawbacks of traditional methods. These performance advances also provide a solid backbone for process safety improvements (Dogan and Birant, 2021). Moreover, process safety relies on a global and proactive awareness of the process, which may not be achieved by one single approach or model. As the optimization space for the performance of single-task-oriented methods is gradually decreasing, it is also expected to broaden the comprehensiveness of process monitoring methods through multiple techniques. Several recent studies have focused on integrating single-task-oriented methods to obtain hybrid methods for fault detection, identification, diagnosis, and their contribution to process safety (Ge, 2017, Xiao et al., 2021, Amin et al., 2021). For example, Amin et al. utilized a hybrid method based on PCA and a Bayesian network to detect and diagnosis the faults at one time (Amin et al., 2018). Deng et al. used a serial PCA to perform fault detection and identification on nonlinear processes, which surpassed the performance of the KPCA method (Deng et al., 2018). The combination of attention mechanisms and deep models is also an effective hybrid method to perform multiple tasks in process monitoring. For example, Li proposed a nonlinear process monitoring method based on 1D convolution and self-attention mechanism to adaptively extract the features of both global and local inter-variable structures, which is validated on the Tennessee Eastman process for fault detection and fault identification (Li et al., 2021). The convergence of attention mechanisms and deep networks allows for better extraction of interrelationships between data, thus making the data-driven methods more rigorous and reliable for process safety. However, this exciting function of the attention mechanisms is still under-explored in the field of process monitoring.

Despite the satisfactory performance of deep learning methods in fault detection, the high accuracy comes at the expense of high abstraction. It is alarming that the complex network structure and the massive number of parameters may make the model an incomprehensible black box, thus hindering human understanding of how deep neural networks make judgments upon the occurrence of faults. Recently, how to improve the interpretability of AI has become a lively topic of discussion (Chakraborty et al., 2017, Zhang and Zhu, 2018). In the field of process monitoring and fault detection, we should not stop at assigning neural networks to give binary judgments about the existence of a fault but rather empower the model to help humans comprehend why certain judgments or predictions have been made (Kim et al., 2016). So far, many paths to this vision have been explored. For example, Bao et al. proposed a sparse dimensionality reduction method called SGLPP, which extract sparse transformation vectors to reveal meaningful correlations between variables and further construct variable contribution plots to produce interpretable fault diagnosis results (Bao et al., 2016). Wu and Zhao developed a PTCN method by incorporating process topology knowledge into graph convolutional neural networks, conducting a more rational and understandable feature extraction than other data-driven fault diagnosis models (Wu and Zhao, 2021). In summary, the predominant trails to improve the interpretability of a model include using intrinsically interpretable models, such as decision tree or Bayesian network; providing summary statistics for each feature, such as variable contribution plot; and revealing the practical implications of the internal parameters of the model, such as weight value of the attention mechanism (Ribeiro et al., 2016).

In recent studies, the attention mechanism has cut a conspicuous figure in conferring interpretability to deep neural networks in many fields (Mott et al., 2019). For example, in the multivariate process monitoring problem, there are two most critical points to consider: the causal relationship between variables and the temporal dependency along the time sequence dimension. By examining the weights of the attention mechanism, we can understand which part of data the model is attending to, which is beneficial to explain the internal parameters of a deep model. Gangopadhyay et al. proposed a spatiotemporal attention module to enhance understanding the contributions of different features on time series prediction outputs. The learned attention weights were also validated from a domain knowledge perspective (Gangopadhyay et al., 2020). Wang et al. designed a multi-attention 1D convolutional neural network, which can fully consider characteristics of rolling bearing faults to enhance fault-related features and ignore irrelevant features (Wang et al., 2020). It can be perceived that leveraging the attention mechanism between variables encourages better interpretability of the model and better evaluation of contributions of multiple variables. This could help isolate abnormal variables and performing fault identification tasks after a fault is detected, which in turn dramatically facilitates troubleshooting efficiency for field operators and improves system safety. However, to the best of our knowledge, this great superiority of the attention mechanism has yet received insufficient focus in chemical processes.

Motivated by the above observations, in this paper we propose a novel attention-based model — orthogonal self-attentive variational autoencoder (OSAVA for brevity) for industrial process monitoring. The OSAVA model consists of two parts: orthogonal attention (OA) and variational self-attentive autoencoder (VSAE). The OA model includes two independent branches: a spatial self-attention layer and a temporal self-attention layer. The former is used to extract the causal relationships between multiple process variables, while the latter focuses on the temporal dependency along the time dimension. The VSAE component utilizes a self-attention mechanism to aggregate information across all time steps and reconstructs the output of OA. Combining these two procedures, it is possible to perform fault detection and identification tasks simultaneously, rendering an improvement on the interpretability for process monitoring as well. We compare the OSAVA-based fault detection and identification method with representative statistical methods and DNN methods using the famous Tennessee Eastman Process. The results show that the proposed method can outperform existing methods by large margins and also highlights abnormal variables by assigning large attention weights for better interpretability. For industrial processes, it can take a long time for operators to determine the location and cause of an alarm after it has occurred. Our proposed method can detect the presence of a fault at an early stage and quickly isolate the fault variables, which will be effective in practical applications to enhance process safety and contribute to the management of abnormal situations.

The rest of the paper is organized as follows. In Section 2, the fundamental theories of orthogonal attention and variational self-attentive autoencoder are introduced in detail. Section 3 shows case study experiments and concrete analysis. Section 4 summarizes this paper.

安全生产是现代工业中持续关注的问题。随着自动化和数字化的发展，工业过程可以通过分布式控制系统 (DCS) 和高级过程控制 (APC) 等系统进行有效控制（Shu 等，2016）。然而，尽管控制系统的进步使生产变得更加智能，但现实世界的过程往往相当复杂，并且不可避免地处于故障状态，在严重的情况下会导致停机、经济损失、伤害甚至灾难性事故（Venkatasubramanian et al. , 2003)。因此，必须通过补偿过程中发生的故障的影响来实现更高水平的安全、效率、质量和盈利能力（Qin，2012，Ge 等，2013，Weese 等，2016）。

过程安全管理的进步凸显了智能过程监控的重要性（Madakyaru 等，2017；Fazai 等，2019）。总的来说，过程监控与四个过程相关：故障检测、故障识别、故障诊断和过程恢复（Chiang 等，2001）。故障检测是指确定是否发生了故障。故障识别是指识别与故障最相关的变量。故障诊断需要进一步说明故障的类型和原因。习惯上，故障检测和诊断统称为FDD。汗等人。从风险管理（RA）的角度描述了过程安全的发展，并指出FDD和RA的整合有效地提高了过程安全水平（Khan等，2015）。 Arunthavanathan 等。阐明了 FDD、RA 和异常情况管理 (ASM) 之间的辩证相互关系，从安全角度进一步扩展了这些概念的内涵（Arunthavanathan 等，2021）。 ASM 是一个集中和集成的过程，它意味着即时检测异常情况，及时诊断根本原因，并为操作员消除故障提供决策支持（Hu et al., 2015；Dai et al., 2016）。包括 FDD 在内的过程监控是 ASM 中最关键的问题之一，这已成为学术界和工业界的共识（Shu et al., 2016）。因此，构建高效、稳健且具有应用价值的过程监控框架对于过程安全至关重要。

近年来，许多研究人员从安全和风险管理的角度丰富了过程监控的范围。例如，BahooToroody 等人。提出了一种基于过程监控的信号过滤方法，用于天然气分配过程的安全评估（BahooToroody 等，2019）。殷等人。部署了一种有监督的数据挖掘方法，用于工业钻井过程中的气涌风险评估和智能警报（Yin et al., 2021）。阿明等人。提出了一种新的风险评估方法，该方法集成了多变量过程监控和逻辑动态故障预测模型（Amin 等，2020）。总之，有效和可靠的过程监控可以降低事故风险，让操作员了解过程状态，从而提高过程安全。

另一方面，过程监控方法按照建模方式可以分为以下几类：第一原理方法，也称为白盒模型；数据驱动方法，也称为黑盒模型（林等人，2017 年）。第一性原理方法主要利用知识、机制和数学方程来定量描述过程。相反，数据驱动的方法仅从收集的数据中分析过程的规则模式，而没有规定故障状态的含义。随着大数据的广泛应用和计算能力的增强，数据驱动的方法逐渐引起学术界和工业界的关注（Shardt，2015）。

数据驱动的过程监控方法大致可以分为两条发展路线：多元统计方法和深度学习方法。传统的统计方法主要有主成分分析（PCA）（Wise et al., 1990）、偏最小二乘法（PLS）（Kresta et al., 1991）、独立成分分析（ICA）（Comon, 1994）和Fisher判别法分析（FDA）（Chiang 等人，2000 年）等。已经对这些方法的优点进行了深入的讨论（Joe Qin，2003 年）。为了应对实际工业过程的非线性和动态特性，衍生出基于内核和基于动态的方法（Kaspar 和 Harmon Ray，1993；Ku 等，1995；Rosipal 和 Trejo，2001；Cho 等。 , 2005)。不幸的是，这些传统方法不能很好地处理复杂问题，并且仍然存在准确率低的问题。

随着深度学习的迅速兴起，基于深度神经网络（DNN）的过程监控方法引起了极大的关注。由于其强大的学习数据内在规律和表示层次的能力，基于深度学习的方法已成为当今过程监控领域的主流领域，近年来出现了先进的模型。例如，深度置信网络 (DBN) 和卷积神经网络 (CNN) 用于执行化学故障诊断任务并取得了优异的结果（Lv 等，2016；Zhang 和 Zhao，2017；Wu 和 Zhao，2018）。然而，这些工作将故障诊断视为一种有监督的分类任务，因此需要提前获得各种故障的标签。然而，故障的发生在现实中是一个小概率事件，即使发生故障事件，也往往是未知的。这种情况使得获取标记良好的故障数据变得不切实际，并降低了在工业过程监控中部署监督学习方法的实用价值。

相比之下，无监督过程监控方法通常仅从正常数据中提取特征，在实践中可能具有更广泛的应用前景。其中，自动编码器 (AE) 及其衍生物是具有代表性的神经网络 (Hinton, 2006)。它们已成功应用于许多场景，包括图像特征提取（Le，2013）、机器翻译（Cheng 等，2016）、异常检测（Zhou 和 Paffenroth，2017、Al-Qatf 等，2018、Roy 等） al., 2018) 和故障诊断 (Zheng and Zhao, 2020)。原则上，AE 通过降维来学习重构正常数据，并将无法重构的数据视为异常（Längkvist 等，2014）。

尽管它在许多领域有广泛的应用，但 AE 的一个缺点是它仅仅通过编码和解码来减少重建损失的训练，但它没有对潜在空间进行正则化，容易过拟合并且不能给出有意义的表示（Fu等人，2019 年）。变分自编码器 (VAE) 具有与 AE 相似的结构，但从概率的角度工作，并估计潜在空间中的后验分布（Kingma 和 Welling，2014 年）。通过引入这种显式正则化，可以训练 VAE 以获得具有良好属性（例如连续性和完整性）的潜在空间，从而允许插值和解释（Kingma 和 Welling，2019）。到目前为止，VAE 已广泛应用于过程监控（Lee 等人，2019；Wang 等人，2019 年；Zhang 等人，2019b）以及许多其他领域（Walker 等人，2016 年；Zhang 等人，2019 年） ., 2019a; Pol 等人, 2020)。

然而，AE 和 VAE 主要是静态深度网络，不考虑数据的动态行为（Längkvist 等，2014）。由于工业过程数据通常采用复杂的多元时间序列形式，因此有必要捕捉时间相关性和特征。为了更好地处理这样的时间序列，模型应该能够综合考虑当前时间步长以及它与过去时间步长的关系。循环神经网络 (RNN) 使用多个内部状态来记忆输入的可变长度序列，以对时间动态行为进行建模。受益于此，变分循环自编码器（VRAE）在过去几年中被提出并应用于许多领域。例如，Park 等人。使用 LSTM-VAE 结构在机器人辅助喂食场景中执行多模态异常检测任务（Park 等，2017）。林等人。提出了 VAE 和 LSTM 的混合模型作为时间序列异常检测的无监督方法（Lin 等人，2020 年）。 Cheng 阐述了一种基于 VAE 和 GRU 的新型故障检测方法，该方法在田纳西伊士曼过程中实现了比传统方法更高的检测精度和更低的检测延迟（Cheng 等，2019）。

在这种基于 RNN 的编码器-解码器结构中，编码器需要将所有输入序列收集到一个集成特征中。不幸的是，基于 RNN 的方法对长输入序列进行建模的能力存在某些问题（Cho 等人，2014 年）。特别是在工业过程中，下游数据往往对长期的上游数据有很强的依赖性。为了应对这一挑战，采用了基于注意力的编码器-解码器结构，该结构允许模型在序列的所有时间步长上赋予不同的权重，并自动关注更重要的部分（Bahdanau 等，2016）。例如， Aliabadi 等人。使用基于注意力的 RNN 模型对化学过程状态进行多步预测，该模型显示出优于传统机器学习技术的性能（Aliabadi 等，2020）。穆等人。引入了时间注意机制来增强 LSTM 并关注局部时间信息，从而在田纳西伊士曼过程中产生高质量的故障分类率（Mu 等人，2021）。到目前为止，在大多数情况下，注意力机制通常与 RNN 或 CNN 结合使用。然而，最近的工作证明，仅使用注意力机制就可以在许多任务上实现出色的性能（Vaswani 等，2017），而在工业过程监控中应用纯粹的注意力机制在研究文献中很少受到关注。

从上面的回顾中可以看出，近年来，随着最先进的深度学习模型的丰富，过程监控领域采用的方法也在不断发展。近年来，各种新颖而强大的深度网络赋予深度学习更大的表征能力，从而克服了传统方法的弊端。这些性能进步还为过程安全改进提供了坚实的基础（Dogan 和 Birant，2021 年）。此外，过程安全依赖于对过程的全局和主动意识，这可能无法通过单一方法或模型实现。随着面向单一任务的方法性能的优化空间逐渐缩小，也有望通过多种技术拓宽过程监控方法的综合性。最近的几项研究侧重于整合面向单一任务的方法，以获得用于故障检测、识别、诊断及其对过程安全的贡献的混合方法（Ge，2017，Xiao 等，2021；Amin 等，2021） .例如，阿明等人。利用基于 PCA 和贝叶斯网络的混合方法一次检测和诊断故障（Amin 等，2018）。邓等人。使用串行 PCA 对非线性过程进行故障检测和识别，其性能优于 KPCA 方法（Deng 等，2018）。注意机制和深度模型的结合也是在过程监控中执行多个任务的有效混合方法。例如，Li 提出了一种基于一维卷积和自注意力机制的非线性过程监控方法，自适应地提取全局和局部变量间结构的特征，该方法在田纳西伊士曼过程中进行了验证，用于故障检测和故障识别（Li等，2021）。注意机制和深度网络的融合可以更好地提取数据之间的相互关系，从而使数据驱动的方法对过程安全更加严格和可靠。然而，注意力机制的这种令人兴奋的功能在过程监控领域仍未得到充分探索。

尽管深度学习方法在故障检测中表现令人满意，但高精度是以高抽象为代价的。令人震惊的是，复杂的网络结构和海量的参数可能使模型成为一个难以理解的黑匣子，从而阻碍人类理解深度神经网络如何对故障的发生做出判断。最近，如何提高人工智能的可解释性成为了一个热烈讨论的话题（Chakraborty et al., 2017, Zhang and Zhu, 2018）。在过程监控和故障检测领域，我们不应止步于分配神经网络对故障的存在进行二元判断，而是赋予模型以帮助人们理解为什么做出某些判断或预测的能力（Kim et al. , 2016)。到目前为止，已经探索了实现这一愿景的许多途径。例如，鲍等人。提出了一种称为 SGLPP 的稀疏降维方法，该方法提取稀疏变换向量以揭示变量之间有意义的相关性，并进一步构建变量贡献图以产生可解释的故障诊断结果（Bao 等，2016）。 Wu 和 Zhao 通过将过程拓扑知识结合到图卷积神经网络中开发了 PTCN 方法，进行了比其他数据驱动的故障诊断模型更合理、更易理解的特征提取（Wu 和 Zhao，2021）。总之，提高模型可解释性的主要途径包括使用内在可解释的模型，例如决策树或贝叶斯网络；为每个特征提供汇总统计，例如变量贡献图；并揭示模型内部参数的实际含义，例如注意力机制的权重值（Ribeiro 等，2016）。

在最近的研究中，注意力机制在许多领域赋予深度神经网络可解释性方面取得了显著成效（Mott 等人，2019 年）。例如，在多变量过程监控问题中，有两个最关键的点需要考虑：变量之间的因果关系和沿时间序列维度的时间依赖性。通过检查注意力机制的权重，我们可以了解模型关注的是哪一部分数据，这有利于解释深度模型的内部参数。 Gangopadhyay 等。提出了一个时空注意模块，以增强理解不同特征对时间序列预测输出的贡献。还从领域知识的角度验证了学习到的注意力权重（Gangopadhyay 等，2020）。王等人。设计了一个多注意力一维卷积神经网络，可以充分考虑滚动轴承故障的特征，增强故障相关的特征，忽略不相关的特征（Wang et al., 2020）。可以看出，利用变量之间的注意力机制可以促进模型更好的可解释性和更好地评估多个变量的贡献。这有助于在检测到故障后隔离异常变量并执行故障识别任务，从而极大地提高现场操作员的故障排除效率并提高系统安全性。然而，据我们所知，注意力机制的这种巨大优势在化学过程中还没有得到足够的关注。

受上述观察的启发，在本文中，我们提出了一种新的基于注意力的模型——正交自注意力变分自编码器（简称 OSAVA），用于工业过程监控。 OSAVA 模型由两部分组成：正交注意（OA）和变分自注意自动编码器（VSAE）。 OA模型包括两个独立的分支：空间自注意力层和时间自注意力层。前者用于提取多个过程变量之间的因果关系，而后者则侧重于沿时间维度的时间依赖性。 VSAE 组件利用自注意力机制来聚合所有时间步长的信息并重建 OA 的输出。结合这两个过程，可以同时执行故障检测和识别任务，也提高了过程监控的可解释性。我们使用著名的田纳西伊士曼过程将基于 OSAVA 的故障检测和识别方法与具有代表性的统计方法和 DNN 方法进行比较。结果表明，所提出的方法可以大大优于现有方法，并且通过分配较大的注意力权重来突出异常变量以获得更好的可解释性。对于工业过程，操作员可能需要很长时间才能在发生警报后确定警报的位置和原因。我们提出的方法可以在早期检测到故障的存在并快速隔离故障变量，这将在实际应用中有效地提高过程安全性并有助于异常情况的管理。

可以看到，该论文使用了最近很火的attention机制详解Transformer （Attention Is All You Need）

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)