论文阅读笔记之——《Toward Convolutional Blind Denoising of Real Photographs》及基于pytorch的CBDNet的复现

2023-05-16

本文是CBDNet（convolutional blind denoising network）的阅读笔记。本博文分为两块，一块是阅读笔记，一块是本人对CBDNet的实验记录

论文链接：https://arxiv.org/pdf/1807.04686.pdf

论文的代码：https://github.com/GuoShi28/CBDNet

一些去噪用的数据集https://blog.csdn.net/zbwgycm/article/details/82144242

理论部分

Background

作者开篇写道“Despite their success in Gaussian denoising, deep convolutional neural networks (CNNs) are still very limited on real noisy photographs, and may even perform worse than the representative traditional methods such as BM3D and K-SVD.”尽管深度卷积网络在高斯去噪取得了重大的成就，但是对于真实的噪声，它的Generalization ability却是非常的差，甚至不如一些传统的方法。大多数基于卷积网络的去噪算法都是基于additive white Gaussian noise (AWGN)，仅仅是去除高斯噪声.但是现实中，camera的噪声会来自于暗电流噪声、热噪声、粒散噪声等等（可以看博文《IPS流程（camera成像原理的介绍）》）因此真实噪声跟人为建模的高斯噪声是差很远的，而如果仅仅让网络去学习这个高斯加噪的图片，网络的泛化能力就会比较差。

在盲去噪中都存在着noise estimation and non-blind denoising.To the best of our knowledge, little work has been given to develop CNN-based model for the blind denoising of real photographs.

Due to that the real noise distribution is much different from Gaussian, DnCNN trained based on AWGN generally does not work well on noise removal for most real images.

通过incorporating network architecture, noise modeling, and asymmetric learning（非对称学习）提出CBDNet网络.CBDNet
is comprised of a noise estimation subnetwork and a denoising subnetwork, and is trained using a more realistic noise model by considering both signal-dependent noise and in-camera processing pipeline（信号相关噪声与ISP流程的噪声）.

CBDNet包含了两个子网络：噪声估计子网络和非盲去噪子网络

the asymmetric learning is presented on the noise estimation subnetwork to suppress more on under-estimation of noise level.（这是受启发于BM3D）

To make the learned model applicable to real photographs, both synthetic images （合成的图像）based on realistic noise model and real noisy photographs with nearly noise-free images are incorporated to train our CBDNet.

incorporate both synthetic images based on signal dependent noise model and real noisy photographs with nearly noise-free images during the training of our CBDNet， which can improve the denoising performance and generalization ability of the learned model.

本文的亮点：

提出了一个更加真实的噪声模型，其考虑了信号依赖噪声和ISP流程（可参考博文《IPS流程（camera成像原理的介绍）》）对噪声的影响，展示了图像噪声模型在真实噪声图像中起着关键作用。从某种程度上而言，本人觉得，这可以理解为作者用了一个更加接近真实场景的退化模型
提出了CBDNet模型，其包括了一个噪声估计子网络（noise level map）和一个非盲去噪子网络，可以实现图像的盲去噪（即未知噪声水平）——通过噪声估计网络和非盲去噪网络实现图像的盲去噪的过程。
提出了非对称学习（asymmetric learning）的损失函数，并允许用户交互式调整去噪结果（interactively rectify the denoising result by tuning the estimated noise level map.通过调整估计的噪声水平图来交互地纠正去噪结果。），增强了去噪结果的鲁棒性。
将合成噪声图像与真实噪声图像一起用于网络的训练，提升网络的去噪效果和泛化能力。

the introduction of noise estimation subnetwork can bring more benets to the denoising of real photographs. We note that non-blind denoisers (e.g., BM3D and FFDNet) are sensitive to under-estimation error of noise level but performs robust to over-estimation error.作者在这里的一个重要的发现是，对于噪声过估计，会有利于网络去噪。就是说你估计的noise level map太小，那可能会导致效果比较差，但是你估计的noise level map估计大了，却会使得performance更好，这就是一个asymmetric sensitivity（非对称性的灵敏度）even the real noise is much dierent from AWGN, non-blind denoisers can still achieve satisfying non-blind denoising result by simply increasing the noise level adopted in the algorithm.（就说每次都过高的去估计noise level map可以使得结果更加好？？？）

如下图所示。both non-blind BM3D and blind DnCNN fail to denoise the two real noisy photographs from DND. In contrast, our CBDNet achieves very pleasing denoising results and can retain most structure and details while well suppressing the sophisticated noise in images.

Real Image Noise Modeling（真实的噪声图片模型synthetic noisy images或称为Realistic Noise Model）

真实图片的噪声is much more sophisticated than AWGN

作者在论文里面提到“signi cantly depends on noise model, while more realistic noise model generally bene ts denoising performance.”

The noise model plays a critical role in guaranteeing the denoising performance of CNN-based denoiser on real photographs.

更真实的加噪

把camera的ISP考虑在内：

对于噪声RAW图像，采用公式1来训练CBDNet/模型。

对于噪声未压缩图像，采用公式2来训练CBDNet模型。

对于噪声压缩图像，采用公式3来训练CBDNet模型

Network Architecture

Noise Estimation Network

adopts a plain five-layer fully convolutional network without pooling and batch normalization operations. In each convolution layer, the number of feature channels is set as 32, and the filter size is 3x3. The ReLU nonlinearity is deployed after each convolution (Conv) layer.

an asymmetric loss——imposing more penalty on under-estimation error of noise level（对噪声水平的低估误差施加更多的惩罚。）

With the asymmetric learning, our CBDNet can produce better results on real photographs in the blind denoising setting.

非对称学习（Asymmetric Learning）

作者观察到非盲去噪方法（如BM3D、FFDNet等）对噪声估计的误差具有非对称敏感性（the asymmetric sensitivity of non-blind denoisers）。如下图所示，分别用BM3D和FFDNet使用不同的输入噪声标准差去噪（标准差依次设为5、10、15、25、35、50），其中绿色框代表输入噪声的标准差与真实噪声标准差一致。可以观察到，当输入噪声的标准差与真实噪声的标准差一致时，去噪效果最好。当输入噪声标准差低于真实值时，去噪结果包含可察觉的噪声；而当输入噪声标准差高于真实值时，去噪结果仍能保持较好的结果，虽然也平滑了部分低对比度的纹理。因此，非盲去噪方法对低估误差比较敏感，而对高估的误差比较鲁棒。正是因为这个特性，BM3D可以通过设置相对较高的输入噪声标准差得到满意的真实图像去噪效果。

the asymmetric sensitivity of non-blind denoisers。为了将这种 asymmetric sensitivity 应用于盲超分。（下图中，第一个loss对应噪声估计网络，第二个loss是为了约束平滑性，第三个loss是去噪的loss）

The non-blind denosing subnetwork

关于网络的总结见下图

关于论文数据（Incorporating Synthetic and Real Noisy Images in Training）

Despite the progress in noise modeling, there remains a gap between the real noise and synthetic noise. It is noted that deep models tend to over-fit the training data, which may make the CBDNet learned on solely synthetic images have limited generalization ability to real photographs. Fortunately，approaches have been developed to obtain nearly noise-free image of the real noisy photographs, and several small scale datasets RENOIR（RENOIR也是人工合成的把？所以这也看来，应该是没有作者自己做的真实的数据集，或者说，所谓的作者做的数据是synthetic images）

For the synthetic images, the ground-truth clean image and noise level map are available。

For the real noisy images, the noise is real, but only the nearly noise-free image is available and the noise level map is unavailable.

对于合成噪声图像，作为ground-truth的干净图像和噪声水平图是可用的，但噪声模型可能与真实噪声不太相符；而对于真实噪声图像，噪声是真实的，但仅仅可以获得接近无噪声的图像作为ground truth，而噪声水平图是未知的。另外，一般真实噪声图像的ground truth比较难以获取，而合成噪声图像可以比较方便的大规模合成。因此，在训练CBDNet的过程中，结合这两种类型的图像，提高网络的泛化能力。

对于合成噪声图像。文章使用上述的噪声模型合成噪声图像。其使用了BSD500的400张图像，Waterloo的1600张图像，MIT-Adobe FiveK数据库的1600张图像作为训练数据。
对于真实图像。使用RENOIR数据库的120张图像（http://ani.stat.fsu.edu/~abarbu/Renoir.html）。

为了提高网络的泛化能力，本文交替使用一批合成图像和一批真实图像进行训练。

目前去噪数据集的建立主要分为以下三种方式：
1. 从现有图像数据库获取高质量图像，然后做图像处理（如线性变化、亮度调整）并根据噪声模型添加人工合成噪声，生成噪声图像；

这种方法比较简单省时，高质量图像可以直接从网上获取，但由于噪声是人工合成的，其与真实噪声图像有一定差异，使得在该数据集上训练的网络在真实噪声图像上的去噪效果受限；
2. 针对同一场景，拍摄低ISO图像作为ground truth，高ISO图像作为噪声图像，并调整曝光时间等相机参数使得两张图像亮度一致；

这种方法只使用单张低ISO图像作为ground truth，难免会残留噪声，且与噪声图像可能存在亮度差异和不对齐的问题；
3. 对同一场景连续拍摄多张图像，然后做图像处理（如图像配准、异常图像剔除等），然后加权平均合成ground truth；

这种方法需要拍摄大量图像，工作量比较大，且需要对图像进行严格对准，但一般得到的ground truth质量比较高；

为了提高去噪网络的鲁棒性和泛化能力，常常需要将输入噪声图像的噪声水平估计也作为网络输入。而真实噪声图像的噪声水平估计往往存在一定误差，从这一方面考虑，合成噪声图像由于噪声模型已知，所以其噪声水平估计是准确的，有利于网络的在不同噪声水平上的泛化。CBDNet就考虑将真实噪声图像和合成噪声图像一起作为训练集，交替对网络进行训练以提升网络的性能。

论文的实验的测试集基于three real noisy image datasets来实现的DND、NC12、Nam（可是这三个数据集不是人工生成的数据么。。。算真实的数据？所以个人感觉，这里所说的真实图片，其实也就是更加接近真实情景的图片）所谓的“adopt a signal-dependent image noise, and also take in-camera image processing pipeline into account to generate synthetic training images.”是不是就是用了这里的数据集呢？还是说作者自己处理的，并没有公布出来呢？但作者又说our CBDNet is trained by synthetic and real noisy images，所以这至此还是一个疑问。。。。。。。

在评价指标（quantitative metrics）上采用了PSNR和SSIM

#############################################################################################################

代码实操部分

打开才发现代码是matlab版本的，接下来把它重塑为pytorch版本。。。。。。

参考

https://blog.csdn.net/zbwgycm/article/details/82052003

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)