PR-RL:Portrait Relighting via Deep Reinforcement Learning

2023-11-19

Title:PR-RL: Portrait Relighting via Deep Reinforcement Learning

Citation:Zhang, Xiaoyan, et al. “Pr-rl: Portrait relighting via deep reinforcement learning.” IEEE Transactions on Multimedia (2021).

10分钟


1 Article

1.1 Abstract and Introduction

提出一种基于深度强化学习利用深度确定性策略梯度(Deep Deterministic Policy Gradient)的肖像重光照方法(PR-RL)。

PR-RL模型通过顺序预测局部光线编辑线条stroke)来表现肖像重光照,利用stroke对图像的明暗度进行减淡和加深(Dodge and Burning)——模拟画家画线条。

action:连续空间中的stroke,

action根据reward引导agent to learn and relight a portrait image.

To optimize PR-RL, the paper make the reward relevant to location, and utility a coarse-to-fine strategy to select corresponding actions.

PR-RL is the more locally effective, scale-invariant and interpretable than existing method.

The paper applies the proposed method to tasks of portrait relighting based on both SH-lighting and reference images.


Physically-based methods:

Physically-based methods
1.Solve an inverse rendering problem
2.Estimate face geometry, reflectance and light
3.According to a desired target and relight the face
Physically-based methods prbloem
1.It is hard to get a perfect inverse
rendering parametric model from a single RGB image
2.Inaccurate estimation of face intrinsic components
can cause strong artifacts in relighting results.
(artifact:合成图片中,不自然、可识别的人为处理的痕迹)

Difficulties: human’s face have a complex geometry, because various, poses, expressions.

Positives: human’s face have a regular geometry, everyone have similar and symmetrical shape. -----> we can use deep learning models to learn a relighting strategy from pre-rendered or capture images.

Deep learning methods
CNNs with encoder-decoder framwork
The relighting result
1.Postive
The result can be generated in only one step
2.Negative
1.The result have local errors and artifacts
2.The portrait relighting is not interpretable
3.The generated images with limited resolutions
Graph annotation: 
Deep learning methods have CNNs with encoder-decoder architecture as relighting generators. The relighting result can be generated in only one step, so the encoder-decoder framework is sufficient and simple, but the result have local errors and artifacts, the portrait relighting strategy is not interpretable, the CNNs-based generated can only generate images with limited resolutions

decompose
The process of relighting
Agent edits local light sequence
Action:draw strokes on the image
RL
predict actions and provide reward
every new actions
observe and gain experiences
select a new action to achieve the desired effect
Graph annotation:
Action: draw strokes on the image 

RL: predict actions and provide reward for every actions, then use actions and rewards to observe and gain experiences to select the next action to achieve the desired effect.

make strokes as parameters to control the stroke position, shape and lightness.
The every actions define the parameters of the strokes, but these parameters are scale-invariant and can be used for images in any resolution.

RL’s agent take a source image and a Spherical Harmonic lighting vector or a reference image as the inputs.

Agent adopts Deep Deterministic Policy Gradient(DDPG) to model, it can define the actions in a continuous space, because the light editing strokes need continuous and high dimensional

action spaces.

Action controls the position, shape, lightness of strokes via defining the parameters.

The paper adopts dodge and burn to manipulate the exposure of a selected area on the image, this way can make the sequential local light editing interpretable.

Dodge can increase the exposure for areas which wishes to be brighter on the image.

Burn can decrease the exposure for areas which wishes to be darker on the image.

Using dodge and burn to make the edited image closer to the image captured in the real lighting condition we desired

The reward is used to assess the selected stroke position and guide the agent to select strokes from coarse to fine throughout the editing process.

assess
guide
Reward
selected stroke position
agent selects strokes from coarse to fine

Contribution:

1.make a portrait relighting model, the model can realize a sequential local light editing process by selecting strokes and using dodge and burn on the image in a coarse-to-fine strategy.

2.Using reward to guide the agent to learn and relight, generate location related award and perform coarse-to-fine action selection.

3.The method of this paper is locally effective, scale-invariant and interpretable. The method can efficiently relight portrait images in high resolutions with interpretable steps.

4.The method is based on SH-lighting and reference images.

1.2 Conclusion

1.Propose a locally effective, scale-invariant and interpretable portrait relighting method by modelling portrait relighting as a sequential local light editing process.

2.The agent of this method can relight to generate coarse-to-fine actions like artists.

3.The method can be applied to different applications.

4.The PR-RL method outperforms the existing stat-of-the-art methods.

1.3 Related research

1.3.1 Portrait Relighting
Portriat Relighting
Physically-based methods
Reference image light sytle transfer
End-to-End data-driven deep learning networks
physical simulation method
The principle of relighting
1.Gain the geometry and intrinsic components
2.Render a new picture with a novel lighting direction

Solve inverse rendering to estimate geometry and intrinsic from a single image.(many method uses deep learning algorithms to estimate geometry and intrinsic components)


CNNs

When relighting, there is light occlusion.

Kanamori et al. uses CNNs to decompose the intrinsic components into albedo map, illumination pattern, a light transport map.

problem: The methods don’t consider specular and shadows, so may fail in handing extreme lighting conditions.

Wang et al. proposes a framework that models multiple reflectance channels, including the facial albedo, geometry, specular and shadows.

According to geometry and intrinsic components information, the image can be rendered to match a new light by using physical rendering model.


End-to-end data driven

Some method use end-to-end data-driven networks to directly map a portrait image to a new image. These methods use images in different lighting conditions to train the networks.

Nestmeyer et al. and Sun et al. use a Light Stage to capture portrait in different lighting views by rotating light.

Zhou et al. use an offline physically-based relighting method to generate high quality relighting images.

Nestmeyer et al. train two U-Nets to generate diffuse and non-diffuse relighting results.

Han et al. use a generative adversarial network(GAN) to generate relighting results and encode lighting conditions in their dataset into one-hot lighting labels.

The end-to-end networks extract features from the whole image and reconstruct the image. However, it can cause local errors and artifacts in the process.


Reference image light sytle transfer

Chen et al. decomposed the lightness layers of the reference and the source images into large-scale and detailed layers. And then replace the large-scale layer of the source image with the large-scale layer of the reference image to obtain relighting results.

Shih et al. decompose the source and reference images into multiscale Laplacian stacks,(拉普拉斯叠加), modify the local energy of source subbands to match the local energy of the reference subbands.

The two methods require both the source and the reference images must be aligned and angularly similar. The two methods can transfer the lighting style and a rough distribution of its lighting directions.

However these methods fail to transfer the shadow caused by face geometry.

Shu et al. solved this problem by adding a normal map and position constraint in their transfer algorithm.

Zhu et al. construct an optimal transport plan between the histogram of feature vectors containing feature, position and normals to design a relight generator.

1.3.2 Reinforcement Learning:
Reinforcement Learning
Game AI
Recommendation systems
Image processing,image captioning,image enhancement

1.4 System (Portrait Relighting Method)

Portrait relighting is modeled ad a Markov Decision Process, it can edit the light locally and sequentially.

The agent is used for portrait relighting based on a SH-lighting vector or a reference image.

A desired light L
A source portrait image I
input
A desired light L
output

At each step t ∈ [ 1 , T ] t\in [1,T] t[1,T], the agent makes a decision according to its policy π ( I t − 1 , L ) \pi(I_{t-1},L) π(It1,L) to select an action a t a_t at. ( I t − 1 I_{t-1} It1 is the relighted portrait in step t − 1 t-1 t1)

According to the parameters in the selected action a t a_t at, generate an image editing stroke.

the portrait image is edited and updated from I t − 1 I_{t-1} It1 to I t = I t − 1 ◦ a t I_t = I_{t-1}◦a_t It=It1at (◦ is the light rendering operation based on the action )

The final relighting result I o u t I_{out} Iout is generated by accumulating T steps.( Each step operation is based on the result of its previous step)

The relighting operation:

a t = π ( I t − 1 , L ) a_t = \pi(I_{t-1},L) at=π(It1,L)

$I_t = I_{t-1}◦a_t $

I o u t = I ◦ a 1 ◦ a 2 ◦ . . . ◦ a T I_{out} = I◦a_1◦a_2◦...◦a_T Iout=Ia1a2...aT
在这里插入图片描述

the state: relighted image I t − 1 I_{t-1} It1,

the target lighting condition L L L

the current step t t t

The actor network predicts actions based on the state.

The action is a vector contains the parameters of the stroke and the action is transformed into a soft stroke mask by a pre-trained soft render network.

The soft stroke mask defines the location of the stroke and the level of adjustment define within the range (0,1)

According to the constraint of the stroke mask, the image is edited by using doge and burn.

The next state s t + 1 s_{t+1} st+1 is obtained by updating the image.

The reward is computed to help the critic network to learn the Q value for the chosen action.

(consider the variation of images in current and next states and the process of coarse-to-fine stroke sequence.)

the actor network update the prediction of the selected action based on the output of the critic network.

1.4.1 Agent Model

The agent model is designed by the DDPG algorithm, it has two different neural networks of actor and critic combining value-based and policy-based methods.

The actor network is policy-based, according to the input state, it can output the action.

At step t t t, the policy of the actor is a t = π ( I t − 1 , L ) a_t = \pi(I_{t-1},L) at=π(It1,L)

policy-based
value-based
Agent:DDPG
value-based and policy-based methods
actor network
critic network
input state
action a t
states
reward feedback
Q vaule

critic use Q ( s , a ) = R ( s , a ) + γ   m a x Q ( s ′ , a ′ ) Q(s, a) = R(s,a) + \gamma \ max{Q(s',a')} Q(s,a)=R(s,a)+γ maxQ(s,a)

γ \gamma γ is a discount factor

s s s is state at the current step

a a a is action at the current step

s ′ s' s is state at the next step

a ′ a' ais action at the next step

R ( s , a ) R(s,a) R(s,a) is the current state reward.

By considering the current reward and next max Q value in the memory, update the Q value until all Q values reach their convergence.

Q ( s ′ , a ′ ) Q(s',a') Q(s,a) is chosen by actor network.

The loss function of the critic is that squared error between target action values and predicted action values.

L c r i t i c = ( R ( s , a ) + γ Q ( s ′ , a ′ ) − Q ( s , a ) ) 2 L_{critic}=(R(s,a)+\gamma Q(s',a')-Q(s,a))^2 Lcritic=(R(s,a)+γQ(s,a)Q(s,a))2

The actor is updated to get predicting actions with high Q value estimated by critic.

The loss function of the actor is : L a c t o r = − Q ( s , a ) L_{actor}=-Q(s,a) Lactor=Q(s,a)

At step t, the environment state s t s_t st is a combination of the relighted image I t − 1 I_{t-1} It1

input light is L L L,

At step t, s t = ( I t − 1 , L , t ) s_t = (I_{t-1},L,t) st=(It1,L,t)

Step t t t is making the agent ware of which step is in the whole decision process.

1.4.2 Action

The action is how to adjust the light and where an editing is needed

action
how to adjust the light
where an editing is needed

The action a a a is a set of parameters that can control the position, shape, and lightness of stroke.

To do the local light editing on the human faces, the paper use smooth curves represent the shape of strokes.

use simple curves with a small number of control points to fit the local geometry of a human face.(the simple curve can learn simple curves more effective than complex curves)

在这里插入图片描述

use a quadratic B e ˊ z i e r B\acute{e}zier Beˊzier curve and control the curve by three points with coordinate ( x 0 , y 0 ) , ( x 1 , y 1 ) , ( x 2 , y 2 ) (x_0,y_0),(x_1,y_1),(x_2,y_2) (x0,y0),(x1,y1),(x2,y2)

the stroke can be controlled by two parameter t 0 t_0 t0 t 1 t_1 t1 t 0 t_0 t0 is the thickness of the start point, t 1 t_1 t1 is thickness of the ending point. Using the start point e 0 e_0 e0 and the end point e 1 e_1 e1 can control the lightness of the stroke. e 0 e_0 e0 and e 1 e_1 e1range from 0 to 1)

the action space is: a c t i o n = { x 0 , y 0 , x 1 , y 1 , x 2 , y 2 , t 0 , t 1 , e 0 , e 1 } action=\left\{x_0,y_0,x_1,y_1,x_2,y_2,t_0,t_1,e_0,e_1\right\} action={x0,y0,x1,y1,x2,y2,t0,t1,e0,e1}

1.4.3 Image Editing

To edit the image based on the chosen action.

stroke mask is firstly generated by a stroke renderer.

Firstly, a stroke renderer generates stroke mask, and then the image is edited by using doge and burn operations.

After image is edited, the stated is updated.

generate
a stroke renderer
stroke mask
  1. Stroke renderer:

    Stroke renderer is a neural network, it can convert the parameters in an action to a stroke on canvas.

    Hard strokes can make noticeable borders and unnatural transitions.

    Soft strokes can make the editing natural.

    The paper trains a renderer to generate soft strokes.

    Soft change of the stroke from center to boundary is generated by gradually reducing the lightness from center to boundary as shown in Figure 3.

在这里插入图片描述

​ To train the stroke renderer:

​ To generate a dataset containing both the soft strokes and their corresponding parameters.

​ In Figure 4, Get a set of stroke parameters { x 0 , y 0 , x 1 , y 1 , x 2 , y 2 , t 0 , t 1 , e 0 , e 1 } \left\{x_0,y_0,x_1,y_1,x_2,y_2,t_0,t_1,e_0, e_1\right\} {x0,y0,x1,y1,x2,y2,t0,t1,e0,e1},

the B e ˊ z i e r B\acute{e}zier Beˊzier curve is firstly generated based on control points ( x 0 , y 0 ) , ( x 1 , y 1 ) , ( x 2 , y 2 ) (x_0,y_0),(x_1,y_1),(x_2,y_2) (x0,y0),(x1,y1),(x2,y2),

​ Then, to sample a number of points on the this curve and draw circles on the curve with each sampled point as the center. t 0 t_0 t0 and t 1 t_1 t1 are the head and tail of the curve.

​ For other circles, their radius are generated by interpolation between t 0 t_0 t0 and t 1 t_1 t1.

​ Each circle’s lightness is decreased from the center to the edge.

​ The lightness of the center is e e e and the lightness of the edge is 0.

​ The lightness e e e in the middle part of the curve is generated by interpolating from e 0 e_0 e0 to e 1 e_1 e1

​ When sample enough number of points on the curve, the strokes will be smoothly.

在这里插入图片描述

  1. Dodge and burn

    To make the sequential local light editing interpretable. The paper adopts dodge and burn, dodge and burn edits the lightness distribution curve of the image.

    To use different curves influence pixels at different levels.
    在这里插入图片描述

    Figure 5 there is dodge curve on the left, and burn curve on the right.

    the dodge curve increases the lightness values,

    the burn curve decreases the lightness values.

    By dodge and burn, we can get three classes of highlight, mid-tone and shadow.

    The paper adopts the following two curves

    I d o d g e = ( 1 + S / 3 ) ⋅ M ⋅ I + ( 1 − M ) ⋅ I I_{dodge} = (1+S/3)\cdot M\cdot I + (1-M)\cdot I Idodge=(1+S/3)MI+(1M)I

    I b u r n = ( 1 − S / 3 ) ⋅ M ⋅ I + ( 1 − M ) ⋅ I I_{burn}=(1-S/3)\cdot M \cdot I +(1-M)\cdot I Iburn=(1S/3)MI+(1M)I

    S S S: a stroke mask as shown (b) Soft stroke.

在这里插入图片描述

M M M is binary mask to filtrate the area that needs to be edited and it is calculated by quantifying the value S > 1 S>1 S>1 as 1 1 1

At the training stage, using action bundle to predict k k k actions at each step. The way can reduce the calculating cost and allow the agent to learn the association between distant states and actions.

​ In the paper experiments, the agent is set to predict 6 actions to generate 3 dodge and 3 burn strokes at each step.

1.4.4 Reward

​ The reward evaluate the predicated action of the agent, (reward is the quality feedback of the agent to find the optimal action)

​ (The goal of the relighting task is to make the edited image in current state s t + 1 s_{t+1} st+1 is closer to the target image than the relight result at state s t s_t st. If the result at s t + 1 s_{t+1} st+1 is closer to the target result than the result at s t s_t st, the agent will get a positive reward, otherwise the agent gets a negative reward.)

r ( s t , a t ) = D ( I t − 1 , g ) − D ( I t , g ) r(s_t,a_t)=D(I_{t-1},g)-D(I_t,g) r(st,at)=D(It1,g)D(It,g)

s t s_t st: the state at step t.

a t a_t at: the chosen action at step t.

g g g: the target image

D ( ⋅ ) D(\cdot) D(): a metric to evaluate the distance between current relighting results and target images g g g.


​ The paper adopts PathGAN reward, Content L2 reward, Shading reward.

**1)**PathGAN reward: PathGAN is a type of discriminator, it can measure the distribution distance between the generated data and the target data.

PathGAN map an $n\times n$ image to $n^2$ overlapped patches and average all responses of patches to provide an ultimate output. (The output's range is 0-1, which is the higher the better.)

r g a n ( s t , a t ) = − ( G d ( I t − 1 , g ) − G d ( I t , g ) ) r_{gan}(s_t,a_t) = -(G_d(I_{t-1},g)-G_d(I_t,g)) rgan(st,at)=(Gd(It1,g)Gd(It,g))


**2)**Content L2 reward: To make the results more precise on shadow and highlight areas, and the transition between the shadow and highlight areas more natural inside the image.

L2 distance between the relighting result and the target image at pixel levels can be set as a metric in reward: r l 2 ( s t , a t ) = ∣ ∣ I t − 1 − g ∣ ∣ 2 − ∣ ∣ I t − g ∣ ∣ 2 r_{l2}(s_t,a_t) = ||I_{t-1}-g||_2-||I_t-g||_2 rl2(st,at)=It1g2Itg2


**3)**Shading reward: The right shadow and highlight on a face can make relighting results look realistic and natural.

To use a shading reward guide the agent to learn fast and precisely.

The shading variation S S S is extracted from the difference between the relighted image I t I_t It

and the original image I I I.

在这里插入图片描述

S ( I t , I ) = I t I + σ S(I_t,I)= \frac{I_t}{I+\sigma} S(It,I)=I+σIt

σ = 1 0 − 6 \sigma = 10^{-6} σ=106 (to avoid zero denominator condition)

The distance D of shading reward is a L2 distance between shading variation of the relighting result and shading variation of the target image.

The shading reward is r s = ∣ ∣ S ( I t − 1 , I ) − S ( g , I ) ∣ ∣ 2 − ∣ ∣ S ( I t , I ) − S ( g , I ) ∣ ∣ 2 r_s =||S(I_{t−1}, I) − S(g, I)||_2 −||S(I_t, I) − S(g, I)||2 rs=S(It1,I)S(g,I)2S(It,I)S(g,I)2


4)Stroke reward:

To use stroke reward for the entire sequence of strokes to guide the generation of coarse-to-fine stroke sequences.

Using a stroke reward to guide the agent in selecting strokes by following a coarse-to-fine manner.

Stroke size is represented by stroke length, stroke thickness, and stroke weight.

The stroke length S l S_l Sl is calculate as L 2 L_2 L2 distance of three control points.

Stroke thickness is S t = t 0 + t 1 S_t = t_0+t_1 St=t0+t1

Stroke weight is S w = e 0 + e 1 S_w = e_0 + e_1 Sw=e0+e1

Stroke reward: r s t r o k e = − [ T ( S l ) + T ( S t ) + T ( S w ) ] ⋅ t ⋅ α r_{stroke}=-[T(S_l)+T(S_t)+T(S_w)]\cdot t\cdot \alpha rstroke=[T(Sl)+T(St)+T(Sw)]tα

T ( S i ) = { 0    S i < t h r e s h o l d S i    S i ≥ t h r e s h o l d T(S_i) =\left\{\begin{matrix}0\ \ S_i< threshold\\S_i\ \ S_i\ge threshold \end{matrix} \right. T(Si)={0  Si<thresholdSi  Sithreshold

i ∈ { l , t , w } i \in \left\{l,t,w \right\} i{l,t,w}

α \alpha α is a scale to control of t t t on the stroke selection.

5)Final reward: r = r g a n + r l 2 + r s + r s t r o k e r = r_{gan}+r_{l2}+r_s+r_{stroke} r=rgan+rl2+rs+rstroke

1.4.5 Application on Light Transfer

To allow users to transfer the lighting condition from a reference image to an input photograph.

Then, the agent can take a source portrait and a reference portrait image as the inputs.

the lighting condition
a reference image
an input photograph

we need the source, reference and target images to train the agent.

Due to the lack of target relighting image, the paper only calculate PatchGAN and stroke rewards.

在这里插入图片描述

the source image as I s I_s Is

the reference image as I r I_r Ir

original image as I r 0 I_r^0 Ir0

W is 2D face warp function based on landmarks.

σ = 1 0 − 6 \sigma=10^{-6} σ=106 is a constant to avoid zero denominator condition.

using I r 0 I r + σ \frac{I_r^0}{I_r+\sigma} Ir+σIr0 to get illumination

I t = W ( I r 0 I r + σ ) ⋅ I s I_t =W(\frac{I_r^0}{I_r+\sigma})\cdot I_s It=W(Ir+σIr0)Is

1.to extract light form I r 0 I_r^0 Ir0 and I r I_r Ir s

2.to warp I r 0 I r + σ \frac{I_r^0}{I_r+\sigma} Ir+σIr0 to match facial shape of the source image based on the face landmarks.

3.the warped light dot product I s I_s Is to generate a coarse relighting result I t I_t It

1.4.6 Network Architectures

The input source image is the L channel in Lab color space converted from a 256 × 256 256\times 256 256×256 RGB image.

The SH-lighting is a 9 × 1 9\times1 9×1 vector, copy and expand each dimension into 256$\times$256 to fit with the input image.

The paper’s actor and critic networks use ResNet-18 for extracting features, and use a fully- connected layer for predicting actions and Q values.

The stroke renderer network uses 5 fully-connected layers and 8 of 3$\times 3 c o n v o l u t i o n l a y e r s ∗ ∗ t o ∗ ∗ m a p t h e s t r o k e p a r a m e t e r s i n t o a 256 3 convolution layers** to **map the stroke parameters into a 256 3convolutionlayerstomapthestrokeparametersintoa256\times$256 stroke mask.

1.5 Experiment and Result

Experiment environment:

Pytorch, Nvidia Tesla P100,

Using constructed dataset based on CelebAMask-HQ for training.

Using Adam optimizer to train network

The initial learning rate of actor is 1 0 − 3 10^{-3} 103

The initial learning rate of critic is 3 × 1 0 − 4 3\times 10^{-4} 3×104

The discount factor γ \gamma γ is 0.9 5 5 0.95^5 0.955

The capacity of experiment replay buffer is 800

The paper’s model train 31250 times with batch size 32.

The actor network adopts batch normalization.

The critic network and weight normalization adopts normalization.

At each time, an image is trained 10 times, the resolution of image is 256$\times$256.

1.5.1 Dataset Preparation

Multi-PIE, Extended-YaleB: to provide relighting portraits.

To get high quality relighting portraits, synthesize high quality relighting in the wild to train model.

To synthesize dataset based on a single directional light. Because the directional light source is a very general representation, the paper’s method can enable the relighting to fit in with complex environment maps by the dataset.

Let human faces are assume as Lambertian reflectance, the rendering of a relighted image I I I can be simplified as I = R ∘ S ( N , L ) I = R\circ S(N,L) I=RS(N,L)

R: the reflectance.

S: a shading computed from normal map N N N and light L L L.

Assumption: the reflectance R R R is the same for an subject in different lighting conditions.

To use the ratio of shadings to render a new image I ^ \hat{I} I^

According to normal N N N and new light L ^ \hat{L} L^

I ^ = R ∘ S ( N , L ^ ) = R ∘ S ( N , L ^ ) R ∘ S ( N , L ) ⋅ ( R ∘ S ( N , L ) ) = S ( N , L ^ ) S ( N , L ) ⋅ I \hat{I} = R\circ S(N,\hat{L})=\frac{R\circ S(N,\hat{L})}{R\circ S(N,L)} \cdot (R \circ S(N,L))=\frac{S(N,\hat{L})}{S(N,L)} \cdot I I^=RS(N,L^)=RS(N,L)RS(N,L^)(RS(N,L))=S(N,L)S(N,L^)I

1.5.2 Performance Comparison

1.Metrics

Using five metrics to measure the relighting performance.

MSE, scale-invariant MSE(Si-MSE), DSSIM, LPIPS and PSNR.

2.Relghting based on SH-lighting

Using four state-of-the-art methods.

DPR, SfSNet, STHP, and MTP

在这里插入图片描述

PR_RL adopts only modifying the light and the other information untouch, and the pixel and structural errors of result are small.

3.Relighting based on reference images

在这里插入图片描述

DPR and SfSNet can extract lighting conditions from the reference image I r I_r Ir as light inputs.

STHP, MTP and RP-RL can accept a source image I I I and a reference image I r I_r Ir as inputs and transfer the light from I r I_r Ir to I I I.

在这里插入图片描述

PR-RL outperforms DPR, SfSNet, STHP, MTP methods in terms of light transfer from reference images.

4)Comparative performance evaluation on FFHQ dataset:

The dataset is high-quality human face dataset which contains 1024$\times$1024 images and variational age is considerable, ethnicity and image background.

在这里插入图片描述

Table3 is in relighting based on SH-lighting. PR_RL is better than DPR, SfSNet, MTP, but is worse than STHP, because STHP is a light transfer method.

在这里插入图片描述

Table 4 is in relighting based on reference images. PR_RL is the better than DPR, SfSNet, STHP, MTP.

1.5.3 Ablation Studies
network depth
effect
PR-RL
parameter setting
rewards
stroke curve

1)network depth

The actor and critic uses ResNet.

RestNet-18 is better than ResNet-34 and ResNet-50.

在这里插入图片描述

2)parameter setting

Episode length --> how many stroke should be used to edit in the image.

more strokes can bring more detailed changes, but it can lead to overexposure or wrong stroke position.

less strokes can be more controllable for the result, but it can fail to generate detailed lighting effects.

the episode length of 6$\times$5 is the best performance.
在这里插入图片描述

Dodge and burn strategy -->

set four different strategies:

dodge-burn: dodge for the first three strokes and burn the second three strokes.

burn-dodge: burn for the first three strokes and dodge for the second three.

cross: dodge and burn crossed.

self-choice: dodge or burn chosen by the agent

dodge-burn is better than burn-dodge, cross, self-choice.
在这里插入图片描述

3)rewards

w/o stroke reward: the agent is hard to control the stroke size for detail editing while training our model without stroke reward.

w/o shading reward: the shading reward can help the agent to pay more attention to the facial area that is susceptible to light changes.

w/o PatchGAN reward: the PatchGAN reward can enable the result to have more details of shadow and highlight changes.

w/o L2 reward: Training by L2 reward, the result has a more natural transition from light to shadow.

在这里插入图片描述

4)Stroke curves

The cubic B e ˊ z i e r B\acute{e}zier Beˊzier has 4 control points, it can express more complex curve shapes.

The cubic B-spline can express various complex shape of curves.
在这里插入图片描述

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

PR-RL:Portrait Relighting via Deep Reinforcement Learning 的相关文章

随机推荐

  • 操作系统PV操作及读者写者问题

    操作系统PV操作及读者写者问题 目录 1 信号量 2 P V操作原语可描述为以下式子 3 解释 4 互斥模式原理 5 同步模式原理 6 读者写者问题 1 信号量 PV操作与信号量的处理有关 信号量是表示资源的实体 是一个与队列有关的整型变量
  • 一文彻底弄懂零拷贝原理

    零拷贝 零拷贝 Zero Copy 是一种 I O 操作优化技术 可以快速高效地将数据从文件系统移动到网络接口 而不需要将其从内核空间复制到用户空间 其在 FTP 或者 HTTP 等协议中可以显著地提升性能 但是需要注意的是 并不是所有的操
  • Maven私服搭建

    1 下载nexus https help sonatype com repomanager2 download 2 上传到linux服务器scp 命令 以私服IP是10 25 1 14为例 将这个IP改成自己的IP 下同 scp nexus
  • 【前端】Vue+Element UI案例:通用后台管理系统-项目总结

    文章目录 相关链接 前言 效果 登录页 首页 管理员的首页 xiaoxiao的首页 用户管理 总结 项目搭建 左侧 CommonAside 上侧 CommonHeader和CommonTag 首页 Home vue 用户管理 User vu
  • 使用Spring Boot实现在服务器上在线打开、编辑和保存Word文档

    使用Spring Boot实现在服务器上在线打开 编辑和保存Word文档 在本文中 我们将探讨如何使用Spring Boot框架实现在服务器上实现在线打开 编辑和保存Word文档的功能 为此 我们将使用pageOffice插件来实现这一目标
  • python安装各种包,只需镜像https://pypi.tuna.tsinghua.edu.cn/simple/

    如 pip install spacy i https pypi tuna tsinghua edu cn simple
  • 无人机航测没信号?北斗卫星来解决

    无人机航测是利用无人机进行地理信息的采集和处理的航测方式 相比传统的航测手段 无人机航测具备更高的灵活性 更低的成本和更广阔的适应性 无人机航测可以应用于土地测绘 农业植保 城市规划 自然资源调查等多个领域 极大地提高了测绘的效率和准确性
  • Vmware安装后,没有VMnet0

    安装虚拟机之后 发现只有VMnet1和VMnet8 没有VMnet0 后来了解到桥接不是添加一个VMnet0虚拟网卡 而是添加网络服务 下面是VMware虚拟网卡的说明 网络类型 网络适配器名 Bridged VMnet0 NAT VMne
  • 图像二值化

    文章目录 1 图像二值化 2 图像二值化方法及Python实现 2 1 简单二值法 2 2 平均值法 2 3 双峰法 2 4 OTSU法 3 opencv python中二值化方法的应用 3 1 简单阈值分割 Simple Threshol
  • Shell脚本开发:printf和test命令的实际应用

    目录 Shell printf 命令 打印简单文本 Shell test 命令 1 文件测试 2 字符串比较 3 整数比较 逻辑运算 Shell printf 命令 当你使用Shell中的printf命令时 它可以帮助你格式化和输出文本 打
  • 【在centos7.6上配置编译环境的时候,输入make menuconfig 报错】

    Build dependency Please install ncurses Missing libncurses so or ncurses h Prerequisite check failed Use FORCE 1 to over
  • 移远EC800N开发板驱动安装卡死

    开发板 移远EC800N开发板 EC800X QuecPython EVB V1 0 系统 Windows 10 驱动 Quectel ASR Series UMTS LTE Windows10 USB Driver Customer V1
  • vector与list的比较

    本博客只作为自己笔记使用 vector和list的相同点 都是STL中的序列式容器 vector和list的不同点 1 底层结构 vector是一段连续的空间 用动态顺序表实现 而list是一个带头结点的双向链表 2 是否支持随机访问 ve
  • tensorflow人工智能项目-鸟类识别系统

    介绍 Python作业 机器学习 人工智能 模式识别课程 鸟类识别检测系统 这是一个鸟类识别项目 基于tensorflow 使用卷积神经网络实现对200种鸟类进行识别 在数据集中收集了200中鸟类图片 每种鸟类都有着40 60张图片 通过对
  • Mysql Replication与Connector/J原理(四)

    十九 Connector与Failover协议 Mysql Connector J支持failover协议 即Client链接失效时 将会尝试与其他host建立链接 这个过程对application是透明的 Failover协议是 Mult
  • json请求 post vue_vue基础之使用get、post、jsonp实现交互功能示例

    本文实例讲述了vue基础之使用get post jsonp实现交互功能 分享给大家供大家参考 具体如下 一 如果vue想做交互 引入 vue resouce 二 get方式 1 get获取一个普通文本数据 window nl ad func
  • C++:类对象的初始化,构造函数:无参、拷贝构造函数,类中const 成员的初始化,对象的构造顺序

    类的对象 类对象的初始化 构造函数 无参构造函数 拷贝构造函数 类中const 成员的初始化 对象的构造顺序 类对象的初始化 include
  • 解决Jenkins插件不能下载安装的问题

    安装到这一步 显示无法下载Jenkins插件 安装中升级站点 如果你还在安装过程中 遇见这个问题 你可以打开一个新的网页 输入网址http localhost 8080 pluginManager advanced 在最下面的升级站点 把其
  • wimax与anroid的困惑

    我要加入wimax组 研究wimax 但是 为什么是anroid平台呢 wimax和android有关系吗 我们是做wimax芯片的呀 难道wimax芯片上跑anroid系统 好像不太可能 经过分析 应该是xx公司要使用我们的wimax芯片
  • PR-RL:Portrait Relighting via Deep Reinforcement Learning

    文章目录 Title PR RL Portrait Relighting via Deep Reinforcement Learning 1 Article 1 1 Abstract and Introduction 1 2 Conclus