PR-RL模型通过顺序预测局部光线编辑线条(stroke)来表现肖像重光照,利用stroke对图像的明暗度进行减淡和加深(Dodge and Burning)——模拟画家画线条。
action:连续空间中的stroke,
action根据reward引导agent to learn and relight a portrait image.
To optimize PR-RL, the paper make the reward relevant to location, and utility a coarse-to-fine strategy to select corresponding actions.
PR-RL is the more locally effective, scale-invariant and interpretable than existing method.
The paper applies the proposed method to tasks of portrait relighting based on both SH-lighting and reference images.
Physically-based methods:
Difficulties: human’s face have a complex geometry, because various, poses, expressions.
Positives: human’s face have a regular geometry, everyone have similar and symmetrical shape. -----> we can use deep learning models to learn a relighting strategy from pre-rendered or capture images.
Graph annotation:
Deep learning methods have CNNs with encoder-decoder architecture as relighting generators. The relighting result can be generated in only one step, so the encoder-decoder framework is sufficient and simple, but the result have local errors and artifacts, the portrait relighting strategy is not interpretable, the CNNs-based generated can only generate images with limited resolutions
Graph annotation:
Action: draw strokes on the image
RL: predict actions and provide reward for every actions, then use actions and rewards to observe and gain experiences to select the next action to achieve the desired effect.
make strokes as parameters to control the stroke position, shape and lightness.
The every actions define the parameters of the strokes, but these parameters are scale-invariant and can be used for images in any resolution.
RL’s agent take a source image and a Spherical Harmonic lighting vector or a reference image as the inputs.
Agent adopts Deep Deterministic Policy Gradient(DDPG) to model, it can define the actions in a continuous space, because the light editing strokes need continuous and high dimensional
action spaces.
Action controls the position, shape, lightness of strokes via defining the parameters.
The paper adopts dodge and burn to manipulate the exposure of a selected area on the image, this way can make the sequential local light editing interpretable.
Dodge can increase the exposure for areas which wishes to be brighter on the image.
Burn can decrease the exposure for areas which wishes to be darker on the image.
Using dodge and burn to make the edited image closer to the image captured in the real lighting condition we desired
The reward is used to assess the selected stroke position and guide the agent to select strokes from coarse to fine throughout the editing process.
Contribution:
1.make a portrait relighting model, the model can realize a sequential local light editing process by selecting strokes and using dodge and burn on the image in a coarse-to-fine strategy.
2.Using reward to guide the agent to learn and relight, generate location related award and perform coarse-to-fine action selection.
3.The method of this paper is locally effective, scale-invariant and interpretable. The method can efficiently relight portrait images in high resolutions with interpretable steps.
4.The method is based on SH-lighting and reference images.
1.2 Conclusion
1.Propose a locally effective, scale-invariant and interpretable portrait relighting method by modelling portrait relighting as a sequential local light editing process.
2.The agent of this method can relight to generate coarse-to-fine actions like artists.
3.The method can be applied to different applications.
4.The PR-RL method outperforms the existing stat-of-the-art methods.
1.3 Related research
1.3.1 Portrait Relighting
Solve inverse rendering to estimate geometry and intrinsic from a single image.(many method uses deep learning algorithms to estimate geometry and intrinsic components)
CNNs
When relighting, there is light occlusion.
Kanamori et al. uses CNNs to decompose the intrinsic components into albedo map, illumination pattern, a light transport map.
problem: The methods don’t consider specular and shadows, so may fail in handing extreme lighting conditions.
Wang et al. proposes a framework that models multiple reflectance channels, including the facial albedo, geometry, specular and shadows.
According to geometry and intrinsic components information, the image can be rendered to match a new light by using physical rendering model.
End-to-end data driven
Some method use end-to-end data-driven networks to directly map a portrait image to a new image. These methods use images in different lighting conditions to train the networks.
Nestmeyer et al. and Sun et al. use a Light Stage to capture portrait in different lighting views by rotating light.
Zhou et al. use an offline physically-based relighting method to generate high quality relighting images.
Nestmeyer et al. train two U-Nets to generate diffuse and non-diffuse relighting results.
Han et al. use a generative adversarial network(GAN) to generate relighting results and encode lighting conditions in their dataset into one-hot lighting labels.
The end-to-end networks extract features from the whole image and reconstruct the image. However, it can cause local errors and artifacts in the process.
Reference image light sytle transfer
Chen et al. decomposed the lightness layers of the reference and the source images into large-scale and detailed layers. And then replace the large-scale layer of the source image with the large-scale layer of the reference image to obtain relighting results.
Shih et al. decompose the source and reference images into multiscale Laplacian stacks,(拉普拉斯叠加), modify the local energy of source subbands to match the local energy of the reference subbands.
The two methods require both the source and the reference images must be aligned and angularly similar. The two methods can transfer the lighting style and a rough distribution of its lighting directions.
However these methods fail to transfer the shadow caused by face geometry.
Shu et al. solved this problem by adding a normal map and position constraint in their transfer algorithm.
Zhu et al. construct an optimal transport plan between the histogram of feature vectors containing feature, position and normals to design a relight generator.
1.3.2 Reinforcement Learning:
1.4 System (Portrait Relighting Method)
Portrait relighting is modeled ad a Markov Decision Process, it can edit the light locally and sequentially.
The agent is used for portrait relighting based on a SH-lighting vector or a reference image.
At each step
t
∈
[
1
,
T
]
t\in [1,T]
t∈[1,T], the agent makes a decision according to its policy
π
(
I
t
−
1
,
L
)
\pi(I_{t-1},L)
π(It−1,L) to select an action
a
t
a_t
at. (
I
t
−
1
I_{t-1}
It−1 is the relighted portrait in step
t
−
1
t-1
t−1)
According to the parameters in the selected action
a
t
a_t
at, generate an image editing stroke.
the portrait image is edited and updated from
I
t
−
1
I_{t-1}
It−1 to
I
t
=
I
t
−
1
◦
a
t
I_t = I_{t-1}◦a_t
It=It−1◦at (◦ is the light rendering operation based on the action )
The final relighting result
I
o
u
t
I_{out}
Iout is generated by accumulating T steps.( Each step operation is based on the result of its previous step)
The relighting operation:
a
t
=
π
(
I
t
−
1
,
L
)
a_t = \pi(I_{t-1},L)
at=π(It−1,L)
$I_t = I_{t-1}◦a_t $
I
o
u
t
=
I
◦
a
1
◦
a
2
◦
.
.
.
◦
a
T
I_{out} = I◦a_1◦a_2◦...◦a_T
Iout=I◦a1◦a2◦...◦aT
the state: relighted image
I
t
−
1
I_{t-1}
It−1,
the target lighting condition
L
L
L
the current step
t
t
t
The actor network predicts actions based on the state.
The action is a vector contains the parameters of the stroke and the action is transformed into a soft stroke mask by a pre-trained soft render network.
The soft stroke mask defines the location of the stroke and the level of adjustment define within the range (0,1)
According to the constraint of the stroke mask, the image is edited by using doge and burn.
The next state
s
t
+
1
s_{t+1}
st+1 is obtained by updating the image.
The reward is computed to help the critic network to learn the Q value for the chosen action.
(consider the variation of images in current and next states and the process of coarse-to-fine stroke sequence.)
the actor network update the prediction of the selected action based on the output of the critic network.
1.4.1 Agent Model
The agent model is designed by the DDPG algorithm, it has two different neural networks of actor and critic combining value-based and policy-based methods.
The actor network is policy-based, according to the input state, it can output the action.
At step
t
t
t, the policy of the actor is
a
t
=
π
(
I
t
−
1
,
L
)
a_t = \pi(I_{t-1},L)
at=π(It−1,L)
critic use
Q
(
s
,
a
)
=
R
(
s
,
a
)
+
γ
m
a
x
Q
(
s
′
,
a
′
)
Q(s, a) = R(s,a) + \gamma \ max{Q(s',a')}
Q(s,a)=R(s,a)+γmaxQ(s′,a′)
γ
\gamma
γ is a discount factor
s
s
s is state at the current step
a
a
a is action at the current step
s
′
s'
s′ is state at the next step
a
′
a'
a′is action at the next step
R
(
s
,
a
)
R(s,a)
R(s,a) is the current state reward.
By considering the current reward and next max Q value in the memory, update the Q value until all Q values reach their convergence.
Q
(
s
′
,
a
′
)
Q(s',a')
Q(s′,a′) is chosen by actor network.
The loss function of the critic is that squared error between target action values and predicted action values.
L
c
r
i
t
i
c
=
(
R
(
s
,
a
)
+
γ
Q
(
s
′
,
a
′
)
−
Q
(
s
,
a
)
)
2
L_{critic}=(R(s,a)+\gamma Q(s',a')-Q(s,a))^2
Lcritic=(R(s,a)+γQ(s′,a′)−Q(s,a))2
The actor is updated to get predicting actions with high Q value estimated by critic.
The loss function of the actor is :
L
a
c
t
o
r
=
−
Q
(
s
,
a
)
L_{actor}=-Q(s,a)
Lactor=−Q(s,a)
At step t, the environment state
s
t
s_t
st is a combination of the relighted image
I
t
−
1
I_{t-1}
It−1
input light is
L
L
L,
At step t,
s
t
=
(
I
t
−
1
,
L
,
t
)
s_t = (I_{t-1},L,t)
st=(It−1,L,t)
Step
t
t
t is making the agent ware of which step is in the whole decision process.
1.4.2 Action
The action is how to adjust the light and where an editing is needed
The action
a
a
a is a set of parameters that can control the position, shape, and lightness of stroke.
To do the local light editing on the human faces, the paper use smooth curves represent the shape of strokes.
use simple curves with a small number of control points to fit the local geometry of a human face.(the simple curve can learn simple curves more effective than complex curves)
use a quadratic
B
e
ˊ
z
i
e
r
B\acute{e}zier
Beˊzier curve and control the curve by three points with coordinate
(
x
0
,
y
0
)
,
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
(x_0,y_0),(x_1,y_1),(x_2,y_2)
(x0,y0),(x1,y1),(x2,y2)
the stroke can be controlled by two parameter
t
0
t_0
t0和
t
1
t_1
t1(
t
0
t_0
t0 is the thickness of the start point,
t
1
t_1
t1 is thickness of the ending point. Using the start point
e
0
e_0
e0 and the end point
e
1
e_1
e1 can control the lightness of the stroke.
e
0
e_0
e0 and
e
1
e_1
e1range from 0 to 1)
the action space is:
a
c
t
i
o
n
=
{
x
0
,
y
0
,
x
1
,
y
1
,
x
2
,
y
2
,
t
0
,
t
1
,
e
0
,
e
1
}
action=\left\{x_0,y_0,x_1,y_1,x_2,y_2,t_0,t_1,e_0,e_1\right\}
action={x0,y0,x1,y1,x2,y2,t0,t1,e0,e1}
1.4.3 Image Editing
To edit the image based on the chosen action.
stroke mask is firstly generated by a stroke renderer.
Firstly, a stroke renderer generates stroke mask, and then the image is edited by using doge and burn operations.
After image is edited, the stated is updated.
Stroke renderer:
Stroke renderer is a neural network, it can convert the parameters in an action to a stroke on canvas.
Hard strokes can make noticeable borders and unnatural transitions.
Soft strokes can make the editing natural.
The paper trains a renderer to generate soft strokes.
Soft change of the stroke from center to boundary is generated by gradually reducing the lightness from center to boundary as shown in Figure 3.
To train the stroke renderer:
To generate a dataset containing both the soft strokes and their corresponding parameters.
In Figure 4, Get a set of stroke parameters
{
x
0
,
y
0
,
x
1
,
y
1
,
x
2
,
y
2
,
t
0
,
t
1
,
e
0
,
e
1
}
\left\{x_0,y_0,x_1,y_1,x_2,y_2,t_0,t_1,e_0, e_1\right\}
{x0,y0,x1,y1,x2,y2,t0,t1,e0,e1},
the
B
e
ˊ
z
i
e
r
B\acute{e}zier
Beˊzier curve is firstly generated based on control points
(
x
0
,
y
0
)
,
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
(x_0,y_0),(x_1,y_1),(x_2,y_2)
(x0,y0),(x1,y1),(x2,y2),
Then, to sample a number of points on the this curve and draw circles on the curve with each sampled point as the center.
t
0
t_0
t0 and
t
1
t_1
t1 are the head and tail of the curve.
For other circles, their radius are generated by interpolation between
t
0
t_0
t0 and
t
1
t_1
t1.
Each circle’s lightness is decreased from the center to the edge.
The lightness of the center is
e
e
e and the lightness of the edge is 0.
The lightness
e
e
e in the middle part of the curve is generated by interpolating from
e
0
e_0
e0 to
e
1
e_1
e1
When sample enough number of points on the curve, the strokes will be smoothly.
Dodge and burn
To make the sequential local light editing interpretable. The paper adopts dodge and burn, dodge and burn edits the lightness distribution curve of the image.
To use different curves influence pixels at different levels.
Figure 5 there is dodge curve on the left, and burn curve on the right.
the dodge curve increases the lightness values,
the burn curve decreases the lightness values.
By dodge and burn, we can get three classes of highlight, mid-tone and shadow.
The paper adopts the following two curves
I
d
o
d
g
e
=
(
1
+
S
/
3
)
⋅
M
⋅
I
+
(
1
−
M
)
⋅
I
I_{dodge} = (1+S/3)\cdot M\cdot I + (1-M)\cdot I
Idodge=(1+S/3)⋅M⋅I+(1−M)⋅I
I
b
u
r
n
=
(
1
−
S
/
3
)
⋅
M
⋅
I
+
(
1
−
M
)
⋅
I
I_{burn}=(1-S/3)\cdot M \cdot I +(1-M)\cdot I
Iburn=(1−S/3)⋅M⋅I+(1−M)⋅I
S
S
S: a stroke mask as shown (b) Soft stroke.
M
M
M is binary mask to filtrate the area that needs to be edited and it is calculated by quantifying the value
S
>
1
S>1
S>1 as
1
1
1
At the training stage, using action bundle to predict
k
k
k actions at each step. The way can reduce the calculating cost and allow the agent to learn the association between distant states and actions.
In the paper experiments, the agent is set to predict 6 actions to generate 3 dodge and 3 burn strokes at each step.
1.4.4 Reward
The reward evaluate the predicated action of the agent, (reward is the quality feedback of the agent to find the optimal action)
(The goal of the relighting task is to make the edited image in current state
s
t
+
1
s_{t+1}
st+1 is closer to the target image than the relight result at state
s
t
s_t
st. If the result at
s
t
+
1
s_{t+1}
st+1 is closer to the target result than the result at
s
t
s_t
st, the agent will get a positive reward, otherwise the agent gets a negative reward.)
r
(
s
t
,
a
t
)
=
D
(
I
t
−
1
,
g
)
−
D
(
I
t
,
g
)
r(s_t,a_t)=D(I_{t-1},g)-D(I_t,g)
r(st,at)=D(It−1,g)−D(It,g)
s
t
s_t
st: the state at step t.
a
t
a_t
at: the chosen action at step t.
g
g
g: the target image
D
(
⋅
)
D(\cdot)
D(⋅): a metric to evaluate the distance between current relighting results and target images
g
g
g.
The paper adopts PathGAN reward, Content L2 reward, Shading reward.
**1)**PathGAN reward: PathGAN is a type of discriminator, it can measure the distribution distance between the generated data and the target data.
PathGAN map an $n\times n$ image to $n^2$ overlapped patches and average all responses of patches to provide an ultimate output. (The output's range is 0-1, which is the higher the better.)
r
g
a
n
(
s
t
,
a
t
)
=
−
(
G
d
(
I
t
−
1
,
g
)
−
G
d
(
I
t
,
g
)
)
r_{gan}(s_t,a_t) = -(G_d(I_{t-1},g)-G_d(I_t,g))
rgan(st,at)=−(Gd(It−1,g)−Gd(It,g))
**2)**Content L2 reward: To make the results more precise on shadow and highlight areas, and the transition between the shadow and highlight areas more natural inside the image.
L2 distance between the relighting result and the target image at pixel levels can be set as a metric in reward:
r
l
2
(
s
t
,
a
t
)
=
∣
∣
I
t
−
1
−
g
∣
∣
2
−
∣
∣
I
t
−
g
∣
∣
2
r_{l2}(s_t,a_t) = ||I_{t-1}-g||_2-||I_t-g||_2
rl2(st,at)=∣∣It−1−g∣∣2−∣∣It−g∣∣2
**3)**Shading reward: The right shadow and highlight on a face can make relighting results look realistic and natural.
To use a shading reward guide the agent to learn fast and precisely.
The shading variation
S
S
S is extracted from the difference between the relighted image
I
t
I_t
It
and the original image
I
I
I.
S
(
I
t
,
I
)
=
I
t
I
+
σ
S(I_t,I)= \frac{I_t}{I+\sigma}
S(It,I)=I+σIt
The distance D of shading reward is a L2 distance between shading variation of the relighting result and shading variation of the target image.
The shading reward is
r
s
=
∣
∣
S
(
I
t
−
1
,
I
)
−
S
(
g
,
I
)
∣
∣
2
−
∣
∣
S
(
I
t
,
I
)
−
S
(
g
,
I
)
∣
∣
2
r_s =||S(I_{t−1}, I) − S(g, I)||_2 −||S(I_t, I) − S(g, I)||2
rs=∣∣S(It−1,I)−S(g,I)∣∣2−∣∣S(It,I)−S(g,I)∣∣2
4)Stroke reward:
To use stroke reward for the entire sequence of strokes to guide the generation of coarse-to-fine stroke sequences.
Using a stroke reward to guide the agent in selecting strokes by following a coarse-to-fine manner.
Stroke size is represented by stroke length, stroke thickness, and stroke weight.
The stroke length
S
l
S_l
Sl is calculate as
L
2
L_2
L2 distance of three control points.
Stroke thickness is
S
t
=
t
0
+
t
1
S_t = t_0+t_1
St=t0+t1
Stroke weight is
S
w
=
e
0
+
e
1
S_w = e_0 + e_1
Sw=e0+e1
Stroke reward:
r
s
t
r
o
k
e
=
−
[
T
(
S
l
)
+
T
(
S
t
)
+
T
(
S
w
)
]
⋅
t
⋅
α
r_{stroke}=-[T(S_l)+T(S_t)+T(S_w)]\cdot t\cdot \alpha
rstroke=−[T(Sl)+T(St)+T(Sw)]⋅t⋅α
T
(
S
i
)
=
{
0
S
i
<
t
h
r
e
s
h
o
l
d
S
i
S
i
≥
t
h
r
e
s
h
o
l
d
T(S_i) =\left\{\begin{matrix}0\ \ S_i< threshold\\S_i\ \ S_i\ge threshold \end{matrix} \right.
T(Si)={0Si<thresholdSiSi≥threshold
i
∈
{
l
,
t
,
w
}
i \in \left\{l,t,w \right\}
i∈{l,t,w}
α
\alpha
α is a scale to control of
t
t
t on the stroke selection.
5)Final reward:
r
=
r
g
a
n
+
r
l
2
+
r
s
+
r
s
t
r
o
k
e
r = r_{gan}+r_{l2}+r_s+r_{stroke}
r=rgan+rl2+rs+rstroke
1.4.5 Application on Light Transfer
To allow users to transfer the lighting condition from a reference image to an input photograph.
Then, the agent can take a source portrait and a reference portrait image as the inputs.
we need the source, reference and target images to train the agent.
Due to the lack of target relighting image, the paper only calculate PatchGAN and stroke rewards.
the source image as
I
s
I_s
Is
the reference image as
I
r
I_r
Ir
original image as
I
r
0
I_r^0
Ir0
W is 2D face warp function based on landmarks.
σ
=
1
0
−
6
\sigma=10^{-6}
σ=10−6 is a constant to avoid zero denominator condition.
using
I
r
0
I
r
+
σ
\frac{I_r^0}{I_r+\sigma}
Ir+σIr0 to get illumination
I
t
=
W
(
I
r
0
I
r
+
σ
)
⋅
I
s
I_t =W(\frac{I_r^0}{I_r+\sigma})\cdot I_s
It=W(Ir+σIr0)⋅Is
1.to extract light form
I
r
0
I_r^0
Ir0 and
I
r
I_r
Ir s
2.to warp
I
r
0
I
r
+
σ
\frac{I_r^0}{I_r+\sigma}
Ir+σIr0 to match facial shape of the source image based on the face landmarks.
3.the warped light dot product
I
s
I_s
Is to generate a coarse relighting result
I
t
I_t
It
1.4.6 Network Architectures
The input source image is the L channel in Lab color space converted from a
256
×
256
256\times 256
256×256 RGB image.
The SH-lighting is a
9
×
1
9\times1
9×1 vector, copy and expand each dimension into 256$\times$256 to fit with the input image.
The paper’s actor and critic networks use ResNet-18 for extracting features, and use a fully- connected layer for predicting actions and Q values.
The stroke renderer network uses 5 fully-connected layers and 8 of 3$\times
3
c
o
n
v
o
l
u
t
i
o
n
l
a
y
e
r
s
∗
∗
t
o
∗
∗
m
a
p
t
h
e
s
t
r
o
k
e
p
a
r
a
m
e
t
e
r
s
i
n
t
o
a
256
3 convolution layers** to **map the stroke parameters into a 256
3convolutionlayers∗∗to∗∗mapthestrokeparametersintoa256\times$256 stroke mask.
1.5 Experiment and Result
Experiment environment:
Pytorch, Nvidia Tesla P100,
Using constructed dataset based on CelebAMask-HQ for training.
Using Adam optimizer to train network
The initial learning rate of actor is
1
0
−
3
10^{-3}
10−3
The initial learning rate of critic is
3
×
1
0
−
4
3\times 10^{-4}
3×10−4
The discount factor
γ
\gamma
γ is
0.9
5
5
0.95^5
0.955
The capacity of experiment replay buffer is 800
The paper’s model train 31250 times with batch size 32.
The actor network adopts batch normalization.
The critic network and weight normalization adopts normalization.
At each time, an image is trained 10 times, the resolution of image is 256$\times$256.
1.5.1 Dataset Preparation
Multi-PIE, Extended-YaleB: to provide relighting portraits.
To get high quality relighting portraits, synthesize high quality relighting in the wild to train model.
To synthesize dataset based on a single directional light. Because the directional light source is a very general representation, the paper’s method can enable the relighting to fit in with complex environment maps by the dataset.
Let human faces are assume as Lambertian reflectance, the rendering of a relighted image
I
I
I can be simplified as
I
=
R
∘
S
(
N
,
L
)
I = R\circ S(N,L)
I=R∘S(N,L)
R: the reflectance.
S: a shading computed from normal map
N
N
N and light
L
L
L.
Assumption: the reflectance
R
R
R is the same for an subject in different lighting conditions.
To use the ratio of shadings to render a new image
I
^
\hat{I}
I^
According to normal
N
N
N and new light
L
^
\hat{L}
L^
I
^
=
R
∘
S
(
N
,
L
^
)
=
R
∘
S
(
N
,
L
^
)
R
∘
S
(
N
,
L
)
⋅
(
R
∘
S
(
N
,
L
)
)
=
S
(
N
,
L
^
)
S
(
N
,
L
)
⋅
I
\hat{I} = R\circ S(N,\hat{L})=\frac{R\circ S(N,\hat{L})}{R\circ S(N,L)} \cdot (R \circ S(N,L))=\frac{S(N,\hat{L})}{S(N,L)} \cdot I
I^=R∘S(N,L^)=R∘S(N,L)R∘S(N,L^)⋅(R∘S(N,L))=S(N,L)S(N,L^)⋅I
1.5.2 Performance Comparison
1.Metrics
Using five metrics to measure the relighting performance.
MSE, scale-invariant MSE(Si-MSE), DSSIM, LPIPS and PSNR.
2.Relghting based on SH-lighting
Using four state-of-the-art methods.
DPR, SfSNet, STHP, and MTP
PR_RL adopts only modifying the light and the other information untouch, and the pixel and structural errors of result are small.
3.Relighting based on reference images
DPR and SfSNet can extract lighting conditions from the reference image
I
r
I_r
Ir as light inputs.
STHP, MTP and RP-RL can accept a source image
I
I
I and a reference image
I
r
I_r
Ir as inputs and transfer the light from
I
r
I_r
Ir to
I
I
I.
PR-RL outperforms DPR, SfSNet, STHP, MTP methods in terms of light transfer from reference images.
4)Comparative performance evaluation on FFHQ dataset:
The dataset is high-quality human face dataset which contains 1024$\times$1024 images and variational age is considerable, ethnicity and image background.
Table3 is in relighting based on SH-lighting. PR_RL is better than DPR, SfSNet, MTP, but is worse than STHP, because STHP is a light transfer method.
Table 4 is in relighting based on reference images. PR_RL is the better than DPR, SfSNet, STHP, MTP.
1.5.3 Ablation Studies
1)network depth
The actor and critic uses ResNet.
RestNet-18 is better than ResNet-34 and ResNet-50.
2)parameter setting
Episode length --> how many stroke should be used to edit in the image.
more strokes can bring more detailed changes, but it can lead to overexposure or wrong stroke position.
less strokes can be more controllable for the result, but it can fail to generate detailed lighting effects.
the episode length of 6$\times$5 is the best performance.
Dodge and burn strategy -->
set four different strategies:
dodge-burn: dodge for the first three strokes and burn the second three strokes.
burn-dodge: burn for the first three strokes and dodge for the second three.
cross: dodge and burn crossed.
self-choice: dodge or burn chosen by the agent
dodge-burn is better than burn-dodge, cross, self-choice.
3)rewards
w/o stroke reward: the agent is hard to control the stroke size for detail editing while training our model without stroke reward.
w/o shading reward: the shading reward can help the agent to pay more attention to the facial area that is susceptible to light changes.
w/o PatchGAN reward: the PatchGAN reward can enable the result to have more details of shadow and highlight changes.
w/o L2 reward: Training by L2 reward, the result has a more natural transition from light to shadow.
4)Stroke curves
The cubic
B
e
ˊ
z
i
e
r
B\acute{e}zier
Beˊzier has 4 control points, it can express more complex curve shapes.
The cubic B-spline can express various complex shape of curves.