也许我的问题会显得很愚蠢。
我正在研究 Q-learning 算法。为了更好地理解它,我正在尝试重新制作 Tenzorflow 代码这个结冰的湖 https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0将示例写入 Keras 代码中。
My code:
import gym
import numpy as np
import random
from keras.layers import Dense
from keras.models import Sequential
from keras import backend as K
import matplotlib.pyplot as plt
%matplotlib inline
env = gym.make('FrozenLake-v0')
model = Sequential()
model.add(Dense(16, activation='relu', kernel_initializer='uniform', input_shape=(16,)))
model.add(Dense(4, activation='softmax', kernel_initializer='uniform'))
def custom_loss(yTrue, yPred):
return K.sum(K.square(yTrue - yPred))
model.compile(loss=custom_loss, optimizer='sgd')
# Set learning parameters
y = .99
e = 0.1
#create lists to contain total rewards and steps per episode
jList = []
rList = []
num_episodes = 2000
for i in range(num_episodes):
current_state = env.reset()
rAll = 0
d = False
j = 0
while j < 99:
j+=1
current_state_Q_values = model.predict(np.identity(16)[current_state:current_state+1], batch_size=1)
action = np.reshape(np.argmax(current_state_Q_values), (1,))
if np.random.rand(1) < e:
action[0] = env.action_space.sample() #random action
new_state, reward, d, _ = env.step(action[0])
rAll += reward
jList.append(j)
rList.append(rAll)
new_Qs = model.predict(np.identity(16)[new_state:new_state+1], batch_size=1)
max_newQ = np.max(new_Qs)
targetQ = current_state_Q_values
targetQ[0,action[0]] = reward + y*max_newQ
model.fit(np.identity(16)[current_state:current_state+1], targetQ, verbose=0, batch_size=1)
current_state = new_state
if d == True:
#Reduce chance of random action as we train the model.
e = 1./((i/50) + 10)
break
print("Percent of succesful episodes: " + str(sum(rList)/num_episodes) + "%")
当我运行它时,它效果不佳:成功剧集百分比:0.052%
plt.plot(rList)
The 原始张量流代码 https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0好多了:成功剧集百分比:0.352%
plt.plot(rList)
我做错了什么?