BipedalWalker-v3🤔
- 写在前面
- 机器人行走控制
- show me code, no bb
- 结果展示
- 写在最后
-
更多代码:
gitee主页:https://gitee.com/GZHzzz
博客主页:
CSDN:https://blog.csdn.net/gzhzzaa
写在前面
- 作为一个新手,写这个强化学习-基础知识专栏是想和大家分享一下自己强化学习的学习历程,希望大家互相交流一起进步!😁在我的gitee收集了强化学习经典论文:强化学习经典论文,搭建了基于pytorch的典型智能体模型,大家一起多篇多交流,互相学习啊!😊
机器人行走控制
- 闭式解closed form solution)也叫解析解(analytical solution),就是一些严格的公式,给出任意的自变量就可以求出其因变量,也就是问题的解, 他人可以利用这些公式计算各自的问题。(代码只有测试过程)
show me code, no bb
import sys
import logging
import itertools
import numpy as np
np.random.seed(0)
import gym
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s [%(levelname)s] %(message)s',
stream=sys.stdout, datefmt='%H:%M:%S')
env = gym.make('BipedalWalker-v3')
env.seed(0)
for key in vars(env):
logging.info('%s: %s', key, vars(env)[key])
for key in vars(env.spec):
logging.info('%s: %s', key, vars(env.spec)[key])
class ClosedFormAgent:
def __init__(self, env):
self.weights = np.array([
[ 0.9, -0.7, 0.0, -1.4],
[ 4.3, -1.6, -4.4, -2.0],
[ 2.4, -4.2, -1.3, -0.1],
[-3.1, -5.0, -2.0, -3.3],
[-0.8, 1.4, 1.7, 0.2],
[-0.7, 0.2, -0.2, 0.1],
[-0.6, -1.5, -0.6, 0.3],
[-0.5, -0.3, 0.2, 0.1],
[ 0.0, -0.1, -0.1, 0.1],
[ 0.4, 0.8, -1.6, -0.5],
[-0.4, 0.5, -0.3, -0.4],
[ 0.3, 2.0, 0.9, -1.6],
[ 0.0, -0.2, 0.1, -0.3],
[ 0.1, 0.2, -0.5, -0.3],
[ 0.7, 0.3, 5.1, -2.4],
[-0.4, -2.3, 0.3, -4.0],
[ 0.1, -0.8, 0.3, 2.5],
[ 0.4, -0.9, -1.8, 0.3],
[-3.9, -3.5, 2.8, 0.8],
[ 0.4, -2.8, 0.4, 1.4],
[-2.2, -2.1, -2.2, -3.2],
[-2.7, -2.6, 0.3, 0.6],
[ 2.0, 2.8, 0.0, -0.9],
[-2.2, 0.6, 4.7, -4.6],
])
self.bias = np.array([3.2, 6.1, -4.0, 7.6])
def reset(self, mode=None):
pass
def step(self, observation, _reward, _done):
action = np.matmul(observation, self.weights) + self.bias
return action
def close(self):
pass
agent = ClosedFormAgent(env)
def play_episode(env, agent, max_episode_steps=None, mode=None, render=False):
observation, reward, done = env.reset(), 0., False
agent.reset(mode=mode)
episode_reward, elapsed_steps = 0., 0
while True:
action = agent.step(observation, reward, done)
if render:
env.render()
if done:
break
observation, reward, done, _ = env.step(action)
episode_reward += reward
elapsed_steps += 1
if max_episode_steps and elapsed_steps >= max_episode_steps:
break
agent.close()
return episode_reward, elapsed_steps
logging.info('==== test ====')
episode_rewards = []
for episode in range(100):
episode_reward, elapsed_steps = play_episode(env, agent,render= 1)
episode_rewards.append(episode_reward)
logging.debug('test episode %d: reward = %.2f, steps = %d',
episode, episode_reward, elapsed_steps)
logging.info('average episode reward = %.2f ± %.2f',
np.mean(episode_rewards), np.std(episode_rewards))
结果展示
写在最后
十年磨剑,与君共勉!
更多代码:gitee主页:https://gitee.com/GZHzzz
博客主页:CSDN:https://blog.csdn.net/gzhzzaa
基于pytorch的经典模型:基于pytorch的典型智能体模型
强化学习经典论文:强化学习经典论文
while True:
Go life
谢谢点赞交流!(❁´◡`❁)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)