Works best in Chrome.
A Pen by Joshua Hibbert on CodePen.
| """ Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """ | |
| import numpy as np | |
| import gym.spaces | |
| # hyperparameters | |
| H = 200 # number of hidden layer neurons | |
| batch_size = 10 # every how many episodes to do a param update? | |
| learning_rate = 1e-4 | |
| gamma = 0.99 # discount factor for reward | |
| decay_rate = 0.99 # decay factor for RMSProp leaky sum of grad^2 |
Works best in Chrome.
A Pen by Joshua Hibbert on CodePen.