资源简介
强化学习倒摆程序 是matlab程序,使用AHC算法,结构简单易懂,初学者的好资料
代码片段和文件信息
/*----------------------------------------------------------------------
This file contains a simulation of the cart and pole dynamic system and
a procedure for learning to balance the pole. Both are described in
Barto Sutton and Anderson “Neuronlike Adaptive Elements That Can Solve
Difficult Learning Control Problems“ IEEE Trans. Syst. Man Cybern.
Vol. SMC-13 pp. 834--846 Sept.--Oct. 1983 and in Sutton “Temporal
Aspects of Credit Assignment in Reinforcement Learning“ PhD
Dissertation Department of Computer and Information Science University
of Massachusetts Amherst 1984. The following routines are included:
main: controls simulation interations and implements
the learning system.
cart_and_pole: the cart and pole dynamics; given action and
current state estimates next state
get_box: The cart-pole‘s state space is divided into 162
boxes. get_box returns the index of the box into
which the current state appears.
These routines were written by Rich Sutton and Chuck Anderson. Claude Sammut
translated parts from Fortran to C. Please address correspondence to
sutton@gte.com or anderson@cs.colostate.edu
---------------------------------------
Changes:
1/93: A bug was found and fixed in the state -> box mapping which resulted
in array addressing outside the range of the array. It‘s amazing this
program worked at all before this bug was fixed. -RSS
----------------------------------------------------------------------*/
#include
#define min(x y) ((x <= y) ? x : y)
#define max(x y) ((x >= y) ? x : y)
#define prob_push_right(s) (1.0 / (1.0 + exp(-max(-50.0 min(s 50.0)))))
#define random ((float) rand() / (float)((1 << 31) - 1))
#define N_BOXES 162 /* Number of disjoint boxes of state space. */
#define ALPHA 1000 /* Learning rate for action weights w. */
#define BETA 0.5 /* Learning rate for critic weights v. */
#define GAMMA 0.95 /* Discount factor for critic. */
#define LAMBDAw 0.9 /* Decay rate for w eligibility trace. */
#define LAMBDAv 0.8 /* Decay rate for v eligibility trace. */
#define MAX_FAILURES 100 /* Termination criterion. */
#define MAX_STEPS 100000
typedef float vector[N_BOXES];
main()
{
float x /* cart position meters */
x_dot /* cart velocity */
theta /* pole angle radians */
theta_dot; /* pole angular velocity */
vector w /* vector of action weights */
v /* vector of critic weights */
e /* vector of action weight eligibilities */
xbar; /* vector of critic weight eligibilities */
float p oldp rhat r;
int box i y steps = 0 failures=0 failed;
printf(“Seed? “);
scanf(“
评论
共有 条评论