资源简介
Sutton课本中的小车爬山例程,强化学习中的基本仿真实验程序。
代码片段和文件信息
/*
This is an example program for reinforcement learning with linear
function approximation. The code follows the psuedo-code for linear
gradient-descent Sarsa(lambda) given in Figure 8.8 of the book
“Reinforcement Learning: An Introduction“ by Sutton and Barto.
One difference is that we use the implementation trick mentioned on
page 189 to only keep track of the traces that are larger
than “min-trace“.
Before running the program you need to obtain the tile-coding
software available at http://envy.cs.umass.edu/~rich/tiles.C and tiles.h
(see http://envy.cs.umass.edu/~rich/tiles.html for documentation).
The code below is in three main parts: 1) Mountain Car code 2) General
RL code and 3) top-level code and misc.
Written by Rich Sutton 12/19/00
*/
#include
#include “tiles.h“
#include “stdio.h“
#include “stdlib.h“
#include
#include
#include
////////// Part 1: Mountain Car code //////////////
// Global variables:
float mcar_position mcar_velocity; //位置和速度值
#define mcar_min_position -1.2
#define mcar_max_position 0.6
#define mcar_max_velocity 0.07 // the negative of this in the minimum velocity
#define mcar_goal_position 0.5
#define POS_WIDTH (1.7 / 8) // the tile width for position
#define VEL_WIDTH (0.14 / 8) // the tile width for velocity
// Profiles
void MCarInit(); // initialize car state
void MCarStep(int a); // update car state for given action
bool MCarAtGoal (); // is car at goal?
void MCarInit()
// Initialize state of Car
{ mcar_position = -0.5;
mcar_velocity = 0.0;}
void MCarStep(int a)
// Take action a update state of car
{ mcar_velocity += (a-1)*0.001 + cos(3*mcar_position)*(-0.0025);
if (mcar_velocity > mcar_max_velocity) mcar_velocity = mcar_max_velocity;
if (mcar_velocity < -mcar_max_velocity) mcar_velocity = -mcar_max_velocity;
mcar_position += mcar_velocity;
if (mcar_position > mcar_max_position) mcar_position = mcar_max_position;
if (mcar_position < mcar_min_position) mcar_position = mcar_min_position;
if (mcar_position==mcar_min_position && mcar_velocity<0) mcar_velocity = 0;}
bool MCarAtGoal ()
// Is Car within goal region?
{ return mcar_position >= mcar_goal_position;}
////////// Part 2: Semi-General RL code //////////////
#define MEMORY_SIZE 10000 // number of parameters to theta memory size
#define NUM_ACTIONS 3 // number of actions
#define NUM_TILINGS 10
// Global RL variables:
float Q[NUM_ACTIONS]; // action values
float theta[MEMORY_SIZE]; // modifyable parameter vector aka memory weights
float e[MEMORY_SIZE]; // eligibility traces 资格轨迹
int F[NUM_ACTIONS][NUM_TILINGS];
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
文件 13541 2015-08-24 10:28 mountaincar1.cpp
文件 4306 2015-08-24 10:29 tiles.C
文件 339 2015-08-20 09:42 tiles.h
----------- --------- ---------- ----- ----
18186 3
相关资源
- 数据结构课程设计之客户积分管理系
- 微型伺服马达原理与控制.doc
- 模拟病人排队看病实验程序代码
- 远控小木马
- 气象数据生成卫星云图雷达雨量风力
- 声明一个类Point,然后利用它声明一个
- kinect深度图像去噪
- win环境下的cholmod库(已编译)
- 哈工程本科算法实验-0-1背包动态规划
- 住房管理系统课程设计报告
- 曹文信息学课件_竞赛中常用的STL
- Qt 访问redis接口代码
- 顶角判别法识别多边形的凸凹性,并
- NSGA2算法代码
- ListCtrl控件和下拉框,编辑框等控件组
- vs2017+qt在一个dll中集成多个自定义插
- Qt socket的文件传输
- linux 下c实现简单的网络嗅探器
- 最简洁马走日c程序回溯打印所有能走
- Jsoncpp_0.6rc2修改版
- 数据结构课程设计--统计成绩
- OPENGL可一走动的人
- Myrecord.rar
- 随机数数据折线图
- 基于4的FFT变换
- VirtualList.7z
- QT实现计算器包含科学计算与进制转换
- QT text预加载方式显示大文件文本.zi
- 计算机图形学四面体几何变换.doc
- 基于强化学习的商品推荐系统.docx
评论
共有 条评论