资源简介
Q学习,很有帮助.jie shao le Q-learning de ji ben shiyong
代码片段和文件信息
%% Q-learning with epsilon-greedy exploration Algorithm for Deterministic Cleaning Robot V1
% Matlab code : Reza Ahmadzadeh
% email: reza.ahmadzadeh@iit.it
% March-2014
%% The deterministic cleaning-robot MDP
% a cleaning robot has to collect a used can also has to recharge its
% batteries. the state describes the position of the robot and the action
% describes the direction of motion. The robot can move to the left or to
% the right. The first (1) and the final (6) states are the terminal
% states. The goal is to find an optimal policy that maximizes the return
% from any initial state. Here the Q-learning epsilon-greedy exploration
% algorithm (in Reinforcement learning) is used.
% Algorithm 2-3 from:
% @book{busoniu2010reinforcement
% title={Reinforcement learning and dynamic programming using function approximators}
% author={Busoniu Lucian and Babuska Robert and De Schutter Bart and Ernst Damien}
% year={2010}
% publisher={CRC Press}
% }
% notice: the code is written in 1-indexed instead of 0-indexed
%
% V1 the initial evaluation of the algorithm
%
%% this is the main function including the initialization and the algorithm
% the inputs are: initial Q matrix set of actions set of states
% discounting factor learning rate exploration probability
% number of iterations and the initial state.
function qlearning
% learning parameters
gamma = 0.5; % discount factor % TODO : we need learning rate schedule
alpha = 0.5; % learning rate % TODO : we need exploration rate schedule
epsilon = 0.9; % exploration probability (1-epsilon = exploit / epsilon = explore)
% states
state = [012345];
% actions
action = [-11];
% initial Q matrix
Q = zeros(length(state)length(action));
K = 1000; % maximum number of the iterations
state_idx = 3; % the initial state to begin from
%% the main loop of the algorithm
for k = 1:K
disp([‘iteration: ‘ num2str(k)]);
r=rand; % get 1 uniform random number
x=sum(r>=cumsum([0 1-eps
- 上一篇:线性调频信号模糊函数仿真
- 下一篇:点云的边界提取
评论
共有 条评论