Q学习，matlab

大小: 4KB

文件类型: .m

金币: 1

下载: 0 次

发布日期: 2021-05-16
语言: Matlab
标签: Q学习

高速下载

资源简介

Q学习，很有帮助.jie shao le Q-learning de ji ben shiyong

资源截图

小图大图

代码片段和文件信息

%% Q-learning with epsilon-greedy exploration Algorithm for Deterministic Cleaning Robot V1
%  Matlab code : Reza Ahmadzadeh
%  email: reza.ahmadzadeh@iit.it
%  March-2014
%% The deterministic cleaning-robot MDP
% a cleaning robot has to collect a used can also has to recharge its
% batteries. the state describes the position of the robot and the action
% describes the direction of motion. The robot can move to the left or to
% the right. The first （1） and the final （6） states are the terminal
% states. The goal is to find an optimal policy that maximizes the return
% from any initial state. Here the Q-learning epsilon-greedy exploration
% algorithm （in Reinforcement learning） is used.
% Algorithm 2-3 from:
% @book{busoniu2010reinforcement
%   title={Reinforcement learning and dynamic programming using function approximators}
%   author={Busoniu Lucian and Babuska Robert and De Schutter Bart and Ernst Damien}
%   year={2010}
%   publisher={CRC Press}
% }
% notice: the code is written in 1-indexed instead of 0-indexed
%
% V1 the initial evaluation of the algorithm 
%
%% this is the main function including the initialization and the algorithm
% the inputs are: initial Q matrix set of actions set of states
% discounting factor learning rate exploration probability
% number of iterations and the initial state.
function qlearning
% learning parameters
gamma = 0.5;    % discount factor  % TODO : we need learning rate schedule
alpha = 0.5;    % learning rate    % TODO : we need exploration rate schedule
epsilon = 0.9;  % exploration probability （1-epsilon = exploit / epsilon = explore）
% states
state = [012345];
% actions
action = [-11];
% initial Q matrix
Q = zeros（length（state）length（action））;
K = 1000;     % maximum number of the iterations
state_idx = 3;  % the initial state to begin from
%% the main loop of the algorithm
for k = 1:K
    disp（[‘iteration: ‘ num2str（k）]）;
    r=rand; % get 1 uniform random number
    x=sum（r>=cumsum（[0 1-eps

共有条评论

Q学习，matlab

资源简介

资源截图

代码片段和文件信息

评论

相关资源