强化学习Q-learning算法

大小: 3KB

文件类型: .py

金币: 1

下载: 0 次

发布日期: 2021-06-15
语言: Python
标签: 强化学习 模型无关 Q-learning

高速下载

资源简介

Q-learning 是一种模型无关的强化学习方法，本文档使用Q-learning做了一个简单的搜索任务，有助于初学者理解强化学习，理解Q-learning.

资源截图

小图大图

代码片段和文件信息

“““
A simple example for Reinforcement Learning using table lookup Q-learning method.
An agent “o“ is on the left of a 1 dimensional world the treasure is on the rightmost location.
Run this program and to see how the agent will improve its strategy of finding the treasure.
View more on my tutorial page: https://morvanzhou.github.io/tutorials/
“““
import numpy as np
import pandas as pd
import time

np.random.seed（2）  # reproducible


N_STATES = 6   # the length of the 1 dimensional world
ACTIONS = [‘left‘ ‘right‘]     # available actions
EPSILON = 0.9   # greedy police ---epsilon
ALPHA = 0.1     # learning rate
GAMMA = 0.9    # discount factor
MAX_EPISODES = 20   # maximum episodes
FRESH_TIME = 0.3    # fresh time for one move


def build_q_table（n_states actions）:  # 建立一个Q表
    table = pd.Dataframe（
        np.zeros（（n_states len（actions）））     # q_table initial values
        columns=actions    # actions‘s name
    ）
    # print（table）    # show table
    return table


def choose_action（state q_table）:
    # This is how to choose an action
    state_actions = q_table.iloc[state :]
    if （np.random.uniform（） > EPSILON） or （state_actions.all（） == 0）:  # act non-greedy or state-action have no value
        action_name = np.random.choice（ACTIONS）
    else:   # act greedy
        action_name = state_actions.argmax（）
    return action_name


def get_env_feedback（S A）:
    # This is how agent will interact with the environment
    if A == ‘right‘:    # move right
        if S == N_STATES - 2:   # terminate
            S_ = ‘terminal‘
            R = 1
        else:
            S_ = S + 1
            R = 0
    else:   # move left
        R = -1

上一篇：基于Python的Vibe目标检测代码
下一篇：python串口读写

共有条评论

强化学习Q-learning算法

资源简介

资源截图

代码片段和文件信息

评论

相关资源