24/12/1 算法笔记<强化学习> 创建Maze交互
我们今天制作一个栅格的游戏。
我们直接上代码教学。
1.载入库和查找相应的函数版本
import numpy as np
import time
import sysif sys.version_info.major == 2:import Tkinter as tk
else:import tkinter as tk
2.设置长宽和单元格大小
UNIT= 40
MAZE_H= 4
MAZE_W =4
3.初始化Maze环境类
class Maze(tk.Tk, object):def __init__(self):super(Maze, self).__init__()self.action_space = ['u', 'd', 'l', 'r']self.n_actions = len(self.action_space)self.title('maze')self.geometry('{0}x{1}'.format(MAZE_H * UNIT, MAZE_H * UNIT))self._build_maze()
里面初始化动作(上下左右),动作数量,窗口标题和大小,构建迷宫。
4.构建迷宫画布,绘制垂直线和水平线
def _build_maze(self):self.canvas = tk.Canvas(self, bg='white', height=MAZE_H * UNIT, width=MAZE_W * UNIT)for c in range(0, MAZE_W * UNIT, UNIT):x0, y0, x1, y1 = c, 0, c, MAZE_H * UNITself.canvas.create_line(x0, y0, x1, y1)for r in range(0, MAZE_H * UNIT, UNIT):x0, y0, x1, y1 = 0, r, MAZE_W * UNIT, rself.canvas.create_line(x0, y0, x1, y1)
5.设定两个陷阱,一个目标和玩家
origin = np.array([20, 20])hell1_center = origin + np.array([UNIT * 2, UNIT])self.hell = self.canvas.create_rectangle(hell1_center[0] - 15, hell1_center[1] - 15,hell1_center[0] + 15, hell1_center[1] + 15,fill='black')hell2_center = origin + np.array([UNIT, UNIT*2]) self.hell2 = self.canvas.create_rectangle(hell2_center[0] - 15, hell2_center[1] - 15,hell2_center[0] + 15, hell2_center[1] + 15,fill='black'
)oval_center = origin + UNIT * 2self.oval = self.canvas.create_oval(oval_center[0] - 15, oval_center[1] - 15,oval_center[0] + 15, oval_center[1] + 15,fill='yellow')self.rect = self.canvas.create_rectangle(origin[0] - 15, origin[1] - 15,origin[0] + 15, origin[1] + 15,fill='red')self.canvas.pack()
最后是打包画布,将画布添加到 Tkinter 窗口中,并允许它显示。
6.重置游戏环境到初始状态的函数。
def reset(self):self.update()time.sleep(0.5)self.canvas.delete(self.rect) #删除旧的玩家图形origin = np.array([20, 20]) #设置玩家的初始位置self.rect = self.canvas.create_rectangle( #重新创建玩家图形origin[0] - 15, origin[1] - 15,origin[0] + 15, origin[1] + 15,fill='red')
7.设定处理玩家在迷宫中的一步移动,并根据结果更新游戏状态的函数。
def step(self, action):s = self.canvas.coords(self.rect) #获取当前位置base_action = np.array([0, 0]) #初始化动作向量if action == 0: #根据动作更新位置if s[1] > UNIT:base_action[1] -= UNITelif action == 1:if s[1] < (MAZE_H - 1) * UNIT:base_action[1] += UNITelif action == 2:if s[0] < (MAZE_W - 1) * UNIT:base_action[0] += UNITelif action == 3:if s[0] > UNIT:base_action[0] -= UNITself.canvas.move(self.rect, base_action[0], base_action[1]) #移动玩家s_ = self.canvas.coords(self.rect) #获取新位置if s_ == self.canvas.coords(self.oval): #检查是否到达终点或陷阱reward = 1done = Trues_ = 'terminal'elif s_ == self.canvas.coords(self.hell) or s_ == self.canvas.coords(self.hell2):reward = -1done = Trues_ = 'terminal'else:reward = 0done = Falsereturn s_, reward, done
8.停顿更新函数
def render(self):time.sleep(0.1)self.update()
9.更新函数
def update():for t in range(10):s = env.reset()while True:env.render()a = 1 # 这里应该是根据策略选择动作s, r, done = env.step(a)if done:break
10.主函数
if __name__ == '__main__':env = Maze()env.after(100, update)env.mainloop()
然后运行就能获得一个简单的自动玩栅格游戏的智能体,这次我们是简单给一些基本设定,以后将加入强化学习的知识强化它。