这个网站包括了我在学习强化学习期间的笔记,希望能通过 quartz 所带的思维导图功能来对强化学习知识进行更好的整理。
---
title: A Quick Start
---
flowchart TD
direction TB
RL_Algorithms["RL Algorithms"]
Model_Free["Model-Free RL"]
Model_Based["Model-Based RL"]
Policy_Optimization["Policy Optimization"]
Q_Learning["Q-Learning"]
Learn_Model["Learn the Model"]
Given_Model["Given the Model"]
subgraph s1[" "]
direction TB
REINFORCE
A2C
A3C
TRPO["TRPO"]
PPO["PPO"]
REINFORCE ~~~ A2C
A2C ~~~ A3C
A3C ~~~ TRPO
TRPO ~~~ PPO
end
subgraph s2[" "]
direction TB
DDPG["DDPG"]
TD3["TD3"]
SAC["SAC"]
DDPG ~~~ TD3
TD3 ~~~ SAC
end
subgraph s3[" "]
direction TB
DQN["DQN"]
C51["C51"]
QR_DQN["QR-DQN"]
HER["HER"]
DQN ~~~ C51
C51 ~~~ QR_DQN
QR_DQN ~~~ HER
end
subgraph s4[" "]
direction TB
World_Models["World Models"]
I2A["I2A"]
MBMF["MBMF"]
MBVE["MBVE"]
World_Models ~~~ I2A
I2A ~~~ MBMF
MBMF ~~~ MBVE
end
AlphaZero["AlphaZero"]
RL_Algorithms --> Model_Free
RL_Algorithms --> Model_Based
Model_Free --> Policy_Optimization
Model_Free --> Q_Learning
Policy_Optimization --> s1
Policy_Optimization --> s2
Q_Learning --> s2
Q_Learning --> s3
Model_Based --> Learn_Model
Model_Based --> Given_Model
Learn_Model --> s4
Given_Model --> AlphaZero
click Model_Free href "tags/model-free"
click Model_Based href "tags/model-based"
click Policy_Optimization href "tags/policy-iteration"
click Q_Learning href "tags/value-iteration"
click DQN href "posts/dqn"
click REINFORCE href "posts/reinforce"
click A2C href "posts/a2c"
click TRPO href "posts/trpo"
click PPO href "posts/ppo"
click SAC href "posts/sac"