这个网站包括了我在学习强化学习期间的笔记,希望能通过 quartz 所带的思维导图功能来对强化学习知识进行更好的整理。

---
title: A Quick Start
---

flowchart TD
    direction TB

    RL_Algorithms["RL Algorithms"]
    Model_Free["Model-Free RL"]
    Model_Based["Model-Based RL"]
    Policy_Optimization["Policy Optimization"]
    Q_Learning["Q-Learning"]
    Learn_Model["Learn the Model"]
    Given_Model["Given the Model"]

    subgraph s1[" "]
        direction TB
        REINFORCE
        A2C
        A3C
        TRPO["TRPO"]
        PPO["PPO"]
        REINFORCE ~~~ A2C
        A2C ~~~ A3C
        A3C ~~~ TRPO
        TRPO ~~~ PPO
    end

    subgraph s2[" "]
        direction TB
        DDPG["DDPG"]
        TD3["TD3"]
        SAC["SAC"]
        DDPG ~~~ TD3
        TD3 ~~~ SAC
    end

    subgraph s3[" "]
        direction TB
        DQN["DQN"]
        C51["C51"]
        QR_DQN["QR-DQN"]
        HER["HER"]
        DQN ~~~ C51
        C51 ~~~ QR_DQN
        QR_DQN ~~~ HER
    end

    subgraph s4[" "]
        direction TB
        World_Models["World Models"]
        I2A["I2A"]
        MBMF["MBMF"]
        MBVE["MBVE"]
        World_Models ~~~ I2A
        I2A ~~~ MBMF
        MBMF ~~~ MBVE
    end

    AlphaZero["AlphaZero"]

    RL_Algorithms --> Model_Free
    RL_Algorithms --> Model_Based
    Model_Free --> Policy_Optimization
    Model_Free --> Q_Learning
    Policy_Optimization --> s1
    Policy_Optimization --> s2
    Q_Learning --> s2
    Q_Learning --> s3
    Model_Based --> Learn_Model
    Model_Based --> Given_Model
    Learn_Model --> s4
    Given_Model --> AlphaZero

    click Model_Free href "tags/model-free"
    click Model_Based href "tags/model-based"
    click Policy_Optimization href "tags/policy-iteration"
    click Q_Learning href "tags/value-iteration"
    click DQN href "posts/dqn"
    click REINFORCE href "posts/reinforce"
    click A2C href "posts/a2c"
    click TRPO href "posts/trpo"
    click PPO href "posts/ppo"
    click SAC href "posts/sac"