This project explores the Maze Rider problem, a challenge demonstrating the trade-off between exploration and exploitation in Reinforcement Learning (RL). The implementation covers both tabular RL methods and Deep RL techniques to solve mazes of varying complexities efficiently.
- State Space Reduction: Compact representation focusing on key elements (agent and goal positions).
- Reward Shaping: Methods to improve convergence by penalizing wall hits and repetitive actions.
- Sampling Techniques: Implementation of ε-greedy exploration and Thompson Sampling.
- Tabular RL Algorithms: Q-Learning, SARSA, Double Q-Learning, and Dyna-Q.
- Deep RL Methods: DQN, Reinforce, and A2C with MLP and convolutional neural networks.
- Q-Learning
- SARSA
- Double Q-Learning
- Dyna-Q
- Deep Q-Networks (DQN)
- Reinforce
- Advantage Actor-Critic (A2C)
- ε-greedy Exploration
- Thompson Sampling
For a detailed analysis of algorithms and their performance, refer to the report.