Deep Reinforcement Learning in Lunar Lander Environment

This was one of the most fun projects I have worked on. The task was to train a Reinforcement Learning agent to land the spacecraft in the lunar lander environment. In order to do this, the agent will have to learn to control the three boosters of a spacecraft based on the spacecraft’s position, velocity, and orientation. I choose to use a Deep Reinforcement Learning approach to make sure the agent generatlizes over the state space of 8 inputs 6 of which are continous values.

Deep Reinforcement Learning combines Reinforcement Learning with Deep Neural Networks.
Other than supervised learning algorithms, RL algorithms like Q-learning or SARSA have the ability to learn not from annotated data, but from experience directly. It uses the TD-Error which is the error between two consecutive predictions of a state to learn more efficiently. The neural networks are used to create a function approximation of the value of a state, action pair. Such a function approximation generalizes better over big state spaces than table based methods.

In this project, I trained a Double Q Network agent using an experience Buffer and experience replay.
I also investigated the effects of multiple hyperparameters and tuned them for better performance.
The video above shows the agents performance during multiple stages of the training.

For this project I coded up the Q-learning agent, the experience generator and buffer, as well as the gridsearch for hyperparameter tuning from scratch. To implement the online network and the target network I used the PyTorch library for both inference and training. The Lunar Lander Game Environment is provided by OpenAI’s gym library.

The code I wrote for this project was partially used for my Reinforcement Learning Class at Georgia Tech. Unfortunately, I am not allowed to publish the code according to the OMSCS code of conduct.

Previous Article