Activities per year
Abstract
Reinforcement learning entails many intuitive and useful approaches to solving various problems. Its main premise is to learn how to complete tasks by interacting with the environment and observing which actions are more optimal with respect to a reward signal. Methods from reinforcement learning have long been applied in aerospace and have more recently seen renewed interest in space applications. Problems in spacecraft control can benefit from the use of intelligent techniques when faced with significant uncertainties - as is common for space environments. Solving these control problems using reinforcement learning remains a challenge partly due to long training times and sensitivity in performance to hyperparameters which require careful tuning. In this work we seek to address both issues for a sample spacecraft control problem. To reduce training times compared to other approaches, we simplify the problem by discretising the action space and use a data-efficient algorithm to train the agent. Furthermore, we employ an automated approach to hyperparameter selection which optimises for a specified performance metric. Our approach is tested on a 3-DOF powered descent problem with uncertainties in the initial conditions. We run experiments with two different problem formulations - using a 'shaped' state representation to guide the agent and also a 'raw' state representation with unprocessed values of position, velocity and mass. The results show that an agent can learn a near-optimal policy efficiently by appropriately defining the action-space and state-space. Using the raw state representation led to 'reward-hacking' and poor performance, which highlights the importance of the problem and state-space formulation in successfully training reinforcement learning agents. In addition, we show that the optimal hyperparameters can vary significantly based on the choice of loss function. Using two sets of hyperparameters optimised for different loss functions, we demonstrate that in both cases the agent can find near-optimal policies with comparable performance to previously applied methods.
Original language | English |
---|---|
Pages (from-to) | 223-255 |
Number of pages | 33 |
Journal | Optimization and Engineering |
Volume | 24 |
Issue number | 1 |
Early online date | 4 Oct 2021 |
DOIs | |
Publication status | E-pub ahead of print - 4 Oct 2021 |
Keywords
- efficiency
- reinforcement learning
- spacecraft
- descent
- Q-learning
Fingerprint
Dive into the research topics of 'Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning'. Together they form a unique fingerprint.-
Learning to Control a Spacecraft
Wilson, C. (Speaker)
22 Nov 2019Activity: Talk or presentation types › Oral presentation
-
5th European Optimisation in Space Engineering Workshop
Wilson, C. (Participant)
21 Nov 2019 → 22 Nov 2019Activity: Participating in or organising an event types › Participation in conference