Safe Reinforcement Learning
Institute for Dynamic Systems and Control, ETHZ
This work is done under the supervision of Dr. Melanie Zeilinger at the Institute for Dynamic Systems and Control at ETHZ. The motivation for the project comes from the growing complexity of systems and increasing availability of data, giving rise to development of data-enabled controller design methods. While there are numerous approaches to this problem, the case of controller design for safety critical systems is a challenge.
With this in mind, this project aims to design safe learning-based controllers for constrained systems using Thompson Sampling and Scenario Based Optimization. This goal is achieved in three parts. First, a Bayesian optimization method is illustrated using a simple Thompson sampling algorithm to efficiently balance between exploration and exploitation when learning to optimize actions. This method is then extended for learning-based control using model-based reinforcement learning. A constrained open-loop finite horizon control problem is solved to compute approximately optimal input sequences while reinforcement learning optimizes close-loop performance. Finally, using the Scenario framework, an uncertain convex program is solved to satisfy randomized uncertain constraints with high probability.
I recently had the opportunity to present this work at the Machine Learning Summer School (MLSS), 2020 in Tubingen.