Safe Policy Search in Higher Dimensions

For my masters thesis, I collaborated with Prof. Dr. Andreas Krause in Learning and Adaptive Systems Group at ETHZ.

Parameters for robotics algorithms need to be tuned in order to maximize performance on the real system. Bayesian Optimization (BO) has been used to automate this process. However, in case of safety-critical systems, evaluation of unsafe parameters during the optimization process should be avoided. Recently, a safe BO algorithm, SAFEOPT, was proposed; it employs Gaussian Processes to only evaluate parameters that satisfy safety constraints with high probability.

Even so, it is known that BO does not scale to higher dimensions (d > 20). To overcome this limitation, I developed the SAFEOPT-HD algorithm that identifies relevant domain regions that efficiently trade-off performance and safety, and restricts BO search to this preprocessed domain. By employing cheap (and potentially inaccurate) simulation models, offline computations allow identifying domain subspaces that are likely to yield optimal policies, thus significantly reducing domain size. When combined with SAFEOPT, we obtain a safe BO algorithm applicable for problems with large input dimensions. To alleviate the issues due to sparsity of the non-uniform preprocessed domain, a method to systematically generate new controller parameters with desirable properties is implemented. The efficacy of SAFEOPT-HD is illustrated by optimizing a 48-dimensional control policy to execute full position control of a quadrotor, while guaranteeing safety.

The results from this work were presented at the 16^th Women in Machine Learning Workshop @ NeurIPS 2021 poster session.