Data-driven approximate Q-learning stabilization with optimality error bound analysis

Yongxiang Li; Chengzan Yang; Zhongsheng Hou-; Yuanjing Feng; Chenkun Yin-

Ayuda

Data-driven approximate Q-learning stabilization with optimality error bound analysis

Autores: Yongxiang Li, Chengzan Yang, Zhongsheng Hou-, Yuanjing Feng, Chenkun Yin-
Localización: Automatica: A journal of IFAC the International Federation of Automatic Control, ISSN 0005-1098, Nº. 103, 2019, págs. 435-442
Idioma: inglés
Texto completo no disponible (Saber más ...)
Resumen
- The approximate Q-learning (AQL), as a typical reinforcement learning method, has attracted extensive attention in the past few years because of its outstanding ability to solve the nonlinear optimal control problem when the knowledge/model of the plant is unavailable. However, because of function approximation errors, the AQL algorithms can just give a near-optimal solution. Hence, a quantitative analysis result of the optimality error bound has important significance. In this paper, the off-line value iteration AQL is used to solve the model-free optimal stabilization control problem and a new optimality error bound analysis framework is proposed. Firstly, for convenience and clearness of analyzing the optimality error bound, the Q-learning operator is well defined based on the estimate of the domain of attraction (DOA) for closed-loops. Secondly, a quantitative analysis result of the estimation error bound for the optimal Q-function is obtained by selecting the function estimator as Gaussian processes regression. Finally, a quantitative analysis result of the optimality error bound, which is the error bound between the optimal cost and the actual cost of the AQL closed-loop, is given. As shown in the main result of this paper, the optimality error bound is determined by the approximation error bound of the function estimator (due to the finite number of data points) and the difference between the two Q functions obtained in the last two iterations (due to the finite number of iterations).

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: