ROBUST ADAPTIVE CONTROL USING REINFORCEMENT LEARNING FOR NONLINEAR SYSTEM WITH INPUT CONSTRAINTS

This paper proposes a novel approach to design a controller in discrete time for the class of uncertain nonlinear systems in the presence of magnitude constrains of control signal which are treated as the saturation nonlinearity. A associative law between reinforcement learning algorithm based on adaptive NRBF neural networks and the theory of robust control H is set up in a novel control structure, in which the proposed controller allows learning and control on-line to compensate multiple uncertain nonlinearities as well as minimizing both the H tracking performance index function and the unknown nonlinear dynamic approximation errors. The novel theorem of robust stabilization of the closed-loop system is declared and proved. Simulation results verify the theoretical analysis. ∞


INTRODUCTION
Direct adaptive controllers for a class of nonaffine and affine uncertain nonlinear discretetime systems with input constraints using reinforcement learning neural networks are proposed in [1][2].The performance index functions of the long term tracking error are predicted and minimized by reinforcement learning algorithms.As results, some of nonlinear components such as unknown dynamic functions, the control inputs constrained saturation and unknown but bounded disturbance are compensated.In addition, the tracking error and the functional approximation error of neural networks are uniform ultimate bounded (UUB) using Lyapunov approach.
In the theory of robust control, available knowledge of system is to exploit absolutely such as nominal models or the upper bound of uncertain parameters to design robust stable controllers.However the robust controllers trend to become "hard" controller because they contain constant parameters.On the other hand, reinforcement learning (RL) methods can be learn online to find better control laws without the available knowledge.However, RL methods deal with processes of try and error, therefore at the intermediate stage of learning and control the RL systems may go through periods of unstable behavior.
Recently, to solve the above problem, some methods of robust RL have been proposed as (1) a RL algorithm using neural networks (NN) combines with the concept of sliding mode control [4].This method makes the system be oscillated by the chattering phenomenon, although the learning system is robust.(2) A tool of robust control theory, Integral Quadratic Constraints (IQCs), is used in robust reinforcement learning [5][6].By replacing the nonlinear and time-varying components of the NNs with IQCs, NN's weights are analyzed and constrained in stable dynamic ranges.As results, NNs generate control signals which make the system be robust stable during learning and control online.(3) Another method is designed based on theory of control for the system whose modeling errors can be pre-interpreted as unknown but bounded disturbance [7].The main purpose of this method is that an online function known as Hamilton-Jacobi-Isacc (HJI) is approximated to drive the worst disturbance and the optimal control simultaneously.
This paper contributes some novel points of view as follows Combining a reinforcement learning algorithm based on neural networks and the theory of control to propose a novel robust adaptive control structure diagram for a class of the nonlinear discrete time system with input constrains.

H
The new robust adaptive reinforcement learning controller is analyzed and designed.The new robust stable theorem is shown and proved.The remainder of this paper is arranged as follows.Section 2 describes properties of the function approximator using NN as adaptive normalized RBF.A description of the uncertain nonlinear discrete time system with input constrains is presented in section 3. Small gain theorem in robust control theory is reviewed in section 4. In section 5, a novel control structure diagram is shown and a novel theorem of robust stabilization of the closed-loop system is declared and proved, subsequently.The results of simulation in section 6 verify the effects of the proposed controller and conclusions are drawn in section 7.

APPROXIMATION PROPERTY OF ADAPTIVE NORMALIZED RBF -ANRBF
Choosing suitable function approximators in RL is essential for speeding up learning and control.ANRBF with ability to adapt centers and widths of basic functions give approximation performance better than other neural networks [8].
A continuous function within a compact subset is approximated by ANRBF as Where W is a target weight matrix of the hidden layer to the output; ε is vector of functional approximation error.The actual ANRBF output is defined as is a weight matrix updated online at instant k; is number of hidden-layer nodes, denotes the vector of center and the value of width of respectively and is number of input-layer nodes.
i n Remark: with limited , the following inequality is always satisfied Trang 6

UNCERTAIN NONLINEAR DISCRETE TIME SYSTEM DESCRIPTION
Consider the following uncertain nonlinear discrete time system Where , is the vector of state at instant k ; is the unknown nonlinear dynamics of the system; is the control input constrained saturation; and is the unknown but bounded disturbance.
Given a reference trajectory and its past values, the vector of tracking error is defined as Define the filtered tracking error as is the next value of ; are the past values for ; is an identity matrix; with is constant diagonal positive definite matrix chosen so that its eigenvalues are within the unit circle.Consequently, if then will go to zero.Combining (4) ( 5) and ( 6) we get The control purpose is to make the tracking error of the system (7) achieve the robust performance index.

CONTROL FOR DISCRETE TIME SYSTEMS
∞ H ∞ H Robust control deals with a system shown in Fig. 1, where G is the controlled plant, K is the controller, is the control input, is the output of plant supposed measurement available to the controller.The controller K is designed to stabilize the closed loop system based on model G .However there is difference between the model and actual plant dynamics, the feedback loop could be unstable.The effect of modeling error can be seen as an unknown disturbance According to the Small Gain Theorem, the system in Fig. 1 will be stable if the condition as follows is satisfied.
, ρ is a specified attenuation level; η is the positive constant depending on initial conditions; is number of steps to the final state.N

Basic control law
At early stages of learning online, the control loop using NN whose weights are selected random from [ will be unstable.Therefore using a basic control law to make system be stable is necessary [1][2].This control law provides the supervised signals which allow the reinforcement learning system turning NN's weights online rather than offline training.To find it, the auxiliary control input is defined as Where is the function approximation of and is a diagonal matrix.The actual control input constrained saturation is defined Where is the upper bound for .The closed loop system can be written as And and are defined as follows to reject the effect of Combining (11) ( 13) and ( 14) we get Trang 8 Where is the basic control law. ) k ( Le u

Robust Adaptive Reinforcement Learning -RARL
Fig. 2 represents a RARL system based on the special structure known as actor-critic [2][3].
Here, the actor and critic are based on ANRBFs.
Remark: in Eq. ( 8) is replaced by and ε is the total of the functional approximation error of both actor and critic.

Value function
The performance index at instant is proposed as And the value function at instant becomes k is a discount factor which makes converge when .The optimal value function satisfies the Optimal Bellman Equal as Solution of Eq. ( 19) could not be found by analytic or the bellman meshed diagram because the model is not available.Hereafter is approximated based on the actor-critic system, in which the output of critic is used to approximate , and the output of actor generates the control signal to approximate .Weights of actor are updated by the signal from critic.

Critic
The critic is used to approximate the value function to

And
; ; is the weight matrix, is the vector of actor functions, c n is the number of hidden-layer nodes, is the input to the critic.The law for updating weights is proposed as Where ℜ ∈ c α is the positive constant representing learning rate.

Actor
The function in Eq. ( 4) is approximate to Where , , a n , are the weight matrix, the vector of actor functions, number of hidden-layer nodes and the input to the actor respectively.The law updating weights is proposed as ) Where ℜ ∈ a α is the positive constant representing learning rate.

Robust stability
Theorem: given the bounded reference trajectory and its past value, defined the auxiliary control input in Eq. ( 9), the is the maximum singular value of the gain matrix in Eq. ( 15) satisfies as And the function of performance index in Eq. ( 17), actor-critic structure base on ANRBF, the laws of updating weight for critic as Eq. ( 22) and actor as Eq. ( 24) then during learning and control online, the tracking error of the closed loop system will be achieving the robust stability.
∞ H Proof: See the Appendix.

SIMULATION
Nonlinear system for simulation to verify proposed controller is given by Eq. ( 26) respectively.The control objective is to design RARL so that tracks desired trajectory with considering saturated gain phenomenon of the control input. is given as

Trang 13
The sampling interval is taken as s .T 05 0 = and the white Gaussian noise with a standard deviation of 0.005 is added to the system.The time duration of simulation is taken 300s.The unknown disturbance is chosen as The gain of the basis control input is chosen as First, to show the effect of proposed controller, the RARL controller is removed out of the closed loop.The uncertainty parameters are selected as 2 1 1 . In Fig. 3 it can be seen that the tracking error given by the basic controller are bounded but the performance is very poor.Now we add the RARL controller to the closed loop.The robust tracking performance is presented in Fig. 4 and Fig. 5.Because of the activation of disturbance, at the second of 100 the tracking error is overshoot but it quickly goes to zero asymptotically.The Fig. 6 presents the control input in which its gain is constrained in range of The robust tracking performance with in Fig. 8. From Fig. 4, 7 and 8 we can see that they are robust for all the parameters.

CONCLUSION
This paper proposes the method which combined reinforcement learning based on neural network ANRBF and the robust theory to design a robust adaptive reinforcement learning controller for a class of the uncertain nonlinear discrete time system with input constrains which are treated as the saturation nonlinearity.The proposed controller not only compensates some uncertainty nonlinear components but also gives the robust tracking performance.

H
An Adaptive controller with robust tracking performance using the recurrent CMAC neural network to get rid of chattering phenomenon for a class of multivariable uncertain nonlinear system is proposed in [9].Develop a RARL controller using CMAC is next research.In addition applying RARL to control for real plants is considered next.