Multi-agent output formation tracking control method and system

文档序号:85349 发布日期:2021-10-08 浏览:6次 中文

阅读说明:本技术 一种多智能体输出编队跟踪控制方法及系统 (Multi-agent output formation tracking control method and system ) 是由 董希旺 石宇 于江龙 化永朝 李清东 任章 吕金虎 于 2021-07-15 设计创作,主要内容包括:本发明涉及一种多智能体输出编队跟踪控制方法及系统,该方法首先基于局部通信信息,设计分布式编队轨迹生成器,实时生成异构智能体的期望编队轨迹;其次利用强化学习的原理,利用系统模型的输入输出数据,在线优化迭代得到稳定的最优反馈控制器;最后根据在线学习的结果,设计输出编队跟踪控制前馈补偿控制器,实现了编队跟踪控制。本发明大大节约了通信资源,降低了通信负担,同时无需利用跟随者智能体的任何模型信息,对环境适应性更好,并且跟踪控制精度高。(The invention relates to a multi-agent output formation tracking control method and a system, wherein the method comprises the steps of firstly designing a distributed formation track generator based on local communication information, and generating an expected formation track of a heterogeneous agent in real time; secondly, utilizing the principle of reinforcement learning and utilizing the input and output data of a system model to obtain a stable optimal feedback controller through online optimization iteration; and finally, designing an output formation tracking control feedforward compensation controller according to the online learning result, thereby realizing the formation tracking control. The invention greatly saves communication resources, reduces communication burden, simultaneously does not need to utilize any model information of a follower intelligent agent, has better environmental adaptability and high tracking control precision.)

1. A multi-agent output formation tracking control method is characterized by comprising the following steps:

establishing a communication network topology model of the multi-agent system;

establishing a leader agent model;

designing a desired formation configuration for the follower agent;

designing a distributed formation trajectory generator from the communication network topology model, the leader agent model, and the desired formation configuration;

generating, with the distributed formation track generator, a reference track for the follower agent;

tracking, with a formation trajectory tracking controller, the output trajectory of the follower agent to the reference trajectory.

2. The method as claimed in claim 1, wherein the establishing a communication network topology model of a multi-agent system specifically comprises:

establishing a first communication topology communication relation between a follower agent and a follower agent;

establishing a second communication topology communication relationship between the follower agent and the leader agent;

and establishing a communication network topology model of the multi-agent system according to the first communication topology communication relation and the second communication topology communication relation.

3. The multi-agent output formation tracking control method of claim 1, wherein the leader agent model is represented as:

y0(t)=Rξ0(t)

wherein the content of the first and second substances,representing a derivative of the leader agent's agent state over time; xi0(t) represents a leader agent state; s represents a leader agent coefficient matrix; r represents a leader agent output matrix; y is0(t) represents the leader agent output state.

4. The multi-agent output formation tracking control method of claim 1, wherein the specific expression of the desired formation configuration of the follower agents is:

wherein the content of the first and second substances,indicating period of ith follower agentHope formation configuration;representing a state offset of the ith follower agent relative to the leader agent; r represents a leader agent output matrix.

5. The multi-agent output formation tracking control method according to claim 1, wherein the specific expression of the distributed formation trace generator is as follows:

wherein the content of the first and second substances,a time derivative representing an ith distributed formation trace generator state; s represents a leader agent coefficient matrix; xii(t) represents the state of the ith distributed formation trace generator; f represents a constant gain matrix;representing a reference trajectory of an ith follower agent; j represents the jth follower agent; n is a radical ofiA set of neighbors representing the ith follower agent;representing the ith follower agent formation vector; giRepresenting a communication weight from the leader agent to the ith follower agent; w is aijRepresenting the corresponding communication weight from the jth agent to the ith agent; xi0(t) represents a leader agent state; v. ofi(t) watchShowing a track to generate a compensation term; r represents a leader agent output matrix.

6. The multi-agent output formation tracking control method according to claim 1, wherein the causing of the follower agent's output trajectory to track the reference trajectory by the formation trajectory tracking controller specifically comprises:

designing an optimal feedback controller by using a reinforcement learning algorithm;

designing an output formation tracking control feedforward compensation controller according to the optimal feedback control gain of the optimal feedback controller;

obtaining the formation track tracking controller according to the optimal feedback controller and the output formation tracking control feedforward compensation controller;

causing an output trajectory of the follower agent to track the reference trajectory with the formation trajectory tracking controller.

7. The multi-agent output formation tracking control method according to claim 6, wherein the designing of the optimal feedback controller by using the reinforcement learning algorithm specifically comprises:

designing an index function of a reinforcement learning algorithm;

carrying out parametric fitting on the index function according to the generation state of the distributed formation track generator and the state of the follower agent to obtain a fitted index function;

initializing the fitted index function, an iteration controller and iteration times;

collecting all the states of the agents and the control input quantity of the follower intelligent model in the multi-agent system after the agents are added into the data excitation controller at a preset time interval to obtain collected data;

when the number of unused data in the acquired data reaches a preset value, carrying out iterative updating on the iterative controller;

and when the difference value between the current iteration controller and the iteration controller obtained in the previous iteration updating is smaller than the error allowable threshold value, ending the iteration updating.

8. The multi-agent output queuing tracking control method according to claim 6, wherein the specific expression of the output queuing tracking control feedforward compensation controller is as follows:

wherein z isic(t) represents a control quantity of the output convoy tracking control feedforward compensation controller;an average value of an input matrix representing an ith follower agent model;andrepresenting an optimal feedback control gain of the optimal feedback controller; s represents a leader agent coefficient matrix;representing the ith follower agent formation vector;is the time derivative of the ith follower agent formation vector.

9. The multi-agent output formation tracking control method according to claim 6, wherein the specific expression of the formation trajectory tracking controller is as follows:

wherein the content of the first and second substances,representing the control quantity of the formation track tracking controller;andrepresenting an optimal feedback control gain of the optimal feedback controller; z is a radical ofic(t) represents a control quantity of the output convoy tracking control feedforward compensation controller; riA designable variable representing an index function for controlling the magnitude of the gain; xii(t) represents the state of the ith distributed formation trace generator; x is the number ofi(t) represents the state quantity of the ith follower agent; b isiAn input matrix representing the ith follower agent.

10. A multi-agent output formation tracking control system, comprising:

the network model establishing module is used for establishing a communication network topology model of the multi-agent system;

the leader model establishing module is used for establishing a leader intelligent agent model;

a formation configuration design module for designing a desired formation configuration for a follower agent;

a trajectory generator design module to design a distributed formation trajectory generator according to the communication network topology model, the leader agent model, and the desired formation configuration;

a reference trajectory generation module to generate a reference trajectory for the follower agent using the distributed formation trajectory generator;

a tracking module for tracking the output trajectory of the follower agent to the reference trajectory using a formation trajectory tracking controller.

Technical Field

The invention relates to the technical field of control theory and unmanned system equipment, in particular to a multi-agent output formation tracking control method and system.

Background

Formation control of a multi-agent system is a research hotspot in the field of control theory and unmanned system equipment at present, and is widely applied to engineering, such as various civil and military scenes of unmanned aerial vehicles, unmanned vehicles, robot collaborative surveying and mapping, reconnaissance, cargo transportation, weapon system cluster combined attack and the like. When the number of cluster systems is large, the traditional centralized control scheme based on guidance, tracking, behavior planning and the like consumes a lot of communication resources and is heavy in communication burden.

Therefore, a method and a system for controlling multi-agent output formation tracking are needed to save communication resources and reduce communication burden.

Disclosure of Invention

The invention aims to provide a multi-agent output formation tracking control method and a multi-agent output formation tracking control system, so as to save communication resources and reduce communication burden.

In order to achieve the purpose, the invention provides the following scheme:

a multi-agent output formation tracking control method, comprising:

establishing a communication network topology model of the multi-agent system;

establishing a leader agent model;

designing a desired formation configuration for the follower agent;

designing a distributed formation trajectory generator from the communication network topology model, the leader agent model, and the desired formation configuration;

generating, with the distributed formation track generator, a reference track for the follower agent;

tracking, with a formation trajectory tracking controller, the output trajectory of the follower agent to the reference trajectory.

Optionally, the establishing a communication network topology model of the multi-agent system specifically includes:

establishing a first communication topology communication relation between a follower agent and a follower agent;

establishing a second communication topology communication relationship between the follower agent and the leader agent;

and establishing a communication network topology model of the multi-agent system according to the first communication topology communication relation and the second communication topology communication relation.

Optionally, the leader agent model is represented as:

y0(t)=Rξ0(t)

wherein the content of the first and second substances,representing a derivative of the leader agent's agent state over time; xi0(t) represents a leader agent state; s represents a leader agent coefficient matrix; r represents a leader agent output matrix; y is0(t) represents the leader agent output state.

Optionally, the specific expression of the expected formation configuration of the follower agent is:

wherein the content of the first and second substances,representing a desired formation configuration for the ith follower agent;representing a state offset of the ith follower agent relative to the leader agent; r represents a leader agent output matrix.

Optionally, the specific expression of the distributed formation track generator is as follows:

wherein the content of the first and second substances,a time derivative representing an ith distributed formation trace generator state; s represents a leader agent coefficient matrix; xii(t) represents the state of the ith distributed formation trace generator; f represents a constant gain matrix;representing a reference trajectory of an ith follower agent; j represents the jth follower agent; n is a radical ofiA set of neighbors representing the ith follower agent;representing the ith follower agent formation vector; giRepresenting a communication weight from the leader agent to the ith follower agent; w is aijRepresenting the corresponding communication weight from the jth agent to the ith agent; xi0(t) represents a leader agent state; v. ofi(t) represents a trajectory generation compensation term; r represents a leader agent output matrix.

Optionally, the causing, by the formation trajectory tracking controller, the output trajectory of the follower agent to track the reference trajectory specifically includes:

designing an optimal feedback controller by using a reinforcement learning algorithm;

designing an output formation tracking control feedforward compensation controller according to the optimal feedback control gain of the optimal feedback controller;

obtaining the formation track tracking controller according to the optimal feedback controller and the output formation tracking control feedforward compensation controller;

tracking, with a formation trajectory tracking controller, the output trajectory of the follower agent to the reference trajectory.

Optionally, the designing an optimal feedback controller by using a reinforcement learning algorithm specifically includes:

designing an index function of a reinforcement learning algorithm;

carrying out parametric fitting on the index function according to the generation state of the distributed formation track generator and the state of the follower agent to obtain a fitted index function;

initializing the fitted index function, an iteration controller and iteration times;

collecting all the states of the agents and the control input quantity of the follower intelligent model in the multi-agent system after the agents are added into the data excitation controller at a preset time interval to obtain collected data;

when the number of unused data in the acquired data reaches a preset value, carrying out iterative updating on the iterative controller;

and when the difference value between the current iteration controller and the iteration controller obtained in the previous iteration updating is smaller than the error allowable threshold value, ending the iteration updating.

Optionally, the specific expression of the output formation tracking control feedforward compensation controller is as follows:

wherein z isic(t) represents a control quantity of the output convoy tracking control feedforward compensation controller;an average value of an input matrix representing an ith follower agent model;andrepresenting an optimal feedback control gain of the optimal feedback controller; s represents a leader agent coefficient matrix;representing the ith follower agent formation vector;is the time derivative of the ith follower agent formation vector.

Optionally, the specific expression of the formation trajectory tracking controller is as follows:

wherein the content of the first and second substances,representing the control quantity of the formation track tracking controller;andrepresenting an optimal feedback control gain of the optimal feedback controller; z is a radical ofic(t) represents a control quantity of the output convoy tracking control feedforward compensation controller; riA designable variable representing an index function for controlling the magnitude of the gain; xii(t) represents the state of the ith distributed formation trace generator; x is the number ofi(t) represents the state quantity of the ith follower agent; b isiAn input matrix representing the ith follower agent.

A multi-agent output formation tracking control system, comprising:

the network model establishing module is used for establishing a communication network topology model of the multi-agent system;

the leader model establishing module is used for establishing a leader intelligent agent model;

a formation configuration design module for designing a desired formation configuration for a follower agent;

a trajectory generator design module to design a distributed formation trajectory generator according to the communication network topology model, the leader agent model, and the desired formation configuration;

a reference trajectory generation module to generate a reference trajectory for the follower agent using the distributed formation trajectory generator;

a tracking module for tracking the output trajectory of the follower agent to the reference trajectory using a formation trajectory tracking controller.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention provides a multi-agent output formation tracking control method and a multi-agent output formation tracking control system.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart of a multi-agent output formation tracking control method provided in embodiment 1 of the present invention;

fig. 2 is a flowchart of a method for enabling an output trajectory of a follower agent to track a reference trajectory by using a formation trajectory tracking controller in a multi-agent output formation tracking control method according to embodiment 1 of the present invention;

fig. 3 is a structural diagram of a multi-agent output formation tracking control system according to embodiment 2 of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims to provide a multi-agent output formation tracking control method and a multi-agent output formation tracking control system, so as to save communication resources and reduce communication burden.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Example 1:

formation control of a multi-agent system is a research hotspot in the field of control theory and unmanned system equipment at present. When the number of cluster systems is large, the traditional centralized control scheme based on guidance-tracking, behavior planning and the like consumes a lot of communication resources and is heavy in communication burden.

Meanwhile, most of the existing formation control researches and solves the formation control problem of the isomorphic intelligent agent. Therefore, in order to adapt to the characteristics of decision intellectualization, organization networking and configuration diversification of the future unmanned system, a cooperative formation control method with more intelligent and general system characteristics needs to be established.

Referring to fig. 1, the present invention provides a multi-agent output formation tracking control method, including:

s1: establishing a communication network topology model of the multi-agent system;

in order to solve the problem of most research on formation control of homogeneous agents in the existing formation control, the multi-agent system of the embodiment considers a heterogeneous agent set including 1 leader and N followers.

Therefore, communication topologies need to be established for the leader agent and the follower agent, respectively, namely:

(1) establishing a first communication topology communication relation between a follower agent and a follower agent;

the communication topology between follower agents may be mathematically modeled as a graph G ═ S, E, W. Wherein S ═ { S ═ S1,s2,…,sNRepresents the set of all follower agents, s in the setiThe index i may take the value of a positive integer between 1 and N, representing the ith follower.

Communication interactions from an ith follower agent to a jth follower agent may be represented by eij=(si,sj) To indicate that subscript j takes on a positive integer between 1 and N and j ≠ i.

Let set E ═ Eij=(si,sj),si,sjE S is the edge set of the intelligent system. If there is a channel eijThen, follower agent j is said to be a neighbor of follower agent i, and N is definedi={sj∈S,(si,sj) E is the neighbor set of agent i. Constant wijFor the corresponding communication weight values between follower agents j to i, there is w if and only if follower agent j is a neighbor of follower agent iij1, otherwise wij0 and thus defines the adjacency matrix of graph G as a dimension N × N matrix, where the i-th row and j-th column elements are wijThat is, the mathematical expression is W ═ Wij]。

Defining an in-degree matrix describing each node asA laplacian matrix describing a first communication topological connectivity relationship between follower agents and follower agents is defined as L ═ D-W.

(2) Establishing a second communication topology communication relationship between the follower agent and the leader agent;

the directional connections that exist between the leader agent and the follower agent may be by a diagonal matrix LC=diag{giDenotes, therefore, the second communication topology is connected with LC=diag{giIn which g isiIntelligently organizing into a leaderCommunication weight of ith follower agent, if there is communication connection from leader agent to follower agent i, gi1, otherwise gi=0。

(3) And establishing a communication network topology model of the multi-agent system according to the first communication topology communication relation and the second communication topology communication relation.

According to a Laplace matrix L and a diagonal matrix LCCan find a positive definite diagonal matrix J, and satisfies J (L + L)c)+(L+Lc)TJ, making it a strict positive definite matrix and defining it as a communication network topological model. The established communication network topology model needs to satisfy the following conditions: a node exists between the follower agents, so that the follower agents are communicated with any other node through a directed path, and when the leader agents are connected to the node, the structure of the communication topology meets the design requirements of a subsequent distributed formation track generator.

S2: establishing a leader agent model;

the leader agent model is represented as:

wherein the content of the first and second substances,representing a derivative of the leader agent's agent state over time; vector xi of dimension p × 10(t) represents a leader agent state; a matrix S with dimension p multiplied by p represents a leader agent coefficient matrix and is used for designing different motion modes of the leader; a matrix R with dimension q p represents a leader agent output matrix; vector y of dimension q × 10(t) represents a leader agent output state;

s3: designing a desired formation configuration for the follower agent;

using vectorsDescribing a desired state formation configuration of each follower agent relative to the formation of the leader agent, wherein a piecewise continuous differentiable functionRepresenting the state offset of the ith follower agent relative to the leader agent, the specific expression for the expected formation configuration of the follower agents is:

wherein the content of the first and second substances,representing a desired formation configuration for the ith follower agent;representing a state offset of the ith follower agent relative to the leader agent; r represents a leader agent output matrix.

S4: designing a distributed formation trajectory generator from the communication network topology model, the leader agent model, and the desired formation configuration;

the specific expression of the distributed formation track generator is as follows:

wherein the content of the first and second substances,a time derivative representing an ith distributed formation trace generator state; s represents a leader agent coefficient matrix; xii(t) represents the state of the ith distributed formation trace generator; f represents a constant gain matrix for adjusting the stability and response characteristics of the distributed formation track generator, and F is-mu M-1Where M is a linear matrix inequality STM+MS-(1-ε)IpSolution of + α M < 0. The normal number epsilon ranges from 0 to 1, the value range of the normal number alpha is any positive real number, the matrix inequality can be solved through proper selection, and the matrix inequality can be used as an adjustable parameter to enable the distributed formation track generator to have different responses; the normal number mu satisfies mu … lambdamax(J/λmin(J(L+Lc)+(L+Lc)TJ) And the specific numerical value can be designed and selected by self to serve as a performance adjustable parameter of the distributed formation track generator. In addition, IpRepresenting an identity matrix of dimension p, λmax(J),λmin(J(L+Lc)+(L+Lc)TJ) Respectively representing the maximum eigenvalue and the minimum eigenvalue of the matrix;representing a reference trajectory of an ith follower agent; j represents the jth follower agent; n is a radical ofiA set of neighbors representing the ith follower agent;representing the ith follower agent formation vector; giRepresenting a communication weight of the leader agent to the ith follower agent, g if there is a communication connection from the leader to agent ii1, otherwise gi=0;wijRepresenting the corresponding communication weights between the jth agent to the ith agent, w if and only if agent j is a neighbor of agent iij1, otherwise wij=0;ξ0(t) represents a leader agent state; r represents a leader agent output matrix; v. ofi(t) represents a trajectory generation compensation term, vi(t) variables satisfying the following formula:

wherein the content of the first and second substances,for the ith follower intelligenceBody formation vectorThe time derivative of (a).

It should be noted that other embodiments that enable designing a distributed formation trail generator based on a communication network topology model, a leader agent model, and a desired formation configuration are also within the scope of the present invention.

S5: generating, with the distributed formation track generator, a reference track for the follower agent, wherein the reference track includes desired formation configuration information for the follower agent;

s6: tracking, with a formation trajectory tracking controller, the output trajectory of the follower agent to the reference trajectory.

As an alternative embodiment, the using a formation trajectory tracking controller to make the output trajectory of the follower agent track the reference trajectory specifically includes, as shown in fig. 2:

s61: designing an optimal feedback controller by using a reinforcement learning algorithm:

since the design process of the controller in the prior art requires complete kinetic model information for all agents. Complete model information is required for the output regulation and control problem of heterogeneous systems to solve the output regulation equation in advance. The application of the above method is limited in consideration of the widely existing model uncertainty or modeling complexity of the actual system. In contrast, the embodiment designs the formation trajectory tracking controller based on the reinforcement learning algorithm, which is specifically as follows:

(1) index function for designing reinforcement learning algorithm

Wherein the index factorDesigned as a quadratic form of tracking error and follower control input; gamma rayiIs a normal number and satisfies the condition gammai> 0 andthe controller can be used as a design parameter, and different controller response performances can be obtained by modifying different values; qi,RiAnd selecting a symmetric positive definite matrix as a designable variable of an index function, and respectively guiding and constraining the error convergence level and the control gain of the reinforcement learning controller.

(2) Carrying out parametric fitting on the index function according to the generation state of the distributed formation track generator and the state of the follower agent to obtain a fitted index function;

state xi of distributed formation track generatori(t) and State x of follower Agentsi(t) as an augmentation vector θi(t)=[ξi(t)T,xi(t)T]TCarrying out parametric fitting on the index function, wherein the fitted index function is represented as:

wherein, PiIs dimension (p + n)i)×(p+ni) Real matrix of, SiIs dimension (p + n)i) Real vector of x 1, TiIs a real number, representing a parameter of the index function. Will PiThe rows and columns are respectively in accordance with the dimensions p and niIs obtained by blocking

(3) Initializing the fitted index function, an iteration controller and iteration times;

according to the structure given by the formula (6), selecting an initial parameter matrix asThe blocking method according to equation (7), initiallyThe iterative controller is represented asThe gain matrix of the pre-estimated feedback controller is recorded as Ki1And Ki2And are respectively initialized toThe number of initialization iterations k is 0.

(4) Collecting all the states of the agents and the control input quantity of the follower intelligent model in the multi-agent system after the agents are added into the data excitation controller at a preset time interval to obtain collected data;

arbitrarily selecting a stable data excitation controllerInto a multi-agent system, where Ki1 0,Ki2 0For any gain matrix that stabilizes the follower agent model, the dimensions are miX p and mi×ni. And n isiThe vector e of the dimension is selected as a group of random frequency Gaussian white noise weighted sums with proper dimensions to serve as exploration noise for improving the stability of the reinforcement learning algorithm. For the system with the stimulus added, all agent states and control inputs for the follower agent model are collected every interval δ t from time 0 and saved as a set of data.

The follower agent model is represented as:

wherein x isi(t) is the dimension miA vector of x 1, representing the state quantity of the follower agent;is dimension miA vector of x 1, representing the time derivative of the state quantity of the follower agent; u. ofi(t) is dimension niA vector of x 1, representing the control input amount; y isi(t) a vector of dimension p × 1, representing the control output; x is the number ofi(t)、ui(t) and yi(t) as a data source for subsequent reinforcement learning. A. thei、BiAnd CiRespectively dimension mi×mi,mi×ni,p×miThe real matrices of (a) represent the system matrix, input matrix and output matrix of the follower agent, respectively.

It should be noted that in the context of heterogeneous agent control of the present invention, the system models between the follower agents and the leader agent, and between the follower agents, may differ in dimension and parameter.

In addition, model information A for follower agents is described hereini,Bi,CiThe method is only used for describing and explaining the applicable objects of the algorithm, and the actual algorithm design implementation process does not need to use specific variable values.

(5) When the number of unused data in the acquired data reaches a preset value, carrying out iterative updating on the iterative controller;

when the number of unused data in the collected data reaches a selected value kappa, executing a round of iterative updating;

the equation shown as (9) is solved in the k-th iteration, and an iteration controller is taken asSolving parameters using least squaresAnd recording each group of solution data.

WhereinMeaning to the kth wheelAnd obtaining a gain matrix of the predictive feedback controller.

Updating parameters: will be provided withPartitioning according to the method of the formula (7), writing the partitioning matrix form, and obtaining an updating form of an iterative controller as follows:

let k be k +1 as a known quantity in the next iteration.

(6) When the difference value between the current iteration controller and the iteration controller obtained in the previous iteration updating is smaller than the error allowable threshold value, the iteration updating is finished;

selecting error allowable threshold as a normal number epsilon, for the same group thetai(t) combining the formula (10) and saving the data whenThen the iteration is terminated. To obtainMaking it the optimal feedback control gain after convergence.

S62: designing an output formation tracking control feedforward compensation controller according to the optimal feedback control gain of the optimal feedback controller;

based on the optimal feedback control gain obtained in step S61, calculation is continuedWhereinRepresents the input matrix BiAn estimate of (d). Select matrix to satisfyAnd isAnd verifying and adjusting the output formation state for the full-rank matrix to meet the following feasibility conditions:

designing an output formation tracking control feedforward compensation controller, wherein the specific expression is as follows:

wherein z isic(t) represents a control quantity of the output convoy tracking control feedforward compensation controller;an average value of an input matrix representing an ith follower agent model;andrepresenting an optimal feedback control gain of the optimal feedback controller; s represents a leader agent coefficient matrix;representing the ith follower agent formation vector;is the time derivative of the ith follower agent formation vector.

S63: obtaining the formation trajectory tracking controller according to the optimal feedback controller and the output formation tracking control feedforward compensation controller, wherein the specific expression of the formation trajectory tracking controller is as follows:

wherein the content of the first and second substances,representing the control quantity of the formation track tracking controller;andrepresenting an optimal feedback control gain of the optimal feedback controller; z is a radical ofic(t) represents a control quantity of the output convoy tracking control feedforward compensation controller; riA designable variable representing an index function for controlling the magnitude of the gain; xii(t) represents the state of the ith distributed formation trace generator; x is the number ofi(t) represents the state quantity of the ith follower agent; b isiAn input matrix representing the ith follower agent.

S64: tracking, with a formation trajectory tracking controller, the output trajectory of the follower agent to the reference trajectory.

Calculation using formation trajectory tracking controllerAnd the gradual and stable formation tracking control of the heterogeneous intelligent agent is realized.

The invention provides a multi-agent output formation tracking control method, which is a heterogeneous multi-agent output formation tracking control method based on a reinforcement learning principle and aiming at a general linear system. Firstly, designing a distributed formation track generator based on local communication information, and generating an expected formation track of a heterogeneous intelligent agent in real time; secondly, utilizing the principle of reinforcement learning and utilizing the input and output data of a system model to obtain a stable optimal feedback controller through online optimization iteration; and finally, designing an output formation tracking control feedforward compensation controller according to the online learning result to realize formation tracking control. The invention has the following advantages:

1. the algorithm designs a formation track generator based on a distributed method, and communication is carried out between follower agents based on neighbor information, so that communication resources are greatly saved, and communication burden is reduced;

2. the design process of the controller is based on a reinforcement learning algorithm, controller iterative learning is carried out only by using input and output data of a system model, any model information of a follower agent is not required to be utilized, and the environment adaptability is better. Data collection learning under the off-orbit strategy is designed, so that the stability of the learning process is enhanced, and the organic integration of a control theory and an artificial intelligence technology is realized;

3. an optimal feedback controller is designed on line through a reinforcement learning method, and is suitable for heterogeneous leaders and followers, so that the adaptability and the autonomous intelligence of the algorithm are improved;

4. the control algorithm comprises feedback control over the state of the leader agent and the state of the follower agent, and feed-forward control over formation information, so that progressive and stable differential-free formation tracking is realized, and the method has the advantage of high precision;

5. the intelligent autonomous control method is provided for the multi-agent system with linear or linearized dynamic model widely existing in engineering practice.

Example 2:

referring to fig. 3, the present invention provides a multi-agent output formation tracking control system, comprising:

the network model building module M1 is used for building a communication network topology model of the multi-agent system;

a leader model establishing module M2, for establishing a leader agent model;

a formation configuration design module M3 for designing a desired formation configuration for a follower agent;

a trajectory generator design module M4 for designing a distributed formation trajectory generator from the communication network topology model, the leader agent model, and the desired formation configuration;

a reference trajectory generation module M5 for generating a reference trajectory for the follower agent using the distributed formation trajectory generator;

a tracking module M6 for tracking the output trajectory of the follower agent to the reference trajectory using a formation trajectory tracking controller.

The emphasis of each embodiment in the present specification is on the difference from the other embodiments, and the same and similar parts among the various embodiments may be referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

16页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于机器人体积的移动机器人路径规划方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类