Distributed optimization control method of multi-agent system and storage medium

文档序号：1904799 发布日期：2021-11-30 浏览：8次中文

阅读说明：本技术 一种多智能体系统分布式优化控制方法及存储介质 (Distributed optimization control method of multi-agent system and storage medium ) 是由姜晓伟曹爽张斌李刚于 2021-08-25 设计创作，主要内容包括：本发明提供了一种多智能体系统分布式优化控制方法存储介质,本方法使用分布式交替方向乘子法优化框架,该算法拥有优秀的迭代速度,同时具有很强的鲁棒性与普适性,通信门限函数的设计,规定智能体只有当本地信息更新足够有效时才能进行通信,保证了通信的高效性,大幅度减少系统优化所消耗的通信资源,对其迭代次数的设定也对系统降低通信损耗的能力有影响。自适应惩罚函数的设计,根据智能体状态值的不同,具体设计了智能体在每次迭代中的惩罚项,能够保证系统的收敛速度,使系统每次迭代都具有足够的有效性,对其最大迭代次数k-(max2)的设置会对系统的收敛速度产生影响。(The invention provides a storage medium for a distributed optimization control method of a multi-agent system, which uses a distributed alternating direction multiplier method optimization framework, has excellent iteration speed, strong robustness and universality and a communication threshold function The design of (2) stipulates that the intelligent agent can communicate only when the local information is updated effectively enough, so that the high efficiency of communication is ensured, communication resources consumed by system optimization are greatly reduced, and the setting of the iteration times of the intelligent agent also influences the capability of the system for reducing communication loss. Adaptive penalty function The design of the method specifically designs a punishment item of the intelligent agent in each iteration according to different state values of the intelligent agent, can ensure the convergence speed of the system, ensures that the system has enough effectiveness in each iteration and has the maximum iteration number k for the system max2 The setting of (a) will have an effect on the convergence speed of the system.)

1. A multi-agent system distributed optimization control method, comprising:

initializing a local state quantity of the multi-agent system;

the initialized agents in the multi-agent system update local variable information in an iterative manner, and the iterative update rule is as follows:

wherein, k represents the number of iterations,as a function f_i(x_i) Obtaining the minimum state value x_iThe value of (a) is selected,is the local data raw variable information of agent i at the k-th iteration, andi, j are two adjacent agents, the set of neighbor agents of i is denoted N_iNumber d of neighbor agents of i_i＝|N_i|，Represents the penalty term coefficient set by agent i for neighbor j at iteration k-1,the penalty factor set by the intelligent agent to the intelligent agent is represented,obtaining a state value x for a dual variable in the k-1 iteration through iteration updating_i；

In the iteration process, according to the error of the original variable between different agents, updating the penalty term coefficient by using a first formula, wherein the first formula is as follows:

wherein k is_max2For maximum number of updates of penalty term, penalty function variablec⁰In order to make the penalty term initially set,a penalty factor set for agent i to itself,a penalty factor set for agent i to neighbor j,is a function of the measured local state value of agent i at the kth iteration and the maximum minimum state value gap of all agents in the system, andwherein the content of the first and second substances,the value of the error function for agent i at the kth iteration,the error function value for agent j at the kth iteration,the local data original variable information of the agent j at the k iteration;

in the iteration process, according to the original variable state value, updating the threshold function by using a second formula, wherein the second formula is as follows:

wherein The communication variable of the k-1 th time of the intelligent agent i is used for recording the local original variable value after communication, tau^kIs a sequence, k, set manually and decreasing with the number of updates_maxlIs the maximum number of updates of the communication threshold function.

According to the threshold function, whether the intelligent agents can communicate or not is confirmed, if the intelligent agents can communicate, information is exchanged among the intelligent agents, and the intelligent agents are updatedWhereinCommunication variables stored for agent i are updated if the agent is unable to communicate

Iteratively updating the dual variable using a third formula, the third formula being:

wherein the content of the first and second substances,for the dual variable at the kth iteration,set itself for the k-th iterationA penalty term is set to the device according to the system,a penalty item is set for the neighbor j in the k iteration;

and when the intelligent agent iteratively updates the local variable information, synchronously judging whether the updating error of the multi-intelligent-agent system meets a preset condition, if so, finishing the iterative process, otherwise, iteratively updating the multi-intelligent-agent system again according to the iterative updating rule until the preset condition is met, and finishing the iteration.

2. A multi-agent system distributed optimization control method as claimed in claim 1, wherein said update error is expressed as a fourth formula:

wherein, X^kIs a state quantity in the form of a matrix at the kth iteration, X⁰Is a matrix of initial state values, X, of the multi-agent system^k-1Updating the state value matrix of the system for the k-1 th time, C (k) is an updating error value, | | | | survival_FThe Frobenius norm of the matrix is represented.

3. The multi-agent system distributed optimization control method of claim 1, wherein the multi-agent system distributed optimization control method further comprises:

setting an error function for measuring state errors of the kth iteration agent i and other agents in the system, wherein the error function is as follows:

wherein the content of the first and second substances,to measure the value of the error function of the state error of the kth iterative agent i and other agents in the system,represents the maximum of the 2-norm of the function values of the status of all agents in the multi-agent system at the kth iteration,representing the minimum of the 2-norm of all agent status function values in the multi-agent system at the kth iteration.

4. The multi-agent system distributed optimization control method of claim 1, wherein the preset condition is that the value of the update error of the multi-agent system is less than 10^-8。

5. The multi-agent system distributed optimization control method of claim 1, wherein said updating the threshold function with the second formula based on the raw variable state values comprises:

when the updating times of the threshold function exceed k_max1Then, a threshold function is setAnd when the threshold function is 0, the threshold function disappears, and the communication is carried out among the intelligent agents in each iteration process.

6. A multi-agent system distributed optimization control method as claimed in claim 1, wherein said threshold function maximum number of updates k_max1To satisfy the updated error value C (k) > 1 × 10^-7Maximum number of iterations in time.

7. The multi-agent system distributed optimization control method of claim 1, wherein when the penalty item is updated by a sub-numberThe number exceeds the maximum updating times k of the penalty term_max2And resetting the penalty item of the intelligent agent to be in an initial state.

8. The multi-agent system distributed optimization control method of claim 1, wherein the penalty term maximum update times k_max2Set when updating error value C (k) > 5 × 10^-3The number of updates in time.

9. A storage medium, characterized in that the storage medium is a computer-readable storage medium on which a multi-agent system distributed optimization control method according to any one of claims 1-8 is stored.

Technical Field

The invention relates to the technical field, in particular to a distributed optimization control method and a storage medium for a multi-agent system.

Background

In the prior art, a distributed alternating direction multiplier algorithm for improving the performance of a multi-agent network system provides a relatively well-implemented distributed computing framework by integrating a plurality of classical optimization ideas, such as a gradient descent method, an original dual algorithm and the like, and combining the problems encountered by modern statistical learning, and can solve the problem of constrained convex optimization in a specific form by starting from an original dual operator and adopting an augmented Lagrange method with the core of the original dual algorithm. The framework can solve a lot of optimization problems and has good convergence rate, but the communication problem cannot be solved, and the condition of limiting communication is not considered, so that a lot of communication loss is generated while the system achieves consistency, and further development of the network system is restricted.

And the optimization algorithm for the centerless network system is based on a distributed alternative direction multiplier algorithm framework, further considers the communication problem, designs a communication examination method from the aspect of reducing the communication times in the communication problem, adopts an examination mode, avoids a plurality of unnecessary communications in the system and achieves the purpose of reducing the communication loss of the network system. However, the examination method will increase the calculation amount of the network system, and at the same time, many invalid communications will occur in the later stage of network system convergence, or the communication times are too small, which will affect the convergence speed of the network system, slow the time to reach consistency, and also affect the performance of the multi-agent network system.

Disclosure of Invention

The invention solves the problem that the overall performance of a network system is poorer due to large communication loss of a multi-agent network system.

According to one aspect of the present invention, the present invention provides a distributed optimization control method for a multi-agent system, comprising:

initializing a local state quantity of the multi-agent system;

the initialized agents in the multi-agent system update local variable information in an iterative manner, and the iterative update rule is as follows:

wherein, k represents the number of iterations,as a function f_i(x_i) Obtaining the minimum state value x_iThe value of (a) is selected,is the original variable information of the local data of the agent i at the k iteration, andi, j are two adjacent agents, the set of neighbor agents of i is denoted N_iNumber d of neighbor agents of i_i＝|N_i|，Represents the penalty term coefficient set by agent i for neighbor j at iteration k-1,the penalty factor set by the intelligent agent to the intelligent agent is represented,obtaining a state value xi through iterative updating for a dual variable during the k-1 iteration;

in the iteration process, according to the error of the original variable between different agents, updating the penalty term coefficient by using a first formula, wherein the first formula is as follows:

in the iteration process, according to the original variable state value, updating the threshold function by using a second formula, wherein the second formula is as follows:

Iteratively updating the dual variable using a third formula, the third formula being:

wherein the content of the first and second substances,for the dual variable at the kth iteration,a penalty term set for itself at the k-th iteration,a penalty item is set for the neighbor j in the k iteration;

Further, the update error is expressed by a fourth formula:

Further, the distributed optimization control method of the multi-agent system further comprises the following steps:

setting an error function for measuring state errors of the kth iteration agent i and other agents in the system, wherein the error function is as follows:

wherein the content of the first and second substances,to measure the value of the error function of the state error of the kth iterative agent i and other agents in the system,represents the maximum of the 2-norm of the function values of the status of all agents in the multi-agent system at the kth iteration,indicating wisdom at kth iterationThe minimum of the 2-norm of the function values of the states of all agents in the energy system.

Further, the preset condition is that the value of the system update error is less than 10^-8。

Further, the updating the threshold function according to the original variable state value by using the second formula includes:

when the updating times of the threshold function exceed k_max1Then, a threshold function is setAt this point, the threshold function disappears, and all agents communicate with their neighbors during each iteration.

Further, the maximum number of updates k of the threshold function_max1To satisfy the updated error value C (k) > 1 × 10^-7Maximum number of iterations in time.

Further, when the number of times of updating the penalty item exceeds the maximum number of times k of updating the penalty item_max2And resetting the penalty item of the intelligent agent to be in an initial state.

Further, the maximum updating times k of the penalty term_max2Set when updating error value C (k) > 5 × 10^-3The number of updates in time.

According to another aspect of the present invention, there is also disclosed a storage medium which is a computer-readable storage medium having stored thereon a multi-agent system distributed optimization control method as recited in any one of claims 1 to 8.

The invention uses a distributed alternating direction multiplier method to optimize the frame, and the algorithm has excellent iteration speed, strong robustness and universality and a communication threshold functionThe design of (2) stipulates that the intelligent agent can communicate only when the local information is updated sufficiently and effectively, ensures the high efficiency of communication, greatly reduces the communication resources consumed by system optimization, and sets the iteration times of the intelligent agent to the systemThe ability of the system to reduce communication loss has an impact. Adaptive penalty functionThe design of the method specifically designs a punishment item of the intelligent agent in each iteration according to different state values of the intelligent agent, can ensure the convergence speed of the system, ensures that the system has enough effectiveness in each iteration and has the maximum iteration number k for the system_max2The setting of (a) will have an effect on the convergence speed of the system. In short, the multi-agent system can achieve the optimization consistency through the calculation of the algorithm, and the consistency is that all agents in the system reach the optimal state valueThe optimal representation minimizes the sum of the function values of the multi-agent system at that state value.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a schematic diagram illustrating a node updating procedure of a multi-agent network system according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a fully-connected multi-intelligent system in the embodiment of the invention.

Fig. 3 is a schematic diagram illustrating the convergence speed and the communication loss effect of the present algorithm and the conventional alternative direction multiplier algorithm according to an embodiment of the present invention.

Detailed Description

Various exemplary embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

In the first embodiment, as shown in fig. 2, which is a schematic structural diagram of the fully-connected multi-agent system in this embodiment, the number of agents included in the multi-agent system (hereinafter referred to as the system) is denoted as n, each independent agent stores its own state information, and the state value of the ith agent is defined asFunction of state value ofRequires the local state function f_i(x) The function is a convex function, and an optimal solution exists, so that the algorithm can be guaranteed to obtain an optimal value at last and only one optimal value exists.

The system communicates through a non-directional communication network, wherein the network is defined as G ═ upsilon, epsilon }, upsilon is a non-empty node set, and each node represents an agent; ε represents a set of directed edges, if two nodes i, j can communicate, then there is (i, j) ε, node j represents the neighbor of i, the set of neighbors of node i is denoted N_iWhere the number of neighbors of i is d_i＝|N_iL. Each agent needs to save the data original variables locally, and the local original data variables of the agent iDual variableCommunication variablesFor recording the value of the original variable, the communication variable, locally after the last communicationFor recording the original variable values of the neighbors after the last communication.

For convenience of representation, the matrix form of each variable of the system is defined as The upper right corner T of the letter in the formula represents a transposed symbol, wherein X is a system original variable state value matrix and contains state value information of all agents, and X is_nStatus value information representing the nth agent,a matrix representing the communication state values of the system, containing the communication variable state value information of all agents, whereinRepresenting the communication variable state value of the nth agent, and Λ representing the system dual variable state value, including the dual variable state value information of all agents, wherein λ_nRepresenting the dual variable state values of the nth agent. Since the system has n agents and each variable x is described above_n，λ_nAre all p-dimensional vectors, thereforeThe matrices are different matrices with n rows and p columns.

As shown in fig. 1, a schematic diagram of a node updating step of a multi-agent network system in the system is shown, and the information processing step of an agent i specifically includes:

first, in order to achieve consistency of the intelligent system, the system needs to initialize a state quantity to 0. Setting initialization primitive variablesDual variableCommunication variablesCommunication variablesPenalty function variableI.e. all locally stored state value information about neighbors is initialized.

After the information of the intelligent agent is initialized, the optimal state value is searched by iterative updatingFirstly, an agent i updates local original data variable information The updating is carried out according to the following rules:

wherein, k represents the number of iterations,as a function f_i(x_i) Obtaining the minimum state value x_iThe value of (a) is selected,is the original variable information of the local data of the agent i at the k iteration, andi, j are two adjacent agents, the set of neighbor agents of i is denoted N_iNumber d of neighbor agents of i_i＝|N_i|，Represents the penalty term coefficient set by agent i for neighbor j at iteration k-1,the penalty term coefficient set by the agent to the agent during k-1 iterations is represented,is a dual variable at the k-1 iteration;<x,y>representing the inner product of the vectors x, y. In the step, a state value x which enables the sum of all intelligent body state function values in the system to be smaller is obtained through iteration_i。

Step two, in order to find a more suitable penalty term coefficient, the penalty term coefficient is updated according to the error of the original variable between different agentsc_ijThe update formula of k is:

wherein the content of the first and second substances,for an expression measuring the difference between the local function value and the maximum and minimum system value, k_max2For maximum number of updates of penalty term, penalty function variablec⁰In order to make the penalty term initially set,a penalty factor set for agent i to itself,a penalty term coefficient is set for the agent i to the neighbor j during the (k + 1) th iteration;

expressed as:

whereinAndthe specification is as follows:

wherein the content of the first and second substances,represents the maximum of the 2-norm of the function values of the status of all agents in the multi-agent system at the kth iteration,the minimum value of 2 norms of all the intelligent agent state function values in the multi-intelligent-agent system in the k iteration is expressed, the formula expression is convenient, and the setting is convenientTo measure the state error of agent i for the kth iteration with other agents in the system,the value of the error function for agent i at the kth iteration,the error function value for agent j at the kth iteration,is the local data raw variable information of agent j at the kth iteration.

k_max2For the maximum updating times of the penalty term, when the iteration times exceed k_max2When the penalty term of the agent is reset to the initial state, k_max2Designing different k according to different applications for the number of iterations when C (k) satisfies a certain condition_max2Better convergence speed can be obtained, and k is generally set_max2Is C (k) > 5e^-3Maximum number of iterations in time.

That is, before the system error value C (k) reaches a certain condition, the most suitable penalty item is updated adaptively according to the state value of the agentWhen reaching the stripAfter being formed, theReset to the initial state prevents the penalty term from affecting the achievement of the system coherency state. Will k_max2The communication loss can be better reduced by changing the application according to different application conditions.

Step three, in order to have a suitable threshold function in each iteration processThe threshold function needs to be updated based on the original variable state values,the update formula of (2) is:

wherein the content of the first and second substances, the communication variable of the k-1 th time of the intelligent agent i is used for recording the local original variable value after communication, tau^kIs a sequence, k, set manually and decreasing with the number of updates_maxlIs the maximum number of updates of the communication threshold function. Because in the iterative processIs reduced with the iteration process, and in order to measure the effectiveness of the update, tau^kAnd the iteration number is reduced along with the increase of the iteration number, a communication threshold is provided for the function, communication cannot be carried out if the communication condition is not met, and communication cannot be carried out when the original variable of the system is updated to be small, so that the communication number is reduced.

Since failure to communicate affects the achievement of system consistency, when the number of iterations exceeds k_max1Setting a threshold functionAt this time, the threshold function disappears, and the communication is performed between the agents in each iteration process, so that the system can reach consistency more quickly.

In some embodiments, k is set_max1To satisfy C (k) > 1e^-7Maximum number of iterations of time, k_max1The communication loss reduction effect can be different by changing according to different application conditions.

In summary, this step determines whether the agent can communicate and is the next dual variableThe update of (2) is data prepared. According toAnd determining whether the intelligent agent communicates with the neighbor or not to transmit information. When in useThe agent can communicate with its neighbors, exchange information, and updateOtherwise, the agent cannot communicate and updateIf i receives a communication from neighbor j, save and updateOtherwise update

Step four, finally updating the dual variableFor the lambda operator in the alternative direction multiplier method, by means of dual field state quantitiesTo find more suitable original variables for the next iterationPreparation is made. The update formula is as follows:

wherein the content of the first and second substances,is the dual variable at the kth iteration.

And step five, judging whether the updating error meets the requirement, wherein the updating error is expressed as a mathematical expression as follows:

here X^kIs a state quantity in the form of a matrix at the kth iteration, X⁰Is a matrix of initial state values, X, of the multi-agent system^k-1Updating the state value matrix of the system for the k-1 th time, C (k) is an updating error value, | | | | survival_FThe Frobenius norm of the matrix is represented.

In some embodiments, C (k) < 10 is set^-8And adjusting according to the application condition, if C (k) meets the requirement, finishing the iteration process, and if C (k) does not meet the requirement, returning to the second step to continue the iteration until the requirement is met and finishing the iteration.

Representing the state value X after iteration of the function, generally by defining an error function^kFrom last iteration state value X^k-1When C (k) < 10^-8Ending iteration to obtain optimal value, and at the moment, the multi-agent system reaches consistency to obtainI.e. the optimum result

Compared with the prior art, the method of the invention ensures that the multi-agent system has higher convergence rate and fast convergence consistency, and can ensure that the system can reach optimization consistency fast. In addition, the system has low communication loss, and the consistency optimizing process can be achieved among the multi-agent systems without excessive ineffective communication.

The effect of the method is illustrated below with reference to experimental data:

consider a multi-agent system with 50 agents, with the equation of state for the agents beingWherein Namely, it isIs a four-dimensional vector, A_(i)Is a four-dimensional matrix, b_(i)Is a four-dimensional vector, A_(i)And b_(i)Element in (1) is agent i interval [0,10 ]]Evenly distributed local data.

After the consistency optimization iterative process is completed by running the algorithm, the convergence speed and the communication loss of the algorithm and the traditional alternative direction multiplier algorithm are compared.

As shown in a and b in fig. 3, the abscissa of the two graphs is iteration number k, the ordinate of the a graph is precision c (k), the ordinate of the b graph is communication loss, and the communication loss is defined as the total communication number of the system after the multi-agent system completes consistency. The simulation experiment compares the communication loss of the system and the required iteration times when a certain convergence precision is reached. All parameters are adjusted to be optimal and the initialization variables of the algorithm of the present invention use the same penalty term as the conventional alternative direction multiplier algorithm.

The simulation result of the graph a shows that the algorithm of the invention can not influence the convergence speed of the system under the condition that the convergence is influenced by the communication threshold function and the self-adaptive penalty term function is accelerated, and meanwhile, the right graph can show that the communication loss of the system can be obviously reduced under the same precision requirement, and the communication loss of the algorithm of the invention is about 1/2 of the communication loss of the traditional alternative direction multiplier method.

In conclusion, the invention has good effect on balancing the communication loss and the convergence speed, can greatly reduce the communication loss without influencing the convergence speed, has practicability, and is suitable for a multi-agent system with serious communication loss and certain requirement on the convergence speed.

The above description is only exemplary of the present invention and should not be taken as limiting the invention, as any modification, equivalent replacement, or improvement made within the spirit and scope of the present invention should be included in the present invention.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

13页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种计算声屏障防水/尘孔径的方法

Distributed optimization control method of multi-agent system and storage medium

相关技术

网友询问留言