Intelligent bee-keeping behavior control method based on deep automatic coding and feature fusion

文档序号:1964395 发布日期:2021-12-14 浏览:22次 中文

阅读说明:本技术 基于深度自动编码和特征融合的智能体蜂拥行为控制方法 (Intelligent bee-keeping behavior control method based on deep automatic coding and feature fusion ) 是由 左源 朱效洲 姚雯 常强 于 2021-08-19 设计创作,主要内容包括:本发明公开了一种基于深度自动编码和特征融合的智能体蜂拥行为控制方法,包括:确定智能体的感知范围内的所有邻域智能体;利用自动编码机分别将智能体及每个邻域智能体的多源异构状态信息转化为数值化状态特征;分别对所有数值化状态特征进行维度级联,利用第一预设深度神经网络对级联后的数值化状态特征进行融合,获取智能体及每个邻域智能体的综合状态信息特征;对所有邻域智能体的综合状态信息特征进行加权合并,获取智能体的融合邻域特征;对智能体的综合状态信息特征和融合邻域特征进行维度级联,利用第二预设深度神经网络映射得到智能体的输出控制量。本发明能够控制智能体集群产生满足群体方向一致性和稳定性要求的智能体集群蜂拥行为。(The invention discloses an intelligent bee-hive behavior control method based on deep automatic coding and feature fusion, which comprises the following steps: determining all neighborhood agents within the agent's perception range; converting the multisource heterogeneous state information of the intelligent agent and each neighborhood intelligent agent into numerical state characteristics by using an automatic coding machine; performing dimensionality cascading on all the numerical state features respectively, and fusing the cascaded numerical state features by utilizing a first preset deep neural network to obtain comprehensive state information features of the intelligent agents and each neighborhood intelligent agent; weighting and combining the comprehensive state information characteristics of all neighborhood agents to obtain the fusion neighborhood characteristics of the agents; and carrying out dimensionality cascade on the comprehensive state information characteristic and the fusion neighborhood characteristic of the intelligent agent, and mapping by utilizing a second preset depth neural network to obtain the output control quantity of the intelligent agent. The invention can control the intelligent agent cluster to generate the intelligent agent cluster bee congestion behavior meeting the requirements of group direction consistency and stability.)

1. An intelligent agent bee-holding behavior control method based on deep automatic coding and feature fusion is used for controlling intelligent agent cluster bee-holding movement and comprises the following steps:

determining all neighborhood agents within the agent's perception range;

converting the multisource heterogeneous state information of the intelligent agent and each neighborhood intelligent agent into numerical state characteristics by utilizing a parallel deep learning automatic coding machine;

respectively carrying out dimension cascade on all numerical state features of the intelligent agent and each neighborhood intelligent agent, and fusing the numerical state features after the dimension cascade by utilizing a first preset deep neural network to obtain the comprehensive state information features of the intelligent agents and the comprehensive state information features of each neighborhood intelligent agent;

weighting and combining the comprehensive state information characteristics of all neighborhood agents to obtain fusion neighborhood characteristics corresponding to the agents;

and carrying out dimension cascade on the comprehensive state information characteristics of the intelligent agent and the fusion neighborhood characteristics corresponding to the intelligent agent, and mapping by utilizing a second preset depth neural network to obtain the output control quantity of the intelligent agent based on the characteristics after the dimension cascade.

2. The intelligent agent bee congestion behavior control method based on deep automatic coding and feature fusion of claim 1, wherein the multi-source heterogeneous state information of the intelligent agent comprises position, speed, acceleration, identification coding and energy source surplus of the intelligent agent.

3. The intelligent agent bee-holding behavior control method based on deep automatic coding and feature fusion as claimed in claim 2, wherein the deep learning automatic coding machine comprises an encoder and a decoder, and the form of the deep learning automatic coding machine is expressed as:

φi:

ρi:

wherein phi isiRepresenting the ith State information X for an agentiThe function of the encoder of (a) is,representing the ith state information X generated after passing through the encoderiCharacteristic of the numerical state of (1), ρiRepresenting the ith State information X for an agentiThe function of the decoder(s) of (c),i-th state information X representing agentiThe corresponding decoder output.

4. The intelligent agent bee congestion behavior control method based on deep automatic coding and feature fusion as claimed in claim 3, wherein the encoder employs a four-layer fully-connected neural network with Relu function as a nonlinear activation function;

the decoder adopts a four-layer fully-connected neural network, the first three layers of the four-layer fully-connected neural network adopt Relu function as nonlinear activation function, and the fourth layer outputs through linear superposition.

5. The intelligent agent bee congestion behavior control method based on deep automatic coding and feature fusion according to claim 4, wherein the first preset deep neural network adopts a three-layer fully-connected neural network with Relu function as a nonlinear activation function.

6. The intelligent agent bee congestion behavior control method based on deep automatic coding and feature fusion as claimed in claim 5, wherein the ith intelligent agent A in the intelligent agent cluster is setiIs a neighborhood of

Ith agent AiThe comprehensive state information characteristics of (1) are as follows:

ith agent AiCorresponding neighborhood agent AjThe comprehensive state information characteristics of (1) are as follows:

wherein A isjRepresenting the jth agent in the agent cluster,representing agent AiAnd agent AjDistance between R and RiRepresenting agent AiThe perceived radius of (a) of (b),representing the ith agent AiΨ represents a first preset depth neural network for fusing features, concat represents a cascaded merging of features in a dimension,representing the ith agent AiIth status information ofCorresponding quantified state characteristics, W and b represent learnable parameters of the network Ψ,representing jth agent AjThe overall status information characteristic of (a) is,representing jth agent AjIth status information ofCorresponding numerical status features.

7. The intelligent agent bee congestion behavior control method based on deep automatic coding and feature fusion as claimed in claim 6, wherein when the comprehensive state information features of all neighborhood intelligent agents are weighted and combined, the weight coefficient corresponding to each neighborhood intelligent agent is calculated by using the following formula;

wherein the content of the first and second substances,representing agent AiNeighborhood agent A ofjThe corresponding weight coefficients.

8. The intelligent agent bee congestion behavior control method based on deep automatic coding and feature fusion of claim 7, wherein the fusion neighborhood features corresponding to the intelligent agent are calculated and determined by using the following formula;

wherein the content of the first and second substances,representing agent AiCorresponding fused neighborhood features.

9. The intelligent agent bee congestion behavior control method based on deep automatic coding and feature fusion as claimed in claim 8, wherein the ith intelligent agent AiThe output control amount of (a) is calculated and determined by the following formula;

wherein the content of the first and second substances,representing the fusion information u obtained by dimension cascading of the comprehensive state information characteristic of the agent and the fusion neighborhood characteristic corresponding to the agentiRepresenting the ith agent AiOutput control amount of fctrlRepresenting a second predetermined deep neural network, WoutAnd boutRepresenting a network fctrlA learnable parameter of (c);

the second preset deep neural network adopts a four-layer fully-connected neural network, the first three layers of the four-layer fully-connected neural network adopt a Relu function as a nonlinear activation function, and the fourth layer outputs the nonlinear activation function through linear superposition.

10. The intelligent agent bee congestion behavior control method based on deep automatic coding and feature fusion according to any one of claims 1 to 9, characterized in that the output control quantity of the intelligent agent is a speed vector of the intelligent agent.

Technical Field

The invention relates to the technical field of intelligent agent cluster motion control, in particular to an intelligent agent bee-holding behavior control method based on deep automatic coding and feature fusion.

Background

The intelligent agent cluster is a cluster robot which is inspired by the life habits of natural living animals, the intelligent agent cluster has no centralized control structure, and the intelligent agent cluster decides to execute actions within the self capacity range through local interaction between the intelligent agents and the external environment, so that specific macroscopic group behaviors are developed, and specific tasks are executed. Therefore, the research on the swarm optimization control method of the intelligent agent has important value and significance for further improving the energy efficiency of the unmanned cluster system and playing the role of the unmanned system.

Early intelligent agent cluster motion control research focused on design, superposition and parameter adjustment of simple rules, and focused on problems of consistency, stability, convergence and the like by taking artificial design rules as guidance. Although the corresponding control method can be verified in simulation and small-scale clustering, the uncertainty of rule superposition and the imprecision of artificial induction cause the emergence of macroscopic behaviors to have uncontrollable property. With the continuous development of data-driven and learning algorithms such as artificial intelligence and deep learning and the improvement of the performance of hardware equipment, group intelligence algorithms and deep learning technologies have been adopted to realize the control of the intelligent agent cluster motion. For example, the method for planning a path of an intelligent agent disclosed in chinese patent document CN106970615A entitled "a real-time online path planning method for deep reinforcement learning" utilizes reinforcement learning to plan a path, emphasizes the role of a learning algorithm in real-time, adaptive and flexible scenes, but has an object of an individual without considering a group state, loses a macroscopic emerging ability, and cannot achieve a bee-holding behavior effect. For example, the patent document CN108921298A entitled "reinforcement learning multi-agent communication and decision method" discloses an agent control method, which mainly aims at interactive fusion of information features of multiple agents, and adaptively extracts task-related information by using a generalization mechanism and capability of deep learning without considering specific scenes, so as to improve the rear-end decision intelligence. The method has the advantages that the designed clustering type fusion method can effectively and adaptively aggregate the characteristics of indefinite quantity, and absorbs the advantage that the characterization learning can convert the physical quantity into the numerical quantity. However, the clustering method depends on the selection of clustering parameters, and the extraction of state features based on manual design and features which are not pre-trained has instability; and the output of the reinforcement learning-based method is discrete action, and the macro-emergence phenomenon of the cluster cannot be effectively and directly controlled by the design without definite action.

Therefore, how to effectively guide the agent to generate actions meeting the conditions of group consistency and stability in the local perception information and further to evolve the actions into the group bee-holding behavior becomes a technical problem to be solved by the technical staff in the field.

Disclosure of Invention

In order to solve part or all technical problems in the prior art, the invention provides an intelligent bee-holding behavior control method based on deep automatic coding and feature fusion.

The technical scheme of the invention is as follows:

the method is used for controlling the bee-holding movement of the intelligent agent cluster and comprises the following steps:

determining all neighborhood agents within the agent's perception range;

converting the multisource heterogeneous state information of the intelligent agent and each neighborhood intelligent agent into numerical state characteristics by utilizing a parallel deep learning automatic coding machine;

respectively carrying out dimension cascade on all numerical state features of the intelligent agent and each neighborhood intelligent agent, and fusing the numerical state features after the dimension cascade by utilizing a first preset deep neural network to obtain the comprehensive state information features of the intelligent agents and the comprehensive state information features of each neighborhood intelligent agent;

weighting and combining the comprehensive state information characteristics of all neighborhood agents to obtain fusion neighborhood characteristics corresponding to the agents;

and carrying out dimension cascade on the comprehensive state information characteristics of the intelligent agent and the fusion neighborhood characteristics corresponding to the intelligent agent, and mapping by utilizing a second preset depth neural network to obtain the output control quantity of the intelligent agent based on the characteristics after the dimension cascade.

In some possible implementations, the multi-source heterogeneous state information of the agent includes a location, a velocity, an acceleration, an identification code, and an energy remaining amount of the agent.

In some possible implementations, the deep learning automatic coding machine includes an encoder and a decoder, and the deep learning automatic coding machine is expressed in the form of:

φi:Xi→Hfi

wherein phi isiRepresenting the ith State information X for an agentiEncoder function of HfiRepresenting the ith state information X generated after passing through the encoderiCharacteristic of the numerical state of (1), ρiRepresenting the ith State information X for an agentiThe function of the decoder(s) of (c),i-th state information X representing agentiThe corresponding decoder output.

In some possible implementations, the encoder employs a four-layer fully-connected neural network with a Relu function as a nonlinear activation function;

the decoder adopts a four-layer fully-connected neural network, the first three layers of the four-layer fully-connected neural network adopt Relu function as nonlinear activation function, and the fourth layer outputs through linear superposition.

In some possible implementations, the first preset deep neural network employs a three-layer fully-connected neural network with a Relu function as a nonlinear activation function.

In some casesIn a possible implementation, the ith agent A in the agent cluster is setiIs a neighborhood of

Ith agent AiThe comprehensive state information characteristics of (1) are as follows:

ith agent AiCorresponding neighborhood agent AjThe comprehensive state information characteristics of (1) are as follows:

wherein A isjRepresenting the jth agent in the agent cluster,representing agent AiAnd agent AjDistance between R and RiRepresenting agent AiThe perceived radius of (a) of (b),representing the ith agent AiΨ represents a first preset depth neural network for fusing features, concat represents a cascaded merging of features in a dimension,representing the ith agent AiIth status information ofCorresponding quantified state characteristics, W and b represent learnable parameters of the network Ψ,express jth intelligenceBody AjThe overall status information characteristic of (a) is,representing jth agent AjIth status information ofCorresponding numerical status features.

In some possible implementation manners, when the comprehensive state information characteristics of all neighborhood agents are weighted and combined, the weight coefficient corresponding to each neighborhood agent is calculated by using the following formula;

wherein the content of the first and second substances,representing agent AiNeighborhood agent A ofjThe corresponding weight coefficients.

In some possible implementation manners, the fusion neighborhood characteristics corresponding to the agent are calculated and determined by using the following formula;

wherein the content of the first and second substances,representing agent AiCorresponding fused neighborhood features.

In some possible implementations, the ith agent AiThe output control amount of (a) is calculated and determined by the following formula;

wherein the content of the first and second substances,representing the fusion information u obtained by dimension cascading of the comprehensive state information characteristic of the agent and the fusion neighborhood characteristic corresponding to the agentiRepresenting the ith agent AiOutput control amount of fctrlRepresenting a second predetermined deep neural network, WoutAnd boutRepresenting a network fctrlA learnable parameter of (c);

the second preset deep neural network adopts a four-layer fully-connected neural network, the first three layers of the four-layer fully-connected neural network adopt a Relu function as a nonlinear activation function, and the fourth layer outputs the nonlinear activation function through linear superposition.

In some possible implementations, the output control quantity of the agent is a velocity vector of the agent.

The technical scheme of the invention has the following main advantages:

the intelligent agent bee-holding behavior control method based on deep automatic coding and feature fusion maps various dimension, span and dimension non-uniform continuous and discrete state information into a dimensionless feature vector space by utilizing a feature self-extraction mode of an automatic coding mechanism to serve as learnable numerical features, then each state feature of an intelligent agent is implicitly fused through dimension cascade and a deep neural network to obtain comprehensive state information features, the limited perception range of the intelligent agent is fully considered aiming at the intelligent agent colony bee-holding behavior, the neighborhood feature information of the intelligent agent is interactively fused, the output control quantity of the intelligent agent is obtained by utilizing deep neural network mapping on the basis of fusing the neighborhood features by combining the state information features of the intelligent agent, the intelligent agent is subjected to motion control according to the output control quantity, and the intelligent agent colony can generate the intelligent agent colony bee-holding behavior meeting the requirements of colony direction consistency and colony system stability And (6) behaviors.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of an intelligent agent bee congestion behavior control method based on deep automatic coding and feature fusion according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating the relationship between agents and their neighborhoods in accordance with an embodiment of the present invention;

fig. 3 is a structural framework and a flow chart of processing status information of an agent according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The technical scheme provided by the embodiment of the invention is described in detail below with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present invention provides an intelligent agent bee-holding behavior control method based on deep automatic coding and feature fusion, where the method is used to control intelligent agent cluster bee-holding motions, and includes the following steps:

s1, determining all neighborhood agents in the perception range of the agents;

s2, converting the multisource heterogeneous state information of the agent and each neighborhood agent into numerical state features by using a parallel deep learning automatic coding machine;

s3, dimension cascading is conducted on all numerical state features of the agents and each neighborhood agent respectively, the numerical state features after the dimension cascading are fused through the first preset depth neural network, and the comprehensive state information features of the agents and the comprehensive state information features of each neighborhood agent are obtained;

s4, weighting and combining the comprehensive state information characteristics of all neighborhood agents to obtain fusion neighborhood characteristics corresponding to the agents;

and S5, performing dimensionality cascade on the comprehensive state information characteristics of the agent and the fusion neighborhood characteristics corresponding to the agent, and mapping by using a second preset depth neural network to obtain the output control quantity of the agent based on the characteristics after the dimensionality cascade.

The intelligent agent bee-holding behavior control method based on the deep automatic coding and the feature fusion provided by the embodiment of the invention utilizes the feature self-extraction mode of the automatic coding mechanism to map continuous and discrete state information with various dimensions, spans and dimensions being nonuniform into a dimensionless feature vector space as learnable numerical features, then implicitly fuses each state feature of the intelligent agent through dimension cascade and a deep neural network to obtain comprehensive state information features, fully considers the limited sensing range of the intelligent agent aiming at the intelligent agent cluster bee-holding behavior to carry out interactive fusion on the neighborhood feature information of the intelligent agent, combines the state information features of the intelligent agent on the basis of fusing the neighborhood features, utilizes the deep neural network to map to obtain the output control quantity of the intelligent agent, and carries out motion control on the intelligent agent according to the output control quantity, the intelligent agent cluster bee congestion behavior meeting the requirements of group direction consistency and group system stability can be generated by the intelligent agent cluster.

The following specifically describes each step and principle of the intelligent agent bee congestion behavior control method based on deep automatic coding and feature fusion according to an embodiment of the present invention.

Step S1, determine all neighborhood agents within the agent' S perception range.

In an embodiment of the invention, when controlling the bee-holding behavior of the agent cluster, a single agent is respectively used as an investigation object to determine all neighborhood agents within the sensing range of each agent, the output control quantity of each agent at the next moment is obtained based on all the determined neighborhood agents, and the agent is controlled to move according to the output control quantity.

See FIG. 2 for an ith agent A in a cluster of agentsiThe output control amount of (2) is set as an example: agent AiHas a sensing radius of RiJ, agent A in agent clusterjAt agent AiWithin the sensing range of (a); then the ith agent a in the agent clusteriMay be

Wherein the content of the first and second substances,representing agent AiAnd agent AjDistance between, i.e. agent AiTo agent ajThe distance of the center point of (a),can be calculated using the following formula;

wherein S isiRepresenting agent AiPosition in space, SjRepresenting agent AjIn the space position, | · | non-woven phosphor2Representing a 2-norm operator.

And step S2, converting the multisource heterogeneous state information of the agents and each neighborhood agent into numerical state characteristics by using a parallel deep learning automatic coding machine.

In one embodiment of the invention, the multi-source heterogeneous state information of the intelligent agent comprises the position, the speed, the acceleration, the identification code and the energy surplus of the intelligent agent; the method comprises the steps of establishing the dimension unification, continuous numeralization and learnable state characteristics of multi-source heterogeneous state information, namely the numeralization state characteristics, through a parallel deep learning automatic coding machine.

Referring to fig. 3, in an embodiment of the present invention, the deep learning automatic coding machine includes an encoder and a decoder, and the form of the deep learning automatic coding machine for any one agent may be represented as:

φi:Xi→Hfi

wherein phi isiRepresenting the ith State information X for an agentiEncoder function of HfiRepresenting the ith state information X generated after passing through the encoderiImplicit features of (1), i.e. numerical state features, piRepresenting the ith State information X for an agentiThe function of the decoder(s) of (c),i-th state information X representing agentiThe corresponding decoder output.

The dimension of all the implicit characteristics of the state information generated by the encoder is the same, so that the subsequent characteristic fusion is convenient to carry out; the specific characteristic dimension can be preset according to the actual situation, for example, the characteristic dimension is preset to m dimension

Optionally, the encoder may adopt a four-layer fully-connected neural network with a Relu function as a nonlinear activation function, and the encoder shares parameters and a neural network structure for the agent cluster, that is, the encoder structure and parameters for the ith state information of different agents are the same; the decoder can also adopt a four-layer fully-connected neural network, the first three layers in the four-layer fully-connected neural network adopt Relu function as nonlinear activation function, the fourth layer outputs through linear superposition, and the decoder shares parameters and neural network structure for the agent cluster, namely the decoder structure and parameters for the ith state information of different agents are the same.

In one embodiment of the invention, the decoder uses the parameters phi of the encoder and decoder during self-learning pre-trainingiAnd ρiThe optimization determination may be made by minimizing a loss function during a self-learning pre-training phase.

Specifically, the ith status information X for the agentiThe learning optimization objective function of (2) may be:

where | l | · | |, represents a vector spatial metric.

In an embodiment of the present invention, different objective functions may be used for different feature data, for example, mean absolute error or mean square error may be used for continuous data, and cross entropy may be used for discrete data.

Step S3, dimension cascading is conducted on all numerical state features of the agents and each neighborhood agent respectively, the numerical state features after dimension cascading are fused through the first preset depth neural network, and the comprehensive state information features of the agents and the comprehensive state information features of each neighborhood agent are obtained.

In an embodiment of the present invention, the first predetermined deep neural network may adopt a three-layer fully-connected neural network using a Relu function as a nonlinear activation function.

Specifically, to obtain the ith agent A in the agent clusteriFor example, the ith agent AiThe integrated status information feature of (a) may be expressed as:

ith agent AiCorresponding neighborhood agent AjThe integrated status information feature of (a) may be expressed as:

wherein the content of the first and second substances,representing the ith agent AiΨ represents a first preset depth neural network for fusing features, concat represents a cascaded merging of features in a dimension,representing the ith agent AiIth status information ofCorresponding quantified state characteristics, W and b represent learnable parameters of the network Ψ,representing jth agent AjThe overall status information characteristic of (a) is,representing jth agent AjIth status information ofCorresponding numerical status features.

Where Ψ, W and b are parameter and neural network structures shared for the agent cluster, i.e. the structure and parameters of the first preset depth neural network for different agents are identical.

And step S4, carrying out weighting combination on the comprehensive state information characteristics of all neighborhood agents to obtain fusion neighborhood characteristics corresponding to the agents.

In an embodiment of the present invention, when performing weighting combination on the comprehensive state information features of all neighborhood agents, the weight coefficient corresponding to each neighborhood agent may be determined by calculation according to the relative distance between the neighborhood agent and the agent to which the neighborhood agent belongs and the sensing range of the agent to which the neighborhood agent belongs.

Specifically, to obtain the ith agent A in the agent clusteriFor example, agent AiNeighborhood agent A ofjThe corresponding weight coefficient can be calculated and determined by the following formula;

wherein the content of the first and second substances,representing agent AiNeighborhood agent A ofjCorresponding weight coefficient, RiRepresenting agent AiThe perceived radius of (a) of (b),representing agent AiAnd agent AjThe distance between them.

Further, agent AiThe corresponding fusion neighborhood characteristics can be calculated and determined by the following formula;

wherein the content of the first and second substances,representing agent AiCorresponding fused neighborhood features.

And step S5, performing dimensionality cascade on the comprehensive state information characteristics of the agent and the fusion neighborhood characteristics corresponding to the agent, and mapping by using a second preset depth neural network to obtain the output control quantity of the agent based on the characteristics after the dimensionality cascade.

In order to control the operation of the agent under the constraint condition of the motion law, the current state of the agent and the influence of the neighborhood agent set corresponding to the agent on the agent need to be considered at the same time. Therefore, in an embodiment of the present invention, when the output control quantity of the agent is obtained, dimension cascading is performed on the comprehensive state information feature of the agent and the fusion neighborhood feature corresponding to the agent, and then the feature after the dimension cascading is mapped to the output control quantity of the agent at the next time by using the second preset deep neural network.

Specifically, to obtain the ith agent A in the agent clusteriTaking the output control quantity of (1) as an example, carrying out dimension cascade operation on the comprehensive state information characteristic of the agent and the fusion neighborhood characteristic corresponding to the agent by using the following formula;

wherein the content of the first and second substances,and representing the fusion information obtained by carrying out dimension cascade on the comprehensive state information characteristic of the intelligent agent and the fusion neighborhood characteristic corresponding to the intelligent agent, and concat represents cascade combination of the characteristic on the dimension.

Further, in an embodiment of the present invention, the second preset deep neural network may adopt a four-layer fully-connected neural network, and the first three layers of the four-layer fully-connected neural network adopt a Relu function as a nonlinear activation function, and the fourth layer performs output by linear superposition. The second preset deep neural network shares parameters and a neural network structure for the agent cluster, namely the adopted second preset deep neural network structure and parameters are the same when the output control quantity of different agents is solved.

Specifically, to obtain the ith agent A in the agent clusteriFor example, the ith agent AiThe output control amount of (d) may be expressed as:

wherein u isiRepresenting the ith agent AiOutput control amount of fctrlRepresenting a second predetermined deep neural network, WoutAnd boutRepresenting a network fctrlMay be used to learn the parameters.

Further, in an embodiment of the present invention, in order to facilitate the motion control of the agent cluster, a velocity vector may be used as a control quantity, at this time, an output control quantity of the agent obtained based on the above process is a velocity vector of the agent, and the motion of the agent is controlled according to the obtained velocity vector.

When the output control quantity of the agent is the speed vector of the agent, the time interval delta t is used as a single control period, and the ith agent A in the agent cluster is subjected toiTaking control as an example, agent AiThe motion path to be in the control period can be expressed as:

wherein the content of the first and second substances,agent A representing time t + Δ tiIs determined by the position vector of (a),agent A representing time tiPosition vector of (V)iRepresenting agent AiVelocity vector of Vi=ui

In an embodiment of the present invention, for parameter optimization training of a deep learning automatic coding machine and a deep neural network, motion data of a classical Reynolds congestion model may be used as a training set (X)train,Vtrain). Wherein, XtrainThe self-learning pre-training method is used for self-learning pre-training of the deep learning automatic coding machine, and the given learning optimization objective function can be used for training the objective function; vtrainTraining for general control models other than self-learning pre-training, includingFor the encoder parameter fine tuning training obtained by the deep neural network parameter training and the pre-training of the control output part, the corresponding training objective function can be as follows:

CtrlLoss=||V-Vtrain||

where V represents an output control amount of the agent, which is a velocity vector in an embodiment of the present invention.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. In addition, "front", "rear", "left", "right", "upper" and "lower" in this document are referred to the placement states shown in the drawings.

Finally, it should be noted that: the above examples are only for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种强化学习中超高精度探索环境下的状态空间处理方法、系统及电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!