Augmented reality multi-agent cooperative confrontation realization method based on reinforcement learning

文档序号：35566 发布日期：2021-09-24 浏览：26次中文

阅读说明：本技术 一种基于强化学习的增强现实多智能体协作对抗实现方法 (Augmented reality multi-agent cooperative confrontation realization method based on reinforcement learning ) 是由陈靖张君瑞周俊研于 2021-05-25 设计创作，主要内容包括：本发明方法提供一种增强现实环境下多智能体对抗仿真环境实现方法,利用深度强化学习网络结合课程学习预测各个智能体行为并作出决策,再将训练完成的强化学习智能体模型迁移至增强现实环境中,能够解决增强现实对抗仿真环境中虚拟多智能体协作策略单一造成的人机交互体验不佳的问题,具有使真实用户和虚拟多智能体之间协作对抗策略灵活多变的效果。(The method provided by the invention provides a method for realizing a multi-agent confrontation simulation environment in an augmented reality environment, the behaviors of various agents are predicted and decision is made by utilizing a deep reinforcement learning network in combination with course learning, and then a trained reinforcement learning agent model is migrated to the augmented reality environment, so that the problem of poor human-computer interaction experience caused by single virtual multi-agent cooperation strategy in the augmented reality confrontation simulation environment can be solved, and the effect of enabling the cooperative confrontation strategy between a real user and a virtual multi-agent to be flexible and changeable is achieved.)

1. An augmented reality multi-agent cooperative confrontation realization method based on reinforcement learning is characterized by comprising the following steps:

step 1: in an off-line stage, modeling is carried out on a real scene, and a dense three-dimensional point cloud map is constructed and triangulated;

step 2: the method is characterized in that a virtual simulation confrontation environment is set up to train the multi-agent according to a real scene, and the method comprises the following steps:

(1) simulating a real scene, then building a virtual simulation confrontation scene, and arranging a plurality of intelligent agents in the virtual simulation confrontation scene; the plurality of intelligent agents are divided into two teams which are in mutual confrontation, both confrontation parties can freely move in a scene, and the task targets of both the confrontation parties are that teams cooperate to kill the other party, so that a simulated confrontation environment is formed;

(2) respectively setting a strategy model for both confrontation parties, and sharing a set of strategy model parameters between the agents in the same team;

(3) completing state input, reward setting and action output of the intelligent agent by using a self-contained assembly MLagent of a three-dimensional rendering engine Unity 3D;

(4) the intelligent agent carries out cyclic reciprocating training on the strategy model according to the continuously input state input, the reward information and the action output;

and step 3: after training is finished, importing the trained intelligent agent strategy model in an online stage, importing the real scene model constructed in the step 1 into a three-dimensional rendering engine, and adding a rigid body component into the three-dimensional rendering engine; then rendering the equipment at a corresponding position in a real scene to realize the rendering and drawing of a subsequent augmented reality simulation confrontation environment;

and 4, step 4: acquiring a six-degree-of-freedom global pose of a user in real time, and transmitting the six-degree-of-freedom global pose to a virtual camera in a three-dimensional rendering engine;

and 5: importing pictures captured by a real camera and rendering the pictures in a real scene;

step 6: hiding the real scene model constructed in the step 1, reserving the rigid body component for collision detection, finally forming an augmented reality simulation confrontation environment for interaction between the user and the intelligent body in the real scene, and forming cooperation between the augmented reality simulation confrontation environment and the intelligent body to complete the confrontation task.

2. The augmented reality multi-agent cooperative confrontation realization method based on reinforcement learning as claimed in claim 1, wherein in step 2, the training process combines the course training mode to divide the scene complexity into three levels of simple, medium and difficult, and the three complexity levels are adopted in sequence to train the strategy model during training.

3. The augmented reality multi-agent cooperative confrontation implementation method based on reinforcement learning as claimed in claim 1, wherein in step 3, the equipment is rendered at the corresponding position in the real scene by using the following conversion formula among the world coordinate system, the camera coordinate system, the image coordinate system and the pixel coordinate system:

in the above formula, (epsilon, eta, delta) is three-dimensional position information of the agent in a world coordinate system; t is_CWRepresenting a transformation from a world coordinate system to a camera coordinate system; the camera model determines the conversion relation between a camera coordinate system and an image coordinate system; for the perspective projection model, θ represents the longitudinal view angle of the camera, n is the distance from the center of the camera to the near clipping plane, f is the distance from the center of the camera to the far clipping plane, Aspect is the Aspect ratio of the projected image, K is the internal parameter of the camera, Z is the internal parameter of the camera, and_cand gamma is related to depth.

4. The augmented reality multi-agent cooperative confrontation realization method based on reinforcement learning as claimed in claim 1, characterized in that a real-time tracking and positioning algorithm is adopted to obtain the global pose of six degrees of freedom of the user in real time.

5. The augmented reality multi-agent cooperative fight implementation method based on reinforcement learning of claim 1, wherein python software is adopted in the agents to receive the state input and reward information and train the model.

6. The augmented reality multi-agent cooperative confrontation realization method based on reinforcement learning as claimed in claim 1, wherein in step 1, a real scene is modeled by a three-dimensional laser scanner.

7. The augmented reality multi-agent cooperative confrontation implementation method based on reinforcement learning as claimed in claim 1, wherein the state input includes: the attributes and the directions of the surrounding agents and the self-related state information are represented in a vector form; image information and location information of the friend agent.

Technical Field

The invention belongs to the technical field of information, and particularly relates to an augmented reality multi-agent cooperative confrontation implementation method based on reinforcement learning.

Background

In recent years, along with the continuous breakthrough of Artificial Intelligence (hereinafter referred to as "AI") related technologies and the continuous maturity of related algorithms, AI agents have been gradually advanced into various fields, and have shown good application effects in the fields of intelligent robots, unmanned vehicles, virtual reality, augmented reality, and the like. In an augmented reality simulation confrontation environment, good virtual-real interaction experience becomes an important ring needing optimization at present, and the intelligence of a virtual target is one of keys for improving the virtual-real interaction experience. In the current augmented reality simulation countermeasure environment, the cooperative countermeasure interaction between the virtual multi-agent and the real user is usually executed by presetting behavior rules of a virtual target, such as behavior designs of a commonly-used state machine, a behavior tree and the like.

In order to improve the intelligence of the multiple intelligent agents in the augmented reality simulation confrontation environment, a deep reinforcement learning algorithm can be adopted to train the multiple intelligent agents in the environment, so that the multiple intelligent agents can independently learn an intelligent cooperation strategy, complete virtual-real interaction with a user in the augmented reality simulation environment, and realize intelligent cooperation confrontation behavior. The deep reinforcement learning is one of emerging technologies in the field of artificial intelligence at present, reinforcement learning is taken as a foundation stone, and multiple defects of the reinforcement learning are made up by using the advantages of deep learning feature extraction capability, so that complementation is formed, and an end-to-end autonomous learning strategy framework from perception to decision is realized. Compared with the multi-agent behavior realized by the traditional method, the method has better virtual and real confrontation interactive experience and can achieve better cooperation effect.

Meanwhile, in order to solve the problem that the training cost of the deep reinforcement learning in the real environment is too high due to continuous trial and error in the learning process, the real environment is generally migrated after the training is completed in the built virtual simulation environment. Unity3D, as a specialized game engine that is currently in widespread use, can be used to build augmented reality simulation environments. The virtual multi-agent is trained by using the reinforcement learning algorithm, the multi-agent cooperation strategy model is obtained and then is migrated to the augmented reality environment, the intelligence of the multi-agent in the augmented reality simulation confrontation environment is improved, and the interactive experience is improved. The method can be used for constructing intelligent confrontation simulation environments of military simulation training, augmented reality games and the like.

Disclosure of Invention

In view of this, the present invention provides an augmented reality multi-agent cooperative countermeasure implementation method based on reinforcement learning, which can solve the problems of single multi-agent behavior strategy, poor cooperative intelligence and poor virtual-real target interaction experience in an augmented reality simulated countermeasure environment.

An augmented reality multi-agent cooperative confrontation implementation method based on reinforcement learning comprises the following steps:

step 1: in an off-line stage, modeling is carried out on a real scene, and a dense three-dimensional point cloud map is constructed and triangulated;

step 2: the method is characterized in that a virtual simulation confrontation environment is set up to train the multi-agent according to a real scene, and the method comprises the following steps:

(2) respectively setting a strategy model for both confrontation parties, and sharing a set of strategy model parameters between the agents in the same team;

(3) completing state input, reward setting and action output of the intelligent agent by using a self-contained assembly MLagent of a three-dimensional rendering engine Unity 3D;

(4) the intelligent agent carries out cyclic reciprocating training on the strategy model according to the continuously input state input, the reward information and the action output;

and 5: importing pictures captured by a real camera and rendering the pictures in a real scene;

Furthermore, in the step 2, the training process combines the course training mode to divide the scene complexity into three degrees of simplicity, moderate and difficulty, and the three complexity measures are adopted to train the strategy model in sequence during the training.

Preferably, in step 3, the equipment is rendered at a corresponding position in the real scene by using the following conversion formula among the world coordinate system, the camera coordinate system, the image coordinate system and the pixel coordinate system:

Preferably, a real-time tracking and positioning algorithm is adopted to obtain the six-degree-of-freedom global pose of the user in real time.

Preferably, the agent uses python software to receive status inputs and reward information and train the model.

Preferably, in step 1, a three-dimensional laser scanner is used to model a real scene.

Preferably, the status inputs include: the attributes and the directions of the surrounding agents and the self-related state information are represented in a vector form; image information and location information of the friend agent.

The invention has the following beneficial effects:

the method provided by the invention provides a method for realizing a multi-agent confrontation simulation environment in an augmented reality environment, the behaviors of various agents are predicted and decision is made by utilizing a deep reinforcement learning network in combination with course learning, and then a trained reinforcement learning agent model is migrated to the augmented reality environment, so that the problem of poor human-computer interaction experience caused by single virtual multi-agent cooperation strategy in the augmented reality confrontation simulation environment can be solved, and the effect of enabling the cooperative confrontation strategy between a real user and a virtual multi-agent to be flexible and changeable is achieved.

Drawings

FIG. 1 is a real map model obtained by three-dimensional laser scanning and triangulation of the present invention;

FIG. 2 is a set-up virtual simulation environment;

FIG. 3 is a flow chart of the method of the present invention;

FIG. 4 is a training flow diagram;

FIG. 5 is a schematic diagram of a training process;

FIG. 6 is a diagram of the effect of training;

fig. 7 is a diagram of real scene effects.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

An augmented reality multi-agent cooperative confrontation implementation method based on reinforcement learning comprises the following basic implementation processes:

step 1: in an off-line stage, a three-dimensional laser scanner is used for modeling a real scene, and a dense three-dimensional point cloud map is constructed and triangulated.

Step 2: and importing the constructed model into a three-dimensional rendering engine, and adding a rigid body component to realize collision detection for subsequent reinforcement learning training.

And step 3: because the real scene model is too big, in order to promote the training speed, imitate the real scene and build the virtual simulation counter environment to train the multi-agent, its specific process is as follows:

(1) simulating a real scene, building a virtual simulation confrontation scene by using a model prefabricated body, dividing a plurality of intelligent bodies into two red and blue teams, enabling both red and blue parties to freely move in the scene, and enabling task targets of both the red and blue parties to be team cooperation to fight and kill the opposite tank so as to form a simulation confrontation environment.

(2) Two strategy models are set, and the two teams respectively use the respective strategy models but share one model parameter with each other. The training process is to collect the input of observation state, combine the reward information aiming at different tasks and obtain the action output with higher reward and more effective through the deep reinforcement learning network learning. Scene management includes control round start, round progress and round termination.

(3) The reinforcement learning module utilizes a three-dimensional rendering engine Unity3D with its own component MLAgent (reinforcement learning plug-in) to complete status input, reward setting and action output. The state perception input of the Agent to the environment is divided into two parts of vector input and image input. The vector state input comprises the steps of detecting rays to obtain the attributes and the directions of surrounding targets and the self-related state information, and the attributes of the surrounding targets are distinguished by setting different labels. The image input is the image information of only the friend agent in the scene processed by the mask, so that the state input is ensured to include the observed state of the image input, and the friend position information is included, and the state input communication among the multiple agents is realized.

(4) Each Agent obtains respective state input for environment perception in a scene, the python end trains a model by combining the received Agent state with set reward information, the action value is taken as output and transmitted back to the Agent, the action is completed in the Unity3D scene, new state input is collected again, circulation is repeated, and the reward value obtained after decision making is continuously improved.

(5) The training process combines the course training mode, divide into simple, medium, difficult three degree with the scene complexity, adopts simple scene to carry out preliminary training to the action of intelligent agent in training earlier stage, promotes the scene complexity after decision-making reward value converges for the intelligent agent carries out further study on the basis of original strategy model, finally obtains the reinforcement learning model.

And 4, step 4: after the training is finished, an Agent model after the training is led in at an online stage, and the virtual intelligent tank is rendered at a corresponding position in a real scene by using a conversion formula (1) among a world coordinate system, a camera coordinate system, an image coordinate system and a pixel coordinate system, so that the rendering and drawing of a subsequent augmented reality simulation confrontation environment are realized.

In formula (1), (epsilon, eta, delta) is three-dimensional position information of the Agent in a world coordinate system. T is_CWRepresenting the transformation from the world coordinate system to the camera coordinate system, is updated in real time by a tracking and positioning algorithm. The camera model determines the transformation relationship between the camera coordinate system and the image coordinate system. For the perspective projection model, θ represents the longitudinal view angle of the camera, n is the distance from the center of the camera to the near clipping plane, f is the distance from the center of the camera to the far clipping plane, Aspect is the Aspect ratio of the projected image, and K is an internal parameter of the camera, and is determined by the camera itself. Zc and γ are depth dependent. 1/Zc is a scale factor.

And 5: and acquiring the six-degree-of-freedom global pose of the user in real time by adopting a real-time tracking and positioning algorithm, and transmitting the six-degree-of-freedom global pose to a virtual camera in a three-dimensional rendering engine.

Step 6: and importing pictures captured by a real camera and rendering the pictures in the scene.

And 7: hiding the off-line three-dimensional map model constructed in the step 1, only reserving collision detection, and finally forming an augmented reality simulation confrontation environment for a user to interact with the virtual intelligent body in a real scene. The virtual multi-agent adopts a reinforcement learning model to realize intelligent decision, so that a user can simultaneously carry out moving and attacking operations in a real scene, and the virtual multi-agent and the agent form cooperation to complete a counterwork task.

Example (b):

an augmented reality multi-agent cooperative confrontation implementation method based on reinforcement learning comprises the following specific steps:

step 1: and (5) constructing a map offline. And (3) adopting a FARO laser scanner to construct a three-dimensional dense point cloud map model of the real scene in an off-line manner. When an experimental scene is scanned, multi-site scanning is adopted, and map three-dimensional data of each site is converted into a coordinate system with a first scanning site as an origin through coordinate system transformation. After scanning is finished, an RGBD panorama of 3D point and color information provided by a scanning result of a FARO laser scanner is utilized, the panorama is converted into a triangular gridding map through a greedy projection triangulation algorithm, and finally an off-line map model is shown in fig. 1.

Step 2: and importing the constructed real scene map model into a Unity3D three-dimensional rendering engine, and adding rigid bodies to the landform, the building, the object and the like existing in the real scene by using a collision detection mode based on the bounding box in Unity3D so as to realize collision interaction between the virtual and real objects.

And step 3: because the real scene model is too big, in order to improve the training speed, a virtual simulation confrontation environment is set up by imitating the real scene to train the multi-agent, and the training comprises the following specific steps:

(1) in the virtual simulation confrontation environment, the tank comprises the ground, buildings, obstacles, trees and intelligent tanks of red and blue parties, as shown in fig. 2. The number of the intelligent agents of each party is 3, both the red and blue parties can move freely in the scene, and the task targets of both the red and blue parties are team cooperation to destroy the tank of the other party, so that a simulated confrontation environment is formed. At the beginning of the round, the initial life values of all tanks of both red and blue sides are set to 100. When the self is attacked or collided, the life value is reduced. The damage value caused by the cannonball is physically simulated according to the real situation, and is influenced by the distance between the drop point of the cannonball and the target, so that the damage value is unequal. When the life value of the tank is less than or equal to 0, the tank is destroyed and dies, and the tank disappears in the scene. When one tank is completely destroyed and died in the scene, the other tank is the confrontation winner. Then the turn is restarted immediately, and the scene and the tank state are all reset.

(2) The reinforcement learning part is built by adopting an MLagent component in a Unity3D rendering engine. The two red and blue teams are divided into two models to be trained in a scene at the same time, and the same team shares one model parameter, so that the red and blue teams are promoted to generate more cooperation and countermeasure strategies.

(3) To obtain more effective observation information, the state input information is divided into two parts, vector and image. Wherein, the vector input is realized by ray detection sent by the tank from the tank. A ray with the length of 100 is arranged under the tank at an angle {0,30,50,70,80,85,90,95,100,110,130,150,180,220,270,320}, and the ray can sense an object through a label of the ray hitting the object, and sensing information comprises an enemy tank, a friend tank, obstacles and the ground. If the perception object is detected at the current angle, setting the value of the vector space position corresponding to the state input as 1, and outputting the distance relative to the ray length 100. Another aspect of the vector input is the tank's own state information, including its own vital value, its own velocity. And the image input information is used as an important friend party information sharing mode, only the friend party tank image layer is reserved and displayed by adding a mask to the image output by the camera, so that the current friend party tank position and the survival state information can be effectively displayed, and then the image is used as state input and is simultaneously transmitted into the training network.

(4) The output of the action space comprises movement, rotation and whether attack exists, and the specific design is as shown in the following table:

motion output	Move	Rotate	Whether or not to attack
				-1	To the rear	Left turn
0	At rest	Direction is not changed	Not attack
				1	Forward	Right turn	Attack of

(5) The training algorithm adopts a PPO (near-end strategy optimization) algorithm. The state information transmitted by tank observation is used as input, and the target can find the optimal strategy to complete the task and obtain a higher reward value by combining reward setting and autonomous learning. And the algorithm network transmits the action output back to the tank, and the tank collects new state information after executing the strategy and performs cyclic reciprocating operation to finally obtain the strategy model. Wherein the reward settings are as follows:

intelligent tank	Reward
		Fighting enemy	+5
Chuzhonghua Fang (Chinese character of 'Zhuzhonghua')	-1.5
		Blank out	-0.01
Time line eachStep by step	-0.00005
		Death was caused by death	-0.5
Move forward	+0.0005
		Collision barrier	-0.05

The training parameter settings are shown in the following table:

(6) the training process adopts a course training mode, and scene complexity is divided into three degrees of simplicity, moderate degree and difficulty. The scene is specifically set as the following table:

complexity of scene	Number of obstacles	Size of scene
			Simple	2	2*2
Medium and high grade	10	10*10
			Difficulty in	50	50*50

The first 1/3 rounds of the total round number adopt simple scenes to preliminarily train the behaviors of the intelligent agent, the scene complexity is improved after the decision reward value is converged, the three scene complexities appear randomly, the intelligent agent further learns on the basis of the original strategy model, when the decision reward value reaches the convergence again, the training scenes are changed into the training by only using the complex scenes, and at the moment, the final cooperative countermeasure strategy model is obtained until the total step length is completed. The convergence speed and the final training effect of the multi-agent model can be effectively improved through a course learning mode.

And 4, step 4: and importing a trained cooperative confrontation strategy model, rendering a virtual intelligent tank to a real scene, and completing an augmented reality simulated confrontation environment. First, the virtual intelligent tank target in the world coordinate system needs to be converted into the pixel coordinate system. The formula involved in the conversion process is as follows:

wherein (epsilon, eta, delta) is three-dimensional position information of the Agent in a world coordinate system. T is_CWRepresenting the transformation from the world coordinate system to the camera coordinate system, is updated in real time by a tracking and positioning algorithm. The camera model determines the transformation relationship between the camera coordinate system and the image coordinate system. In Unity3D, the camera model is a perspective projection model for which θ represents the longitudinal view angle of the camera, n is the distance from the center of the camera to the near clipping plane, and f is the distance from the center of the cameraDistance to far clipping plane, Aspect is the Aspect ratio of the projected image, K is the internal parameters of the camera, Z_cAnd gamma is related to depth. The virtual intelligent tank can be rendered at a corresponding position in a real scene through the coordinate transformation.

And 5: and acquiring a six-degree-of-freedom global pose of the user by adopting a multi-sensor fusion real-time tracking and positioning algorithm, and transmitting the pose to a virtual camera in the Unity 3D.

Step 6: and importing a picture captured by a real camera, rendering the picture in a scene, and taking the picture as an operation object of a user for carrying out virtual-real interaction with the virtual tank and the scene.

And 7: and changing the rendering channel sequence of the real scene model, the real-time video stream and the virtual tank object in Unity3D, and finally adjusting the rendering sequence of the real scene model and the virtual tank object according to the depth relation of the real scene model and the virtual tank object, so that the three-dimensional map model in the virtual scene generated in the off-line stage of the step 1 is hidden, and finally an augmented reality simulation countermeasure environment in which the virtual tank and the real environment are fused is formed. At the moment, in a real scene, a user can operate a real tank and a virtual intelligent tank to cooperate to complete a countermeasure task, meanwhile, the virtual intelligent tank has an intelligent type which makes a more optimal decision in the face of environmental change, and a cooperation strategy exists among the multi-agent tanks, so that the cooperation countermeasure strategy between the real user and the virtual multi-agent is more flexible and changeable, and interaction experience and simulation countermeasure effect are improved.

Therefore, the augmented reality multi-agent cooperative confrontation environment based on reinforcement learning is realized.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

13页详细技术资料下载

Augmented reality multi-agent cooperative confrontation realization method based on reinforcement learning

相关技术

网友询问留言