DQN-based intelligent confrontation behavior realization method under augmented reality condition

文档序号：551870 发布日期：2021-05-14 浏览：3次中文

阅读说明：本技术 一种增强现实条件下基于dqn的智能体对抗行为实现方法 (DQN-based intelligent confrontation behavior realization method under augmented reality condition ) 是由陈靖周俊研缪远东于 2021-01-28 设计创作，主要内容包括：本发明提供一种增强现实条件下基于DQN的智能体对抗行为实现方法,利用DQN深度强化学习网络强大的特征提取能力和强化学习决策能力预测各智能体的行为,再将采用训练好的DQN网络获取各个时刻的动作的智能体迁移到增强现实环境中,使生成的虚拟智能体具有更高的智能性,能够解决增强现实游戏中虚拟对抗智能体的行为呆板的问题,具有使虚拟智能体的对抗行为更灵活的效果。(The invention provides an intelligent agent confrontation behavior realization method based on DQN under augmented reality conditions, which is characterized in that the behavior of each intelligent agent is predicted by utilizing the strong feature extraction capability and the reinforcement learning decision capability of a DQN deep reinforcement learning network, and then the intelligent agent which adopts the trained DQN network to acquire the action at each moment is transferred to an augmented reality environment, so that the generated virtual intelligent agent has higher intelligence, the problem that the behavior of the virtual confrontation intelligent agent in an augmented reality game is stiff can be solved, and the confrontation behavior of the virtual intelligent agent is more flexible.)

1. A method for realizing confrontation of an intelligent agent based on DQN under an augmented reality condition is characterized in that the intelligent agent is divided into at least two camps, and each intelligent agent adopts a trained DQN network to obtain the action at each moment to realize confrontation; the method for training the DQN network comprises the following steps:

s1: scanning a real scene by adopting a laser scanner, acquiring a three-dimensional point cloud map corresponding to the real scene, triangulating the three-dimensional point cloud map, and importing the three-dimensional point cloud map into a three-dimensional scene rendering engine to obtain a real scene model;

s2: distributing each intelligent agent in a real scene model, wherein each intelligent agent respectively acquires the information of the surrounding environment where each intelligent agent is located by adopting a ray detection method;

s3: the method comprises the steps that surrounding environment information, life values and current positions corresponding to current moments of all intelligent agents form state vectors, the state vectors are input into a DQN network, and the DQN network generates corresponding Q values for possible actions of each intelligent agent at the next moment according to the state vectors;

s4: determining the action of each intelligent agent at the next moment according to the Q value corresponding to the action possibly located at the next moment of each intelligent agent based on an e-greedy strategy;

s5: after each intelligent agent executes the action determined in the step S4, respectively acquiring the reward value and the life value corresponding to the current moment of each intelligent agent, and superposing the reward value corresponding to the current moment of each intelligent agent to the reward total score of the respective marketing; meanwhile, updating the DQN network by adopting a gradient descent method, judging whether the life value of each agent is greater than zero, and entering step S6 for agents with life values greater than zero;

s6: after the agents with the life values larger than zero acquire the surrounding environment information of the agents, the updated DQN network is adopted to execute the steps S3-S5 again until the life value of any intelligent agent in the camp is not larger than zero;

s7: respectively judging whether the total reward points corresponding to each marketing are stable in a set range, if not, entering the step S8; if so, the current DQN network is the finally trained DQN network;

s8: and giving the life value to the intelligent agent of each camp again, and executing the steps S2-S7 again until the total reward points corresponding to each camp are stabilized within the set range.

2. The method for realizing the confrontation of the intelligent agents based on DQN under the augmented reality condition of claim 1, wherein the step S4 of determining the action of each intelligent agent at the next time according to the Q value corresponding to the action that each intelligent agent may be in at the next time based on the e-greedy policy is specifically:

generating a random number random, judging whether the random number random is smaller than a set value epsilon, if so, taking the action corresponding to the maximum value of the Q value corresponding to each intelligent agent as the action at the next moment; if not, each agent randomly selects one action from all possible actions as the action of the next moment.

3. The method for realizing the confrontation of the intelligent body based on the DQN under the augmented reality condition according to claim 1, wherein if the mesh number of the three-dimensional dense map obtained by triangulating the three-dimensional point cloud map in the step S1 exceeds a set value, a simplified virtual confrontation simulation model is built by using unity3D as a real scene model.

4. The method for realizing confrontation of intelligent agents based on DQN under augmented reality according to claim 1, wherein the step S2 of obtaining the information of their own surrounding environment by the intelligent agents using a ray detection method specifically includes: respectively taking each agent as a current agent to execute the following steps:

selecting at least three angle directions in a 360-degree panoramic range, and respectively sensing whether obstacles exist in each angle direction, a self-direction intelligent agent, an opposite intelligent agent or whether the front of the current intelligent agent is the ground by adopting a ray detection method, wherein different types of the obstacles, the self-direction intelligent agent, the opposite intelligent agent and the ground are provided with different labels;

and splicing the environmental information { the label, the distance } of the current intelligent agent in each angle direction into the surrounding environmental information, wherein the distance in the environmental information represents the distance between the current intelligent agent and the obstacle represented by the label in the environmental information, the own intelligent agent, the opposite intelligent agent or the ground.

5. The method for realizing intelligent object confrontation behavior based on DQN in augmented reality condition of claim 4, wherein the selected angular directions in the 360 ° panoramic range are {0,30,50,70,80,85,90,95,100,110,130,150,180,220,270,320}, respectively.

6. The method for implementing DQN-based agent confrontation behavior under augmented reality as claimed in claim 1, wherein the agent's actions include three sub-states, which are moving, rotating and attacking states respectively, wherein the type of moving includes moving backward, moving forward and stationary, the type of rotating includes rotating left, rotating right and unchanged direction, the type of attacking state includes attacking and not attacking, and then each agent's actions that may be taken at the next moment are 3 x 2 ═ 18.

7. The method for realizing confrontation of intelligent agents based on DQN under augmented reality conditions of claim 1, wherein the obtaining method of the reward value corresponding to each intelligent agent at the current moment in step S5 is as follows:

if the agent hits the opposite agent, the reward value is 1; if the agent plays the own agent, the reward value is-1; if the life value of the intelligent agent is not more than zero, the reward value is-0.5; if the agent moves forward, the reward value is 0.01; if the agent encounters an obstacle, the reward value is-0.02.

8. The method for realizing confrontation of intelligent agents based on DQN under augmented reality conditions of claim 1, wherein the initial life value of each intelligent agent is 100, and the method for obtaining the life value corresponding to the current moment of each intelligent agent in step S5 is:

if the agent is attacked by other agents, the vital value is reduced by 30.

9. The method for realizing the confrontation of the intelligent body based on the DQN under the augmented reality condition as recited in claim 1, wherein a multi-sensor fusion tracking and positioning algorithm is adopted to obtain the global pose of six degrees of freedom of the human player in real time, and then the global pose of six degrees of freedom of the human player is transmitted to a virtual camera in a three-dimensional scene rendering engine; meanwhile, a real scene video captured by a real camera and each intelligent agent which obtains actions at each moment by adopting a trained DQN network are led into a rendering interface of a virtual camera in a three-dimensional scene rendering engine, so that a confrontation simulation environment which is formed by the integration of the intelligent agent and the real environment is obtained.

Technical Field

The invention belongs to the technical field of information, and particularly relates to an intelligent confrontation behavior realization method based on DQN under an augmented reality condition.

Background

With the progress and development of the times, the level of the internet in China is also promoted and extended, and is particularly reflected in the progress of the augmented reality technology. The interaction between the user and the virtual object (Non Player Character) is one of the important experience contents for the augmented reality application, so the requirement for the intelligence and diversity of the NPC is more urgent. The main mode adopted by the NPC behavior design in the current augmented reality application is a state machine and a behavior tree, various behaviors of a user and the NPC are designed in advance, the NPC behaviors run according to a certain rule and are far away from real intelligent decision, evaluation and learning, namely, more AI (Artificial Intelligence) in the current augmented reality application is artificially set rather than intelligent. In order to make the AI closer to a real person, motion capture and expression capture are introduced in many applications, the behavior of the NPC looks more natural, but the learning, thinking and problem solving aspects of the NPC are not further improved in the algorithm level, and only the optimization is carried out in the expression level. Meanwhile, the existing NPC behaviors are usually acquired by a state machine and a behavior tree, and the generated AI is too stiff, so that the NPC behaviors can be easily predicted by human players, and the players feel tired quickly and cannot be well matched with the game level of the human players.

Disclosure of Invention

In order to solve the problem that the behavior of a virtual confrontation agent in an augmented reality game is not flexible enough, the invention provides an agent confrontation behavior realization method based on DQN under an augmented reality condition, which can make the confrontation behavior of the virtual agent more flexible.

An intelligent agent confrontation behavior realization method based on DQN under augmented reality conditions is disclosed, wherein the intelligent agent is divided into at least two camps, and each intelligent agent adopts a trained DQN network to acquire the action at each moment to realize confrontation; the method for training the DQN network comprises the following steps:

Further, based on the e-greedy policy in step S4, determining the action of each agent at the next time according to the Q value corresponding to the action that each agent may be in at the next time specifically includes:

Further, if the mesh number of the three-dimensional dense map obtained by triangulating the three-dimensional point cloud map in step S1 exceeds a set value, a simplified virtual countermeasure simulation model is constructed as a real scene model using unity 3D.

Further, the step S2 of acquiring the information of the surrounding environment where each agent is located by using a ray detection method specifically includes: respectively taking each agent as a current agent to execute the following steps:

Further, the angular directions selected within the 360 ° panoramic range are {0,30,50,70,80,85,90,95,100,110,130,150,180,220,270,320}, respectively.

Further, the actions of the agent include three sub-states, which are respectively a moving state, a rotating state and an attacking state, wherein the types of the moving state include backward movement, forward movement and still, the types of the rotating state include left rotation, right rotation and direction invariance, the types of the attacking state include attack and non-attack, and then the actions that each agent may take at the next moment are 3 × 3 × 2 — 18.

Further, in step S5, the method for obtaining the bonus value corresponding to the current time of each agent is as follows:

Further, the initial life value of each agent is 100, and the method for acquiring the life value corresponding to the current time of each agent in step S5 includes:

if the agent is attacked by other agents, the vital value is reduced by 30.

Further, acquiring a global pose of six degrees of freedom of a human player in real time by adopting a multi-sensor fusion tracking positioning algorithm, and transmitting the global pose of six degrees of freedom of the human player to a virtual camera in a three-dimensional scene rendering engine; meanwhile, a real scene video captured by a real camera and each intelligent agent which obtains actions at each moment by adopting a trained DQN network are led into a rendering interface of a virtual camera in a three-dimensional scene rendering engine, so that a confrontation simulation environment which is formed by the integration of the intelligent agent and the real environment is obtained.

Has the advantages that:

the invention provides an intelligent agent confrontation behavior realization method based on DQN under augmented reality conditions, which is characterized in that the behavior of each intelligent agent is predicted by utilizing the strong feature extraction capability and the reinforcement learning decision capability of a DQN deep reinforcement learning network, and then the intelligent agent which adopts the trained DQN network to acquire the action at each moment is transferred to an augmented reality environment, so that the generated virtual intelligent agent has higher intelligence, the problem that the behavior of the virtual confrontation intelligent agent in an augmented reality game is stiff can be solved, and the confrontation behavior of the virtual intelligent agent is more flexible.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a flow chart of the intelligent tank training of the present invention;

FIG. 3 is a layout diagram of a virtual simulation environment of the present invention;

FIG. 4 is a diagram of the intelligent tank training process of the present invention;

FIG. 5 is a three-dimensional map of a real scene of the present invention;

FIG. 6 is a diagram illustrating the effect of the fusion of real and virtual components;

FIG. 7 is another effect diagram of the virtual-real fusion according to the present invention.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

In order to improve the intelligence of the NPC in the augmented reality application, a deep reinforcement learning algorithm DQN may be adopted to train the NPC to have an intelligent behavior. Deep reinforcement learning is one of the emerging technologies in the field of artificial intelligence, combines the strong feature extraction capability of deep learning with the decision-making capability of reinforcement learning, realizes an end-to-end framework from perception input to decision output, has strong learning capability and is widely applied. The AI generated by the state machine and the behavior tree is too stiff, so that the NPC behavior can be easily predicted by a human player and the human player feels tired quickly. Because deep reinforcement learning needs continuous trial and error and training cost is too high in a real environment, training is generally performed in a virtual simulation environment and then the training is transferred to the real environment for use. Unity3D is a widely used professional game engine, and can be used for generating virtual objects in augmented reality, building a simulation environment by using Unity3D, training the virtual objects by using a DQN algorithm, and then migrating the virtual objects to the augmented reality environment, so that the generated virtual objects have higher intelligence.

Specifically, the method for realizing the confrontation of the intelligent body based on the DQN under the augmented reality condition specifically comprises the following steps:

acquiring the global pose of the human player with six degrees of freedom in real time by adopting a multi-sensor fusion tracking and positioning algorithm, and transmitting the global pose of the human player with six degrees of freedom to a virtual camera in a three-dimensional scene rendering engine; meanwhile, importing a real scene video captured by a real camera and each agent acquiring actions at each moment by adopting a trained DQN network into a rendering interface of a virtual camera in a three-dimensional scene rendering engine, adjusting the rendering sequence of the real scene three-dimensional dense map model and the virtual agent according to the depth relation of the real scene three-dimensional dense map model and the virtual agent, hiding texture information of the three-dimensional dense map model in the virtual scene generated in an offline stage, and reserving a collision model thereof to obtain a confrontation simulation environment in which the agents and the real environment are fused so as to realize confrontation, as shown in FIGS. 6 and 7; the intelligent agents are divided into at least two camps, and each intelligent agent adopts the trained DQN network to obtain the action at each moment.

Further, as shown in fig. 1, the method for training the DQN network includes the following steps:

It should be noted that, in the offline stage, a three-dimensional dense map model of a scene is constructed offline through a FARO laser scanner. When an experimental scene is scanned, the laser scanner performs multi-site scanning, and converts the map three-dimensional data of each site into a coordinate system with the site scanned for the first time as an origin, namely a three-dimensional map model coordinate system. After scanning is finished, the FARO laser scanner provides an RGBD panorama of a scene with about 40M 3D points and color information, and generates a triangulated dense point cloud map from the panorama using a greedy projection triangulation algorithm.

In addition, if the number of the real scene three-dimensional dense map patches constructed by the laser scanner is too large, a simplified virtual confrontation simulation model can be constructed by the Unity3D as a real scene model. For example, building a collision body such as a building, a tree, and the like in a real scene can be built with a cube, an obstacle tag is added, a ground is built with a plan, a ground tag is added, and then the agent is trained in a simplified virtual simulation environment, as shown in fig. 3. That is to say, the virtual confrontation simulation model used for training is only a simplified expression of the real environment, and the three-dimensional dense map constructed by the laser scanner can completely express the real environment, so that the functions of virtual and real shielding and collision detection are realized. Therefore, after the training is completed, the trained intelligent agent model and the three-dimensional dense map are imported in Unity at the online stage, and the intelligent agent model is rendered at the corresponding position of the real scene, namely the three-dimensional dense map, as shown in fig. 4.

S2: distributing each intelligent agent in a real scene model, and respectively acquiring the information of the surrounding environment of each intelligent agent by adopting a ray detection method.

For example, as shown in fig. 2, obstacles including buildings, trees and the like are arranged in the virtual confrontation simulation environment built in unity3D, most areas are flat, a tank is adopted as an intelligent agent, the intelligent agent is divided into two red and blue parties, and the tanks of the two red and blue parties are distributed on two sides of the scene, each party has 3 tanks, and the tanks can freely move in the scene and attack other tanks. Each tank at least selects three angular directions within a 360-degree panoramic range, such as {0,30,50,70,80,85,90,95,100,110,130,150,180,220,270 and 320}, and the current tank respectively senses whether obstacles exist in each angular direction, a self-tank, an opposite tank or the front is the ground by adopting a ray with the length of 120, wherein different obstacle types, self-tank, opposite tank and ground are provided with different labels; and splicing environmental information { labels, distances } in each angular direction of the current tank into surrounding environmental information, wherein the distance in the environmental information represents the distance between the current tank and an obstacle, a self tank, an opposite tank or the ground represented by the labels in the environmental information.

Further, the environment information obtained at a certain angle due to the ray detection is represented as: { tag, distance }, if the relevant object is detected at the current angle, the value of the corresponding object is changed from 0 to 1, and the distance between the object and the agent is output.

S3: and forming a state vector by the ambient environment information, the life value and the current position corresponding to the current moment of each intelligent agent, and inputting the state vector into the DQN network, so that the DQN network generates a corresponding Q value for the action possibly positioned at the next moment of each intelligent agent according to the state vector.

It should be noted that the movement of the agent is achieved by applying force to the agent, and the change of direction is achieved by left-hand rotation and right-hand rotation, and the value design is specifically shown in table 1.

TABLE 1

Numerical value	Move	Rotate	Attack of
				-1	To the rear	Left hand rotation of 5 °
0	At rest	Direction is not changed	Not attack
				1	Forward	Right hand rotation of 5 °	Attack of

That is, the actions of the agent include three sub-states, which are respectively a movement state, a rotation state and an attack state, wherein the type of movement includes backward movement, forward movement and still, the type of rotation includes left rotation, right rotation and direction invariance, the type of attack state includes attack and no attack, then the actions that each agent may take at the next moment are 3 × 3 × 2 ═ 18, and the actions at the next moment are backward and left rotation by 5 ° to further release the attack.

S4: determining the action of each agent at the next moment according to the Q value corresponding to the action possibly located at the next moment of each agent based on an e-greedy strategy, which specifically comprises the following steps:

generating a random number random, and then judging whether the random number random is smaller than a set value epsilon, wherein epsilon is a number between 0 and 1, if so, each agent takes the action corresponding to the maximum value of the Q value corresponding to the agent as the action of the next moment; if not, each agent randomly selects one action from all possible actions as the action of the next moment.

S5: after each intelligent agent executes the action determined in the step S4, respectively acquiring the reward value and the life value corresponding to the current moment of each intelligent agent, and superposing the reward value corresponding to the current moment of each intelligent agent to the reward total score of the respective marketing; meanwhile, updating the DQN network by adopting a gradient descent method, judging whether the life value of each agent is greater than zero, and entering step S6 for agents with life values greater than zero.

Further, at the beginning of the turn, all agents have a life value of 100, when an agent is attacked by other agents, the life value is reduced by 30, and when the life value of an agent is less than or equal to 0, the agent dies and the agent disappears in the scene. And when all the intelligent agents in any one battle in the scene disappear, the turn is restarted.

Meanwhile, the reward setting of each agent at each moment is shown in table 2:

TABLE 2

Tank	Reward
		Fighting enemy	+1
Chuzhonghua Fang (Chinese character of 'Zhuzhonghua')	-1
		Death was caused by death	-0.5
Move forward	+0.01
		Run into the obstacle	-0.02

That is, if the agent hits the opposite agent, the reward value is 1; if the agent plays the own agent, the reward value is-1; if the life value of the intelligent agent is not more than zero, the reward value is-0.5; if the agent moves forward, the reward value is 0.01; if the agent encounters an obstacle, the reward value is-0.02.

S6: and after the intelligent agents with the life values larger than zero acquire the surrounding environment information of the intelligent agents, the updated DQN network is adopted to perform the steps S3-S5 again until the life value of any intelligent agent in the camp is not larger than zero.

S7: respectively judging whether the total reward points corresponding to each marketing are stable in a set range, if not, entering the step S8; if yes, the current DQN network is the finally trained DQN network.

S8: and giving the life value to the intelligent agent of each camp again, and executing the steps S2-S7 again until the total reward points corresponding to each camp are stabilized within the set range.

It should be noted that, the DQN algorithm is written in Python language and runs in VSCode environment, and the simulation environment is built under Unity3D, and Unity3D and Python are required to perform data communication, so as to ensure normal running of the program.

For example, when a turn starts, a tank is initialized at a corresponding position in the Unity3D scene, then the tank senses the surrounding environment through ray detection, and then state information (surrounding environment information, life value and current position) and reward information are sent to the python terminal. The corresponding action of the tank is obtained through the DQN network, action selection is realized through an epsilon-greedy strategy, a random number random is generated in the action selection process, if random is smaller than epsilon, the action corresponding to the maximum Q value of the DQN network is selected, and if not, an action is selected randomly. The python terminal then sends action information to the unity3D terminal to control the tank to move, rotate and shoot in the scene, so that the current state of the tank changes. This process is repeated, wherein the round is restarted at the Unity3D end after the round is over, and finally the training is over until the bonus value of the round is stable.

In summary, the basic implementation processes of software and hardware of the method for realizing the confrontation of the intelligent object based on the DQN under the augmented reality condition provided by the present invention are summarized as follows:

step 1: in the off-line stage, a dense three-dimensional point cloud map is firstly constructed, and three corners of the dense three-dimensional point cloud map are meshed to form the three-dimensional dense map. And secondly, importing the three-dimensional dense map into a three-dimensional scene rendering engine.

Step 2: the DQN network is trained using real scene models in a three-dimensional rendering engine. Wherein, the scene logic needs to comprise the turn beginning, the turn ending and the interaction between the agent and the environment in the turn process, and designs the reward information. By adding different labels to the targets with different attributes in the environment, the agent can effectively identify the target information in the surrounding environment. The Python builds a DQN algorithm framework, and the three-dimensional rendering engine is communicated with the Python. The three-dimensional rendering engine sends information whether the current agent state, reward and turn are finished to a Python end, after the Python end receives the information, the action of the agent is obtained through a DQN network by using an epsilon-greedy strategy and then sent to the three-dimensional rendering engine, the three-dimensional rendering engine controls the agent in the scene according to the received action, then the agent state and reward information are sent to the Python end, the process is repeated until the turn reward value is stable, and then training is finished.

And step 3: and after the training is finished, importing the trained agent model and the three-dimensional dense map in an online stage, and rendering the agent model at a corresponding position of the real scene. Meanwhile, in order to render the virtual agent in the real scene, the virtual agent target under the world coordinate system needs to be converted into the pixel coordinate system.

And 4, step 4: and acquiring the global pose of the human player with six degrees of freedom in real time by adopting a real-time tracking and positioning algorithm, and transmitting the global pose to a virtual camera in a three-dimensional rendering engine. And then, importing the video captured by the real camera into a rendering interface of a three-dimensional rendering engine, and hiding the three-dimensional map model in the virtual scene generated in the off-line stage, thereby forming a confrontation simulation environment in which the virtual agent and the real environment are fused.

Therefore, compared with the prior art, the method can solve the problem that the behavior of the virtual confrontation intelligent body in the augmented reality game is stiff, and has the effect of enabling the confrontation behavior of the virtual intelligent body to be more flexible.

The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it will be understood by those skilled in the art that various changes and modifications may be made herein without departing from the spirit and scope of the invention as defined in the appended claims.

13页详细技术资料下载

DQN-based intelligent confrontation behavior realization method under augmented reality condition

相关技术

网友询问留言