Method for determining virtual object behaviors and hosting virtual object behaviors

文档序号:1347385 发布日期:2020-07-24 浏览:8次 中文

阅读说明:本技术 确定虚拟对象行为及托管虚拟对象行为的方法 (Method for determining virtual object behaviors and hosting virtual object behaviors ) 是由 黄超 周大军 张力柯 荆彦青 于 2020-03-27 设计创作,主要内容包括:公开了一种确定虚拟对象行为及托管虚拟对象行为的方法、设备、装置和计算机可读存储介质。该方法包括:一种确定虚拟对象行为的方法,包括:基于虚拟对象的情景图像,利用残差网络获取表征所述虚拟对象所处情景的情景特征;基于所述情景特征,确定虚拟对象行为在预定行为集合中的概率分布;以及基于所述概率分布,确定所述虚拟对象行为。该方法可以通过轻量化的残差深度网络来计算虚拟对象人工智能的行为分布,从而解决了在虚拟对象人工智能的设计中存在训练耗时过长、设计难度过大、无法处理多值问题等的技术问题。(Methods, devices, apparatuses, and computer-readable storage media for determining virtual object behavior and hosting virtual object behavior are disclosed. The method comprises the following steps: a method of determining behavior of a virtual object, comprising: based on the scene image of the virtual object, acquiring scene characteristics representing the scene of the virtual object by using a residual error network; determining probability distribution of virtual object behaviors in a preset behavior set based on the scene features; and determining the virtual object behavior based on the probability distribution. The method can calculate the behavior distribution of the artificial intelligence of the virtual object through the lightweight residual error depth network, thereby solving the technical problems of too long training time, too large design difficulty, incapability of processing multi-value problems and the like in the design of the artificial intelligence of the virtual object.)

1. A method of determining behavior of a virtual object, comprising:

based on the scene image of the virtual object, acquiring scene characteristics representing the scene of the virtual object by using a residual error network;

determining probability distribution of virtual object behaviors in a preset behavior set based on the scene features; and

determining the virtual object behavior based on the probability distribution.

2. The method of determining behavior of a virtual object as claimed in claim 1, wherein the scene image comprises a reference image area, and the obtaining the scene features characterizing the scene in which the virtual object is located based on the scene image of the virtual object further comprises:

intercepting a reference image area from the scene image, wherein the reference image area shows acquaintance information of the virtual object in a game;

and acquiring scene features representing the scene of the virtual object by using the residual error network based on the reference image area.

3. The method of determining virtual object behavior of claim 1, wherein determining the probability distribution of the virtual object behavior in the predetermined set of behaviors further comprises: determining a probability distribution of virtual object behaviors in a predetermined behavior set by using a behavior prediction network;

wherein the behavior prediction network comprises a moving direction prediction network, a view angle prediction network, and a view amplitude prediction network, and the virtual object behavior comprises at least one of the following behaviors:

a direction of movement of the virtual object;

the perspective angle of the virtual object at the next moment;

the view angle amplitude of the virtual object at the next time instant.

4. The method of determining virtual object behavior of claim 3, wherein the utilizing a behavior prediction network to determine a probability distribution of virtual object behavior and determining the virtual object behavior based on the probability distribution further comprises:

inputting the contextual characteristics to a movement direction prediction network to obtain a first probability distribution indicating a probability distribution of movement of the virtual object in a plurality of movement directions, and determining a movement direction of the virtual object based on the first probability distribution;

inputting the scene features to a view angle prediction network to obtain a second probability distribution, wherein the second probability distribution indicates the probability distribution of the view angle value of the virtual object at the next moment in a view angle value interval, and the view angle value of the virtual object at the next moment is determined based on the second probability distribution;

inputting the scene features into a view angle amplitude prediction network to obtain a third probability distribution, wherein the third probability distribution indicates the probability distribution of the view angle amplitude values of the virtual object at the next moment in the view angle amplitude value interval, and determining the view angle amplitude values of the virtual object at the next moment based on the third probability distribution.

5. The method of determining virtual object behavior of claim 1, wherein the residual network comprises:

at least one first residual module, the spatial dimension of the input features of which is twice the spatial dimension of its output features, and the channel dimension of the input features of which is half the channel dimension of its output features;

at least one second residual module, the spatial dimension and channel dimension of the input features of the second residual module being the same as the spatial dimension and channel dimension of its output features.

6. The method of determining virtual object behavior of claim 5 wherein,

the first residual module comprises:

a first number of first convolution layers, a step size of the first convolution layers being a first step size, and a size of convolution kernels of the first convolution layers being a first size;

a second number of second convolutional layers, the step size of the second convolutional layers being a second step size, and the size of the convolutional cores of the second convolutional layers being a second size;

a second number of third convolutional layers, the step size of the third convolutional layers being a second step size, and the size of the convolutional cores of the third convolutional layers being a first size;

the second residual module comprises:

a first number of third convolutional layers, the step size of the first convolutional layer being a second step size, and the size of the convolutional kernels of the third convolutional layers being a first size;

a second number of second convolutional layers, the step size of the second convolutional layers being a second step size, and the size of the convolutional cores of the second convolutional layers being a second size.

7. The method of determining behavior of a virtual object according to claim 3, wherein the virtual object is a virtual object in a game, the contextual image of the virtual object being a game interface, the method further comprising:

recording a video of the game interface in which the virtual object is manipulated,

acquiring a plurality of sample data from the video, wherein each sample data comprises a game interface sample, a movement direction sample executed by a virtual object aiming at the game interface sample, a visual angle sample at the next moment and a visual amplitude sample at the next moment;

training the residual network and the behavior prediction network based on the plurality of sample data.

8. The method of determining virtual object behavior of claim 7, wherein the training the residual network and the behavior prediction network comprises:

training parameters of a residual error network and a moving direction prediction network by optimizing class cross entropy loss between the moving direction sample and a moving direction predicted by the moving direction prediction network based on a game interface sample in the plurality of sample data and a moving direction sample corresponding to the game interface sample;

training parameters of a residual error network and a view angle prediction network by optimizing posterior probability loss between a view angle sample and a probability distribution of a view angle predicted based on a view angle prediction network based on the game interface sample in the plurality of sample data and a view angle sample at a next moment corresponding to the game interface sample; and

based on the game interface samples in the plurality of sample data and the visual angle amplitude samples at the next moment corresponding to the game interface samples, parameters of a residual error network and a visual angle amplitude prediction network are trained by optimizing posterior probability loss between the visual angle amplitude samples and the probability distribution of the visual angle amplitude predicted based on the visual angle amplitude prediction network.

9. The method of determining behavior of a virtual object according to claim 4, wherein the virtual object is a virtual object manipulated in a game, and the scene image of the virtual object is a battle game interface, wherein, in a case where an attacking object of the virtual object appears in the battle game interface, an extremum of the second probability distribution is close to a viewing angle facing the attacking object.

10. A method of hosting virtual object behavior in a battle game, comprising:

determining probability distribution of virtual object behaviors in a preset behavior set based on a game interface of a virtual object, wherein the virtual object behaviors comprise a moving direction of the virtual object, a visual angle of the virtual object at the next moment and a visual angle amplitude of the virtual object at the next moment;

hosting the virtual object behavior based on the probability distribution;

wherein, when an attacking object of the virtual object appears on the game interface, an extreme value of a probability distribution of a perspective angle of the virtual object at a next time is close to a perspective angle facing the attacking object.

11. The method of hosting virtual object behavior in a tournament game according to claim 10, wherein a reference image area showing an orientation and a distance of an attacking object or obstacle in a field of view of the virtual object in the tournament game is included in a game interface of the virtual object.

12. An apparatus for determining behavior of a virtual object, comprising:

the scene feature acquisition module is configured to acquire scene features representing scenes of the virtual objects by using a residual error network based on scene images of the virtual objects;

a probability distribution determination module configured to determine a probability distribution of virtual object behaviors in a predetermined behavior set based on the contextual characteristics; and

a virtual object behavior determination module configured to determine the virtual object behavior based on the probability distribution.

13. The apparatus for determining virtual object behavior of claim 12 wherein determining a probability distribution of virtual object behavior in a predetermined set of behaviors further comprises: determining a probability distribution of virtual object behaviors in a predetermined behavior set by using a behavior prediction network;

wherein the behavior prediction network comprises a moving direction prediction network, a view angle prediction network and a view amplitude prediction network, and the virtual object behavior comprises a moving direction of a virtual object, a view angle of the virtual object at a next time and a view amplitude of the virtual object at the next time.

14. An apparatus to determine behavior of a virtual object, comprising:

a processor;

memory storing computer instructions which, when executed by the processor, implement the method of any one of claims 1-11.

15. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of any one of claims 1-11.

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly, to a method, apparatus, device, and computer-readable storage medium for determining behavior of a virtual object. The present disclosure also relates to a method of hosting virtual object behaviors in a battle game.

Background

With the development of network technology, human-computer interactive applications such as computer games and the like can provide virtual scenes for users, and the users can control virtual objects to execute operations in the virtual scenes so as to achieve the purpose of entertainment. In a game guide, a game test, a Character hosting or Non-Player Character (NPC) control, and the like, it is also necessary to automatically determine an operation to be executed by a certain virtual object by a computer, and then perform operation control. For example, in game hosting, a terminal analyzes a game scene in which a game character is located in place of a player, and automatically controls the game character to perform an operation. In the above scenario, the computer may determine the operation of the virtual object by designing virtual object artificial intelligence. The existing design of virtual object artificial intelligence usually has the technical problems of too long training time, too large design difficulty, incapability of processing multi-value problems and the like.

Disclosure of Invention

Embodiments of the present disclosure provide a method, device, electronic device, and computer-readable storage medium for determining behavior of a virtual object. Embodiments of the present disclosure also provide a method of hosting virtual object behaviors in a battle game.

An embodiment of the present disclosure provides a method for determining a behavior of a virtual object, including: based on the scene image of the virtual object, acquiring scene characteristics representing the scene of the virtual object by using a residual error network; determining probability distribution of virtual object behaviors in a preset behavior set based on the scene features; and determining the virtual object behavior based on the probability distribution.

An embodiment of the present disclosure provides a method for hosting behavior of a virtual object in a battle game, including: determining probability distribution of virtual object behaviors in a preset behavior set based on a game interface of a virtual object, wherein the virtual object behaviors comprise a moving direction of the virtual object, a visual angle of the virtual object at the next moment and a visual angle amplitude of the virtual object at the next moment; hosting the virtual object behavior based on the probability distribution; wherein, in a case where an attack object appears on the game interface, an extreme value of a probability distribution of a perspective angle of the virtual object at a next time is close to a perspective angle facing the attack object.

An embodiment of the present disclosure provides an apparatus for determining a behavior of a virtual object, including: the scene feature acquisition module is configured to acquire scene features representing scenes of the virtual objects by using a residual error network based on scene images of the virtual objects; a probability distribution determination module configured to determine a probability distribution of virtual object behaviors in a predetermined behavior set based on the contextual characteristics; and a virtual object behavior determination module configured to determine the virtual object behavior based on the probability distribution.

An embodiment of the present disclosure provides an apparatus for determining a behavior of a virtual object, including: a processor; a memory storing computer instructions that, when executed by the processor, implement the above-described method.

Embodiments of the present disclosure provide a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the above-described method.

The present disclosure proposes a method, device, electronic device and computer readable storage medium for determining behavior of a virtual object. Embodiments of the present disclosure also provide a method of hosting virtual object behavior in a battle game. The embodiment of the disclosure calculates the behavior distribution of the virtual object artificial intelligence through the light-weighted residual error depth network, and solves the technical problems that the training time is too long, the design difficulty is too large, the multi-value problem cannot be processed and the like in the design of the virtual object artificial intelligence.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly introduced below. The drawings in the following description are merely exemplary embodiments of the disclosure.

Fig. 1 is an example schematic diagram illustrating a scene image of a virtual object according to an embodiment of the present disclosure.

Fig. 2A is a flow diagram illustrating a method of determining virtual object behavior in accordance with an embodiment of the present disclosure.

Fig. 2B is a schematic diagram illustrating a method of determining virtual object behavior according to an embodiment of the present disclosure.

Fig. 2C illustrates a block diagram of an apparatus for determining behavior of a virtual object according to an embodiment of the present disclosure.

Fig. 3A is a schematic diagram illustrating a residual network and a behavior prediction network according to an embodiment of the disclosure.

Fig. 3B is a schematic diagram illustrating a first residual module and a second residual module, according to an embodiment of the present disclosure.

Fig. 3C is a schematic diagram illustrating a first residual module and a second residual module, according to an embodiment of the present disclosure.

Fig. 4A is a flow diagram illustrating training a residual network and a behavior prediction network according to an embodiment of the disclosure.

Fig. 4B is a flow diagram illustrating one example of training a residual network and a behavior prediction network in accordance with an embodiment of the present disclosure.

FIG. 5 illustrates a flow diagram of a method of hosting virtual object behavior in a battle game in accordance with an embodiment of the present disclosure.

Fig. 6 is a block diagram illustrating an apparatus for determining behavior of a virtual object according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more apparent, exemplary embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

In the present specification and the drawings, steps and elements having substantially the same or similar characteristics are denoted by the same or similar reference numerals, and repeated description of the steps and elements will be omitted. Meanwhile, in the description of the present disclosure, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance or order.

For the purpose of describing the present disclosure, concepts related to the present disclosure are introduced below.

The game Artificial intelligence belongs to one of Artificial Intelligence (AI). Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. Game artificial intelligence attempts to understand the essence of a human player's operations in a game and produce a new intelligent gaming machine that can react in a manner similar to human intelligence. The game artificial intelligence has the functions of perception, reasoning and decision-making in the game by integrating the design principle and the implementation method of various intelligent machines.

Machine learning (Machine L earning, M L) is a multi-domain interdisciplinary subject that relates to probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. Game artificial intelligence is a multi-domain interdisciplinary subject that simulates or implements human learning behavior by studying how to obtain new knowledge or skills, reorganizes existing knowledge structures to continuously improve its performance.

Game artificial intelligence generally needs to analyze game scenes when making decisions. The game scene is usually presented to the real player in the form of a two-dimensional or three-dimensional picture. The game artificial intelligence simulates the situation that a real player sees a two-dimensional or three-dimensional picture, and makes a decision based on the two-dimensional or three-dimensional picture. At this point, the game artificial intelligence will employ Computer Vision technology (CV). Computer vision technology computer vision is a science for researching how to make a machine "see", and further, it means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronized positioning and map construction, among other techniques.

Alternatively, each network in the following may be an artificial intelligence network, in particular an artificial intelligence based neural network. Typically, artificial intelligence based neural networks are implemented as acyclic graphs, with neurons arranged in different layers. Typically, the neural network model comprises an input layer and an output layer, the input layer and the output layer being separated by at least one hidden layer. The hidden layer transforms input received by the input layer into a representation that is useful for generating output in the output layer. The network nodes are all connected to nodes in adjacent layers via edges, and no edge exists between nodes in each layer. Data received at a node of an input layer of a neural network is propagated to a node of an output layer via any one of a hidden layer, an active layer, a pooling layer, a convolutional layer, and the like. The input and output of the neural network model may take various forms, which the present disclosure does not limit.

The scheme provided by the embodiment of the disclosure relates to technologies such as artificial intelligence, computer vision technology and machine learning, and is specifically described by the following embodiments.

Fig. 1 is an example schematic diagram illustrating a scene image 100 of a virtual object according to an embodiment of the present disclosure.

The virtual objects of the present disclosure may be individual game characters in a computer game that can be manipulated by virtual object artificial intelligence or a real player. Alternatively, the computer game is a role-playing competitive game, and may be a battle game such as a man-machine battle game or a multiplayer battle game. The man-machine battle game refers to a game in which a game role of a user account and a simulation game role set by the game play a competition in the same scene. The multiplayer battle game refers to a game in which a plurality of user accounts compete in the same scene. Alternatively, the Multiplayer battle game may be MOBA (Multiplayer online battle Arena Games). In addition, the computer game can be a client game or a web game, an online game which needs network support, or an offline game which does not need network support.

The method provided by the embodiment of the disclosure can be applied to scenes such as game guidance, character hosting, NPC control or game testing in computer games, and in these scenes, the electronic device needs to automatically make decisions and controls on the operations of some game characters, so that the game characters can perform reasonable operations in various game scenes like game players. The electronic device may be a terminal or a server.

Taking a game guidance scene as an example, in order to facilitate a novice player to quickly become familiar with a game, during the playing process of the novice player, a terminal or a server may analyze a game scene in which a game character of the novice player is located, predict an operation that the game character of the novice player should perform next, and then present the predicted operation to the novice player to guide the novice player in operation.

Taking a game hosting scene as an example, when the terminal is offline or the player is busy, the player can host the game role of the player, so that the terminal or the server can replace the player to control the game role.

Taking a game test scene as an example, a simulated game role can be set in a game as an opponent of the game role of a player, and the simulated game role is controlled by a terminal or a server or a virtual object artificial intelligence in the terminal or the server, so that the role of a tester is replaced, and the game is played through the virtual object artificial intelligence to obtain test data, so as to realize game performance test.

In the application scenario, the game character controlled by the virtual object in an artificial intelligence manner may be a game character of a player or a simulated game character set for a game, or may be an NPC such as a soldier or a monster. The scene image is an image showing an application scene (particularly, a game scene). The scene image that the virtual object can "see" can include various objects within the field of view of the virtual object, including but not limited to enemy characters, my teammates, obstacles, bonus items, and the like. The virtual object artificial intelligence determines the next operation of the virtual object artificial intelligence by analyzing the information in the visual field range. As shown in fig. 1, characters such as an enemy object a, an enemy object B, and my teammate appear in the visual field range of the artificial intelligence of the virtual object as the player character. The information is displayed to the virtual object artificial intelligence through the scene image. The virtual object artificial intelligence determines its behavior, such as when to fire, at what angle, how the perspective changes, how the character moves, etc., by analyzing this information. In some games, a reference image (e.g., a small map) may be displayed on the scene image, which abstractly shows the position of an enemy object, the position of an obstacle, the position of my teammate, the position of a player character, and the like.

Currently, the way to train the artificial intelligence of the virtual object is mainly based on the algorithm of the Deep Q Network (DQN) and/or the simulated learning algorithm based on the minimum mean square error. The DQN-based algorithm is a deep reinforcement learning algorithm that requires manual design of reward/penalty functions. Virtual object artificial intelligence obtains a sample set of states, actions, and rewards/penalties through constant interaction with the environment. The various parameters in the virtual object artificial intelligence calculation model are then determined by maximizing the expected reward for the game and/or minimizing the expected penalty for the game. Training of virtual object artificial intelligence based on DQN usually requires a large amount of time to be spent, and meanwhile, it is difficult for an algorithm designer to design a suitable reward function/penalty function, so that it is difficult to acquire suitable virtual object artificial intelligence. The virtual object artificial intelligence imitating the learning algorithm based on the minimum mean square error takes the image as the input of an artificial intelligence network model thereof, and then compares and/or fits the output virtual object behavior with the recorded virtual object behavior operated by a real person. When the simulation learning algorithm based on the minimum mean square error is used for fitting the behavior of the virtual object, the loss training model parameter of the minimum mean square error is adopted, and the scheme cannot well process the multi-value problem. That is, when the virtual object has many reasonable decisions in the virtual scene (for example, the virtual object can shoot either the enemy object a or the enemy object B in the scene of fig. 1), since the algorithm using the minimum mean square error can only be used for the average value of the behaviors of the virtual object, the virtual object often cannot select one of the behaviors, and thus cannot attack the object correctly.

The present disclosure proposes a method, device, electronic device and computer readable storage medium for determining behavior of a virtual object. Embodiments of the present disclosure also provide a method of hosting virtual object behavior in a battle game. According to the embodiment of the disclosure, the probability distribution of the behavior of the virtual object is calculated through the light-weight residual error depth network, and the technical problems that the training time is too long, the design difficulty is too large, the multi-value problem cannot be processed and the like exist in the design of the artificial intelligence of the virtual object are solved.

Fig. 2A is a flow diagram illustrating a method 200 of determining virtual object behavior in accordance with an embodiment of the disclosure. Fig. 2B is a schematic diagram illustrating a method 200 of determining virtual object behavior in accordance with an embodiment of the present disclosure. Fig. 2C illustrates a block diagram of an apparatus 2000 for determining behavior of a virtual object according to an embodiment of the present disclosure.

The method for controlling the operation of the virtual object can be applied to computer games or live broadcast and other human-computer interaction scenes, the human-computer interaction scenes can provide virtual scenes and virtual objects for users, and the method provided by the application can automatically control the operation of the virtual objects in the virtual scenes.

The method 200 of determining behavior of a virtual object according to embodiments of the present disclosure may be applied to any electronic device. It is understood that the electronic device may be a different kind of hardware device, such as a Personal Digital Assistant (PDA), an audio/video device, a mobile phone, an MP3 player, a personal computer, a laptop computer, a server, etc. For example, the electronic device may be the device 2000 of fig. 2C that determines the behavior of the virtual object. In the following, the present disclosure is illustrated by an example of the apparatus 2000, and it should be understood by those skilled in the art that the present disclosure is not limited thereto.

Referring to fig. 2C, the device 2000 may include a processor 2001 and a memory 2002. The processor 2001 and memory 2002 may be connected by a bus 2003.

The processor 2001 may perform various actions and processes according to programs stored in the memory 2002. In particular, the processor 2001 may be an integrated circuit chip having signal processing capabilities. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The methods, steps and logic blocks disclosed in the embodiments of the present application may be implemented or performed. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, either of the X87 architecture or the ARM architecture.

The memory 2002 has stored thereon computer instructions that, when executed by the microprocessor, implement the method 200. the memory 2002 may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.

First, in operation 201, the apparatus 2000 may acquire, based on a scene image of a virtual object, a scene feature representing a scene in which the virtual object is located, using a residual network.

Referring to fig. 2B, the scene image may be, for example, the scene image 100 shown in fig. 1. The scene image includes various information about the virtual object. Such information is, for example, the position of the enemy object, the position of my teammate, the status of the player character, and the like. The residual network may extract this information from the scene image and characterize it in the manner of scene features. That is, the scene characteristics may represent the scene where the virtual object is located, so that the artificial intelligence of the virtual object can determine the state of the virtual object in the virtual scene according to the scene characteristics. For example, the virtual object artificial intelligence may determine whether the reference virtual object is in a secure state or in an attacked state, an attacked damage degree when in the attacked state, and the like, so as to facilitate subsequent simulation of perception of the virtual object on damage.

Optionally, the contextual characteristics may be a multidimensional floating point number vector, such as a 128-bit floating point type vector, which integrates various contextual information such as an enemy defense building related scene, an enemy object injury related scene on a virtual object, and the like. Each element in the multi-dimensional floating-point number vector is a floating-point number, and the vector includes a plurality of dimensions. Therefore, the scene features can represent the scene information in a numerical form so as to facilitate subsequent analysis and calculation. For example, the contextual characteristics may describe by numerical values whether the virtual object is under an enemy defense building, is being attacked by an enemy defense building, whether the virtual object is within a range of harm of weapons or skills of an enemy virtual object, a distance to an enemy's nearest attack injury (e.g., bullet, skill, etc.), and so forth. The contextual characteristics may also incorporate information about the virtual object itself, such as the type of virtual object, the type of weapon, the rating of the virtual object, or the combat power of the virtual object, etc. For example, in a computer game, the virtual object types may include two types, a player character and a non-player character. The fighting power of the virtual object may specifically include at least one of a blood volume, a blue volume, an offensive power, a grade, equipment, and a number of hits and kills of the character. Of course, the context information may also include other information that can affect the operation of the virtual object. The present disclosure is not limited to the specific manner in which features of a scene are characterized and the information that it may fuse.

Alternatively, as shown in fig. 2B, a reference image region, for example, a small map region, may be included in the scene image. Operation 201 further comprises: and intercepting a reference image area from the scene image, wherein the reference image area shows the acquaintance information of the virtual object in the game. The learnable information is, for example, deployment layouts of both friend and foe, map information, friend object locations, my teammate locations, and the like. The reference image area is not limited to the disk form shown in fig. 2B as long as information can be presented in a predefined form in the scene image. For example, in the reference image area, the position of an enemy object may be represented by a red rectangular dot, the position of my teammate may be represented by a blue circular dot, and so on, and the disclosure does not limit how the information is characterized in the reference image area.

Since the reference image region represents most of the available information of the virtual object in the scene image in an abstract manner, the device 2000 may obtain the scene characteristics representing the scene where the virtual object is located by using the residual error network based on the reference image region only, thereby reducing the number of input parameters of the residual error network and making the residual error network more lightweight and efficient.

The residual network can prevent gradient decay/gradient explosion in the neural network, thereby further increasing the convergence speed of the residual network for virtual object artificial intelligence, for example, the residual network shown in FIG. 2B includes two convolutional layers, the convolutional kernels of both convolutional layers have a size of 3 × 3 and a step size of 1, each convolutional layer includes C convolutional kernelsThe input feature of the residual network is a convolution feature x1The spatial dimension is H × W, the channel dimension is C1Through these two convolution layers, a new convolution characteristic x will be obtained2The spatial dimension is H × W, the channel dimension is also C1And x2Adding to obtain the final output characteristic x3Its spatial dimension is also H × W and the channel dimension is also c3Due to the fusion of feature x1And x2Thereby outputting the feature x3The result in (2) results in the continued propagation of errors due to the cascaded operation of each convolutional layer, which in turn results in abnormal virtual object behavior. Input feature x1May be the reference image area/scene image above. Output characteristic x3May be a contextual feature as described above.

Compared with the traditional DQN-based neural network model, the residual error network has the advantages of simple structure, less parameter quantity and high convergence speed, does not need to design a reward/penalty function, and is more convenient for artificial intelligence training and application of virtual objects.

Next, in operation 202, the device 2000 determines a probability distribution of the behavior of the virtual object in a predetermined behavior set based on the context characteristics.

As shown in fig. 2B, the probability distribution of the virtual object behavior in the predetermined behavior set may be a discrete probability distribution or a continuous probability distribution, which is not limited by the present disclosure. The probability distribution of the virtual object behavior in the predetermined behavior set indicates a probability that the virtual object behavior occurs as each predetermined behavior in the predetermined behavior set. For example, assuming that the behavior of the virtual object indicates whether the virtual object is to shoot at a gun, the predetermined behavior set includes two behaviors "shoot" and "not shoot". The device 2000 calculates the probability of "shooting" to be 0.7 and the probability of not shooting to be 0.3 according to the scene characteristics. At this time, the virtual object performs a shooting operation with a probability of 0.7 while facing the scene shown in fig. 2B. The virtual object artificial intelligence outputs a random number using the probability distribution. Assuming that the random number represents a shot with 1 and a non-shot with 0, the virtual object faces the same scene a plurality of times, the number of times of outputting the random number 1 is 70%, and the number of times of outputting the random number 0 is 30%. Therefore, the behavior mode of the virtual object is not rigid and is easy to predict, and the interest of the game is improved.

Alternatively, the apparatus 2000 may utilize a behavior prediction network to determine a probability distribution of virtual object behaviors in a predetermined set of behaviors. The combination of the behavior prediction network and the residual network may be referred to as a hybrid density network. Since in the battle game, especially the gunfight game, the real player mainly manipulates the moving direction of the virtual object, adjusts the angle change of the angle of view of the virtual object, and adjusts the magnitude of the angle of view of the virtual object. The virtual object behavior may thus comprise at least a part of the direction of movement of the virtual object, the angle of view of the virtual object at the next moment in time and the amplitude of the angle of view of the virtual object at the next moment in time. Those skilled in the art will appreciate that the behavior of the virtual objects may vary from game to game. Such as a board game, the behavior of the virtual object may include the number of points played, etc. Where the virtual object behavior includes at least a portion of a direction of movement of the virtual object, a perspective angle of the virtual object at a next time instance, and a perspective magnitude of the virtual object at the next time instance, the behavior prediction network may include at least a portion of a direction of movement prediction network, a perspective angle prediction network, and a perspective magnitude prediction network.

Optionally, the predetermined set of behaviors for the direction of movement of the virtual object includes moving up, up right, down left, up left. As shown in fig. 2B, a first probability distribution indicating a probability distribution of a virtual object moving in a plurality of moving directions is acquired by inputting scenario features to a moving direction prediction network. The first probability distribution may be a discrete probability distribution. The first probability distribution may have a plurality of equal maxima. For example, assuming that the probabilities to the left and right are both 0.4 and the probabilities to the other directions are below 0.4, the first probability distribution indicates that the simulated player whose virtual object is both maximized to walk to the left and right can operate.

Optionally, the angle of view value of the virtual object corresponds to the angle the virtual object faces. The predetermined behavior set of the view angle values of the virtual object is an interval of view angle values to which the virtual object can rotate. For example, the interval of view angle values may be from 0 to 1, corresponding to 0 to 360 degrees clockwise down. The device 2000 inputs the scene characteristics to the view angle prediction network to obtain a second probability distribution indicating a probability distribution of view angle values of the virtual object at a next time in a view angle value interval. The second probability distribution may be a discrete probability distribution or may be a continuous probability distribution as shown in fig. 2B. As shown by the solid line in fig. 2B, the second probability distribution may have a plurality of extreme values, each extreme value representing an optimal solution (optimal strategy) for the angle of view of the virtual object under the virtual scene.

For another example, the scene image of the virtual object may be a battle game interface in which, in a case where an attack object is present on the battle game interface, an extremum of the second probability distribution is close to a viewing angle facing the attack object. For example, in the battle game scene shown in fig. 1, the angle of view facing the enemy character a and the enemy character B will become the two extrema of the second probability distribution.

Optionally, the predetermined set of behaviors of the view angle value of the virtual object is an interval of view angle values over which the virtual object can vary. For example, the interval of view angle amplitude values may be from 0 to 1, corresponding to a minimum value of amplitude and a maximum value of amplitude. The device 2000 inputs the scene characteristics to the view angle amplitude prediction network to obtain a third probability distribution, where the third probability distribution indicates a probability distribution of view angle amplitude values of the virtual object at a next time in a view angle amplitude value interval. The third probability distribution may be a discrete probability distribution or may be a continuous probability distribution as shown in fig. 2B. As shown by the dashed lines in FIG. 2B, the third probability distribution can have a plurality of extrema, each extrema representing an optimal solution (optimal strategy) for the perspective magnitudes of the virtual objects under the virtual scene.

Finally, in operation 203, the device 2000 determines the virtual object behavior based on the probability distribution.

Optionally, the device 2000 determines a direction of movement of the virtual object based on the first probability distribution. The virtual object randomly samples the movement actions according to the first probability, rather than always performing the most probable action. In some cases, performing the most probable action may cause the virtual object to hit an obstacle and then stop at the front of the obstacle and not move. The movement behavior is randomly sampled through the first probability, so that the virtual object can be prevented from being clamped in a game scene, and the virtual object moves randomly according to the first probability, so that the game player can get rid of difficulties with a certain probability.

Optionally, the device 2000 determines a viewing angle value of the virtual object at the next time instant based on the second probability distribution. If the enemy appears on the right side, the angle of the visual angle needs to be moved to the right until the enemy appears in the center of the image, so that the virtual object can conveniently attack the enemy. Two extreme values are shown in fig. 2B, which correspond to the angle of view facing enemy character a and the angle of view facing enemy character B, respectively. And randomly outputting view angle values according to the second probability distribution, wherein the output view angle values have a high probability of being view angles close to the extreme value, so that the virtual object moves the sight line towards the direction of the attack object. Similarly, the device 2000 may also determine a view angle magnitude value of the virtual object at the next time instant based on the third probability distribution.

Alternatively, both the second probability distribution and the third probability distribution may be a mixed gaussian distribution. The gaussian mixture distribution is a linear combination of multiple gaussian distributions that can fit well to various probability distributions. Therefore, the visual angle variation behavior of the virtual object can be better fitted by adopting the Gaussian mixture distribution.

The virtual object determines the behavior of the virtual object according to the probability distribution, and can execute a plurality of reasonable game strategies when the same game scene is faced, so that compared with a loss training model of the minimum mean square error, the multi-value problem faced by the virtual object can be better solved.

The method 200 calculates the probability distribution of the virtual object behaviors through the lightweight residual error depth network, and solves the technical problems that training time is too long, design difficulty is too large, multi-value problems cannot be processed and the like in the design of artificial intelligence of the virtual object.

Fig. 3A is a schematic diagram illustrating a residual network and a behavior prediction network according to an embodiment of the disclosure. Fig. 3B is a schematic diagram illustrating a first residual module and a second residual module, according to an embodiment of the present disclosure. Fig. 3C is a schematic diagram illustrating a first residual module and a second residual module, according to an embodiment of the present disclosure.

Referring to fig. 3A, the behavior prediction network includes a moving direction prediction network 302, a view angle prediction network 303, and a view amplitude prediction network 304. The residual network 301 comprises at least one first residual module 3011 and at least one second residual module 3012.

The residual network 301 is used to output the scene characteristics. Assume that the scene is characterized as a one-dimensional vector comprising N floating point numbers, N being greater than 1. Preferably, N may be equal to 200.

The moving direction prediction network 302 is used to complete the subtask 1, that is, to output probability distributions (first probability distributions) of the virtual object moving in a plurality of moving directions. Optionally, the moving direction prediction network 302 includes a fully connected layer, the input of which is a scene feature, and the output of which is a probability distribution of the virtual object moving in a plurality of moving directions. Taking 8 shift directions as an example, the output includes an array of 8 floating point numbers as the first probability distribution. Each floating point number in the array represents a probability that the virtual object moves in a certain direction. The device 2000 may determine a moving direction of the virtual object based on the first probability distribution.

The view angle prediction network 303 is configured to complete the subtask 2, that is, output a probability distribution (a second probability distribution) of a view angle value of the virtual object at the next time in the view angle value interval. Optionally, the view angle prediction network 303 comprises three fully connected layers: a fully connected layer for calculating the mean μ, a fully connected layer for calculating the variance σ, and a fully connected layer for the weight ω. For example, the input to each fully-connected layer is a contextual feature and the output is an array comprising 32 floating point numbers, and in FIG. 3A, the output of each fully-connected layer is simply represented by the number of data items included in the output array of that fully-connected layerAnd (6) discharging. Through the view angle prediction network 303, the device 2000 may obtain 3 number sets { μ }k}、{σk}、{ωkK is more than or equal to 1 and less than or equal to 32. Mean value mu corresponding to each kkSum variance σkA gaussian distribution can be constructed as follows.

Thus, the array { μk}、{σkConstitute 32 Gaussian distributions, and associate the 32 Gaussian distributions with their corresponding weights { omega }kCombining to obtain a second probability distribution, so that the second probability distribution is a mixed gaussian distribution, which can be formed by the following formula:

where K is the number of Gaussian distributions constituting the mixed Gaussian distribution, ωkIs the weight of the kth Gaussian distribution, and is not less than 0 and not more than omegakLess than or equal to 1, and simultaneously,μkis the mean, σ, of the kth Gaussian distributionkIs the standard deviation of the kth gaussian distribution. A gaussian distribution is a distribution that is abundant in nature, the most common, and a gaussian distribution usually has an extreme value. Linearly combining multiple gaussian distributions to form a mixed gaussian distribution can have multiple extrema (each extrema representing an optimal strategy) to better fit the probability distribution of view angles.

Those skilled in the art will appreciate that 32 is only one example and that the view angle prediction network 303 may output a greater or lesser number of means μ, variances σ, and weights ω.

The view amplitude prediction network 304 is configured to complete the subtask 3, that is, output a probability distribution (third probability distribution) of the view amplitude value of the virtual object at the next time in the view amplitude value interval. Optionally, the view angle magnitude prediction network 304 includes three fully connected layers: a fully connected layer for calculating the mean μ, a fully connected layer for calculating the variance σ, and a fully connected layer for the weight ω. The method of constructing the third probability distribution based on these values is similar to the method of constructing the second probability distribution, and therefore, the description thereof is omitted.

It will be understood by those skilled in the art that the first, second and third probability distributions may be probability distributions defined in other ways, and that the discrete probability distribution and the mixed gaussian distribution are merely examples, and the disclosure is not limited thereto.

The structure of the first and second residual modules 3011 and 3012 may be as shown in fig. 3B.

The spatial dimension of the input features of the first residual module 3011 is twice the spatial dimension of its output features, and the channel dimension of the input features of the first residual module 3011 is half the channel dimension of its output features. The first residual error module can be used as a module in the residual error network and is repeatedly called in the design of the residual error network, so that the design of the residual error network is simpler.

Optionally, the first residual module 3011 includes: a first number of first convolutional layers, the step size of the first convolutional layers being a first step size, and the size of the convolutional cores of the first convolutional layers being a first size; a second number of second convolutional layers, the step size of the second convolutional layers being a second step size, and the size of the convolutional cores of the second convolutional layers being a second size; a second number of third convolutional layers, the step size of the third convolutional layers being a second step size, and the size of the convolutional cores of the third convolutional layers being the first size. The design of the first residual module can enable the residual network to extract information more lightweight and efficiently. Specifically, the convolution kernels with different step lengths and different sizes can be used for fusing information of fields with different sizes at different sampling rates, and the use efficiency of the residual error network is improved.

It will be understood by those skilled in the art that the specific values of the first number, the second number, the first step size, the second step size, the first dimension and the second dimension can be set according to practical situations, and the disclosure does not set any limit to the specific values of these parameters.

For ease of understanding, the present disclosure will be described by taking the first number as 2, the second number as 1, the first step size as 2, the second step size as 2, the first size as 1 × 1, and the second size as 3 × 3.

A person skilled in the art can set the number of convolution kernels included in each convolution layer in the first residual module 3011 and the second residual module 3012 according to a difference between a scene image and a game scene. The present disclosure does not limit the number of convolution kernels in a convolution layer.

Referring to fig. 3B, first residual module 3011 may include convolutional layer a, convolutional layer B, convolutional layer C, and convolutional layer D. Wherein, the convolution layer A and the convolution layer B are the first convolution layer, the convolution layer C is the second convolution layer, and the convolution layer D is the third convolution layer. Note that, even if convolutional layer a and convolutional layer B both belong to the first convolutional layer, the number of convolutional cores included in convolutional layer a and convolutional layer B may be the same or different. The following explanation is merely for convenience in explaining embodiments of the present disclosure, and the present disclosure does not limit the number of convolution kernels in the convolution layers a-D.

Assume that the spatial dimension of the input features of the first residual module is H × W and the channel dimension is C.

Where the step size of convolutional layer a and convolutional layer B is 2, and the size of the convolutional kernel is 1 × 1, since the step size of convolutional layer a and convolutional layer B is 2, the width and height (i.e., spatial dimension) of the output of both convolutional layers will be reduced by a factor of 1.

At this time, assume that the number of convolution kernels of convolution layer a is 2C (how many convolution kernels there are output channels, i.e., the number of convolution kernels is equal to the number of output channels) — therefore, the input space dimension of convolution layer a is H × W, the input channel dimension is M, and the output space dimension is (0.5H) ×

(0.5W), the output channel dimension is 2M.

Assuming that the number of convolution kernels of convolutional layer B is M, the input spatial dimension of convolutional layer B is H × W, the input channel dimension is M, the output spatial dimension is (0.5H) × (0.5W), and the output channel dimension is M.

Since the input features of convolutional layer B are the output features of convolutional layer B, the input spatial dimension of convolutional layer C is (0.5H) × (0.5W), the input channel dimension is M, and the output spatial dimension is (0.5H) ×

(0.5W), the output channel dimension is M.

Assuming that the step size of convolutional layer D is 1, the size of convolutional kernel is 1 × 1, and the number of convolutional kernels is 2M, since the input characteristic of convolutional layer D is the output characteristic of convolutional layer M, the input spatial dimension of convolutional layer D is (0.5H) × (0.5W), the input channel dimension is M, the output spatial dimension is (0.5H) × (0.5W), and the output channel dimension is 2M.

The output characteristics of the first residual module can be obtained by adding the output characteristics of the convolutional layer A and the convolutional layer D, the output space dimension of the first residual module is (0.5H) × (0.5W), and the output channel dimension is 2M.

The spatial and channel dimensions of the input features of the second residual module 3012 are the same as the spatial and channel dimensions of its output features. Similarly, the second residual module can be a module in the residual network, and is called repeatedly in the design of the residual network, so that the design of the residual network is simpler.

The second residual module 3012 includes: a first number of third convolutional layers, the step size of the third convolutional layers being a second step size, and the size of the convolutional cores of the third convolutional layers being a first size; a second number of second convolutional layers, the step size of the second convolutional layers being a second step size, and the size of the convolutional cores of the second convolutional layers being a second size. The design of the second residual error module can enable the residual error network to extract information more lightly and efficiently. Specifically, the convolution kernels with different step lengths and different sizes can fuse information of fields with different sizes at different sampling rates, and the use efficiency of the residual error network is improved.

Referring to fig. 3B, second residual module 3012 includes convolutional layer E, convolutional layer F, and convolutional layer G. Wherein convolutional layer E and convolutional layer G are the third convolutional layer, and convolutional layer F is the second convolutional layer.

Note that even if convolutional layer D, convolutional layer E, and convolutional layer G all belong to the third convolutional layer, the number of convolutional cores included in convolutional layer D, convolutional layer E, and convolutional layer G may be the same or different. Even if convolutional layer C and convolutional layer F both belong to the second convolutional layer, the number of convolutional cores included in convolutional layer C and convolutional layer F may be the same or different. This disclosure does not limit the number of convolution kernels in convolution layers E-G.

Assume that the spatial dimension of the input features of the second residual module 3012 is H × W and the channel dimension is M.

Assume that the number of convolution kernels of convolutional layer E is 0.5M, the step size is 1, and the size of the convolution kernels is 1 × 1, thus, the input spatial dimension of convolutional layer E is H × W, the input channel dimension is M, the output spatial dimension is H × W, and the output channel dimension is 0.5M.

Assuming that the step size of convolutional layer F is 1, the size of the convolutional kernel is 3 × 3, and the number of convolutional kernels is 0.5M, the input features and the output features of convolutional layer F have the same spatial dimension and channel dimension, since the input features of convolutional layer F are the output features of convolutional layer E, the input spatial dimension of convolutional layer F is H × W, the input channel dimension is 0.5M, the output spatial dimension is H × W, and the output channel dimension is 0.5M.

Assuming that the step size of convolutional layer G is 1, the size of convolutional kernel is 1 × 1, and the number of convolutional kernels is M, since the input characteristic of convolutional layer G is the output characteristic of convolutional layer F, the input space dimension of convolutional layer G is H × W, the input channel dimension is 0.5M, the output space dimension is H × W, and the output channel dimension is M.

The first and second residual modules 3011 and 3012 may be cascaded with other layers in the residual network in various orders. Fig. 3C illustrates an example cascading approach.

In addition to the first and second residual modules 3011 and 3012, the residual network includes a convolutional layer, a global average pooling layer, and a fully-connected layer. Assume that the scene image or the reference image region in the scene image is 1024 × 1024 pixels, and has three input channels of RGB.

The scene image or the reference image area in the scene image is input to a convolutional layer, which has 8 convolutional kernels, each convolutional kernel has a size of 7 × 7, the step size is 2, and the spatial dimension of the output feature of the convolutional layer is 512 × 512 and the channel dimension is 8.

Then, the output feature passes through a first residual module, and the number of convolution kernels of the convolution layer a, the convolution layer B, the convolution layer C and the convolution layer D of the first residual module is [16,8,8,16], so that an output vector with a spatial dimension of 256 × 256 and a channel dimension of 16 is obtained.

Then, the output feature passes through two second residual modules, and the number of convolution kernels of the convolution layer E, the convolution layer F and the convolution layer G of the second residual module is [8,8,16], so that an output vector with a spatial dimension of 256 × 256 and a channel dimension of 16 is obtained.

Then, the output vectors of the two second residual modules sequentially pass through 1 first residual module, 3 second residual modules, 1 first residual module, 5 second residual modules, 1 first residual module, 2 second residual modules, 1 first residual module and 2 second residual modules to obtain an output vector with a channel dimension of 256 and a space dimension of 16 × 16.

The output vector of the last second residual module (channel dimension 256, spatial dimension 16 x 16) will be input to a global average pooling layer and a full-link layer, resulting in a one-dimensional vector with 200 floating-point numbers. This one-dimensional vector is also called a scene vector.

Fig. 4A is a flow diagram illustrating training a residual network and a behavior prediction network according to an embodiment of the disclosure. Fig. 4B is a flow diagram illustrating one example of training a residual network and a behavior prediction network in accordance with an embodiment of the present disclosure.

In operation 401, the device 2000 records a video in which the virtual object is manipulated. The device 2000 may collect a sample of a half hour or so of a gunfight game by manually recording the gunfight game at a sampling frequency of 10 frames per second.

In operation 402, the apparatus 2000 acquires a plurality of sample data from the video, each sample data including a game interface sample and a movement direction sample, a next-time perspective angle sample, and a next-time perspective amplitude sample of a virtual object performed for the game interface sample. For example, as shown in fig. 4B, the device 2000 may record a gunfight game sample and then extract virtual object behavior therefrom to obtain a movement direction sample, a next time view angle sample, and a next time view magnitude sample. The movement direction sample includes the movement of the virtual object in 8 directions (at 45-degree intervals, divided into up, upper right, lower left, upper left). The view angle sample at the next time includes the view angle value of each frame when the player operates the virtual object, that is, the view angle value of each frame of the game character in the game video. The view angle amplitude sample at the next moment includes the view angle amplitude of each frame when the player operates the virtual object, that is, the change of the view angle value of the game character in each frame. The device 2000 saves the video of the game and the corresponding virtual object behavior. Alternatively, as shown in fig. 4B, the device 2000 may also extract a reference image area (i.e., a small map area) in the game interface as a game interface sample. Optionally, after sample data is obtained, 80% of the sample data is used to train a residual error network and a behavior prediction network (a combination of the residual error network and the behavior prediction network is also referred to as a mixed density network), and the remaining sample data is used to test the mixed density network.

In operation 403, the residual network and the behavior prediction network are trained based on the plurality of sample data. The structure of the residual network may be similar to that shown in fig. 3A to 3B. Those skilled in the art will appreciate that, as shown in fig. 4B, the designer of the virtual object artificial intelligence can also design other lightweight network structures as residual networks according to different games, and then train a mixed density network including the residual networks and the behavior prediction network.

Operation 403 further comprises training parameters of a residual error network and a moving direction prediction network by optimizing class cross entropy loss between the moving direction samples and moving directions predicted by the moving direction prediction network based on game interface samples in the plurality of sample data and moving direction samples corresponding to the game interface samples.

For example, the class cross entropy loss between the movement direction sample and the movement direction predicted by the movement direction prediction network may be defined as:

where m is the total number of samples, C is the number of categories (e.g., C-8, representing the game character's movement in 8 directions), yjiA label representing the ith class of the jth sample. If the category of the jth sample is i, then yjiIs 1. If the category of the jth sample is not i, yjiIs 0. p is a radical ofjiRepresenting the probability that the jth sample is in the ith class. The parameters of the residual error network and the movement direction prediction network are trained by optimizing the category cross entropy loss (when the loss function converges) by taking the category cross entropy loss as an objective function, so that the residual error network and the movement direction prediction network can learn the movement strategy of the gunfight game.

Operation 403 further comprises training parameters of a residual error network and a view angle prediction network by optimizing a posterior probability loss between a game interface sample of the plurality of sample data and a probability distribution of view angles predicted based on a view angle prediction network based on the view angle sample and a view angle sample at a next time corresponding to the game interface sample.

For example, the posterior probability loss between a view angle sample and the probability distribution of view angles predicted based on the view angle prediction network can be defined as:

wherein x isnIs the angle of view value of the nth sample, N is the sampleThe total number of the books. The posterior probability loss is used as an objective function, parameters of a residual error network and a view angle prediction network are trained by optimizing the loss (when the loss function is converged), and therefore the residual error network and the moving direction prediction network can learn a game strategy related to the view angle direction of the gunfight game. The device 2000 can better solve the multi-value problem related to the viewing angle direction in the game through the scheme of mixed gaussian distribution (there are multiple reasonable strategies in the same scene).

Operation 403 further comprises training parameters of a residual error network and a view amplitude prediction network by optimizing a posterior probability loss between a view amplitude sample of the plurality of sample data and a probability distribution of view amplitudes predicted based on a view amplitude prediction network based on the view amplitude sample and a view amplitude sample of a next time instant corresponding to the game interface sample. Similarly, the posterior probability loss between the view amplitude samples and the probability distribution of view amplitudes predicted based on the view amplitude prediction network can also be defined as:similarly, the posterior probability loss is taken as an objective function, parameters of the residual error network and the view angle prediction network are trained by optimizing the loss (when the loss function converges), so that the residual error network and the moving direction prediction network can learn a game strategy related to the view angle amplitude under the gunfight game, and a multi-value problem related to the view angle amplitude in the game can be well solved by a scheme of mixing Gaussian distribution (the same scene has a plurality of reasonable strategies).

In experiments of embodiments of the present disclosure, the apparatus 2000 was able to complete training of the mixed density network by updating the parameters of the mixed density network through 20 iterations (each iteration would go through all training samples). Based on the game sample of 10 plays, 1 play is about 3 minutes, the recording time of 10 plays is about 30 minutes, and in the case of the GPU, the time for training the mixed density network is about half an hour, so that the virtual object artificial intelligence is obtained in about one hour.

Because the simulated player controls the virtual object, the mixed density network can train the gunfight game artificial intelligence through a small amount of recorded samples, the training efficiency is greatly improved, meanwhile, the lightweight residual small model can extract abstract features with more discriminative power, and the game artificial intelligence can obtain better results in the gunfight game. And finally, outputting probability parameters of discrete probability distribution and mixed Gaussian distribution through a mixed density network, and sampling angle and amplitude of view according to the probability distribution, thereby better solving the multi-value problem of games.

FIG. 5 illustrates a flow diagram of a method 500 of hosting virtual object behavior in a battle game in accordance with an embodiment of the present disclosure.

The game hosting scene means that when the terminal is offline or the player is busy, the player can host the game role of the player, so that the terminal or the server can replace the player to control the game role.

In operation 501, the device 2000 may determine a probability distribution of virtual object behaviors in a predetermined behavior set based on a game interface of the virtual object. The game interface for the virtual object may be the scene image 100 in fig. 1. A reference image area, such as a small map area, may be included in the scene image.

The probability distribution of the virtual object behavior in the predetermined behavior set may be a discrete probability distribution or a continuous probability distribution, which is not limited by the present disclosure. The probability distribution of the virtual object behavior in the predetermined set of behaviors indicates a probability that the virtual object behavior occurs in the predetermined set of behaviors. For example, assuming that the behavior of the virtual object indicates whether the virtual object is to shoot at a gun, the predetermined behavior set includes two behaviors of "shoot" and "not shoot". The device 2000 calculates the probability of "shooting" to be 0.7 and the probability of not shooting to be 0.3 from the scene characteristics. At this time, the virtual object performs a shooting operation with a probability of 0.7 while facing the scene shown in fig. 2B. The virtual object artificial intelligence outputs a random number using the probability distribution. Assuming that the random number represents shooting with 1 and not shooting with 0, the number of times of outputting the random number 1 and the number of times of outputting the random number 0 account for 70% when the virtual object faces the same scene a plurality of times. Therefore, the behavior mode of the virtual object is not rigid and is easy to predict, and the interest of the game is improved.

Optionally, as shown in fig. 2B, the probability distribution of the virtual object behavior in the predetermined behavior set includes a first probability distribution, a second probability distribution, and a third probability distribution. The first probability distribution indicates a probability distribution of movement of a virtual object in a plurality of directions of movement. The second probability distribution indicates a probability distribution of view angle values of the virtual object at a next time instant over a range of view angle values. The third probability distribution indicates a probability distribution of view angle amplitude values of the virtual object at a next time in a view angle amplitude value interval.

In operation 502, the device 2000 may host the virtual object behavior based on the probability distribution. Wherein, in a case where an attack object appears on the game interface, an extreme value of a probability distribution of a perspective angle of the virtual object at a next time is close to a perspective angle facing the attack object.

For example, if the enemy appears on the right side, the angle of view needs to be moved to the right until the enemy appears in the center of the image, so that the virtual object can attack the enemy conveniently. Two extreme values of the second probability distribution are shown in fig. 2B, which correspond to the angle of view facing enemy character a and the angle of view facing enemy character B, respectively. And randomly outputting a view angle value according to the second probability distribution, wherein the output view angle value is approximately the view angle close to the extreme value, so that the virtual object moves the view line towards the direction of the attack object.

Because the virtual object determines the behavior of the virtual object according to the probability distribution, the virtual object can execute a plurality of reasonable game strategies when facing the same game scene, thereby better solving the multivalue problem faced by the virtual object.

Fig. 6 is a structural diagram illustrating an apparatus 2000 for determining behavior of a virtual object according to an embodiment of the present disclosure.

The device 2000 may include a contextual feature acquisition module 601, a probability distribution determination module 602, and a virtual object behavior determination module 603.

The scene feature obtaining module 601 may be configured to obtain, based on a scene image of a virtual object, a scene feature representing a scene in which the virtual object is located by using a residual network.

Optionally, a reference image region, such as a small map region, may be included in the scene image. The scenario characteristic obtaining module 601 may further perform: and intercepting a reference image area from the scene image, wherein the reference image area shows the acquaintance information of the virtual object in the game. Examples of the learnable information are deployment layouts of both friend and foe, map information that has already been explored, friend object locations, my teammate locations, and so on. Since the reference image region represents the known information of the virtual object in the scene image in an abstract manner, the device 2000 may obtain the scene features representing the scene where the virtual object is located by using the residual error network based on the reference image region only, thereby reducing the number of input parameters of the residual error network and making the residual error network more lightweight and efficient.

The probability distribution determination module 602 may be configured to determine a probability distribution of the behavior of the virtual object based on the contextual characteristics.

Alternatively, the apparatus 2000 may utilize a behavior prediction network to determine a probability distribution of virtual object behaviors in a predetermined set of behaviors. The combination of the behavior prediction network and the residual network may be referred to as a hybrid density network. Wherein the behavior prediction network comprises a moving direction prediction network, a view angle prediction network, and a view amplitude prediction network, and the virtual object behavior comprises at least a portion of a moving direction of the virtual object, a view angle of the virtual object at a next time, and a view amplitude of the virtual object at the next time. Optionally, the predetermined set of behaviors for the direction of movement of the virtual object includes moving up, up right, down left, up left. The predetermined behavior set of the view angle values of the virtual object is an interval of view angle values to which the virtual object can rotate. The predetermined behavior set of the view angle amplitude value of the virtual object is an interval of view angle amplitude values over which the virtual object can vary.

The virtual object behavior determination module 603 may be configured to determine the virtual object behavior based on the probability distribution.

Optionally, the device 2000 determines a direction of movement of the virtual object based on the first probability distribution. The virtual object randomly samples the movement actions according to the first probability, rather than always performing the most probable action. In some cases, performing the most probable action may cause the virtual object to hit an obstacle and then stop at the front of the obstacle and not move. The movement behavior is randomly sampled through the first probability, so that the virtual object can be prevented from being clamped in a game scene, and the virtual object moves randomly according to the first probability, so that the game player can get rid of difficulties with a certain probability.

Optionally, the device 2000 determines a viewing angle value of the virtual object at the next time instant based on the second probability distribution. If the enemy appears on the right side, the angle of the visual angle needs to be moved to the right until the enemy appears in the center of the image, so that the virtual object can conveniently attack the enemy. Two extreme values are shown in fig. 2B, which correspond to the angle of view facing enemy character a and the angle of view facing enemy character B, respectively. And randomly outputting view angle values according to the second probability distribution, wherein the output view angle values have a high probability of being view angles close to the extreme value, so that the virtual object moves the sight line towards the direction of the attack object. Similarly, the device 2000 may also determine a view angle magnitude value of the virtual object at the next time instant based on the third probability distribution.

Embodiments of the present disclosure provide a computer-readable storage medium having stored thereon computer instructions that, when executed by a processor, implement the method 200 and the method 500.

According to the method 200 and the method 500, the distribution of the behavior of the virtual object is calculated through the light-weight residual error network, and the technical problems that training time is too long, design difficulty is too large, multi-value problems cannot be processed and the like exist in the design of artificial intelligence of the virtual object are solved.

It is to be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In general, the various example embodiments of this disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While aspects of embodiments of the disclosure have been illustrated or described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

The exemplary embodiments of the invention, as set forth in detail above, are intended to be illustrative, not limiting. It will be appreciated by those skilled in the art that changes may be made in these embodiments or in their features without departing from the principles and spirit of the invention, and that such changes are intended to be within the scope of the invention.

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:碰撞检测的方法和装置、存储介质及电子装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类