Intelligent agent interaction method and device, computer equipment and storage medium

文档序号:413371 发布日期:2021-12-21 浏览:24次 中文

阅读说明:本技术 一种智能体互动方法、装置、计算机设备及存储介质 (Intelligent agent interaction method and device, computer equipment and storage medium ) 是由 邱福浩 韩国安 练振杰 王伟轩 王亮 于 2021-07-22 设计创作,主要内容包括:本申请提供一种智能体互动方法、装置、计算机设备及存储介质,可以应用于云计算领域或人工智能领域,用于解决智能体的互动准确性较低的问题。该方法包括:响应于虚拟账号触发的互动请求指令,加载目标智能体。响应于所述虚拟账号针对目标虚拟互动场景中,与所述虚拟账号关联的第一目标虚拟受控元素触发的控制操作,获取所述控制操作对应的目标互动场景图像。从所述目标互动场景图像中提取目标互动状态特征。基于所述目标互动状态特征确定所述目标智能体对应的目标调度操作和目标互动操作。响应于所述目标调度操作和所述目标互动操作,在所述目标虚拟互动场景中对所述目标智能体关联的第二目标虚拟受控元素进行控制。(The application provides an intelligent agent interaction method, an intelligent agent interaction device, computer equipment and a storage medium, which can be applied to the field of cloud computing or the field of artificial intelligence and are used for solving the problem of low interaction accuracy of intelligent agents. The method comprises the following steps: and responding to an interaction request instruction triggered by the virtual account, and loading the target agent. Responding to a control operation triggered by a first target virtual controlled element associated with the virtual account in a target virtual interaction scene by the virtual account, and acquiring a target interaction scene image corresponding to the control operation. And extracting target interaction state characteristics from the target interaction scene image. And determining target scheduling operation and target interaction operation corresponding to the target agent based on the target interaction state characteristics. And responding to the target scheduling operation and the target interaction operation, and controlling a second target virtual controlled element associated with the target agent in the target virtual interaction scene.)

1. An agent interaction method, comprising:

responding to an interaction request instruction triggered by the virtual account, and loading a target agent;

responding to a control operation triggered by a first target virtual controlled element associated with the virtual account in a target virtual interaction scene by the virtual account, and acquiring a target interaction scene image corresponding to the control operation;

extracting target interaction state features from the target interaction scene image;

determining target scheduling operation and target interaction operation corresponding to the target agent based on the target interaction state characteristics;

and responding to the target scheduling operation and the target interaction operation, and controlling a second target virtual controlled element associated with the target agent in the target virtual interaction scene.

2. The method of claim 1, wherein the target agent is trained using:

based on the interaction process of the to-be-trained intelligent agent and a preset reference intelligent agent in a sample virtual interaction scene, performing multi-round iterative training on the to-be-trained intelligent agent until a preset training target is met, outputting the to-be-trained intelligent agent as a target intelligent agent, wherein in one round of iterative training, the following operations are performed:

predicting a sample scheduling operation executed by the intelligent agent to be trained aiming at a sample virtual controlled element associated with the intelligent agent to be trained in the sample virtual interactive scene based on a first sample interactive state characteristic corresponding to a first sample interactive scene image in the sample virtual interactive scene, and predicting a sample interactive operation executed by the intelligent agent to be trained aiming at the sample virtual controlled element after the intelligent agent to be trained executes the sample scheduling operation;

and adjusting the model parameters of the intelligent agent to be trained based on a second sample interaction state characteristic corresponding to a second sample interaction scene image generated after the sample scheduling operation is executed and a third sample interaction state characteristic corresponding to a third sample interaction scene image generated after the sample interaction operation is executed.

3. The method of claim 2, wherein adjusting the model parameters of the agent to be trained based on the second sample interaction state feature corresponding to the second sample interaction scene image generated after the sample scheduling operation is performed and the third sample interaction state feature corresponding to the third sample interaction scene image generated after the sample interaction operation is performed comprises:

determining scheduling excitation data of the sample scheduling operation according to a preset scheduling excitation strategy based on the second sample interaction state characteristic, wherein the scheduling excitation data is used for representing the completion degree of the sample scheduling operation and the influence degree of the sample scheduling operation on the sample interaction result;

determining interactive excitation data of the sample interactive operation according to a preset interactive excitation strategy based on the third sample interactive state feature, wherein the interactive excitation data is used for representing the influence degree of the sample interactive operation on the sample interactive result;

and respectively determining error values between the scheduling excitation data and the interactive excitation data and preset target excitation data, and adjusting the model parameters of the agent to be trained based on the obtained error values.

4. The method according to claim 2, wherein performing multiple rounds of iterative training on the agent to be trained based on the interaction process of the agent to be trained and a preset reference agent in a sample virtual interaction scene until a preset training target is met, and outputting the agent to be trained as a target agent comprises:

randomly extracting reference agents from each reference agent in a preset reference agent set based on the corresponding selection probability of each reference agent;

performing multiple rounds of iterative training on the intelligent agent to be trained based on the interactive process of the intelligent agent to be trained and the extracted reference intelligent agent in the sample virtual interactive scene;

if the to-be-trained intelligent agent does not meet the training target when the sample interaction result of the to-be-trained intelligent agent and the extracted reference intelligent agent is obtained, re-extracting the reference intelligent agents from the reference intelligent agents, and continuing to perform multiple rounds of iterative training on the to-be-trained intelligent agent;

and if the agent to be trained meets the training target, outputting the agent to be trained as a target agent.

5. The method of claim 4, further comprising, before outputting the agent to be trained as a target agent:

counting the training times of iterative training of the intelligent agent to be trained;

if the counted training times reach the preset specified times, outputting the intelligent agent to be trained as a reference intelligent agent, and adding the reference intelligent agent to the reference intelligent agent set;

and resetting the training times, continuing to carry out iterative training on the intelligent agent to be trained, and updating the reference intelligent agent set based on the re-counted training times.

6. The method of claim 2, wherein, before predicting a sample scheduling operation performed by the agent to be trained on a sample virtual controlled element associated with the agent to be trained in the sample virtual interactive scene based on a first sample interactive state feature corresponding to a first sample interactive scene image in the sample virtual interactive scene, and predicting a sample interactive operation performed by the agent to be trained on the sample virtual controlled element after performing the sample scheduling operation, further comprises:

performing area identification processing on the first sample interactive scene image to obtain a first interactive result area, a first global view area and a first local view area;

respectively carrying out image feature extraction processing on the first interaction result area, the first global view angle area and the first local view angle area to respectively obtain a corresponding first feature vector, a first global view angle feature matrix and a first local view angle feature matrix, wherein the first feature vector is used for characterizing interaction information related to a sample interaction result, the first global perspective feature matrix is used for characterizing position information of the sample virtual controlled elements, position information of reference virtual controlled elements associated with the reference intelligent agent and position information of scene elements included in the sample virtual interaction scene, the first local view feature matrix is used for characterizing position information of a sample virtual controlled element contained in the first local view area, position information of a reference virtual controlled element contained in the first local view area, and position information of a scene element contained in the first local view area;

and taking the first feature vector, the first global view feature matrix and the first local view feature matrix as first sample interaction state features corresponding to the first sample interaction scene image.

7. The method according to claim 6, wherein predicting, based on a first sample interaction state feature corresponding to a first sample interaction scene image in the sample virtual interaction scene, a sample scheduling operation performed by the agent to be trained on a sample virtual controlled element associated with the agent to be trained in the sample virtual interaction scene, and predicting a sample interaction operation performed by the agent to be trained on the sample virtual controlled element after performing the sample scheduling operation, comprises:

predicting a sample scheduling operation performed by the agent to be trained on the sample virtual controlled elements based on the first feature vector and the first global perspective feature matrix;

predicting a prediction feature vector, a prediction global view feature matrix and a prediction local view feature matrix corresponding to a prediction interactive scene image generated after the sample scheduling operation is executed based on the sample scheduling operation, the first global view feature matrix and the first local view feature matrix;

and predicting sample interaction operation performed by the intelligent agent to be trained aiming at the sample virtual controlled elements based on the prediction feature vector and the prediction local view feature matrix.

8. The method of claim 7, wherein predicting, based on the first eigenvector and the first global view eigenmatrix, a sample scheduling operation performed by the agent to be trained for the sample virtual controlled element comprises:

dividing the first global view area into a plurality of sub-areas;

predicting a target sub-region corresponding to the sample virtual controlled element based on the first eigenvector and the first global view feature matrix;

and obtaining the sample scheduling operation based on the sub-area where the sample virtual controlled element is currently located and the target sub-area corresponding to the sample virtual controlled element.

9. The method of claim 8, wherein after obtaining the sample scheduling operation based on the sub-region where the sample virtual controlled element is currently located and the target sub-region corresponding to the sample virtual controlled element, the method further comprises:

based on the sample scheduling operation, controlling the sample virtual controlled element to move from the current sub-area to the corresponding target sub-area, and starting timing;

determining the reference time length for the sample virtual controlled element to move from the current sub-area to the corresponding target sub-area based on the preset moving speed for the sample virtual controlled element;

if the timing duration reaches the reference duration, acquiring the second sample interactive scene image;

and extracting a second sample interaction state characteristic corresponding to the second sample interaction scene image to obtain a second characteristic vector, a second global view angle characteristic matrix and a second local view angle characteristic matrix.

10. The method of claim 9, further comprising, after extracting a second sample interaction state feature corresponding to the second sample interaction scene image:

determining a first scheduling sub-excitation of the sample scheduling operation based on whether the sub-region where the sample virtual controlled element is currently located is matched with a target sub-region corresponding to the sample virtual controlled element indicated by the sample scheduling operation;

determining a second scheduling sub-stimulus for the sample scheduling operation based on a change value between the first eigenvector and the second eigenvector corresponding to the sample virtual controlled element;

determining scheduled stimulus data for the sample scheduling operation based on a weighted sum of the first scheduled sub-stimulus and the second scheduled sub-stimulus.

11. The method according to any one of claims 2 to 10, wherein the agent to be trained comprises a quantitative information extraction module, wherein the quantitative information extraction module is used for extracting sample interaction state features corresponding to each sample interaction scene image;

the intelligent agent to be trained further comprises a training module, wherein the training module is used for obtaining each scheduling excitation data and each interactive excitation data based on each sample interactive state characteristic, and adjusting the model parameters of the intelligent agent to be trained based on each obtained scheduling excitation data and each obtained interactive excitation data.

12. The method of claim 11, wherein determining error values between the scheduled excitation data and the interactive excitation data and a preset target excitation data, respectively, and adjusting the model parameters of the agent to be trained based on the obtained error values comprises:

if the training module comprises a scheduling model and an interaction model, and the target excitation data comprises scheduling target excitation data and interaction target excitation data, determining a scheduling error value between the scheduling excitation data and the scheduling target excitation data, and adjusting model parameters of the scheduling model based on the obtained scheduling error value;

and determining an interaction error value between the interaction excitation data and the interaction target excitation data, and adjusting the model parameters of the interaction model based on the obtained interaction error value.

13. An agent interaction device, comprising:

loading a module: the system comprises a target agent, a virtual account and an interaction request instruction, wherein the interaction request instruction is used for responding to an interaction request instruction triggered by a virtual account and loading the target agent;

a processing module: the virtual account number is used for responding to a control operation triggered by a first target virtual controlled element associated with the virtual account number in a target virtual interaction scene, and a target interaction scene image corresponding to the control operation is obtained;

the processing module is further configured to: extracting target interaction state features from the target interaction scene image;

the processing module is further configured to: determining target scheduling operation and target interaction operation corresponding to the target agent based on the target interaction state characteristics;

the processing module is further configured to: and responding to the target scheduling operation and the target interaction operation, and controlling a second target virtual controlled element associated with the target agent in the target virtual interaction scene.

14. A computer device, comprising:

a memory for storing program instructions;

a processor for calling the program instructions stored in the memory and executing the method according to any one of claims 1 to 12 according to the obtained program instructions.

15. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 12.

Technical Field

The present application relates to the field of computer technologies, and in particular, to an intelligent agent interaction method and apparatus, a computer device, and a storage medium.

Background

With the continuous development of science and technology, more and more devices can provide a virtual interaction scene for a plurality of virtual accounts and also can provide an intelligent agent for interacting in the virtual interaction scene for a single virtual account. For example, taking a game scene as an example, a certain game account may interact with an agent when not matched with other game accounts; for another example, the game account number may improve the interaction capability of the game account number by interacting with the agent.

Generally, after the virtual account performs a control operation on a certain virtual controlled element associated with the virtual account, the agent can only determine a corresponding feedback operation on the certain virtual controlled element associated with the agent. However, after the virtual account performs the control operation, not only a single element of the virtual interactive scene is affected, but also the influence caused by the control operation is diversified, and the traditional interaction method of the intelligent agent does not consider the real interaction process in the virtual interactive scene, so that the intelligent agent cannot flexibly interact with the virtual account in the virtual interactive scene.

Therefore, in the prior art, the interaction accuracy of the intelligent agent is low.

Disclosure of Invention

The embodiment of the application provides an intelligent agent interaction method and device, computer equipment and a storage medium, and is used for solving the problem of low interaction accuracy of an intelligent agent.

In a first aspect, a method for intelligent agent interaction is provided, including:

responding to an interaction request instruction triggered by the virtual account, and loading a target agent;

responding to a control operation triggered by a first target virtual controlled element associated with the virtual account in a target virtual interaction scene by the virtual account, and acquiring a target interaction scene image corresponding to the control operation;

extracting target interaction state features from the target interaction scene image;

determining target scheduling operation and target interaction operation corresponding to the target agent based on the target interaction state characteristics;

and responding to the target scheduling operation and the target interaction operation, and controlling a second target virtual controlled element associated with the target agent in the target virtual interaction scene.

In a second aspect, an intelligent agent interaction device is provided, which includes:

loading a module: the system comprises a target agent, a virtual account and an interaction request instruction, wherein the interaction request instruction is used for responding to an interaction request instruction triggered by a virtual account and loading the target agent;

a processing module: the virtual account number is used for responding to a control operation triggered by a first target virtual controlled element associated with the virtual account number in a target virtual interaction scene, and a target interaction scene image corresponding to the control operation is obtained;

the processing module is further configured to: extracting target interaction state features from the target interaction scene image;

the processing module is further configured to: determining target scheduling operation and target interaction operation corresponding to the target agent based on the target interaction state characteristics;

the processing module is further configured to: and responding to the target scheduling operation and the target interaction operation, and controlling a second target virtual controlled element associated with the target agent in the target virtual interaction scene.

Optionally, the target agent is trained in the following manner:

the processing module is further configured to: based on the interaction process of the to-be-trained intelligent agent and a preset reference intelligent agent in a sample virtual interaction scene, performing multi-round iterative training on the to-be-trained intelligent agent until a preset training target is met, outputting the to-be-trained intelligent agent as a target intelligent agent, wherein in one round of iterative training, the processing module is specifically used for:

predicting a sample scheduling operation executed by the intelligent agent to be trained aiming at a sample virtual controlled element associated with the intelligent agent to be trained in the sample virtual interactive scene based on a first sample interactive state characteristic corresponding to a first sample interactive scene image in the sample virtual interactive scene, and predicting a sample interactive operation executed by the intelligent agent to be trained aiming at the sample virtual controlled element after the intelligent agent to be trained executes the sample scheduling operation;

and adjusting the model parameters of the intelligent agent to be trained based on a second sample interaction state characteristic corresponding to a second sample interaction scene image generated after the sample scheduling operation is executed and a third sample interaction state characteristic corresponding to a third sample interaction scene image generated after the sample interaction operation is executed.

Optionally, the processing module is specifically configured to:

determining scheduling excitation data of the sample scheduling operation according to a preset scheduling excitation strategy based on the second sample interaction state characteristic, wherein the scheduling excitation data is used for representing the completion degree of the sample scheduling operation and the influence degree of the sample scheduling operation on the sample interaction result;

determining interactive excitation data of the sample interactive operation according to a preset interactive excitation strategy based on the third sample interactive state feature, wherein the interactive excitation data is used for representing the influence degree of the sample interactive operation on the sample interactive result;

and respectively determining error values between the scheduling excitation data and the interactive excitation data and preset target excitation data, and adjusting the model parameters of the agent to be trained based on the obtained error values.

Optionally, the processing module is further configured to:

after adjusting the model parameters of the intelligent agent to be trained based on the obtained error values, determining an evaluation value of the intelligent agent to be trained based on each scheduling excitation data and each interactive excitation data obtained by multiple rounds of iterative training according to a preset scoring strategy, wherein the evaluation value is used for representing the training degree of the intelligent agent to be trained;

and if the evaluation value is converged, outputting the agent to be trained as a target agent.

Optionally, the processing module is specifically configured to:

randomly extracting reference agents from each reference agent in a preset reference agent set based on the corresponding selection probability of each reference agent;

performing multiple rounds of iterative training on the intelligent agent to be trained based on the interactive process of the intelligent agent to be trained and the extracted reference intelligent agent in the sample virtual interactive scene;

if the to-be-trained intelligent agent does not meet the training target when the sample interaction result of the to-be-trained intelligent agent and the extracted reference intelligent agent is obtained, re-extracting the reference intelligent agents from the reference intelligent agents, and continuing to perform multiple rounds of iterative training on the to-be-trained intelligent agent;

and if the agent to be trained meets the training target, outputting the agent to be trained as a target agent.

Optionally, the processing module is further configured to:

counting training times of iterative training of the agent to be trained before outputting the agent to be trained as a target agent;

if the counted training times reach the preset specified times, outputting the intelligent agent to be trained as a reference intelligent agent, and adding the reference intelligent agent to the reference intelligent agent set;

and resetting the training times, continuing to carry out iterative training on the intelligent agent to be trained, and updating the reference intelligent agent set based on the re-counted training times.

Optionally, the processing module is further configured to:

predicting a sample scheduling operation executed by the agent to be trained aiming at a sample virtual controlled element associated with the agent to be trained in the sample virtual interactive scene based on a first sample interactive state characteristic corresponding to a first sample interactive scene image in the sample virtual interactive scene, and predicting that the agent to be trained performs region identification processing on the first sample interactive scene image after executing the sample scheduling operation and before executing the sample interactive operation aiming at the sample virtual controlled element to obtain a first interactive result region, a first global view angle region and a first local view angle region;

respectively carrying out image feature extraction processing on the first interaction result area, the first global view angle area and the first local view angle area to respectively obtain a corresponding first feature vector, a first global view angle feature matrix and a first local view angle feature matrix, wherein the first feature vector is used for characterizing interaction information related to the sample interaction result, the first global perspective feature matrix is used for characterizing the position information of the sample virtual controlled elements, the position information of the reference virtual controlled elements associated with the reference intelligent agent and the position information of scene elements contained in the sample virtual interactive scene, the first local view feature matrix is used for characterizing position information of a sample virtual controlled element contained in the first local view area, position information of a reference virtual controlled element contained in the first local view area, and position information of a scene element contained in the first local view area;

and taking the first feature vector, the first global view feature matrix and the first local view feature matrix as first sample interaction state features corresponding to the first sample interaction scene image.

Optionally, the processing module is specifically configured to:

predicting a sample scheduling operation performed by the agent to be trained on the sample virtual controlled elements based on the first feature vector and the first global perspective feature matrix;

predicting a prediction feature vector, a prediction global view feature matrix and a prediction local view feature matrix corresponding to a prediction interactive scene image generated after the sample scheduling operation is executed based on the sample scheduling operation, the first global view feature matrix and the first local view feature matrix;

and predicting sample interaction operation performed by the intelligent agent to be trained aiming at the sample virtual controlled elements based on the prediction feature vector and the prediction local view feature matrix.

Optionally, the processing module is specifically configured to:

dividing the first global view area into a plurality of sub-areas;

predicting a target sub-region corresponding to the sample virtual controlled element based on the first eigenvector and the first global view feature matrix;

and obtaining the sample scheduling operation based on the sub-area where the sample virtual controlled element is currently located and the target sub-area corresponding to the sample virtual controlled element.

Optionally, the processing module is further configured to:

after the sample scheduling operation is obtained based on the sub-region where the sample virtual controlled element is located currently and the target sub-region corresponding to the sample virtual controlled element, controlling the sample virtual controlled element to move to the corresponding target sub-region based on the sample scheduling operation, and obtaining a second sample interactive scene image generated by the intelligent body to be trained;

and extracting a second sample interaction state characteristic corresponding to the second sample interaction scene image to obtain a second characteristic vector, a second global view angle characteristic matrix and a second local view angle characteristic matrix.

Optionally, the processing module is further configured to:

after extracting second sample interaction state features corresponding to the second sample interaction scene image, determining a first scheduling sub-excitation of the sample scheduling operation based on whether the sub-region where the sample virtual controlled element is currently located is matched with a target sub-region corresponding to the sample virtual controlled element indicated by the sample scheduling operation;

determining a second scheduling sub-stimulus for the sample scheduling operation based on a change value between the first eigenvector and the second eigenvector corresponding to the sample virtual controlled element;

determining scheduled stimulus data for the sample scheduling operation based on a weighted sum of the first scheduled sub-stimulus and the second scheduled sub-stimulus.

Optionally, the agent to be trained includes a quantitative information extraction module, where the quantitative information extraction module is configured to extract sample interaction state features corresponding to each sample interaction scene image;

the intelligent agent to be trained further comprises a training module, wherein the training module is used for obtaining each scheduling excitation data and each interactive excitation data based on each sample interactive state characteristic, and adjusting the model parameters of the intelligent agent to be trained based on each obtained scheduling excitation data and each obtained interactive excitation data.

Optionally, the processing module is specifically configured to:

if the training module comprises a scheduling model and an interaction model, and the target excitation data comprises scheduling target excitation data and interaction target excitation data, determining a scheduling error value between the scheduling excitation data and the scheduling target excitation data, and adjusting model parameters of the scheduling model based on the obtained scheduling error value;

and determining an interaction error value between the interaction excitation data and the interaction target excitation data, and adjusting the model parameters of the interaction model based on the obtained interaction error value.

In a third aspect, a computer device is provided, comprising:

a memory for storing program instructions;

a processor for calling the program instructions stored in the memory and executing the method according to the first aspect according to the obtained program instructions.

In a fourth aspect, there is provided a computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of the first aspect.

In the embodiment of the application, when the control operation triggered by the virtual account for the first target virtual controlled element is responded, the target scheduling operation and the target interaction operation corresponding to the target agent can be determined, so that the second target virtual controlled element can be controlled in response to the target scheduling operation and the target interaction operation. The target scheduling operation obtained by responding to the control operation can control the second target virtual controlled element to execute the scheduling action, the interaction strategy obtained by the target intelligent agent responding to the control operation is reflected from a macroscopic view, the interaction strategy is not simply responded to the control operation in the interaction action, and the interaction accuracy of the target intelligent agent is improved.

Meanwhile, when the number of the second target virtual controlled elements is multiple, a certain second target virtual controlled element is not controlled singly to respond to the control operation of the virtual account, but the obtained target scheduling operation is considered from the overall situation of all the second target virtual controlled elements, so that one second target virtual controlled element or multiple second target virtual controlled elements are controlled based on the target scheduling operation, and the interaction accuracy of the target intelligent agent is further improved.

The second target virtual controlled element can be controlled to execute the interactive action by responding to the target interactive operation obtained by the control operation, the interactive capability of the target intelligent body is reflected from a microscopic view, the interactive capability can play the greatest role on the basis of a correct interactive strategy, and the interactive accuracy of the target intelligent body is improved. From the macroscopic and microscopic angles, the macroscopic decision-making capability of the target intelligent agent is improved on the premise of not reducing the interaction capability, so that the target intelligent agent can accurately and diversely feed back the virtual account. In the embodiment of the application, the interaction between the target intelligent agent and the virtual account number can accurately simulate the real interaction between the virtual account numbers, and the interaction accuracy of the target intelligent agent is improved.

Drawings

FIG. 1a is a schematic diagram illustrating a first principle of an agent interaction method provided in the related art;

fig. 1b is a schematic diagram illustrating a principle of an agent interaction method according to an embodiment of the present application;

fig. 2 is an application scenario of the agent interaction method according to the embodiment of the present application;

fig. 3a is a schematic diagram illustrating a third principle of an agent interaction method according to an embodiment of the present application;

fig. 3b is a schematic flowchart illustrating a first method for agent interaction according to an embodiment of the present disclosure;

fig. 4a is a schematic diagram illustrating a principle of an agent interaction method according to an embodiment of the present application;

fig. 4b is a schematic diagram illustrating a principle of an agent interaction method according to an embodiment of the present application;

fig. 4c is a schematic diagram six illustrating a principle of an agent interaction method according to an embodiment of the present application;

fig. 4d is a schematic diagram seven illustrating a principle of an agent interaction method according to an embodiment of the present application;

fig. 5a is a schematic diagram eight illustrating a principle of an agent interaction method according to an embodiment of the present application;

fig. 5b is a schematic diagram nine illustrating an interaction method of an agent according to an embodiment of the present application;

fig. 6 is a schematic diagram ten illustrating a principle of an agent interaction method according to an embodiment of the present application;

fig. 7 is a schematic flowchart illustrating a second method for agent interaction according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an agent interaction device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an intelligent agent interaction device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

(1) Multiplayer online tactical sports game (MOBA):

in the competition, a plurality of people are generally divided into two teams, the two teams compete with each other in a scattered game map, each person controls a selected role through one interface, and organizational units such as building groups, resources, training arms and the like in the game do not need to be operated generally.

(2) Agent:

an agent is a computing entity that resides in a certain environment, can continuously and autonomously play a role, and has the characteristics of residence, reactivity, sociality, initiative and the like. An agent is an important concept in the field of artificial intelligence. Any entity that is independent of the idea and that can interact with the environment can be abstracted as an agent.

(3) Reinforcement Learning (RL):

reinforcement learning can be used to describe and solve problems for an agent in interacting with an environment to achieve maximum returns or achieve specific goals through learning strategies. If a certain behavior of the agent results in a positive incentive for the environment, the tendency of the agent to generate this behavior at a later time is enhanced. The goal of the agent is to find the optimal strategy at each discrete state to maximize the desired sum of stimuli.

Embodiments of the present application relate to cloud technology (cloud technology) and Artificial Intelligence (AI). The design is based on cloud computing (cloud computing) and cloud storage (cloud storage) in cloud technology, and Computer Vision technology (CV) and Machine Learning (ML) in artificial intelligence technology.

The cloud technology is a hosting technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize the calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

Cloud computing is a computing model that distributes computing tasks over a resource pool of large numbers of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

As a basic capability provider of cloud computing, a cloud computing resource pool (called as an Infrastructure as a Service (IaaS) platform for short) is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients.

According to the logic function division, a Platform as a Service (PaaS) layer can be deployed on the IaaS layer, a Software as a Service (SaaS) layer is deployed on the PaaS layer, and the SaaS layer can be directly deployed on the IaaS layer. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.

Cloud storage is a new concept extended and developed from a cloud computing concept, and a distributed cloud storage system (hereinafter referred to as a storage system) refers to a storage system which integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces to cooperatively work through functions of cluster application, a grid technology, a distributed storage file system and the like, and provides data storage and service access functions to the outside.

At present, a storage method of a storage system is as follows: logical volumes are created, and when created, each logical volume is allocated physical storage space, which may be the disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as data Identification (ID), the file system writes each object into a physical storage space of the logical volume, and the file system records storage location information of each object, so that when the client requests to access the data, the file system can allow the client to access the data according to the storage location information of each object.

The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided in advance into stripes according to a group of capacity measures of objects stored in a logical volume (the measures often have a large margin with respect to the capacity of the actual objects to be stored) and Redundant Array of Independent Disks (RAID), and one logical volume can be understood as one stripe, thereby allocating physical storage space to the logical volume.

Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer vision technology computer vision is a science for researching how to make a machine "see", and further, it means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eye observation or transmitted to an instrument for detection. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The application field of the intelligent agent interaction method provided by the embodiment of the application is briefly introduced below.

With the continuous development of science and technology, more and more devices can provide virtual interaction scenes for a plurality of virtual accounts and also can provide an intelligent agent for virtual interaction in the virtual interaction scenes for a single virtual account. Taking a game scene as an example, for example, a certain game account can perform virtual interaction with an agent when not matched with other game accounts; for another example, the game account may improve the interaction capability or the interaction level of the game account by interacting with the agent.

For example, a certain virtual account number can simulate other vehicles in a road by operating a steering wheel or a pedal plate, driving is simulated at a client, the intelligent body can simulate other vehicles in the road, the virtual account number can perform virtual interaction with the intelligent body, for example, turning a steering lamp to indicate other vehicles, the steering wheel rotates to be parallel to the road, other vehicles are prompted by a loudspeaker, and the like, so that the purpose of driving is achieved.

For example, a virtual account can control a virtual character to find an interview site in a designated area through a client, control the virtual character to prepare before interviewing in the interview site, and answer questions of a virtual interviewer simulated by an intelligent agent through voice to achieve the purpose of improving interview capability.

Generally, after the virtual account performs a control operation on a certain virtual controlled element associated with the virtual account, the agent can only determine a corresponding feedback operation on the certain virtual controlled element associated with the agent. Taking a game scenario as an example, for example, the virtual account controls a certain hero associated with the virtual account to execute an attack action, and the agent can only control the virtual controlled element associated with the agent and subjected to the attack action to execute an evasive operation.

However, after the virtual account performs the control operation, not only a single element of the virtual interactive scene is affected, but also the effect of the control operation is diversified. Taking a game scene as an example, for example, when one hero selected by an agent is attacked, another hero can be scheduled to enrich the blood for the hero, and the hero subjected to the attacking action can directly attack the action without avoiding the action, and the opponent hero is killed by using skills.

Further, the agent is obtained based on virtual interactive scene training, and the traditional method for training the agent is to train the agent based on the attribute information of the virtual interactive scene, please refer to fig. 1a, train the agent based on the control rule of the virtual controlled element contained in the attribute information of the virtual interactive scene, the position of the scene element and the interaction duration, so that the triggered virtual interaction can be predicted when the trained agent faces the virtual account. Taking a game scene as an example, for example, based on the skill attribute of hero in the game, the position of an obstacle in a game map and the action duration of each skill of hero, an intelligent agent is trained, so that when the trained intelligent agent faces an attack action of a virtual account, a triggered avoidance action and the like can be predicted.

However, since the attribute information of the virtual interactive scene is preset, the interaction capability of the trained intelligent agent tends to be simplified. When the agent is faced with the action executed by one virtual controlled element associated with the virtual account, the agent can only control the virtual controlled element associated with the agent and having virtual interaction with the one virtual controlled element to execute the corresponding action. Taking a game scenario as an example, for example, when an agent faces an attack action sent by any hero selected by a virtual account, the hero selected by the agent is usually controlled to trigger an avoidance action, however, when one hero selected by the agent is attacked, another hero can be scheduled to enrich the blood for the hero, and the hero subjected to the attack action can directly perform the attack action without performing the avoidance action, and then the opponent hero is killed by using skills.

For another example, when the hero selected by the agent enters the attack range of the hero selected by the agent, the agent usually controls the hero selected by the agent to attack the hero selected by the agent, however, when the hero selected by the agent enters the attack range of the hero selected by the agent, the hero selected by the agent can execute an evasive action in a grove, and when other heros selected by the agent reach nearby, the agent attacks the hero selected by the agent again, thereby realizing tactical coordination between teammates and the like.

In the actual interaction process, interaction methods are studied, the interaction process is rich and various, the traditional intelligent agent interaction method and the traditional intelligent agent training method do not consider the real interaction process in a virtual interaction scene, so that the trained intelligent agent can not perform virtual interaction from the macroscopic decision angle and can not flexibly perform virtual interaction with a virtual account in the virtual interaction scene. When the virtual account number performs virtual interaction with the intelligent agent, real interaction experience when virtual interaction is performed with other virtual account numbers cannot be accurately restored. Therefore, in the prior art, the interaction accuracy of the intelligent agent is low.

In order to solve the problem that the interaction accuracy of an agent is low, the application provides an agent interaction method. Referring to fig. 1b, the method loads the target agent in response to an interaction request command triggered by the virtual account. And responding to a control operation triggered by a first target virtual controlled element associated with the virtual account in the target virtual interactive scene by the virtual account, and acquiring a target interactive scene image corresponding to the control operation. And extracting target interaction state characteristics from the target interaction scene image. And determining target scheduling operation and target interaction operation corresponding to the target agent based on the target interaction state characteristics. And responding to the target scheduling operation and the target interaction operation, and controlling a second target virtual controlled element associated with the target agent in the target virtual interaction scene.

It should be noted that the target scheduling operation and the target interaction operation are only two-angle divisions of the operation that can be executed by the target agent, and are not limited to a certain operation. The target scheduling operation is used for controlling the second target virtual controlled element to execute the target scheduling action, and the target interaction operation is used for controlling the second target virtual controlled element to execute the target interaction action. When the target agent is triggered to execute the target scheduling operation and the target interaction operation, only the target interaction operation may be executed if the target scheduling operation is not scheduled, only the target scheduling operation may be executed if the target interaction operation is not interactive, or no operation may be executed if the target scheduling operation is not scheduled and the target interaction operation is not interactive. When the number of the target virtual controlled elements is multiple, the target scheduling operation performed by the target agent may include a scheduling action for controlling each target virtual controlled element to perform, and the target interaction operation performed by the target agent may include an interaction action for controlling each target virtual controlled element to perform. The target agent may control all target virtual controlled elements simultaneously, may only control one or more target virtual controlled elements, and the like, and may be specifically set according to an actual scene, which is not specifically described herein.

In the embodiment of the application, when the control operation triggered by the virtual account for the first target virtual controlled element is responded, the target scheduling operation and the target interaction operation corresponding to the target agent can be determined, so that the second target virtual controlled element can be controlled in response to the target scheduling operation and the target interaction operation. The target scheduling operation obtained by responding to the control operation can control the second target virtual controlled element to execute the scheduling action, the interaction strategy obtained by the target intelligent agent responding to the control operation is reflected from a macroscopic view, the interaction strategy is not simply responded to the control operation in the interaction action, and the interaction accuracy of the target intelligent agent is improved.

Meanwhile, when the number of the second target virtual controlled elements is multiple, a certain second target virtual controlled element is not controlled singly to respond to the control operation of the virtual account, but the obtained target scheduling operation is considered from the overall situation of all the second target virtual controlled elements, so that one second target virtual controlled element or multiple second target virtual controlled elements are controlled based on the target scheduling operation, and the interaction accuracy of the target intelligent agent is further improved.

The second target virtual controlled element can be controlled to execute the interactive action by responding to the target interactive operation obtained by the control operation, the interactive capability of the target intelligent body is reflected from a microscopic view, the interactive capability can play the greatest role on the basis of a correct interactive strategy, and the interactive accuracy of the target intelligent body is improved. From the macroscopic and microscopic angles, the macroscopic decision-making capability of the target intelligent agent is improved on the premise of not reducing the interaction capability, so that the target intelligent agent can accurately and diversely feed back the virtual account. In the embodiment of the application, the interaction between the target intelligent agent and the virtual account number can accurately simulate the real interaction between the virtual account numbers, and the interaction accuracy of the target intelligent agent is improved.

An application scenario of the intelligent agent interaction method provided by the present application is explained below.

Please refer to fig. 2, which is an application scenario of the intelligent agent interaction method according to the embodiment of the present application. The application scenario includes a client 101, an agent interaction end 102, and an agent training end 103. The client 101 and the agent interaction terminal 102 can communicate with each other, the agent interaction terminal 102 and the agent training terminal 103 can communicate with each other, and the communication mode can be that wired communication technology is adopted for communication, for example, communication is carried out through connecting a network cable or a serial port cable; the communication may also be performed by using a wireless communication technology, for example, communication may be performed by using technologies such as bluetooth or wireless fidelity (WIFI), and the like, which is not limited specifically.

The client 101 generally refers to a device that can log in to a virtual account, for example, a terminal device, a third-party application that can be accessed by the terminal device, a web page that can be accessed by the terminal device, or the like. For example, the terminal device includes, but is not limited to, a mobile phone, a computer, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, and the like. The agent interaction terminal 102 generally refers to a device, such as a terminal device or a server, capable of performing virtual interaction with a virtual account or an agent. For example, the server includes a cloud server, a local server, or an associated third party server, and the like. Agent training terminal 103 generally refers to a device, such as a terminal device or a server, that can train an agent. The client 101, the agent interaction terminal 102 and the agent training terminal 103 can all adopt cloud computing to reduce the occupation of local computing resources; cloud storage can also be adopted to reduce the occupation of local storage resources.

As an embodiment, the client 101 and the agent interaction terminal 102 may be the same device, the agent interaction terminal 102 and the agent training terminal 103 may be the same device, the client 101 and the agent training terminal 103 may be the same device, and the client 101, the agent interaction terminal 102 and the agent training terminal 103 may be the same device, which is not limited specifically. In the embodiment of the present application, a client 101, an agent interaction end 102, and an agent training end 103 are respectively different devices for example.

The following describes an intelligent agent interaction method provided in the embodiment of the present application in detail based on fig. 2.

Before the client 101 performs virtual interaction with the target agent of the agent interaction side 102, the agent interaction side 102 may obtain the target agent. After the agent training terminal 103 carries out iterative training on an agent to be trained, a target agent is obtained, the agent training terminal 103 sends the target agent to the agent interaction terminal 102, and the agent interaction terminal 102 receives the target agent sent by the agent training terminal 103.

The process of training the agent to be trained by the agent training terminal 103 will be described first.

Referring to fig. 3a, based on the interaction process of the agent to be trained and the preset reference agent in the sample virtual interaction scene, performing multiple rounds of iterative training on the agent to be trained until a preset training target is met, and outputting the agent to be trained as a target agent. The virtual interactive scene simulated by the two intelligent agents is adopted, and the characteristics of the real interactive process are integrated into the training process aiming at the target intelligent agent. The real interactive training intelligent body based on the simulated virtual interactive scene enables the trained target intelligent body to flexibly interact with the virtual account number in the real virtual interactive scene, and interaction accuracy of the intelligent body obtained through training is improved.

As an embodiment, based on an interaction process of an agent to be trained and a preset reference agent in a sample virtual interaction scene, the preset reference agent may be obtained before the agent to be trained performs multiple rounds of iterative training. There are various methods for obtaining the reference agent, for example, the other device sends the reference agent to the agent training terminal 103, and for example, the reference agent may be obtained by the agent training terminal 103 based on a preset set of reference agents, etc. The reference agent set can comprise all reference agents with the interaction capacity equivalent to that of the agents to be trained, so that an interaction method for obtaining an expected sample interaction result can be learned from multiple interaction processes with equivalent strength, and the interaction accuracy of the trained agents to be trained is improved. The reference agent set can further comprise all reference agents with interaction capacity higher than that of the agents to be trained, so that the high-capacity interaction method can be learned from the high-capacity interaction methods of the reference agents, and the interaction accuracy of the trained agents to be trained is improved. The reference agent set can further comprise various reference agents with interaction capacity lower than that of the agents to be trained, so that the interaction method of the reference agents with low capacity can be used as a reverse teaching material to learn the interaction method with high capacity, and the interaction accuracy of the trained agents to be trained is improved. The reference agent set may further include various types of reference agents, the reference agents may be randomly selected, or each type of reference agent may be extracted according to a preset probability, and the like, which is not limited specifically.

In the following, a process of obtaining a reference agent by the agent training terminal 103 based on a preset reference agent set and training an agent to be trained is described as an example.

And randomly extracting the reference agents from the reference agents based on the selection probability corresponding to each reference agent in the preset reference agent set. The selection probability corresponding to each reference agent may be determined based on the frequency of each reference agent being extracted, where the higher the frequency of extraction is, the lower the selection probability is, and the lower the frequency of extraction is, the higher the selection probability is. The selection probability corresponding to each reference agent may also be determined based on the acquisition time of each reference agent, where the shorter the duration between the acquisition time and the current time, the higher the selection probability, and the longer the selection probability. The selection probability corresponding to each reference agent is not specifically limited.

After the reference agent is extracted, performing multi-round iterative training on the agent to be trained based on the interaction process of the agent to be trained and the extracted reference agent in the sample virtual interaction scene. If the to-be-trained intelligent agent does not meet the training target currently when the sample interaction result of the to-be-trained intelligent agent and the extracted reference intelligent agent is obtained, the reference intelligent agent can be extracted again from each reference intelligent agent, and multi-round iterative training is continuously carried out on the to-be-trained intelligent agent. After the agent to be trained is subjected to multiple rounds of iterative training, if the agent to be trained currently meets the training target, the agent to be trained is output as the target agent.

As an embodiment, if the to-be-trained agent does not meet the training target yet when the sample interaction result of the to-be-trained agent and the extracted reference agent is obtained, the reference agent is extracted again from each reference agent, and the to-be-trained agent continues to perform multiple rounds of iterative training. And if the to-be-trained agent meets the training target, outputting the to-be-trained agent as the target agent.

As an embodiment, there are various methods for obtaining the reference agent set, for example, the other device sends a preset reference agent set to the agent training terminal 103, and for example, the reference agent set is obtained in the process of training the agent to be trained, and the like, which is not limited specifically.

The following is an example description of a process for obtaining a set of reference agents during training of agents to be trained.

When the to-be-trained intelligent body is not trained, the preset reference intelligent body set only comprises the to-be-trained intelligent body, when the to-be-trained intelligent body is subjected to multi-round iterative training based on the interactive process of the to-be-trained intelligent body and the reference intelligent body in the sample virtual interactive scene, every round of iterative training is performed, the training times are accumulated, and the training times for the to-be-trained intelligent body to be subjected to iterative training are counted.

And if the counted training times do not reach the preset specified times, continuing to carry out iterative training. And if the counted training times reach the specified times, adding the current agent to be trained to the reference agent set as the reference agent. After the reference agent is added to the reference agent set, the training times can be cleared, iterative training is continuously carried out on the agent to be trained, new training times are counted again, the reference agents contained in the reference agent set are updated all the time before the trained target agent is obtained, new reference agents are continuously added, and the reference agents of which the sample interaction results do not reach preset indexes can be removed from the reference agent set.

Taking a game scene as an example, for example, if a reference agent in the reference agent set fails to fight against an agent to be trained for a plurality of times, it indicates that the fighting capability of the reference agent is weak and cannot play a role in training the agent to be trained, so that the reference agent can be removed from the reference agent set to ensure that each reference agent in the reference agent set maintains the same interaction level, and the accuracy of training the agent to be trained based on the reference agent is improved. In the process of training the intelligent agent to be trained, the virtual interaction data are obtained based on the virtual interaction between the intelligent agent and the intelligent agent, any virtual account number is not required to participate, and the difficulty in obtaining the virtual interaction data is reduced.

In the following, a round of iterative training performed on an agent to be trained is taken as an example for introduction, please refer to fig. 3b, which is a schematic flow chart of an agent interaction method provided in an embodiment of the present application.

S301, based on the first sample interaction state feature corresponding to the first sample interaction scene image in the sample virtual interaction scene, predicting a sample scheduling operation executed by the to-be-trained intelligent body aiming at a sample virtual controlled element associated with the to-be-trained intelligent body in the sample virtual interaction scene, and predicting a sample interaction operation executed by the to-be-trained intelligent body aiming at the sample virtual controlled element after executing the sample scheduling operation.

Before prediction is performed based on the first sample interaction state feature, a first sample interaction state feature corresponding to a first sample interaction scene image in a sample virtual interaction scene may be obtained, and there are various methods for obtaining the first sample interaction state feature, for example, receiving the first sample interaction state feature corresponding to the first sample interaction scene image sent by other devices, for example, calculating the first sample interaction state feature in real time when obtaining the first sample interaction scene image, and the like.

When the agent to be trained performs virtual interaction with the reference agent, each frame of the sample virtual interaction scene may be used as a sample interaction scene image, or some frames of the sample virtual interaction scene may be used as each sample interaction scene image, and the like, which is not particularly limited.

Aiming at the first sample interactive scene image, the intelligent agent to be trained can perform region identification processing on the first sample interactive scene image to obtain a first interactive result region, a first global view angle region and a first local view angle region. The first sample interactive scene image may be a first frame image when the to-be-trained agent and the reference agent enter the sample virtual interactive scene, or may be any frame image in the sample virtual interactive scene, and is not particularly limited. The first interaction result area is used for representing the current interaction situation of the intelligent agent to be trained and the reference intelligent agent, the first global view angle area is used for representing the position information of the sample virtual controlled element associated with the intelligent agent to be trained and the reference virtual controlled element associated with the reference intelligent agent in the virtual interaction scene, and the first local view angle area is used for representing the position information of the sample virtual controlled element and the reference virtual controlled element contained in a certain view angle in the sample virtual interaction scene.

Taking a game scene as an example, please refer to fig. 4a, which is a schematic diagram of a possible interface of a first sample interactive scene image. Please refer to fig. 4b, which is a first interaction result area in the first sample interaction scene image, where the first interaction result area may include information of survival of each hero associated with the agent to be trained, the number of heros on the opponent that are killed by the agent to be trained and the reference agent, the number of killed by the opponent, and the duration of the battle. Referring to fig. 4c, the first global perspective area in the first sample interactive scene image may include each hero associated with the to-be-trained agent and each hero associated with the reference agent, and position information in the sample virtual interactive scene, which is not limited specifically. Referring to fig. 4d, the first local view angle region in the first sample interactive scene image may include, in a certain hero view angle, hero associated with the to-be-trained agent and hero associated with the reference agent included in the view angle, position information in the sample virtual interactive scene, and skill information that can be used by the hero corresponding to the hero view angle, which is not limited specifically.

After the first interaction result area, the first global view area, and the first local view area are obtained, image feature extraction processing may be performed on the first interaction result area, the first global view area, and the first local view area, respectively, to obtain a corresponding first feature vector, a corresponding first global view feature matrix, and a corresponding first local view feature matrix, respectively. And taking the first eigenvector, the first global view angle characteristic matrix and the first local view angle characteristic matrix as first sample interaction state characteristics corresponding to the first sample interaction scene image.

The first feature vector is used for representing interaction information related to a sample interaction result, the first global perspective feature matrix is used for representing position information of a sample virtual controlled element, position information of a reference virtual controlled element associated with a reference agent and position information of a scene element contained in a sample virtual interaction scene, and the first local perspective feature matrix is used for representing position information of the sample virtual controlled element contained in a first local perspective area, position information of the reference virtual controlled element contained in the first local perspective area and position information of the scene element contained in the first local perspective area.

As an embodiment, the agent to be trained may include a quantitative information extraction module, where the quantitative information extraction module is configured to extract sample interaction state features corresponding to respective sample interaction scene images generated based on virtual interaction in preset respective sample virtual interaction scenes.

After obtaining the first sample interaction state feature corresponding to the first sample interaction scene image, sample scheduling operation performed by the to-be-trained agent for a sample virtual controlled element associated with the to-be-trained agent in the sample virtual interaction scene may be predicted based on the first sample interaction state feature corresponding to the first sample interaction scene image in the sample virtual interaction scene, and sample interaction operation performed by the to-be-trained agent for the sample virtual controlled element after the sample scheduling operation is performed by the to-be-trained agent is predicted, and the following description is given to the processes of the prediction sample scheduling operation and the sample interaction operation respectively.

Predictive sample scheduling operation:

based on the first eigenvector and the first global perspective eigenvector matrix, a sample scheduling operation performed by the agent to be trained on the sample virtual controlled elements can be predicted. Based on the first feature vector, the current interaction condition can be obtained, and the difference between the current interaction condition and the expected sample interaction result can be obtained through the current interaction condition; based on the first global perspective feature matrix, the position information of the sample virtual controlled elements associated with the to-be-trained intelligent agent and the position information of the reference virtual controlled elements associated with the reference intelligent agent can be obtained from a macroscopic perspective, so that when the sample scheduling operation performed by the to-be-trained intelligent agent on the sample virtual controlled elements is predicted, the position information which is beneficial to the sample virtual controlled elements and analyzed in the direction of reducing the difference between the current interaction condition and the expected sample interaction result can be obtained, and the sample scheduling operation can be obtained. In the real interaction process, the virtual account operates the characteristics of the virtual controlled elements, the sample scheduling operation is predicted, and the interaction accuracy of the trained intelligent agent is improved. The sample scheduling operation may include a scheduling direction, and referring to fig. 5a, a specified number of direction angles may be divided according to a current position of the sample virtual controlled element. The sample scheduling operation may further include a scheduling distance, so that the sample virtual controlled element may be controlled to move to the scheduling direction by the corresponding scheduling distance through the sample scheduling operation, so that the sample virtual controlled element moves from the current position to the target position.

The agent to be trained may also divide the first global perspective region into a plurality of sub-regions. And predicting the target sub-region corresponding to the sample virtual controlled element based on the first feature vector and the first global view feature matrix. And obtaining a sample scheduling operation based on the sub-region where the sample virtual controlled element is located currently and the target sub-region corresponding to the sample virtual controlled element. Taking a game scene as an example, please refer to fig. 5b, which is a first global view area divided into a plurality of sub-areas.

After the sample scheduling operation is obtained, based on the sample scheduling operation, the sample virtual controlled elements are controlled to move to the corresponding target sub-regions, and a second sample interactive scene image generated by the intelligent body to be trained is obtained. And extracting a second sample interaction state characteristic corresponding to the second sample interaction scene image to obtain a second characteristic vector, a second global view angle characteristic matrix and a second local view angle characteristic matrix.

Based on the sample scheduling operation, the sample virtual controlled elements are controlled to move to the corresponding target sub-regions, and the process of obtaining the second sample interactive scene image generated by the intelligent body to be trained is various, for example, the sample virtual controlled elements are controlled to move from the current sub-region to the corresponding target sub-region, and timing is started. And determining the reference time length for the sample virtual controlled element to move from the current sub-area to the corresponding target sub-area based on the preset moving speed for the sample virtual controlled element. And if the timing duration reaches the reference duration, obtaining a second sample interactive scene image generated by the intelligent body to be trained.

For another example, the sample virtual controlled element is controlled to move from the current sub-region to the corresponding target sub-region, the movement distance is recorded, and if the recorded movement distance reaches the reference distance between the current sub-region of the sample virtual controlled element and the target sub-region, a second sample interactive scene image generated by the intelligent agent to be trained is obtained.

Prediction of sample interaction operation:

and predicting a prediction characteristic vector, a prediction global view characteristic matrix and a prediction local view characteristic matrix corresponding to a prediction interactive scene image generated by the intelligent agent to be trained after the sample scheduling operation is executed based on the sample scheduling operation, the first global view characteristic matrix and the first local view characteristic matrix. And the predicted interactive scene image is the scene image when the sample virtual controlled element reaches the target sub-region after the predicted intelligent body to be trained executes the sample scheduling operation.

After a prediction characteristic vector, a prediction global view characteristic matrix and a prediction local view characteristic matrix corresponding to a prediction interactive scene image are obtained, sample interactive operation executed by the to-be-trained intelligent agent aiming at the sample virtual controlled elements is predicted based on the prediction characteristic vector and the prediction local view characteristic matrix. The predicted feature vector may characterize possible interaction after performing the sample scheduling operation, and thus, may further characterize a gap between the possible interaction after performing the sample scheduling operation and an expected sample interaction result. The predicted global view feature matrix may represent the position information of the sample virtual controlled element and the position information of the reference virtual controlled element after the sample virtual controlled element reaches the target sub-region, so that information such as whether the relative position between the sample virtual controlled element and the reference virtual controlled element is favorable or not may be further represented. The predicted local view characteristic matrix can represent the position information of the sample virtual controlled elements and the position information of the reference virtual controlled elements in the predicted local view area where each sample virtual controlled element is located, so that the information such as whether the relative position between the sample virtual controlled elements and the reference virtual controlled elements is favorable can be further represented. Therefore, the next sample interaction operation can be predicted based on the predicted situation after the sample scheduling operation is executed.

After the predicted sample scheduling operation and the sample interaction operation are obtained, model parameters of the intelligent body to be trained are adjusted based on a second sample interaction state feature corresponding to a second sample interaction scene image generated after the sample scheduling operation is executed and a third sample interaction state feature corresponding to a third sample interaction scene image generated after the sample interaction operation is executed. There are various methods for adjusting the model parameters of the agent to be trained based on the second sample interaction state feature and the third sample interaction state feature, for example, adjusting the model parameters of the agent to be trained based on an error value between the second sample interaction state feature and the third sample interaction state feature and a preset interaction state feature. For another example, the model parameters of the agent to be trained are adjusted based on the scheduling excitation data and the interaction excitation data corresponding to the second sample interaction state feature and the third sample interaction state feature respectively. Steps S302 to S304 are described by taking as an example a method of adjusting model parameters of an agent to be trained based on scheduling excitation data and interaction excitation data corresponding to the interaction state feature of the second sample and the interaction state feature of the third sample, respectively.

S302, based on the interaction state characteristics of a second sample corresponding to the interaction scene image of the second sample generated by the intelligent agent to be trained after the sample scheduling operation is executed, scheduling excitation data of the sample scheduling operation is determined according to a preset scheduling excitation strategy.

And determining a first scheduling sub-excitation of the sample scheduling operation based on whether the current position of the sample virtual controlled element is matched with the target position corresponding to the sample virtual controlled element indicated by the sample scheduling operation or not, or based on whether the sub-region where the sample virtual controlled element is currently located is matched with the target sub-region corresponding to the sample virtual controlled element indicated by the sample scheduling operation.

If there is a match, then the first scheduled sub-excitation is a positive excitation, and if there is no match, then the first scheduled sub-excitation is a negative excitation. For example, the current position of the sample virtual controlled element may be determined to match the target position corresponding to the sample virtual controlled element indicated by the sample scheduling operation, given a preset stimulus value, based on the current position of the sample virtual controlled element and the distance difference between the target positions corresponding to the sample virtual controlled elements indicated by the sample scheduling operation, if the distance difference is determined to be smaller than the first specified distance threshold. And if the distance difference is larger than the first specified distance threshold and smaller than the second specified distance threshold, determining the current position of the sample virtual controlled element, and giving a value corresponding to a specified percentage of the preset excitation value, wherein the target position of the sample virtual controlled element is not completely matched with the target position corresponding to the sample virtual controlled element indicated by the sample scheduling operation. And if the distance difference is larger than a third specified distance threshold, determining the current position of the sample virtual controlled element, and giving a preset negative excitation value when the current position of the sample virtual controlled element does not match with the target position corresponding to the sample virtual controlled element indicated by the sample scheduling operation.

And determining a second scheduling sub-excitation of the sample scheduling operation based on the variation value between the first eigenvector and the second eigenvector corresponding to the sample virtual controlled element. Taking a game scene as an example, the second scheduling sub-incentive of the sample scheduling operation can be determined according to the change of hero experience values, the change of gold coins, the change of blood volume, the change of killing number, the change of killed times and the change of blood volume of a main building.

After obtaining the first scheduling sub-excitation and the second scheduling sub-excitation, scheduling excitation data of the sample scheduling operation may be determined based on a weighted sum of the first scheduling sub-excitation and the second scheduling sub-excitation, please refer to equation (1). The weight may be a preset value, or a value learned in the process of training the agent to be trained, and is not limited specifically.

Rt=wd*Rd+we*Re (1)

Wherein R istFor the first scheduling of sub-excitations RdAnd a second scheduling sub-excitation ReWeighted sum of, wdFor the first scheduling of sub-excitations RdWeight of (1), weFor the second scheduling of sub-excitations ReThe weight of (c).

After obtaining the weighted sum of the first scheduled sub-stimulus and the second scheduled sub-stimulus, the cumulative sum of the weighted sums of all obtained first scheduled sub-stimuli and second scheduled sub-stimuli for the process from the current time to the end of the interaction of the agent to be trained and the reference agent may be determined based on Bellman Equation (Bellman Equation), please refer to equations (2) and (3).

V(St)=E[Rt+1+λV(St+1)|St=s] (3)

Wherein the excitation data V (S) is scheduledt) May be an expected value, λ, of the cumulative sum of the weighted sums of the first scheduled sub-excitation and the second scheduled sub-excitation obtained for the whole process from the current to the end of the interaction of the agent to be trained and the reference agentkFor the attenuation coefficient, s represents the virtual scene of the first sample interactive scene image at the corresponding moment.

The scheduling excitation data not only comprises data for representing the influence degree of the sample scheduling operation on the sample interaction result, but also comprises data for representing the completion degree of the sample scheduling operation, namely the scheduling excitation data is obtained by combining dense excitation data and sparse excitation data, so that the situation that the intelligent body generates a local optimal solution when predicting the virtual interaction operation is avoided, the interaction accuracy is reduced, and the decision-making capability of the trained target intelligent body is improved.

And S303, determining interactive excitation data of the sample interactive operation according to a preset interactive excitation strategy based on the interactive state characteristics of a third sample corresponding to the interactive scene image of the third sample generated by the intelligent body to be trained after the sample interactive operation is executed.

After the sample interaction operation is executed, a third sample interaction scene image can be obtained, and a third sample interaction state feature corresponding to the third sample interaction scene image can be extracted by adopting a quantitative information extraction module to obtain a third feature vector, a third global view angle feature matrix and a third local view angle feature matrix.

And determining the interactive excitation of the sample interactive operation based on the variation value between the second characteristic vector and the third characteristic vector corresponding to the sample virtual controlled element. Scheduling incentive data for the sample scheduling operation is determined based on the interactive incentives. Taking a game scene as an example, the interactive incentive of the sample interactive operation can be determined according to the change of the hero experience value, the change of gold coins, the change of blood volume, the change of killing number, the change of killing times and the change of blood volume of a main building. Likewise, the expectation of the sum or the sum of the interaction excitations that are available for the agent to be trained and the reference agent during the course from the current to the end of the interaction can be determined on the basis of Bellman's equations (Bellman equalisation).

As an embodiment, the agent to be trained may further include a training module, where the training module is configured to obtain each piece of scheduling incentive data and each piece of interaction incentive data based on each sample interaction state feature, and adjust a model parameter of the agent to be trained based on each piece of obtained scheduling incentive data and each piece of interaction incentive data.

S304, determining error values between the scheduling excitation data and the interactive excitation data and the preset target excitation data respectively, and adjusting model parameters of the agent to be trained based on the obtained error values.

If the training module comprises a scheduling model and an interactive model, model parameters of the scheduling model and model parameters of the interactive model can be respectively adjusted based on the scheduling incentive data and the interactive incentive data, and the scheduling model and the interactive model are respectively subjected to reinforcement learning. For example, error values between the scheduled excitation data and the interactive excitation data, and the preset target excitation data are respectively determined. After obtaining the error values, the model parameters of the scheduling model and the model parameters of the interactive model may be adjusted according to the obtained error values.

For another example, if the target excitation data includes scheduling target excitation data and interactive target excitation data, a scheduling error value between the scheduling excitation data and the scheduling target excitation data may be determined, and model parameters of the scheduling model may be adjusted based on the obtained scheduling error value. Meanwhile, an interaction error value between the interaction excitation data and the interaction target excitation data can be determined, and model parameters of the interaction model are adjusted based on the obtained interaction error value.

By means of the method for hierarchical reinforcement learning from macroscopic and microscopic angles, complex prediction problems can be simplified into two simple prediction problems, training for the macroscopic decision-making capability of the intelligent agent is added on the premise that the training standard for the interaction capability of the intelligent agent is not reduced, so that the trained target intelligent agent can accurately achieve the real interaction effect when the virtual account number performs virtual interaction, and the interaction accuracy of the target intelligent agent is improved.

As an embodiment, in the process of training an agent to be trained, after each round of iterative training, the trained agent to be trained may be evaluated, and an evaluation value of the agent to be trained is determined based on scheduling excitation data and interactive excitation data obtained each time before and the scheduling excitation data and interactive excitation data obtained this time, where the evaluation value is used to represent the training degree of the agent to be trained. For example, by an ELO evaluation mechanism, the training level of the agent to be trained is evaluated, and so on.

In the process of training the intelligent agent to be trained, after each round of iterative training, the trained intelligent agent to be trained can be evaluated, and the specific evaluation time is not limited. And when the evaluation value obtained for multiple times tends to converge, outputting the agent to be trained as the target agent.

In the process of training the agent to be trained, the iteration times can be counted, and if the iteration times reach the preset maximum times, the agent to be trained is output as the target agent.

As an embodiment, in the training process for the agents to be trained, the agents can be conveniently and quickly extended to multiple machines in parallel through a multi-container docker mirror image according to the available machine capacity, so that, referring to fig. 6, the agents to be trained and different reference agents can be trained on multiple machines, and multiple agents to be trained can be trained on multiple machines, so that the generation efficiency of AI battle data can be greatly improved.

After obtaining the target agent, the agent training terminal 103 may send the target agent to the agent interaction terminal 102, so that the agent interaction terminal 102 uses the target agent after receiving the target agent sent by the agent training terminal 103.

The process of interacting with a target agent is described below.

Please refer to fig. 7, which is a schematic flow chart illustrating the interaction process using the target agent.

And S701, responding to an interaction request instruction triggered by the virtual account, and loading the target agent.

The virtual account number can trigger an interaction request instruction through the client, and a target agent is loaded in response to the interaction request instruction triggered by the virtual account number. Therefore, the virtual account can perform virtual interaction with the target agent in the target virtual interaction scene.

S702, responding to a control operation triggered by a first target virtual controlled element associated with a virtual account in a target virtual interaction scene by the virtual account, and acquiring a target interaction scene image corresponding to the control operation.

The virtual account can trigger a control operation for a first target virtual controlled element associated with the virtual account in a target virtual interaction scene through the client, and a target interaction scene image corresponding to the control operation is acquired in response to the control operation triggered by the virtual account for the first target virtual controlled element associated with the virtual account in the target virtual interaction scene.

And S703, extracting target interaction state features from the target interaction scene image.

After the target interaction scene image corresponding to the control operation is obtained, the target interaction state feature of the target interaction scene image may be extracted, and the process of extracting the target interaction state feature may refer to the process of extracting the sample interaction state feature of the sample interaction scene image described above, which is not described herein again.

S704, determining target scheduling operation and target interaction operation corresponding to the target agent based on the target interaction state characteristics.

After the target interaction state features are obtained, the target scheduling operation and the target interaction operation corresponding to the target agent can be determined based on the target interaction state features, and the process of determining the target scheduling operation and the target interaction operation can refer to the process of determining the sample scheduling operation and the sample interaction operation described above, and is not described herein again.

S705, responding to the target scheduling operation and the target interaction operation, and controlling a second target virtual controlled element associated with the target agent in the target virtual interaction scene.

And after the target scheduling operation and the target interaction operation are determined, the target agent executes the target scheduling operation and the target interaction operation, responds to the target scheduling operation and the target interaction operation, and controls a second target virtual controlled element associated with the target agent in a target virtual interaction scene to realize the interaction between the target agent and the virtual account.

The target agent and the virtual account number can be interacted for multiple times until an interaction ending instruction is received, the interaction between the target agent and the virtual account number is ended, and repeated description is not carried out on each interaction in the application. And if the interaction ending instruction is received, generating a target interaction result of the virtual account and the target intelligent agent in the target virtual interaction scene based on the interaction ending instruction. The interaction ending instruction may be triggered by the virtual account through the client, may also be triggered by the target agent, and may also be automatically generated when the virtual interaction process ends, which is not limited specifically.

The target agent is obtained by training based on the sample virtual interaction data, and the training process may specifically refer to the above description of the process of training the agent to be trained.

In the following, a game scene is taken as an example to describe the intelligent agent interaction method provided by the embodiment of the present application.

And sequentially extracting the reference intelligent bodies from the reference intelligent body set to perform reinforcement learning training on the intelligent bodies to be trained, wherein the intelligent bodies to be trained can play a battle game with the intelligent bodies to be trained during the first training, and along with the expansion of the reference intelligent body set, other reference intelligent bodies can be extracted from the reference intelligent body set to perform reinforcement learning training on the intelligent bodies to be trained during the subsequent training, and the intelligent bodies to be trained are output as target intelligent bodies until the evaluation values of the intelligent bodies to be trained tend to converge or the training times reach the upper limit.

In the one-time training process, the intelligent body to be trained can control a plurality of sample virtual controlled elements, namely sample hero, and the reference intelligent body can control a plurality of reference virtual controlled elements, namely a plurality of reference hero. Also included in the virtual interactive scene are a plurality of environmental elements, namely, shelter and npc in the battle game. After the intelligent agent to be trained and the reference intelligent agent enter the battle game, corresponding first sample interactive scene images can be obtained under the visual angle of each sample hero.

Based on the first sample interaction state characteristics corresponding to the first sample interaction scene images, sample scheduling operations executed by the to-be-trained intelligent agent for each sample hero in the battle game can be predicted, and the sample scheduling operations can include scheduling actions corresponding to each sample hero. And the sample interaction operation executed by the intelligent body to be trained aiming at the sample hero after executing the sample scheduling operation can be predicted, wherein the sample interaction operation can comprise a skill action corresponding to each sample hero.

After the sample scheduling operation is performed for each sample hero, a second sample interactive scene image is obtained. Based on the second sample interaction state characteristics corresponding to the second sample interaction scene images, scheduling excitation data of the sample scheduling operation can be determined. After the sample interaction operation is performed for each sample hero, a third sample interaction scene image is obtained. Based on the third sample interaction state features corresponding to the third sample interaction scene images, interaction excitation data of sample interaction operation can be determined.

The agent to be trained can be subjected to reinforcement learning training based on the obtained scheduling incentive data and interactive incentive data, for example, hierarchical reinforcement learning training is performed based on the scheduling incentive data and the interactive incentive data, respectively.

After multiple rounds of training are performed to obtain the target agent, the virtual account number can click an intelligent fight button in a fight game to load the target agent. The virtual account number can control a first target virtually controlled element, namely, a first target hero to move to the position where the target agent is located, and can implement skill attack on a second target virtually controlled element, namely, a second target hero or npc, which can be controlled by the target agent, and avoid the skill attack on the first target hero by the second target hero or npc.

If the virtual account kills all the second target heros or npc, or the fighting game time is over, an interaction end instruction may be generated, and after the interaction end instruction is obtained, a target interaction result may be generated, where the target interaction result may indicate whether the virtual account wins or the target agent wins, may indicate that each first target hero in the virtual account kills several second target heros and several npc, and may indicate that each second target hero in the target agent kills several first target heros and several npc, and the like, which is not limited in particular.

Based on the same inventive concept, the embodiment of the present application provides an agent interaction apparatus, which is equivalent to the target agent discussed above and can implement the corresponding function of the agent interaction method. Referring to fig. 8, the apparatus includes a loading module 801 and a processing module 802, wherein:

the loading module 801: the system comprises a target agent, a virtual account and an interaction request instruction, wherein the interaction request instruction is used for responding to an interaction request instruction triggered by a virtual account and loading the target agent;

the processing module 802: the method comprises the steps of responding to a control operation triggered by a first target virtual controlled element associated with a virtual account in a target virtual interaction scene by the virtual account, and acquiring a target interaction scene image corresponding to the control operation;

the processing module 802 is further configured to: extracting target interaction state features from the target interaction scene image;

the processing module 802 is further configured to: determining target scheduling operation and target interaction operation corresponding to the target agent based on the target interaction state characteristics;

the processing module 802 is further configured to: and responding to the target scheduling operation and the target interaction operation, and controlling a second target virtual controlled element associated with the target agent in the target virtual interaction scene.

In one possible embodiment, the target agent is trained in the following manner:

the processing module 802 is further configured to: based on the interactive process of the to-be-trained agent and the preset reference agent in the sample virtual interactive scene, performing multi-round iterative training on the to-be-trained agent, outputting the to-be-trained agent as a target agent until a preset training target is met, wherein in one round of iterative training, the processing module 802 is specifically used for:

predicting a sample scheduling operation executed by the intelligent body to be trained aiming at a sample virtual controlled element associated with the intelligent body to be trained in the sample virtual interactive scene based on a first sample interactive state characteristic corresponding to a first sample interactive scene image in the sample virtual interactive scene, and predicting a sample interactive operation executed by the intelligent body to be trained aiming at the sample virtual controlled element after the intelligent body to be trained executes the sample scheduling operation;

and adjusting the model parameters of the intelligent agent to be trained based on the second sample interaction state characteristics corresponding to the second sample interaction scene images generated after the sample scheduling operation is executed and the third sample interaction state characteristics corresponding to the third sample interaction scene images generated after the sample interaction operation is executed.

In a possible embodiment, the processing module 802 is specifically configured to:

determining scheduling excitation data of the sample scheduling operation according to a preset scheduling excitation strategy based on the second sample interaction state characteristic, wherein the scheduling excitation data is used for representing the completion degree of the sample scheduling operation and the influence degree of the sample scheduling operation on the sample interaction result;

determining interactive excitation data of sample interactive operation according to a preset interactive excitation strategy based on the interactive state characteristics of the third sample, wherein the interactive excitation data is used for representing the influence degree of the sample interactive operation on the sample interactive result;

and respectively determining error values between the scheduling excitation data and the interactive excitation data and the preset target excitation data, and adjusting the model parameters of the intelligent agent to be trained based on the obtained error values.

In one possible embodiment, the processing module 802 is further configured to:

after adjusting the model parameters of the intelligent agent to be trained based on the obtained error values, determining an evaluation value of the intelligent agent to be trained based on each scheduling excitation data and each interactive excitation data obtained by multi-round iterative training according to a preset scoring strategy, wherein the evaluation value is used for representing the training degree of the intelligent agent to be trained;

and if the evaluation value is converged, outputting the agent to be trained as a target agent.

In a possible embodiment, the processing module 802 is specifically configured to:

randomly extracting reference agents from all reference agents based on the selection probability corresponding to each reference agent in a preset reference agent set;

performing multiple rounds of iterative training on the intelligent agent to be trained based on the interactive process of the intelligent agent to be trained and the extracted reference intelligent agent in the sample virtual interactive scene;

if the to-be-trained intelligent agent does not meet the training target when the sample interaction result of the to-be-trained intelligent agent and the extracted reference intelligent agent is obtained, re-extracting the reference intelligent agent from each reference intelligent agent, and continuously performing multi-round iterative training on the to-be-trained intelligent agent;

and if the intelligent agent to be trained meets the training target, outputting the intelligent agent to be trained as the target intelligent agent.

In one possible embodiment, the processing module 802 is further configured to:

counting the training times of iterative training of the agent to be trained before outputting the agent to be trained as a target agent;

if the counted training times reach the preset specified times, outputting the intelligent agent to be trained as a reference intelligent agent, and adding the reference intelligent agent to the reference intelligent agent set;

and resetting the training times, continuing to carry out iterative training on the agents to be trained, and updating the reference agent set based on the newly counted training times.

In one possible embodiment, the processing module 802 is further configured to:

predicting a sample scheduling operation executed by an intelligent agent to be trained aiming at a sample virtual controlled element associated with the intelligent agent to be trained in a sample virtual interactive scene based on a first sample interactive state characteristic corresponding to a first sample interactive scene image in the sample virtual interactive scene, and predicting that the intelligent agent to be trained carries out region identification processing on the first sample interactive scene image after the intelligent agent to be trained carries out the sample scheduling operation and before the sample interactive operation executed aiming at the sample virtual controlled element to obtain a first interactive result region, a first global visual angle region and a first local visual angle region;

respectively carrying out image feature extraction processing on the first interaction result area, the first global view angle area and the first local view angle area to respectively obtain a corresponding first feature vector, a first global view angle feature matrix and a first local view angle feature matrix, the first feature vector is used for representing interaction information related to a sample interaction result, the first global perspective feature matrix is used for representing position information of a sample virtual controlled element, position information of a reference virtual controlled element associated with a reference agent and position information of a scene element contained in a sample virtual interaction scene, and the first local perspective feature matrix is used for representing position information of the sample virtual controlled element contained in a first local perspective area, position information of the reference virtual controlled element contained in the first local perspective area and position information of the scene element contained in the first local perspective area;

and taking the first eigenvector, the first global view angle characteristic matrix and the first local view angle characteristic matrix as first sample interaction state characteristics corresponding to the first sample interaction scene image.

In a possible embodiment, the processing module 802 is specifically configured to:

predicting a sample scheduling operation executed by the intelligent agent to be trained aiming at the sample virtual controlled elements based on the first feature vector and the first global view feature matrix;

predicting a prediction characteristic vector, a prediction global view characteristic matrix and a prediction local view characteristic matrix corresponding to a prediction interactive scene image generated after the sample scheduling operation is executed based on the sample scheduling operation, the first global view characteristic matrix and the first local view characteristic matrix;

and predicting sample interaction operation performed by the intelligent body to be trained aiming at the sample virtual controlled elements based on the prediction feature vector and the prediction local visual angle feature matrix.

In a possible embodiment, the processing module 802 is specifically configured to:

dividing a first global view area into a plurality of sub-areas;

predicting a target sub-region corresponding to the sample virtual controlled element based on the first feature vector and the first global view feature matrix;

and obtaining a sample scheduling operation based on the sub-region where the sample virtual controlled element is located currently and the target sub-region corresponding to the sample virtual controlled element.

In one possible embodiment, the processing module 802 is further configured to:

after a sample scheduling operation is obtained based on the current sub-region of the sample virtual controlled element and a target sub-region corresponding to the sample virtual controlled element, the sample virtual controlled element is controlled to move to the corresponding target sub-region based on the sample scheduling operation, and a second sample interactive scene image generated by the intelligent body to be trained is obtained;

and extracting a second sample interaction state characteristic corresponding to the second sample interaction scene image to obtain a second characteristic vector, a second global view angle characteristic matrix and a second local view angle characteristic matrix.

In one possible embodiment, the processing module 802 is further configured to:

after extracting second sample interaction state features corresponding to a second sample interaction scene image, determining a first scheduling sub-excitation of a sample scheduling operation based on whether a sub-region where a sample virtual controlled element is currently located is matched with a target sub-region corresponding to the sample virtual controlled element indicated by the sample scheduling operation;

determining a second scheduling sub-excitation of the sample scheduling operation based on a variation value between the first eigenvector and the second eigenvector corresponding to the sample virtual controlled element;

scheduling excitation data for the sample scheduling operation is determined based on a weighted sum of the first scheduling sub-excitation and the second scheduling sub-excitation.

In one possible embodiment, the agent to be trained comprises a quantitative information extraction module, wherein the quantitative information extraction module is used for extracting sample interaction state features corresponding to each sample interaction scene image;

the intelligent agent to be trained further comprises a training module, wherein the training module is used for obtaining each scheduling excitation data and each interactive excitation data based on each sample interactive state characteristic, and adjusting model parameters of the intelligent agent to be trained based on each obtained scheduling excitation data and each interactive excitation data.

In a possible embodiment, the processing module 802 is specifically configured to:

if the training module comprises a scheduling model and an interactive model, and the target excitation data comprises scheduling target excitation data and interactive target excitation data, determining a scheduling error value between the scheduling excitation data and the scheduling target excitation data, and adjusting model parameters of the scheduling model based on the obtained scheduling error value;

and determining an interaction error value between the interaction excitation data and the interaction target excitation data, and adjusting model parameters of the interaction model based on the obtained interaction error value.

Based on the same inventive concept, the embodiment of the present application provides a computer device, and the computer device 900 is described below.

Referring to fig. 9, the above-mentioned agent interaction apparatus may be run on a computer device 900, and a current version and a historical version of a data storage program and application software corresponding to the data storage program may be installed on the computer device 900, where the computer device 900 includes a display unit 940, a processor 980 and a memory 920, where the display unit 940 includes a display panel 941 for displaying an interface interacted by a user, etc.

In one possible embodiment, the Display panel 941 may be configured in the form of a Liquid Crystal Display (LCD) or an Organic Light-Emitting Diode (OLED) or the like.

The processor 980 is configured to read the computer program and then execute a method defined by the computer program, for example, the processor 980 reads a data storage program or a file, etc., so as to run the data storage program on the computer device 900 and display a corresponding interface on the display unit 940. The Processor 980 may include one or more general-purpose processors and may further include one or more DSPs (Digital Signal processors) for performing relevant operations to implement the solutions provided by the embodiments of the present application.

Memory 920 typically includes both internal and external memory, which may be Random Access Memory (RAM), Read Only Memory (ROM), and CACHE memory (CACHE). The external memory can be a hard disk, an optical disk, a USB disk, a floppy disk or a tape drive. The memory 920 is used for storing a computer program including an application program corresponding to each client and other data, which may include data generated after an operating system or the application program is executed, including system data (e.g., configuration parameters of the operating system) and user data. In the embodiment of the present application, the program instructions are stored in the memory 920, and the processor 980 executes the program instructions stored in the memory 920, thereby implementing any one of the methods for agent interaction discussed in the previous figures.

The display unit 940 is used to receive input digital information, character information, or touch operation/non-touch gesture, and generate signal input related to user setting and function control of the computer apparatus 900, and the like. Specifically, in the embodiment of the present application, the display unit 940 may include a display panel 941. The display panel 941, for example, a touch screen, can collect touch operations by a user (for example, operations of the user on the display panel 941 or on the display panel 941 by using a finger, a stylus pen, or any other suitable object or attachment), and drive a corresponding connection device according to a preset program.

In one possible embodiment, the display panel 941 may include two portions of a touch detection device and a touch controller. The touch detection device detects the touch direction of a player, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 980, and can receive and execute commands sent by the processor 980.

The display panel 941 may be implemented by a plurality of types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the display unit 940, the computer device 900 may also include an input unit 930, which input unit 930 may include a graphical input device 931 and other input devices 932, wherein the other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

In addition to the above, the computer device 900 may also include a power supply 990 for powering the other modules, an audio circuit 960, a near field communication module 970, and an RF circuit 910. The computer device 900 may also include one or more sensors 950, such as acceleration sensors, light sensors, pressure sensors, and the like. The audio circuit 960 specifically includes a speaker 961 and a microphone 962, and the computer device 900 may collect the user's voice through the microphone 962, perform corresponding operations, and so on.

For one embodiment, the number of the processors 980 may be one or more, and the processors 980 and the memories 920 may be coupled or relatively independent.

As an example, the processor 980 in fig. 9 may be used to implement the functionality of the loading module 801 and the processing module 802 in fig. 8.

As an example, the processor 980 in fig. 9 may be configured to implement the corresponding functions of the server 102 discussed above.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

40页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:终端游戏的控制方法及装置、存储介质及终端

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类