Game game-play decision-making method and device, electronic equipment and storage medium

文档序号：427725 发布日期：2021-12-24 浏览：43次中文

阅读说明：本技术 博弈对局决策方法和装置、电子设备及存储介质 (Game game-play decision-making method and device, electronic equipment and storage medium ) 是由洪伟峻林俊洪曾广俊林悦于 2021-09-10 设计创作，主要内容包括：本发明公开了一种博弈对局决策方法和装置、电子设备及存储介质。其中,该方法包括：获取博弈对局在第一时刻的第一状态信息,其中,第一状态信息包括：博弈对局在第一时刻的多种博弈信息,以及博弈对局当前所处的第一决策阶段；利用当前决策模型对第一状态信息进行处理,得到第一决策结果,其中,决策结果包括：待执行动作,或,待进入的第二决策阶段；执行第一决策结果。本发明解决了通过大量人力编制精细的规则进行博弈对局决策,导致决策方法复杂度和成本就高的技术问题。(The invention discloses a game-to-game decision-making method and device, electronic equipment and a storage medium. Wherein, the method comprises the following steps: acquiring first state information of a game pair at a first moment, wherein the first state information comprises: the method comprises the following steps that various game information of game matches at a first moment and a first decision stage of the game matches at present; processing the first state information by using the current decision model to obtain a first decision result, wherein the decision result comprises: an action to be executed, or a second decision stage to be entered; and executing the first decision result. The invention solves the technical problem of high complexity and cost of a decision method caused by the fact that a great amount of manpower is used for compiling fine rules to carry out game match decision.)

1. A game-play decision-making method is characterized by comprising the following steps:

acquiring first state information of a game pair at a first moment, wherein the first state information comprises: the game is characterized by comprising a plurality of kinds of game information of the game pair at the first moment and a first decision stage in which the game pair is currently located;

processing the first state information by using a current decision model to obtain a first decision result, wherein the decision result comprises: an action to be executed, or a second decision stage to be entered;

and executing the first decision result.

2. The method of claim 1, wherein the first decision result comprises: in the case of the second decision phase to be entered, executing the first decision result comprises:

acquiring second state information of the game pair at a second moment, wherein the second state information comprises: a plurality of game information of the game in the second time and the second decision stage;

processing the second state information by using the current decision model to obtain a second decision result;

and executing the second decision result.

3. The method of claim 1, wherein the first decision result comprises: in the case of the action to be performed, executing the first decision result comprises:

storing the action to be executed;

determining whether the first decision stage is executed completely based on the stored number of the actions to be executed;

and executing the stored action to be executed under the condition that the first decision phase is determined to be executed completely.

4. The method of claim 3, wherein in the event that it is determined that the first decision stage has not been performed, the method further comprises:

acquiring third state information of the game at a third moment, wherein the third state information comprises: a plurality of game information of the game in the third moment and the first decision stage;

processing the third state information by using the current decision model to obtain a third decision result;

and executing the third decision result.

5. The method of claim 3, wherein determining whether the first decision stage is performed based on the stored number of actions to be performed comprises:

determining a preset number corresponding to the first decision stage, wherein the preset number is used for representing the number of actions to be executed allowed to be executed in the first decision stage;

determining whether the stored number of the actions to be executed is the same as the preset number;

if the stored number of the actions to be executed is the same as the preset number, determining that the execution of the first decision stage is finished;

and if the stored number of the actions to be executed is different from the preset number, determining that the first decision stage is not executed completely.

6. The method of claim 3, wherein after performing the stored action to be performed, the method further comprises:

determining a third decision stage following the first decision stage;

acquiring fourth state information of the game at a fourth moment, wherein the fourth state information comprises: a plurality of game information of the game in the fourth moment, and a third decision stage;

processing the fourth state information by using the current decision model to obtain a fourth decision result;

executing the fourth decision result.

7. The method of claim 6, wherein prior to determining a third decision stage subsequent to the first decision stage, the method further comprises:

determining whether the first decision stage is a preset decision stage;

determining the third decision stage after the first decision stage if the first decision stage is the preset decision stage.

8. The method according to any one of claims 1 to 7, further comprising:

acquiring historical information of a plurality of game games, wherein the historical information comprises: historical state information and historical decision results;

training data is constructed based on the historical information of the plurality of game matches;

and training the current decision model by using the training data based on a near-end strategy optimization algorithm.

9. The method of claim 8, wherein the historical state information comprises: initial action information, executed action information, and scoring information, wherein constructing the training data based on historical information for the plurality of gaming pairings comprises:

obtaining a difference value between score information of a first game pair and score information of a second game pair to obtain historical reward information of the first game pair, wherein the first game pair is adjacent to the second game pair, and the first game pair is earlier than the second game pair;

and obtaining the training data based on the historical state information, the historical decision result and the historical reward information of the game-pair.

10. The method of claim 9, wherein deriving the training data based on historical state information, historical decision results, and historical award information for the plurality of game pairings comprises:

obtaining the product of the historical reward information of the plurality of game matches and a preset value to obtain target reward information of the plurality of game matches;

and obtaining the training data based on the historical state information, the historical decision result and the target reward information of the game pairs.

11. The method of claim 8, wherein obtaining historical information for the plurality of game plays comprises one of:

obtaining historical game information in an interactive environment to obtain historical information of the game matches, wherein the interactive environment is used for interacting with a target object;

and acquiring the historical information processed by using a historical decision model.

12. The method of claim 8, further comprising:

receiving historical information of the plurality of game matches sent by a plurality of processing cores;

dividing the historical information of the game matches into a plurality of historical information sets, wherein the historical information sets correspond to a plurality of training processes one by one;

and training the current decision model by using corresponding training data based on a near-end strategy optimization algorithm through the plurality of training processes.

13. The method of claim 12, wherein training, by the plurality of training processes based on a near-end strategy optimization algorithm, the current decision model with corresponding training data comprises:

determining gradients corresponding to the plurality of training processes;

synchronizing gradients corresponding to the plurality of training processes;

and updating the parameters of the current decision model through the training processes according to the average value of the gradients corresponding to the training processes.

14. The method according to any one of claims 1 to 7, further comprising:

outputting a plurality of decision models, wherein training time of different decision models is different;

and responding to a selection instruction for selecting the plurality of decision models, and determining the current decision model corresponding to the selection instruction.

15. A game play decision device, comprising:

the obtaining module is configured to obtain first state information of a game match at a first time, where the first state information includes: the game is characterized by comprising a plurality of kinds of game information of the game pair at the first moment and a first decision stage in which the game pair is currently located;

a processing module, configured to process the first state information by using a current decision model to obtain a first decision result, where the first decision result includes: an action to be executed, or a second decision stage to be entered;

and the execution module is used for executing the first decision result.

16. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls a device on which the computer-readable storage medium resides to perform the game play decision method of any one of claims 1 to 14.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a program executable by the at least one processor, the program when executed by the at least one processor performing the game play decision method of any one of claims 1 to 14.

Technical Field

The invention relates to the field of computers, in particular to a game-play decision-making method and device, electronic equipment and a storage medium.

Background

In the card game, AI (Artificial Intelligence) is often required to be designed to enhance the playability of the game itself and the game experience of the player. In general, in a more interesting card game playing method, cards are various, various combinations may be formed among different cards, and the game itself is more complicated, which brings challenges to AI design of card games. Traditional rule-based AI and other robotic rules that need to be implemented as games become more complex also become more complex, requiring a significant amount of human effort to develop detailed rules to reach the level of high-end players.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a game-play decision-making method and device, electronic equipment and a storage medium, which are used for at least solving the technical problem of high complexity and cost of a decision-making method caused by the fact that a great amount of fine rules are compiled by manpower to carry out game-play decision-making.

According to an aspect of an embodiment of the present invention, a game-play decision method is provided, including: acquiring first state information of a game pair at a first moment, wherein the first state information comprises: the method comprises the following steps that various game information of game matches at a first moment and a first decision stage of the game matches at present; processing the first state information by using the current decision model to obtain a first decision result, wherein the decision result comprises: an action to be executed, or a second decision stage to be entered; and executing the first decision result.

Optionally, the first decision result includes: in the case of a second decision phase to be entered, executing the first decision result comprises: acquiring second state information of the game at a second moment, wherein the second state information comprises: various game information of the game in the second moment and a second decision stage; processing the second state information by using the current decision model to obtain a second decision result; and executing the second decision result.

Optionally, the first decision result includes: in the case of an action to be performed, executing the first decision result includes: storing the action to be executed; determining whether the first decision stage is executed completely or not based on the stored number of the actions to be executed; and executing the stored action to be executed under the condition that the execution of the first decision phase is determined to be finished.

Optionally, in a case that it is determined that the first decision stage is not executed completely, the method further includes: acquiring third state information of the game at a third moment, wherein the third state information comprises: various game information of the game in the third moment and a first decision stage; processing the third state information by using the current decision model to obtain a third decision result; and executing the third decision result.

Optionally, determining whether the first decision stage is performed completely based on the stored number of actions to be performed comprises: determining a preset number corresponding to the first decision stage, wherein the preset number is used for representing the number of actions to be executed allowed to be executed in the first decision stage; determining whether the stored number of the actions to be executed is the same as a preset number; if the stored number of the actions to be executed is the same as the preset number, the first decision stage is determined to be executed completely; and if the stored number of the actions to be executed is different from the preset number, determining that the first decision stage is not executed completely.

Optionally, after executing the stored action to be executed, the method further comprises: determining a third decision stage following the first decision stage; acquiring fourth state information of the game at a fourth moment, wherein the fourth state information comprises: various game information of the game in the fourth moment and a third decision stage; processing the fourth state information by using the current decision model to obtain a fourth decision result; and executing the fourth decision result.

Optionally, before determining a third decision stage after the first decision stage, the method further comprises: determining whether the first decision stage is a preset decision stage; if the first decision stage is a preset decision stage, a third decision stage following the first decision stage is determined.

Optionally, the method further comprises: acquiring historical information of a plurality of game games, wherein the historical information comprises: historical state information and historical decision results; training data are constructed based on historical information of a plurality of game games; and training the current decision model by using the training data based on a near-end strategy optimization algorithm.

Optionally, the historical state information includes: the game play management method includes the steps of initial action information, executed action information and score information, wherein the training data are constructed based on historical information of a plurality of game plays, and the training data comprise the following steps: obtaining a difference value between score information of a first game match and score information of a second game match to obtain historical reward information of the first game match, wherein the first game match and the second game match are adjacent, and the first game match is earlier than the second game match; and obtaining training data based on the historical state information, the historical decision result and the historical reward information of the game-pair.

Optionally, obtaining training data based on historical state information, historical decision results, and historical award information of the plurality of game pairs includes: obtaining the product of historical reward information of a plurality of game matches and a preset value to obtain target reward information of the plurality of game matches; and obtaining training data based on the historical state information, the historical decision result and the target reward information of the game pairs.

Optionally, the obtaining of the history information of the plurality of game pairs includes one of: obtaining historical game information in an interactive environment to obtain historical information of a plurality of game games, wherein the interactive environment is used for interacting with a target object; and acquiring historical information of the plurality of game matches obtained by processing the historical decision model.

Optionally, the method further comprises: receiving historical information of a plurality of game matches sent by a plurality of processing cores; dividing historical information of a plurality of game-to-game games into a plurality of historical information sets, wherein the plurality of historical information sets correspond to a plurality of training processes one to one; and training the current decision model by using corresponding training data through a plurality of training processes based on a near-end strategy optimization algorithm.

Optionally, training, by a plurality of training processes based on a near-end strategy optimization algorithm, the current decision model using corresponding training data includes: determining gradients corresponding to a plurality of training processes; synchronizing gradients corresponding to the plurality of training processes; and updating the parameters of the current decision model through a plurality of training processes according to the average value of the gradients corresponding to the plurality of training processes.

Optionally, the trained decision model is sent to a plurality of processing cores through a model pool process.

Optionally, the method further comprises: outputting a plurality of decision models, wherein training time of different decision models is different; and responding to a selection instruction for selecting the plurality of decision models, and determining a current decision model corresponding to the selection instruction.

According to another aspect of the embodiments of the present invention, there is also provided a game match decision device, including: the obtaining module is used for obtaining first state information of a game at a first moment, wherein the first state information comprises: the method comprises the following steps that various game information of game matches at a first moment and a first decision stage of the game matches at present; a processing module, configured to process the first state information by using a current decision model to obtain a first decision result, where the first decision result includes: an action to be executed, or a second decision stage to be entered; and the execution module is used for executing the first decision result.

Optionally, the execution module includes: the obtaining unit is used for obtaining second state information of the game in a second moment, wherein the second state information comprises: various game information of the game in the second moment and a second decision stage; the processing unit is used for processing the second state information by using the current decision model to obtain a second decision result; and the execution unit is used for executing the second decision result.

Optionally, the execution module includes: the storage unit is used for storing the action to be executed; the first determination unit is used for determining whether the first decision stage is executed completely or not based on the stored number of the actions to be executed; and the execution unit is used for executing the stored action to be executed under the condition that the execution of the first decision stage is determined to be finished.

Optionally, the obtaining module is further configured to obtain third state information of the game match at a third time when it is determined that the first decision stage is not executed completely, where the third state information includes: various game information of the game in the third moment and a first decision stage; the processing unit is further used for processing the third state information by using the current decision model to obtain a third decision result; the execution unit is further configured to execute the third decision result.

Optionally, the first determining unit is further configured to: determining a preset number corresponding to the first decision stage, wherein the preset number is used for representing the number of actions to be executed allowed to be executed in the first decision stage; determining whether the stored number of the actions to be executed is the same as a preset number; if the stored number of the actions to be executed is the same as the preset number, the first decision stage is determined to be executed completely; and if the stored number of the actions to be executed is different from the preset number, determining that the first decision stage is not executed completely.

Optionally, the apparatus further comprises: the first determination module is used for determining a third decision stage after the first decision stage after executing the stored action to be executed; the obtaining module is further configured to obtain fourth state information of the game match at a fourth time, where the fourth state information includes: various game information of the game in the fourth moment and a third decision stage; the processing module is further used for processing the fourth state information by using the current decision model to obtain a fourth decision result; the execution module is further configured to execute the fourth decision result.

Optionally, the apparatus further comprises: the second determining module is used for determining whether the first decision stage is a preset decision stage or not before determining a third decision stage after the first decision stage; the first determining module is further configured to determine a third decision stage subsequent to the first decision stage if the first decision stage is a preset decision stage.

Optionally, the apparatus further comprises: the obtaining module is further used for obtaining historical information of a plurality of game matches, wherein the historical information comprises: historical state information and historical decision results; the building module is used for building training data based on historical information of a plurality of game matches; and the training module is used for training the current decision model by utilizing the training data based on the near-end strategy optimization algorithm.

Optionally, the historical state information includes: initial action information, executed action information and score information, wherein the construction module comprises: the second determining unit is used for obtaining the difference value of the score information of the first game pair and the score information of the second game pair to obtain the historical reward information of the first game pair, wherein the first game pair is adjacent to the second game pair, and the first game pair is earlier than the second game pair; and the construction unit is used for obtaining training data based on the historical state information, the historical decision result and the historical reward information of the game pairs.

Optionally, the construction unit is further configured to: obtaining the product of historical reward information of a plurality of game matches and a preset value to obtain target reward information of the plurality of game matches; and obtaining training data based on the historical state information, the historical decision result and the target reward information of the game pairs.

Optionally, the obtaining module includes one of: the game system comprises a first acquisition unit, a second acquisition unit and a game execution unit, wherein the first acquisition unit is used for acquiring historical game information in an interactive environment and acquiring historical information of a plurality of game matches, and the interactive environment is used for interacting with a target object; and the second acquisition unit is used for acquiring the historical information of the plurality of game matches, which is obtained by utilizing the historical decision model for processing.

Optionally, the apparatus further comprises: the receiving module is used for receiving historical information of a plurality of game matches sent by a plurality of processing cores; the scheduling module is used for dividing the historical information of the game games into a plurality of historical information sets in a halving mode, wherein the historical information sets correspond to the training processes one by one; and the training module is used for training the current decision model by using corresponding training data based on a near-end strategy optimization algorithm through a plurality of training processes.

Optionally, the training module comprises: the third determining unit is used for determining gradients corresponding to a plurality of training processes; the synchronization unit is used for synchronizing the gradients corresponding to the training processes; and the updating unit is used for updating the parameters of the current decision model through the training processes according to the average values of the gradients corresponding to the training processes.

Optionally, the apparatus comprises: and the sending module is used for sending the trained decision model to the plurality of processing cores through the model pool process.

Optionally, the apparatus further comprises: the output module is used for outputting a plurality of decision models, wherein the training time of different decision models is different; and the response module is used for responding to a selection instruction for selecting the plurality of decision models and determining the current decision model corresponding to the selection instruction.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the game-play decision method in any one of the above embodiments.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores a program executable by the at least one processor, and the program, when executed by the at least one processor, performs the game-play game-pair decision method in any of the above embodiments.

In the embodiment of the invention, the first state information of the game is acquired at the first moment, the first state information is processed by utilizing the current decision-making model to obtain the first decision-making result, and the decision-making purpose of the first decision-making stage is further completed by executing the first decision-making result. Compared with the related technology, the decision process of the game is divided into a plurality of decision stages, the purpose of decision is completed by using the current decision model, and complicated rules do not need to be written according to the playing methods of different game, so that the technical effects of reducing the complexity and cost of the decision method and improving the flexibility of the decision space are achieved, and the technical problem that the complexity and cost of the decision method are high due to the fact that the game is decided by compiling a large number of fine rules through manpower is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a game play decision method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an alternative deep reinforcement learning training framework according to an embodiment of the invention;

fig. 3 is a schematic diagram of a game match decision device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, technical terms and technical terms in the embodiments of the present invention are explained as follows:

self-play: in the self-game training method, the opponent in the interaction of the training model and the environment is the historical version of the model, namely the model obtained by historical training.

PPO: proximal Policy Optimization, a near-end Policy Optimization algorithm, is a widely used depth-enhanced learning algorithm.

A finite state machine: mathematical models representing a finite number of states and the behavior of transitions and actions of those states.

Traditional AI's are mainly based on rules written AI's, e.g., behavior trees AI, where each node in the tree represents a subset of game state, and the basic principle is as follows: when it is necessary to decide what kind of action the current AI should take, this tree can be searched from top to bottom by some conditions, finally determining the action that needs to be done (leaf nodes). Also for example, the finite state machine AI can switch between different AI behaviors according to a given game state transition, with the decision of the current behavior illustratively depending on the past state.

However, the AI requires a certain rule to be manually set according to the state and condition of the game. If a stronger, more player-like AI needs to be built, or the game itself becomes more complex, more manpower is required to enumerate the state conditions that may arise and make corresponding decisions. In the face of a real complex and variable environment, the AI is generally difficult to cover all situations, and particularly, the AI does not behave like a player, is easy to find a defect by the player, and the like.

According to an embodiment of the present invention, a game play decision method is provided, it should be noted that the steps shown in the flow chart of the figure may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flow chart, in some cases the steps shown or described may be executed in an order different from that shown.

In the embodiment of the invention, the game match can be a round-based card game, and particularly, different from the playing method of the traditional card game such as a landlord, the number of times of playing in each round is not fixed any more, and the number of times of playing in each round is also fixed any more.

For example, in a card game, the cards may be 2, public cards and function cards. A community card refers to a card played by both parties in a game that both parties may obtain. In a game, 52 public cards are used, which are divided into 4 kinds of suits corresponding to green, red, yellow-white and blue-purple colors. The function cards refer to cards collected by players, special effects are triggered if the function cards are triggered in a game, and the number of cards played in each round can be increased by partial active function cards.

After a game is played, both parties take 10 cards from the pile as hands, and there is a common card selection area that displays the 10 cards that are dispensed from the pile. After each round is started, the normal matching stage is entered, one party in the current operation needs to select a hand and match the hand with a card in the same suit in the public card selecting area to play, then the two cards are put into a card library to calculate scores, and finally the winner is determined according to the scores when all the hands of the two parties are exhausted.

If the hands of the current operator cannot be paired with the cards in the public card selecting area, before entering the normal pairing stage, the card discarding stage can be entered, one hand is discarded, and then two cards are drawn from the card stack, wherein one card serves as the own hand and the other card is placed in the public card selecting area. The card abandoning stage is executed at most twice in each round, and if the hands still cannot be paired with the cards in the public card selecting area after the card abandoning stage is executed twice, the round is ended after one hand is discarded. In addition, if the player holds the active function card with the functions of forced pairing, hand card replacement and the like, after each round is started, the stage of starting the active function card is firstly entered, the player can choose to send the active function card, and after the function execution of the active function card is finished, the normal pairing stage can be entered.

As can be seen from the above description of playing methods, in the card game, when a player has active function cards or enters a discard stage, the number of times of playing cards in each round is greater than 1, and the number of playing cards in each round is different, for example, the number of playing cards in the discard stage is 1, and the number of playing cards in the stage of sending the active function cards and the normal pairing stage may be greater than 1.

Therefore, in the above application scenario, AI needs to have continuous decision-making capability and a highly flexible decision space. In the embodiment of the invention, in order to enable the AI decision to be more generalized, the AI decision is combined with a simple finite state machine, the decision process of each round is divided into a plurality of decision stages, each decision stage only selects a fixed number of hands or enters a certain decision stage, and the current decision stage of the AI is recorded by the finite state machine, so that the size of the decision space is equivalent to the number of the cards in the card bank. Taking the above application scenario as an example, the decision process can be divided into: the card dealing method comprises an initialization stage, an active function card starting stage, a first card abandoning stage, a second card abandoning stage and a normal pairing stage.

Fig. 1 is a flowchart of a game-play decision method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step S102, first state information of the game at a first moment is obtained, wherein the first state information comprises: the game comprises a plurality of game information of the game pair at a first moment and a first decision stage of the game pair at present.

The first time in the above steps may be a time at which an AI decision is required in the game match, the game match in the embodiment of the present invention may be divided into a plurality of decision stages, and the first decision stage may be any one of the decision stages currently entered by the game match.

The game information in the above steps can be various game information in game match, and the game information can include: non-performed action information, and score information. Taking the above application scenario as an example, the game information may include: the card information of the card selection area is compared with the current score information of the card selection area, and the current score information is compared with the current score information of the card selection area.

In an alternative embodiment, a plurality of game information in the game pair can be acquired at a first time, each game information is mapped into a multidimensional vector, and then the multidimensional vector is spliced with a first decision stage in a finite state machine, so that first state information is obtained. Taking the above application scenario as an example, each card group may be mapped onto a 52-dimensional card group vector, and for cards existing in the card group, the corresponding dimension may be set to 1, otherwise, set to 0, for example, when an opening is made, 10 dimensions of 1 exist in the card group vector representing own hand information, and the remaining 42 dimensions are 0, and the finally obtained first state information may be a 384-dimensional vector.

Step S104, processing the first state information by using the current decision model to obtain a first decision result, wherein the decision result comprises: an action to be performed, or a second decision phase to be entered.

The current decision model in the above steps may be a model obtained by training using a deep reinforcement learning algorithm, the deep reinforcement learning is an algorithm for learning in interaction, and briefly, the model is optimized toward a direction of increasing the reward according to the interaction result. In the embodiment of the present invention, a model structure of the current decision model may adopt a fully connected neural network, and the current decision model may be trained by PPO, and as an example of the above application scenario, the current decision model may have inputs of 384 dimensions, two hidden layers and 256 nodes in each layer, and outputs of 58 dimensions correspond to 52 cards and a decision entering the second decision stage.

Taking the above application scenario as an example, when the first decision stage is an initialization stage, a second decision stage to be entered next needs to be determined through an AI decision, where the second decision stage may be a stage of launching an active function card, a card discarding stage or a normal pairing stage; determining the cards to be played through AI decision under the condition that the first decision stage is a stage of starting the active function cards or a card abandoning stage; in the case that the first decision stage is a normal pairing stage, cards to be paired need to be determined through an AI decision. Therefore, in the embodiment of the present invention, the decision result of the AI decision may be a card to be played or a decision stage before the initialization stage.

The action to be executed in the above steps may refer to a card to be played in the first decision stage, and the second decision stage may be a decision stage entered after the first decision stage determined by the finite state machine.

In an optional embodiment, the obtained first state information may be input to the current decision model, and an output of the current decision model is obtained, so as to obtain a first decision result.

Step S106, executing a first decision result.

In an optional embodiment, after the first decision result is obtained by the current decision model, if the first decision result is an action to be executed, the action to be executed may be executed, that is, the decision result is restored to a specific card-playing action by a decoding function, so as to achieve the purpose of playing the card selected by the AI decision; if the first decision result is to enter a second decision stage, the AI decision process of the second decision stage is started, and the decision process of the second decision stage is similar to that of the first decision stage.

Through the steps, the first state information of the game is obtained at the first moment, the first state information is processed by utilizing the current decision-making model to obtain a first decision-making result, and the decision-making purpose of the first decision-making stage is further completed by executing the first decision-making result. Compared with the related technology, the decision process of the game is divided into a plurality of decision stages, the purpose of decision is completed by using the current decision model, and complicated rules do not need to be written according to the playing methods of different game, so that the technical effects of reducing the complexity and cost of the decision method and improving the flexibility of the decision space are achieved, and the technical problem that the complexity and cost of the decision method are high due to the fact that the game is decided by compiling a large number of fine rules through manpower is solved.

In the above embodiment of the present invention, the first decision result includes: in the case of a second decision phase to be entered, executing the first decision result comprises: acquiring second state information of the game at a second moment, wherein the second state information comprises: various game information of the game in the second moment and a second decision stage; processing the second state information by using the current decision model to obtain a second decision result; and executing the second decision result.

The second time in the above steps may be a time of entering the second decision stage, and the second decision result, similar to the first decision result, may also include: to perform an action or enter another decision phase.

It should be noted that the decision process in the second decision stage is similar to the decision process in the first decision stage (i.e., the process from step S102 to step S106), and is not repeated herein.

In the above embodiment of the present invention, the first decision result includes: in the case of an action to be performed, executing the first decision result includes: storing the action to be executed; determining whether the first decision stage is executed completely or not based on the stored number of the actions to be executed; and executing the stored action to be executed under the condition that the execution of the first decision phase is determined to be finished.

It should be noted that, because the action to be executed output by the current decision-making model is to play a hand, and the number of hands that can be played in different decision-making stages is different, after one action to be executed is decided by using the current decision-making model each time, the action to be executed can be stored, the number of the stored actions to be executed is judged, and if the number of the stored actions to be executed reaches the maximum number of the first decision-making stage, the execution of the first decision-making stage is determined to be finished, and then all the stored actions to be executed are executed; and if the stored number of the actions to be executed does not reach the maximum number of the first decision stage, determining that the first decision stage is not executed completely, and continuing to make a decision by using the current decision model.

In the above embodiment of the present invention, when it is determined that the first decision stage is not executed completely, the method further includes: acquiring third state information of the game at a third moment, wherein the third state information comprises: various game information of the game in the third moment and a first decision stage; processing the third state information by using the current decision model to obtain a third decision result; and executing the third decision result.

The third time in the above steps may be a time when the first decision stage is determined not to be completed.

In an optional embodiment, after it is determined that the first decision stage is not executed, the current decision model is required to be used for continuing to make decisions, the various game information in the game are obtained again, the third state information is generated by combining the first decision stage, and then the current decision model is used for processing the third state information. That is, in the first decision stage, the decision flow using the current decision model is the same each time, and is not described herein again.

In the above embodiment of the present invention, determining whether the first decision stage is executed completely based on the stored number of the actions to be executed includes: determining a preset number corresponding to the first decision stage, wherein the preset number is used for representing the number of actions to be executed allowed to be executed in the first decision stage; determining whether the stored number of the actions to be executed is the same as a preset number; if the stored number of the actions to be executed is the same as the preset number, the first decision stage is determined to be executed completely; and if the stored number of the actions to be executed is different from the preset number, determining that the first decision stage is not executed completely.

In an optional embodiment, the number of cards allowed to be played in different decision stages is different, so that different preset numbers can be set for different decision stages in advance, actions to be executed can be stored after decision is made by using the current decision model each time, and whether execution of the first decision stage is finished is determined by judging whether the number of all stored actions to be executed reaches the preset number.

In the above embodiment of the present invention, after executing the stored action to be executed, the method further includes: determining a third decision stage following the first decision stage; acquiring fourth state information of the game at a fourth moment, wherein the fourth state information comprises: various game information of the game in the fourth moment and a third decision stage; processing the fourth state information by using the current decision model to obtain a fourth decision result; and executing the fourth decision result.

Taking the above application scenario as an example for explanation, for the phase of starting the active function card and the card discarding phase, after the decision phase is completed, the normal pairing phase may be continuously executed. In the embodiment of the present invention, after the first decision stage is executed, that is, after the stored action to be executed is executed, a third decision stage executed after the first decision stage needs to be determined according to the state of the finite state machine, and a decision flow of the third decision stage is similar to a decision flow of the first decision stage (i.e., the flows shown in step S102 to step S106), which is not described herein again.

In the above embodiment of the present invention, before determining the third decision stage after the first decision stage, the method further includes: determining whether the first decision stage is a preset decision stage; if the first decision stage is a preset decision stage, a third decision stage following the first decision stage is determined.

Taking the above application scenario as an example, for the normal pairing phase, after the normal pairing phase is completed, the round is ended, and the next round may be started. Therefore, the preset decision stage in the above steps may be other stages except the normal pairing stage, that is, after the execution of the preset decision stage is completed, the game will not be ended, and the AI decision needs to be continued.

In an optional embodiment, after the first decision stage is executed, it may be determined whether the first decision stage is a preset decision stage, and if the first decision stage is the preset decision stage, it indicates that the game is not ended, and an AI decision needs to be continued, so that a third decision stage performed after the first decision stage may be determined based on a state of a finite state machine; if the first decision stage is not the preset decision stage, the game is ended, and other decision stages need to be entered.

The above embodiments of the present invention will be described in detail below with reference to the above application scenarios. Assuming that the maximum number of cards allowed to be played per decision phase is 2, and the player is equipped with two active function cards, one hand may be played using active function card A and two hands may be played using active function card B.

On the basis, the game can be divided into the following four decision stages:

an initialization stage, selecting whether to start an active function board or not;

starting the active function board A, wherein the corresponding preset number of the active function boards A is 1;

a stage of starting the active function board B, wherein the corresponding preset number of the active function boards B is 2;

and a normal pairing stage, wherein the corresponding preset number of the stage is 2, the stage can be started with any number of active function cards before, and the stage is followed with the turn ending.

The whole round processing flow is as follows:

step 1: setting the current decision stage as the initialization stage.

Step 2: and performing inference once by using the current decision model according to the current card game state, and outputting a decision result, wherein the decision result can be a decision stage selected to enter except the initialization stage, or a card to be played.

And step 3: and judging the current decision stage. If the initialization stage is in place, entering step 4; if the active function board A is started, entering step 5; if the active function board B is started, entering the step 6; if the pairing is in the normal pairing stage, step 7 is entered.

And 4, step 4: the current decision stage is an initialization stage, the decision result output by the current decision model is to select to enter a certain decision stage except the initialization stage, the current decision stage is set according to the decision result, and then the step 3 is executed again.

And 5: the current decision stage is the stage of starting the active function card A, and the decision result output by the current decision model is to select a certain card, so that the function card A can be started, the selected card can be played in the game, and then the step 1 is executed again.

Step 6: the current decision stage is the stage of starting the active function card B, and the decision result output by the current decision model is to select a certain card. If the first card to be played is not selected before, recording the currently selected card as the first card of the two cards to be played, and then executing the step 2; if the first card to be played has been selected previously, we will use the currently selected card as the second card and then play it with the first card using the active function card B in the game, and then re-execute step 1.

And 7: the current decision stage is a normal pairing stage, and the decision result output by the current decision model is to select a certain card. If the current hand has no cards which can be matched with the public card selecting area, the current hand is played as a discard card in the game, and then the step 2 is executed again. Under the condition that a matchable card exists in the current hand, if the first card to be played is not selected before, recording the currently selected card as the first card of the two cards to be played, and then executing the step 2; if the first card to be played has been selected previously, we will take the currently selected card as the second card and then play it with the first card in the game until the round is over.

Through the steps and by combining a simple finite state machine, the complexity of a decision space is greatly reduced and the efficiency of training a current decision model is greatly improved under the condition of introducing a small amount of rules.

In the above embodiment of the present invention, the method further includes: acquiring historical information of a plurality of game games, wherein the historical information comprises: historical state information and historical decision results; training data are constructed based on historical information of a plurality of game games; and training the current decision model by using the training data based on a near-end strategy optimization algorithm.

The game-play game-pair in the above steps may be a plurality of rounds in one game, in which AI decision is made, or a plurality of rounds in a plurality of games, in which AI decision is made, and the history information may be original game information generated by interaction between the AI and the game environment or between the AI and the training environment. Since the AI training environment is different from the game environment directly interfacing with the player, the acquired original game information needs to be converted into a form easily understood by the AI, and training data for training the current decision model needs to be generated.

It should be noted that, after each model training is completed, the decision result obtained by using the newly trained model to perform AI decision may be used as the training data for the next model training.

Through the steps, the current decision-making model is trained through a near-end strategy algorithm, so that the AI can be explored and learned through a large amount of combat, the decision-making result is optimized, and the high-strength AI is realized.

In the above embodiment of the present invention, the historical status information includes: the game play management method includes the steps of initial action information, executed action information and score information, wherein the training data are constructed based on historical information of a plurality of game plays, and the training data comprise the following steps: obtaining a difference value between score information of a first game match and score information of a second game match to obtain historical reward information of the first game match, wherein the first game match and the second game match are adjacent, and the first game match is earlier than the second game match; and obtaining training data based on the historical state information, the historical decision result and the historical reward information of the game-pair.

The initial action information in the above steps may be card information of the AI at the beginning of the game match, the executed action information may be information that the AI has executed an action at the end of the game match, and the score information may be accumulated scores of the AI at the end of the game match. Still taking the above application scenario as an example, the initial action information may be hand information of the AI at the beginning of each round and card information of the common deck area, the executed action information may be information of a card that the AI has played at the end of each round, and the score information may be an accumulated score of the AI at the end of each round.

Still taking the above application scenario as an example, in order to guide the AI to train optimization in the direction of a larger jackpot, in the embodiment of the present invention, bonus information may be extracted from the original game information before and after the decision. In an alternative embodiment, the difference in AI cumulative winnings between two rounds may be used as the reward signal, for example, if the AI cumulative score for the previous round exceeds the opponent by 6 points, the AI for the next round exceeds the opponent by 2 points, and the AI's decision in the previous round corresponds to a reward value of minus 8. That is, the difference in score information between two rounds on which AI decisions are made may be obtained as the bonus signal.

In the above embodiment of the present invention, obtaining training data based on the historical state information, the historical decision result, and the historical reward information in a plurality of historical stages includes: obtaining the product of historical reward information of a plurality of historical stages and a preset value to obtain target reward information of the plurality of historical stages; training data is obtained based on historical state information, historical decision results and target reward information of the plurality of historical stages.

The preset value in the above step may be a preset scaling constant.

In an alternative embodiment, the historical reward information is multiplied by a preset value to be scaled and then trained in the training process of the current decision model.

In the above embodiment of the present invention, acquiring history information of a plurality of game matches includes one of the following: obtaining historical game information in an interactive environment to obtain historical information of a plurality of game games, wherein the interactive environment is used for interacting with a target object; and acquiring historical information of the plurality of game matches obtained by processing the historical decision model.

In an alternative embodiment, the actual game information may be obtained from the game environment of the AI and player games, and historical information of a plurality of rounds in which AI decisions have been made may be obtained. In another alternative embodiment, the AI may generate historical information of a plurality of game plays through self-play, and in order to make the opponents stronger and stronger in self-play, a historical decision model may be periodically selected as the opponent of the current decision model.

In the above embodiment of the present invention, the method further includes: receiving historical information of a plurality of game matches sent by a plurality of processing cores; dividing historical information of a plurality of game-to-game games into a plurality of historical information sets, wherein the plurality of historical information sets correspond to a plurality of training processes one to one; and training the current decision model by using corresponding training data through a plurality of training processes based on a near-end strategy optimization algorithm.

In an alternative embodiment, in order to satisfy the conditions required by large-scale sampling and training, the embodiment of the present invention may construct a deep reinforcement learning training framework as shown in fig. 2, where the framework is deployed on a cluster of 70 CPU (Central Processing Unit) machines and 1 GPU (Graphics Processing Unit) machine.

Each CPU machine comprises 45 cores, and each core corresponds to a working process for AI self-play (such as an agent in the figure) and environment interaction to generate historical information of a plurality of game matches for training.

The GPU machine comprises 8 NVIDIA 2080Ti video cards which respectively correspond to a training process, each training process receives historical information of a plurality of game matches from each CPU machine in a load balancing manner, namely, the historical information of the game matches generated by all the CPU machines can be divided into 8 historical information sets, optionally, each training process can receive the historical information sent by working processes of 70 × 45/8 ═ 393.75 (some 393 are 394), and the received historical information is spliced to form a batch and then trained by using a PPO algorithm.

It should be noted that, in the embodiment of the present invention, each training process may obtain, from the work process, a data track generated after each environment and model interact, and the track length may be 128, that is, 128 times of data continuously interacted with the same environment.

In the above embodiment of the present invention, training the current decision model by using corresponding training data based on a near-end policy optimization algorithm through a plurality of training processes includes: determining gradients corresponding to a plurality of training processes; synchronizing gradients corresponding to the plurality of training processes; and updating the parameters of the current decision model through a plurality of training processes according to the average value of the gradients corresponding to the plurality of training processes.

In an alternative embodiment, different training processes keep the initial parameters consistent during initialization, then a distributed data parallel system of the pytorch is accessed in the training process, the system is responsible for notifying other training processes after each training process calculates the respective gradient, and then the training processes update the model parameters according to the average gradient.

Through the steps, the model parameters are ensured to be still consistent after each training process is updated through the synchronous gradient in the model training process.

In the above embodiment of the present invention, the trained decision model is sent to the plurality of processing cores through the model pool process.

In an alternative embodiment, as shown in FIG. 2, there is a model pool process in addition to the training process in the GPU machine. The model pool process can receive the updated decision model after each training of the training process and send the decision model to each working process.

In the above embodiment of the present invention, the method further includes: outputting a plurality of decision models, wherein training time of different decision models is different; and responding to a selection instruction for selecting the plurality of decision models, and determining a current decision model corresponding to the selection instruction.

In an alternative embodiment, as shown in FIG. 2, the model pool process may output multiple decision models to the work processes of multiple processing cores. The difficulty grades of the trained models can be carried out according to different model training times, and the higher the model strength is, the higher the difficulty grade is, the longer the training time is, so that the difficulty grades are suitable for different player levels. In the actual game process, the player can select the decision models with different difficulty levels according to the requirement of the player, the decision model selected by the player is the current decision model, and in the game process, the AI can adopt the decision model to make AI decision.

Still taking the above application scenario as an example, the AI finally corresponding to the card game can achieve 5 difficulties.

Through the steps, the AI difficulty can be flexibly adjusted by outputting a plurality of decision models, and the game experience of players with different levels can be conveniently improved.

According to the embodiment of the present invention, a game-play decision-making device is further provided, which can execute the game-play decision-making method in the above embodiment, and the specific implementation scheme and application scenario are the same as those in the above embodiment, and are not described herein again. Optionally, the apparatus may be deployed in a computer terminal cluster, where the computer terminal cluster includes: the game system comprises at least one central processor terminal and at least one graphic processor terminal, wherein the central processor terminal is used for making decisions in the game, and the graphic processor terminal is used for training a decision model.

In an embodiment of the present invention, the apparatus may be a deep reinforcement learning training framework as shown in fig. 2, and the computer terminal cluster may include 70 CPU machines and 1 GPU machine. Each CPU machine comprises 45 cores, and each core corresponds to a working process for AI self-play and environment interaction to generate historical information of a plurality of game pairs for training. The GPU machine comprises 8 NVIDIA 2080Ti display cards which respectively correspond to one training process, each training process receives historical information of a plurality of game matches from each CPU machine in a load balancing mode, the received historical information is spliced, and a PPO algorithm is used for training after a batch is formed.

Fig. 3 is a schematic diagram of a game play decision device according to an embodiment of the present invention, as shown, the device includes:

the obtaining module 32 is configured to obtain first state information of the game match at a first time, where the first state information includes: the game comprises a plurality of game information of the game pair at a first moment and a first decision stage of the game pair at present.

A processing module 34, configured to process the first state information by using the current decision model to obtain a first decision result, where the first decision result includes: an action to be performed, or a second decision phase to be entered.

And an executing module 36, configured to execute the first decision result.

In the above embodiments of the present invention, the execution module includes: the obtaining unit is used for obtaining second state information of the game in a second moment, wherein the second state information comprises: various game information of the game in the second moment and a second decision stage; the processing unit is used for processing the second state information by using the current decision model to obtain a second decision result; and the execution unit is used for executing the second decision result.

In the above embodiments of the present invention, the execution module includes: the storage unit is used for storing the action to be executed; the first determination unit is used for determining whether the first decision stage is executed completely or not based on the stored number of the actions to be executed; and the execution unit is used for executing the stored action to be executed under the condition that the execution of the first decision stage is determined to be finished.

In the above embodiment of the present invention, the obtaining module is further configured to obtain third state information of the game match at a third time when it is determined that the first decision stage is not executed completely, where the third state information includes: various game information of the game in the third moment and a first decision stage; the processing unit is further used for processing the third state information by using the current decision model to obtain a third decision result; the execution unit is further configured to execute the third decision result.

In the above embodiment of the present invention, the first determining unit is further configured to: determining a preset number corresponding to the first decision stage, wherein the preset number is used for representing the number of actions to be executed allowed to be executed in the first decision stage; determining whether the stored number of the actions to be executed is the same as a preset number; if the stored number of the actions to be executed is the same as the preset number, the first decision stage is determined to be executed completely; and if the stored number of the actions to be executed is different from the preset number, determining that the first decision stage is not executed completely.

In the above embodiment of the present invention, the apparatus further includes: the first determination module is used for determining a third decision stage after the first decision stage after executing the stored action to be executed; the obtaining module is further configured to obtain fourth state information of the game match at a fourth time, where the fourth state information includes: various game information of the game in the fourth moment and a third decision stage; the processing module is further used for processing the fourth state information by using the current decision model to obtain a fourth decision result; the execution module is further configured to execute the fourth decision result.

In the above embodiment of the present invention, the apparatus further includes: the second determining module is used for determining whether the first decision stage is a preset decision stage or not before determining a third decision stage after the first decision stage; the first determining module is further configured to determine a third decision stage subsequent to the first decision stage if the first decision stage is a preset decision stage.

In the above embodiment of the present invention, the apparatus further includes: the obtaining module is further used for obtaining historical information of a plurality of game matches, wherein the historical information comprises: historical state information and historical decision results; the building module is used for building training data based on historical information of a plurality of game matches; and the training module is used for training the current decision model by utilizing the training data based on the near-end strategy optimization algorithm.

In the above embodiment of the present invention, the historical status information includes: initial action information, executed action information and score information, wherein the construction module comprises: the second determining unit is used for obtaining the difference value of the score information of the first game pair and the score information of the second game pair to obtain the historical reward information of the first game pair, wherein the first game pair is adjacent to the second game pair, and the first game pair is earlier than the second game pair; and the construction unit is used for obtaining training data based on the historical state information, the historical decision result and the historical reward information of the game pairs.

In the above embodiment of the present invention, the construction unit is further configured to: obtaining the product of historical reward information of a plurality of game matches and a preset value to obtain target reward information of the plurality of game matches; and obtaining training data based on the historical state information, the historical decision result and the target reward information of the game pairs.

In the above embodiments of the present invention, the obtaining module includes one of the following: the game system comprises a first acquisition unit, a second acquisition unit and a game execution unit, wherein the first acquisition unit is used for acquiring historical game information in an interactive environment and acquiring historical information of a plurality of game matches, and the interactive environment is used for interacting with a target object; and the second acquisition unit is used for acquiring the historical information of the plurality of game matches, which is obtained by utilizing the historical decision model for processing.

In the above embodiment of the present invention, the apparatus further includes: the receiving module is used for receiving historical information of a plurality of game matches sent by a plurality of processing cores; the scheduling module is used for dividing the historical information of the game games into a plurality of historical information sets in a halving mode, wherein the historical information sets correspond to the training processes one by one; and the training module is used for training the current decision model by using corresponding training data based on a near-end strategy optimization algorithm through a plurality of training processes.

In the above embodiment of the present invention, the training module includes: the third determining unit is used for determining gradients corresponding to a plurality of training processes; the synchronization unit is used for synchronizing the gradients corresponding to the training processes; and the updating unit is used for updating the parameters of the current decision model through the training processes according to the average values of the gradients corresponding to the training processes.

In the above embodiment of the present invention, the apparatus includes: and the sending module is used for sending the trained decision model to the plurality of processing cores through the model pool process.

In the above embodiment of the present invention, the apparatus further includes: the output module is used for outputting a plurality of decision models, wherein the training time of different decision models is different; and the response module is used for responding to a selection instruction for selecting the plurality of decision models and determining the current decision model corresponding to the selection instruction.

According to an embodiment of the present invention, there is further provided a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the device on which the computer-readable storage medium is located is controlled to execute the game-play decision-making method in any one of the above embodiments.

According to an embodiment of the present invention, there is also provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores a program executable by the at least one processor, and the program, when executed by the at least one processor, performs the game-play game-pair decision method in any of the above embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

20页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种对战信息处理方法、系统及存储介质

Game game-play decision-making method and device, electronic equipment and storage medium

相关技术

网友询问留言