Interactive animation generation model training method, interactive animation generation method and system

文档序号：170479 发布日期：2021-10-29 浏览：42次中文

阅读说明：本技术 交互动画生成模型训练、交互动画生成方法和系统 (Interactive animation generation model training method, interactive animation generation method and system ) 是由渠思源于 2021-08-02 设计创作，主要内容包括：本说明书实施例公开了一种交互动画生成模型训练、交互动画生成方法和系统。其中,用户输入角色的初始骨骼状态、物体的空间分布、物体的初始位置、初始姿态、目标位置和目标姿态等中的一种或多种信息后,交互动画生成模型可基于用户输入生成在角色的骨骼运动参数,该骨骼运动参数可指示在该角色与物体交互过程中的至少两个时间点骨骼的位置和姿态。进而,可以基于角色的骨骼运动参数生成该角色与物体的交互动画。(The embodiment of the specification discloses interactive animation generation model training and interactive animation generation methods and systems. Wherein, after the user inputs one or more of the initial bone state of the character, the spatial distribution of the object, the initial position, the initial posture, the target position, the target posture of the object, and the like, the interactive animation generation model may generate bone motion parameters at the character based on the user input, the bone motion parameters may indicate the position and posture of the bone at least two time points in the interaction process of the character and the object. In turn, an animation of the character's interaction with the object may be generated based on the character's skeletal motion parameters.)

1. An interactive animation generation model training method comprises the following steps:

obtaining a plurality of sample input data, wherein the sample input data comprise initial skeleton state parameters of a character, space distribution parameters of a rigid object and motion track parameters of the rigid object, the initial skeleton state parameters indicate initial positions and initial postures of one or more skeletons of the character, and the motion track parameters at least indicate the initial positions, the initial postures, the target positions and the target postures of the rigid object;

obtaining a plurality of sample label data corresponding to the plurality of sample input data, respectively, the sample label data including skeletal motion parameters of a character, the skeletal motion parameters indicating positions and postures of the one or more bones at least two time points in an interaction process of the character with a rigid object;

and training an initial model by using the plurality of sample input data and the plurality of sample label data to obtain the interactive animation generation model.

2. The interactive animation generation model training method of claim 1, wherein the one or more bones comprise bones from a shoulder to a finger.

3. The interactive animation generation model training method of claim 1, wherein the spatial distribution parameters of the rigid object are obtained based on the model of the rigid object subjected to the low-face polygonization process.

4. The interactive animation generation model training method according to claim 1 or 3, wherein the spatial distribution of the model of the rigid object does not exceed a cubic space of a preset size, the cubic space of the preset size is divided into a plurality of sub-cubic spaces, and the spatial distribution parameter of the rigid object indicates the proportion of the part of the model of the rigid object in each sub-cubic space to the sub-cubic space.

5. The interactive animation generation model training method of claim 1, wherein the initial bone state parameters comprise three-dimensional coordinates indicative of a position of the one or more bones and a rotational quaternion indicative of a pose of the one or more bones;

the motion trail parameters of the rigid object comprise three-dimensional coordinates indicating a starting position and a target position of the rigid object, and a rotation quaternion indicating an initial posture and a target posture of the rigid object;

wherein the rotation quaternion comprises elements of a three-dimensional vector representation of a rotation axis and a rotation angle around the rotation axis.

6. The interactive animation generation model training method of claim 1, wherein the initial model comprises a neural network.

7. The interactive animation generation model training method as claimed in claim 6, wherein the interactive animation generation model is obtained based on a random gradient descent algorithm.

8. The interactive animation generation model training method of claim 6 or 7, wherein the neural network comprises a first encoder, a second encoder, and a motion parameter decoder; the first encoder is used for extracting features of initial bone parameters of the role to obtain a first feature vector; the second encoder is used for extracting the characteristics of the space distribution parameters of the rigid object to obtain a second characteristic vector; the motion parameter decoder is used for taking the output of the current step as a part of the input of the next step and predicting the output of each step according to the input of the step; the condition parameters input into the motion parameter decoder comprise a combined feature vector formed by splicing the first feature vector and the second feature vector, and the input of each step of the motion parameter decoder at least comprises a motion track parameter of a rigid object; the bone motion parameters are derived based on the step outputs of the motion parameter decoder.

9. The interactive animation generation model training method of claim 8, wherein the first encoder comprises a multi-layer perceptron.

10. The interactive animation generation model training method of claim 8, wherein the second encoder comprises a three-dimensional convolutional neural network.

11. The interactive animation generation model training method of claim 8, wherein the motion parameter decoder is a Transformer structure.

12. An interactive animation generation model training system comprises a sample input data acquisition module, a sample label data acquisition module and a training module;

the system comprises a sample input data acquisition module, a data acquisition module and a data processing module, wherein the sample input data acquisition module is used for acquiring a plurality of sample input data, the sample input data comprise initial skeleton state parameters of a role, space distribution parameters of a rigid object and motion track parameters of the rigid object, the initial skeleton state parameters indicate the initial position and the initial posture of one or more skeletons of the role, and the motion track parameters at least indicate the initial position, the initial posture, the target position and the target posture of the rigid object;

the sample label data acquisition module is used for acquiring a plurality of sample label data respectively corresponding to the plurality of sample input data, wherein the sample label data comprises bone motion parameters of a character, and the bone motion parameters indicate the position and the posture of the one or more bones at least two time points in the interaction process of the character and a rigid object;

the training module is used for training an initial model by using the plurality of sample input data and the plurality of sample label data to obtain the interactive animation generation model.

13. An interactive animation generative model training apparatus comprising a processor and a storage device, the storage device being arranged to store instructions which, when executed by the processor, carry out the method according to any one of claims 1 to 11.

14. An interactive animation generation method, comprising:

acquiring initial skeleton state parameters of a target character, space distribution parameters of a target rigid object and motion track parameters of the target rigid object, wherein the initial skeleton state parameters indicate the initial position and the initial posture of one or more skeletons of the target character, and the motion track parameters at least indicate the initial position, the initial posture, the target position and the target posture of the target rigid object;

inputting initial bone state parameters of the target character, spatial distribution parameters of the target rigid object and motion trajectory parameters of the target rigid object into an interactive animation generation model to obtain bone motion parameters of the target character output by the interactive animation generation model, wherein the bone motion parameters indicate the position and the posture of the one or more bones at least two time points in the process of interacting the target character with the target rigid object;

generating an interaction animation of the target character with the target rigid object based on the skeletal motion parameters of the target character.

15. The interactive animation generation method of claim 14, wherein the generating the interactive animation of the target character with the target rigid object based on the skeletal motion parameters of the target character comprises generating the interactive animation of the target character with the target rigid object based on the skeletal motion parameters of the target character and the motion trajectory parameters of the target rigid object.

16. The interactive animation generation method according to claim 14, wherein the interactive animation generation model is obtained by the interactive animation generation model training method according to any one of claims 1 to 11.

17. An interactive animation generation system comprises an input parameter acquisition module, an output parameter acquisition module and an interactive animation generation module;

the input parameter acquisition module is used for acquiring initial skeleton state parameters of a target character, space distribution parameters of a target rigid object and motion track parameters of the target rigid object, wherein the initial skeleton state parameters indicate the initial position and the initial posture of one or more skeletons of the target character, and the motion track parameters at least indicate the initial position, the initial posture, the target position and the target posture of the target rigid object;

the output parameter obtaining module is used for inputting the initial bone state parameters of the target character, the spatial distribution parameters of the target rigid object and the motion trail parameters of the target rigid object into an interactive animation generation model so as to obtain the bone motion parameters of the target character output by the interactive animation generation model, wherein the bone motion parameters indicate the position and the posture of the one or more bones at least two time points in the interaction process of the target character and the target rigid object;

the interactive animation generating module is used for generating the interactive animation of the target character and the target rigid object based on the bone motion parameters of the target character.

18. An interactive animation generation apparatus comprising a processor and a storage device, wherein the storage device is used for storing instructions, and when the processor executes the instructions, the interactive animation generation method according to any one of claims 14 to 16 is realized.

Technical Field

The present disclosure relates to animation production, and more particularly, to a method and system for interactive animation generation model training and interactive animation generation.

Background

Animation is increasingly used, for example, in the production of works such as animation, movies, games, advertisements, etc., animation technology is often required.

It is currently desirable to provide an efficient method of interactive animation.

Disclosure of Invention

One embodiment of the present specification provides an interactive animation generation model training method. The method can comprise the following steps: obtaining a plurality of sample input data, wherein the sample input data comprise initial skeleton state parameters of a character, space distribution parameters of a rigid object and motion track parameters of the rigid object, the initial skeleton state parameters indicate initial positions and initial postures of one or more skeletons of the character, and the motion track parameters at least indicate the initial positions, the initial postures, the target positions and the target postures of the rigid object; obtaining a plurality of sample label data corresponding to the plurality of sample input data, respectively, the sample label data including skeletal motion parameters of a character, the skeletal motion parameters indicating positions and postures of the one or more bones at least two time points in an interaction process of the character with a rigid object; and training an initial model by using the plurality of sample input data and the plurality of sample label data to obtain the interactive animation generation model.

One embodiment of the present specification provides an interactive animation generation method. The method can comprise the following steps: acquiring initial skeleton state parameters of a target character, space distribution parameters of a target rigid object and motion track parameters of the target rigid object, wherein the initial skeleton state parameters indicate the initial position and the initial posture of one or more skeletons of the target character, and the motion track parameters at least indicate the initial position, the initial posture, the target position and the target posture of the target rigid object; inputting initial bone state parameters of the target character, spatial distribution parameters of the target rigid object and motion trajectory parameters of the target rigid object into an interactive animation generation model to obtain bone motion parameters of the target character output by the interactive animation generation model, wherein the bone motion parameters indicate the position and the posture of the one or more bones at least two time points in the process of interacting the target character with the target rigid object; generating an interaction animation of the target character with the target rigid object based on the skeletal motion parameters of the target character.

One embodiment of the present specification provides an interactive animation generation model training system. The system may include a sample input data acquisition module, a sample label data acquisition module, and a training module. The sample input data acquiring module may be configured to acquire a plurality of sample input data, where the sample input data includes an initial bone state parameter of a character, a spatial distribution parameter of a rigid object, and a motion trajectory parameter of the rigid object, the initial bone state parameter indicates an initial position and an initial posture of one or more bones of the character, and the motion trajectory parameter indicates at least an initial position, an initial posture, a target position, and a target posture of the rigid object. The sample tag data obtaining module may be configured to obtain a plurality of sample tag data corresponding to the plurality of sample input data, respectively, the sample tag data including bone motion parameters of the character, the bone motion parameters indicating a position and a posture of the one or more bones at least two points in time during an interaction of the character with the rigid object. The training module may be configured to train an initial model using the plurality of sample input data and the plurality of sample label data to obtain the interactive animation generation model.

One embodiment of the present specification provides an interactive animation generation system. The system can comprise an input parameter acquisition module, an output parameter acquisition module and an interactive animation generation module. The input parameter obtaining module may be configured to obtain an initial bone state parameter of a target character, a spatial distribution parameter of a target rigid object, and a motion trajectory parameter of the target rigid object, where the initial bone state parameter indicates an initial position and an initial posture of one or more bones of the target character, and the motion trajectory parameter indicates at least an initial position, an initial posture, a target position, and a target posture of the target rigid object. The output parameter obtaining module may be configured to input the initial bone state parameter of the target character, the spatial distribution parameter of the target rigid object, and the motion trajectory parameter of the target rigid object into an interactive animation generation model, so as to obtain a bone motion parameter of the target character output by the interactive animation generation model, where the bone motion parameter indicates a position and a posture of the one or more bones at least two time points in an interaction process between the target character and the target rigid object. The interaction animation generation module may be configured to generate an interaction animation of the target character with the target rigid object based on the skeletal motion parameters of the target character.

One embodiment of the present specification provides an interactive animation generation model training device. The apparatus comprises a processor and a storage device, wherein the storage device is used for storing instructions, and when the processor executes the instructions, the interactive animation generation model training method according to any embodiment of the specification is realized.

One embodiment of the present specification provides an interactive animation generation apparatus. The device comprises a processor and a storage device, wherein the storage device is used for storing instructions, and when the processor executes the instructions, the interactive animation generation method is realized according to any embodiment of the specification.

Drawings

The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is an exemplary flow diagram of an interactive animation generation model training method according to some embodiments described herein;

FIG. 2 is an exemplary flow diagram of an interactive animation generation method, shown in accordance with some embodiments of the present description;

FIG. 3 is an exemplary structural diagram of a neural network for generating an interactive animation, shown in accordance with some embodiments of the present description;

FIG. 4 is an exemplary block diagram of an interactive animation generation model training system shown in accordance with some embodiments of the present description;

FIG. 5 is an exemplary block diagram of an interactive animation generation system shown in accordance with some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification, the terms "a", "an" and/or "the" are not intended to be inclusive of the singular, but rather are intended to be inclusive of the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

The movement (e.g., motion) of a character in a three-dimensional animation is typically accomplished using three-dimensional skeletal animation techniques. The skeleton animation binds each vertex of the character three-dimensional model surface to a plurality of skeletons, and the coordinates of the skeletons jointly determine the coordinates of the vertexes of the model surface. The calculation of the motion of an object is generally divided into two types, rigid objects and non-rigid objects, and this specification only discusses interactive animation for rigid objects. It should be understood that a rigid object refers to an object that does not deform (or deforms negligibly). The state of the rigid object may be determined based on position and attitude (also referred to as "rotation"). For the sake of brevity, the "object" appearing hereinafter may refer to a rigid object in particular.

There are a large number of character interactions with objects in three-dimensional animations. Several methods of interactive animation of characters and objects are provided below.

In some embodiments, in the animation software, the animator may adjust the character to a desired state by adjusting the position parameters and rotation parameters of the skeleton from frame to frame. Accordingly, the position parameters and the rotation parameters of the object can be adjusted frame by frame so as to make the whole interactive animation conform to the physical law. Such a method (denoted as method 1) requires a significant amount of time and effort by a professional animator to ensure the sophistication of the animation.

In some embodiments, with the help of motion capture technology, the actor can wear professional clothes and gloves, the position parameters and rotation parameters of the actor's skeleton can be captured and calculated by professional equipment, then the parameters are imported into the model through animation software, and then the animation effect of the model is checked by an animator and is corrected when needed. Such a method (denoted as method 2), while improving animation efficiency over method 1, introduces the cost of hiring actors, using motion capture sites and equipment. In addition, in practical application, data often has errors, and the data can hardly be used without manual correction, that is, the cost of manual correction is often also taken into account.

In some embodiments, target points may be preset on the object model by using an IK (Inverse Kinematics) algorithm, each target point corresponds to a designated touch point on the hand, then the hand is driven by the touch point to move to the target point, and the bone positions and rotation values of the hand and the arm are reversely calculated by using the IK. This method (denoted as method 3) is common in high-precision 3D game production, and requires setting a target point and a hand touch point in advance for each model. Because the physical relation calculation of pure space is adopted, the IK algorithm is easy to cause the die-penetrating problem of hands and objects, in addition, the calculated limb movement process is generally the shortest path, and action behaviors which are not in accordance with the habit of a human body are easy to occur.

The embodiment of the specification further provides a method for automatically generating motion parameters required by interaction of the three-dimensional character and the object by using the machine learning model. Wherein, after the user inputs one or more of the initial bone state of the character, the spatial distribution of the object, the initial position, the initial posture, the target position, the target posture of the object, and the like, the interactive animation generation model may generate bone motion parameters at the character based on the user input, the bone motion parameters may indicate the position and posture of the bone at least two time points in the interaction process of the character and the object. In turn, an animation of the character's interaction with the object may be generated based on the character's skeletal motion parameters.

The following is a description of two stages of model training and model prediction.

FIG. 1 is an exemplary flow diagram of an interactive animation generation model training method, shown in some embodiments herein. The process 100 may be performed by one or more processors. The process 100 corresponds to a model training phase in which sample input data and sample label data may be collectively referred to as training data. As shown in fig. 1, the process 100 may include:

at step 110, a plurality of sample input data is obtained. In some embodiments, step 110 may be implemented by sample input data acquisition module 410.

The sample input data may include initial skeletal state parameters of the character, spatial distribution parameters of the object, and motion trajectory parameters of the object. Wherein the initial bone state parameters may indicate an initial position and an initial pose of one or more bones of the character, and the motion trajectory parameters may indicate at least an initial position, an initial pose, a target position, and a target pose of the object.

In some embodiments, the one or more bones may include bones from the shoulder to the finger. In practical applications, only the shoulder-to-hand motion of the character during interaction with the object may be focused, i.e., the motion of other parts during interaction may be ignored. In this way, the data collection cost and the amount of calculation in the model training and prediction phase can be greatly reduced.

It should be noted that the bones referred to in this specification may be derived from real biological bone (e.g., human bone) structures, or may be customized (e.g., engineered or completely fictional to real biological bone structures).

In some embodiments, in order to reduce the amount of data calculation, a model of an object (an expression of the object in a computing device) may be subjected to a low-face-number polygonization process, and accordingly, a spatial distribution parameter of the object is obtained based on the model subjected to the low-face-number polygonization process. The low-face-number polygonization process may refer to representing the object with a polygon having a smaller number of faces, which means that some of the details of the concavity and convexity of the object may be ignored. It will be appreciated that the low-face-number polygonization process can be performed in conjunction with actual requirements. For example, the level of low-face-number polygonization processing may be adjusted according to the degree of fineness of the interactive animation. For another example, the low-surface-number polygonization processing may be performed according to a specific type of the interactive animation, and for example only, when the interactive animation is door opening by hand, only details of a door handle or the like having strong association with the door opening animation may be retained when performing the low-surface-number polygonization processing, so as to reduce the amount of calculation.

In some embodiments, the spatial distribution of the model of the object may be limited to a cubic (cubic) space of a preset size, i.e. not exceeding the cubic space of a preset size (volume). The cube space of the preset size may be divided into a plurality of subcube spaces, and accordingly, the spatial distribution parameter of the object (hereinafter, referred to as a model parameter of the object) may indicate a proportion of a portion of the model of the object in each subcube space occupying the subcube space. By way of example only, a cubic space having a side of 1m may be divided into 100³Each subcube space with a size of 1cm³. Accordingly, the model parameters of the object may be a three-dimensional tensor of 100 × 100 (with 100 × 100 points/components), wherein each point/component may take on values between 0-1.

Step 120, obtaining a plurality of sample label data corresponding to the plurality of sample input data, respectively. In some embodiments, step 120 may be implemented by the exemplar tag data acquisition module 420.

The sample tag data may include skeletal motion parameters of the character that may indicate a position and a pose of the one or more bones at least two points in time during the character's interaction with the object. It is understood that the bone motion parameter is a supervisory signal for model training.

In some embodiments, one or more parameters mentioned in the present specification (e.g., initial bone state parameters, bone motion parameters, motion trajectory parameters of an object) may represent a position (in three-dimensional space) by three-dimensional coordinates and a rotation (i.e., a pose) by a rotation quaternion. The rotation quaternion includes elements (3 numbers) represented by a three-dimensional vector of a rotation axis and a rotation angle (1 number) around the rotation axis. As such, if the position and rotation of an object (e.g., a certain bone or object of a character) is regarded as the state of the object, the state of the object can be represented by 7 numerical values. In particular, the initial bone state parameter of the character may be or include an array of lengths N (3+4), where N is the number of the one or more bones. The initial state and the target state of the object (which may constitute the motion trajectory parameters of the object) may each be 7 values. The sample label data or model output may be or include a sequence of frames of T x N (3+4), where N again represents the number of the one or more bones and T represents the number of frames. It is understood that one frame represents a still picture (image) of an animation (video) at a single point in time, i.e., T represents the number of the at least two points in time.

Model parameters of the object may be obtained with reference to physical objects of various sizes and/or shapes. The diversity/number of object models can be adjusted according to task requirements, for example, the number of object models can be more than 100 in order to meet the task requirements.

The initial bone state parameters and bone motion parameters of the character can be obtained by animators who perform animation and then derive the relevant parameters. The diversity/complexity of the interactive animations (actions) reflected by the initial bone state parameters and bone motion parameters can also be adjusted according to task requirements. For example, to meet task requirements, at least one or more basic interactions (animations) may be required for each object model, such as one or more of pick up, lift, grab, hold, drop, throw, and the like.

It should be appreciated that with respect to the acquisition of initial skeletal state parameters and/or skeletal motion parameters of a character, reference may also be made to the interactive animation methods provided in the foregoing embodiments. In some embodiments, the advantages of one or more interactive animation methods may be combined to obtain high quality training data. Wherein, the high quality can indicate that the interactive animation reflected by the bone motion parameters is natural and reasonable (such as conforming to the physical law).

And step 130, training an initial model by using the plurality of sample input data and the plurality of sample label data to obtain an interactive animation generation model. In some embodiments, step 130 may be implemented by training module 430.

The goal of training may include, but is not limited to, optimizing the error, i.e., reducing the gap between the model output and the sample label data.

In some embodiments, the initial model employed for training may include one or more of a linear Regression model, a Logistic Regression (LR) model, a decision tree, a neural network, and the like.

In some embodiments, when the initial model used for training includes a neural network, the initial model may be trained based on a stochastic gradient descent algorithm, resulting in an interactive animation generation model.

With regard to the specific structure of the neural network for generating the interactive animation, reference may be made to fig. 3 and its associated description.

FIG. 2 is an exemplary flow diagram of a method of interactive animation generation, shown in accordance with some embodiments of the present description. The process 100 may be performed by one or more processors. The process 200 corresponds to a model prediction phase, wherein the interactive animation generation model may be a prediction model obtained by training an initial model according to the process 100, and the target character and the target object are two interactive objects associated with the interactive animation to be predicted. As shown in fig. 1, the process 100 may include:

step 210, obtaining initial skeleton state parameters of the target role, space distribution parameters of the target object and motion track parameters of the target object. In some embodiments, step 210 may be implemented by input parameter acquisition module 510.

The initial skeletal state parameters are indicative of an initial position and an initial pose of one or more bones of the target character, and the motion trajectory parameters are indicative of at least an initial position, an initial pose, a target position, and a target pose of the target object.

More details about the initial bone state parameters of the target character, the spatial distribution parameters of the target object, and the motion trajectory parameters of the target object can be found in the related description of step 110, and are not repeated here.

And step 220, inputting the initial skeleton state parameters of the target character, the space distribution parameters of the target object and the motion trail parameters of the target object into the interactive animation generation model so as to obtain the skeleton motion parameters of the target character output by the interactive animation generation model. In some embodiments, step 220 may be implemented by output parameter obtaining module 520.

The bone motion parameters are indicative of a position and a pose of the one or more bones at least two points in time during interaction of the target character with the target object.

Further details regarding the bone motion parameters of the target character can be found in the related description of step 110, and are not repeated here again.

And step 230, generating an interactive animation of the target character and the target object based on the skeletal motion parameters of the target character. In some embodiments, step 230 may be implemented by interactive animation generation module 530.

It will be appreciated that the skeletal motion parameters of the (target) character may reflect the actions in the course of the (target) character interacting with the (target) object.

In some embodiments, an interaction animation of the target character with the target object may be generated based on the skeletal motion parameters of the target character, and the target object may not be included in the interaction animation. Of course, the motion animation of the target object (i.e., not including the character) may also be generated based on the motion trajectory parameter of the target object, and the interaction animation not including the target object and the motion animation of the target object are fused to obtain the interaction animation including both the target character and the target object.

In some embodiments, an interactive animation of the target character and the target object can be generated based on the skeletal motion parameters of the target character and the motion trajectory parameters of the target object, and the animation can include both the target character and the target object.

In some embodiments, the interaction animation of the target character with the target object may be generated by rendering through an engine (which may be integrated in animation software or software related to animation), or the like, based on the skeletal motion parameters of the target character, or based on the skeletal motion parameters of the target character and the motion trajectory parameters of the target object.

It is worth mentioning that the interactive animation mentioned in the present specification may only focus on the interactive action after the character comes into contact with the object, i.e. may ignore the action before the character interacts with the object, e.g. the preparatory action before interacting towards the object to be interacted, bending down, rising up, squatting down, etc.

It should be noted that the above description of the flow is for illustration and description only and does not limit the scope of the application of the present specification. Various modifications and alterations to the flow may occur to those skilled in the art, given the benefit of this description. However, such modifications and variations are intended to be within the scope of the present description.

FIG. 3 is a schematic diagram of an exemplary architecture of a neural network for generating interactive animations according to some embodiments of the present description.

As shown in fig. 3, the neural network 300 may include a first encoder 310, a second encoder 320, and a motion parameter decoder 330.

The first encoder 310 may be configured to perform feature extraction on the initial bone parameters of the character, so as to obtain a first feature vector. The second encoder 320 may be configured to perform feature extraction on the spatial distribution parameters (referred to as model parameters in fig. 3 for short) of the object to obtain a second feature vector.

The motion parameter decoder 330 may be used to take the output of the current step (which may also be interpreted as a "way") as part of the input to the next step and predict the output of each step based on the input to that step. Specifically, as shown in fig. 3, the motion parameter decoder 330 may predict the output out1 of the first step according to the input start (also referred to as start flag) of the first step, add the output out1 of the first step to the input of the second step, predict the output out2 of the second step according to the input of the second step, add the output out2 of the second step to the input of the third step, and so on until the output end (also referred to as end flag) of the last step is obtained. The condition parameters input to the motion parameter decoder 330 may include a combined feature vector into which the first feature vector and the second feature vector are spliced. As indicated by the dashed box in fig. 4, the bone motion parameters of the character can be obtained based on the outputs of the steps of the motion parameter decoder. Also, motion trajectory parameters (indicating an initial state and a target state, referred to as start-stop state parameters in fig. 3) of the object may be added to the input of each step of the motion parameter decoder 330, so that the interactive animation (motion) reflected by the skeletal motion parameters output by the motion parameter decoder 330 can conform to/match the start-stop state of the object, thereby generating a reasonable and natural interactive animation.

In some embodiments, the first encoder 310 may include a multi-layer perceptron for extracting skeletal features. In some embodiments, the second encoder 320 may include a three-dimensional convolutional neural network (3D CNN) for extracting the phantom features. In some embodiments, the motion parameter decoder 330 may include a Transformer model for generating bone motion parameters. Merely by way of example, the transform model may have the same or similar structure as the transform model in the paper "Attention is all you needed" published by google machine translation team at the NIPS (Neural Information Processing Systems) international conference in 2017.

FIG. 4 is an exemplary block diagram of an interactive animation generation model training system shown in accordance with some embodiments of the present description. As shown in FIG. 4, the system 400 may include a sample input data acquisition module 410, a sample tag data acquisition module 420, and a training module 430.

The sample input data acquisition module 410 may be used to acquire a plurality of sample input data. Wherein the sample input data comprises initial bone state parameters of the character, spatial distribution parameters of the object, and motion trajectory parameters of the object, the initial bone state parameters indicating an initial position and an initial pose of one or more bones of the character, the motion trajectory parameters indicating at least an initial position, an initial pose, a target position, and a target pose of the object.

The sample tag data obtaining module 420 may be configured to obtain a plurality of sample tag data corresponding to the plurality of sample input data, respectively. Wherein the sample tag data comprises skeletal motion parameters of the character indicating the position and pose of the one or more bones at least two points in time during the interaction of the character with the object.

The training module 430 may be configured to train an initial model using the plurality of sample input data and the plurality of sample label data, resulting in an interactive animation generation model.

For more details of the system 400 and its modules, reference may be made to fig. 1 and its associated description.

FIG. 5 is an exemplary block diagram of an interactive animation generation system shown in accordance with some embodiments of the present description. As shown in FIG. 5, system 500 may include an input parameter acquisition module 510, an output parameter acquisition module 520, and an interactive animation generation module 530.

The input parameter acquiring module 510 may be configured to acquire an initial bone state parameter of a target character, a spatial distribution parameter of a target rigid object, and a motion trajectory parameter of the target object. Wherein the initial skeletal state parameters are indicative of an initial position and an initial pose of one or more bones of the target character, and the motion trajectory parameters are indicative of at least an initial position, an initial pose, a target position, and a target pose of the target object.

The output parameter obtaining module 520 may be configured to input the initial bone state parameter of the target character, the spatial distribution parameter of the target object, and the motion trajectory parameter of the target object into an interactive animation generation model, so as to obtain the bone motion parameter of the target character output by the interactive animation generation model. Wherein the bone motion parameters are indicative of the position and pose of the one or more bones at least two points in time during the interaction of the target character with the target object.

The interaction animation generation module 530 may be used to generate an interaction animation of the target character with the target object based on the skeletal motion parameters of the target character.

For more details on the system 500 and its modules, reference may be made to fig. 3 and its associated description.

It should be understood that the systems shown in fig. 4, 5 and their modules may be implemented in various ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).

It should be noted that the above description of the system and its modules is for convenience only and should not limit the present disclosure to the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, in some embodiments, the sample input data acquisition module 410 and the sample tag data acquisition module 420 may be separate modules or may be combined into one module. Such variations are within the scope of the present disclosure.

The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: (1) by utilizing the interactive animation generation model obtained by training, a user inputs the initial skeleton state parameters of the target character, the space distribution parameters of the target object and the motion trail parameters of the target object, and the system can automatically generate the interactive animation of the target character and the target object, so that the production efficiency of the interactive animation can be greatly improved; (2) by adopting high-quality training data, the prediction effect/quality of the model can be ensured; (3) motion trajectory parameters indicating the start-stop state of the object can be added to the input of each step of the motion parameter decoder 330, so that the interactive animation (motion) reflected by the skeletal motion parameters output by the neural network can conform to/match the start-stop state of the object, thereby generating reasonable and natural interactive animation. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the embodiments herein. Various modifications, improvements and adaptations to the embodiments described herein may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the embodiments of the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the embodiments of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of embodiments of the present description may be carried out entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the embodiments of the present specification may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for operation of various portions of the embodiments of the present description may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, VisualBasic, Fortran2003, Perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or processing device. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

In addition, unless explicitly stated in the claims, the order of processing elements and sequences, use of numbers and letters, or use of other names in the embodiments of the present specification are not intended to limit the order of the processes and methods in the embodiments of the present specification. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing processing device or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more embodiments of the invention. This method of disclosure, however, is not intended to imply that more features are required than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are possible within the scope of the embodiments of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

16页详细技术资料下载

Interactive animation generation model training method, interactive animation generation method and system

相关技术

网友询问留言