Video acquisition method, electronic equipment and computer readable storage medium

文档序号:1315141 发布日期:2020-07-10 浏览:7次 中文

阅读说明:本技术 一种视频获取方法、电子设备及计算机可读存储介质 (Video acquisition method, electronic equipment and computer readable storage medium ) 是由 赵琦 颜忠伟 王科 张健 于 2020-03-27 设计创作,主要内容包括:本发明实施例提供一种视频获取方法、电子设备及计算机可读存储介质,涉及视频处理技术领域,以解决现有的视频合成效果差的问题。该方法包括:获取包括源对象的源视频;获取目标对象的第一图像;基于所述第一图像,获取所述目标对象的目标模型;获取所述源视频中所述源对象的关键动作;根据所述关键动作,对所述目标模型进行调整,获得目标动作模型;基于所述目标动作模型,获得目标视频。这样,基于源视频中源对象的关键动作,对目标对象的目标模型进行调整,可使得目标动作模型所呈现的动作与关键动作相匹配,增加了目标视频中目标对象动作的真实性,提升了目标视频的合成效果。(The embodiment of the invention provides a video acquisition method, electronic equipment and a computer readable storage medium, relates to the technical field of video processing, and aims to solve the problem of poor video synthesis effect in the prior art. The method comprises the following steps: acquiring a source video including a source object; acquiring a first image of a target object; acquiring a target model of the target object based on the first image; acquiring key actions of the source object in the source video; adjusting the target model according to the key action to obtain a target action model; and obtaining a target video based on the target action model. Therefore, the target model of the target object is adjusted based on the key action of the source object in the source video, so that the action presented by the target action model is matched with the key action, the authenticity of the action of the target object in the target video is increased, and the synthetic effect of the target video is improved.)

1. A video acquisition method is applied to electronic equipment and is characterized by comprising the following steps:

acquiring a source video including a source object;

acquiring a first image of a target object;

acquiring a target model of the target object based on the first image;

acquiring key actions of the source object in the source video;

adjusting the target model according to the key action to obtain a target action model;

and obtaining a target video based on the target action model.

2. The method of claim 1, wherein the adjusting the target model to obtain a target action model according to the key action comprises:

obtaining an action model according to the key action;

and adjusting the target model according to the action model to obtain the target action model.

3. The method of claim 2, wherein obtaining an action model based on the critical action comprises:

obtaining M action submodels according to M key sub actions of the key action, wherein M is a positive integer;

the adjusting the target model according to the action model to obtain the target action model includes:

adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model;

the obtaining of the target video based on the target action model comprises:

and obtaining a target video based on the M target action submodels.

4. The method of claim 3, wherein the adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model comprises:

for each action submodel of the M action submodels, performing three-dimensional space disassembly on the action submodel to obtain a plurality of key points of the action submodel;

and adjusting the target model according to the plurality of key points to obtain a target action sub-model corresponding to the action sub-model.

5. The method of claim 3, wherein the adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model comprises:

adjusting the target model according to the M action submodels to obtain M intermediate action submodels;

for each intermediate action submodel of the M intermediate action submodels, acquiring a target vertex of the intermediate action submodel;

acquiring a first vertex corresponding to the target vertex, wherein the first vertex is a vertex of a first action sub-model, and the first action sub-model is an action sub-model corresponding to the intermediate action sub-model;

acquiring a second vertex corresponding to the target vertex from a pre-acquired action template model corresponding to the first action sub-model;

and adjusting the position of the target vertex according to the positions of the first vertex and the second vertex so as to obtain a target action submodel corresponding to the intermediate action submodel.

6. The method of claim 1, wherein obtaining a target model of the target object based on the first image comprises:

acquiring an intermediate target model of the target object according to the first image;

obtaining a second image of the target object by using a generative model according to the first image, wherein the appearance of the target object in the second image is matched with the appearance of the target object in the first image;

and adjusting the intermediate target model according to the second image to obtain the target model, wherein the appearance of the target model is matched with the appearance of the target object in the second image.

7. The method of claim 3, wherein obtaining the target video based on the M target action submodels comprises:

obtaining M target frames based on the M target action submodels;

and obtaining a target video according to the M target frames.

8. The method of claim 7, wherein obtaining the target video from the M target frames comprises:

according to the corresponding relation between the M target action submodels and the M target frames and according to a first sequence of the M target action submodels, sequencing the M target frames to obtain a sequenced target frame sequence, wherein the first sequence is determined by the M target action submodels according to the sequence of M key submodels;

and performing interframe interpolation based on the target frame sequence to obtain the target video.

9. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the video acquisition method according to any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the video acquisition method according to any one of claims 1 to 8.

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a video acquisition method, an electronic device, and a computer-readable storage medium.

Background

With the popularity of short videos, various video software is available in the market to meet the needs of users. For example, if a user wants to replace dance videos of other people with dance videos of the user, it is a common practice to replace face images of other people in the dance videos with face images of the user by image processing techniques. However, the effect of video synthesis is poor due to the processing mode.

Disclosure of Invention

The embodiment of the invention provides a video acquisition method, electronic equipment and a computer readable storage medium, which aim to solve the problem of poor video synthesis effect.

To solve the above technical problem, the embodiment of the present invention is implemented as follows:

in a first aspect, an embodiment of the present invention provides a video acquisition method, including:

acquiring a source video including a source object;

acquiring a first image of a target object;

acquiring a target model of the target object based on the first image;

acquiring key actions of the source object in the source video;

adjusting the target model according to the key action to obtain a target action model;

and obtaining a target video based on the target action model.

In a second aspect, an embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, and when executed by the processor, the electronic device implements the steps of the video acquiring method according to the first aspect.

In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements the steps of the video acquiring method according to the first aspect.

In the embodiment of the invention, a source video comprising a source object is obtained; acquiring a first image of a target object; acquiring a target model of the target object based on the first image; acquiring key actions of the source object in the source video; adjusting the target model according to the key action to obtain a target action model; and obtaining a target video based on the target action model. Therefore, the target model of the target object is adjusted based on the key action of the source object in the source video, so that the action presented by the target action model is matched with the key action of the source object, the effect that the target object imitates the action of the source object is improved, the authenticity of the action of the target object in the target video is enhanced, and the composite effect of the target video is improved.

Drawings

Fig. 1 is a flowchart of a video acquisition method according to an embodiment of the present invention;

fig. 2 is a second flowchart of a video capture method according to an embodiment of the present invention;

fig. 3 is a third flowchart of a video acquisition method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a generative model provided by an embodiment of the invention;

FIG. 5 is a diagram of a first intermediate action sub-model in a grid according to an embodiment of the present invention;

FIG. 6 is a block diagram of an electronic device provided by an embodiment of the invention;

fig. 7 is a block diagram of an electronic device according to another embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to facilitate understanding of the embodiments of the present invention, a video color ring and a color ring are first described.

Referring to fig. 1, fig. 1 is a flowchart of a video acquisition method according to an embodiment of the present invention, and as shown in fig. 1, the embodiment provides a video acquisition method applied to an electronic device, including the following steps:

step 101, obtaining a source video including a source object.

The source object may be a human or an animal. The source video may be a dance video, a motion video, or other video that includes a motion of the source object. The source video may be a video captured according to a preset scenario, which includes a preset action.

Step 102, a first image of a target object is acquired.

The target object may be a human or an animal, and the first image of the target object preferably is a frontal whole-body image of the target object, the first image comprising a face of the target object.

And 103, acquiring a target model of the target object based on the first image.

The target model may be a three-dimensional model constructed based on the first image.

And 104, acquiring key actions of the source object in the source video.

When the key action is obtained, the key action can be determined according to the selection operation of the user; if the source video is a video obtained by shooting according to a preset scenario, the key action can be determined according to the arrangement of the scenario, namely, the key action is determined according to the preset action. For example, if the preset actions include action a, action B, and action C, then one or more of action a, action B, and action C may be selected as the key action.

And 105, adjusting the target model according to the key action to obtain a target action model.

And adjusting the target model according to the key action, so that the obtained target action model is matched with the key action, namely, the action presented by the target action model has higher similarity with the key action, and the aim of simulating the key action by the target action model of the target object is fulfilled.

The critical action may include one or more critical sub-actions. And if the key action comprises a plurality of key sub-actions, adjusting the target model according to each key sub-action to obtain a target action sub-model corresponding to each key sub-action, wherein the target action model comprises a target action sub-model, and the target action sub-model can also be a three-dimensional model.

And 106, obtaining a target video based on the target action model.

In this step, after the target motion model is obtained, a key frame may be determined based on the target motion model, and then the target video may be determined according to the key frame. That is, the object performing the action in the target video is the target object, and the performed action is the action of the source object in the source video, so that the target object imitates the action of the source object.

For example, if the source object is Zhang three, the target object is Liqu, and the source video is a dance video. In this embodiment, the target model is built according to the image of lie four, and it can be known from the target model that lie four is, for example, the face and the shape of the target model are similar to those of lie four. According to the key action of the dance video, limb action adjustment is carried out on the target model of the Liqu so that the limb action of the target action model is matched with the key action, then the key frame is determined based on the target action model, and then the target video is further determined according to the key frame.

In an embodiment of the present invention, the electronic Device may be a Mobile phone, a Tablet personal Computer (Tablet personal Computer), a laptop Computer (L ap Computer), a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), a Wearable Device (Wearable Device), or the like.

The video acquisition method of the embodiment of the invention acquires a source video comprising a source object; acquiring a first image of a target object; acquiring a target model of the target object based on the first image; acquiring key actions of the source object in the source video; adjusting the target model according to the key action to obtain a target action model; and obtaining a target video based on the target action model. Therefore, the target model of the target object is adjusted based on the key action of the source object in the source video, so that the action presented by the target action model is matched with the key action, the effect of the target object simulating the action of the source object is improved, the authenticity of the action of the target object in the target video is increased, and the composite effect of the target video is improved.

Referring to fig. 2, fig. 2 is a second flowchart of a video acquisition method according to an embodiment of the present invention, and as shown in fig. 2, the embodiment provides a video acquisition method applied to an electronic device, including the following steps:

step 201, obtaining a source video including a source object.

The source object may be a human or an animal. The source video may be a dance video, a motion video, or other video that includes a motion of the source object. The source video may be a video captured according to a preset scenario, which includes a preset action.

Step 202, a first image of the target object is acquired.

The target object may be a human or an animal, and the first image of the target object preferably is a frontal whole-body image of the target object, the first image comprising a face of the target object.

Step 203, acquiring a target model of the target object based on the first image.

The target model may be a three-dimensional model constructed based on the first image.

And 204, acquiring the key action of the source object in the source video.

When the key action is obtained, the key action can be determined according to the selection operation of the user; if the source video is a video obtained by shooting according to a preset scenario, the key action can be determined according to the arrangement of the scenario, namely, the key action is determined according to the preset action. For example, if the preset actions include action a, action B, and action C, then one or more of action a, action B, and action C may be selected as the key action.

And step 205, obtaining an action model according to the key action.

And constructing an action model corresponding to the key action according to the key action, wherein the action model can be a three-dimensional model. The critical action may include one or more critical sub-actions. If the key action includes a plurality of key sub-actions, a corresponding sub-action model can be obtained according to each key sub-action, in this case, the action model includes a plurality of sub-action models. The sub-action model may also be a three-dimensional model.

And 206, adjusting the target model according to the action model to obtain the target action model.

Specifically, the target model is adjusted according to the action model, so that the target action model is matched with the action model, that is, the action presented by the target action model has higher similarity with the action model, and the purpose that the target action model of the target object simulates the key action is achieved.

Step 205-step 206 are one implementation of step 105.

And step 207, obtaining a target video based on the target action model.

According to the video acquisition method, the action model is established based on the key action of the source object in the source video, the target model of the target object is adjusted based on the action model, the action presented by the target action model can be matched with the key action, and the effect that the target object imitates the action of the source object is improved.

Referring to fig. 3, fig. 3 is a third flowchart of a video acquisition method according to an embodiment of the present invention, and as shown in fig. 3, the embodiment provides a video acquisition method applied to an electronic device, including the following steps:

step 301, obtaining a source video including a source object.

The source object may be a human or an animal. The source video may be a dance video, a motion video, or other video that includes a motion of the source object. The source video may be a video captured according to a preset scenario, which includes a preset action.

Step 302, a first image of a target object is acquired.

The target object may be a human or an animal, and the first image of the target object preferably is a frontal whole-body image of the target object, the first image comprising a face of the target object.

Step 303, obtaining a target model of the target object based on the first image.

The target model may be a three-dimensional model constructed based on the first image.

Further, step 303, obtaining a target model of the target object based on the first image, includes:

acquiring an intermediate target model of the target object according to the first image;

obtaining a second image of the target object by using a generative model according to the first image, wherein the appearance of the target object in the second image is matched with the appearance of the target object in the first image;

and adjusting the intermediate target model according to the second image to obtain the target model, wherein the appearance of the target model is matched with the appearance of the target object in the second image.

In this embodiment, the generative model is used to generate a second image of a target object from the first image, the appearance of the target object in the second image matching the appearance of the target object in the first image. The appearance of the target object may be the face, clothing, or coat of the target object (for the case where the target object is an animal), or the like. A deep learning based migration algorithm may be employed to migrate the appearance of the target object onto the intermediate target model. The generation model adopts a countermeasure network, the countermeasure network consists of a generator and a discriminator, the generator is used for capturing the distribution of sample data, simulating the distribution of the sample in a target domain according to input random noise, generating a false sample and 'cheating' the discriminator.

The generator for generating the model in the present embodiment functions as: and generating a second image according to the appearance of the target object in the first image, wherein the appearance of the target object in the second image is matched with the appearance of the target object in the first image. As shown in fig. 4, in order to generate a training diagram of the model, noise is input to the generator during training, and the noise exists to make the network random and generate a distribution, so that sampling can be performed, and random noise which follows gaussian distribution is generally used. The generated data is obtained by the generator, and the generated data and the real data obtained by the real sample are input into the discriminator together, and the discriminator outputs the discrimination result. After the training is completed, a generator in the generative model may generate a second image in which the appearance of the target object matches the appearance of the target object in the first image.

And adjusting the intermediate target model according to the second image to obtain the target model, wherein the appearance of the target model is matched with the appearance of the target object in the second image, and the visual effect that the appearance of the target object in the first image is consistent with the appearance of the target model is realized.

The adjustment of the intermediate target model according to the second image may be understood as mapping the intermediate target model according to the appearance of the second image, so that the intermediate target model has an appearance visual effect consistent with the second image.

And 304, acquiring the key action of the source object in the source video.

When the key action is obtained, the key action can be determined according to the selection operation of the user; if the source video is a video obtained by shooting according to a preset scenario, the key action can be determined according to the arrangement of the scenario, namely, the key action is determined according to the preset action. For example, if the preset actions include action a, action B, and action C, then one or more of action a, action B, and action C may be selected as the key action.

And 305, obtaining M action sub-models according to the M key sub-actions of the key action, wherein M is a positive integer.

The key actions comprise M key sub-actions, and one action sub-model can be obtained according to each key sub-action.

And step 306, adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model.

And adjusting the target model according to one action submodel to obtain one target action submodel, wherein each action submodel can correspond to one target action submodel.

Further, step 305, adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model, including:

for each action submodel of the M action submodels, performing three-dimensional space disassembly on the action submodel to obtain a plurality of key points of the action submodel;

and adjusting the target model according to the plurality of key points to obtain a target action sub-model corresponding to the action sub-model.

Specifically, when the target motion sub-model is obtained by disassembling the target model according to the motion sub-model, for example, by adopting a human body segmentation algorithm, based on each of the M motion sub-models, the motion sub-model may be disassembled in a three-dimensional space to obtain a plurality of key points, and the plurality of key points have three-dimensional coordinates. And then adjusting points corresponding to the key points in the target model based on the key points to obtain a target action sub-model. Each action submodel corresponds to a target action submodel.

Further, step 305, adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model, including:

adjusting the target model according to the M action submodels to obtain M intermediate action submodels;

for each intermediate action submodel of the M intermediate action submodels, acquiring a target vertex of the intermediate action submodel;

acquiring a first vertex corresponding to the target vertex, wherein the first vertex is a vertex of a first action sub-model, and the first action sub-model is an action sub-model corresponding to the intermediate action sub-model;

acquiring a second vertex corresponding to the target vertex from a pre-acquired action template model corresponding to the first action sub-model;

and adjusting the position of the target vertex according to the positions of the first vertex and the second vertex so as to obtain a target action submodel corresponding to the intermediate action submodel.

According to the M action submodels, adjusting the target model to obtain M intermediate action submodels, which may specifically be: for each action submodel of the M action submodels, performing three-dimensional space disassembly on the action submodel to obtain a plurality of key points of the action submodel; and adjusting the target model according to the plurality of key points to obtain an intermediate action submodel corresponding to the action submodel. The above-mentioned related descriptions can be adopted, and are not described herein in detail.

In order to further improve the adjustment accuracy of the intermediate operation submodel, the intermediate operation submodel is further adjusted.

And for each intermediate action submodel, determining a target vertex of the intermediate action submodel, and then acquiring a first vertex corresponding to the target vertex, wherein the first vertex is the vertex of the first action submodel, and the intermediate action submodel corresponds to the first action submodel, namely the intermediate action submodel is obtained by adjusting the target model based on the first action submodel.

The action template model is obtained in advance and can be regarded as a standard action model. The action template model set may include a plurality of action template models, each action template model corresponding to one of the M action submodels. And determining an action template model corresponding to the first action sub-model from the action template model set, and acquiring a second vertex corresponding to the target vertex from the action template model.

And adjusting the position of the target vertex according to the positions of the first vertex and the second vertex.

If the vertex of the middle action sub-model is V, the action template model is V1The vertex of the first action submodel is V2The computational expression of V is as follows:

the weight value is represented, and the value range is 0 to 1.

Fig. 5 shows the intermediate action submodel in the grid, and the intermediate action submodel is fine-tuned by using the grid deformation algorithm, that is, the intermediate action submodel is adjusted by using the above expression according to the action template model concentrated by the action template model and the first action submodel.

Taking the above formula as a reference, under the condition that a plurality of intermediate action submodels need to be adjusted, a multi-target fusion algorithm is used, and the algorithm is as follows:

representing the weight value, wherein the value range is 0 to 1, b represents the vertex coordinate of the action base reference model of the key action, namely the vertex coordinate of the action template model in the action template model set, and b is (x)b,yb,zb),TiThe vertex coordinates of the ith action submodel are expressed, the value of i can be from 1 to n, n is the total number of the action submodels, and T1=(x1,y1,z1) Vertex coordinates, T, representing a first action sub-model2=(x2,y2,z2) Vertex coordinates representing the second action submodel, and so on, Tn=(xn,yn,zn) The vertex coordinates of the nth motion sub model are shown.

Step 305-step 306 are one implementation of step 205.

And 307, obtaining a target video based on the M target action submodels.

Step 307 is one implementation of step 206.

Further, the step may specifically be: obtaining M target frames based on the M target action submodels; and obtaining a target video according to the M target frames.

And determining a target frame according to each target action submodel, wherein the action corresponding to one target action submodel is displayed in each target frame, and the actions of the plurality of target frames are connected in series, so that the actions in the target video are coherent.

Further, the obtaining a target video according to the M target frames includes:

according to the corresponding relation between the M target action submodels and the M target frames and according to a first sequence of the M target action submodels, sequencing the M target frames to obtain a sequenced target frame sequence, wherein the first sequence is determined by the M target action submodels according to the sequence of M key submodels;

and performing interframe interpolation based on the target frame sequence to obtain the target video.

The sequence of the M key sub-actions can be determined according to the sequence of each key sub-action in the source video, and as the target action sub-models and the key sub-actions have corresponding relations, the sequence, namely the first sequence, of each target action sub-model can be determined based on the sequence of each key sub-action.

Because the target frame is determined according to the target action submodel, the target action submodel has a corresponding relation with the target frame, and thus, the sequence among a plurality of target frames can be determined based on the sequence of each target action submodel. In order to improve the display effect of the target video, interframe interpolation is carried out by adopting adjacent target frames in the target frame sequence to obtain the target video. The object of executing the action in the target video is the target object, the executed action is the action of the source object in the source video, the purpose that the target object imitates the action of the source object is achieved, for example, if the key action is a dance action, the target video that the target object imitates the dance of the source object can be obtained.

Referring to fig. 6, fig. 6 is a structural diagram of a terminal according to an embodiment of the present invention, and as shown in fig. 6, an electronic device 600 includes:

a first obtaining module 601, configured to obtain a source video including a source object;

a second obtaining module 602, configured to obtain a first image of a target object;

a third obtaining module 603, configured to obtain a target model of the target object based on the first image;

a fourth obtaining module 604, configured to obtain a key action of the source object in the source video;

a fifth obtaining module 605, configured to adjust the target model according to the key action, so as to obtain a target action model;

a sixth obtaining module 606, configured to obtain a target video based on the target action model.

Further, the fifth obtaining module 605 includes:

the first obtaining submodule is used for obtaining an action model according to the key action;

and the second obtaining submodule is used for adjusting the target model according to the action model to obtain the target action model.

Further, the first obtaining sub-module is configured to obtain M action sub-models according to M key sub-actions of the key action, where M is a positive integer;

the second obtaining submodule is used for adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model;

and the sixth acquisition module is used for acquiring the target video based on the M target action submodels.

Further, the second obtaining sub-module includes:

the disassembly unit is used for performing three-dimensional space disassembly on the action submodel for each action submodel of the M action submodels to obtain a plurality of key points of the action submodel;

and the first adjusting unit is used for adjusting the target model according to the plurality of key points to obtain a target action sub-model corresponding to the action sub-model.

Further, the second obtaining sub-module includes:

the second adjusting unit is used for adjusting the target model according to the M action submodels to obtain M middle action submodels;

a first obtaining unit configured to obtain a target vertex of the intermediate action sub-model for each of the M intermediate action sub-models;

a second obtaining unit, configured to obtain a first vertex corresponding to the target vertex, where the first vertex is a vertex of a first action sub-model, and the first action sub-model is an action sub-model corresponding to the intermediate action sub-model;

a third obtaining unit, configured to obtain a second vertex corresponding to the target vertex from a pre-obtained action template model corresponding to the first action sub-model;

and the third adjusting unit is used for adjusting the position of the target vertex according to the positions of the first vertex and the second vertex so as to obtain a target action sub-model corresponding to the intermediate action sub-model.

Further, the third obtaining module 603 is configured to:

acquiring an intermediate target model of the target object according to the first image;

obtaining a second image of the target object by using a generative model according to the first image, wherein the appearance of the target object in the second image is matched with the appearance of the target object in the first image;

and adjusting the intermediate target model according to the second image to obtain the target model, wherein the appearance of the target model is matched with the appearance of the target object in the second image.

Further, the sixth obtaining module 606 includes:

a fourth obtaining unit, configured to obtain M target frames based on the M target action submodels;

and the fifth acquisition unit is used for acquiring the target video according to the M target frames.

Further, the fifth obtaining unit is configured to:

according to the corresponding relation between the M target action submodels and the M target frames and according to a first sequence of the M target action submodels, sequencing the M target frames to obtain a sequenced target frame sequence, wherein the first sequence is determined by the M target action submodels according to the sequence of M key submodels;

and performing interframe interpolation based on the target frame sequence to obtain the target video.

The terminal 600 can implement each process implemented by the terminal in the embodiments of the methods in fig. 1 to fig. 3, and is not described herein again to avoid repetition.

The terminal 600 of the embodiment of the present invention obtains a source video including a source object; acquiring a first image of a target object; constructing a target model of the target object based on the first image; acquiring key actions of the source object in the source video; adjusting the target model according to the key action to obtain a target action model; and obtaining a target video based on the target action model. Therefore, the target model of the target object is adjusted based on the key action of the source object in the source video, so that the action presented by the target action model is matched with the key action, and the synthetic effect of the target object simulating the action of the source object is improved.

Fig. 7 is a schematic diagram of a hardware structure of an electronic device for implementing various embodiments of the present invention, and as shown in fig. 7, the electronic device 700 includes, but is not limited to: a radio frequency unit 701, a network module 702, an audio output unit 703, an input unit 704, a sensor 705, a display unit 706, a user input unit 707, an interface unit 708, a memory 709, a processor 710, a power supply 711, and the like. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 7 does not constitute a limitation of the electronic device, and that the electronic device may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the electronic device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a wearable device, a pedometer, and the like.

The processor 710 is configured to obtain a source video including a source object;

acquiring a first image of a target object;

acquiring a target model of the target object based on the first image;

acquiring key actions of the source object in the source video;

adjusting the target model according to the key action to obtain a target action model;

and obtaining a target video based on the target action model.

Further, the processor 710 is further configured to:

obtaining an action model according to the key action;

and adjusting the target model according to the action model to obtain the target action model.

Further, the processor 710 is further configured to:

obtaining M action submodels according to M key sub actions of the key action, wherein M is a positive integer;

the adjusting the target model according to the action model to obtain the target action model includes:

adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model;

the obtaining of the target video based on the target action model comprises:

and obtaining a target video based on the M target action submodels.

Further, the processor 710 is further configured to:

for each action submodel of the M action submodels, performing three-dimensional space disassembly on the action submodel to obtain a plurality of key points of the action submodel;

and adjusting the target model according to the plurality of key points to obtain a target action sub-model corresponding to the action sub-model.

Further, the processor 710 is further configured to:

adjusting the target model according to the M action submodels to obtain M intermediate action submodels;

for each intermediate action submodel of the M intermediate action submodels, acquiring a target vertex of the intermediate action submodel;

acquiring a first vertex corresponding to the target vertex, wherein the first vertex is a vertex of a first action sub-model, and the first action sub-model is an action sub-model corresponding to the intermediate action sub-model;

acquiring a second vertex corresponding to the target vertex from a pre-acquired action template model corresponding to the first action sub-model;

and adjusting the position of the target vertex according to the positions of the first vertex and the second vertex so as to obtain a target action submodel corresponding to the intermediate action submodel.

Further, the processor 710 is further configured to:

acquiring an intermediate target model of the target object according to the first image;

obtaining a second image of the target object by using a generative model according to the first image, wherein the appearance of the target object in the second image is matched with the appearance of the target object in the first image;

and adjusting the intermediate target model according to the second image to obtain the target model, wherein the appearance of the target model is matched with the appearance of the target object in the second image.

Further, the processor 710 is further configured to:

obtaining M target frames based on the M target action submodels;

and obtaining a target video according to the M target frames.

Further, the processor 710 is further configured to:

according to the corresponding relation between the M target action submodels and the M target frames and according to a first sequence of the M target action submodels, sequencing the M target frames to obtain a sequenced target frame sequence, wherein the first sequence is determined by the M target action submodels according to the sequence of M key submodels;

and performing interframe interpolation based on the target frame sequence to obtain the target video.

The electronic device 700 is capable of implementing the processes implemented by the electronic device in the foregoing embodiments, and in order to avoid repetition, the details are not described here.

The electronic device 700 of the embodiment of the present invention acquires a source video including a source object; acquiring a first image of a target object; acquiring a target model of the target object based on the first image; acquiring key actions of the source object in the source video; adjusting the target model according to the key action to obtain a target action model; and obtaining a target video based on the target action model. Therefore, the target model of the target object is adjusted based on the key action of the source object in the source video, so that the action presented by the target action model is matched with the key action, and the synthetic effect of the target object simulating the action of the source object is improved.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 701 may be used for receiving and sending signals during a message transmission and reception process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 710; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 701 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 701 may also communicate with a network and other devices through a wireless communication system.

The electronic device provides wireless broadband internet access to the user via the network module 702, such as assisting the user in sending and receiving e-mails, browsing web pages, and accessing streaming media.

The audio output unit 703 may convert audio data received by the radio frequency unit 701 or the network module 702 or stored in the memory 709 into an audio signal and output as sound. Also, the audio output unit 703 may also provide audio output related to a specific function performed by the electronic apparatus 700 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 703 includes a speaker, a buzzer, a receiver, and the like.

The input unit 704 is used to receive audio or video signals. The input Unit 704 may include a Graphics Processing Unit (GPU) 7041 and a microphone 7042, and the Graphics processor 7041 processes image data of a still picture or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 706. The image frames processed by the graphic processor 7041 may be stored in the memory 709 (or other storage medium) or transmitted via the radio unit 701 or the network module 702. The microphone 7042 may receive sounds and may be capable of processing such sounds into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 701 in case of a phone call mode.

The electronic device 700 also includes at least one sensor 707, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 7061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 7061 and/or a backlight when the electronic device 700 is moved to the ear. As one type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of an electronic device (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 707 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The Display unit 706 may include a Display panel 7061, and the Display panel 7061 may be configured in the form of a liquid Crystal Display (L acquired Crystal Display, L CD), an Organic light-Emitting Diode (O L ED), or the like.

The user input unit 707 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 707 includes a touch panel 7071 and other input devices 7072. The touch panel 7071, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 7071 (e.g., operations by a user on or near the touch panel 7071 using a finger, a stylus, or any other suitable object or attachment). The touch panel 7071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 710, receives a command from the processor 710, and executes the command. In addition, the touch panel 7071 can be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave. The user input unit 707 may include other input devices 7072 in addition to the touch panel 7071. In particular, the other input devices 7072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described herein again.

Further, the touch panel 7071 may be overlaid on the display panel 7061, and when the touch panel 7071 detects a touch operation on or near the touch panel 7071, the touch operation is transmitted to the processor 710 to determine the type of the touch event, and then the processor 710 provides a corresponding visual output on the display panel 7061 according to the type of the touch event. Although the touch panel 7071 and the display panel 7061 are shown in fig. 7 as two separate components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 7071 and the display panel 7061 may be integrated to implement the input and output functions of the electronic device, which is not limited herein.

The interface unit 708 is an interface for connecting an external device to the electronic apparatus 700. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 708 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic apparatus 700 or may be used to transmit data between the electronic apparatus 700 and the external device.

The memory 709 may be used to store software programs as well as various data. The memory 709 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 709 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 710 is a control center of the electronic device, connects various parts of the whole electronic device by using various interfaces and lines, performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 709 and calling data stored in the memory 709, thereby monitoring the whole electronic device. Processor 710 may include one or more processing units; preferably, the processor 710 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 710.

The electronic device 700 may also include a power supply 711 (e.g., a battery) for providing power to the various components, and preferably, the power supply 711 may be logically coupled to the processor 710 via a power management system, such that functions of managing charging, discharging, and power consumption may be performed via the power management system.

In addition, the electronic device 700 includes some functional modules that are not shown, and are not described in detail herein.

Preferably, an embodiment of the present invention further provides an electronic device, which includes a processor 710, a memory 709, and a computer program stored in the memory 709 and capable of running on the processor 710, where the computer program is executed by the processor 710 to implement each process of the above-mentioned video obtaining method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the video acquisition method embodiment shown in fig. 1 or fig. 2, and can achieve the same technical effect, and is not described herein again to avoid repetition. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

21页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:视频输出方法、装置、视频设备及计算机可读存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类