Video acquisition method, electronic equipment and computer readable storage medium

文档序号：1315141 发布日期：2020-07-10 浏览：7次中文

阅读说明：本技术 一种视频获取方法、电子设备及计算机可读存储介质 (Video acquisition method, electronic equipment and computer readable storage medium ) 是由赵琦颜忠伟王科张健于 2020-03-27 设计创作，主要内容包括：本发明实施例提供一种视频获取方法、电子设备及计算机可读存储介质,涉及视频处理技术领域,以解决现有的视频合成效果差的问题。该方法包括：获取包括源对象的源视频；获取目标对象的第一图像；基于所述第一图像,获取所述目标对象的目标模型；获取所述源视频中所述源对象的关键动作；根据所述关键动作,对所述目标模型进行调整,获得目标动作模型；基于所述目标动作模型,获得目标视频。这样,基于源视频中源对象的关键动作,对目标对象的目标模型进行调整,可使得目标动作模型所呈现的动作与关键动作相匹配,增加了目标视频中目标对象动作的真实性,提升了目标视频的合成效果。(The embodiment of the invention provides a video acquisition method, electronic equipment and a computer readable storage medium, relates to the technical field of video processing, and aims to solve the problem of poor video synthesis effect in the prior art. The method comprises the following steps: acquiring a source video including a source object; acquiring a first image of a target object; acquiring a target model of the target object based on the first image; acquiring key actions of the source object in the source video; adjusting the target model according to the key action to obtain a target action model; and obtaining a target video based on the target action model. Therefore, the target model of the target object is adjusted based on the key action of the source object in the source video, so that the action presented by the target action model is matched with the key action, the authenticity of the action of the target object in the target video is increased, and the synthetic effect of the target video is improved.)

1. A video acquisition method is applied to electronic equipment and is characterized by comprising the following steps:

acquiring a source video including a source object;

acquiring a first image of a target object;

acquiring a target model of the target object based on the first image;

acquiring key actions of the source object in the source video;

adjusting the target model according to the key action to obtain a target action model;

and obtaining a target video based on the target action model.

2. The method of claim 1, wherein the adjusting the target model to obtain a target action model according to the key action comprises:

obtaining an action model according to the key action;

and adjusting the target model according to the action model to obtain the target action model.

3. The method of claim 2, wherein obtaining an action model based on the critical action comprises:

obtaining M action submodels according to M key sub actions of the key action, wherein M is a positive integer;

the adjusting the target model according to the action model to obtain the target action model includes:

adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model;

the obtaining of the target video based on the target action model comprises:

and obtaining a target video based on the M target action submodels.

4. The method of claim 3, wherein the adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model comprises:

for each action submodel of the M action submodels, performing three-dimensional space disassembly on the action submodel to obtain a plurality of key points of the action submodel;

and adjusting the target model according to the plurality of key points to obtain a target action sub-model corresponding to the action sub-model.

5. The method of claim 3, wherein the adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model comprises:

adjusting the target model according to the M action submodels to obtain M intermediate action submodels;

for each intermediate action submodel of the M intermediate action submodels, acquiring a target vertex of the intermediate action submodel;

acquiring a first vertex corresponding to the target vertex, wherein the first vertex is a vertex of a first action sub-model, and the first action sub-model is an action sub-model corresponding to the intermediate action sub-model;

acquiring a second vertex corresponding to the target vertex from a pre-acquired action template model corresponding to the first action sub-model;

and adjusting the position of the target vertex according to the positions of the first vertex and the second vertex so as to obtain a target action submodel corresponding to the intermediate action submodel.

6. The method of claim 1, wherein obtaining a target model of the target object based on the first image comprises:

acquiring an intermediate target model of the target object according to the first image;

obtaining a second image of the target object by using a generative model according to the first image, wherein the appearance of the target object in the second image is matched with the appearance of the target object in the first image;

and adjusting the intermediate target model according to the second image to obtain the target model, wherein the appearance of the target model is matched with the appearance of the target object in the second image.

7. The method of claim 3, wherein obtaining the target video based on the M target action submodels comprises:

obtaining M target frames based on the M target action submodels;

and obtaining a target video according to the M target frames.

8. The method of claim 7, wherein obtaining the target video from the M target frames comprises:

according to the corresponding relation between the M target action submodels and the M target frames and according to a first sequence of the M target action submodels, sequencing the M target frames to obtain a sequenced target frame sequence, wherein the first sequence is determined by the M target action submodels according to the sequence of M key submodels;

and performing interframe interpolation based on the target frame sequence to obtain the target video.

9. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the video acquisition method according to any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the video acquisition method according to any one of claims 1 to 8.

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a video acquisition method, an electronic device, and a computer-readable storage medium.

Background

With the popularity of short videos, various video software is available in the market to meet the needs of users. For example, if a user wants to replace dance videos of other people with dance videos of the user, it is a common practice to replace face images of other people in the dance videos with face images of the user by image processing techniques. However, the effect of video synthesis is poor due to the processing mode.

Disclosure of Invention

The embodiment of the invention provides a video acquisition method, electronic equipment and a computer readable storage medium, which aim to solve the problem of poor video synthesis effect.

To solve the above technical problem, the embodiment of the present invention is implemented as follows:

in a first aspect, an embodiment of the present invention provides a video acquisition method, including:

acquiring a source video including a source object;

acquiring a first image of a target object;

acquiring a target model of the target object based on the first image;

acquiring key actions of the source object in the source video;

adjusting the target model according to the key action to obtain a target action model;

and obtaining a target video based on the target action model.

In a second aspect, an embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, and when executed by the processor, the electronic device implements the steps of the video acquiring method according to the first aspect.

In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements the steps of the video acquiring method according to the first aspect.

In the embodiment of the invention, a source video comprising a source object is obtained; acquiring a first image of a target object; acquiring a target model of the target object based on the first image; acquiring key actions of the source object in the source video; adjusting the target model according to the key action to obtain a target action model; and obtaining a target video based on the target action model. Therefore, the target model of the target object is adjusted based on the key action of the source object in the source video, so that the action presented by the target action model is matched with the key action of the source object, the effect that the target object imitates the action of the source object is improved, the authenticity of the action of the target object in the target video is enhanced, and the composite effect of the target video is improved.

Drawings

Fig. 1 is a flowchart of a video acquisition method according to an embodiment of the present invention;

fig. 2 is a second flowchart of a video capture method according to an embodiment of the present invention;

fig. 3 is a third flowchart of a video acquisition method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a generative model provided by an embodiment of the invention;

FIG. 5 is a diagram of a first intermediate action sub-model in a grid according to an embodiment of the present invention;

FIG. 6 is a block diagram of an electronic device provided by an embodiment of the invention;

fig. 7 is a block diagram of an electronic device according to another embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to facilitate understanding of the embodiments of the present invention, a video color ring and a color ring are first described.

Referring to fig. 1, fig. 1 is a flowchart of a video acquisition method according to an embodiment of the present invention, and as shown in fig. 1, the embodiment provides a video acquisition method applied to an electronic device, including the following steps:

step 101, obtaining a source video including a source object.

The source object may be a human or an animal. The source video may be a dance video, a motion video, or other video that includes a motion of the source object. The source video may be a video captured according to a preset scenario, which includes a preset action.

Step 102, a first image of a target object is acquired.

The target object may be a human or an animal, and the first image of the target object preferably is a frontal whole-body image of the target object, the first image comprising a face of the target object.

And 103, acquiring a target model of the target object based on the first image.

The target model may be a three-dimensional model constructed based on the first image.

And 104, acquiring key actions of the source object in the source video.

When the key action is obtained, the key action can be determined according to the selection operation of the user; if the source video is a video obtained by shooting according to a preset scenario, the key action can be determined according to the arrangement of the scenario, namely, the key action is determined according to the preset action. For example, if the preset actions include action a, action B, and action C, then one or more of action a, action B, and action C may be selected as the key action.

And 105, adjusting the target model according to the key action to obtain a target action model.

And adjusting the target model according to the key action, so that the obtained target action model is matched with the key action, namely, the action presented by the target action model has higher similarity with the key action, and the aim of simulating the key action by the target action model of the target object is fulfilled.

The critical action may include one or more critical sub-actions. And if the key action comprises a plurality of key sub-actions, adjusting the target model according to each key sub-action to obtain a target action sub-model corresponding to each key sub-action, wherein the target action model comprises a target action sub-model, and the target action sub-model can also be a three-dimensional model.

And 106, obtaining a target video based on the target action model.

In this step, after the target motion model is obtained, a key frame may be determined based on the target motion model, and then the target video may be determined according to the key frame. That is, the object performing the action in the target video is the target object, and the performed action is the action of the source object in the source video, so that the target object imitates the action of the source object.

For example, if the source object is Zhang three, the target object is Liqu, and the source video is a dance video. In this embodiment, the target model is built according to the image of lie four, and it can be known from the target model that lie four is, for example, the face and the shape of the target model are similar to those of lie four. According to the key action of the dance video, limb action adjustment is carried out on the target model of the Liqu so that the limb action of the target action model is matched with the key action, then the key frame is determined based on the target action model, and then the target video is further determined according to the key frame.

In an embodiment of the present invention, the electronic Device may be a Mobile phone, a Tablet personal Computer (Tablet personal Computer), a laptop Computer (L ap Computer), a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), a Wearable Device (Wearable Device), or the like.

The video acquisition method of the embodiment of the invention acquires a source video comprising a source object; acquiring a first image of a target object; acquiring a target model of the target object based on the first image; acquiring key actions of the source object in the source video; adjusting the target model according to the key action to obtain a target action model; and obtaining a target video based on the target action model. Therefore, the target model of the target object is adjusted based on the key action of the source object in the source video, so that the action presented by the target action model is matched with the key action, the effect of the target object simulating the action of the source object is improved, the authenticity of the action of the target object in the target video is increased, and the composite effect of the target video is improved.

Referring to fig. 2, fig. 2 is a second flowchart of a video acquisition method according to an embodiment of the present invention, and as shown in fig. 2, the embodiment provides a video acquisition method applied to an electronic device, including the following steps:

step 201, obtaining a source video including a source object.

Step 202, a first image of the target object is acquired.

Step 203, acquiring a target model of the target object based on the first image.

The target model may be a three-dimensional model constructed based on the first image.

And 204, acquiring the key action of the source object in the source video.

And step 205, obtaining an action model according to the key action.

And constructing an action model corresponding to the key action according to the key action, wherein the action model can be a three-dimensional model. The critical action may include one or more critical sub-actions. If the key action includes a plurality of key sub-actions, a corresponding sub-action model can be obtained according to each key sub-action, in this case, the action model includes a plurality of sub-action models. The sub-action model may also be a three-dimensional model.

And 206, adjusting the target model according to the action model to obtain the target action model.

Specifically, the target model is adjusted according to the action model, so that the target action model is matched with the action model, that is, the action presented by the target action model has higher similarity with the action model, and the purpose that the target action model of the target object simulates the key action is achieved.

Step 205-step 206 are one implementation of step 105.

And step 207, obtaining a target video based on the target action model.

According to the video acquisition method, the action model is established based on the key action of the source object in the source video, the target model of the target object is adjusted based on the action model, the action presented by the target action model can be matched with the key action, and the effect that the target object imitates the action of the source object is improved.

Referring to fig. 3, fig. 3 is a third flowchart of a video acquisition method according to an embodiment of the present invention, and as shown in fig. 3, the embodiment provides a video acquisition method applied to an electronic device, including the following steps:

step 301, obtaining a source video including a source object.

Step 302, a first image of a target object is acquired.

Step 303, obtaining a target model of the target object based on the first image.

The target model may be a three-dimensional model constructed based on the first image.

Further, step 303, obtaining a target model of the target object based on the first image, includes:

acquiring an intermediate target model of the target object according to the first image;

In this embodiment, the generative model is used to generate a second image of a target object from the first image, the appearance of the target object in the second image matching the appearance of the target object in the first image. The appearance of the target object may be the face, clothing, or coat of the target object (for the case where the target object is an animal), or the like. A deep learning based migration algorithm may be employed to migrate the appearance of the target object onto the intermediate target model. The generation model adopts a countermeasure network, the countermeasure network consists of a generator and a discriminator, the generator is used for capturing the distribution of sample data, simulating the distribution of the sample in a target domain according to input random noise, generating a false sample and 'cheating' the discriminator.

The generator for generating the model in the present embodiment functions as: and generating a second image according to the appearance of the target object in the first image, wherein the appearance of the target object in the second image is matched with the appearance of the target object in the first image. As shown in fig. 4, in order to generate a training diagram of the model, noise is input to the generator during training, and the noise exists to make the network random and generate a distribution, so that sampling can be performed, and random noise which follows gaussian distribution is generally used. The generated data is obtained by the generator, and the generated data and the real data obtained by the real sample are input into the discriminator together, and the discriminator outputs the discrimination result. After the training is completed, a generator in the generative model may generate a second image in which the appearance of the target object matches the appearance of the target object in the first image.

And adjusting the intermediate target model according to the second image to obtain the target model, wherein the appearance of the target model is matched with the appearance of the target object in the second image, and the visual effect that the appearance of the target object in the first image is consistent with the appearance of the target model is realized.

The adjustment of the intermediate target model according to the second image may be understood as mapping the intermediate target model according to the appearance of the second image, so that the intermediate target model has an appearance visual effect consistent with the second image.

And 304, acquiring the key action of the source object in the source video.

And 305, obtaining M action sub-models according to the M key sub-actions of the key action, wherein M is a positive integer.

The key actions comprise M key sub-actions, and one action sub-model can be obtained according to each key sub-action.

And step 306, adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model.

And adjusting the target model according to one action submodel to obtain one target action submodel, wherein each action submodel can correspond to one target action submodel.

Further, step 305, adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model, including:

for each action submodel of the M action submodels, performing three-dimensional space disassembly on the action submodel to obtain a plurality of key points of the action submodel;

and adjusting the target model according to the plurality of key points to obtain a target action sub-model corresponding to the action sub-model.

Specifically, when the target motion sub-model is obtained by disassembling the target model according to the motion sub-model, for example, by adopting a human body segmentation algorithm, based on each of the M motion sub-models, the motion sub-model may be disassembled in a three-dimensional space to obtain a plurality of key points, and the plurality of key points have three-dimensional coordinates. And then adjusting points corresponding to the key points in the target model based on the key points to obtain a target action sub-model. Each action submodel corresponds to a target action submodel.

Further, step 305, adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model, including:

adjusting the target model according to the M action submodels to obtain M intermediate action submodels;

for each intermediate action submodel of the M intermediate action submodels, acquiring a target vertex of the intermediate action submodel;

acquiring a second vertex corresponding to the target vertex from a pre-acquired action template model corresponding to the first action sub-model;

According to the M action submodels, adjusting the target model to obtain M intermediate action submodels, which may specifically be: for each action submodel of the M action submodels, performing three-dimensional space disassembly on the action submodel to obtain a plurality of key points of the action submodel; and adjusting the target model according to the plurality of key points to obtain an intermediate action submodel corresponding to the action submodel. The above-mentioned related descriptions can be adopted, and are not described herein in detail.

In order to further improve the adjustment accuracy of the intermediate operation submodel, the intermediate operation submodel is further adjusted.

And for each intermediate action submodel, determining a target vertex of the intermediate action submodel, and then acquiring a first vertex corresponding to the target vertex, wherein the first vertex is the vertex of the first action submodel, and the intermediate action submodel corresponds to the first action submodel, namely the intermediate action submodel is obtained by adjusting the target model based on the first action submodel.

The action template model is obtained in advance and can be regarded as a standard action model. The action template model set may include a plurality of action template models, each action template model corresponding to one of the M action submodels. And determining an action template model corresponding to the first action sub-model from the action template model set, and acquiring a second vertex corresponding to the target vertex from the action template model.

And adjusting the position of the target vertex according to the positions of the first vertex and the second vertex.

If the vertex of the middle action sub-model is V, the action template model is V₁The vertex of the first action submodel is V₂The computational expression of V is as follows:

the weight value is represented, and the value range is 0 to 1.

Fig. 5 shows the intermediate action submodel in the grid, and the intermediate action submodel is fine-tuned by using the grid deformation algorithm, that is, the intermediate action submodel is adjusted by using the above expression according to the action template model concentrated by the action template model and the first action submodel.

Taking the above formula as a reference, under the condition that a plurality of intermediate action submodels need to be adjusted, a multi-target fusion algorithm is used, and the algorithm is as follows:

representing the weight value, wherein the value range is 0 to 1, b represents the vertex coordinate of the action base reference model of the key action, namely the vertex coordinate of the action template model in the action template model set, and b is (x)_b,y_b,z_b)，T_iThe vertex coordinates of the ith action submodel are expressed, the value of i can be from 1 to n, n is the total number of the action submodels, and T₁＝(x₁,y₁,z₁) Vertex coordinates, T, representing a first action sub-model₂＝(x₂,y₂,z₂) Vertex coordinates representing the second action submodel, and so on, T_n＝(x_n,y_n,z_n) The vertex coordinates of the nth motion sub model are shown.

Step 305-step 306 are one implementation of step 205.

And 307, obtaining a target video based on the M target action submodels.

Step 307 is one implementation of step 206.

Further, the step may specifically be: obtaining M target frames based on the M target action submodels; and obtaining a target video according to the M target frames.

And determining a target frame according to each target action submodel, wherein the action corresponding to one target action submodel is displayed in each target frame, and the actions of the plurality of target frames are connected in series, so that the actions in the target video are coherent.

Further, the obtaining a target video according to the M target frames includes:

and performing interframe interpolation based on the target frame sequence to obtain the target video.

The sequence of the M key sub-actions can be determined according to the sequence of each key sub-action in the source video, and as the target action sub-models and the key sub-actions have corresponding relations, the sequence, namely the first sequence, of each target action sub-model can be determined based on the sequence of each key sub-action.

Because the target frame is determined according to the target action submodel, the target action submodel has a corresponding relation with the target frame, and thus, the sequence among a plurality of target frames can be determined based on the sequence of each target action submodel. In order to improve the display effect of the target video, interframe interpolation is carried out by adopting adjacent target frames in the target frame sequence to obtain the target video. The object of executing the action in the target video is the target object, the executed action is the action of the source object in the source video, the purpose that the target object imitates the action of the source object is achieved, for example, if the key action is a dance action, the target video that the target object imitates the dance of the source object can be obtained.

Referring to fig. 6, fig. 6 is a structural diagram of a terminal according to an embodiment of the present invention, and as shown in fig. 6, an electronic device 600 includes:

a first obtaining module 601, configured to obtain a source video including a source object;

a second obtaining module 602, configured to obtain a first image of a target object;

a third obtaining module 603, configured to obtain a target model of the target object based on the first image;

a fourth obtaining module 604, configured to obtain a key action of the source object in the source video;

a fifth obtaining module 605, configured to adjust the target model according to the key action, so as to obtain a target action model;

a sixth obtaining module 606, configured to obtain a target video based on the target action model.

Further, the fifth obtaining module 605 includes:

the first obtaining submodule is used for obtaining an action model according to the key action;

and the second obtaining submodule is used for adjusting the target model according to the action model to obtain the target action model.

Further, the first obtaining sub-module is configured to obtain M action sub-models according to M key sub-actions of the key action, where M is a positive integer;

the second obtaining submodule is used for adjusting the target model according to the M action submodels to obtain M target action submodels of the target action model;

and the sixth acquisition module is used for acquiring the target video based on the M target action submodels.

Further, the second obtaining sub-module includes:

the disassembly unit is used for performing three-dimensional space disassembly on the action submodel for each action submodel of the M action submodels to obtain a plurality of key points of the action submodel;

and the first adjusting unit is used for adjusting the target model according to the plurality of key points to obtain a target action sub-model corresponding to the action sub-model.

Further, the second obtaining sub-module includes:

the second adjusting unit is used for adjusting the target model according to the M action submodels to obtain M middle action submodels;

a first obtaining unit configured to obtain a target vertex of the intermediate action sub-model for each of the M intermediate action sub-models;

a second obtaining unit, configured to obtain a first vertex corresponding to the target vertex, where the first vertex is a vertex of a first action sub-model, and the first action sub-model is an action sub-model corresponding to the intermediate action sub-model;

a third obtaining unit, configured to obtain a second vertex corresponding to the target vertex from a pre-obtained action template model corresponding to the first action sub-model;

and the third adjusting unit is used for adjusting the position of the target vertex according to the positions of the first vertex and the second vertex so as to obtain a target action sub-model corresponding to the intermediate action sub-model.

Further, the third obtaining module 603 is configured to:

acquiring an intermediate target model of the target object according to the first image;

Further, the sixth obtaining module 606 includes:

a fourth obtaining unit, configured to obtain M target frames based on the M target action submodels;

and the fifth acquisition unit is used for acquiring the target video according to the M target frames.

Further, the fifth obtaining unit is configured to:

and performing interframe interpolation based on the target frame sequence to obtain the target video.

The terminal 600 can implement each process implemented by the terminal in the embodiments of the methods in fig. 1 to fig. 3, and is not described herein again to avoid repetition.

The terminal 600 of the embodiment of the present invention obtains a source video including a source object; acquiring a first image of a target object; constructing a target model of the target object based on the first image; acquiring key actions of the source object in the source video; adjusting the target model according to the key action to obtain a target action model; and obtaining a target video based on the target action model. Therefore, the target model of the target object is adjusted based on the key action of the source object in the source video, so that the action presented by the target action model is matched with the key action, and the synthetic effect of the target object simulating the action of the source object is improved.

Fig. 7 is a schematic diagram of a hardware structure of an electronic device for implementing various embodiments of the present invention, and as shown in fig. 7, the electronic device 700 includes, but is not limited to: a radio frequency unit 701, a network module 702, an audio output unit 703, an input unit 704, a sensor 705, a display unit 706, a user input unit 707, an interface unit 708, a memory 709, a processor 710, a power supply 711, and the like. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 7 does not constitute a limitation of the electronic device, and that the electronic device may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the electronic device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a wearable device, a pedometer, and the like.

The processor 710 is configured to obtain a source video including a source object;

acquiring a first image of a target object;

acquiring a target model of the target object based on the first image;

acquiring key actions of the source object in the source video;

adjusting the target model according to the key action to obtain a target action model;

and obtaining a target video based on the target action model.