Method for synthesizing computer video

文档序号:1326795 发布日期:2020-07-14 浏览:8次 中文

阅读说明:本技术 一种计算机视频合成的方法 (Method for synthesizing computer video ) 是由 杨超辉 于 2020-04-22 设计创作,主要内容包括:本发明公开了一种计算机视频合成的方法,包括接收用户的视频合成请求,其特征在于,所述视频合成请求中包括视频素材、音频素材和根据所述视频合成请求确定的素材需求,再根据确定结果,生成用于视频合成的时域分布信息,将各个媒体素材通过智能组合软件进行合成,合成后,对合成后的视频进行降噪,降噪包括以下步骤:步骤(1)提取输入视频连续三帧图像:前一帧、当前帧和后一帧。本发明实施例在对视频、图片和音频进行合成的过程中,能够同时添加相应的特效,缩短了视频合成所需的时间,提高了用户体验,既能较好的去除噪声分量,减少块效应,又能保留高频部分的细节,提高主观视觉感受。(The invention discloses a method for synthesizing computer video, which comprises the steps of receiving a video synthesizing request of a user, and is characterized in that the video synthesizing request comprises video materials, audio materials and material requirements determined according to the video synthesizing request, generating time domain distribution information for video synthesis according to a determination result, synthesizing each media material through intelligent combination software, and after synthesis, reducing noise of the synthesized video, wherein the noise reduction comprises the following steps: extracting three continuous frame images of an input video: a previous frame, a current frame, and a next frame. According to the embodiment of the invention, in the process of synthesizing the video, the picture and the audio, the corresponding special effects can be added simultaneously, the time required by video synthesis is shortened, the user experience is improved, the noise component can be better removed, the block effect is reduced, the details of a high-frequency part can be reserved, and the subjective visual experience is improved.)

1. A method for synthesizing computer video includes receiving video synthesizing request of user, generating time domain distribution information for video synthesis according to video synthesizing request including video material, audio material and material requirement determined by said request, synthesizing each media material by intelligent combination software, denoising synthesized video after synthesis, denoising including following steps:

Extracting three continuous frame images of an input video: a previous frame, a current frame, and a next frame;

Step (2) simple edge detection is carried out on the current frame image, then block noise variance estimation is carried out, and a threshold value T of motion detection is set according to the noise variance;

Respectively performing motion estimation on a previous frame image and a current frame image, and a next frame image and a current frame image after down sampling, and obtaining a forward matching block and a backward matching block according to a minimum SAD (sum of absolute differences) criterion;

Step (4) carrying out motion detection on the block before sampling according to the forward matching block and the backward matching block, and carrying out time-domain filtering on the rigid motion region if the MAD values of the matching block and the current block are less than a threshold value T; if the MAD value of the matched block and the current block is larger than the threshold value T, the block is considered to be non-rigid motion, and then self-adaptive spatial filtering is carried out according to the edge detection result;

And (5) carrying out weighted average on the filtering result in the step (4), and outputting the result as a previous frame of the next filtering to participate in the recursive filtering.

2. The method according to claim 1, wherein the material requirements include media materials required for video composition, presentation time periods corresponding to desired composition results of the media materials, rendering special effects corresponding to the media materials, and rendering time periods corresponding to the rendering special effects, which are determined according to the video composition request.

3. The method of claim 1, wherein the temporal distribution information comprises: the time domain distribution condition of each media material in the expected synthesis result and the time domain distribution condition of the rendering special effect corresponding to each media material in the expected synthesis result.

4. The method according to claim 1, wherein in step (2), Sobel operator is used to perform edge detection and record coordinate values of edge points, and then the image is divided into 16 × 16 non-overlapping sub-blocks, if the sub-block Bmn contains N consecutive edge points, the noise estimation method excludes Bmn from performing noise estimation on the remaining sub-blocks, and the noise estimation method is based on intra-block neighborhood correlation, and the calculation method is as follows: respectively calculating the average value of the absolute value of the difference between each pixel point and all adjacent pixel points in the block, then obtaining the sum of all the average values to obtain psi, wherein psi is the neighborhood correlation in the block, selecting the block with the minimum psi from all the blocks, the block mean and variance can be used as the mean and variance of noise, and the threshold T is the block variance.

5. A method of computer video synthesis as claimed in claim 1, wherein in step (3), the SAD criterion is calculated in a manner that (m, n) represents the coordinate position of the pixel in the image, k represents the frame number of the image in the video, and (i, j) is the displacement vector between the pixels (m, n, k) and (m + i, n + j, k-1), and f (m, n, k) and f (m + i, n + j, k-1) are the gray values of the pixels (m, n, k) and (m + i, n + j, k-1) of the current fk frame and the reference frame fk-1, respectively, and if the value of SAD (i0, j0) reaches the minimum at a certain displacement vector (i0, j0), the vector is the best block matching motion vector.

6. A method for computer video composition according to claim 1, wherein in the step (4), the MAD value is calculated as follows: where (m, n) represents the coordinate position of the pixel in the image, k represents the number of frames in the video, and (i, j) is the displacement vector between the pixel (m, n, k) and (m + i, n + j, k-1), and f (m, n, k) and f (m + i, n + j, k-1) are the gray scale values of the current fk frame and the reference frame fk-1 pixel (m, n, k) and (m + i, n + j, k-1), respectively.

7. A method of computer video synthesis according to claim 1, wherein in step (4), the temporal filtering is calculated as follows: POUT1 ═ w × p (t-1) + (1-w) × p (t); POUT2 ═ w × p (t) + (1-w) × p (t +1), where p (t-1) and p (t +1) respectively represent the results after upsampling recovery for the previous frame and the next frame, p (t) represents the results after upsampling recovery for the current frame, POUT1 represents the results of time-domain weighted filtering for the previous frame and the current frame, POUT2 represents the results of time-domain weighted filtering for the next frame and the current frame, and w is a weight coefficient.

8. A method for computer video synthesis according to claim 1, wherein in step (4), the adaptive spatial filtering is calculated as follows: w (i, j) ═ wd (i, j) wr (i, j), where wd (i, j) is the spatial proximity factor and wr (i, j) is the luminance proximity factor.

9. A method for computer video synthesis according to claim 8, wherein wd (i, j) is calculated as wr (i, j) in such a way that σ d and σ r represent distance difference and luminance difference between pixels, respectively, wherein σ d is an adaptive filter coefficient having a value twice the variance of the noise estimate.

10. The method of claim 1, wherein in step (5), the weighted average is calculated as P0 ═ (POUT1| | POUT3) × 0.6+ (POUT2| | | POUT4) × 0.4, wherein POUT1 and POUT3 respectively represent the results of temporal filtering and spatial filtering of the current frame and the previous frame, POUT2 and POUT4 respectively represent the results of temporal filtering and spatial filtering of the current frame and the next frame, | | | represents or, 0.6 and 0.4 are weighting coefficients, and P0 represents the final output result.

Technical Field

The invention relates to the field of video synthesis methods, in particular to a computer video synthesis method.

Background

In this information age, people often use mobile terminals such as mobile phones and tablet computers to take videos and pictures or record audios so as to record the operating and living drips. People can also install software with a media material editing function in a computer, synthesize shot videos, pictures and recorded audios into a sound dynamic video, and add various special effects to the synthesized video.

However, in the media material editing software in the prior art, when synthesizing and adding special effects to a video, a picture and an audio, the video, the picture and the audio need to be synthesized respectively, that is, the video, the picture and the audio are synthesized first, and then the corresponding special effects are added to the synthesized video, or the video, the picture and the audio are added with the respective corresponding special effects first, and then the video, the picture and the audio with the special effects are synthesized, so that the editing process is complicated, long time is required to be consumed, and the user experience is poor.

And because of factors such as the internal structure of the camera device, the external environment and the like, noise is inevitably introduced in the process of acquiring, storing and transmitting the video, and the noise not only seriously affects the subjective quality of the video image, but also brings extra high-frequency components and wastes more bits to keep the useless information. In addition, the presence of noise can also affect image enhancement, object recognition, and the like. Noise reduction is therefore one of the most critical and common processes in video image processing systems. Typical transform domain noise reduction algorithms such as fourier transform filtering and wavelet transform filtering remove noise by analyzing and screening coefficients after signal transformation, retain useful signals, and have a good effect on retaining image edges and details, but are difficult to select wavelet basis and are limited in application.

Disclosure of Invention

The invention aims to solve the defects in the prior art and provides a computer video synthesis method.

In order to achieve the purpose, the invention adopts the following technical scheme:

A method for synthesizing computer video includes receiving video synthesizing request of user, generating time domain distribution information for video synthesizing according to video synthesizing request and video material, synthesizing each media material by intelligent combination software, denoising synthesized video after synthesis, including following steps:

Extracting three continuous frame images of an input video: a previous frame, a current frame, and a next frame;

Step (2) simple edge detection is carried out on the current frame image, then block noise variance estimation is carried out, and a threshold value T of motion detection is set according to the noise variance;

Respectively performing motion estimation on a previous frame image and a current frame image, and a next frame image and a current frame image after down sampling, and obtaining a forward matching block and a backward matching block according to a minimum SAD (sum of absolute differences) criterion;

Step (4) carrying out motion detection on the block before sampling according to the forward matching block and the backward matching block, and carrying out time-domain filtering on the rigid motion region if the MAD values of the matching block and the current block are less than a threshold value T; if the MAD value of the matched block and the current block is larger than the threshold value T, the block is considered to be non-rigid motion, and then self-adaptive spatial filtering is carried out according to the edge detection result;

And (5) carrying out weighted average on the filtering result in the step (4), and outputting the result as a previous frame of the next filtering to participate in the recursive filtering.

Preferably, the material requirements include media materials required for video synthesis, a presentation time period corresponding to each media material in an expected synthesis result, a rendering special effect corresponding to each media material, and a rendering time period corresponding to each rendering special effect, which are determined according to the video synthesis request.

Preferably, the time domain distribution information has recorded therein: the time domain distribution condition of each media material in the expected synthesis result and the time domain distribution condition of the rendering special effect corresponding to each media material in the expected synthesis result.

Preferably, in the step (2), the Sobel operator is used to perform edge detection and record coordinate values of edge points, then the image is divided into subblocks with 16 × 16 pixels which are not overlapped with each other, if the subblock Bmn contains N continuous edge points, the Bmn is excluded from performing noise estimation on the remaining subblocks, and the noise estimation method is calculated by a method based on intra-block neighborhood correlation, and the calculation method is as follows: respectively calculating the average value of the absolute value of the difference between each pixel point and all adjacent pixel points in the block, then obtaining the sum of all the average values to obtain psi, wherein psi is the neighborhood correlation in the block, selecting the block with the minimum psi from all the blocks, the block mean and variance can be used as the mean and variance of noise, and the threshold T is the block variance.

Preferably, in the step (3), the SAD criterion is calculated in a manner that (m, n) represents a coordinate position of a pixel in the image, k represents a frame number of the image in the video, (i, j) is a displacement vector between the pixels (m, n, k) and (m + i, n + j, k-1), and f (m, n, k) and f (m + i, n + j, k-1) are gray values of the pixels (m, n, k) and (m + i, n + j, k-1) of the current fk frame and the reference frame fk-1, respectively, and if a SAD (i0, j0) value reaches a minimum value at a certain displacement vector (i0, j0), the vector is a best block matching motion vector.

Preferably, in the step (4), the MAD value is calculated as follows: where (m, n) represents the coordinate position of the pixel in the image, k represents the number of frames in the video, and (i, j) is the displacement vector between the pixel (m, n, k) and (m + i, n + j, k-1), and f (m, n, k) and f (m + i, n + j, k-1) are the gray scale values of the current fk frame and the reference frame fk-1 pixel (m, n, k) and (m + i, n + j, k-1), respectively.

Preferably, in the step (4), the temporal filtering is calculated as follows: POUT1 ═ w × p (t-1) + (1-w) × p (t); POUT2 ═ w × p (t) + (1-w) × p (t +1), where p (t-1) and p (t +1) respectively represent the results after upsampling recovery for the previous frame and the next frame, p (t) represents the results after upsampling recovery for the current frame, POUT1 represents the results of time-domain weighted filtering for the previous frame and the current frame, POUT2 represents the results of time-domain weighted filtering for the next frame and the current frame, and w is a weight coefficient.

Preferably, in step (4), the adaptive spatial filtering is calculated as follows: w (i, j) ═ wd (i, j) wr (i, j), where wd (i, j) is the spatial proximity factor and wr (i, j) is the luminance proximity factor.

Preferably, wd (i, j) is calculated in such a way that wr (i, j) is calculated in such a way that σ d and σ r represent a distance difference and a luminance difference between pixels, respectively, where σ d is an adaptive filter coefficient whose value is twice the variance of the noise estimate.

Preferably, in step (5), the weighted average is calculated in a manner of P0 ═ P0 (POUT1| | POUT3) × 0.6+ (POUT2| | POUT4) × 0.4, where POUT1 and POUT3 respectively represent results of temporal filtering and spatial filtering of the current frame and the previous frame, POUT2 and POUT4 respectively represent results of temporal filtering and spatial filtering of the current frame and the next frame, | | | | represents or, 0.6 and 0.4 are weighting coefficients, and P0 represents a final output result.

The invention has the beneficial effects that:

Compared with the prior art, the method and the device can add corresponding special effects simultaneously in the process of synthesizing the video, the picture and the audio, shorten the time required by video synthesis, and improve the user experience.

2, the invention can better estimate the noise intensity when the noise is small, more accurately set the filter coefficient, and meanwhile, the improved bilateral filter is adopted, thus having stronger filtering capability than the classical filter. The algorithm of the invention estimates noise more accurately, can distinguish rigid motion block and non-rigid motion block better, will not bring the motion smear, PSNR value has improved 0.64dB than the reference algorithm on average, the invention carries on the edge detection and estimates out the noise intensity at first, then get the motion information and local structure of the pixel point through the technology such as downsampling and motion estimation, etc., select different filtering tactics according to these information self-adaptation. The invention can better remove noise components, reduce blocking effect, retain details of high-frequency parts and improve subjective visual perception.

Drawings

Fig. 1 is a flowchart of a method for synthesizing a computer video according to the present invention.

Fig. 2 is a schematic diagram of the material requirements of a method for computer video composition according to the present invention.

Fig. 3 is a flowchart of the operation of the noise reduction part of the proposed method for computer video composition.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

Referring to fig. 1-3, a method for synthesizing a computer video includes receiving a video synthesis request of a user, where the video synthesis request includes video material, audio material, and material requirements determined according to the video synthesis request, generating time domain distribution information for video synthesis according to a determination result, synthesizing each media material by intelligent combination software, and after the synthesis, performing noise reduction on the synthesized video, where the noise reduction includes the following steps:

Extracting three continuous frame images of an input video: a previous frame, a current frame, and a next frame;

Step (2) simple edge detection is carried out on the current frame image, then block noise variance estimation is carried out, and a threshold value T of motion detection is set according to the noise variance;

Respectively performing motion estimation on a previous frame image and a current frame image, and a next frame image and a current frame image after down sampling, and obtaining a forward matching block and a backward matching block according to a minimum SAD (sum of absolute differences) criterion;

Step (4) carrying out motion detection on the block before sampling according to the forward matching block and the backward matching block, and carrying out time-domain filtering on the rigid motion region if the MAD values of the matching block and the current block are less than a threshold value T; if the MAD value of the matched block and the current block is larger than the threshold value T, the block is considered to be non-rigid motion, and then self-adaptive spatial filtering is carried out according to the edge detection result;

Step (5) weighted average is carried out on the filtering result in the step (4), and meanwhile, the result is output as a previous frame of next filtering and participates in recursive filtering;

The material requirements comprise that media materials required by video synthesis, display time periods corresponding to the media materials in an expected synthesis result, rendering special effects corresponding to the media materials and rendering time periods corresponding to the rendering special effects are determined according to the video synthesis request, and the time domain distribution information records: in the step (2), edge detection is performed by using a Sobel operator, coordinate values of edge points are recorded, then an image is divided into subblocks with 16 × 16 pixels not overlapped with each other, if a subblock Bmn contains continuous N edge points, the Bmn is excluded from performing noise estimation on the rest subblocks, and a noise estimation method adopts a method based on intra-block neighborhood correlation to calculate, wherein the calculation method comprises the following steps: calculating the average value of the absolute value of the difference between each pixel point and all adjacent pixel points in the block respectively, then obtaining psi by calculating the sum of all the average values, psi is the correlation degree of the neighborhood in the block, the block with the minimum psi is selected from all the blocks, the block average value and variance can be used as the average value and variance of noise, the threshold value T is the variance of the block, in the step (3), the SAD criterion is calculated in the way that (m, n) represents the coordinate position of the pixel in the image, k represents the frame number of the image in the video, (i, j) is the displacement vector between the pixel (m, n, k) and (m + i, n + j, k-1), and f (m, n, k) and f (m + i, n + j, k-1) are the gray values of the pixel (m, n, k) and (m + i, n + j, k-1) of the current fk frame and the reference frame, respectively, if the SAD (i0, j0) value reaches the minimum at a certain displacement vector (i0, j0), the vector is the optimal block matching motion vector, and in step (4), the MAD value is calculated as follows: wherein (m, n) represents the coordinate position of the pixel in the image, k represents the frame number of the image in the video, and (i, j) is the displacement vector between the pixel (m, n, k) and (m + i, n + j, k-1), and f (m, n, k) and f (m + i, n + j, k-1) are the gray scale values of the pixel (m, n, k) and (m + i, n + j, k-1) of the current fk frame and the reference frame fk-1, respectively, in step (4), the temporal filtering is calculated as follows: POUT1 ═ w × p (t-1) + (1-w) × p (t); POUT2 ═ w × p (t) + (1-w) × p (t +1), where p (t-1) and p (t +1) respectively represent the results after the upsampling recovery of the previous frame and the next frame, p (t) represents the results after the upsampling recovery of the current frame, POUT1 represents the results of the temporal weighting filtering of the previous frame and the current frame, POUT2 represents the results of the temporal weighting filtering of the next frame and the current frame, and w is a weight coefficient, and in step (4), the adaptive spatial filtering is calculated as follows: w (i, j) ═ wd (i, j) wr (i, j), where wd (i, j) is a spatial proximity factor, wr (i, j) is a luminance proximity factor, and wd (i, j) is calculated in such a manner that wr (i, j) is calculated in such a manner that σ d and σ r respectively represent a distance difference and a luminance difference between pixels, where σ d is an adaptive filter coefficient whose value is twice the noise estimation variance, in step (5), weighted average is calculated in such a manner that P0 ═ POUT1| | | | POUT3) × 0.6+ (POUT2| | | POUT4) × 0.4, where POUT1 and POUT 567 respectively represent results of filtering and temporal filtering of a current frame and a previous frame, POUT2 and POUT4 respectively represent results of filtering and temporal filtering and a subsequent frame, and spatial or spatial domain 366 | represents a final weighted coefficient, and P0 represents a final weighted coefficient.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种姿态估计方法、服务器和网络设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类