Method for intelligently editing video

文档序号：1548226 发布日期：2020-01-17 浏览：12次中文

阅读说明：本技术 一种智能剪辑视频的方法 (Method for intelligently editing video ) 是由陈凌云吴伟平于 2019-09-29 设计创作，主要内容包括：本发明涉及视频剪辑技术领域,具体地说,涉及一种智能剪辑视频的方法。其方法步骤为：用户通过界面或其他方式预设一个视频输出的画幅比例；导入本次需要剪辑的视频素材；对导入的视频素材,进行解码,每帧抽取一张图片作为分析对象；将视频按镜头进行切分；对上一步剪出的单个视频镜头进行画面裁剪；把所有镜头裁剪出来的结果,按照时间的顺序进行合成,得出最后的视频。该智能剪辑视频的方法中,通过人脸检测、物体检测等算法,找到视频画面中最重要的内容,并能根据输入的画幅比例,动态调节裁剪区域,并通过美学打分的方式,找到最美的视频裁剪方案,使得拍摄出来的视频能够以较低的成本,分发到多个对画幅比例要求不同的平台上。(The invention relates to the technical field of video editing, in particular to a method for intelligently editing a video. The method comprises the following steps: the user presets a frame scale of video output through an interface or other modes; importing the video material needing to be edited at this time; decoding the imported video material, and extracting a picture from each frame as an analysis object; segmenting the video according to the shot; cutting the picture of the single video shot cut in the previous step; and (4) synthesizing the cut results of all the shots according to the time sequence to obtain the final video. In the method for intelligently editing the video, the most important content in the video picture is found through algorithms such as face detection, object detection and the like, the cutting area can be dynamically adjusted according to the input picture proportion, and the most beautiful video cutting scheme is found through an aesthetic scoring mode, so that the shot video can be distributed to a plurality of platforms with different requirements on the picture proportion at lower cost.)

1. A method for intelligently editing videos comprises the following steps:

s1, presetting a frame scale of video output by a user through an interface or other modes;

s2, importing the video material needing to be edited at this time;

s3, decoding the imported video material, and extracting a picture from each frame as an analysis object;

s4, segmenting the video according to the shot;

s5, cutting the picture of the single video shot cut in the previous step;

and S6, synthesizing the cut results of all the shots according to the time sequence to obtain the final video.

2. The method of intelligently clipping video according to claim 1, wherein: in S4, the video cut-by-shot step is as follows:

①, given a segment of video stream, the video draws a histogram frame by frame;

②, calculating the difference value with the previous frame, if the difference value between two adjacent frames is obviously larger, then it can be judged as a lens edge frame;

③, cutting the video according to the calculated edge frames, the difference between the histograms can be calculated by the following formula:

3. the method of intelligently clipping video according to claim 1, wherein: in S5, the step of clipping the single cut video shot includes:

①, detecting the face appearing in each frame of picture by a face detection algorithm to obtain the coordinate and area of each face in the picture;

②, detecting the close-up objects appearing in each frame of picture through an object detection algorithm to obtain the coordinates and the area of each object in the picture;

③, cutting according to the preset video frame;

④, marking aesthetic scores for each cutting picture and selecting the cutting scheme with the most beautiful appearance.

4. The method of intelligently clipping video according to claim 3, wherein: the face detection algorithm adopts an MTCNN algorithm, and comprises the following steps:

the method comprises the following steps: face detection, wherein the face detection adopts a cross entropy loss function to realize a classification task, and the cross entropy loss function is as follows:

step two: the face feature point positioning is a regression problem, the target is the square sum loss of the feature points and calibrated data, and the algorithm is as follows:

step four: training an objective function, introducing an indication value for indicating whether the sample needs to calculate a certain loss, the function being:

5. the method of intelligently clipping video according to claim 3, wherein: the object detection algorithm adopts a YOLO3 detection method, and comprises the following steps:

the method comprises the following steps: introduction of command statements:

creating a new cmd formatted file in a darknet, exe and peer directory directly, writing an executed command statement in the file, saving the file, and double-clicking the cmd formatted file to execute the command, wherein the training command statement is as follows: darknet. exedetectorrandata/obj-leaf. daoyolov3-leaf. cfgdarknet53. conv.74;

step two: starting training:

① creating new command file, named as yolov3_ leaf _ train, right key editing, copying and storing training command sentence;

② debug the trial by double clicking "yolov 3_ leaf _ train.

6. The method of intelligently clipping video according to claim 3, wherein: the method for marking the aesthetic feeling score for each cutting picture comprises the following steps:

the method comprises the following steps: analyzing the picture of each cutting scheme based on an image aesthetic algorithm, and scoring the aesthetic feeling of the video picture;

step two: the cutting schemes are arranged in a reverse order according to the aesthetic feeling, and the cutting scheme with the greatest beauty is found out;

step three: and cutting the shot according to the cutting scheme to obtain the cut shot.

7. The method of intelligently clipping video of claim 6, wherein: the image aesthetic algorithm adopts an NIMA algorithm model and comprises the following steps:

①, using a pre-trained image network as a baseline (baseline), the image network comprising an image algorithm off-the-shelf by MobileNet, VGG16, inclusion;

②, replacing the last layer on the basis of baseline (baseline), and pre-training the task by using a classifier (FC) initialized randomly;

③, normalizing the result of the pre-training by softmax;

④, training the loss function with EMD (empirical mode decomposition), the expression of the loss function is as follows:

wherein the content of the first and second substances,

⑤, based on the algorithm, every time a picture is input, the algorithm can give a score (0-10 points) to evaluate the quality of the picture, and the higher the score is, the better the quality of the picture is.

8. The method of intelligently clipping video of claim 6, wherein: and (3) performing reverse order arrangement on the cutting scheme according to the aesthetic feeling by adopting a kernel calibration function, and giving a kernel function K, K', the formula is as follows:

each given set of finite sample sets S ═ x₁,x₂,…,x_m) The kernel matrix K ∈ R^m+mAnd K_,∈R^m+mThe calibration formula between (1) is as follows:

in the formula (I), the compound is shown in the specification,<…>_Fis Frobenius product, and |. |)_FRefers to the Frobenius norm;

given a training set, different kernel functions are applied to different features to obtain a feature kernel matrix { K }₁,K₂,…,K_pIs made use ofThe label of the training set sample can obtain a target kernel matrix K_YThe higher the nuclear calibration value of the characteristic nuclear matrix and the target nuclear matrix is, the higher the correlation degree of the characteristic nuclear matrix and the target nuclear matrix is, and the fused target of the nuclear matrix is the fused nuclear matrix

definition vector a ═ c (<K₁,K_Y>_F,<K₂,K_Y>_F…<K_p,K_Y>_F)^TThe matrix M is an element of R^p*pWherein M is_i,j＝<K_i,K_j>_FThen, the optimization problem formula is:

Technical Field

The invention relates to the technical field of video editing, in particular to a method for intelligently editing a video.

Background

Due to the popularization of smart phones, 70% of smart phone users use smart phones through a vertical screen for a long time, and therefore more and more users watch videos through the vertical screen. Within many apps (e.g., tremble, pan, etc.), the video provided is also predominantly portrait (9:16 or 3: 4). However, most of the finished videos on the market still mainly comprise horizontal screens (the ratio is 16:9 or 4: 3), and the viewing experience of the videos on the mobile phone is not good. The present method was therefore invented with the aim of rapidly converting from 16:9 or 4: 3 the vertical video is cut out from the horizontal screen video well and quickly, so that the aim of cutting out the video at any specified proportion can be fulfilled.

Disclosure of Invention

It is an object of the present invention to provide a method for intelligently clipping video that solves some or some of the above-mentioned deficiencies in the prior art.

In order to achieve the above object, the present invention provides a method for intelligently editing a video, which comprises the following steps:

s1, presetting a frame scale of video output by a user through an interface or other modes;

s2, importing the video material needing to be edited at this time;

s3, decoding the imported video material, and extracting a picture from each frame as an analysis object;

s4, segmenting the video according to the shot;

s5, cutting the picture of the single video shot cut in the previous step;

and S6, synthesizing the cut results of all the shots according to the time sequence to obtain the final video.

Preferably, in S4, the video shot segmentation step is as follows:

①, given a segment of video stream, the video draws a histogram frame by frame;

②, calculating the difference value with the previous frame, if the difference value between two adjacent frames is obviously larger, then it can be judged as a lens edge frame;

③, cutting the video according to the calculated edge frames, the difference between the histograms can be calculated by the following formula:

preferably, in S5, the cut-out single video shot is subjected to the following steps:

①, detecting the face appearing in each frame of picture by a face detection algorithm to obtain the coordinate and area of each face in the picture;

②, detecting the close-up objects appearing in each frame of picture through an object detection algorithm to obtain the coordinates and the area of each object in the picture;

③, cutting according to the preset video frame;

④, marking aesthetic scores for each cutting picture and selecting the cutting scheme with the most beautiful appearance.

Preferably, the face detection algorithm adopts an MTCNN algorithm, which includes the following steps:

step two: the face feature point positioning is a regression problem, the target is the square sum loss of the feature points and calibrated data, and the algorithm is as follows:

step four: training an objective function, introducing an indication value for indicating whether the sample needs to calculate a certain loss, the function being:

preferably, the object detection algorithm adopts a YOLO3 detection method, and comprises the following steps:

the method comprises the following steps: introduction of command statements:

step two: starting training:

① creating new command file, named as yolov3_ leaf _ train, right key editing, copying and storing training command sentence;

② debug the trial by double clicking "yolov 3_ leaf _ train.

Preferably, the step of giving an aesthetic score to each cut picture comprises the following steps:

the method comprises the following steps: analyzing the picture of each cutting scheme based on an image aesthetic algorithm, and scoring the aesthetic feeling of the video picture;

step two: the cutting schemes are arranged in a reverse order according to the aesthetic feeling, and the cutting scheme with the greatest beauty is found out;

step three: and cutting the shot according to the cutting scheme to obtain the cut shot.

Preferably, the image aesthetic algorithm uses a NIMA algorithm model, which comprises the following steps:

①, using a pre-trained image network as a baseline (baseline), the image network comprising an image algorithm off-the-shelf by MobileNet, VGG16, inclusion;

②, replacing the last layer on the basis of baseline (baseline), and pre-training the task by using a classifier (FC) initialized randomly;

③, normalizing the result of the pre-training by softmax;

④, training the loss function with EMD (empirical mode decomposition), the expression of the loss function is as follows:

wherein the content of the first and second substances,

that is, instead of predicting the distribution, the cumulative value of the probabilities of the scores is predicted, rather than predicting the probability of obtaining each score independently;

Preferably, the clipping scheme is reversely ordered according to the aesthetic sense by using a kernel calibration function, and given kernel functions K, the formula is as follows:

each given set of finite sample sets S ═ x₁，x₂，...，x_m) The kernel matrix K ∈ R^m+mAnd K, ∈ R^m+mThe calibration formula between (1) is as follows:

in the formula (I), the compound is shown in the specification,<...>_Fis Frobenius product, and | |. Lu_FRefers to the Frobenius norm;

given a training set, different kernel functions are applied to different features to obtain a feature kernel matrix { K }₁，K₂，...，K_pGet the target kernel matrix K by using the labels of the training set samples_YThe higher the nuclear calibration value of the characteristic nuclear matrix and the target nuclear matrix is, the higher the correlation degree of the characteristic nuclear matrix and the target nuclear matrix is, and the fused target of the nuclear matrix is the fused coreMatrix array

And the calibration value of the target kernel matrix is maximized, namely:

definition vector a ═ c (<K₁，K_Y)_F，(K₂，K_Y>_F...<K_p，K_Y>_F)^TThe matrix M is an element of R^p*pWherein M is_i，j＝<K_i，K_j>_FThen, the optimization problem formula is:

compared with the prior art, the invention has the beneficial effects that:

1. according to the method for intelligently editing the video, under the scene of the requirement of frame change, the quality of automatic cutting by a machine can be greatly improved, the most important content in a video frame is found through algorithms such as face detection, object detection and the like, the cutting area can be dynamically adjusted according to the input frame proportion, and the most beautiful video cutting scheme is found through an aesthetic scoring mode.

2. According to the intelligent video clipping method, the machine can replace repeated work of people to a large extent, and the overall efficiency of video clipping is improved.

3. According to the method for intelligently editing the video, the cutting area is dynamically adjusted according to the input picture proportion, and the most beautiful video cutting scheme is found in an aesthetic scoring mode, so that the shot video can be distributed to a plurality of platforms with different requirements on the picture proportion at lower cost.

Drawings

FIG. 1 is an overall process flow diagram of the present invention;

FIG. 2 is a flow chart of the steps of cropping a frame for a single video shot in accordance with the present invention;

FIG. 3 is a flow chart of the present invention for cutting the aesthetic feeling of the picture.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-3, the present invention provides a technical solution:

the invention provides a method for intelligently clipping a video, which comprises the following steps:

s1, presetting a frame scale of video output by a user through an interface or other modes;

s2, importing the video material needing to be edited at this time;

s3, decoding the imported video material, and extracting a picture from each frame as an analysis object;

s4, segmenting the video according to the shot;

s5, cutting the picture of the single video shot cut in the previous step;

and S6, synthesizing the cut results of all the shots according to the time sequence to obtain the final video.

Further, in S4, the video slicing step includes:

①, given a segment of video stream, the video draws a histogram frame by frame;

②, calculating the difference value with the previous frame, if the difference value between two adjacent frames is obviously larger, then it can be judged as a lens edge frame;

③, cutting the video according to the calculated edge frames, the difference between the histograms can be calculated by the following formula:

specifically, in S5, the step of clipping the cut single video shot includes:

①, detecting the face appearing in each frame of picture by a face detection algorithm to obtain the coordinate and area of each face in the picture;

②, detecting the close-up objects appearing in each frame of picture through an object detection algorithm to obtain the coordinates and the area of each object in the picture;

③, cutting according to the preset video frame;

④, marking aesthetic scores for each cutting picture and selecting the cutting scheme with the most beautiful appearance.

The shot cutting can be performed by various shot cutting schemes, for example, by using an i-frame recognition algorithm carried by the h.264 itself, and a similar effect can be achieved.

Further, the face detection algorithm adopts MTCNN algorithm, which includes the following steps:

step two: the face feature point positioning is a regression problem, the target is the square sum loss of the feature points and calibrated data, and the algorithm is as follows:

step three: performing regression on the bounding box, and performing a series of fine adjustments on the predicted window when the IOU is smaller than a set threshold value to enable the predicted window to be closer to a true value; in practical application, the changed input and output convert the converted result and the final proper result according to a specific algorithm, and the converted result can be understood as a linear regression of a loss function;

step four: training an objective function, introducing an indication value for indicating whether the sample needs to calculate a certain loss, the function being:

14页详细技术资料下载

Method for intelligently editing video

相关技术

网友询问留言