Video slicing method and device, electronic equipment and storage medium

文档序号：1957072 发布日期：2021-12-10 浏览：16次中文

阅读说明：本技术 一种视频切片方法、装置、电子设备及存储介质 (Video slicing method and device, electronic equipment and storage medium ) 是由成超蔡媛樊鸿飞汪贤鲁方波于 2020-06-10 设计创作，主要内容包括：本申请实施例提供了一种视频切片方法、装置、电子设备及存储介质,涉及视频处理技术领域,所述方法包括：从待切片视频的第一个视频帧开始,计算每两个相邻视频帧之间的图像相似度,若存在两个相邻视频帧之间的图像相似度小于预设阈值,则在小于预设阈值的两个视频帧之间进行视频切片。采用本申请可以提高视频切片的准确性。(The embodiment of the application provides a video slicing method, a video slicing device, electronic equipment and a storage medium, and relates to the technical field of video processing, wherein the method comprises the following steps: and calculating the image similarity between every two adjacent video frames from the first video frame of the video to be sliced, and if the image similarity between the two adjacent video frames is smaller than a preset threshold, carrying out video slicing between the two video frames smaller than the preset threshold. By the adoption of the method and the device, the accuracy of video slicing can be improved.)

1. A method of video slicing, the method comprising:

calculating the image similarity between every two adjacent video frames from the first video frame of the video to be sliced;

and if the image similarity between two adjacent video frames is smaller than a preset threshold, performing video slicing between the two adjacent video frames smaller than the preset threshold.

2. The method of claim 1, wherein the calculating the image similarity between each two adjacent video frames comprises:

calculating the image similarity between every two adjacent video frames by a specified similarity algorithm; alternatively, the first and second electrodes may be,

aiming at each two adjacent video frames, respectively adopting a plurality of similarity algorithms to calculate the image similarity between the two adjacent video frames; and carrying out weighted summation on the image similarity obtained by adopting a plurality of similarity algorithms to obtain the finally required image similarity between the two adjacent video frames.

3. The method according to claim 2, wherein said calculating the image similarity between the two adjacent video frames by using a plurality of similarity algorithms respectively comprises:

calculating a normalized correlation coefficient between the two adjacent video frames;

calculating the similarity of the histograms between the two adjacent video frames;

the step of performing weighted summation on the image similarity calculated by adopting a plurality of similarity algorithms to obtain the finally required image similarity between the two adjacent video frames comprises the following steps:

and performing weighted summation on the normalized correlation coefficient and the histogram similarity based on a first weighting coefficient and a second weighting coefficient, and taking the result of weighted summation as the finally required image similarity between the two adjacent video frames, wherein the first weighting coefficient is the weight of the normalized correlation coefficient, and the second weighting coefficient is the weight of the histogram similarity.

4. The method according to any of claims 1-3, wherein prior to said calculating image similarity between each two adjacent video frames, the method further comprises:

compressing the resolution of each video frame included in the video to be sliced into a preset resolution.

5. The method of claim 3, wherein the first weighting factor is 0.7 and the second weighting factor is 0.3.

6. A video slicing apparatus, the apparatus comprising:

the calculating module is used for calculating the image similarity between every two adjacent video frames from the first video frame of the video to be sliced;

and the slicing module is used for performing video slicing between the two adjacent video frames smaller than the preset threshold value if the image similarity between the two adjacent video frames is smaller than the preset threshold value.

7. The apparatus of claim 6, wherein the computing module is specifically configured to:

calculating the image similarity between every two adjacent video frames by a specified similarity algorithm; alternatively, the first and second electrodes may be,

8. The apparatus of claim 7, wherein the computing module is specifically configured to:

calculating a normalized correlation coefficient between the two adjacent video frames;

calculating the similarity of the histograms between the two adjacent video frames;

9. The apparatus according to any one of claims 6-8, further comprising:

and the compression module is used for compressing the resolution of each video frame included in the video to be sliced into a preset resolution.

10. The apparatus of claim 8, wherein the first weighting factor is 0.7 and the second weighting factor is 0.3.

11. An electronic device comprising a processor and a memory;

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1 to 5 when executing a program stored in the memory.

12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-5.

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a video slicing method and apparatus, an electronic device, and a storage medium.

Background

Video slicing refers to cutting a piece of video into a plurality of video segments in units of shots, and each video segment can be referred to as a shot. One of the shots is a video frame sequence which is shot by the camera equipment in the same shooting process and has a continuous spatio-temporal relationship.

In the related art, video slicing is generally performed on video using a Fast Forward Mpeg (FFmpeg) tool. For example, as shown in fig. 1, fig. 1 is a video frame sequence of a live soccer video, which is composed of 4 continuous shots. The first lens is a video frame sequence which overlooks the whole defense forbidden area in a high-altitude scene and enables players to complete shooting; the second shot is a video frame sequence for close-up of the player; the third lens is a video frame sequence played back by shooting; shot four is a sequence of video frames that feature the opposite player. The position shown by the scissors in fig. 1 is the position where the FFmpeg tool performs video slicing on the live football video.

However, in the video slicing process performed by the FFmpeg tool, one shot may be sliced into two or more video slices, or multiple shots may be sliced into one video slice, which is less accurate.

Disclosure of Invention

An object of the embodiments of the present application is to provide a video slicing method, an apparatus, an electronic device, and a storage medium, so as to improve accuracy of video slicing. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a video slicing method, where the method includes:

calculating the image similarity between every two adjacent video frames from the first video frame of the video to be sliced;

and if the image similarity between two adjacent video frames is smaller than a preset threshold, performing video slicing between the two adjacent video frames smaller than the preset threshold.

In one possible implementation, the calculating the image similarity between every two adjacent video frames includes:

calculating the image similarity between every two adjacent video frames by a specified similarity algorithm; alternatively, the first and second electrodes may be,

In a possible implementation manner, the calculating the image similarity between the two adjacent video frames by using a plurality of similarity algorithms respectively includes:

calculating a normalized correlation coefficient between the two adjacent video frames;

calculating the similarity of the histograms between the two adjacent video frames;

In one possible implementation, before the calculating the image similarity between each two adjacent video frames, the method further includes:

compressing the resolution of each video frame included in the video to be sliced into a preset resolution.

In one possible implementation, the first weighting factor is 0.7 and the second weighting factor is 0.3.

In a second aspect, an embodiment of the present application provides a video slicing apparatus, including:

the calculating module is used for calculating the image similarity between every two adjacent video frames from the first video frame of the video to be sliced;

In a possible implementation manner, the calculation module is specifically configured to:

calculating the image similarity between every two adjacent video frames by a specified similarity algorithm; alternatively, the first and second electrodes may be,

In a possible implementation manner, the calculation module is specifically configured to:

calculating a normalized correlation coefficient between the two adjacent video frames;

calculating the similarity of the histograms between the two adjacent video frames;

In one possible implementation, the apparatus further includes:

and the compression module is used for compressing the resolution of each video frame included in the video to be sliced into a preset resolution.

In one possible implementation, the first weighting factor is 0.7 and the second weighting factor is 0.3.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory;

a memory for storing a computer program;

a processor for implementing the method steps of the first aspect when executing the program stored in the memory.

In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps of the first aspect.

In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.

By adopting the video slicing method, the video slicing device, the electronic equipment and the storage medium, the image similarity between every two adjacent video frames can be calculated from the first video frame of the video to be sliced, and if the image similarity between the two adjacent video frames is smaller than the preset threshold value, the video slicing is carried out between the two adjacent video frames smaller than the preset threshold value. Because the image similarity between the video frames of the same shot is high, and the image similarity between the video frames of different shots is low, the video frames belonging to the same shot and the video frames belonging to different shots can be accurately distinguished based on the image similarity in the embodiment of the application, so that the video slicing of the video to be sliced is performed through the image similarity between the adjacent video frames based on the embodiment of the application, and the accuracy of the video slicing is improved compared with the prior art.

Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is an exemplary diagram of the position of a video slice provided in the background art;

fig. 2 is a flowchart of a video slicing method according to an embodiment of the present application;

fig. 3 is a flowchart of another video slicing method provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a video slicing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of another video slicing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

An embodiment of the present application provides a video slicing method, which is applied to an electronic device, where the electronic device may be a terminal with video processing capability, such as a mobile phone and a computer, and as shown in fig. 2, the method includes:

s201, starting from a first video frame of a video to be sliced, calculating the image similarity between every two adjacent video frames.

For example, the image similarity between the first video frame and the second video frame, and the image similarity between the second video frame and the third video frame may be calculated until the image similarity between the penultimate video frame and the last video frame is calculated.

S202, if the image similarity between two adjacent video frames is smaller than a preset threshold value, performing video slicing between the two adjacent video frames smaller than the preset threshold value.

It can be understood that if the image similarity is smaller than the preset threshold, it indicates that the difference between the picture contents included in two adjacent video frames is large, and therefore the two adjacent video frames do not belong to the same shot, and further, video slicing can be performed between the two adjacent video frames.

On the contrary, if the image similarity is greater than or equal to the preset threshold, it indicates that the picture contents included in the two adjacent video frames are relatively similar and belong to the same shot, and further, video slicing is not performed between the two adjacent video frames.

The preset threshold may be an empirical value.

By adopting the video slicing method provided by the embodiment of the application, the image similarity between every two adjacent video frames can be calculated from the first video frame of the video to be sliced, and if the image similarity between the two adjacent video frames is smaller than the preset threshold, video slicing is performed between the two adjacent video frames smaller than the preset threshold. Because the image similarity between the video frames of the same shot is high, and the image similarity between the video frames of different shots is low, the video frames belonging to the same shot and the video frames belonging to different shots can be accurately distinguished based on the image similarity in the embodiment of the application, so that the video slicing is performed on the video to be sliced through the image similarity between two adjacent video frames based on the embodiment of the application, and the accuracy of the video slicing is improved compared with the prior art.

In the above S201, calculating the image similarity between every two video frames may be implemented by the following two ways:

in the first mode, the image similarity between every two adjacent video frames is calculated by a specified similarity algorithm.

The specified SIMilarity algorithm may be any one of a normalized correlation coefficient algorithm, a histogram SIMilarity algorithm, a Structural SIMilarity (SSIM) algorithm, and a perceptual hash (pHash) algorithm. Of course, the embodiment of the present application is not limited thereto, and the specified similarity calculation method may also be other algorithms for calculating the image similarity.

In the embodiment of the present application, a normalized correlation coefficient between two adjacent video frames may be calculated, and the calculated normalized correlation coefficient is used as an image similarity between the two adjacent video frames. Because the normalized correlation coefficient can better reflect the similarity of the contents included in two adjacent video frames, the normalized correlation coefficient is taken as the image similarity in the embodiment of the application, so that the accuracy of the video slice is higher.

And secondly, calculating the image similarity between the two adjacent video frames by adopting a plurality of similarity algorithms respectively aiming at each two adjacent video frames, and carrying out weighted summation on the image similarities calculated by adopting the plurality of similarity algorithms to obtain the finally required image similarity between the two adjacent video frames.

Wherein the plurality of similarity algorithms may include: a normalized correlation coefficient algorithm, a histogram similarity algorithm, an SSIM algorithm, and a pHash algorithm. Other algorithms for calculating image similarity may also be included.

In an implementation manner of the embodiment of the present application, taking a plurality of similarity algorithms including a normalized correlation coefficient algorithm and a histogram similarity algorithm as an example, as shown in fig. 3, a method for calculating an image similarity between every two adjacent video frames specifically includes the following steps:

s301, calculating a normalized correlation coefficient between two adjacent video frames.

And S302, calculating the histogram similarity between two adjacent video frames.

In the embodiment of the present application, the histogram similarity between two adjacent video frames may be calculated by the Bhattachayya Distance (Bhattachayya Distance).

And S303, carrying out weighted summation on the normalized correlation coefficient and the histogram similarity based on the first weighting coefficient and the second weighting coefficient, and taking the result of the weighted summation as the finally required image similarity between two adjacent video frames.

The first weighting coefficient is the weight of the normalized correlation coefficient, and the second weighting coefficient is the weight of the histogram similarity.

Alternatively, the first weighting factor and the second weighting factor may be set according to an empirical value, such as 0.7 for the first weighting factor and 0.3 for the second weighting factor. Of course, the values of the first weighting factor and the second weighting factor are not limited thereto.

The image similarity between two adjacent video frames can be specifically calculated by the following formula:

Score(I_t，I_t+1)＝alpha*CCOEFF(I_t，I_t+1)+(1-alpha)*HIST(O_t，O_t+1)

wherein, I_tFor video frames at time t in the video to be sliced, I_t+1The video frame at the t +1 moment in the video to be sliced is obtained. Score (I)_t，I_t+1) Is the image similarity between the video frame at the time t and the video frame at the time t + 1. alpha is the first weight coefficient, and 1-alpha is the second weight coefficient. CCOEFF (I)_t，I_t+1) Is a normalized correlation coefficient between the video frame at time t and the video frame at time t +1, HIST (I)_t，I_t+1) Is the histogram similarity between the video frame at time t and the video frame at time t + 1.

Wherein, CCOEFF (I)_t，I_t+1) And HIST (I)_t，I_t+1) All the values of (A) are in the range of 0 to 1, so the calculated Score (I)_t，I_t+1) Also ranges between 0 and 1.

Calculated Score (I)_t，I_t+1) The closer to 1, the more similar between two video frames.

Further, in the above S202, as an example, the preset threshold may be 0.75. Namely, if the image similarity is less than 0.75, video slicing is carried out at the time t in the video to be sliced; and if the image similarity is greater than 0.75, not slicing at the time t in the video to be sliced.

By adopting the embodiment of the application, the image similarity can be determined based on the normalized correlation coefficient and the histogram similarity of two adjacent video frames, and because the normalized correlation coefficient is mainly used for representing the similarity between the contents of the video frames, the histogram similarity can avoid the influence of light on the similarity of the two adjacent video frames, for example, the problem that the determined image similarity is inaccurate because the difference of the brightness of the two adjacent video frames is large is avoided. Therefore, the image similarity is determined based on the normalized correlation coefficient and the histogram similarity, so that the determined image similarity is more accurate, and the video slicing result is more accurate.

In another embodiment of the present application, before calculating the image similarity between every two adjacent video frames starting from the first video frame of the video to be sliced in S201, the method further includes:

and compressing the resolution of each video frame included in the video to be sliced to a preset resolution.

Alternatively, the preset resolution may be a resolution of 64 × 64.

Because the main information in the video frames can be represented by using the low-resolution video frames, that is, the image thumbnails, and then the image similarity between the video frames can be calculated based on the low-resolution video frames, the embodiment of the present application can compress the resolution of the video frames before executing the method flow of fig. 2, and perform subsequent processing based on the compressed resolution, so that the calculation amount can be reduced.

Based on the same technical concept, an embodiment of the present application further provides a video slicing apparatus, as shown in fig. 4, the apparatus includes:

a calculating module 401, configured to calculate an image similarity between every two adjacent video frames from a first video frame of a video to be sliced;

a slicing module 402, configured to perform video slicing between two adjacent video frames smaller than a preset threshold if there is an image similarity between the two adjacent video frames smaller than the preset threshold.

Optionally, the calculating module 401 is specifically configured to:

calculating the image similarity between every two adjacent video frames by a specified similarity algorithm; or, aiming at each two adjacent video frames, respectively adopting a plurality of similarity algorithms to calculate the image similarity between the two adjacent video frames; and carrying out weighted summation on the image similarity obtained by adopting a plurality of similarity algorithms to obtain the finally required image similarity between two adjacent video frames.

Optionally, the calculating module 401 is specifically configured to:

calculating a normalized correlation coefficient between two adjacent video frames;

calculating the similarity of histograms between two adjacent video frames;

and performing weighted summation on the normalized correlation coefficient and the histogram similarity based on a first weighting coefficient and a second weighting coefficient, and taking the result of the weighted summation as the finally required image similarity between two adjacent video frames, wherein the first weighting coefficient is the weight of the normalized correlation coefficient, and the second weighting coefficient is the weight of the histogram similarity.

Optionally, as shown in fig. 5, the apparatus further includes:

the compressing module 501 is configured to compress the resolution of each video frame included in the video to be sliced to a preset resolution.

Wherein the first weighting coefficient is 0.7 and the second weighting coefficient is 0.3.

By adopting the video slicing device provided by the embodiment of the application, the image similarity between every two adjacent video frames can be calculated from the first video frame of the video to be sliced, and if the image similarity between the two adjacent video frames is smaller than the preset threshold value, video slicing is performed between the two adjacent video frames smaller than the preset threshold value. Because the image similarity between the video frames of the same shot is high, and the image similarity between the video frames of different shots is low, the video frames belonging to the same shot and the video frames belonging to different shots can be accurately distinguished based on the image similarity in the embodiment of the application, so that the video slicing of the video to be sliced is performed through the image similarity between the adjacent video frames based on the embodiment of the application, and the accuracy of the video slicing is improved compared with the prior art.

An electronic device is also provided in the embodiments of the present application, as shown in fig. 6, and includes a processor 601 and a memory 603.

A memory 603 for storing a computer program;

the processor 601 is configured to implement the method steps in the above method embodiments when executing the program stored in the memory 603.

Optionally, the electronic device further comprises a communication interface 602 and a communication bus 604, wherein the processor 601, the communication interface 602 and the memory 603 complete communication with each other through the communication bus 604.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, or discrete hardware components.

Based on the same technical concept, embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the video slicing method steps described above.

Based on the same technical concept, embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, causes the computer to perform the above-mentioned video slicing method steps.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

13页详细技术资料下载

Video slicing method and device, electronic equipment and storage medium

相关技术

网友询问留言