Video processing method and device, electronic equipment and computer storage medium

文档序号：142709 发布日期：2021-10-22 浏览：20次中文

阅读说明：本技术 视频处理方法、装置、电子设备及计算机存储介质 (Video processing method and device, electronic equipment and computer storage medium ) 是由夏朱荣耿致远张士伟唐铭谦于 2020-04-21 设计创作，主要内容包括：本发明实施例提供了一种视频处理方法、装置、电子设备及计算机存储介质。其中,所述视频处理方法,包括：响应于视频拆分请求,获得所述请求指示的待拆分视频中的至少一个视频片段；按照配置信息指示的分析维度,对所述视频片段进行分析,并获得所述视频片段对应的结构化分析结果；至少根据结构化分析结果,对视频片段进行聚合,以获得拆分出的视频。通过本发明实施例,可以对视频进行拆分。(The embodiment of the invention provides a video processing method and device, electronic equipment and a computer storage medium. The video processing method comprises the following steps: responding to a video splitting request, and obtaining at least one video segment in a video to be split indicated by the request; analyzing the video clips according to the analysis dimensionality indicated by the configuration information, and obtaining structural analysis results corresponding to the video clips; and aggregating the video segments at least according to the structural analysis result to obtain the split video. By the embodiment of the invention, the video can be split.)

1. A video processing method, comprising:

responding to a video splitting request, and obtaining at least one video segment in a video to be split indicated by the request;

analyzing the video clips according to the analysis dimensionality indicated by the configuration information, and obtaining structural analysis results corresponding to the video clips;

And aggregating the video segments at least according to the structural analysis result to obtain the split video.

2. The method of claim 1, wherein the obtaining, in response to a video splitting request, at least one video segment in the video to be split indicated by the request comprises:

and splitting the video to be split according to a preset splitting mode to obtain at least one video clip corresponding to the preset splitting mode.

3. The method of claim 2, wherein the preset splitting pattern comprises at least one of: splitting according to a lens, splitting according to a person, splitting according to a background of an image frame in a video to be split, and splitting according to a light and shade threshold of the image frame in the video to be split.

4. The method of claim 1, wherein the analyzing the video segments according to the analysis dimension indicated by the configuration information and obtaining the structural analysis result corresponding to the video segments comprises:

determining a target analysis dimension according to an analysis dimension included in preset first configuration information or an analysis dimension included in second configuration information input by a user, wherein the target analysis dimension includes at least one of the following: a scene analysis dimension, a character analysis dimension, and an event analysis dimension;

And analyzing the video clips according to the target analysis dimensionality, and obtaining structural analysis results corresponding to the video clips.

5. The method of claim 4, wherein the analyzing the video segments according to the target analysis dimension and obtaining the structural analysis result corresponding to the video segments comprises:

and if the target analysis dimension comprises a scene analysis dimension, performing scene analysis on the video clip and obtaining a scene structural analysis result, wherein the scene structural analysis result comprises a scene corresponding to the video clip and scene time information of the scene.

6. The method of claim 5, wherein the second configuration information further comprises user-input information of a join image, the information of the join image being used for indicating a join image frame for joining different scenes in the video to be split;

the scene analysis of the video clips and the acquisition of the scene structural analysis result comprise:

and analyzing the video clips according to the linked images by using a scene analysis algorithm, and obtaining a scene structural analysis result.

7. The method of claim 4, wherein the analyzing the video segments according to the target analysis dimension and obtaining the structural analysis result corresponding to the video segments comprises:

and if the target analysis dimension comprises a character analysis dimension, performing character analysis on the video segment, and obtaining a character structuralization analysis result, wherein the character structuralization analysis result comprises characters corresponding to the faces contained in the video segment and the occurrence time information of the characters.

8. The method of claim 7, wherein the second configuration information further comprises person category information input by a user, wherein the person category information is used for indicating an identity category to which a person contained in the video to be split belongs;

the character analysis of the video clips and the acquisition of character structural analysis results comprise the following steps:

and analyzing the video clips by using a character analysis algorithm corresponding to the character category information, and obtaining the character structural analysis result.

9. The method of claim 4, wherein the analyzing the video segments according to the target analysis dimension and obtaining the structural analysis result corresponding to the video segments comprises:

And if the target analysis dimension comprises an event analysis dimension, performing event analysis on the video clip, and obtaining an event structuralization analysis result, wherein the event structuralization analysis result comprises events contained in the video clip and occurrence time information of the events.

10. The method of claim 1, wherein the aggregating video segments to obtain the split video according to at least the structural analysis result comprises:

determining a target aggregation condition according to the configuration information, wherein the target aggregation condition comprises at least one of the following conditions: scene aggregation conditions, character aggregation conditions, and event aggregation conditions;

and aggregating the video segments according to the target aggregation condition and the corresponding structural analysis result to obtain the split video.

11. The method of claim 10, wherein the aggregating video segments to obtain the split video according to the target aggregation condition and the corresponding structural analysis result comprises:

if the target aggregation condition comprises a scene aggregation condition, acquiring scene structural analysis results corresponding to a plurality of video clips;

Acquiring a segment set corresponding to the split video according to the plurality of scene structural analysis results;

wherein the segment set comprises a first set and/or a second set, the first set comprises at least two video segments satisfying a similarity condition, and the similarity condition comprises: the at least two video clips are continuous in time sequence, the similarity between the scene structural analysis results corresponding to any two adjacent video clips meets a scene aggregation threshold, and the second set comprises one video clip which does not meet the similarity condition;

and aggregating the video segments in the segment set to obtain the split video.

12. The method of claim 10, wherein the aggregating video segments to obtain the split video according to the target aggregation condition and the corresponding structural analysis result comprises:

if the target aggregation condition comprises a character aggregation condition, acquiring character structural analysis results corresponding to the plurality of video segments;

and aggregating video segments corresponding to the human structural analysis results containing the current human to obtain the split video aiming at the human included in the human structural analysis results.

13. The method of claim 10, wherein the aggregating video segments to obtain the split video according to the target aggregation condition and the corresponding structural analysis result comprises:

if the target aggregation condition comprises an event aggregation condition, acquiring the event structured analysis results corresponding to the plurality of video clips;

and aggregating video segments corresponding to the event structuring analysis results with the similarity greater than or equal to an event aggregation threshold value with respect to the events in the event structuring analysis results to obtain the split video.

14. The method of claim 1, wherein the method further comprises:

carrying out audio analysis on the video to be split, and obtaining time information of character conversation in the video to be split;

and adjusting the starting time and/or the ending time corresponding to the split video by using the time information of the character conversation.

15. The method of claim 1, wherein the method further comprises:

and filtering the split video according to the splitting time length configured by the user and the time length of the split video so as to screen out the split video with the time length smaller than the configured splitting time length.

16. The method of claim 1, wherein the method further comprises:

acquiring a boundary adjustment request for indicating adjustment of the split video boundary, wherein the boundary adjustment request comprises information of a target split video, and a target start time and/or a target end time;

and adjusting the starting time and/or the ending time of the split video indicated by the information of the target split video to the corresponding target starting time and/or target ending time.

17. The method of claim 1, wherein the method further comprises:

acquiring an increase request for indicating to increase the split video, wherein the increase request comprises time information of the video to be increased;

and obtaining the video to be added requested by the adding request according to the time information of the video to be added.

18. A video processing method, comprising:

receiving information of a video to be split input by a user through an interactive interface, and generating a video splitting request according to the information of the video to be split, wherein the video splitting request is used for indicating to obtain a split video corresponding to the video to be split;

Acquiring a split video responding to the video splitting request, and displaying at least part of the split video in a preview interface, wherein the split video is obtained by the method of any one of claims 1 to 17.

19. The method of claim 18, wherein the generating a video splitting request according to the video to be split added by the user through the interactive interface comprises:

acquiring second configuration information added by a user through the interactive interface, wherein the second configuration information at least comprises an analysis dimension configured by the user;

and generating the video splitting request according to the information of the video to be split and the second configuration information.

20. A video processing apparatus comprising:

the video splitting module is used for responding to a video splitting request and obtaining at least one video segment in the video to be split indicated by the request;

the analysis module is used for analyzing the video clips according to the analysis dimensionality indicated by the configuration information and obtaining the structural analysis results corresponding to the video clips;

and the aggregation module is used for aggregating the video segments at least according to the structural analysis result so as to obtain the split video.

21. A video processing apparatus comprising:

the request module is used for receiving information of a video to be split input by a user through an interactive interface and generating a video splitting request according to the information of the video to be split, wherein the video splitting request is used for indicating to obtain a split video corresponding to the video to be split;

a display module, configured to obtain a split video that responds to the video splitting request, and display at least a portion of the split video in a preview interface, where the split video is obtained by the apparatus according to claim 20.

22. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is configured to store at least one executable instruction, which causes the processor to perform an operation corresponding to the video processing method according to any one of claims 1 to 17, or to perform an operation corresponding to the video processing method according to claim 18 or 19.

23. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements a video processing method as claimed in any one of claims 1 to 17, or which, when executed, implements a video processing method as claimed in claim 18 or 19.

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a video processing method and device, electronic equipment and a computer storage medium.

Background

With the acceleration of life rhythm, the fragmentation time is increased, and the demand of utilizing the fragmentation time is stronger and stronger. Taking video as an example, a user may want to obtain the required information from the video in a shorter time without having to finish watching the complete video. The prior art needs to meet the requirement that the video is played at a faster speed by self-adjustment, so that the video can be viewed in a shorter time and the required information can be obtained. Or, in the video playing process, the user manually adjusts the playing progress to skip some contents.

The two methods have the problems that the user needs to manually adjust the video content, the user still needs to watch the video completely, and the required information can be lost if the user does not know the content of the video.

Disclosure of Invention

Embodiments of the present invention provide a video processing scheme to solve some or all of the above problems.

According to a first aspect of the embodiments of the present invention, there is provided a video processing method, including: responding to a video splitting request, and obtaining at least one video segment in a video to be split indicated by the request; analyzing the video clips according to the analysis dimensionality indicated by the configuration information, and obtaining structural analysis results corresponding to the video clips; and aggregating the video segments at least according to the structural analysis result to obtain the split video.

According to a second aspect of the embodiments of the present invention, there is provided a video processing method, including: receiving information of a video to be split input by a user through an interactive interface, and generating a video splitting request according to the information of the video to be split, wherein the video splitting request is used for indicating to obtain a split video corresponding to the video to be split; acquiring a split video responding to the video splitting request, and displaying at least part of the split video in a preview interface, where the split video is obtained by the method of the first aspect.

According to a third aspect of embodiments of the present invention, there is provided a video processing apparatus including: the video splitting module is used for responding to a video splitting request and obtaining at least one video segment in the video to be split indicated by the request; the analysis module is used for analyzing the video clips according to the analysis dimensionality indicated by the configuration information and obtaining the structural analysis results corresponding to the video clips; and the aggregation module is used for aggregating the video segments at least according to the structural analysis result so as to obtain the split video.

According to a fourth aspect of the embodiments of the present invention, there is provided a video processing apparatus, including a request module, configured to receive information of a video to be split input by a user through an interactive interface, and generate a video splitting request according to the information of the video to be split, where the video splitting request is used to instruct to obtain a split video corresponding to the video to be split; and the display module is used for acquiring the split video responding to the video splitting request and displaying at least part of the split video in a preview interface, wherein the split video is obtained by the device in the third aspect.

According to a fifth aspect of embodiments of the present invention, there is provided an electronic apparatus, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the video processing method according to the first aspect or the second aspect.

According to a sixth aspect of embodiments of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements a video processing method as described in the first or second aspect.

According to the video processing scheme provided by the embodiment of the invention, the video to be split is segmented to obtain the corresponding video segments, and each video segment is analyzed according to the analysis dimensionality indicated by the configuration information to obtain the normalized structured analysis result, so that the video segments can be aggregated according to the structured analysis result to obtain the required split video, thereby realizing the automatic splitting of the video to be split, meeting different analysis dimensionality requirements and being beneficial to improving the adaptability and the video splitting efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and it is also possible for a person skilled in the art to obtain other drawings based on the drawings.

Fig. 1a is a flowchart illustrating steps of a video processing method according to a first embodiment of the present invention;

FIG. 1b is a schematic diagram of a usage scenario according to a first embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of a video processing method according to a second embodiment of the present invention;

FIG. 3 is a flowchart illustrating steps of a video processing method according to a third embodiment of the present invention;

fig. 4 is a flowchart illustrating steps of a video processing method according to a fourth embodiment of the present invention;

FIG. 5a is a flowchart illustrating steps of a video processing method according to a fifth embodiment of the present invention;

FIG. 5b is a diagram illustrating a usage scenario according to a fifth embodiment of the present invention;

FIG. 5c is a schematic diagram of another usage scenario according to the fifth embodiment of the present invention;

fig. 6 is a block diagram of a video processing apparatus according to a sixth embodiment of the present invention;

Fig. 7 is a block diagram of a video processing apparatus according to a seventh embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an eighth embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.

The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.

Example one

Referring to fig. 1a, a flow chart of steps of a video processing method according to a first embodiment of the present invention is shown.

The video processing method of the embodiment comprises the following steps:

step S102: responding to a video splitting request, and obtaining at least one video segment in the video to be split indicated by the request.

The video splitting request is used for indicating that the video to be split is split into one or more split videos (for example, video splitting bars). The video to be split can be any appropriate video, long video or short video.

The person skilled in the art may obtain the video segments in the video to be split in any suitable manner, for example, in a specific implementation, step S102 may be implemented as: and splitting the video to be split according to a preset splitting mode to obtain at least one video clip corresponding to the preset splitting mode. Therefore, the split video can be more flexibly split, and different requirements can be met.

The preset splitting manner may be configured as required, for example, the preset splitting manner includes at least one of the following: splitting according to a lens, splitting according to a person, splitting according to a background of an image frame in a video to be split, and splitting according to a light and shade threshold of the image frame in the video to be split.

And adopting a proper processing process aiming at different preset splitting modes, for example, when splitting according to the person, identifying the person contained in the image frame in the video to be split, and if the persons contained in two adjacent image frames are different, splitting to obtain the video segments corresponding to different persons. Similarly, the split may be by background, light and shade thresholds, etc. in the image frame.

For another example, when splitting the video to be split into video segments corresponding to shots according to the shots, any existing Shot splitting algorithm, such as Shot Boundary Detection (SBD) and the like, may be used. The SBD can detect whether shot conversion exists in the video to be split, so that the video to be split is split according to the fact whether shot conversion exists.

In the field of video processing, a shot refers to a group of pictures between two clip points in video editing, and is a basic unit constituting the entire video. Based on this, the video clips corresponding to the shots in the embodiment of the present invention may be video clips corresponding to a single shot, or may be video clips combined by video clips corresponding to multiple shots. In this embodiment, a description will be given taking an example in which a video clip is a video clip corresponding to a shot.

Step S104: and analyzing the video clips according to the analysis dimensions indicated by the configuration information, and obtaining structural analysis results corresponding to the video clips.

The analysis dimension is used for indicating an angle of analysis required to be performed on the video segment, and the analysis dimension includes at least one of: a scene analysis dimension, a people analysis dimension, and an event analysis dimension.

The analysis dimensionality indicated by the configuration information can be determined according to preset first configuration information and also can be determined according to second configuration information input by a user, so that the user can configure the analysis dimensionality as required to control corresponding analysis on the video clip, and further the bar splitting requirement of the user is met.

The analysis of the video segments may be a structured analysis, which may refer to a process of obtaining normalizable structural information from the video segments.

The structured analysis corresponds to an analysis dimension, that is, the structured analysis comprises at least one of: scene analysis, people analysis, and event analysis.

Taking the human figure analysis in the structural analysis as an example, the process of analyzing information such as the starting time point and the ending time point of the appearance of the coming human figure, the human figure corresponding to the human face and the like from the binary data of the original video clip and forming normalization structural information by the information can be called structural analysis, and the normalization structural information output by the process can be called a structural analysis result.

By performing scene analysis on the video clips, at least scenes contained in each video clip and scene time information corresponding to each scene can be determined. For example, video segment 1 contains an indoor scene, and the scene time information is 1: 00-2: 20. The video clip 2 comprises an outdoor scene, and the scene time information is 2: 25-3: 03, and the like.

Scene analysis may be performed by one skilled in the art in any suitable manner, and the embodiment is not limited thereto, for example, scene recognition is performed by using the place365 data set through the Resnet-50 neural network model.

By performing person analysis on the video segments, at least persons contained in each video segment and information on the occurrence time of the persons can be determined. For example, the video clip 3 includes a person a, a person B, and a person C, the presence time information of the person a is 3:10 to 3:15, the presence time information of the person B is 3:07 to 3:12, and the presence time information of the person C is 3:16 to 3: 22.

Persons skilled in the art can perform human analysis in any suitable manner, and the embodiment is not limited thereto. For example, the face detection algorithm detects the face in at least part of image frames in a video clip, the ArcFace face feature algorithm obtains feature information of the detected face, and the cos distance between the feature information of the face and preset person feature information is calculated to determine a person corresponding to the face.

By performing event analysis on the video clips, at least the events contained in each video clip and the occurrence time information of the events can be determined. For example, the video segment 4 includes an event a, the event occurrence time information of the event a is 4:00 to 4:50, the video segment 5 also includes the event a, and the corresponding occurrence time information is 4:55 to 5: 30. The video clip 6 comprises an event B, and the occurrence time information of the event B is 5: 31-6: 10.

Event analysis may be performed in any suitable manner by one skilled in the art, and the present embodiment is not limited thereto. For example, event analysis and the like are performed through an I3D neural network model.

The structured analysis result is obtained by analyzing the video segments, so that the split videos can be quickly obtained by using the corresponding structured analysis result according to the subsequent aggregation requirements of different types, the purpose of quickly splitting the to-be-split videos in one or more dimensions is achieved, and the requirements of users are met.

Step S106: and aggregating the video segments at least according to the structural analysis result to obtain the split video.

In a specific implementation, video segments can be aggregated according to structured analysis results corresponding to different analysis dimensions, so as to obtain a split video.

For example, video segments that contain the same or similar scenes and are continuous in time are aggregated according to the scene structural analysis result corresponding to the scene analysis dimension to form a scene splitting-based split video (also referred to as a video splitting bar).

Specifically, according to scene structural analysis results corresponding to the video clips, it is determined that the video clips 7-9 all include an indoor scene A, the video clip 10 includes an outdoor scene B, the video clips 11 and 12 both include the indoor scene A, the video clips 7-9 are aggregated to form a split video, the video clip 10 alone forms a split video, and the video clips 11 and 12 are aggregated to form a split video.

It should be noted that, in the present embodiment, the aggregated split videos are indicated by means of a start time and an end time (for convenience of description, referred to as a video split bar). For example, the video clip I formed by aggregating the video segments 7-9 can be represented by the starting time (e.g., 10:33) of the video segment 7 and the ending time (e.g., 12:01) of the video segment 9. The method represents that the video splitting bar does not need to actually intercept the video to be split, so that the subsequent user is very convenient if the content contained in the split video needs to be adjusted, the user is prevented from frequently intercepting the video to be split during adjustment, and the data processing amount and the occupation of hardware resources (such as storage space, CPU and other computing resources) are reduced.

Of course, in other embodiments, the split video may also be represented in other manners, for example, by setting a mark on the video to be split, or directly capturing the split video from the video to be split, and the embodiment does not limit this.

Similarly, aggregation may be performed according to the person structural analysis result corresponding to the person analysis dimension, so as to obtain a video split according to the person, and the like, which is not described herein again.

The following description is made in conjunction with a specific usage scenario to implement the video processing method, as follows:

as shown in fig. 1b, a user may add a video to be split in an interactive interface of a terminal device, where the video to be split may be a video uploaded by the user or a video selected by the user from a video list. The terminal device generates a video splitting request according to information (such as a name and an ID of the video or a corresponding URL address) of the video to be split.

Optionally, the user may perform other configurations in the interactive interface as desired, such as selecting analysis dimensions, configuring aggregation conditions, configuring filters, and so forth. If the user performs other configurations, the corresponding second configuration information can be obtained according to the configurations performed by the user, and a video splitting request is generated according to the information of the video to be split and the second configuration information. Subsequently, according to the video splitting request, the video processing method can be executed on the video to be split, and the corresponding split video is obtained.

In this usage scenario, the case where the user is instructed to select the scene analysis dimension and the character analysis dimension in the second configuration information of the user will be described as an example. The video processing method can be executed by a server (the server comprises a server and/or a cloud). Of course, the video processing method may be executed by the terminal device in other usage scenarios, which are not limited in this usage scenario.

And obtaining at least one video clip in the video to be split according to the video splitting request. For example, a video to be split is split according to a preset splitting mode, and a video clip corresponding to the splitting mode is obtained. In the use scenario, the preset splitting manner may be to split according to a shot to obtain a video clip corresponding to the shot, where the video clip may be represented by a start time and an end time, for example, the start time of the video clip 14 corresponding to the shot a is 5: 04-5: 30.

and determining the analysis dimension according to the second configuration information of the user, wherein the analysis dimension comprises a scene analysis dimension and a character analysis dimension. Based on the scene analysis, the scene analysis is carried out on each video clip to obtain the scene structural analysis result of each video clip, wherein the scene structural analysis result comprises scene and scene time information; and performing character analysis on the video clips to obtain character structural analysis results of the video clips, wherein the character structural analysis results comprise characters included in the video clips, appearance time information of the characters and the like.

According to the scene structural analysis result, on one hand, video segments which comprise the same scene and are continuous in time sequence are aggregated to form a split video. If the similarity of the scenes contained in the video segments 1-3 meets a scene aggregation threshold, aggregating the video segments 1-3 to obtain the start time and the end time of the split video (which can also be called as video split bar A) so as to represent the video split bar A through the start time and the end time. And if the similarity of the scenes contained in the video segments 5-9 meets a scene aggregation threshold, aggregating the video segments 5-9 into a split video (which can also be called as a video splitting bar C).

On the other hand, the independent video segments are formed into independent split videos. If the similarity between the scene included in the video segment 4 and the scene included in the video segment 3 is smaller than the scene aggregation threshold, and the similarity between the scene included in the video segment 4 and the scene included in the video segment 5 is smaller than the scene aggregation threshold, the video segment 4 is taken as a split video (also referred to as a video split bar B).

It should be noted that not all of the above two aspects are necessarily present. For example, in other usage scenarios, only video segments that include scenes and are temporally continuous may be aggregated to form a split video, or only independent video segments may be aggregated to form an independent split video, based on the scene structure analysis result.

And according to the character structural analysis result, aggregating the video segments containing the same character to form a split video. For example, if the video segments 1 to 3, 5, and 8 include the person a, these video segments are aggregated to form a split video (which may also be referred to as a video split bar D). When the video segments 2, 4, 6 to 9 include the person B, the video segments are aggregated to form a split video (which may also be referred to as a video split bar E).

After the split video is obtained, the split video can be sent to the terminal device, so that the split video is displayed to a user through the terminal device, and the user can browse the split video or adjust the split video according to the needs of the user.

If the video splitting request does not include the second configuration information, the analysis dimension may be determined according to the first configuration information, and the process of determining the analysis dimension according to the first configuration information and splitting the video to be split is similar to the process of splitting according to the second configuration information, so that details are not repeated.

Through this embodiment, to waiting to split the video and cut, obtain corresponding video segment, carry out the analysis to each video segment according to the analysis dimension that configuration information instructed, obtain the structuralization analysis result that can standardize to can aggregate the video segment according to structuralization analysis result, obtain the video that the split that needs was gone out, realized treating the split video automatically like this and split, can satisfy different analysis dimension demands moreover, help promoting adaptability and video split efficiency.

The video processing method of the present embodiment may be performed by any suitable electronic device having data processing capabilities, including but not limited to: servers, mobile terminals (such as tablet computers, mobile phones and the like), PCs and the like.

Example two

Referring to fig. 2, a flow chart of steps of a video processing method according to a second embodiment of the invention is shown.

The video processing method of the present embodiment includes the foregoing steps S102 to S106.

Wherein, step S104 includes the following substeps:

step S1041: and determining a target analysis dimension according to the analysis dimension included by the preset first configuration information or the analysis dimension included by the second configuration information input by the user.

The first configuration information includes an analysis dimension of the default configuration. The first configuration information may be used when the second configuration information is not included in the video split request.

The second configuration information includes an analysis dimension selected by the user. Optionally, the second configuration information may further include information of a link image uploaded by the user, person category information, video content category information, and the like. The information of the joining image is used for indicating a joining image frame (such as a transition image) used for joining different scenes in the video to be split. The information of the person category is used for indicating the identity category (such as star) to which the person contained in the video to be split belongs. The video content category information is used for indicating the content category of the video to be split, such as movie and television episode or news.

The user can configure through the interactive interface of the terminal device, so that the second configuration information is generated. Therefore, interactivity in the split video process is improved, a user can select analysis dimensionality according to own needs, and then the aggregation conditions when the video segments are aggregated can be determined according to the analysis dimensionality.

If the video splitting request carries the second configuration information, a target analysis dimension (which is used as the analysis dimension indicated by the configuration information) can be determined according to the second configuration information. On the contrary, if the video splitting request does not carry the second configuration information, the target analysis dimension may be determined according to the preset first configuration information.

The target analysis dimension may include at least one of: a scene analysis dimension, a people analysis dimension, and an event analysis dimension. In the present embodiment, the description is given by taking an example in which the scene analysis dimension, the character analysis dimension, and the event analysis dimension are included.

Step S1042: and analyzing the video clips according to the target analysis dimensionality, and obtaining structural analysis results corresponding to the video clips.

The step S1042 may include at least one of the processes a1 to C1 described below for performing different analyses with respect to the analysis dimension indication.

Procedure a 1: and if the target analysis dimension comprises a scene analysis dimension, performing scene analysis on the video clip and obtaining a scene structural analysis result, wherein the scene structural analysis result comprises a scene corresponding to the video clip and scene time information of the scene.

The scene analysis of the video segments may be performed in any suitable manner by a person skilled in the art. For example, each video clip is input into a Resnet-50 neural network model for scene analysis, and the output scene structural analysis result corresponding to the input video clip is obtained. The scene structural analysis result comprises the scene of the corresponding video clip and the scene time information of the scene.

If video clip 1 is input into the Resnet-50 neural network model, a scene structural analysis result a corresponding to video clip 1 is obtained, which indicates that the scene corresponding to video clip 1 is an indoor scene a, and the scene time information is "1: 00-2: 20 ".

Inputting the video clip 2 into a Resnet-50 neural network model, and obtaining a scene structural analysis result B corresponding to the video clip 2, wherein the scene structural analysis result B indicates that the scene corresponding to the video clip 2 is an outdoor scene B, and the scene time information is' 2: 25-3: 03".

Optionally, in the process a1, in order to improve the accuracy of the scene analysis and make the scene time information more accurate, if the second configuration information includes information of linking images, in the process a1, the performing the scene analysis on the video segment and obtaining the scene structural analysis result may be implemented as: and analyzing the video clips according to the linked images by using a scene analysis algorithm, and obtaining a scene structural analysis result.

The information of the linked image is used for indicating the linked image frame used for linking different scenes in the video to be split, and the linked image is usually used for linking different scenes, so that if a user uploads the linked image, when a scene analysis algorithm (such as an algorithm based on a Resnet-50 neural network model) is used for analyzing a video clip, the linked image can be combined, so that the scene time information contained in a scene structural analysis result is more accurate, and further, the split video obtained according to the structural analysis result is more accurate.

Procedure B1: and if the target analysis dimension comprises a character analysis dimension, performing character analysis on the video segment, and obtaining a character structuralization analysis result, wherein the character structuralization analysis result comprises characters corresponding to the faces contained in the video segment and the occurrence time information of the characters.

Persons skilled in the art may perform character analysis on each video segment in any suitable manner. For example, a retinaFace face detection algorithm is used to detect the face contained in at least part of the image frames in each video segment and the position of the face in the image frame. And obtaining the feature information of the detected face through an ArcFace face feature algorithm. And determining the figure corresponding to the face according to the feature information of the face and the preset figure feature information, and further determining the occurrence time information of the figure so as to generate a figure structural analysis result.

Optionally, in the process B1, to improve the accuracy of the human character analysis, if the second configuration information includes human character category information, in a process B1, the performing human character analysis on the video segment and obtaining the result of human character structural analysis includes: and analyzing the video clips by using a character analysis algorithm corresponding to the character category information, and obtaining the character structural analysis result.

In order to improve the accuracy of human analysis, human face images with different identity types can be used in advance to train a human face detection algorithm, a human face feature algorithm and the like, so that different human face detection algorithms and human face feature algorithms are obtained, and then the different human face detection algorithms and the human face feature algorithms can be collectively called as human analysis algorithms.

In this way, if the person type information indicates that the identity type of the person is star, person analysis is performed on the video segment by using a person analysis algorithm corresponding to the star (for example, a face detection algorithm and a face feature algorithm trained by using an image of a face including the star), and a person structural analysis result is obtained. At least the person in the video segment and the appearance time information of the person are included in the person structural analysis result.

Procedure C1: and if the target analysis dimension comprises an event analysis dimension, performing event analysis on the video clip, and obtaining an event structuralization analysis result, wherein the event structuralization analysis result comprises events contained in the video clip and occurrence time information of the events.

Event analysis may be performed on the video segments in any suitable manner by one skilled in the art. For example, event analysis is performed by using an I3D neural network model, and the events contained in each video segment and the occurrence time information of the events are obtained.

In addition, the method can allow the user to perform personalized configuration, and can meet the accurate splitting requirement of the user to the maximum extent under the condition of user configuration (namely under the condition of second configuration information). And the method can also adapt to the situation that the user does not carry out configuration (namely, the situation that second configuration information does not exist), under the situation, analysis can be carried out according to a plurality of dimensions, and the corresponding structured analysis result is obtained, so that the video splitting is carried out according to the structured analysis result, and the splitting requirements of the user with different dimensions can be met.

EXAMPLE III

Referring to fig. 3, a flow chart of steps of a video processing method according to a third embodiment of the present invention is shown.

The video processing method of the present embodiment includes the foregoing steps S102 to S106. Step S104 may adopt an implementation manner in any of the foregoing embodiments.

In the present embodiment, step S106 includes sub-step S1061 and sub-step S1062.

Substep S1061: and determining a target aggregation condition according to the configuration information.

The target polymerization conditions include at least one of: scene aggregation conditions, person aggregation conditions, and event aggregation conditions.

The configuration information may be the first configuration information or the second configuration information. For example, if the user selects the target aggregation condition in the second configuration information, the target aggregation condition is determined directly according to the user selection. Or, if the target aggregation condition selected by the user is not included in the second configuration information, but the analysis dimension is selected by the user, the corresponding target aggregation condition may be determined according to the analysis dimension, and if the analysis dimension includes the scene analysis dimension, the target aggregation condition correspondingly includes the scene aggregation condition.

Substep S1062: and aggregating the video segments according to the target aggregation condition and the corresponding structural analysis result to obtain the split video.

The sub-step S1062 may include at least one of the processes a2 through C2 for different target aggregation conditions.

Procedure a 2: when the target aggregation condition includes a scene aggregation condition, the process a2 may be implemented by sub-steps I to III.

Substep I: and if the target aggregation condition comprises a scene aggregation condition, acquiring scene structural analysis results corresponding to the plurality of video clips.

Because the video clips are analyzed in multiple dimensions, the video clips can be aggregated according to the structural analysis result corresponding to the target aggregation condition when the video clips are aggregated. For example, when aggregation is performed based on a scene, aggregation is performed based on the result of scene structural analysis.

The scene structural analysis result comprises scenes of the corresponding video clips and scene time information. In the present embodiment, the scene time information includes a start time and an end time of the scene.

Of course, in other embodiments, the scene structural analysis result may include other information as long as it can indicate the scene included in the video segment.

And (II) sub-step: and acquiring a segment set corresponding to the split video according to the plurality of scene structural analysis results.

Wherein the segment set comprises a first set and/or a second set, the first set comprises at least two video segments satisfying a similarity condition, and the similarity condition comprises: the at least two video clips are continuous in time sequence, the similarity between the scene structural analysis results corresponding to any two adjacent video clips meets a scene aggregation threshold, and the second set comprises one video clip which does not meet the similarity condition.

In a specific implementation, for a certain video segment, the segment set to which the video segment belongs may be determined by determining whether a similarity condition is satisfied between the certain video segment and an adjacent preceding video segment and an adjacent succeeding video segment.

For example, sub-step II may be implemented as:

and determining whether the next video clip adjacent to the current video clip meets the similarity condition or not according to the time sequence relation among the video clips.

Taking video segment 1 as the current video segment as an example, if the similarity between the scene structural analysis result of video segment 1 and the scene structural analysis result of video segment 2 meets the scene aggregation threshold, then video segments 1 and 2 meet the similarity condition.

The scene aggregation threshold may be determined as needed, which is not limited in this embodiment. For example, it may be 70%, 80%, 90%, 100%, etc. In this embodiment, the scene aggregation threshold is described as 100%. The scene aggregation threshold value can ensure the accuracy of aggregation and ensure that the same scene is aggregated.

The similarity may be calculated by any suitable means, such as by calculating the euclidean distance of the scene, etc., by those skilled in the art.

In one case, if the subsequent video segment meets the similarity condition, the subsequent video segment is added to the set (denoted as segment set a) where the current video segment is located, the subsequent video segment is determined as a new current video segment, and whether the subsequent video segment adjacent to the current video segment meets the similarity condition is determined according to the time sequence relationship among the video segments.

For example, video clip 2 satisfies the similarity condition, and video clip 2 is added to clip set a, which includes video clips 1 and 2. In addition, video segment 2 is taken as the new current video segment. It is determined whether the video segment 3 satisfies the similarity condition. If video segment 3 is satisfied, then video segment 3 is added to segment set a, where segment set a includes video segments 1 through 3.

In another case, if the subsequent video segment does not satisfy the similarity condition, a new set is created for the subsequent video segment, the subsequent video segment is used as a new current video segment, and whether the subsequent video segment adjacent to the current video segment satisfies the similarity condition is determined according to the time sequence relationship between the video segments.

For example, when the video segment 4 does not satisfy the similarity condition while the video segment 3 is the current video segment, a new set (referred to as segment set B) is created for the video segment 4, and the video segment 4 is used as the new current video segment to determine whether the video segment 5 satisfies the similarity condition. If the video segment 5 is not satisfied, a new set (denoted as segment set C) is created for the video segment 5, and the video segment 5 is used as a new current video segment, and the above steps are repeated until the last video segment is determined.

The acquired segment set A can be regarded as a first set because the acquired segment set A comprises video segments 1-3 meeting the similarity condition. The clip set B can be considered as a second set since it includes only video clips 4.

It should be noted that, in this embodiment, sub-step II is described in a manner of determining whether video segments satisfy the similarity condition one by one according to the time sequence backward. In other embodiments, however, sub-step II may be implemented in a manner determined one by one in a forward direction according to the timing sequence, or in other implementations.

And substep III: and aggregating the video segments in the segment set to obtain the split video.

After acquiring a segment set corresponding to each split video (i.e., a video split bar, which is referred to as a split video in the following for the sake of simplicity of description, but a person skilled in the art can know that the split video may be in other forms, which is not limited in this embodiment), the video segments in each segment set are aggregated to obtain a corresponding video split bar.

In a feasible mode, the segment set a includes video segments 1 to 3, and the start time of the scene time information in the scene structural analysis result of the video segment 1 is used as the start time of the video stripping bar, and the end time of the scene time information in the scene structural analysis result of the video segment 3 is used as the end time of the video stripping bar, so as to represent the aggregated video stripping bar.

Of course, in other embodiments, the video splitting bar may be represented in other manners, or the video splitting bar may be directly intercepted from the video to be split.

In this embodiment, in addition to the aggregation based on scenes, the aggregation may be performed based on people, and at this time, the sub-step S1062 includes a process B2.

Procedure B2: in the case of person-based polymerization, the process B2 can be implemented by sub-step IV to sub-step V.

Substep IV: and if the target aggregation condition comprises a character aggregation condition, acquiring character structural analysis results corresponding to the plurality of video segments.

The human character structural analysis result comprises the human characters included in the corresponding video segments and the appearance time information of the human characters.

The result of the human structuralization analysis may further include other information, such as the position of the face corresponding to the human in the image frame, and the like, as required, which is not limited in this embodiment.

And a substep V: and aggregating video segments corresponding to the human structural analysis results containing the current human to obtain the split video aiming at the human included in the human structural analysis results.

For example, according to the person structural analysis result corresponding to the video segments 1-3, if the video segments 1-3 are determined to include the person A, the video segments 1-3 are aggregated into a video strip corresponding to the person A. The aggregated video strips may also be represented by the start time and the end time of the included video segments.

For example, video clip A is expressed by the start time and the end time of video segments 1-3, such as { "1: 00-2: 20 "," 2: 30-2: 50 "" 2: 55-3: 33 "}.

It should be noted that the character may be a role in the video to be split, or may be an actor playing the role. For example, if a certain actor decorates two corners of one actor in a certain video to be split, all video segments including the actor may be aggregated into one video split bar, or each role may be aggregated into one video split bar, which is not limited in this embodiment.

Similar to the implementation principle of aggregation based on people, aggregation can also be performed based on events. At this time, step S106 includes a process C2.

Procedure C2: in case of event-based polymerization, the process C2 can be implemented by sub-steps VI to VII.

Substep VI: and if the target aggregation condition comprises an event aggregation condition, acquiring the event structured analysis result corresponding to the plurality of video clips.

The event structured analysis result comprises events in the corresponding video clips and occurrence time information of the events.

Of course, in other embodiments, other information may be included according to needs, and this embodiment is not limited to this.

Substep S1066: and aggregating video segments corresponding to the event structuring analysis results with the similarity greater than or equal to an event aggregation threshold value with respect to the events in the event structuring analysis results to obtain the split video.

The event aggregation threshold may be determined as needed, which is not limited in this embodiment. For example, it may be 80%, 90%, 100%, etc.

The similarity of the events can be calculated in any suitable manner by those skilled in the art, and the embodiment is not limited thereto.

For example, if the video clips 1-4 comprise the event A according to the structured analysis results of the events corresponding to the video clips 1-4, the video clips 1-4 are aggregated to obtain a video stripping bar corresponding to the event A. The video stripping can be represented by the start time and the end time of the included video segment, or by any other suitable means, which is not limited by the embodiment.

In addition, the method of the embodiment can meet the requirement of aggregating video clips based on corresponding shot aggregation conditions based on scene structural analysis results, event structural analysis results and character structural analysis results, so that the splitting of the video to be split in different dimensions is realized.

Example four

Referring to fig. 4, a flow chart of steps of a video processing method according to a fourth embodiment of the present invention is shown.

The video processing method of the present embodiment includes the foregoing steps S102 to S106. Wherein, both step S104 and step S106 may adopt the implementation manner in any of the foregoing embodiments.

In this embodiment, the method further includes the steps of:

step S108: and carrying out audio analysis on the video to be split, and obtaining time information of character conversation in the video to be split.

In one implementation, an audio-to-text algorithm (ASR algorithm) may be employed to identify a human conversation in the video to be split, as well as time information of the human conversation. Of course, in other implementations, other suitable algorithms may be employed to obtain the time information of the human dialog.

Step S110: and adjusting the starting time and/or the ending time corresponding to the split video by using the time information of the character conversation.

In order to improve the precision of the split video and avoid the phenomenon that the character conversation of the split video is interrupted, the time information of the character conversation is used for adjusting the split video. The following explains the adjustment process by taking the split video as a video splitting example:

for example, according to the starting time of the video strip a and the time information of the character conversation, if the starting boundary of the video strip a is determined to be located in the middle of a certain character conversation, the starting time of the video strip a is adjusted to the starting time or the ending time of the character conversation.

For another example, if the termination boundary of the video clip B is determined to be located in the middle of a certain person conversation based on the termination time of the video clip B and the time information of the person conversation, the termination time of the video clip B is adjusted to the start time or the termination time of the person conversation.

Optionally, in order to further improve the quality of the split video, in this embodiment, the method further includes step S112.

It should be noted that step S112 may be executed before, after, or in parallel with the above steps, which is not limited in this embodiment.

Step S112: and filtering the split video according to the splitting time length configured by the user and the time length of the split video so as to screen out the split video with the time length smaller than the configured splitting time length.

The user may configure the split duration through the interactive interface, such as 10 seconds, 30 seconds, 1 minute, 5 minutes, and so on. The splitting duration is used to indicate the shortest duration of the allowed split video. If the time length of the split video is less than the splitting time length, the split video is deleted, so that the split video with too short time length can be screened out, the quality of the split video is improved, and the number of meaningless split videos is reduced.

Optionally, in order to improve the interactivity and better meet the requirements of the user, the method further includes step S114 to step S116.

It should be noted that step S114 may be executed before, after, or in parallel with the above steps, which is not limited in this embodiment.

Step S114: and acquiring a boundary adjustment request for indicating adjustment of the split video boundary, wherein the boundary adjustment request comprises information of the target split video, and target starting time and/or target ending time.

The boundary adjustment request may be a request generated by performing a boundary adjustment operation on the displayed split video through an interactive interface by a user. For example, if the user adjusts the starting boundary of a split video (e.g., video split bar F) displayed through the interactive interface, a corresponding boundary adjustment request is generated.

The boundary adjustment request includes information of the video split by the target operated by the user, such as an ID, a name, or other identifiers of the video. In addition, the target starting time and/or the target ending time of the video split by the target are/is also included. If the starting time of the video split by the target is 1:40, the adjusted starting time of the target is 1:35, and the like.

Step S116: and adjusting the starting time and/or the ending time of the split video indicated by the information of the target split video to the corresponding target starting time and/or target ending time.

The starting time of the video obtained by splitting the target is adjusted to be the corresponding target starting time, and/or the ending time of the video is adjusted to be the target ending time, so that the boundary of the split video after adjustment meets the requirements of a user, the interactivity during splitting of the video is improved, and more degrees of freedom can be provided for the user.

In addition, when adjusting the split video of the target, the split video adjacent to the other time sequence may be adjusted as needed, or may not be adjusted, which is not limited in this embodiment.

In the adjustment, if the ending time of the previous split video is located after the target starting time, the ending time of the previous split video may be adjusted to the target starting time. Similarly, if the start time of the later split video is before the target end time, the start time of the later split video may be adjusted to the target end time.

Optionally, in order to improve the interactivity and meet the personalized requirements of the user, the method further includes step S118 to step S120.

It should be noted that step S118 may be executed before, after, or in parallel with the above steps.

Step S118: an increase request for instructing an increase of the split video is acquired.

And the user can add a new video on the basis of the obtained split video through the interactive interface. The add request may be generated based on an add operation by the user. The adding request comprises time information of the video to be added.

For example, the time information of the video to be added may be time information of one or more video clips. For example, it can be represented as { "1: 00-1: 35", "2: 04-2: 35" }.

Step S120: and obtaining the video to be added requested by the adding request according to the time information of the video to be added.

When a new video is added, a corresponding part of videos can be intercepted from the video to be split according to the time information of the video to be added, and the videos to be added are combined into the requested video to be added.

Alternatively, in other embodiments, the requested video to be added may be obtained in other manners, which is not limited by this embodiment.

In addition, according to the method of the embodiment, a user can adjust the obtained split video according to needs, including but not limited to adjusting the boundary of the split video, adding a new video (i.e., video splitting), deleting the existing split video, and the like, so that the interactive video splitting is realized, and the interactive capability is improved.

EXAMPLE five

Referring to fig. 5a, a flow chart of steps of a video processing method according to a fifth embodiment of the present invention is shown.

The video processing method of the present embodiment explains the video processing method provided by the embodiment of the present invention from the perspective of a terminal device side, and includes the following steps:

step S502: the method comprises the steps of receiving information of a video to be split input by a user through an interactive interface, and generating a video splitting request according to the information of the video to be split.

The video processing method of the embodiment can be executed by the terminal device, and also can be executed by a server configured with a display screen. The display screen can be separated from the server side, and data transmission is achieved through the network. The display screen may be used only for displaying pictures, receiving user operations, outputting sounds, and communicating with the server.

In the present embodiment, description is made taking an example of execution by a terminal device.

When the interactive interface is as shown in interface 1 in fig. 5b, the user can input the information of the video to be split through the interactive interface. The information of the video to be split may include the video to be split, and may also include the name, ID, URL, etc. of the video to be split.

According to the information of the video to be split, a video splitting request for indicating to obtain the split video corresponding to the video to be split can be generated. The terminal device can send the video splitting request to the server.

Optionally, to further promote interactivity, step S502 may be implemented as: acquiring second configuration information added by the user through the interactive interface; and generating the video splitting request according to the information of the video to be split and the second configuration information.

When the interactive interface is as shown in interface 2 in fig. 5b, the user may add the second configuration information through the interactive interface.

The second configuration information at least comprises an analysis dimension configured by a user. Such as a scene analysis dimension, an event analysis dimension, and a people analysis dimension.

In addition, the second configuration information may further include the type of the video to be split, information of the connected image, person identity category information, and the like. The type of the video to be split can be used to indicate that the video to be split is a movie and television episode or a news video. The joining image may be an image (e.g., transition image) included in the video to be split for joining different scenes. The person identity category information may be an identity type indicating a person in the video to be split, such as star.

And generating the video splitting request according to the information of the video to be split and the second configuration information under the condition that the video to be split comprises the second configuration information. Therefore, when the video to be split is split, splitting can be carried out according to the indication of the second configuration information, and therefore the personalized requirements of the user are met.

Step S504: and acquiring the split video responding to the video splitting request, and displaying at least part of the split video in a preview interface.

The split video can be obtained by the method of the foregoing embodiment. The obtained split video is displayed through the interactive interface, so that the interactivity is improved, and a user can adjust the split video according to the requirement of the user.

The following describes, with reference to a specific usage scenario, an implementation process of the video processing method as follows:

as shown in fig. 5b, when the interactive interface is shown as interface 1 in fig. 5b, the user may add information of the video to be split. After the user clicks certain, the interactive interface may be displayed as the interface shown in interface 2 in fig. 5 b. In the interface, the user configures the second configuration information. The second configuration information includes an analysis dimension selected by the user, such as a scene analysis dimension, an event analysis dimension, a character analysis dimension, and the like. In addition, the linked images can be uploaded. And then, generating a video splitting request according to the information of the video to be split and the second configuration information.

According to the video splitting request, splitting the video to be split by using a shot splitting technology (such as SBD), and splitting the video to be split into video fragments with fine granularity and corresponding to shots.

And carrying out structured analysis on each video clip. For example, if the user selects a scene analysis dimension, a character analysis dimension, and an event analysis dimension, the scene analysis, the character analysis, and the event analysis are performed on the video segment with shot granularity to obtain a corresponding structured analysis result.

For example, a scene structural analysis result, a human structural analysis result, and an event structural analysis result are obtained for the video clip 1, respectively. The structured analysis result includes accurate time information of the boundary, such as scene time information, appearance time information of a person, appearance time information of an event, and the like. Therefore, the split videos with different dimensions can be generated conveniently by using different structural analysis results subsequently, and convenience is improved.

In addition, audio analysis can be performed on the video to be split, such as by an audio-to-text (ASR) technique, to convert the human dialog therein into text with precise time boundaries. The time boundary is used to indicate time information of the human dialog.

And carrying out video aggregation according to the shot aggregation condition selected by the user and the corresponding structural analysis result.

For example, if the shot aggregation condition includes a scene aggregation condition, video segments that include the same scene and are continuous in time are aggregated. For example, video segments of a bedroom scene are aggregated, and the aggregated split video (which may also be referred to as a video split bar) may be represented by the scene time information of the included video segments. In addition, video clips which contain railway station scenes and are continuous in time sequence can be aggregated. Or aggregating scenes that include kitchen scenes and are chronologically continuous, and the like. When aggregation is performed according to scenes, aggregation may be performed according to action scenes, such as aggregation of video clips including fighting scenes or aggregation of video clips including kiss scenes, in addition to according to environment scenes.

It should be noted that sequential continuity is understood that there is no third video segment between two video segments, and it is not necessary that the ending time of one video segment coincides with the starting time of the other video segment.

For another example, if the shot aggregation condition includes a person aggregation condition, video segments in which the same face appears are aggregated, for example, video segments in which all actors appear are aggregated.

For another example, if the shot aggregation condition includes an event aggregation condition, video clips in which the same event occurs are aggregated, for example, video clips in which a certain news event occurs are aggregated.

In addition, in the aggregation process, the boundaries of the aggregated split videos are adjusted by using the time information of the character conversations, so that the phenomenon that the character conversations are interrupted is avoided.

After the split video is obtained, the split video can be filtered according to the filtering condition selected by the user. For example, the split video is filtered according to the splitting duration selected by the user, and the split video with too short duration is screened out.

The obtained split video can be sent to the terminal equipment, and the terminal equipment displays the split video. If the user is not satisfied with the split video, the user may adjust the split video, for example, add the split video with any time boundary (also referred to as video splitting bar), delete some split videos, or readjust the time boundary of the existing split video. In addition, the user can download the selected split videos through the downloading options.

In the use scene, the requirement that a user needs to perform secondary processing on a video to be split can be met through the interactive interface, and an original complete video to be split is split into a plurality of split videos according to a certain logic thinking or specific requirements. And the user can adjust the starting time and the ending time of the obtained split video and add or delete the split video.

In addition, the method can better meet the user requirements under the condition that the user performs configuration, and can also perform splitting according to multiple dimensions under the condition that the user does not participate so as to meet the requirement of performing secondary processing on the video to be split.

In another usage scenario, the video processing method is implemented as follows:

as shown in fig. 5c, when the interactive interface is shown as interface 1 in fig. 5c, the user may add a video to be split. After the user clicks and determines, a video splitting request generated based on the video to be split can be sent to the server side.

The server splits the video to be split according to the video splitting request and a preset splitting mode to obtain split video fragments. For example, the video to be split is split according to the image frame background, and video segments corresponding to different image frame backgrounds are obtained.

And performing structured analysis on each obtained video clip according to the analysis dimension indicated by the first configuration information. For example, the first configuration information indicates that the video segment is subjected to analysis of a scene analysis dimension, or the scene analysis dimension, the person analysis dimension, the event analysis dimension, and the like.

Taking structured analysis of scene analysis dimension, character analysis dimension and event analysis dimension respectively performed on the video clip as an example, by performing structured analysis, a corresponding scene structured analysis result, character structured analysis result and event structured analysis result are obtained.

The structural analysis results can be returned to the terminal equipment and displayed in an interactive interface of the terminal equipment for a user to check. For example, interface 2 in fig. 5c shows a schematic information diagram of the terminal device displaying the structural analysis result through the interactive interface, where st indicates the start time of the video clip corresponding to the scene, and et indicates the end time of the video clip corresponding to the scene. Besides displaying the structure division analysis result, other related information can be displayed according to needs, and the use scene does not limit the information.

The user may select the target aggregation condition in the interface 2 (for example, the user selects an aggregated scene by clicking a check box in the interface, which indicates that the user needs to aggregate according to the scene and only aggregates video segments corresponding to the scene selected by the user at this time, or the user directly clicks a scene analysis result, which indicates that the user needs to aggregate according to the scene and needs to aggregate video segments corresponding to all scenes). In addition, the split video can be optionally processed, such as adding a filter, adding a special effect, adding a subtitle, adding a map, and the like. And generating an aggregation request according to the input of the user, and sending the aggregation request to the server.

And the server determines a target aggregation condition according to the aggregation request, and aggregates the video segments according to the target aggregation condition and the corresponding structural analysis result to obtain the split video.

For example, if the target aggregation condition includes a scene aggregation condition, obtaining a corresponding scene structural analysis result, and aggregating video segments that include the same scene and are continuous in time sequence according to the scene structural analysis result to obtain a split video.

The aggregation process when the target aggregation condition includes the person aggregation condition and the event aggregation condition is similar, and therefore, the description thereof is omitted.

And if the aggregation request indicates that the split video is added with a filter, a special effect and the like, adding the filter, the special effect and the like aiming at the split video, sending the processed video to the terminal equipment, and displaying the video by the terminal equipment.

In the use scene, besides video splitting, video auxiliary generation may also be performed, for example, the split video is combined with other videos selected by the user to generate a new video. Or determining a video segment containing the character A according to the character structuralization analysis result, and editing the video segment containing the character A and a video segment containing another character B uploaded by a user to form a new video of the conversation between the character A and the character B after the editing, and the like. According to the embodiment, the video to be split is split to obtain the corresponding video segments, different analysis is performed on the video segments according to the analysis dimensionality indicated by the configuration information to obtain the structural analysis result, so that the video segments can be aggregated according to the corresponding structural analysis result according to different shot aggregation conditions to obtain the required split video, the video to be split is automatically split, and different shot aggregation conditions can be met.

EXAMPLE six

Referring to fig. 6, a block diagram of a video processing apparatus according to a sixth embodiment of the present invention is shown.

The video processing apparatus of the present embodiment includes: a splitting module 602, configured to, in response to a video splitting request, obtain at least one video segment in a to-be-split video indicated by the request; the analysis module 604 is configured to analyze the video clip according to the analysis dimension indicated by the configuration information, and obtain a structural analysis result corresponding to the video clip; and an aggregation module 606, configured to aggregate the video segments according to at least the structural analysis result to obtain a split video.

Optionally, the segmenting module 602 is configured to, when a video splitting request is responded and at least one video segment in the to-be-split video indicated by the request is obtained, split the to-be-split video according to a preset splitting manner, and obtain at least one video segment corresponding to the preset splitting manner.

Optionally, the preset splitting manner includes at least one of the following: splitting according to a lens, splitting according to a person, splitting according to a background of an image frame in a video to be split, and splitting according to a light and shade threshold of the image frame in the video to be split.

Optionally, the analysis module 604 is configured to determine a target analysis dimension according to an analysis dimension included in preset first configuration information or an analysis dimension included in second configuration information input by a user, where the target analysis dimension includes at least one of: a scene analysis dimension, a character analysis dimension, and an event analysis dimension; and analyzing the video clips according to the target analysis dimensionality, and obtaining structural analysis results corresponding to the video clips.

Optionally, the analysis module 604 is configured to, when analyzing the video segment according to the target analysis dimension and obtaining a structured analysis result corresponding to the video segment, perform scene analysis on the video segment and obtain a scene structured analysis result if the target analysis dimension includes a scene analysis dimension, where the scene structured analysis result includes a scene corresponding to the video segment and scene time information of the scene.

Optionally, the second configuration information further includes information of a join image input by a user, where the information of the join image is used to indicate a join image frame used for joining different scenes in the video to be split;

the analysis module 604 is configured to analyze the video clips according to the linked images by using a scene analysis algorithm when performing scene analysis on the video clips and obtaining scene structural analysis results.

Optionally, the analysis module 604 is configured to, when analyzing the video segment according to the target analysis dimension and obtaining a structural analysis result corresponding to the video segment, if the target analysis dimension includes a character analysis dimension, perform character analysis on the video segment and obtain a character structural analysis result, where the character structural analysis result includes characters corresponding to faces included in the video segment and occurrence time information of the characters.

Optionally, the second configuration information further includes person category information input by a user, where the person category information is used to indicate an identity category to which a person included in the video to be split belongs; the analysis module 604 is configured to analyze the video segment by using a human analysis algorithm corresponding to the human category information and obtain a human structuralization analysis result when performing human analysis on the video segment and obtaining the human structuralization analysis result.

Optionally, the analysis module 604 is configured to, when the video segment is analyzed according to the target analysis dimension and a structured analysis result corresponding to the video segment is obtained, perform event analysis on the video segment and obtain an event structured analysis result if the target analysis dimension includes an event analysis dimension, where the event structured analysis result includes an event included in the video segment and occurrence time information of the event.

Optionally, the aggregating module 606 is configured to determine a target aggregating condition according to the configuration information, where the target aggregating condition includes at least one of: scene aggregation conditions, character aggregation conditions, and event aggregation conditions; and aggregating the video segments according to the target aggregation condition and the corresponding structural analysis result to obtain the split video.

Optionally, the aggregating module 606 is configured to, when aggregating the video segments according to the target aggregating condition and the corresponding structured analysis result to obtain the split video, if the target aggregating condition includes a scene aggregating condition, obtain scene structured analysis results corresponding to the multiple video segments; acquiring a segment set corresponding to the split video according to the plurality of scene structural analysis results; wherein the segment set comprises a first set and/or a second set, the first set comprises at least two video segments satisfying a similarity condition, and the similarity condition comprises: the at least two video clips are continuous in time sequence, the similarity between the scene structural analysis results corresponding to any two adjacent video clips meets a scene aggregation threshold, and the second set comprises one video clip which does not meet the similarity condition; and aggregating the video segments in the segment set to obtain the split video.

Optionally, the aggregating module 606 is configured to, when aggregating video segments according to the target aggregating condition and the corresponding structured analysis result to obtain the split video, if the target aggregating condition includes an event aggregating condition, obtain the event structured analysis result corresponding to the plurality of video segments; and aggregating video segments corresponding to the event structuring analysis results with the similarity greater than or equal to an event aggregation threshold value with respect to the events in the event structuring analysis results to obtain the split video.

Optionally, the apparatus further comprises:

The audio analysis module 608 is configured to perform audio analysis on the video to be split, and obtain time information of a character conversation in the video to be split;

the first boundary adjusting module 610 is configured to adjust a start time and/or an end time corresponding to the split video by using the time information of the character dialog.

Optionally, the apparatus further comprises:

the screening module 612 is configured to filter the split video according to the splitting duration configured by the user and the duration of the split video, so as to screen the split video with a duration less than the configured splitting duration.

Optionally, the apparatus further comprises:

a first request obtaining module 614, configured to obtain a boundary adjustment request for indicating to adjust the split video boundary, where the boundary adjustment request includes information of a target split video, and a target start time and/or a target end time;

a second boundary adjusting module 616, configured to adjust a start time and/or an end time of the split video indicated by the information of the target split video to the corresponding target start time and/or target end time.

Optionally, the apparatus further comprises:

a second request obtaining module 618, configured to obtain an increase request for indicating to increase the split video, where the increase request includes time information of a video to be increased;

and an adding module 620, for obtaining the video to be added requested by the adding request according to the time information of the video to be added.

The video processing apparatus of this embodiment is configured to implement the corresponding video processing method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again. In addition, the functional implementation of each module in the video processing apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and is not repeated here.

EXAMPLE seven

Referring to fig. 7, a block diagram of a video processing apparatus according to a seventh embodiment of the present invention is shown.

The video processing apparatus of the present embodiment includes:

the request module 702 receives information of a video to be split, which is input by a user, through an interactive interface, and generates a video splitting request according to the information of the video to be split, where the video splitting request is used to instruct to obtain a split video corresponding to the video to be split;

A display module 704, configured to obtain a split video that responds to the video splitting request, and display at least part of the split video in a preview interface, where the split video is obtained by the apparatus in the sixth embodiment.

Optionally, the requesting module 702 is configured to obtain second configuration information added by the user through the interactive interface, where the second configuration information at least includes an analysis dimension configured by the user; and generating the video splitting request according to the information of the video to be split and the second configuration information.

Example eight

Referring to fig. 8, a schematic structural diagram of an electronic device according to an eighth embodiment of the present invention is shown, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.

As shown in fig. 8, the electronic device may include: a processor (processor)802, a Communications Interface 804, a memory 806, and a communication bus 808.

Wherein:

the processor 802, communication interface 804, and memory 806 communicate with one another via a communication bus 808.

A communication interface 804 for communicating with other electronic devices, such as a terminal device or a server.

The processor 802 is configured to execute the program 810, and may specifically perform relevant steps in the above-described video processing method embodiment.

In particular, the program 810 may include program code comprising computer operating instructions.

The processor 802 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention. The electronic device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

The memory 806 stores a program 810. The memory 806 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 810 may be specifically configured to cause the processor 802 to perform the following operations: responding to a video splitting request, and obtaining at least one video segment in a video to be split indicated by the request; analyzing the video clips according to the analysis dimensionality indicated by the configuration information, and obtaining structural analysis results corresponding to the video clips; and aggregating the video segments at least according to the structural analysis result to obtain the split video.

In an optional implementation manner, the program 810 is further configured to, when the processor 802 obtains at least one video segment in the to-be-split video indicated by the request in response to the video splitting request, split the to-be-split video according to a preset splitting manner, and obtain at least one video segment corresponding to the preset splitting manner.

In an optional embodiment, the preset splitting manner includes at least one of the following: splitting according to a lens, splitting according to a person, splitting according to a background of an image frame in a video to be split, and splitting according to a light and shade threshold of the image frame in the video to be split.

In an optional implementation manner, the program 810 is further configured to, when the analyzing dimension indicated by the configuration information is used to analyze the video segment and obtain a structural analysis result corresponding to the video segment, determine a target analyzing dimension according to an analyzing dimension included in preset first configuration information or an analyzing dimension included in second configuration information input by a user, where the target analyzing dimension includes at least one of: a scene analysis dimension, a character analysis dimension, and an event analysis dimension; and analyzing the video clips according to the target analysis dimensionality, and obtaining structural analysis results corresponding to the video clips.

In an optional implementation, the program 810 is further configured to, when the video segment is analyzed according to the target analysis dimension and a structural analysis result corresponding to the video segment is obtained, if the target analysis dimension includes a scene analysis dimension, perform scene analysis on the video segment and obtain a scene structural analysis result, where the scene structural analysis result includes a scene corresponding to the video segment and scene time information of the scene.

In an optional embodiment, the second configuration information further includes information of a join image input by a user, where the information of the join image is used to indicate a join image frame used for joining different scenes in the video to be split; the program 810 is further configured to enable the processor 802 to analyze the video segment according to the connected image and obtain a scene structural analysis result by using a scene analysis algorithm when performing the scene analysis on the video segment and obtaining the scene structural analysis result.

In an optional implementation, the program 810 is further configured to, when the video segment is analyzed according to the target analysis dimension and a structural analysis result corresponding to the video segment is obtained, perform a human character analysis on the video segment and obtain a human structural analysis result if the target analysis dimension includes a human character analysis dimension, where the human structural analysis result includes a human character corresponding to a human face included in the video segment and occurrence time information of the human character.

In an optional embodiment, the second configuration information further includes person category information input by a user, where the person category information is used to indicate an identity category to which a person included in the video to be split belongs; program 810 is further configured to cause processor 802 to, when performing a persona analysis on the video segment and obtaining a persona analysis result, analyze the video segment using a persona analysis algorithm corresponding to the persona category information and obtain the persona analysis result.

In an optional implementation, the program 810 is further configured to, when the video segment is analyzed according to the target analysis dimension and a structured analysis result corresponding to the video segment is obtained, if the target analysis dimension includes an event analysis dimension, perform event analysis on the video segment and obtain an event structured analysis result, where the event structured analysis result includes an event included in the video segment and occurrence time information of the event.

In an alternative embodiment, the program 810 is further configured to enable the processor 802, when aggregating the video segments to obtain the split video according to at least the structural analysis result, to determine a target aggregation condition according to the configuration information, where the target aggregation condition includes at least one of: scene aggregation conditions, character aggregation conditions, and event aggregation conditions; and aggregating the video segments according to the target aggregation condition and the corresponding structural analysis result to obtain the split video.

In an optional implementation manner, the program 810 is further configured to enable the processor 802 to, when aggregating video segments according to the target aggregation condition and the corresponding structural analysis result to obtain the split video, obtain scene structural analysis results corresponding to a plurality of video segments if the target aggregation condition includes a scene aggregation condition; acquiring a segment set corresponding to the split video according to the plurality of scene structural analysis results; wherein the segment set comprises a first set and/or a second set, the first set comprises at least two video segments satisfying a similarity condition, and the similarity condition comprises: the at least two video clips are continuous in time sequence, the similarity between the scene structural analysis results corresponding to any two adjacent video clips meets a scene aggregation threshold, and the second set comprises one video clip which does not meet the similarity condition; and aggregating the video segments in the segment set to obtain the split video.

In an optional implementation manner, the program 810 is further configured to, when the aggregating is performed on the video segments according to the target aggregation condition and the corresponding structural analysis result to obtain the split video, if the target aggregation condition includes a human aggregation condition, obtain human structural analysis results corresponding to a plurality of video segments; and aggregating video segments corresponding to the human structural analysis results containing the current human to obtain the split video aiming at the human included in the human structural analysis results.

In an optional implementation, the program 810 is further configured to, when the aggregating is performed on the video segments according to the target aggregation condition and the corresponding structural analysis result to obtain the split video, if the target aggregation condition includes an event aggregation condition, obtain event structural analysis results corresponding to a plurality of the video segments by the processor 802; and aggregating video segments corresponding to the event structuring analysis results with the similarity greater than or equal to an event aggregation threshold value with respect to the events in the event structuring analysis results to obtain the split video.

In an alternative embodiment, the program 810 is further configured to enable the processor 802 to perform audio analysis on the video to be split, and obtain time information of a human conversation in the video to be split; and adjusting the starting time and/or the ending time corresponding to the split video by using the time information of the character conversation.

In an alternative embodiment, the program 810 is further configured to cause the processor 802 to filter the split video according to a splitting duration configured by a user and a duration of the split video to screen out the split video having a duration less than the configured splitting duration.

In an optional implementation, the program 810 is further configured to cause the processor 802 to obtain a boundary adjustment request for indicating to adjust the boundary of the split video, where the boundary adjustment request includes information of the target split video, and a target start time and/or a target end time; and adjusting the starting time and/or the ending time of the split video indicated by the information of the target split video to the corresponding target starting time and/or target ending time.

In an alternative embodiment, the program 810 is further configured to enable the processor 802 to obtain an increase request for instructing to increase the split video, where the increase request includes time information of the video to be increased; and obtaining the video to be added requested by the adding request according to the time information of the video to be added.

Alternatively, the first and second electrodes may be,

the program 810 may be specifically configured to cause the processor 802 to perform the following operations: receiving information of a video to be split input by a user through an interactive interface, and generating a video splitting request according to the information of the video to be split, wherein the video splitting request is used for indicating to obtain a split video corresponding to the video to be split; acquiring a split video responding to the video splitting request, and displaying at least part of the split video in a preview interface, wherein the split video is obtained by the method of any one of claims 1 to 17.

In an optional implementation manner, the program 810 is further configured to, when the processor 802 generates a video splitting request according to a video to be split that is added by a user through an interactive interface, obtain second configuration information that is added by the user through the interactive interface, where the second configuration information at least includes an analysis dimension configured by the user; and generating the video splitting request according to the information of the video to be split and the second configuration information.

For specific implementation of each step in the program 810, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing embodiments of the video processing method, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.

The above-described method according to an embodiment of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein may be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the video processing methods described herein. Further, when a general-purpose computer accesses code for implementing the video processing methods shown herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the video processing methods shown herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The above embodiments are only for illustrating the embodiments of the present invention and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

35页详细技术资料下载

Video processing method and device, electronic equipment and computer storage medium

相关技术

网友询问留言