Intelligent device, video playing method and video processing method

文档序号:73226 发布日期:2021-10-01 浏览:18次 中文

阅读说明:本技术 一种智能设备、视频播放方法及视频处理方法 (Intelligent device, video playing method and video processing method ) 是由 高桢 孙菁 高语函 曲磊 李正义 赵启东 谢飞学 于 2020-04-27 设计创作,主要内容包括:本发明公开了一种智能设备、视频播放方法及视频处理方法,可以通过处理器获取到待处理视频,并确定出待处理视频的基准关键帧。这样可以根据待处理视频中的视频帧与基准关键帧之间的相似关系,将待处理视频划分为连续的多个视频子序列,从而将该待处理视频的内容进行自动分割。并且,将每一个视频子序列中的第一个视频帧作为视频子序列的标志帧,以完成自动分割。从而在显示器播放经处理器处理后的视频时,可以通过视频子序列的标志帧进行上一个或下一个视频子序列的自动跳转。(The invention discloses an intelligent device, a video playing method and a video processing method. Therefore, the video to be processed can be divided into a plurality of continuous video subsequences according to the similarity relation between the video frame in the video to be processed and the reference key frame, and the content of the video to be processed is automatically segmented. And the first video frame in each video subsequence is taken as a mark frame of the video subsequence to finish automatic segmentation. Therefore, when the display plays the video processed by the processor, the automatic skip of the previous or next video subsequence can be carried out through the mark frame of the video subsequence.)

1. A smart device, comprising:

a housing;

the display is arranged on the shell;

a memory;

a processor configured to:

acquiring a video to be processed;

determining a plurality of reference key frames in the video to be processed; wherein the plurality of reference key frames are arranged sequentially;

dividing the video to be processed into a plurality of continuous video subsequences according to the similarity relation between the video frame in the video to be processed and the reference key frame; wherein one of said video sub-sequences corresponds to one of said reference key frames;

determining a first video frame in each of the video sub-sequences as a flag frame of each of the video sub-sequences;

and storing the video to be processed after determining the mark frame of each video subsequence in the memory.

2. The smart device of claim 1, wherein the processor is further configured to:

extracting a feature vector of each video frame in the video to be processed;

determining a plurality of candidate key frames in the video to be processed according to the feature vectors of all video frames in the video to be processed; wherein the plurality of candidate keyframes are arranged in a sequence;

determining the initial similarity between every two adjacent candidate key frames according to the feature vectors of the candidate key frames;

for each initial similarity, judging whether the initial similarity is larger than an initial similarity threshold value;

if so, determining the candidate key frame in the two candidate key frames corresponding to the initial similarity, which is in the front of the initial similarity, as a reference key frame, and discarding the other candidate key frame;

and if not, determining the two candidate key frames corresponding to the initial similarity as reference key frames.

3. The smart device of claim 1 or 2, wherein the processor is further configured to:

determining a first similarity between each video frame and an nth reference key frame and a second similarity between each video frame and the (n + 1) th reference key frame for each video frame between the nth reference key frame and the (n + 1) th reference key frame which are sequentially arranged;

for each video frame between an nth reference key frame and an n +1 th reference key frame which are sequentially arranged, when a first similarity corresponding to the video frame is greater than a second similarity, determining that the attributes of the video frame and the nth reference key frame are the same, and when the first similarity corresponding to the video frame is less than the second similarity, determining that the attributes of the video frame and the n +1 th reference key frame are the same;

and dividing the video to be processed into a plurality of continuous video subsequences according to a rule that video frames with the same attribute and a reference key frame are divided into one video subsequence.

4. The smart device of claim 1 or 2, wherein the processor is further configured to:

determining a middle key frame in a jth video subsequence of the plurality of video subsequences, and updating a reference key frame of the jth video subsequence according to the middle key frame;

and when the updated reference key frame in the jth video subsequence is the same as the reference key frame before updating, taking the first video frame in the jth video subsequence as the mark frame of the jth video subsequence.

5. The smart device of claim 4, wherein the processor is further configured to:

judging whether the updated reference key frame in the jth video subsequence is the same as the reference key frame before updating;

if so, taking the first video frame in the jth video subsequence as a mark frame of the jth video subsequence;

if not, determining a third similarity between each video frame and the updated reference key frame in the j video subsequence and determining a fourth similarity between each video frame and the updated reference key frame in the adjacent video subsequence for the video frames between the updated reference key frame in the j video subsequence and the updated reference key frame in the adjacent video subsequence;

for a video frame between an updated reference key frame in the jth video subsequence and an updated reference key frame in an adjacent video subsequence, determining that the video frame belongs to the jth video subsequence when a third similarity corresponding to the video frame is greater than a fourth similarity, and determining that the video frame belongs to a video subsequence adjacent to the jth video subsequence when the third similarity corresponding to the video frame is less than the fourth similarity;

and determining the middle key frame in the j video subsequence again, and updating the updated reference key frame in the j video subsequence again according to the middle key frame until the updated reference key frame in the j video subsequence is the same as the reference key frame before updating.

6. The smart device of claim 1 or 2, wherein the processor is further configured to:

when the display plays the video to be processed which is processed by the processor, receiving a first input instruction, controlling the video played by the display to be switched from the video frame of the current video subsequence to the mark frame of the last video subsequence adjacent to the current video subsequence, and playing the switched last video subsequence from the mark frame on the display;

and when the display plays the video to be processed which is processed by the processor, receiving a second input instruction, controlling the video played on the display to be switched from the video frame of the current video subsequence to the mark frame of the next video subsequence adjacent to the current video subsequence, and playing the switched next video subsequence from the mark frame on the display.

7. A smart device, comprising:

a housing;

the display is arranged on the shell;

the processor is further configured to:

receiving a video to be played;

receiving a first input instruction, controlling a video played by the display to be switched from a current video subsequence to a previous video subsequence adjacent to the current video subsequence, and playing the switched previous video subsequence on the display;

and/or receiving a second input instruction, controlling the video played on the display to be switched from the current video subsequence to a next video subsequence adjacent to the current video subsequence, and playing the switched next video subsequence on the display;

the video to be played is divided into a plurality of continuous video subsequences, and the adjacent video subsequences comprise video frames in different operation steps or operation states.

8. The smart device of claim 7, wherein the processor is further configured to:

receiving a third input instruction, and controlling the video played on the display to be paused at the current video frame;

receiving a fourth input instruction, and controlling the paused video on the display to start playing from the current video frame;

receiving a fifth input instruction, and controlling the video played on the display to advance by a first preset number of video frames;

and receiving a sixth input instruction, and controlling the video played on the display to retreat by a second preset number of video frames.

9. A video processing method, comprising:

acquiring a video to be processed;

determining a plurality of reference key frames in the video to be processed; wherein the plurality of reference key frames are arranged sequentially;

dividing the video to be processed into a plurality of continuous video subsequences according to the similarity relation between the video frame in the video to be processed and the reference key frame; wherein one of said video sub-sequences corresponds to one of said reference key frames;

determining a first video frame in each of the video sub-sequences as a flag frame of each of the video sub-sequences;

and storing the video to be processed after determining the mark frame of each video subsequence in the memory.

10. A video playback method, comprising:

receiving a video to be played;

receiving a first input instruction, controlling a video played by the display to be switched from a current video subsequence to a previous video subsequence adjacent to the current video subsequence, and playing the switched previous video subsequence on the display;

and/or receiving a second input instruction, controlling the video played on the display to be switched from the current video subsequence to a next video subsequence adjacent to the current video subsequence, and playing the switched next video subsequence on the display;

the video to be played is divided into a plurality of continuous video subsequences, and the adjacent video subsequences comprise video frames in different operation steps or operation states.

Technical Field

The invention relates to the technical field of intelligent equipment, in particular to intelligent equipment, a video playing method and a video processing method.

Background

Along with the increasing improvement of living standard, intelligent equipment has become the domestic appliance that people are indispensable in the life to, along with the improvement of life demand, people also have higher and higher requirements degree to the intellectuality of intelligent equipment product.

Disclosure of Invention

The embodiment of the invention provides intelligent equipment, a video playing method and a video processing method, which are used for realizing automatic segmentation of videos.

The intelligent device that this application embodiment provided includes: a housing; a memory; the processor and the display are arranged on the shell;

a processor configured to: acquiring a video to be processed; determining a plurality of reference key frames in the video to be processed; dividing the video to be processed into a plurality of continuous video subsequences according to the similarity relation between the video frame in the video to be processed and the reference key frame; determining a first video frame in each of the video sub-sequences as a flag frame of each of the video sub-sequences; and storing the video to be processed after determining the mark frame of each video subsequence in the memory. Wherein the plurality of reference key frames are arranged sequentially; one said video sub-sequence corresponding to one said reference key-frame.

In some embodiments of the present application, a to-be-processed video may be obtained by a processor, and a reference key frame of the to-be-processed video is determined. Therefore, the video to be processed can be divided into a plurality of continuous video subsequences according to the similarity relation between the video frame in the video to be processed and the reference key frame, and the content of the video to be processed is automatically segmented. And the first video frame in each video subsequence is taken as a mark frame of the video subsequence to finish automatic segmentation. Therefore, when the display plays the video processed by the processor, the automatic skip of the previous or next video subsequence can be carried out through the mark frame of the video subsequence.

In some embodiments of the present application, for the extracted partial candidate keyframes, there may be a large similarity between two adjacent candidate keyframes, so that an initial similarity between two adjacent candidate keyframes may be determined for each two adjacent candidate keyframes. If the initial similarity is greater than the initial similarity threshold of a certain value, it indicates that there is a greater similarity between the two candidate key frames. Therefore, the candidate key frame with smaller sequence number can be selected as the reference key frame, and the other candidate key frame can be discarded. Otherwise, the two candidate key frames may be retained at the same time, and both of the two candidate key frames may be used as the reference key frame.

In some embodiments of the present application, the reference key frame is a frame of image in each video sub-sequence that is most representative of the video content to be played by the video sub-sequence. And judging the attribute of the video frame in the video based on the obtained reference key frame. Therefore, the video frames with the same attribute can be classified into the video sub-sequences of the same video according to the obtained attribute of each video frame. According to the method and the device, the video frames with the same attribute and the reference key frame are divided into the video sub-sequence, so that the video frames in the video sub-sequence form a shot or a scene. Therefore, the video to be processed can be automatically segmented to obtain the content segmentation time point by dividing the video to be processed into a plurality of continuous video subsequences.

In some embodiments of the present application, when the updated reference key frame in the jth video subsequence is the same as the reference key frame before updating, it can be stated that the video frames in the jth video subsequence may form a shot or a scene, so that the content of the video to be processed can be automatically segmented.

The intelligent device that this application embodiment provided includes: a housing; the display is arranged on the shell;

the processor is further configured to: receiving a video to be played; receiving a first input instruction, controlling a video played by the display to be switched from a current video subsequence to a previous video subsequence adjacent to the current video subsequence, and playing the switched previous video subsequence on the display; and/or receiving a second input instruction, controlling the video played on the display to be switched from the current video subsequence to a next video subsequence adjacent to the current video subsequence, and playing the switched next video subsequence on the display; the video to be played is divided into a plurality of continuous video subsequences, and the adjacent video subsequences comprise video frames in different operation steps or operation states.

In some embodiments of the present application, a video with playback may be obtained by a processor, and a reference key frame of the video with playback may be determined. Therefore, the video with play can be divided into a plurality of continuous video subsequences according to the similarity relation between the video frame in the video with play and the reference key frame, so that the content of the video with play can be automatically segmented. And the first video frame in each video subsequence is taken as a mark frame of the video subsequence to finish automatic segmentation. Therefore, when the display plays the video processed by the processor, the video played by the display can automatically jump to the mark frame of the previous video subsequence and then start to be automatically played by receiving the first input instruction. Or the second input instruction is received, so that the video played by the display automatically jumps to the mark frame of the next video subsequence and then starts to be automatically played.

The video processing method provided by the embodiment of the application comprises the following steps: acquiring a video to be processed; determining a plurality of reference key frames in the video to be processed; dividing the video to be processed into a plurality of continuous video subsequences according to the similarity relation between the video frame in the video to be processed and the reference key frame; determining a first video frame in each of the video sub-sequences as a flag frame of each of the video sub-sequences; and storing the video to be processed after determining the mark frame of each video subsequence in the memory. Wherein the plurality of reference key frames are arranged sequentially; one said video sub-sequence corresponding to one said reference key-frame.

The video playing method provided by the embodiment of the application comprises the following steps: receiving a video to be played; receiving a first input instruction, controlling a video played by the display to be switched from a current video subsequence to a previous video subsequence adjacent to the current video subsequence, and playing the switched previous video subsequence on the display; and/or receiving a second input instruction, controlling the video played on the display to be switched from the current video subsequence to a next video subsequence adjacent to the current video subsequence, and playing the switched next video subsequence on the display; the video to be played is divided into a plurality of continuous video subsequences, and the adjacent video subsequences comprise video frames in different operation steps or operation states.

Drawings

Fig. 1a is a schematic structural diagram of a refrigerator according to some embodiments of the present application;

FIG. 1b is a schematic view of a portion of a refrigerator according to some embodiments of the present disclosure;

fig. 2 is a schematic structural view of a range hood provided in some embodiments of the present application;

FIG. 3 is a block diagram of a configuration of some smart devices provided by some embodiments of the present application;

FIG. 4 is a block diagram of an architectural configuration in a processor provided by some embodiments of the present application;

fig. 5 is a flowchart of a video playing method according to some embodiments of the present application;

fig. 6 is a flowchart of a video processing method according to further embodiments of the present application;

fig. 7 is a flow chart of a video processing method according to further embodiments of the present application;

fig. 8 is a flow chart of a video processing method according to further embodiments of the present application;

FIG. 9 is a schematic diagram of a portion of a video frame provided by some embodiments of the present application;

FIG. 10a is a diagram of a first reference video frame according to some embodiments of the present application;

FIG. 10b shows a video frame f according to some embodiments of the present applicationiA schematic diagram of (a);

FIG. 10c is a diagram of a second reference video frame provided by some embodiments of the present application;

fig. 11 is a flowchart of a video playing method according to some embodiments of the present application.

Reference numerals:

0100-storage room, 0100A-freezing room, 0100B-refrigerating room, 0200-door body, 0200A-freezing room door body, 0200B-refrigerating room door body, 0101-storage drawer, 0102-first layer rack and 0103-second layer rack; 10-an image acquisition unit; 110-controller, 120-memory, 130-communicator, 140-user input interface, 150-user output interface, 160-power supply, 170-image acquisition interface, 180-display; 111-random access memory, 112-read only memory, 113-processor; 131-an infrared signal interface, 132-a radio frequency signal interface, 133-a WIFI module, 134-a Bluetooth module and 135-a wired Ethernet module; 141-microphone, 142-touchpad, 143-tactile sensor, 144-key, 151-LED interface, 152-vibration interface, 153-sound output interface.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. And the embodiments and features of the embodiments may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.

Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. The use of "first," "second," and similar terms in the present application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

It should be noted that the sizes and shapes of the figures in the drawings are not to be considered true scale, but are merely intended to schematically illustrate the present invention. And the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout.

One common piece of equipment in modern kitchens includes stoves (e.g., gas stoves, microwave ovens, or ovens), cooking stations (e.g., cabinets, sinks), refrigerators, dishwashers, range hoods, and the like. With the increase of living demands, people have higher and higher requirements on the intellectualization of equipment in a kitchen. As shown in fig. 1a, the smart device may include a refrigerator. As shown in fig. 2, the smart device may also include a range hood. The following description will be made of a refrigerator. Of course, in practical application, what kind of product the smart device is specifically set to may be designed and determined according to practical application requirements, and is not limited herein.

Fig. 1a is a schematic perspective view of some refrigerators in some embodiments of the present application. Referring to fig. 1a, the refrigerator 1 of the present embodiment may include a cabinet of an approximately rectangular parallelepiped shape. The cabinet may include a storage chamber 0100 and a door 0200 movably connected to the cabinet. In practical applications, the appearance of the refrigerator 1 may be determined by the storage chamber 0100 defining the storage space and the door body 0200 provided in the storage chamber 0100. In some examples, the storage chamber 0100 is a box with an opening formed by a storage chamber liner, a storage chamber housing, and a foaming layer therebetween. The door 0200 is used to shield the opening of the storage chamber 0100. The storage chamber 0100 is vertically partitioned into a freezing chamber 0100A located below and a refrigerating chamber 0100B located above. In which the freezing compartment 0100A and the refrigerating compartment 0100B may have independent storage spaces, respectively.

In some examples, the door bodies 0200 may include a freezing chamber door body 0200A and a refrigerating chamber door body 0200B. Also, freezing chamber 0100A is defined at a lower side of storage chamber 0100 and an opening of freezing chamber 0100A can be selectively covered by freezing chamber door body 0200A.

In some examples, the refrigerating compartment 0100B is defined at an upper side of the storage compartment 0100, i.e., the refrigerating compartment 0100B is disposed above the freezing compartment 0100A. Further, the opening of refrigerating room 0100B is selectively covered with refrigerating room door 0200B. In practical applications, the refrigerating chamber door body 0200B is pivotally mounted on the refrigerating chamber 0100B, so that the opening of the refrigerating chamber 0100B can be selectively opened or closed by the refrigerating chamber door body 0200B.

In some examples, in the storage room 0100 of the refrigerator in the embodiment of the present invention, a locker drawer 0101 and a first shelf 0102 and a second shelf 0103 on the locker drawer 0101 may be included. Therein, the locker drawer 0101 can be used to contain food materials (e.g., fruits, vegetables, etc.) in the first shelf 0102 and the second shelf 0103, respectively.

With the improvement of the demand on the quality of life, the food teaching video enters the life of the user, so that the dietary life of the user becomes more abundant and diversified. For example, when a user makes a gourmet, the user can play and watch a gourmet teaching video through a display of the refrigerator, so that the user can realize 'watching while operating' by referring to the gourmet teaching video, and more convenient gourmet cooking guidance is provided for the user. In practical applications, a user often needs to perform a playing operation, such as "pause", "play", "fast forward", "rewind", "next step", "previous step", etc., while watching a gourmet teaching video. Usually, the user performs the artificial annotation of the segmented time points on the content in the videos in advance. However, for the current massive video resources, on one hand, the segmentation time points of the video resources cannot be labeled in advance manually, and on the other hand, the operation of labeling the segmentation time points of the video in advance consumes manpower and material resources. This makes the demand for the intellectualization of the refrigerator products more and more high for the users.

The refrigerator provided by some embodiments of the application can be applied to a scene where a user plays videos, for example, the user determines a food to be made, selects a recipe for making the food, the recipe corresponds to a food teaching video, and the display on the refrigerator can play the food teaching video.

In some embodiments of the present invention, as shown in fig. 1b, the refrigerator may further include an image capturing unit 10, configured to capture multiple frames of detected images of food materials taken by a user during the process of accessing the food materials. For example, a detection image of the food material that the user puts into the storage chamber at a time may be collected by the image collecting unit 10 to identify the kind of the food material that the user puts into through the detection image. A plurality of frames of detection images of the food materials taken out of the storage chamber by the user each time can be collected by the image collecting unit 10, so that the types of the food materials taken out by the user can be identified through the detection images. Of course, the user can also input the food materials into the refrigerator by inputting the food materials into and taking the food materials out of the refrigerator through voice interaction. Of course, the face image of the person using the refrigerator may be collected by the image collecting unit 10 to perform face recognition.

In some examples, the image acquisition unit may be a color camera, a depth camera, or a combination of both. The color camera may be a general color camera or a wide-angle color camera. The depth camera may be a binocular camera, a structured light camera, or a camera based on time of flight (TOF).

In some embodiments of the present invention, the range of the viewing angle of the image capturing unit can cover the whole refrigerating chamber and/or the whole freezing chamber, so that a detection image of the food material taken by the user can be captured during the process of the food material taken by the user. In some examples, the image acquisition unit is used for responding to the opening of the door body and acquiring multi-frame detection images with food materials to be identified. For example, when the refrigerating chamber door body 0200B is opened, collection is carried out in the process of taking food materials by a user, so that multiple frames of detection images with food materials to be identified are collected.

In some examples, as shown in fig. 1B, the image capturing unit 10 may be installed at the top end inside a storage compartment (e.g., a refrigerating compartment 0100B) of a refrigerator. Alternatively, the image capturing unit 10 may be installed at the top end (e.g., near the top of the refrigerating chamber door 0200B) outside the storage chamber (e.g., the refrigerating chamber 0100B) of the refrigerator.

In some embodiments of the invention, some block configurations of smart devices are illustrated in FIG. 3. As shown in fig. 3, the smart device may further include a controller 110, a memory 120, a communicator 130, a user input interface 140, a user output interface 150, a power supply 160, an image capture interface 170, and a display 180.

The controller 110 includes a Random Access Memory (RAM) 111, a Read Only Memory (ROM) 112, a processor 113, a communication interface, and a communication bus. The controller 110 is used to control the operation of the above devices, as well as the communication cooperation between the internal components, and the external and internal data processing functions.

Illustratively, when an interaction of a user pressing the key 144 or an interaction of a touch on the touch pad 142 is detected, the controller 110 may control the processor 113 to generate a signal corresponding to the detected interaction and transmit the signal to the display 180 so that the display 180 may display the corresponding content or screen.

In some examples, the processor 113 may be configured to receive a plurality of frames of detection images acquired by the image acquisition unit and determine a feature vector of each frame of detection image; determining a target confidence probability of the food materials to be identified corresponding to each food material type according to the predetermined class characteristic vectors corresponding to a plurality of different food material types and the characteristic vector of each frame of detection image; and determining the food material type corresponding to the maximum value in the target confidence probability as the type of the food material to be identified. Further, the processor 113 may send a control instruction to the display 180 according to the determined type of the food material to be recognized, so that the display 180 may display the determined type or image of the food material to be recognized.

In some examples, in some embodiments of the present invention, the processor 113 may be configured to enable identification of the type of the food material to be accessed when the food material is taken out of or put into the storage room of the smart device, and may perform dynamic identification through multiple frames of images to improve the accuracy of identifying the type of the food material. In addition, the food materials taken out of or put into the storeroom of the intelligent device by the user can be managed, for example, the types, the number and the freshness date of the food materials put into the storeroom by the user can be recorded and managed, and therefore the food material storage record of the user can be formed. The types and the number of the food materials taken out by the user each time can be recorded and managed, so that the food material taking record of the user can be formed.

In some examples, processor 113 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a combination of a CPU and a GPU. The processor may further include a hardware chip. The hardware chip may be an Application-Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a Field-Programmable Gate Array (FPGA), General Array Logic (GAL), or any combination thereof.

A memory 120 for storing various operation programs, data and applications of driving and controlling under the control of the controller 110. The memory 120 may store various control signal commands input by a user. In some examples, a memory is coupled to the processor via a bus or other means and has stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by the processor.

The communicator 130 may be a component for communicating with an external device or an external server according to various communication protocol types. For example, the smart device may transmit content data to an external device connected via the communicator 130, or browse and download content data from an external device connected via the communicator 130. The communicator 130 may include a network communication protocol module or a near field communication protocol module, such as an infrared signal interface 131, a radio frequency signal interface 132, a WIFI module 133, a bluetooth communication protocol module 134, and a wired ethernet communication protocol module 135, so that the communicator 130 may implement communication of control signals and data signals with an external device or an external server according to control of the controller 110. For example: when the infrared signal interface is used, the infrared control signal sent by the user needs to be converted according to the infrared control protocol and then output to the controller 110. For example, in the rf signal interface, the command of the rf control signal mode input by the user needs to be converted and then output to the controller 110. For example, the WIFI module 133, the bluetooth communication protocol module 134, and the wired ethernet communication protocol module 135 receive a control signal for controlling the smart device by the external device, and output the control signal to the controller 110 after processing the control signal.

The user input interface 140 may include at least one of a microphone 141, a touch pad 142, a sensor 143, a key 144, and the like, so that a user can input a user instruction regarding controlling the smart device to the smart device through voice, touch, gesture, press, and the like. For example, the user may be caused to input the video play instruction, the first input instruction, the second input instruction, the third input instruction, the fourth input instruction, the fifth input instruction, and the sixth input instruction into the processor 113 of the smart device by at least one of voice, touch, gesture, and press.

The user output interface 150 controls the processor 113 to execute corresponding program steps through the controller 110 by outputting user instructions received by the user input interface 140 to the controller 110. Alternatively, the processor 113 may control the display 180 to display a corresponding screen or output corresponding content through the user output interface after executing the corresponding program steps. Here, the user output interface 150 may include an LED interface 151, a vibration interface 152 generating vibration, a sound output interface 153 outputting sound, and the like. For example, a display may receive output signals such as audio, video, or data from user output interface 150 and display the output signals as images on the display, as audio at sound output interface 153, or as vibrations at vibration interface 152.

And the image acquisition interface 170 is used for performing signal connection between the image acquisition unit 10 and the intelligent device. For example, the detection image acquired by the image acquisition unit 10 may be transmitted to the processor 113 in the controller 110 through the image acquisition interface 170.

The display 180 is configured to receive the image signal input by the processor 113, and display video content, images and a menu control interface. The video content may be displayed from the video content processed by the processor 113 or from the video content input by the communicator 130 or the user input interface 140. The display 180 may also simultaneously display a user manipulation interface UI for controlling the smart device. And, the display 180 may further include a display component for presenting a picture and a driving component for driving image display. Alternatively, a projection device and projection screen may be included, provided that the display 180 is a projection display.

In some examples, as shown in fig. 1B, the display 180 can be mounted on the refrigerator door 0200B. Alternatively, the housing may be installed at other positions of the housing, which is not limited herein.

And a power supply 160 for providing operating power support for the components in the smart device under the control of the controller 110. In the form of a battery and associated control circuitry.

A block diagram of the architectural configuration of the operating system in memory 120 is illustrated in fig. 4. The operating system architecture comprises an application layer, a middleware layer and a kernel layer from top to bottom.

The application layer, the application programs built in the system and the non-system-level application programs belong to the application layer. Is responsible for direct interaction with the user. The application layer may include a plurality of applications such as a setup application, a post application, a media center application, and the like. These applications may be implemented as Web applications that execute based on a WebKit engine, and in particular may be developed and executed based on HTML5, Cascading Style Sheets (CSS), and JavaScript.

Here, HTML, which is called HyperText Markup Language (HyperText Markup Language), is a standard Markup Language for creating web pages, and describes the web pages by Markup tags, where the HTML tags are used to describe characters, graphics, animation, sound, tables, links, etc., and a browser reads an HTML document, interprets the content of the tags in the document, and displays the content in the form of web pages.

CSS, known as Cascading Style Sheets (Cascading Style Sheets), is a computer language used to represent the Style of HTML documents, and may be used to define Style structures, such as fonts, colors, locations, etc. The CSS style can be directly stored in the HTML webpage or a separate style file, so that the style in the webpage can be controlled.

JavaScript, a language applied to Web page programming, can be inserted into an HTML page and interpreted and executed by a browser. The interaction logic of the Web application is realized by JavaScript. The JavaScript can package a JavaScript extension interface through the browser to realize communication with the kernel layer.

The middleware layer may provide some standardized interfaces to support the operation of various environments and systems. For example, the middleware layer may be implemented as multimedia and hypermedia information coding experts group (MHEG) middleware related to data broadcasting, DLNA middleware which is middleware related to communication with an external device, middleware which provides a browser environment in which each application program in the display device operates, and the like.

The kernel layer provides core system services, such as: file management, memory management, process management, network management, system security authority management and the like. The kernel layer may be implemented as a kernel based on various operating systems, for example, a kernel based on the Linux operating system.

The kernel layer also provides communication between system software and hardware, and provides device driver services for various hardware, such as: provide display driver for the display, provide camera driver for the camera, provide button driver for the remote controller, provide wiFi driver for the WIFI module, provide audio driver for audio output interface, provide power management drive for Power Management (PM) module etc..

In some embodiments, the user may also input a user command on a Graphical User Interface (GUI) displayed on the display 180, and the controller 110 may receive the user-input command. Among these, "user interfaces" are media interfaces for interaction and information exchange between an application or operating system and a user, which enable the conversion between an internal form of information and a form acceptable to the user. A common presentation form of a user interface is a Graphical User Interface (GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It may be an interface element such as an icon, window, control, etc. displayed in the display of the electronic device, where the control may include a visual interface element such as an icon, control, menu, tab, text box, dialog box, status bar, channel bar, Widget, etc.

As shown in fig. 5, a processor provided in some embodiments of the present application may be configured to perform the following program steps:

and S510, acquiring a video to be processed.

S520, determining a plurality of reference key frames in the video to be processed; wherein the plurality of reference key frames are arranged in sequence.

S530, dividing the video to be processed into a plurality of continuous video subsequences according to the similarity relation between the video frame in the video to be processed and the reference key frame; wherein one video sub-sequence corresponds to one reference key frame.

And S540, determining the first video frame in each video subsequence as a mark frame of each video subsequence.

And S550, storing the video to be processed after the mark frame of each video subsequence is determined in a memory.

According to the intelligent device provided by some embodiments of the application, the video to be processed can be obtained through the processor, and the reference key frame of the video to be processed is determined. Therefore, the video to be processed can be divided into a plurality of continuous video subsequences according to the similarity relation between the video frame in the video to be processed and the reference key frame, and the content of the video to be processed is automatically segmented. And the first video frame in each video subsequence is taken as a mark frame of the video subsequence to finish automatic segmentation. Therefore, when the display plays the video processed by the processor, the automatic skip of the previous or next video subsequence can be carried out through the mark frame of the video subsequence.

In some examples, a piece of video may be divided into a plurality of scenes, each scene in turn including one to a plurality of shots. And a shot may be composed of a number of successive video frame images to represent temporally and spatially successive work in a scene. Thus, a video can be viewed as a sequence of consecutive still images, where each still image is a video frame, the smallest unit that makes up the video. And the reference key frame is one of the video frames that can describe the main content of the shot.

In real life, videos watched by users are various, and for example, the users can watch gourmet teaching videos, watch entertainment videos, watch handmade videos and the like. In some examples, the video processed by the processor 113 in the present application may include a food teaching video, an entertainment video, a handmade video, and the like. Of course, the video processed by the processor in this application may be determined according to the actual implementation and is not limited herein.

In some examples, the user may input instructions to the processor 113 to retrieve the video, such that the processor 113 may search for multiple videos through the WIFI module 133. However, the user does not view every video, but rather selects some videos of interest for viewing. The video in the embodiment of the application can be the video of interest selected by the user. And, after the user selects the videos of interest, the processor will perform the processing procedures of steps S510 to S540 on the videos, so as to cache or download the videos of interest into the local memory 120. When the user controls the processor 113 to play the videos, the processor can directly fetch the processed videos from the memory 120 to play the videos, so that the played videos can automatically jump to the previous or next video subsequence through the mark frame of the video subsequence.

In some examples, as shown in fig. 6, the processor 113 may be further configured to perform the following program steps:

and S521, extracting the feature vector of each video frame in the video to be processed.

In some examples, the processor 113 may extract image features of each video frame in the video to be processed according to a machine learning method or a deep learning method in the related art. And characterizing the image features of each video frame as a feature vector of T1, where T is the dimension of the feature vector.

In some examples, the color features may include Lab color histograms, HSV color histograms, Luv color histograms, color moments, color aggregation vectors, color correlation maps, and the like. Structural features may include SFIT (Scale Invariant Feature Transform) operators, Histogram of Oriented Gradients (HOG), Haar features, wavelet descriptors, and the like. Texture features may include LBP (Local Binary pattern), gray level co-occurrence matrix, etc.

In some examples, the image features may include at least one of texture features, structural features, color features. For example, the invention can adopt a multi-feature fusion method to extract features, namely, the invention respectively adopts HSV color histogram, HOG histogram and LBP to extract image features of each video frame, and the extracted image features are respectively expressed as histogram vectors of 32-bin, 64-bin and 32-bin. Then, these three feature vectors are reconstructed into a 128-bin histogram vector, i.e., the feature vector corresponding to each video frame image is a 128 × 1 feature vector. Of course, in practical applications, the specific implementation of the image features may be determined according to the requirements of practical applications, and is not limited herein.

S522, determining a plurality of candidate key frames in the video to be processed according to the feature vectors of all video frames in the video to be processed; wherein the plurality of candidate key frames are arranged in sequence.

In some examples, a plurality of candidate key frames may be extracted from video frames of a video to be processed using a deep learning based key frame extraction method. Illustratively, the deep learning based key frame extraction method may utilize a convolutional neural network for adaptive extraction of key frames.

In some examples, a plurality of candidate keyframes may also be extracted from the video frames of the video to be processed using a machine learning-based keyframe extraction method. Illustratively, the key frame extraction method based on machine learning may include a sampling-based method, a shot boundary method, a color feature method, a motion analysis method, a clustering method, and the like.

And S523, determining the initial similarity between every two adjacent candidate key frames according to the feature vectors of the candidate key frames.

And S524, judging whether the initial similarity is greater than an initial similarity threshold value or not for each initial similarity. If yes, go to step S525; if not, go to step S526.

Illustratively, when the initial similarity is greater than the initial similarity threshold, it may be stated that there is a greater similarity between two candidate key frames corresponding to the initial similarity. In practical applications, the initial similarity threshold may be determined according to practical application requirements, and is not limited herein.

And S525, determining the candidate key frame in the two candidate key frames corresponding to the initial similarity as a reference key frame, and discarding the other candidate key frame.

And S526, determining the two candidate key frames corresponding to the initial similarity as reference key frames.

In some examples, for the extracted partial candidate keyframes, there may be a large similarity between two adjacent candidate keyframes, so an initial similarity between two adjacent candidate keyframes may be determined for each of the two adjacent candidate keyframes. If the initial similarity is greater than the initial similarity threshold of a certain value, it indicates that there is a greater similarity between the two candidate key frames. Therefore, the candidate key frame with smaller sequence number can be selected as the reference key frame, and the other candidate key frame can be discarded. Otherwise, the two candidate key frames may be retained at the same time, and both of the two candidate key frames may be used as the reference key frame.

In some examples, as shown in fig. 7, the processor 113 may be further configured to perform the following program steps:

and S531, determining a first similarity between each video frame and the nth reference key frame and a second similarity between each video frame and the (n + 1) th reference key frame for each video frame between the nth reference key frame and the (n + 1) th reference key frame which are sequentially arranged.

S532, for each video frame between the n-th reference key frame and the n + 1-th reference key frame which are sequentially arranged, when the first similarity corresponding to the video frame is larger than the second similarity, the attribute of the video frame is determined to be the same as that of the n-th reference key frame, and when the first similarity corresponding to the video frame is smaller than the second similarity, the attribute of the video frame is determined to be the same as that of the n + 1-th reference key frame.

And S533, dividing the video to be processed into a plurality of continuous video subsequences according to the rule that the video frame with the same attribute and the reference key frame are divided into one video subsequence. For example, the video to be processed F0 is divided into a plurality of consecutive video sub-sequences F1、F2、F3、……Fj、……、FJ. J and J are integers, and J is more than or equal to 1 and less than or equal to J. And, a video sub-sequence FjCorresponding to a reference key frame.

In some examples, the reference key frame is the frame of image in each video sub-sequence that best characterizes the video content to be played by the video sub-sequence. And judging the attribute of the video frame in the video based on the obtained reference key frame. Therefore, the video frames with the same attribute can be classified into the video sub-sequences of the same video according to the obtained attribute of each video frame. According to the method, the video frames with the same attributes and the reference key frames are divided into the video sub-sequences, so that the video frames in one video sub-sequence can form a shot or a scene, and the video frames in one video sub-sequence can form an operation step or an operation state, so that the video to be processed can be divided into a plurality of continuous video sub-sequences, the video to be processed can be automatically divided to obtain content segmentation time points, and the adjacent video sub-sequences can contain the video frames with different operation steps or operation states.

In some examples, for each video frame between the n-th reference key frame and the n + 1-th reference key frame in the sequential arrangement, a first similarity between each video frame and the n-th reference key frame may be determined by calculating a first distance between the feature vector of each video frame and the feature vector of the n-th reference key frame according to the feature vector of each video frame determined above. And, a second similarity between each video frame and the (n + 1) th reference key frame is also determined by calculating a second distance between the feature vector of each video frame and the feature vector of the (n + 1) th reference key frame. Illustratively, the calculated first and second distances may include, but are not limited to, Euclidean distances, Manhattan distances, Chebyshev distances, Minkowski distances, Mahalanobis distances, cosine similarity distances, Hamming distances, correlation coefficient distances, KL divergence distances, and the like. For example, if a first distance between a feature vector of a certain video frame and a feature vector of an nth reference key frame is smaller than a second distance between the feature vector of the video frame and a feature vector of an (n + 1) th reference key frame, it may be said that the video frame is more similar to the (n + 1) th reference key frame, and it may be said that the attribute of the video frame may be the same as the (n + 1) th reference key frame.

In some examples, the processor 113 may be further configured to perform the following program steps:

determining a middle key frame in the jth video subsequence according to the jth video subsequence in the plurality of video subsequences, and updating a reference key frame of the jth video subsequence according to the middle key frame;

and when the updated reference key frame in the jth video subsequence is the same as the reference key frame before updating, taking the first video frame in the jth video subsequence as the mark frame of the jth video subsequence.

In some examples, as shown in fig. 8, the processor 113 may be further configured to perform the following program steps:

s5331, aiming at the j video subsequence in the plurality of video subsequences, determining an intermediate key frame in the j video subsequence, and updating a reference key frame of the j video subsequence according to the intermediate key frame.

Illustratively, for the jth video sub-sequence FjThe feature vector corresponding to each video frame in the video frame is clustered to obtain a clustering center vector Cj. For the jth video sub-sequence FjCalculating a feature vector and a cluster center vector C of each video framejA third distance between, selecting a jth video sub-sequence FjAnd taking the video frame with the minimum third distance as an intermediate key frame. And according to the intermediate key frame to the jth video subsequence FjIs updated to obtain the jth video sub-sequence FjThe corresponding new reference key frame.

And, for the j video sub-sequence FjThe method for clustering the feature vectors corresponding to each video frame in the video coding method can be, but is not limited to, k-means clustering, hierarchical clustering, FCM clustering, Gaussian mixture clustering and the like.

S5332, judging whether the updated reference key frame in the jth video subsequence is the same as the reference key frame before updating. If yes, go to step S5333; if not, steps S5334-S5336 are executed.

S5333, the first video frame in the j video subsequence is used as the mark frame of the j video subsequence.

S5334, determining a third similarity between each video frame and the updated reference key frame in the j video subsequence and determining a fourth similarity between each video frame and the updated reference key frame in the adjacent video subsequence for the video frames between the updated reference key frame in the j video subsequence and the updated reference key frame in the adjacent video subsequence;

s5335, aiming at a video frame between the updated reference key frame in the jth video subsequence and the updated reference key frame in the adjacent video subsequence, when the third similarity corresponding to the video frame is greater than the fourth similarity, determining that the video frame belongs to the jth video subsequence, and when the third similarity corresponding to the video frame is less than the fourth similarity, determining that the video frame belongs to the video subsequence adjacent to the jth video subsequence;

s5336, determining the middle key frame in the j video subsequence again, and updating the updated reference key frame in the j video subsequence again according to the middle key frame until the updated reference key frame in the j video subsequence is the same as the reference key frame before updating.

In some embodiments of the present application, when the updated reference key frame in the jth video subsequence is the same as the reference key frame before updating, it can be stated that the video frames in the jth video subsequence may form a shot or a scene, so that the content of the video to be processed can be automatically segmented.

In some examples, the processor 113 may be further configured to perform the following program steps:

when the display plays the video to be processed which is processed by the processor, receiving a first input instruction, controlling the video played by the display to be switched from the video frame of the current video subsequence to the mark frame of the last video subsequence adjacent to the current video subsequence, and playing the switched last video subsequence from the mark frame on the display;

and when the display plays the video to be processed which is processed by the processor, receiving a second input instruction, controlling the video played on the display to be switched from the video frame of the current video subsequence to the mark frame of the next video subsequence adjacent to the current video subsequence, and playing the switched next video subsequence from the mark frame on the display.

In some examples, the processor 113 may be further configured to perform the following program steps:

receiving a third input instruction, and controlling the video played on the display to be paused at the current video frame;

receiving a fourth input instruction, and controlling the paused video on the display to start playing from the current video frame;

receiving a fifth input instruction, and controlling the video played on the display to advance by a first preset number of video frames;

and receiving a sixth input instruction, and controlling the video played on the display to retreat by a second preset number of video frames.

In some embodiments of the present application, when the display plays the video processed by the processor, the automatic skip of the previous or next video sub-sequence can be performed through the mark frame of the video sub-sequence.

The present invention will be described in detail below with reference to fig. 9 to 10c by way of specific examples. It should be noted that the present embodiment is intended to better explain the present invention, but not to limit the present invention. Taking a video to be processed as a food teaching video as an example.

The working process of the intelligent device provided by some embodiments of the present application may include the following steps:

(1) the processor 113 acquires a food teaching video F from the network through the WIFI module0

(2) The processor 113 may respectively adopt the HSV color histogram, the HOG histogram, and the LBP for the gourmet teaching video F according to the deep learning method0Each video frame in (1) is subjected to image feature extraction. After image feature extraction is performed on one video frame by adopting the HSV color histogram, the extracted image feature can be expressed as a histogram vector of 32-bin. After image feature extraction is performed on one video frame by using the HOG histogram, the extracted image feature can be expressed as a 64-bin histogram vector. After image feature extraction is performed on a video frame by using LBP, the video frame can be expressed as a histogram vector of 32-bin. And reconstructing the three feature vectors corresponding to the same video frame into a histogram vector of 128-bin, namely obtaining a feature vector of 128 multiplied by 1 corresponding to each video frame image.

(3) The processor 113 may employ a deep learning based keyframe extraction method from the gourmet teaching video F based on feature vectors of individual video frames0Extracting K candidate key frames Fs arranged along the playing progress sequence of the video1、Fs2、Fs3、……Fsk、……FsK. Wherein K and K are integers, and K is more than or equal to 1 and less than or equal to K.

(4) And determining the initial similarity between every two adjacent candidate key frames according to the feature vector of each candidate key frame. For example, the initial similarity between each two adjacent candidate key frames may be determined by calculating the initial distance between the feature vectors corresponding to each two adjacent candidate key frames. Where the calculated initial distance may include, but is not limited to, an euclidean distance, a manhattan distance, a chebyshev distance, a minkowski distance, a mahalanobis distance, a cosine similarity distance, a hamming distance, a correlation coefficient distance, a KL-divergence distance, and the like.

For example, the first candidate keyframe Fs in the order is determined1And a second candidate key frame Fs2Initial similarity between CD1And determining a second candidate keyframe Fs in the order2And a third candidate key frame Fs3Initial similarity between CD2

The calculated first and second distances may include, but are not limited to, euclidean distances, manhattan distances, chebyshev distances, minkowski distances, mahalanobis distances, cosine similarity distances, hamming distances, correlation coefficient distances, KL divergence distances, and the like.

(5) And judging whether the initial similarity is larger than an initial similarity threshold or not for each initial similarity. If yes, executing the step (6); if not, executing step (7).

(6) For example, the initial similarity CD2If the value is greater than the initial similarity threshold, the second candidate key frame Fs2 is determined as the reference key frame, and the third candidate key frame Fs is determined as the reference key frame3Discard is performed.

(7) For example, the initial similarity CD1Not greater than the initial similarity threshold, the first candidate key frame Fs1And a second candidate key frame Fs2Are determined as reference key frames.

Through the steps (5) to (7), N reference key frames sequentially arranged along with the playing progress of the video can be determined: fz1、Fz2、Fz3、……Fzn、……FzN. Wherein N and N are integers and are not more than 1n≤N。

(8) In order of the 1 st reference key frame Fz1And 2 nd reference key frame Fz2A video frame f in betweeniFor example, a video frame f is determinediCorresponding feature vector and the 1 st reference key frame Fz1A first distance between corresponding feature vectors, determining a video frame f according to the first distanceiAnd the 1 st reference key frame Fz1A first similarity therebetween.

And, determining the video frame fiCorresponding feature vector and 2 nd reference key frame Fz2A second distance between the corresponding feature vectors, determining a video frame f based on the second distanceiAnd 2 nd reference key frame Fz2A second degree of similarity therebetween.

The first similarity and the second similarity corresponding to the other video frames, and so on, are not described herein again.

(9) In video frame fiWhen the corresponding first similarity is greater than the second similarity, the video frame f can be determinediAnd the 1 st reference key frame Fz1Are the same.

In video frame fiWhen the corresponding first similarity is smaller than the second similarity, the video frame f can be determinediAnd 2 nd reference key frame Fz2Are the same.

The attribute determination method for the rest of the video frames and the reference key frame, and so on, are not described herein again.

(10) Dividing the food teaching video F into a video subsequence according to the rule that the video frame with the same attribute and the reference key frame0Partitioning into a plurality of consecutive video subsequences: f1、F2、F3、……Fj、……、FJ. And, a video sub-sequence FjCorresponding to a reference key frame Fzn. I.e., n ═ j.

(11) With the 2 nd video sub-sequence F2For example, for the 2 nd video sub-sequence F2The feature vector corresponding to each video frame in the video frame is clustered to obtain a clustering center vector C2. For the 2 nd video sub-sequenceF2Calculating a feature vector and a cluster center vector C of each video frame2Of the 2 nd video sub-sequence F is selected2And taking the video frame with the minimum third distance as an intermediate key frame. And according to the intermediate key frame to the 2 nd video subsequence F2Is updated to obtain the 2 nd video sub-sequence F2A corresponding updated reference key frame.

The process of determining the middle key frame of the remaining video sub-sequences, and so on, is not described herein.

(12) Determining the 2 nd video sub-sequence F2Whether the updated reference key frame is the same as the reference key frame before updating. If yes, executing step (13); if not, executing the steps (14) to (16).

(13) The 2 nd video sub-sequence F2As the 2 nd video sub-sequence F2The flag frame of (1).

(14) To be in the 2 nd video sub-sequence F2The updated reference key frame and the 1 st video sub-sequence F1A video frame f between the updated reference key framesqFor example, a video frame f is determinedqCorresponding feature vector and 2 nd video sub-sequence F2Determining a third distance between the feature vectors corresponding to the updated reference key frame, and determining a video frame f according to the third distanceqWith the 2 nd video sub-sequence F2The third similarity between the updated reference key frames.

And, determining the video frame fqCorresponding feature vector and the 1 st video sub-sequence F1Determining a fourth distance between the feature vectors corresponding to the updated reference key frame to determine a video frame fqWith the 1 st video sub-sequence F1And (4) a fourth similarity between the updated reference keyframes.

Illustratively, the calculated third and fourth distances may include, but are not limited to, Euclidean distances, Manhattan distances, Chebyshev distances, Minkowski distances, Mahalanobis distances, cosine similarity distances, Hamming distances, correlation coefficient distances, KL divergence distances, and the like.

The third similarity and the fourth similarity corresponding to the other video frames, and so on, are not described herein.

(15) In video frame fqWhen the corresponding third similarity is larger than the fourth similarity, determining the video frame fqBelonging to the 2 nd video sub-sequence F2

In video frame fqWhen the corresponding third similarity is smaller than the fourth similarity, determining the video frame fqBelonging to the 1 st video sub-sequence F1

Through the steps (12) to (15), when the updated reference key frame is not the same as the reference key frame before updating in the video subsequence, the video frame is determined again, and a new video subsequence can be determined again. The determination methods of the other video frames are analogized in turn, and are not described herein again.

(16) Repeating the steps (11) to (15) for multiple times until the updated reference key frame in the video subsequence is the same as the reference key frame before updating. Therefore, the food teaching video F can be realized0Automatically segmenting the content of (1).

(17) And (4) storing the gourmet teaching video processed in the steps (1) to (16) in a memory 120.

(18) The user may input a video play instruction to the processor 113 through voice interaction or a touch pad. Alternatively, the video playing instruction may be input to the processor 113 by recognizing a gesture of the user through the image capturing unit. The processor 113 retrieves the stored gourmet teaching video from the memory 120 and controls the display 180 to play the gourmet teaching video based on the video play instruction.

(19) The processor 113, when the display plays the gourmet teaching video processed by the processor, receives a first input instruction (e.g., a previous playing instruction), and may control the display to switch the video played from the video frame of the current video sub-sequence to the mark frame of the previous video sub-sequence adjacent to the current video sub-sequence, and play the switched previous video sub-sequence on the display from the mark frame. Therefore, the food teaching video can automatically jump to the mark frame of the last video clip to start playing. For example, the gourmet teaching video can be automatically jumped to the mark frame of the video clip of the previous operation step or operation state to start playing.

(20) The processor 113 receives a second input instruction (e.g., a next playing instruction) when the display plays the gourmet teaching video processed by the processor, controls the video played on the display to switch from the video frame of the current video sub-sequence to the mark frame of the next video sub-sequence adjacent to the current video sub-sequence, and plays the switched next video sub-sequence on the display from the mark frame. Therefore, the food teaching video can automatically jump to the mark frame of the next video clip to start playing. For example, the cate teaching video can be automatically jumped to the next operation step or the mark frame of the video clip of the operation state to start playing.

(21) The processor 113 receives a third input command, and can control the video played on the display to pause at the current video frame. Therefore, the food teaching video played on the display can be paused and controlled.

The processor 113, receiving the fourth input command, may control the paused video on the display to start playing from the current video frame. Therefore, the food teaching video paused on the display can be played and controlled.

The processor 113 receives a fifth input command, and may control the video played on the display to advance by a first preset number of video frames. Therefore, the fast forward control operation can be carried out on the food teaching video played on the display. It should be noted that the first preset number may be set and determined according to an actual application requirement, and is not limited herein.

The processor 113 receives a sixth input command, and may control the video played on the display to move back by a second preset number of video frames. Therefore, the food teaching video played on the display can be controlled to retreat. It should be noted that the first preset number may be the same as the second preset number. Alternatively, the first predetermined number may be different from the second predetermined number. The second preset number may be set and determined according to actual application requirements, and is not limited herein.

Based on the same inventive concept, some embodiments of the present application further provide some processors, and the processor 113 may also be configured to execute the following program steps:

receiving a video to be played;

receiving a first input instruction, controlling a video played by a display to be switched from a current video subsequence to a previous video subsequence adjacent to the current video subsequence, and playing the switched previous video subsequence on the display;

and/or receiving a second input instruction, controlling the video played on the display to be switched from the current video subsequence to a next video subsequence adjacent to the current video subsequence, and playing the switched next video subsequence on the display;

the video to be played is divided into a plurality of continuous video subsequences, and the adjacent video subsequences contain video frames in different operation steps or operation states. And the video subsequence is determined according to the similarity between the video frame and the corresponding reference key frame, and the reference key frame is determined from the video to be played.

In some examples, as shown in fig. 11, the processor 113 may also be configured to perform the following program steps:

s1110, receiving a video to be played;

s1120, determining a plurality of reference key frames in the video to be played; wherein the plurality of reference key frames are arranged in sequence;

s1130, dividing the video to be played into a plurality of continuous video subsequences according to the similarity relation between the video frame in the video to be played and the reference key frame; wherein one video sub-sequence corresponds to one reference key frame;

s1140, after determining the first video frame in each video subsequence as the mark frame of each video subsequence, controlling a display to play a video to be played;

s1150, receiving a first input instruction, controlling the video played by the display to be switched from the current video subsequence to the previous video subsequence adjacent to the current video subsequence, and playing the switched previous video subsequence on the display;

s1160, receiving a second input instruction, controlling the video played on the display to be switched from the current video subsequence to a next video subsequence adjacent to the current video subsequence, and playing the switched next video subsequence on the display.

Steps S1120 to S1140 are processes executed in the processor. Steps S1150 to S1160 are a process of the user interacting with the smart device.

According to the intelligent device provided by some embodiments of the present application, the video with playing function can be obtained through the processor, and the reference key frame of the video with playing function is determined. Therefore, the video with play can be divided into a plurality of continuous video subsequences according to the similarity relation between the video frame in the video with play and the reference key frame, so that the content of the video with play can be automatically segmented. And the first video frame in each video subsequence is taken as a mark frame of the video subsequence to finish automatic segmentation. Therefore, when the display plays the video processed by the processor, the video played by the display can automatically jump to the mark frame of the previous video subsequence and then start to be automatically played by receiving the first input instruction. Or the second input instruction is received, so that the video played by the display automatically jumps to the mark frame of the next video subsequence and then starts to be automatically played.

In some examples, the processing procedure of the video to be played by the processor 113 is substantially the same as the processing procedure of the video to be processed, and is not described herein again.

Based on the same inventive concept, some embodiments of the present application further provide a video processing method, as shown in fig. 5, which may include the following program steps:

and S510, acquiring a video to be processed.

S520, determining a plurality of reference key frames in the video to be processed; wherein the plurality of reference key frames are arranged in sequence.

S530, dividing the video to be processed into a plurality of continuous video subsequences according to the similarity relation between the video frame in the video to be processed and the reference key frame; wherein one video sub-sequence corresponds to one reference key frame.

And S540, determining the first video frame in each video subsequence as a mark frame of each video subsequence.

And S550, storing the video to be processed after the mark frame of each video subsequence is determined in a memory.

It should be noted that, for the working process and principle of the video processing method provided in some embodiments of the present application, reference may be made to the working process and principle of the processor, which is not described herein again.

Based on the same inventive concept, some embodiments of the present application further provide a video playing method, which may include the following program steps:

receiving a video to be played;

receiving a first input instruction, controlling a video played by a display to be switched from a current video subsequence to a previous video subsequence adjacent to the current video subsequence, and playing the switched previous video subsequence on the display;

and/or receiving a second input instruction, controlling the video played on the display to be switched from the current video subsequence to a next video subsequence adjacent to the current video subsequence, and playing the switched next video subsequence on the display;

the video to be played is divided into a plurality of continuous video subsequences, and the adjacent video subsequences contain video frames in different operation steps or operation states. And the video subsequence is determined according to the similarity between the video frame and the corresponding reference key frame, and the reference key frame is determined from the video to be played.

It should be noted that, the working process and principle of the video playing method provided in some embodiments of the present application may refer to the working process and principle of the processor, which are not described herein again.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

29页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:剪辑推荐方法、装置、电子设备、存储介质及程序产品

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类