Delay calibration method, delay calibration device, computer equipment and storage medium

文档序号：38411 发布日期：2021-09-24 浏览：6次中文

阅读说明：本技术 延迟校准方法、装置、计算机设备和存储介质 (Delay calibration method, delay calibration device, computer equipment and storage medium ) 是由门泽华于 2021-05-18 设计创作，主要内容包括：本申请涉及一种延迟校准方法、装置、计算机设备和存储介质。所述方法包括：获取视频组,视频组中至少包括一个视频；对惯性传感器与视觉系统之间的延时值进行更新,并基于更新后的延时值,获取视频组对应的防抖性能得分,重复上述延时值的更新过程及获取防抖性能得分的过程,直至获取到的防抖性能得分满足预设条件,则获取满足预设条件的防抖性能得分所对应的延时值。由于不需要由IMU与视觉系统分别估计两组运动,再以两组运动之间的误差作为代价值,使误差最小化来估计两者之间的延迟,从而能够避免两组运动估计本身所带来的误差,进而能够提高校准延迟时的精准度。(The application relates to a delay calibration method, a delay calibration device, computer equipment and a storage medium. The method comprises the following steps: acquiring a video group, wherein the video group at least comprises one video; updating a delay value between the inertial sensor and the vision system, acquiring an anti-shake performance score corresponding to the video group based on the updated delay value, repeating the updating process of the delay value and the process of acquiring the anti-shake performance score until the acquired anti-shake performance score meets a preset condition, and acquiring the delay value corresponding to the anti-shake performance score meeting the preset condition. Because the IMU and the vision system do not need to estimate two groups of motions respectively, and the error between the two groups of motions is used as a cost value, the error is minimized to estimate the delay between the two groups of motions, thereby avoiding the error caused by the estimation of the two groups of motions and further improving the accuracy of the calibration delay.)

1. A method of delay calibration, the method comprising:

acquiring a video group, wherein the video group at least comprises one video;

updating a delay value between an inertial sensor and a vision system, acquiring an anti-shake performance score corresponding to the video group based on the updated delay value, repeating the updating process of the delay value and the process of acquiring the anti-shake performance score until the acquired anti-shake performance score meets a preset condition, and acquiring the delay value corresponding to the anti-shake performance score meeting the preset condition;

the inertial sensor and the visual system are coupled on the same shooting device, each video in the video group is acquired based on the visual system, the anti-shake processing is completed through the visual system and the inertial sensor and based on a delay value between the visual system and the inertial sensor, and the anti-shake performance score is used for evaluating the anti-shake effect of the videos after the anti-shake processing is performed.

2. The method of claim 1, wherein the obtaining the set of videos comprises:

acquiring a plurality of videos, wherein the videos are shot on the premise that the shooting equipment shakes;

screening the plurality of videos according to the acquired attitude data of the shooting equipment in the shooting time period corresponding to each video in the plurality of videos, and forming the video group by the videos obtained after screening; wherein the attitude data of the photographing apparatus is acquired based on the inertial sensor.

3. The method according to claim 2, wherein the filtering the plurality of videos according to the acquired pose data of the shooting device in the shooting time period corresponding to each of the plurality of videos comprises:

converting the attitude data of the shooting equipment acquired in the shooting time period corresponding to each video into a frequency domain space to obtain an amplitude-frequency characteristic curve set corresponding to each video;

acquiring a frequency domain value corresponding to each video according to the amplitude-frequency characteristic curve set corresponding to each video;

and screening the plurality of videos according to the frequency domain value corresponding to each video.

4. The method according to claim 3, wherein the obtaining the frequency domain score corresponding to each video according to the amplitude-frequency characteristic curve set corresponding to each video comprises:

for an amplitude-frequency characteristic curve set corresponding to any video, acquiring a frequency domain score corresponding to each amplitude-frequency characteristic curve according to the frequency and amplitude corresponding to each amplitude-frequency characteristic curve in the amplitude-frequency characteristic curve set;

and acquiring the frequency domain value corresponding to any video according to the frequency domain value corresponding to each amplitude-frequency characteristic curve.

5. The method according to claim 4, wherein the obtaining a frequency domain score corresponding to each amplitude-frequency characteristic curve according to a frequency and an amplitude corresponding to each amplitude-frequency characteristic curve in the amplitude-frequency characteristic curve set comprises:

obtaining the product of the frequency corresponding to each amplitude-frequency characteristic curve and the amplitude, and taking the product as the frequency domain score corresponding to each amplitude-frequency characteristic curve; alternatively, the first and second electrodes may be,

and obtaining the score of the frequency corresponding to each amplitude-frequency characteristic curve, obtaining the product of the score corresponding to each amplitude-frequency characteristic curve and the amplitude, and taking the product as the frequency domain score corresponding to each amplitude-frequency characteristic curve.

6. The method according to claim 4, wherein the obtaining the frequency domain score corresponding to any one of the videos according to the frequency domain score corresponding to each of the frequency characteristic curves comprises:

and carrying out weighted summation on the frequency domain scores corresponding to all the amplitude-frequency characteristic curves in the amplitude-frequency characteristic curve set, and taking the obtained sum value as the frequency domain score corresponding to any video.

7. The method of claim 3, wherein the filtering the plurality of videos according to the frequency domain score corresponding to each video comprises:

and sorting the frequency domain values corresponding to each video in the plurality of videos from large to small, selecting a preset number of videos, and taking the videos as the videos obtained after screening.

8. A delay calibration apparatus, characterized in that the apparatus comprises:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a video group, and the video group at least comprises one video;

the updating module is used for updating a delay value between the inertial sensor and the vision system, acquiring an anti-shake performance score corresponding to the video group based on the updated delay value, repeating the updating process of the delay value and the process of acquiring the anti-shake performance score until the acquired anti-shake performance score meets a preset condition, and acquiring the delay value corresponding to the anti-shake performance score meeting the preset condition;

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a delay calibration method and apparatus, a computer device, and a storage medium.

Background

At present, the attitude of a camera is usually calculated from a shake signal detected by an IMU (Inertial Measurement Unit), and then an image captured by a vision system is compensated according to the calculated attitude of the camera, so as to realize electronic anti-shake. For example, when the vision system captures an image of a certain frame, the jitter detected by the IMU is corresponding to the previous frame, but the system may consider that the two are matched at the same time, that is, the vision system is difficult to capture the image right at the moment when the IMU detects the jitter, so that for the delay, in practical application, the delay between the IMU and the vision system needs to be calibrated, that is, for a clock corresponding to the IMU and a clock of the vision system, a time deviation of one clock with the other clock as a standard needs to be determined.

In the related art, two sets of motions are estimated by an IMU and a vision system, respectively, and then a nonlinear optimization algorithm is used to estimate a delay between the two sets of motions by minimizing an error as a cost value. Because two groups of motion estimation have errors, the delay estimated by the method has low precision, and the requirement of high-precision delay cannot be met. In addition, if there is a periodically repeated motion in the two sets of motions, the method may also have an estimation error.

Disclosure of Invention

In view of the foregoing, there is a need to provide a delay calibration method, apparatus, computer device and storage medium capable of accurately calibrating the delay between an IMU and a vision system.

A method of delay calibration, the method comprising:

acquiring a video group, wherein the video group at least comprises one video;

updating a delay value between an inertial sensor and a vision system, acquiring an anti-shake performance score corresponding to a video group based on the updated delay value, repeating the updating process of the delay value and the process of acquiring the anti-shake performance score until the acquired anti-shake performance score meets a preset condition, and acquiring the delay value corresponding to the anti-shake performance score meeting the preset condition;

In one embodiment, obtaining the video group comprises:

acquiring a plurality of videos, wherein the videos are shot on the premise that the shooting equipment shakes;

screening the plurality of videos according to the acquired attitude data of the shooting equipment in the shooting time period corresponding to each video in the plurality of videos, and forming a video group by the videos obtained after screening; wherein the attitude data of the photographing apparatus is acquired based on the inertial sensor.

In one embodiment, screening a plurality of videos according to the acquired attitude data of the shooting device in the shooting time period corresponding to each of the plurality of videos includes:

acquiring a frequency domain value corresponding to each video according to the amplitude-frequency characteristic curve set corresponding to each video;

and screening the plurality of videos according to the frequency domain value corresponding to each video.

In one embodiment, obtaining a frequency domain score corresponding to each video according to a magnitude-frequency characteristic curve set corresponding to each video includes:

and acquiring the frequency domain value corresponding to the video according to the frequency domain value corresponding to each frequency characteristic curve.

In one embodiment, obtaining a frequency domain score corresponding to each amplitude-frequency characteristic curve according to a frequency and an amplitude corresponding to each amplitude-frequency characteristic curve in the amplitude-frequency characteristic curve set includes:

In one embodiment, obtaining the frequency domain score corresponding to any video according to the frequency domain score corresponding to each frequency characteristic curve includes:

In one embodiment, screening a plurality of videos according to the frequency domain score corresponding to each video includes:

A delay calibration apparatus, the apparatus comprising:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a video group, and the video group at least comprises one video;

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring a video group, wherein the video group at least comprises one video;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring a video group, wherein the video group at least comprises one video;

According to the delay calibration method, the delay calibration device, the computer equipment and the storage medium, the delay value between the inertial sensor and the vision system is updated by acquiring the video set, the anti-shake performance score corresponding to the video set is acquired based on the updated delay value, the updating process of the delay value and the process of acquiring the anti-shake performance score are repeated until the acquired anti-shake performance score meets the preset condition, and the delay value corresponding to the anti-shake performance score meeting the preset condition is acquired. Because the IMU and the vision system do not need to estimate two groups of motions respectively, and the error between the two groups of motions is used as a cost value, the error is minimized to estimate the delay between the two groups of motions, thereby avoiding the error caused by the estimation of the two groups of motions and further improving the accuracy of the calibration delay.

Drawings

FIG. 1 is a flow diagram illustrating a method for delay calibration in one embodiment;

FIG. 2 is a flow chart illustrating a method for delay calibration in another embodiment;

FIG. 3 is a block diagram of a delay calibration apparatus according to an embodiment;

FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various terms, but these terms are not limited by these terms unless otherwise specified. These terms are only used to distinguish one term from another. For example, the third preset threshold and the fourth preset threshold may be the same or different without departing from the scope of the present application.

At present, the mobile terminal takes pictures and makes a video recording better and better, and has gradually replaced traditional card machine to more and more mobile terminals cover super wide angle, long focus and portrait scene through the combination of many cameras, so as to bring better image experience. One of them is the topic that can not be bypassed, namely anti-shake. The anti-shake effect is applied to video, and the excellent anti-shake effect in photographing can bring a larger safety shutter and improve the film forming rate, so that the anti-shake effect is also a target pursued by many mobile terminal manufacturers.

Based on the requirements, electronic anti-shake comes along. The EIS (Electronic Image stabilization) is mainly to compensate an Image at an edge according to a signal corresponding to a minute shake by using a minute shake detected by a sensor in a photographing device during an Image photographing process after an Image is photographed, so as to overcome an Image blur caused by the shake of the photographing device. In the related art, the sensor mainly utilized is the IMU. Correspondingly, when electronic anti-shake is realized, the camera posture is calculated by mainly utilizing the shake signals detected from the IMU, and then the image shot by the vision system is compensated according to the calculated camera posture.

For example, when the vision system captures an image of a certain frame, the jitter detected by the IMU is corresponding to the previous frame, but the system may consider that the two are matched at the same time, that is, the vision system is difficult to capture the image right at the moment when the IMU detects the jitter, so that for the delay, in practical application, the delay between the IMU and the vision system needs to be calibrated, that is, for a clock corresponding to the IMU and a clock of the vision system, a time deviation of one clock with the other clock as a standard needs to be determined. In the related art, two sets of motions are estimated by an IMU and a vision system, respectively, and then a nonlinear optimization algorithm is used to estimate a delay between the two sets of motions by minimizing an error as a cost value. Because two groups of motion estimation have errors, the delay estimated by the method has low precision, and the requirement of high-precision delay cannot be met. In addition, if there is a periodically repeated motion in the two sets of motions, the method may also have an estimation error.

In view of the above problems in the related art, embodiments of the present invention provide a delay calibration method, which can be applied to a terminal, where the terminal can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, and the like. It can be understood that the delay calibration method may also be applied to a server, and the corresponding execution subject is the server, or according to actual needs and feasibility, the delay calibration method may be applied to both a terminal and the server, that is, the execution subject of a part of steps in the delay calibration method may be the terminal, and the execution subject of another part of steps may be the server, which is not specifically limited in this embodiment of the present invention. For example, step 101 in the method flow corresponding to fig. 1 may be executed by the terminal, and the terminal transmits the video group to the server, so that step 102 is executed by the server, and the server may transmit the video group to the terminal after acquiring the delay value between the IMU and the vision system. It should be noted that, the numbers of "a plurality" and the like mentioned in the embodiments of the present application each refer to a number of "at least two", for example, "a plurality" refers to "at least two".

Before describing the embodiments of the present application, a main application scenario of the present application will be described. The delay calibration method is mainly applied to calibrating the delay value between the IMU and the vision system, so that the subsequent IMU and the vision system realize electronic anti-shake based on the delay value between the IMU and the vision system. In conjunction with the above embodiments, in one embodiment, referring to fig. 1, a method of time delay calibration is provided. The method is applied to a terminal, and an execution subject is taken as an example for explanation, and the method comprises the following steps:

101. acquiring a video group, wherein the video group at least comprises one video;

102. updating a delay value between the inertial sensor and the vision system, acquiring an anti-shake performance score corresponding to the video group based on the updated delay value, repeating the updating process of the delay value and the process of acquiring the anti-shake performance score until the acquired anti-shake performance score meets a preset condition, and acquiring the delay value corresponding to the anti-shake performance score meeting the preset condition.

The inertial sensor and the visual system are coupled on the same shooting device, each video in the video group is acquired based on the visual system, the anti-shake processing is completed through the visual system and the inertial sensor and based on a delay value between the visual system and the inertial sensor, and the anti-shake performance score is used for evaluating the anti-shake effect of the video after the anti-shake processing. The inertial sensor and the vision system need to be coupled to the same shooting device because the embodiments of the present invention mainly calibrate the delay value between the inertial sensor and the vision system according to the imaging quality of the vision system. Among them, the inertial sensor needs to capture the shake of the photographing apparatus, and the vision system needs to take an image on the premise that the photographing apparatus has the shake and then determine the imaging quality. To achieve this, the inertial sensor and the vision system need to be coupled to the same camera.

In step 101, the video group may include only one video or may include a plurality of videos, which is not specifically limited in this embodiment of the present invention. In step 102, the anti-shake performance score corresponding to the video group is obtained based on the anti-shake performance score of each video in the video group. The embodiment of the present invention does not specifically limit the manner of obtaining the anti-shake performance score corresponding to the video group, and includes but is not limited to: adding the anti-shake performance scores of all videos in the video group, and taking the sum obtained by adding as the anti-shake performance score corresponding to the video group; or adding the anti-shake performance scores of all videos in the video group, averaging the sum obtained by adding, and taking the average value as the anti-shake performance score corresponding to the video group.

In addition, in step 102, the delay value may have an initial value, such as an initial value of 0. The first updating of the delay value may refer to updating an initial value of the delay value. Certainly, in the actual implementation process, the anti-shake performance score corresponding to the video group is obtained for the first time, and the delay value may not be updated, that is, the anti-shake performance score corresponding to the video group is not obtained based on the updated delay value, but is directly based on the initial value of the delay value.

The update mode of the delay value may be updated in a direction of increasing the delay value, or in a direction of decreasing the delay value, which is not specifically limited in the embodiment of the present invention. For example, the update may be performed in a direction in which the delay value increases, and the delay value may be increased by 0.2 seconds before the update and by 0.3 seconds after the update. The update is performed in the direction of decreasing the delay value, which may be 0.3 seconds before the update and 0.2 seconds after the update.

In the step 102, the preset condition may be set according to a requirement, for example, for the anti-shake performance score meeting the preset condition obtained in the step 102, the anti-shake performance score obtained after the delay value is updated for the last time is actually the anti-shake performance score obtained for the last time, and is also the anti-shake performance score obtained for the last time. Based on this, the preset condition may be that a difference between the anti-shake performance score obtained last time and the anti-shake performance score obtained last time is smaller than a first preset threshold, and at this time, the delay value corresponding to the anti-shake performance score meeting the preset condition may be the delay value corresponding to the anti-shake performance score obtained last time. Or, the preset condition may be that the anti-shake performance score obtained last time is greater than a second preset threshold, and at this time, the delay value corresponding to the anti-shake performance score meeting the preset condition may also be the delay value corresponding to the anti-shake performance score obtained last time.

Further alternatively, in consideration that when the updated delay value gradually approaches the true value of the delay value, the anti-shake performance score may gradually increase but the increase amplitude gradually decreases along with the approach of the updated delay value, based on the principle, the preset condition may be that the anti-shake performance scores obtained n consecutive times are all greater than a third preset threshold, and the difference between every two adjacent anti-shake performance scores in the anti-shake performance scores obtained n consecutive times is all less than a fourth preset threshold. Wherein n is a positive integer not less than 2. At this time, the delay value corresponding to the anti-shake performance score meeting the preset condition may be the delay value corresponding to the anti-shake performance score obtained last time. Of course, the preset condition in the actual implementation process may also be other contents, and this is not specifically limited in the embodiment of the present invention. It should be noted that the first preset threshold to the fourth preset threshold may be obtained according to actual measurement or experience, and this is not specifically limited in the embodiment of the present invention. Additionally, the IMU may include an accelerometer and a gyroscope, as embodiments of the invention are not limited in this respect.

In the method provided by the embodiment of the invention, the delay value between the inertial sensor and the visual system is updated by acquiring the video set, the anti-shake performance score corresponding to the video set is acquired based on the updated delay value, the updating process of the delay value and the process of acquiring the anti-shake performance score are repeated until the acquired anti-shake performance score meets the preset condition, and the delay value corresponding to the anti-shake performance score meeting the preset condition is acquired. Because the IMU and the vision system do not need to estimate two groups of motions respectively, and the error between the two groups of motions is used as a cost value, the error is minimized to estimate the delay between the two groups of motions, thereby avoiding the error caused by the estimation of the two groups of motions and further improving the accuracy of the calibration delay.

With reference to the content of the foregoing embodiments, in an embodiment, for any video in a video group, the embodiment of the present invention does not specifically limit the manner of obtaining the anti-shake performance score of the video, and includes but is not limited to: and acquiring the anti-shake performance score of the video according to the image frame parameters corresponding to the video.

The image frame parameters may include a degree of disparity and/or a degree of similarity between image frames, and the image frame parameters may be calculated based on image parameters between image frames in the video. The image parameter may include brightness and/or contrast, which is not particularly limited in this embodiment of the present invention. Taking the image parameters as brightness as an example, the image frame parameters may include a similarity and/or a difference in brightness between image frames. Taking the image parameters as contrast as an example, the image frame parameters may include similarity and/or difference of contrast between image frames. Taking the example that the image parameters include brightness and contrast, the image frame parameters may include similarity and/or difference of brightness and similarity and/or difference of contrast. The difference can be obtained by calculating the difference, and the similarity can be obtained by calculating the similarity algorithm. For example, the difference in brightness between two image frames may be obtained by calculating the difference in brightness between the two image frames. The similarity of the brightness between two image frames can be calculated by a similarity calculation method, for example, for the brightness feature vectors corresponding to the two image frames, the similarity between the two brightness feature vectors can be calculated as the similarity of the brightness between the two image frames.

As can be seen from the above process, the image frame parameters may be mainly used to represent the difference and/or similarity between image frames in the video. The difference and/or similarity between image frames in the video may be set according to requirements, and this is not particularly limited by the embodiment of the present invention. For example, the image frame parameter may be constituted by only the difference and/or similarity between the start frame and the intermediate frame in the video, may be constituted by only the difference and/or similarity between the intermediate frame and the end frame, may be constituted by the difference and/or similarity between the start frame and the intermediate frame, and may be constituted by the difference and/or similarity between the intermediate frame and the end frame.

It should be noted that a video is composed of a frame of image, and when the video is captured by a capturing device in a moving state, some image parameters are distorted between image frames in the video due to jitter. The deformation of the image parameters can be combined together, the image parameters are reflected in the visual effect, a poor shooting effect can be presented, for example, the video can be caused to present the poor shooting effect such as shooting shake blur, and the anti-shake processing can eliminate the deformation of the parameters as much as possible so as to improve the shooting effect. In terms of data processing, the deformation of these image parameters will be reflected on the calculation results corresponding to the image parameters between the image frames, that is, may be reflected on the image frame parameters. Therefore, the image frame parameters, which are an extrinsic quantification of the visual effect of the video after the anti-shake processing, can represent that the anti-shake performance of the video after the anti-shake processing is good or bad, so that the anti-shake performance of the video can be evaluated by using the image frame parameters.

In addition, with reference to the contents in the above examples, the embodiment of the present invention is not particularly limited in the manner that the terminal 101 obtains the anti-shake performance score of the video according to the image frame parameters corresponding to the video. Based on the content contained in the image frame parameters, the manner of obtaining the anti-shake performance score can be classified into the following manners:

(1) the image frame parameters include a degree of disparity between the image frames.

As can be seen from the content in the above example, when the anti-shake performance score of the video is obtained according to the image frame parameters corresponding to the video, the degree of difference between which image frames in the video are obtained may be set according to the requirement. No matter which image frames have the difference degree, the difference degree is actually a group formed by two image frames in the video and is the difference degree between the two image frames in the group. Thus, the image frame parameters may actually include several degrees of difference, each determined by a certain set of two frame images in the video. Wherein "a number" may mean one or more. Correspondingly, when the anti-shake performance score of the video is obtained according to the image frame parameters corresponding to the video, if the image frame parameters include a difference degree, the difference degree can be directly used as the anti-shake performance score of the video. If the image frame parameters include a plurality of difference degrees, the plurality of difference degrees can be averaged, and the average value is used as the anti-shake performance score of the video.

(2) The image frame parameters include a similarity between image frames.

Similar to the case (1), it can be seen from the content in the above example that, when the anti-shake performance score of the video is obtained according to the image frame parameters corresponding to the video, the degree of difference between which image frames in the video are obtained can be set according to the requirement. No matter which image frames have the similarity, the two images in the video actually form a group, and the similarity between the two images in the group is the similarity. Thus, the image frame parameters may actually include several similarities, each determined by a certain set of two frame images in the video. Wherein "a number" may mean one or more. Correspondingly, when the anti-shake performance score of the video is obtained according to the image frame parameters corresponding to the video, if the image frame parameters include a similarity, the similarity can be directly used as the anti-shake performance score of the video. If the image frame parameters include a plurality of similarities, the similarities may be averaged, and the average value may be used as the anti-shake performance score of the video.

(3) The image frame parameters include similarity and difference between image frames.

Similar to the cases (1) and (2), no matter which image frames have similarity or difference, it is true that two images in a video form a group and the similarity or difference between the two images in the group. Thus, the image frame parameters may actually include a number of similarities and a number of differences, each determined by a set of two images in the video. Wherein "a number" may mean one or more. Correspondingly, when the anti-shake performance score of the video is obtained according to the image frame parameters corresponding to the video, the average value of the difference degrees of the image frame parameters can be obtained by averaging a plurality of difference degrees, and the average value of the similarity degrees of the image frame parameters can be obtained by averaging a plurality of similarity degrees. And performing weighted summation on the difference average value and the similarity average value, and taking the weighted summation result as the anti-shake performance score of the video. If the above "several" are substantially one, the one similarity or average degree may be directly used for weighted summation without averaging.

For example, in connection with the above example, taking the example that the image frame parameters include the difference degree between the starting frame and the ending frame in the video, the difference degree can be directly used as the anti-shake performance score. Taking the image frame parameters including the difference between the starting frame and the intermediate frame in the video and the difference between the intermediate frame and the ending frame as an example, the two differences may be averaged, and the average may be used as the anti-shake performance score. Taking the image frame parameters including the difference between the initial frame and the intermediate frame in the video and the similarity between the initial frame and the intermediate frame in the video as an example, the weights of the difference and the similarity can be set according to the importance degree of the difference and the similarity in the aspect of enabling the video to present a better shooting effect, so that the difference and the similarity are weighted and summed, and the weighted and summed result is used as an anti-shake performance score.

According to the method provided by the embodiment of the invention, the anti-shake performance score of the video is obtained according to the image frame parameters corresponding to the video by obtaining the video formed through anti-shake processing. The anti-shake performance score is a relatively objective evaluation basis obtained based on the image frame parameters corresponding to the video, so that the anti-shake performance score is more accurate as an evaluation result compared with a human visual system. In addition, the anti-shake performance score is directly obtained according to the image frame parameters corresponding to the video to evaluate the anti-shake effect, and the anti-shake effect is not required to be evaluated by visual perception in a long time, so that the time consumption is short, and the evaluation efficiency is high.

In combination with the above embodiments, in one embodiment, the image frame parameters include image similarity; accordingly, the embodiment of the present invention does not specifically limit the manner of obtaining the anti-shake performance score of the video according to the image frame parameters corresponding to the video, and includes but is not limited to: for each group of two adjacent frames of images with preset intervals in the video, acquiring the image similarity between the previous frame of image and the next frame of image in each group of two adjacent frames of images with preset intervals, and taking the image similarity as the image similarity corresponding to each group of two adjacent frames of images with preset intervals; and acquiring the anti-shake performance score of the video according to the image similarity corresponding to each group of two adjacent frames of images at preset intervals in the video.

In the above process, the preset interval may be represented by m, which represents an interval of m frames. Specifically, m may be 1 or 2, but cannot be larger than the total number of frames minus 1. And if m is too large, the total amount of image similarity is too small, so that the subsequent anti-shake performance score is not accurate enough. For the above reasons and for convenience of explanation, the following processes will be explained in the embodiment of the present invention with the preset interval as 1.

Take the example that the video contains m frames of images, i.e. the 1 st frame, the 2 nd frame, … and the m th frame. In the above process, each group of two adjacent images at a preset interval in the video refers to two adjacent images at a preset interval of 1, wherein the 1 st frame and the 2 nd frame are taken as a group, the 2 nd frame and the 3 rd frame are taken as a group, the 3 rd frame and the 4 th frame are taken as a group, … …, and up to the m-1 st frame and the m-th frame are taken as a group, so that m-1 groups can be formed. The image similarity corresponding to two frames of images in each group of adjacent preset intervals may be calculated in the manner described above with reference to the definition of the image similarity.

After the image similarity corresponding to each group of two adjacent frames of images at the preset interval in the video is obtained, the anti-shake performance score of the video can be further obtained according to the image similarity corresponding to each group of two adjacent frames of images at the preset interval. The embodiment of the present invention does not specifically limit the manner of obtaining the anti-shake performance score of the video according to the image similarity corresponding to two adjacent frames of images at a preset interval in the video, and includes but is not limited to: and acquiring a summation result of image similarity corresponding to two adjacent frames of images at a preset interval in the video, and taking the summation result as an anti-shake performance score of the video. Or, further, averaging the summation result based on the total number of groups formed by every two adjacent frames of images at preset intervals in the video, and taking the average value as the anti-shake performance score of the video.

Further alternatively, if the obtained image similarity is more than one, the anti-shake performance score of the video may be further obtained based on a plurality of image similarities. For example, in conjunction with the description in the above example, the image similarity is calculated based on image parameters between two adjacent frames of images in the video, and the image parameters may include brightness and/or contrast. Taking the image parameters including brightness and contrast as an example, correspondingly, the image similarity may include two terms, one term is obtained based on the image parameters for brightness and is recorded as brightness similarity, and the other term is obtained based on the image parameters for contrast and is recorded as contrast similarity.

Based on the above description, the obtaining of the anti-shake performance score of the video according to the image similarity corresponding to each group of two adjacent frames of images at the preset interval in the video may further include: and acquiring a summation result of the similarity of each image corresponding to each group of two adjacent frames of images at a preset interval in the video, summing the summation results corresponding to the similarity of each image, and taking the final summation result as the anti-shake performance score of the video. Of course, besides this method, for the case of multiple image similarities, a weighted summation of the multiple image similarities may be adopted to obtain the anti-shake performance score of the video. For example, taking the image similarity including a brightness similarity result obtained based on the image parameter as brightness and a contrast similarity result obtained based on the image parameter as contrast as examples, the image similarity may be weighted and summed based on the image similarity of each group of two adjacent frames of images at a preset interval in the video and the weight corresponding to the image similarity of each group, and the weighted and summed result is used as the anti-shake performance score of the video.

According to the method provided by the embodiment of the invention, because the shooting jitter is continuous, on the premise of anti-jitter processing, the improvement effect after anti-jitter processing can be reflected in the comparison between two adjacent frames of images at each group of adjacent preset intervals in the video, and the image similarity corresponding to the two adjacent frames of images at each group of adjacent preset intervals can reflect the actual improvement effect, so that the anti-jitter performance score obtained based on the image similarity corresponding to the two adjacent frames of images at each group of adjacent preset intervals can be used as a relatively objective evaluation basis, and the evaluation result is more accurate.

With reference to the content of the foregoing embodiment, in an embodiment, the preset interval is 1, and for any group of two adjacent frames of images in the video with the preset interval, the two frames of images are respectively recorded as a q-th frame of image and a q-1 th frame of image; accordingly, the embodiment of the present invention does not specifically limit the manner of obtaining the image similarity between the previous frame image and the next frame image in each group of two adjacent frame images at the preset interval, and includes, but is not limited to, the following two manners:

the first way to obtain image similarity is: acquiring the image similarity between a first subregion in the q frame image and a second subregion in the q-1 frame image, and taking the image similarity as the image similarity between the q frame image and the q-1 frame image, wherein the first subregion and the second subregion are divided according to the same division mode and are positioned at the same position in the respective images; alternatively, the first and second electrodes may be,

the second way to obtain image similarity is: acquiring the image similarity between a third subregion and a fourth subregion in each subregion group, and acquiring the image similarity between the q frame image and the q-1 frame image according to the image similarities corresponding to the plurality of subregion groups; each subregion group is composed of a third subregion in a q frame image and a fourth subregion in a q-1 frame image, the third subregion in the q frame image and the fourth subregion in the q-1 frame image are obtained according to the same dividing mode, and the third subregion and the fourth subregion in each subregion group are located at the same position in the respective images.

In the first manner, taking an example that the q-th frame image and the q-1-th frame image are divided into 4 parts of 2 × 2 in the same division manner, the first sub-region is the upper left corner of the 4 divided parts of the q-th frame image, and the second sub-region is the upper left corner of the 4 divided parts of the q-1-th frame image, the image similarity between the first sub-region and the second sub-region can be respectively obtained in the manner of calculating the image similarity in the above example. For example, the average brightness value of all pixels in the first sub-region may be obtained first, then the average brightness value of all pixels in the second sub-region may be obtained, and the difference between the average brightness value corresponding to the first sub-region and the average brightness value corresponding to the second sub-region may be used as the image similarity between the first sub-region and the second sub-region.

Of course, of the 4 parts formed according to the above dividing manner, the upper right part in the q-1 th frame image may be used as the first sub-region, the upper right part in the q-1 th frame image may be used as the second sub-region, similarly, the lower left part in the q-1 th frame image may be used as the first sub-region, and the lower left part in the q-1 th frame image may be used as the second sub-region, so as to obtain the image similarity between the first sub-region and the second sub-region, which is not specifically limited in the embodiment of the present invention.

In the second mode, the q-th frame image and the q-1-th frame image are divided into 4 parts of 2 × 2 in the same division manner. Accordingly, 4 third sub-regions are included in the q-th frame image, and 4 fourth sub-regions are included in the q-1-th frame image, and thus 4 sub-region groups can be formed.

Specifically, a third sub-region of the q-th frame image located at the upper left corner and a fourth sub-region of the q-1 th frame image located at the upper left corner may form a first sub-region group, a third sub-region of the q-th frame image located at the upper right corner and a fourth sub-region of the q-1 th frame image located at the upper right corner may form a second sub-region group, a third sub-region of the q-th frame image located at the lower left corner and a fourth sub-region of the q-1 th frame image located at the lower left corner may form a third sub-region group, and a third sub-region of the q-th frame image located at the lower right corner and a fourth sub-region of the q-1 th frame image located at the lower right corner may form a fourth sub-region group.

With reference to the above example, based on the same image similarity calculation manner, the image similarity corresponding to each of the four sub-region groups may be obtained. Therefore, according to the image similarity corresponding to the plurality of subarea groups, the image similarity between the q frame image and the q-1 frame image can be acquired. The embodiment of the present invention does not specifically limit the manner of obtaining the image similarity between the q-th frame image and the q-1-th frame image according to the image similarities corresponding to the multiple sub-region groups, which includes but is not limited to: taking the summation result as the image similarity between the q frame image and the q-1 frame image; or, based on the number of the subregion groups, obtaining the average value of the summation result, and taking the average value as the image similarity between the q frame image and the q-1 frame image. And the summation result is obtained by adding the image similarity corresponding to each subregion group. It should be noted that, when the preset interval given in the above example is 1, and when the preset interval is another value than 1, the process in the above example may also be referred to, and details are not described here.

According to the method provided by the embodiment of the invention, because the shooting jitter is continuous, on the premise of anti-jitter processing, the improvement effect after anti-jitter processing can be reflected in the comparison between two adjacent frames of images at each group of preset intervals in the video, and the image similarity corresponding to the two adjacent frames of images at each group of preset intervals can reflect the actual improvement effect, so that for a group of two adjacent frames of images at each group of preset intervals, after the two images are divided in the same dividing mode, the image similarity corresponding to the two images is obtained based on a certain area obtained by dividing the two images at the same position or taking all the areas obtained by dividing the two images as global consideration, and can be used as a relatively objective evaluation basis, and the obtained evaluation result is more accurate.

With reference to the content of the foregoing embodiment, in an embodiment, the embodiment of the present invention does not specifically limit a manner of obtaining the anti-shake performance score of the video according to the image similarity corresponding to each group of two adjacent frames of images at a preset interval in the video, and the method includes, but is not limited to: acquiring similarity scores corresponding to two frames of images of each group of adjacent preset intervals in the video according to the similarity of each image corresponding to the two frames of images of each group of adjacent preset intervals in the video and the weight corresponding to the similarity of each image; and acquiring the anti-shake performance score of the video according to the similarity score corresponding to each group of two adjacent frames of images at preset intervals in the video.

The embodiment of the present invention also does not specifically limit the method for obtaining the similarity score corresponding to two frames of images in each group of adjacent preset intervals in the video according to the similarity of each image corresponding to two frames of images in each group of adjacent preset intervals in the video and the weight corresponding to the similarity of each image, and includes, but is not limited to, the following two methods:

the first way to obtain the similarity score is: and acquiring a weighted summation result based on the similarity of each image corresponding to each group of two adjacent frames of images at a preset interval in the video and the weight corresponding to the similarity of each image, and taking the weighted summation result as the similarity score corresponding to each group of two adjacent frames of images at a preset interval in the video.

The second way to obtain the similarity score is: the method comprises the steps of taking the similarity of each image corresponding to each group of two frames of images at adjacent preset intervals in a video as a power base number, taking the weight corresponding to the similarity of each image as a power exponent, obtaining a power result of the similarity of each image corresponding to each group of two frames of images at adjacent preset intervals in the video, and obtaining a similarity score corresponding to each group of two frames of images at adjacent preset intervals in the video according to the power result of the similarity of each image corresponding to each group of two frames of images at adjacent preset intervals in the video.

The embodiment of the present invention does not specifically limit the manner of obtaining the similarity score corresponding to each group of two frames of images at adjacent preset intervals in the video according to the power result of the similarity of each group of images at adjacent preset intervals in the video, and includes, but is not limited to: summing the power results of the similarity of each image corresponding to each group of two adjacent frames of images at a preset interval in the video, and taking the summed result as the similarity score corresponding to each group of two adjacent frames of images at the preset interval; or multiplying the power result of the similarity of each image corresponding to each group of two frames of images at adjacent preset intervals in the video, and taking the product result as the similarity score corresponding to each group of two frames of images at adjacent preset intervals.

For example, taking the image similarity as 3 items as an example, the image similarity of the first item corresponding to the t-1 th group of two adjacent frames at the preset interval in the video is recorded as L_tAnd the similarity of a second image corresponding to two frames of images at the t-1 th group with adjacent preset intervals in the video is recorded as C_tAnd the similarity of the third image corresponding to the two adjacent frames at the preset interval in the t-1 th group in the video is recorded as S_t. And the weight corresponding to the similarity of the first item of image is recorded as a, the weight corresponding to the similarity of the second item of image is recorded as b, and the weight corresponding to the similarity of the third item of image is recorded as c.

For the first way of obtaining the similarity score described above, it can be calculated with reference to the following formula (1):

P_t＝a*L_t+b*C_t+c*S_t； (1)

for the second method of obtaining the similarity score, if obtaining the similarity score corresponding to each group of two adjacent frames in the video at the preset interval is a method of multiplying the power result, the second method of obtaining the similarity score may refer to the following formula (2):

in the above equations (1) and (2), P_tAnd representing the similarity scores corresponding to the two frames of images of the t-th group adjacent to the preset interval. In the above-mentioned formula (2),the power result of the similarity of the first item of image corresponding to the two frames of images of the t-1 th group adjacent to the preset interval is shown,the power result of the similarity of the second term image corresponding to the two frames of images of the t-1 th group adjacent to the preset interval is shown,and the power result of the similarity of the third item of image corresponding to the two frames of images of the t-1 th group adjacent to the preset interval is shown.

It should be noted that, in the two above-mentioned manners of obtaining the similarity score, the weight corresponding to the similarity of each item of image may be set according to actual requirements. For example, if there are two image similarities, one is the image similarity calculated based on brightness, the other is the image similarity calculated based on contrast, and the ambient brightness in the video is dark, the error caused by the dark ambient brightness should be minimized for the two image similarities. Thus, the weight corresponding to the image similarity calculated based on the brightness can be appropriately reduced, and the weight corresponding to the image similarity calculated based on the contrast can be appropriately increased.

After the similarity score corresponding to each group of two adjacent frames of images at the preset interval in the video is obtained, the anti-shake performance score of the video can be obtained according to the similarity score corresponding to each group of two adjacent frames of images at the preset interval in the video. The embodiment of the present invention does not specifically limit the manner of obtaining the anti-shake performance score of the video according to the similarity score corresponding to each group of two adjacent frames of images at the preset interval in the video, and includes but is not limited to: and acquiring an accumulation result of the similarity scores, wherein the accumulation result is obtained by accumulating the similarity scores corresponding to two adjacent frames of images at preset intervals in the video.

According to the method provided by the embodiment of the invention, the similarity score between the two images at the adjacent preset intervals can be obtained based on the similarity of each image corresponding to the two images at the adjacent preset intervals, so that compared with the method for obtaining the similarity score based on the similarity of a single image, the obtained result is more accurate. In addition, the weight of each image similarity can be set according to actual requirements, so that the importance can be placed on obtaining the similarity score, the error caused by the image similarity corresponding to the low weight is reduced, the anti-shake performance score is determined by the similarity score and the weight, and the subsequently obtained anti-shake performance score is more accurate.

In combination with the content of the foregoing embodiments, in an embodiment, the image similarity includes at least one of the following three similarities, which are brightness similarity, contrast similarity, and structure similarity, respectively.

With reference to the content and the definition of the similarity in the above embodiments and specific examples, taking a preset interval as 1 as an example, the process of calculating the three similarities will now be described, where the brightness similarity corresponding to two frames of images in the t-1 th group adjacent to the preset interval in the video is denoted as L_tAnd the contrast similarity corresponding to the two frames of images of the t-1 th group adjacent to the preset interval in the video is recorded as C_tAnd the structural similarity corresponding to the t-1 th group of two adjacent frames at preset intervals in the video is recorded as S_t。

Wherein, the brightness similarity corresponding to the two frame images of the t-1 th group adjacent to the preset interval, that is, the brightness similarity between the t-frame image and the t-1 th frame image in the two frame images of the t-1 th group adjacent to the preset interval, is calculated, and the following formula (3) can be referred to:

in the above formula (3), μ_tRepresents the mean value of the luminance, mu, of the t-th frame image_t-1Which represents the mean value of the luminance of the t-1 th frame image. Wherein, mu_tThe following equation (4) can be used for calculation:

in the above formula (4), N represents the total number of pixels in the t-th frame image, i represents the i-th pixel in the t-th frame image, and t represents_iIndicating the luminance value of the ith pixel.

Calculating the contrast similarity corresponding to the two frame images of the t-1 th group adjacent to the preset interval, that is, the contrast similarity between the t-frame image and the t-1 th frame image in the two frame images of the t-1 th group adjacent to the preset interval, which can refer to the following formula (5):

in the above formula (5), δ_tIndicating the standard deviation of the luminance of the image of the t-th frame, i.e. the contrast, delta, of the image of the t-th frame_t-1Showing the contrast of the t-1 frame image. Wherein, delta_tThe following equation (6) can be used for calculation:

in the above formula (6), the definition of each parameter can refer to the relevant description in the above formula.

Calculating the structural similarity corresponding to the two frame images of the t-1 th group adjacent to the preset interval, that is, the structural similarity between the t-frame image and the t-1 th frame image in the two frame images of the t-1 th group adjacent to the preset interval, and referring to the following formula (7):

in the above formula (7), δ_t,t-1Representing the luminance covariance between the t frame image and the t-1 frame image. Wherein, delta_t,t-1The following equation (8) can be used for calculation:

in the above formula (8), (t-1)_iRepresents the luminance value, mu, of the ith pixel in the t-1 th frame image_t-1Which represents the mean value of the luminance of the t-1 th frame image.

According to the method provided by the embodiment of the invention, the similarity score between the two adjacent frames of images at the preset interval can be obtained based on the brightness similarity, the contrast similarity and the structure similarity corresponding to the two adjacent frames of images at the preset interval, so that compared with the method for obtaining the similarity score based on the similarity of a single image, the obtained result is more accurate, and the anti-shake performance score is determined by the similarity score, so that the subsequently obtained anti-shake performance score is more accurate.

In conjunction with the above embodiments, in one embodiment, the video is a single channel video or a multi-channel video. The single-channel video is a gray level video, and the multi-channel video is a color video. If the video is a gray scale video, the anti-shake performance score of the gray scale video can be obtained directly according to the method provided in the above embodiment. If the video is a color video, the method provided in the above embodiment may be adopted to first obtain the similarity of each image corresponding to each group of two frames of images at adjacent preset intervals in the video of each channel, and for the similarity of a certain type of images, then add the similarity of the same type of images corresponding to each group of two frames of images at adjacent preset intervals in the video of each channel, and use the result of the addition as the similarity of the same type of images corresponding to each group of two frames of images at adjacent preset intervals in the video. Through the process, the similarity of each image corresponding to each group of two adjacent frames of images at preset intervals in the video can be obtained, and the anti-shake performance score of the video can be obtained by adopting the mode provided by the embodiment.

The method provided by the embodiment of the invention can be simultaneously suitable for single-channel video or multi-channel video, so that the method is more widely applicable to scenes.

In combination with the above embodiments, in one embodiment, referring to fig. 2, there is provided a delay calibration method, including the steps of:

201. acquiring a plurality of videos, wherein the videos are shot on the premise that the shooting equipment shakes;

202. screening the plurality of videos according to the acquired attitude data of the shooting equipment in the shooting time period corresponding to each video in the plurality of videos, and forming a video group by the videos obtained after screening; wherein the attitude data of the photographing apparatus is acquired based on the inertial sensor;

203. updating a delay value between the inertial sensor and the vision system, acquiring an anti-shake performance score corresponding to the video group based on the updated delay value, repeating the updating process of the delay value and the process of acquiring the anti-shake performance score until the acquired anti-shake performance score meets a preset condition, and acquiring the delay value corresponding to the anti-shake performance score meeting the preset condition.

For the related explanation of the content in step 203, reference may be made to the content of the foregoing embodiments, which are not described herein again. In the above step 201, "the video is shot on the premise that the shooting device shakes", which means that the shooting environment of the shooting device may shake, for example, the video may be shot during sports, such as high-frequency sports like user running and holding shooting, mountain bike riding shooting, and the like. Since the shooting device shakes continuously with the movement of the user in the above movements, the video shot in these movements can be considered as being shot under the premise that the shooting device shakes. It should be noted that, when a plurality of videos are actually acquired, for example, n videos may be acquired by capturing one video first and then capturing a plurality of video segments from the video based on a sliding window instead of capturing n videos respectively.

The embodiment of the invention just needs to utilize videos with 'jitter' and takes the videos as the evaluation objects of the anti-jitter performance scores. Wherein, the more serious the "jitter" is, the better the video is used as an evaluation object. Based on this principle, the above-mentioned step 201 explains that "the video is shot on the premise that the shooting device has a shake". Of course, in the actual implementation process, as long as the shooting device is held by a hand, shaking usually exists, and it is not necessarily required that the shooting device is specially used for shooting in an environment with shaking, that is, shooting in a general environment may be performed, and only in comparison with the former, it is difficult to obtain a video with serious "shaking" as an evaluation object.

The length of the sliding window itself may be set according to a requirement, which is not specifically limited in the embodiment of the present invention. In addition, the sliding step length of the sliding window in each sliding can be set according to requirements, the sliding step length of each sliding can be the same or different, and this is not specifically limited in the embodiment of the present invention. For example, with a video having 4800 frames in total, the length of the sliding window itself may be fixed to 100 frames, and the sliding step length is fixed to 10 frames, the 1 st frame to the 100 th frame may be intercepted first as the 1 st video by the sliding window sliding method, after sliding for 1 time, the 10 th frame may be skipped, then the 111 th frame to the 211 th frame may be intercepted as the 2 nd video, and so on until a plurality of videos with the required number are intercepted.

In the step 202, the attitude data of the shooting device is used to describe the attitude of the shooting device, and may be expressed by different manners such as an attitude angle or a quaternion, which is not specifically limited in this embodiment. In addition, for a certain video, when acquiring the attitude data of the shooting device in the shooting time period corresponding to the video, the acquisition frequency may be the same as or different from the frame number frequency when the video is shot, and this is not particularly limited in the embodiment of the present invention. For example, for a video with a length of 1 minute acquired in the shooting period from 17 o 'clock 10 at 4/7/2021 to 17 o' clock 11 at 4/7/2021, if 1 second is 24 frames, the attitude data of the shooting devices can be acquired at the same time at the time of acquiring the image frames in the shooting period, that is, the attitude data of the shooting devices can be acquired 24 times per second, so that 24 × 60 — 1440 shooting devices can be acquired in the 1 minute.

Taking the manner in which the attitude data of the shooting device is expressed by the attitude angle as an example, accordingly, the embodiment of the present invention does not specifically limit the manner in which the attitude data of the shooting device is acquired, and includes but is not limited to: and estimating the attitude of the shooting equipment through a preset algorithm based on the IMU to obtain attitude data of the shooting equipment. The preset algorithm may be an AKF (Adaptive Kalman Filter) algorithm, an Unscented Kalman Filter (Unscented Kalman Filter), a complementary filtering algorithm, or another filtering algorithm, which is not specifically limited in this embodiment of the present invention.

It should be noted that the delay value affects the anti-shake performance score because the anti-shake performance score is obtained according to image frame parameters obtained based on the image frame after the anti-shake process, and the anti-shake process is completed by the vision system and the inertial sensor based on the delay value therebetween. Therefore, the more accurate the delay values are for the clock corresponding to the IMU and the clock corresponding to the vision system, the more accurate the indexing result is when one clock is taken as a standard and the delay values are added, and the corresponding data is indexed in the other clock. It should also be noted that, with one clock as a standard, the delay value between the other clock and the clock may be positive or negative, for example, with the clock corresponding to the vision system as a standard, the clock corresponding to the IMU may be slow or fast, so that based on this, the delay value may be positive or negative.

For example, when the real delay value between the IMU and the vision system is 0.01 second, in the clock corresponding to the IMU, the pose data of the shooting device acquired based on the IMU are respectively: the attitude data of the photographing device at 10 times of 0.01 th, 0.02 th, 0.03 th, 0.04 th, 0.05 th, 0.06 th, 0.07 th, 0.08 th, 0.09 th and 0.10 th seconds, and the image frames photographed by the vision system at the clock corresponding to the vision system are respectively as follows: the image frames at 10 times of 0.01 th, 0.02 th, 0.03 th, 0.04 th, 0.05 th, 0.06 th, 0.07 th, 0.08 th, 0.09 th, and 0.10 th seconds are exemplified.

Assume that the estimated delay value between the IMU and the vision system is 0.03 seconds, and the clock corresponding to the vision system is taken as the standard, the clock corresponding to the IMU is slower by 0.03 seconds, i.e., the delay value between the clock corresponding to the IMU and the clock corresponding to the vision system is-0.03. According to the delay value, the image frame shot by the vision system at the time of 0.04 second corresponds to the attitude data of the shooting device, which is obtained by the IMU at the time of 0.01 second, and the attitude data of the shooting device, which is obtained by the IMU at the time of 0.01 second, is used when the image frame shot by the IMU at the time of 0.04 second is subjected to electronic anti-shake processing. The real delay value is 0.01 second, that is, the image frame captured by the vision system at the time of 0.04 second should correspond to the attitude data of the capturing device captured by the IMU at the time of 0.03 second, and subsequently, when the image frame captured at the time of 0.04 second is subjected to electronic anti-shake processing, the attitude data of the capturing device captured by the IMU at the time of 0.03 second should be used. The larger the difference between the estimated delay value and the real delay value is, the more the correct attitude data of the shooting device cannot be indexed, and thus the larger the error is when the electronic anti-shake processing is performed subsequently.

Through the process, the attitude data of the shooting equipment acquired in the shooting time period corresponding to each video is acquired, and a plurality of videos can be screened. The screening process may be to calculate a variance corresponding to the acquired attitude data of the shooting device within a shooting time period corresponding to each video, and sort the videos according to the variance from large to small, so as to select the videos in the preset number. Since a larger variance can represent more unstable data, a video with more intense jitter can be selected as the video obtained after screening.

According to the method provided by the embodiment of the invention, the plurality of videos are screened according to the acquired attitude data of the shooting equipment in the shooting time period corresponding to each video in the plurality of videos. Before calculating the anti-shake performance score corresponding to the video, the video can be screened, the video with more intense shake is selected as the video obtained after screening, the requirement for anti-shake processing is higher when the shake is more intense, the anti-shake performance score can reflect the real effect of the anti-shake processing, and the requirement for the value taking accuracy of the delay value is higher, so that the screened video is used as the basis for testing the anti-shake processing effect, and the finally obtained delay value can be more accurate by continuously executing the updating process of the delay value and the process for obtaining the anti-shake performance score.

It should be understood that although the steps in the flowcharts of fig. 1 and 2 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1 and 2 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.

With reference to the content of the foregoing embodiment, in an embodiment, the embodiment of the present invention does not specifically limit the manner of filtering the multiple videos according to the pose data of the shooting device acquired in the shooting time period corresponding to each of the multiple videos, and the method includes, but is not limited to: converting the attitude data of the shooting equipment acquired in the shooting time period corresponding to each video into a frequency domain space to obtain an amplitude-frequency characteristic curve set corresponding to each video; acquiring a frequency domain value corresponding to each video according to the amplitude-frequency characteristic curve set corresponding to each video; and screening the plurality of videos according to the frequency domain value corresponding to each video.

The attitude data of the shooting device acquired in the shooting time period corresponding to each video may be a continuous axis angle, that is, a continuous discrete value, and the discrete values may form a time domain curve with linear change in a coordinate system formed by taking an abscissa as time and an ordinate as the size of the axis angle. Through fast fourier transform, this curve can be transformed into a plurality of sinusoidal curves, i.e. a plurality of amplitude-frequency characteristic curves, and thus a set of amplitude-frequency characteristic curves is formed. Each amplitude-frequency characteristic curve in the amplitude-frequency characteristic curve sets can be used as a point in a coordinate system formed by taking the frequency as an abscissa and the amplitude as an ordinate.

The frequency domain score corresponding to each video can be used for indicating the shaking intensity of each video when being shot. For a frequency-amplitude characteristic curve set corresponding to a certain video, when the frequency domain score corresponding to the video is obtained, the frequency maximum value and the amplitude maximum value can be determined from the frequency and the amplitude corresponding to each frequency-amplitude characteristic curve in the frequency-amplitude characteristic curve set, and thus the product of the frequency maximum value and the amplitude maximum value is used as the frequency domain score corresponding to the video. Of course, the frequency average value and the amplitude average value may also be determined according to the frequency and the amplitude corresponding to each amplitude-frequency characteristic curve in the amplitude-frequency characteristic curve set, so that the product of the two average values is used as the frequency domain score corresponding to the video. In combination with the above process of calculating the frequency domain score of the video, the frequency domain score corresponding to the video can be used to indicate the shaking intensity during video shooting, because the amplitude can indicate the shaking intensity during video shooting, a value associated with the amplitude is used as a product factor, a value associated with the frequency is used as another product factor, and the frequency score obtained by multiplying the two product factors is correspondingly also used to indicate the shaking intensity during video shooting. After the frequency domain score corresponding to each video is obtained, a plurality of videos can be screened according to the frequency domain score corresponding to each video, and specifically, videos with frequency domain scores larger than a preset threshold value can be screened out.

According to the method provided by the embodiment of the invention, the attitude data of the shooting equipment acquired in the shooting time period corresponding to each video is converted into the frequency domain space based on the fast Fourier transform to obtain the amplitude-frequency characteristic curve set corresponding to each video, the frequency domain score corresponding to each video is acquired according to the amplitude-frequency characteristic curve set corresponding to each video, and a plurality of videos are screened according to the frequency domain score corresponding to each video. Before the anti-shake performance score corresponding to the video is calculated, the video can be screened, the video with more intense shake is selected as the video obtained after screening, the requirement for anti-shake processing is higher when the shake is more intense, the anti-shake performance score can reflect the real effect of the anti-shake processing, and the requirement for the value taking accuracy of the delay value is higher, so that the video is screened based on the frequency domain score of the video, and the updating process of the delay value and the process of obtaining the anti-shake performance score are continuously executed on the basis of taking the screened video as the basis for testing the anti-shake processing effect, so that the finally obtained delay value is more accurate.

With reference to the content of the foregoing embodiment, in an embodiment, the embodiment of the present invention does not specifically limit the manner of obtaining the frequency domain score corresponding to each video according to the amplitude-frequency characteristic curve set corresponding to each video, which includes but is not limited to: for an amplitude-frequency characteristic curve set corresponding to any video, acquiring a frequency domain score corresponding to each amplitude-frequency characteristic curve according to the frequency and amplitude corresponding to each amplitude-frequency characteristic curve in the amplitude-frequency characteristic curve set; and acquiring the frequency domain value corresponding to the video according to the frequency domain value corresponding to each frequency characteristic curve.

For a certain amplitude-frequency characteristic curve, the frequency and amplitude corresponding to the amplitude-frequency characteristic curve can be subjected to weighted summation, so that the weighted summation result is used as the frequency domain score corresponding to the amplitude-frequency characteristic curve. For a certain video, after obtaining the frequency domain score corresponding to each amplitude-frequency characteristic curve in the amplitude-frequency characteristic curve set corresponding to the video, the maximum value and the minimum value can be selected from the frequency domain scores corresponding to all amplitude-frequency characteristic curves, and the average value of the maximum value and the minimum value is used as the frequency domain score corresponding to the amplitude-frequency characteristic curve set, namely, the frequency domain score corresponding to the video.

According to the method provided by the embodiment of the invention, for a certain video, according to the frequency and the amplitude corresponding to each amplitude-frequency characteristic curve in the amplitude-frequency characteristic curve set corresponding to the video, the frequency domain score corresponding to each amplitude-frequency characteristic curve is obtained; and acquiring the frequency domain value corresponding to the video according to the frequency domain value corresponding to each frequency characteristic curve. Before calculating the anti-shake performance score corresponding to the video, the video can be screened to select the video with more intense shake as the video obtained after screening, the requirement for anti-shake processing is higher as the shake is more intense, the anti-shake performance score can reflect the real effect of the anti-shake processing, and the requirement for the value taking accuracy of the delay value is higher, so that the frequency domain score corresponding to the video is obtained based on the frequency domain score corresponding to each piece of frequency characteristic video in the set of frequency characteristic curves corresponding to the video, the video is screened based on the frequency domain score of the video, and the finally obtained delay value is more accurate by continuously executing the updating process of the delay value and the process of obtaining the anti-shake performance score based on the screened video serving as the basis for testing the anti-shake processing effect.

With reference to the content of the foregoing embodiment, in an embodiment, the embodiment of the present invention does not specifically limit the manner of obtaining the frequency domain score corresponding to each amplitude-frequency characteristic curve according to the frequency and amplitude corresponding to each amplitude-frequency characteristic curve in the amplitude-frequency characteristic curve set, and the method includes, but is not limited to: obtaining the product of the frequency corresponding to each amplitude-frequency characteristic curve and the amplitude, and taking the product as the frequency domain score corresponding to each amplitude-frequency characteristic curve; or, obtaining a score of the frequency corresponding to each amplitude frequency characteristic curve, obtaining a product of the score corresponding to each amplitude frequency characteristic curve and the amplitude, and taking the product as a frequency domain score corresponding to each amplitude frequency characteristic curve.

In the above process, the embodiment of the present invention does not specifically limit the manner of obtaining the score of the frequency corresponding to each amplitude-frequency characteristic curve, and the method includes, but is not limited to: and determining the frequency of each amplitude-frequency characteristic curve in a preset time period according to the frequency corresponding to each amplitude-frequency characteristic curve, and taking the frequency as the score corresponding to each amplitude-frequency characteristic curve. The preset time period may be 1 second, which is not specifically limited in this embodiment of the present invention.

In addition, in the second method, the frequency corresponding to each frequency characteristic curve is converted into the score, because the frequencies corresponding to each frequency characteristic curve are different, the frequency corresponding to each frequency characteristic curve is converted into the score under the same standard, so that the identity of data can be ensured, and the frequency domain scores obtained by subsequent calculation are all based on the same calculation standard.

According to the method provided by the embodiment of the invention, before the anti-shake performance score corresponding to the video is calculated, the video can be screened, the video with more intense shake is selected as the video obtained after screening, the requirement for anti-shake processing is higher as the shake is more intense, the anti-shake performance score can reflect the real effect of the anti-shake processing, and the requirement for the value taking accuracy of the delay value is higher, so that the video is screened based on the frequency domain score of the video, and the updating process of the delay value and the process of obtaining the anti-shake performance score are continuously executed based on the screened video as the basis for testing the anti-shake processing effect, and the finally obtained delay value is more accurate.

With reference to the content of the foregoing embodiment, in an embodiment, the embodiment of the present invention does not specifically limit the manner of obtaining the frequency domain score corresponding to any video according to the frequency domain score corresponding to each frequency characteristic curve, which includes but is not limited to: and carrying out weighted summation on the frequency domain scores corresponding to all the amplitude-frequency characteristic curves in the amplitude-frequency characteristic curve set, and taking the obtained sum value as the frequency domain score corresponding to the video.

According to the method provided by the embodiment of the invention, for a certain video, the frequency domain scores corresponding to all amplitude-frequency characteristic curves in the amplitude-frequency characteristic curve set corresponding to the video are subjected to weighted summation, and the obtained summation value is used as the frequency domain score corresponding to the video. Before the anti-shake performance score corresponding to the video is calculated, the video can be screened, the video with more intense shake is selected as the video obtained after screening, the requirement for anti-shake processing is higher when the shake is more intense, the anti-shake performance score can reflect the real effect of the anti-shake processing, and the requirement for the value taking accuracy of the delay value is higher, so that the video is screened based on the frequency domain score of the video, and the updating process of the delay value and the process of obtaining the anti-shake performance score are continuously executed on the basis of taking the screened video as the basis for testing the anti-shake processing effect, so that the finally obtained delay value is more accurate.

With reference to the content of the foregoing embodiment, in an embodiment, the embodiment of the present invention does not specifically limit a manner of screening a plurality of videos according to a frequency domain score corresponding to each video, which includes but is not limited to: and sorting the frequency domain values corresponding to each video in the plurality of videos from large to small, selecting a preset number of videos, and taking the videos as the videos obtained after screening.

According to the content of the embodiment, the larger the frequency domain value of the video is, the more violent the jitter degree of the video during shooting is indicated, so that in order to select the video with the more violent jitter degree during shooting, the frequency domain values can be sorted from large to small, and a preset number of videos in the sorting result can be screened out.

According to the method provided by the embodiment of the invention, the frequency domain values corresponding to each video in the plurality of videos are sorted from large to small, and the videos with the preset number are selected and used as the videos obtained after screening. Before the anti-shake performance score corresponding to the video is calculated, the video can be screened, the video with more intense shake is selected as the video obtained after screening, the requirement for anti-shake processing is higher when the shake is more intense, the anti-shake performance score can reflect the real effect of the anti-shake processing, and the requirement for the value taking accuracy of the delay value is higher, so that the video is screened based on the frequency domain score of the video, and the updating process of the delay value and the process of obtaining the anti-shake performance score are continuously executed on the basis of taking the screened video as the basis for testing the anti-shake processing effect, so that the finally obtained delay value is more accurate.

It should be noted that the technical solutions described above may be implemented as independent embodiments in actual implementation processes, or may be combined with each other and implemented as combined embodiments. In addition, when the contents of the embodiments of the present invention are described above, the different embodiments are described according to the corresponding sequence only based on the idea of convenient description, for example, the sequence of the data flow is adopted, and the execution sequence between the different embodiments is not limited. Accordingly, in the actual implementation process, if it is necessary to implement multiple embodiments provided by the present invention, the execution sequence provided in the embodiments of the present invention is not necessarily required, but the execution sequence between different embodiments may be arranged according to requirements.

In conjunction with the above embodiments, in one embodiment, as shown in fig. 3, there is provided a delay calibration apparatus including: an obtaining module 301 and an updating module 302, wherein:

an obtaining module 301, configured to obtain a video group, where the video group includes at least one video;

an updating module 302, configured to update a delay value between the inertial sensor and the vision system, acquire an anti-shake performance score corresponding to the video group based on the updated delay value, repeat the updating process of the delay value and the process of acquiring the anti-shake performance score, until the acquired anti-shake performance score meets a preset condition, and acquire a delay value corresponding to the anti-shake performance score meeting the preset condition;

In one embodiment, the obtaining module 301 includes:

the acquisition submodule is used for acquiring a plurality of videos, and the videos are shot on the premise that the shooting equipment shakes;

the screening submodule is used for screening the plurality of videos according to the acquired attitude data of the shooting equipment in the shooting time period corresponding to each video in the plurality of videos, and the videos obtained after screening form a video group; wherein the attitude data of the photographing apparatus is acquired based on the inertial sensor.

In one embodiment, a screening submodule, comprising:

the conversion unit is used for converting the acquired attitude data of the shooting equipment in the shooting time period corresponding to each video into a frequency domain space so as to obtain an amplitude-frequency characteristic curve set corresponding to each video;

the acquisition unit is used for acquiring the frequency domain value corresponding to each video according to the amplitude-frequency characteristic curve set corresponding to each video;

and the screening unit is used for screening the plurality of videos according to the frequency domain value corresponding to each video.

In one embodiment, the obtaining unit includes:

the first obtaining subunit is configured to, for an amplitude-frequency characteristic curve set corresponding to any video, obtain, according to a frequency and an amplitude corresponding to each amplitude-frequency characteristic curve in the amplitude-frequency characteristic curve set, a frequency domain score corresponding to each amplitude-frequency characteristic curve;

and the second acquiring subunit is used for acquiring the frequency domain score corresponding to the video according to the frequency domain score corresponding to each frequency characteristic curve.

In an embodiment, the first obtaining subunit is configured to obtain a product of a frequency corresponding to each frequency characteristic curve and an amplitude, and use the product as a frequency domain score corresponding to each frequency characteristic curve; or, obtaining a score of the frequency corresponding to each amplitude frequency characteristic curve, obtaining a product of the score corresponding to each amplitude frequency characteristic curve and the amplitude, and taking the product as a frequency domain score corresponding to each amplitude frequency characteristic curve.

In an embodiment, the second obtaining subunit is configured to perform weighted summation on frequency domain scores corresponding to all amplitude-frequency characteristic curves in the amplitude-frequency characteristic curve set, and use the obtained sum as the frequency domain score corresponding to the video.

In an embodiment, the screening unit is configured to select a preset number of videos as the videos obtained after screening, where the frequency domain scores corresponding to each of the videos are sorted from large to small.

For the specific definition of the delay calibration device, reference may be made to the above definition of the delay calibration method, which is not described herein again. The modules in the delay calibration apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 4. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an image processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

acquiring a video group, wherein the video group at least comprises one video;

In one embodiment, the processor, when executing the computer program, further performs the steps of:

acquiring a plurality of videos, wherein the videos are shot on the premise that the shooting equipment shakes;

screening the plurality of videos according to the acquired attitude data of the shooting equipment in the shooting time period corresponding to each video in the plurality of videos, and forming a video group by the videos obtained after screening; wherein the attitude data of the photographing apparatus is acquired based on the inertial sensor.

In one embodiment, the processor, when executing the computer program, further performs the steps of:

acquiring a frequency domain value corresponding to each video according to the amplitude-frequency characteristic curve set corresponding to each video;

and screening the plurality of videos according to the frequency domain value corresponding to each video.

In one embodiment, the processor, when executing the computer program, further performs the steps of: for an amplitude-frequency characteristic curve set corresponding to any video, acquiring a frequency domain score corresponding to each amplitude-frequency characteristic curve according to the frequency and amplitude corresponding to each amplitude-frequency characteristic curve in the amplitude-frequency characteristic curve set; and acquiring the frequency domain value corresponding to the video according to the frequency domain value corresponding to each frequency characteristic curve.

In one embodiment, the processor, when executing the computer program, further performs the steps of: obtaining the product of the frequency corresponding to each amplitude-frequency characteristic curve and the amplitude, and taking the product as the frequency domain score corresponding to each amplitude-frequency characteristic curve; or, obtaining a score of the frequency corresponding to each amplitude frequency characteristic curve, obtaining a product of the score corresponding to each amplitude frequency characteristic curve and the amplitude, and taking the product as a frequency domain score corresponding to each amplitude frequency characteristic curve.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and carrying out weighted summation on the frequency domain scores corresponding to all the amplitude-frequency characteristic curves in the amplitude-frequency characteristic curve set, and taking the obtained sum value as the frequency domain score corresponding to the video.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and sorting the frequency domain values corresponding to each video in the plurality of videos from large to small, selecting a preset number of videos, and taking the videos as the videos obtained after screening.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

acquiring a video group, wherein the video group at least comprises one video;

In one embodiment, the computer program when executed by the processor further performs the steps of: