Robustness zero-watermark method for two-dimensional video frame and depth map right protection in three-dimensional video

文档序号：1429578 发布日期：2020-03-17 浏览：6次中文

阅读说明：本技术 面向三维视频中二维视频帧和深度图版权保护的鲁棒性零水印方法 (Robustness zero-watermark method for two-dimensional video frame and depth map right protection in three-dimensional video ) 是由刘熙尧张雅云楼杰挺孙玉莹王磊廖胜辉赵荣昌邹北骥于 2019-12-11 设计创作，主要内容包括：本发明公开了一种基于DIBR的三维视频中二维视频帧和深度图版权保护的鲁棒性零水印方法,在该方法中,首先从二维视频帧和深度图的TIRI中提取特征,生成主共享。然后,生成表示版权信息与主共享关系的从共享,并存储起来进行版权识别。此外,本方案是第一次根据版权保护需求的不同对二维视频帧和深度图像分开提取特征,也是第一次对二维视频帧和深度图各自使用两种求特征的方法进行特征融合,这两种方法的融合保证了同时对几何攻击、信号攻击以及DIBR攻击的鲁棒性。该零水印方案不会引起合成三维视频的失真,显示出对各类视频攻击足够的鲁棒性,并且能够对三维视频的二维视频帧和深度图像同时并且独立地进行版权保护。(The invention discloses a DIBR-based robust zero watermarking method for protecting the copyright of a two-dimensional video frame and a depth map in a three-dimensional video. Then, a slave share indicating the relationship of copyright information and the master share is generated and stored for copyright identification. In addition, the method separately extracts the characteristics of the two-dimensional video frame and the depth image according to different copyright protection requirements for the first time, and also performs characteristic fusion on the two-dimensional video frame and the depth image by using two characteristic solving methods respectively for the first time, and the fusion of the two methods ensures the robustness of geometric attack, signal attack and DIBR attack at the same time. The zero-watermark scheme does not cause distortion of a synthesized three-dimensional video, shows sufficient robustness to various video attacks, and can simultaneously and independently perform copyright protection on two-dimensional video frames and depth images of the three-dimensional video.)

1. A robustness zero-watermarking method facing two-dimensional video frame and depth map right protection in three-dimensional video is characterized by comprising a watermark sharing stage and a watermark restoring stage;

the watermark sharing stage comprises the following steps:

a10, respectively sampling N frames of video frames from the two-dimensional video frame sequence and the depth map sequence, wherein the value of N is preset;

a20, preprocessing a video frame, setting the size of the video frame obtained by sampling as a fixed value, and performing Gaussian low-pass filtering on the adjusted two-dimensional video frame;

a30, respectively calculating TIRI of the preprocessed two-dimensional video frame and the preprocessed depth map video frame;

a40, respectively generating feature vectors of a two-dimensional video frame and a depth map video frame;

a50, rearranging each generated eigenvector into a two-dimensional matrix to generate a master share, and carrying out bitwise XOR operation on the master share and a binary watermark containing copyright information to generate a slave share;

a60, storing the generated slave share in an authentication database;

the method comprises the following steps of A30-A40, wherein TIRI of the preprocessed two-dimensional video frame and TIRI of the preprocessed depth map video frame are respectively calculated, and feature vectors of the two-dimensional video frame and the feature vectors of the depth map video frame are respectively generated, and the method comprises the following steps:

calculating TIRI1 for two-dimensional video frames and using TIRI_2d1Expressing, generating a characteristic vector F by adopting a dual-tree wavelet-quaternion combined method based on TIRI of a two-dimensional video frame_2d1；

Calculating TIRI2 for two-dimensional video frames and using TIRI_2d2That is, TIRI2 based on two-dimensional video frames generates a feature vector of two F by normalizing TIRI-based bias_2d2；

Calculating TIRI1 of video frame of depth map, and using TIRI_depth1The TIRI1 based on the depth map video frame adopts two-dimensional DCT transformation and method of taking low-frequency coefficient to generate feature vector F_depth1；

Calculating TIRI2 of video frame of depth map, and using TIRI_depth2That is, TIRI2 based on depth map video frames generates feature vector two F by using a method of obtaining normalized TIRI-based bias_depth2；

The watermark recovery stage comprises two-dimensional video watermark recovery and/or depth video watermark recovery, and specifically comprises the following steps:

b10, sampling and preprocessing suspicious two-dimensional video frames and/or depth maps shared on the network according to the steps A10-A50, respectively extracting the features of the preprocessed suspicious two-dimensional video frames and/or depth maps, respectively generating two feature vectors, and generating corresponding main shares;

b20, respectively carrying out bitwise XOR operation on the generated master share and the corresponding slave shares stored in the authentication database to respectively obtain two recovered watermarks;

b30, calculating error rates of two recovered watermarks obtained through the suspicious two-dimensional video frame and/or two recovered watermarks obtained through the suspicious depth map and corresponding original watermarks respectively, and fusing the two error rates corresponding to the suspicious two-dimensional video frame and/or the depth map by using a minimum method characteristic to obtain respective final error rates so as to identify the authenticity and the copyright of the inquired two-dimensional video frame and/or the depth map;

the feature extraction is respectively performed on the preprocessed suspicious two-dimensional video frame and/or the preprocessed depth map, and the generation of the two feature vectors respectively specifically comprises the following steps:

for suspicious two-dimensional video frames, generating feature vector F of the preprocessed two-dimensional video frames according to steps A30-A40_2d1' and F_2d2’；

For the suspicious depth map, generating a feature vector F from the preprocessed depth map according to the steps A30-A40_depth1' and F_depth2’。

2. The zero-watermarking method according to claim 1, wherein the TIRI1 based on two-dimensional video frames generates the feature vector F by adopting a dual-tree wavelet-quaternion combination method_2d1The method specifically comprises the following steps:

a411, for each TIRI_2d1It is divided into non-overlapping preprocessed sub-blocks B of size m_sWherein s is the corresponding serial number of each sub-block;

a412, sub-block B_sR, G, B, respectively performing 3-level dual-tree wavelet transform, selecting the coefficient of the 3 rd-level dual-tree wavelet transform domain to enhance the robustness of resisting low-pass filtering, noise and JPEG compression, and transforming six sub-domains H_3,drDivided into three sets of subfield pairs, dr ═ 1,2, …,6, each being (H)_3,1,H_3,6)，(H_3,2,H_3,5) And (H)_3,3,H_3,4) Selecting (H)_3,1,H_3,6) And (H)_3,2,H_3,5) Two sub-fields connected by H_3,1，H_3,6，H_3,2And H_3,5The sub-field pairs are combined and the amplitude is calculated to obtain

A413, dividing each sub-block B_sThe dual-tree wavelet transform amplitude matrix is expressed by quaternion, and a sub-block B_sThe dual-tree wavelet transform magnitude matrix of (a) is represented as:

wherein i, j, k are imaginary numbers;

a414, calculating

Performing DCT (discrete cosine transformation) on one row of the 4 sub-arrays each time, extracting DC (direct current) coefficients respectively, forming the obtained 4 DC coefficients into a one-dimensional vector, performing DCT again, and taking the DC coefficients; sub-block B_sFinally, 5 DC coefficients are obtained, and all the sub-blocks B are connected_sThe 5 DC coefficients obtain a one-dimensional vector H_s；

A415, calculating the characteristic value, firstly calculating each TIRI by a mean value binarization method according to the following formula_2dThe binary characteristic value of (2);

wherein L represents a one-dimensional vector H_sLength of (2), final binary feature vector F_2dFrom all TIRIs_2d1Is obtained by concatenating characteristic values of, i.e. F_2d1＝(Ft₍₁₎,Ft₍₂₎…Ft_(s)…)。

3. The method of claim 1, wherein the TIRI2 is calculated for a two-dimensional video frame and used_2d2Representing TIRI of video frames based on depth maps_2d2Generation of feature vector two F by normalized TIRI-based bias_2d2The method specifically comprises the following steps:

a421, by calculation of F_norm2dThe pixel in (1) and its pixel in TIRI_2d2The maximum absolute difference between the 8 spatial neighborhood pixels in (a) generates a TIRI-based deviation D as follows_2d(i,j,k)，

D_2d(i,j,k)＝max(|TIRI_2d2(i±1,j±1)-F_norm2d(i,j,k)|)

Wherein i is more than or equal to 2 and less than or equal to H-1, j is more than or equal to 2 and less than or equal to W-1, and k is more than or equal to 1 and less than or equal to L;

a422, normalizing the TIRI-based deviation and recording the normalized TIRI-based deviation as N in the following formula_2d(i,j,k)；

N_2d(i,j,k)＝arctan(D_2d(i,j,k)/TIRI_2d2(i,j))

A423, dividing the normalized deviation based on the TIRI into a center circle and X-1 concentric circles, setting the radius of the center circle and the width of the concentric circles as r,

for each pixel (i, j, k) in the k-th frame, it is first calculated to the frame center point (i, j, k) as follows_o,j_oK) distance Dist (i, j, k),

the partition n, n of the pixel (x, y, k) is then calculated from Dist (i, j, k) in the following way,

a424, using the pixel value in TIRI as the weight of normalized TIRI-based deviation

a425, normalizing the intermediate features by mean and standard deviation in the following manner, to generate features fn,

and F, binarizing fn according to the following formula and the median value t of fn to obtain the final characteristic F_2d2：

4. The method of claim 1, wherein the TIRI1 based on the depth map video frame generates an eigenvector-F by using two-dimensional DCT and taking low-frequency coefficients_depth1The method specifically comprises the following steps:

a431, TIRI for each frame_depth1Performing two-dimensional discrete cosine transform to obtain DCT_depth；

A432, selecting DCT as follows_depthLow frequency coefficient Coeff_depth：

Coeff_depth(i-1,j-1)＝DCT_depth(i,j)

Wherein i is more than or equal to 2 and less than or equal to 9, and j is more than or equal to 2 and less than or equal to 9;

a433, calculating characteristic value, firstly calculating each TIRI by mean binarization method according to the following formula_depth1The binary characteristic value f 1;

f1(i, j) ═ 1 when Coeff_depth(i,j)＞t；

f1(i, j) ═ 0 when Coeff_depth(i,j)≤t；

Wherein i is more than or equal to 1 and less than or equal to 8, j is more than or equal to 1 and less than or equal to 8, and t is Coeff_depthThe median value of (a). Connecting all TIRIs_depth1F1 to obtain the final extracted feature vector F_depth1。

5. The method of claim 1, wherein the TIRI2 is calculated for a depth map video frame and used_depth2That is, TIRI2 based on the depth map video frame generates a feature vector two F of the depth map video frame according to the method of obtaining normalized TIRI-based variance in step A421-A425_depth2。

6. The method of claim 1, wherein the TIRI1 and TIRI2 of the two-dimensional video frame and the TIRI1 and TIRI2 of the depth map video frame are calculated in step a30 according to the following formulas:

W_k＝a^k

in calculating TIRI for a two-dimensional video frame, F_kRepresenting a k-th frame of a two-dimensional video frame, W_kRepresenting the weight of the kth frame two-dimensional video frame, L representing the number of sampling frames of the two-dimensional video frame sequence downsampling, and a is more than or equal to 0 and less than or equal to 1; in computing TIRI1 for a depth map video frame, F_kRepresenting the kth frame depth map, W_kAnd the weight of the kth frame depth map is represented, L represents the number of sampling frames for the depth map sequence downsampling, and a is more than or equal to 0 and less than or equal to 1.

7. The method of claim 1, wherein the BER is calculated as follows,

where W' (i, j) and W (i, j) represent the pixels of the recovered watermark and the original watermark, respectively,represents an exclusive OR operation, m_w×m_wIs the size of the watermark.

Technical Field

The invention relates to the technical field of digital watermarking, in particular to a robustness zero watermarking method for two-dimensional video frame and depth map right protection in a three-dimensional video.

Background

With the development of internet technology and the innovation of new media technology, the digital media technology has made breakthrough progress, changes the dominant pattern of the traditional media in the information dissemination process, and simultaneously, a series of problems about the digital media also appear as the digital media is more abundant. Among them, how to effectively protect the copyright of the digital media, and prevent the digital media from being illegally copied or used has become an important aspect. Digital Rights Management (DRM) is a main means for protecting the copyright of Digital media spread in a network at present, and Digital watermarking is increasingly regarded as an important technology of DRM. Watermarking is also increasingly combined with copyright protection in various digital media application scenarios, such as video and audio photographs, medical images, 3D videos, and the like.

With the increasing popularity of three-dimensional video, the risk of copyright infringement is also increasing. Therefore, copyright protection of 3D video has become a crucial issue, see documents m.asikuzzzaman, m.r.pickering, An overview of digital video watermarking, IEEE t.circ.syst.vid.28(2018) 2131-. 3D video can be stored in two main formats. One of these is called the side-by-side format, which contains left and right views, with two identical cameras taking the same scene at different positions and angles simultaneously. Another format uses Depth-image-based Rendering (DIBR) technology. The DIBR-based format includes a plurality of two-dimensional video frames and depth maps thereof, and the frames are warped by using a DIBR technology to obtain a corresponding three-dimensional video. Compared to the side-by-side format, the storage and transmission bandwidth cost is lower for the dibr-based format because the depth map contains only gray pixels and smooth regions, which can be compressed efficiently, see documents s.c. pei, y.y.wang, a new 3D unseen visual watermarking and iterative to multimedia, In: proc.ieee int.conf.consumer electronics (gcce), Japan,2014, pp.140-143. In addition, the producer can convert the existing 2D video into 3D video using DIBR technology. These advantages are the reason why many 3D videos are stored and transmitted in DIBR-based formats. Therefore, DIBR-based three-dimensional video is the focus of our research.

The protection of DIBR three-dimensional video is more complex than the protection of traditional two-dimensional video. The original two-dimensional video frames of the depth map of the three-dimensional video may be converted into three-dimensional composite frames. Thus, the watermark should be obtainable from any of the original two-dimensional frames, the synthesized frames or the depth map. But since the DIBR post-synthesized frame is shifted horizontally with respect to the original two-dimensional frame, this means that the protection scheme for three-dimensional video should be DIBR-invariant. In addition, for three-dimensional video synthesized by existing two-dimensional video, producers of two-dimensional video frames and depth maps may be different, and in this case, copyright of 2D video frames and depth maps should be independently protected.

Watermarking is a general solution to the problem of Digital Rights Management (DRM), but existing DIBR-based three-dimensional video watermarking schemes can be mainly divided into two-dimensional video frame watermarking, depth image watermarking and zero watermarking schemes, which all have improved space. 1) The watermarking scheme based on the two-dimensional video frame only embeds the watermark in the two-dimensional video frame, and can cause irreversible distortion to video content. Furthermore, they neglect the situation that the producers of two-dimensional video frames and depth maps may be different and cannot independently protect the copyright of the depth maps. 2) The depth map-based watermarking scheme only embeds watermarks into depth maps, and is not robust against severe signal attacks and geometric attacks. In addition, they cannot independently protect the copyright of two-dimensional video frames. 3) For the zero watermark scheme, secondary sharing is generated to represent the mapping relation between the video characteristics and the watermark, and the watermark does not need to be directly embedded, so that distortion cannot be caused to the three-dimensional video. It is often difficult to have robustness to geometric attacks, signal attacks, and DIBR at the same time.

Disclosure of Invention

Considering that the conventional Depth-image-based-Rendering (DIBR) three-dimensional video zero-watermark scheme cannot well and independently perform copyright protection on a Depth map and a two-dimensional video frame simultaneously, and cannot simultaneously satisfy the problems of signal attack on a video, geometric attack and robustness of DIBR simultaneously, a robust zero-watermark scheme for performing copyright protection on a two-dimensional video frame and a Depth map separately in a three-dimensional video is provided for the first time. First, features are extracted from a two-dimensional video frame and a TIRI (temporal information representative image) of a depth map, and a main share is generated. And then generating a slave share representing the relationship between the copyright information and the master share, and storing the slave share for copyright identification. In addition, according to the scheme, different protection schemes are adopted for the two-dimensional video frames and the depth images for the first time according to different copyright protection requirements, and two different feature extraction methods are used for the two-dimensional video frames and the depth images for the first time to perform feature fusion, so that the robustness to geometric attack, signal attack and DIBR attack is guaranteed, distortion of a synthesized three-dimensional video is avoided, and copyright protection can be simultaneously and independently performed on the two-dimensional video frames and the depth images of the three-dimensional video.

In order to achieve the technical purpose, the invention provides the following technical scheme:

a robustness zero-watermark method facing two-dimensional video frame and depth map right protection in three-dimensional video comprises a watermark sharing stage and a watermark recovery stage;

the watermark sharing stage comprises the following steps:

a10, respectively sampling N frames of video frames from the two-dimensional video frame sequence and the depth map sequence, wherein the value of N is preset;

a20, preprocessing a video frame, setting the size of the video frame obtained by sampling as a fixed value, and performing Gaussian low-pass filtering on the adjusted two-dimensional video frame;

a30, respectively calculating TIRI of the preprocessed two-dimensional video frame and the preprocessed depth map video frame;

a40, respectively generating feature vectors of a two-dimensional video frame and a depth map video frame;

a60, storing the generated slave share in an authentication database;

in step a30, the method for calculating TIRI of the preprocessed two-dimensional video frame and depth map video frame, and generating feature vectors of the two-dimensional video frame and depth map video frame respectively includes the following steps:

Calculating TIRI2 for two-dimensional video frames and using TIRI_2d2That is, TIRI2 based on two-dimensional video frames generates a feature vector of two F by normalizing TIRI-based bias_2d2；

The watermark recovery stage comprises two-dimensional video watermark recovery and/or depth video watermark recovery, and specifically comprises the following steps:

b10, sampling and preprocessing suspicious two-dimensional video frames and/or depth maps shared on the network according to the steps A10-A20;

b20, respectively extracting features of the preprocessed suspicious two-dimensional video frames and/or depth maps, respectively generating two feature vectors, and generating corresponding main shares;

b30, respectively carrying out bitwise XOR operation on the generated master share and the corresponding slave shares stored in the authentication database to respectively obtain two recovered watermarks;

b40, comparing the two recovered watermarks obtained through the suspicious two-dimensional video frame and/or the two recovered watermarks obtained through the suspicious depth map with the corresponding original watermarks to calculate the error rate, and fusing the two error rates corresponding to the suspicious two-dimensional video frame and/or the depth map by using a minimum method characteristic to obtain respective final error rates so as to identify the authenticity and the copyright of the inquired two-dimensional video frame and/or the depth map;

for suspicious two-dimensional video frames, generating feature vector F of the preprocessed two-dimensional video frames according to steps A30-A40_2d1' and F_2d2’。

For the suspicious depth map, generating a feature vector F from the preprocessed depth map according to the steps A30-A40_depth1' and F_depth2’。

In this method, features are first extracted from a two-dimensional video frame and a Time Information Representative Image (TIRI) of a depth map to generate a primary share. Then, a slave share indicating the relationship of copyright information and the master share is generated and stored for copyright identification. In addition, the method separately extracts the characteristics of the two-dimensional video frame and the depth image according to different copyright protection requirements for the first time, and also performs characteristic fusion on the two-dimensional video frame and the depth image by using two characteristic solving methods respectively for the first time, and the fusion of the two methods ensures the robustness of geometric attack, signal attack and DIBR attack at the same time. The zero-watermark scheme does not cause distortion of a synthesized three-dimensional video, shows sufficient robustness to various video attacks, and can simultaneously and independently perform copyright protection on two-dimensional video frames and depth images of the three-dimensional video. When carrying out copyright authentication of DIB three-dimensional video, we apply a flexible authentication mechanism to fully meet the requirements of DRM (digital rights management) for the first time, on one hand, when the copyright information of the two-dimensional video is different from the copyright information of the depth video, the copyright identification process is respectively carried out on the two-dimensional video and the depth video in a separating way, on the other hand, when the copyright information of the two-dimensional video is the same as that of the depth video, we use only the two-dimensional video for copyright recognition, and the copyright recognition result is used as the final copyright recognition result of the three-dimensional video, because the two-dimensional video contains more texture information, the distinguishability is better, and in the special line fusion scheme aiming at the two-dimensional video frames, the two-dimensional video is robust enough to various attacks such as DIBR, translation, rotation, additional noise, filtering and the like.

Further, the TIRI1 based on two-dimensional video frame generates the feature vector F by adopting a dual-tree wavelet-quaternion combination method_2dThe method specifically comprises the following steps:

a411, for each TIRI_2d1It is divided into non-overlapping preprocessed sub-blocks B of size m_sWherein s is the corresponding serial number of each sub-block;

a412, sub-block B_sR, G, B, respectively performing 3-level dual-tree wavelet transform, selecting the coefficient of the 3 rd-level dual-tree wavelet transform domain to enhance the robustness of resisting low-pass filtering, noise and JPEG compression, and transforming six sub-domains H_3,dr(dr ═ 1,2, …,6) are divided into three sets of subdomain pairs, each being (H)_3,1,H_3,6)，(H_3,2,H_3,5) And (H)_3,3,H_3,4) Each of which contains more horizontal, diagonal, and vertical edges, and is selected because vertical edges are more easily distorted during DIBR (H)_3,1,H_3,6) And (H)_3,2,H_3,5) Two sub-fields connected by H in the manner of equation (3)_3,1，H_3,6，H_3,2And H_3,5The sub-field pairs are combined and the amplitude is calculated to obtain

A413, sub-block B_sThe quaternion representation of the dual-tree wavelet transform magnitude matrix of (a), quaternion q is a hypercomplex number comprising a scalar portion s (q) a and a vector portion v (q), is represented as follows:

q＝s(q)+v(q)＝a+bi+cj+dk

wherein a, b, c, d are real numbers, i, j, k are imaginary numbers, the quaternion is called a pure quaternion when the vector part of the quaternion is equal to zero, and the calculation of the quaternion satisfies the following rule:

i²＝j²＝k²＝ijk＝-1

ij＝-ji＝k,ki＝-ik＝j,jk＝-kj＝i

one sub-block B_sThe dual-tree wavelet transform magnitude matrix of (a) may consist of a set of pure four-element numbers:

a414, calculatingAnd is recorded as

According to the original H_3,1，H_3,6，H_3,2And H_3,5The corresponding positions of the sub-domains will

Is divided into 4 sub-arrays in the following way

A415, calculating the characteristic value, firstly obtaining the TIRI by a median binarization method according to the following formula_2d1A characteristic value of (d);

wherein L represents a one-dimensional vector H_sLength of (2), final binary feature vector F_2d1From all TIRIs_2d1Is obtained by concatenating characteristic values of, i.e. F_2d1＝(Ft₍₁₎,Ft₍₂₎…Ft_(s)…)。

According to the feature extraction method for the two-dimensional video frame, the low and medium frequency coefficients of the 3 rd-level dual-tree wavelet transform domain are selected, so that the robustness of resisting low-pass filtering, noise and JPEG compression is enhanced; and because only those containing more horizontal edges are selected (H)_3,1,H_3,6) And (H)_3,2,H_3,5) The two groups of sub-domains reduce the influence on the vertical edge when DIBR operation is carried out on the two-dimensional video frame, and improve the robustness of resisting DIBR attack.

TIRI of the depth map-based video frame_2d2Generation of feature vector two F by normalized TIRI-based bias_2d2The method specifically comprises the following steps:

D_2d(i,j,k)＝max(|TIRI_2d2(i±1,j±1)-F_norm2d(i,j,k)|)

Wherein i is more than or equal to 2 and less than or equal to H-1, j is more than or equal to 2 and less than or equal to W-1, and k is more than or equal to 1 and less than or equal to L;

a422, normalizing the TIRI-based deviation and recording the normalized TIRI-based deviation as N in the following formula_2d(i,j,k)；

N_2d(i,j,k)＝arctan(D_2d(i,j,k)/TIRI_2d2(i,j))

A423, dividing the normalized deviation based on the TIRI into a center circle and X-1 concentric circles, setting the radius of the center circle and the width of the concentric circles as r,

for each pixel (i, j, k) in the k-th frame, it is first calculated to the frame center point (i, j, k) as follows_o,j_oK) distance Dist (i, j, k),

the partition n, n of the pixel (x, y, k) is then calculated from Dist (i, j, k) in the following way,

when the video is rotated or flipped, the pixels divided in this manner still belong to their originally associated circular or annular partition. Thus, the robustness of the features to rotation and flip attacks is guaranteed. Furthermore, the regions outside the largest loop were not used in our study for the following two reasons. On the one hand, the primary vision of a video frame is usually concentrated in its central region, and in general the importance of a pixel increases as its distance from the center of the frame decreases. Therefore, discarding features generated by the largest out-of-loop region does not lose much important visual information. On the other hand, since these regions are the most common locations for logo insertion and edge clipping attacks, the feature robustness to these attacks can be enhanced by discarding the regions outside the largest circles.

A424, using the pixel value in TIRI as the weight of normalized TIRI-based deviation

Calculate the centroid of the normalized TIRI-based bias in each partition, then according to f ═ v (1,1) … v (N,1) … v (1, K) … v (N, K)]Generating an intermediate feature of a two-dimensional video frame, denoted f_2dWhere K is the number of frames consisting of normalized TIRI-based bias;

a425, normalizing the intermediate features by mean and standard deviation in the following manner, to generate features fn,

and F, binarizing fn according to the following formula and the median value t of fn to obtain the final characteristic F_2d2：

Further, the TIRI1 based on the depth map video frame generates a feature vector F by adopting a method of two-dimensional DCT transformation and low-frequency coefficient taking_depth1The method specifically comprises the following steps:

a431, TIRI for each frame_depth1Performing two-dimensional discrete cosine transform to obtain DCT_depth；

A432, selecting DCT as follows_depthLow frequency coefficient Coeff_depth：

Coeff_depth(i-1,j-1)＝DCT_depth(i,j)

Wherein i is more than or equal to 2 and less than or equal to 9, j is more than or equal to 2 and less than or equal to 9, DCT_depthThe DC coefficients of (a) are excluded to improve the distinguishability of the feature values;

a433 according to the median pair Coeff_depthBinarization to ensure maximum precinct of binary featuresDifferential, yielding f 1:

f1(i, j) ═ 1 when Coeff_depth(i,j)＞t；

f1(i, j) ═ 0 when Coeff_depth(i,j)≤t；

Wherein i is more than or equal to 1 and less than or equal to 8, j is more than or equal to 1 and less than or equal to 8, and t is Coeff_depthThe median value of (d); connecting all TIRIs_depth1F1 to obtain the final extracted feature vector F_depth1。

The feature extraction method for the depth map extracts the low-frequency coefficient in the DCT domain, the operation enhances the robustness of resisting additional noise attack and low-pass filtering attack, and the DC value is removed to enhance the distinguishability of the features.

Further, the TIRI based on the depth map video frame_depth2Generating a feature vector two F of the depth map video frame according to the method for solving the normalized TIRI-based deviation in the step A421-A425_depth2。

Further, the TIRI1 and TIRI2 of the two-dimensional video frame and the TIRI1 and TIRI2 of the depth map video frame calculated in step a30 are calculated according to the following formulas:

W_k＝a^k

in calculating TIRI for a two-dimensional video frame, F_kRepresenting a k-th frame of a two-dimensional video frame, W_kRepresenting the weight of the kth two-dimensional video frame, L representing the number of sampling frames of the two-dimensional video frame sequence downsampling, i representing the serial number of TIRI of the two-dimensional video frame, and a being more than or equal to 0 and less than or equal to 1; in computing TIRI1 for a depth map video frame, F_kRepresenting the kth frame depth map, W_kAnd the weight of the kth frame depth map is represented, L represents the number of sampling frames for depth map sequence downsampling, i represents the serial number of TIRI of the depth map, and a is more than or equal to 0 and less than or equal to 1. As a approaches 0, the generated TIRI will contain more spatial information and less temporal information, resulting in a more discriminative representation image. Conversely, when a is close to 1, the generated TIRI will be a blurred image containing mean time information, resulting in a more robust representation image.

Further, the bit error rate BER calculation method is as follows,

where W' (i, j) and W (i, j) denote the pixels of the recovered watermark and the original watermark, respectively, ⊕ denotes the XOR operation, m denotes_w×m_wIs the size of the watermark.

Advantageous effects

In order to simultaneously and independently protect the two-dimensional video frame and the depth map, the embodiment adopts different methods for extracting features for the two-dimensional video frame and the depth map: aiming at a two-dimensional video frame, a method based on dual-tree wavelet-quaternion combination is adopted, and horizontal and diagonal subdomains containing fewer vertical edges in a low-frequency transform domain in dual-tree wavelet are taken to enable an extracted feature vector F_2d1Resisting signal attack and DIBR attack, and obtaining a feature vector two F by using a method for solving normalized TIRI-based deviation_2d2To resist geometric attacks such as rotation and shear. And further generating a secondary shared O of the two-dimensional video frame_2d1And O_2d2And stores them in the slave shared database. In the watermark recovery stage of the two-dimensional video frame, the main sharing M for inquiring the two-dimensional video frame is solved_2d1' and M_2d2', and separately from the stored slave shared O_2d1And O_2d2Performing an XOR operation to obtain a recovered watermark W_2d1' and W_2d2Comparing the two watermark images with the original watermark image, and finally determining the final copyright identification result according to a minimum value fusion method. And for the depth map, generating a feature vector F by a method of performing two-dimensional DCT (discrete cosine transform) transformation on the depth map and taking a low-frequency coefficient_depth1To resist signal attack, a method for solving normalized TIRI-based deviation is used to obtain a feature vector II F_depth2To resist geometric attacks such as rotation and shear. And further generating slave shared O of depth map_depth1And O_depth2. Depth map watermark recovery stage, findingMaster share M for querying depth maps of three-dimensional video_depth1' and M_depth2', and separately from the stored slave shared O_depth1And O_depth2Performing an XOR operation to obtain a recovered watermark W_depth1' and W_depth2Comparing the two watermark images with the original watermark image to obtain BER, and finally determining the final copyright identification result according to a minimum value fusion method.

Drawings

Fig. 1 is a general flowchart of a zero-watermarking method in an embodiment of the invention;

FIG. 2 is a flow chart of two methods of two-dimensional video frame feature extraction according to an embodiment of the present invention;

FIG. 3 is a flow chart of a method for extracting two feature vectors of a depth map according to an embodiment of the present invention;

FIG. 4 is an original two-dimensional video frame, a depth map and a binary watermark image in an embodiment of the present invention;

FIG. 5 is a flow chart of copyright identification in an embodiment of the present invention;

fig. 6 is a binarized watermark image recovered from a suspicious two-dimensional video according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

As shown in fig. 1 to fig. 6, the present invention provides a robust zero-watermark method for two-dimensional video frame and depth map right protection in three-dimensional video, which includes a watermark sharing stage and a watermark restoring stage;

the watermark sharing stage comprises a two-dimensional video frame part and a depth map part;

the two-dimensional video frame portion comprises the steps of:

a10: n frames of two-dimensional video frames are sampled from the two-dimensional video frame sequence to ensure that the lengths of the feature vectors of all input images are equal, wherein the value of N in the embodiment is 100, and equal-interval sampling is adopted during sampling.

A20: and preprocessing the two-dimensional video frame. The size of the two-dimensional video frame is set to be P multiplied by P pixels, and the robustness of resisting the scaling attack is enhanced by the operation; smoothing the adjusted two-dimensional video frame through Gaussian low-pass filtering to enhance the robustness of the video frame against noise attack; recording the processed two-dimensional video frame as F_norm2dThe value of P in this embodiment is 320. A10 and a20 correspond to the process of preprocessing a two-dimensional video frame in fig. 2 (a).

A31: solving for F according to equations (1) and (2)_norm2dAnd using the TIRI (temporal information representative images) of (1)_2dAnd (4) showing.

W_k＝a^k(2)

Wherein, F_kRepresenting a k-th frame of a two-dimensional video frame, W_kAnd (3) representing the weight of the kth frame two-dimensional video frame, wherein L represents the number of sampling frames of the two-dimensional video frame sequence for down sampling, and a is more than or equal to 0 and less than or equal to 1. In this embodiment, L has a value of 20, i.e., 100/20 ═ 5 frames of TIRI are generated_2dAnd a is set to 1. A31 corresponds to the process of averaging in fig. 2 (a).

Step A41: TIRI (triangulated irregular interference) based on two-dimensional video frame generates feature vector F by adopting dual-tree wavelet-quaternion combination method_2d1；

As shown in fig. 2(a), step a41 specifically includes the following steps:

step A411: for each TIRI_2d1It is divided into non-overlapping preprocessed sub-blocks B of size m_sAnd s is the serial number corresponding to each sub-block. After testing, the value of m is set to 40 in this embodiment.

Step A412: for sub-block B_sR, G of,And B, respectively carrying out 3-level dual-tree wavelet transform on the three components, and selecting coefficients of a3 rd-level dual-tree wavelet transform domain to enhance the robustness of resisting low-pass filtering, noise and JPEG compression. Six sub-fields H obtained by conversion_3,dr(dr ═ 1,2, …,6) are divided into three sets of subdomain pairs, each being (H)_3,1,H_3,6)，(H_3,2,H_3,5) And (H)_3,3,H_3,4) Each containing more horizontal, diagonal and vertical edges. In this example we use only (H)_3,1,H_3,6) And (H)_3,2,H_3,5) Two sets of subfields, because the vertical edges are more easily distorted during DIBR. Connecting H according to the method in formula (3)_3,1，H_3,6，H_3,2And H_3,5The sub-field pairs are combined and the amplitude is calculated to obtain

Step A413: a quaternion representation of a dual-tree wavelet transform magnitude matrix of a color image. The quaternion q is a hypercomplex number comprising a scalar section s (q) a and a vector section v (q), and is expressed as follows:

q＝s(q)+v(q)＝a+bi+cj+dk

i²＝j²＝k²＝ijk＝-1

ij＝-ji＝k,ki＝-ik＝j,jk＝-kj＝i

for quaternion, the dual-tree wavelet transform magnitude matrix for a color image may consist of a set of pure quaternions:

step A414: computing

And is recorded as

According to the original H_3,1，H_3,6，H_3,2And H_3,5The corresponding positions of the sub-domains willIs divided into 4 sub-arrays in the following way

In the present embodiment, it is preferred that,

the size of each subarray is 25 × 25. And performing DCT (discrete cosine transformation) on one row of the 4 sub-arrays each time, extracting DC coefficients with higher robustness respectively, forming the obtained 4 DC coefficients into a one-dimensional vector, performing DCT again, and taking the DC coefficients. Sub-block B_sFinally, 5 DC coefficients are obtained, and all the sub-blocks B are connected_sThe 5 DC coefficients of (a) result in a one-dimensional vector of length (320/40) × (320/40) × 5 ═ 320, denoted as H_s。

Step A415: calculating the characteristic value by calculating TIRI according to the following formula_2d1Is determined by the characteristic value of (a),

wherein L represents a one-dimensional vector H_jLength of (2), final binary feature vector F_2d1From all TIRIs_2d1Is obtained by concatenating characteristic values of, i.e. F_2d1＝(Ft₍₁₎,Ft₍₂₎…Ft_(s)…). In this embodiment, L is 320, and the feature vector F_2d1Has a dimension of 320 × 5 to 1600 bits.

Step A32: solving for F according to equations (1) and (2)_norm2dTIRI2, and TIRI_2d2Is shown in the specification, wherein F_kRepresenting a k-th frame of a two-dimensional video frame, W_kAnd the weight of the kth frame two-dimensional video frame is represented, L represents the number of sampling frames of the two-dimensional video frame sequence downsampling, i represents the serial number of TIRI of the depth map, and a is more than or equal to 0 and less than or equal to 1. In this embodiment, one frame is taken every 5 frames in an interval sampling manner, that is, L has a value of 100/5-20. And a is set to 1 in the same manner as step a 31. B32 corresponds to the averaging process in fig. 2 (B).

Step A42: TIRI generation using normalized TIRI-based bias_2d2Feature vector of (2) two F_2d2。

As shown in fig. 2(b), step a42 specifically includes the following steps:

step A421: preprocessing frames F by computation_norm2dThe maximum absolute difference between the pixel in (a) and its 8 spatial neighborhood pixels in the TIRIs, the TIRI-based deviation D is generated as follows according to equation (13)_2d(i,j,k)；

D_2d(i,j,k)＝max(|TIRI_2d2(i±1,j±1)-F_norm2d(i,j,k)|) (13)

Wherein i is more than or equal to 2 and less than or equal to 319, j is more than or equal to 2 and less than or equal to 319, and k is more than or equal to 1 and less than or equal to 100.

Step A422: normalize TIRI-based bias as N in (14)_2d(i,j,k)；

N_2d(i,j,k)＝arctan(D_2d(i,j,k)/TIRI_2d2(i,j)) (14)

Step A423: dividing the normalized deviation based on TIRI into a central circle and an X-1 concentric circle, wherein the radius of the central circle and the width of the concentric circle are set as r;

for each pixel (i, j, k) in the k-th frame, it is first computed to the frame center point (i, j, k)_o,j_oK), the distance Dist (i, j, k) is as shown in equation (15).

The partition n of the pixel (x, y, k) is then calculated from Dist (i, j, k), as shown in (16).

In the present embodiment, the size of a frame composed of normalized TIRI-based offsets is 320 × 320, where X is 16 and r is 10.

Step A424: the pixel values in the TIRI are used as weights for the normalized TIRI-based bias. The centroid of the normalized TIRI-based bias in each partition is calculated using (17), as shown at (18), to generate an intermediate feature of the two-dimensional video frame, denoted as f_2d。

f＝[v(1,1)...v(n,1)...v(1,K)...v(N,K)](18)

Where K is the number of frames consisting of normalized TIRI-based bias. In this embodiment, K equals 100, f_2dDimension K X equals 1600 bits.

Step A425: the intermediate features are normalized by mean and standard deviation to generate features fn, as shown in (19).

F is binarized according to their median values as shown in the following formula (20) to obtain the final feature F_2d2。

Step A50: generating a master share and a slave share, and applying a vector F_2d1And F_2d2Rearranging to a40 x 40 two-dimensional matrix, i.e. generating a master share M_2d1And M_2d2To M_2d1And M_2d2Respectively associated with a40 x 40 sized binary watermark W containing copyright information_2dPerforming XOR operation according to bit to generate slave sharing O_2d1And O_2d2。

Step A60: will share O from_2d1And O_2d2Stored in an authentication database for use in copyright authentication.

Depth map section:

step A10: and sampling to a fixed value N frame depth map from the depth map sequence, wherein the value of N is 100 in the embodiment, and sampling at equal intervals is adopted.

Step A20: preprocessing the depth map, setting the size of the depth map to a fixed value P × P pixels, and recording the processed video frame as F_normdepthThe value of P in this embodiment is 320. A10 and a20 correspond to the pretreatment processes in fig. 3(a) and (b).

Step A33: solving for F according to equations (1) and (2)_normdepthTIRI1, and TIRI_depth1Is shown in the specification, wherein F_kRepresenting the kth frame depth map, W_kRepresenting the weight of the kth frame depth map, and L representing the descending sampling of the depth map sequenceThe number of sample frames, i, represents the TIRI number of the depth map, and a is greater than or equal to 0 and less than or equal to 1. In the present embodiment, assuming that L has a value of 4, 100/4 ═ 25 frames of TIRI are generated_depth1. And a is set to 1 in the same manner as step a 31. A33 corresponds to the averaging process in fig. 3 (a).

Step A43: TIRI1 method for generating TIRI by two-dimensional DCT transformation and low-frequency coefficient taking based on depth map_depth1Feature vector of (1)_depth1。

As shown in fig. 3(a), step a43 specifically includes the following steps:

step A431: TIRI for each frame_depth1Performing two-dimensional discrete cosine transform (2D-DCT) to obtain DCT_depth。

Step A432: the low-frequency coefficient Coeff of DCTdepth is selected as follows_depth：

Coeff_depth(i-1,j-1)＝DCT_depth(i,j) (10)

Wherein i is more than or equal to 2 and less than or equal to 9, and j is more than or equal to 2 and less than or equal to 9.

Step A433: according to median pair Coeff_depthBinarization is performed to ensure maximum distinguishability of binary features, yielding f 1:

wherein i is more than or equal to 1 and less than or equal to 8, j is more than or equal to 1 and less than or equal to 8, and t is Coeff_depthThe median value of (a). Connecting 25 TIRIs_depth1F1 to obtain the final extracted feature vector F_depth1. In the present embodiment, the dimension of the feature vector is 1600 bits by 25 × 64.

Step A34: solving for F according to equations (1) and (2)_normdepthTIRI2, and TIRI_depth2Is shown in the specification, wherein F_kRepresenting the kth frame depth map, W_kRepresenting weight of the kth frame depth map, L representing a depth map sequenceThe number of sampling frames for down-sampling, i represents the serial number of TIRI of the depth map, and a is greater than or equal to 0 and less than or equal to 1. In the present embodiment, one frame is taken every 5 frames in an interval sampling manner, that is, L has a value of 100/5-20. And a is set to 1 in the same manner as step a 31. A34 corresponds to the averaging process in fig. 3 (b).

Step A44: TIRI generation using normalized TIRI-based bias_depth2Feature vector of (2) two F_depth2. Construction F_normdepthThe TIRI2 of (1) corresponds to the averaging process in FIG. 3 (b).

As shown in FIG. 3(b), the step A44 is the same as the step A421-A425.

Step A50: generating a master share and a slave share, and applying a vector F_depth1And F_depth2Rearranging to a40 x 40 two-dimensional matrix, i.e. generating a master share M_depth1And M_depth2To M_depth1And M_depth2Respectively associated with a40 x 40 sized binary watermark W containing copyright information_depthPerforming XOR operation according to bit to generate slave sharing O_depth1And O_depth2。

Step A60: will share O from_depth1And O_depth2Stored in an authentication database for use in copyright authentication.

In fig. 4, (a) is an original two-dimensional video frame, (b) is a depth map, and (c) is a binarized watermark image.

The watermark recovery stage comprises a two-dimensional video part and/or a depth video part;

as shown in fig. 5 (a), the two-dimensional video part includes the steps of:

step B11: processing suspicious two-dimensional video frames shared on the network according to the part of the two-dimensional video frames in the steps A10-A50 to generate a binary feature vector F based on TIRI of the two-dimensional video frames_2d1' and F_2d2', then generates the corresponding master share M_2d1' and M_2d2’。

Step B21: master sharing M of suspicious depth maps_2d1' and M_2d2' sharing O with slaves in an authentication database, respectively_2d1And O_2d2Bitwise XOR (XO)R) operation respectively resulting in a recovered watermark W_2d1' and W_2d2’。

Step B31: by watermarking W to be recovered according to equation (23)_2d1' and W_2d2' with original watermark W_2dComparing to calculate the BER to obtain the BER_2d1And BER_2d2To identify the authenticity and copyright of the queried two-dimensional video frame.

Where W' (i, j) and W (i, j) denote the pixels of the recovered watermark and the original watermark, respectively, ⊕ denotes an exclusive OR (XOR) operation, m denotes_w×m_wIs the size of the watermark.

For BER_2d1And BER_2d2The final BER is obtained by the method of minimum value in the formula (24) and characteristic fusion_2d。

BER_2d＝min(BER_{2d_r},BER_{2d_nr}) (24)

Through experimental tests on 200 three-dimensional videos, we take 0.243 as BER_2dWhen the threshold value is determined as BER_2dA value of less than 0.243 we consider the queried image to be an illegally copied or tampered image. When BER_2dA value of greater than or equal to 0.243 we consider the queried image to be not an illegally copied or tampered image. The queried two-dimensional video frame of the embodiment is an image with additive noise attack on the original image, and the restored watermark is shown in fig. 6(a), and the BER thereof_2dThe experimental result is 0.057, and the copyright identification result judges that the inquired two-dimensional video frame is an illegally copied or tampered image.

As shown in fig. 5 (b), the depth video part includes the steps of:

step B12: processing the suspicious depth map shared on the network according to the part aiming at the depth map in the steps A10-A50 to generate a binary feature vector F of TIRI based on the depth map_depth1' and F_depth2', then generates the corresponding master share M_depth1' and M_depth2’。

Step B22: master sharing M of suspicious depth maps_depth1' and M_depth2' sharing O with slaves in an authentication database, respectively_depth1And O_depth2Respectively obtaining restored watermarks W by carrying out exclusive OR (XOR) operation according to bits_depth1' and W_depth2’。

Step B32: watermark W to be recovered by equation (23)_depth1' and W_depth2' with original watermark W_depthComparing to calculate the BER to obtain the BER_depth1And BER_depth2To identify the authenticity and copyright of the queried two-dimensional video frame.

For BER_depth1And BER_depth2And (3) fusing the characteristics by the characteristic method taking the minimum value as shown in (24) to obtain the final BER_depth。

Through experimental tests on 200 three-dimensional videos, we take 0.19 as BER_depthWhen the threshold value is determined as BER_depthA value of less than 0.19 we consider the queried image to be an illegally copied or tampered image. When BER_depthA value of 0.19 or more we consider the image under query not to be an illegally copied or tampered image. The queried depth map of the embodiment is an image obtained by adding additive noise attack to the original depth map, and the restored watermark is shown in fig. 6(b), and the BER of the restored watermark is_depthThe experimental result is 0.038, and the copyright identification result judges that the queried depth map is an illegally copied or tampered image.

When the copyright of DIBR three-dimensional video is authenticated, a flexible authentication mechanism is applied for the first time to fully meet the requirements of DRM (digital rights management). On one hand, when the copyright information of the two-dimensional video is different from the copyright information of the depth video, the two-dimensional video and the depth video are separated to carry out respective copyright identification processes. On the other hand, when the copyright information of the two-dimensional video is the same as that of the depth video, the copyright identification is performed only by using the two-dimensional video, and the copyright identification result is used as the final copyright identification result of the three-dimensional video, because the two-dimensional video contains more texture information, the distinguishability is better, and in the fusion scheme aiming at the two-dimensional video frame, the two-dimensional video frame is robust enough to various attacks such as DIBR, translation, rotation, additive noise, filtering and the like.

In order to simultaneously and independently protect the two-dimensional video frame and the depth map, the embodiment adopts different methods for extracting features for the two-dimensional video frame and the depth map: aiming at a two-dimensional video frame, a method based on dual-tree wavelet-quaternion combination is adopted, and horizontal and diagonal subdomains containing fewer vertical edges in a low-frequency transform domain in dual-tree wavelet are taken to enable an extracted feature vector F_2d1Resisting signal attack and DIBR attack, and obtaining a feature vector two F by using a method for solving normalized TIRI-based deviation_2d2To resist geometric attacks such as rotation and shear. And further generating a secondary shared O of the two-dimensional video frame_2d1And O_2d2And stores them in the slave shared database. In the watermark recovery stage of the two-dimensional video frame, the main sharing M for inquiring the two-dimensional video frame is solved_2d1' and M_2d2', and separately from the stored slave shared O_2d1And O_2d2Performing an XOR operation to obtain a recovered watermark W_2d1' and W_2d2Comparing the two watermark images with the original watermark image, and finally determining the final copyright identification result according to a minimum value fusion method. And for the depth map, generating a feature vector F by a method of performing two-dimensional DCT (discrete cosine transform) transformation on the depth map and taking a low-frequency coefficient_depth1To resist signal attack, a method for solving normalized TIRI-based deviation is used to obtain a feature vector II F_depth2To resist geometric attacks such as rotation and shear. And further generating slave shared O of depth map_depth1And O_depth2. Depth map watermark recovery stage, calculating main sharing M of depth map of inquiry three-dimensional video_depth1' and M_depth2', and separately from the stored slave shared O_depth1And O_depth2Performing an XOR operation to obtain a recovered watermark W_depth1' and W_depth2Comparing the two watermark images with the original watermark image to obtain BER, and determining the final copyright identification according to the minimum value fusion methodAs a result, the decision fusion method ensures that the depth map has robustness to both signal attack and geometric attack.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

22页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于产生更高分辨率帧的方法、装置和处理器

Robustness zero-watermark method for two-dimensional video frame and depth map right protection in three-dimensional video

相关技术

网友询问留言