CNN-based self-adaptive audio steganography method and secret information extraction method

文档序号:1230224 发布日期:2020-09-08 浏览:22次 中文

阅读说明:本技术 基于cnn的自适应音频隐写方法和秘密信息提取方法 (CNN-based self-adaptive audio steganography method and secret information extraction method ) 是由 王让定 王杰 严迪群 董理 于 2020-04-17 设计创作,主要内容包括:一种基于CNN的自适应音频隐写方法和秘密信息提取方法,根据对训练集中的音频信号与秘密信息进行维度叠加后将其输入到编码网络中,得到采样点修改向量;之后将其与音频信号进行相加得到隐写音频,并将隐写音频输入到解码网络中得到解码后的秘密信息;接着根据损失函数分别更新解码网络和编码网络中的参数;并采用与上述相同的方法依次使用所有训练样本对编码网络和解码网络进行训练得到训练完成的编码网络和解码网络;完成上述操作后,将待隐写的秘密信息与音频信号按照上述处理后输入到训练完成的编码网络中即可得到隐写音频,并将隐写音频输入到训练完成的解码网络中,即得到提取出的秘密信息。该方法产生微弱的扰动,保证隐写后的音频质量。(A CNN-based self-adaptive audio steganography method and a secret information extraction method are disclosed, wherein audio signals in a training set and secret information are input into a coding network after being subjected to dimension superposition to obtain sampling point modification vectors; adding the steganographic audio signal and the audio signal to obtain steganographic audio, and inputting the steganographic audio into a decoding network to obtain decoded secret information; then, parameters in a decoding network and parameters in an encoding network are respectively updated according to the loss function; training the coding network and the decoding network by using all training samples in sequence by adopting the same method to obtain the trained coding network and decoding network; after the operation is completed, the secret information to be steganographically and the audio signal are processed according to the above steps and then input into the trained coding network to obtain steganographically audio, and the steganographically audio is input into the trained decoding network to obtain the extracted secret information. The method generates weak disturbance, and ensures the audio quality after steganography.)

1. A CNN-based adaptive audio steganography method and a secret information extraction method are characterized in that: the method comprises the following steps:

step 1, constructing a training set and a testing set: cutting the audio signals in the audio sample library into audio segments with consistent lengths, wherein each audio segment is provided with m sampling values; selecting part of audio segments from the audio sample library to construct a training set, and constructing the rest audio segments into a test set; m is a positive integer;

step 2, randomly selecting n audio segments in a training set, forming a training sample by the n audio segments, and preprocessing the training sample to obtain an m x n audio signal x; n is a positive integer; the pretreatment comprises the following specific steps:

step 2-1, performing normalization processing on m sampling values of each audio segment in the training sample to enable the m sampling values in each audio segment to be normalized to the range of [ -1,1 ];

2-2, respectively processing the m normalized sampling values in each audio segment by a high-order filter to obtain the sampling values after high-order filtering, and constructing the sampling values into m x n audio signals x;

step 3, randomly generating n secret information bit streams with the length of m, and constructing the secret information bit streams into secret information y of m x n;

step 4, performing dimension superposition on the audio signal x of m × n in the step 2 and the secret information y of m × n in the step 3 to obtain a matrix of m × 2 n;

step 5, taking the matrix of m × 2n as input data, and inputting the input data into an initialized coding network to obtain a sampling point modification vector E of m × n; wherein, the coding network is a convolutional neural network;

step 6, obtaining steganographic audio s according to the m x n sampling point modification vector E obtained in the step 5 and the m x n audio signal x in the step 2; the calculation formula is as follows:

s=x+E*α;

wherein alpha is a preset coefficient;

step 7, inputting the steganographic audio s into an initialized decoding network to obtain the secret information y' of m × n after decoding; wherein, the decoding network is a convolutional neural network;

step 8, calculating loss function L in decoding networkDAnd according to the loss function LDUpdating parameters in the decoding network; decoding a loss function L in a networkDThe calculation formula of (2) is as follows:

Figure FDA0002456285120000011

wherein, yijIs the value y 'corresponding to the ith row and jth column in the secret information y'ijThe value is the value corresponding to the ith row and the jth column in the decoded secret information y';

step 9, calculating a loss function L in the coding networkEAnd according to the loss function LEUpdating parameters in the coding network; decoding a loss function L in a networkEThe calculation formula of (2) is as follows:

Figure FDA0002456285120000021

wherein β and γ are control adaptive loss functions L respectivelyeAnd decoding the loss function LDEmpirical coefficient of weight, xjThe values corresponding to all rows in the j-th column of the audio signal x are a matrix of m x 1,for convolution operations, f is high of any orderA filter of the order of one or more of,is a pair ofTaking an absolute value of each numerical value in the matrix of m x 1 obtained after operation, and selecting an absolute value numerical value corresponding to the ith row; x is the number ofijThe value of the ith row and the jth column in the audio signal x is sigma a preset constant value, EijModifying a value corresponding to the jth column in the ith row in the vector E for the sampling point;

step 10, constructing a plurality of training samples in the training set again, and training the updated coding network and decoding network of each training sample by adopting the same method in the steps 2 to 9 in sequence until all the training samples in the training set train the coding network and the decoding network according to the preset number of training turns to obtain a trained coding network and a trained decoding network;

step 11, converting the secret information to be sent into a binary bit stream, constructing the binary bit stream into secret information y 'of m x k, randomly selecting k audio segments in a test set, and constructing the k audio segments into a test sample x' of m x k; finally, after the test sample x ' and the secret information y ' are processed according to the steps 2-4, the test sample x ' and the secret information y ' are input into a trained coding network, and steganographic audio s ' is obtained according to the calculation in the step 6, wherein k is a positive integer;

and step 12, inputting the steganographic audio s 'obtained in the step 11 into a decoding network after training is completed, namely obtaining the secret information extracted from the steganographic audio s'.

2. The CNN-based adaptive audio steganography method and secret information extraction method of claim 1, wherein: the coding network in the step 5 comprises N convolutional block layers, 1 convolutional layer and a first activation function which are sequentially connected, each convolutional block layer comprises a convolutional layer, a normalization layer and a second activation function which are sequentially connected, and N is a positive integer.

3. The CNN-based adaptive audio steganography method and secret information extraction method of claim 2, wherein: n-7.

4. The CNN-based adaptive audio steganography method and secret information extraction method of claim 2, wherein: the first activation function adopts a TanH activation function; the second activation function is a ReLU activation function.

5. The CNN-based adaptive audio steganography method and secret information extraction method of claim 1, wherein: the decoding network in step 7 includes M convolutional block layers and 1 convolutional layer which are connected in sequence, each convolutional block layer includes a convolutional layer and a third activation function which are connected in sequence, and M is a positive integer.

6. The CNN-based adaptive audio steganography method and secret information extraction method of claim 5, wherein: and M is 7.

7. The CNN-based adaptive audio steganography method and secret information extraction method of claim 5, wherein: the third activation function adopts a ReLU activation function.

Technical Field

The invention relates to the field of audio encryption, in particular to a CNN-based self-adaptive audio steganography method and a secret information extraction method.

Background

Digital steganography, which is widely used in the field of information security, is a technique for invisibly hiding secret information in a digital carrier (e.g., text, voice, image, etc.) and then transmitting it to a recipient through an open channel. Steganography focuses on invisibility, and has the advantage that after secret information is hidden, third parties are not easily attracted to attention. Traditional steganography hides secret information in perceptually redundant parts of a digital carrier. More advanced adaptive steganography achieves a high-concealment steganography effect by designing a distortion cost function and matching with convolutional coding.

The latest trend in steganography is to incorporate deep learning techniques to achieve automatic steganography. Hayes in the literature "Generation and steganographic images via adaptive tracing, Advances in neural information Processing Systems, pp.1954-1963,2017" first proposes to construct an end-to-end steganographic framework under the GAN framework with the encoding network, decoding network and steganographic analysis network as components. The encoding network takes the carrier image and the secret information as input and outputs the steganographic image. The decoding network can extract the secret information in the steganographic image after receiving the steganographic image. The steganalysis network improves the steganography concealment with the identity of the adversary. Zhu In the document "Hidden: Hiding data with depth networks," In Proceedings of the European Conference on Computer Vision, pp.657-672,2018 "greatly improves the robustness of steganography by adding a noise simulation layer, which can resist attacks such as JPEG compression, noise pollution, etc. Zhang further modifies the structure of neural network in the document "SteganoGAN: high capacity image structural graphics with gans" arXiv preprintiv: 1901.03892,2019 ", and adds DenseNet module to realize larger steganographic capacity than the former two methods.

The prior art is applied to the field of digital images, and Hayes and Zhu use a full connection layer in an encoding network in the literature in consideration of two-dimensional characteristics of secret information and images, but the full connection layer needs a large amount of memory consumption due to the one-dimensional time sequence characteristic of audio, so the method is not suitable for audio carriers; secondly, in order to improve the concealment of steganography, the three documents are added with a steganography analysis network to carry out confrontation training with a coding network, but the steganography analysis network used in the frames has weak capability and cannot really guide the training of the coding network, so that the steganography audio generated by using the documents can generate strong disturbance, cannot ensure the quality of the steganography audio, and is easily perceived by human ears.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a CNN-based adaptive audio steganography method and a secret information extraction method aiming at the current situation of the prior art, and the method reduces the disturbance generated by steganography audio under the condition of ensuring the secret information extraction, ensures the audio quality after steganography and improves the imperceptibility of the audio.

The technical scheme adopted by the invention for solving the technical problems is as follows: a CNN-based adaptive audio steganography method and a secret information extraction method are characterized in that: the method comprises the following steps:

step 1, constructing a training set and a testing set: cutting the audio signals in the audio sample library into audio segments with consistent lengths, wherein each audio segment is provided with m sampling values; selecting part of audio segments from the audio sample library to construct a training set, and constructing the rest audio segments into a test set; m is a positive integer;

step 2, randomly selecting n audio segments in a training set, forming a training sample by the n audio segments, and preprocessing the training sample to obtain an m x n audio signal x; n is a positive integer; the pretreatment comprises the following specific steps:

step 2-1, performing normalization processing on m sampling values of each audio segment in the training sample to enable the m sampling values in each audio segment to be normalized to the range of [ -1,1 ];

2-2, respectively processing the m normalized sampling values in each audio segment by a high-order filter to obtain the sampling values after high-order filtering, and constructing the sampling values into m x n audio signals x;

step 3, randomly generating n secret information bit streams with the length of m, and constructing the secret information bit streams into secret information y of m x n;

step 4, performing dimension superposition on the audio signal x of m × n in the step 2 and the secret information y of m × n in the step 3 to obtain a matrix of m × 2 n;

step 5, taking the matrix of m × 2n as input data, and inputting the input data into an initialized coding network to obtain a sampling point modification vector E of m × n; wherein, the coding network is a convolutional neural network;

step 6, obtaining steganographic audio s according to the m x n sampling point modification vector E obtained in the step 5 and the m x n audio signal x in the step 2; the calculation formula is as follows:

s=x+E*α;

wherein alpha is a preset coefficient;

step 7, inputting the steganographic audio s into an initialized decoding network to obtain the secret information y' of m × n after decoding; wherein, the decoding network is a convolutional neural network;

step 8, calculating loss function L in decoding networkDAnd according to the loss function LDUpdating parameters in the decoding network; decoding a loss function L in a networkDThe calculation formula of (2) is as follows:

Figure BDA0002456285130000021

wherein, yijIs the value y 'corresponding to the ith row and jth column in the secret information y'ijThe value is the value corresponding to the ith row and the jth column in the decoded secret information y';

step 9, calculating a loss function L in the coding networkEAnd according to the loss function LEUpdating parameters in the coding network; decoding a loss function L in a networkEThe calculation formula of (2) is as follows:

wherein β and γ are control adaptive loss functions L respectivelyeAnd decoding the loss function LDEmpirical coefficient of weight, xjThe values corresponding to all rows in the j-th column of the audio signal x are a matrix of m x 1,

Figure BDA0002456285130000032

for convolution, f is a high order filter of any order,is a pair of

Figure BDA0002456285130000034

Taking an absolute value of each numerical value in the matrix of m x 1 obtained after operation, and selecting an absolute value corresponding to the ith row; x is the number ofijThe value of the ith row and the jth column in the audio signal x is sigma a preset constant value, EijModifying a value corresponding to the jth column in the ith row in the vector E for the sampling point;

step 10, constructing a plurality of training samples in the training set again, and training the updated coding network and decoding network of each training sample by adopting the same method in the steps 2 to 9 in sequence until all the training samples in the training set train the coding network and the decoding network according to the preset number of training turns to obtain a trained coding network and a trained decoding network;

step 11, converting the secret information to be sent into a binary bit stream, constructing the binary bit stream into secret information y 'of m x k, randomly selecting k audio segments in a test set, and constructing the k audio segments into a test sample x' of m x k; finally, after the test sample x ' and the secret information y ' are processed according to the steps 2-4, the test sample x ' and the secret information y ' are input into a trained coding network, and steganographic audio s ' is obtained according to the calculation in the step 6, wherein k is a positive integer;

and step 12, inputting the steganographic audio s 'obtained in the step 11 into a decoding network after training is completed, namely obtaining the secret information extracted from the steganographic audio s'.

As an improvement, the coding network in step 5 includes N convolutional block layers, 1 convolutional layer, and a first activation function, which are connected in sequence, where each convolutional block layer includes a convolutional layer, a normalization layer, and a second activation function, which are connected in sequence, and N is a positive integer.

Preferably, N is 7.

In the scheme, the first activation function adopts a TanH activation function; the second activation function is a ReLU activation function.

Further, the decoding network in step 7 includes M convolutional block layers and 1 convolutional layer which are sequentially connected, where each convolutional block layer includes a convolutional layer and a third activation function which are sequentially connected, and M is a positive integer.

Preferably, M is 7.

In this scheme, the third activation function is a ReLU activation function.

Compared with the prior art, the invention has the advantages that: in addition, the secret information can be embedded in an area where audio is difficult to perceive by optimizing a self-adaptive loss function in the training process of the coding network, so that the quality of the audio after steganography is ensured, and the imperceptibility of the audio after steganography is improved; and the convolutional neural network is adopted for encoding and decoding, and the method can be directly used for audio steganography and secret information extraction after the network framework training is completed, so that the efficiency of audio steganography and secret information extraction is improved, and the method is more convenient to use.

Drawings

Fig. 1 is a network framework diagram of an audio steganography method and a secret information extraction method in an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying examples.

As shown in fig. 1, an adaptive audio steganography method and a secret information extraction method based on CNN includes the following steps:

step 1, constructing a training set and a testing set: cutting the audio signals in the audio sample library into audio segments with consistent lengths, wherein each audio segment is provided with m sampling values; selecting part of audio segments from the audio sample library to construct a training set, and constructing the rest audio segments into a test set; m is a positive integer; wherein, the number of audio segments in the training set is far larger than that in the testing set; the proportion between the number of the audio segments in the training set and the number of the audio segments in the test set can adopt the proportion commonly used when the training set and the test set are constructed in the existing deep learning technology;

step 2, randomly selecting n audio segments in a training set to form a training sample; preprocessing the training sample to obtain an m x n audio signal x; n is a positive integer; the pretreatment comprises the following specific steps:

step 2-1, performing normalization processing on m sampling values of each audio segment in the training sample to enable the m sampling values in each audio segment to be normalized to the range of [ -1,1 ];

2-2, respectively processing the m normalized sampling values in each audio segment by a high-order filter to obtain the sampling values after high-order filtering, and constructing the sampling values into m x n audio signals x;

step 3, randomly generating n secret information bit streams with the length of m, and constructing the secret information bit streams into secret information y of m x n;

step 4, performing dimension superposition on the audio signal x of m × n in the step 2 and the secret information y of m × n in the step 3 to obtain a matrix of m × 2n, where in this embodiment, the matrix of m × 2n may be [ x, y ], or [ y, x ];

step 5, taking the matrix of m × 2n as input data, and inputting the input data into an initialized coding network to obtain a sampling point modification vector E of m × n; wherein, the coding network is a convolutional neural network;

the convolutional neural network corresponding to the coding network comprises N convolutional block layers, 1 convolutional layer and a first activation function which are sequentially connected, each convolutional block layer comprises a convolutional layer, a normalization layer and a second activation function which are sequentially connected, and N is a positive integer; wherein the convolutional layer is a convolutional layer in the existing convolutional neural network; in this embodiment, N determined through an experiment is 7, and the first activation function is a TanH activation function; the second activation function adopts a ReLU activation function;

step 6, obtaining steganographic audio s according to the m x n sampling point modification vector E obtained in the step 5 and the m x n audio signal x in the step 2; the calculation formula is as follows:

s=x+E*α;

wherein alpha is a preset coefficient; in the present embodiment, α is 200;

step 7, inputting the steganographic audio s into an initialized decoding network to obtain the secret information y' of m × n after decoding; wherein, the decoding network is a convolutional neural network;

the convolutional neural network corresponding to the decoding network comprises M convolutional block layers and 1 convolutional layer which are sequentially connected, each convolutional block layer comprises a convolutional layer and a third activation function which are sequentially connected, M is a positive integer, and the convolutional layers are convolutional layers in the conventional convolutional neural network; in the embodiment, the determined M is 7 through experiments, and the third activation function adopts a ReLU activation function;

step 8, calculating loss function L in decoding networkDAnd according to the loss function LDUpdating parameters in the decoding network;

for the decoding network, the only task is to extract the original secret information from the received steganographic audio, in this embodiment, a binary cross entropy function is used as a loss function to measure the distortion when the secret information is extracted, and a loss function L in the decoding networkDThe calculation formula of (2) is as follows:

Figure BDA0002456285130000051

wherein, yijIs the value y 'corresponding to the ith row and jth column in the secret information y'ijThe value is the value corresponding to the ith row and the jth column in the decoded secret information y';

step 9, calculating a loss function L in the coding networkEAnd according to the loss function LEUpdating parameters in the coding network;

for the purpose of encoding the network(s),in the embodiment, a DRF distortion cost function is introduced as a loss function to guide the training of the coding network, the definition of the DRF distortion cost function endows more distortions to an audio complex area and an area with large amplitude of a sampling point, and the part obtains an adaptive loss function L through calculatione(ii) a Besides, the accuracy of extraction of the decoding network should be considered in the encoding network, because if the encoding network does not consider the decoding link, the generated sample point modification vector E is easy to be very slight, which results in that the decoding network cannot extract accurately. Thus except for the adaptive loss function LeIn addition, the coding network will decode the corresponding loss L of the networkDAs one of the optimization objectives;

therefore, the loss function L in the decoding networkEThe calculation formula of (2) is as follows:

Figure BDA0002456285130000061

wherein β and γ are control adaptive loss functions L respectivelyeAnd decoding the loss function LDEmpirical coefficient of weight, xjThe values corresponding to all rows in the j-th column of the audio signal x are a matrix of m x 1,

Figure BDA0002456285130000062

f is a high-order filter with any order, the high-order filter with any order corresponds to a matrix of q 1, and q is the order of the high-order filter;is a pair ofTaking an absolute value of each numerical value in the matrix of m x 1 obtained after operation, and selecting an absolute value corresponding to the ith row; x is the number ofijIth row and jth column pair in audio signal xThe corresponding value, σ, is a predetermined constant, EijModifying the value corresponding to the ith row and the jth column in the vector E for the sampling point, wherein in the embodiment, β is equal to 1 and γ is equal to 0.00001 under the condition of focusing on imperceptibility, and different values can be selected according to different requirements;

step 10, constructing a plurality of training samples in the training set again, and training the updated coding network and decoding network of each training sample by adopting the same method in the steps 2 to 9 in sequence until all the training samples in the training set train the coding network and the decoding network according to the preset number of training turns to obtain a trained coding network and a trained decoding network;

step 11, converting the secret information to be sent into a binary bit stream, constructing the binary bit stream into secret information y 'of m x k, randomly selecting k audio segments in a test set, and constructing the k audio segments into a test sample x' of m x k; finally, after the test sample x ' and the secret information y ' are processed according to the steps 2-4, the test sample x ' and the secret information y ' are input into a trained coding network, and steganographic audio s ' is obtained according to the calculation in the step 6, wherein k is a positive integer; in this embodiment, during testing, k may be 1, and certainly, may also be a positive integer greater than 1;

and step 12, inputting the steganographic audio s 'obtained in the step 11 into a decoding network after training is completed, namely obtaining the secret information extracted from the steganographic audio s'.

In the using process, a steganographer firstly converts the secret information to be sent into binary bit information, talkbacks an audio sample and the secret information to process by using the coding network to construct steganography audio, and sends the steganography audio to a receiver through an open channel. The receiver side possesses the corresponding decoding network, and can extract the secret information from the received steganographic audio. Therefore, the audio steganography method can be directly used for extracting the steganography and the secret information of the audio when the network framework training is completed through the deep learning technology, and the efficiency of hiding the information is improved. Meanwhile, the steganography audio with higher quality than that of the existing deep learning-based method can be generated by optimizing the adaptive loss function with lower disturbance, and the disturbance generated after steganography is greatly reduced and the audio hiding performance is improved by generating the audio sampling point modification vector.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the technical principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:音频差异检测方法、装置、设备及可读存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类