Audio steganalysis method based on convolutional neural network and domain confrontation learning

文档序号:1058569 发布日期:2020-10-13 浏览:34次 中文

阅读说明:本技术 基于卷积神经网络和领域对抗学习的音频隐写分析方法 (Audio steganalysis method based on convolutional neural network and domain confrontation learning ) 是由 王让定 林昱臻 严迪群 董理 于 2020-05-15 设计创作,主要内容包括:本发明涉及一种基于卷积神经网络和领域对抗学习的音频隐写分析方法,其特征在于:所述方法对应的网络框架包括特征提取子网络<Image he="78" wi="105" file="DDA0002494661440000011.GIF" imgContent="drawing" imgFormat="GIF" orientation="portrait" inline="no"></Image>隐写分析子网络<Image he="74" wi="75" file="DDA0002494661440000012.GIF" imgContent="drawing" imgFormat="GIF" orientation="portrait" inline="no"></Image>和载体来源判别子网络<Image he="69" wi="102" file="DDA0002494661440000013.GIF" imgContent="drawing" imgFormat="GIF" orientation="portrait" inline="no"></Image>其中θ<Sub>f</Sub>、θ<Sub>y</Sub>、θ<Sub>d</Sub>分别代表各个子网络的网络参数,通过提供基于卷积神经网络和领域对抗学习的音频隐写分析方法,能有效缓解由载体来源失配问题导致的音频隐写分析模型性能下降问题,为音频隐写分析技术在复杂互联网大数据取证场景下的应用提供了一种可行思路。(The invention relates to an audio steganalysis method based on a convolutional neural network and domain confrontation learning, which is characterized by comprising the following steps: the network framework corresponding to the method comprises a feature extraction sub-network Steganalysis subnetwork And carrier source discrimination subnetwork Wherein theta is f 、θ y 、θ d Network parameters representing respective sub-networks by providing volume-basedThe audio steganalysis method for the neural network and the field counterstudy can effectively relieve the problem of performance reduction of an audio steganalysis model caused by the problem of carrier source mismatch, and provides a feasible thought for the application of an audio steganalysis technology in a complex internet big data evidence obtaining scene.)

1. An audio steganalysis method based on a convolutional neural network and domain confrontation learning is characterized in that: the network framework corresponding to the method comprises a feature extraction sub-networkSteganalysis subnetwork

Figure FDA0002494661420000012

s1, inputting source domain dataTarget domain data

Figure FDA0002494661420000015

s2, outputting a steganalysis feature vector F through a feature extraction sub-network;

s3, outputting the steganalysis feature vector F through the steganalysis subnetwork to obtain the binary steganalysis prediction probabilityComputing binary steganographic prediction probabilities

Figure FDA0002494661420000017

s4, outputting the steganalysis feature vector F through the carrier source discrimination sub-network to obtain the carrier source prediction probability valueCalculating the predicted probability value of the vector source

Figure FDA0002494661420000019

2. The audio steganalysis method based on convolutional neural network and domain confrontation learning of claim 1, characterized in that:

the feature extraction sub-network in S2 includes an audio preprocessing layer and 4 concatenated convolution groups after the audio preprocessing layer, that is, a 1 st convolution group, a 2 nd convolution group, a 3 rd convolution group, and a 4 th convolution group.

3. The audio steganalysis method based on convolutional neural network and domain confrontation learning according to claim 2, characterized in that:

the audio preprocessing layer consists of 4 1 multiplied by 5 convolution kernels D1-D4, and the initial weights are respectively as follows:

D1=[1,-1,0,0,0],D1=[1,-2,1,0,0],D1=[1,-3,3,1,0],D1=[1,-4,6,-4,1];

the 1 st convolution group includes a 1 × 1 first convolution layer, a 1 × 5 second convolution layer, and a 1 × 1 third convolution layer;

the 2 nd convolution group, the 3 rd convolution group and the 4 th convolution group respectively comprise a 1x5 convolution layer, a 1x1 convolution layer and a mean pooling layer, wherein the mean pooling layer of the 4 th convolution group is a global mean pooling layer;

the steganalysis feature vector is a 256-dimensional vector.

4. The audio steganalysis method based on convolutional neural network and domain confrontation learning according to claim 2, characterized in that:

the audio preprocessing layer adopts a differential filtering design.

5. The audio steganalysis method based on convolutional neural network and domain confrontation learning of claim 1, characterized in that:

the steganalysis sub-network comprises a full-connection layer and a steganographic label prediction layer, wherein the full-connection layer is formed by two layers of cascade connection and respectively comprises 128 neurons and 64 neurons.

6. The audio steganalysis method based on convolutional neural network and domain confrontation learning of claim 1, characterized in that:

the carrier source discrimination subnetwork comprises a gradient inversion layer, a domain discrimination layer and a domain label prediction layer, wherein the gradient inversion layer maintains identity mapping of input and output data in a forward propagation stage, gradient values of inversion errors in an error backward propagation stage are respectively expressed as,

Forward:F(x)=x

Figure FDA0002494661420000021

wherein F (x) represents an equivalent function formula of the gradient inversion layer, and I is a unit matrix.

Further, the network parameter θ is updated in S3yAnd updating the network parameter θ in S4dThe optimization is carried out by the following formula,

wherein the content of the first and second substances,

Figure FDA0002494661420000024

Technical Field

The invention relates to the technical field of audio steganography, in particular to an audio steganography analysis method based on a convolutional neural network and field confrontation learning.

Background

At present, an audio steganalysis model based on a deep learning technology obtains higher detection performance under laboratory conditions. However, under the actual environment of network big data forensics, audio carrier data has the characteristics of diversity, heterogeneity and the like, and if the steganalysis model obtained by laboratory training is directly used for detection, the accuracy rate is greatly reduced.

The Carrier Source Mismatch (CSM) problem in audio steganalysis is caused by the difference between the sources of training set audio data and test set audio data (e.g., recording equipment, speaker gender, language, etc.). CSM is essentially a Domain Adaptation (Domain Adaptation) problem in migration learning, which can be defined as: given a marked source data fieldAnd a label-free target data fieldAssuming that their feature spaces are the same, their class spaces are the same, and their conditional probability distributions are the same, but the edge distributions of the two domains are different, the goal of domain adaptive learning is to use labeled data DsTo learn a classifier f xt→ytTo predict the target domain DtSuch that the risk of predicted errors is minimized.

But currently, no solution is specially designed for CSM problem in audio steganalysis.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide an audio steganalysis method based on a convolutional neural network and domain confrontation learning, which can effectively alleviate the impact of a CSM phenomenon on performance degradation of an audio steganalysis model, and improve the application feasibility of an audio steganalysis technology in a complex internet big data forensics scene.

In order to achieve the purpose, the technical scheme of the invention is as follows: an audio steganalysis method based on a convolutional neural network and field counterstudy is characterized in that: the network framework corresponding to the method comprises a feature extraction sub-networkSteganalysis subnetworkAnd carrier source discrimination subnetwork

Figure BDA0002494661430000022

Wherein theta isf、θy、θdRespectively, representing network parameters of the respective sub-networks, the method comprising,

s1, inputting source domain data

Figure BDA0002494661430000023

Target domain data

Figure BDA0002494661430000024

An antagonistic training factor λ, learning rate η;

s2, outputting a steganalysis feature vector F through a feature extraction sub-network;

s3, outputting the steganalysis feature vector F through the steganalysis subnetwork to obtain the binary steganalysis prediction probability

Figure BDA0002494661430000025

Computing binary steganographic prediction probabilitiesCross entropy loss l from the original steganographic label yyAnd accordingly updating the network parameter theta through a back propagation error and gradient descent algorithmyWherein y ∈ {0,1}, when y takes the value 0, it represents the original carrier and is takenThe value 1 represents a steganographic carrier;

s4, outputting the steganalysis feature vector F through the carrier source discrimination sub-network to obtain the carrier source prediction probability valueCalculating the predicted probability value of the vector sourceCross entropy loss l from the original steganographic label ddAnd accordingly updating the network parameter theta by back-propagating the errordWhere d ∈ {0,1}, when d takes a value of 0, it represents the source domain and when d takes a value of 1, it represents the target domain.

Further, the feature extraction sub-network in S2 includes an audio preprocessing layer and 4 concatenated convolution groups after the audio preprocessing layer, that is, a 1 st convolution group, a 2 nd convolution group, a 3 rd convolution group, and a 4 th convolution group.

Further, the audio preprocessing layer is composed of 4 1 × 5 convolution kernels D1-D4, and the initial weights are respectively:

D1=[1,-1,0,0,0],D1=[1,-2,1,0,0],D1=[1,-3,3,1,0],D1=[1,-4,6,-4,1];

the 1 st convolution group includes a 1 × 1 first convolution layer, a 1 × 5 second convolution layer, and a 1 × 1 third convolution layer;

the 2 nd convolution group, the 3 rd convolution group and the 4 th convolution group respectively comprise a 1x5 convolution layer, a 1x1 convolution layer and a mean pooling layer, wherein the mean pooling layer of the 4 th convolution group is a global mean pooling layer;

the steganalysis feature vector is a 256-dimensional vector.

Furthermore, the audio preprocessing layer adopts a differential filtering design.

Further, the steganalysis subnetwork comprises a fully-connected layer and a steganographic label prediction layer, wherein the fully-connected layer is a two-layer cascade and comprises 128 neurons and 64 neurons respectively.

Further, the carrier source discrimination sub-network includes a gradient inversion layer, a domain discrimination layer, and a domain label prediction layer, the gradient inversion layer maintains an identity mapping of input and output data in a forward propagation stage, gradient values of inversion errors in an error back propagation stage are respectively expressed as,

Forward:F(x)=x

Figure BDA0002494661430000031

wherein F (x) represents an equivalent function formula of the gradient inversion layer, and I is a unit matrix.

Further, the network parameter θ is updated in S3yAnd updating the network parameter θ in S4dThe optimization is carried out by the following formula,

Figure BDA0002494661430000033

wherein the content of the first and second substances, respectively representing the network parameters determined by each sub-network, wherein n is the number of the source domain data training samples, and m is the number of the target domain data training samples.

Compared with the prior art, the invention has the advantages that:

the convolutional neural network and the field confrontation learning are combined and applied to the audio general steganalysis model, the steganalysis characteristics with independent domains can be obtained, the problem of performance reduction of the audio steganalysis model caused by the problem of carrier source mismatch can be effectively solved, and a feasible thought is provided for the application of the audio steganalysis technology in a complex internet big data evidence obtaining scene.

Detailed Description

The following detailed description of embodiments of the invention is merely exemplary in nature and is intended to be illustrative of the invention and not to be construed as limiting the invention.

The invention provides an audio steganalysis method based on a convolutional neural network and field counterstudy, which is characterized in that: the network framework corresponding to the method comprises a feature extraction sub-networkSteganalysis subnetworkAnd carrier source discrimination subnetwork

Figure BDA0002494661430000038

Wherein theta isf、θy、θdRespectively, representing network parameters of the respective sub-networks, the method comprising,

s1, inputting source domain dataTarget domain dataAn antagonistic training factor λ, learning rate η;

s2, outputting a steganalysis feature vector F through a feature extraction sub-network;

s3, outputting the steganalysis feature vector F through the steganalysis subnetwork to obtain the binary steganalysis prediction probabilityComputing binary steganographic prediction probabilitiesCross entropy loss l from the original steganographic label yyAnd accordingly updating the network parameter theta through a back propagation error and gradient descent algorithmyWherein y ∈ {0,1}, when y takes the value 0, it represents the original carrier, and when the value 1, it represents the steganographic carrier;

s4, analyzing the feature vector by steganalysisF, obtaining the prediction probability value of the carrier source through the output of the carrier source discrimination subnetworkCalculating the predicted probability value of the vector source

Figure BDA0002494661430000044

Cross entropy loss l from the original steganographic label ddAnd accordingly updating the network parameter theta by back-propagating the errordWhere d ∈ {0,1}, when d takes a value of 0, it represents the source domain and when d takes a value of 1, it represents the target domain.

In order to alleviate the degradation of steganalysis performance caused by CSM (Carrier sense multiple access) problems, the output feature vector F firstly has steganalysis (namely, a correct steganalysis result is obtained after the steganalysis sub-network is input) and also has certain domain independence (namely, the feature space distribution of different audio carrier data is kept consistent). By continuously learning the distribution difference between the original audio sample and the steganographic audio sample data, the feature extraction network improves the ability of the learned feature F to correctly detect the steganographic audio. At the same time, the reverse propagation stage is reversed

Figure BDA0002494661430000045

The resulting error gradient to update

Figure BDA0002494661430000046

Network parameter θ offTo reduce the correlation of its extracted features F to the field of audio carrier data.

For the network architecture, the detailed architecture parameters of the various sub-network modules are shown in the following table. The numerical meanings in the tables are exemplary: 64x (1x5), ReLU, a convolution kernel of 1x5 with the parameters representing the convolutional layer set to output channel 64, and the output activated using ReLU. FC-256 represents a fully connected layer with 256 neurons.

Figure BDA0002494661430000051

Sub-network for feature extractionIts role is to adaptively extract steganalysis features from input audio data. In a CNN steganalysis model, a reasonable preprocessing layer is arranged, so that the steganalysis performance of a network can be improved. Therefore, at the beginning of the feature extraction sub-network, an audio pre-processing layer based on differential filtering design is used, which is composed of 4 1 × 5 convolution kernels D1-D4, and the initial weights are respectively:

D1=[1,-1,0,0,0]

D2=[1,-2,1,0,0]

D3=[1,-3,3,1,0]

D4=[1,-4,6,-4,1]

the audio pre-processing layer is followed by 4 concatenated convolutional group modules. The convolution layer in the 1 st convolution module does not undergo nonlinear activation processing and eliminates pooling operation, and the aim is to more effectively capture weak information brought by steganography. The 2 nd to 4 th convolution modules all comprise a 1x5 convolution layer, a 1x1 convolution layer and a mean Pooling layer, wherein the last mean Pooling layer of the 4 th convolution module is replaced by a Global mean Pooling (Global Average Pooling) layer for fusing Global features.

Figure BDA0002494661430000053

Finally, 256-dimensional steganalysis feature vector F is output.

Classifying subnetworks for steganographyIt is next to the feature output layer, and its structure is two cascaded fully-connected layers (containing 128 and 64 neuron structures, respectively).

Discriminating subnetwork for carrier origin

Figure BDA0002494661430000055

The structure of the carrier source distinguishing network is similar to that of a steganographic classification network, and the main structure is also composed of full connection layers. In contrast, the feature extraction subnetworkOutput characteristic F of and carrier source discrimination sub-networkThe domain discrimination layers (C) are connected by a Gradient Reverse Layer (GRL).

For the formula Forward F (x) x andthe smaller the λ, the less important the domain label,

Figure BDA0002494661430000064

the extracted feature vector F is also allowed to contain more domain information. When λ is 0, it means that the effect of the domain label is not considered, i.e., migration is not considered. The classifier is most strongly dependent on the source domain data at this time. It is therefore also important to set a reasonable lambda. When the two domains differ more, λ may be suitably larger.

Method for protecting source domain audio data in training process

Figure BDA0002494661430000065

With complete steganographic label information, and target domain audio dataThe steganographic tag information is not included. The training process of the whole network can be divided into two parts: 1)

Figure BDA0002494661430000067

and

Figure BDA0002494661430000068

the sub-networks are cascaded to form a supervision steganalysis network; 2)

Figure BDA0002494661430000069

andand a carrier source distinguishing process formed by cascading sub-networks. The training purpose of the whole network is as follows: by trainingThe difference of the characteristic F in the steganography space is improved through trainingTo distinguish different audio data and extract domain information, and meanwhile, to pass through

Figure BDA00024946614300000613

And

Figure BDA00024946614300000614

to eliminateThe domain-related information of the extracted feature F. The training objective of the overall network is equivalent to solving the following optimization problem:

Figure BDA00024946614300000616

wherein the content of the first and second substances,

Figure BDA00024946614300000619

respectively representing respective sub-networksAnd determining the network parameters, wherein n is the number of the source domain data training samples, and m is the number of the target domain data training samples.

To achieve the above object, the training process of the whole network can be represented by the following table, i.e.

Figure BDA00024946614300000620

By adopting the method, the performance reduction of the audio steganalysis model caused by the problem of carrier source mismatch is effectively relieved, and a feasible idea is provided for the application of the audio steganalysis technology in a complex internet big data forensics scene.

While embodiments of the invention have been shown and described, it will be understood by those skilled in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种LC3音频编码器编码优化方法、系统、存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类