Multi-band infrared image fusion method based on Cascade-GAN

文档序号：1938463 发布日期：2021-12-07 浏览：14次中文

阅读说明：本技术 一种基于Cascade-GAN的多波段红外图像融合方法 (Multi-band infrared image fusion method based on Cascade-GAN ) 是由彭玉怀王文茜郭钰王晨路吴菁晶于 2021-09-09 设计创作，主要内容包括：本发明涉及一种基于Cascade-GAN的多波段红外图像融合方法,首先,通过将采集到的包括短波、中波、长波在内的针对同一场景的多波段红外图像沿通道进行连接,得到的三通道图像作为级联生成对抗网络模型Cascade-GAN的训练数据集,利用训练好的去噪生成对抗网络模型DnGAN生成去噪图像,提高图像信噪比；然后将去噪后的图像输入融合生成对抗网络FuGAN,通过该融合网络的生成器G-(Fu)与其判别器D-(Fu)之间的对抗博弈进行高质量图像融合；级联网络通过建立总损失函数,利用FuGAN输出结果指导DnGAN的训练,通过提高FuGAN的输出质量来改善DnGAN的融合效果,从而获得高质量融合图像。本发明与传统融合方法相比,基于深度学习的融合方法具有更高的鲁棒性,并且具有较佳的融合效果,能够充分改善融合精度。(The invention relates to a multi-band infrared image fusion method based on Cascade-GAN, which comprises the steps of firstly, connecting collected multi-band infrared images aiming at the same scene and comprising short waves, medium waves and long waves along a channel, using the obtained three-channel images as a training data set for cascading generation of an confrontation network model Cascade-GAN, generating a denoised image by using a trained denoised generation confrontation network model DnGAN, and improving the signal-to-noise ratio of the image; then, the de-noised images are input and fused to generate an antagonistic network FuGAN, and a generator G passing through the fused network Fu And its discriminator D Fu The fighting game between the two images is subjected to high-quality image fusion; the cascade network utilizes the output result of FuGAN to guide the training of DnGAN by establishing a total loss function and improves the fusion effect of DnGAN by improving the output quality of FuGAN,thereby obtaining a high quality fused image. Compared with the traditional fusion method, the fusion method based on deep learning has higher robustness and better fusion effect, and can fully improve the fusion precision.)

1. A multiband infrared image fusion method based on Cascade-GAN is characterized by comprising the following steps:

s1, data set preparation: respectively acquiring short-wave infrared images, medium-wave infrared images and long-wave infrared images under the same scene through an infrared sensor, and performing Noise processing on the images, wherein the original images and the images subjected to Noise processing are respectively stored in an Image folder and a Noise folder; connecting the long wave, the medium wave and the short wave corresponding to the original infrared image and the noise-added image in the same scene respectively along channels, and inputting the long wave, the medium wave and the short wave as a training data set into a Cascade-GAN for unsupervised learning;

s2, design of DnGAN network: inputting an original image and a noise-added image into DnGAN, DnGAN includes a generator G_DnAnd a discriminator D_DnTwo parts, which have a relationship of fighting games, generator G_DnContinuously generating a denoised image closer to the original image, and a discriminator D_DnDetermining the difference between an original image and a generated denoising image, and finally aiming at establishing a denoising generation network to generate a denoising image which can not be distinguished from the original noiseless image;

s3, FuGAN network design: the de-noised images are input and fused to generate an antagonistic network FuGAN, and the generated antagonistic network is used as a basic network and passes through a generator G_FuAnd discriminator D_FuThe countermeasure game realizes the output of high-quality fusion images; generator G_FuExtracting image characteristics through an encoder, and reconstructing and outputting a single-channel fusion image through a decoder; discriminator D_FuThe output images are respectively identified according to the single-channel infrared source images of three wave bands, corresponding discrimination probability vectors are output and used as feedback to drive a generator G_FuLearning the data distribution of the input image and fusing the data distribution until a discriminator D_FuUnable to generate the generator G_FuJudging authenticity of the output image;

s4, training strategy: firstly, initializing and fusing a trained network under a noise-free environment to generate a confrontation network FuGAN, then training the cascade of the two networks in an end-to-end manner, and simultaneously determining the weight of the FuGAN in the process; the weights in the de-noising generated countermeasure network DnGAN are also updated by the error back-propagation of the subsequent network.

2. The Cascade-GAN-based multiband infrared image fusion method according to claim 1, wherein: in the step S2, the generator G_DnThe device mainly comprises an encoder and a decoder, and an up-down sampling pair operation is introduced; extracting the features of the image through an encoder, performing down-sampling operation on the features to obtain feature maps of different scales, then performing feature extraction operation on the scales once, and finally realizing feature fusion of the two scales and reconstruction of a de-noised image through a decoder; by down-sampling andand the up-sampling operation pair realizes the scaling of the characteristic diagram and the change of the reception field of the convolution kernel, thereby utilizing more context information and improving the denoising effect.

3. The Cascade-GAN-based multiband infrared image fusion method according to claim 2, wherein: the encoder shown consists of 4 CNNs, which have 128, 32, 128 cores of 3 × 3, 1 × 1, 3 × 3 and 1 × 1 sizes, respectively, from top to bottom; in order to alleviate the disappearance of the gradient, compensate for the loss of the features, reuse the features previously calculated, introduce DenseNet and establish short direct connections between each layer and all layers in a feed-forward manner; the decoder shown is also a 4-layer CNN, similar to the encoder structure, except that the number of kernels for the four convolutional layers is 256, 64, and 256, respectively; the number of steps of all the convolution layers is set as 1; to avoid explosive/disappearing gradients and accelerated training, batch normalization was applied; a ReLU activation function is adopted to accelerate the convergence speed and avoid gradient sparsity; the down sampling adopts max pooling, and the step length is 2; the upsampling operation is achieved by deconvolution of 4 x 4 kernels, with the aim of extending the feature map to the same spatial size as the previous scale.

4. The Cascade-GAN-based multiband infrared image fusion method according to claim 2, wherein: the discriminator D_DnEssentially a two-classifier, using a 3 × 3 convolution kernel and a ReLU activation function in the convolution layers from the first layer to the third layer, extracting a feature map from the input image, and then classifying it; the stride of all the convolutional layers is set to be 2; setting the last layer to generate a scalar by using the tanh activation function, wherein the scalar indicates that the input data is from the original image instead of G_DnProbability of false images being generated.

5. The Cascade-GAN-based multiband infrared image fusion method according to claim 1, wherein: in the step S3, the generator G_FuThe device mainly comprises an encoder and a decoder; the encoder is composed of 5 convolutional layers and is arranged at the firstAn attention mechanism is introduced after the layers and the fourth layer of convolution layer, so that the information more critical to the current task is focused, the attention degree to other information is reduced, and the efficiency of the whole network is improved; introducing DenseNet, and establishing short direct connection between each layer and all layers in a feed-forward mode so as to relieve the disappearance of gradients, make up the loss of characteristics and reuse the characteristics calculated before; the discriminator consists of 4 convolution layers and a linear layer; these four convolutional layers use a 3 × 3 convolutional kernel and a leaky ReLU activation function, and batch normalization; in all convolutional layers, stride is set to 2; and the last linear layer judges the input according to the characteristics extracted from the first four convolutional layers and outputs a probability vector.

6. The Cascade-GAN-based multiband infrared image fusion method according to claim 5, wherein: the attention mechanism comprises a channel attention module CAM and a space attention module SAM, wherein the two parts are connected in sequence, the CAM is firstly input by intermediate feature mapping, and then the refined feature mapping of the channel is used as the input of the SAM; wherein, in order to gather abundant information in each channel, the CAM uses max-pooling, overlapping-pooling and avg-pooling respectively to squeeze the spatial information of the input feature map; the application of overlapping-posing can improve the prediction precision and slow down overfitting; three channel vectors can be obtained through compression operation; then, the three channel vectors are sent to a shared full-connection layer and a hidden layer, and the three channel vectors are activated through element-level summation operation and a sigmoid function, so that a channel attention vector can be obtained; multiplying it with the input feature map makes the network have more interest in the just-interesting channel region.

7. The Cascade-GAN-based multiband infrared image fusion method according to claim 1, wherein: the step S4 specifically includes:

a loss function is set separately for guiding the optimization of the generators and discriminators of the two networks:

under a noiseless environment, guide FThe loss function of generator training of uGAN is represented by G_FuAnd D_FuAgainst the loss ofAnd controlling perceptual loss of high frequency signature lossAnd SSIM loss to control low frequency signature lossConsists of the following components:

wherein λ₁、λ₂Is the ratio of gradual modification in training;

the resistance loss is defined as follows:

wherein e is a probability label for judging the fusion image; due to the discriminator D_FuIs a multi-classifier, outputs a 1 × 3 probability vector, and thus, D_Fu(·)[0]The first term representing the vector, i.e. the probability that the fused image is a short-wave infrared image, D_Fu(·)[1]、D_Fu(·)[2]Representing the second item and the third item of the vector, namely the probability that the fused image is a medium wave infrared image and a long wave infrared image; due to the generator G_FuExpectation discriminator D_FuThe fused image and the real data cannot be distinguished, so e is set to 1;

loss of perception: the high-level features of the source image are compared with the same-level features of the fusion image generated by the training network; selecting layers 2, 4, 6 and 8 of the existing VGG-16 network model as a feature extraction sub-network; connecting the infrared images of the three wave bands along a channel to obtain a three-channel image F, inputting the three-channel image F as a reference image, connecting the same three single-channel fusion images, and inputting the three single-channel fusion images as a fusion result into an input unit I:

wherein j represents the jth layer of the VGG-16 network; c_jH_jW_jFeature map representing a jth layer channel of size H_jW_j；φ_j(F) And phi_j(I) Respectively representing output characteristic mapping obtained by a j-th layer of the VGG-16 network, and calculating final loss by an L2 norm; through the constraint of the perceptual loss term, the generator G can be prompted_FuObtaining a fusion image with good visual effect;

SSIM loss, namely, the correlation loss, the brightness distortion and the contrast distortion of the constrained fusion image, wherein the SSIM loss is defined as follows:

L_GFSSIM＝1-(ω₁·SSIM(I_fused,I_SWIR)+ω₂·SSIM(I_fused,I_MWIR)+ω₃·SSIM(I_fused,I_LWIR))

where ω denotes a weight, ω₁+ω₂+ω₃＝1；

Using least squares to generate a countermeasure network using a least squares loss function as the loss function of a discriminator, discriminator D of FuGAN_FuLoss function L of_DfThe method comprises the following steps of (1) forming a decision loss four parts of three infrared source images and a fused image; by usingTo represent these four losses:

considering the 1 × 3 vectors of the discriminator output, we have P_SWIR＝D_Fu(x)[0]、P_MWIR＝D_Fu(x)[1]、P_LWIR＝D_Fu(x)[2](ii) a When the input is a short wave infrared image, P is expected_SWIRClose to 1, P_MWIRAnd P_LWIRIs close to 0; the corresponding loss is defined as:

where N is the number of pixels in the image, a₁、a₂And a₃Is a probability label, a₁Is set to 1, a₂And a₃Set to 0, i.e. when a short-wave infrared image is input, discriminator D_FuJudging that the probability of the short-wave infrared image is higher, and judging that the probability of the short-wave infrared image is lower, and judging that the short-wave infrared image is a medium-wave infrared image and a long-wave infrared image is lower;

similarly, the loss term for mid-wave infrared, long-wave infrared images is defined as:

wherein b is₂Is set to 1, b₁And b₃Set to 0; c. C₃Is set to 1, c₁And c₂Set to 0;

finally, when the input image is a fused image, the loss function is defined as:

wherein D is a discriminator D_FuJudging the probability label of the fusion image, and setting the probability label as 0; likewise, the three probability labels D are balanced, i.e. from the arbiter D_FuFrom the perspective of (1), the fused images are pseudo short wave infrared images, pseudo medium wave infrared images and pseudo medium wave infrared images of the same degreePseudo-long wave infrared images;

DnGAN generator G in the training process of cascade network_DnAnd FuGAN Generator G_FuBy reconstruction of the loss functionLoss of perceptionAnd guide FuGAN generator G in a noiseless environment_FuEach loss term L of_GfJointly constitute:

loss of reconstructionFor the mean square error between the denoised network output and the noiseless image, the following is defined:

where x is the input noisy image, x-is the noiseless image, G_Dn(x) For a denoised image generated by DnGAN, i and j respectively represent the row and the column where a pixel is located, and H multiplied by W is the size of the image;

loss of perceptionReference to the corresponding loss term of FuGAN;

each loss term of the FuGAN generator, G_FuIs a loss function L_Gf；

DnGAN discriminator D_DnThe loss function also employs a least squares loss function:

where N is the number of pixels in the image, a₁、a₂Is a probability label, a₁Is set to 1, a₂Set to 0, i.e., when the denoised image is input, the discriminator D_DnJudging that the probability of the image is larger, and judging that the probability is smaller;

similarly, the loss term for a noiseless image is defined as:

Technical Field

The invention relates to the technical field of image fusion, in particular to a multi-band infrared image fusion method based on Cascade-GAN.

Background

The infrared sensor can distinguish a target from a background by collecting external infrared radiation and generating an infrared image by utilizing radiation difference, and has the characteristic of working day and night, so that the infrared sensor is widely applied to the fields of target identification, detection, visual perception and the like. However, although the infrared sensor has a great number of advantages suitable for detection, it has its limitations, and with the complication of application tasks, the expansion of application environments, and the progress of infrared stealth and interference technologies, the infrared thermal imaging system has the defects of poor detection and identification capability for targets, high false alarm rate of the automatic early warning system, insufficient dynamic range, and the like in many application scenarios, so it is very important to improve the system performance by improving the infrared image quality.

The image fusion method can obtain more effective information of the target by utilizing the inherent and stronger difference and complementarity of the target in different infrared band images, and the more effective information is fused to obtain a more robust and information-rich fusion image, so that the efficiency of the system is effectively improved. At present, deep learning is an important means for solving advanced visual tasks, but most of deep learning technologies for image fusion are developed around the fusion of visible light images and infrared images, and traditional technologies such as multi-scale transformation, sparse representation and the like are still adopted for the fusion of multi-band infrared images. Decomposing a source image into different scales for feature extraction through multi-scale transformation, fusing the features of all scales by adopting a proper fusion strategy, and then reconstructing the fused image by utilizing an inverse operator. The sparse low-rank representation learning-based method needs to learn an over-complete dictionary from a large number of high-quality natural images, sparsely encode each image segment, fuse sparse representation coefficients according to a given fusion rule, and finally reconstruct a fusion image by utilizing the learned over-complete dictionary fusion coefficients. Both of the above two conventional methods require manual selection of transformation methods and formulation of fusion rules, and the process is very complicated.

The existing traditional fusion method adopts the same transformation or representation aiming at the fusion of different source images, but the method is not suitable for the infrared multiband image fusion, because the infrared images of different wave bands have difference of wavelength length, different expression forms can exist for the same scene. In addition, in the existing method, the fusion rule mostly adopts a manual design mode, and is more and more complex, and the implementation difficulty and the calculation cost are also continuously improved.

Disclosure of Invention

In view of the above problems, the present invention aims to provide a method for fusing multiband infrared images based on Cascade-GAN, which utilizes the inherent and strong differences and complementarity of a target in different infrared band images to obtain more effective information of the target. The method realizes a unified deep learning framework with a generation countermeasure network GAN as a basic network, performs combined processing of infrared multiband image denoising and image fusion, establishes a total loss function, guides image denoising by utilizing semantic information in an image fusion process, and improves the quality of an output fusion image through image denoising, so that the finally obtained image has two characteristics of high signal-to-noise ratio and high information entropy, and smooth proceeding of high-level visual tasks such as target identification, target detection and the like performed by utilizing the image subsequently is ensured.

The technical scheme adopted by the invention is as follows:

the invention provides a multiband infrared image fusion method based on Cascade-GAN, which comprises the following steps:

s2, design of DnGAN network: the original image and the noise-added image are input into DnGAN, which includes a generator G_DnAnd a discriminator D_DnTwo parts, which have a relationship of fighting games, generator G_DnContinuously generating a denoised image closer to the original image, and a discriminator D_DnDetermining the difference between an original image and a generated denoising image, and finally aiming at establishing a denoising generation network to generate a denoising image which can not be distinguished from the original noiseless image;

Further, in the step S2, the generator G_DnThe device mainly comprises an encoder and a decoder, and an up-down sampling pair operation is introduced; extracting the features of the image by an encoder, performing down-sampling operation on the features to obtain feature maps of different scales, and then performing down-sampling operation on the feature maps of different scalesPerforming primary feature extraction operation, and finally realizing feature fusion of two scales and reconstruction of a de-noised image through a decoder; by the aid of the down-sampling and up-sampling operation pairs, scaling of the feature diagram is achieved, change of the convolution kernel receptive field is achieved, and accordingly more context information is utilized, and the denoising effect is improved.

Further, the encoder is composed of 4 CNNs, the 4 CNNs have 128, 32, 128 cores of 3 × 3, 1 × 1, 3 × 3, and 1 × 1 sizes from top to bottom, respectively; in order to alleviate the disappearance of the gradient, compensate for the loss of the features, reuse the features previously calculated, introduce DenseNet and establish short direct connections between each layer and all layers in a feed-forward manner; the decoder shown is also a 4-layer CNN, similar to the encoder structure, except that the number of kernels for the four convolutional layers is 256, 64, and 256, respectively; the number of steps of all the convolution layers is set as 1; to avoid explosive/disappearing gradients and accelerated training, batch normalization was applied; a ReLU activation function is adopted to accelerate the convergence speed and avoid gradient sparsity; the down sampling adopts max pooling, and the step length is 2; the upsampling operation is achieved by deconvolution of 4 x 4 kernels, with the aim of extending the feature map to the same spatial size as the previous scale.

Further, the discriminator D_DnEssentially a two-classifier, using a 3 × 3 convolution kernel and a ReLU activation function in the convolution layers from the first layer to the third layer, extracting a feature map from the input image, and then classifying it; the stride of all the convolutional layers is set to be 2; setting the last layer to generate a scalar by using the tanh activation function, wherein the scalar indicates that the input data is from the original image instead of G_DnProbability of false images being generated.

Further, in the step S3, the generator G_FuThe device mainly comprises an encoder and a decoder; the encoder consists of 5 convolutional layers, and an attention mechanism is introduced after the first convolutional layer and the fourth convolutional layer, so that the information more critical to the current task is focused, the attention to other information is reduced, and the efficiency of the whole network is improved; DenseNet was introduced and a short direct connection was established between each layer and all layers in a feed forward manner, thereby mitigating the disappearance of the gradient, the fillCompensating for the loss of the features, and reusing the previously calculated features; the discriminator consists of 4 convolution layers and a linear layer; these four convolutional layers use a 3 × 3 convolutional kernel and a leaky ReLU activation function, and batch normalization; in all convolutional layers, stride is set to 2; and the last linear layer judges the input according to the characteristics extracted from the first four convolutional layers and outputs a probability vector.

Furthermore, the attention mechanism comprises a channel attention module CAM and a space attention module SAM which are connected in sequence, wherein the CAM is firstly input into the intermediate feature mapping, and then the channel refined feature mapping is used as the input of the SAM; wherein, in order to gather abundant information in each channel, the CAM uses max-pooling, overlapping-pooling and avg-pooling respectively to squeeze the spatial information of the input feature map; the application of overlapping-posing can improve the prediction precision and slow down overfitting; three channel vectors can be obtained through compression operation; then, the three channel vectors are sent to a shared full-connection layer and a hidden layer, and the three channel vectors are activated through element-level summation operation and a sigmoid function, so that a channel attention vector can be obtained; multiplying it with the input feature map makes the network have more interest in the just-interesting channel region.

Further, the step S4 specifically includes:

a loss function is set separately for guiding the optimization of the generators and discriminators of the two networks:

under the noiseless environment, the loss function for guiding the training of the generator of FuGAN is G_FuAnd D_FuAgainst the loss ofAnd controlling perceptual loss of high frequency signature lossAnd SSIM loss to control low frequency signature lossConsists of the following components:

wherein λ₁、λ₂Is the ratio of gradual modification in training;

the resistance loss is defined as follows:

wherein j represents the jth layer of the VGG-16 network; c_jH_jW_jFeature map representing a jth layer channel of size H_jW_j；φ_j(F) And phi_j(I) Respectively representing output characteristic mapping obtained by a j-th layer of the VGG-16 network, and calculating final loss by an L2 norm; by sensing the approximation of the loss termBundle, can prompt the generator G_FuObtaining a fusion image with good visual effect;

SSIM loss, namely, the correlation loss, the brightness distortion and the contrast distortion of the constrained fusion image, wherein the SSIM loss is defined as follows:

where ω denotes a weight, ω₁+ω₂+ω₃＝1；

where N is the number of pixels in the image, a₁、a₂And a₃Is a probability label, a₁Is set to 1, a₂And a₃Set to 0, i.e. when a short-wave infrared image is input, discriminator D_FuThe probability of judging the infrared image as a short wave infrared image is higher, and the infrared image is judged as a summary of medium wave and long wave infrared imagesThe rate is low;

similarly, the loss term for mid-wave infrared, long-wave infrared images is defined as:

wherein b is₂Is set to 1, b₁And b₃Set to 0; c. C₃Is set to 1, c₁And c₂Set to 0;

finally, when the input image is a fused image, the loss function is defined as:

wherein D is a discriminator D_FuJudging the probability label of the fusion image, and setting the probability label as 0; likewise, the three probability labels D are balanced, i.e. from the arbiter D_FuIn the aspect of the method, the fused images are pseudo short wave infrared images, pseudo medium wave infrared images and pseudo long wave infrared images with the same degree;

loss of reconstructionFor the mean square error between the denoised network output and the noiseless image, the following is defined:

where x is the input noise image and x is the input noise image,as a noiseless image, G_Dn(x) For a denoised image generated by DnGAN, i and j respectively represent the row and the column where a pixel is located, and H multiplied by W is the size of the image;

loss of perceptionReference to the corresponding loss term of FuGAN;

each loss term of the FuGAN generator, G_FuIs a loss function L_Gf；

DnGAN discriminator D_DnThe loss function also employs a least squares loss function:

similarly, the loss term for a noiseless image is defined as:

compared with the prior art, the invention has the following beneficial effects:

in the image denoising network module, a method of applying a Gan network is tried to perform image denoising; an up-down sampling operation pair is introduced in the feature extraction process, so that the information of different scales can be obtained by changing the receptive field of a convolution kernel, the feature extraction precision is improved, and a higher-quality denoised image is obtained; the design of the total loss function combines the mean square error between images in the traditional method with the perception loss of a characteristic domain and the loss function of a fusion network, guides the training of a denoising network together, can utilize more semantic information, and improves the image denoising effect in all aspects; in general, the application of the deep learning in the image denoising can avoid the occurrence of smooth artifacts and the loss of image details caused in the denoising process by the traditional method, so as to obtain a higher-quality denoised image;

a channel attention and space attention module is introduced into the image fusion module, so that the information which is more critical to the current task is focused, the attention degree to other information is reduced, and irrelevant information is filtered, so that the efficiency of the whole network is improved; the dense block is introduced, so that the problem of gradient disappearance in the network training process can be relieved, the feature propagation is enhanced, the feature multiplexing is encouraged, and the parameter quantity is greatly reduced; the sensing loss is constructed by using VGG-16, so that the loss of high-frequency characteristics is reduced; the SSIM is utilized to ensure the low-level characteristics of the brightness, the contrast, the structure and the like of the fused images, and the spatial structure correlation between the images can be improved; the key design of the image fusion module can make the output information of the whole cascade network richer and represent more excellent fusion images;

the design of the whole cascade network model is an end-to-end model, a fusion image can be automatically generated according to an input source image without manually designing a fusion rule, and compared with the traditional fusion method, the fusion method based on deep learning has higher robustness and good fusion effect and can fully improve the fusion precision.

Drawings

FIG. 1 is a schematic diagram of a main model of a Cascade-GAN-based multiband infrared image fusion method provided by the invention;

FIG. 2 shows a generator G_DnThe structure schematic diagram of the encoder;

FIG. 3 is a generator G_DnA decoder structure schematic diagram;

FIG. 4 shows a discriminator D_DnA schematic structural diagram;

FIG. 5 shows a generator G_FuA schematic structural diagram;

FIG. 6 is a schematic view of an attention module;

FIG. 7 is a schematic view of a channel attention model;

FIG. 8 is a schematic view of a spatial attention model;

FIG. 9 shows a discriminator D_FuA schematic structural diagram;

FIG. 10 is a schematic of a converged network loss;

fig. 11 is a total loss diagram.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

As shown in fig. 1, the multiband infrared image fusion method based on Cascade-GAN provided by the invention specifically includes the following steps:

s1, data set preparation: the method comprises the steps of respectively collecting short-wave infrared images, medium-wave infrared images and long-wave infrared images under the same scene through an infrared sensor, and conducting Noise processing on the images, wherein original images and the images after Noise processing are respectively stored in an Image folder and a Noise folder. And respectively connecting the long wave, the medium wave and the short wave corresponding to the original infrared image and the noise-added image in the same scene along channels, and inputting the long wave, the medium wave and the short wave as a training data set into a Cascade-GAN for unsupervised learning.

S2, design of DnGAN network: the original image and the noise-added image are input into DnGAN, which includes a generator G_DnAnd a discriminator D_DnTwo partsThe score and the score have the relation of fighting game to generate a network G_DnContinuously generating a denoised image closer to the original image to discriminate the network D_DnThe difference between the original image and the generated denoised image is determined, and the final aim is to establish a denoising generating network so that a denoised image which cannot be distinguished from the original noiseless image can be generated.

Generator G_DnThe structure is shown in figure 1, which mainly comprises an encoder and a decoder, and an up-down sampling pair operation is introduced; the method comprises the steps of extracting the features of an image through an encoder, carrying out down-sampling operation on the features to obtain feature images of different scales, then carrying out feature extraction operation on the scales once, and finally realizing feature fusion of the two scales and reconstruction of a de-noised image through a decoder. By the aid of the down-sampling and up-sampling operation pairs, scaling of the feature diagram is achieved, change of the convolution kernel receptive field is achieved, and accordingly more context information is utilized, and the denoising effect is improved.

As shown in fig. 2, the encoder is composed of 4 CNNs, and the 4 CNNs have 128, 32, 128 cores of 3 × 3, 1 × 1, 3 × 3, 1 × 1 sizes from top to bottom, respectively; to mitigate the disappearance of the gradient, compensate for the loss of features, reuse previously calculated features, introduce DenseNet, and establish short direct connections between each and all layers in a feed forward manner. The decoder is also a 4-layer CNN and is similar in structure to the encoder, except that the number of cores for the four convolutional layers is 256, 64, and 256, respectively. Each layer is arranged as shown in figure 3; the number of steps of all the convolution layers is set as 1; to avoid explosive/disappearing gradients and accelerated training, batch normalization was applied; and the convergence speed is accelerated by adopting a ReLU activation function, and the gradient sparsity is avoided. The down sampling adopts max pooling, and the step length is 2; the upsampling operation is achieved by deconvolution of 4 × 4 kernels, with the aim of extending the feature map to the same spatial size as the previous scale;

discriminator D_DnAnd generator G_DnThe architecture is simpler to arrange than it is, as shown in fig. 4; discriminator D_DnEssentially a two-classifier, using a 3 x 3 convolution kernel and a ReLU activation function in convolutional layers from the first layer to the third layer, from the inputExtracting feature mapping from the image, and then classifying the feature mapping; the stride of all the convolutional layers is set to be 2; setting the last layer to generate a scalar by using the tanh activation function, wherein the scalar indicates that the input data is from the original image instead of G_DnProbability of false images being generated.

S3, FuGAN network design: the de-noised image is input and fused to generate an antagonistic network FuGAN, and the generated antagonistic network is used as a basic network and is input and fused by a generator G_FuAnd discriminator D_FuThe countermeasure game realizes the output of high-quality fusion images; the generator extracts image features through the encoder, and the decoder reconstructs and outputs a single-channel fusion image; the discriminator identifies the output image according to the single-channel infrared source images of three wave bands, outputs the corresponding discrimination probability vector as feedback to drive the generator to learn the data distribution of the input image and then fuse the data distribution until the discriminator can not discriminate the authenticity of the image output by the generator;

generator G_FuThe system consists of two parts, namely an encoder and a decoder, as shown in fig. 5, wherein the encoder consists of 5 convolutional layers, and an attention mechanism is introduced after the first convolutional layer and the fourth convolutional layer, so that the information which is more critical to the current task is focused, the attention to other information is reduced, and the efficiency of the whole network is improved; DenseNet was introduced and a short direct connection was established between each and all layers in a feed forward manner to mitigate the disappearance of the gradient, compensate for the loss of features, and reuse features previously calculated.

The Attention mechanism includes two parts of a channel Attention module CAM (channel Attention module) and a spatial Attention module SAM (spatial Attention module); the structure is shown in FIG. 6, the two parts are connected in sequence, the middle feature map is firstly input into CAM, and then the channel refined feature map is used as the input of SAM; wherein, in order to gather abundant information in each channel, the CAM uses max-pooling, overlapping-pooling and avg-pooling respectively to squeeze the spatial information of the input feature map; as shown in fig. 7, the application of overlapping-posing can improve the prediction accuracy and slow down the overfitting; three channel vectors can be obtained through compression operation; then, the three channel vectors are sent to a shared full-connection layer and a hidden layer, and the three channel vectors are activated through element-level summation operation and a sigmoid function, so that a channel attention vector can be obtained; multiplying the input feature map by the input feature map, so that the network has more attention to the just-interested channel region; the purpose of SAM is to obtain better spatial attention effect, the structure of SAM is shown in FIG. 8, and the model still uses three pooling operations of max-pooling, overlapping-pooling and avg-pooling to squeeze the channel information of the input feature map. Then, connecting the three two-dimensional maps, inputting the three two-dimensional maps into a convolutional layer, and activating the three two-dimensional maps through a sigmoid function to finally obtain a space attention two-dimensional map; also, the two-dimensional map may show where feature mapping needs to highlight where suppression is needed.

Discriminator D_FuThe structure is shown in FIG. 9, and the discriminator D of FuGAN_FuEssentially, the system is a multi-classifier which can estimate the probability of identifying the fused image as the infrared source images of three wave bands respectively; its output is a 1 x 3 probability vector; the discriminator D_FuConsists of four convolutional layers and a linear layer; these four convolutional layers use a 3 × 3 convolutional kernel and a leaky ReLU activation function, and batch normalization; in all convolutional layers, we set stride to 2; and the last linear layer judges the input according to the characteristics extracted from the first four convolutional layers and outputs a probability vector.

S4, training strategy: firstly, initializing and fusing a trained network under a noise-free environment to generate a confrontation network FuGAN, then training the cascade of the two networks in an end-to-end mode, and simultaneously determining the weight of the FuGAN in the process; the weight value in the de-noising generated countermeasure network DnGAN is also updated through the error back propagation of the subsequent network;

a loss function is set separately for guiding the optimization of the generators and discriminators of the two networks:

as shown in FIG. 10, in a noiseless environment, the loss function that guides the training of the FuGAN generator is represented by G_FuAnd D_FuAgainst the loss ofAnd controlling perceptual loss of high frequency signature lossAnd SSIM loss to control low frequency signature lossConsists of the following components:

wherein λ₁、λ₂Is the ratio of gradual modification in training;

the resistance loss is defined as follows:

SSIM loss, namely, the correlation loss, the brightness distortion and the contrast distortion of the constrained fusion image, wherein the SSIM loss is defined as follows:

where ω denotes a weight, ω₁+ω₂+ω₃＝1；

In the unsupervised image fusion task, the SSIM loss is the most common loss, because the brightness, contrast and structural characteristics of the images can be comprehensively considered in the calculation, and the spatial structural correlation between the images is considered; the method is consistent with the method that the human visual system acquires the structural information of the visual area, and the distortion state of the image can be perceived;

the discriminator of FuGAN is a multi-classifier; the conventional GAN discriminator adopts a sigmoid cross entropy loss function, and the method can cause the problem of gradient disappearance in the learning process; to overcome this problem we use least squares generation countermeasure networks (LSGANs) that use the least squares loss function as the loss function of the arbiter, the loss function L of the arbiter of FuGAN_DfThe method is composed of three infrared source images and decision loss of a fusion image. We use Coming watchThese four losses are shown:

similarly, the loss term for mid-wave infrared, long-wave infrared images is defined as:

wherein b is₂Is set to 1, b₁And b₃Set to 0; c. C₃Is set to 1, c₁And c₂Set to 0;

finally, when the input image is a fused image, the loss function is defined as:

wherein D is a discriminator D_FuJudging the probability label of the fusion image, and setting the probability label as 0; similarly, we also balance the three probability labels D, i.e., from the discriminator D_FuIn the aspect of the method, the fused images are pseudo short wave infrared images, pseudo medium wave infrared images and pseudo long wave infrared images with the same degree;

as shown in FIG. 11, the generator G of DnGAN during the training of the cascade network_DnAnd FuGAN Generator G_FuBy reconstruction of the loss functionLoss of perceptionAnd guide FuGAN generator G in a noiseless environment_FuEach loss term L of_GfJointly constitute:

loss of reconstructionFor the Mean Square Error (MSE) between the denoised network output and the noiseless image, the following is defined:

loss of perceptionReference to the corresponding loss term of FuGAN;

each loss term of the FuGAN generator, G_FuIs a loss function L_Gf；

DnGAN discriminator D_DnThe loss function also employs a least squares loss function:

similarly, the loss term for a noiseless image is defined as:

the above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solution of the present invention by those skilled in the art should fall within the protection scope defined by the claims of the present invention without departing from the spirit of the present invention.

19页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于目标检测的沥青路面损坏识别方法

Multi-band infrared image fusion method based on Cascade-GAN

相关技术

网友询问留言