Medical image registration method and system for unsupervised learning

文档序号:1939455 发布日期:2021-12-07 浏览:21次 中文

阅读说明:本技术 无监督学习的医学图像配准方法及系统 (Medical image registration method and system for unsupervised learning ) 是由 戴亚康 周志勇 胡冀苏 钱旭升 耿辰 于 2021-08-25 设计创作,主要内容包括:本发明公开了一种无监督学习的医学图像配准方法及系统,该方法包括以下步骤:1)构建深度学习配准网络,其包括空间自注意力配准网络和多分辨率图像配准网络;2)将固定图像F和待配准的浮动图像M输入深度学习配准网络中,得到F和M之间的形变场3)基于形变场采用三线性插值对M进行空间变换,得到最终的配准结果将与F的结构信息相似测度、平滑约束项和雅可比负值折叠罚项共同作为深度学习配准网络的损失函数L来引导网络参数的优化。本发明无需预先准备的分割标签或形变场标签,可对不同模态中的大形变区域得到较好的配准精度,且本发明的配准速度快、可达到实时性效果。(The invention discloses a medical image registration method and a medical image registration system for unsupervised learning, wherein the method comprises the following steps: 1) constructing a deep learning registration network which comprises a spatial self-attention registration network and a multi-resolution image registration network; 2) inputting the fixed image F and the floating image M to be registered into a deep learning registration network to obtain a deformation field between the F and the M 3) Based on deformation field Performing spatial transformation on M by adopting trilinear interpolation to obtain a final registration result Will be provided with And the structural information similarity measure, the smooth constraint term and the Jacobian negative value folding penalty term which are similar to the F are used as a loss function L of the deep learning registration network to guide the optimization of the network parameters. The method and the device do not need to prepare segmentation labels or deformation field labels in advance, can obtain better registration accuracy for large deformation areas in different modes, and have high registration speed and can achieve the real-time effect.)

1. A medical image registration method of unsupervised learning, characterized by comprising the following steps:

1) constructing a deep learning registration network which comprises a spatial self-attention registration network and a multi-resolution image registration network;

2) the image is as follows: inputting the fixed image F and the floating image M to be registered into a deep learning registration network to obtain the fixed image F and the floating image MDeformation field of

3) Based on deformation fieldPerforming space transformation on the floating image M by adopting trilinear interpolation to obtain a final registration resultIn the registration process, the registration result is processedTogether with the structural information similarity measure of the fixed image F, the smoothing constraint term and the jacobian negative folding penalty term, as a loss function L of the deep learning registration network, the optimization of the network parameters is guided.

2. The unsupervised learning medical image registration method according to claim 1, wherein in step 2), the image pairs: the fixed image F and the floating image M are input into a space self-attention registration network to be downsampled to different degrees to form a plurality of low-resolution images so as to obtain a coarse registration deformation field between the image pairsThen, registering the low-resolution images through a multi-resolution image registration network to finally obtain a deformation field between the fixed image F and the floating image M

3. The unsupervised learning medical image registration method of claim 2, wherein the spatial self-attention registration network comprises an encoding module, a decoding module, and a self-attention gating module;

pair of images: the fixed image F and the floating image M are connected into a 2-channel image as the input of a space self-attention registration network, and a 3-channel coarse registration deformation field is finally obtained through the encoding and decoding stages in sequence

Wherein, the encoding stage uses a 3D convolution layer with convolution kernel size of 3 and step length of 1, and each convolution is followed by a LeakyReLU active layer; and in the encoding stage, two maximum pooling layers are used to down-sample the spatial dimension while increasing the channel depth;

in the decoding stage, an up-sampling layer, a crossing connection layer and a convolution layer are alternately used for gradually transferring characteristics, and finally, a target deformation field is output through convolution with a step length of 1 and a SoftSign activation layer;

wherein the cross-over connection is connected by a self-attention gating module so as to combine different levels of information from the coding and decoding stages onto the spatial feature map.

4. The unsupervised learning medical image registration method according to claim 3, wherein the self-attention gating module acquires different weights in a spatial dimension by connecting adjacent-order feature maps with different scales in encoding and decoding stages, so as to keep relevant regions active and remove irrelevant or noisy responses, and specifically comprises:

firstly, performing up-sampling operation on a current feature map C in a decoding stage to obtain a feature map C' consistent with the number of P channels and the size of an image of a previous feature map;

then, respectively adopting average pooling and maximum pooling along the channel axis pairs P and C', and adding the results to obtain an effective text feature description CF;

for CF, after standard convolution operation with convolution kernel size of 1 and step size of 1 is performed, the obtained attention feature diagram AF is normalized through Sigmoid activation, and differential noise is eliminated;

and finally, performing inter-voxel para-position multiplication on the AF and the P to obtain a spatial attention feature map with rich context information.

5. The unsupervised medical image registration method according to claim 5, wherein in the step 2), the deformation field is obtained through a multi-resolution image registration networkThe method specifically comprises the following steps:

2-1) first, the input fixed image F and floating image M are down-sampled to 1/2 and 1/4, respectively, of the original image size by tri-linear interpolation, i.e., F is 2F2=4F1,M=2M2=4M1

2-2) image pair (F)1,M1) As input to the first stage, image F is computed by a spatial self-attention registration network1And an image M1Deformation field in between

2-3) pairsUp-sampling to obtain the image pair F2、M2Deformation field with same sizeWill be provided withAs a deformation field and for M2Is subjected to space deformation to obtain

2-4) image pairsAs a second stageBy computing an image F through a spatial self-attention registration network2And imageDeformation field in betweenWill be provided withAndare added to obtain

2-5) pairsUpsampling to obtain a deformation field of the same size as the image pair F, MUsing field of deformationCarrying out space deformation on M to obtain

2-6) image pairsAs input to the second stage, image F and image F are computed by a spatial self-attention registration networkDeformation field in betweenWill be provided withAndadding to obtain the final deformation field

6. The unsupervised learning medical image registration method of claim 1, wherein the loss function L is expressed as:

wherein the content of the first and second substances,as a result of registrationMeasure of similarity to structural information of the fixed image F, LsmoothFor smoothing the constraint term, LJetFor the Jacobian negative folding penalty term, α, β, and γ are weights.

7. The unsupervised learning medical image registration method of claim 6, wherein α, β and γ are 10, 0.5 and 200, respectively.

8. The unsupervised learning medical image registration method of claim 6 or 7, wherein,the calculation method comprises the following steps:

3-1) the local structure for any point x in the image I is represented by a six neighborhood: the central image block is an image block with the center of a point x and the size of p multiplied by p, and the periphery of the central image block is a six-neighborhood block with the distance of r from the central image block; the neighborhood structure description of the x point is represented by the Gaussian kernel distance between x and six neighborhood image blocks, and any image block in the six neighborhood is assumed to be xiThen x and xiThe gaussian kernel distance of (a) is expressed as:

the sum of the mean squared euclidean distances of 6 sets of image pairs is represented, where each set of image pairs is represented as:

wherein i is 1, 2.. 6, Dp(I,x,xi) Represents the sum of the mean squared Euclidean distances of 6 sets of image pairs, each of which is (x, x)i) The mean squared euclidean distance of (a) is: image block I centered on xp(x) And with xiImage block I being centeredp(xi) The mean squared euclidean distance therebetween;

wherein σ2Is the expected value of the mean squared euclidean distance for all pairs of images, i.e.:

3-2) calculating all Gaussian kernel distances, and defining the loss MIND of the modal-independent neighborhood characteristics as:

MIND=[dgauss(I,x,xi)},i=1,2...6;

3-3) definition ofComprises the following steps:

wherein N is 6.

9. The unsupervised learning medical image registration method of claim 8, wherein LJetThe expression of (a) is:

wherein M isThe total number of all elements in (a) · represents a linear activation function, which is linear for all positive values and all negative values are zero;representing deformation fieldThe jacobian matrix at position p;

the expression of (a) is:

10. medical image registration system for unsupervised learning, characterized in that it employs a method according to any of claims 1-9 for medical image registration.

Technical Field

The invention relates to the field of medical image registration, in particular to a medical image registration method and system for unsupervised learning.

Background

The existing multi-modal medical image registration multi-base iterative numerical optimization method needs to repeatedly perform numerical optimization in an iterative process, and has huge calculation amount, so that the calculation time is too long, and the real-time performance cannot be realized. The deep learning method is high in reasoning speed, but a large deformation area in a multi-modal image is difficult to perceive, large deformation registration is difficult to realize, a large number of tissue segmentation labels or deformation field labels are required in the conventional deep learning method, and the labels are usually obtained in practical application.

Therefore, a more reliable solution is now needed.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a medical image registration method and system without supervised learning, aiming at the above deficiencies in the prior art.

In order to solve the technical problems, the invention adopts the technical scheme that: a medical image registration method of unsupervised learning, comprising the steps of:

1) constructing a deep learning registration network which comprises a spatial self-attention registration network and a multi-resolution image registration network;

2) the image is as follows: inputting the fixed image F and the floating image M to be registered into a deep learning registration network to obtain a deformation field between the fixed image F and the floating image M

3) Based on deformation fieldPerforming space transformation on the floating image M by adopting trilinear interpolation to obtain a final registration resultIn the registration process, the registration result is processedTogether with the structural information similarity measure of the fixed image F, the smoothing constraint term and the jacobian negative folding penalty term, as a loss function L of the deep learning registration network, the optimization of the network parameters is guided.

Preferably, in the step 2), the image pair is: the fixed image F and the floating image M are input into a space self-attention registration network to be downsampled to different degrees to form a plurality of low-resolution images so as to obtain a coarse registration deformation field between the image pairsThen, registering the low-resolution images through a multi-resolution image registration network to finally obtain a deformation field between the fixed image F and the floating image M

Preferably, the spatial self-attention registration network comprises an encoding module, a decoding module and a self-attention gating module;

pair of images: the fixed image F and the floating image M are connected into a 2-channel image as the input of a space self-attention registration network, and a 3-channel coarse registration deformation field is finally obtained through the encoding and decoding stages in sequence

Wherein, the encoding stage uses a 3D convolution layer with convolution kernel size of 3 and step length of 1, and each convolution is followed by a LeakyReLU active layer; and in the encoding stage, two maximum pooling layers are used to down-sample the spatial dimension while increasing the channel depth;

in the decoding stage, an up-sampling layer, a crossing connection layer and a convolution layer are alternately used for gradually transferring characteristics, and finally, a target deformation field is output through convolution with a step length of 1 and a SoftSign activation layer;

wherein the cross-over connection is connected by a self-attention gating module so as to combine different levels of information from the coding and decoding stages onto the spatial feature map.

Preferably, the self-attention gating module acquires different weights in a spatial dimension by connecting adjacent-order feature maps of different scales in encoding and decoding stages, and further retains activation of a relevant region, and removes irrelevant or noisy responses, and specifically includes:

firstly, performing up-sampling operation on a current feature map C in a decoding stage to obtain a feature map C' consistent with the number of P channels and the size of an image of a previous feature map;

then, respectively adopting average pooling and maximum pooling along the channel axis pairs P and C', and adding the results to obtain an effective text feature description CF;

for CF, after standard convolution operation with convolution kernel size of 1 and step size of 1 is performed, the obtained attention feature diagram AF is normalized through Sigmoid activation, and differential noise is eliminated;

and finally, performing inter-voxel para-position multiplication on the AF and the P to obtain a spatial attention feature map with rich context information.

Preferably, in the step 2), the deformation field is obtained through a multi-resolution image registration networkThe method specifically comprises the following steps:

2-1) first, the input fixed image F and floating image M are down-sampled to 1/2 and 1/4, respectively, of the original image size by tri-linear interpolation, i.e., F is 2F2=4F1,M=2M2=4M1

2-2) image pair (F)1,M1) As a first stageBy computing an image F through a spatial self-attention registration network1And an image M1Deformation field in between

2-3) pairsUp-sampling to obtain the image pair F2、M2Deformation field with same sizeWill be provided withAs a deformation field and for M2Is subjected to space deformation to obtain

2-4) image pairsAs input to the second stage, image F is computed by a spatial self-attention registration network2And imageDeformation field in betweenWill be provided withAndare added to obtain

2-5) pairsUpsampling to obtain a deformation field of the same size as the image pair F, MUsing field of deformationCarrying out space deformation on M to obtain

2-6) image pairsAs input to the second stage, image F and image F are computed by a spatial self-attention registration networkDeformation field in betweenWill be provided withAndadding to obtain the final deformation field

Preferably, the loss function L is expressed as:

wherein the content of the first and second substances,as a result of registrationMeasure of similarity to structural information of the fixed image F, LsmoothFor smoothing the constraint term, LJetFor the Jacobian negative folding penalty term, α, β, and γ are weights.

Preferably, α, β and γ are 10, 0.5 and 200, respectively.

It is preferred that, among others,the calculation method comprises the following steps:

3-1) the local structure for any point x in the image I is represented by a six neighborhood: the central image block is an image block with the center of a point x and the size of p multiplied by p, and the periphery of the central image block is a six-neighborhood block with the distance of r from the central image block; the neighborhood structure description of the x point is represented by the Gaussian kernel distance between x and six neighborhood image blocks, and any image block in the six neighborhood is assumed to be xiThen x and xiThe gaussian kernel distance of (a) is expressed as:

the sum of the mean squared euclidean distances of 6 sets of image pairs is represented, where each set of image pairs is represented as:

wherein i is 1, 2.. 6, Dp(I,x,xi) Represents the sum of the mean squared Euclidean distances of 6 sets of image pairs, each of which is (x, x)i) The mean squared euclidean distance of (a) is: image block I centered on xp(x) And with xiImage block I being centeredp(xi) The mean squared euclidean distance therebetween;

wherein σ2Is the expected value of the mean squared euclidean distance for all pairs of images, i.e.:

3-2) calculating all Gaussian kernel distances, and defining the loss MIND of the modal-independent neighborhood characteristics as:

MIND=[dgauss(I,x,xi)},i=1,2...6;

3-3) definition ofComprises the following steps:

wherein N is 6.

Preferably, wherein LJetThe expression of (a) is:

wherein M isThe total number of all elements in (a) · represents a linear activation function, which is linear for all positive values and all negative values are zero;representing deformation fieldThe jacobian matrix at position p;

the expression of (a) is:

the invention also provides a medical image registration system for unsupervised learning, which adopts the method for medical image registration.

The invention has the beneficial effects that: the medical image registration method of unsupervised learning provided by the invention does not need a segmentation label or a deformation field label prepared in advance, can obtain better registration precision for large deformation areas in different modes, and has high registration speed and can achieve the real-time effect.

Drawings

FIG. 1 is a schematic frame diagram of a medical image registration method for unsupervised learning according to the present invention;

FIG. 2 is a block diagram of a spatial self-attention registration network of the present invention;

FIG. 3 is a block diagram of a self-attention gating module of the present invention;

FIG. 4 is a schematic flow diagram of a multi-resolution image registration network of the present invention;

FIG. 5 is a diagram of the structure of MIND neighborhood.

Detailed Description

The present invention is further described in detail below with reference to examples so that those skilled in the art can practice the invention with reference to the description.

It will be understood that terms such as "having," "including," and "comprising," as used herein, do not preclude the presence or addition of one or more other elements or groups thereof.

Example 1

The embodiment provides a medical image registration method for unsupervised learning, which comprises the following steps:

1) constructing a deep learning registration network which comprises a spatial self-attention registration network and a multi-resolution image registration network;

2) the image is as follows: inputting the fixed image F and the floating image M to be registered into a deep learning registration network to obtain a deformation field between the fixed image F and the floating image M

Wherein, the image pair is: the fixed image F and the floating image M are input into a spatial self-attention registration network to be downsampled to different degrees to form a plurality of low-resolution images,obtaining a coarse registration deformation field between pairs of imagesThen, registering the low-resolution images through a multi-resolution image registration network to finally obtain a deformation field between the fixed image F and the floating image M

Given a pair of three-dimensional images: the fixed image F and the floating image M are registered, and the aim of the registration is to find an optimal set of deformation transformation parametersSo that the registered floating imageMorphologically and anatomically aligned with the fixed image F. The invention establishes a deep learning network model, directly estimates the deformation field between F and M, and can be expressed as:

where f represents the mapping function to be learned by the deep learning network, theta is a network parameter,is the estimated deformation field. The optimal network parameters are generally learned by training the network by maximizing a similarity measure functionThe image registration process can be expressed as:

wherein S represents the fixed image F and the registered imageMeasure of similarity between, R is to maintainThe smoothness of the image and the regular term added,representing a non-linear deformation operation.

3) Based on deformation fieldPerforming space transformation on the floating image M by adopting trilinear interpolation to obtain a final registration resultIn the registration process, the registration result is processedTogether with the structural information similarity measure of the fixed image F, the smoothing constraint term and the jacobian negative folding penalty term, as a loss function L of the deep learning registration network, the optimization of the network parameters is guided.

Referring to fig. 1, an overall registration framework of the present invention is shown.

Referring to fig. 2, in the present embodiment, the spatial self-attention registration network includes an encoding module, a decoding module, and a self-attention gating module;

pair of images: the fixed image F and the floating image M are connected into a 2-channel image as the input of a space self-attention registration network, and a 3-channel coarse registration deformation field is finally obtained through the encoding and decoding stages in sequence

Wherein, the encoding stage uses 3D convolution layers with convolution kernel size of 3 and step size of 1, and each convolution is followed by an LeakyReLU active layer with parameter of 0.2; and in the encoding stage, two maximum pooling layers are used to down-sample the spatial dimension while increasing the channel depth;

in the decoding stage, an up-sampling layer, a crossing connection layer and a convolution layer are alternately used for gradually transferring characteristics, and finally, a target deformation field is output through convolution with a step length of 1 and a SoftSign activation layer;

in general, when learning target deformation, in order to prevent the disappearance of low-level features, cross-connection is used in the codec path. In a preferred embodiment, cross-connections are taken from attention gating module connections to incorporate different levels of information from the codec stage onto the spatial signature graph.

Referring to fig. 3, the self-attention gating module obtains different weights in a spatial dimension by connecting adjacent-order feature maps of different scales in encoding and decoding stages, and further retains activation of a relevant region, and removes irrelevant or noisy responses, and specifically includes:

firstly, performing up-sampling operation on a current Feature map C (current Feature map) in a decoding stage to obtain a Feature map C' consistent with the number of channels and the size of an image of a previous Feature map P (previous Feature map);

then, respectively adopting average pooling and maximum pooling for P and C' along a channel axis, and adding the results to obtain an effective text feature description CF (context feature);

for CF, after standard convolution operation with convolution kernel size of 1 and step length of 1 is carried out, the obtained attention feature map AF (attention feature) is normalized through Sigmoid activation, and differential noise is eliminated;

and finally, performing inter-voxel para-position multiplication on the AF and the P to obtain a spatial attention feature map with rich context information. Since only pooling and convolution with a convolution kernel of 1 are used, the added parameters that must be optimized are almost zero, and can be used with deeper networks with little additional time cost.

The difficulty of image registration is affected by the degree of alignment of regions with large structural differences, which are generally strongly associated with large deformations and difficult to align. In order to further improve the capability of capturing the structural difference between the images through the network, a spatial self-attention gating module is added before the connection layer is crossed, and a large deformation area can be highlighted by utilizing the spatial and text information of different levels, so that a deformation field is refined.

In this embodiment, referring to fig. 4, in step 2), the deformation field is obtained through the multi-resolution image registration networkThe method specifically comprises the following steps:

2-1) first, the input fixed image F and floating image M are down-sampled to 1/2 and 1/4, respectively, of the original image size by tri-linear interpolation, i.e., F is 2F2=4F1,M=2M2=4M1

2-2) image pair (F)1,M1) As input to the first stage, image F is computed by a spatial self-attention registration network1And an image M1Deformation field in between

2-3) pairsUp-sampling to obtain the image pair F2、M2Deformation field with same sizeWill be provided withAs a deformation field and for M2Is subjected to space deformation to obtain

2-4) image pairsAs input for the second stage, through spaceSelf-attention registration network computed image F2And imageDeformation field in betweenWill be provided withAndare added to obtain

2-5) pairsUpsampling to obtain a deformation field of the same size as the image pair F, MUsing field of deformationCarrying out space deformation on M to obtain

2-6) image pairsAs input to the second stage, image F and image F are computed by a spatial self-attention registration networkDeformation field in betweenWill be provided withAndadding to obtain the final deformation field

The deep learning network has the inherent defect of small visual field, is not beneficial to the registration of large deformation, and is difficult to directly optimize, slow in convergence and easy to sink into the optimum.

In this embodiment, the expression of the loss function L is:

wherein the content of the first and second substances,as a result of registrationMeasure of similarity to structural information of the fixed image F, LsmoothFor smoothing the constraint term, LJetFor the Jacobian negative folding penalty term, α, β, and γ are weights. In a preferred embodiment, α, β and γ are 10, 0.5 and 200, respectively.

For multi-modal image registration, the similarity measure needs to get rid of the limitation of the modality, and the similarity of the multi-modal image pair can be really measured. To address this problem, the present invention introduces a similarity loss based on structural information, i.e., a mode-independent neighborhood feature (MIND) loss. MIND is defined on non-local (non-local) image blocks based on self-similarity, relying on local image structure information rather than image gray-scale distribution. Specifically, this embodimentIn (1),the calculation method comprises the following steps:

3-1) referring to FIG. 5, the local structure for any point x in image I is represented by a six neighborhood: the central image block is an image block with the center of a point x and the size of p multiplied by p, and the periphery of the central image block is a six-neighborhood block with the distance of r from the central image block; the neighborhood structure description of the x point is represented by the Gaussian kernel distance between x and six neighborhood image blocks, and any image block in the six neighborhood is assumed to be xiThen x and xiThe gaussian kernel distance of (a) is expressed as:

the sum of the mean squared euclidean distances of 6 sets of image pairs is represented, where each set of image pairs is represented as:

wherein i is 1, 2.. 6, Dp(I,x,xi) Represents the sum of the mean squared Euclidean distances of 6 sets of image pairs, each of which is (x, x)i) The mean squared euclidean distance of (a) is: image block I centered on xp(x) And with xiImage block I being centeredp(xi) The mean squared euclidean distance therebetween;

wherein σ2Is the expected value of the mean squared euclidean distance for all pairs of images, i.e.:

3-2) calculating all Gaussian kernel distances, and defining the loss MIND of the modal-independent neighborhood characteristics as:

MIND={dgauss(I,x,xi)},i=1,2...6;

3-3) definition ofComprises the following steps:

in this embodiment, six neighborhoods are adopted, so N is 6; of course eight neighborhoods, sixteen neighborhoods, etc. may also be employed.

During image registration, all voxels do not necessarily experience the same amount of deformation, and severely deformed voxels can lead to folding or tearing phenomena. In order to reduce the above situation, the present invention proposes to use a dynamic folding penalty term based on a Jacobian negative folding penalty term to further constrain the deformation.

Specifically, wherein the Jacobian negative folding penalty term LJetThe expression of (a) is:

wherein M isThe total number of all elements in (a) (-) represents a linear activation function, which is linear for all positive values and zero for all negative values, in the embodiment, σ (·) is set as a ReLU function;representing deformation fieldThe jacobian matrix at position p;

the expression of (a) is:

wherein x, y, z here denote directions, i.e. x-axis direction, y-axis direction and z-axis direction.

The jacobian matrix of the deformation field is the second-order tensor of the deformation derivatives in three directions, the determinant of which can be used to analyze the local state of the deformation field. For example: dotPositive means that point p can maintain directivity in its neighborhood. On the contrary, if the point isA negative value indicates that point p is folded within its neighborhood, resulting in a disruption of normal topology. We are in this fact to embed the unfolding penalty term on jacobian negative voxels, so that negative regions will be penalized and positive regions will be almost unaffected in jacobian. Further, in this embodiment, a smooth constraint term L is used jointlysmoothThe whole deformation can be kept smooth as far as possible while the folding is carried out reversely.

Example 2

The present embodiment provides a medical image registration system for unsupervised learning, which performs registration of medical CT and MR images using the method of embodiment 1.

While embodiments of the invention have been disclosed above, it is not limited to the applications listed in the description and the embodiments, which are fully applicable in all kinds of fields of application of the invention, and further modifications may readily be effected by those skilled in the art, so that the invention is not limited to the specific details without departing from the general concept defined by the claims and the scope of equivalents.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种可变形医学图像配准方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!