Remote sensing image road extraction method based on graph convolution

文档序号:616059 发布日期:2021-05-07 浏览:7次 中文

阅读说明:本技术 一种基于图卷积的遥感图像道路提取方法 (Remote sensing image road extraction method based on graph convolution ) 是由 陶敏玉 迟远英 丁治明 杨博文 于 2021-01-16 设计创作,主要内容包括:本发明公开了一种基于图卷积的遥感图像道路提取方法,图卷积可以聚合相邻节点之间的特征信息,在节点较大邻域内提取特征,有效解决局部位置信息丢失问题。所设计的方法可以视为多任务学习,首先,利用CNN实现对遥感图像的特征提取,其次,在基于CNN所提取的道路特征基础上构建图结构模型,主要由节点和相应的边关系组成,将CNN分支所提取的道路特征信息视为节点,节点之间的差异度视作边,通过获取节点之间的关系来获取局部位置信息,本发明通过利用图卷积来解决卷积神经网络因为泛化效果而造成的道路局部位置信息丢失问题,从而能有效提高道路分割精度。(The invention discloses a remote sensing image road extraction method based on graph convolution, wherein the graph convolution can aggregate feature information between adjacent nodes, and features are extracted in a larger neighborhood of the nodes, so that the problem of local position information loss is effectively solved. The designed method can be regarded as multi-task learning, firstly, CNN is used for realizing feature extraction of a remote sensing image, secondly, a graph structure model is constructed on the basis of road features extracted on the basis of the CNN, the graph structure model mainly comprises nodes and corresponding edge relations, road feature information extracted by CNN branches is regarded as the nodes, the difference degree among the nodes is regarded as the edge, and local position information is obtained by obtaining the relation among the nodes.)

1. A remote sensing image road extraction method based on a graph convolution model is characterized by comprising the following steps: comprises the following steps of (a) carrying out,

step S1: designing a feature extraction branch based on a CNN model to realize feature extraction of the remote sensing image;

step S2: designing a local information capturing branch based on a GCN model, and generating a corresponding graph node according to the characteristic information extracted by the CNN model;

step S3: carrying out overall design by utilizing the designed road extraction network based on the graph convolution model, further realizing road extraction of the high-resolution remote sensing image and obtaining a result;

and according to the road characteristics in the remote sensing image, the remote sensing image road extraction is realized by adopting a GCN-based model. The whole frame adopts an encoder-decoder structure, the encoder stage is divided into a road extraction branch based on a CNN model and a feature extraction branch based on a GCN and firstly designed according to the road characteristics, the ResNet34 pre-trained on an ImageNet data set is used as an encoder, the last full connection layer is removed, the improved ResNet34 is used for extracting the features of related road information, then the obtained related road feature information is used as the input of the GCN model, firstly, a related graph structure is built according to the features, the feature information is regarded as nodes, the difference degree between the nodes is regarded as edges, the relation between nodes in the graph structure is obtained through the designed graph volume model to obtain local position information, the whole network adopts a multi-task joint learning mode to realize the extraction of road information in the high-resolution remote sensing image, and therefore the road segmentation precision is improved.

2. The remote sensing image road extraction method based on the graph convolution model as claimed in claim 1, characterized in that: s1 is to realize the design based on CNN feature extraction branch, design the encoder of road extraction network based on ResNet34 network structure, remove the full connection layer therein, because ResNet34 mainly adopts the bottleneck layer to realize, the network structure adopts two convolution layers to reduce and raise the feature dimension separately; the method adopts a jump connection mode to realize the fusion of shallow information and deep characteristics, thereby solving the problem of gradient disappearance in the training process.

And extracting relevant road characteristic information in the remote sensing image by taking the improved ResNet34 as a characteristic extraction branch of the designed network and taking the original image as the input of the branch, wherein the expression of the branch is shown by an expression (1).

Y=g(x) (1)

Wherein x is the input original image, the function represents a series of convolution pooling operations, and after the function CNN feature extraction, the road feature information Y is finally obtained.

3. The remote sensing image road extraction method based on the graph convolution model as claimed in claim 1, characterized in that: s2 implements the design of GCN local feature capture branches. The GCN is used as a novel network architecture, captures local information and converts a road extraction problem into a graph node classification problem.

In order to capture the relationship between features extracted by CNN, a graph structure is established based on road feature information extracted by CNN branches, the graph structure is composed of nodes and edges, the defined graph structure is represented by G ═ V, E, where V represents a set of nodes, E represents a set of edges of the graph, node V represents feature information extracted by CNN, and edge E represents the degree of difference between nodes, and the adjacency matrix a of the graph represents the degree of difference between each node and other nodes in the graph structure.

The adjacent matrix of the graph structure is initialized by adopting a Gaussian kernel function, and then information among the exchange nodes is spread by using messages through the graph convolution of a plurality of layers, so that the characteristics are extracted in the neighborhood, and the local information loss is avoided.

And inputting the remote sensing image into a CNN feature extraction branch, and acquiring feature information in the image after a plurality of layers of convolution operation to be used as the node feature input of the GCN. In the GCN-based branch, the adjacency matrix A generated according to the features obtained by CNN and the features x are used as input, the specific graph convolution operation is shown as formula (2), and gθExpressed as a convolution kernel, θ0θ1As convolution parameters, INIs an identity matrix.

Wherein L is a Laplace matrix of the graph, A is a relation matrix pre-generated based on the characteristic information, D is a degree matrix of a vertex, and the normalized Laplace matrixThe final transformation yields the formula (2).

Will theta0θ1These two parameters are parameterized as a single parameter θ, resulting in the following convolution equation (3).

Due to the fact thatThe eigenvalue value range is [0, 2 ]]This will lead to a series of problems such as numerical instability and explosion or disappearance of the gradient, and therefore normalization is performed using (4), where a' ═ a + IN,D′ii=∑A′ij

In summary, the specific operation of the graph convolution is shown as the following formula

Wherein H(l)The feature information output for the i-th layer map convolution is also input information for the i + 1-th layer map convolution, and when l is 0, H is(l)X, θ is a learnable weight. Delta is an activation function, ReLU is selected as the activation function, so that nonlinear operation is realized, and information exchange among all nodes is realized through the incidence matrix A in the GCN training process. The surrounding feature information is gathered and its state is changed by linear transformation W.

Through the graph convolution operation, the convolution layer and the pooling layer in the convolution operation realize local perception of characteristic information in the remote sensing image, and carry out comprehensive operation on local information, realize information exchange among different nodes and further obtain global information. The maximum range of the information of other nodes which can be received by the current node is the receptive field of the graph, and the designed GCN has two layers.

4. The remote sensing image road extraction method based on the graph convolution model as claimed in claim 1, characterized in that: in S3, feature extraction is performed by using the CNN branch, and then relationship capture between features is performed by using the GCN branch, and relationship flow between nodes in the graph structure is convolved by two layers of graphs, and then the obtained deep feature information is up-sampled, and finally the final road segmentation image is output at the decoder.

Technical Field

The invention relates to the field of remote sensing image road extraction, in particular to an automatic remote sensing image road extraction method based on a graph convolution model.

Background

The road is used as an infrastructure and plays an important role in the fields of geography, economy, military and the like, the high-resolution remote sensing image is used for extracting the key road to obtain valuable geographic information which is difficult to obtain by people, the basis can be improved for subsequent tasks of vehicle navigation, city planning, road network updating or disaster relief support and the like, and the method has important significance for promoting city development.

The research of road extraction based on remote sensing images has been in history for decades, however, the self structure of the road is special, the road is different from a common segmentation target, the background of the remote sensing images is complex, the remote sensing images are easily influenced by other background factors in the road extraction, especially in recent years, along with the improvement of the resolution of aerial images, some buildings have spectral values similar to the road, the buildings are easily mistaken for the road in the road extraction, and the road extraction based on the high-resolution remote sensing images is still a difficult point. At present, methods for extracting roads are mainly divided into methods based on traditional methods and methods based on deep learning. Traditional road extraction methods mainly include spectral analysis, edge detection, threshold segmentation, etc., such as shi et al, which use adaptive neighborhood to classify spectral feature space to distinguish between roads and non-roads, but this method requires training an SVM for each input image and is not suitable for complex prototype intersections. Huilin et al propose to segment the image into a binary image with only a target and a background based on gray features, but the speed is not fast enough and irrelevant features are difficult to distinguish, Gaetano et al propose for the first time to carry out edge extraction with Canny operator, then carry out post-processing through graph cut theory to obtain a final road map, these traditional methods basically extract road features through unsupervised mode, the dependence on model parameters is strong, most of the methods need certain manual interaction, the degree of automation is not high, and the ideal generalization effect can not be exerted for different remote sensing image data sets.

With the vigorous development of artificial intelligence, a deep learning method becomes a trend, and breakthrough progress is made in the field of image analysis, a convolutional neural network can be used for not only processing the problems of target detection and image classification, but also performing semantic segmentation and other fine reasoning, more and more researches begin to realize automatic extraction of roads by using a computer vision related algorithm, and Wei and other people add road structure information into a loss function based on an FCN network architecture, so that the road segmentation result is improved. Kestur et al propose UFCN (U-shaped FCN) model to realize unmanned aerial vehicle low-altitude remote sensing image road extraction, the model is composed of a group of convolution stacks and corresponding mirror image deconvolution stacks, local feature information is stored by using jump link, zhang et al propose a road extraction network combining the advantages of ResNet and U-Net, the network is composed of residual units, and training of a deep network is simplified. Finin et al propose to extract roads using a deep neural network to obtain predicted mask images and vectors of individual roads, and post-processing with vectors, MniCoswtea et al propose extracting roads in remote sensing images by three successive steps, first extracting initial road results by combining a plurality of UNet networks, secondly, based on the road map fusion of the first stage, the optimization algorithm generates a road vector with corresponding thickness, and finally adds missing connection based on an inference map to improve the road extraction precision, the above road extraction methods are generally implemented based on a simple full convolution neural network or a combination of multiple networks, the prediction precision of the pixel level is ensured to a certain extent, the accuracy of road extraction is improved, but local information is easily lost through iteration of a common convolution layer and a pooling layer, and the road segmentation result is finally influenced.

Therefore, based on the defects of the current road segmentation network, the invention introduces a GCN model to capture local position information, firstly utilizes CNN branches to realize extraction of road characteristic information, and initializes an adjacency matrix through a Gaussian kernel function according to the characteristic information extracted by CNN. The management between the nodes is finally obtained through convolution of the two layers of graphs to obtain the local position information, so that the road segmentation precision is effectively improved.

Disclosure of Invention

The invention aims to provide a high-accuracy road extraction method based on a graph convolution model, which realizes automatic extraction of roads in remote sensing images and solves the problems mentioned in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme: a road extraction method based on a graph convolution model comprises the following steps:

step S1: designing a feature extraction branch based on a CNN model to realize feature extraction of the remote sensing image;

step S2: designing a local information capturing branch based on a GCN model, and generating a corresponding graph node according to the characteristic information extracted by the CNN model;

step S3: and carrying out overall design by utilizing the designed road extraction network based on the graph convolution model, further realizing road extraction of the high-resolution remote sensing image and obtaining a result.

Fig. 1 is an overall architecture diagram of the invention, according to the road characteristics in the remote sensing image, most of the existing road extraction methods obtain characteristic information based on the stacking of a convolutional layer and a pooling layer, so that the receptive field is increased, and the high-resolution remote sensing image extraction is realized. The whole framework of the invention adopts an encoder-decoder structure, wherein, in the stage of an encoder, the method comprises a road extraction branch based on a CNN model and a feature extraction branch based on a GCN, which is firstly designed according to the road characteristics, a ResNet34 pre-trained on an ImageNet data set is used as an encoder, a last full connection layer is removed, improved ResNet34 is used for extracting the features of related road information, the obtained related road feature information is used as the input of the GCN model, a related graph structure is firstly built according to the features, the feature information is regarded as nodes, the difference degree between the nodes is regarded as edges, the relation between nodes in the graph structure is obtained through the designed graph volume model to obtain local position information, the whole network adopts a multi-task joint learning mode to realize the extraction of road information in the high-resolution remote sensing image, and therefore the road segmentation precision is effectively improved.

As a further scheme of the invention:

the step S1 is to realize the design based on CNN feature extraction branch, and mainly uses a ResNet34 network structure as an encoder of a road extraction network, and the main improvement is to remove a full connection layer therein, because ResNet34 is mainly realized by a bottleneck layer, and a main network structure is shown in fig. 2, wherein two convolutional layers are respectively used for reducing and increasing feature dimensions, and the main purpose is to reduce the number of parameters and increase the number of network layers, wherein the module uses a jump connection mode to realize the fusion of shallow layer information and deep layer features, thereby effectively solving the problem of gradient disappearance in the training process.

Extracting relevant road characteristic information in the remote sensing image by taking the improved ResNet34 as a characteristic extraction branch of the designed network and taking an original image as an input of the branch, wherein a main expression of the branch can be shown by an expression (1)

Y=g(x) (1)

Wherein X is the input original image, the function represents a series of convolution pooling operations, and after the function CNN feature extraction, the road feature information Y is finally obtained.

As a further scheme of the invention:

the global context information of the image is important to obtain because the current road structure is special and usually presents a slender shape, and occupies the whole image information, but many current methods may obtain the context information by adding an attention mechanism or repeatedly stacking convolution layers, but this method is easy to have a local information loss problem, resulting in the loss of the road information, and the step S2 of the present invention realizes the design of the GCN local feature capture branch.

Compared with the traditional CNN, the GCN can operate on data with any non-Euclidean structure and effectively process graph structure data by modeling the relationship among samples. The working principle of the GCN is to aggregate the characteristics between adjacent nodes based on the adjacency matrix, thereby enlarging the receptive field and solving the problem of local information loss. The road related information in the remote sensing image can be captured more accurately. Due to the fact that the GCN is used, the problem of loss of a large amount of local information is solved, more road information can be obtained in a road extraction task, and therefore a road extraction result is remarkably improved. The subject proposes to capture local information using GCN and to view the road extraction problem as a map node classification problem.

To capture the relationship between features extracted by CNN, first, we build a graph structure based on road feature information extracted by CNN branches, the graph structure is composed of nodes and edges, the defined graph structure is represented by G ═ V, E, where V represents a set of nodes, E represents a set of edges of the graph, node V represents feature information extracted by CNN, and edge E represents the degree of difference between nodes, and the adjacency matrix a of the graph represents the degree of difference between each node and other nodes in the graph structure.

A Gaussian kernel function is adopted to initialize an adjacency matrix of a graph structure, and then information among exchange nodes is spread by using messages through graph convolution of a plurality of layers, so that features can be extracted in a larger neighborhood, and loss of local information is avoided.

The specific graph convolution process is described in detail next, the remote sensing image is input into a CNN feature extraction branch, after several layers of convolution operation, important feature information in the image is obtained and input as the node feature of the GCN. In the GCN-based branch, an adjacency matrix a generated from the features acquired by the CNN and the features X are input, and a specific graph convolution operation is performed.

Wherein L is a Laplace matrix of the graph, A is a relation matrix pre-generated based on the characteristic information, D is a degree matrix of a vertex, and the normalized Laplace matrixThe final transformation yields equation (2).

Will theta0θ1These two parameters are parameterized as a single parameter θ, resulting in the following convolution equation (3).

Due to the fact thatThe eigenvalue value range is [0, 2 ]]This will lead to a series of problems such as numerical instability and gradient explosion or disappearance, and therefore the normalization technique shown in (4) is used, where a' ═ a + IN,D′ii=∑A′ij

In summary, the specific operation of graph convolution adopted by us is shown as the following formula

Wherein H(l)The feature information output for the i-th layer map convolution is also input information for the i + 1-th layer map convolution, and when l is 0, H is(l)X, θ is a learnable weight. δ is an activation function, and ReLU is selected as the activation function in this document, so as to realize nonlinear operation, and information exchange between all nodes is realized through the incidence matrix a in the GCN training process. The surrounding feature information is gathered and its state is changed by linear transformation W.

Through the graph convolution operation, the graph convolution operation is equivalent to a convolution layer and a pooling layer in the convolution operation, local perception of characteristic information in the remote sensing image is achieved, comprehensive operation is conducted on local information, information exchange among different nodes is achieved, and therefore global information is obtained. The maximum range of the information of other nodes which can be received by the current node is the receptive field of the graph, but the graph convolution does not have similar sampling as a pooling layer, so that the loss of local information is avoided, and various experiments prove that the GCN layer is not too deep, so that the GCN designed by the method only has two layers.

As a further scheme of the invention:

the step S3, feature extraction is performed by using CNN branch, then relationship capture between features is performed by using GCN branch, relationship flow between nodes in graph structure is up-sampled by two-layer graph convolution, finally final road segmentation image is output at decoder,

because the GCN model is adopted to replace the convolution layer and the pooling layer, the receptive field is enlarged, and no local position information is lost in the graph convolution process, so that the road segmentation network based on the graph convolution provided by the invention can acquire more dense context information, the road segmentation of the high-resolution remote sensing image is better realized, and a road extraction result with higher trueness and higher accuracy is finally obtained.

Drawings

Fig. 1 is an overall architecture diagram of the present invention.

Fig. 2 a bottomoneck network architecture.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments; all other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The real-time protocol for this experiment was as follows:

step S1: an experiment data set is constructed, experiments are mainly trained on a Massachusetts data set, 1108 training sets, 14 verification sets and 49 test sets are included, the pixel size of each image is 1500 x 1500, next, the data set is an aerial remote sensing image, the coverage area exceeds 2634 square kilometers, and shooting areas include urban areas, suburban areas and rural areas.

Step S2: the method comprises the steps of constructing a network, adopting improved ResNet34 as a CNN branch mainly used for extracting the characteristics of the high-resolution remote sensing image, mainly comprising a bottleNeck module, reducing the number of network parameters, and deepening the network hierarchy, thereby obtaining deeper characteristic information. The characteristic information extracted by CNN branches is used as nodes of a graph structure, a Gaussian kernel function is adopted to initialize an adjacent matrix of the graph structure, and then information between the nodes is exchanged by utilizing message propagation between the nodes through two-layer graph convolution, so that the characteristic can be extracted in a larger neighborhood, and the loss of local information is avoided. The constructed network adopts a multi-task learning mode, after feature extraction is carried out and the mutual relation among the features is obtained, the obtained feature information is continuously up-sampled, and finally a road segmentation result is output through a decoder.

Step S3: the data set is enhanced, because the Massachusetts data set has a small number of test sets, in order to improve the robustness of prediction, data enhancement including image horizontal turning, vertical turning, diagonal turning and the like is also carried out on the test data set, the number and the variability of the data set are enhanced through a series of operations on the data set, a network trained by utilizing the data has a better generalization effect, and a better result can be shown for a new remote sensing image data set.

Step S4: and (3) network training, wherein in order to avoid the over-fitting phenomenon, a pitorch frame is used for training a network by performing data enhancement technologies such as image translation, horizontal turning, vertical turning, multi-scale transformation and the like on a training image in the training process. Considering that the problem of imbalance of positive and negative samples exists in road segmentation, namely the proportion of pixels occupied by roads is far smaller than that occupied by the background, and important information is easily lost in iteration, the network is trained by adopting dicells, adaptive updating parameters of an Adam optimizer are utilized, the size of a training batch is set to be 8, training is simultaneously carried out on 4 2080 TIGPUs, the learning rate is initially set to be 3e-4, and when a loss function does not decrease for more than 3 times, the learning rate is divided by 5 until a model converges.

Step S5: in the network test, the road segmentation problem is defined as a binary classification problem of pixels, that is, a road pixel is positive, a background pixel is negative, and accuracy, precision, recall rate, and F1-score are used as prediction result evaluation criteria, where the accuracy represents the percentage of a result with correct prediction to a total sample, and the precision, also referred to as precision, is the probability of actually being a road sample in all samples predicted as roads. The recall ratio represents the probability of being predicted as a road pixel among the actual road pixels. Since the precision rate and the recall rate are contradictory, F1-score considers the precision rate and the recall rate simultaneously, and enables the precision rate and the recall rate to reach the highest simultaneously, thereby achieving balance.

Wherein TP, TN, FN and FP respectively represent true positive, false positive, true negative and false negative pixels of the predicted image pixel.

The main innovation point of the invention is that the mutual relation between the characteristic information is obtained by using the graph convolution, the local position information can be effectively captured, and the road extraction network architecture designed by the invention has better performance through the performance on the Massachusetts data set.

In the description of the present invention, "a plurality" means two or more unless otherwise specified.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof; the present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein; any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:车辆重识别方法、电子设备及相关产品

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!