Mask R-CNN-based unmanned aerial vehicle image building area calculation method and system

文档序号:1739010 发布日期:2019-12-20 浏览:29次 中文

阅读说明:本技术 基于Mask R-CNN的无人机图像建筑物面积计算方法及系统 (Mask R-CNN-based unmanned aerial vehicle image building area calculation method and system ) 是由 陈珺 王干北 龚文平 罗林波 程展 王永涛 于 2019-07-25 设计创作,主要内容包括:本发明提供了基于Mask R-CNN的无人机图像建筑物面积计算方法及系统,首先使用无人机采集预选定地区的多张图像,每张图像尺寸保持一致;进行删选,删除完全不包含建筑物的图像;对剩下的图像标注出图像中的建筑物,将标注好的图像作为训练图像集;准备一个完整的卫星图像数据集;采用Mask R-CNN算法,先使用卫星图像数据集进行预训练得到一个初始模型;使用训练图像集对初始模型进行训练,得到最终的分割模型;采用无人机对待测地区进行图像采集,拼接成一张全景图,进行降采样处理,然后裁剪成小型图像;使用分割模型对小型图像进行处理;统计每一个标注出建筑物所包含的像素点的个数;根据实际情况,计算得到图像中每个建筑物的面积。(The invention provides a Mask R-CNN-based unmanned aerial vehicle image building area calculation method and a system, wherein an unmanned aerial vehicle is used for collecting a plurality of images of a preselected area, and the size of each image is kept consistent; deleting images which do not contain buildings at all; marking buildings in the images for the rest images, and taking the marked images as a training image set; preparing a complete satellite image data set; adopting a Mask R-CNN algorithm, and firstly using a satellite image data set for pre-training to obtain an initial model; training the initial model by using a training image set to obtain a final segmentation model; collecting images of an area to be detected by using an unmanned aerial vehicle, splicing the images into a panoramic image, performing down-sampling treatment, and cutting the panoramic image into small images; processing the small image by using a segmentation model; counting the number of pixel points contained in each marked building; and calculating the area of each building in the image according to the actual situation.)

1. An unmanned aerial vehicle image building area calculation method based on Mask R-CNN is characterized by comprising the following steps:

s1, collecting a plurality of images of a preselected area by using an unmanned aerial vehicle, wherein the size of each image is kept consistent;

s2, deleting the images acquired in the S1, and deleting the images which do not contain buildings at all;

s3, labeling the remaining images in the S2 one by one, labeling buildings in the images, and taking the labeled images as a training image set;

s4, preparing a satellite image data set which comprises a plurality of satellite images which are marked with buildings;

s5, adopting a Mask R-CNN algorithm, firstly using a satellite image data set for pre-training, and obtaining an initial model after the pre-training is completed;

s6, training the initial model by using the training image set, and after training for multiple rounds, converging the model to obtain a final segmentation model;

s7, collecting images of the area to be detected by adopting an unmanned aerial vehicle, splicing a plurality of collected images into a panoramic image, performing down-sampling treatment on the panoramic image, and cutting the panoramic image into a plurality of small images with the same size;

s8, processing all the small images obtained in the S7 by using the segmentation model so as to mark the buildings in each small image;

s9, counting the number of pixel points contained in each marked building;

and S10, setting the unit pixel area represented by each pixel according to the actual situation, and calculating to obtain the area of each building in the image.

2. The Mask R-CNN-based unmanned aerial vehicle image building area calculation method of claim 1, wherein the image collected in S1 is a three-channel RGB image.

3. The Mask R-CNN-based unmanned aerial vehicle image building area calculation method of claim 1, wherein in S3, building labeling is performed on the image by using labelme software.

4. The unmanned aerial vehicle image building area calculation method based on Mask R-CNN as claimed in claim 1, wherein in the calculation method in S10, the unit pixel point area is multiplied by the number of pixel points.

5. Unmanned aerial vehicle image building area calculation system based on Mask R-CNN, its characterized in that includes: a processor and a storage device; the processor loads and executes instructions and data in the storage device to realize the unmanned aerial vehicle image building area calculation method based on Mask R-CNN as claimed in any one of claims 1-4.

Technical Field

The invention belongs to the field of geographic information science, and particularly relates to a Mask R-CNN-based unmanned aerial vehicle image building area calculation method and system.

Background

In recent years, geological disasters have been more and more frequently generated with climate changes. Particularly landslides, debris flows, floods, and the like, threaten human life and property safety, and therefore precautionary measures must be taken in advance. In some areas where geological disasters are likely to occur, such as landslide zones, property assessment is performed, and then corresponding measures are taken according to the property value. Property value is typically assessed in terms of floor area. The traditional method is generally operated manually, and is quite long in time consumption and low in efficiency. In recent years, remote sensing technology has been rapidly developed, and therefore, it has become a trend to extract buildings using remote sensing images. Although the remote sensing technology makes a great breakthrough, the resolution of the remote sensing image is still too low, and the use of the remote sensing image to extract the building and calculate the area brings great errors, which may have adverse effects on the evaluation of the house value. In recent years, unmanned aerial vehicle technology has developed rapidly, and great progress is made particularly in voyage and load. There are two reasons for using drone images for property assessment. Firstly, a high-resolution image with the resolution reaching the centimeter level can be easily obtained by equipping a high-resolution camera; second, in some areas where geological disasters are likely to occur, the terrain is complex and very dangerous, and designated areas can be photographed in these areas using unmanned aerial vehicles. Therefore, the invention provides a new method for automatically calculating the area of a building based on aerial images of unmanned aerial vehicles.

In the traditional image segmentation algorithm, a watershed algorithm is a popular method, and in the segmentation process, the similarity between adjacent pixels is taken as an important reference basis, so that pixels with similar spatial positions and similar gray values are obtained, and then interconnected closed contours are formed. The common operation steps of the watershed algorithm are as follows: graying the color image, then obtaining a gradient image, and finally, carrying out a watershed algorithm on the basis of the gradient image to obtain edge lines of the segmentation image. In the watershed algorithm, the gradient image is subjected to threshold processing, and the final segmentation image is greatly influenced by selecting an appropriate threshold. Therefore, the selection of the threshold is key to the image segmentation effect. In addition, the conventional segmentation algorithm also includes clustering method, edge detection method, and the like.

In recent years, Deep Convolutional Neural Networks (DCNNs) have become very popular in computer vision applications. DCNNs typically contain many convolutional layers and deep features can be learned from training data. Therefore, a deep convolutional neural network is introduced into the segmentation task, and a good effect is achieved. However, these methods still have limitations. Good results are difficult to achieve with less training data. As a supervised learning approach, deep neural network based image segmentation typically requires a large amount of training data, but in some cases the number of samples is small. The remote sensing images and the unmanned aerial vehicle images have larger similarity, and the remote sensing images and the unmanned aerial vehicle images can be obtained on some open source websites, so that a large number of remote sensing images can be used for pre-training a network by means of the idea of transfer learning, and then a small number of unmanned aerial vehicle aerial images are utilized for fine adjustment to achieve good effect.

Many methods of deep learning are currently used to perform segmentation tasks. These methods can be divided into two categories. The first class is semantic segmentation, such as the FCN and deplab series, and the second class is instance segmentation, such as FCIS and Mask R-CNN. Semantic segmentation refers to classifying each pixel in an image, but it cannot distinguish between different objects in the same class. Instance partitioning can be seen as an extension of semantic partitioning. Unlike semantic segmentation, instance segmentation distinguishes each instance, even for the same class of objects, the instance segmentation marks each object and distinguishes it with a different color and bounding box. Instance segmentation can be seen as a combination of object detection and semantic segmentation. Example segmentation is more suitable than semantic segmentation for calculating the building area of a given region because of the need to distinguish between adjacent objects in a given region.

Disclosure of Invention

The invention aims to solve the technical problems of long time consumption and low efficiency of the conventional building area calculation method, and provides an unmanned aerial vehicle image building area calculation method and system based on Mask R-CNN to solve the technical defects.

An unmanned aerial vehicle image building area calculation method based on Mask R-CNN comprises the following steps:

s1, collecting a plurality of images of a preselected area by using an unmanned aerial vehicle, wherein the size of each image is kept consistent;

s2, deleting the images acquired in the S1, and deleting the images which do not contain buildings at all;

s3, labeling the remaining images in the S2 one by one, labeling buildings in the images, and taking the labeled images as a training image set;

s4, preparing a satellite image data set which comprises a plurality of satellite images which are marked with buildings;

s5, adopting a Mask R-CNN algorithm, firstly using a satellite image data set for pre-training, and obtaining an initial model after the pre-training is completed;

s6, training the initial model by using the training image set, and after training for multiple rounds, converging the model to obtain a final segmentation model;

s7, collecting images of the area to be detected by adopting an unmanned aerial vehicle, splicing a plurality of collected images into a panoramic image, performing down-sampling treatment on the panoramic image, and cutting the panoramic image into a plurality of small images with the same size;

s8, processing all the small images obtained in the S7 by using the segmentation model so as to mark the buildings in each small image;

s9, counting the number of pixel points contained in each marked building;

and S10, setting the unit pixel area represented by each pixel according to the actual situation, and calculating to obtain the area of each building in the image.

Further, the image acquired in S1 is a three-channel RGB image.

Further, in S3, building labeling is performed on the image using labelme software.

Further, in the calculation method in S10, the area of the unit pixel is multiplied by the number of pixels.

Unmanned aerial vehicle image building area calculation system based on Mask R-CNN includes: a processor and a storage device; the processor loads and executes the instructions and data in the storage device to realize any unmanned aerial vehicle image building area calculation method based on Mask R-CNN.

Compared with the prior art, the invention has the advantages that: selecting Mask R-CNNAs a segmentation modelThe Mask R-CNN has the advantages of simple and flexible structure, obvious segmentation effect and the like. After performing Mask R-CNN, the outline of each building may be obtained and the number of pixels on the outline of each building may be counted, and then using the size of the unit area represented by each pixel, the area of each building may be calculated accordingly. The result precision obtained by the method is obviously higher than that of the traditional calculation method.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flow chart of an unmanned aerial vehicle image building area calculation method based on Mask R-CNN of the invention;

FIG. 2 is a qualitative experimental comparison of various algorithms;

FIG. 3 is a graph comparing quantitative indicators for each algorithm;

FIG. 4 is a graph of the effect of the present invention using Mask R-CNN segmentation;

FIG. 5 is a comparison of the calculated area and the true value of the present invention.

Detailed Description

For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

An unmanned aerial vehicle image building area calculation method based on Mask R-CNN is shown in FIG. 1 and comprises the following steps:

s1, collecting a plurality of images of a preselected area by using an unmanned aerial vehicle, wherein the size of each image is kept consistent, and the images are three-channel RGB images;

s2, deleting the images acquired in the S1, and deleting the images which do not contain buildings at all;

s3, labeling the rest images in the S2 one by using labelme software, labeling buildings in the images, and taking the labeled images as a training image set;

s4, preparing a complete satellite image data set, wherein the data set comprises a plurality of satellite images which are marked with buildings;

s5, adopting a Mask R-CNN algorithm, firstly using a satellite image data set for pre-training, and obtaining an initial model after the pre-training is completed;

s6, training the initial model by using the training image set, and after training for multiple rounds, converging the model to obtain a final segmentation model;

s7, collecting images of the area to be measured by adopting an unmanned aerial vehicle, splicing a plurality of collected images into a panoramic image, performing down-sampling treatment on the panoramic image, and cutting the panoramic image into a plurality of small images with the same size, wherein the size meets the treatment requirement of a segmentation model;

s8, processing all the small images obtained in the S7 by using the segmentation model so as to mark the buildings in each small image;

s9, counting the number of pixel points contained in each marked building;

s10, setting the area of the unit pixel point represented by each pixel point according to the actual situation, multiplying the area of the unit pixel point by the number of the pixel points, and calculating to obtain the area of each building in the image.

The Mask R-CNN model adopted by the invention is the combination of target detection and semantic segmentation, and is called as example segmentation or target segmentation. The Mask R-CNN adds a semantic segmentation algorithm full convolution neural network (FCN) algorithm as a segmentation branch on the basis of a target detection algorithm, namely fast R-CNN. After the picture passes the fast R-CNN, many regions of interest (RoI) are generated, and FCN is applied to each RoI to classify the pixels. Unlike the Faster R-CNN, the Master R-CNN uses alignment of interest (RoI Align) instead of pooling of interest (RoI Pool), which can solve the spatial dislocation problem and is a significant help to improve the segmentation quality. Furthermore, binary rather than polynomial penalties are used, which can produce accurate binary masks. Another feature of Mask R-CNN is the use of a residual network or modified ResNet network rather than the traditional vgg network to enhance the ability to extract features. ResNet is also the champion proposed by hoxmin et al and obtained the ILSVRC2015 game. ResNet works better than VGGNet, but with fewer parameters. Generally, the structure of ResNet can accelerate the training of deep neural network, and the accuracy is greatly improved. The Mask R-CNN is very flexible and can be used for various computer vision tasks including target detection, image segmentation and human body gesture recognition. In COCO challenges, Mask R-CNN performs better than previous models. Different from the FCIS proposed by Microsoft, the Mask R-CNN is simpler, has better performance, better expansibility and more various functions, and can change different backbone structures, such as Resnet-101 or Resnet-101-FPN and the like. Here, mainly FPN solves the problem of multi-scale detection. In short, there are three major improvements of Master R-CNN over fast R-CNN. First, various network structures are explored as backbone networks for Mask R-CNN. Second, alignment of interest (RoI Align) is used instead of pooling of interest (RoI Pool). Third, the FCN algorithm is added as a split branch. In our task, Mask R-CNN identifies each target with a bounding box. Each bounding box is then partitioned into architectural and non-architectural regions.

Qualitative experiments for the various algorithms are shown in fig. 2, the first column is true, the second column is FCN segmentation results, the third column is deplab segmentation results, the fourth column is SegNet segmentation results, and the fifth column is segmentation results of the method of the present invention; the quantitative index comparison of the above algorithm is shown in fig. 3.

The results of the segmentation using Mask R-CNN are shown in fig. 4, where seven buildings, respectively A, B, C, D, E, F, G, were selected, and the comparison of the area calculated by the present invention with the real value is shown in fig. 5, where GT is the real value and the unit is square meter.

In conclusion, the Mask R-CNN is used as the segmentation model, and the neural network learning method is adopted to train and optimize the model, so that the Mask R-CNN has the advantages of simple and flexible structure, remarkable segmentation effect and the like. After performing Mask R-CNN, the outline of each building may be obtained and the number of pixels on each building outline may be calculated, and then using the unit area size represented by each pixel, the area of each building may be calculated accordingly. The result precision obtained by the method is obviously higher than that of the traditional calculation method.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种输电线路导线覆冰厚度的识别方法和装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!