Image semantic segmentation method based on PU-L earning

文档序号：1354694 发布日期：2020-07-24 浏览：12次中文

阅读说明：本技术 一种基于PU-Learning的图像语义分割方法 (Image semantic segmentation method based on PU-L earning ) 是由汪聪浦剑于 2020-03-23 设计创作，主要内容包括：本发明提供了一种基于正样本和无标签样本学习的图像语义分割方法,属于计算机视觉技术领域,其中方法包括：数据准备步骤、数据预处理步骤、深度卷积神经网络构建步骤、基于PU-Learning的损失函数设计步骤、损失函数的优化学习步骤,迭代执行训练步骤直至所述图像语义分割模型的训练结果满足预定收敛条件。本发明采用深度神经网络提取待分割的图像特征,在此基础上,本发明设计了一种基于PU-Learning的交叉熵损失函数,可以在只有部分像素级标注的情况下训练优化语义分割模型,本发明方案既可以端到端的训练优化语义分割模型,同时在一定程度上保留了像素级别的直接监督,在保证了良好的语义分割质量的同时,提升了数据的标注速度。(The invention provides an image semantic segmentation method based on positive sample and unlabelled sample learning, which belongs to the technical field of computer vision, and comprises a data preparation step, a data preprocessing step, a deep convolutional neural network construction step, a loss function design step based on PU-L earning, and an optimization learning step of a loss function, wherein the training step is executed iteratively until the training result of the image semantic segmentation model meets a preset convergence condition.)

1. An image semantic segmentation method based on PU-L earning is characterized by comprising the following steps:

s1, preparing data, wherein in an image database to be trained, at least one pixel-level image label of each category is marked;

s2, receiving an image, performing data preprocessing on the image, performing an operation of subtracting a mean value and dividing the mean value by a standard deviation, controlling the numerical distribution of a sample, and performing data augmentation operation under the condition of insufficient sample, wherein the data augmentation operation comprises rotating the image by a certain angle, horizontally turning, blurring noise and multi-scale scaling, and finally adjusting the width and the height of the image to the same size;

s3, constructing a deep convolutional neural network, using the deep learning full convolutional neural network as a semantic segmentation model, using the full convolutional neural network for prediction, and learning intermediate representation parameters through a plurality of convolutional layers/nonlinear activation layers/pooling layers to obtain a primary result of semantic segmentation of the training image;

s4, designing a loss function,wherein: y is_i，p_iRespectively representing the true probability and the prediction probability of the pixel point classified into the ith class, wherein the true probability can be obtained according to the label of the pixel point, the prediction probability can be obtained by passing the output of the semantic segmentation model through a softmax function, gamma and α are respectively two hyper-parameters, the value of gamma is generally set to be 2, the value of α is smaller than the proportion of negative samples in all samples, and k represents that the class of the labeled samples has k in total in practical application;

s5, optimizing and learning a loss function, calculating the error between the initial result and the corresponding image label by using the loss function according to the initial result, and correcting and optimizing the model parameters of the semantic segmentation model by adopting a random gradient descent method;

and S6, iteratively executing the training until the training result of the semantic segmentation model meets a preset convergence condition.

2. The method for image semantic segmentation based on PU-L earning as claimed in claim 1, wherein S1 to S5 are training steps, and S1 to S5 training steps are iteratively performed to obtain a trained semantic segmentation model.

Technical Field

The invention relates to the technical field of computer vision, in particular to an image semantic segmentation method based on PU-L earning.

Background

With the continuous development of technologies such as a big data technology, a fifth generation mobile communication technology, an internet of things technology and the like, the collection, aggregation and storage of multimedia resources such as images and videos are more and more convenient, at present, in some application scenes (such as automatic driving, medical images and the like), people need to perform semantic segmentation on collected images, the semantic segmentation of the images is a classic problem in the field of computer vision, and the purpose of the semantic segmentation is to enable a computer to predict the category of each pixel in the images, namely, to mark each pixel with a category label.

In the existing image segmentation technology based on supervised learning, people often need to provide pixel-level class labels for training samples, namely, a class label needs to be manually marked on each pixel point in an image, statistical data shows that the time for pixel-level labeling of an image is averagely 15 minutes, and the labeling process is time-consuming and labor-consuming and has a high cost. The training samples of the method do not need pixel-level labeling, and only the training images or the reference images with image-level labeling are used for semantic segmentation, compared with other systems which need heavy pixel-level labeling on the training images, the rough labeling on the images can be faster and easier to obtain, however, because the accurate pixel-level labeling is not used as the reference for model learning, the weak supervision semantic segmentation problem is very challenging, and the quality of semantic segmentation is difficult to guarantee.

In summary, considering both the semantic segmentation quality and the data annotation speed becomes an important problem that restricts the image semantic segmentation method from obtaining big data support and further developing.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an image semantic segmentation method based on PU-L earning, which solves the problems mentioned in the background technology.

In order to achieve the purpose, the invention is realized by the following technical scheme that the image semantic segmentation method based on PU-L earning comprises the following steps:

s1, preparing data, wherein in an image database to be trained, at least one pixel-level image label of each category is marked;

and S6, iteratively executing the training until the training result of the semantic segmentation model meets a preset convergence condition.

Further, the training steps S1 to S5 are repeated S1 to S5 to obtain a trained semantic segmentation model.

Compared with the prior art, the invention has the following beneficial effects:

(1) the invention designs a novel loss function based on the cross entropy loss function and PU-L earning, and can train and optimize a semantic segmentation model under the condition that only partial pixel level labels are available.

(2) The scheme of the invention can train and optimize the semantic segmentation model end to end, simultaneously reserve direct supervision of pixel level to a certain extent, ensure good semantic segmentation quality and simultaneously improve the data labeling speed. Under the current situation of rapid development in the field of image semantic segmentation.

(3) The method is suitable for the existing various depth image semantic segmentation models, and therefore, the method can be used for large-scale image semantic segmentation scenes.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a technical solution, in which a PU-L earning-based image semantic segmentation method includes the following steps:

s1, preparing data, wherein in an image database to be trained, at least one pixel-level image label of each type is marked (the label of a label-free sample needs to be uniformly set to be 0, and labels of other samples of different types are marked from 1);

and S6, iteratively executing training until the training result of the semantic segmentation model meets a preset convergence condition.

S1 to S5 are training steps, and the training steps S1 to S5 are iteratively executed to obtain a trained semantic segmentation model, specifically, the training steps are iteratively executed until the training result of the semantic segmentation model satisfies a predetermined convergence condition. For example, the predetermined convergence condition is that a predetermined number of iterations is reached, and when the number of iterations reaches the predetermined number of iterations, the iteration process ends.

In operation, in the steps from S1 to S6, for S3, the FCN full convolutional network is taken as an example, and specific examples are as follows:

1. input layer

2 < 1 convolutional layer 1_1(3 x 64)

3 < 2non-linear activation of Re L U layer

4.< ═ 3 convolutional layer 1_2(3 × 64)

5 < 4 nonlinear activation Re L U layer

6.< ═ 5 pooling layer (3 × 3/2)

7.< ═ 6 convolutional layer 2_1(3 × 128)

8 < 7 nonlinear activation Re L U layer

9.< ═ 8 convolutional layers 2_2(3 × 128)

10 < 9 nonlinear activation Re L U layer

11.< ═ 10 pooling layer (3 × 3/2)

12.< ═ 11 convolutional layer 3_1(3 × 256)

13.< -12 non-linearly activated Re L U layer

14.< ═ 13 convolutional layer 3_2(3 × 256)

15 < 14 nonlinear activation Re L U layer

16.< ═ 15 convolutional layer 3_3(3 × 256)

17 < 16 nonlinear activation Re L U layer

18.< ═ 17 pooling layer (3 × 3/2)

19.< ═ 18 convolutional layer 4_1(3 × 512)

20 < 19 nonlinear activation Re L U layer

21.< ═ 20 convolutional layer 4_2(3 × 512)

22 < 21 nonlinear activation Re L U layer

23 < 22 convolutional layer 4_3(3 x 512)

24.< ═ 23 nonlinear activation Re L U layer

25.< ═ 24 pooling layer (3 × 3/2)

26 < 25 convolutional layer 5_1(3 x 512)

27 < 26 nonlinear activation Re L U layer

28.< ═ 27 convolutional layer 5_2(3 × 512)

29 < 28 non-linear activation of the Re L U layer

30 < 29 convolutional layer 5_3(3 x 512)

31 < 30 nonlinear activation of the Re L U layer

32.< ═ 31 pooling layer (3 × 3/2)

33.< ═ 32 convolutional layer 6_1(7 × 4096)

34 < 33 nonlinear activation of Re L U layer

35.< = 34 random drop Dropout layer (0.5)

36.< ═ 35 convolutional layer 6_2(1 × 4096)

37 < 36 nonlinear activation Re L U layer

38.< = 37 random drop Dropout layer (0.5)

39.< ═ 38 upsampling layer

40.< ═ 39 loss layers, the calculation of the loss function is performed

Where the symbol "< ═ is the current layer number, the following number is the input layer number, e.g., 3.< ═ 2 indicates that the current layer is the third layer, the input is the second layer, the convolutional layer parameters are followed by the convolutional layer in parentheses, e.g., 7 × 7 × 4096, indicating the convolutional kernel size of 7 × 7, the number of channels is 4096, the pooled layer parameters are followed by the pooled layer in parentheses, e.g., 3 × 3/2 indicates the pooled kernel size of 3 × 3, the interval is 2, and the probability of dropping is followed by the random drop Dropout layer in parentheses, e.g., 0.5 indicates that the neurons of this layer do not participate in prediction in one prediction and the probability of the parameters not modifying the tuning in back propagation is 50%.

In the above example, the convolution kernel of most convolution layers is set to 3 × 3, so that local information can be better integrated, in the present embodiment, the step size of the pooling layer is set in order to obtain a larger field of view for the upper layer features without increasing the amount of calculation, and the step size of the pooling layer also has the feature of enhancing spatial invariance, that is, the same input is allowed to appear at different image positions, and the output result responses are the same.

The nonlinear activation unit is specifically a linear rectifying unit (Rectified L interferometric units, hereinafter referred to as Re L U), and the mapping result of the convolutional layer is thinned as much as possible by adding the linear rectifying unit after the convolutional layer, so that the mapping result is closer to the visual response of a human, and the image processing effect is better.

Before the up-sampling layer of the above example, a random drop Dropout layer is added, so that the occurrence of the phenomenon of overfitting of the deep neural network can be effectively alleviated.

In order to enlarge the previous features to the size of the original image, the upsampling layer obtains different probability values for predicting each pixel into different categories, for example, there are different types 10 of positive samples and unlabeled samples, and there are 11 types in total, so that the finally predicted tensor is the same in width and height as the original image and is 11 in length.

To sum up, the convolutional layer of the full convolutional neural network is mainly used for information induction and fusion, the pooling layer (which may be selected as a max pooling layer: MaxPooling) mainly induces high-level information, and the full convolutional neural network can be finely adjusted to adapt to different performance and efficiency tradeoffs.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

8页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于边缘检测的电视台台标检测方法

Image semantic segmentation method based on PU-L earning

相关技术

网友询问留言