Ship name character region detection method based on deep learning

文档序号:1505622 发布日期:2020-02-07 浏览:9次 中文

阅读说明:本技术 一种基于深度学习的船名字符区域检测方法 (Ship name character region detection method based on deep learning ) 是由 张三元 吴书楷 祁忠琪 涂凯 于 2019-10-08 设计创作,主要内容包括:本发明公开了一种基于深度学习的船名字符区域检测方法。通过摄像机采集大运河过往货船的高清图片,预处理后对图片中船名字符区域进行标注,构建船名字符区域检测数据集;搭建基于特征融合策略的深度学习检测网络,将数据集送入网络进行训练;利用训练好的检测网络对原图中的目标区域进行计算,根据置信度阈值和非极大值抑制算法对检测结果进行筛选获得最终的检测框,并绘制在图片上。本发明能够快速、准确地对自然场景中的货船船名字符区域进行检测,相比于传统检测方法采用阈值分割、边缘检测等技术,其不容易受光线、雨水等复杂环境的影响,在智能交通领域具有较好的应用前景。(The invention discloses a ship name character region detection method based on deep learning. Acquiring a high-definition picture of a cargo ship passing by a large canal through a camera, preprocessing the picture, labeling a ship name character region in the picture, and constructing a ship name character region detection data set; building a deep learning detection network based on a feature fusion strategy, and sending a data set into the network for training; and calculating a target area in the original image by using the trained detection network, screening a detection result according to a confidence threshold and a non-maximum suppression algorithm to obtain a final detection frame, and drawing the final detection frame on a picture. The method can quickly and accurately detect the name character area of the cargo ship in the natural scene, is not easily influenced by complex environments such as light rays and rainwater compared with the traditional detection method adopting the technologies such as threshold segmentation and edge detection, and has a better application prospect in the field of intelligent transportation.)

1. A ship name character region detection method based on deep learning is characterized in that:

the method comprises the following steps:

1) collecting a cargo ship picture passing through a canal as a sample picture, then preprocessing the sample picture, and taking all the sample pictures and labeled information thereof as a data set;

2) constructing a ship name character area detection network, and training the ship name character area detection network by using the data set in the step 1) to obtain the trained ship name character area detection network;

3) preprocessing a picture of the cargo ship to be detected, inputting the preprocessed picture into the ship name character area detection network trained in the step 2), and obtaining a target frame containing confidence score;

4) and screening out a high-score target frame according to the confidence threshold, then eliminating a target frame with high position coincidence degree through a non-maximum suppression algorithm, and finally keeping the target frame as a final detection result.

2. The deep learning-based ship name character region detection method according to claim 1, characterized in that: the preprocessing in step 1) and step 4) is to scale the picture to a fixed size of 500 × 500.

3. The deep learning-based ship name character region detection method according to claim 1, characterized in that: the step 1) is specifically as follows: preprocessing a sample picture, then carrying out rectangular frame labeling on the position of a ship name character area by using labeling software, and giving a label serial number 1 to the ship name character area and giving a label serial number 0 to a background area outside the ship name character area; all sample pictures and labeling information thereof are used as a data set, the labeling information comprises label serial numbers and position information of a rectangular frame, and the position information of the rectangular frame is a coordinate value of the upper left corner of the rectangular frame and a width value and a length value of the rectangular frame.

4. The deep learning-based ship name character region detection method according to claim 1, characterized in that: the step 2) is specifically as follows:

2.1) constructing a ship name character area detection network comprising an input layer, a feature extraction module, a feature fusion module, a prediction module and an output layer;

the characteristic extraction module comprises a basic network module and three convolution modules, an input layer is connected to the third convolution module through the basic network module, the first convolution module and the second convolution module in sequence, the basic network module is a VGG16 network after a full connection layer is removed, and each convolution module comprises two convolution layers which are connected in sequence;

the characteristic fusion module comprises four subunits which are connected in sequence, each subunit comprises an anti-convolution layer, an Eltwise layer and a convolution layer which are connected in sequence, and the convolution layer is used as the output of each subunit; the input of the deconvolution layer in the second subunit, the third subunit and the fourth subunit is the output of the previous subunit, and the input of the deconvolution layer in the first subunit is the output of the second convolution layer in the third convolution module; the convolution layer and the pooling layer in the basic network module are respectively input into an Eltwise layer in the fourth subunit and the Eltwise layer in the third subunit, and the second convolution layer in the first convolution module and the second convolution module are respectively input into the Eltwise layers in the second subunit and the first subunit;

the prediction module comprises five prediction units, the output of each prediction unit is connected to the output layer, and the input of the second prediction unit, the input of the third prediction unit, the input of the fourth prediction unit and the input of the fifth prediction unit are the output of the convolution layer in the first sub-unit, the second sub-unit, the third sub-unit and the fourth sub-unit respectively;

2.2) sending the data set in the step 1) into a ship name character area detection network, and training a neural network by adopting a random gradient descent method until the error of the network reaches the minimum value; wherein, the first 20K iterative learning rate is set to be 1 multiplied by 10-4The decay rate of the last 10K iterative learning is 1 x 10-5

5. The deep learning-based ship name character region detection method according to claim 4, characterized in that: after the deconvolution operation of the deconvolution layer in the subunit, the feature graph of the high level in the ship name character area detection network is consistent with the feature graph of the low level in the network in size;

the Eltwise layer of the subunit is used for adding pixel values at the same position in the two input characteristic maps.

6. The deep learning-based ship name character region detection method according to claim 4, characterized in that: generating a plurality of default bounding boxes by the feature map through a prediction unit, wherein the default bounding boxes are a series of rectangular boxes with the same area and different aspect ratios, and the aspect ratios of the rectangular boxes are 1, 1/2, 1/3, 1/5, 1/7 and 1/10 respectively;

for each rectangular frame, the prediction unit respectively calculates a category score and a position offset by adopting convolution layers of two convolution kernels with the size of 1 multiplied by 5, wherein the category score is a confidence score of a ship name character region in the rectangular frame, the position offset is calculation of the position of the ship name character region, and the calculation result comprises a coordinate value of the upper left corner of the rectangular frame and a width value and a length value of the rectangular frame.

7. The deep learning-based ship name character region detection method according to claim 1, characterized in that: in the step 4), a large number of repeated rectangular frames are output after each picture is input into the ship name character region detection network, the rectangular frames with high scores are selected according to a preset confidence coefficient threshold value, redundant rectangular frames are removed by using a non-maximum suppression algorithm, the rectangular frames with the highest confidence coefficient scores in the neighborhood of the ship name character region are reserved, and finally the reserved rectangular frames are used as final detection results.

Technical Field

The invention relates to a detection method of a ship name character region, in particular to a ship name character region detection method based on deep learning.

Background

The waterway transportation is used as a high-efficiency and rapid transportation mode, and plays a certain role in boosting the development of economy. In order to manage the canal conveniently, the identity information of the passing ships is often needed to be known, so that the automatic ship name identification technology has great significance. The current ship name recognition technology comprises two main steps of ship name character area positioning and ship name recognition, and the quick and accurate positioning of the ship name character area is of great importance as the first step, because the subsequent recognition process is influenced by the detection quality.

The traditional detection method adopts technologies such as binarization, edge detection and the like to calculate the target area, and is easily influenced by the external natural environment, the detection accuracy is low, and the effect is not ideal.

Disclosure of Invention

In order to solve the problems in the background art, the invention provides a ship name character region detection method based on deep learning. The method calculates the position of the ship name character area through the deep neural network, realizes the efficient, accurate and quick positioning function, has strong robustness and anti-interference capability, and is easy to transplant.

The technical scheme adopted by the invention is as follows:

the invention comprises the following steps:

step 1) collecting a cargo ship picture passing through a canal as a sample picture, then preprocessing the sample picture, and taking all the sample pictures and labeled information thereof as a data set;

step 2) constructing a ship name character area detection network, and training the ship name character area detection network by using the data set in the step 1) to obtain the trained ship name character area detection network;

step 3) preprocessing a picture of the cargo ship to be detected, inputting the preprocessed picture into the ship name character area detection network trained in the step 2), and obtaining a target frame containing confidence score;

and 4) screening out a high-score target frame according to the confidence coefficient threshold, then eliminating the target frame with high position coincidence degree through a non-maximum suppression algorithm, and finally keeping the target frame as a final detection result.

The preprocessing in step 1) and step 4) is to scale the picture to a fixed size of 500 × 500.

The step 1) is specifically as follows: preprocessing a sample picture, then carrying out rectangular frame labeling on the position of a ship name character area by using labeling software, and giving a label serial number 1 to the ship name character area and giving a label serial number 0 to a background area outside the ship name character area; all sample pictures and labeling information thereof are used as a data set, the labeling information comprises label serial numbers and position information of a rectangular frame, and the position information of the rectangular frame is a coordinate value of the upper left corner of the rectangular frame and a width value and a length value of the rectangular frame.

The step 2) is specifically as follows:

2.1) constructing a ship name character area detection network comprising an input layer, a feature extraction module, a feature fusion module, a prediction module and an output layer;

the characteristic extraction module comprises a basic network module and three convolution modules, an input layer is connected to the third convolution module through the basic network module, the first convolution module and the second convolution module in sequence, the basic network module is a VGG16 network after a full connection layer is removed, and each convolution module comprises two convolution layers which are connected in sequence;

the characteristic fusion module comprises four subunits which are connected in sequence, each subunit comprises an anti-convolution layer, an Eltwise layer and a convolution layer which are connected in sequence, and the convolution layer is used as the output of each subunit; the input of the deconvolution layer in the second subunit, the third subunit and the fourth subunit is the output of the previous subunit, and the input of the deconvolution layer in the first subunit is the output of the second convolution layer in the third convolution module; the convolution layer and the pooling layer in the basic network module are respectively input into an Eltwise layer in the fourth subunit and the Eltwise layer in the third subunit, and the second convolution layer in the first convolution module and the second convolution module are respectively input into the Eltwise layers in the second subunit and the first subunit;

the prediction module comprises five prediction units, the output of each prediction unit is connected to the output layer, and the input of the second prediction unit, the input of the third prediction unit, the input of the fourth prediction unit and the input of the fifth prediction unit are the output of the convolution layer in the first sub-unit, the second sub-unit, the third sub-unit and the fourth sub-unit respectively;

2.2) sending the data set in the step 1) into a ship name character area detection network, and training a neural network by adopting a random gradient descent method until the error of the network reaches the minimum value; wherein, the first 20K iterative learning rate is set to be 1 multiplied by 10-4The decay rate of the last 10K iterative learning is 1 x 10-5

The detection network adopts a multitask loss function, and the loss function is the sum of category loss and position loss; wherein, the category loss adopts a cross entropy cost function, and the position loss adopts a smooth L1 loss function.

After deconvolution operation of a deconvolution layer in a subunit, the feature graph of a high level in the ship name character area detection network is consistent with the feature graph of a low level in the network in size, namely the feature graph of the deconvolution layer in the input subunit is consistent with the feature graph of the Eltwise layer in the current subunit after deconvolution operation;

the Eltwise layer of the subunit is used for adding pixel values at the same position in the two input characteristic maps to realize characteristic enhancement.

Generating a plurality of default bounding boxes by the feature map through a prediction unit, wherein the default bounding boxes are a series of rectangular boxes with the same area and different aspect ratios, and the aspect ratios of the rectangular boxes are 1, 1/2, 1/3, 1/5, 1/7 and 1/10 respectively;

for each rectangular frame, the prediction unit respectively calculates a category score and a position offset by adopting convolution layers of two convolution kernels with the size of 1 multiplied by 5, wherein the category score is a confidence score of a ship name character region in the rectangular frame, the position offset is calculation of the position of the ship name character region, and the calculation result comprises a coordinate value of the upper left corner of the rectangular frame and a width value and a length value of the rectangular frame.

In the step 4), a large number of repeated rectangular frames are output after each picture is input into the ship name character region detection network, the rectangular frames with high scores are selected according to a preset confidence coefficient threshold value, redundant rectangular frames are removed by using a non-maximum suppression algorithm, the rectangular frames with the highest confidence coefficient scores in the neighborhood of the ship name character region are reserved, and finally the reserved rectangular frames are used as final detection results.

The invention has the beneficial effects that:

1) the ship name character region detection function is completely realized through a neural network algorithm, and compared with the traditional method, the method well learns the detail characteristics of the character region, and greatly improves the detection precision and stability.

2) The invention only needs 210ms for detecting a picture with the resolution of 500 multiplied by 500, has high speed and timeliness.

3) The invention has wider application range, can be applied to the fields of license plate area detection and the like after being migrated, and has better application prospect.

4) The method can quickly and effectively automatically detect the ship name character area, has certain detection precision and speed, is not easily influenced by complex environments such as light rays and rainwater compared with the traditional detection method adopting the technologies such as threshold segmentation, edge detection and the like, and has better application prospect in the field of intelligent transportation.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2 is a structural diagram of a ship name character area detection network of the present invention.

FIG. 3 shows four different sample pictures of an original image to be detected according to the present invention.

FIG. 4 is a graph showing the results of the present invention, wherein (a), (b), (c) and (d) are the results of the four different samples shown in FIG. 3.

Detailed Description

The invention is further described with reference to the following figures and specific embodiments.

As shown in fig. 1, the implementation of the present invention is as follows:

the method comprises the following steps: through installing the high definition camera on the canal both sides and gathering past cargo ship picture as the sample picture, carry out the preliminary treatment to the sample picture, construct the name of a ship position data set, specific process is:

the acquired original image is scaled to a fixed size of 500 x 500. And (3) carrying out rectangular frame marking on the ship name character area by using marking software, and giving a label serial number 1, wherein the label serial number 0 is given to the background area without frame marking. The framed position information includes x and y values of coordinates at the upper left corner of the rectangular frame (a coordinate system is established by taking the upper left corner of the picture as an origin), and a width value and a length value of the rectangular frame. All pictures and labeled information form a ship name character area detection data set, and the labeled information comprises label serial numbers and position information of rectangular frames. Data sets were according to 4: the scale of 1 is divided into a training set and a validation set for training of the detection network.

Step two: building a ship name character area detection network shown in fig. 2, and sending the data set built in the step one into the network for training, wherein the specific process is as follows:

2.1) constructing a deep learning detection network, wherein the network is mainly divided into three modules: the device comprises a feature extraction module, a feature fusion module and a prediction module.

The feature extraction module is composed of a basic network module and a plurality of convolution modules. After the data is input into the feature extraction module, feature extraction is firstly performed through an underlying network module, wherein the underlying network module is a VGG16 network with a full connection layer removed. Three successive convolution modules are added behind the basic network module, and each convolution module has two convolution layers. The first convolutional layer (Conv6_1) of the first convolutional block contains 256 convolutional kernels of size 1 × 1 with step size 1 and fill size 0, and the second convolutional layer (Conv6_2) contains 512 convolutional kernels of size 3 × 3 with step size 2 and fill size 1. The first convolutional layer (Conv7_1) of the second convolutional block contains 128 convolutional kernels of 1 × 1 size with step size 1 and fill size 0, and the second convolutional layer (Conv7_2) uses 256 convolutional kernels of 3 × 3 size with step size 2 and fill size 1. The first convolutional layer (Conv8_1) of the third convolutional block contains 128 convolutional kernels of size 1 × 1 with step size 1 and fill size 0, and the second convolutional layer (Conv8_2) contains 256 convolutional kernels of size 3 × 3 with step size 2 and fill size 1. The results of all above convolutional layers are activated by the ReLU function.

The characteristic fusion module is divided into four subunits, each subunit comprises an anti-convolution layer (Deconv), an Eltwise layer and a convolution layer (Conv _ Eltwise) which are sequentially connected, and the convolution layers serve as the output of each subunit; and carrying out deconvolution, corresponding position pixel value addition and convolution operations in turn inside each subunit. The specific flow is that Conv8_2 is deconvoluted to make it consistent with Conv7_2, then Conv8_2 and Conv7_2 are added with the element values of the corresponding positions, then a convolution layer (Conv _ Eltwise _1) which uses 256 convolution kernels with the size of 3 × 3, the step size is 1, the filling size is 1 and the activation function is ReLU is processed. Thus, a first subunit is formed. Then, the result of the convolution layer (Conv _ Eltwise _1) of the first sub-unit is sent to the next sub-unit for processing, and a second sub-module is formed. By analogy, four subunits may be formed. The results of the convolutional layer of the first subunit (Conv _ Eltwise _1), the convolutional layer of the second subunit (Conv _ Eltwise _2), the convolutional layer of the third subunit (Conv _ Eltwise _3), the convolutional layer of the fourth subunit (Conv _ Eltwise _4) and the Conv8_2 layer are sent to a prediction module for processing.

The prediction module includes five prediction units, each of which generates a certain number of default bounding boxes centered around each pixel point on the feature map, the bounding boxes having aspect ratios of 1, 1/2, 1/3, 1/5, 1/7, and 1/10, respectively. For each bounding box, the prediction unit performs the computation of the class score and the position offset, respectively, using two convolution layers of convolution kernel size 1 × 5. The position offset has four values, which sequentially represent the x and y values of the coordinate point at the upper left corner of the bounding box (a coordinate system is established by taking the upper left corner of the picture as an origin) and the width value and the length value of the bounding box.

The detection network adopts a multitask loss function, and the loss function is the sum of class loss and position loss. Wherein, the category loss adopts a cross entropy cost function, and the position loss adopts a smooth L1 loss function.

2.2) sending the data set constructed in the step 1) into a deep learning detection network, and training a neural network by adopting a random gradient descent method until the error of the network reaches the minimum value. Wherein, the first 20K iterative learning rate is set to be 1 multiplied by 10-4The decay rate of the last 10K iterative learning is 1 x 10-5

Step three: firstly, selecting a high-score detection frame according to a preset confidence threshold, then removing a frame with high position coincidence degree in the detection result by using a non-maximum suppression algorithm, and obtaining the retained frame as a final detection result. And finally, drawing the detection result on the original drawing.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于卷积神经网络的片段和链接的场景文字的检测方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!