Image processing method, device and equipment and computer storage medium

文档序号：191750 发布日期：2021-11-02 浏览：25次中文

阅读说明：本技术 一种图像处理方法、装置、设备及计算机存储介质 (Image processing method, device and equipment and computer storage medium ) 是由杜森林杜松王邦军杨怀宇李磊于 2021-07-27 设计创作，主要内容包括：本申请实施例提供一种图像处理方法、装置、设备及计算机存储介质,涉及图像检测领域,用以在复杂场景下提高对图像中图形码定位检测的准确率。所述方法包括：获取待检测图像；将待检测图像输入至预先训练的关键点检测模型中,确定待检测图像中的对象的参考置信度以及对象的多组参考角点坐标信息,关键点检测模型根据训练样本集训练得到,训练样本集中包括多个样本图形码以及每一所述样本图形码的标签信息,标签信息包括所述样本图形码的多个角点坐标信息和中心点坐标信息；在参考置信度大于预设置信度阈值的情况下,确定对象为待定位图形码；对由多组参考角点坐标信息确定的多个参考边框进行筛选,确定待定位图形码的边框位置信息。(The embodiment of the application provides an image processing method, an image processing device, image processing equipment and a computer storage medium, relates to the field of image detection, and is used for improving the accuracy of positioning and detecting a graphic code in an image in a complex scene. The method comprises the following steps: acquiring an image to be detected; inputting an image to be detected into a pre-trained key point detection model, determining reference confidence of an object in the image to be detected and multiple groups of reference corner point coordinate information of the object, wherein the key point detection model is obtained by training according to a training sample set, the training sample set comprises a plurality of sample graphic codes and label information of each sample graphic code, and the label information comprises a plurality of corner point coordinate information and central point coordinate information of the sample graphic codes; determining the object as a graphic code to be positioned under the condition that the reference confidence coefficient is greater than a preset confidence coefficient threshold value; and screening a plurality of reference frames determined by a plurality of groups of reference corner point coordinate information to determine frame position information of the graphic code to be positioned.)

1. An image processing method, characterized in that the method comprises:

acquiring an image to be detected;

inputting the image to be detected into a pre-trained key point detection model, and determining reference confidence of an object in the image to be detected and multiple groups of reference corner point coordinate information of the object, wherein the first confidence represents the probability that the object is a graphic code, the key point detection model is obtained by training according to a training sample set, the training sample set comprises a plurality of sample graphic codes and label information of each sample graphic code, and the label information comprises a plurality of corner point coordinate information and central point coordinate information of the sample graphic codes;

determining the object as a graphic code to be positioned under the condition that the reference confidence coefficient is greater than a preset confidence coefficient threshold value;

and screening a plurality of reference frames determined by the plurality of groups of reference corner point coordinate information to determine frame position information of the graphic code to be positioned, wherein the frame position information comprises a plurality of corner point coordinate information.

2. The method according to claim 1, wherein the screening the plurality of reference frames determined by the plurality of sets of reference corner point coordinate information to determine frame position information of the graphic code to be positioned comprises:

obtaining local maximum values of the plurality of reference frames by adopting a non-maximum value suppression algorithm;

and screening the plurality of reference frames according to the local maximum value and a preset intersection ratio threshold value until no overlapped frame exists, and determining frame position information of the graphic code to be positioned.

3. The method of claim 1, wherein the keypoint detection model comprises a first network and a second network, wherein the first network comprises a convolutional cascaded combination comprising a first convolutional layer, a normalization layer, and an activation layer, and wherein the second network comprises a max-pooling layer and a second convolutional layer;

the inputting the image to be detected into a pre-trained key point detection model, and determining the reference confidence of the object in the image to be detected and the coordinate information of multiple groups of reference corner points of the object, includes:

inputting the image to be detected into the first network, and determining a plurality of depth features of the image to be detected, wherein the sizes of the depth features are different;

and inputting the depth features into the second network for fusion, and determining the reference confidence of the object in the image to be detected and the multiple groups of reference corner point coordinate information of the object.

4. The method according to claim 1, wherein prior to said acquiring an image to be detected, said method further comprises:

acquiring a training sample set, wherein the training sample set comprises a plurality of sample graphic codes and label information corresponding to each sample graphic code, and the label information is marked with four corner point coordinate information and central point coordinate information of the sample graphic codes;

and training a preset key point detection model by using the sample graphic codes in the training sample set until a training stopping condition is met, and obtaining the trained key point detection model.

5. The method of claim 4, wherein prior to said obtaining a set of training samples, the method further comprises:

acquiring a plurality of original graphic codes;

for each original graphic code, the following steps are respectively executed:

marking the version information and the four corner coordinate information of the graphic code to be processed;

generating central point coordinate information of the original graphic code according to the version information and the four corner point coordinate information;

and determining the original graphic code marked with the four corner point coordinate information and the central point coordinate information as the sample graphic code.

6. The method according to claim 4, wherein the training of the preset keypoint detection model by using the sample pattern code in the training sample set until a training stop condition is met to obtain a trained keypoint detection model comprises:

inputting the training sample set into the preset key point detection model, and determining prediction information of each sample graphic code, wherein the prediction information comprises a prediction confidence coefficient of the sample graphic code and four prediction corner coordinate information;

determining a loss function value of the preset key point detection model according to the prediction information and the label information of each sample graphic code;

and under the condition that the loss function value does not meet the training stopping condition, adjusting model parameters of the key point detection model, and training the adjusted key point detection model by using the sample graphic code until the loss function value meets the training stopping condition to obtain the trained key point detection model.

7. The method according to claim 6, wherein the inputting the training sample set into the predetermined keypoint detection model and determining the prediction information of each sample pattern code comprises:

inputting the training sample set into the preset key point detection model, and determining the coordinate information of a predicted central point of a target sample graphic code, wherein the target sample graphic code is any sample in the training sample set;

determining the position of a predicted central point of the target sample graphic code according to the coordinate information of the predicted central point;

and determining the prediction confidence of the target sample graphic code according to the position relation between the prediction central point position and a target grid, wherein the target grid is a preset area determined according to the central point coordinate information in the target sample graphic code label information.

8. The method of claim 6, wherein determining the loss function value of the predetermined keypoint detection model according to the prediction information and the label information of each sample graphic code comprises:

calculating the regression loss of the prediction confidence coefficient by using a cross entropy function to obtain a first loss function value;

calculating the regression loss of the coordinate information of the prediction reference corner point by using a mean square error function to obtain a second loss function value;

and determining a loss function value of the preset key point detection model according to the first loss function value and the second loss function value.

9. An image processing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring an image to be detected;

the processing module is used for inputting the image to be detected into a pre-trained key point detection model, and determining a reference confidence coefficient of an object in the image to be detected and multiple groups of reference corner point coordinate information of the object, wherein the first confidence coefficient represents the probability that the object is a graphic code, the key point detection model is obtained by training according to a training sample set, the training sample set comprises a plurality of sample graphic codes and label information of each sample graphic code, and the label information comprises a plurality of corner point coordinate information and central point coordinate information of the sample graphic codes;

the first determining module is used for determining the object as a graphic code to be positioned under the condition that the reference confidence coefficient is greater than a preset confidence coefficient threshold value;

and the second determining module is used for screening a plurality of reference frames determined by the plurality of groups of reference corner point coordinate information and determining frame position information of the graphic code to be positioned, wherein the frame position information comprises a plurality of corner point coordinate information.

10. An image processing apparatus, characterized in that the apparatus comprises: a processor, and a memory storing computer program instructions; the processor reads and executes the computer program instructions to implement the image processing method of any one of claims 1 to 8.

11. A computer storage medium having computer program instructions stored thereon which, when executed by a processor, implement the image processing method of any one of claims 1 to 8.

Technical Field

The present application relates to the field of image detection, and in particular, to an image processing method, apparatus, device, and computer storage medium.

Background

The graphic code is a most common coding form in daily life, and is widely applied to scenes such as mobile payment and information acquisition. However, in the graphic code image, due to the printing problems, such as printing strips, under printing, and the like, and due to the small proportion, blurring, smudging, uneven illumination, uneven geometric deformation, and the like of the graphic code image, the graphic code image is abnormal in imaging, and therefore the graphic code cannot be accurately positioned.

In the existing graphic code positioning and identifying technology, the main positioning method is to scan an image by a line-by-line and column-by-column point-by-point scanning algorithm to judge a boundary, find pattern features for positioning according to light and dark width flows obtained by gradient transformation, screen by a transverse line segment set and a longitudinal line segment set, and determine the approximate position of a position detection pattern according to the ratio of black and white pixels of the position detection pattern. The method can achieve higher accurate recognition rate and real-time performance for a perfect graphic code, namely an image with good printing, no abnormal imaging and single background, but can reduce generalization performance of the graphic code image in a complex scene, and simultaneously, the speed of positioning and detecting the graphic is slow and inaccurate, so that the accurate recognition rate and the recognition speed of the graphic code are low, and the requirements cannot be met.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, image processing equipment and a computer storage medium, which are used for improving the accuracy of positioning and detecting a graphic code in an image in a complex scene.

In a first aspect, an embodiment of the present application provides an image processing method, where the method includes:

acquiring an image to be detected;

inputting an image to be detected into a pre-trained key point detection model, and determining reference confidence of an object in the image to be detected and multiple groups of reference corner point coordinate information of the object, wherein the first confidence represents the probability that the object is a graphic code, the key point detection model is obtained by training according to a training sample set, the training sample set comprises a plurality of sample graphic codes and label information of each sample graphic code, and the label information comprises a plurality of corner point coordinate information and center point coordinate information of the sample graphic codes;

determining the object as a graphic code to be positioned under the condition that the reference confidence coefficient is greater than a preset confidence coefficient threshold value;

and screening a plurality of reference frames determined by a plurality of groups of reference corner point coordinate information, and determining frame position information of the graphic code to be positioned, wherein the frame position information comprises a plurality of corner point coordinate information.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the acquisition module is used for acquiring an image to be detected;

the processing module is used for inputting an image to be detected into a pre-trained key point detection model, and determining reference confidence of an object in the image to be detected and multiple groups of reference corner point coordinate information of the object, wherein the first confidence represents the probability that the object is a graphic code, the key point detection model is obtained by training according to a training sample set, the training sample set comprises a plurality of sample graphic codes and label information of each sample graphic code, and the label information comprises a plurality of corner point coordinate information and central point coordinate information of the sample graphic codes;

the first determining module is used for determining that the object is a graphic code to be positioned under the condition that the reference confidence coefficient is greater than a preset confidence coefficient threshold value;

and the second determining module is used for screening a plurality of reference frames determined by a plurality of groups of reference corner point coordinate information and determining frame position information of the graphic code to be positioned, wherein the frame position information comprises a plurality of corner point coordinate information.

In a third aspect, an embodiment of the present application provides an image processing apparatus, including:

a processor, and a memory storing computer program instructions; the processor reads and executes the computer program instructions to implement the image processing method as provided by the first aspect of the embodiments of the present application.

In a fourth aspect, an embodiment of the present application provides a computer storage medium, on which computer program instructions are stored, and when executed by a processor, the computer program instructions implement the image processing method provided in the first aspect of the embodiment of the present application.

The image processing method provided by the embodiment of the application comprises the steps of firstly obtaining an image to be detected; inputting the image to be detected into a pre-trained key point detection model, and determining reference confidence of an object in the image to be detected and multiple groups of reference corner point coordinate information of the object, wherein the first confidence represents the probability that the object is a graphic code, the key point detection model is obtained by training according to a training sample set, the training sample set comprises a plurality of sample graphic codes and label information of each sample graphic code, and the label information comprises a plurality of corner point coordinate information and center point coordinate information of the sample graphic codes; determining the object as a graphic code to be positioned under the condition that the reference confidence coefficient is greater than a preset confidence coefficient threshold value; and screening a plurality of reference frames determined by a plurality of groups of reference corner point coordinate information, and determining frame position information of the graphic code to be positioned, wherein the frame position information comprises a plurality of corner point coordinate information. Compared with the prior art, the method and the device have the advantages that the plurality of corner coordinate information of the sample graphic code are marked in the training sample of the key point detection model, so that the limitation of a rectangular frame is avoided in the image processing process, the frame position of the graphic code is positioned through the plurality of corner coordinate information, the graphic code which is affected by the abnormal influences of dirt, uneven illumination, uneven geometric deformation and the like can be well positioned, in addition, the dual detection of confidence coefficient and the corner coordinate information is carried out on the graphic code to be positioned, the position of the graphic code is accurately positioned while the error recognition probability of the graphic code is reduced, and the recognition accuracy of the graphic code is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a method for positioning and detecting a graphic code in the prior art;

FIG. 2 is a schematic diagram of a prior art key point detection model;

fig. 3 is a schematic structural diagram of a graphic code functional area according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a keypoint detection model provided in an embodiment of the present application;

fig. 5 is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 6 is a schematic flowchart of an image processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The image processing algorithm is one of important research directions in the field of computer vision, and plays an important role in the fields of public safety, road traffic, video monitoring and the like. The graphic code is a most common coding form in daily life, and is widely applied to scenes such as mobile payment and information acquisition. However, in the graphic code image, due to the printing problems, such as printing strips, under printing, etc., and due to the small proportion, blurring, smudging, uneven illumination, uneven geometric deformation, etc., of the graphic code image, the imaging of the graphic code is abnormal, so that the graphic code in the image cannot be accurately positioned.

In recent years, with the development of image processing algorithms based on deep learning, image processing has been increasing in accuracy. In the prior art, the graphic code in the image is positioned by the following two ways:

one, conventional computer vision algorithm:

in a traditional computer vision algorithm, as shown in fig. 1, an image is scanned by a line-by-line and column-by-column point-by-point scanning algorithm to judge a boundary, a light-dark width flow is obtained according to gradient transformation to search pattern features for positioning, screening is performed by a transverse line segment set and a longitudinal line segment set, and the approximate position of a position detection pattern is determined according to the ratio of black pixels to white pixels of the position detection pattern. The center coordinates of the position detection pattern are repositioned. In order to improve the accuracy of positioning the graphic code, methods such as using a line detection method, such as hough transform, positioning the graphic code by searching two groups of straight lines meeting the mutually perpendicular requirements, morphology and the like are also proposed.

Secondly, a key point detection algorithm based on Mask R-CNN:

as shown in fig. 2, a Mask area Convolutional Neural Network (Mask registers with Convolutional Neural Network, Mask R-CNN) is an example segmentation algorithm, which is improved based on a fast area Convolutional Neural Network (fast R-CNN), and a Full Convolutional Network (FCN) branch is added under the frame of the fast R-CNN to output a Mask, and a Region of Interest activation layer (Region of Interest Align, roilign) is also provided to improve a Region of Interest Pooling layer (Region of Interest Pooling). Mask R-CNN can realize the positioning of the outline of the example, classify each pixel and realize the accurate segmentation at the pixel level. For the key point detection task, the positions of the key points are independently modeled, and the cross entropy is used as a loss function, so that higher key point positioning accuracy can be realized.

The two algorithms are common technologies for positioning and detecting the graphic code in the image, and can achieve higher accurate recognition rate and meet requirements on the perfect graphic code, namely, the image with good printing, no abnormal imaging and single background can be obtained, but the problem of abnormal imaging of the graphic code caused by printing strips, under printing, blurring, smudging, uneven illumination, uneven geometric deformation and the like in the complex background is not considered, so that the application of the algorithms in more complex scenes is ignored, the application range is narrower, the speed of positioning position detection of the graphic is low and inaccurate, the accurate recognition rate of the graphic code is low, and the recognition speed is low.

Based on the above, the embodiment of the application provides an image processing method, which can better locate a graphic code which is affected by abnormal influences such as dirt, uneven illumination, uneven geometric deformation and the like in an image based on a pre-trained key point detection model, perform double detection on confidence and corner coordinate information of the graphic code to be located, reduce the probability of misrecognition of the graphic code, accurately locate the position of the graphic code, be suitable for more complex scenes, and improve the accuracy of locating the graphic code in the image.

The main purpose of the embodiments of the present application is to locate a graphic code in an image, and the following first introduces the nature of the graphic code. The graphic code is a matrix two-dimensional code symbol, and compared with a one-dimensional bar code and other two-dimensional bar codes, the graphic code has the advantages of large information capacity, high reliability, capability of representing various data types, strong confidentiality and anti-counterfeiting performance and the like. As shown in fig. 3, each part of the graphic code has its own role, and can be basically divided into a functional region and a coding region. The functional area has 3 position detecting patterns for identifying the position and direction of the pattern code, the positioning pattern 5 for positioning to prevent the pattern code from distortion, and the correcting pattern for aligning the pattern. The format information of the coding region stores formatted data, the version information is used for more than version 7, the data and the error correction code words store actual graphic code information, and the error correction code words are used for correcting errors caused by graphic code damage.

It should be noted that, in the image processing method provided in the embodiment of the present application, it is necessary to recognize an image by using a pre-trained keypoint detection model, and therefore, before performing image processing by using the keypoint detection model, the keypoint detection model needs to be trained first. Therefore, a specific implementation of the method for training the keypoint detection model provided in the embodiment of the present application is described first below.

The embodiment of the application provides a method for training a key point detection model, wherein the key point detection model comprises a first network and a second network, the first network comprises a convolution cascade combination, the convolution cascade combination comprises a first convolution layer, a normalization layer and an activation layer, the second network comprises a maximum pooling layer and a second convolution layer, and iterative training is carried out on a preset key point detection model through a classification and regression detection algorithm until a training stop condition is met. The method can be realized by the following steps:

firstly, acquiring an original graphic code.

In order to improve the generalization and robustness of a key point detection model, in the embodiment of the application, a large number of unreal graphic codes generated randomly are used for simulating poor printing and abnormal imaging which may occur in an industrial production line, the unreal graphic codes are randomly projected in the background of a COCO public data set, and then are randomly transformed and synthesized with a large number of graphic codes in real industrial scenes, such as random synthesis of zooming, blurring, distortion, noise, dirt and the like, the synthesized graphic codes are used as original graphic codes, and possible transformation in the real industry is simulated for subsequent sample marking.

And secondly, marking the original graphic code to generate a sample training set.

Firstly, the original graphic codes are labeled manually, and the labeled contents are label clause label and four corner point coordinate information of each graphic code.

Secondly, in order to ensure the accuracy of the labeling, only label and four corner point coordinate information in the manual labeling, and the central point coordinate information is generated according to the version information of the graphic code and the four corner point coordinate information based on a transformation matrix. Wherein, the version information is determined by label of the graphic code.

And finally, taking the graphic code with the central point coordinate information and the four corner point coordinate information as a sample graphic code, wherein a plurality of sample graphic codes form a sample training set.

It should be noted that, in the scheme for detecting a graphic code through a deep learning neural network in the prior art, when the graphic code is labeled to form a sample training set, the labeling method is as follows: and marking coordinate information of five points, wherein the five points are respectively a center of the three position detection graph, a center of the correction graph and a center point of the graph code.

However, in the marking mode, when the graphic code is printed badly to cause abnormal imaging, the marked position detection graphic center can cause poor training model effect, and further obvious coordinate deviation can occur when the graphic code is positioned and detected through the trained model, so that the identification accuracy of the graphic code is low.

And thirdly, integrating the labeled graphic codes and the labeling information corresponding to each graphic code into a training sample set, wherein the training sample set comprises a plurality of sample graphic codes.

It should be noted that, because the keypoint detection model needs to be iteratively trained for multiple times to adjust the loss function value until the loss function value meets the training stop condition, the trained keypoint detection model is obtained, and in each iterative training, if only one sample pattern code is input, the amount of samples is too small to facilitate the training adjustment of the keypoint detection model, so that the multiple sample pattern codes in the training sample set are used to iteratively train the keypoint detection model.

And fourthly, training a preset key point detection model by using the sample graphic codes in the training sample set until the training stopping condition is met, and obtaining the trained key point detection model.

It should be noted that, as shown in fig. 4, the keypoint detection model includes a first network and a second network, where the first network includes a convolution cascade combination, the convolution cascade combination includes a first convolution layer, a normalization layer, and an activation layer, and the second network includes a maximum pooling layer and a second convolution layer.

After the convolutional layer is used for extracting the image features through the first network, in the prior art, the convolutional neural network is connected with a full connection layer, and then an activation function is used. And the full-connection layer is used for reducing the dimension of the feature map of the key point, so that the parameter quantity is large, and overfitting is easily caused. In the key point detection model of the embodiment of the application, the second network is used for replacing a common full-connection layer after the convolutional neural network, and the whole model is a CNN network without the full-connection layer. The second network is composed of a pooling layer and a convolutional layer, after which the second network is used instead of a fully connected layer, with the following advantages in comparison:

firstly, a large number of parameters for training and tuning are needed for the full connection layer, and the space parameters are greatly reduced after replacement, so that the model is more robust, and the anti-overfitting effect is better;

and the second network reserves the spatial information extracted by each convolution layer and each pooling layer, so that the effect is obviously improved in practical application.

Specifically, the training process of the keypoint detection model may have the following steps:

and 4.1, extracting a plurality of depth features of the sample graphic code by utilizing a first network in a preset key point detection model to form a key point feature map, wherein the sizes of the depth features are different.

And 4.2, inputting the feature map into a second network in a preset key point detection model for fusion, and determining the prediction information corresponding to the sample graphic code, wherein the prediction information comprises the prediction confidence coefficient of the sample graphic code and the coordinate information of the four prediction corner points.

Specifically, the prediction confidence may be obtained by:

inputting the training sample set into a preset key point detection model, and determining the coordinate information of a predicted central point of a target sample graphic code, wherein the target sample graphic code is any one sample in the training sample set;

determining the position of a predicted central point of the target sample graphic code according to the coordinate information of the predicted central point;

and determining the prediction confidence coefficient of the target sample graphic code according to the position relation between the prediction central point position and a target grid, wherein the target grid is a preset area determined according to the central point coordinate information in the target sample graphic code label information.

4.3, calculating a loss value between the predicted identification information and the label information of the sample graph code.

In one embodiment, the method for determining the loss function value of the preset keypoint detection model may include the following steps:

calculating the regression loss of the prediction confidence coefficient by using a cross entropy function to obtain a first loss function value;

calculating the regression loss of the coordinate information of the prediction reference corner point by using a mean square error function to obtain a second loss function value;

and determining a loss function value of the preset key point detection model according to the first loss function value and the second loss function value.

Specifically, a binary cross-loss entropy (binary cross-entropy) is used to calculate the classification loss of the confidence, and the confidence error calculation is a weighted average of the confidence of the graphic code and the confidence of the absence of the graphic code, which can be expressed by formula 1.

Wherein λ is_objWeight coefficient, λ, indicating presence of a target_noobjRepresenting weight coefficients without targets. In the examples of the present application, 1: 100, the confidence of the absence of the target may be adjusted to approach 0, and the confidence of the presence of the target may approach 1.

In addition, the activation function corresponding to the confidence coefficient uses a Sigmoid function, and the corner point coordinate information uses a tanh function. The regression loss to the four corner coordinates of the object with the target is the mean square error, which represents the averaging of the sum of squares of the differences between the predicted values and the target values. To balance the weight of the two loss functions, the loss function of the corner coordinate information is multiplied by the height and width of the grid, which can be expressed by equation 2.

The total loss function value of the keypoint detection model is determined by adding the above-mentioned confidence to the loss value of the corner coordinate information, which can be represented by equation 3.

Loss＝Loss_conf+Loss_prexyEquation 3

Optimizing the key point detection model according to the loss function shown in the formula 1-formula 3, reversely updating the network parameters by using a gradient descent algorithm to obtain an updated key point detection model, stopping the optimization training until the loss function value is smaller than a preset value, and determining the trained key point detection model.

It should be noted that, in order to improve the accuracy of the key point detection model, the key point detection model may also be continuously trained by using new training samples in practical applications, so as to continuously update the key point detection model, improve the accuracy of the key point detection model, and further improve the accuracy of image processing.

In the above specific implementation of the method for training a keypoint detection model provided in the embodiment of the present application, the keypoint detection model obtained through the above training can be applied to the image processing method provided in the following embodiment.

A specific implementation of the image processing method provided in the present application is described in detail below with reference to fig. 5.

As shown in fig. 5, an embodiment of the present application provides an image processing method, including:

s501, acquiring an image to be detected.

In some embodiments, the image to be detected may be acquired by a camera, or a frame extraction process may be performed on a pre-acquired video to determine the image to be detected.

S502, inputting an image to be detected into a pre-trained key point detection model, and determining reference confidence of an object in the image to be detected and multiple groups of reference corner point coordinate information of the object, wherein the first confidence represents the probability that the object is a graphic code, the key point detection model is obtained by training according to a training sample set, the training sample set comprises a plurality of sample graphic codes and label information of each sample graphic code, and the label information comprises a plurality of corner point coordinate information and center point coordinate information of the sample graphic codes.

After the original graphic codes are obtained, manually marking the original graphic codes, wherein the marked contents are label short sentence label and four corner point coordinate information of each graphic code; in order to ensure the accuracy of labeling, only label and four corner point coordinate information in manual labeling, the central point coordinate information is based on the transformation matrix and generated according to the version information of the graphic code and the four corner point coordinate information. Wherein, the version information is determined by label of the graphic code; and taking the graphic code with the central point coordinate information and the four corner point coordinate information as a sample graphic code, wherein a plurality of sample graphic codes form a sample training set.

In the above S502, the keypoint detection model includes a first network and a second network, where the first network includes a convolution cascade combination, the convolution cascade combination includes a first convolution layer, a normalization layer, and an activation layer, and the second network includes a maximum pooling layer and a second convolution layer;

the above inputting the image to be detected into the pre-trained key point detection model, determining the reference confidence of the object in the image to be detected and the multi-group reference corner coordinate information of the object, may include:

inputting an image to be detected into a first network, and determining a plurality of depth features of the image to be detected, wherein the sizes of the depth features are different;

and inputting the depth features into a second network for fusion, and determining the reference confidence of the object in the image to be detected and the multiple groups of reference corner point coordinate information of the object.

In the training sample labeling mode of the key point detection model provided by the embodiment of the application, the four corner points of the polygonal labeling graphic code are adopted for labeling the image, and the labeling of the central point of the graphic code is completed according to the inherent properties of the graphic code, so that the training process and the model detection of the whole model are greatly improved; meanwhile, the model structure of the key point detection model does not comprise a full connection layer, and on the basis of keeping a plurality of depth features extracted from the first network, the model structure based on the second network greatly reduces the space parameters, so that the model has better anti-over-fitting effect, and can be well applied to the accurate positioning of graphic codes in industry.

And S503, determining the object as a graphic code to be positioned under the condition that the reference confidence is greater than a preset confidence threshold.

In the embodiment of the application, the confidence coefficient and the angular point coordinate information of the graphic code to be positioned are detected doubly, so that the position of the graphic code is accurately positioned while the probability of error identification of the graphic code is reduced, and the identification accuracy of the graphic code is improved.

S504, a plurality of reference frames determined by a plurality of groups of reference corner point coordinate information are screened, and frame position information of the graphic code to be positioned is determined, wherein the frame position information comprises a plurality of corner point coordinate information.

In the above S504, the screening a plurality of reference frames determined by the plurality of sets of reference corner point coordinate information to determine frame position information of the graphic code to be positioned may include:

obtaining local maximum values of a plurality of reference frames by adopting a non-maximum value suppression algorithm;

Specifically, the non-maximum suppression process uses a local maximum search method to determine the most accurate bounding box among a plurality of reference bounding boxes of the graphic code to be located. The searching method for the local peak extreme point can use window judgment, but the size of the window determines the quality of the algorithm, and the window method is difficult to deal with for the problem of continuous peaks. In the embodiment of the application, the morphological thought is adopted to expand from the inside, the original image is expanded, and the adjacent local maximum values are combined to be smaller than the expanded size. The position where the original image equals the extended image will be returned as a local maximum.

According to the image processing method provided by the embodiment of the application, the plurality of corner coordinate information of the sample graphic code are marked in the training sample of the key point detection model, so that the limitation of a rectangular frame is avoided in the image processing process, the frame position of the graphic code is positioned through the plurality of corner coordinate information, the graphic code which is affected by the abnormal influences of dirt, uneven illumination, uneven geometric deformation and the like can be well positioned, in addition, the double detection of confidence coefficient and the corner coordinate information is carried out on the graphic code to be positioned, the position of the graphic code is accurately positioned while the misrecognition probability of the graphic code is reduced, and the recognition accuracy of the graphic code is improved.

Based on the same inventive concept of the image processing method, the embodiment of the application also provides an image processing device.

As shown in fig. 6, an embodiment of the present application provides an image processing apparatus including:

an obtaining module 601, configured to obtain an image to be detected;

the processing module 602 is configured to input an image to be detected into a pre-trained keypoint detection model, and determine a reference confidence of an object in the image to be detected and multiple sets of reference corner coordinate information of the object, where a first confidence represents a probability that the object is a pattern code, the keypoint detection model is obtained by training according to a training sample set, the training sample set includes multiple sample pattern codes and label information of each sample pattern code, and the label information includes multiple corner coordinate information and center coordinate information of the sample pattern codes;

a first determining module 603, configured to determine that the object is a graphic code to be positioned when the reference confidence is greater than a preset confidence threshold;

the second determining module 604 is configured to screen multiple reference frames determined by multiple sets of reference corner coordinate information, and determine frame position information of the graphic code to be positioned, where the frame position information includes multiple corner coordinate information.

In some embodiments, the second determining module may be specifically configured to:

obtaining local maximum values of a plurality of reference frames by adopting a non-maximum value suppression algorithm;

In some embodiments, the keypoint detection model comprises a first network comprising a convolutional cascaded combination comprising a first convolutional layer, a normalization layer, and an activation layer, and a second network comprising a max-pooling layer and a second convolutional layer;

the processing module may specifically be configured to:

inputting an image to be detected into a first network, and determining a plurality of depth features of the image to be detected, wherein the sizes of the depth features are different;

In some embodiments, the apparatus may further comprise:

the second acquisition module is used for acquiring a training sample set, wherein the training sample set comprises a plurality of sample graphic codes and label information corresponding to each sample graphic code, and the label information is marked with four corner point coordinate information and central point coordinate information of the sample graphic codes;

and the training module is used for training the preset key point detection model by using the sample graphic code in the training sample set until the training stopping condition is met, so as to obtain the trained key point detection model.

In some embodiments, the apparatus may further include a labeling module, and the labeling module may be specifically configured to:

acquiring a plurality of original graphic codes;

for each original graphic code, the following steps are respectively executed:

marking version information and four corner point coordinate information of the graphic code to be processed;

generating central point coordinate information of the original graphic code according to the version information and the four corner point coordinate information;

and determining the original graphic code marked with the coordinate information of the four corner points and the coordinate information of the central point as a sample graphic code.

In some embodiments, the training module may be specifically configured to:

inputting the training sample set into a preset key point detection model, and determining the prediction information of each sample graphic code, wherein the prediction information comprises the prediction confidence coefficient of the sample graphic code and the coordinate information of four prediction angular points;

determining a loss function value of a preset key point detection model according to the prediction information and the label information of each sample graphic code;

and under the condition that the loss function value does not meet the training stopping condition, adjusting model parameters of the key point detection model, training the adjusted key point detection model by using the sample image until the loss function value meets the training stopping condition, and obtaining the trained key point detection model.

In some embodiments, the training module may be specifically configured to:

determining the position of a predicted central point of the target sample graphic code according to the coordinate information of the predicted central point;

In some embodiments, the training module may be specifically configured to:

calculating the regression loss of the prediction confidence coefficient by using a cross entropy function to obtain a first loss function value;

calculating the regression loss of the coordinate information of the prediction reference corner point by using a mean square error function to obtain a second loss function value;

and determining a loss function value of the preset key point detection model according to the first loss function value and the second loss function value.

Other details of the image processing apparatus according to the embodiment of the present application are similar to those of the image processing method according to the embodiment of the present application described above with reference to fig. 5, and are not repeated herein.

Fig. 7 shows a hardware structure diagram of image processing provided by an embodiment of the present application.

The image processing method and apparatus provided according to the embodiment of the present application described in conjunction with fig. 5 and 6 may be implemented by an image processing device. Fig. 7 is a diagram showing a hardware configuration 700 of an image processing apparatus according to an embodiment of the present invention.

A processor 701 and a memory 702 storing computer program instructions may be included in the image processing apparatus.

Specifically, the processor 701 may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the embodiments of the present Application.

Memory 702 may include a mass storage for data or instructions. By way of example, and not limitation, memory 702 may include a Hard Disk Drive (HDD), a floppy Disk Drive, flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. In one example, memory 702 may include removable or non-removable (or fixed) media, or memory 702 is non-volatile solid-state memory. The memory 702 may be internal or external to the integrated gateway disaster recovery device.

In one example, the Memory 702 may be a Read Only Memory (ROM). In one example, the ROM may be mask programmed ROM, programmable ROM (prom), erasable prom (eprom), electrically erasable prom (eeprom), electrically rewritable ROM (earom), or flash memory, or a combination of two or more of these.

The processor 701 reads and executes the computer program instructions stored in the memory 702 to implement the methods/steps S501 to S504 in the embodiment shown in fig. 5, and achieve the corresponding technical effects achieved by the embodiment shown in fig. 5 executing the methods/steps thereof, which are not described herein again for brevity.

In one example, the image processing device may also include a communication interface 703 and a bus 710. As shown in fig. 7, the processor 701, the memory 702, and the communication interface 703 are connected by a bus 710 to complete mutual communication.

The communication interface 703 is mainly used for implementing communication between modules, apparatuses, units and/or devices in this embodiment of the application.

Bus 710 comprises hardware, software, or both to couple the components of the online data traffic billing device to each other. By way of example, and not limitation, a Bus may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (Front Side Bus, FSB), a Hyper Transport (HT) interconnect, an Industry Standard Architecture (ISA) Bus, an infiniband interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a Micro Channel Architecture (MCA) Bus, a Peripheral Component Interconnect (PCI) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a video electronics standards association local (VLB) Bus, or other suitable Bus or a combination of two or more of these. Bus 710 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

The image processing equipment provided by the embodiment of the application marks a plurality of corner coordinate information of the sample graphic code in a training sample of a key point detection model, so that the limitation of a rectangular frame is avoided in the image processing process, the frame position of the graphic code is positioned through the plurality of corner coordinate information, and the graphic code which is polluted, uneven in illumination, and irregularly influenced by geometry, and the like can be better positioned.

In addition, in combination with the image processing method in the foregoing embodiments, the embodiments of the present application may be implemented by providing a computer storage medium. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement any of the image processing methods in the above embodiments.

It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

19页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于非对称注意力特征融合的红外弱小目标检测方法

Image processing method, device and equipment and computer storage medium

相关技术

网友询问留言