Deep learning object classification method based on multi-channel model fusion

文档序号:105807 发布日期:2021-10-15 浏览:13次 中文

阅读说明:本技术 一种基于多通道模型融合的深度学习物体分类方法 (Deep learning object classification method based on multi-channel model fusion ) 是由 安康 李国承 林雪松 刘翔鹏 李一染 于 2021-04-02 设计创作,主要内容包括:本发明公开了一种基于多通道模型融合的深度学习物体分类方法,其步骤如下:将待分类物品放在视觉图像采集系统的转盘中心上,开启视觉图像采集系统采集待分类物品的图像;将各图像采集设备采集到的图像分别对应输入与图像采集设备对应的多个第一处理模型得到各图像采集设备对应的预测概率矩阵;对各预测概率矩阵进行融合得到融合矩阵;将融合矩阵输入第二处理模型即得到待分类物品的分类结果。本发明的基于多通道模型融合的深度学习物体分类方法,针对多个特征面建立多个网络提取特征并融合的模型,与现有技术相比,具有更强的特征提取能力;与传统方法相比有更高的精度,尤其对于相似零件分类极大地提高了分类精度,极具应用前景。(The invention discloses a deep learning object classification method based on multi-channel model fusion, which comprises the following steps: placing the articles to be classified on the center of a turntable of a visual image acquisition system, and starting the visual image acquisition system to acquire images of the articles to be classified; correspondingly inputting the images acquired by each image acquisition device into a plurality of first processing models corresponding to the image acquisition devices respectively to obtain a prediction probability matrix corresponding to each image acquisition device; fusing the prediction probability matrixes to obtain a fusion matrix; and inputting the fusion matrix into a second processing model to obtain a classification result of the articles to be classified. The deep learning object classification method based on the multi-channel model fusion is characterized in that a plurality of network extraction feature fusion models are established for a plurality of feature planes, and compared with the prior art, the deep learning object classification method based on the multi-channel model fusion has stronger feature extraction capability; compared with the traditional method, the method has higher precision, particularly greatly improves the classification precision for the classification of similar parts, and has great application prospect.)

1. A deep learning object classification method based on multi-channel model fusion is characterized by comprising the following steps:

(1) placing the articles to be classified on the center of a turntable of a visual image acquisition system, and starting the visual image acquisition system to acquire images of the articles to be classified;

(2) correspondingly inputting the images acquired by each image acquisition device into a plurality of first processing models corresponding to the image acquisition devices respectively to obtain a prediction probability matrix corresponding to each image acquisition device;

(3) fusing the prediction probability matrixes obtained in the step (2) to obtain a fusion matrix;

(4) inputting the fusion matrix obtained in the step (3) into a second processing model to obtain a classification result of the articles to be classified;

the visual image acquisition system comprises a turntable for placing articles to be classified, wherein the turntable is connected with a turntable driving device and can horizontally rotate under the driving of the turntable driving device;

more than two image acquisition devices with the vision centers aligned with the center of the turntable are arranged around the turntable, the image acquisition devices are positioned above the turntable, the image acquisition devices are arranged on different directions of the turntable and have different height differences with the turntable;

the image acquisition equipment and the turntable driving device are respectively connected with the central processing unit;

each first processing model corresponds to the image acquisition equipment one by one, the first processing model is a Densenet model, the training process takes the image of an article with a known class as input, the corresponding class probability of the article as theoretical output, and the process of continuously adjusting the model parameters is carried out, and the termination condition of the training is that the upper limit of the training times is reached;

the second processing model is a BP neural network, the training process is a process of continuously adjusting model parameters by taking the fusion matrix of each image of the article with the known class as input and the corresponding class probability of the article as theoretical output, the termination condition of the training is that the upper limit of the training times is reached, and the fusion matrix of each image of the article with the known class is a matrix obtained by fusing prediction probability matrixes obtained by inputting each image of the article with the known class into the first processing model.

2. The method for classifying deep learning objects based on multi-channel model fusion as claimed in claim 1, wherein the images of the objects of the known category are obtained by placing the objects of the known category on a center of a turntable of the visual image acquisition system and starting the visual image acquisition system to acquire the objects.

3. The method for classifying deep learning objects based on multi-channel model fusion as claimed in claim 1, wherein the central processing unit can acquire the pictures of the image to be acquired at different angles through the image acquisition device by rotating the turntable driving device, and can complete data enhancement after processing the acquired pictures of the image to be acquired at different angles.

4. The deep learning object classification method based on multi-channel model fusion as claimed in claim 1, characterized in that a frame is sleeved outside the turntable, and each image acquisition device and each light source are fixed on the frame;

and a black back plate is arranged at the bottom of the turntable.

5. The deep learning object classification method based on multi-channel model fusion is characterized in that the light source is arranged above the turntable, and the frame is sleeved with a soft cover;

the frame is a square frame;

the turntable driving device is a driving motor;

the image acquisition equipment is total three, arranges respectively in the A side of frame, B side and top, and the difference in height of three image acquisition equipment and carousel is all different, and A side and B side are mutually perpendicular.

6. The deep learning object classification method based on multi-channel model fusion is characterized in that corners of the frame are in rounded transition; the frame is formed by fixedly splicing a plurality of aluminum alloy square tubes.

7. The method for deep learning object classification based on multi-channel model fusion as claimed in claim 2, wherein the first processing model is specifically a densenert 121 network model;

the set of data sets used to train the first processing model includes images of items of a known class acquired by the corresponding image acquisition device and their corresponding classes.

8. The method according to claim 2, wherein the set of data sets used for training the second processing model includes a matrix fused matrix obtained by inputting the image of an object of a known class acquired by each image acquisition device in the visual image acquisition system into the corresponding first processing model and the class corresponding to the matrix fused matrix.

9. The method for deep learning object classification based on multi-channel model fusion as claimed in claim 1, wherein the image collected by the image collecting device needs to be preprocessed before application as follows:

(1) graying;

(2) removing image noise;

(3) using a canny operator to carry out edge detection;

(4) finding out the minimum external square according to the edge, and intercepting the whole external square;

(5) and scaling the square image to a proper size by a bilinear interpolation method.

Technical Field

The invention belongs to the technical field of visual detection, relates to a deep learning object classification method based on multi-channel model fusion, and particularly relates to a method for completing object classification by applying a convolutional neural network based on model fusion after multi-surface object images are collected.

Background

With the rapid development of multimedia technology and internet, the classification of object images has become a hot problem for domestic and foreign research. The rapid and high-precision image classification and identification algorithm is a basic premise for realizing various practical applications, so that the method has very important significance for the research of image classification.

The biggest difference between deep learning and the traditional image classification method is that the method automatically learns characteristics from big data instead of manually designed characteristics, and good characteristics can greatly improve the performance of image recognition. Deep learning can automatically learn a representation of a feature from large data, which can contain thousands of parameters. Manually designing the effective feature is a rather lengthy process. Reviewing the history of computer vision development, it often takes five to ten years to develop a well-recognized feature. And deep learning can quickly learn new effective feature representations from training data for new applications.

The classification method based on deep learning has been applied to a certain extent, but has the defect of low precision in different scenes. Object classification based on deep learning has the following problems in the classification of similar items: the characteristics of the same large class of articles are generally the same, the difference is often in detail, and the features are often difficult to embody in one feature plane, while the current algorithm does not consider a plurality of feature planes, and simultaneously the objects need to be sorted before the objects are sorted, so that the cost is increased, and on the other hand, the applicability is limited, and the sorting of the highly similar objects is difficult to realize.

Therefore, the development of a method which can realize the rapid, accurate and low-cost classification of articles and has good adaptability is of great practical significance.

Disclosure of Invention

The invention aims to overcome the defects of poor identification and classification effects and poor applicability of the existing articles and provides a method which can realize rapid, accurate and low-cost classification of articles and has good applicability.

In order to achieve the purpose, the invention provides the following technical scheme:

a deep learning object classification method based on multi-channel model fusion comprises the following steps:

(1) placing the articles to be classified on the center of a turntable of a visual image acquisition system, and starting the visual image acquisition system to acquire images of the articles to be classified;

(2) respectively inputting the images acquired by each image acquisition device into a first processing model to obtain a prediction probability matrix corresponding to each image acquisition device;

(3) fusing the prediction probability matrixes obtained in the step (2) to obtain a fusion matrix;

(4) inputting the fusion matrix obtained in the step (3) into a second processing model to obtain a classification result of the articles to be classified;

the visual image acquisition system comprises a turntable for placing articles to be classified, wherein the turntable is connected with a turntable driving device and can horizontally rotate under the driving of the turntable driving device;

more than two image acquisition devices with the vision centers aligned with the center of the turntable are arranged around the turntable, the image acquisition devices are positioned above the turntable, the image acquisition devices are arranged on different directions of the turntable and have different height differences with the turntable;

the image acquisition equipment and the turntable driving device are respectively connected with the central processing unit;

each first processing model corresponds to the image acquisition equipment one by one, the first processing model is a Densenet model, the training process takes the image of an article with a known class as input, the corresponding class probability of the article as theoretical output, and the process of continuously adjusting the model parameters is carried out, and the termination condition of the training is that the upper limit of the training times is reached;

the second processing model is a BP neural network (wherein the initialization weight W is initialized randomly by adopting Gaussian distribution), the training process is a process of continuously adjusting model parameters by taking each image fusion matrix of the articles of the known type as input and taking the corresponding type probability of the articles as theoretical output, the termination condition of the training is that the upper limit of the training times (such as 50 times) is reached, and each image fusion matrix of the articles of the known type is a matrix obtained by fusing each prediction probability matrix obtained by inputting each image of the articles of the known type into the first processing model. Wherein the BP neural network adopts an Adam optimizer, and the learning rate is 0.001.

The position of the steerable part of carousel increases data abundance and reliability, compares and only adopts the overlook image to discern in traditional machine vision, and the image of many visual angles can provide more part information, is favorable to the degree of depth learning model to learn more complete part information to can prevent the overfitting, promote the model generalization ability.

The invention relates to a deep learning object classification method based on multi-channel model fusion, which aims at the problems that the geometric structure of the current object to be classified has higher similarity, the top views of the objects in the same category can be completely the same, and the discrimination of a plurality of industrial materials is lower when the industrial materials are shot from a single angle, a general visual image acquisition system is used for acquiring an object image in a multi-angle image acquisition mode (the characteristic extraction capability is higher, the acquired characteristics are more), then a specific Densenet model is used for processing the image acquired by each image acquisition device (each Densenet model corresponds to each image acquisition device one by one), and the image is input into a BP neural network after fusion, so that the deep learning based on multi-model fusion is completed to realize the object identification, the classification accuracy is high, and the application prospect is wide.

As a preferred technical scheme:

in the deep learning object classification method based on multi-channel model fusion, the object image of the known class is obtained by placing the object of the known class on the center of the turntable of the visual image acquisition system and starting the visual image acquisition system to acquire, that is, the class probability corresponding to the object image is determined.

According to the deep learning object classification method based on the multi-channel model fusion, the central processing unit can acquire the pictures of the image piece to be acquired at different angles through the image acquisition equipment by rotating the turntable driving device, and data enhancement can be completed after the acquired pictures of the image piece to be acquired at different angles are processed.

Namely, the data expansion is carried out by applying the method as follows:

the image is preprocessed to 80 × 80 pixels, data enhancement processing is performed on each image, that is, each image is regarded as an 80 × 80 matrix, operations of shifting, rotating, mirroring and turning over are performed on each image randomly, so that the shifting and turning over are within a certain range (using the range of counterclockwise rotation and clockwise rotation to be within 10 degrees, and the proportion is horizontal shifting or vertical shifting within 0.1 range), and each image is processed to generate brand new N images, that is, the database is expanded by N times (for example, the database can be expanded by 10 times).

According to the deep learning object classification method based on multi-channel model fusion, compared with the state corresponding to the enhanced object, the state corresponding to the enhanced data is translated by less than or equal to 20% left and right or up and down, or the random rotation angle clockwise or counterclockwise is less than or equal to 30 °, and only one feasible technical scheme is given here, and a person skilled in the art can generate the enhanced data through translation and rotation operations according to actual needs.

According to the deep learning object classification method based on multi-channel model fusion, the turntable is arranged in a frame, the image acquisition equipment is fixed on the frame, and the frame is also fixed with a light source;

a black back plate is arranged below the rotary disc;

the turntable driving device is a driving motor.

According to the deep learning object classification method based on multi-channel model fusion, the light source is arranged above the turntable, and the frame is sleeved with the soft light cover;

the frame is a square frame;

the image acquisition equipment is total three, arranges respectively in the A side of frame, B side and top, and the difference in height of three image acquisition equipment and carousel is all different, and A side and B side are mutually perpendicular.

The deep learning object classification method based on the multi-channel model fusion is characterized in that corners of the frame are in rounded corner transition; the frame is formed by fixedly splicing a plurality of aluminum alloy square tubes.

The method for classifying deep learning objects based on multi-channel model fusion as described above, wherein the first processing model is specifically a densenert 121 network model; based on Densenet, the network has a total of 121 layers, and adopts a repeated structure of Input-BN-Dropout (0.4) -Dense100(Input represents an Input layer, BN represents a batch normalization layer, Dropout represents a random loss weight proportion, a number carried by the random loss weight proportion represents a proportion of a discarding weight, Dense represents a fully-connected layer, a number carried by the fully-connected layer represents the number of neurons in the layer), the structure is repeated for 4 times, a Dense layer is adopted in the last layer, and an activation function calculates the classification probability of each kind by using a softmax classifier:

according to the probability yiObtaining the final prediction result, predicting the maximum probability y in the probability matrixiThe index i of (a) is the result of the prediction;

judging whether the model is trained or not according to the predicted result in the model training process, if so, saving a Densenet121 model and parameters, otherwise, performing the training again;

the Densenet121 network model is used for carrying out feature extraction and calculating a convolutional layer, a full-link layer and a pooling layer by using forward propagation;

the set of data sets used to train the first processing model includes images of items of a known class acquired by the corresponding image acquisition device and their corresponding classes.

In the method for classifying deep learning objects based on multi-channel model fusion, a group of data sets used for training the second processing model includes a matrix fused matrix obtained by inputting images of an object of a known class acquired by each image acquisition device in the visual image acquisition system into the corresponding first processing model and a class corresponding to the matrix fused matrix.

The BP neural network main body consists of 19 layers and three network layers, namely a Dense layer, a Dropout layer and a BatchNormalization layer. The Dense layer has 400 neurons, the activation function is set as a relu function, parameters of Dropout are set to be 0.4 in order to reduce overfitting, a Batchnormalization layer is used for normalizing data of each batch, gradient diffusion is prevented, and learning efficiency is further improved.

The input of the second processing model is a result obtained by processing the pictures acquired by different image acquisition devices through a densenert 121 network model and then fusing the results, taking three image acquisition devices as an example, the three image acquisition devices are respectively input into corresponding first processing models (D1, D2 and D3, wherein the training libraries of the first processing models corresponding to each image acquisition device and the use of the training libraries are different and independent from each other) to obtain corresponding output vectors, the output vectors are respectively stored and named as P1, P2 and P3, then the three files are combined into an A matrix (namely the input of the second processing model), and the type of the article corresponding to the data (A matrix) used in the training process (namely the output of the second processing model) is known.

The process of labeling the sample image comprises the following steps: based on image acquisition equipment, calling three cameras through an opencv function to acquire images of an object under a black background, wherein N types of the object with different shapes, different materials and different sizes are selected as objects to be acquired, and the N types with the serial numbers of 1-N are respectively used as tags for object deep learning; and storing the images acquired by each camera in folders, wherein the images are divided into a training set and a test set, the numbers of the training set and the test set are camera1-camera3, N folders are arranged below each camera folder, the file names of the folders are object numbers, and the object images of the corresponding categories are respectively stored.

The fusion process (extended dimension) is specifically as follows:

let the fused feature input be D1Then D is1Is composed of

D1=concatenate(A,B,C)

Obtaining the processing result D of one image1Is [ a ]1,a2,...,an,b1,b2,...,bn,c1,c2,...,cn]。

The fusion process is not limited to this, and the fusion can also be performed in a form of feature addition, which is specifically as follows:

D2=A+B+C

obtained result D2Is [ a ]1+b1+c1,a2+b2+c2,……an+bn+cn]。

Meanwhile, a form of adding and superposing the extended dimension and the feature can also be used, which is specifically as follows:

D3=concatenate(D1,D2)

obtained result D3Is [ a ]1,a2……an,……c1,……cn,a1+b1+c1,a2+b2+c2,……an+bn+cn]。

According to the deep learning object classification method based on multi-channel model fusion, before application, the image acquired by the image acquisition device needs to be preprocessed as follows:

(1) graying;

(2) removing noise by using Gaussian blur, setting the size of a Gaussian matrix to be 9 x 9, setting standard deviation to be 0, and then performing Gaussian filtering;

(3) using a canny operator to carry out edge detection, setting two threshold values of the canny operator to be 25-150, and finding the edge of the whole article;

(4) finding out the minimum external square according to the edge, and intercepting the whole external square;

(5) the square image is scaled to a suitable size (specifically 80 x 80 pixels) by bilinear interpolation.

Has the advantages that:

(1) the deep learning object classification method based on the multi-channel model fusion is characterized in that a plurality of network extraction feature fusion models are established for a plurality of feature planes, and compared with the prior art, the deep learning object classification method based on the multi-channel model fusion has stronger feature extraction capability;

(2) compared with the traditional method, the deep learning object classification method based on the multi-channel model fusion has higher precision, particularly greatly improves the classification precision for similar part classification, and has great application prospect.

Drawings

FIG. 1 is a schematic diagram of the overall structure of a visual image acquisition system according to the present invention;

FIG. 2 is a schematic view of a data acquisition process of the visual image acquisition system of the present invention;

FIG. 3 is a schematic flow chart of the deep learning object classification method based on multi-channel model fusion according to the present invention;

FIG. 4 is a structural diagram of Densenet 121;

FIG. 5 is a BP layer network structure diagram;

FIG. 6 is a schematic diagram of the process and effect of image preprocessing;

FIG. 7 is a flow diagram of data being processed through a first processing model → a second processing model;

FIG. 8 is a graph showing the test results.

Detailed Description

The present invention will be described in more detail with reference to the accompanying drawings, in which embodiments of the invention are shown and described, and it is to be understood that the embodiments described are merely illustrative of some, but not all embodiments of the invention.

In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.

A deep learning object classification method based on multi-channel model fusion comprises the following steps as shown in FIG. 3:

(1) placing the article to be classified on the center of a turntable of a visual image acquisition system, and starting the visual image acquisition system to acquire the image of the article to be classified (the data processing flow is shown in fig. 2);

a visual image acquisition system, as shown in fig. 1, comprising a turntable for placing an image to be acquired, the turntable being connected to a turntable driving device (driving motor) and being driven by the turntable driving device to rotate horizontally, the turntable being arranged in a frame (a square frame formed by fixedly splicing a plurality of aluminum alloy square tubes, the corners of the frame being in fillet transition), a black back plate being arranged below the turntable, the frame being sheathed with a soft light cover;

three image acquisition devices with vision centers aligned with the center of the turntable are arranged around the turntable, the image acquisition devices are positioned above the turntable and fixed on the frame, the three image acquisition devices are respectively arranged on the side A, the side B (the side A is vertical to the side B) and the top of the frame, the height differences of the three image acquisition devices and the turntable are different, and the frame is also fixed with a light source which is arranged above the turntable;

image acquisition equipment, carousel drive arrangement are connected with central processing unit respectively, and central processing unit passes through behind the rotatory carousel drive arrangement accessible image acquisition equipment promptly and acquires the picture of waiting to gather the different angles of image spare, handles the picture of waiting to gather the different angles of image spare to gather and can accomplish the data enhancement after:

(2) the image acquired by the image acquisition device is preprocessed, specifically as shown in fig. 6:

(2.1) graying;

(2.2) removing noise by using Gaussian blur, setting the size of a Gaussian matrix to be 9 x 9, setting standard deviation to be 0, and then performing Gaussian filtering;

(2.3) carrying out edge detection by using a canny operator, setting two threshold values of the canny operator to be 25-150, and finding the edge of the whole article;

(2.4) finding out the minimum external square according to the edge, and intercepting the whole external square;

(2.5) scaling the square image to a proper size (specifically 80 x 80 pixels) by a bilinear interpolation method;

(3) inputting the images acquired by each image acquisition device into the corresponding first processing model respectively to obtain the prediction probability matrix corresponding to each image acquisition device, wherein the processing flows of the steps (3) to (5) are shown in fig. 7;

the first processing model is a densenert 121 network model (as shown in fig. 4), the training process takes the image of the object with known category as input, the process of continuously adjusting model parameters by taking the corresponding class probability of the article as theoretical output, wherein the termination condition of training is that the upper limit of training times (50 times) is reached, a group of data groups used for training a first processing model (the data group used for training the first processing model is a training set, the data group is obtained by acquiring 50 or more images by using a visual image acquisition system and then performing data enhancement, the size of the image is 640 x 480 pixels) comprises the images of the articles of known classes acquired by corresponding image acquisition equipment and the classes corresponding to the images, the image of the object of the known category is obtained by placing the object of the known category on the center of a turntable of the visual image acquisition system and starting the visual image acquisition system to acquire the object of the known category;

(4) fusing the prediction probability matrixes obtained in the step (3) to obtain a fusion matrix;

(5) inputting the fusion matrix obtained in the step (4) into a second processing model to obtain a classification result of the articles to be classified;

the second processing model is a BP neural network (as shown in fig. 5), the training process is a process of continuously adjusting model parameters by taking the fusion matrix of each image of an article of a known type as Input and taking the probability of the corresponding type of the article as theoretical output, the training termination condition is that the upper limit of the training times (50 times) is reached, the training process adopts a repeated structure of Input-BN-Dropout (0.4) -density 100(Input represents an Input layer, BN represents a batch normalization layer, Dropout represents a random loss weight proportion, numbers carried thereafter represent a proportion of discarding weights, density represents a fully-connected layer, numbers thereafter represent the number of neurons in the layer), the structure is repeated 4 times, a density layer is adopted in the last layer, an activation function uses softmax, the BP neural network adopts an Adam optimizer, the learning rate is 0.001, a group of data sets used by the second processing model comprises the first processing model of inputting the image of the article of the known type acquired by each image acquisition device in the visual image acquisition system The fusion matrix of the images of the articles in the known type is a matrix obtained by fusing prediction probability matrixes obtained by inputting the images of the articles in the known type into a first processing model, the prediction probability matrixes obtained by inputting the images into the first processing model are specifically A1, A2 and A3, and the fusion matrix of the images of the articles in the known type is a matrix B obtained by connecting A1, A2 and A3 in a mode of matrix dimension expansion.

The above examples specifically employ the following schemes:

description of the acquisition apparatus:

the collector of this patent is shown in figure 1, comprises section bar support, lighting device, camera, rotation chassis, opaque cover etc. at first fixes 3 cameras respectively directly over the part, and part the place ahead is 45 degrees and part rear is 45 degrees to one side, then covers opaque cover, puts into the part, then uses opencv function to open the camera image to adjust lighting device luminance makes the part image clear.

Data acquisition and processing:

the 411 parts of different shapes, different materials and different sizes are selected as the objects to be collected in the embodiment, and the serial numbers are 001-411 which are used as labels for deep learning of the parts. Firstly, calling three cameras through an opencv function to obtain images of the part under a black background, and respectively acquiring pictures with the quantity larger than 30 by 3 cameras from 3 different angles so as to obtain the feature conditions of different position surfaces of the part at the same time, wherein the feature conditions are directly divided into 3 groups of data sets which are not intersected with each other but belong to the same category. And storing the images collected by each camera in folders, wherein the images are divided into a training set and a testing set, the serial number of the training set is 1-camera3, 411 folders are arranged under each camera folder, the file names of the folders are part numbers, and the part images of the corresponding categories are respectively stored. (File directory: train _ data/camera1/001/1.jpg)

The resolution of the image collected by the camera is 640 x 480, the image resolution is too high and contains images outside the part, and the images need to be processed. Firstly, graying the pictures of a training set and a test set, removing noise by using Gaussian blur, setting the size of a Gaussian matrix to be 9 x 9, setting the standard deviation to be 0, and then performing Gaussian filtering. And then, carrying out binarization operation on the image, carrying out edge detection by using a canny operator, setting the minimum threshold value of the canny operator to be 25 and the maximum threshold value to be 150, and cutting the image and simultaneously zooming the image to 80 × 80. And performing data enhancement on the training set, wherein the data in all the training sets are subjected to horizontal deviation or vertical deviation with the counterclockwise rotation and the clockwise rotation within 10 degrees and the proportion within 0.1, and the data are enhanced in a horizontal overturning or vertical overturning mode, so that the size of the training set is increased by 10 times or more.

Model training and testing:

the experimental platform is windows7 system, English Windada 1080ti display card and vscode software. The model network is shown in fig. 7 as two layers, the first layer consisting of three juxtaposed densener 121 networks d1, d2, d 2. The second layer is a BP network. The BP neural network main body is provided with 19 layers, and the BP neural network main body is composed of three network layers, namely a Dense layer, a Dropout layer and a BatchNormalization layer. The Dense layer has 400 neurons, the activation function is set as a relu function, parameters of Dropout are set to be 0.4 in order to reduce overfitting, a Batchnormalization layer is used for normalizing data of each batch, gradient diffusion is prevented, and learning efficiency is further improved.

The processed training set data was input into the model shown in fig. 7, 50 rounds of training were performed, the learning rate was set to 0.001, and Adam optimizer was used. And inputting the test set data into the trained network to obtain the accuracy of the model.

And selecting a traditional deep learning classification algorithm, and carrying out an experiment on a single camera data set to obtain the accuracy of the traditional algorithm, and comparing the accuracy with the method provided by the invention.

The experimental results are as follows:

experiments have shown that the final accuracy of the conventional method using only a single camera data set is 84.996% when training 50 rounds. The accuracy rate of the multi-channel deep learning model constructed by the three camera data sets is 89.5%, compared with the prediction result of the traditional model, the multi-channel input deep learning model adopted by the invention has higher accuracy rate, the accuracy is about 5% higher than that of the traditional method, and the test result is shown in fig. 8 and table 1.

TABLE 1 accuracy of test set of multi-channel models with different fusion modes

Model classes Additive fusion Dimension expansion fusion Additive dimension-extended fusion
Multi-channel multi-model 89.500% 88.200% 88.500%

Through verification, the deep learning object classification method based on multi-channel model fusion disclosed by the invention has stronger feature extraction capability compared with the prior art by establishing a plurality of network extraction feature-fused models aiming at a plurality of feature surfaces; compared with the traditional method, the method has higher precision, particularly greatly improves the classification precision for the classification of similar parts, and has great application prospect.

Although specific embodiments of the present invention have been described above, it will be appreciated by those skilled in the art that these embodiments are merely illustrative and various changes or modifications may be made without departing from the principles and spirit of the invention.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于奇异值分解的实例物体自标注方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!