Object detection system and method based on artificial neural network

文档序号:1191918 发布日期:2020-08-28 浏览:7次 中文

阅读说明:本技术 基于人工神经网络的物体检测的系统及方法 (Object detection system and method based on artificial neural network ) 是由 蒋卓键 陈晓智 于 2019-10-30 设计创作,主要内容包括:一种基于人工神经网络的物体检测的系统及方法。该方法包括:获取三维点云,并使用主干神经网络获取三维点云的第一特征图(S101);使用注意力分支神经网络处理第一特征图,并获取第二特征图,该第二特征图的各个位置包括与该位置对应的预测注意力系数,第二特征图还用于获取目标物体的损失函数,该损失函数用于更新注意力分支神经网络的网络系数(S102);根据第二特征图获得预测结果,预测结果包括目标物体的位置信息(S103)。(An object detection system and method based on an artificial neural network. The method comprises the following steps: acquiring a three-dimensional point cloud, and acquiring a first feature map of the three-dimensional point cloud by using a trunk neural network (S101); processing the first feature map by using the attention branch neural network, and acquiring a second feature map, wherein each position of the second feature map comprises a predicted attention coefficient corresponding to the position, and the second feature map is also used for acquiring a loss function of the target object, and the loss function is used for updating a network coefficient of the attention branch neural network (S102); a prediction result is obtained from the second feature map, the prediction result including position information of the target object (S103).)

1. A system for object detection based on an artificial neural network, comprising at least one processor and a lidar, wherein,

the laser radar is used for acquiring a three-dimensional point cloud of a target object and inputting the three-dimensional point cloud into the processor;

the processor is used for carrying out three-dimensional grid division on the three-dimensional point cloud to obtain a plurality of voxels; according to the point cloud density in each voxel, point cloud features of the position corresponding to the voxel are determined, the point cloud features are extracted through a backbone network of the object detection model, and a first feature map of the target object is generated; generating a predicted attention coefficient in the first feature map by an attention branch neural network of the object detection model; calculating the result of the attention loss function according to the true value attention coefficient and the prediction attention coefficient in the sample characteristic diagram by using the loss function branch neural network; updating the prediction attention coefficient according to the result of the attention loss function, so that the prediction attention coefficient generated by the feature map part corresponding to the visible part of the target object in the second feature map is higher than the prediction attention coefficient of the feature map part of the shielded part of the target object; and obtaining a prediction result according to the visible part information of the target object, wherein the prediction result comprises the position information of the target object.

2. The system of claim 1, wherein the processor is further configured to partition candidate boxes for the first feature map; generating a predicted attention coefficient for the candidate box of the first feature map by the attention branch neural network; and performing dot multiplication on the prediction attention coefficient and the first feature map to obtain the second feature map.

3. The system of claim 2, wherein the processor is further configured to compare the predicted attention coefficient of the candidate box of the second feature map with the attention coefficient of the true box in the sample feature map corresponding to the candidate box; when the confidence of the predicted attention coefficient of the candidate box and the true value attention coefficient of the true value box is higher than a first threshold, determining the result of the attention loss function according to the predicted attention coefficient and the true value attention coefficient; and updating the prediction attention coefficient according to the result of the attention loss function, so that the confidence degrees of the prediction attention coefficient and the truth value attention coefficient are higher than a second threshold value.

4. The system according to any one of claims 1-3, wherein when the processor is configured to update the predicted attention coefficient of the second feature map, the updated predicted attention coefficient is subjected to an exponential operation with a natural constant e.

5. The system of claim 3 or 4, wherein the processor is further configured to update the predicted attention coefficient by a back propagation algorithm based on the result of the attention loss function.

6. The system according to any one of claims 3-5, wherein the attention loss function is

Figure FDA0002583193870000021

7. The system of any one of claims 1-6, wherein the processor is further configured to obtain three-dimensional point cloud data of the occluded target object;

the processor is further used for carrying out three-dimensional network division on the three-dimensional point cloud data and obtaining a plurality of three-dimensional space voxels;

the processor is further used for obtaining point cloud characteristics of the voxels according to the point cloud density in each voxel;

the processor is further configured to extract the point cloud features using the trunk neural network, and generate the first feature map.

8. The system of any one of claims 2-7, wherein the processor, configured to generate a predicted attention coefficient for a candidate box of the first feature map via the attention branch neural network, comprises:

the attention branch neural network generates the predicted attention coefficients by one or more of a convolution operation, a full concatenation, and a variation of a convolution operation.

9. The system according to any one of claims 1 to 8, wherein the processor is configured to perform object detection on the target object through the artificial neural network, and obtain a three-dimensional position and a confidence of a feature map candidate box corresponding to a visible portion of the target object;

the processor is further configured to rank the confidence degrees and select a candidate frame with the confidence degree higher than a third threshold;

the processor is further configured to predict information of the target object according to the candidate box with the confidence coefficient higher than a third threshold.

10. The system of claim 9, wherein the information of the target object comprises a position and/or a size of the target object.

11. The system according to any one of claims 1-10, comprising a display for displaying the prediction results obtained from the second profile.

12. A method of object detection based on an artificial neural network, the method comprising:

acquiring a three-dimensional point cloud, and acquiring a first feature map of the three-dimensional point cloud by using a trunk neural network;

processing the first feature map by using an attention branch neural network, and acquiring a second feature map, wherein each position of the second feature map comprises a predicted attention coefficient corresponding to the position, and the second feature map is further used for acquiring a loss function of the target object, and the loss function is used for updating the predicted attention coefficient;

and obtaining a prediction result according to the second feature map, wherein the prediction result comprises the position information of the target object.

13. The method of claim 12, wherein the processing the first feature map using the branched attention neural network and obtaining a second feature map comprises:

dividing the first feature map into candidate boxes;

generating a predicted attention coefficient for the candidate box of the first feature map by the attention branch neural network;

and performing dot multiplication on the prediction attention coefficient and the first feature map to obtain the second feature map.

14. The method of claim 13, further comprising:

comparing the predicted attention coefficient of the candidate box of the second feature map with the attention coefficient of the true value box in the sample feature map corresponding to the candidate box;

when the confidence of the predicted attention coefficient of the candidate box and the true value attention coefficient of the true value box is higher than a first threshold, determining the result of the attention loss function according to the predicted attention coefficient and the true value attention coefficient;

and updating the prediction attention coefficient according to the result of the attention loss function, so that the confidence degrees of the prediction attention coefficient and the truth value attention coefficient are higher than a second threshold value.

15. The method according to any one of claims 12-14, further comprising:

and when the prediction attention coefficient of the second feature map is updated, performing exponential operation of taking a natural constant e on the updated prediction attention coefficient.

16. The method according to claim 14 or 15, wherein said updating the predicted attention coefficient according to the result of the attention loss function comprises:

and updating the prediction attention coefficient through a back propagation algorithm according to the result of the attention loss function.

17. The method according to any of claims 14-16, wherein the attention loss function isWherein k is the number of the feature points in the candidate frame, LaAs smooth L1 loss function, mkFor the prediction of attention coefficient, tkIs the true attention coefficient.

18. The method of any one of claims 12-17, wherein the obtaining the three-dimensional point cloud and obtaining the first feature map of the three-dimensional point cloud using a neural network comprises:

acquiring three-dimensional point cloud data of a shielded target object;

carrying out three-dimensional network division on the three-dimensional point cloud data and obtaining a plurality of three-dimensional space voxels;

obtaining point cloud characteristics of the voxels according to the point cloud density in each voxel;

extracting the point cloud features by using the trunk neural network, and generating the first feature map.

19. The method of any one of claims 13-18, wherein generating, by the attention branch neural network, a predicted attention coefficient for the candidate box of the first feature map comprises:

the attention branch neural network generates the predicted attention coefficients by one or more of a convolution operation, a full concatenation, and a variation of a convolution operation.

20. The method according to any one of claims 12-19, further comprising:

carrying out object detection on the target object through the artificial neural network, and obtaining the three-dimensional position and confidence of a feature map candidate frame corresponding to the visible part of the target object;

sorting the confidence degrees, and selecting a candidate frame with the confidence degree higher than a third threshold value;

and predicting the information of the target object according to the candidate box with the confidence coefficient higher than a third threshold value.

21. The method of claim 20, wherein the information of the target object comprises a position and/or a size of the target object.

22. The method according to any one of claims 12-21, further comprising:

and displaying a prediction result obtained according to the second feature map.

23. The method according to any one of claims 12-22, wherein the method is applied to a detection process of a target object in an automatic driving scene, wherein the target object is occluded by partial point cloud, and the detection process comprises the following steps:

establishing an object detection model based on an artificial neural network;

inputting the three-dimensional point cloud of the target object into the object detection model, and performing three-dimensional grid division on the three-dimensional point cloud through the object detection model to obtain a plurality of voxels;

determining point cloud characteristics of the voxel positions according to the point cloud density in each voxel;

extracting the point cloud characteristics through a backbone network of the object detection model, and generating a first characteristic diagram of the target object;

generating a predicted attention coefficient in the first feature map by an attention branch neural network of the object detection model;

calculating the result of the attention loss function according to the true value attention coefficient and the prediction attention coefficient in the sample characteristic diagram by using the loss function branch neural network;

updating the network coefficients of the attention branch neural network according to the result of the attention loss function, so that the predicted attention coefficient generated by the attention branch neural network at the characteristic map part corresponding to the visible part of the target object is higher than the predicted attention coefficient at the characteristic map part of the shielded part of the target object;

and obtaining a prediction result according to the visible part information of the target object, wherein the prediction result comprises the position information of the target object.

24. A computer storage medium having program instructions which, when executed directly or indirectly, cause the method of any of claims 12 to 23 to be carried out.

Technical Field

The present invention relates to the field of three-dimensional object detection and deep learning technologies, and more particularly, to a system and method for object detection based on an artificial neural network.

Background

Safety is one of the most important concerns in autonomous driving. On the algorithm level, the accurate perception of the unmanned vehicle to the surrounding environment is the basis for ensuring the safety, so the precision of the algorithm is very important. In the unmanned driving process, the unmanned vehicle needs to detect surrounding three-dimensional objects. At present, most of laser radars are adopted to detect three-dimensional objects, and when the traditional detection method faces the situation that the detected three-dimensional objects are partially shielded, the problem of poor detection effect occurs because point cloud is shielded.

Therefore, how to improve the detection effect of the blocked three-dimensional object becomes a problem to be solved urgently.

Disclosure of Invention

Compared with the prior art, the object detection system and method based on the artificial neural network can further improve the prediction effect of the occluded object.

In a first aspect, a method for object detection based on an artificial neural network is provided, the method comprising: acquiring a three-dimensional point cloud, and acquiring a first feature map of the three-dimensional point cloud by using a trunk neural network; processing the first characteristic diagram by using an attention branch neural network, and acquiring a second characteristic diagram, wherein the second characteristic diagram is also used for acquiring a loss function of the target object, and the loss function is used for updating network coefficients of the attention branch neural network; and obtaining a prediction result according to the second feature map, wherein the prediction result comprises the position information of the target object.

Optionally, a second feature map with a prediction attention coefficient at each position may be obtained from a first feature map determined according to three-dimensional point cloud data of an occluded target object, and after the prediction attention coefficient is corrected or updated according to the prediction attention coefficient and an attention loss function generated by true value attention, the prediction attention coefficient of a visible part of the occluded target object is made higher than that of the occluded part, so that the visible part can be utilized to a greater extent in the prediction process to more accurately predict information such as the position and size of the target object.

It should be understood that the method for detecting an object based on an artificial neural network provided by the embodiment of the present application may be applied to the field of automatic driving of unmanned devices such as unmanned planes or unmanned vehicles, and is used for predicting obstacles (such as other vehicles, pedestrians, and the like) in the surrounding environment of an unmanned mobile device, where the obstacles (i.e., target objects) may be partially occluded objects. According to the method provided by the embodiment of the application, the object detection model based on the artificial neural network, which can obtain the position information, the size information and the like of the target object according to the visible part information of the shielded target object, can be trained through deep learning of the neural network. In the process of training, the object detection model can give more weight to the information of the visible part of the target object based on an attention mechanism, namely the object detection model is more sensitive to the information of the visible part, so that the object detection model can more accurately acquire the information of the target object according to the visible part of the target object in the subsequent prediction process.

With reference to the first aspect, in certain implementations of the first aspect, the processing the first feature map using a branched attention neural network and obtaining a second feature map includes: dividing the first feature map into candidate boxes; generating, by the attention branch neural network, predicted attention coefficients for candidate boxes of the first feature map, wherein a value of the predicted attention coefficient for each of the candidate boxes is determined from a sample feature map that matches the first feature map; and performing dot multiplication on the prediction attention coefficient and the first feature map to obtain the second feature map.

Optionally, before training the detection model, a sample library may be established, where the sample library may include a sample feature map of the target object, and the sample feature map includes a true value attention coefficient, and for example, a true value attention coefficient corresponding to a visible portion of the target object in the sample feature map is higher than a true value attention coefficient of an occluded portion.

Alternatively, the sample feature map of the target object may be the same as the point cloud feature information of each part of the first feature map acquired by the detection model, and the difference is only that the attention coefficients generated for each position are different.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: comparing the predicted attention coefficient of the candidate box of the second feature map with the attention coefficient of the true value box in the sample feature map corresponding to the candidate box; when the confidence of the predicted attention coefficient of the candidate box and the true value attention coefficient of the true value box is higher than a first threshold, determining the result of the attention loss function according to the predicted attention coefficient and the true value attention coefficient; and updating the attention branch neural network coefficient according to the result of the attention loss function, so that the confidence of the attention branch neural network coefficient is higher than a second threshold value.

Alternatively, the second threshold may be higher than the first threshold. In other words, after the prediction attention coefficient is updated, the values of the prediction attention coefficient and the true value attention coefficient corresponding thereto are closer. The values of the first threshold and the second threshold may be flexibly set, which is not limited in the embodiment of the present application.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: and when the prediction attention coefficient of the second feature map is updated, performing exponential operation of taking a natural constant e on the updated prediction attention coefficient.

It should be understood that after the e-index operation is performed on the updated prediction attention coefficient, the prediction attention coefficient corresponding to the visible part of the target object can be distinguished from the prediction attention coefficient corresponding to the shielded part more obviously, and the information of the visible part is highlighted.

With reference to the first aspect, in certain implementations of the first aspect, the updating the predicted attention coefficient according to a result of the attention loss function includes: and updating the prediction attention coefficient through a back propagation algorithm according to the result of the attention loss function.

With reference to the first aspect, in certain implementations of the first aspect, the attention loss function is

Figure BDA0002583193880000031

Wherein k is the number of the feature points in the candidate frame, LaAs smooth L1 loss function, mkFor the prediction of attention coefficient, tkIs the true attention coefficient.

With reference to the first aspect, in certain implementations of the first aspect, the acquiring a three-dimensional point cloud and acquiring a first feature map of the three-dimensional point cloud using a neural network includes: acquiring three-dimensional point cloud data of a shielded target object; carrying out three-dimensional network division on the three-dimensional point cloud data and obtaining a plurality of three-dimensional space voxels; obtaining point cloud characteristics of the voxels according to the point cloud density in each voxel; extracting the point cloud features by using the trunk neural network, and generating the first feature map.

With reference to the first aspect, in certain implementations of the first aspect, the generating, by the attention branch neural network, a predicted attention coefficient for the candidate box of the first feature map includes: the attention branch neural network generates the predicted attention coefficients by one or more of a convolution operation, a full concatenation, and a variation of a convolution operation.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: carrying out object detection on the target object through the artificial neural network, and obtaining the three-dimensional position and confidence of a feature map candidate frame corresponding to the visible part of the target object; sorting the confidence degrees, and selecting a candidate frame with the confidence degree higher than a third threshold value; and predicting the information of the target object according to the candidate box with the confidence coefficient higher than a third threshold value.

It should be understood that in the prediction process, the candidate frames are screened according to the confidence, and the prediction result of the target object is determined according to the information of the candidate frame with higher confidence.

With reference to the first aspect, in certain implementations of the first aspect, the information of the target object includes a position and/or a size of the target object.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: and displaying a prediction result obtained according to the second feature map.

It should be understood that, when predicting the position or size information of the target object, the method provided by the embodiment of the present application obtains the feature map in which the prediction attention coefficient of the visible part is higher than that of the occluded part, and obtains the prediction result of the target object according to the feature map, wherein the prediction result can be directly displayed by the display.

In a second aspect, a system for object detection based on an artificial neural network is provided, which includes at least one processor and a laser radar, wherein the laser radar is used for acquiring a three-dimensional point cloud; inputting the three-dimensional point cloud of the target object into the processor; the processor is used for carrying out three-dimensional grid division on the three-dimensional point cloud to obtain a plurality of voxels; the processor is further used for determining point cloud characteristics of the positions corresponding to the voxels according to the point cloud density in each voxel; the processor is further used for extracting the point cloud features through a backbone network of the object detection model and generating a first feature map of the target object; the processor is further configured to generate a predicted attention coefficient in the first feature map by an attention branch neural network of the object detection model; the processor is further configured to calculate a result of the attention loss function according to the true value attention coefficient and the predicted attention coefficient in the sample feature map by using a loss function branch neural network; the processor is further configured to update the predicted attention coefficient according to a result of the attention loss function, so that the predicted attention coefficient generated by a feature map portion corresponding to the visible portion of the target object in the second feature map is higher than the predicted attention coefficient of a feature map portion of the occluded portion of the target object; the processor is further configured to obtain a prediction result according to the visible part information of the target object, where the prediction result includes position information of the target object.

It should be understood that a second feature map with a prediction attention coefficient can be generated at each position by obtaining a first feature map determined according to three-dimensional point cloud data of an occluded target object, and after the prediction attention coefficient is corrected or updated according to the prediction attention coefficient and an attention loss function generated by true value attention, the prediction attention coefficient of a visible part of the occluded target object is higher than that of the occluded part, so that information such as the position and the size of the target object can be predicted more accurately by using the visible part to a greater extent in the prediction process.

With reference to the second aspect, in some implementations of the second aspect, the processor is further configured to partition candidate blocks for the first feature map; the processor is further configured to generate a predicted attention coefficient for the candidate box of the first feature map through the attention branch neural network; the processor is further configured to perform dot multiplication on the prediction attention coefficient and the first feature map to obtain the second feature map.

Optionally, before training the detection system, a sample library may be established, where the sample library may include a sample feature map of the target object, and the sample feature map includes a true value attention coefficient, and for example, a true value attention coefficient corresponding to a visible portion of the target object in the sample feature map is higher than a true value attention coefficient of an occluded portion.

With reference to the second aspect, in some implementations of the second aspect, the processor is further configured to compare the predicted attention coefficient of the candidate box of the second feature map with the attention coefficient of the true-value box in the sample feature map corresponding to the candidate box; the processor is further configured to determine a result of an attention loss function according to the predicted attention coefficient and the true attention coefficient when a confidence of the predicted attention coefficient of the candidate box and the true attention coefficient of the true box is higher than a first threshold; the processor is further configured to update the prediction attention coefficient according to a result of the attention loss function, so that the confidence of the prediction attention coefficient is higher than a second threshold.

With reference to the second aspect, in some implementations of the second aspect, when the processor is configured to update the prediction attention coefficient of the second feature map, the updated prediction attention coefficient is subjected to an exponential operation with a natural constant e.

It should be understood that after the e-index operation is performed on the updated prediction attention coefficient, the prediction attention coefficient corresponding to the visible part of the target object can be distinguished from the prediction attention coefficient corresponding to the shielded part more obviously, and the information of the visible part is highlighted.

With reference to the second aspect, in certain implementations of the second aspect, the processor is further configured to update the predicted attention coefficient through a back propagation algorithm according to a result of the attention loss function.

With reference to the second aspect, in certain implementations of the second aspect, the attention loss function isWherein k is the number of the feature points in the candidate frame, LaAs smooth L1 loss function, mkFor the prediction of attention coefficient, tkIs the true attention coefficient.

With reference to the second aspect, in certain implementations of the second aspect, the processor is further configured to acquire three-dimensional point cloud data of the occluded target object; the processor is further used for carrying out three-dimensional network division on the three-dimensional point cloud data and obtaining a plurality of three-dimensional space voxels; the processor is further used for obtaining point cloud characteristics of the voxels according to the point cloud density in each voxel; the processor is further configured to extract the point cloud features using the trunk neural network, and generate the first feature map.

With reference to the second aspect, in certain implementations of the second aspect, the processor, configured to generate a predicted attention coefficient for a candidate box of the first feature map through the attention branch neural network, includes: the attention branch neural network generates the predicted attention coefficients by one or more of a convolution operation, a full concatenation, and a variation of a convolution operation.

With reference to the second aspect, in certain implementations of the second aspect, the processor is configured to perform object detection on the target object through the artificial neural network, and obtain a three-dimensional position and a confidence of a feature map candidate frame corresponding to a visible portion of the target object; the processor is further configured to rank the confidence degrees and select a candidate frame with the confidence degree higher than a third threshold; the processor is further configured to predict information of the target object according to the candidate box with the confidence coefficient higher than a third threshold.

With reference to the second aspect, in certain implementations of the second aspect, the information of the target object includes a position and/or a size of the target object.

With reference to the second aspect, in certain implementations of the second aspect, the system further includes a display for displaying the prediction result obtained according to the second feature map.

Optionally, the system provided by the embodiment of the application can be applied to a mobile device in the field of unmanned driving, and the mobile device can be an unmanned aerial vehicle or an unmanned vehicle. The movable equipment can collect three-dimensional point cloud of the sheltered target object through the laser radar and predict the position and/or size information of the target object according to the visible part of the sheltered object.

In a third aspect, a system for object detection based on an artificial neural network is provided, the system comprising a processing module and a receiving module, wherein the system is configured to perform the method as described in any implementation manner of the first aspect.

In a fourth aspect, a computer storage medium is provided, on which a computer program is stored, which, when executed by a computer, causes the computer to perform the method provided by the first aspect.

In a fifth aspect, a chip system is provided, the chip system comprising at least one processor, wherein when program instructions are executed in the at least one processor, the method according to any of the first aspect is implemented.

In a sixth aspect, there is provided a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method provided in the first aspect.

The method for detecting the object based on the artificial neural network can be applied to the automatic driving field of unmanned equipment such as unmanned planes or unmanned vehicles and the like, and is used for predicting obstacles (such as other vehicles, pedestrians and the like) in the surrounding environment of the unmanned movable equipment, wherein the obstacles (namely target objects) can be partially shielded objects. According to the method provided by the embodiment of the application, the object detection model based on the artificial neural network, which can obtain the position information, the size information and the like of the target object according to the visible part information of the shielded target object, can be trained through deep learning of the neural network. In the process of training, the object detection model can give more weight to the information of the visible part of the target object based on an attention mechanism, namely the object detection model is more sensitive to the information of the visible part, so that the object detection model can more accurately acquire the information of the target object according to the visible part of the target object in the subsequent prediction process.

Drawings

Fig. 1 is a schematic diagram illustrating a scene to which the method for detecting an object based on an artificial neural network provided in the embodiment of the present application is applied.

Fig. 2 shows a schematic flow chart of an object detection method based on an artificial neural network provided in an embodiment of the present application.

Fig. 3 shows a schematic flow chart of an object detection method based on an artificial neural network provided in an embodiment of the present application.

Fig. 4 shows a schematic diagram of an object detection system based on an artificial neural network according to an embodiment of the present application.

Fig. 5 is a schematic diagram of another system for detecting an object based on an artificial neural network according to an embodiment of the present application.

Detailed Description

In order to facilitate understanding of the technical solutions provided by the embodiments of the present invention, some concepts related to the embodiments of the present invention are described below first.

1. Attention (attention) mechanism

Colloquially, the attention mechanism is to focus attention on important points and ignore other unimportant factors. For example, the attention mechanism is similar to a human visual attention mechanism, and human visual attention can obtain a target area needing important attention, namely a focus of attention, by rapidly scanning a global image when facing the image, and then put more attention into the area to obtain more detailed information of the target needing attention, while suppressing other useless information. The determination of the importance level may depend on the application scenario. The attention mechanism is classified into spatial attention and temporal attention according to application scenarios, the former being generally used for image processing, and the latter being generally used for natural language processing. Embodiments of the present application relate primarily to spatial attention.

It should be understood that the method for object detection provided by the embodiment of the present application may be applied to an automatic driving scene (as shown in fig. 1). Specifically, in the automatic driving process, the unmanned vehicle can use the laser radar to acquire the three-dimensional point cloud to detect the surrounding environment and detect the three-dimensional object in the surrounding environment, and when the detected three-dimensional object is partially shielded, the point cloud is lost to cause missing detection, and the detection effect is greatly reduced. In order to obtain a better detection result during detection of a blocked object, so that the position and the size of the blocked three-dimensional object can be predicted more accurately, an attention network branch is added into a neural network detection model of the three-dimensional object by improving a training strategy of a neural network in a deep learning algorithm, the utilization degree of key point clouds in a visible part is improved, and the detection effect of the blocked three-dimensional object is further improved.

The method for detecting an object provided by the embodiments of the present application is further described below with reference to the accompanying drawings.

Fig. 2 shows a schematic flow chart of an object detection method based on an artificial neural network provided in an embodiment of the present application. Comprises the following steps.

S101, acquiring a three-dimensional point cloud, and acquiring a first characteristic diagram of the three-dimensional point cloud by using a trunk neural network.

The three-dimensional point cloud is three-dimensional point cloud data of a partially occluded target object.

It should be understood that a neural network detection model of a three-dimensional object is generated by a deep learning algorithm before three-dimensional point cloud data is acquired. In a training stage of a neural network detection model, the detection model can comprise a trunk network and network branches, wherein the trunk network can be used for receiving three-dimensional point cloud data and generating a feature map according to the three-dimensional point cloud data; the network branches can be used for calculating a loss function of the network, the loss function is a loss function respectively related to the confidence coefficient, the position and the attention coefficient, and the loss functions can guide updating of network parameters such as the attention coefficient and the like, so that the neural network detection model can more accurately predict the position and the size of the target object according to the unoccluded part of the target object, and has better prediction performance.

In one implementation, prior to generating the feature map from the point cloud data, the neural network detection model may voxelize the three-dimensional space of the target object and determine point cloud features for each voxel location according to the point cloud density in that voxel. The process may convert the point cloud data of the target object into dimensions that the neural network may receive.

For example, the neural network detection model may first perform three-dimensional network partition processing on a point cloud of a target object in an xyz direction according to a certain resolution to obtain a plurality of voxels in a three-dimensional space; and then determining the point cloud characteristics corresponding to the voxel based on the point cloud density in the voxel. Calculating the point cloud density P in a voxel with a point cloud, and setting the point cloud characteristic of the position as P; and setting the point cloud characteristic of the voxel without the point cloud as 0.

In one implementation, the neural network detection model may extract point cloud features through a backbone network and generate a first feature map. The backbone network may be any network structure, the size of the first feature graph may be H × W, and the specific values of H and W are not limited in this embodiment.

In one implementation, the data input to the neural network detection model is not limited to point cloud data of the target object, but may be image information of the target object, such as RGB image information of the target object.

And S102, processing the first characteristic diagram by using the attention branch neural network to obtain a second characteristic diagram, wherein the second characteristic diagram is also used for obtaining a loss function of the target object, and the loss function is used for updating the network coefficient of the attention branch neural network.

In one implementation, the predicted attention coefficient may be generated for the candidate box of the first feature map by an attention branch neural network. Each position of the first feature map may refer to each candidate frame obtained by dividing the first feature map, and the size of the candidate frame may be flexibly set as needed, which is not limited in the present application.

In one implementation, the branch of attention neural network may generate the corresponding predicted attention coefficients at various locations of the first feature map in a variety of ways, such as by convolution operations, full concatenation, and variants of convolution (e.g., SPAS, STAR, SGRS, SPARSE, etc.).

In one implementation, the value of the initial generated prediction attention coefficient on the first feature map may be a preset default value; alternatively, the value of the initially generated predictive attention coefficient is an empirical value.

Optionally, before performing the neural network model training process for detecting the object, a sample library may be established, where the sample library includes a sample feature map of the target object, where a truth box may be divided on the sample feature map, and a size of the truth box may be the same as a size of the candidate box in the second feature map.

Alternatively, the labeling of the true attention coefficient may be performed on each part of the target object in the sample feature map in advance. For example, for the blocked part of the target object, the value of the attention coefficient with the true value can be marked as a negative number or a positive number smaller than 1 in advance; for parts of the target object that are not occluded (i.e. visible parts), the values of the true values of the attention coefficients can be pre-labeled as positive numbers greater than 1.

In one implementation, the true value attention coefficient in the sample feature map may be taken as the natural logarithm e index, so that the information of the unoccluded part is more prominent.

It should be understood that the object detection model training process provided in the embodiments of the present application is to enable the attention coefficient of the visible portion of the target object to be higher or much higher than the attention coefficient of the occluded portion of the target object. The higher the attention coefficient is, the higher the degree of interest and the degree of utilization of the partial point cloud are when the position or the size of the target object is predicted subsequently.

In one implementation, the second feature map may be obtained from the first feature map generated with the prediction attention coefficient. For example, the generated prediction attention coefficient and the first feature map may be point-multiplied by a least square method to obtain a second feature map. In other words, the second feature map is obtained after generating the corresponding predicted attention coefficient for each candidate frame based on the first feature map, and the second feature map may also be understood as an attention feature map (i.e., an attention feature map).

It should be understood that, since the prediction attention coefficient in each candidate box in the second feature map is a default value or an empirical value, it cannot be guaranteed that the prediction attention coefficient of the visible part of the target object is higher or much higher than the prediction attention coefficient of the occluded part. In this case, the prediction attention coefficient in the second feature map needs to be corrected and updated with the true value attention coefficient in the sample feature map as a reference, so that the confidence degrees of the prediction attention coefficient and the true value attention coefficient reach a first threshold value, in the process, the attention branch neural network is trained to generate a high attention coefficient in the visible part of the target object, and a lower attention coefficient is generated in the blocked part of the target object, so that the feature map part corresponding to the visible part of the target object can be obtained more accurately, and the information of the target object can be predicted by using the visible part subsequently.

specifically, comparing the predicted attention coefficient in the candidate box of the second feature map with the true value attention coefficient in the sample feature map corresponding to the candidate box, determining the result of the attention loss function according to the predicted attention coefficient and the true value attention coefficient when the confidence of the predicted attention coefficient of the candidate box and the true value attention coefficient of the true value box is higher than a first threshold, and updating the attention branch neural network coefficient according to the structure of the attention loss function after calculating the result of the attention loss function according to the predicted attention coefficient and the true value attention coefficient so that the confidence of the attention branch neural network coefficient is higher than a second thresholdn=0 k[La(mk,tk)]K is the number of feature points in the candidate frame, LaTo smooth the minimum one times smooth L1 loss function, mkTo predict the attention function, tkIs the actual true attention coefficient.

In one implementation, after the result of the attention loss function is calculated, the result may be used to correct and update the network coefficients of the attention branch neural network using a back propagation algorithm.

It will be appreciated that by correcting and updating the predictive attention coefficient in the second profile with the result of the attention loss function, the predictive attention coefficient can be made closer to the value of the true value attention coefficient.

In one implementation, after the predicted attention coefficient of the second feature map is updated, an operation of taking an exponential of a natural constant e may be performed on the updated predicted attention coefficient, so that the attention coefficient of the visible part and the attention coefficient of the shielding part have a more obvious difference to highlight the information of the visible part.

It should be understood that after the predicted attention coefficient in the second feature map is corrected and updated, the attention coefficient corresponding to the visible part of the target object in the second feature map is higher, so that the object detection neural network model is more sensitive to the information of the visible part, and the position and the size of the whole target object are predicted to a greater extent by using the information of the visible part.

It should also be understood that after the predicted attention coefficient of the attention branch neural network is corrected and updated, the attention branch neural network may generate a high predicted attention coefficient in a visible portion of the target object in a subsequent detection process of the target object, and generate a lower predicted attention coefficient in a blocked portion of the target object, that is, the attention branch neural network may be more sensitive to information of the visible portion of the target object after the training process, so that the object detection model focuses more on information of the visible portion in an actual prediction process, so as to achieve an effect of improving detection of the blocked target object.

And S103, obtaining a prediction result according to the second feature map, wherein the prediction result comprises the position information of the target object.

In one implementation, the neural network object detection model may obtain a prediction result according to the corrected or updated second feature map of the prediction attention coefficient, where the prediction result may include position information of the target object, and may also include information such as a size of the target object.

In one implementation, for the neural network object detection model trained through the training process, in the actual prediction process, point cloud data or image information data of the target object may be input into the detection model, and the detection model may screen out data or information belonging to an unobstructed portion in the point cloud data or image information, and predict information such as a position or a size of the entire obstructed target object according to the data or information of the unobstructed portion.

In one implementation, through the neural network object detection model, a three-dimensional position and a confidence corresponding to a candidate frame of a detected object can be obtained; after the candidate frames are sorted according to the confidence degrees, a certain number of candidate frames can be screened out according to the sequence from high confidence degree to low confidence degree, wherein the confidence degrees of the screened candidate frames can be all higher than a third threshold value; and predicting the position and the size of the target object according to the screened candidate box with higher confidence coefficient. The process of predicting the size or position of the whole object according to the point cloud data or the image information of some key parts of the target object by the deep learning algorithm of the neural network can refer to the existing flow, and is not described herein again.

It should be understood that the method for detecting an object based on an artificial neural network provided by the embodiment of the present application may be applied to the field of automatic driving, in a scene where an unmanned vehicle or an unmanned aerial vehicle predicts information such as a position and a size of an obstacle existing in a surrounding environment. In view of the fact that in the conventional process of detecting an obstacle, after a feature map is generated according to information of the obstacle, the feature map is directly input into a branch network for calculating loss functions of positions and confidence degrees, that is, in this case, basically the same attention is given to the occluded part and the visible part of the obstacle, however, since the obstacle is partially occluded, valuable information parts for predicting positions, confidence degrees and the like of the obstacle are lost, and the detection effect is poor. According to the method for detecting the object based on the artificial neural network, the attention network branch is added into the artificial neural network, the attention network branch is trained to learn to accurately identify the visible part of the obstacle, and then the object detection model predicts the position, the size, the confidence coefficient and other information of the obstacle by using the key information of the visible part, so that the unmanned vehicle or the unmanned aerial vehicle can accurately know the distribution, the size and the like of the obstacle in the surrounding environment to make an accurate driving track.

In addition, the method for detecting the object based on the artificial neural network can still detect by using the laser radar without fusing other sensors, and hardware cost is reduced.

Fig. 3 shows a schematic flowchart of an object detection method based on an artificial neural network provided in an embodiment of the present application. The process includes the following steps.

S201, inputting point clouds.

It should be understood that the object detection method provided by the embodiment of the present application may also input an image of the target object, such as a GRB image.

And S202, three-dimensional gridding.

The three-dimensional networking is to perform three-dimensional meshing processing on point clouds of a target object, namely, to voxelate a three-dimensional space. Specifically, the spatial point cloud may be subjected to grid division in the xyz three spatial coordinate directions according to a certain resolution, so as to obtain a voxel in a three-dimensional space.

In one implementation, point cloud features are determined from point cloud densities in voxels. Calculating the point cloud density (marked as P) in a voxel with a point cloud, and setting the point cloud characteristic of the position as P; and setting the point cloud characteristic of the voxel without the point cloud as 0. The process may convert the point cloud data of the target object into dimensions that the neural network may receive.

S203, acquiring the first characteristic diagram through the backbone network.

The first characteristic diagram is a characteristic diagram of the target object point cloud.

In an implementation manner, the backbone network may have any network structure, the size of the first feature graph may be H × W, and the specific values of H and W are not limited in this embodiment.

In one implementation, the data input to the neural network detection model is not limited to point cloud data of the target object, but may be image information of the target object, such as RGB image information of the target object.

And S204, attention coefficient correlation operation.

The correlation operation of the attention coefficient in the process may include generating a predicted attention coefficient corresponding to each position in the first feature map based on the first feature map. Each position of the first feature map may refer to each candidate frame obtained by dividing the first feature map, and the size of the candidate frame may be flexibly set as needed, which is not limited in the present application.

In one implementation, the branch of attention neural network may generate the corresponding predicted attention coefficients at various locations of the first feature map in a variety of ways, such as by convolution operations, full concatenation, and variants of convolution (e.g., SPAS, STAR, SGRS, SPARSE, etc.).

And S205, obtaining the attention coefficient.

And S206, obtaining a second characteristic diagram.

The second feature map is obtained after generating corresponding predicted attention coefficients in each candidate frame based on the first feature map, and may also be understood as an attention feature map (i.e., an attention feature map).

It should be understood that, since the prediction attention coefficient in each candidate box in the second feature map is a default value or an empirical value, it cannot be guaranteed that the prediction attention coefficient of the visible part of the target object is higher or much higher than the prediction attention coefficient of the occluded part. In this case, the prediction attention coefficient in the second feature map needs to be corrected and updated with the true value attention coefficient in the sample feature map as a reference, so that the confidence degrees of the prediction attention coefficient and the true value attention coefficient reach a first threshold value, in the process, the attention branch neural network is trained to generate a high attention coefficient in the visible part of the target object, and a lower attention coefficient is generated in the blocked part of the target object, so that the feature map part corresponding to the visible part of the target object can be obtained more accurately, and the information of the target object can be predicted by using the visible part subsequently.

specifically, comparing the predicted attention coefficient in the candidate box of the second feature map with the true value attention coefficient in the sample feature map corresponding to the candidate box, determining the result of the attention loss function according to the predicted attention coefficient and the true value attention coefficient when the confidence of the predicted attention coefficient of the candidate box and the true value attention coefficient of the true value box is higher than a first threshold, and updating the attention branch neural network coefficient according to the structure of the attention loss function after calculating the result of the attention loss function according to the predicted attention coefficient and the true value attention coefficient so that the confidence of the attention branch neural network coefficient is higher than a second thresholdn=0 k[La(mk,tk)]K is the number of feature points in the candidate frame, LaAs smooth L1 loss function, mkTo predict the attention function, tkIs the actual true attention coefficient.

In one implementation, after the result of the attention loss function is calculated, the result may be used to correct and update the network coefficients of the attention branch neural network using a back propagation algorithm.

It will be appreciated that by correcting and updating the predictive attention coefficient in the second profile with the result of the attention loss function, the predictive attention coefficient can be made closer to the value of the true value attention coefficient.

In one implementation, after the predicted attention coefficient of the second feature map is updated, an operation of taking an exponential of a natural constant e may be performed on the updated predicted attention coefficient, so that the attention coefficient of the visible part and the attention coefficient of the shielding part have a more obvious difference to highlight the information of the visible part.

It should be understood that after the predicted attention coefficient in the second feature map is corrected and updated, the attention coefficient corresponding to the visible part of the target object in the second feature map is higher, so that the object detection neural network model is more sensitive to the information of the visible part, and the position and the size of the whole target object are predicted to a greater extent by using the information of the visible part.

It should also be understood that after the predicted attention coefficient of the attention branch neural network is corrected and updated, the attention branch neural network may generate a high predicted attention coefficient in a visible portion of the target object in a subsequent detection process of the target object, and generate a lower predicted attention coefficient in a blocked portion of the target object, that is, the attention branch neural network may be more sensitive to information of the visible portion of the target object after the training process, so that the object detection model focuses more on information of the visible portion in an actual prediction process, so as to achieve an effect of improving detection of the blocked target object.

And S207, obtaining a prediction result.

In one implementation, the neural network object detection model may obtain a prediction result according to the corrected or updated second feature map of the prediction attention coefficient, where the prediction result may include position information of the target object, and may also include information such as a size of the target object.

And S208, confidence degree sorting and threshold value screening.

In one implementation, through the neural network object detection model, a three-dimensional position and a confidence corresponding to a candidate frame of a detected object can be obtained; after the candidate frames are sorted according to the confidence coefficient, the candidate frames with certain data can be screened out according to the sequence from high confidence coefficient to low confidence coefficient; and predicting the position and the size of the target object according to the screened candidate box with higher confidence coefficient. The process of predicting the size or position of the whole object according to the point cloud data or the image information of some key parts of the target object by the deep learning algorithm of the neural network can refer to the existing flow, and is not described herein again.

And S209, obtaining the final prediction result of the target object.

The method for detecting the object based on the artificial neural network can be suitable for the scene that information such as the position and the size of an obstacle existing in the surrounding environment is predicted by an unmanned vehicle or an unmanned aerial vehicle in the field of automatic driving. In view of the fact that in the conventional process of detecting an obstacle, after a feature map is generated according to information of the obstacle, the feature map is directly input into a branch network for calculating loss functions of positions and confidence degrees, the feature map gives basically the same attention to the blocked part and the visible part of the obstacle, that is, in this case, the obstacle is partially blocked, so that valuable information for predicting the positions, the confidence degrees and the like of the obstacle is partially lost, and the detection effect is poor. According to the method for detecting the object based on the artificial neural network, the attention network branch is added into the artificial neural network, the attention network branch is trained to accurately identify the visible part of the obstacle, and then the object detection model predicts the position, the size, the confidence coefficient and other information of the obstacle by using the key information of the visible part, so that the unmanned vehicle or the unmanned aerial vehicle can accurately acquire the distribution, the size and the like of the obstacle in the surrounding environment to make an accurate driving track.

Fig. 4 shows a schematic diagram of an object detection system based on an artificial neural network according to an embodiment of the present application. The system 300 includes at least one lidar 310 and a processor 320. The system 300 may be a distributed perception processing system disposed on an autonomous vehicle, for example, at least one lidar 310 may be disposed on a roof of the vehicle and be a rotary lidar; lidar 310 may also be located elsewhere on the autonomous vehicle or use other forms of lidar. Processor 320 may be a super-computing platform disposed on the autonomous vehicle, i.e., processor 320 may include one or more processing units in the form of a CPU, GPU, FPGA, or ASIC for processing sensory data acquired by sensors of the autonomous vehicle.

In one implementation, lidar 310 is used to acquire a three-dimensional point cloud.

In one implementation, the processor 320 is configured to obtain a first feature map of the three-dimensional point cloud using a trunk neural network.

In one implementation, the processor 320 is further configured to process the first feature map using the attention branch neural network, and obtain a second feature map, where each position of the second feature map includes a predicted attention coefficient corresponding to the position, and the second feature map is further configured to obtain a loss function of the target object, and the loss function is used to update the predicted attention coefficient.

In one implementation, the processor 320 is further configured to obtain a prediction result according to the second feature map, where the prediction result includes position information of the target object.

In one implementation, the processor 320 is further configured to partition the candidate box into the first feature map.

In one implementation, the processor 320 is further configured to generate a prediction attention coefficient for the candidate box of the first feature map by the attention branch neural network.

In one implementation, the processor 320 is further configured to perform a dot-product on the prediction attention coefficient and the first feature map to obtain a second feature map.

In one implementation, the processor 320 is further configured to compare the predicted attention coefficient of the candidate box of the second feature map with the attention coefficient of the true value box in the sample feature map corresponding to the candidate box.

In one implementation, the processor 320 is further configured to determine a result of the attention loss function according to the predicted attention coefficient and the true attention coefficient when a confidence of the predicted attention coefficient of the candidate box and the true attention coefficient of the true box is higher than a first threshold.

In one implementation, the processor 320 is further configured to update the predicted attention coefficient according to the result of the attention loss function, so that the confidence of the predicted attention coefficient is higher than a second threshold.

In one implementation, when the processor 320 is configured to update the prediction attention coefficient of the second feature map, an exponential operation with a natural constant e is performed on the updated prediction attention coefficient.

In one implementation, the processor 320 is further configured to update the predicted attention coefficient by a back propagation algorithm according to the result of the attention loss function.

In one implementation, the attention loss function isWherein k is the number of feature points in the candidate frame, LaAs smooth L1 loss function, mkTo predict the attention coefficient, tkIs the true value attention coefficient.

In one implementation, the processor 320 is further configured to obtain three-dimensional point cloud data of the occluded target object.

In one implementation, the processor 320 is further configured to perform three-dimensional network partition on the three-dimensional point cloud data and obtain a plurality of three-dimensional space voxels.

In one implementation, the processor 320 is further configured to obtain a point cloud feature of the voxel according to the point cloud density in each voxel.

In one implementation, the processor 320 is further configured to extract point cloud features using a trunk neural network and generate a first feature map.

In one implementation, the processor 320, configured to generate a prediction attention coefficient for a candidate box of a first feature map through an attention branch neural network, includes: the attention branch neural network generates the predictive attention coefficients by one or more of a convolution operation, full concatenation, and a variation of the convolution operation.

In one implementation, the processor 320 is configured to perform object detection on the target object through an artificial neural network, and obtain a three-dimensional position and a confidence of a feature map candidate frame corresponding to a visible portion of the target object.

In one implementation, the processor 320 is further configured to rank the confidence level and select a candidate box with the confidence level higher than a third threshold.

In one implementation, the processor 320 is further configured to predict information of the target object according to the candidate box with the confidence coefficient higher than the third threshold.

In one implementation, the information of the target object includes a position and/or a size of the target object.

In one implementation, the system 300 provided by the embodiment of the present application may further include a display for displaying the prediction result of the target object predicted according to the second feature map.

It should be understood that the system of the object detection model based on the artificial neural network provided in the embodiment of the present application may be applied to the field of automatic driving of unmanned devices such as unmanned planes or unmanned vehicles, and is used to predict obstacles (such as other vehicles, pedestrians, etc.) in the surrounding environment of the unmanned mobile device, where the obstacles (i.e., target objects) may be partially occluded objects. The system provided by the embodiment of the application can train the object detection model based on the artificial neural network, which can obtain the position information, the size information and the like of the target object according to the visible part information of the shielded target object through deep learning of the neural network. In the process of training, the object detection model can give more weight to the information of the visible part of the target object based on an attention mechanism, namely the object detection model is more sensitive to the information of the visible part, so that the object detection model can more accurately acquire the information of the target object according to the visible part of the target object in the subsequent prediction process.

Fig. 5 is a schematic diagram of a system for detecting an object based on an artificial neural network according to an embodiment of the present application. The system 400 includes at least one receiving module 410 and a processing module 420.

In one implementation, the receiving module 410 is configured to obtain a three-dimensional point cloud.

In one implementation, the processing module 420 is configured to obtain a first feature map of the three-dimensional point cloud using a trunk neural network.

In one implementation, the processing module 420 is further configured to process the first feature map using an attention branch neural network, and obtain a second feature map, where each position of the second feature map includes a predicted attention coefficient corresponding to the position, and the second feature map is further configured to obtain a loss function of the target object, and the loss function is used to update the predicted attention coefficient.

In one implementation, the processing module 420 is further configured to obtain a prediction result according to the second feature map, where the prediction result includes location information of the target object.

In one implementation, the processing module 420 is further configured to partition the candidate box into the first feature map.

In one implementation, the processing module 420 is further configured to generate a predicted attention coefficient for the candidate box of the first feature map through the attention branch neural network.

In one implementation, the processing module 420 is further configured to perform dot multiplication on the prediction attention coefficient and the first feature map to obtain a second feature map.

In one implementation, the processing module 420 is further configured to compare the predicted attention coefficient of the candidate box of the second feature map with the attention coefficient of the true value box in the sample feature map corresponding to the candidate box.

In one implementation, the processing module 420 is further configured to determine a result of the attention loss function according to the predicted attention coefficient and the true attention coefficient when a confidence of the predicted attention coefficient of the candidate box and the true attention coefficient of the true box is higher than a first threshold.

In one implementation, the processing module 420 is further configured to update the predicted attention coefficient according to the result of the attention loss function, so that the confidence of the predicted attention coefficient is higher than a second threshold.

In one implementation, when the processing module 420 is configured to update the predicted attention coefficient of the second feature map, an exponential operation with a natural constant e is performed on the updated predicted attention coefficient.

In one implementation, the processing module 420 is further configured to update the predicted attention coefficient through a back propagation algorithm according to the result of the attention loss function.

In one implementation, the attention loss function isWherein k is the number of feature points in the candidate frame, LaAs smooth L1 loss function, mkTo predict the attention coefficient, tkIs the true value attention coefficient.

In one implementation, the processing module 420 is further configured to obtain three-dimensional point cloud data of the occluded target object.

In one implementation, the processing module 420 is further configured to perform three-dimensional network partition on the three-dimensional point cloud data and obtain a plurality of three-dimensional space voxels.

In one implementation, the processing module 420 is further configured to obtain a point cloud feature of the voxel according to the point cloud density in each voxel.

In one implementation, the processing module 420 is further configured to extract point cloud features using a trunk neural network and generate a first feature map.

In one implementation, the processing module 420, configured to generate a prediction attention coefficient for a candidate box of a first feature map through an attention branch neural network, includes: the attention branch neural network generates the predictive attention coefficients by one or more of a convolution operation, full concatenation, and a variation of the convolution operation.

In one implementation, the processing module 420 is configured to perform object detection on the target object through an artificial neural network, and obtain a three-dimensional position and a confidence of a feature map candidate frame corresponding to a visible portion of the target object.

In one implementation, the processing module 420 is further configured to rank the confidence level and select a candidate box with the confidence level higher than a third threshold.

In one implementation, the processing module 420 is further configured to predict information of the target object according to the candidate box with the confidence coefficient higher than the third threshold.

In one implementation, the information of the target object includes a position and/or a size of the target object.

In one implementation, the system of the object detection model provided in the embodiment of the present application may further include a display, and the display is configured to display the prediction result obtained according to the second feature map.

The embodiment of the present invention further provides a chip system, where the chip system includes at least one processor, and when the program instructions are executed in the at least one processor, the method provided in the embodiment of the present application is implemented.

Embodiments of the present invention also provide a computer storage medium having a computer program stored thereon, where the computer program is executed by a computer, so that the computer executes the method of the above method embodiments.

Embodiments of the present invention also provide a computer program product comprising instructions, which when executed by a computer, cause the computer to perform the method of the above method embodiments.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware or any other combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

23页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:图像处理方法、装置、控制终端及可移动设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!