Target detection method and device
阅读说明:本技术 一种目标检测方法和装置 (Target detection method and device ) 是由 张立成 于 2018-06-21 设计创作,主要内容包括:本发明公开了一种目标检测方法和装置,涉及计算机技术领域。该方法的一具体实施方式包括:将采集的点云数据的三维坐标,按照预设的映射规则映射为在柱状图上的二维坐标;通过利用深度残差网络构造的网络模型,对所述柱状图进行卷积和降采样,得到多通道特征图;对所述多通道特征图进行反卷积,得到与所述柱状图相同大小的两通道特征图;根据所述两通道特征图,确定检测目标所在的位置信息。该实施方式能够直接对激光点云数据进行处理来检测其中的检测目标,而不依赖于激光雷达和摄像头之间的标定,从而提高检测结果的准确性和可靠性。(The invention discloses a target detection method and device, and relates to the technical field of computers. One embodiment of the method comprises: mapping the three-dimensional coordinates of the collected point cloud data into two-dimensional coordinates on the histogram according to a preset mapping rule; performing convolution and down-sampling on the histogram by using a network model constructed by a depth residual error network to obtain a multi-channel characteristic diagram; deconvoluting the multi-channel feature map to obtain two channel feature maps with the same size as the histogram; and determining the position information of the detection target according to the two-channel characteristic diagram. According to the embodiment, the laser point cloud data can be directly processed to detect the detection target in the laser point cloud data, and the calibration between the laser radar and the camera is not relied on, so that the accuracy and the reliability of the detection result are improved.)
1. A method of object detection, comprising:
mapping the three-dimensional coordinates of the collected point cloud data into two-dimensional coordinates on the histogram according to a preset mapping rule;
performing convolution and down-sampling on the histogram by using a network model constructed by a depth residual error network to obtain a multi-channel characteristic diagram;
deconvoluting the multi-channel feature map to obtain two channel feature maps with the same size as the histogram;
and determining the position information of the detection target according to the two-channel characteristic diagram.
2. The method of claim 1, wherein the step of mapping the three-dimensional coordinates of the collected point cloud data to two-dimensional coordinates on the histogram according to a preset mapping rule comprises:
calculating a first angle and a second angle corresponding to the three-dimensional coordinates according to the values of the three-dimensional coordinates of the collected point cloud data;
obtaining a two-dimensional coordinate of a point of the three-dimensional coordinate mapped on the histogram according to a horizontal angular resolution, a vertical angular resolution, the first angle and the second angle, wherein the horizontal angular resolution is as follows: the minimum angle difference between two horizontally adjacent three-dimensional coordinate points in the point cloud data is obtained by the laser radar; the vertical angular resolution is: and the laser radar is used for obtaining the minimum angle difference between two vertically adjacent three-dimensional coordinate points in the point cloud data, wherein a first angle corresponding to a three-dimensional coordinate represents: the projection point of the three-dimensional coordinate on the xy plane of the three-dimensional coordinate axis deviates from the angle of the x axis in the horizontal direction; the three-dimensional coordinates correspond to a second angle representation: the point of the three-dimensional coordinate is deviated from the xy-plane of the three-dimensional coordinate axis in the vertical direction, the horizontal direction being a direction along the y-axis, and the vertical direction being a direction along the z-axis.
3. The method of claim 1, wherein the step of obtaining a multi-channel feature map by performing convolution and downsampling on the histogram using a network model constructed by a depth residual network comprises:
selecting a specific layer of a depth residual error network to carry out convolution on the histogram so as to obtain a feature map after convolution;
and performing down-sampling on the convolved feature map in the width dimension to obtain the multi-channel feature map.
4. The method according to claim 1, wherein the step of determining the location information of the detection target according to the two-channel feature map comprises:
comparing output values of two channels of the two-channel characteristic diagram, and determining a label of each point of the two-channel characteristic diagram according to a comparison result, wherein the label indicates whether the point belongs to the detection target or not;
and selecting each point of which the label indication belongs to the detection target, and determining the position information of the detection target according to the three-dimensional coordinates corresponding to the selected points.
5. An object detection device, comprising:
the point cloud coordinate mapping module is used for mapping the three-dimensional coordinates of the acquired point cloud data into two-dimensional coordinates on the histogram according to a preset mapping rule;
the first characteristic diagram generating module is used for performing convolution and downsampling on the histogram through a network model constructed by utilizing a depth residual error network to obtain a multi-channel characteristic diagram;
the second feature map generation module is used for carrying out deconvolution on the multichannel feature map to obtain two channel feature maps with the same size as the histogram;
and the position information determining module is used for determining the position information of the detection target according to the two channel characteristic graphs.
6. The apparatus of claim 5, wherein the point cloud coordinate mapping module is further configured to:
calculating a first angle and a second angle corresponding to the three-dimensional coordinates according to the values of the three-dimensional coordinates of the collected point cloud data;
obtaining a two-dimensional coordinate of a point of the three-dimensional coordinate mapped on the histogram according to a horizontal angular resolution, a vertical angular resolution, the first angle and the second angle, wherein the horizontal angular resolution is as follows: the minimum angle difference between two horizontally adjacent three-dimensional coordinate points in the point cloud data is obtained by the laser radar; the vertical angular resolution is: and the laser radar is used for obtaining the minimum angle difference between two vertically adjacent three-dimensional coordinate points in the point cloud data, wherein a first angle corresponding to a three-dimensional coordinate represents: the projection point of the three-dimensional coordinate on the xy plane of the three-dimensional coordinate axis deviates from the angle of the x axis in the horizontal direction; the three-dimensional coordinates correspond to a second angle representation: the point of the three-dimensional coordinate is deviated from the xy-plane of the three-dimensional coordinate axis in the vertical direction, the horizontal direction being a direction along the y-axis, and the vertical direction being a direction along the z-axis.
7. The apparatus of claim 5, wherein the first profile generation module is further configured to:
selecting a specific layer of a depth residual error network to carry out convolution on the histogram so as to obtain a feature map after convolution;
and performing down-sampling on the convolved feature map in the width dimension to obtain the multi-channel feature map.
8. The apparatus of claim 5, wherein the location information determining module is further configured to:
comparing output values of two channels of the two-channel characteristic diagram, and determining a label of each point of the two-channel characteristic diagram according to a comparison result, wherein the label indicates whether the point belongs to the detection target or not;
and selecting each point of which the label indication belongs to the detection target, and determining the position information of the detection target according to the three-dimensional coordinates corresponding to the selected points.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-4.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-4.
Technical Field
The invention relates to the technical field of computers, in particular to a target detection method and device.
Background
With the development of computer technology and the wide application of image technology, the application scenes of target detection technology are more and more extensive, at present, in the field of automatic driving, vehicle detection mostly depends on images, because the information of the images is very rich, but the images are difficult to obtain accurate position information, so that a laser radar and a camera need to be calibrated, the vehicle detected from the images is mapped onto a laser point cloud after the calibration is accurate, and then a decision is made according to the position information on the point cloud. If the calibration is inaccurate, the position of the vehicle mapped onto the laser point cloud is inaccurate, which may affect the decision of unmanned driving.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
the existing method depends on the calibration of the laser radar and the camera, and the accuracy and reliability of the detection result are poor.
Disclosure of Invention
In view of this, embodiments of the present invention provide a target detection method and apparatus, which can directly process laser point cloud data to detect a detection target therein, without depending on calibration between a laser radar and a camera, so as to improve accuracy and reliability of a detection result.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an object detection method.
A method of target detection, comprising: mapping the three-dimensional coordinates of the collected point cloud data into two-dimensional coordinates on the histogram according to a preset mapping rule; performing convolution and down-sampling on the histogram by using a network model constructed by a depth residual error network to obtain a multi-channel characteristic diagram; deconvoluting the multi-channel feature map to obtain two channel feature maps with the same size as the histogram; and determining the position information of the detection target according to the two-channel characteristic diagram.
Optionally, the step of mapping the three-dimensional coordinates of the collected point cloud data into two-dimensional coordinates on the histogram according to a preset mapping rule includes: calculating a first angle and a second angle corresponding to the three-dimensional coordinates according to the values of the three-dimensional coordinates of the collected point cloud data; obtaining a two-dimensional coordinate of a point of the three-dimensional coordinate mapped on the histogram according to a horizontal angular resolution, a vertical angular resolution, the first angle and the second angle, wherein the horizontal angular resolution is as follows: the minimum angle difference between two horizontally adjacent three-dimensional coordinate points in the point cloud data is obtained by the laser radar; the vertical angular resolution is: and the laser radar is used for obtaining the minimum angle difference between two vertically adjacent three-dimensional coordinate points in the point cloud data, wherein a first angle corresponding to a three-dimensional coordinate represents: the projection point of the three-dimensional coordinate on the xy plane of the three-dimensional coordinate axis deviates from the angle of the x axis in the horizontal direction; the three-dimensional coordinates correspond to a second angle representation: the point of the three-dimensional coordinate is deviated from the xy-plane of the three-dimensional coordinate axis in the vertical direction, the horizontal direction being a direction along the y-axis, and the vertical direction being a direction along the z-axis.
Optionally, the step of obtaining a multi-channel feature map by performing convolution and downsampling on the histogram by using a network model constructed by a depth residual error network includes: selecting a specific layer of a depth residual error network to carry out convolution on the histogram so as to obtain a feature map after convolution; and performing down-sampling on the convolved feature map in the width dimension to obtain the multi-channel feature map.
Optionally, the step of determining the location information of the detection target according to the two-channel feature map includes: comparing output values of two channels of the two-channel characteristic diagram, and determining a label of each point of the two-channel characteristic diagram according to a comparison result, wherein the label indicates whether the point belongs to the detection target or not; and selecting each point of which the label indication belongs to the detection target, and determining the position information of the detection target according to the three-dimensional coordinates corresponding to the selected points.
According to another aspect of the embodiments of the present invention, there is provided an object detecting apparatus.
An object detection device comprising: the point cloud coordinate mapping module is used for mapping the three-dimensional coordinates of the acquired point cloud data into two-dimensional coordinates on the histogram according to a preset mapping rule; the first characteristic diagram generating module is used for performing convolution and downsampling on the histogram through a network model constructed by utilizing a depth residual error network to obtain a multi-channel characteristic diagram; the second feature map generation module is used for carrying out deconvolution on the multichannel feature map to obtain two channel feature maps with the same size as the histogram; and the position information determining module is used for determining the position information of the detection target according to the two channel characteristic graphs.
Optionally, the point cloud coordinate mapping module is further configured to: calculating a first angle and a second angle corresponding to the three-dimensional coordinates according to the values of the three-dimensional coordinates of the collected point cloud data; obtaining a two-dimensional coordinate of a point of the three-dimensional coordinate mapped on the histogram according to a horizontal angular resolution, a vertical angular resolution, the first angle and the second angle, wherein the horizontal angular resolution is as follows: the minimum angle difference between two horizontally adjacent three-dimensional coordinate points in the point cloud data is obtained by the laser radar; the vertical angular resolution is: and the laser radar is used for obtaining the minimum angle difference between two vertically adjacent three-dimensional coordinate points in the point cloud data, wherein a first angle corresponding to a three-dimensional coordinate represents: the projection point of the three-dimensional coordinate on the xy plane of the three-dimensional coordinate axis deviates from the angle of the x axis in the horizontal direction; the three-dimensional coordinates correspond to a second angle representation: the point of the three-dimensional coordinate is deviated from the xy-plane of the three-dimensional coordinate axis in the vertical direction, the horizontal direction being a direction along the y-axis, and the vertical direction being a direction along the z-axis.
Optionally, the first feature map generation module is further configured to: selecting a specific layer of a depth residual error network to carry out convolution on the histogram so as to obtain a feature map after convolution; and performing down-sampling on the convolved feature map in the width dimension to obtain the multi-channel feature map.
Optionally, the location information determining module is further configured to: comparing output values of two channels of the two-channel characteristic diagram, and determining a label of each point of the two-channel characteristic diagram according to a comparison result, wherein the label indicates whether the point belongs to the detection target or not; and selecting each point of which the label indication belongs to the detection target, and determining the position information of the detection target according to the three-dimensional coordinates corresponding to the selected points.
According to yet another aspect of an embodiment of the present invention, an electronic device is provided.
An electronic device, comprising: one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the object detection method provided by the present invention.
According to yet another aspect of an embodiment of the present invention, a computer-readable medium is provided.
A computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the object detection method provided by the invention.
One embodiment of the above invention has the following advantages or benefits: mapping the three-dimensional coordinates of the collected point cloud data into two-dimensional coordinates on the histogram according to a preset mapping rule; performing convolution and down-sampling on the histogram by using a network model constructed by a depth residual error network to obtain a multi-channel characteristic diagram; deconvoluting the multi-channel characteristic graph to obtain two channel characteristic graphs with the same size as the histogram; and determining the position information of the detection target according to the two-channel characteristic diagram. The method can directly process the laser point cloud data to detect the detection target in the laser point cloud data without depending on the calibration between the laser radar and the camera, thereby improving the accuracy and reliability of the detection result.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a target detection method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the main blocks of an object detection apparatus according to an embodiment of the present invention;
FIG. 3 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 4 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of main steps of a target detection method according to an embodiment of the present invention.
As shown in fig. 1, the target detection method of the embodiment of the present invention mainly includes the following steps S101 to S104.
Step S101: and mapping the three-dimensional coordinates of the collected point cloud data into two-dimensional coordinates on the histogram according to a preset mapping rule.
The collected point cloud data is the point cloud collected by the laser radar, and the points in the point cloud have respective three-dimensional coordinates.
Step S101 may specifically include: calculating a first angle and a second angle corresponding to the three-dimensional coordinate according to the value of the three-dimensional coordinate of the collected point cloud data; and obtaining the two-dimensional coordinate of the point of the three-dimensional coordinate mapped on the histogram according to the horizontal angular resolution, the vertical angular resolution, the calculated first angle and the second angle.
Wherein the horizontal angular resolution is: the minimum angle difference between two horizontally adjacent three-dimensional coordinate points in the laser radar-to-point cloud data; the vertical angular resolution is: and the minimum angle difference between two vertically adjacent three-dimensional coordinate points in the laser radar-to-point cloud data.
A first angular representation corresponding to a three-dimensional coordinate: the projection point of the three-dimensional coordinate on the xy plane of the three-dimensional coordinate axis deviates from the angle of the x axis in the horizontal direction; the three-dimensional coordinates correspond to a second angle representation: the point of the three-dimensional coordinate is deviated from the xy-plane of the three-dimensional coordinate axis in the vertical direction, where the horizontal direction is a direction along the y-axis and the vertical direction is a direction along the z-axis.
Each point on the histogram has a four-dimensional feature that includes the three-dimensional coordinates (i.e., the three x, y, z values of the three-dimensional spatial coordinates) of the point and the laser reflection intensity of the point.
Step S102: and performing convolution and down-sampling on the histogram by using a network model constructed by the depth residual error network to obtain a multi-channel characteristic diagram.
The depth residual network may adopt a ResNet50 network, which is a depth residual network with a depth of 50, and may also adopt other depth residual networks such as a ResNet101 network.
Step S102 may specifically include: selecting a specific layer of the depth residual error network to carry out convolution on the histogram so as to obtain a feature map after convolution; and performing down-sampling on the convolved feature map in the width dimension to obtain a multi-channel feature map.
Step S103: and performing deconvolution on the multi-channel characteristic graph to obtain two-channel characteristic graphs with the same size as the histogram.
Step S104: and determining the position information of the detection target according to the two-channel characteristic diagram.
Step S104 may specifically include: comparing output values of two channels of the two-channel characteristic diagram, and determining a label of each point of the two-channel characteristic diagram according to a comparison result, wherein the label indicates whether the point belongs to a detection target or not; and selecting each point of which the label indication belongs to the detection target, and determining the position information of the detection target according to the three-dimensional coordinates corresponding to the selected points.
The following describes a target detection method according to an embodiment of the present invention, taking the detection of a vehicle in the field of unmanned driving as an example. The target detection method provided by the embodiment of the invention is not only suitable for detecting vehicles, but also can be used for detecting other targets such as pedestrians.
In order to detect the position of the vehicle, it is necessary to identify all point clouds belonging to the vehicle, and obtain the position information of the vehicle from the three-dimensional coordinates of the point clouds.
First, a point cloud collected by a laser radar is mapped onto a histogram, and assuming that the three-dimensional coordinate of a certain point in the point cloud is (x, y, z), the following two angles θ (i.e., a first angle) and θ are calculated from the value of the three-dimensional coordinate
(i.e., second angle):
wherein θ (i.e., the first angle) represents an angle at which a projection point (coordinates are (x, y)) of the point on an xy plane (i.e., a plane formed by an x axis and a y axis in three-dimensional coordinate axes) deviates from the x axis in the horizontal direction;
(i.e., the second angle) represents an angle at which the point deviates from the xy-plane (i.e., a plane formed by the x-axis and the y-axis in the three-dimensional coordinate axes) toward the vertical direction. In other words, let W denote the projection point, and θ (i.e., the first angle) is an angle between a line connecting W and a coordinate origin (corresponding to two-dimensional coordinates (0,0), and three-dimensional coordinates (0,0,0)) and the x-axis.(i.e., the second angle) is the angle between the line connecting the (x, y, z) point and the origin of coordinates (the three-dimensional coordinates are (0,0,0)) and the xy plane.Then according to theta (i.e., the first angle) and
(i.e., second angle), and horizontal angular resolution Δ θ and verticalAngular resolutionThe two-dimensional coordinates (X, Y) of the points (X, Y, z) on the histogram are calculated, in particular,
and mapping the points (x, y, z) to the histogram, and mapping each point in the point cloud to the histogram according to the process, namely completing the mapping of the point cloud acquired by the laser radar to the histogram. Wherein the horizontal angular resolution Δ θ is: the minimum angle difference between two horizontally adjacent three-dimensional coordinate points in the laser radar-to-point cloud data is numerically equal to the minimum value of the difference between theta values corresponding to the two horizontally adjacent three-dimensional coordinate points; vertical angular resolution
Comprises the following steps: the minimum angle difference between two vertically adjacent three-dimensional coordinate points in the laser radar to point cloud data is equal to the minimum angle difference between two vertically adjacent three-dimensional coordinate pointsThe minimum value of the difference between the values.The horizontal direction refers to a direction along the y-axis, and the vertical direction refers to a direction along the z-axis, and similarly, two three-dimensional coordinate points that are horizontally adjacent refer to two three-dimensional coordinate points that are adjacent in the y-axis direction, and two three-dimensional coordinate points that are vertically adjacent refer to two three-dimensional coordinate points that are adjacent in the z-axis direction.
Assuming the lidar sensor has 16 lines, each line transmitting 2016 points, the histogram is a 16 x 2016 lattice, 16 is the height of the image, 2016 is the width of the image, and the units are pixels. Each position (i.e., point, or pixel point) of the histogram has four features (four-dimensional features): i.e. the x, y, z coordinates of the point cloud, and the intensity of the laser radar reflection at the point cloud (laser reflection intensity). The four-dimensional features are features that each point in the point cloud contains. Thus, for each frame of point cloud, a three-dimensional matrix of 4 x 16 x 2016 may be obtained, and if a point cloud is missing at a location on the histogram, the four-dimensional features of that location are supplemented with 0 s.
After the histogram is obtained, the histogram is convolved with a network model constructed using a ResNet50 network (a deep residual network with a depth of 50). Instead of using all layers of the ResNet50 network, the network model only retains res5c _ relu (an active layer) and layers before it, layers after res5c _ relu are not useful for the vehicle detection task of embodiments of the present invention, resulting in information loss. Meanwhile, since the height of the histogram is only 16, the downsampling layer of the ResNet50 network is modified (only one downsampling layer in the layer of the ResNet50 network reserved in the network model) and is reduced only in the width dimension. Specifically, the step size of the original downsampling layer of the ResNet50 network (2 in both dimensions of width and height) is changed into 2 in the width dimension and 1 in the height dimension. Thus, after down-sampling, the dimension width will be reduced to one-half of the original dimension, while the dimension height will be unchanged. In addition, modifying all the convolution layers of the ResNet50 network with step size 2 to have step size 2 in the width dimension and step size 1 in the height dimension, while the convolution kernel size is still 3 × 3 or 5 × 5 etc. of the original ResNet50 network, and does not need to be changed to 1 × 3 or 1 × 5 etc., which is advantageous in that it ensures that the height dimension does not decrease in dimension during the training phase of the network model, and also allows the convolution layer to learn the features of 3 × 3 regions or 5 × 5 regions instead of the features of 1 × 3 or 1 × 5 regions, in other words, allows the convolution layer to learn the neighboring features of the height dimension. The network model of the embodiment of the present invention is obtained by the above modification, i.e., construction using the ResNet50 network.
After the histogram is convolved and down-sampled by the network model constructed as described above, the dimension of the height dimension is unchanged, and the width dimension is reduced to the original size of thirty-half, so that a multi-channel feature map of 2048 × 16 × 63 is obtained, 2048 is the number of channels, and 16 × 63 is the size of the feature map.
Connecting an deconvolution layer behind the constructed network model, wherein the number of convolution kernels of the deconvolution layer is 2, and the size of the convolution kernels is 1 × 64, namely the height of the convolution kernels is 1, and the width of the convolution kernels is 64; the step size in the height dimension is 1 and the step size in the width dimension is 32. After the deconvolution operation, a two-channel signature of 2 x 16 x 2016 was obtained, where 2 represents the two channels and 16 x 2016 is the size of the two-channel signature, so that the two-channel signature was restored to the same size as the source input (histogram) by the deconvolution layer.
The output values of the two channels (first channel and second channel) at each of the 16 x 2016 points (i.e., locations) in the two-channel signature are compared, e.g., for a point where the output value of the first channel is greater than the output value of the second channel, the point label is 1, indicating that the point belongs to a vehicle, otherwise, the point label is 0, indicating that the point does not belong to a vehicle. Each point in the two-channel characteristic diagram corresponds to a three-dimensional coordinate, namely the three-dimensional coordinate corresponding to a point corresponding to the same position in the histogram of the input network model (the value of the three-dimensional coordinate is obtained according to the four-dimensional characteristic of the point in the histogram).
The points in the histogram are obtained by mapping the point cloud acquired by the laser radar, so that the labels of all the points in the two channel characteristic maps are determined, namely whether each point in the point cloud belongs to a vehicle is determined, and when all the point clouds belonging to the vehicle are identified, the vehicle is identified. And determining the position information of the vehicle according to the three-dimensional coordinates of the point cloud belonging to the vehicle, namely the three-dimensional coordinates corresponding to each point in the two-channel characteristic diagram, wherein the position information of the vehicle comprises the three-dimensional coordinates of the point cloud belonging to the vehicle.
According to the above, the embodiment of the invention connects an deconvolution layer behind the constructed network model, so as to construct a new target detection model. Before detecting a target such as a vehicle based on the target detection model of the embodiment of the present invention, the target detection model needs to be trained. In the training stage, a back propagation algorithm is adopted for model learning, and a random gradient descent method is adopted for model parameter learning. Specifically, a true value label (indicating whether the training sample is a vehicle) of the training sample is labeled before training, a cost function value (Loss value) is calculated according to the labeled true value label and an output result of the target detection model during each training, the Loss value is continuously reduced, and finally a label with more accurate output is obtained. And the Loss is reduced by continuously moving the Loss value to the opposite direction of the gradient corresponding to the current point in the gradient descending process, and the gradient calculated by only one training sample is updated at a time in the random gradient descending process, wherein the gradient is calculated by adopting a back propagation algorithm.
In the testing stage, a frame of new point cloud is input, the labels of all points belonging to the vehicle are accurately indicated through learning output by the target detection model, and the position information of the vehicle can be obtained according to the three-dimensional coordinates in the point cloud corresponding to the points.
According to the vehicle detection method provided by the embodiment of the invention, the laser point cloud data is directly processed to detect the detection target in the laser point cloud data, and the calibration between a laser radar and a camera is not relied on, so that the accuracy and reliability of the detection result are improved, and the decision of unmanned driving is not influenced.
Fig. 2 is a schematic diagram of main blocks of an object detection apparatus according to an embodiment of the present invention.
As shown in fig. 2, the target detection apparatus 200 according to the embodiment of the present invention mainly includes: a point cloud coordinate mapping module 201, a first feature map generating module 202, a second feature map generating module 203, and a position information determining module 204.
The point cloud coordinate mapping module 201 is configured to map the three-dimensional coordinates of the collected point cloud data into two-dimensional coordinates on a histogram according to a preset mapping rule, where each point on the histogram has a four-dimensional feature, and the four-dimensional feature includes the three-dimensional coordinates corresponding to the point and the laser reflection intensity corresponding to the point.
The point cloud coordinate mapping module 201 may be specifically configured to: calculating a first angle and a second angle corresponding to the three-dimensional coordinates according to the value of the three-dimensional coordinates of the collected point cloud data; obtaining a two-dimensional coordinate of a point of the three-dimensional coordinate mapped on the histogram according to the horizontal angular resolution, the vertical angular resolution, the first angle and the second angle, wherein the horizontal angular resolution is as follows: the minimum angle difference between two horizontally adjacent three-dimensional coordinate points in the laser radar-to-point cloud data; the vertical angular resolution is: and the minimum angle difference between the laser radar and two vertically adjacent three-dimensional coordinate points in the point cloud data.
The first feature map generation module 202 is configured to perform convolution and downsampling on the histogram by using a network model constructed by a depth residual error network, so as to obtain a multi-channel feature map.
The first feature map generation module 202 may specifically be configured to: selecting a specific layer of the depth residual error network to carry out convolution on the histogram so as to obtain a feature map after convolution; and performing down-sampling on the convolved feature map in the width dimension to obtain a multi-channel feature map.
The second feature map generation module 203 is configured to perform deconvolution on the multi-channel feature map to obtain two-channel feature maps with the same size as the histogram.
The position information determining module 204 is configured to determine position information of the detection target according to the two-channel feature map.
The location information determining module 204 may be specifically configured to: comparing the output values of the two channels of the two-channel characteristic diagram, and determining a label of each point of the two-channel characteristic diagram according to the comparison result, wherein the label indicates whether the point belongs to the detection target or not; and selecting each point of which the label indication belongs to the detection target, and determining the position information of the detection target according to the three-dimensional coordinates corresponding to the selected points.
The function of the second feature map generation module of the object detection apparatus 200 may be implemented using the deconvolution layer of the object detection model described above.
According to the vehicle detection device provided by the embodiment of the invention, the laser point cloud data is directly processed to detect the detection target in the laser point cloud data, and the calibration between a laser radar and a camera is not relied on, so that the accuracy and reliability of the detection result are improved.
In addition, the detailed implementation of the object detection device in the embodiment of the present invention has been described in detail in the above object detection method, and therefore, the repeated description is not repeated here.
Fig. 3 shows an
As shown in fig. 3, the
The user may use the
The
The
It should be noted that the object detection method provided in the embodiment of the present invention may be executed by the
It should be understood that the number of terminal devices, networks, and servers in fig. 3 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 4, a block diagram of a
As shown in fig. 4, the
The following components are connected to the I/O interface 405: an
In particular, according to embodiments of the present disclosure, the processes described above with reference to the main step schematic may be implemented as computer software programs. For example, the disclosed embodiments of the invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the main step diagram. In such an embodiment, the computer program may be downloaded and installed from a network through the
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The principal step diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the main step diagrams or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or block diagrams, and combinations of blocks in the block diagrams or block diagrams, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises a point cloud coordinate mapping module 201, a first feature map generation module 202, a second feature map generation module 203, and a position information determination module 204. The names of these modules do not form a limitation to the modules themselves in some cases, for example, the point cloud coordinate mapping module 201 may also be described as "a module for mapping three-dimensional coordinates of the acquired point cloud data to two-dimensional coordinates on a histogram according to a preset mapping rule".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: mapping the three-dimensional coordinates of the collected point cloud data into two-dimensional coordinates on the histogram according to a preset mapping rule; performing convolution and down-sampling on the histogram by using a network model constructed by a depth residual error network to obtain a multi-channel characteristic diagram; deconvoluting the multi-channel feature map to obtain two channel feature maps with the same size as the histogram; and determining the position information of the detection target according to the two-channel characteristic diagram.
According to the technical scheme of the embodiment of the invention, the three-dimensional coordinates of the collected point cloud data are mapped into two-dimensional coordinates on the histogram according to a preset mapping rule; performing convolution and down-sampling on the histogram by using a network model constructed by a depth residual error network to obtain a multi-channel characteristic diagram; deconvoluting the multi-channel characteristic graph to obtain two channel characteristic graphs with the same size as the histogram; and determining the position information of the detection target according to the two-channel characteristic diagram. The method can directly process the laser point cloud data to detect the detection target in the laser point cloud data without depending on the calibration between the laser radar and the camera, thereby improving the accuracy and reliability of the detection result.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.