Self-motion estimation method and device and model training method and device

文档序号：1686274 发布日期：2020-01-03 浏览：31次中文

阅读说明：本技术 自运动估计方法和设备以及模型训练方法和设备 (Self-motion estimation method and device and model training method and device ) 是由曹现雄崔成焘于 2019-03-20 设计创作，主要内容包括：公开了一种用于估计自运动的方法和设备、以及训练设备及其方法。用于估计自运动的设备可以基于运动识别模型根据雷达感测数据来估计自运动信息。(A method and apparatus for estimating self-motion, and a training apparatus and method thereof are disclosed. The apparatus for estimating self-motion may estimate self-motion information from radar sensing data based on a motion recognition model.)

1. A method of estimating self-motion, the method comprising:

generating input data based on radar sensing data collected by the radar sensor for each time frame; and

estimating, based on a motion recognition model, self-motion information from a motion estimation device from the input data.

2. The method of claim 1, wherein estimating the self-motion information comprises:

extracting feature data from the input data based on a first model; and

calculating the self-movement information from the feature data based on a second model.

3. The method of claim 1, wherein estimating the self-motion information comprises:

calculating at least one of a position and a posture of the self-movement estimating apparatus as the self-movement information.

4. The method of claim 1, wherein estimating the self-motion information comprises:

radar sensing data corresponding to at least two time frames is input for a layer corresponding to one time frame in the motion recognition model.

5. The method of claim 1, wherein the motion recognition model comprises:

a first model comprising a layer corresponding to each of a plurality of time frames; and

a second model connected to the plurality of layers of the first model, an

Wherein estimating the self-motion information comprises:

extracting feature data for a respective time frame from the input data for the respective time frame based on a layer in the first model corresponding to each of a plurality of time frames; and

calculating self-motion information for the respective time frame from the extracted feature data based on the second model.

6. The method of claim 1, wherein estimating the self-motion information comprises:

extracting current feature data from input data corresponding to a current frame based on a first model;

loading prior feature data corresponding to a prior frame from memory; and

calculating the self-movement information from the prior feature data and the current feature data based on a second model.

7. The method of claim 1, further comprising:

feature data calculated in the current frame based on a first model included in the motion recognition model is stored in a memory.

8. The method of claim 1, wherein generating input data comprises:

detecting a radar signal using at least one radar sensor arranged along an outer surface of the self-motion estimation device; and

generating the radar sensing data by preprocessing detected radar signals.

9. The method of claim 1, wherein generating input data comprises:

selecting at least two items of radar sensing data corresponding to time frames different from each other by a preset time interval from the plurality of items of radar sensing data; and

generating the input data based on the selected radar sensing data item.

10. The method of claim 1, wherein generating input data comprises:

in response to receiving radar sensing data corresponding to a subsequent frame, excluding radar sensing data corresponding to a first frame of the plurality of time frames stacked in the input data.

11. The method of claim 1, wherein generating input data comprises:

generating radar sensing data from the radar signals, the radar sensing data being indicative of an angle and a distance relative to a point detected by the radar sensor for each quantized speed.

12. The method of claim 1, wherein generating input data comprises:

generating input data from the radar sensing data, the input data being indicative of a horizontal angle and a distance of points detected by the radar sensor for each quantized elevation angle.

13. The method of claim 1, wherein generating input data comprises:

classifying the radar sensing data into static data about a static point and dynamic data about a dynamic point;

generating static input data from the static data, the static input data being indicative of a horizontal angle and a distance of points detected by the radar sensor for each quantized elevation angle; and

generating dynamic input data from the dynamic data, the dynamic input data being indicative of horizontal angle and distance of points detected by the radar sensor for each quantized elevation angle.

14. The method of claim 1, wherein the motion recognition model comprises a Convolutional Neural Network (CNN) and a cyclic neural network.

15. The method of claim 14, wherein the recurrent neural network is a bidirectional neural network.

16. The method of claim 1, wherein estimating the self-motion information comprises:

in response to a plurality of items of radar sensing data corresponding to a plurality of time frames being stacked in the input data, self-motion information is calculated for each of the plurality of time frames.

17. The method of claim 1, further comprising:

detecting an object in the vicinity of the self-motion estimation device based on the estimated self-motion information.

18. A model training method, comprising:

generating reference input data relating to a plurality of time frames and reference output data corresponding to the reference input data from reference radar sensing data; and

training a processor for parameters of a motion recognition model based on the motion recognition model such that the processor outputs the reference output data based on the reference input data.

19. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1.

20. An automatic motion estimation device comprising:

a radar sensor configured to generate radar sensing data; and

a processor configured to generate input data from the radar sensing data for each time frame, and to estimate self-motion information of the self-motion estimation device from the input data based on a motion recognition model.

Technical Field

The following description relates to techniques for estimating self-motion using a motion recognition model.

Background

Recently, in order to solve the problem of classifying input patterns into specific groups, application of an effective pattern recognition method of human to real computers has been actively studied. One of the studies is about artificial neural networks modeled by mathematical expressions of the characteristics of human biological neurons. To solve the problem of classifying input patterns into specific groups, artificial neural networks use algorithms that mimic human learning capabilities. Based on the algorithm, the artificial neural network may generate a mapping between the input pattern and the output pattern, and the ability to generate the mapping is expressed as a learning ability of the artificial neural network. Furthermore, artificial neural networks have a generalizing capability that generates relatively correct outputs for input patterns that have not been used for learning based on learning results.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

In one general aspect, a method of estimating self-motion includes: the method further includes generating input data based on radar sensing data collected by the radar sensor for each time frame, and estimating self-motion information from the motion estimation device from the input data based on the motion recognition model.

Estimating the self-motion information may include: feature data is extracted from the input data based on the first model, and the self-movement information is calculated from the feature data based on the second model.

Estimating the self-motion information may include: at least one of a position and a posture of the self-movement estimating apparatus is calculated as the self-movement information.

Estimating the self-motion information may include: radar sensing data corresponding to at least two time frames is input for a layer corresponding to one time frame in the motion recognition model.

The motion recognition model may include a first model including a layer corresponding to each of the plurality of time frames and a second model connected to the plurality of layers of the first model. Estimating the self-motion information may include: extracting feature data for a respective time frame from the input data for the respective time frame based on a layer in the first model corresponding to each of the plurality of time frames; and calculating self-motion information for the respective time frame from the extracted feature data based on the second model.

The estimation of the self-motion information may include: the method includes extracting current feature data from input data corresponding to a current frame based on a first model, loading prior feature data corresponding to a prior frame from a memory, and calculating self-motion information from the prior feature data and the current feature data based on a second model.

The method may further comprise: feature data calculated in the current frame based on a first model included in the motion recognition model is stored in a memory.

Generating the input data may include: the method includes detecting radar signals using at least one radar sensor arranged along an outer surface of the self-motion estimation device, and generating radar sensing data by preprocessing the detected radar signals.

Generating the input data may include: at least two items of radar sensing data corresponding to time frames different from each other by a preset time interval are selected from the plurality of items of radar sensing data, and input data is generated based on the selected items of radar sensing data.

Generating the input data may include: in response to receiving radar sensing data corresponding to a subsequent frame, radar sensing data corresponding to a first frame of the plurality of time frames stacked in the input data is excluded.

Generating the input data may include: radar sensing data is generated from the radar signals, the radar sensing data being indicative of the angle and distance of points detected by the radar sensor for each quantised speed.

Generating the input data may include: input data is generated from the radar sensing data, the input data being indicative of the horizontal angle and distance of points detected by the radar sensor for each quantized elevation angle.

Generating the input data may include: classifying the radar sensing data into static data regarding the static point and dynamic data regarding the dynamic point; generating static input data from static data, the static input data being indicative of a horizontal angle and a distance of a point detected by the radar sensor for each quantized elevation angle; and generating dynamic input data from the dynamic data, the dynamic input data being indicative of the horizontal angle and distance of points detected by the radar sensor for each quantized elevation angle.

The motion recognition model may include a Convolutional Neural Network (CNN) and a recurrent neural network.

The recurrent neural network may be a bidirectional neural network.

Estimating the self-motion information may include: in response to a plurality of items of radar sensing data corresponding to a plurality of time frames being stacked in the input data, self-motion information is calculated for each of the plurality of time frames.

The method may further comprise: an object near the self-motion estimating apparatus is detected based on the estimated self-motion information.

In another general aspect, a model training method includes: generating reference input data relating to a plurality of time frames and reference output data corresponding to the reference input data from the reference radar sensing data; and training the processor for parameters of the motion recognition model based on the motion recognition model such that the processor outputs reference output data based on the reference input data.

In yet another general aspect, an auto-motion estimation apparatus includes: a radar sensor configured to generate radar sensing data; and a processor configured to generate input data from the radar sensing data for each time frame, and to estimate self-motion information from the motion estimation device from the input data based on the motion recognition model.

Other features and aspects will become apparent from the following detailed description, the accompanying drawings, and the claims.

Drawings

Fig. 1 shows an example of estimating self-motion information from radar sensing data based on a motion recognition model.

Fig. 2 shows an example of the structure of a motion recognition model.

Fig. 3 shows an example of an auto-motion estimation method.

Fig. 4 shows an example of an automatic motion estimation device.

Fig. 5 shows an example of a process of self-motion estimation.

Fig. 6 shows an example of the structure of a motion recognition model.

Fig. 7 and 8 show examples of generating input data.

Fig. 9 and 10 show examples of input data generated using stacked radar sensing data.

Fig. 11 and 12 show examples of model training methods.

FIG. 13 illustrates an example of a model training apparatus.

FIG. 14 shows an example of a model training process.

Fig. 15 shows an example of an automatic motion estimation apparatus using a plurality of radar sensors.

Throughout the drawings and detailed description, the same reference numerals should be understood to refer to the same elements, features and structures unless otherwise described or provided. The figures may not necessarily be to scale and relative sizes, proportions and depictions of elements in the figures may be exaggerated for clarity, illustration and convenience.

Detailed Description

The following detailed description is provided to assist the reader in obtaining a thorough understanding of the methods, devices, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, devices, and/or systems described herein will be apparent after understanding the disclosure of the present application. For example, the order of operations described herein is merely an example and is not limited to those set forth herein, but may be changed in ways that will be apparent after understanding the disclosure of the present application, except for operations that must occur in a certain order. Moreover, descriptions of features known in the art may be omitted for clarity and conciseness.

The features described herein may be embodied in different forms and should not be construed as limited to the examples described herein. Rather, the examples described herein are provided merely to illustrate some of the many possible ways of implementing the methods, devices, and/or systems described herein, which will be apparent after understanding the disclosure of the present application.

The terminology used herein is for the purpose of describing various examples only and is not intended to be limiting of the disclosure. The articles "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," and "having" mean that the recited features, numbers, operations, elements, components, and/or combinations thereof are present, but do not preclude the presence or addition of one or more other features, numbers, operations, elements, components, and/or combinations thereof.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

With regard to the reference numerals assigned to the elements in the drawings, it should be noted that, although the same elements are shown in different drawings, the same elements will be denoted by the same reference numerals where possible. Further, in the description of the embodiments, when it is considered that a detailed description of known related structures or functions will lead to a vague explanation of the present disclosure, such description will be omitted.

Fig. 1 shows an example of estimating self-motion information from radar sensing data based on a motion recognition model.

The motion recognition model may be a model designed to be output from the motion information 109 from the radar sensing data 101. The motion recognition model may have a machine learning structure, but is not limited thereto. The self-motion estimation device generates self-motion information 109 for each time frame at once from input data in which radar sensing data 101 of a plurality of time frames are stacked, using a motion recognition model.

The radar sensing data 101 may be data sensed by a radar sensor. For example, the radar sensor receives a signal obtained by reflecting a signal emitted from the radar sensor toward a target point, and generates radar sensing data 101 using the received signal. The radar sensing data 101 includes a distance from the radar sensor to the target point. As shown in fig. 1, radar sensing data 101 is input to a motion recognition model in a stacked form.

The autoexercise information 109 is ANDThe motion of the device is related to information and includes, for example, relative motion, coordinates, and pose of the device. In fig. 1, the self-motion information 109 indicates coordinates (x) of each of N time frames_i，y_i) N is an integer greater than or equal to 1 and i is an integer between 1 and N. The time frame may be a unit of division time. The self-motion information 109 is not limited to the above examples, and thus may include various values, such as three-dimensional (3D) coordinates (e.g., x, y, and z coordinates, distance, azimuth, and elevation), two-dimensional (2D) coordinates (e.g., x and y coordinates, distance, and angular coordinates), eastward, northward, altitude, roll, yaw, pitch, velocity, and so forth.

The motion recognition model includes a neural network 100. Based on the neural network 100, a method of performing the self-motion recognition and an apparatus for performing the method (hereinafter referred to as "self-motion estimation apparatus") are proposed, and a method of training the neural network 100 and an apparatus for training the neural network 100 (hereinafter referred to as "model training apparatus") are proposed.

Before describing the self-motion recognition, the structure of the neural network 100 will be described below.

The neural network 100 includes a plurality of layers, each layer including a plurality of nodes. Further, the neural network 100 includes connection weights that connect a plurality of nodes included in a plurality of layers to nodes included in another layer. The model training apparatus acquires the neural network 100 from an internal database stored in a memory, or receives the neural network 100 from an external server through a communicator.

The neural network 100 is, for example, a recognition model that performs a desired operation using a plurality of artificial nodes connected by connecting lines (e.g., edges). The neural network 100 may be implemented by hardware or a combination of hardware and software. The neural network 100 may also be referred to as an artificial neural network.

The neural network 100 includes a separate plurality of nodes for each layer. The nodes are connected to each other by edges having connection weights. The connection weight may be a predetermined value of the edge and may also be referred to as a connection strength.

The neural network 100 includes a plurality of layers. The neural network 100 includes, for example, an input layer 110, a hidden layer 120, and an output layer 130. The input layer 110 receives input for performing training or recognition and transmits the input to the hidden layer 120. The output layer 130 generates the output of the neural network 100 based on the signal received from the hidden layer 120. The hidden layer 120 is interposed between the input layer 110 and the output layer 130, and changes a training input of training data transmitted through the input layer 110 into a value that is easily predicted.

The input layer 110, the hidden layer 120, and the output layer 130 each include a plurality of nodes. The nodes included in the input layer 110 may be referred to as input nodes. The nodes included in the hidden layer 120 may be referred to as hidden nodes. The nodes included in the output layer 130 may be referred to as output nodes.

The input nodes included in the input layer 110 and the hidden nodes included in the hidden layer 120 are connected by edges having connection weights. The hidden node included in the hidden layer 120 and the output node included in the output layer 130 are connected by an edge having a connection weight.

Although not shown, the neural network includes a plurality of hidden layers. A neural network including a plurality of hidden layers may be referred to as a deep neural network. Training of deep neural networks may be referred to as deep learning. When the hidden layer 120 includes a first hidden layer, a second hidden layer, and a third hidden layer, an output of a hidden node included in the first hidden layer is connected to a hidden node included in the second hidden node. The output of the hidden node comprised in the second hidden layer is connected to the hidden node comprised in the third hidden layer.

The model training device and the self-motion estimation device input, to each hidden layer, an output of a previous hidden node included in the previous hidden layer using an edge having a connection weight. Further, the model training device and the self-motion estimation device generate the output of the hidden node included in the hidden layer based on the activation function and a value obtained by applying the connection weight to the output of the previous hidden node. For example, to transmit an output to a subsequent hidden node, the result of the activation function may need to be greater than the threshold of the current hidden node. In this example, the node remains in an inactive state and does not send a signal to subsequent nodes until the input vector is used to reach the threshold activation strength.

The model training apparatus trains the neural network 100 through supervised learning. The model training apparatus may be implemented by hardware modules or a combination of software modules and hardware modules. Supervised learning is the following scheme: the neural network 100 is input with a training input of training data and a training output corresponding to the training input, and updates the connection weights of the edges so that output data corresponding to the training output of the training data is output. Although fig. 1 illustrates a neural network having a node structure, embodiments are not limited to this node structure. The neural network may be stored in the storage device using various data structures.

The training data is data used for training, and includes a pair of a training input and a training output corresponding to the training input. The training input is, for example, pre-collected sample radar data. The training output is a given value to be output for a predetermined training input, and is, for example, the self-movement information 109 given for sample radar data.

The model training apparatus determines the parameters of the nodes using a gradient descent scheme based on the output values of the nodes included in the neural network and the losses that are propagated back to the neural network. For example, the model training apparatus updates the connection weights between nodes by loss back propagation learning. The loss back propagation learning is the following method: the loss is estimated by performing a forward calculation on given training data, propagating the estimated loss in the reverse direction from the output layer 130 to the hidden layer 120 and the input layer 110 and updating the connection weights to reduce the loss. The processing of the neural network 100 is performed in a direction from the input layer 110 to the hidden layer 120 and the output layer 130. In the lossy back propagation training, the update of the connection weights is performed in the direction from the output layer 130 to the hidden layer 120 and the input layer 110. One or more processes may use a buffer memory that stores a layer of computation data or a series of computation data to process the neural network in a desired direction.

The model training apparatus defines an objective function for measuring the closeness of the currently set connection weight to the optimum value, continuously changes the connection weight based on the result of the objective function, and repeatedly performs training. For example, the objective function is a loss function for calculating a loss between an actual output value output by the neural network 100 based on a training input of the training data and a desired value (e.g., a training output) to be output. The model training device updates the connection weights to reduce the value of the loss function.

The configuration of the motion recognition model will be described below.

Fig. 2 shows an example of the structure of a motion recognition model.

The motion recognition model 220 includes a first model 221 and a second model 222.

The first model 221 is a model for extracting feature data from predetermined data. The self-motion estimation device extracts feature data from the input data 210 based on the first model 221. The first model 221 includes a Convolutional Neural Network (CNN). The first model 221 includes a layer corresponding to each of a plurality of time frames. As shown in FIG. 2, the first model 221 includes convolutional layers corresponding to each time frame of the input data 210.

The second model 222 is a model for generating the self-movement information 230 from the feature data. The self-motion estimation device calculates self-motion information 230 from the feature data based on the second model 222. The second model 222 includes a recurrent neural network. For example, the second model 222 includes a bidirectional recurrent neural network as the recurrent neural network. As shown in fig. 2, the second model 222 includes layers connected to the layers of the first model 221 corresponding to each time frame.

The input data 210 is data having the following structure: radar sensing data 201 corresponding to each of a plurality of time frames is stacked in the structure. In the example of fig. 2, the input data 210 is data in which radar sensing data 201 corresponding to a frame t-1, a frame t, and a frame t +1 are stacked, t being an integer greater than or equal to 1.

Although fig. 2 shows the radar sensing data 201 as a scan image indicating a distance from a target point for each steering angle, the radar sensing data 201 is not limited thereto. The radar sensing data 201 may include all data sensed by the radar sensor for each time frame.

Based on the layer in the first model 221 corresponding to each of the plurality of time frames, the self-motion estimation device extracts feature data of the respective time frame from the input data 210 of the respective time frame. The self-motion estimation device calculates self-motion information 230 for the respective time frame from the extracted feature data based on the second model 222. Further, the self-motion estimation apparatus inputs radar sensing data 201 corresponding to at least two time frames for a layer corresponding to one time frame in the motion recognition model 220. As shown in fig. 2, the self-motion estimation apparatus inputs radar sensing data 201 corresponding to a frame t-1 and radar sensing data 201 corresponding to a frame t to a convolutional layer corresponding to the frame t in the first model 221.

The self-motion estimation device estimates self-motion information 230 for a plurality of time frames from the input data 210 for the plurality of time frames based on the motion recognition model 220. When a plurality of items of radar sensing data corresponding to a plurality of time frames are stacked in the input data 210, the self-motion estimation apparatus obtains self-motion information 230 for each of the plurality of time frames. In the example of FIG. 2, input data 210 indicates radar sensing data 201 associated with frame t-1, frame t, and frame t + 1. The self-motion estimation device outputs device coordinates in frame t-1, device coordinates in frame t, and device coordinates in frame t +1 from input data 210.

Although fig. 2 shows three time frames for the sake of simplicity of description, the number of frames used in the self-motion estimation apparatus is not limited thereto. For example, the self-motion estimation device generates input data 210 for M time frames. In this example, the input data 210 may be data in which radar sensing data 201 corresponding to frames t-M +1 to t are stacked, M being an integer greater than or equal to 1. Further, although fig. 2 shows that radar sensing data 201 corresponding to a plurality of time frames are stacked in the input data 210, the embodiment is not limited thereto. The input data 210 may include radar sensing data 201 corresponding to one time frame.

A method of estimating the self-motion using the motion recognition model will be described below.

Fig. 3 shows an example of an auto-motion estimation method.

In operation 310, the self-motion estimation device generates input data for each time frame based on radar sensing data collected by the radar sensor. For example, the radar signal is detected by at least one radar sensor arranged along an outer surface of the self-motion estimation device. The radar signal may be a signal transmitted by the radar sensor, reflected from a target point, and then sensed by the radar sensor. The self-motion estimation device generates radar sensing data by preprocessing the detected radar signals.

In operation 320, the self-motion estimation device estimates self-motion information from the motion estimation device from input data based on the motion recognition model. The self-motion estimation device calculates at least one of a position and a posture of the self-motion estimation device as self-motion information.

Fig. 4 shows an example of an automatic motion estimation device.

The self-motion estimation device 400 includes a radar sensor 410, a processor 420, and a memory 430.

The radar sensor 410 is a sensor that transmits and receives radar signals. The radar sensor 410 generates radar sensing data by preprocessing the radar signal. For example, the radar sensor 410 generates radar sensing data based on a frequency difference between a signal emitted by the radar sensor 410 and a reflected signal.

The processor 420 generates input data from the radar sensing data for each time frame. The processor 420 estimates the self-motion information from the motion estimation device 400 from the input data based on the motion recognition model. The operation of processor 420 will be further described below with reference to fig. 5-15.

The memory 430 temporarily or permanently stores information required to perform the self-motion estimation method. For example, the memory 430 stores a motion recognition model and trained parameters for the motion recognition model. Further, the memory 430 stores radar sensing data and input data.

Fig. 5 shows an example of a process of self-motion estimation.

The self-motion estimation device generates input data by performing pre-processing operations 510 on the radar sensing data 501. The self-motion estimation device estimates self-motion information 509 from the input data based on a motion recognition model 520. However, the preprocessing operation 510 is not limited thereto. Through a preprocessing operation 510, the self-motion estimation apparatus converts a radar signal corresponding to raw data acquired by a radar sensor into radar sensing data 501 indicating distance, angle, and speed information. The radar sensing data 501 may vary based on, for example, the desired resolution and modulation scheme of the radar sensor.

The self-motion estimation apparatus extracts current feature data from input data corresponding to the current frame based on the first model 521. The self-motion estimation device loads from memory prior feature data 580 corresponding to a prior frame. After preprocessing, the radar sensing data 501 is represented as 3D information, for example, distance, steering angle, and doppler velocity, which may be image-like structures such as X-coordinates, Y-coordinates, and channels. CNNs and convolutional layers may achieve better performance when extracting features of an image. CNN may have a good performance in extracting information related to distance and angle axes (as position information), and may also have a good performance for doppler velocity axes corresponding to channels (e.g., RGB of an image).

The self-motion estimation device obtains self-motion information 509 from the prior feature data 580 and the current feature data based on the second model 522. The radar sensor continuously acquires radar sensing data for each time frame. In this example, the radar sensing data acquired for adjacent time frames may have similar temporal correlations. Models that perform well in identifying data with temporal correlations may be recurrent neural networks and recurrent layers.

Fig. 6 shows an example of the structure of a motion recognition model.

When continuous radar sensing data is continuously acquired, the self-motion estimation device skips preprocessing and feature data extraction for a previous time frame because preprocessing and feature data extraction for the previous time frame are redundant. For example, the self-motion estimation device stores feature data extracted from the CNN, and performs preprocessing and feature data extraction on radar sensing data acquired for each time frame to be transmitted to the recurrent neural network together with previous feature data.

The self-motion estimation apparatus calculates feature data corresponding to the current frame from the input data 610 corresponding to the current frame based on the convolution layer corresponding to the current frame of the first model 621. The self-motion estimation apparatus stores in the memory the feature data corresponding to the current frame calculated based on the first model 621 included in the motion recognition model. The self-motion estimation apparatus adds feature data corresponding to the current frame to the feature database 680 stored in the memory.

The self-motion estimation device estimates self-motion information 630 for a plurality of temporal frames based on input data 610 corresponding to a portion of the plurality of temporal frames (e.g., data corresponding to frame t of fig. 6). For example, the first model 621 may be implemented with the following structure: the feature data of a corresponding time frame is output according to the input data 610 corresponding to the time frame. The second model 622 may be implemented with the following structure: the self-motion information 630 corresponding to the M frames is output according to the feature data corresponding to the M frames, M being "3" in the example of fig. 6. The self-motion estimation device extracts current feature data corresponding to the frame t from the input data 610 corresponding to the frame t based on the first model 621, and loads previous feature data corresponding to the frame t-M +1 to the frame t-1. The self-motion estimation device generates self-motion information 630 corresponding to the frame t-M +1 through the frame t-1 from feature data corresponding to the frame t-M +1 through the frame t-1 based on the second model 622.

In this way, the self-motion estimation device does not extract feature data corresponding to M time frames for each time frame, but extracts feature data corresponding to one time frame, thereby reducing the calculation resources and time required for feature data extraction.

Fig. 7 and 8 show examples of generating input data.

The self-motion estimation device classifies a distance (e.g., a distance from the device to a target point) and a map indicating a steering angle with respect to the target point based on, for example, an elevation angle and a velocity at the target point. The self-motion estimation device generates input data by converting and classifying the radar sensing model into a form suitable for input in the motion recognition model.

The self-motion estimation device generates radar sensing data indicating a distance and an angle of a point detected by the radar sensor for each quantized speed from the radar signal. The self-motion estimation device uses the generated radar sensing data as input data 710. For example, the velocity verified based on the radar sensing data is the velocity of the self-motion estimation device relative to the target point. For example, the angle verified based on the radar sensing data is a steering angle, which is an angular difference between the traveling direction of the device and the direction from the device toward the target point. Although the present disclosure describes the steering angle as a horizontal angle on a 2D plane sensed by the radar sensor, embodiments are not limited thereto. The steering of the radar sensor may be performed vertically and horizontally. When steering is performed vertically, the steering angle may be an elevation angle.

For simplicity of description, fig. 7 shows input data 710 generated for one time frame. The input data 710 generated for one time frame comprises radar sensing data 711, the radar sensing data 711 indicating a distance and an angle corresponding to the target point detected by the radar sensor for each quantized speed. When radar sensing data corresponding to a plurality of time frames are stacked, the self-motion estimation device generates, for each of the plurality of time frames, input data 710 indicating a distance and an angle corresponding to a target point of each quantized velocity. The self-motion estimation device calculates self-motion information from the input data 710 based on the motion recognition model 720.

The self-motion estimation device generates input data indicating a distance and a horizontal angle corresponding to a point detected by the radar sensor for each elevation angle from the radar sensing data. The elevation angle may be an angle formed by the target point with respect to a horizontal plane based on the radar sensor with respect to a vertical direction of the radar sensor.

The self-motion estimation device classifies the radar sensing data into static data 810 associated with static points and dynamic data 820 associated with dynamic points. A stationary point may be a stationary object of the target point, and a point with zero absolute velocity. The dynamic point may be a moving object of the target point, and a point whose absolute velocity is not zero. The self-motion estimation apparatus generates static input data indicating the distance and horizontal angle of the point detected by the radar sensor for each quantization angle from the static data 810. The self-motion estimation apparatus generates dynamic input data indicating the distance and horizontal angle of the point detected by the radar sensor for each quantization angle from the dynamic data 820. For simplicity of description, fig. 8 shows input data generated for one time frame. The input data generated for a time frame includes static data 810 and dynamic data 820. The static data 810 of the predetermined time frame comprises data 811 indicating the distance and horizontal angle of the target point detected by the radar sensor for each quantized angle. The dynamic data 820 of the predetermined time frame comprises data 821 indicating the distance and horizontal angle of the target point detected by the radar sensor for each quantized angle.

The self-motion estimation device calculates self-motion information from input data including static data 810 and dynamic data 820 based on a motion recognition model 830.

Fig. 9 and 10 show examples of input data generated using stacked radar sensing data.

The radar sensing data is gathered along a spatial axis.

Fig. 9 shows input data in which radar sensing data is stacked by skipping several time frames. At least two radar sensing data of time frames different by a preset time interval among the plurality of radar sensing data are selected from the motion estimation apparatus. In the example of fig. 9, the preset time interval may be three frames. The first to tenth frames are collected from the motion estimation device. The self-motion estimation apparatus selects radar sensing data 911 corresponding to the first frame, the fourth frame, the seventh frame, and the tenth frame. In the eleventh frame, the self-motion estimation apparatus selects radar sensing data 912 corresponding to the second, fifth, eighth, and eleventh frames.

The self-motion estimation device generates input data based on the selected radar sensing data. In the tenth frame, input data generated from radar sensing data 911 corresponding to the first, fourth, seventh, and tenth frames is input from the motion estimation apparatus to the motion recognition model 920. The self-motion estimation apparatus calculates self-motion information corresponding to the first frame, the fourth frame, the seventh frame, and the tenth frame from the radar sensing data 911 based on the motion recognition model 920. In the eleventh frame, the input data generated from the radar sensing data 912 corresponding to the second, fifth, eighth, and eleventh frames is input from the motion estimation device to the motion recognition model 920. The self-motion estimation apparatus calculates self-motion information corresponding to the second frame, the fifth frame, the eighth frame, and the eleventh frame from the radar sensing data 912 based on the motion recognition model 920. Such order of the time frames described above is described as an example, and the order of the time frames is not limited to this example.

Fig. 10 shows the generated input data corresponding to the predetermined length of time for each time frame. In response to receiving radar sensing data corresponding to a subsequent frame, radar sensing data corresponding to a first frame of the plurality of time frames stacked in the input data is excluded from the motion estimation device.

Referring to fig. 10, radar sensing data corresponding to first to eighth frames is collected from the motion estimation apparatus. The self-motion estimation apparatus generates input data 1011 corresponding to an eighth frame by stacking radar sensing data corresponding to the first to eighth frames. In a ninth frame, the self-motion estimation device excludes radar sensing data corresponding to the first frame. The self-motion estimation apparatus generates input data 1012 corresponding to a ninth frame by stacking radar sensing data corresponding to the second to ninth frames. The self-motion estimation apparatus calculates self-motion information corresponding to the first to eighth frames from the input data 1011 corresponding to the first to eighth frames based on the motion recognition model 1020. The self-motion estimation apparatus calculates self-motion information corresponding to the second to ninth frames from the input data 1012 corresponding to the second to ninth frames based on the motion recognition model 1020.

Fig. 11 and 12 show examples of model training methods.

Fig. 11 shows a flow of training.

In operation 1110, the model training device generates reference input data related to a plurality of time frames and reference output data corresponding to the reference input data from the reference radar sensing data. The reference sensing data corresponds to a training input and the reference output data corresponds to a training output. The training data comprises, for example, pairs of reference radar sensing data and reference output data. The reference output data may also be referred to as reference self-motion information.

In operation 1120, the model training apparatus trains the processor for parameters of the motion recognition model using the motion recognition model such that the processor outputs reference output data based on the reference input data. The model training apparatus updates parameters of the motion recognition model based on the motion recognition model until a loss between the provisional output calculated from the reference input data and the reference output data is less than a threshold loss.

The training of fig. 11 will be further described with reference to fig. 12.

In operation 1211, the model training apparatus acquires reference radar sensing data. The model training device obtains reference radar sensing data from a training database.

In operation 1212, the model training apparatus generates reference input data and reference output data. The model training apparatus generates reference input data from the reference radar sensing data by an operation similar to that described with reference to fig. 7 to 10. The model training device obtains reference output data from a training database.

In operation 1221, the model training apparatus calculates an output of the first model associated with the reference input data. As described with reference to fig. 1, the model training apparatus calculates temporary feature data by inputting reference input data to the first model and propagating the reference input data to the layers.

In operation 1222, the model training apparatus calculates an output of the second model in association with an output of the first model. As described with reference to fig. 1, the model training apparatus calculates a provisional output by inputting feature data to the second model and propagating the feature data.

In operation 1223, the model training apparatus updates the parameter based on a loss between the output of the second model and the reference output data. The model training apparatus calculates a value of an objective function between the provisional output calculated based on the second model and the reference output data as a loss. The objective function may be, for example, Mean Square Error (MSE), absolute error, and Huber loss. The objective function may be combined with a regularization term for the parameter for use. The model training apparatus adjusts parameters of the motion recognition model to reduce the loss. In the case where the motion recognition model is a neural network, the parameters of the motion recognition model may be (but are not limited to) connection weights.

In operation 1224, the model training apparatus stores the updated parameters. The model training apparatus stores the updated parameters in the memory. The model training device uploads the updated parameters to the server or sends the updated parameters to the self-motion estimation device. The parameters uploaded in the server may be transmitted to the self-motion estimation device via a communication network or a data interface.

In operation 1225, the model training apparatus determines whether the loss converges. The model training device determines whether the calculated loss is less than a threshold loss. In response to the loss having converged, the model training apparatus terminates training. In response to the loss not converging, the model training apparatus repeatedly performs operations 1211 to 1225.

FIG. 13 illustrates an example of a model training apparatus.

The model training apparatus 1300 includes a processor 1310 and a memory 1320.

The processor 1310 generates reference input data 1391 relating to a plurality of time frames and reference output data 1392 corresponding to the reference input data 1391 from the reference radar sensing data. The processor 1310 trains the processor 1310 for the parameters 1322 of the motion recognition model 1321 based on the motion recognition model 1321 such that the processor 1310 outputs reference output data 1392 based on reference input data 1391. The operations of the processor 1310 are not limited thereto, but may be performed in conjunction with the operations described with reference to fig. 11 and 12.

Memory 1320 stores information needed for model training, either temporarily or permanently. Memory 1320 stores motion recognition models 1321, parameters 1322, and training data. The memory 1320 stores reference input data 1391 and reference output data 1392 as training data. The reference output data 1392 may be a value given for the reference input data 1391, and may be self-motion information to be output in association with the reference input data north 91. Further, the memory 1320 stores reference sensing data instead of the reference input data 1391.

The model training apparatus 1300 may be implemented separately from the self-motion estimation apparatus, but is not limited thereto. The model training apparatus 1300 and the self-motion estimation apparatus may also be implemented as a single apparatus.

FIG. 14 shows an example of a model training process.

The model training apparatus trains the motion recognition model to output reference self-motion information 1492 from the reference radar sensing data 1491. The model training device uses N items of continuous radar data for which preprocessing is done. To perform training, the model training device uses N in the continuous radar data_BIndividual item mini-batch radar data is used as input. The motion recognition model includes a CNN and a recurrent neural network. The self-motion information output by the motion recognition model includes estimates of the position and pose of the device. As shown in fig. 14, the self-motion information includes, for example, N sets of x-coordinates and y-coordinates for each time frame.

The model training device acquires radar signals as raw data and generates reference radar sensing data 1491 through basic Radar Signal Processing (RSP) preprocessing. The model training device generates reference input data by temporarily accumulating the reference radar sensing data 1491.

To perform training, the model training device calculates temporal feature data for each time frame based on the first model 1421. In the example of FIG. 14, the model training device calculates temporal feature data for each of frame t-2, frame t-1, and frame t. The model training device calculates a provisional output for each time frame based on the second model 1422.

The model training device calculates an error value, such as a loss, by comparing the provisional output with the nominal truth data (e.g., the reference self-motion information 1492). The model training apparatus uses a loss function such as MSE.

The model training device propagates the calculated error values to lower layers based on a back propagation over time (BPTT) scheme and updates parameters between layers of the motion recognition model.

For each training, the model training device generates a trained motion recognition model by storing state values and weights for the first model 1421 and the second model 1422. When the error values converge, the model training device terminates training, otherwise retraining is performed.

When the model training device performs training, the first model 1421 includes a layer 1450 of a previous frame. When the first model 1421 is used for self-motion estimation, the layer 1450 of the previous frame is removed. The model training apparatus performs training using a plurality of training data, thereby increasing the training speed. Further, the self-motion estimation apparatus uses radar sensing data corresponding to the current frame, thereby increasing the operation speed of self-motion estimation.

Fig. 15 shows an example of an automatic motion estimation apparatus using a plurality of radar sensors.

Referring to fig. 15, the self-motion estimation apparatus 1500 may be implemented as, for example, a vehicle. The self-motion estimation apparatus 1500 estimates the attitude and the movement path of the vehicle by positioning the radar sensor in the vehicle. The self-motion estimation apparatus 1500 detects an object near the self-motion estimation apparatus 1500 based on the estimated self-motion information. The self-motion estimation device 1500 assists autonomous driving of the vehicle and various Advanced Driver Assistance System (ADAS) functions. The self-motion estimation apparatus 1500 acquires a wide-angle high-resolution image based on self-position information accurately measured for each time frame.

As shown in fig. 15, when the self-motion estimation apparatus 1500 includes a plurality of radar sensors, the self-motion estimation apparatus 1500 locates each of the radar sensors and fuses the positioning results, thereby improving the accuracy of self-motion information estimation. The self-motion estimation device 1500 may combine a radar sensor with another sensor, such as a camera sensor, an ultrasonic sensor, and light detection and ranging (LiDAR), or fuse self-motion information and information such as a Global Navigation Satellite System (GNSS), a vehicle speed, and a steering angle, thereby improving accuracy. As shown in fig. 15, when acquiring omnidirectional radar sensing data, the self-motion estimation apparatus 1500 generates a 360-degree image of the surroundings of the vehicle or identifies the surrounding environment of the vehicle.

The self-motion estimation device 1500 can maintain accuracy without requiring special actions such as closed loop regardless of the passage of time. Furthermore, the self-motion estimation apparatus 1500 can maintain accuracy even when a moving object is in a captured scene.

The devices, units, modules, apparatuses, and other components described herein are implemented by hardware components. Examples of hardware components that may be used to perform the operations described herein include controllers, sensors, generators, drivers, memories, comparators, arithmetic logic units, adders, subtractors, multipliers, dividers, integrators, and any other electronic components configured to perform the operations described herein, where appropriate. In other examples, one or more hardware components that perform the operations described herein are implemented by computing hardware (e.g., by one or more processors or computers). A processor or computer may be implemented by one or more processing elements (e.g., an array of logic gates, a controller and arithmetic logic unit, a digital signal processor, a microcomputer, a programmable logic controller, a field programmable gate array, a programmable logic array, a microprocessor, or any other device or combination of devices that is configured to respond to and execute instructions in a defined manner to achieve a desired result). In one example, a processor or computer includes (or is connected to) one or more memories storing instructions or software for execution by the processor or computer. A hardware component implemented by a processor or a computer may execute instructions or software, such as an Operating System (OS) and one or more software applications running on the OS to perform the operations described herein. The hardware components may also access, manipulate, process, create, and store data in response to execution of instructions or software. For the sake of brevity, descriptions of examples described in this application may use the singular term "processor" or "computer," but in other examples multiple processors or computers may be used, or a processor or computer may include multiple processing elements, or multiple types of processing elements, or both. For example, a single hardware component or two or more hardware components may be implemented by a single processor, or two or more processors, or a processor and a controller. One or more hardware components may be implemented by one or more processors, or processors and controllers, and one or more other hardware components may be implemented by one or more other processors, or another processor and another controller. One or more processors or a processor and a controller may implement a single hardware component, or two or more hardware components. The hardware components may have any one or more of a variety of different processing configurations, examples of which include single processors, independent processors, parallel processors, Single Instruction Single Data (SISD) multiprocessing, Single Instruction Multiple Data (SIMD) multiprocessing, Multiple Instruction Single Data (MISD) multiprocessing, and Multiple Instruction Multiple Data (MIMD) multiprocessing.

The methods of performing the operations described in this application are performed by computing hardware, e.g., one or more processors or computers implemented as described above executing instructions or software, to perform the operations described in this application performed by these methods. For example, a single operation or two or more operations may be performed by a single processor, or two or more processors, or a processor and a controller. One or more operations may be performed by one or more processors or processors and controllers, and one or more other operations may be performed by one or more other processors or another processor and another controller. One or more processors or a processor and a controller may perform a single operation or two or more operations.

Instructions or software for controlling a processor or computer to implement the hardware components and perform the methods described above are written as computer programs, code segments, instructions, or any combination thereof, for individually or collectively instructing or configuring the processor or computer to operate as a machine or special purpose computer to perform the operations and methods described above as being performed by the hardware components. In one example, the instructions or software include machine code that is directly executed by a processor or computer, such as machine code generated by a compiler. In another example, the instructions or software include higher level code that is executed by a processor or computer using an interpreter. Instructions or software can be readily written by a programmer of ordinary skill in the art based on the block and flow diagrams illustrated in the drawings and the corresponding descriptions in the specification, which disclose algorithms for performing the operations performed by the hardware components and methods as described above.

Instructions or software that control a processor or computer to implement hardware components and perform methods as described above, as well as any associated data, data files, and data structures, are recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media. Examples of non-transitory computer-readable storage media include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or optical disk storage, Hard Disk Drive (HDD), solid-state drive (SSD), flash memory, card-type memory (e.g., multimedia card or microcard (e.g., Secure Digital (SD) or extreme digital (XD)), magnetic tape, magneto-optical data storage, flash memory, card-type memory (e.g., magnetic disk, magneto-optical data storage, magnetic disk, Optical data storage devices, hard disks, solid state disks, and any other device configured as follows: store the instructions or software and any associated data, data files, and data structures in a non-transitory manner and provide the instructions or software and any associated data, data files, and data structures to a processor or computer so that the processor or computer can execute the instructions.

Although the present disclosure includes specific examples, it will be apparent to those of ordinary skill in the art that: various changes in form and detail may be made in these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered merely as illustrative and not for purposes of limitation. The description of features or aspects in each example is deemed applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order and/or if components in the described systems, architectures, devices, or circuits are combined in a different manner and/or replaced or supplemented by other components or their equivalents. Therefore, the scope of the present disclosure is defined not by the detailed description but by the claims and their equivalents, and all changes within the scope of the claims and their equivalents are to be construed as being included in the present disclosure.

28页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种测速仪雷达自动修正角度的方法和系统

Self-motion estimation method and device and model training method and device

相关技术

网友询问留言