Bayesian global optimization based parameter tuning for vehicle motion controllers

文档序号：125336 发布日期：2021-10-22 浏览：28次中文

阅读说明：本技术 用于车辆运动控制器的基于贝叶斯全局优化的参数调谐 (Bayesian global optimization based parameter tuning for vehicle motion controllers ) 是由王禹罗琦许嘉轩周金运姜舒陶佳鸣曹昱林玮曼许珂诚缪景皓胡江滔于 2020-12-14 设计创作，主要内容包括：本公开提供用于车辆运动控制器的基于贝叶斯全局优化的参数调谐。在一个实施例中,一种用于优化自主驾驶车辆(ADV)的控制器的方法,包括：获得数个样本,每个样本具有参数集合,迭代地执行以下操作直到满足预定条件：根据控制器基于样本的参数集合的配置,为每个样本确定分数；将机器学习模型应用到样本和相应分数,以确定均值函数和方差函数,产生新样本作为关于参数集合的输入空间的均值函数和方差函数的函数的最小值,以及将新样本添加到几个样本中；以及输出新样本作为最佳样本,其中最佳样本的参数被用于配置控制器以自主地驾驶ADV。(The present disclosure provides a bayesian global optimization based parameter tuning for a vehicle motion controller. In one embodiment, a method for optimizing a controller of an Autonomously Driven Vehicle (ADV) includes: obtaining a number of samples, each sample having a set of parameters, iteratively performing the following until a predetermined condition is satisfied: determining a score for each sample according to a configuration of the controller based on a parameter set of the sample; applying a machine learning model to the samples and corresponding scores to determine a mean function and a variance function, generating a new sample as a minimum of a function of the mean function and the variance function of the input space with respect to the set of parameters, and adding the new sample to several samples; and outputting the new sample as an optimal sample, wherein parameters of the optimal sample are used to configure the controller to autonomously drive the ADV.)

1. A method for optimizing a controller of an autonomously driven vehicle, ADV, comprising:

obtaining a plurality of samples, each sample having a set of parameters;

iteratively performing the following until a predetermined condition is satisfied:

determining a score for each sample according to a configuration of the controller based on a parameter set of the sample;

applying a machine learning model to the plurality of samples and the corresponding scores to determine a mean function and a variance function,

generating new samples as a minimum of a function of a mean function and a variance function of an input space with respect to a set of parameters, an

Adding the new sample to the plurality of samples; and

the new sample is output as an optimal sample, wherein parameters of the optimal sample are used to configure the controller to autonomously drive the ADV.

2. The method of claim 1, wherein obtaining a plurality of samples comprises randomly generating at least some of the plurality of samples.

3. The method of claim 1, wherein the machine learning model comprises a gaussian process regression model.

4. The method of claim 1, wherein the machine learning model comprises a tree-shaped Parzen estimator model.

5. The method of claim 1, wherein determining a score for each sample comprises,

configuring a controller with one or more parameters of a parameter set of a sample; and

simulating performance of the configured controller, wherein the score indicates the simulated performance.

6. The method of claim 1, wherein each parameter of the set of parameters represents a weight associated with a cost term of a cost function used by a controller of the ADV to generate control commands to autonomously navigate the ADV.

7. The method of claim 1, wherein each new sample is generated based on a previously generated new sample added to the plurality of samples.

8. The method of claim 1, wherein the predetermined condition comprises a predetermined number of new samples to be generated.

9. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform the method for optimizing the controller of an autonomously driven vehicle of any one of claims 1-8.

10. A data processing system comprises

A processor; and

a memory coupled to the processor and storing instructions that, when executed by the processor, cause the processor to perform the method for optimizing the controller of an autonomously driven vehicle of any one of claims 1-8.

11. A computer program product comprising a computer program which, when executed by a processor, implements a method for optimizing a controller of an autonomously driven vehicle according to any one of claims 1-8.

Technical Field

Embodiments of the present disclosure relate generally to operating an autonomous vehicle. More particularly, embodiments of the present disclosure relate to Bayesian global optimization-based parameter tuning for vehicle motion controllers.

Background

A vehicle operating in an autonomous mode (e.g., unmanned) may relieve some of the driving-related responsibilities of the occupants, particularly the driver. When operating in the autonomous mode, the vehicle may navigate to various locations using onboard sensors, allowing the vehicle to travel with minimal human interaction or in some situations without any passengers.

A vehicle controller of the autonomous vehicle may generate control commands to move the vehicle according to a desired path or route. For example, a Model Predictive Controller (MPC) may generate a sequence of commands to be applied over a future time frame that will cause the controlled object to move along the predicted path. The MPC may be configured in accordance with one or more control parameters to optimize the command sequence so that the controller can control an Autonomous Driving Vehicle (ADV) to track along a target path at a target speed.

Tuning control parameters play an important role in controller design. Conventionally, the parameters of the controller may be tuned by a human observer (e.g., by an engineer in a controlled setting, such as a laboratory) or an exhaustive search using a grid search algorithm. One problem with the first solution is that optimization of the control parameters may not be guaranteed, since the solution is highly dependent on human experience and intuition. For the second solution, such an exhaustive search has a low efficiency, especially when the configuration space becomes relatively high, thereby potentially exponentially increasing the computational cost.

Disclosure of Invention

In a first aspect, there is provided a method for optimizing a controller of an autonomously driven vehicle, ADV, comprising:

obtaining a plurality of samples, each sample having a set of parameters;

iteratively performing the following until a predetermined condition is satisfied:

determining a score for each sample according to a configuration of the controller based on a parameter set of the sample;

applying a machine learning model to the plurality of samples and the corresponding scores to determine a mean function and a variance function,

generating new samples as a minimum of a function of a mean function and a variance function of an input space with respect to a set of parameters, an

Adding the new sample to the plurality of samples; and

the new sample is output as an optimal sample, wherein parameters of the optimal sample are used to configure the controller to autonomously drive the ADV.

In a second aspect, there is provided a non-transitory machine readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform the method for optimizing a controller of an Autonomously Driven Vehicle (ADV) as described in the first aspect.

In a third aspect, there is provided a data processing system comprising

A processor; and

a memory coupled to the processor and storing instructions that, when executed by the processor, cause the processor to perform a method for optimizing a controller of an Autonomously Driven Vehicle (ADV) as described in the first aspect.

In a fourth aspect, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method for optimizing a controller of an autonomously driven vehicle according to the first aspect.

According to the present disclosure, the controller parameters are optimized by reducing the total cost.

Drawings

Aspects are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements. It should be noted that references to "an" or "one" aspect of the disclosure are not necessarily to the same aspect, and they mean at least one. Moreover, in the interest of brevity and minimization of the overall number of figures, a given figure can be used to illustrate features of more than one aspect and not all elements of a figure are required for a given aspect.

FIG. 1 is a block diagram illustrating a networked system according to one embodiment.

FIG. 2 is a block diagram illustrating an example of an autonomous vehicle, according to one embodiment.

Fig. 3 is a block diagram illustrating an example of a controller parameter tuner according to one embodiment.

FIG. 4 is a flow diagram illustrating a process for tuning parameters of a controller, according to one embodiment.

5A-5B are block diagrams illustrating an example of a perception and planning system for use with an autonomous vehicle, according to one embodiment.

Detailed Description

Several embodiments of the present disclosure are now described with reference to the drawings. Whenever the shapes, relative positions, and other aspects of the components described in a given aspect are not explicitly defined, the scope of the present disclosure herein is not limited to the components shown, which are intended for illustrative purposes only. Moreover, while numerous details are set forth, it should be understood that aspects may be practiced without these details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description. Moreover, unless the meaning is clearly contrary, all ranges recited herein are to be considered as inclusive of the endpoints of each range.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the disclosure. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.

The present disclosure addresses the problem of optimizing controller parameters by applying bayesian global optimization and Gaussian Process Regression (GPR) to parameter tuning of a controller, such as a motion controller used by an Autonomous Driving Vehicle (ADV). In particular, the present disclosure describes using bayesian optimization techniques to find the optimal controller (parameter) configuration by using a GPR model as (an approximation of) an objective function (e.g., a proxy model). For example, samples of the controller parameters to be tuned are obtained (e.g., randomly generated). A score (or label) is generated for each sample by simulating a controller configured according to one or more parameters of the sample. The sample sets and corresponding scores are fitted into a machine learning model (such as a GPR model) to derive a mean prediction function and a variance function. Using the derived function, the next most promising sample point is determined. A new sample is added to the sample set and the process is repeated (e.g., for a predetermined number of iterations) to define a sufficient exploration of the sample space. Once the process has been performed for a sufficient number of iterations, a GPR model-based Bayesian global optimization outputs the best samples. The parameters of the best sample may be used to configure the controller to autonomously drive the ADV. Thus, the present disclosure optimizes controller parameters by reducing the overall cost (or negative control profile score).

According to some embodiments, a computer-implemented method of a controller for optimizing ADV includes: obtaining a number of samples, each sample having a set of parameters; iteratively performing the following until a predetermined condition is satisfied: determining a score for each sample according to a configuration of a controller based on a parameter set of the samples, applying a machine learning model to the plurality of samples and the corresponding scores to determine a mean function and a variance function, generating a new sample that is a minimum of a utility function that combines the mean function and the variance function with respect to an input space of the parameter set, and adding the new sample to the plurality of samples; the new sample is output as an optimal sample, wherein parameters of the optimal sample are used to configure the controller to autonomously drive the ADV.

In one embodiment, obtaining the number of samples includes randomly generating at least some of the samples. In another embodiment, the machine learning model is a Gaussian Process Regression (GPR) model. In some embodiments, the machine learning model is a Tree Park Estimator (TPE) model. In one embodiment, determining the score includes: for each sample, configuring a controller with one or more parameters of a parameter set of the sample; and simulating performance of the configured controller, wherein the fraction represents the simulated performance. In another embodiment, each parameter in the set of parameters is a weight associated with a cost term of a cost function used by a motion controller of an Autonomously Driven Vehicle (ADV) to generate control commands to autonomously navigate the ADV.

In one embodiment, each new sample is generated based on a previously generated new sample added to the number of samples. In another embodiment, the predetermined condition is a predetermined number of new samples to be generated.

In another embodiment of the disclosure, a non-transitory machine-readable medium and a data processing system perform at least some of the processes described herein.

Fig. 1 is a block diagram illustrating an autonomous vehicle network configuration according to one embodiment of the present disclosure. Referring to fig. 1, a network configuration 100 includes an Autonomous Driving Vehicle (ADV)101 that may be communicatively coupled to one or more servers 103 and 104 via a network 102. Although one autonomous vehicle is shown, multiple autonomous vehicles may be coupled to each other and/or to server 103 and 104 via network 102. The network 102 may be any type of wired or wireless network, such as a Local Area Network (LAN), a Wide Area Network (WAN) such as the Internet, a cellular network, a satellite network, or a combination thereof. Server(s) 103-104 may be any type of server or cluster of servers, such as a Web or cloud server, an application server, a backend server, or a combination thereof. The servers 103-104 may be data analysis servers, content servers, traffic information servers, map and point of interest (MPOI) servers, or location servers, among others.

An autonomous vehicle refers to a vehicle that can be configured to be in an autonomous mode in which the vehicle navigates through the environment with little or no driver input. Such autonomous vehicles may include a sensor system having one or more sensors configured to detect information about the environment in which the vehicle operates. The vehicle and its associated controller(s) use the detected information to navigate through the environment. The autonomous vehicle 101 may operate in a manual mode, a fully autonomous mode, or a partially autonomous mode.

In one embodiment, the autonomous vehicle 101 includes, but is not limited to, a sensing and planning system 110, a vehicle control system 111, a wireless communication system 112, a user interface system 113, and a sensor system 115. The autonomous vehicle 101 may also include certain common components included in a common vehicle, such as an engine, wheels, steering wheel, transmission, etc., which may be controlled by the vehicle control system 111 and/or the sensing and planning system 110 using various communication signals and/or commands (e.g., an acceleration signal or command, a deceleration signal or command, a steering signal or command, a braking signal or command, etc.).

The components 110 and 115 can be communicatively coupled to each other via an interconnect, bus, network, or combination thereof. For example, the components 110 and 115 CAN be communicatively coupled to each other via a Controller Area Network (CAN) bus. The CAN bus is a vehicle bus standard designed to allow microcontrollers and devices to communicate with each other in applications without a host. It is a message-based protocol originally designed for multiplexed electrical wiring within a vehicle, but is also used in many other environments.

Referring now to fig. 2, in one embodiment, the sensor system 115 includes, but is not limited to, one or more cameras 211, a Global Positioning System (GPS) unit 212, an Inertial Measurement Unit (IMU)213, a radar unit 214, and a light detection and range (LIDAR) unit 215. The GPS system 212 may include a transceiver operable to provide information regarding the location of the autonomous vehicle. The IMU unit 213 may sense position and orientation changes of the autonomous vehicle based on inertial acceleration. Radar unit 214 may represent a system that uses radio signals to sense objects within the local environment of an autonomous vehicle. In some embodiments, in addition to sensing an object, radar unit 214 may additionally sense a speed and/or heading of the object. The LIDAR unit 215 may sense objects in the environment in which the autonomous vehicle is located using a laser. The LIDAR unit 215 may include one or more laser sources, laser scanners, and one or more detectors, among other system components. The camera 211 may include one or more devices to capture images of the environment surrounding the autonomous vehicle. The camera 211 may be a still camera and/or a video camera. The camera may be mechanically movable, for example by mounting the camera on a rotating and/or tilting platform.

The sensor system 115 may also include other sensors such as sonar sensors, infrared sensors, steering sensors, throttle sensors, brake sensors, and audio sensors (e.g., microphones). The audio sensor may be configured to capture sound from an environment surrounding the autonomous vehicle. The steering sensor may be configured to sense a steering angle of a steering wheel, wheels, or a combination thereof of the vehicle. The throttle sensor and the brake sensor sense a throttle position and a brake position of the vehicle, respectively. In some cases, the throttle sensor and the brake sensor may be integrated into an integrated throttle/brake sensor.

In one embodiment, the vehicle control system 111 includes, but is not limited to, a steering unit 201, a throttle unit 202 (also referred to as an acceleration unit), and a brake unit 203. The steering unit 201 is used to adjust the direction or heading of the vehicle. The throttle unit 202 is used to control the speed of the motor or engine, which in turn controls the speed and acceleration of the vehicle. The brake unit 203 decelerates the vehicle by providing friction to slow the wheels or tires of the vehicle. Note that the components shown in fig. 2 may be implemented in hardware, software, or a combination thereof.

Referring back to fig. 1, the wireless communication system 112 allows communication between the autonomous vehicle 101 and external systems, such as devices, sensors, other vehicles, and the like. For example, the wireless communication system 112 may wirelessly communicate with one or more devices (such as the server 103 over the network 102 and 104) directly or via a communication network. The wireless communication system 112 may communicate with another component or system using any cellular communication network or Wireless Local Area Network (WLAN), for example, using WiFi. The wireless communication system 112 may communicate directly with devices (e.g., passenger's mobile device, display device, speakers within the vehicle 101), for example, using an infrared link, bluetooth, etc. The user interface system 113 may be part of peripheral devices implemented within the vehicle 101 including, for example, a keypad, a touch screen display device, a microphone, and speakers, among others.

Some or all of the functions of the autonomous vehicle 101 may be controlled or managed by the perception and planning system 110, particularly when operating in an autonomous driving mode. The perception and planning system 110 includes the necessary hardware (e.g., processor(s), memory, storage devices) and software (e.g., operating system, planning and routing programs) to receive information from the sensor system 115, the control system 111, the wireless communication system 112, and/or the user interface system 113, process the received information, plan a route or path from an origin to a destination point, and then drive the vehicle 101 based on the planning and control information. Alternatively, the perception and planning system 110 may be integrated with the vehicle control system 111.

For example, a user who is a passenger may specify a start location and a destination of a trip, e.g., via a user interface. The perception and planning system 110 obtains data related to the trip. For example, the awareness and planning system 110 may obtain location and route information from an MPOI server, which may be part of the server 103 and 104. The location server provides location services and the MPOI server provides map services and POIs for certain locations. Alternatively, such location and MPOI information may be cached locally in a persistent storage device of the sensing and planning system 110.

The awareness and planning system 110 may also obtain real-time traffic information from a traffic information system or server (TIS) as the autonomous vehicle 101 moves along the route. Note that server 103 and 104 may be operated by a third party entity. Alternatively, the functionality of server 103 and 104 may be integrated with sensing and planning system 110. Based on the real-time traffic information, MPOI information, and location information, as well as real-time local environmental data (e.g., obstacles, objects, nearby vehicles) detected or sensed by sensor system 115, perception and planning system 110 may plan an optimal route and drive vehicle 101 according to the planned route, e.g., via control system 111, to safely and efficiently reach the designated destination.

Server 103 may be a data analysis system to perform data analysis services for various clients. In one embodiment, data analysis system 103 includes a data collector 121 and a machine learning engine 122. The data collector 121 collects driving statistics 123 from various vehicles (either autonomous vehicles or conventional vehicles driven by human drivers). The driving statistics 123 include information indicative of driving commands issued (e.g., throttle, brake, steering commands) and responses of the vehicle (e.g., speed, acceleration, deceleration, direction) captured by sensors of the vehicle at different points in time. The driving statistics 123 may also include information describing the driving environment at different points in time, such as a route (including a start location and a destination location), MPOI, road conditions, weather conditions, and so forth.

Based on the driving statistics 123, the machine learning engine 122 generates or trains a set of rules, algorithms, and/or predictive models 124 for various purposes. In one embodiment, the predictive models 124 may include parametric and/or non-parametric models. For example, the models may include a Gaussian Process Regression (GPR) model and/or a tree-shaped Parsen estimator (TPE) model, each of which may be used for parameter tuning as described herein.

As shown, the machine learning engine 122 includes a controller parameter tuner 125, the controller parameter tuner 125 configured to perform bayesian-global-optimization-based parameter tuning as described herein. In one embodiment, tuner 125 may be configured to tune one or more parameters of any type of controller (e.g., a motion controller). For example, the tuner may be configured to tune parameters of a proportional-integral-derivative (PID) controller, a Linear Quadratic Regulator (LQR) controller, or a Model Predictive Control (MPC) controller.

In one embodiment, the controller parameter (or parameters) may be any type of parameter (or setting) of the controller. For example, the PID controller may have at least three parameters, one for proportional gain, one for integral gain, and one for derivative gain. As another example, the parameter may be a weight associated with and applied to a cost term of a cost function that a motion controller (e.g., an MPC controller) of the ADV uses to generate control commands in order to autonomously navigate the ADV. More information about motion controllers is described herein.

The algorithm 124 may then be uploaded to the ADV for real-time use in autonomous driving. In one embodiment, as described herein, one or more controller parameters that are optimally tuned may be uploaded onto the ADV for use by the motion controller operating therein in real-time.

Fig. 3 is a block diagram illustrating an example of a controller parameter tuner according to one embodiment. As described herein, the controller parameter tuner 125 may be executed by the server 103 (e.g., one or more processors of the server) in order to optimally tune one or more controller parameters that may be used by the motion controller of the ADV. The tuner includes a configuration generator 305, a configuration profiler 310, a modeler 315, and decision logic 320. The generator is configured to generate one or more (initial) samples of the controller parameter. For example, the generator may generate x_mA number of samples, where each x sample includes the d parameters for which the tuner will optimally tune. Thus, the generator generates a vector x comprising the d-dimension_mE Rd, where each training point represents a control parameter setting. For example, for a PID controller, m is 3. In one embodiment, the generator may randomly generate the sample x_mAt least some of. In another embodiment, the generator may input from a library or from a userOne or more samples are obtained.

The configuration profiler 310 is configured to obtain a pool of samples (e.g., initial samples from the configuration generator 305) and is configured to determine a score (or label) y for each sample according to the configuration of the controller based on the parameters of the sample. In particular, the profiler configures the controller with one or more parameters of the sample (e.g., associated with d parameters). The profiler then evaluates the configured profiling response. In particular, the profiler simulates the performance of a configured controller and generates a score indicative of the simulated performance. Thus, the parser generates a score for each sample, thereby generating y_mE R, which represents a negative control profile score. In one embodiment, a low score (e.g., below a threshold) indicates low (or less than satisfactory) performance of the controller, while a high score (e.g., above a threshold) indicates high (or more desirable) performance of the controller.

The modeler 315 is configured to obtain x_mAnd y_mAnd is configured to apply to the obtained samples and scores (or to train a machine learning model such as a GPR model, e.g., the obtained samples and scores as training data). Once applied, the gaussian process may be updated to predict the score y for the input sample x. In one embodiment, the input sample is a test input sample during a first pass through a gaussian process. In another embodiment, the input samples may be new samples generated by a previous iteration of bayesian global optimization as described herein. The outputs of the GPR process are a mean function h (x) and a variance function v (x), which defines the predictive Gaussian process as

Where X is the entire input space, X is the input samples,is the variance of the noise, I is the identity matrix, y is one or more scores, and K is the kernel function.

The modeler 315 is configured to generate new samples X, e.g.

Wherein the weighting factor k is chosen to balance how much to pay to explore more potential good points with high uncertainty (e.g. with high variance) or how much to pay with the knowledge about the current best point (with the best configuration) found so far. Function(s)Referred to as utility functions.

Thus, the new sample X is the minimum of the utility function that combines the mean function and the variance function with respect to the input space X of the set of parameters of the input samples. As described herein, a new sample is determined using the gaussian model described herein. In one embodiment, the samples may be determined using any type of model (e.g., TPE model) to derive new samples from the mean function and variance function.

Decision logic 320 is configured to obtain new samples X from the modeler and determine whether the tuner has performed a sufficient amount of exploration. In one embodiment, decision logic 320 may determine whether a predetermined (end) condition is satisfied. For example, the predetermined condition may be a predetermined number of iterations or new samples to be generated (e.g., 100 new samples). In another embodiment, the determination may be based on an error threshold between new samples. For example, if the error (or difference) between at least some of the new samples is below a threshold (e.g., 0.1%), then it may be determined that there is a sufficient amount of exploration and that the latest sample is optimal.

If not, then the new sample is added to the current pool of samples (e.g., for the second pass, the pool is now x_m+1) Thereby increasing the sample pool to continue exploration. In particular, as long as the junction is not satisfiedThe tuner will iteratively perform at least some of the operations of configuration parser 310 and/or modeler 315 to generate new samples, with each new sample being generated based on previously generated new samples added to the sample pool. In one embodiment, to perform the next iteration, the latest sample generated (e.g., in a previous iteration) may be the input sample for the current iteration. Thus, the tuner will continue this loop until decision logic 320 determines that enough exploration has been performed (e.g., at new sample x)_m+nAt (c). At this point, decision logic 320 outputs the best sample. In one embodiment, the best sample is the latest sample (e.g., x) to be generated_m+n). In another embodiment, the sample that is output as the best sample is selected based on some criteria (e.g., below and/or above a threshold). In one embodiment, outputting the best samples may include sending the best samples as a controller configuration (e.g., via a wireless network) to one or more ADVs to configure the controller with the best samples to autonomously drive the ADV.

FIG. 4 is a flow diagram of a process 400 for tuning parameters of a controller, according to one embodiment. In particular, the process 400 determines an optimal sample of one or more controller parameters for configuring a controller (such as a motion controller of an ADV). At least some of the operations described herein may be performed by controller parameter tuner 125 of server 103 as shown in fig. 3.

The process 400 obtains a number of samples (as a pool of samples), each sample having a set of parameters (at block 401). For example, the configuration generator 305 may randomly generate samples, each sample having parameters to be optimally tuned. The process 400 determines a score for each sample according to the configuration of the controller based on the parameter set of the sample (at block 402). For example, configuration parser 310 simulates the performance of the controller according to the configuration on a per sample basis to produce a corresponding score. The process 400 applies a machine learning model (e.g., a GPR model) to the samples and corresponding scores to determine a mean function and a variance function (at block 403). For example, modeler 315 trains a model and determines a function with respect to input samples. In one embodiment, the input samples may be random test samples. In another embodiment, the input samples may be determined from an acquisition function (such as a desired improvement). In some embodiments, the input samples may be newly generated samples of a previous iteration as described herein.

The process 400 generates a new sample as a minimum of a function that is a function of a mean function and a variance function of the input space with respect to the set of parameters (at block 404). The process 400 adds a new sample to the sample pool (at block 405). The process 400 determines whether more new samples should be generated (at decision block 406). As described herein, decision logic 320 may determine whether an end condition is satisfied. If so, process 400 returns to block 402 to repeat the Bayesian global optimization of the GPR model. However, if no new samples are to be generated, the process 400 outputs the new samples (e.g., most recently generated) as the best samples, where the parameters of the best samples may be used to configure the controller to autonomously drive the ADV (at block 407).

Some embodiments perform variations of process 400. For example, certain operations of the processes may be performed out of the exact order shown and described. Certain operations may be performed in a sequential series of operations, some operations may be omitted, and different certain operations may be performed in different embodiments.

Fig. 5A and 5B are block diagrams illustrating an example of a perception and planning system for use with an autonomous vehicle, according to one embodiment. System 500 may be implemented as part of autonomous vehicle 101 of fig. 1, including but not limited to sensing and planning system 110, control system 111, and sensor system 115. Referring to fig. 5A-5B, the perception and planning system 110 includes, but is not limited to, a location module 501, a perception module 502, a prediction module 503, a decision module 504, a planning module 505, a control module 506, a routing module 507, and one or more motion controllers 508.

Some or all of the modules 501-508 may be implemented in software, hardware, or a combination thereof. For example, the modules may be installed in the permanent storage 552, loaded into the memory 551, and executed by one or more processors (not shown). Note that some or all of these modules may be communicatively coupled to or integrated with some or all of the modules of the vehicle control system 111 of fig. 2. Some of the modules 501 and 508 may be integrated together as an integrated module.

The positioning module 501 determines the current location of the autonomous vehicle 500 (e.g., using the GPS unit 212) and manages any data related to the user's journey or route. The location module 501 (also referred to as a map and route module) manages any data related to the user's journey or route. The user may log in and specify the starting location and destination of the trip, for example, via a user interface. The location module 501 communicates with other components of the autonomous vehicle 500, such as map and route information 511, to obtain data related to the trip. For example, the location module 501 may obtain location and route information from a location server and a map and poi (mpoi) server. The location server provides location services and the MPOI server provides map services and POIs for certain locations, which may be cached as part of the map and route information 511. The location module 501 may also obtain real-time traffic information from a traffic information system or server as the autonomous vehicle 500 moves along the route.

Based on the sensor data provided by the sensor system 115 and the positioning information obtained by the positioning module 501, a perception of the surrounding environment is determined by the perception module 502. The perception information may indicate a situation around the vehicle that the average driver will perceive as being driving. Perception may include the relative position of a lane configuration, a traffic light signal, another vehicle, e.g., in the form of an object, a pedestrian, a building, a crosswalk, or other traffic-related sign (e.g., stop sign, yield sign), and so forth. The lane configuration includes information describing one or more lanes, such as, for example, the shape of the lane (e.g., straight or curved), the width of the lane, the number of lanes in the road, one or two-way lanes, merge or separate lanes, drive away lanes, and so forth.

The perception module 502 may include a computer vision system or functionality of a computer vision system to process and analyze images captured by one or more cameras in order to identify objects and/or features in the environment of the autonomous vehicle. The objects may include traffic signals, roadway boundaries, other vehicles, pedestrians, and/or obstacles, and the like. Computer vision systems may use object recognition algorithms, video tracking, and other computer vision techniques. In some embodiments, the computer vision system may map the environment, track the object, estimate the speed of the object, and the like. The perception module 502 may also detect objects based on other sensor data provided by other sensors, such as radar and/or LIDAR.

For each of the objects, the prediction module 503 predicts what behavior the object will behave under the environment. The prediction is performed based on the perception data of the driving environment perception at the time point according to a set of map/route information 511 and traffic rules 512. For example, if the object is a vehicle in the opposite direction and the current driving environment includes an intersection, the prediction module 503 will predict whether the vehicle will likely move straight ahead or turn. If the perception data indicates that the intersection has no traffic lights, the prediction module 503 may predict that the vehicle may have to stop completely before entering the intersection. If the perception data indicates that the vehicle is currently in a left-turn only lane or a right-turn only lane, the prediction module 503 may predict that the vehicle will be more likely to turn left or right, respectively.

For each of the objects, the decision module 504 makes a decision on how to process the object. For example, for a particular object (e.g., another vehicle in a cross-route) and its metadata describing the object (e.g., speed, direction, turn angle), the decision module 504 decides how the object is encountered (e.g., cut-in, yield, stop, pass). Decision module 504 may make these decisions based on a set of rules, such as traffic rules or driving rules 512, which may be stored in persistent storage 552.

The routing module 507 is configured to provide one or more routes or paths from an origin point to a destination point. For a given trip, e.g., received from a user, from a starting location to a destination location, the routing module 507 obtains route and map information 511 and determines all possible routes or paths from the starting location to the destination location. The routing module 507 may generate a reference line in the form of a topographical map for each route determined from the starting location to the destination location. A reference line refers to an ideal route or path without any other disturbance from, for example, other vehicles, obstacles or traffic conditions. That is, if there are no other vehicles, pedestrians, or obstacles on the road, the ADV should follow the reference line precisely or closely. The terrain map is then provided to a decision module 504 and/or a planning module 505. The decision module 504 and/or the planning module 505 examines all possible routes to select and modify one of the best routes according to other data provided by other modules, such as traffic conditions from the location module 501, driving environment sensed by the sensing module 502, and traffic conditions predicted by the prediction module 503. Depending on the particular driving environment at the point in time, the actual path or route used to control the ADV may be close to or different from the reference line provided by the routing module 507.

Based on the decisions for each of the perception objects, the planning module 505 plans the path or route of the autonomous vehicle and driving parameters (e.g., distance, speed, and/or turn angle) using the reference lines provided by the routing module 507 as a basis. That is, for a given object, the decision module 504 decides what to do with the object, and the planning module 505 determines how to do. For example, for a given subject, decision module 504 may decide to pass through the subject, while planning module 505 may determine whether to pass on the left or right side of the subject. The planning and control data is generated by the planning module 505, including information describing how the vehicle 500 will move in the next movement cycle (e.g., the next route/path segment). For example, the planning and control data may instruct the vehicle 500 to move 10 meters at a speed of 30 miles per hour (mph) and then change to the right roadway at a speed of 25 mph.

Based on the planning and control data, the control module 506 controls and drives the autonomous vehicle by sending appropriate commands or signals to the vehicle control system 111 according to the route or path defined by the planning and control data. The planning and control data includes sufficient information to drive the vehicle from a first point to a second point of the route or path at different points in time along the route or route using appropriate vehicle settings or driving parameters (e.g., throttle, brake, steering commands).

In one embodiment, the planning phase is performed in a plurality of planning periods, also referred to as driving periods, such as in each time interval of 100 milliseconds (ms). For each planning or driving cycle, one or more control commands will be issued based on the planning and control data. That is, for every 100ms, the planning module 505 plans the next route segment or path segment, e.g., including the target location and the time required for the ADV to reach the target location. Alternatively, the planning module 505 may also specify a particular speed, direction, and/or steering angle, etc. In one embodiment, the planning module 505 plans a route segment or a path segment for the next predetermined time period (such as 5 seconds). For each planning cycle, the planning module 505 plans the target location for the current cycle (e.g., the next 5 seconds) based on the target locations planned in the previous cycle. The control module 506 then generates one or more control commands (e.g., throttle control commands, brake control commands, steering control commands) based on the current cycle of the planning and control data.

Note that the decision module 504 and the planning module 505 may be integrated as an integrated module. The decision module 504/planning module 505 may include a navigation system or functionality of a navigation system to determine a driving path for an autonomous vehicle. For example, the navigation system may determine a series of speed and directional headings to affect movement of the autonomous vehicle along a path that substantially avoids the perceived obstacle while generally advancing the autonomous vehicle along a roadway-based path to a final destination. The destination may be set according to user input via the user interface system 113. The navigation system may dynamically update the driving path while the autonomous vehicle is in operation. The navigation system may combine data from the GPS system and one or more maps to determine a driving path for the autonomous vehicle.

The motion controller 508 has an optimizer and a vehicle model. The optimizer may generate a series of control commands (e.g., throttle, steering, and/or braking commands) that track the vehicle path along the target vehicle trajectory using the cost function and the vehicle model. These commands are generated when optimized for different cost terms (e.g., lateral trajectory error, driving direction error, speed, steering, acceleration, rate of change of steering, and/or rate of change of acceleration). Each of the cost terms may be represented in a cost function to penalize undesirable behavior. A controller parameter 513 (or weight) may be associated with and applied to each term (e.g., multiplication) to modify the impact of each term on the overall computational cost. In one embodiment, the controller parameter 513 is an optimized parameter tuned by the controller parameter tuner 125 for the motion controller 508.

A general example of a cost function for an MPC controller is shown below, where J is the total calculated cost, w_xIs the weight corresponding to the term x (x ═ 1,2, …) and N is the point along the target trajectory of the ADV.

These terms can be optimized by minimizing the computation cost J. These items may include at least one of: lateral trajectory error (distance of penalty ADV from target trajectory), driving direction error (error between penalty ADV driving direction and target trajectory direction at a certain point), speed cost (change in penalty speed), steering cost (change in penalty steering), acceleration cost (change in penalty acceleration), rate of change of steering (how fast the penalty steering change), braking cost (penalty braking), and rate of change of acceleration (how fast the penalty acceleration change). In some embodiments, the cost function includes at least two of the above terms. In other embodiments, the cost function includes all of the above terms. While considering the above, sequential control commands (throttle, steering, braking) may be generated to optimally track the target trajectory.

Note that some or all of the components shown and described above may be implemented in software, hardware, or a combination thereof. For example, these components may be implemented as software installed and stored in a persistent storage device, which may be loaded into and executed by a processor (not shown) in memory to perform the processes or operations described throughout this application. Alternatively, these components may be implemented as executable code programmed or embedded into special-purpose hardware, such as an integrated circuit (e.g., an application specific IC or ASIC), a Digital Signal Processor (DSP) or a Field Programmable Gate Array (FPGA), which is accessible via a corresponding driver and/or operating system from an application. Further, these components may be implemented as specific hardware logic or processor cores in a processor or in a computer as part of an instruction set accessible via one or more specific instruction software components.

Some portions of the preceding detailed description have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, considered to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as those set forth in the appended claims, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

Embodiments of the present disclosure also relate to apparatuses for performing the operations herein. Such a computer program is stored in a non-transitory computer readable medium. A machine-readable medium includes any mechanism for storing information in a form readable by a machine (e.g., a computer). For example, a machine-readable (e.g., computer-readable) medium includes a machine (e.g., computer) readable storage medium (e.g., read only memory ("ROM"), random access memory ("RAM"), magnetic disk storage media, optical storage media, flash memory devices).

The processes or methods described in the foregoing figures may be performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, etc.), software (e.g., embodied on a non-transitory computer readable medium), or a combination of both. Although the processes or methods are described above with respect to some sequential operations, it should be understood that some of the operations described may be performed in a different order. Further, some operations may be performed in parallel rather than sequentially.

Embodiments of the present disclosure are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of embodiments of the disclosure as described herein.

In the foregoing specification, embodiments of the disclosure have been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope of the disclosure as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

In some embodiments, the present disclosure may include language, e.g., "at least one of [ element a ] and [ element B ]. The language may refer to one or more elements. For example, "at least one of a and B" may refer to "a", "B", or "a and B". Specifically, "at least one of a and B" may mean "at least one of a and B" or "at least one of a or B". In some embodiments, the present disclosure may include language such as "[ element a ], [ element B ], and/or [ element C ]". The language may refer to any one of, or any combination of, the elements. For example, "A, B and/or C" may refer to "a", "B", "C", "a and B", "a and C", "B and C", or "A, B and C. "

17页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：车辆控制装置、车辆控制方法及存储介质

Bayesian global optimization based parameter tuning for vehicle motion controllers

相关技术

网友询问留言