Camera assessment techniques for automated vehicles

文档序号：621637 发布日期：2021-05-07 浏览：6次中文

阅读说明：本技术 自动车辆的相机评估技术 (Camera assessment techniques for automated vehicles ) 是由 C.W.克拉多科 A.温德尔 X.胡于 2019-08-19 设计创作，主要内容包括：本公开涉及评估两个或更多个相机(310、320、410、420、430)的操作。这些相机可以是具有自动驾驶模式的车辆(100)的感知系统(172)的一组相机(300、400)。可以接收由第一相机捕获到的第一图像(500)和由第二相机捕获到的第二图像(600)。可以生成第一图像的第一特征向量和第二图像的第二特征向量。可以使用第一特征向量和第二特征向量来确定相似性分数。该相似性分数可以用于评估两个相机的操作,并且可以采取适当的行动。(The present disclosure relates to evaluating operation of two or more cameras (310, 320, 410, 420, 430). The cameras may be a set of cameras (300, 400) of a perception system (172) of a vehicle (100) having an autonomous driving mode. A first image (500) captured by a first camera and a second image (600) captured by a second camera may be received. A first feature vector of the first image and a second feature vector of the second image may be generated. A similarity score may be determined using the first feature vector and the second feature vector. The similarity score may be used to evaluate the operation of the two cameras and appropriate action may be taken.)

1. A method for evaluating operation of two or more cameras, the method comprising:

receiving, by one or more processors, a first image captured by a first camera;

receiving, by the one or more processors, a second image captured by a second camera, the first camera and the second camera having overlapping fields of view;

generating, by one or more processors, a first feature vector of a first image and a second feature vector of a second image;

determining, by the one or more processors, a similarity score using the first feature vector and the second feature vector; and

the similarity score is used by the one or more processors to evaluate operation of the two cameras.

2. The method of claim 1, wherein the first image and the second image are differently exposed.

3. The method of claim 1, wherein the first camera includes an ND filter and the second camera does not include an ND filter.

4. The method of claim 1, wherein the exposure time of the first image is longer than the exposure time of the second image.

5. The method of claim 1, wherein the first and second images are captured over a predetermined period of time to capture the object at a given location within the overlapping fields of view.

6. The method of claim 1, further comprising, prior to generating the first and second feature vectors, downscaling the first and second images.

7. The method of claim 6, wherein scaling down the first and second images comprises cropping the first and second images to include only pixels corresponding to the overlapping fields of view.

8. The method of claim 6, wherein scaling down the first and second images comprises thumbnails of the first and second images.

9. The method of claim 1, wherein the first and second feature vectors are further generated based on a time of day.

10. The method of claim 1, wherein the first and second eigenvectors are generated so as to include only features corresponding to light-emitting objects.

11. The method of claim 1, wherein generating a similarity score comprises using a structural similarity index.

12. The method of claim 1, wherein using the similarity score comprises comparing the similarity score to a threshold.

13. The method of claim 1, wherein using the similarity score comprises comparing the similarity score to other similarity scores generated from images from the first camera and the second camera over time.

14. The method of claim 13, wherein comparing the similarity score to other similarity scores comprises using a cumulative sum control graph.

15. The method of claim 1, further comprising, based on the evaluation, sending a request for assistance to a remote computing device, the request comprising the first image and the second image.

16. The method of claim 15, further comprising, after sending the request, sending the updated image from the first camera and the updated image from the second camera to the remote computing device.

17. The method of claim 15, further comprising:

receiving an instruction to stop the vehicle in response to the request; and

in response to the request, the vehicle is stopped.

18. The method of claim 1, further comprising activating a cleaning system for one or both of the first camera and the second camera.

19. The method of claim 1, further comprising controlling, by the one or more processors, the vehicle in the autonomous driving mode by making driving decisions based on the evaluation.

20. The method of claim 19, wherein controlling the vehicle comprises discarding all or part of one or both of the first image and the second image when making the driving decision.

21. The method of claim 1, wherein generating a similarity score comprises inputting a first feature vector and a second feature vector into a model.

22. The method of claim 21, wherein the model is a decision tree model.

23. The method of claim 21, wherein generating a similarity score further comprises inputting additional information comprising at least one of depth data for the first image, depth data for the second image, a location of the sun, or time of day information.

Background

Automated vehicles, such as vehicles that do not require a human driver, may be used to assist in the transport of passengers or items from one location to another. Such vehicles may operate in a fully autonomous driving mode in which the passenger may provide some initial input (such as a destination) and the vehicle maneuvers itself to reach the destination. Such vehicles may therefore rely heavily on systems capable of determining the position of an autonomous vehicle and detecting and identifying objects outside the vehicle (such as other vehicles, stop lights, pedestrians, etc.) at any given time. For example, these systems may include sensors (such as laser scanning equipment and cameras) mounted at different locations on the vehicle. Therefore, being able to assess the operation of such sensors in real time is important to ensure that the vehicle does not rely on sensor data from sensors that are not functioning or covered by debris when making driving decisions.

Disclosure of Invention

Aspects of the present disclosure provide a method for evaluating operation of two or more cameras. The method includes receiving, by one or more processors, a first image captured by a first camera; receiving, by one or more processors, a second image captured by a second camera, the first camera and the second camera having overlapping fields of view (field of view); generating, by one or more processors, a first feature vector (feature vector) for a first image and a second feature vector for a second image; determining, by the one or more processors, a similarity score (similarity score) using the first feature vector and the second feature vector; and evaluating, by the one or more processors, operation of the two cameras using the similarity scores.

In one example, the first image and the second image are exposed differently. In another example, the first camera includes an ND filter and the second camera does not include an ND filter. In another example, the exposure time (exposure period) of the first image is longer than the exposure time of the second image. In another example, the first and second images are captured over a predetermined period of time in order to capture an object at a given location within the overlapping fields of view. In another example, the method further comprises reducing the first image and the second image prior to generating the first feature vector and the second feature vector. In this example, reducing the first and second images includes cropping (crop) the first and second images to include only pixels corresponding to the overlapping fields of view. Additionally or alternatively, reducing the first and second images includes thumbnail (thumbnail) the first and second images.

In another example, the first feature vector and the second feature vector are also generated based on a time of day. In another example, the first feature vector and the second feature vector are generated so as to include only features corresponding to the light emitting object. In another example, generating the similarity score includes using a structural similarity index. In another example, using the similarity score includes comparing the similarity score to a threshold. In another example, using the similarity score includes comparing the similarity score to other similarity scores generated from images from the first camera and the second camera over time. In this example, comparing the similarity score to other similarity scores includes using a cumulative sum control chart (cumulative sum control chart). In another example, the method further includes sending a request for assistance to the remote computing device based on the evaluation, the request including the first image and the second image. In this example, the method further includes, after sending the request, sending the updated image from the first camera and the updated image from the second camera to the remote computing device. Additionally or alternatively, the method further comprises receiving an instruction to stop the vehicle and stopping the vehicle in response to the request, in response to the request. In another example, the method further includes activating a cleaning system for one or both of the first camera and the second camera. In another example, the method further includes controlling, by the one or more processors, the vehicle in the autonomous driving mode by making a driving decision based on the evaluation. In this example, controlling the vehicle includes discarding all or part of one or both of the first image and the second image when making the driving decision.

In another example, generating the similarity score includes inputting the first feature vector and the second feature vector into the model. In this example, the model is a decision tree model. Additionally or alternatively, generating the similarity score further comprises inputting additional information comprising at least one of depth data of the first image, depth data of the second image, a position of the sun, or time of day information.

Drawings

FIG. 1 is a functional diagram of an example vehicle, according to aspects of the present disclosure.

FIG. 2 is an example exterior view of the example vehicle of FIG. 1, according to aspects of the present disclosure.

Fig. 3 is an example of a camera group according to aspects of the present disclosure.

Fig. 4 is another example of a camera group according to aspects of the present disclosure.

Fig. 5 is an example image according to aspects of the present disclosure.

Fig. 6 is another example image according to aspects of the present disclosure.

Fig. 7 is a schematic diagram of an example system according to aspects of the present disclosure.

Fig. 8 is a functional diagram of the system of fig. 7, according to aspects of the present disclosure.

Fig. 9 is an example of cropping and reducing an image according to aspects of the present disclosure.

Fig. 10 is an example image according to aspects of the present disclosure.

Fig. 11 is another example image according to aspects of the present disclosure.

Fig. 12 is an example flow diagram in accordance with aspects of the present disclosure.

Detailed Description

SUMMARY

The present technology relates to evaluating the operation of two or more cameras, or more specifically, to confirming that the cameras are functioning properly. For example, it is difficult to know if the camera is properly "seeing" the world, or if foreign debris is present on the lens, if condensation is present, if there are non-functioning pixels, etc. This is particularly important in the case of automated vehicles that rely on such cameras to make driving decisions.

For example, the perception system of an automotive vehicle may include multiple cameras and other sensors. The cameras may have different configurations, e.g., different filters, etc., but may be configured to capture images periodically. At least some of the cameras, and thus some of the captured images, may have overlapping fields of view. The function of a pair of cameras with overlapping fields of view can be verified by selecting a pair of images captured from each camera (valid). Ideally, these images are captured in close proximity in time or within some predetermined time period in order to capture one or more of the same objects at the same location within the overlapping fields of view.

To simplify the processing, the size of the image may be reduced. For example, the image may be reduced in size and/or otherwise cropped to include only portions corresponding to the overlapping fields of view.

The scaled-down image may then be analyzed to generate a feature vector. These feature vectors will therefore represent features in the scaled down image. These feature vectors may then be compared to determine a similarity score or a degree of similarity between each other. The similarity score may be determined using a cosine similarity measure, clustering technique, other vector similarity measure technique, or model.

The similarity score may be compared to a threshold to determine whether the similarity between the zoomed out images is too low, or rather, the images are so different that one of the cameras may be problematic or occluded. In many cases, the threshold may be sufficient to identify changes, such as when condensation slowly forms on one of the camera lenses. However, in some cases, the threshold may not necessarily identify the problem. In this case, the similarity scores of many images from two cameras may be compared over time to identify sudden changes.

If the threshold is met or if a sudden change is detected, the vehicle's computing device may assume that one or both of the cameras is problematic. Therefore, an appropriate response can be taken. Further, the process may be performed periodically, such as each time a pair of images is captured between two cameras having overlapping fields of view, or less frequently.

While the above-described techniques work well during daytime when ambient lighting is good, in darker or nighttime environments, the similarity score and SSIM may become unreliable. During this time, instead of matching all features in the two reduced images, only bright spots or high intensity regions may be compared. Further, the similarity scores of these vectors may be determined again and compared to a threshold and/or tracked to identify if one of the cameras is having a problem. Thereafter, an appropriate response may be taken.

The features described herein allow for reliable camera evaluation under various lighting conditions. As described above, it is extremely challenging to determine whether the camera is properly "seeing" the world, or whether there is some foreign object debris, condensation, non-functioning pixels on the lens, etc. This is especially important in the case of automated vehicles that rely on such cameras to make driving decisions.

Example System

As shown in fig. 1, a vehicle 100 according to one aspect of the present disclosure includes various components. Although certain aspects of the present disclosure are particularly useful for certain types of vehicles, the vehicle may be any type of vehicle, including, but not limited to, a car, truck, motorcycle, bus, caravan, and the like. The vehicle may have one or more computing devices, such as computing device 110 containing one or more processors 120, memory 130, and other components typically found in a general purpose computing device.

Memory 130 stores information accessible by one or more processors 120, including instructions 132 and data 134 that may be executed or otherwise used by processors 120. The memory 130 may be of any type capable of storing information accessible by the processor, including a computing device readable medium, or other medium that stores data that may be read by an electronic device, such as a hard disk drive, memory card, ROM, RAM, DVD or other optical disk, and other writable and read-only memories. The systems and methods may include different combinations of the foregoing, whereby different portions of the instructions and data are stored on different types of media.

The instructions 132 may be any set of instructions that are directly executable by a processor (such as machine code) or indirectly executable (such as scripts). For example, the instructions may be stored as computing device code on a computing device readable medium. In this regard, the terms "instructions" and "programs" may be used interchangeably herein. The instructions may be stored in an object code format for direct processing by a processor, or in any other computing device language, including a collection of script or independent source code modules that are interpreted or pre-compiled as needed. The function, method and routine of the instructions are explained in more detail below.

Data 134 may be retrieved, stored, or modified by processor 120 according to instructions 132. For example, although claimed subject matter is not limited by any particular data structure, data may be stored in computing device registers, in a relational database as a table having a plurality of different fields and records, in an XML document or flat file (flat file), for example. The data may also be formatted in any computing device readable format.

The one or more processors 120 may be any conventional processor, such as a commercially available CPU. Alternatively, one or more processors may be special purpose devices, such as an ASIC or other hardware-based processor. Although fig. 1 functionally shows the processors, memory, and other elements of the computing device 110 to be within the same block, those of ordinary skill in the art will appreciate that a processor, computing device, or memory may actually comprise multiple processors, computing devices, or memories, which may or may not be stored within the same physical housing. For example, the memory may be a hard disk drive or other storage medium located in a different enclosure than the enclosure of the computing device 110. Thus, references to a processor or computing device are to be understood as including references to a collection of processors or computing devices or memories that operate in parallel or not.

Computing device 110 may include all of the components typically used in connection with computing devices, such as the processors and memories described above, as well as user input 150 (e.g., a mouse, keyboard, touch screen, and/or microphone) and various electronic displays (e.g., a monitor having a screen or any other electronic device operable to display information). In this example, the vehicle includes an internal electronic display 152 and one or more speakers 154 to provide an informational or audiovisual experience. In this regard, the internal electronic display 152 may be located within a cabin of the vehicle 100 and may be used by the computing device 110 to provide information to passengers within the vehicle 100.

Computing device 110 may also include one or more wireless network connections 156 to facilitate communications with other computing devices, such as client and server computing devices described in detail below. Wireless network connections may include short-range communication protocols (such as bluetooth, bluetooth Low Energy (LE), cellular connections), as well as various configurations and protocols (including the internet, world wide web, intranets, virtual private networks, wide area networks, local area networks, private networks using communication protocols proprietary to one or more companies, ethernet, WiFi, and HTTP), as well as various combinations of the foregoing.

In one example, the computing device 110 may be an autonomous driving computing system incorporated into the vehicle 100. The autonomous computing system can communicate with various components of the vehicle to operate the vehicle 100 in a fully autonomous driving mode and/or a semi-autonomous driving mode. For example, returning to fig. 1, the computing device 110 may communicate with various systems of the vehicle 100, such as a deceleration system 160, an acceleration system 162, a steering system 164, a signaling system 166, a navigation system 168, a positioning system 170, a sensing system 172, and a power system 174 (e.g., gasoline or diesel powered motors or electric motors) to control the motion, speed, etc. of the vehicle 100 according to the instructions 132 of the memory 130. Further, although these systems are shown external to computing device 110, in practice, these systems may also be incorporated into computing device 110, again as an autonomous driving computing system for controlling vehicle 100.

For example, the computing device 110 may interact with a deceleration system 160 and an acceleration system 162 to control the speed of the vehicle. Similarly, the computing device 110 may use the steering system 164 to control the direction of the vehicle 100. For example, if the vehicle 100 is configured for use on a roadway, such as a car or truck, the steering system may include components that control the angle of the wheels to turn the vehicle. The computing device 110 may use the signaling system 166 to signal other drivers or vehicles of the vehicle's intent (e.g., by illuminating turn or brake lights, if desired).

The computing device 110 may use the navigation system 168 to determine and follow (follow) a route to a location. In this regard, the navigation system 168 and/or the data 134 may store detailed map information, such as highly detailed maps identifying the shape and height of roads, lane lines, intersections, crosswalks, speed limits, traffic signals, buildings, signs, real-time traffic information, vegetation, or other such objects and information. In other words, the detailed map information may define the geometry of the expected environment of the vehicle, including roads and speed limits (legal speed limits) for those roads. Further, the map information may include information regarding traffic controls, such as traffic lights, stop signs, yield signs, etc., in combination with real-time information received from the perception system 172, which may be used by the computing device 110 to determine which traffic directions have right of way at a given location.

The sensing system 172 also includes one or more components for detecting objects external to the vehicle, such as other vehicles, obstacles on the road, traffic signals, signs, trees, etc. For example, the perception system 172 may include one or more LIDAR sensors, sonar devices, radar units, cameras, and/or any other detection devices that record data that may be processed by the computing device 110. Sensors of the perception system may detect objects in the environment external to the vehicle and generate sensor data describing characteristics of the objects, such as location, orientation, size, shape, type, direction and speed of movement, and the like. Raw sensor data from the sensors and/or the aforementioned characteristics may be quantized or arranged into (organized intro) descriptive functions or vectors and sent to the computing device 110 for further processing. As discussed in further detail below, the computing device 110 may use the positioning system 170 to determine the location of the vehicle and the sensing system 172 to detect and respond to objects when safe arrival at the location is desired.

For example, fig. 2 is an example exterior view of the vehicle 100. In this example, the top housing 210 and the housings 212, 214 may include LIDAR sensors as well as various cameras and radar units. Further, a housing 220 located at the front end of the vehicle 100 and housings 230, 232 located on the driver and passenger sides of the vehicle may each store a LIDAR sensor. For example, the housing 230 is located in front of the driver's door 260. The vehicle 100 also includes housings 240, 242 for radar units and/or cameras that are also located on the roof of the vehicle 100. Additional radar units and cameras (not shown) may be located at the front and rear ends of the vehicle 100 and/or at other locations along the roof or roof housing 210.

The cameras of the perception system 172 may be arranged on the vehicle such that at least two cameras periodically capture a majority of points in the vehicle environment. Some points in front of the vehicle (i.e., the direction of travel) may also be "seen" by the telepresence camera. Thus, for evaluation purposes, each camera of the perception system may be in a group with one or more other cameras in a "camera group".

Fig. 3 is an example of a camera group 300 including two cameras 310, 320 with overlapping fields of view 312, 322. Fig. 4 is an example of a camera group 400 including three cameras 410, 420, 430 having fields of view 412, 422, 432, where the fields of view 412, 422, 432 have overlapping portions 302, 402. Thus, the sets of cameras, and thus the images captured by the sets, may have overlapping fields of view. Each camera of the camera group may have a cleaning system 314, 324, 414, 424, 434, which may include a wiper (wiper) and/or a cleaning fluid to clean the lens of the camera. For example, the operation of the cleaning system may be controlled by the computing device 110. The cameras of each camera group may be fixed relative to each other and to the vehicle to ensure that the overlapping fields of view remain consistent.

Further, each camera in the groups may have the same or different configurations, e.g., different filters, etc. In some cases, the images may be exposed differently, that is, the images may be captured using different filtering techniques and/or exposure times. For example, referring to images 500 and 600 of fig. 5 and 6, one image 500 may be captured with an ND filter for a first exposure time using a first camera (such as camera 310), while a second image 600 may be captured without an ND filter for a second exposure time using a second camera (such as camera 320). The first exposure time and the second exposure time may be the same or different, for example, the second exposure time may be shorter than the first exposure time. As an example, the first image 500 may include rough outlines of a traffic light 510 and a vehicle 520, and possibly other objects. The second image 600 may include an overexposed traffic light 510 and an overexposed vehicle 520.

The computing device 110 of the vehicle 100 may also receive information from or transmit information to other computing devices, such as those that are part of the transportation service. Fig. 7 and 8 are a schematic diagram and a functional diagram, respectively, of an example system 700, the example system 700 including a plurality of computing devices 710, 720, 730, 740 and a storage system 750 connected via a network 760. The system 700 also includes a vehicle 100 and vehicles 100A, 100B that may be configured the same as or similar to the vehicle 100. Although only a few vehicles and computing devices are depicted for simplicity, a typical system may include significantly more vehicles and computing devices.

As shown in fig. 7, each of the computing devices 710, 720, 730, 740 may include one or more processors, memories, data, and instructions. Such processors, memories, data, and instructions may be configured similar to the one or more processors 120, memories 130, instructions 132, and data 134 of the computing device 110.

Network 760 and intermediate nodes may include various configurations and protocols, including short-range communication protocols such as bluetooth, bluetooth LE, the internet, the world wide web, intranets, virtual private networks, wide area networks, local area networks, private networks using communication protocols specific to one or more companies, ethernet, WiFi, and HTTP, as well as various combinations of the foregoing. Such communication may be facilitated by any device capable of sending and receiving data to and from other computing devices, such as modems and wireless interfaces.

In one example, the one or more computing devices 710 may include one or more server computing devices (e.g., a load balancing server farm) having multiple computing devices that exchange information with different nodes of a network for the purpose of receiving data from, processing data, and sending data to other computing devices. For example, the one or more computing devices 710 may include one or more server computing devices capable of communicating with the computing device 110 of the vehicle 100 or similar computing devices of the vehicle 100A and the computing devices 720, 730, 740 via the network 760. For example, the vehicles 100, 100A may be part of a fleet of vehicles that may be dispatched to various locations by a server computing device. In this regard, the server computing device 710 may function as a verification computing system that may be used to verify automatic control software that vehicles, such as the vehicle 100 and the vehicle 100A, may use to operate in an autonomous driving mode. Further, the server computing device 710 may send information to a user (such as the users 722, 732, 742) using the network 760 and present the information to the user on a display (such as the displays 724, 734, 744 of the computing devices 720, 730, 740). In this regard, the computing devices 720, 730, 740 can be considered client computing devices.

As shown in fig. 7, each client computing device 720, 730, 740 can be a personal computing device intended for use by a user 722, 732, 742, and have all of the components typically used in connection with a personal computing device, including one or more processors (e.g., a Central Processing Unit (CPU)), memory (e.g., RAM and internal hard drives) to store data and instructions, a display such as display 724, 734, 744 (e.g., a monitor having a screen, touch screen, projector, television, or other device operable to display information), and a user input device 726, 736, 746 (e.g., a mouse, keyboard, touch screen, or microphone). The client computing device may also include a camera for recording video streams, speakers, a network interface device, and all components for connecting these elements to each other.

Although the client computing devices 720, 730, and 740 may each comprise a full-size personal computing device, they may alternatively comprise mobile computing devices capable of wirelessly exchanging data with a server over a network such as the internet. By way of example only, the client computing device 720 may be a mobile phone or a device capable of obtaining information via the internet or other network, such as a wireless-enabled PDA, a tablet PC, a wearable computing device or system, or a netbook. In another example, the client computing device 730 may be a wearable computing system, such as the watch shown in fig. 7. As an example, a user may input information using a keypad, a microphone, using a camera using visual signals, or a touch screen.

In some examples, the client computing device 740 may be an operating workstation used by an administrator or other operator, such as the user 742, to respond to assistance requests received from computing devices of vehicles (such as the vehicle 100 and the vehicle 100A). Although only a single operational workstation 740 is shown in fig. 7 and 8, any number of such workstations may be included in a typical system. Further, although the operational workstation is depicted as a desktop computer, the operational workstation may include various types of personal computing devices, such as laptop computers, netbooks, tablet computers, and the like.

Like the memory 130, the storage system 750 may be any type of computerized storage capable of storing information accessible by the server computing device 710, such as a hard drive, memory card, ROM, RAM, DVD, CD-ROM, writable memory, and read-only memory. Further, storage system 750 may comprise a distributed storage system where data is stored on a plurality of different storage devices physically located in the same or different geographic locations. As shown in fig. 7 and 8, storage system 750 may be connected to computing devices via network 760, and/or may be directly connected to or incorporated into any of computing devices 110, 710, 720, 730, 740, etc.

Example method

In addition to the operations described above and illustrated in the figures, various operations will now be described. It should be understood that the following operations need not be performed in the exact order described below. Conversely, various steps may be processed in a different order or concurrently, and steps may be added or omitted.

As described above, as the vehicle 100 travels around, its sensing system 172 may use various sensors to detect and identify objects in the vehicle environment. Furthermore, at least some of these sensors may comprise the aforementioned set of cameras. To ensure that the cameras of a given camera group are operating properly, the function of these cameras may be evaluated. To this end, the computing device 110 may receive two or more images, such as images 500 and 600, from cameras of a camera group, such as the camera group 300 (or the camera group 400). Ideally, these images are captured in close proximity in time or within some predetermined time period in order to capture one or more of the same objects at the same location within the overlapping fields of view.

To simplify the processing of the image, the size of the image may be reduced. For example, the computing device 110 may reduce the size of each image, such as by thumbnails and/or otherwise cropping, to include only portions corresponding to the overlapping fields of view. Turning to fig. 9, the images 500 and 600 may be cropped into cropped areas 950, 960. These cropped regions include only the portion corresponding to the overlapping portion 302 of the fields of view 312 and 322. The cropped regions 950, 960 may then be reduced or thumbnails to reduce the number of pixels. The result is reduced images 952 and 962, which may include only pixels corresponding to the overlapping portion 302 of the fields of view of cameras 310 and 320. In this respect, the possibility that the two reduced images include the same object at the same position is very high. Although in some cases, cropping may be avoided using the original resolution image in order to detect very small occlusions, this may make the process more sensitive to smaller errors in camera alignment and more sensitive to parallax.

The scaled (or unshaded) image may then be analyzed to generate a feature vector. For example, a first feature vector may be generated for the scaled down image 952 and a second feature vector may be generated for the scaled down image 962. These feature vectors therefore represent features in the reduced image, such as color, edges, brightness, contrast, etc. In some cases, the feature vector may include pixels of the image itself.

One or more similarity scores may be determined using a first feature vector of a reduced image from the set of cameras. For example, the computing device 110 may compare each pair of feature vectors (a pair of scaled-down images from different cameras of a group of cameras) to determine a similarity score or how similar they are to each other.

The similarity score may be determined using a cosine similarity measure, clustering technique, or other vector similarity measure technique. For example, a Structural Similarity (SSIM) index may be used to measure the similarity between feature vectors of scaled-down images and determine a similarity score. In this regard, a high similarity score will indicate a high similarity of the features between the two reduced images, while a low similarity score will indicate a low similarity of the features between the two reduced images.

As another example, a model may be used to determine a similarity score. In use, the first and second feature vectors may be input into a model, and the model may provide a similarity score or a series of similarity scores representing how similar different portions of the image (such as left-hand, right-hand, interior, etc.) are.

The model may be a decision tree model, a random forest model, a neural network, or other machine-learned model. Decision trees may be particularly practical where more complex neural networks are trained in limited data. The model may be stored locally in computing device 110, for example in memory 130. For example, the model may be trained offline at one or more server computing devices 710, then sent to computing device 110 via network 760, and/or loaded directly into memory 130.

The model may be trained using real-world examples of images from cameras of the camera group (such as cameras 310, 320, 410, 420, 430 of the camera group 300, 400) as training inputs, and using respective similarity scores as training outputs. Because there may be a limited number of negative (negative) examples for training purposes, training may additionally or alternatively include using fault injection (fault injection) techniques to simulate "bad" images with typical problems (such as obstacles, dirty lenses, etc.) other than good images. Feature vectors from good images may then be paired with features from bad images and identified as having low similarity scores as training data. Similarly, feature vectors from good images may be paired with themselves and identified as having a high similarity score as training data. Of course, for all training, the more training data or examples used to train the model, the more reliable the similarity value (similarity value) generated by the model.

In some cases, the model may be trained and used with additional information. For example, additional information such as depth data of the first image, depth data of the second image, position of the sun (e.g., angle of the sun relative to a camera capturing the image), or time of day information. The depth data may be a depth map generated by projecting sensor data provided by one or more LIDAR sensors of the perception system 172 into a field of view (or coordinate system) in each image. For example, the time of day may be provided by a camera that captures the image, and the angle of the sun relative to the camera may be identified from data stored in memory 130 relating to the time of day. In other words, given the data and the time of day and the location of one or both of the cameras capturing the images, the computing device may determine the angle of the sun relative to the current location of the vehicle.

The similarity score may be used to evaluate the operation of the camera. For example, the computing device 110 may compare the similarity score to a threshold to determine whether the similarity between the scaled-down images is too low, or rather, the scaled-down images are so different that one of the cameras may be problematic or occluded. In some cases, where different portions of the first and second images have similarity scores, this may help identify exactly which portion of the camera lens may be problematic or occluded by comparing the similarity scores of those different portions to a threshold. In many cases, the threshold may be sufficient to identify changes, such as when condensation slowly forms on one of the camera lenses. The threshold may be chosen to be sensitive enough to such variations without generating too many false-positives. For example, if the SSIM score ranges from-1 to 1, the threshold may be set to 0.25 or more or less. Which is not a meaningful number by itself.

However, in some cases, such as a crack on one of the lenses of the camera, the threshold need not be able to identify the problem. In this case, the computing device 110 may compare multiple images between two cameras over time in order to identify sudden changes. For example, a cumulative and control Chart (CUSUM) may be used to identify sudden changes in the similarity score over time that do not necessarily satisfy a threshold, but may also indicate that one of the cameras is having a problem.

If the threshold is met or if a sudden change is detected, the vehicle's computing device may assume that one or both of the cameras is problematic. In this way, the vehicle's computing device can take an appropriate response. For example, cleaning systems for one or both cameras (such as cleaning systems 314, 324, 414, 424, 434) may be automatically activated. Additionally or alternatively, computing device 110 may send a request to a remote assistance operator (such as to computing device 740 and user 742), e.g., via network 760, to check the camera image (raw or zoomed out). The remote assistance operator or user 742 may be able to determine whether simple cleaning (by activating a cleaning system such as a wiper and wash) is sufficient to correct the problem, whether the vehicle should be parked alongside, or if the camera is not a critical sensor, whether the vehicle should simply stop using information from the camera to make driving decisions. In some cases, the remote assistance operator or user 742 may be able to remotely activate (and in some cases also deactivate) one or more of the cleaning systems and view the second set of images to confirm that cleaning is adequate. The computing device 110 may also send updated images from each camera of the camera group via the network 160 to allow a remote assistance operator or user 742 to confirm that the problem has been resolved.

In some cases, the computing device 110 may avoid processing invalid images or invalid portions of images based on information that one or more cameras of the camera group are problematic. Additionally or alternatively, the computing device 110 may simply control the vehicle in the autonomous driving mode to park side-to-side until the problem is resolved.

The above-described processes of processing images, detecting changes, and taking appropriate responses may be performed periodically, such as each time a set of images is captured by a camera group, or less frequently.

To avoid false positives, additional steps may be taken. For example, when a vehicle leaves a tunnel or an object near one of the cameras causes some parallax, several frames captured by the cameras may come together over time. As described above, these aggregated images may be scaled down and used to generate feature vectors. Additionally or alternatively, a depth map generated from sensor data provided by one or more LIDAR sensors of the perception system 172 may be used to "skip over" or otherwise ignore regions of the image or reduced image so that parallax locations may arise due to motion of the vehicle 100. In this regard, the feature vector may be generated from a portion of an image or a reduced image that is expected to be similar, rather than from a portion of an image or a reduced image that is expected to be different due to parallax.

While the above-described techniques may work well in daytime or daytime situations where ambient lighting is good, in darker or nighttime environments, the similarity score and SSIM may become unreliable. In this regard, at certain times of the day corresponding to night or nighttime hours, all features in the two reduced images are not matched, but only bright spots or high intensity areas may be compared. For example, typically, because of legal requirements, light sources such as traffic lights and tail lights have a known constant brightness, such lights should be visible in the reduced image. This bright spot method does not rely on legal requirements for the lamp, nor is it only applicable to vehicle lamps. Instead, this approach relies on the fact that if a bright spot is visible in a camera with an ND filter (which blocks most of the light), then the same bright spot should also be visible in the other camera (which receives more light).

For example, referring to images 1000 and 1100 of fig. 10 and 11, a first image 1000 may be captured with an ND filter for a first exposure time using a first camera (such as camera 310), while a second image 1100 may be captured without an ND filter for a second exposure time using a second camera (such as camera 320). The first exposure time and the second exposure time may be the same or different, for example, the second exposure time may be shorter than the first exposure time. For ease of understanding, images 1000 and 1100 correspond to images 500 and 600, respectively, although taken during nighttime hours. Thus, both images include the traffic light 510 and the vehicle 520, although they are thinly visible in the image 1100, but not visible in the image 1000 due to the use of the ND filter and the exposure time. In this example, although the image appears dark, bright spots 1010 of the traffic light 510 and bright spots 1020, 1022 of the tail lights of the vehicle 520 can be seen in both images 1000 and 1100.

Further, before generating the feature vectors of these images, the images may be cropped and reduced as described above. Furthermore, the feature vectors generated for these reduced images can be simplified. For example, feature vectors for images 1000 and 1100 may be generated to describe only the characteristics of features corresponding to bright spots, such as shape, location, and size. In other words, the feature vector may include only data of features corresponding to the light-emitting objects in the reduced image. Thus, if there are very few light sources in an area, this process may be less efficient. However, since the exposure parameters (shutter speed, analog gain, ND filter, etc.) of the image are known, the image can also be corrected for a wider range of exposure parameters. For example, the SSIM method can handle up to 6 powers (6 stors) (a factor of 64 times) of difference relatively well.

The similarity scores for these vectors may again be determined using any of the examples described above, including SSIM. Of course, the feature describing the location of the bright spot would be the most important characteristic for comparison in this case. Further, the similarity score may be compared to a threshold and/or the CUSUM tracking similarity score may be used to identify if one of the cameras has a problem. Thereafter, an appropriate response may be taken as described above.

Fig. 12 is an example flow diagram 1200 of operations for evaluating two or more cameras according to some aspects described herein and as may be performed by one or more processors of one or more computing devices, such as processor 120 of computing device 110. In this example, at block 1210, a first image captured by a first camera is received. At block 1220, a second image captured by a second camera is received. The first camera and the second camera have overlapping fields of view. At block 1230, a first feature vector of the first image and a second feature vector of the second image may be generated. At block 1240, a similarity score may be determined using the first feature vector and the second feature vector. At block 1250, the similarity score may be used to evaluate the operation of the two cameras.

The features described herein allow for reliable camera evaluation under various lighting conditions. As described above, it is very difficult to determine whether the camera is properly "seeing" the world, or whether there is some foreign object debris, condensation, non-functioning pixels on the lens, etc. This is especially important in the case of automated vehicles that rely on such cameras to make driving decisions.

Unless otherwise specified, the foregoing alternative examples are not mutually exclusive and may be implemented in various combinations to achieve unique advantages. As these and other variations and combinations of the features discussed above can be utilized without departing from the subject matter defined by the claims, the foregoing description of the embodiments should be taken by way of illustration rather than by way of limitation of the subject matter defined by the claims. Furthermore, the provision of examples described herein, and clauses phrased as "such as," "including," and the like, should not be interpreted as limiting the subject matter of the claims to the specific examples; rather, these examples are intended to illustrate only one of many possible embodiments. Further, the same reference numbers in different drawings may identify the same or similar elements.

25页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：基于感兴趣区域的视频数据编码方法、装置和存储介质

Camera assessment techniques for automated vehicles

相关技术

网友询问留言