CNN classification of multi-frame semantic signals

文档序号：1804204 发布日期：2021-11-05 浏览：15次中文

阅读说明：本技术 多帧语义信号的cnn分类 (CNN classification of multi-frame semantic signals ) 是由 E.马拉赫 Y.香比克 J.本托利拉 I.盖勒于 2019-11-15 设计创作，主要内容包括：本主题为高级驾驶辅助系统(ADAS)和自主车辆(AV)系统面临的技术问题提供了各种技术方案。具体地,所公开的实施例提供的系统和方法可以使用相机和其他传感器来检测物体和事件,并将其识别为预定的信号分类器,诸如检测和识别红色停行灯。这些信号分类器在ADAS和AV系统内被用来基于信号的类型而控制车辆或提醒车辆操作者。这些ADAS和AV系统可以提供完整的车辆操作,而不需要人类输入。本文公开的实施例提供了可以作为ADAS和AV系统的一部分或与之结合使用的系统和方法。(The present subject matter provides various technical solutions to the technical problems faced by Advanced Driving Assistance Systems (ADAS) and Autonomous Vehicle (AV) systems. In particular, the disclosed embodiments provide systems and methods that can use cameras and other sensors to detect objects and events and identify them as predetermined signal classifiers, such as detecting and identifying red stop lights. These signal classifiers are used within the ADAS and AV systems to control the vehicle or alert the vehicle operator based on the type of signal. These ADAS and AV systems can provide complete vehicle operation without human input. Embodiments disclosed herein provide systems and methods that may be used as part of or in conjunction with ADAS and AV systems.)

1. A system for fast CNN classification of a multi-frame semantic signal, the system comprising:

a processing circuit; and

one or more storage devices comprising instructions that, when executed by the processing circuitry, configure the processing circuitry to:

receiving a plurality of time series images from an image capture device;

converting the plurality of time series images into a plurality of vectors stored in a time series buffer;

generating a temporal image based on the plurality of vectors; and

generating a semantic signal based on applying a convolutional neural network to the temporal image.

2. The system of claim 1, wherein each of the plurality of vectors comprises a row vector having a same width as each of the plurality of time series images.

3. The system of claim 2, wherein to convert the plurality of time series images to the plurality of vectors, the processing circuitry is configured to calculate a column value for each of a plurality of columns within each of the plurality of time series images.

4. The system of claim 2, wherein calculating the column values comprises: calculating at least one of an average, a median, or a maximum of each of a plurality of columns within each of the plurality of time series images.

5. The system of claim 1, wherein the generation of the temporal image comprises: concatenating the plurality of vectors to form the temporal image.

6. The system of claim 1, wherein to convert the plurality of time series images into the plurality of vectors, the processing circuitry is configured to obtain each of the plurality of vectors from a respective plurality of images using a classifier.

7. The system of claim 1, wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuit is configured to use a flicker classifier.

8. The system of claim 1, wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuit is configured to use a braking classifier.

9. The system of claim 1, wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuit is configured to use a brake classifier on a pair of vectors of the plurality of vectors and a flicker classifier on the entire temporal image.

10. The system of claim 1, wherein the brake classifier is trained for a plurality of brake signals.

11. The system of claim 1, wherein the plurality of brake signals includes at least one of a brake on signal, a brake off signal, a brake up signal, and a brake down signal.

12. The system of claim 1, wherein the flicker classifier is trained for a plurality of flicker signals.

13. The system of claim 1, wherein the plurality of blink signals includes at least one of a right blink on signal, a right blink off signal, a left blink on signal, and a left blink off signal.

14. The system of claim 1, wherein:

the image capture device is mounted on a vehicle;

the semantic signal indicates a changed path condition of the vehicle; and

the instructions further configure the processing circuit to:

identifying a maneuver for the vehicle in response to the changed path condition; and

transmitting a vehicle control signal to perform the maneuver.

15. The system of claim 1, further comprising a vehicle control device to receive the control signal and perform a vehicle maneuver.

16. An autonomous navigation semantic signal method, comprising:

receiving a plurality of time series images from an image capture device, each of the plurality of time series images associated with a unique image capture time;

mapping each of the plurality of time series images to each of a plurality of vectors;

converting the plurality of vectors into a temporal image; and

semantic signals are identified based on applying a convolutional neural network to the temporal image.

17. The method of claim 16, further comprising:

capturing the plurality of time series images; and

associating the unique image capture time with each of the captured plurality of time series images.

18. The method of claim 16, wherein each of the plurality of vectors comprises a row vector having a same width as each of the plurality of time series images.

19. The method of claim 18, wherein mapping each of the plurality of time series images to each of a plurality of vectors comprises: a column value for each of a plurality of columns within each of the plurality of time series images is calculated.

20. The method of claim 18, wherein calculating the column values comprises: calculating at least one of an average, a median, or a maximum of each of a plurality of columns within each of the plurality of time series images.

21. The method of claim 16, wherein the generation of the temporal image comprises: concatenating the plurality of vectors to form the temporal image.

22. The method of claim 16, wherein, to convert the plurality of time series images into the plurality of vectors, the processing circuitry is configured to obtain each of the plurality of vectors from the respective plurality of images using a classifier.

23. The method of claim 16, wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, a processing circuit is configured to use a flicker classifier.

24. The method of claim 16, wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, a processing circuit is configured to use a braking classifier.

25. The method of claim 16, wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, processing circuitry is configured to use a brake classifier on a pair of vectors of the plurality of vectors and a flicker classifier on the entire temporal image.

26. The method of claim 16, wherein a brake classifier is trained for a plurality of brake signals.

27. The method of claim 16, wherein the plurality of brake signals includes at least one of a brake on signal, a brake off signal, a brake up signal, and a brake down signal.

28. The method of claim 16, wherein a flicker classifier is trained for a plurality of flicker signals.

29. The method of claim 16, wherein the plurality of blink signals comprises at least one of a right blink on signal, a right blink off signal, a left blink on signal, and a left blink off signal.

30. The method of claim 16, further comprising:

identifying a vehicle maneuver based on the semantic signal; and

transmitting a control signal to perform the vehicle maneuver to a vehicle control device.

31. One or more machine-readable media comprising instructions that, when executed by a computing system, cause the computing system to perform the method of any of claims 16-30.

32. An apparatus comprising means for performing the method of any of claims 16-30.

33. A computer program product storing instructions that, when executed by a computerized system, cause the computerized system to perform operations comprising:

receiving a plurality of time series images from an image capture device, each of the plurality of time series images associated with a unique image capture time;

mapping each of the plurality of time series images to each of a plurality of vectors;

converting the plurality of vectors into a temporal image; and

semantic signals are identified based on applying a convolutional neural network to the temporal image.

34. The computer program product of claim 33, further comprising:

capturing the plurality of time series images; and

associating the unique image capture time with each of the captured plurality of time series images.

35. The computer program product of claim 33, wherein each of the plurality of vectors comprises a row vector having a same width as each of the plurality of time series images.

36. The computer program product of claim 35, wherein mapping each of the plurality of time series images to each of a plurality of vectors comprises: a column value for each of a plurality of columns within each of the plurality of time series images is calculated.

37. The computer program product of claim 35, wherein calculating the column values comprises: calculating at least one of an average, a median, or a maximum of each of a plurality of columns within each of the plurality of time series images.

38. The computer program product of claim 33, wherein the generation of the temporal image comprises: concatenating the plurality of vectors to form the temporal image.

39. The computer program product of claim 33, wherein to convert the plurality of time series images into the plurality of vectors, the processing circuitry is configured to obtain each of the plurality of vectors from the respective plurality of images using a classifier.

40. The computer program product of claim 33, wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuitry is configured to use a flicker classifier.

41. The computer program product of claim 33, wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuitry is configured to use a braking classifier.

42. The computer program product of claim 33, wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuitry is configured to use a brake classifier on a pair of vectors of the plurality of vectors and a flicker classifier on the entire temporal image.

43. The computer program product of claim 33, wherein a brake classifier is trained for a plurality of brake signals.

44. The computer program product of claim 33, wherein the plurality of brake signals includes at least one of a brake on signal, a brake off signal, a brake up signal, and a brake down signal.

45. The computer program product of claim 33, wherein a flicker classifier is trained for a plurality of flicker signals.

46. The computer program product of claim 33, wherein the plurality of blink signals comprises at least one of a right blink on signal, a right blink off signal, a left blink on signal, and a left blink off signal.

47. The computer program product of claim 33, further comprising:

identifying a vehicle maneuver based on the semantic signal; and

transmitting a control signal to perform the vehicle maneuver to a vehicle control device.

48. A classification system comprising:

a memory with instructions that, when executed by a processing unit, cause the processing unit to implement a classification trainer comprising:

a backbone network for converting the plurality of images into feature vectors;

a braking network for generating a probability for each of a plurality of braking signals; and

a flicker network for generating a probability for each of the plurality of flicker signals.

49. The system of claim 48, wherein the backbone network comprises a convolutional layer, a pooling layer, and a fully-connected layer.

50. The system of claim 48, wherein:

the plurality of images comprises a plurality of two-channel vehicle images; and

the backbone network converts the plurality of images into feature vectors of sixty-four in length.

51. The system of claim 48, wherein:

classifying a set of two backbone features for the plurality of braking signals; and

a set of sixteen backbone features is classified for the plurality of braking signals and the plurality of flicker signals.

52. The system of claim 48, wherein:

classifying a set of two backbone features for a first subset of sixteen frames received after a vehicle is detected; and

when at least a complete set of sixteen frames has been received, a set of sixteen backbone features is classified.

53. The system of claim 48, wherein the braking network comprises a single hidden fully connected layer and an output layer having four neurons.

54. The system of claim 48, wherein the braking network operates on two row vectors to generate a probability for each of the plurality of braking signals.

55. The system of claim 48, wherein the plurality of brake signals includes at least one of a brake on signal, a brake off signal, a brake up signal, and a brake down signal.

56. The system of claim 48, wherein the scintillation network operates on sixteen row vectors to generate a probability for each of the plurality of scintillation signals.

57. The system of claim 48, wherein sixteen row vectors comprise a reshaped set of sixteen backbone features, the reshaped set of sixteen backbone features comprising one horizontal vector of sixteen lengths and four channels.

58. The system of claim 48, wherein the plurality of blink signals comprises at least one of a right blink on signal, a right blink off signal, a left blink on signal, and a left blink off signal.

59. A method of classification comprising

Training a classification trainer, the classification trainer comprising:

a backbone network for converting the plurality of images into feature vectors;

a braking network for generating a probability for each of a plurality of braking signals; and

a flicker network for generating a probability for each of the plurality of flicker signals.

60. The method of claim 59, wherein the backbone network comprises convolutional layers, pooling layers, and fully-connected layers.

61. The method of claim 59, wherein:

the plurality of images comprises a plurality of two-channel vehicle images; and

the backbone network converts the plurality of images into feature vectors of sixty-four in length.

62. The method of claim 59, wherein:

classifying a set of two backbone features for the plurality of braking signals; and

a set of sixteen backbone features is classified for the plurality of braking signals and the plurality of flicker signals.

63. The method of claim 59, wherein:

classifying a set of two backbone features for a first subset of sixteen frames received after a vehicle is detected; and

when at least a complete set of sixteen frames has been received, a set of sixteen backbone features is classified.

64. The method of claim 59, wherein the braking network comprises a single hidden fully connected layer and an output layer having four neurons.

65. The method of claim 59, wherein the braking network operates on two row vectors to generate a probability for each of the plurality of braking signals.

66. The method of claim 59, wherein the plurality of brake signals includes at least one of a brake on signal, a brake off signal, a brake up signal, and a brake down signal.

67. The method of claim 59, wherein the scintillation network operates on sixteen row vectors to produce a probability for each of the plurality of scintillation signals.

68. The method of claim 59, wherein sixteen row vectors comprise a reshaped set of sixteen backbone features, the reshaped set of sixteen backbone features comprising a horizontal vector of sixteen lengths and four channels.

69. The method of claim 59, wherein the plurality of blink signals comprises at least one of a right blink on signal, a right blink off signal, a left blink on signal and a left blink off signal.

70. A computer program product storing instructions that, when executed by a computerized system, cause the computerized system to perform operations comprising:

training a classification trainer, the classification trainer comprising:

a backbone network for converting the plurality of images into feature vectors;

a braking network for generating a probability for each of a plurality of braking signals; and

a flicker network for generating a probability for each of the plurality of flicker signals.

71. The computer program product of claim 70, wherein the backbone network comprises a convolutional layer, a pooling layer, and a fully-connected layer.

72. The computer program product of claim 70, wherein:

the plurality of images comprises a plurality of two-channel vehicle images; and

the backbone network converts the plurality of images into feature vectors of sixty-four in length.

73. The computer program product of claim 70, wherein:

classifying a set of two backbone features for the plurality of braking signals; and

a set of sixteen backbone features is classified for the plurality of braking signals and the plurality of flicker signals.

74. The computer program product of claim 70, wherein:

classifying a set of two backbone features for a first subset of sixteen frames received after a vehicle is detected; and

when at least a complete set of sixteen frames has been received, a set of sixteen backbone features is classified.

75. The computer program product of claim 70, wherein the braking network comprises a single hidden fully connected layer and an output layer having four neurons.

76. The computer program product of claim 70, wherein the braking network operates on two row vectors to generate a probability for each of the plurality of braking signals.

77. The computer program product of claim 70, wherein the plurality of brake signals includes at least one of a brake on signal, a brake off signal, a brake up signal, and a brake down signal.

78. The computer program product of claim 70, wherein the scintillation network operates on sixteen row vectors to generate a probability for each of the plurality of scintillation signals.

79. The computer program product of claim 70, wherein sixteen row vectors comprise a reshaped set of sixteen backbone features, the reshaped set of sixteen backbone features comprising one horizontal vector of sixteen lengths and four channels.

80. The computer program product of claim 70, wherein the plurality of blink signals comprises at least one of a right blink on signal, a right blink off signal, a left blink on signal and a left blink off signal.

81. One or more machine-readable media comprising instructions that, when executed by a machine, cause the machine to perform operations of any one of claims 1 to 80.

82. An apparatus comprising means for performing the operations of any one of claims 1 to 80.

83. A system to perform the operations of any of claims 1 to 80.

84. A method to perform the operations of any of claims 1 to 80.

Background

Advanced Driving Assistance Systems (ADAS) and Autonomous Vehicle (AV) systems use cameras and other sensors to provide partially or fully autonomous vehicle navigation. The cameras and sensors provide input to the ADAS or AV system that is used to identify other vehicles, lanes, or other navigation environment features. As ADAS and AV systems move toward fully autonomous operation, it would be beneficial to improve the recognition and classification of computer visual inputs.

Disclosure of Invention

The disclosed embodiments provide systems and methods that may be used as part of or in conjunction with ADAS and AV systems. These ADAS and AV systems may use cameras and other sensors to detect objects and events and identify them as predetermined signal classifiers, such as detecting and identifying red stop lights. These signal classifiers are used within the ADAS and AV systems to control the vehicle or alert the vehicle operator based on the type of signal. These ADAS and AV systems can provide complete vehicle operation without human input. Embodiments disclosed herein provide systems and methods that may be used as part of or in conjunction with ADAS and AV systems. ADAS technology may include any suitable technology to assist a driver in navigating or controlling their vehicle, such as front impact warning (FCW), Lane Departure Warning (LDW), Traffic Signal Recognition (TSR), or other partially autonomous driving assistance technology.

The human vehicle operator reacts to similar inputs, such as a human visually recognizing a red stop light and applying brakes to stop the vehicle. However, human vehicle operators rely on subjective judgments to identify the various lights and manipulate vehicle controls. In contrast, the present disclosure provides systems and methods that apply a set of rules defined by a trained system that is trained using a machine learning algorithm such as Convolutional Neural Network (CNN), for example, to identify signal classifiers based on inputs from cameras and other sensors. The technical scheme enables automation of specific vehicle operation tasks which cannot be automated previously. In some embodiments, the systems and methods of the present disclosure may be used to alert a vehicle driver (e.g., vehicle operator), such as to improve the safety or efficiency of vehicle operation.

The following detailed description refers to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings and the following description to refer to the same or like parts. While several illustrative embodiments have been described herein, modifications, adaptations, and other implementations are possible. For example, substitutions, additions or modifications may be made to the components illustrated in the drawings, and the illustrative methods described herein may be modified by substituting, reordering, deleting or adding steps to the disclosed methods. Therefore, the following detailed description may not be limited to the disclosed embodiments and examples.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate various disclosed embodiments. In the drawings:

FIG. 1 is a block diagram representation of a method according to an exemplary embodiment.

FIG. 2 is a block diagram representation of an image map in accordance with an exemplary embodiment;

FIG. 3 is a block diagram representation of an image map in accordance with an exemplary embodiment;

FIG. 4 is a block diagram representation of a system according to an exemplary embodiment;

FIG. 5 is a block diagram representation of a system according to an exemplary embodiment;

FIG. 6 is a block diagram representation of a system according to an exemplary embodiment; and

fig. 7 is a block diagram representation of a system according to an exemplary embodiment.

Detailed Description

The system may be arranged to process images of the environment in front of a vehicle navigating on a road to train a neural network or deep learning algorithm, such as a Convolutional Neural Network (CNN), to detect and classify a plurality of frames of signals (e.g. a plurality of semantic signals) within the context of the plurality of frames of signals. An exemplary multi-frame semantic signal includes a Vehicle Light Indicator (VLI), in which case the task is to identify the status of the vehicle based on the vehicle's light indicator (e.g., flashing light, brake light, hazard light). The multi-frame semantic signals may also include emergency vehicle lights (e.g., flashing lights on emergency vehicles), construction marker lights (e.g., for detour management), traffic light status classifications (e.g., green/yellow/red lights, flashing green/yellow/red, flashing arrows, etc.), or other time-varying visual signals. Semantic signals may be used to identify vehicle maneuvers (maneuver), detect the presence of a particular vehicle (e.g., an emergency vehicle) in the environment of a host vehicle, identify the status or behavior of a road sign indicator such as a traffic light, or identify other nearby signals or vehicles. In examples of the present disclosure, the multi-frame semantic signal may relate to a signal generated from a plurality of frames captured over a period of time from one or more sensors onboard the host vehicle. In various embodiments, the plurality of frames are used to create a signal signature and the signal signature is processed to characterize the signal. The results of such processing may be used to generate a vehicle control signal in response to the signal, such as to notify a vehicle operator or to generate a vehicle brake control signal. In some embodiments, a vehicle control system may be used to receive vehicle control signals and perform identified vehicle maneuvers or issue appropriate alerts.

However, it is to be understood that embodiments of the present disclosure are not limited to scenes where the semantic signal is caused by a lamp. Semantic signal recognition may be associated with various other situations and may result from other types of image data, and may also result from non-image based or non-image based data (such as audible information). In some embodiments, the multi-frame semantic signal may also include detection of an audible signal, such as a siren or emergency vehicle siren.

Systems and methods described herein include applying a Convolutional Neural Network (CNN) to provide detection and classification of multi-frame semantic signals to determine signal classifiers based on input from cameras and other sensors. The input may be analyzed or matched to predetermined signal characteristics, such as by matching to a database of signal characteristics. The input may be used to identify, analyze, or predict an event, a sequence of events, an object, a behavior of an object (e.g., a driving pattern based on a sequence of motion of an object), or other object or event characteristics.

This application of CNN is based on Artificial Intelligence (AI) analysis of the input. As used herein, AI analysis is a field related to the development of decision-making systems to perform cognitive tasks that traditionally require a live actor (such as a human). CNN is an Artificial Neural Network (ANN) algorithm of the type: where the ANN includes a computational structure that can loosely model biological neurons. In general, ANN encodes information (e.g., data or decision making) via weighted connections (e.g., synapses) between nodes (e.g., neurons). Modern ANN is the basis for many AI applications, such as automatic perception (e.g., computer vision, speech recognition, context awareness, etc.), automatic cognition (e.g., decision making, logistics, routing, supply chain optimization, etc.), automatic control (e.g., autonomous cars, drones, robots, etc.), and so forth.

Many ANN's are represented (e.g., implemented) as a matrix of weights corresponding to modeled connections. An ANN operates by accepting data into a set of input neurons, which typically have many outbound connections to other neurons. In each traversal between neurons, the corresponding weights modify the input and are tested against a threshold at the destination neuron. If the weighted value exceeds the threshold, the value is again weighted or transformed by a non-linear function and passed to another neuron further down the ANN map; if the threshold is not exceeded, the value is typically not passed to the (down-graph) neurons down the graph and the synaptic connections remain inactive. Continuing the process of weighting and testing until an output neuron is reached; the pattern and value of the output neurons constitute the result of the ANN processing.

The correct operation of most ANN's relies on the correct weights. However, the ANN designer may not know which weights will be appropriate for a given application. ANN designers typically select some neuron layers or specific connections between layers (including cyclic connections), but ANN designers may not know which weights will be appropriate for a given application. Instead, the training process is used to derive the appropriate weights. However, determining the correct synaptic weights is common to most ANN's. The training process is performed by selecting an initial weight, which may be randomly selected. Training data is fed into the ANN and the results are compared to an objective function that provides an indication of error. The error indication is a measure of how wrong the result of the ANN is compared to the expected result. This error is then used to correct the weights. Over multiple iterations, the weights will collectively converge to encode the operational data into the ANN. This process may be referred to as optimization of an objective function (e.g., a cost or loss function) to minimize cost or loss.

Back propagation is a technique of feeding training data forward through the ANN-forward herein means that the data starts at an input neuron and follows a directed graph of neuron connections until an output neuron is reached-and the objective function is applied back through the ANN to correct synaptic weights. At each step in the back-propagation process, the results of the previous step are used to correct the weights. Thus, the results of the output neuron correction are applied to the neurons connected to the output neurons, and so on, until the input neurons are reached. Back propagation has become a popular technique for training various ANN's.

Semantic signal detection and classification described herein may be based on applying a Deep Neural Network (DNN) to classify a sequence of multiple frames of images. The DNN architecture for processing ordered (sequential) data may include variants of Recurrent Neural Networks (RNNs) and 3-D CNNs. While some believe RNNs perform well in Natural Language Processing (NLP) tasks, some believe RNNs are less effective at capturing the spatial structure of an image, and therefore RNNs are not generally widely used for image sequences. Furthermore, variants of RNNs use convolution, however they are generally not widely used as they involve implementing complex architectures and often provide poor results.

The use of 3-D CNN addresses some of the drawbacks of RNN. For example, 3-D CNNs provide a simple, straightforward (straight forward) architecture for processing ordered image data and generally give superior performance relative to RNNs. However, the computational cost of full 3-D convolution is high, which makes full 3-D convolution unfavorable for real-time applications such as autonomous driving. For example, full 3-D convolution typically involves saving a long sequence of images, which requires considerable memory space and significantly increases computational cost due to processing of 3-D data (e.g., multi-dimensional matrix data).

In contrast to using full 3-D convolution, the present solution processes ordered images to generate temporal (temporal) images while preserving the spatio-temporal (spatio-temporal) structure of the overall image sequence. By preserving the spatiotemporal structure, the present solution enjoys the advantage of using a full 3-D convolution with significantly reduced memory space requirements and significantly reduced computational cost.

Methods according to examples of the presently disclosed subject matter may be implemented in one or more of the various possible embodiments and configurations of the vehicle mountable system described herein. In some embodiments, various examples of the system may be installed in a vehicle and may be operated while the vehicle is in motion. In some embodiments, the system may implement a method according to examples of the presently disclosed subject matter.

The following description and the drawings sufficiently illustrate specific embodiments to enable those skilled in the art to understand the specific embodiments. Other embodiments may incorporate structural, logical, electrical, process, and other changes. Portions and features of various embodiments may be included in, or substituted for, those of other embodiments. Embodiments set forth in the claims encompass all available equivalents of those claims.

Fig. 1 is a block diagram representation of a method 100 according to an example embodiment. The method 100 includes receiving a plurality of input images 110. In various examples, the image 110 may include a sequence of images of a vehicle stop light as shown in fig. 2, or may include a sequence of images of a vehicle turn signal as shown in fig. 3.

The method 100 includes mapping each of the plurality of images 110 to a corresponding plurality of vectors 120. The plurality of images 110 are 2-D representations of the environment (e.g., a 3-D environment) of the host vehicle in a field of view (FOV) of a camera used to capture the images. For example, the plurality of images 110 of 2-D may be created using a sensor that includes a 2-D array of image pixels and additional image capture circuitry.

The mapping of the plurality of images 110 to the corresponding plurality of vectors 120 may be performed such that the vectors 120 maintain the spatial structure of the original image 110 along one of its axes. Thus, each of the plurality of vectors 120 provides a 1-D representation created from a corresponding source 2-D image. In the example shown in fig. 1, the vectors 120 may include row vectors, where each row vector 120 has the same width "W" as each corresponding image 110. The mapping of the image 110 to the vector 120 may include operations that computer vision hardware can efficiently perform, such as computing an average, median, or maximum value along each column of the image.

The method 100 includes concatenating (contracting) the vectors 120 into a new temporal image 130. This provides a temporal image 130 of the 2-D image that is made up of all of the plurality of vectors 120 or a representative number (e.g., a statistically representative sampling) of the plurality of vectors 120. Temporal image 130 may have the same width "W" as vector 120 and image 110, and also include a height "T" equal to the number of source images. The method 100 includes feeding the temporal image 130 into a Convolutional Neural Network (CNN)140 to identify semantic signals 150.

The temporal image 130 fed into the CNN140 maintains the spatiotemporal structure of the overall image sequence 110, since the rows of the temporal image 130 maintain the spatial structure of the original image. By maintaining the spatiotemporal structure, the use of the temporal image 130 in the CNN140 provides the same advantages as using a full 3-D convolution with significantly reduced memory space requirements and significantly reduced computational cost. In one example, by maintaining a spatiotemporal structure, the present solution provides memory and computational costs similar to 2-D CNN.

FIG. 2 is a block diagram representation of an image map 200 according to an exemplary embodiment. The image map 200 may be based on a received ordered set of images 210 and 220. As shown in fig. 2, images 210 and 220 collectively constitute twenty sequential images of three stop lights on a vehicle going from on to off and back on. Each of the twenty ordered images 210 and 220 may be mapped to a corresponding plurality of row vectors using a column-wise (column-wise) maximum operator and concatenated into a temporal image 230. For example, the top row in the temporal image 230 is a row vector representing the output of the column-wise max operator applied to the first ordered image, and the next few rows represent the progression over time of the twenty ordered images 210 and 220. In the embodiment shown in fig. 2, each of the twenty ordered images 210 and 220 is eighty pixels in height and eighty pixels in width, and each row vector is one pixel in height and eighty pixels in width, and the temporal image 230 is twenty pixels in height (one pixel for each row vector) and eighty pixels in width. The temporal image 230 may include a plurality of temporal sub-regions, such as a first region 240 where the stop lamp is on, a second region 250 where the stop lamp is off, and a third region 260 where the stop lamp is turned back on. The temporal image 230 is then fed into the CNN for signal classification, e.g., to identify when the vehicle applies, releases, and reapplies the vehicle brakes.

FIG. 3 is a block diagram representation of an image map 300 according to an exemplary embodiment. The image map 300 may be based on a received ordered set of images 310 and 320. As shown in fig. 3, images 310 and 320 collectively comprise twenty sequential images of right turn signal flashes. Each of the twenty ordered images 310 and 320 can be mapped to a corresponding plurality of row vectors using the column maximum operator and concatenated into a temporal image 330. For example, the top row in temporal image 330 is a row vector representing the output of the column-wise max operator applied to the first ordered image, and the next few rows represent the progression over time of twenty ordered images 310 and 320. In the embodiment shown in fig. 3, each of the twenty ordered images 310 and 320 is eighty pixels in height and eighty pixels in width, each row vector is one pixel in height and eighty pixels in width, and the temporal image 330 is twenty pixels in height (one pixel for each row vector) and eighty pixels in width. Temporal image 330 may include a plurality of temporal sub-regions, such as turn signal on regions 340, 350, and 360, and turn signal off regions 370, 380, and 390. The temporal image 330 is then fed into the CNN for signal classification, such as identifying when the vehicle issues a turn signal.

FIG. 4 is a block diagram representation of an image map 400 according to an exemplary embodiment. The image map 400 may be based on a received ordered set of captured traffic light images, shown herein as a traffic light map 410. As shown in fig. 4, the sequence of traffic light maps 410 collectively represents an ordered image of the traffic signal.

Each of the sequentially captured traffic light images may be mapped to a corresponding plurality of column vectors using a row-wise (row-wise) maximum operator and concatenated into a temporal image 420. For example, the leftmost column in the temporal image 420 is a column vector representing the output of the column-to-max operator applied to the first sequential image, and the next few columns represent the progression over time of the sequentially captured traffic light images. While only eight ordered graphs 410 are shown to illustrate the change in traffic signal, a total of two hundred sixty ordered captured traffic light images may be used to generate the temporal image 420. The size of the generated temporal image 420 may be based on the resolution of the captured traffic light image. In the embodiment shown in fig. 4, each ordered graph 410 represents an image of eighty pixels in height and thirty pixels in width, each column vector being one pixel in width and eighty pixels in height, and temporal image 430 being eighty pixels in height and two hundred sixty pixels in width (one pixel for each column vector). In some embodiments, the sampling frequency of the sequentially captured traffic light images may be selected to be representative of each state, e.g., representative of each light change. For example, if the timing between the red, red/yellow, and green states is known, less than 240 images may be captured to identify traffic light changes.

Temporal image 420 may include a plurality of temporal subregions, such as region 430 to illuminate red light, region 440 to illuminate red and yellow light, and region 450 to illuminate green light. While the ordered graph 410 shows progression from red to red/yellow to green, other sequences may be detected, such as from green to yellow to red to green. Further, while the ordered graph 410 shows a vertical traffic light direction, the traffic light and the ordered captured traffic light images may be received or captured horizontally, and the ordered graph 410 or the generated temporal image 420 may be rotated ninety degrees. The temporal image 420 is then fed into the CNN for signal classification, such as identifying the state or timing of traffic signals.

FIG. 5 is a block diagram representation of an image map 500 according to an exemplary embodiment. The image map 500 may include a plurality of input images 510. The plurality of input images 510 may be converted into a plurality of feature vectors 530 by the classifier 520. The classifier operates on a warped scaling (warp) of 64 x 64 pixels (e.g., red and gray) and stores the output feature vectors of the 64-entries. The plurality of feature vectors 530 may be concatenated into a feature vector map that is stored in a circular buffer 540. At least 16 of the feature vectors 530 may be stored in the circular buffer 540. Data from the circular buffer 540 may be separated and used by a flicker classifier 550 or a brake classifier 560.

The flicker and braking classifiers 550, 560 may consist of a backbone (backbone) network that converts each image into feature vectors of size 64-entries. The backbone is composed of convolutional and pooling layers and finally a full connection. The backbone features may include a brake classifier 560, which may include states of up (rise), down (fall), on, and off. The skeletal features may include a flicker classifier 550, which may include the state of a flicker or brake signal: braking (e.g., up, down, on, off), flashing (e.g., right on/off; left on/off). The backbone features may include a combination of a flicker classifier 550 and a brake classifier 560, which may be used to identify the combined state of the flicker and brake signals. The brake classifier may be used to process the two nearest feature vectors of the circular buffer 540 among the first sixteen feature vectors received after vehicle detection. Instead, the flicker classifier 550 may be used to classify the last sixteen detected signals. The circular buffer 540 may be used to store a moving window of feature vectors 530 so that the most recent N feature vectors 530 may be used by the flicker and brake classifiers 550, 560.

After training, the classification system may include at least three neural networks including a backbone network, a braking network, and a flickering network. The backbone network may receive a set of pixels for each image. For example, the set of pixels may include a plurality of 64 × 64 pixels with an effective axis of 50 pixels. This may include multiple vehicle images with at least two channels (e.g., red, gray). Analysis of the backbone vectors may produce backbone vectors, for example, 64 pixel long backbone vectors. The network may include a single hidden fully connected layer and an output layer having four neurons. Such a network may include an associated Maffe cost of approximately 3.6k cycles.

In an example, the backbone network may include a Maffe cost of approximately 342k cycles. This can be improved (e.g., reduced) with additional pruning (prune). The brake classifier 560 may operate on the nearest two row vectors to generate probabilities, such as for identifying brake off, brake on, brake up, or brake down.

The flicker classifier 550 may operate on the 16 row vectors to generate various probabilities. For example, each backbone may be reshaped to include a 1 × 16 horizontal vector and 4 channels. Each backbone may generate probabilities for one or more classification outputs, such as a left flashing signal (e.g., on signal, off signal) or a right flashing signal (e.g., on signal, off signal), a hazard signal (e.g., two turn signals flashing simultaneously), or a brake signal. In one example, the backbone network may include a Maffe cost of about 68K per cycle.

At each cycle, the backbone may operate on warped scaling, such as 64 × 64 warped scaling (e.g., redness and grayscale). The backbone may store sixty-four output vectors in a circular buffer that holds the last 16 results. In an example, the buffer may represent 1.78 seconds under the slow transaction table (agenda). The last two vectors may be used as inputs to the brake classifier 560. In the example, all 16 vectors are inputs to the flicker classifier 550. In an example, the brake classifier 560 signals (e.g., up, down, on, and off) may be integrated over time using a hidden Markov (hidden Markov) model to generate a multi-frame brake signal.

FIG. 6 is a block diagram representation of a shared image map 600 according to an exemplary embodiment. The shared image map 600 may include training one or more classifiers, such as a brake classifier 660 or a flicker classifier 670. One or more classifiers may be trained on one or more sets of training warp scaled (e.g., training warp scaled) data. In training, at least sixteen input images may be input to sixteen backbone networks with shared weights. In an example, sixteen classifiers 620 with shared weights can take a respective input image 610 and classify it to output a feature vector 630 for the image 610.

The feature vector pairs 640 are used as inputs to train a brake classifier 650. In an embodiment, the feature vector pairs 640 are organized into first and second, second and third, third and fourth, and so on until the feature vectors 630 are exhausted. This results in training N-1 brake classifiers for N input feature vectors (e.g., 16 input feature vectors have 15 pairs).

The feature vectors 630 are used to train a flicker classifier 660. In an embodiment, the feature vectors 630 of all input images 610 are used to train the flicker classifier 660. Thus, in the example shown in fig. 6, a total of 16 feature vectors 630 may be used as inputs to the flicker classifier 660.

Fig. 7 is a block diagram representation of a system 700 according to an example embodiment. System 700 may include various components depending on the requirements of a particular implementation. In some examples, system 700 may include a processing unit 710, an image acquisition unit 720, and one or more memory units 740, 750. Processing unit 710 may include one or more processing devices. In some embodiments, processing unit 710 may include an application processor 780, an image processor 790, or any other suitable processing device. Likewise, image acquisition unit 720 may include any number of image capture devices and components, depending on the requirements of a particular application. In some embodiments, image acquisition unit 720 may include one or more image capture devices (e.g., cameras), such as image capture device 722, image capture device 724, and image capture device 726. In some embodiments, system 700 may also include a data interface 728 communicatively connecting processing unit 710 with image capture device 720. For example, data interface 728 may include any wired and/or wireless link for communicating image data obtained by image capture device 720 to processing unit 710.

Both the application processor 780 and the image processor 790 may include various types of processing devices. For example, either or both of the application processor 780 and the image processor 790 may include one or more microprocessors, pre-processors (such as image pre-processors), graphics processors, Central Processing Units (CPUs), support circuits, digital signal processors, integrated circuits, memories, or any other type of device suitable for running applications and for performing image processing and analysis. In some embodiments, the application processor 780 and/or the image processor 790 may include any type of single or multi-core processor, mobile device microcontroller, central processing unit, or the like. Various processing devices may be used, including, for example, a microprocessor, a memory, a computer readable medium, a computer program, a computer readable medium, a computerObtained by the manufacturer and may include various architectures (e.g., x86 processingA device,Etc.).

In some embodiments, the application processor 780 and/or the image processor 790 may include a slave processorAny one of the obtained EyeQ series processor chips. These processor designs each include multiple processing units with local memory and instruction sets. Such processors may include video inputs for receiving image data from multiple image sensors, and may also include video output capabilities. Although fig. 7 depicts two separate processing devices included in processing unit 710, more or fewer processing devices may be used. For example, in some examples, a single processing device may be used to accomplish the tasks of the application processor 780 and the image processor 790. In other embodiments, these tasks may be performed by more than two processing devices.

The processing unit 710 may include various types of devices. For example, the processing unit 710 may include various devices such as a controller, an image preprocessor, a Central Processing Unit (CPU), support circuits, a digital signal processor, an integrated circuit, a memory, or any other type of device for image processing and analysis. The image preprocessor may include a video processor for capturing, digitizing, and processing imagery from the image sensor. The CPU may include any number of microcontrollers or microprocessors. The support circuits may be any number of circuits commonly known in the art, including cache, power supplies, clocks, and input-output circuits. The memory may store software that, when executed by the processor, controls the operation of the system. The memory may include a database and image processing software including a trained system such as a neural network. The memory may include any number of random access memories, read only memories, flash memories, disk drives, optical storage, removable storage, and other types of storage. In one example, the memory may be separate from the processing unit 710. In another example, memory may be integrated into processing unit 710.

Each memory 740, 750 may include software instructions that, when executed by a processor (e.g., application processor 780 and/or image processor 790), may control the operation of various aspects of system 700. These memory units may include various databases and image processing software. The memory units may include random access memory, read only memory, flash memory, disk drives, optical storage, tape storage, removable storage, and/or any other type of storage. In some examples, the memory unit 740, 750 may be separate from the application processor 780 and/or the image processor 790. In other embodiments, these memory units may be integrated into the application processor 780 and/or the image processor 790.

In some embodiments, the system may include a position sensor 730. The location sensor 730 may comprise any type of device suitable for determining a location associated with at least one component of the system 700. In some embodiments, the location sensor 730 may include a GPS receiver. Such a receiver can determine the user's position and velocity by processing signals broadcast by global positioning system satellites. The location information from the location sensor 730 may be available to the application processor 780 and/or the image processor 790.

In some embodiments, the system 700 may be operatively connected to various systems, devices, and units onboard a vehicle in which the system 700 may be installed, and the system 700 may communicate with the systems of the vehicle through any suitable interface (e.g., a communication bus). Examples of vehicle systems with which system 700 may cooperate include: a throttle system, a brake system and a steering system.

In some embodiments, system 700 may include a user interface 770. User interface 770 may include any device suitable for providing information to, or receiving input from, one or more users of system 700, including, for example, a touch screen, microphone, keyboard, pointing device, track wheel, camera, knobs, buttons, and the like. Information may be provided to a user by system 700 through user interface 770.

In some embodiments, the system 700 may include a map database 760. Map database 760 may include any type of database for storing digital map data. In some examples, map database 760 may include data relating to the location of various items in a reference coordinate system, including roads, water features, geographic features, points of interest, and the like. Map database 760 may store not only the locations of these items, but also descriptors relating to these items, including, for example, names and other information about any stored features. For example, the location and type of known obstacles may be included in a database, information about the topography of the road or the slope of certain points along the road, and so forth. In some embodiments, map database 760 may be physically located with other components of system 700. Alternatively or additionally, map database 760, or portions thereof, may be remotely located with respect to other components of system 700 (e.g., processing unit 710). In such embodiments, information from map database 760 may be downloaded via a wired or wireless data connection to a network (e.g., via a cellular network and/or the internet, etc.).

Image capture devices 722, 724, and 726 may each include any type of device suitable for capturing at least one image from an environment. Further, any number of image capture devices may be used to acquire images for input to the image processor. Some examples of the presently disclosed subject matter may include or may be implemented with only a single image capture device, while other examples may include or may be implemented with two, three, or even four or more image capture devices.

It will be appreciated that the system 700 may include or may be operatively associated with other types of sensors, including for example: acoustic sensors, RF sensors (e.g., radar transceivers), LIDAR sensors. Such sensors may be used independently of, or in conjunction with, image capture device 720. For example, data from a radar system (not shown) may be used to verify processed information received from processing images acquired by image acquisition device 720, e.g., to filter certain false positives (false positives) resulting from processing images acquired by image acquisition device 720, or it may be combined with or otherwise supplemented with image data from image acquisition device 720 or with some processed variant or derivative of image data from image acquisition device 720.

The system 700 or various components thereof may be incorporated into a variety of different platforms. In some embodiments, the system 700 may be included on a vehicle. For example, a vehicle may be equipped with the processing unit 710 and any other components, such as the system 700 described above with respect to fig. 7. While in some embodiments, the vehicle may be equipped with only a single image capture device (e.g., camera), in other embodiments multiple image capture devices may be used. For example, either of the image capture devices 722 and 724 of the vehicle may be part of an ADAS (advanced driving assistance system) imaging set.

The image capture device included on the vehicle as part of the image acquisition unit 720 may be located at any suitable location. In some embodiments, the image capture device 722 may be located in the vicinity of a rear view mirror. This position may provide a similar line of sight to the driver of the vehicle, which may help determine what the driver may and may not see. Other locations of the image capture device of the image acquisition unit 720 may also be used. For example, the image capture device 724 may be located on or in a bumper of the vehicle. Such a position may be particularly suitable for image capture devices with a wide field of view. The line of sight of the image capture device located at the bumper may be different from the line of sight of the driver. The image capture devices (e.g., image capture devices 722, 724, and 726) may also be located in other locations. For example, the image capture device may be located on or in one or both side-view mirrors of the vehicle, on the roof of the vehicle, on the hood of the vehicle, on the trunk of the vehicle, on the side of the vehicle, mounted on any window of the vehicle, behind or in front of any window, and in or near a lamp body in front of and/or behind the vehicle, etc. The image capturing unit 720 or an image capturing device that is one of a plurality of image capturing devices used in the image capturing unit 720 may have a field of view (FOV) different from the FOV of the driver of the vehicle and may not always see the same object. In one example, the FOV of the image acquisition unit 720 may extend outside of the FOV of a typical driver, and thus objects outside of the driver's FOV may be imaged. In another example, the FOV of the image acquisition unit 720 is some portion of the FOV of the driver. In some embodiments, the FOV of the image acquisition unit 720 corresponds to a sector that covers an area of the road in front of the vehicle, the surroundings of the road, or other areas.

In addition to the image capture device, the vehicle may include various other components of the system 700. For example, the processing unit 710 may be included on the vehicle, either integrated with an Engine Control Unit (ECU) of the vehicle, or separate therefrom. The vehicle may also be equipped with a position sensor 730, such as a GPS receiver, and may also include a map database 760 and memory units 740 and 750.

To better illustrate the methods and apparatus disclosed herein, a non-limiting list of embodiments is provided herein.

Example 1 is a system for fast CNN classification of a plurality of frames of semantic signals, the system comprising: a processing circuit; and one or more storage devices comprising instructions that, when executed by the processing circuitry, configure the processing circuitry to: receiving a plurality of time series images from an image capture device; converting the plurality of time series images into a plurality of vectors stored in a time series buffer; generating a temporal image based on the plurality of vectors; and generating a semantic signal based on applying a convolutional neural network to the temporal image.

In example 2, the subject matter of example 1 optionally includes: wherein each of the plurality of vectors comprises a row vector having a same width as each of the plurality of time series images.

In example 3, the subject matter of example 2 optionally includes: wherein, to convert the plurality of time series images into the plurality of vectors, the processing circuitry is configured to calculate a column value for each of a plurality of columns within each of the plurality of time series images.

In example 4, the subject matter of any one or more of examples 2-3 optionally includes: wherein calculating the column value comprises: at least one of an average, a median, or a maximum of each of a plurality of columns within each of the plurality of time series images is calculated.

In example 5, the subject matter of any one or more of examples 1-4 optionally includes: wherein the generating of the temporal image comprises: the vectors are connected in series to form the temporal image.

In example 6, the subject matter of any one or more of examples 1-5 optionally includes: wherein, to convert the plurality of time series images into the plurality of vectors, the processing circuitry is configured to obtain each of the plurality of vectors from a respective plurality of images using a classifier.

In example 7, the subject matter of any one or more of examples 1-6 optionally includes: wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuit is configured to use a flicker classifier.

In example 8, the subject matter of any one or more of examples 1-7 optionally includes: wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuit is configured to use a brake classifier.

In example 9, the subject matter of any one or more of examples 1-8 optionally includes: wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuit is configured to use a brake classifier on a pair of vectors of the plurality of vectors and a flicker classifier on the entire temporal image.

In example 10, the subject matter of any one or more of examples 1-9 optionally includes: wherein the brake classifier is trained for a plurality of brake signals.

In example 11, the subject matter of any one or more of examples 1-10 optionally includes: wherein the plurality of brake signals includes at least one of a brake-on signal, a brake-off signal, a brake-up signal, and a brake-down signal.

In example 12, the subject matter of any one or more of examples 1-11 optionally includes: wherein the flicker classifier is trained on a plurality of flicker signals.

In example 13, the subject matter of any one or more of examples 1-12 optionally includes: wherein the plurality of blinking signals include at least one of a right blinking open signal, a right blinking close signal, a left blinking open signal, and a left blinking close signal.

In example 14, the subject matter of any one or more of examples 1-13 optionally includes: wherein: the image capturing apparatus is mounted on a vehicle; the semantic signal indicates a changed path condition of the vehicle; and the instructions further configure the processing circuitry to: identifying a maneuver for the vehicle in response to the changed path condition; and sending a vehicle control signal to perform the maneuver.

In example 15, the subject matter of any one or more of examples 1-14 optionally includes a vehicle control device to receive the control signal and perform the vehicle maneuver.

Example 16 is an autonomous navigation semantic signal method, comprising: receiving a plurality of time series images from an image capture device, each of the plurality of time series images associated with a unique image capture time; mapping each of the plurality of time series images to each of a plurality of vectors; converting the plurality of vectors into a temporal image; and identifying a semantic signal based on applying a convolutional neural network to the temporal image.

In example 17, the subject matter of example 16 optionally includes: capturing the plurality of time series images; and associating the unique image capture time with each of the captured plurality of time series images.

In example 18, the subject matter of any one or more of examples 16-17 optionally includes: wherein each of the plurality of vectors comprises a row vector having a same width as each of the plurality of time series images.

In example 19, the subject matter of example 18 optionally includes: wherein mapping each of the plurality of time series images to each of a plurality of vectors comprises: a column value for each of a plurality of columns within each of the plurality of time series images is calculated.

In example 20, the subject matter of any one or more of examples 18-19 optionally includes: wherein calculating the column value comprises: at least one of an average, a median, or a maximum of each of a plurality of columns within each of the plurality of time series images is calculated.

In example 21, the subject matter of any one or more of examples 16-20 optionally includes: wherein the generating of the temporal image comprises: the vectors are connected in series to form the temporal image.

In example 22, the subject matter of any one or more of examples 16-21 optionally includes: wherein, to convert the plurality of time series images into the plurality of vectors, the processing circuitry is configured to obtain each of the plurality of vectors from the respective plurality of images using the classifier.

In example 23, the subject matter of any one or more of examples 16-22 optionally includes: wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuit is configured to use a flicker classifier.

In example 24, the subject matter of any one or more of examples 16-23 optionally includes: wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuit is configured to use a braking classifier.

In example 25, the subject matter of any one or more of examples 16-24 optionally includes: wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuit is configured to use a brake classifier on a pair of vectors of the plurality of vectors and a flicker classifier on the entire temporal image.

In example 26, the subject matter of any one or more of examples 16-25 optionally includes: wherein the brake classifier is trained for a plurality of brake signals.

In example 27, the subject matter of any one or more of examples 16-26 optionally includes: wherein the plurality of brake signals includes at least one of a brake-on signal, a brake-off signal, a brake-up signal, and a brake-down signal.

In example 28, the subject matter of any one or more of examples 16-27 optionally includes: wherein the flicker classifier is trained on a plurality of flicker signals.

In example 29, the subject matter of any one or more of examples 16-28 optionally includes: wherein the plurality of blinking signals include at least one of a right blinking open signal, a right blinking close signal, a left blinking open signal, and a left blinking close signal.

In example 30, the subject matter of any one or more of examples 16-29 optionally includes: identifying a vehicle maneuver based on the semantic signal; and transmitting a control signal to perform the vehicle manipulation to a vehicle control apparatus.

Example 31 is one or more machine-readable media comprising instructions that, when executed by a computing system, cause the computing system to perform any of the methods of examples 16-30.

Example 32 is an apparatus comprising means for performing any of the methods of examples 16-30.

Example 33 is a computer program product storing instructions that, when executed by a computerized system, cause the computerized system to perform operations comprising: receiving a plurality of time series images from an image capture device, each of the plurality of time series images associated with a unique image capture time; mapping each of the plurality of time series images to each of a plurality of vectors; converting the plurality of vectors into a temporal image; and identifying a semantic signal based on applying a convolutional neural network to the temporal image.

In example 34, the subject matter of example 33 optionally includes: capturing the plurality of time series images; and associating the unique image capture time with each of the captured plurality of time series images.

In example 35, the subject matter of any one or more of examples 33-34 can optionally include: wherein each of the plurality of vectors comprises a row vector having a same width as each of the plurality of time series images.

In example 36, the subject matter of example 35 optionally includes: wherein mapping each of the plurality of time series images to each of a plurality of vectors comprises: a column value for each of a plurality of columns within each of the plurality of time series images is calculated.

In example 37, the subject matter of any one or more of examples 35-36 optionally includes: wherein calculating the column value comprises: at least one of an average, a median, or a maximum of each of a plurality of columns within each of the plurality of time series images is calculated.

In example 38, the subject matter of any one or more of examples 33-37 optionally includes: wherein the generating of the temporal image comprises: the vectors are connected in series to form the temporal image.

In example 39, the subject matter of any one or more of examples 33-38 optionally includes: wherein, to convert the plurality of time series images into the plurality of vectors, the processing circuitry is configured to obtain each of the plurality of vectors from the respective plurality of images using the classifier.

In example 40, the subject matter of any one or more of examples 33-39 optionally includes: wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuit is configured to use a flicker classifier.

In example 41, the subject matter of any one or more of examples 33-40 optionally includes: wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuit is configured to use a braking classifier.

In example 42, the subject matter of any one or more of examples 33-41 optionally includes: wherein to generate the semantic signal based on applying a convolutional neural network to the temporal image, the processing circuit is configured to use a brake classifier on a pair of vectors of the plurality of vectors and a flicker classifier on the entire temporal image.

In example 43, the subject matter of any one or more of examples 33-42 optionally includes: the brake classifier is trained for a plurality of brake signals.

In example 44, the subject matter of any one or more of examples 33-43 optionally includes: wherein the plurality of brake signals includes at least one of a brake-on signal, a brake-off signal, a brake-up signal, and a brake-down signal.

In example 45, the subject matter of any one or more of examples 33-44 optionally includes: wherein the flicker classifier is trained on a plurality of flicker signals.

In example 46, the subject matter of any one or more of examples 33-45 optionally includes: wherein the plurality of blinking signals include at least one of a right blinking open signal, a right blinking close signal, a left blinking open signal, and a left blinking close signal.

In example 47, the subject matter of any one or more of examples 33-46 optionally includes: identifying a vehicle maneuver based on the semantic signal; and transmitting a control signal to perform the vehicle manipulation to a vehicle control apparatus.

Example 48 is a classification system comprising: a memory with instructions that, when executed by a processing unit, cause the processing unit to implement a classification trainer, the classification trainer comprising: a backbone network for converting the plurality of images into feature vectors; a braking network for generating a probability for each of a plurality of braking signals; and a flicker network for generating a probability for each of the plurality of flicker signals.

In example 49, the subject matter of example 48 optionally comprising: the backbone network comprises a convolutional layer, a pooling layer and a full-connection layer.

In example 50, the subject matter of any one or more of examples 48-49 optionally includes: wherein: the plurality of images includes a plurality of two-channel vehicle images; and the backbone network converts the plurality of images into feature vectors with sixty-four lengths.

In example 51, the subject matter of any one or more of examples 48-50 optionally includes: wherein: classifying a set of two backbone features for the plurality of brake signals; and classifying a set of sixteen backbone features for the plurality of braking signals and the plurality of flicker signals.

In example 52, the subject matter of any one or more of examples 48-51 optionally includes: wherein: classifying a set of two backbone features for a first subset of sixteen frames received after a vehicle is detected; and classifying a set of sixteen backbone features when at least a complete set of sixteen frames has been received.

In example 53, the subject matter of any one or more of examples 48-52 optionally includes: wherein the braking network comprises a single hidden fully connected layer and an output layer with four neurons.

In example 54, the subject matter of any one or more of examples 48-53 optionally includes: wherein the braking network operates on two row vectors to generate a probability for each of the plurality of braking signals.

In example 55, the subject matter of any one or more of examples 48-54 optionally includes: wherein the plurality of brake signals includes at least one of a brake-on signal, a brake-off signal, a brake-up signal, and a brake-down signal.

In example 56, the subject matter of any one or more of examples 48-55 optionally includes: wherein the flicker network operates on sixteen row vectors to generate a probability for each of the plurality of flicker signals.

In example 57, the subject matter of any one or more of examples 48-56 optionally includes: the sixteen row vectors include a reshaped set of sixteen backbone features, and the reshaped set of sixteen backbone features includes a horizontal vector with a length of sixteen and four channels.

In example 58, the subject matter of any one or more of examples 48-57 optionally includes: wherein the plurality of blinking signals include at least one of a right blinking open signal, a right blinking close signal, a left blinking open signal, and a left blinking close signal.

Example 59 is a method of classification, comprising: training a classification trainer, the classification trainer comprising: a backbone network for converting the plurality of images into feature vectors; a braking network for generating a probability for each of a plurality of braking signals; and a flicker network for generating a probability for each of the plurality of flicker signals.

In example 60, the subject matter of example 59 optionally includes: the backbone network comprises a convolutional layer, a pooling layer and a full-connection layer.

In example 61, the subject matter of any one or more of examples 59-60 optionally includes: wherein: the plurality of images includes a plurality of two-channel vehicle images; and the backbone network converts the plurality of images into feature vectors with sixty-four lengths.

In example 62, the subject matter of any one or more of examples 59-61 optionally includes: wherein: classifying a set of two backbone features for the plurality of brake signals; and classifying a set of sixteen backbone features for the plurality of braking signals and the plurality of flicker signals.

In example 63, the subject matter of any one or more of examples 59-62 optionally includes: wherein: classifying a set of two backbone features for a first subset of sixteen frames received after a vehicle is detected; and classifying a set of sixteen backbone features when at least a complete set of sixteen frames has been received.

In example 64, the subject matter of any one or more of examples 59-63 optionally includes: wherein the braking network comprises a single hidden fully connected layer and an output layer with four neurons.

In example 65, the subject matter of any one or more of examples 59-64 optionally includes: wherein the braking network operates on two row vectors to generate a probability for each of the plurality of braking signals.

In example 66, the subject matter of any one or more of examples 59-65 optionally includes: wherein the plurality of brake signals includes at least one of a brake-on signal, a brake-off signal, a brake-up signal, and a brake-down signal.

In example 67, the subject matter of any one or more of examples 59-66 optionally includes: wherein the flicker network operates on sixteen row vectors to generate a probability for each of the plurality of flicker signals.

In example 68, the subject matter of any one or more of examples 59-67 optionally includes: the sixteen row vectors include a reshaped set of sixteen backbone features, and the reshaped set of sixteen backbone features includes a horizontal vector with a length of sixteen and four channels.

In example 69, the subject matter of any one or more of examples 59-68 optionally includes: wherein the plurality of blinking signals include at least one of a right blinking open signal, a right blinking close signal, a left blinking open signal, and a left blinking close signal.

Example 70 is a computer program product storing instructions that, when executed by a computerized system, cause the computerized system to perform operations comprising: training a classification trainer, the classification trainer comprising: a backbone network for converting the plurality of images into feature vectors; a braking network for generating a probability for each of a plurality of braking signals; and a flicker network for generating a probability for each of the plurality of flicker signals.

In example 71, the subject matter of example 70 optionally includes: the backbone network comprises a convolutional layer, a pooling layer and a full-connection layer.

In example 72, the subject matter of any one or more of examples 70-71 optionally includes: wherein: the plurality of images includes a plurality of two-channel vehicle images; and the backbone network converts the plurality of images into feature vectors with sixty-four lengths.

In example 73, the subject matter of any one or more of examples 70-72 optionally includes: wherein: classifying a set of two backbone features for the plurality of brake signals; and classifying a set of sixteen backbone features for the plurality of braking signals and the plurality of flicker signals.

In example 74, the subject matter of any one or more of examples 70-73 optionally includes: wherein: classifying a set of two backbone features for a first subset of sixteen frames received after a vehicle is detected; and classifying a set of sixteen backbone features when at least a complete set of sixteen frames has been received.

In example 75, the subject matter of any one or more of examples 70-74 optionally includes: wherein the braking network comprises a single hidden fully connected layer and an output layer with four neurons.

In example 76, the subject matter of any one or more of examples 70-75 optionally includes: wherein the braking network operates on two row vectors to generate a probability for each of the plurality of braking signals.

In example 77, the subject matter of any one or more of examples 70-76 optionally includes: wherein the plurality of brake signals includes at least one of a brake-on signal, a brake-off signal, a brake-up signal, and a brake-down signal.

In example 78, the subject matter of any one or more of examples 70-77 optionally includes: wherein the flicker network operates on sixteen row vectors to generate a probability for each of the plurality of flicker signals.

In example 79, the subject matter of any one or more of examples 70-78 optionally includes: the sixteen row vectors include a reshaped set of sixteen backbone features, and the reshaped set of sixteen backbone features includes a horizontal vector with a length of sixteen and four channels.

In example 80, the subject matter of any one or more of examples 70-79 optionally includes: wherein the plurality of blinking signals include at least one of a right blinking open signal, a right blinking close signal, a left blinking open signal, and a left blinking close signal.

Example 81 is one or more machine-readable media comprising instructions that, when executed by a machine, cause the machine to perform any of the operations of examples 1-80.

Example 82 is an apparatus comprising means for performing any of the operations of examples 1-80.

Example 83 is a system to perform the operations of any of examples 1-80.

Example 84 is a method to perform the operations of any of examples 1-80.

The foregoing detailed description includes references to the accompanying drawings, which form a part hereof. The drawings show, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are also referred to herein as "examples". These examples may include elements in addition to those shown or described. However, it is also contemplated by the inventors that only examples of those elements shown or described may be provided. Moreover, the inventors also contemplate examples using any combination or permutation of those elements (or one or more aspects thereof), whether shown or described with respect to a particular example (or one or more aspects thereof), or with respect to other examples (or one or more aspects thereof) shown or described herein.

Any reference to a system shall be made to a computer program product adapted for use in a method performed by the system and/or storing instructions that, once executed by the system, will cause the system to perform the method. The computer program product is non-transitory and may be, for example, an integrated circuit, a magnetic memory, an optical memory, a magnetic disk, or the like.

Any reference to a method shall be made to a computer program product adapted for use with a system configured to perform the method and/or to store instructions that, once executed by the system, will cause the system to perform the method.

Any reference to a computer program product should be made to a system adapted for use in the method performed by the system and/or configured to execute instructions stored in the computer program product.

The term "and/or" is additionally or alternatively.

In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims.

Furthermore, the terms "front," "back," "top," "bottom," "over," "under," and the like in the description and in the claims, if any, are used for descriptive purposes and not necessarily for describing permanent relative positions. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in other orientations than those illustrated or otherwise described herein.

Any arrangement of components to achieve the same functionality is effectively "associated" such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as "associated with" each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being "operably connected," or "operably coupled," to each other to achieve the desired functionality.

Further, those skilled in the art will recognize that boundaries between the above described operations merely illustrative. Multiple operations may be combined into a single operation, single operations may be distributed in additional operations, and operations may be performed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation and the order of operations may be altered in various other embodiments.

However, other modifications, variations, and alternatives are also possible. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

The phrase "may be X" indicates that condition X may be satisfied. This phrase also indicates that condition X may not be satisfied. For example, any reference to a system including a certain component should also cover scenarios in which the system does not include the certain component.

The terms "comprising," "including," "having," "consisting of …," and "consisting essentially of …" are used interchangeably. For example, any of the methods may include at least the steps included in the figures and/or the description, and may include only the steps included in the figures and/or the description. The same applies to the system and the mobile computer.

It will be appreciated that for simplicity and clarity of illustration, elements illustrated in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.

For another example, in one embodiment, the illustrated examples may be implemented as circuitry located on a single integrated circuit or within a same device. Alternatively, these examples may be implemented as any number of separate integrated circuits or separate devices interconnected with one another in a suitable manner.

Also for example, these examples, or portions thereof, may be implemented as soft or code representations of physical circuitry or of logical representations convertible into physical circuitry, such as in any suitable type of hardware description language.

In addition, the present invention is not limited to physical devices or units implemented in non-programmable hardware, but may also be applied to programmable devices or units capable of performing desired device functions by operating in accordance with appropriate program code, such as mainframes, microcomputers, servers, workstations, personal computers, notebooks, personal digital assistants, electronic games, automobiles and other embedded systems, cell phones, and other various wireless devices, which are generally referred to as "computer systems" in this application.

Other modifications, variations, and alternatives are also possible. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of other elements or steps than those listed in a claim. Furthermore, the terms "a" or "an," as used herein, are defined as one or more than one. In addition, the use of introductory phrases such as "at least one" and "one or more" in the claims should not be construed to imply that the introduction of another claim element by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim element to inventions containing only one such element, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an". The same holds true for the use of definite articles. Unless otherwise specified, terms such as "first" and "second" are used to arbitrarily distinguish between the elements such terms describe. Thus, the terms are not necessarily intended to indicate temporal or other prioritization of such elements, but the mere fact that certain measures are recited in mutually different claims does not indicate that a combination of these measures cannot be used to advantage.

While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.

Any combination of any of the components and/or system elements illustrated in any of the figures and/or the description and/or the claims may be provided. Any combination of any of the systems described in any of the figures and/or the description and/or the claims may be provided. Any combination of the steps, operations and/or methods illustrated in any of the figures and/or description and/or claims may be provided. Any combination of the operations illustrated in any of the figures and/or the description and/or the claims may be provided. Any combination of the methods illustrated in any of the figures and/or the description and/or the claims may be provided.

Moreover, although illustrative embodiments have been described herein, the scope of any and all embodiments having equivalent elements, modifications, omissions, combinations (e.g., of aspects across various embodiments), adaptations and/or alterations will be apparent to those in the art based on this disclosure. The limitations in the claims are to be interpreted broadly based on the language employed in the claims and not limited to examples described in the specification or during the prosecution of the application. These examples should be construed as non-exclusive. Further, the steps of the disclosed methods may be modified in any manner, including by reordering steps and/or inserting or deleting steps. It is intended, therefore, that the specification and examples be considered as exemplary only, with a true scope and spirit being indicated by the following claims and their full scope of equivalents.

30页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：使用软生物特征的认证验证

CNN classification of multi-frame semantic signals

相关技术

网友询问留言