Imaging apparatus, imaging system, imaging method, and imaging program

文档序号：835727 发布日期：2021-03-30 浏览：8次中文

阅读说明：本技术 成像设备、成像系统、成像方法和成像程序 (Imaging apparatus, imaging system, imaging method, and imaging program ) 是由佐藤竜太青木卓山本启太郎浴良仁于 2019-09-02 设计创作，主要内容包括：本发明包括：成像单元(10),其包括布置有多个像素的像素区域,从包括在像素区域中的像素读出像素信号,并输出该像素信号；读出单元控制器(111a),其控制被设置为像素区域的一部分的读出单元；第一读出单元设置单元(123),其设置第一读出单元,通过该第一读出单元从像素区域读出像素信号以进行已经学习了每个读出单元的训练数据的识别处理；第二读出单元设置单元(142),其设置第二读出单元,通过该第二读出单元从像素区域读出像素信号以将像素信号输出到后续级；以及调解单元(1110),其通过第一读出单元和第二读出单元执行调解,其中,该读出单元控制器通过来自调解单元的调解来设置读出单元。(The invention comprises the following steps: an imaging unit (10) including a pixel region in which a plurality of pixels are arranged, reading out a pixel signal from a pixel included in the pixel region, and outputting the pixel signal; a readout unit controller (111a) that controls a readout unit provided as a part of the pixel region; a first readout unit setting unit (123) that sets a first readout unit by which pixel signals are read out from the pixel region to perform recognition processing in which training data of each readout unit has been learned; a second readout unit setting unit (142) that sets a second readout unit by which pixel signals are read out from the pixel region to output the pixel signals to a subsequent stage; and a mediation unit (1110) that performs mediation by the first readout unit and the second readout unit, wherein the readout unit controller sets the readout unit by mediation from the mediation unit.)

1. An image forming apparatus comprising:

an imaging unit having a pixel region in which a plurality of pixels are arranged, and reading and outputting pixel signals from the pixels included in the pixel region;

a readout unit controller that controls a readout unit provided as a part of the pixel region;

a first readout unit setting unit that sets a first readout unit for reading out the pixel signals from the pixel region to perform recognition processing of training data for which each of the readout units has been learned;

a second readout unit setting unit that sets a second readout unit for reading out the pixel signal from the pixel region to output the pixel signal to a subsequent stage; and

a mediation unit that performs mediation between the first readout unit and the second readout unit,

wherein the readout unit controller sets the readout unit by mediation by the mediation unit.

2. The image forming apparatus as claimed in claim 1,

wherein the mediation unit performs the mediation by a logical product of the first readout unit and the second readout unit.

3. The image forming apparatus as claimed in claim 1,

wherein the mediation unit performs the mediation based on a result of the recognition processing.

4. The image forming apparatus as claimed in claim 3,

wherein the mediation unit selects the second readout unit in a case where a result of the recognition processing indicates recognition of a moving body.

5. The image forming apparatus as claimed in claim 3,

wherein the mediation unit selects the second reading unit in a case where a result of the recognition processing indicates a recognition confidence of a threshold value or less.

6. The image forming apparatus as claimed in claim 1,

wherein the mediation unit performs the mediation based on the pixel signal read out from the second readout unit.

7. The image forming apparatus as claimed in claim 6,

wherein the mediation unit selects the second readout unit in a case where the brightness based on the pixel signal exceeds a threshold.

8. The image forming apparatus as claimed in claim 1,

wherein the mediation unit performs the mediation based on external information provided from outside of the imaging apparatus.

9. The image forming apparatus as claimed in claim 8,

wherein the mediation unit performs the mediation based on an operation mode provided from the outside.

10. The image forming apparatus as claimed in claim 8,

wherein the mediation unit performs the mediation based on a detection output of another sensor device provided from the outside.

11. An imaging system, comprising:

an image forming apparatus provided with:

an imaging unit having a pixel region in which a plurality of pixels are arranged, and reading and outputting pixel signals from the pixels included in the pixel region;

a readout unit controller that controls a readout unit provided as a part of the pixel region;

a second readout unit setting unit that sets a second readout unit for reading out the pixel signal from the pixel region to output the pixel signal to a subsequent stage; and

a mediation unit that performs mediation between the first readout unit and the second readout unit; and

an information processing device equipped with the identification unit that executes the identification processing,

wherein the readout unit controller sets the readout unit by mediation by the mediation unit.

12. An imaging method performed by a processor, comprising:

a readout unit control step of controlling a readout unit provided as a part of a pixel region in which a plurality of pixels are arranged, the pixel region being included in an imaging unit;

a first readout unit setting step of setting a first readout unit for reading out pixel signals from pixels included in the pixel region to perform a recognition process of training data for which each of the readout units has been learned;

a second readout unit setting step of setting a second readout unit for reading out the pixel signal from the pixel region to output the pixel signal to a subsequent stage; and

a mediation step of performing mediation between the first readout unit and the second readout unit,

wherein the readout unit control step sets the readout unit by mediation of the mediation step.

13. An imaging program causing a processor to execute:

a readout unit control step of controlling a readout unit provided as a part of a pixel region in which a plurality of pixels are arranged, the pixel region being included in an imaging unit;

a second readout unit setting step of setting a second readout unit for reading out the pixel signal from the pixel region to output the pixel signal to a subsequent stage; and

a mediation step of performing mediation between the first readout unit and the second readout unit,

wherein the readout unit control step sets the readout unit by mediation of the mediation step.

Technical Field

The present disclosure relates to an imaging apparatus, an imaging system, an imaging method, and an imaging program.

Background

In recent years, with the complication of imaging apparatuses such as small cameras mounted on digital still cameras, digital video cameras, and multifunctional mobile phones (smartphones), imaging apparatuses equipped with an image recognition function of recognizing a predetermined object included in a captured image have been developed.

Reference list

Patent document

Patent document 1: JP 2017-112409A.

Disclosure of Invention

Technical problem

In general, an image suitable for recognition processing in an image recognition function is different from an image suitable for visual recognition of a person. Therefore, when attempting to improve the recognition accuracy in an imaging apparatus equipped with an image recognition function, it may be difficult to use an image captured for recognition processing as an image that can sufficiently provide information as an image for visual recognition.

The present disclosure aims to provide an imaging apparatus, an imaging system, an imaging method, and an imaging program capable of realizing both imaging for recognition processing and imaging for visual recognition.

Solution to the problem

In order to solve the above problem, an image forming apparatus according to an aspect of the present disclosure has: an imaging unit having a pixel region in which a plurality of pixels are arranged, and reading and outputting pixel signals from the pixels included in the pixel region; a readout unit controller that controls a readout unit provided as a part of the pixel region; a first readout unit setting unit that sets a first readout unit for reading out pixel signals from the pixel region to perform a recognition process of training data for which each of the readout units has been learned; a second readout unit setting unit that sets a second readout unit for reading out the pixel signal from the pixel region to output the pixel signal to a subsequent stage; and a mediation unit that performs mediation between the first readout unit and the second readout unit, wherein the readout unit controller sets the readout unit by mediation by the mediation unit.

Drawings

Fig. 1 is a block diagram showing a configuration of an example of an image forming apparatus applicable to each embodiment of the present disclosure.

Fig. 2A is a schematic diagram showing an example of a hardware configuration of an image forming apparatus according to each embodiment.

Fig. 2B is a schematic diagram showing an example of a hardware configuration of an image forming apparatus according to each embodiment.

Fig. 3A is a diagram illustrating an example in which an image forming apparatus according to each embodiment is formed as a stacked CIS having a two-layer structure.

Fig. 3B is a diagram illustrating an example in which an image forming apparatus according to each embodiment is formed as a stacked CIS having a three-layer structure.

Fig. 4 is a block diagram showing a configuration of an example of a sensor unit applicable to each embodiment.

Fig. 5A is a schematic view showing a rolling shutter method (rolling shutter method).

Fig. 5B is a schematic view illustrating a rolling shutter method.

Fig. 5C is a schematic view illustrating a rolling shutter method.

Fig. 6A is a schematic diagram illustrating line thinning in the rolling shutter method.

Fig. 6B is a schematic diagram showing line thinning in the rolling shutter method.

Fig. 6C is a schematic diagram illustrating line thinning in the rolling shutter method.

Fig. 7A is a diagram schematically illustrating an example of another imaging method among rolling shutter methods.

Fig. 7B is a diagram schematically illustrating an example of another imaging method among the rolling shutter methods.

Fig. 8A is a schematic diagram showing a global shutter method.

Fig. 8B is a schematic diagram showing a global shutter method.

Fig. 8C is a schematic diagram showing the global shutter method.

Fig. 9A is a diagram schematically showing an example of a sampling pattern that can be realized in the global shutter method.

Fig. 9B is a diagram schematically showing an example of a sampling pattern that can be realized in the global shutter method.

Fig. 10 is a diagram schematically showing an image recognition process performed by the CNN.

Fig. 11 is a diagram schematically showing an image recognition process for obtaining a recognition result from a part of an image as a recognition target.

Fig. 12A is a diagram schematically showing an example of recognition processing performed by DNN when time series information is not used.

Fig. 12B is a diagram schematically showing an example of recognition processing performed by the DNN when time-series information is not used.

Fig. 13A is a diagram schematically showing a first example of the recognition processing performed by the DNN when time-series information is used.

Fig. 13B is a diagram schematically showing a first example of the recognition processing performed by the DNN when time-series information is used.

Fig. 14A is a diagram schematically showing a second example of the recognition processing performed by the DNN when time-series information is used.

Fig. 14B is a diagram schematically showing a second example of the recognition processing performed by the DNN when time-series information is used.

Fig. 15A is a diagram showing a relationship between a frame driving speed and a pixel signal readout amount.

Fig. 15B is a diagram showing a relationship between a frame driving speed and a pixel signal readout amount.

Fig. 16 is a schematic diagram schematically illustrating an identification process according to each embodiment of the present disclosure.

Fig. 17 is a flowchart showing an example of the recognition processing performed by the recognition processing unit according to the first embodiment.

Fig. 18 is a diagram showing an example of image data of one frame.

Fig. 19 is a diagram showing a flow of machine learning processing executed by the recognition processing unit according to the first embodiment.

Fig. 20A is a schematic diagram showing an application example of the first embodiment.

Fig. 20B is a schematic diagram showing an application example of the first embodiment.

Fig. 21 is a functional block diagram showing an example of functions of an image forming apparatus according to the second embodiment.

Fig. 22 is a diagram showing an example of processing in the recognition processing unit according to the second embodiment in more detail.

Fig. 23 is a functional block diagram showing an example of functions according to the second embodiment.

Fig. 24 is a schematic diagram showing a frame readout process according to the second embodiment.

Fig. 25 is a diagram showing an overview of the identification process according to the second embodiment.

Fig. 26 is a diagram showing an example of terminating the identification process in the middle of frame readout.

Fig. 27 is a diagram showing an example of terminating the identification process in the middle of frame readout.

Fig. 28 is a flowchart showing an example of the identification process according to the second embodiment.

Fig. 29A is a timing chart showing an example of control readout and recognition processing according to the second embodiment.

Fig. 29B is a timing chart showing an example of control readout and recognition processing according to the second embodiment.

Fig. 30 is a timing chart showing another example of the control readout and recognition process according to the second embodiment.

Fig. 31 is a flowchart showing an example of control according to the third embodiment.

Fig. 32 is a diagram schematically showing an example of output control processing according to the third embodiment.

Fig. 33A is a functional block diagram showing functions of an example of the recognition processing unit side of the image forming apparatus according to the third embodiment.

Fig. 33B is a functional block diagram showing functions of an example of the visual recognition processing unit side of the imaging apparatus according to the third embodiment.

Fig. 34 is a flowchart showing an example of processing when a trigger signal is output according to time according to the third embodiment.

Fig. 35 is a diagram schematically showing an example of an output control process according to a first modification of the third embodiment.

Fig. 36A is a functional block diagram showing functions of an example of the recognition processing unit side of the image forming apparatus according to the first modification of the third embodiment.

Fig. 36B is a functional block diagram showing functions of an example on the visual recognition processing unit side of the imaging apparatus according to the first modification of the third embodiment.

Fig. 37 is a flowchart showing an example of processing according to the first modification of the third embodiment.

Fig. 38 is a diagram schematically showing an example of an output control process according to a second modification of the third embodiment.

Fig. 39A is a functional block diagram showing functions of an example of the recognition processing unit side of the image forming apparatus according to the second modification of the third embodiment.

Fig. 39B is a functional block diagram showing functions of an example on the visual recognition processing unit side of an imaging apparatus according to a second modification of the third embodiment.

Fig. 40 is a flowchart showing an example of processing according to a second modification of the third embodiment.

Fig. 41A is a functional block diagram showing functions of an example on the identification processing unit side of an imaging apparatus according to a third modification of the third embodiment.

Fig. 41B is a functional block diagram showing functions of an example on the visual recognition processing unit side of an imaging apparatus according to a third modification of the third embodiment.

Fig. 42 is a flowchart showing an example of processing according to a third modification of the third embodiment.

Fig. 43 is a diagram schematically showing an example of output control processing according to the fourth embodiment.

Fig. 44 is a functional block diagram showing functions of an example of an image forming apparatus according to a fourth embodiment.

Fig. 45 is a flowchart showing an example of processing according to the fourth embodiment.

Fig. 46 is a functional block diagram showing functions of an example of an image forming apparatus according to a first modification of the fourth embodiment.

Fig. 47 is a flowchart showing an example of processing according to the first modification of the fourth embodiment.

Fig. 48 is a diagram schematically showing an example of an output control process according to a second modification of the fourth embodiment.

Fig. 49 is a functional block diagram showing functions of an example of an image forming apparatus 1 according to a second modification of the fourth embodiment.

Fig. 50 is a flowchart showing an example of processing according to a first modification of the fourth embodiment.

Fig. 51 is a flowchart showing an example of an overview of mediation processing according to the fifth embodiment.

Fig. 52 is a functional block diagram showing an example of functions of the image forming apparatus 1 applicable to the fifth embodiment.

Fig. 53 is a schematic diagram showing mediation processing according to the fifth embodiment.

Fig. 54 is an exemplary flowchart showing mediation processing according to the fifth embodiment.

Fig. 55 is a functional block diagram showing an example of functions of an image forming apparatus applicable to the first modification of the fifth embodiment.

Fig. 56 is a schematic diagram showing a first example of mediation processing according to a first modification of the fifth embodiment.

Fig. 57 is a schematic diagram showing a second example of mediation processing according to the first modification of the fifth embodiment.

Fig. 58 is a flowchart showing an example of mediation processing according to the first modification of the fifth embodiment.

Fig. 59 is a functional block diagram showing an example of functions of an image forming apparatus applicable to the second modification of the fifth embodiment.

Fig. 60 is a schematic diagram showing a mediation process according to a second modification of the fifth embodiment.

Fig. 61 is a flowchart showing an example of mediation processing according to a second modification of the fifth embodiment.

Fig. 62 is a functional block diagram showing an example of functions of the image forming apparatus 1 applicable to the third modification of the fifth embodiment.

Fig. 63 is a flowchart showing an example of mediation processing according to a third modification of the fifth embodiment.

Fig. 64 is a diagram showing a use example of an imaging apparatus to which the technique of the present disclosure is applied.

Fig. 65 is a block diagram showing an example of a schematic configuration of a vehicle control system.

Fig. 66 is a diagram showing an example of the mounting positions of the vehicle exterior information detecting unit and the imaging portion.

Detailed Description

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. In each embodiment below, the same portions are denoted by the same reference numerals, and a repetitive description thereof will be omitted.

Hereinafter, embodiments of the present disclosure will be described in the following order.

1. Configuration example according to each embodiment of the present disclosure

2. Examples of prior art applicable to the present disclosure

2-1 overview of roller shutter

2-2 overview of Global shutter

2-3. Deep Neural Network (DNN)

2-3-1 overview of Convolutional Neural Networks (CNN)

2-3-2 overview of Recurrent Neural Networks (RNNs)

2-4. driving speed

3. Summary of the disclosure

4. First embodiment

4-1. identifying operation examples in a processing Unit

4-2. identifying specific examples of operations in a processing Unit

4-3 application example of the first embodiment

5. Second embodiment

5-0-1. configuration example according to the second embodiment

5-0-2. example of processing in the recognition processing unit according to the second embodiment

5-0-3 details of the recognition processing according to the second embodiment

5-0-4. example of control of readout and recognition processing according to the second embodiment

6. Third embodiment

6-0 overview of the third embodiment

6-0-1 examples of outputting trigger signals over time

6-1. first modification of third embodiment

6-2. second modification of third embodiment

6-3. third modification of third embodiment

7. Fourth embodiment

7-1. first modification of the fourth embodiment

7-2. second modification of the fourth embodiment

8. Fifth embodiment

8-0-1 specific examples of mediation processing

8-1. first modification of fifth embodiment

8-2. second modification of fifth embodiment

8-3. third modification of fifth embodiment

9. Sixth embodiment

[1. configuration example according to each embodiment of the present disclosure ]

The configuration of an imaging apparatus according to the present disclosure will be schematically described. Fig. 1 is a block diagram showing a configuration of an example of an imaging apparatus applicable to each embodiment of the present disclosure. In fig. 1, an imaging apparatus 1 includes a sensor unit 10, a sensor controller 11, a recognition processing unit 12, a memory 13, a visual recognition processing unit 14, and an output controller 15. The imaging device 1 is a Complementary Metal Oxide Semiconductor (CMOS) image sensor (CIS), in which these individual units are integrally formed by using CMOS. The imaging apparatus 1 is not limited to this example, and may be another type of optical sensor, such as an infrared light sensor that performs imaging with infrared light.

The sensor unit 10 outputs a pixel signal corresponding to light emitted to the light receiving surface via the optical unit 30. More specifically, the sensor unit 10 has a pixel array in which pixels including at least one photoelectric conversion element are arranged in a matrix. The light receiving surface is formed of individual pixels arranged in a matrix in a pixel matrix. The sensor unit 10 further includes: a driving circuit for driving respective pixels included in the pixel array; and a signal processing circuit that performs predetermined signal processing on the signal read out from each pixel, and outputs the processed signal as a pixel signal for each pixel. The sensor unit 10 outputs a pixel signal of each pixel included in the pixel area as image data in a digital format.

Hereinafter, in the pixel array included in the sensor unit 10, a region in which effective pixels for generating pixel signals are arranged will be referred to as a frame. The frame image data is formed of pixel data based on each pixel signal output from each pixel included in the frame. Further, each row in the pixel array of the sensor unit 10 is referred to as a line (line), and pixel data based on a pixel signal output from each pixel included in the line is to form line image data. The operation in which the sensor unit 10 outputs a pixel signal corresponding to light applied to the light receiving surface is referred to as imaging. The sensor unit 10 controls exposure at the time of imaging and a gain (analog gain) of a pixel signal according to an imaging control signal supplied from a sensor controller 11 described below.

The sensor controller 11, which is constituted by a microprocessor, for example, controls readout of pixel data from the sensor unit 10, and outputs the pixel data based on each pixel signal read out from each pixel included in a frame. The pixel data output from the sensor controller 11 is transferred to the recognition processing unit 12 and the visual recognition processing unit 14.

Further, the sensor controller 11 generates an imaging control signal for controlling imaging in the sensor unit 10. The sensor controller 11 generates an imaging control signal, for example, according to instructions from a recognition processing unit 12 and a visual recognition processing unit 14, which will be described below. The imaging control signal contains information indicating the above-described exposure and analog gain set at the time of imaging in the sensor unit 10. The imaging control signal further includes a control signal (a vertical synchronization signal, a horizontal synchronization signal, or the like) used by the sensor unit 10 to perform an imaging operation. The sensor controller 11 supplies the generated imaging control signal to the sensor unit 10.

The optical unit 30 is provided to apply light from an object to a light receiving surface of the sensor unit 10, and is provided, for example, at a position corresponding to the sensor unit 10. The optical unit 30 includes, for example, a plurality of lenses, an aperture mechanism for adjusting the size of the aperture with respect to the incident light, and a focusing mechanism for adjusting the focus of the light applied to the light receiving surface. The optical unit 30 may further include a shutter mechanism (mechanical shutter) that adjusts the duration of time for which light is applied to the light receiving surface. The aperture mechanism, the focus mechanism, and the shutter mechanism of the optical unit 30 may be controlled by the sensor controller 11, for example. Without being limited thereto, the aperture (alert)/aperture (diaphragm) and focus of the optical unit 30 may be controlled from the outside of the imaging apparatus 1. The optical unit 30 may also be configured integrally with the imaging apparatus 1.

The recognition processing unit 12 performs recognition processing on an object included in an image containing pixel data based on the pixel data transferred from the sensor controller 11. In the present disclosure, for example, a Digital Signal Processor (DSP) reads out and executes a program that has undergone pre-learning using training data and is stored as a learning model in the memory 13, thereby implementing the recognition processing unit 12 as a machine learning unit that performs recognition processing using a Deep Neural Network (DNN). The recognition processing unit 12 may instruct the sensor controller 11 to read out pixel data required for the recognition processing from the sensor unit 10. The recognition result obtained by the recognition processing unit 12 is transmitted to the output controller 15.

The visual recognition processing unit 14 performs processing for obtaining an image suitable for human recognition on the pixel data transferred from the sensor controller 11, and outputs image data containing a set of pixel data, for example. For example, an Image Signal Processor (ISP) reads out and executes a program stored in advance in a memory (not shown), thereby implementing the visual recognition processing unit 14.

For example, in the case where a color filter is provided for each pixel included in the sensor unit 10 and the pixel data has separate types of color information (i.e., information of red (R), green (G), and blue (B)), the visual recognition processing unit 14 may perform demosaicing processing, white balance processing, and the like. Further, the visual recognition processing unit 14 may instruct the sensor controller 11 to read out pixel data necessary for the visual recognition processing from the sensor unit 10. The image data of which pixel data has undergone the image processing performed by the visual recognition processing unit 14 is transferred to the output controller 15.

The output controller 15 is constituted by a microprocessor, for example, and outputs one or both of the recognition result delivered from the recognition processing unit 12 and the image data delivered from the visual recognition processing unit 14 as a result of the visual recognition processing to the outside of the imaging apparatus 1. For example, the output controller 15 may output image data to a display unit 31 having a display device. This enables the user to visually recognize the image data displayed by the display unit 31. The display unit 31 may be built in the imaging apparatus 1, or may be provided outside the imaging apparatus 1.

Fig. 2A and 2B are schematic diagrams showing an example of a hardware configuration of the image forming apparatus 1 according to each embodiment. Fig. 2A is an example in which the sensor unit 10, the sensor controller 11, the recognition processing unit 12, the memory 13, the visual recognition processing unit 14, and the output controller 15 in the assembly in fig. 1 are mounted on one chip 2. Note that, for the sake of simplicity, fig. 2A omits illustration of the memory 13 and the output controller 15.

In the configuration shown in fig. 2A, the recognition result obtained by the recognition processing unit 12 is output to the outside of the chip 2 via the output controller 15 (not shown). Further, in the configuration of fig. 2A, the recognition processing unit 12 may acquire pixel data for recognition from the sensor controller 11 via the internal interface of the chip 2.

Fig. 2B is an example in which the sensor unit 10, the sensor controller 11, the visual recognition processing unit 14, and the output controller 15 in the assembly in fig. 1 are mounted on one chip 2 and the recognition processing unit 12 and the memory 13 (not shown) are provided outside the chip 2. Similar to fig. 2A, fig. 2B also omits illustration of the memory 13 and the output controller 15 for simplification.

In the configuration of fig. 2B, the identification processing unit 12 acquires pixel data to be used for identification via an interface provided for inter-chip communication. Further, in fig. 2B, the recognition result obtained by the recognition processing unit 12 is directly output from the recognition processing unit 12 to the outside, but the output method is not limited to this example. That is, in the configuration of fig. 2B, the recognition processing unit 12 may return the recognition result to the chip 2, and may perform control to output the result from the output controller 15 (not shown) mounted on the chip 2.

In the configuration shown in fig. 2A, the recognition processing unit 12 is mounted on the chip 2 together with the sensor controller 11, so that high-speed communication between the recognition processing unit 12 and the sensor controller 11 can be performed by an internal interface of the chip 2. On the other hand, the configuration shown in fig. 2A is difficult to replace the recognition processing unit 12, resulting in difficulty in changing the recognition processing. In contrast, the recognition processing unit 12 in the configuration shown in fig. 2B is provided outside the chip 2, and communication needs to be performed between the recognition processing unit 12 and the sensor controller 11 via an interface between the chips. This makes the communication between the recognition processing unit 12 and the sensor controller 11 slower than that in the configuration shown in fig. 2A, resulting in the possibility of a delay in the control. On the other hand, the recognition processing unit 12 can be easily replaced, and thus various recognition processes can be realized.

Hereinafter, unless otherwise specified, the imaging apparatus 1 has a configuration including a sensor unit 10, a sensor controller 11, a recognition processing unit 12, a memory 13, a visual recognition processing unit 14, and an output controller 15 mounted on one chip 2 in fig. 2A.

In the configuration shown in fig. 2A described above, the imaging apparatus 1 may be formed over one substrate. Without being limited thereto, the imaging device 1 may be implemented as a stacked CIS in which a plurality of semiconductor chips are stacked and integrally formed.

As an example, the imaging device 1 may be formed to have a two-layer structure in which semiconductor chips are stacked in two layers. Fig. 3A is a diagram illustrating an example in which the image forming apparatus 1 according to each embodiment is formed as a stacked CIS having a two-layer structure. In the structure of fig. 3A, the pixel unit 20a is formed on a first-tier semiconductor chip, and the memory + logic unit 20b is formed on a second-tier semiconductor chip. The pixel unit 20a includes at least a pixel array in the sensor unit 10. For example, the memory + logic unit 20b includes the sensor controller 11, the recognition processing unit 12, the memory 13, the visual recognition processing unit 14, and the output controller 15, and provides an interface for communication between the imaging apparatus 1 and the outside. The memory + logic unit 20b further includes a part or all of a driving circuit that drives the pixel array in the sensor unit 10. In addition, although not shown, the memory + logic unit 20b may further include a memory for processing the image data by the visual recognition processing unit 14.

As shown on the right side of fig. 3A, the first-layer semiconductor chip and the second-layer semiconductor chip are bonded together while being in electrical contact with each other, so that the imaging device 1 can be configured as one solid-state imaging element.

As another example, the imaging apparatus 1 may be formed to have a three-layer structure in which semiconductor chips are stacked in three layers. Fig. 3B is a diagram illustrating an example in which the image forming apparatus 1 according to each embodiment is formed as a stacked CIS having a three-layer structure. In the structure of fig. 3B, the pixel unit 20a is formed on a first-tier semiconductor chip, the memory unit 20c is formed on a second-tier semiconductor chip, and the logic unit 20B' is formed on a third-tier semiconductor chip. In this case, the logic unit 20 b' includes, for example, the sensor controller 11, the recognition processing unit 12, the visual recognition processing unit 14, the output controller 15, and an interface providing for communication between the imaging apparatus 1 and the outside. Further, the storage unit 20c may include, for example, the memory 13 and a memory used by the visual recognition processing unit 14 for processing image data. The memory 13 may be included in the logic unit 20 b'.

As shown on the right side of fig. 3B, the first layer semiconductor chip, the second layer semiconductor chip, and the third layer semiconductor chip are bonded together while being in electrical contact with each other, so that the imaging device 1 can be configured as one solid-state imaging element.

Fig. 4 is a block diagram showing a configuration of an example of the sensor unit 10 applicable to each embodiment. In fig. 4, the sensor unit 10 includes a pixel array unit 101, a vertical scanning unit 102, an analog-to-digital (AD) conversion unit 103, a pixel signal line 106, a vertical signal line VSL, a control unit 1100, and a signal processing unit 1101. In fig. 4, for example, the control unit 1100 and the signal processing unit 1101 may be included in the sensor controller 11 shown in fig. 1.

The pixel array unit 101 includes a plurality of pixel circuits 100 each including, for example, a photoelectric conversion element that performs photoelectric conversion on received light using a photodiode and a circuit that performs readout of charges from the photoelectric conversion element. In the pixel array unit 101, a plurality of pixel circuits 100 are arranged in a matrix in a horizontal direction (row direction) and a vertical direction (column direction). In the pixel array unit 101, the arrangement of the pixel circuits 100 in the row direction is referred to as a row. For example, in the case where one frame image is formed of 1920 pixels × 1080 rows, the pixel array unit 101 includes at least 1080 rows, and the 1080 rows include at least 1920 pixel circuits 100. The pixel signals read out from the pixel circuits 100 included in a frame form an image (image data) of one frame.

Hereinafter, an operation of reading out a pixel signal from each pixel circuit 100 included in a frame of the sensor unit 10 will be appropriately described as an operation of reading out a pixel from a frame. Further, an operation of reading out a pixel signal from each pixel circuit 100 of a row included in a frame will be appropriately described as an operation of reading out a row.

Further, in the pixel array unit 101, with respect to the row and column of each pixel circuit 100, a pixel signal line 106 is connected to each row, and a vertical signal line VSL is connected to each column. The end portion of the pixel signal line 106 not connected to the pixel array unit 101 is connected to the vertical scanning unit 102. Under the control of a control unit 1100 described below, the vertical scanning unit 102 transmits a control signal (such as a drive pulse used when reading out a pixel signal from a pixel) to the pixel array unit 101 via the pixel signal line 106. The end of the vertical signal line VSL not connected to the pixel array unit 101 is connected to the AD conversion unit 103. The pixel signal read out from the pixel is sent to the AD conversion unit 103 via the vertical signal line VSL.

Readout control of pixel signals from the pixel circuit 100 will be schematically described. Readout of a pixel signal from the pixel circuit 100 is performed in a place where charges stored in the photoelectric conversion element are transferred by exposure to a floating diffusion layer (FD) and the charges transferred in the floating diffusion layer are converted into a voltage. The voltage converted from the charges in the floating diffusion layer is output to the vertical signal line VSL via an amplifier.

More specifically, during exposure, the pixel circuit 100 is set to close (open) the connection between the photoelectric conversion element and the floating diffusion layer so as to store charges generated by light incident upon photoelectric conversion in the photoelectric conversion element. After the exposure is finished, the floating diffusion layer and the vertical signal line VSL are connected according to a selection signal supplied via the pixel signal line 106. Further, the floating diffusion layer is connected to a power supply line of a power supply voltage VDD or a black level voltage for a short period of time in accordance with a reset pulse supplied via the pixel signal line 106, so as to reset the floating diffusion layer. The reset level voltage (defined as voltage a) of the floating diffusion layer is output to the vertical signal line VSL. Thereafter, a transfer pulse supplied via the pixel signal line 106 turns on (off) the connection between the photoelectric conversion element and the floating diffusion layer so as to transfer the charges stored in the photoelectric conversion element to the floating diffusion layer. A voltage (defined as a voltage B) corresponding to the amount of charge in the floating diffusion layer is output to the vertical signal line VSL.

The AD conversion unit 103 includes an AD converter 107, a reference signal generator 104, and a horizontal scanning unit 105 provided for each vertical signal line VSL. The AD converter 107 is a column AD converter that performs AD conversion processing for each column of the pixel array unit 101. The AD converter 107 performs an AD conversion process on the pixel signal supplied from the pixel circuit 100 via the vertical signal line VSL, thereby generating two digital values (values corresponding to the voltages a and B) in a Correlated Double Sampling (CDS) process for noise reduction.

The AD converter 107 supplies the generated two digital values to the signal processing unit 1101. The signal processing unit 1101 performs CDS processing based on the two digital values supplied from the AD converter 107, thereby generating a pixel signal (pixel data) formed of a digital signal. The pixel data generated by the signal processing unit 1101 is output to the outside of the sensor unit 10.

Based on the control signal input from the control unit 1100, the reference signal generator 104 generates a ramp signal as a reference signal, which is used by each AD converter 107 to convert the pixel signal into two digital values. The ramp signal is a signal whose level (voltage value) decreases with a constant slope with respect to time, or a signal whose level decreases stepwise. The reference signal generator 104 supplies the generated ramp signal to each AD converter 107. The reference signal generator 104 is configured by using a digital-to-analog converter (DAC) or the like.

When a ramp signal in which the voltage is stepped down at a predetermined inclination is supplied from the reference signal generator 104, the counter starts counting according to the clock signal. The comparator compares the voltage of the pixel signal supplied from the vertical signal line VSL with the voltage of the ramp signal, and stops the count of the counter at the timing when the voltage of the ramp signal crosses the voltage of the pixel signal. When the counting is stopped, the AD converter 107 outputs a value corresponding to the count value, thereby converting the pixel signal as an analog signal into a digital value.

The AD converter 107 supplies the generated two digital values to the signal processing unit 1101. The signal processing unit 1101 performs CDS processing based on the two digital values supplied from the AD converter 107, thereby generating a pixel signal (pixel data) formed of a digital signal. The pixel signal, which is a digital signal generated by the signal processing unit 1101, is output to the outside of the sensor unit 10.

Under the control of the control unit 1100, the horizontal scanning unit 105 performs selective scanning in which each AD converter 107 is selected in a predetermined order so that each digital value temporarily held in each AD converter 107 is sequentially output to the signal processing unit 1101. The horizontal scanning unit 105 is implemented as a shift register or an address decoder, for example.

Based on the imaging control signal supplied from the sensor controller 11, the control unit 1100 performs drive control of the vertical scanning unit 102, the AD conversion unit 103, the reference signal generator 104, the horizontal scanning unit 105, and the like. The control unit 1100 generates various driving signals as references for the operations of the vertical scanning unit 102, the AD conversion unit 103, the reference signal generator 104, and the horizontal scanning unit 105. For example, based on a vertical synchronization signal or an external trigger signal included in the imaging control signal and a horizontal synchronization signal, the control unit 1100 generates a control signal to be supplied to each pixel circuit 100 by the vertical scanning unit 102 via the pixel signal line 106. The control unit 1100 supplies the generated control signal to the vertical scanning unit 102.

Further, the control unit 1100 transfers, for example, information indicating the analog gain included in the imaging control signal supplied from the sensor controller 11 to the AD conversion unit 103. Based on the information indicating the analog gain, the AD conversion unit 103 controls the gain of the pixel signal input to each AD converter 107 included in the AD conversion unit 103 via the vertical signal line VSL.

Based on a control signal supplied from the control unit 1100, the vertical scanning unit 102 supplies various signals including a drive pulse in the pixel signal line 106 of the selected pixel row of the pixel array unit 101 to each pixel circuit 100 row by row so as to allow a pixel signal to be output from each pixel circuit 100 to the vertical signal line VSL. For example, the vertical scanning unit 102 is implemented as a shift register or an address decoder. Further, the vertical scanning unit 102 controls exposure in each pixel circuit 100 based on information indicating exposure supplied from the control unit 1100.

The sensor unit 10 configured in this way is a column AD type Complementary Metal Oxide Semiconductor (CMOS) image sensor in which the AD converters 107 are arranged in columns.

[2. examples of prior art applicable to the present disclosure ]

Before describing each embodiment according to the present disclosure, a summary will be given to the prior art applicable to the present disclosure for ease of understanding.

(2-1. overview of rolling shutter)

Known imaging methods used when imaging is performed by the pixel array unit 101 include a Rolling Shutter (RS) method and a Global Shutter (GS) method. First, the rolling shutter method will be schematically described. Fig. 5A, 5B, and 5C are schematic views illustrating a rolling shutter method. In the rolling shutter method, as shown in fig. 5A, for example, imaging is sequentially performed in units of lines from a line 201 at the upper end of a frame 200.

The above description describes "imaging" as a representation of an operation in which the sensor unit 10 outputs a pixel signal corresponding to light applied to the light receiving surface. More specifically, "imaging" is used to indicate a series of operations from the start of exposure of a pixel until a pixel signal is transferred to the sensor controller 11 based on the charge stored by exposure to the photoelectric conversion element included in the pixel. Further, as described above, a frame refers to a region of the pixel array unit 101 in which the pixel circuit 100 effective for generating a pixel signal is arranged.

For example, in the configuration of fig. 4, exposure is simultaneously performed for each of the pixel circuits 100 included in one row. After the exposure ends, pixel signals based on the electric charges stored by the exposure are simultaneously transferred in each of the pixel circuits 100 included in the row via each of the vertical signal lines VSL corresponding to each of the pixel circuits 100. By sequentially performing this operation in units of rows, imaging can be achieved with a rolling shutter.

Fig. 5B schematically shows an example of the relationship between imaging and time in the rolling shutter method. In fig. 5B, the vertical axis represents the line position, and the horizontal axis represents time. In the rolling shutter method, since exposure for each line is sequentially performed in the order of the line, the exposure timing for each line is sequentially shifted depending on the position of the line, as shown in fig. 5B. Therefore, for example, in the case where the horizontal positional relationship between the imaging apparatus 1 and the subject changes at a high speed, the captured image of the frame 200 is distorted as shown in fig. 5C. In the example of fig. 5C, the image 202 corresponding to the frame 200 is an image inclined at an angle corresponding to the speed and direction of change in the horizontal positional relationship between the imaging apparatus 1 and the subject.

In the rolling shutter method, the rows for imaging can also be thinned. Fig. 6A, 6B, and 6C are schematic diagrams illustrating line refinement in the rolling shutter method. As shown in fig. 6A, similarly to the example of fig. 5A described above, imaging is performed in units of lines from the line 201 at the upper end of the frame 200 toward the lower end of the frame 200. At this time, imaging is performed while skipping rows by a predetermined number.

Here, for explanation, it is assumed that imaging is performed every other line by performing single-line thinning. That is, after the imaging of the n-th line, the imaging of the (n +2) line is performed. At this time, when no thinning is performed, the time from imaging of the nth row to imaging of the (n +2) th row is assumed to be equal to the time from imaging of the nth row to imaging of the (n +1) th row.

Fig. 6B schematically shows an example of the relationship between imaging and time when single-line thinning is performed in the rolling shutter method. In fig. 6B, the vertical axis represents the line position, and the horizontal axis represents time. In fig. 6B, exposure a corresponds to the exposure of fig. 5B without thinning, while exposure B shows the exposure when single row thinning is performed. As shown in exposure B, the execution of line thinning makes it possible to reduce the deviation of exposure timing at the same line position, as compared with the case where line thinning is not executed. Therefore, as shown in the image 203 in fig. 6C, distortion in the oblique direction that occurs in the image of the captured frame 200 is smaller than in the case where the line thinning shown in fig. 5C is not performed. On the other hand, the image resolution when line thinning is performed is reduced as compared with the case where line thinning is not performed.

The above description is an example of a rolling shutter method in which imaging is sequentially performed in the order of rows from the upper end to the lower end of the frame 200. However, the present invention has been described, but the present disclosure is not limited to this example. Fig. 7A and 7B are diagrams schematically illustrating examples of other imaging methods among rolling shutter methods. For example, as shown in fig. 7A, in the rolling shutter method, imaging may be sequentially performed in the order of rows from the lower end to the upper end of the frame 200. In this case, the distortion direction of the image 202 in the horizontal direction will be opposite to the direction in the case where the images are sequentially imaged in the order of lines from the upper end to the lower end of another frame 200.

Further, for example, by setting the range of the vertical signal line VSL for transmitting a pixel signal, a part of a row can also be selectively read out. Further, by separately setting the lines for imaging and the vertical signal lines VSL for transmitting pixel signals, it is also possible to set the imaging start/end lines to positions other than the upper and lower ends of the frame 200. Fig. 7B schematically shows an example in which a region 205 having a rectangular shape and having a width and a height smaller than those of the frame 200 is set as an imaging range. In the example of fig. 7B, imaging is performed sequentially in the order of the row 204 from the upper end of the region 205 toward the row from the lower end of the region 205.

(2-2. overview of Global shutter)

Next, a Global Shutter (GS) method will be schematically described as an imaging method at the time of imaging by using the pixel array unit 101. Fig. 8A, 8B, and 8C are schematic diagrams illustrating a global shutter method. As shown in fig. 8A, the global shutter method is simultaneously exposed in the all-pixel circuit 100 included in the frame 200.

When the global shutter method is implemented in the configuration of fig. 4, as an example, a configuration using the pixel circuit 100 in which a capacitor is further provided between the photoelectric conversion element and the FD can be conceived. Further, the configuration further includes a first switch and a second switch, which are respectively provided between the photoelectric conversion element and the capacitor and between the capacitor and the floating diffusion layer, and opening and closing of each of the first switch and the second switch is controlled by a pulse supplied via the pixel signal line 106.

In such a configuration, in all the pixel circuits 100 included in the frame 200, the first switch and the second switch are set to be open during the exposure period, and then, when the exposure is completed, the first switch is switched from the open state to the closed state so as to transfer the charge from the photoelectric conversion element to the capacitor. Thereafter, in a case where the capacitor is regarded as a photoelectric conversion element, the charges will be read out from the capacitor in a similar order to that used in the readout operation of the rolling shutter method described above. This makes it possible to perform simultaneous exposure in all the pixel circuits 100 included in the frame 200.

Fig. 8B schematically shows an example of the relationship between imaging and time in the global shutter method. In fig. 8B, the vertical axis represents the line position, and the horizontal axis represents time. In the global shutter method, exposure is simultaneously performed in all the pixel circuits 100 included in the frame 200. This makes it possible to obtain the same exposure timing for each row, as shown in fig. 8B. Therefore, for example, even in the case where the horizontal positional relationship between the imaging apparatus 1 and the subject changes at a high speed, the image 206 captured in the frame 200 is not distorted by the change, as shown in fig. 8C.

The global shutter method makes it possible to ensure simultaneity of exposure timings in the all-pixel circuits 100 included in the frame 200. Therefore, by controlling the timing of each pulse supplied from the pixel signal line 106 of each row and the timing of transmission from each vertical signal line VSL, sampling (readout of pixel signals) can be realized in various modes.

Fig. 9A and 9B are diagrams schematically illustrating an example of a sampling pattern that can be implemented in the global shutter method. Fig. 9A is an example of extracting samples 208 of pixel signal readout in a lattice pattern from each of the pixel circuits 100 arranged in a matrix included in the frame 200. Further, fig. 9B is an example of extracting samples 208 of pixel signal readout from each pixel circuit 100 in a grid pattern. Further, also in the global shutter method, similarly to the rolling shutter method described above, imaging may be performed sequentially in the order of rows.

(2-3. Deep Neural Network (DNN))

Next, a recognition process using a Deep Neural Network (DNN) applicable to each embodiment will be schematically described. In each embodiment, the identification process of the image data is performed by using a Convolutional Neural Network (CNN) in the DNN, and particularly, a Recurrent Neural Network (RNN). Hereinafter, "recognition processing of image data" will be referred to as "image recognition processing" or the like as appropriate.

(2-3-1. overview of CNN)

First, CNN will be schematically described. In the image recognition process using CNN, the image recognition process is performed based on image information provided by pixels arranged in a matrix. Fig. 10 is a diagram schematically illustrating an image recognition process performed by the CNN. The pixel information 51 of the entire image 50 including the drawing of the number "8" of the object as the recognition target is processed by the CNN52 that has been trained in a predetermined manner. By this processing, the number "8" is recognized as the recognition result 53.

In contrast, the processing of CNN may also be applied based on the image of each line to obtain the recognition result from a part of the image as the recognition target. Fig. 11 is a diagram schematically illustrating an image recognition process of obtaining a recognition result from a part of an image as a recognition target. In fig. 11, an image 50' is an image acquired with a portion of the number "8" of an object as a recognition target obtained in units of lines. For example, the pieces of pixel information 54a, 54b, and 54c of the respective lines forming the pixel information 51 ' of the image 50 ' are sequentially processed by the CNN52 ' that has been trained in a predetermined manner.

For example, here is an assumed case in which the recognition result 53a obtained by the recognition processing performed by the CNN 52' on the pixel information 54a of the first line is not a valid recognition result. Here, the valid recognition result refers to a recognition result in which the score indicating the reliability of the recognition result is a predetermined value or higher. The CNN 52' performs an internal state update 55 based on the recognition result 53 a. Next, the pixel information 54b of the second row is subjected to the recognition processing performed by the CNN52 ', which CNN 52' has performed the internal state update 55 based on the previous recognition result 53 a. In fig. 11, as a result of this processing, a recognition result 53b indicating that the number as the recognition target is "8" or "9" has been obtained. Further, based on the recognition result 53b, the internal information of CNN 52' is updated 55. Next, the pixel information 54c of the third row is subjected to the recognition processing performed by the CNN52 ', which CNN 52' has performed the internal state update 55 based on the previous recognition result 53 b. Therefore, in fig. 11, the number as the recognition target is narrowed down to "8" of "8" or "9".

Here, the recognition processing shown in fig. 11 updates the internal state of the CNN using the result of the previous recognition processing. Next, the CNN whose internal state has been updated performs the recognition processing using the pixel information of the line adjacent to the line on which the previous recognition processing has been performed. That is, the recognition processing shown in fig. 11 is sequentially executed in the order of lines for an image in which the internal state of CNN is updated based on the previous recognition result. Therefore, the identification processing shown in fig. 11 is processing that is repeatedly performed sequentially in the order of rows, and can be considered to have a structure equivalent to that of RNN.

(2-3-2. overview of RNN)

Next, RNN will be schematically described. Fig. 12A and 12B are diagrams schematically illustrating an example of recognition processing (recognition processing) performed by DNN when time-series information is not used. In this case, as shown in fig. 12A, one image is input to the DNN. The DNN performs recognition processing on the input image and outputs a recognition result.

Fig. 12B is a diagram for providing a more detailed illustration of the process of fig. 12A. As shown in fig. 12B, the DNN performs a feature extraction process and a recognition process. The DNN performs a feature extraction process on the input image, thereby extracting feature data of the image. Further, the DNN performs recognition processing on the extracted feature data, and obtains a recognition result.

Fig. 13A and 13B are diagrams schematically illustrating a first example of the recognition processing performed by the DNN when time series information is used. In the example of fig. 13A and 13B, the recognition processing with DNN is performed using a fixed number of past information in time series. In the example of FIG. 13A, image [ T ] at time T, image [ T-1] at time T-1 before time T, and image [ T-2] at time T-2 before time T-1 are input to DNN. The DNN performs a recognition process on each of the input images [ T ], [ T-1] and [ T-2] to obtain a recognition result [ T ] at time T.

Fig. 13B is a diagram for providing a more detailed illustration of the process of fig. 13A. As shown in fig. 13B, the DNN performs the feature extraction process described above with reference to fig. 12B on each of the input images [ T ], [ T-1] and [ T-2], thereby extracting feature data corresponding to the images [ T ], [ T-1] and [ T-2], respectively. The DNN integrates feature data obtained based on these images [ T ], [ T-1] and [ T-2], and further performs recognition processing on the integrated feature data, thereby obtaining a recognition result at time T.

The method of fig. 13A and 13B will make it necessary to have a plurality of configurations for performing feature data extraction and to have a configuration for performing feature data extraction in accordance with the number of available past images, resulting in enlargement of the DNN configuration.

Fig. 14A and 14B are diagrams schematically showing a second example of the recognition processing performed by the DNN when time series information is used. In the example of fig. 14A, the image [ T ] at time T is input to the DNN whose internal state has been updated to the state of time T-1, thereby obtaining the recognition result [ T ] at time T.

Fig. 14B is a diagram for providing a more detailed illustration of the process of fig. 14A. As shown in fig. 14B, the DNN performs the feature extraction process described above with reference to fig. 12B on the image [ T ] at the input time T, thereby extracting feature data corresponding to the image [ T ]. In DNN, the internal state has been updated by the image before time T, and feature data related to the updated internal state is retained. Features related to the retained internal information are integrated with feature data on the image [ T ], and a recognition process is performed on the integrated feature data.

The identification processing shown in fig. 14A and 14B is performed using DNN whose internal state has been updated using the immediately preceding identification result, and is thus loop processing. The DNN that performs the cyclic processing in this manner is called a Recurrent Neural Network (RNN). The recognition processing performed by the RNN is generally used for moving image recognition or the like, in which recognition accuracy can be improved by sequentially updating the internal state of the DNN with frame images updated in time series, for example.

In the present disclosure, RNN is applied to a rolling shutter method structure. That is, in the rolling shutter method, pixel signals are sequentially read out in the order of rows. Pixel signals sequentially read out in the order of rows are applied to the RNN as information on time series. This makes it possible to perform the recognition processing based on a plurality of rows having a smaller configuration than the case of using the CNN (refer to fig. 13B). Not limited to this, RNN may also be applied to the structure of the global shutter method. In this case, for example, it is conceivable to consider adjacent rows as information in the time series.

(2-4. drive speed)

Next, the relationship between the frame driving speed and the pixel signal readout amount will be described with reference to fig. 15A and 15B. Fig. 15A is a diagram showing an example of reading out all rows in an image. Here, it is assumed that the resolution of an image as a recognition processing target is 640 pixels horizontally × 480 pixels vertically (480 lines). In this case, when driven at a driving speed of 14400[ lines/second ], it can be output at 30[ frames/second (fps) ].

Next, here is an assumed case where imaging is performed while line thinning. For example, as shown in fig. 15B, it is assumed that imaging is performed by skipping every other line, that is, imaging is performed by using the readout thinned by 1/2. As a first example of 1/2 refinement, driving at a drive speed of 14400[ lines/second ] as described above halves the number of lines read out from the image. Although the resolution is reduced, the output can be achieved at a rate of 60[ fps ], which is twice that without refinement, thereby increasing the frame rate. 1/2 is a case where driving is performed at a driving speed of 7200[ fps ], which is half that of the first example. In this case, power savings are achieved, although the frame rate will be 30 fps, similar to the case without refinement.

When reading out an image line, whether thinning is not performed, thinning is performed to increase the driving speed, or thinning is performed and the driving speed is maintained at the same speed as in the case without thinning, may be selected according to the purpose of the identification processing based on the pixel signal to be read out.

[3. summary of the present disclosure ]

Hereinafter, each embodiment of the present disclosure will be described in more detail. First, a process according to each embodiment of the present disclosure will be schematically described. Fig. 16 is a schematic diagram schematically illustrating the identification process according to each embodiment of the present disclosure. In fig. 16, in step S1, the imaging apparatus 1 (refer to fig. 1) according to each embodiment starts imaging a target image as a recognition target.

Note that the target image is, for example, an image of a handwritten numeral "8". Further, as a premise, the memory 13 stores in advance a learning model trained to be able to recognize numbers by predetermined training data as a program, and the recognition processing unit 12 reads out the program from the memory 13 and executes the program, thereby enabling recognition of numbers included in an image. Further, the imaging apparatus 1 will perform imaging by the rolling shutter method. Even when the imaging apparatus 1 performs imaging by the global shutter method, the following processing can be applied similarly to the case of the rolling shutter method.

When imaging is started, the imaging apparatus 1 sequentially reads out frames in units of lines from the upper end side to the lower end side of the frame in step S2.

When the line is read out to the specific position, the recognition processing unit 12 recognizes the number "8" or "9" from the image of the read-out line (step S3). For example, the numbers "8" and "9" include features common to the top half. Thus, at the point where the rows are sequentially read out from the top and the features are identified, the identified object may be identified as the number "8" or the number "9".

Here, as shown in step S4a, by reading the line at the lower end of the frame or the line near the lower end, the entire picture of the identified object appears, and the object identified as the number "8" or "9" in step S2 is now determined as the number "8".

In contrast, steps S4b and S4c are processes related to the present disclosure.

As shown in step S4b, when the line is further read from the line position read out in step S3, the identified object can be identified as the number "8" even before reaching the lower end of the number "8". For example, the lower half of the number "8" and the lower half of the number "9" have different characteristics. By reading out the row of the portion where the difference in characteristics is clarified, it is possible to identify whether the object identified in step S3 is the number "8" or "9". In the example of fig. 16, in step S4b, the object is determined to be the number "8".

Further, as shown in step S4c, it is also conceivable to further read out from the row position of step S3 in the state of step S3 so as to jump to a row position where the object identified in step S3 seems to be able to be determined as the number "8" or the number "9". By reading out the line to which the jump arrives, it can be determined whether the object identified in step S3 is the number "8" or "9". The row position where the jump arrives may be determined based on a learning model pre-trained according to predetermined training data.

Here, in the case where the subject is recognized in step S4b or step S4c described above, the imaging apparatus 1 may end the recognition processing. This makes it possible to shorten the recognition processing and save power in the imaging apparatus 1.

Note that the training data is data that holds a plurality of combinations of the input signal and the output signal of each readout unit. As an example, in the task of recognizing the above-described numbers, data (line data, sub-sampling data, etc.) of each readout unit may be used as an input signal, and data indicating "correct number" may be used as an output signal. As another example, in a task of detecting an object, data (line data, sub-sampling data, etc.) of each readout unit may be used as an input signal, and an object category (person/vehicle/non-object), coordinates (x, y, h, w) of the object, and the like may be used as an output signal. Further, the output signal may be generated only from the input signal by using the self-supervised learning.

[4. first embodiment ]

Next, a first embodiment of the present disclosure will be described.

(4-1. identifying operation example in processing Unit)

In the imaging apparatus 1 according to the first embodiment, as described above, the recognition processing unit 12 reads out and executes the program stored in the memory 13 as a learning model trained in advance based on predetermined training data, thereby serving as a recognizer using DNN.

Fig. 17 is a flowchart showing an example of the recognition processing performed by the recognition processing unit 12 according to the first embodiment. In fig. 17, in step S121, the DSP constituting the recognition processing unit 12 in the imaging apparatus 1 reads out the learning model from the memory 13 and executes the learning model. By this processing, the DSP functions as the recognition processing unit 12.

Next, in step S122, the recognition processing unit 12 in the imaging apparatus 1 instructs the sensor controller 11 to start reading out the frame from the sensor unit 10. In this frame reading, for example, one frame of image data is sequentially read in units of lines (also referred to as line units). The recognition processing unit 12 determines whether or not image data of a predetermined number of lines in one frame has been read out.

When the recognition processing unit 12 determines that the image data of the predetermined number of lines in one frame has been read out ("yes" in step S123), the recognition processing unit 12 proceeds to the processing in step S124. In step S124, the recognition processing unit 12 performs recognition processing as machine learning processing using CNN on the read image data of a predetermined number of lines. That is, the recognition processing unit 12 performs machine learning processing on the image data of a predetermined number of lines as a unit area using a learning model. Machine learning processing using CNN includes performing various recognition or detection processes such as face detection, face authentication, line-of-sight detection, facial expression recognition, face direction detection, object recognition, motion (moving body) detection, pet detection, scene recognition, state detection, avoidance target object recognition, and other processes.

Here, the face detection is a process of detecting a face of a person included in the image data. The face authentication is one of biometrics authentications, and is a process of authenticating whether or not the face of a person included in image data matches the face of a person registered in advance. The line of sight detection is a process of detecting the direction of a line of sight of a person included in image data. Facial expression recognition is a process of recognizing a facial expression of a person included in image data. The face direction detection is a process of detecting the up/down direction of the face of a person included in the image data. Object detection is a process of detecting an object included in image data. Object recognition is a process of recognizing what an object included in image data is. Motion (moving body) detection is processing of detecting a moving body included in image data. Pet detection is a process of detecting a pet such as a dog or a cat included in image data. Scene recognition is a process of recognizing a scene (sea, mountain, etc.) being photographed. The state detection is a process of detecting the state (normal state, abnormal state, etc.) of a person or the like included in the image data. The avoidance target object recognition is a process of recognizing an object existing ahead in the traveling direction as an avoidance target when the person moves. The machine learning process performed by the recognition processing unit 12 is not limited to these examples.

In step S125, the recognition processing unit 12 determines whether the machine learning processing using CNN in step S124 is successful. When the recognition processing unit 12 determines that the machine learning processing using the CNN is successful (yes in step S125), the recognition processing unit 12 proceeds to the processing of step S129. In contrast, when the recognition processing unit 12 determines that the machine learning processing using CNN in step S124 fails (no in step S125), the recognition processing unit 12 proceeds to the processing of step S126. In step S126, the recognition processing unit 12 waits for reading out the image data of the next predetermined number of lines from the sensor controller 11 (no in step S126).

In the present description, success of the machine learning process means that a specific detection result, identification result, or authentication has been obtained in, for example, face detection, face authentication, or the like as described above. In contrast, failure of the machine learning process means that sufficient detection results, recognition results, and authentication have not been obtained in face detection, face authentication, and the like, for example, as described above.

Next, in step S126, when reading out the image data (unit area) of the next predetermined number of lines (yes in step S126), the recognition processing unit 12 performs machine learning processing using RNN on the read out image data of the predetermined number of lines in step S127. The machine learning process using the RNN also uses the result of the machine learning process using the CNN or RNN performed so far for the image data of the same frame.

In a case where it has been determined in step S128 that the machine learning process using the RNN in step S127 has succeeded (yes in step S128), the recognition processing unit 12 proceeds to the process of step S129.

In step S129, the recognition processing unit 12 supplies, for example, the machine learning result successfully obtained in step S124 or step S127 from the recognition processing unit 12 to the output controller 15. For example, the machine learning result output in step S129 is a valid recognition result obtained by the recognition processing unit 12. The recognition processing unit 12 may store the machine learning result in the memory 13.

Further, when the recognition processing unit 12 determines in step S128 that the machine learning process using the RNN in step S127 fails (no in step S128), the recognition processing unit 12 proceeds to the process of step S130. In step S130, the recognition processing unit 12 determines whether the readout of the image data of one frame is completed. When it has been determined that the readout of the image data for one frame has not been completed (no in step S130), the recognition processing unit 12 returns the processing to step S126, and in step S126, the processing of the image data for the next predetermined number of lines is to be performed.

In contrast, when the recognition processing unit 12 determines in step S130 that the readout of the image data of one frame is completed (yes in step S130), the recognition processing unit 12 determines in step S131 whether to end the series of processing of the flowchart in fig. 17. When the recognition processing unit 12 determines not to end the processing (no in step S131), the recognition processing unit 12 returns the processing to step S122, and performs a similar operation on the next frame. When the recognition processing unit 12 determines to end the processing (yes at step S131), the recognition processing unit 12 ends a series of processing of the flowchart of fig. 17.

Whether to advance to the next frame in step S131 may be determined based on whether an end instruction has been input from the outside of the imaging apparatus 1 or based on whether a series of processing of image data for a predetermined number of frames has been completed.

Further, there is an assumed case where machine learning processing such as face detection, face authentication, line of sight detection, facial expression recognition, face direction detection, object recognition, motion (moving body) detection, scene recognition, or state detection is performed successively. In this case, in the case where the previous machine learning process fails, the subsequent machine learning process may be skipped. For example, when face authentication is to be performed after face detection, in the case where face detection has failed, the latter face authentication process may be skipped.

(4-2. identifying specific examples of operations in processing units)

Next, the operation of the machine learning unit described with reference to fig. 17 will be described with reference to a specific example. Hereinafter, a case of performing face detection using DNN will be described.

Fig. 18 is a diagram showing an example of image data of one frame. Fig. 19 is a diagram showing the flow of the machine learning process executed by the recognition processing unit 12 according to the first embodiment.

When performing face detection on image data as shown in fig. 18 by machine learning, as shown in part (a) of fig. 19, the recognition processing unit 12 first receives input of image data of a predetermined number of lines (corresponding to step S123 in fig. 17). The recognition processing unit 12 performs face detection by performing machine learning processing using CNN on the image data of the predetermined number of lines that has been input (corresponding to step S124 in fig. 17). However, since the image data of the entire face has not been input at the stage of part (a) of fig. 19, the recognition processing unit 12 fails in the face detection (corresponding to no in step S125 of fig. 17).

Subsequently, as shown in part (b) of fig. 19, the image data of the next predetermined line number is input to the recognition processing unit 12 (corresponding to step S126 of fig. 17). When using the result of the machine learning process using the CNN performed on the image data of the predetermined number of lines input in part (a) of fig. 19, the recognition processing unit 12 performs the machine learning process using the RNN on the newly input image data of the predetermined number of lines, thereby performing face detection (corresponding to step S127 in fig. 17).

At the stage of part (b) of fig. 19, the image data of the entire face is input together with the predetermined number of lines of pixel data input at the stage of part (a) of fig. 19. Therefore, at the stage of part (b) of fig. 19, the recognition processing unit 12 succeeds in face detection (corresponding to "yes" in step S128 of fig. 17). This causes the operation to output the result of face detection without reading out the next and subsequent image data (image data in parts (c) to (f) of fig. 19) (corresponding to step S129 in fig. 17).

In this way, by performing the machine learning process using DNN on the image data of a predetermined number of lines, it is possible to omit reading out the image data and performing the machine learning process after the point at which the face detection succeeds. This makes it possible to complete processes such as detection, identification, and authentication in a short time, thereby reducing processing time and power consumption.

The predetermined number of rows is the number of rows determined by the size of the filter required by the algorithm of the learning model, and the minimum number is one row.

Further, the image data read out from the sensor unit 10 by the sensor controller 11 may be image data thinned in at least one of a column direction and a row direction. In this case, for example, when image data is read out every other row in the column direction, image data on the 2(N-1) th row (N is an integer of 1 or more) will be read out.

Further, in the case where the filter required for the learning model algorithm is not formed in units of rows, but is formed in units of pixels such as 1 × 1 pixels or 5 × 5 pixels as a rectangular region, image data in the rectangular region corresponding to the shape and size of the filter, instead of image data of a predetermined number of lines, may be input to the recognition processing unit 12 as image data of a unit region on which the recognition processing unit 12 performs machine learning processing.

Further, although the CNN and RNN are explained above as examples of DNN, the present disclosure is not limited to these, and other learning models may be used.

(4-3. application example of the first embodiment)

Next, an application example of the first embodiment will be described. Here, as an application example of the first embodiment, the following is an example of controlling exposure of a predetermined number of lines to be read out next to be performed, for example, based on the result of the machine learning process performed by the CNN in step S124 and the result of the machine learning process performed by the RNN in step S127 of the flowchart of fig. 17. Fig. 20A and 20B are schematic diagrams showing an application example of the first embodiment.

Part (a) of fig. 20A is a schematic diagram showing an example of an overexposed image 60A. Overexposure of the image 60a causes the image 60a to appear whitish as a whole. For example, the monitor 62 as an object included in the image 60a has a phenomenon called highlight (brown-out highlight) in the screen, making it difficult for the human eye to distinguish details. On the other hand, the person 61 as another object included in the image 60a is slightly whitish due to overexposure, but the human eye seems to easily recognize it as compared with the monitor 62.

Part (b) of fig. 20A is a schematic diagram showing an example of the underexposed image 60 b. Underexposure of the image 60b causes the image 60b to appear blackened overall. For example, the person 61 visible in the image 60a is now difficult to recognize by the human eye. On the other hand, the monitor 62 included in the image 60b can be recognized by the human eye in detail as compared with the image 60 a.

Fig. 20B is a schematic diagram showing a readout method according to an application example of the first embodiment. Parts (a) and (B) of fig. 20B show a case where frame readout is started in the underexposed state in step S122 of the flowchart of fig. 17 described above.

Part (a) of fig. 20B shows a readout method of the first example in the application example according to the first embodiment. The image 60c of part (a) of fig. 20B indicates that the recognition process using CNN has failed with respect to the line L #1 at the top of the frame in step S124, for example, or that the score indicating the reliability of the recognition result is a predetermined value or less. In this case, the recognition processing unit 12 instructs the sensor controller 11 to set the exposure of the line L #2 read out in step S126 to an exposure suitable for the recognition processing (in this case, to a larger exposure amount). In fig. 20B, the rows L #1, L #2, etc. may be one single row, or may be a plurality of rows adjacent to each other.

In the example in part (a) of fig. 20B, the exposure amount of the line L #2 is larger than that of the line L # 1. In this case, it is assumed that the result is overexposure of the line L #2, and the recognition processing using the RNN has failed in step S127, or the score is a predetermined value or less. The recognition processing unit 12 instructs the sensor controller 11 to set the exposure amount of the line L #3 to be read out after the processing returns from step S130 to step S126 to be smaller than the exposure amount of the line L # 2. Similarly, the exposure amount of the next line will also be set in order for lines #4, …, L # m, … according to the result of the recognition processing.

In this way, by adjusting the exposure amount of a line to be read out next based on the recognition result of a specific line, the recognition processing can be performed with higher accuracy.

Further, as a further application of the above-described application example, as shown in part (B) of fig. 20B, there is a conceivable method for resetting exposure at a point where readout is completed to a predetermined line, and then performing readout again from the first line of the frame. As shown in part (B) of fig. 20B, the recognition processing unit 12 reads out from the line L #1 at the top of the frame to the line L # m (first) similarly to part (a) described above, for example, and resets the exposure based on the recognition result. The recognition processing unit 12 reads out again the respective lines L #1, L #2, etc. (second) of the frame based on the reset exposure.

In this way, the exposure is reset based on the readout results of the predetermined number of lines, and the lines L #1, L #2, … are read out again from the top of the frame based on the reset exposure, so that the recognition processing can be performed with higher accuracy.

[5. second embodiment ]

(5-0-1. configuration example according to the second embodiment)

Next, a second embodiment of the present disclosure will be described. The second embodiment is an extension of the recognition process according to the first embodiment described above. Fig. 21 is a functional block diagram showing an example of functions of the image forming apparatus according to the second embodiment. Note that fig. 21 omits illustrations of the optical unit 30, the sensor unit 10, the memory 13, and the display unit 31 shown in fig. 1. In addition, fig. 21 has a trigger generator 16 added to the configuration of fig. 1.

In fig. 21, the sensor controller 11 includes a readout unit 110 and a readout controller 111. The recognition processing unit 12 includes a feature data calculation unit 120, a feature data storage controller 121, a readout determiner 123, and a recognition processing execution unit 124. The feature data storage controller 121 includes a feature data storage unit 122. Further, the visual recognition processing unit 14 includes an image data storage controller 140, a readout determiner 142, and an image processing unit 143. The image data storage controller 140 includes an image data storage unit 141.

In the sensor controller 11, the readout controller 111 receives readout area information indicating a readout area for readout performed by the recognition processing unit 12 from the readout determiner 123 included in the recognition processing unit 12. The readout area information indicates a line number of one or more lines. Without being limited thereto, the readout area information may be information indicating a pixel position in one row. Further, by providing readout region information obtained by combining one or more line numbers and information indicating pixel positions of one or more pixels in a line, readout regions of various patterns can be specified. The readout region corresponds to a readout unit. Without being limited thereto, the readout region and the readout unit may be different.

Similarly, the readout controller 111 receives readout area information indicating a readout area for readout performed by the visual recognition processing unit 14 from the readout determiner 142 included in the visual recognition processing unit 14.

Based on these readout determiners 123 and 142, the readout controller 111 passes readout area information indicating a readout area for actual readout to the readout unit 110. For example, in the case where there is a conflict between the readout region information received from the readout determiner 123 and the readout region information received from the readout determiner 142, the readout controller 111 may perform mediation and adjustment on the readout region information to be transferred to the readout unit 110.

Further, the readout controller 111 may receive information indicating exposure and analog gain from the readout determiner 123 or the readout determiner 142. The readout controller 111 passes the received information indicative of the exposure and the analog gain to the readout unit 110.

The readout unit 110 reads out pixel data from the sensor unit 10 according to readout region information transferred from the readout controller 111. For example, the readout unit 110 obtains a line number indicating a line to be read out and pixel position information indicating positions of pixels in the line to be read out based on the readout area information, and passes the obtained line number and pixel position information to the sensor unit 10. The readout unit 110 transfers the respective pixel data acquired from the sensor unit 10 to the recognition processing unit 12 and the visual recognition processing unit 14 together with the readout region information.

Further, the readout unit 110 sets the exposure and analog gain of the sensor unit 10 according to the information indicating the exposure and analog gain received from the readout controller 111. Further, the readout unit 110 may generate a vertical synchronization signal and a horizontal synchronization signal and supply the generated signals to the sensor unit 10.

In the recognition processing unit 12, the readout determiner 123 receives readout information indicating a readout region to be read out next from the feature data storage controller 121. The readout determiner 123 generates readout area information based on the received readout information, and passes the generated information to the readout controller 111.

Here, the readout determiner 123 may use, as the readout region indicated in the readout region information, information that readout position information, for example, pixel data for reading out the readout unit has been added to a predetermined readout unit. The readout unit is a set of one or more pixels, and corresponds to a processing unit executed by the recognition processing unit 12 and the visual recognition processing unit 14. As an example, when the readout unit is a row, a row number [ L # x ] indicating a row position is added as readout position information. Further, when the readout unit is a rectangular region including a plurality of pixels, information indicating the position of the rectangular region in the pixel array unit 101 (for example, information indicating the position of the pixel in the upper left corner) will be added as readout position information. The readout determiner 123 designates in advance a readout unit to be applied. Without being limited thereto, the readout determiner 123 may also determine the readout unit, for example, in response to an instruction from outside the readout determiner 123. Therefore, the readout determiner 123 functions as a readout unit controller that controls the readout unit.

Note that the readout determiner 123 may also determine a readout region to be read out next based on identification information delivered from the identification processing execution unit 124, which will be described below, and may generate readout region information indicating the determined readout region.

Similarly, in the visual recognition processing unit 14, the readout determiner 142 receives readout information indicating a readout region to be read out next, for example, from the image data storage controller 140. The readout determiner 142 generates readout region information based on the received readout information, and passes the generated information to the readout controller 111.

In the recognition processing unit 12, the feature data calculation unit 120 calculates feature data in the area indicated by the readout area information based on the pixel data and the readout area information transferred from the readout unit 110. The feature data calculation unit 120 transfers the calculated feature data to the feature data storage controller 121.

As described below, the feature data calculation unit 120 may calculate feature data based on the pixel data transferred from the readout unit 110 and the past feature data transferred from the feature data storage controller 121. Without being limited thereto, the feature data calculation unit 120 may acquire information for setting exposure and analog gain from the readout unit 110, for example, and may further calculate feature data using the acquired information.

In the recognition processing unit 12, the feature data storage controller 121 stores the feature data transferred from the feature data calculation unit 120 in the feature data storage unit 122. Further, when the feature is delivered from the feature data calculation unit 120, the feature data storage controller 121 generates readout information indicating a readout area for the next readout, and delivers the generated information to the readout determiner 123.

Here, the feature data storage controller 121 may integrally store already stored feature data and newly transferred feature data. Further, the feature data storage controller 121 may delete unnecessary feature data from the feature data stored in the feature data storage unit 122. An example of unnecessary feature data may be feature data related to a previous frame, or feature data calculated based on a frame image regarding a scene different from the frame image in which new feature data is calculated and has been stored. Further, the feature data storage controller 121 may also delete and initialize all the feature data stored in the feature data storage unit 122 as necessary.

Further, the feature data storage controller 121 generates feature data used for the recognition processing by the recognition processing execution unit 124 based on the feature data transferred from the feature data calculation unit 120 and the feature data stored in the feature data storage unit 122. The feature data storage controller 121 passes the generated feature data to the recognition processing executing unit 124.

The recognition processing execution unit 124 executes recognition processing based on the feature data transferred from the feature data storage controller 121. The recognition processing execution unit 124 executes object detection, face detection, and the like by recognition processing. The recognition processing execution unit 124 passes the recognition result obtained by the recognition processing to the output controller 15. The identification process execution unit 124 may also pass identification information including the identification result generated by the identification process to the readout determiner 123. The identification process execution unit 124 may receive the feature data from the feature data storage controller 121 and execute the identification process according to the execution of the trigger generated by the trigger generator 16, for example.

Meanwhile, in the visual recognition processing unit 14, the image data storage controller 140 receives the pixel data read out from the readout area and the readout area information corresponding to the image data from the readout unit 110. The image data storage controller 140 stores the pixel data and the readout area information in the image data storage unit 141 in association with each other.

The image data storage controller 140 generates image data used by the image processing unit 143 to perform image processing based on the pixel data transferred from the readout unit 110 and the image data stored in the image data storage unit 141. The image data storage controller 140 transfers the generated image data to the image processing unit 143. Without being limited thereto, the image data storage controller 140 may also transfer the pixel data transferred from the readout unit 110 to the image processing unit 143 as it is.

Further, the image data storage controller 140 generates readout information indicating a readout area for the next readout based on the readout area information transferred from the readout unit 110, and transfers the generated readout information to the readout determiner 142.

Here, for example, the image data storage controller 140 may perform integrated storage of the already stored image data and the newly transferred pixel data using the addition average. Further, the image data storage controller 140 may delete unnecessary image data from the image data stored in the image data storage unit 141. An example of the unnecessary image data may be image data related to a previous frame, or image data calculated based on a frame image regarding a scene different from the frame image in which new image data is calculated and has been stored. Further, the image data storage controller 140 may also delete and initialize all the image data stored in the image data storage unit 141 as necessary.

Further, the image data storage controller 140 may acquire information for setting exposure and analog gain from the readout unit 110, and may store image data corrected using the acquired information in the image data storage unit 141.

The image processing unit 143 performs predetermined image processing on the image data transferred from the image data storage controller 140. For example, the image processing unit 143 may perform predetermined image quality enhancement processing on the image data. Further, in the case where the transferred image data is image data in which data is spatially reduced by line thinning or the like, interpolation processing may be used to fill in the thinned portion with image information. The image processing unit 143 transfers the image data having undergone image processing to the output controller 15.

The image processing unit 143 may receive image data from the image data storage controller 140 and perform image processing, for example, according to execution of a trigger generated by the trigger generator 16.

The output controller 15 outputs one or both of the recognition result delivered from the recognition processing execution unit 124 and the image data delivered from the image processing unit 143. The output controller 15 outputs one or both of the recognition result and the image data, for example, according to the trigger generated by the trigger generator 16.

Based on the information related to the recognition processing transferred from the recognition processing unit 12 and the information related to the image processing transferred from the visual recognition processing unit 14, the trigger generator 16 generates a trigger including a trigger to be transferred to the recognition processing execution unit 124, a trigger to be transferred to the image processing unit 143, and a trigger to be transferred to the output controller 15. The trigger generator 16 transfers each generated trigger to the recognition processing execution unit 124, the image processing unit 143, and the output controller 15, respectively, at predetermined timings.

(5-0-2. example of processing in the recognition processing unit according to the second embodiment)

Fig. 22 is a schematic diagram showing an example of processing in the recognition processing unit 12 according to the second embodiment in more detail. Here, it is assumed that the readout region is a line, and the readout unit 110 reads out pixel data in units of lines from the upper end to the lower end of the frame of the image 60. The line image data (line data) of the line L # x read out by the readout unit 110 in line units is to be input to the feature data calculation unit 120.

The feature data calculation unit 120 performs a feature data extraction process 1200 and an integration process 1202. The feature data calculation unit 120 performs a feature data extraction process 1200 on the input line data to extract feature data 1201 from the line data. Here, the feature data extraction processing 1200 extracts feature data 1201 from line data based on parameters obtained by learning in advance. Using the integration process 1202, the feature data 1201 extracted by the feature data extraction process 1200 is integrated with the feature data 1212 processed by the feature data storage controller 121. The integrated feature data 1210 is passed to the feature data storage controller 121.

The feature data storage controller 121 performs an internal state update process 1211. The feature data 1210 transferred to the feature data storage controller 121 is transferred to the identification process execution unit 124, and is subjected to the internal state update process 1211. The internal state update process 1211 reduces the feature data 1210 based on the previously learned parameters so as to update the internal state of the DNN, and then generates feature data 1212 corresponding to the updated internal state. The feature data 1212 is integrated with the feature data 1201 by the integration process 1202. The processing performed by the feature data storage controller 121 corresponds to processing using RNN.

The recognition processing execution unit 124 executes recognition processing 1240 on the feature data 1210 delivered from the feature data storage controller 121 based on, for example, a parameter learned in advance using predetermined training data, and outputs the recognition result.

As described above, based on the parameters learned in advance, the recognition processing unit 12 according to the second embodiment performs the processes, specifically, the feature data extraction process 1200, the integration process 1202, the internal state update process 1211, and the recognition process 1240. Parameter learning is performed using training data based on an assumed recognition target.

Note that, for example, when a program stored in the memory 13 or the like is read and executed by the DSP included in the imaging apparatus 1, the functions of the above-described feature data calculation unit 120, feature data storage controller 121, readout determiner 123, and recognition processing execution unit 124 are realized. Similarly, for example, when a program stored in the memory 13 or the like is read and executed by an ISP included in the imaging apparatus 1, the functions of the above-described image data storage controller 140, readout determiner 142, and image processing unit 143 are realized. These programs may be stored in the memory 13 in advance, or may be supplied to the image forming apparatus 1 from the outside and written in the memory 13.

(5-0-3. details of the identification processing according to the second embodiment)

Next, the second embodiment will be described in more detail. Fig. 23 is a functional block diagram showing an example of functions according to the second embodiment. Since the second embodiment mainly describes the recognition processing performed by the recognition processing unit 12, fig. 23 omits illustrations of the visual recognition processing unit 14, the output controller 15, and the trigger generator 16 shown in the configuration of fig. 21. In fig. 23, the readout controller 111 is not shown in the sensor controller 11.

Fig. 24 is a schematic diagram showing a frame readout process according to the second embodiment. In the second embodiment, the readout unit is a line, and for the frame fr (x), readout of pixel data is sequentially performed in the order of the line. In the example of fig. 24, in the mth frame fr (m), readout of lines is sequentially performed in the order of lines starting from the line L #1 at the upper end of the frame fr (m) so as to continue to the lines L #2, L #3, and the like. When the line readout in the frame Fr (m) is completed, in the next frame Fr (m +1) as the (m +1) -th frame, the readout of the lines is performed sequentially in the order of the lines from the upper line L #1 in a similar manner.

Fig. 25 is a diagram schematically showing the identification process according to the second embodiment. As shown in fig. 25, the recognition processing is performed by sequentially performing the processing performed by CNN 52' and the internal information update 55 for each pixel information 54 of each line L #1, L #2, L #3, and the like. Therefore, it is sufficient to input pixel information 54 of one line into CNN 52', so that the identifier 56 can be formed on an extremely small scale. Note that the identifier 56 has a configuration as an RNN because it performs processing of the CNN 52' on information sequentially input, and performs internal information update 55.

By sequentially performing the recognition processing in the order of rows using the RNN, it is possible to obtain a valid recognition result without performing readout of all rows included in the frame. In this case, the recognition processing unit 12 may end the recognition processing at a point at which a valid recognition result is obtained. An example of ending the identification process in the middle of frame readout will be described with reference to fig. 26 and 27.

Fig. 26 is a diagram showing an exemplary case where the recognition target is the number "8". In the example of fig. 26, the number "8" is identified at a point where a range 71 of approximately 3/4 of the frame 70 in the vertical direction has been read out. Accordingly, the recognition processing unit 12 can output a valid recognition result indicating that the recognition number "8" is recognized at the point where the range 71 has been read out, and can end the line readout processing and the recognition processing of the frame 70.

Fig. 27 is a diagram showing an example when the recognition target is a human. In the example of fig. 27, the person 74 is identified at a point where a range 73 of approximately 1/2 of the frame 72 in the vertical direction has been read out. Accordingly, the recognition processing unit 12 can output a valid recognition result indicating that the person 74 is recognized at the point where the range 73 has been read out, and can end the line readout processing and the recognition processing of the frame 72.

In this way, in the second embodiment, when a valid recognition result is obtained in the middle of line readout of a frame, the line readout and recognition processing can be ended. This makes it possible to save power in the recognition processing and shorten the time required for the recognition processing.

Although the above is an example in which line readout is performed from the upper end side to the lower end side of the frame, the readout direction is not limited to this example. For example, line readout may be performed from the lower end side to the upper end side of the frame. That is, by performing line readout from the upper end side to the lower end side of the frame, it is generally possible to identify an object existing far from the imaging apparatus 1 earlier. In contrast, by performing line readout from the lower end side to the upper end side of the frame, a subject existing on the front side with respect to the imaging apparatus 1 can be generally recognized earlier.

For example, there is a conceivable case where the imaging apparatus 1 is installed for an in-vehicle application so as to image a front view. In this case, an object in front (for example, a vehicle or a pedestrian in front of the own vehicle) exists in a lower portion of the screen to be imaged. Therefore, it is more efficient to perform line readout from the lower end side to the upper end side of the frame. Furthermore, when an Advanced Driver Assistance System (ADAS) needs to be stopped immediately, only at least one corresponding object needs to be identified. In addition, in the case where one object has been recognized, it is considered more effective to perform line readout again from the lower end side of the frame. Further, for example, on a highway, there is a case where an object at a distance will be prioritized. In this case, the line readout is preferably performed from the upper end side to the lower end side of the frame.

Further, the readout unit may be set to a column direction in a matrix (row-column) direction in the pixel array unit 101. For example, it is conceivable to use a plurality of pixels arranged in columns in the pixel array unit 101 as the readout unit. Application of the global shutter method as an imaging method makes it possible to perform column-based readout using columns as readout units. In the global shutter method, readout can be performed by switching between column-based readout and row-based readout. For example, when the reading is fixed to the column-based readout, it is conceivable to rotate the pixel array unit 101 by 90 ° and use the rolling shutter method.

For example, by sequentially reading out from the left end side of the frame based on the readout of the columns, the subject existing on the left side of the imaging apparatus 1 can be recognized earlier. Similarly, by sequentially reading out from the right end side of the frame by column-based readout, an object existing on the right side with respect to the imaging apparatus 1 can be recognized earlier.

In a use example in which the imaging apparatus 1 is used for an in-vehicle application, for example, when the vehicle is turning, in some cases, an object existing on the turning side will be prioritized. In this case, readout is preferably performed from the end on the turn side by column-based readout. For example, the turning direction may be acquired based on the steering information of the vehicle. Without being limited to this, for example, it is possible to provide the imaging apparatus 1 with sensors capable of detecting angular velocities in three directions, and to acquire a turning direction based on the detection results of the sensors.

Fig. 28 is a flowchart showing an example of the identification process according to the second embodiment. For example, the process according to the flowchart of fig. 28 is a process corresponding to reading out pixel data from a readout unit (for example, one line) of a frame. Here, it is assumed that the readout unit is one row. For example, the readout area information may be represented by a row number indicating a row to be read out.

In step S100, the recognition processing unit 12 reads out line data from a line indicated by a readout line of the frame. More specifically, in the recognition processing unit 12, the readout determiner 123 passes the row number on the row to be read out next to the sensor controller 11. In the sensor controller 11, the readout unit 110 reads out the pixel data of the row indicated by the row number from the sensor unit 10 as row data according to the transferred row number. The readout unit 110 transfers the line data read out from the sensor unit 10 to the feature data calculation unit 120. Further, the readout unit 110 passes readout area information (e.g., a line number) indicating an area for pixel data readout to the feature data calculation unit 120.

In the next step S101, the feature data calculation unit 120 calculates feature data based on line data according to the pixel data transferred from the readout unit 110, and calculates feature data of the line. In the next step S102, the feature data calculation unit 120 acquires feature data stored in the feature data storage unit 122 from the feature data storage controller 121. In the next step S103, the feature data calculation unit 120 integrates the feature data calculated in step S101 and the feature data acquired from the feature data storage controller 121 in step S102. The integrated feature data is passed to feature data storage controller 121. The feature data storage controller 121 stores the integrated feature data transferred from the feature data calculation unit 120 in the feature data storage unit 122 (step S104).

Note that a series of processes from step S100 is a process for the first line of a frame, and therefore, for example, when the feature data storage unit 122 is initialized, the processes in steps S102 and S103 may be omitted. At this time, the processing according to step S104 is processing of accumulating line feature data calculated based on the first line in the feature data storage unit 122.

Further, the feature data storage controller 121 also passes the integrated feature data passed from the feature data calculation unit 120 to the identification process execution unit 124. In step S105, the recognition processing execution unit 124 executes recognition processing using the integrated feature data transferred from the feature data storage controller 121. In the next step S106, the recognition processing execution unit 124 outputs the recognition result of the recognition processing of step S105.

In step S107, the readout determiner 123 in the recognition processing unit 12 determines a readout row for performing the next readout from the readout information delivered from the feature data storage controller 121. For example, the feature data storage controller 121 receives the readout area information and the feature data from the feature data calculation unit 120. Based on the readout region information, the feature data storage controller 121 determines a readout row to be read out next according to, for example, a predetermined readout pattern (in this example, in the order of rows). The processing from step S100 is executed again for the determined readout row.

(5-0-4. example of control of readout and recognition processing according to the second embodiment)

Next, an example of controlling the readout and recognition processing according to the second embodiment will be described. Fig. 29A and 29B are timing charts showing an example of control readout and recognition processing according to the second embodiment. The example of fig. 29A and 29B is an example of providing a blank time blk in which no imaging operation is performed within one imaging period (one frame period). Fig. 29A and 29B show time passing to the right.

Fig. 29A shows an example in which 1/2 of the imaging period is continuously allocated to the blank time blk. In fig. 29A, the imaging period corresponds to a frame period of, for example, 1/30[ seconds ]. Readout of frames from the sensor unit 10 is performed in this frame period. The imaging time is a length of time required to image all the lines included in the frame. In the example of fig. 29A, it is assumed that the frame includes n lines, and imaging of n lines from line L #1 to line L # n is completed within 1/60[ sec ] (which is 1/2 of the frame period of 1/30[ sec ]). The length of time allocated to one line of imaging is 1/(60 × n) [ seconds ]. A period of 1/30[ sec ] from the timing of imaging the last line L # n in the frame to the timing of imaging the first line L #1 in the next frame is defined as a blank time blk.

For example, at the timing of completing the imaging of the line L #1, the imaging of the next line L #2 is started. Meanwhile, the recognition processing unit 12 performs line recognition processing, that is, recognition processing of pixel data included in the line L #1, on the line L # 1. Before the imaging of the next line L #2 is started, the recognition processing unit 12 ends the line recognition processing for the line L # 1. When the line recognition processing for the line L #1 is completed, the recognition processing unit 12 outputs a recognition result on the recognition processing.

Similarly, for the next line L #2, at the timing of completing the imaging of the line L #2, the imaging of the next line L #3 is started. Subsequently, the recognition processing unit 12 executes the line recognition processing for the line L #2, and ends the executed line recognition processing before the imaging of the next line L #3 is started. In the example of fig. 29A, imaging of lines L #1, L #2, #3, …, L # m, … is performed in this manner in order. In each of the lines L #1, L #2, L #3, …, L # m, …, at the timing at which imaging ends, imaging of the line next to the line at which imaging is completed is started. At the same time as the start, the line recognition processing of the imaging-completed line is performed.

In this way, by sequentially performing the recognition processing in the readout unit (row in this example), the recognition result can be sequentially obtained without inputting all the image data of the frame to the recognizer (recognition processing unit 12), so that the delay until the recognition result is obtained can be reduced. Further, when a valid recognition result is obtained on a certain line, the recognition processing can be ended at that point, thereby reducing the time of the recognition processing and saving power. Further, by propagating information on the time axis and integrating the recognition result for each line, the recognition accuracy can be gradually improved.

In the example of fig. 29A, the blank time blk within the frame period may be used to perform other processing (for example, image processing in the visual recognition processing unit 14 using the recognition result) assumed to be performed within the frame period.

Fig. 29B shows an example of providing a blank time blk for each time of one line imaging. In the example of fig. 29B, similarly to the example of fig. 29A, the frame period (imaging period) is set to 1/30[ sec ]. On the other hand, the imaging time is set to 1/30[ sec ], which is the same as the imaging period. Further, in the example of fig. 29B, it is assumed that line imaging of n lines (i.e., lines L #1 to L # n) is performed at a time interval of 1/(30 × n) [ sec ] in one frame period, and the imaging time of one line is 1/(60 × n) [ sec ].

In this case, a blank time blk of 1/(60 × n) [ seconds ] may be provided for each imaging time of each of the lines L #1 to L # n. In each blank time blk of each of the lines L #1 to L # n, other processing supposed to be performed (for example, image processing in the visual recognition processing unit 14 using the recognition result) may be performed on the captured image of the corresponding line. At this time, a time (about 1/(30 × n) [ seconds ] in this example) until the end of imaging of the line immediately next to the target line may be allocated to the other processing. In the example of fig. 29B, the processing results of the other processing can be output line by line, so that the processing results of the other processing can be acquired more quickly.

Fig. 30 is a timing chart showing another example of the control readout and recognition process according to the second embodiment. In the example of fig. 29 described above, imaging of all the lines L #1 to L # n included in the frame is completed within 1/2 periods of the frame period, where the remaining 1/2 periods of the frame period are set as blank times. In contrast, in the example shown in fig. 30, imaging of all the lines L #1 to L # n included in the frame is performed using all the frame periods without a blank time within the frame period.

Here, when the imaging time of one line is 1/(60 × n) [ sec ] which is the same time as that of fig. 29A and 29B, and the number of lines included in a frame is n which is the same number as that of fig. 29A and 29B, the frame period (i.e., imaging period) will be 1/60[ sec ]. Therefore, in the example in which the blank time blk shown in fig. 30 is not provided, the frame rate can be increased as compared with the above-described examples of fig. 29A and 29B.

[ 6] third embodiment ]

Next, a third embodiment of the present disclosure will be described. The third embodiment is an example of controlling the recognition result of the recognition processing unit 12 and the output timing of the image data for visual recognition by the visual recognition processing unit 14. In the third embodiment, referring to fig. 21, control is performed on the output of the recognition result from the recognition processing execution unit 124 and the output of the image data from the image processing unit 143 based on the trigger signal generated by the trigger generator 16.

(6-0. overview of third embodiment)

Fig. 31 is a flowchart showing an example of an overview of the output control process according to the third embodiment. The processing in the flowchart of fig. 31 is processing performed for each readout of the readout unit. Hereinafter, it is assumed that the readout unit is a row, and the sensor controller 11 reads out pixel data from the sensor unit 10 in units of rows.

In step S200, the readout unit 110 reads out pixel data (hereinafter, appropriately referred to as line data) from the sensor unit 10 in units of lines. The readout unit 110 transfers the line data read out from the sensor unit 10 to the recognition processing unit 12 and the visual recognition processing unit 14. The visual recognition processing unit 14 transfers the pixel data transferred from the readout unit 110 to the image data storage controller 140. For example, the image data storage controller 140 stores the received pixel data in the image data storage unit 141, and also transfers the pixel data to the image processing unit 143.

Meanwhile, in step S201, the recognition processing unit 12 performs calculation of feature data by the feature data calculation unit 120 based on the line data transferred from the readout unit 110, stores the calculated feature data in the feature data storage unit 122, and performs recognition processing and the like by the recognition processing execution unit 124 based on the integrated feature data stored in the feature data storage unit 122. In the next step S202, the recognition processing unit 12 outputs the recognition result of the recognition processing from the recognition-processing executing unit 124. In the next step S203, in the recognition processing unit 12, the readout determiner 123 generates readout region information indicating the next readout row, and passes the information to the sensor controller 11.

In the next step S204, the trigger generator 16 determines whether or not an image for visual recognition is output from the image processing unit 143, according to, for example, the output of the recognition result in step S202. In a case where the trigger generator 16 determines not to output the image for visual recognition (no at step S204), the trigger generator 16 proceeds to the processing at step S206. In contrast, when the trigger generator 16 determines to output an image for visual recognition (yes at step S204), the trigger generator 16 proceeds to the process at step S205.

In step S205, the trigger generator 16 performs output processing to output a trigger signal. The trigger signal is transmitted to the recognition processing execution unit 124, the image processing unit 143, and the output controller 15. In response to the trigger signal, the recognition processing performing unit 124 and the image processing unit 143 output the recognition result and the image data, respectively. The recognition result and the image data output from the recognition processing execution unit 124 and the image processing unit 143 are respectively transferred to the output controller 15.

In the next step S206, the output controller 15 performs an output control process according to the trigger signal delivered from the trigger generator 16 in step S205, and outputs the recognition result and the image data to the subsequent stage.

In this way, by controlling the recognition processing execution unit 124, the image processing unit 143, and the output controller 15 according to the trigger signal generated by the trigger generator 16, it is possible to output the recognition result and the image data at appropriate timings.

(6-0-1. example of outputting trigger signal in accordance with time)

Fig. 32 is a diagram schematically showing an example of the output control processing according to the third embodiment. Here, a case where the trigger generator 16 outputs the trigger signal based on time will be described.

In fig. 32, the imaging apparatus 1 (refer to fig. 1) starts imaging of a target image (handwritten numeral "8") as a recognition target. In step S10, the sensor controller 11 transmits read-out area information to the recognition processing unit 12 at time t₀Reading out the frame in units of lines in the frame is started. The sensor controller 11 sequentially reads out frames in units of lines from the upper end side to the lower end side of the frames.

When the line is read out to a certain position, the recognition processing unit 12 recognizes the number "8" or "9" from the image of the read-out line (step S11). Based on the integrated feature data delivered from the feature data storage controller 121, the readout determiner 123 of the recognition processing unit 12 generates readout area information that specifies a row that predicts which of the numbers "8" or "9" the object recognized in step S11 can be recognized, and delivers the generated information to the readout unit 110. Subsequently, the recognition processing unit 12 performs recognition processing based on the pixel data obtained by reading out the specified row by the readout unit 110 (step S12).

The trigger generator 16 is at time t from the start of the readout₀Time t after elapse of predetermined time_TRGAnd outputting a trigger signal. For example, when readout of a frame is performed in units of rows in a frame period, the trigger generator 16 outputs a trigger signal at a specific time interval corresponding to the frame period. In the example of fig. 32, the time t has elapsed at the processing point of step S12_TRGAnd the trigger generator 16 has output a trigger signal. In response to the trigger signal, the recognition processing execution unit 124 outputs the recognition result, and the image processing unit 143 outputs the image data. Further, in response to the trigger signal, the output controller 15 outputs the recognition result output from the recognition processing execution unit 124 and the image data output from the image processing unit 143 to the subsequent stage.

Note that the recognition processing unit 12 performs processing at a timing different from that in the visual recognition processing unit 14 and the trigger generator 16. Therefore, the recognition processing unit 12 sometimes at time t_TRGThe previous time completes the recognition process. In this case, the recognition processing unit 12 waits for the next processing until at time t_TRGUntil the trigger generator 16 outputs a trigger signal.

Further, at this time, in the case where there is an unprocessed line that has not been read out from the frame at the point at which the recognition processing in the recognition processing unit 12 is completed, the visual recognition processing unit 14 may further read out the unprocessed line. The output controller 15 may output line data on the unprocessed line read out by the visual recognition processing unit 14 and line data read out for the recognition processing by the recognition processing unit 12.

There may be a point in time t of the recognition processing unit 12_TRGThe case where the recognition processing has not been completed. In this case, the recognition processing unit 12 responds to the trigger signal at the time point t_TRGAnd outputting the recognition result.

Fig. 33A and 33B are functional block diagrams respectively showing an exemplary function on the recognition processing unit 12 side and an exemplary function on the visual recognition processing unit 14 side of the imaging apparatus 1 according to the third embodiment. Fig. 33A and 33B show an exemplary function on the recognition processing unit 12 side and an exemplary function on the visual recognition processing unit 14 side, respectively, from the configuration of fig. 21 described above.

As shown in fig. 33A, the trigger generator 16a outputs a trigger signal to the identification process execution unit 124 at certain time intervals. Further, as shown in fig. 33B, the trigger generator 16a outputs a trigger signal to the image processing unit 143 at certain time intervals.

Fig. 34 is a flowchart showing an example of processing when a trigger signal is output according to time according to the third embodiment. In fig. 34, the processing of steps S200 to S203 is similar to the processing of steps S200 to S203 according to the flowchart of fig. 31 described above.

That is, in step S200, the reading unit 110 reads line data from the sensor unit 10, and passes the line data to the recognition processing unit 12 and the visual recognition processing unit 14. The visual recognition processing unit 14 transfers the pixel data transferred from the readout unit 110 to the image data storage controller 140. For example, the image data storage controller 140 stores the received pixel data in the image data storage unit 141, and also transfers the pixel data to the image processing unit 143.

In step S201, the recognition processing unit 12 performs calculation of feature data, storage of calculated feature data, recognition processing based on stored and integrated feature data, and the like based on the line data delivered from the readout unit 110. In the next step S202, the recognition processing unit 12 outputs the recognition result of the recognition processing from the recognition-processing executing unit 124. In the next step S203, in the recognition processing unit 12, the readout determiner 123 generates readout region information indicating the next readout row, and passes the information to the sensor controller 11.

In the next step S2040, the trigger generator 16 determines whether a certain time has elapsed since the readout of the row in step S200. In a case where it is determined that the time has not elapsed (no in step S2040), a series of processing according to the flowchart of fig. 34 is terminated. In contrast, when the trigger generator 16 determines that the specific time has elapsed (yes at step S2040), the trigger generator 16 proceeds to the processing at step S205.

In step S205, the trigger generator 16 performs output processing to output a trigger signal. The trigger signal is transmitted to the recognition processing execution unit 124, the image processing unit 143, and the output controller 15. In response to the trigger signal, the recognition processing performing unit 124 and the image processing unit 143 output the recognition result and the image data, respectively. The recognition result and the image data output from the recognition processing execution unit 124 and the image processing unit 143 are output to the subsequent stage via the output controller 15, respectively.

In this way, in the third embodiment, since the trigger signal is output by the trigger generator 16 at a fixed cycle, the recognition result and the image data for visual recognition can be output at a fixed cycle (for example, a frame cycle).

[6-1 ] first modification of third embodiment ]

Next, a first modification of the third embodiment will be described. The first modification of the third embodiment is an example in which the trigger signal is generated from the region read out from the frame by the sensor controller 11.

Fig. 35 is a diagram schematically showing an example of the output control processing according to the first modification of the third embodiment. In fig. 35, part (a) shows a temporal change in the ratio of the area read out from the frame by the sensor controller 11 to the entire frame (the ratio of the read-out area). Further, part (b) is a diagram corresponding to fig. 32 described above, and schematically shows the state of the frame read out by the sensor controller 11. That is, the frames are sequentially read out in the order of lines in step S10, and the process jumps to a position recognizable by the prediction object, and the readout is performed in step S11. Subsequently, the recognition result is output in step S12.

In part (a) of fig. 35, the ratio of the readout region is changed at a constant rate up to step S11, and is changed at a rate smaller than step S11 from step S11. Here, when the ratio of the readout region reaches the threshold value R_thAt time t, the trigger generator 16_TRGA trigger signal is generated. In response to the trigger signal, the recognition processing execution unit 124 outputs the recognition result, and the image processing unit 143 outputs the image data. Further, in response to the trigger signal, the output controller 15 outputs the recognition result output from the recognition processing execution unit 124 and the image data output from the image processing unit 143 to the subsequent stage.

Fig. 36A and 36B are functional block diagrams respectively showing an exemplary function on the recognition processing unit 12 side and an exemplary function on the visual recognition processing unit 14 side of the imaging apparatus 1 according to the first modification of the third embodiment. Fig. 36A and 36B show an exemplary function on the recognition processing unit 12 side and an exemplary function on the visual recognition processing unit 14 side, respectively, from the configuration of fig. 21 described above.

As shown in fig. 36A and 36B, respectively, the trigger generator 16B receives readout region information from the readout controller 111 of the sensor controller 11, and obtains the ratio of the readout region based on the received readout region information. When the trigger generator 16b determines that the ratio of the obtained read-out areas exceeds the threshold value R_thAt this time, the trigger generator 16B generates a trigger signal, and outputs the generated trigger signal to the recognition processing execution unit 124 (refer to fig. 36A) and the image processing unit 143 (refer to fig. 36B), respectively.

Fig. 37 is a flowchart showing an example of processing according to the first modification of the third embodiment. In fig. 37, the processing of steps S200 to S203 is similar to the processing of steps S200 to S203 according to the above-described flowchart of fig. 34, and thus the description thereof will be omitted here. In step S203, the readout determiner 123 in the recognition processing unit 12 passes readout region information indicating the next readout row to the sensor controller 11, and then the process proceeds to step S2041.

In step S2041, the trigger generator 16b is based on the slave transmissionThe sensor controller 11 receives the readout region information to determine whether the ratio of the readout regions exceeds the threshold value R_th. When it is determined that the ratio does not exceed the threshold R_thIf not (no in step S2041), the series of processing according to the flowchart of fig. 37 ends. Thereafter, for example, the next line data is read out from step S200.

In contrast, the trigger generator 16b determines that the ratio of the readout regions exceeds the threshold value R_thIn the case of (yes in step S2041), the trigger generator 16b proceeds to the process of step S205, and performs an output process to output a trigger signal. In response to the trigger signal, the recognition processing performing unit 124 and the image processing unit 143 output the recognition result and the image data, respectively.

In this way, in the first modification of the third embodiment, the trigger signal is output by the trigger generator 16b in accordance with the ratio of the readout regions, so that image data in a specific region or more in the frame can be output as image data for visual recognition.

[6-2 ] second modification of third embodiment ]

Next, a second modification of the third embodiment will be described. The second modification of the third embodiment is an example of generating a trigger signal according to a recognition confidence indicating the confidence of the recognition processing result of the recognition processing execution unit 124.

Fig. 38 is a diagram schematically showing an example of the output control processing according to the first modification of the third embodiment. In fig. 38, part (a) shows a temporal change in a recognition confidence score indicating the recognition confidence of the recognition processing of the line data read out from the frame by the sensor controller 11 by the recognition processing execution unit 124. Further, part (b) is a diagram corresponding to fig. 32 described above, and schematically shows the state of the frame read out by the sensor controller 11. That is, the frames are sequentially read out in the order of lines in step S10, and the process jumps to a position recognizable by the prediction object, and the readout is performed in step S11. Subsequently, the recognition result is output in step S12.

In part (a) of fig. 38, the recognition confidence scores are at a constant rateUntil step S11, and after the number "8" or "9" is identified in step S11, the rate is changed at a rate greater than step S11. Here, when the recognition confidence score reaches a threshold C_thAt time t, the trigger generator 16_TRGA trigger signal is generated. In response to the trigger signal, the recognition processing execution unit 124 outputs the recognition result, and the image processing unit 143 outputs the image data. Further, in response to the trigger signal, the output controller 15 outputs the recognition result output from the recognition processing execution unit 124 and the image data output from the image processing unit 143 to the subsequent stage.

Fig. 39A and 39B are functional block diagrams respectively showing an exemplary function on the recognition processing unit 12 side and an exemplary function on the visual recognition processing unit 14 side of the imaging apparatus 1 according to the second modification of the third embodiment. Fig. 39A shows an example of the function on the side of the recognition processing unit 12 extracted from the configuration of fig. 21 described above.

As shown in fig. 39A and 39B, respectively, the recognition processing performing unit 124 appropriately outputs a recognition result including a recognition confidence score. The trigger generator 16c receives the recognition result from the recognition processing execution unit 124, and acquires a recognition confidence score included in the received recognition result. When the trigger generator 16C determines that the acquired recognition confidence score exceeds the threshold C_thAt this time, the trigger generator 16c generates a trigger signal, and outputs the generated trigger signal to the recognition processing execution unit 124 (refer to fig. 39A) and the image processing unit 143 (refer to fig. 39B), respectively.

Fig. 40 is a flowchart showing an example of processing according to a second modification of the third embodiment. In fig. 40, the processing of steps S200 to S203 is similar to the processing of steps S200 to S203 according to the above-described flowchart of fig. 34, and thus the description thereof will be omitted here. In step S203, the readout determiner 123 in the recognition processing unit 12 passes readout region information indicating the next readout row to the sensor controller 11, and then the process proceeds to step S2041.

In step S2042, the trigger generator 16c determines the identification position included in the identification result received from the identification process execution unit 124Whether the confidence score exceeds a threshold C_th. When it is determined that the score does not exceed the threshold C_thIf not (no in step S2042), the series of processing according to the flowchart of fig. 40 ends. Thereafter, for example, the next line data is read out from step S200.

In contrast, it is determined at the trigger generator 16C that the recognition confidence score exceeds the threshold C_thIn the case of (yes in step S2042), the trigger generator 16c proceeds to the process of step S205, and performs an output process to output a trigger signal. In response to the trigger signal, the recognition processing performing unit 124 and the image processing unit 143 output the recognition result and the image data, respectively.

In this way, in the second modification of the third embodiment, the trigger generator 16 outputs the trigger signal according to the recognition confidence score, so that it is possible to acquire recognition information with higher accuracy relating to the object included in the image data for visual recognition.

[6-3 ] third modification of third embodiment ]

Next, a third modification of the third embodiment will be described. A third modification of the third embodiment is an example in which the trigger signal is generated from external information acquired from the outside of the imaging apparatus 1.

Fig. 41A and 41B are functional block diagrams respectively showing an exemplary function on the recognition processing unit 12 side and an exemplary function on the visual recognition processing unit 14 side of the imaging apparatus 1 according to the third modification of the third embodiment. Fig. 41A and 41B show an exemplary function on the recognition processing unit 12 side and an exemplary function on the visual recognition processing unit 14 side, respectively, from the configuration of fig. 21 described above.

As shown in fig. 41A and 41B2, an imaging apparatus 1 according to a third modification of the third embodiment includes an external information acquisition unit 17 that acquires information from the outside. The external information acquisition unit 17 transfers external information acquired from the outside to the trigger generator 16 d. The trigger generator 16d generates a trigger signal according to the external information transferred from the external information acquisition unit 17, and outputs the generated trigger signal to the recognition processing execution unit 124 (refer to fig. 41A) and the image processing unit 143 (refer to fig. 41B), respectively.

Here, as the external information acquired by the external information acquisition unit 17, various information that can be acquired from the outside of the imaging apparatus 1, such as a trigger signal from the outside and a recognition result of an external recognition device, may be applied. Examples of external devices that output such external information include other imaging devices, laser imaging detection and ranging (LiDAR) system sensors (known as LiDAR sensors), or radar devices. For example, when the imaging apparatus 1 is used for an in-vehicle application, it is desirable to be able to input external information (such as identification information, a trigger signal, and vehicle information output from other imaging apparatuses, LiDAR sensors, radar devices, and the like mounted on the same vehicle) to the imaging apparatus 1.

As an example, in the case where the external information is a recognition result of another imaging device or a recognition result of a LiDAR sensor or a radar apparatus, it is conceivable that the trigger generator 16d generates a trigger signal according to a recognition confidence score of the recognition result acquired as the external information by the external information acquisition unit 17.

Note that when using external information output from these external devices, calibration related to the position of the imaging apparatus 1 with respect to a captured image or related to time is preferably performed. Further, although the above description is an example in which the external device serves as a host and the image forming apparatus 1 outputs the trigger signal in response to the external information output from the external device, the present disclosure is not limited to this example. For example, it is also permissible to use a configuration in which the imaging apparatus 1 is used as a host, and the trigger generator 16d outputs a trigger signal generated by other methods (the ratio of time and readout region, the recognition confidence score, and the like) to an external device.

Not limited to the above example, time information acquired by using a Global Navigation Satellite System (GNSS) may also be used as the external information. Further, in the case where the imaging apparatus 1 is used for an in-vehicle application, the external information acquisition unit 17 may acquire vehicle information (steering information, speed information, brake information, direction indication information, etc.) about a vehicle in which the imaging apparatus 1 is installed as the external information.

Fig. 42 is a flowchart showing an example of processing according to a third modification of the third embodiment. In fig. 42, the processing of steps S200 to S203 is similar to the processing of steps S200 to S203 according to the above-described flowchart of fig. 34, and thus the description thereof will be omitted here. In step S203, the readout determiner 123 in the recognition processing unit 12 passes readout region information indicating the next readout row to the sensor controller 11, and then the process proceeds to step S2043.

In step S2043, the trigger generator 16d determines whether the external information acquisition unit 17 has acquired predetermined external information. In a case where it is determined that information has not been acquired (no in step S2043), a series of processing according to the flowchart of fig. 42 ends. Thereafter, for example, the next line data is read out from step S200.

In contrast, when the trigger generator 16d determines that the external information acquisition unit 17 has acquired the predetermined external information (yes in step S2043), the trigger generator 16d proceeds to the processing of step S205. The trigger generator 16d acquires predetermined external information input from an external device to the external information acquisition unit 17 from the external information acquisition unit 17. The trigger generator 16d performs output processing according to the acquired predetermined external information, and outputs a trigger signal. In response to the trigger signal, the recognition processing performing unit 124 and the image processing unit 143 output the recognition result and the image data, respectively.

In this way, in the third modification of the third embodiment, the trigger signal is output in accordance with the external information input from the outside, so that the recognition results obtained by the plurality of sensor devices can be used. Therefore, the image forming apparatus 1 according to the third modification of the third embodiment can be linked with an external device.

[ 7] fourth embodiment ]

Next, a fourth embodiment will be described. The fourth embodiment is an example of a deviation between an output corresponding to a recognition result of the recognition processing unit 12 and an output of image data for visual recognition by the visual recognition processing unit 14.

Fig. 43 is a diagram schematically showing an example of the output control process according to the fourth embodiment. In a fourth embodiment, the outputs are independently used to identifyA trigger signal for the physical execution unit 124 and a trigger signal for the image processing unit 143. Further, in the following example, according to the ratio of the readout region to the frame in each process described in the second modification of the third embodiment, a trigger signal for identifying each of the process execution unit 124 and the image processing unit 143 is output. A threshold value of the ratio of the readout region to the process of the identification process execution unit 124 is defined as a threshold value R_th1And a threshold value of the ratio of the readout region to the processing of the image processing unit 143 is defined as a threshold value R_th2。

In FIG. 43, at time t₀Frame readout is started (step S10), and frames are sequentially read out in the order of rows in step S11. In this example, after step S11, the process jumps to a line recognizable by the prediction object, and readout is performed in step S20. Here, in the processing of step S20, it is assumed that the ratio of the readout region to the processing performed by the recognition processing performing unit 124 is at time t_TRG1Reach the threshold value R_th1. In this case, at time t_TRG1The trigger signal is output to the identification process execution unit 124. The recognition processing executing unit 124 outputs a recognition result in response to the trigger signal. By the recognition processing execution unit 124 at time t_TRG1The output recognition result is cached in a predetermined storage area (referred to as a cache memory) (step S21). When the recognition result is output and cached, the recognition processing unit 12 ends the recognition processing.

Here, at a point in time t_TRG1Let us assume the time t read from the start frame₀A predetermined time, such as a frame period, has not elapsed.

In step S21, after the recognition processing performed by the recognition processing unit 12 ends, the visual recognition processing unit 14 performs processing from the reading start time t₀Frame readout is started until a predetermined time (e.g., a frame period) elapses. Here, it is assumed that the ratio of the readout region to the processing of the image processing unit 143 is at time t_TRG2Reach the threshold value R_th2. At time t_TRG2The trigger signal is output to the image processing unit 143.

Image processing sheetElement 143 responds to t_TRG2The trigger signal at (a) outputs image data for visual recognition. Further, in response to time t_TRG2The trigger signal at (b) is read out from the cache memory and the identification result cached in step S21 is output. This makes it possible to output image data for visual recognition and a recognition result at the same time.

In the above description, only the identification result is cached in step S21. However, the present disclosure is not limited to this example, and the image data for visual recognition may be further cached.

Fig. 44 is a functional block diagram showing functions of an example of the image forming apparatus 1 according to the fourth embodiment. In fig. 44, in the image forming apparatus 1, two trigger generators, i.e., the trigger generator 16 that generates the trigger signal for the identification processing execution unit 124 are provided_e1And a trigger generator 16 for generating a trigger signal for the image processing unit 143_e2。

For the trigger generator 16_e1A threshold value R is set with respect to the ratio of the readout region in the identification processing execution unit 124_th1. Similarly, for the trigger generator 16_e2Setting a threshold value R regarding the ratio of readout regions in the image processing unit 143_th2. For example, these thresholds R_th1And R_th2May be respectively trigger generators 16_e1And 16_e2Preset or may be adaptively set according to a frame readout state.

Further, the output controller 15a includes a cache memory 150 that caches the recognition result and a cache memory 151 that caches image data for visual recognition.

The read controller 111 passes read area information indicating read rows to be read next to the trigger generator 16, respectively_e1And 16_e2. Trigger generator 16_e1And 16_e2The ratio of the current readout area is obtained based on the transferred readout area information. When the obtained ratio of the current readout region reaches the threshold value R_th1Time of day trigger generator 16_e1The trigger signal is output to the identification process execution unit 124. LikeWhen the obtained ratio of the current readout region reaches the threshold value R_th2Time of day trigger generator 16_e2The trigger signal is output to the image processing unit 143.

The recognition result output from the recognition processing execution unit 124 in response to the trigger signal is transferred to the output controller 15a and stored in the cache memory 150. Similarly, the image data for visual recognition output from the image processing unit 143 in response to the trigger signal is transferred to the output controller 15a and stored in the cache memory 151. The output controller 15a outputs the recognition results and the image data for visual recognition stored in the cache memories 150 and 151, respectively, at a predetermined timing (for example, at a timing synchronized with the frame period).

Fig. 45 is a flowchart showing an example of processing according to the fourth embodiment. In fig. 45, the processing of steps S200 to S203 is similar to the processing of steps S200 to S203 according to the above-described flowchart of fig. 34, and thus the description thereof will be omitted here. In step S203, the readout determiner 123 in the recognition processing unit 12 passes readout region information indicating the next readout row to the sensor controller 11, and then the process proceeds to step S2044.

For example, respectively at the trigger generator 16_e1And 16_e2Step S2044 and the process of step S205 subsequent to step S2044 are executed in parallel.

At the trigger generator 16_e1In step S2044, it is determined whether the recognition processing execution unit 124 is allowed to output the recognition result. More specifically, in step S2044, the trigger generator 16_e1The ratio of the current readout region is obtained based on the readout region information indicating the next readout row determined in step S203, and the threshold R is reached at the obtained ratio of the current readout region_th1In the case of (3), it is determined that the recognition processing execution unit 124 is allowed to output the recognition result. When the trigger generator 16 is triggered_e1When it is determined that the recognition result is not output at the current point (no in step S2044), the trigger generator 16_e1The process proceeds to step S2060.

In contrast, when the generator 16 is triggered_e1Determining to be inputWhen the recognition result is recognized (yes in step S2044), the processing proceeds to step S205. In step S205, the trigger generator 16_e1The output processing of outputting the trigger signal to the identification processing execution unit 124 is performed. When the output processing is executed, the processing proceeds to step S2060.

Trigger generator 16_e2Similar to the trigger generator 16_e1The process of (1). I.e. trigger generator 16_e2It is determined in step S2044 whether the image processing unit 143 is allowed to output image data for visual recognition. More specifically, in step S2044, the trigger generator 16_e2The ratio of the current readout region is obtained based on the readout region information indicating the next readout row determined in step S203, and the threshold R is reached at the obtained ratio of the current readout region_th2In the case of (3), it is determined that the image processing unit 143 is allowed to output image data for visual recognition. When the trigger generator 16 is triggered_e2When it is determined that the recognition result is not output at the current point (no in step S2044), the trigger generator 16_e2The process proceeds to step S2060.

In contrast, when the trigger generator 15f determines in step S2044 that the recognition result is output (no in step S2044), the trigger generator 15f proceeds to the processing of step S205. In step S205, the trigger generator 16_e2Output processing of outputting the trigger signal to the image processing unit 143 is performed. When the output processing is executed, the processing proceeds to step S2060.

The process of step S2060 and the process of step S2061 following step S2060 are processes executed by the output controller 15 a. The output controller 15a performs output control processing on the recognition result output from the recognition processing execution unit 124 and the image data for visual recognition output from the image processing unit 143, respectively.

In step S2060, the output controller 15a stores the recognition result output from the recognition processing execution unit 124 in the cache memory 150 so as to execute the output storage processing. After the output controller 15a has stored the recognition result in the cache memory 150, the output controller 15a proceeds to the process of step S2061. In step S2061, the output controller 15a executes output control processing of outputting the identification result stored in the cache memory 150 at a predetermined timing (for example, at a timing synchronized with the frame period).

Similarly, in step S2060, the output controller 15a stores the image data for visual recognition output from the image processing unit 143 in the cache memory 151 so as to perform the output storage processing. After the output controller 15a has stored the image data for visual recognition in the cache memory 150, the output controller 15a proceeds to the process of step S2061. In step S2061, the output controller 15a executes output control processing of outputting the image data for visual recognition stored in the cache memory 151 at a predetermined timing (for example, at a timing synchronized with a frame period).

Here, the output controller 15a performs the output of the recognition result in step S2061 in synchronization with the output of the image data for visual recognition. This makes it possible to output the recognition result and the image data for visual recognition without any time lag.

For example, after the output processing of the recognition result and the image data for visual recognition has been performed in step S2061, the line data of the next readout line is read out from step S200.

In this way, in the fourth embodiment, the recognition result and the image data for visual recognition are cached separately, and the cached recognition result and the image data for visual recognition are output at predetermined timings. This makes it possible to output the recognition result and the image data for visual recognition in a state in which the time lag between the recognition result and the image data for visual recognition is suppressed.

[7-1 ] first modification of the fourth embodiment ]

Next, a first modification of the fourth embodiment will be described. In the fourth embodiment described above, a time lag between the recognition result and the image data for visual recognition is suppressed. In contrast, in the first modification of the fourth embodiment, the spatial deviation between the recognition result and the image data for visual recognition will be suppressed. For example, when the imaging apparatus 1 is moved at a high speed during imaging for an in-vehicle application, there may be a case where a spatial deviation (for example, positional deviation of an object in a two-dimensional plane) occurs between the recognition result and image data for visual recognition. Further, in the case where the imaging apparatus 1 images a moving body moving at high speed, a spatial deviation may occur in an object in image data of the moving body. In the first modification of the fourth embodiment, such a deviation is suppressed based on information acquired from the external sensor.

Fig. 46 is a functional block diagram showing functions of an example of the image forming apparatus 1 according to the first modification of the fourth embodiment. The configuration shown in fig. 46 is an example of suppressing a spatial deviation occurring when the imaging apparatus 1 moves at high speed.

In the configuration shown in fig. 46, the output of the external sensor 18 is supplied to the output controller 15b, as compared with the configuration shown in fig. 44 described above. The external sensor 18 is, for example, a device capable of detecting the movement of the imaging apparatus 1, and may be realized by applying an angular velocity sensor mounted on the imaging apparatus. For example, the angular velocity in each direction is measured using a 3-axis gyro sensor, the motion information of the imaging apparatus 1 is acquired, and the acquired motion information is input to the output controller 15 b. Further, the external sensor 18 may be realized by using another imaging device that performs moving image compression or camera shake correction using motion detection. The other imaging apparatus is movably mounted integrally with or in synchronization with the imaging apparatus 1, and a detection result of motion detection in the other imaging apparatus will be input to the output controller 15b as motion information of the imaging apparatus 1.

Based on motion information input from external sensor 18 and by trigger generator 16_e1And 16_e2The output controller 15b estimates the amount of spatial deviation between the recognition result and the image data for visual recognition at the output timing of each trigger signal output. For example, the output controller 15b obtains the trigger signal from the trigger generator 16_e1And 16_e2A difference in output timing of each trigger signal output.

Note that the identification processing execution unit 124 goes to output controlThe input timing of the recognition result of the controller 15b and the input timing of the image data for visual recognition from the image processing unit 143 to the output controller 15b may be regarded as being made by the trigger generator 16_e1And 16_e2The output timings of the trigger signals respectively output.

Further, the output controller 15b obtains the moving direction and speed of the imaging apparatus 1 based on the movement information input from the external sensor 18. The output controller 15b calculates the amount of spatial deviation between the recognition result and the image data for visual recognition based on the difference in the output timings of the respective trigger signals and the direction and speed of movement of the imaging apparatus 1. The output controller 15b corrects the image data for visual recognition stored in the cache memory 151 based on the calculated spatial deviation amount. Examples of the correction include trimming, tilt correction, and the like of image data for visual recognition. The output controller 15b stores the corrected image data for visual recognition in the cache memory 151.

The output controller 15b outputs the recognition result stored in the cache memory 150 and the corrected image data for visual recognition stored in the cache memory 151 at a predetermined timing (for example, at a timing synchronized with the frame period).

The above description is an example in which the output controller 15b corrects the image data for visual recognition stored in the cache memory 151 based on the calculated spatial deviation amount. However, the present disclosure is not limited to this example. That is, the output controller 15b may also correct the recognition result stored in the cache memory 150 based on the calculated spatial deviation amount. This includes a case where the output controller 15b corrects the coordinate information of the recognition object included in the recognition result, for example, based on the calculated spatial deviation amount. Further, the output controller 15b may correct the recognition result and the image data for visual recognition, respectively.

Fig. 47 is a flowchart showing an example of processing according to the first modification of the fourth embodiment. In fig. 47, the processing of steps S200 to S203 is similar to the processing of steps S200 to S203 according to the above-described flowchart of fig. 34, and therefore will be omitted hereA description is given. Further, the processing of step S204 and step S205 of fig. 47 is processing similar to the processing of step S2044 and step S205 of fig. 45 described above, and is performed, for example, at the trigger generator 16, respectively_e1And 16_e2In parallel. A detailed description of the processing of steps S2044 and S205 will be omitted here.

In contrast, when the trigger generator 15f determines in step S2044 that the recognition result is output (yes in step S2044), the trigger generator 15f proceeds to the processing of step S205. In step S205, the trigger generator 16_e1The output processing of outputting the trigger signal to the identification processing execution unit 124 is performed. When the output processing is executed, the processing proceeds to step S2060.

Trigger generator 16_e2Similar to the trigger generator 16_e1The process of (1). I.e. trigger generator 16_e2It is determined in step S2044 whether the image processing unit 143 is allowed to output image data for visual recognition. When the trigger generator 16 is triggered_e1When it is determined that the recognition result is not output at the current point (no in step S2044), the trigger generator 16_e1The process proceeds to step S2060. In contrast, when the trigger generator 15f determines that the recognition result is to be output (yes at step S2044), the processing proceeds to step S205. In step S205, the trigger generator 16_e2Output processing of outputting the trigger signal to the image processing unit 143 is performed. When the output processing is executed, the processing proceeds to step S2060.

The process of step S2060 and the process of step S2062 following step S2060 are processes executed by the output controller 15 b. The output controller 15b performs output control processing on the recognition result output from the recognition processing execution unit 124 and the image data for visual recognition output from the image processing unit 143, respectively.

In step S2060, the output controller 15b stores the recognition result output from the recognition processing execution unit 124 in the cache memory 150 so as to execute the output storage processing. After the output controller 15a has stored the recognition result in the cache memory 150, the output controller 15a proceeds to the process of step S2062.

In step S2062, the output controller 15b performs correction processing on the image data for visual recognition stored in the cache memory 151 in step S2060 using the motion information input from the external sensor 18, and stores the corrected image data for visual recognition in the cache memory 151. Without being limited thereto, in step S2062, the output controller 15b may perform the correction processing on the recognition result stored in the cache memory 150 in step S2060. The output controller 15b stores the corrected recognition result in the cache memory 150.

The output controller 15b outputs the recognition result stored in the cache memory 150 and the corrected image data for visual recognition stored in the cache memory 151 at a predetermined timing.

In this way, in the first modification of the fourth embodiment, the recognition result and the image data for visual recognition are cached separately, and the recognition result or the image data for visual recognition is corrected using the motion information input from the external sensor 18. This makes it possible to output the recognition result and the image data for visual recognition in a state in which the spatial deviation between the recognition result and the image data for visual recognition is suppressed.

[7-2 ] second modification of the fourth embodiment ]

Next, a second modification of the fourth embodiment will be described. A second modification of the fourth embodiment is an example in which, after the recognition processing by the recognition processing unit 12, the visual recognition processing unit 14 performs readout of pixel data at high speed in order to suppress a difference between the output timing of the recognition result and the output timing of the image data for visual recognition.

Fig. 48 is a diagram schematically showing an example of an output control process according to a second modification of the fourth embodiment. In fig. 48, steps S10 to S12 are diagrams corresponding to the above-described fig. 32, and schematically show the states of frames read out by the sensor controller 11. That is, the frames are sequentially read out in the order of lines in step S10, and the process jumps to a position recognizable by the prediction object, and the readout is performed in step S11. Subsequently, the recognition result is output in step S12.

In fig. 48, the recognition result has been output in a state where all the line data of the frame has not been read out in step S12. Therefore, in the next step S20, the visual recognition processing unit 14 performs readout of a line that has not been read out in the frame up to the processing of step S12. At this time, the visual recognition processing unit 14 performs frame readout at a readout rate higher than the readout rate with respect to the readout performed by the recognition processing unit 12 in the processing up to step S12. The visual recognition processing unit 14 completes readout at a predetermined timing, and outputs image data for visual recognition.

In the second modification of the fourth embodiment, this makes it possible to suppress a time difference between the output timing of the recognition result and the output timing of the image data for visual recognition.

Fig. 49 is a functional block diagram showing functions of an example of the image forming apparatus 1 according to the second modification of the fourth embodiment. In fig. 49, the recognition result from the recognition processing execution unit 124 is appropriately provided, and the trigger generator 16f generates a trigger signal based on the recognition confidence score delivered in the form included in the recognition result from the recognition processing execution unit 124, as described in the second modification of the third embodiment (refer to fig. 38 to 40).

When the recognition confidence score included in the recognition result delivered from the recognition processing execution unit 124 reaches the threshold value C_thAt time t, the trigger generator 16f_TRGA trigger signal is generated. In response to the trigger signal, the recognition processing execution unit 124 passes the recognition result to the output controller 15 c. The output controller 15c stores the received recognition result in the cache memory 150.

On the other hand, the trigger generator 16f outputs a trigger signal to the identification processing execution unit 124, generates a high-speed readout instruction instructing to read out pixel data from the sensor unit 10 at a higher speed, and passes the high-speed readout instruction to the readout determiner 142.

The readout determiner 142 generates readout region information including the high-speed readout instruction transferred from the output controller 15b, and transfers the generated readout region information to the sensor controller 11. In the sensor controller 11, readout area information including a high-speed readout instruction is transferred from the readout controller 111 to the readout unit 110. In response to a high-speed readout instruction included in the readout region information, the readout unit 110 generates an imaging control signal for driving the sensor unit 10 at a higher driving speed than the driving speed before the occurrence of the trigger signal. The sensor unit 10 drives the sensor unit 10 at high speed according to the imaging control signal, and reads out pixel data at higher speed than before the occurrence of the trigger signal.

The pixel data read out by the readout unit 110 is transferred to the image processing unit 143 as image data for visual recognition via the image data storage controller 140. The image processing unit 143 performs image processing on the received image data, and passes the processed data to the output controller 15 c. The output controller 15c stores the image data transferred from the image processing unit 143 in the cache memory 151. The output controller 15c reads out the recognition result stored in the cache memory 150 and the image data for visual recognition stored in the cache memory 151 at a predetermined timing (for example, a timing synchronized with a frame period), and outputs the data, respectively.

Fig. 50 is a flowchart showing an example of processing according to the first modification of the fourth embodiment. In fig. 50, the processing of steps S200 to S203 is similar to the processing of steps S200 to S203 according to the above-described flowchart of fig. 34, and thus the description thereof will be omitted here. Further, the processing of step S204 and step S205 of fig. 50 is processing similar to the processing of step S2044 and step S205 of fig. 50 described above, and is performed, for example, at the trigger generator 16, respectively_e1And 16_e2In parallel. Herein will be describedDetailed description of the processing of steps S2044 and S205 is omitted.

Similar to step S2044 of the flowchart of FIG. 45 described above, the trigger generator 16_e1It is determined in step S2044 whether the recognition processing execution unit 124 is allowed to output the recognition result. When the trigger generator 16 is triggered_e1When it is determined that the recognition result is not output at the current point (no in step S2044), the trigger generator 16_e1The process proceeds to step S2063. In contrast, when the trigger generator 16f determines that the recognition result is to be output (yes at step S2044), the processing proceeds to step S205. In step S205, the trigger generator 16_e1The output processing of outputting the trigger signal to the identification processing execution unit 124 is performed. When the output processing is executed, the processing proceeds to step S2051.

In step S2051, high-speed readout processing is performed. That is, in step S2051, the trigger generator 16f generates a high-speed read instruction and passes the instruction to the read determiner 142. The high-speed readout instruction is included in the readout region information generated by the readout determiner 142 and is transmitted to the sensor controller 11. The sensor controller 11 transfers the received readout area information to the readout unit 110. The readout unit 110 drives the sensor unit 10 at a higher speed in response to a high-speed readout instruction included in the received readout region information. At this time, the readout unit 110 may perform thinning readout on the sensor unit 10 to increase the readout speed, or may reduce the bit depth of the image data to be read out to increase the readout speed.

Having read out the pixel data from the sensor unit 10 driven at high speed, the readout unit 110 passes the pixel data to the visual recognition processing unit 14. The visual recognition processing unit 14 passes the pixel data received from the readout unit 110 to the image processing unit 143 via the image data storage controller 140. The image processing unit 143 performs image processing on the received image data, and outputs the processed data as image data for visual recognition. The image data for visual recognition output from the image processing unit 143 is transferred to the output controller 15c and stored in the cache memory 151.

In the next step S2063, the output controller 15c outputs the recognition result stored in the cache memory 150 and the corrected image data for visual recognition stored in the cache memory 151 at a predetermined timing (for example, at a timing synchronized with the frame period).

In this way, in the second modification of the fourth embodiment, after the recognition result is output, the image data for visual recognition is acquired by high-speed readout, and the acquired image data for visual recognition and the recognition result are output at predetermined timings. This makes it possible to output the recognition result and the image data for visual recognition in a state in which the time lag between the recognition result and the image data for visual recognition is suppressed.

Although the above description is an example in which high-speed readout of image data for visual recognition is performed after the recognition result is output, the present disclosure is not limited to this example. That is, high-speed readout for the recognition processing can be performed after completion of readout of the image data for the visual recognition processing.

In this case, the trigger generator 16f generates a high-speed readout instruction in accordance with the output of the trigger signal to the image processing unit 143, and passes the generated high-speed readout instruction to the readout determiner 123 via a path shown by a broken line in fig. 49. The high-speed readout instruction is included in the readout region information generated by the readout determiner 142 and is transmitted to the sensor controller 11. The sensor controller 11 transfers the received readout area information to the readout unit 110. The readout unit 110 drives the sensor unit 10 at a higher speed in response to a high-speed readout instruction included in the received readout region information. The readout unit 110 reads out pixel data from the sensor unit 10 driven at high speed, and transfers the pixel data to the recognition processing unit 12.

[ 8] fifth embodiment ]

Next, a fifth embodiment will be described. The fifth embodiment is an example in which mediation is performed between a readout region in which the recognition processing unit 12 performs readout and a readout region in which the visual recognition processing unit 14 performs readout. For example, when one of the recognition processing unit 12 and the visual recognition processing unit 14 performs line readout by line thinning and the other performs readout in order of lines, the target line of line output between the recognition processing unit 12 and the visual recognition processing unit 14 will be different at a certain timing. In this case, mediation of the readout region is performed between the recognition processing unit 12 and the visual recognition processing unit 14 so as to determine a line as a readout target.

Fig. 51 is a flowchart showing an example of an overview of mediation processing according to the fifth embodiment. The processing in the flowchart of fig. 51 is processing performed for each readout of the readout unit. Hereinafter, it is assumed that the readout unit is a row, and the sensor controller 11 reads out pixel data from the sensor unit 10 in units of rows.

The process according to the flowchart of fig. 51 will be described with reference to fig. 21. In step S300, the readout unit 110 reads out pixel data (line data) in units of lines from the sensor unit 10. The readout unit 110 transfers the line data read out from the sensor unit 10 to the recognition processing unit 12 and the visual recognition processing unit 14. The visual recognition processing unit 14 transfers the pixel data transferred from the readout unit 110 to the image data storage controller 140. For example, the image data storage controller 140 stores the received pixel data in the image data storage unit 141, and transfers the data to the image processing unit 143.

When the processing of step S300 is completed, the processing proceeds to step S301 and step S311. The processing of steps S301 to S303 is processing in the recognition processing unit 12. In contrast, the processing of steps S311 to S313 is processing in the visual recognition processing unit 14. The processing in the recognition processing unit 12 and the processing in the visual recognition processing unit 14 may be executed in parallel.

First, the process of identifying the processing unit 12 from step S301 will be described. In step S301, the recognition processing unit 12 performs calculation of feature data by the feature data calculation unit 120 based on the line data transferred from the readout unit 110, stores the calculated feature data in the feature data storage unit 122, and performs recognition processing and the like by the recognition processing execution unit 124 based on the integrated feature data stored in the feature data storage unit 122. In the next step S302, the recognition processing unit 12 outputs the recognition result of the recognition processing from the recognition-processing execution unit 124. In the next step S303, in the recognition processing unit 12, the readout determiner 123 generates readout line information indicating the next readout line as readout region information using the integrated feature data, and passes the generated information to the sensor controller 11. When the process of step S303 is completed, the process proceeds to step S320 a.

Next, the processing performed by the visual recognition processing unit 14 from step S311 will be described. In step S311, the visual recognition processing unit 14 performs storing of the line data transferred from the readout unit 110 in the image data storage unit 141, performing image processing on the image data stored in the image data storage unit 141 by the image processing unit 143, and the like. In the next step S312, the image processing unit 143 in the visual recognition processing unit 14 outputs the image data subjected to the image processing in step S311. In the next step S312, the readout determiner 142 in the visual recognition processing unit 14 generates readout row information indicating the next readout row as readout region information using the row information of the row read out in step S300 and the recognition result output by the recognition processing unit 12 in step S302, and passes the generated information to the sensor controller 11. When the process of step S313 is completed, the process proceeds to step S320 a.

In step S320a, the sensor controller 11 acquires arbitration information for arbitrating the readout region by an arbitration controller described below. Specific examples of mediation information will be described below.

In the next step S321, the mediation controller determines which of the readout line indicated by the readout line information delivered from the recognition processing unit 12 in step S303 and the readout line indicated by the readout line information delivered from the visual recognition processing unit 14 in step S313 is to be used as the readout line for the next readout, using the mediation information acquired in step S320 a. In the next step S322, the sensor controller 11 executes the row control processing for executing readout of the readout row determined by the mediation controller in step S321.

In this way, in the fifth embodiment, the mediation controller performs mediation of the readout row to be read out next between the recognition processing unit 12 and the visual recognition processing unit 14 based on the mediation information acquired in a predetermined manner. Therefore, for example, even when the recognition processing unit 12 and the visual recognition processing unit 14 have determined different lines as the readout line, it is possible to avoid the occurrence of a problem in performing readout of the lines.

(8-0-1. concrete example of mediation processing)

Next, the mediation processing according to the fifth embodiment will be described more specifically. Fig. 52 is a functional block diagram showing an example of functions of the image forming apparatus 1 applicable to the fifth embodiment.

In the configuration shown in fig. 52, in comparison with the configuration shown in fig. 21, the readout controller 111a in the sensor controller 11 includes a mediation controller 1110 and a readout processing controller 1111. One or more pieces of readout region information from the readout determiner 123 of the recognition processing unit 12 and the readout determiner 142 of the visual recognition processing unit 14, respectively, are input to the mediation controller 1110 as control signals for controlling mediation processing. In other words, the mediation controller 1110 in this example uses the control signal as mediation information for performing mediation control.

Here, it is assumed that one readout area information indicates one line. That is, the mediation controller 1110 receives input of one or more pieces of line information from the readout determiners 123 and 142, respectively.

In the fifth embodiment, the mediation controller 1110 obtains a logical product of the control signal input from the readout determiner 123 and the control signal input from the visual recognition processing unit 14 to determine one readout line of the next readout.

The mediation controller 1110 transfers a control signal indicating the readout row determined by the mediation processing to the readout processing controller 1111. The readout processing controller 1111 transfers the received control signal to the readout unit 110 as readout region information indicating a readout row.

In the configuration of fig. 52, the output controller 15 outputs the recognition result output from the recognition processing execution unit 124 and the image data for visual recognition output from the image processing unit 143 to the devices of the subsequent stage. Here, the device in the subsequent stage may be realized by applying another sensor device that performs the recognition processing. In this case, the recognition result output from the imaging apparatus 1 and the image data for visual recognition may be applied to the recognition processing on another sensor device.

Fig. 53 is a schematic diagram showing mediation processing according to the fifth embodiment. Fig. 53 shows an example in which the mediation controller 1110 performs mediation processing based on the control signal. Here, the readout area information is used as a control signal.

Fig. 53 shows time passing to the right. Further, the vertical direction indicates the recognition control by the recognition processing unit 12, the visual recognition control by the visual recognition processing unit 14, the readout image read out from the sensor unit 10, and the mediation result by the mediation controller 1110, respectively. As described above, the mediation controller 1110 performs mediation by obtaining the logical product of the control signal (readout region information) output by the recognition processing unit 12 and the control signal output by the visual recognition processing unit 14.

In step S40, the recognition processing unit 12 and the visual recognition processing unit 14 generate readout region information for reading out the ith row (i row). The recognition processing unit 12 outputs readout region information for reading out three lines of the (i +1) th line, the (i +2) th line, and the (i +3) th line, respectively. Similarly, the visual recognition processing unit 14 also outputs control signals for reading out three lines of the (i +1) th line, the (i +2) th line, and the (i +3) th line, respectively. The readout region information output from the recognition processing unit 12 and the readout region information output from the visual recognition processing unit 14 are input to the mediation controller 1110 as control signals for the mediation controller 1110 to perform mediation control.

The mediation controller 1110 obtains logical products of respective control signals of the (i +1) th, (i +2) th, and (i +3) th lines from the recognition processing unit 12 and respective control signals of the (i +1) th, (i +2) th, and (i +3) th lines from the visual recognition processing unit 14. Here, since the respective control signals from the recognition processing unit 12 and the individual control signals from the visual recognition processing unit 14 match, all the rows indicated by the respective control signals can be read out. The mediation controller 1110 selects the (i +1) th row, the (i +2) th row, and the (i +3) th row one by one according to the readout order in the frame, and outputs the rows in order. For example, mediation controller 1110 first selects, as a mediation result, a control signal indicating the (i +1) th line closest to the upper end of the frame among the (i +1) th line, the (i +2) th line, and the (i +3) th line. Thereafter, the mediation controller 1110 selects the control signal indicating the (i +2) row and the control signal indicating the (i +3) row, that is, selects the control signals one by one in the order of the rows.

The arbitration controller 1110 transfers a control signal selected as an arbitration result to the readout processing controller 1111. The readout processing controller 1111 transmits the control signal received from the arbitration controller 1110 to the readout unit 110 as readout region information. The readout unit 110 reads out the line data of the line (i +1) th line) indicated in the readout region information from the sensor unit 10. The readout unit 110 transfers the line data of the (i +1) line read out from the sensor unit 10 to the recognition processing unit 12 and the visual recognition processing unit 14.

Through this processing, in step S40, as shown in the read image of fig. 53, line data is sequentially read in line order from, for example, line L # i on the upper end side of the frame toward the lower end side of the frame.

The frames are read out sequentially in the order of lines, and assuming that at the point of the j-th line (j-line) where the frame is read out to the vicinity of the center, for example, the identification processing unit 12 has identified the number "8" or "9" based on the read line data (step S41).

The recognition processing unit 12 may perform readout by skipping a line to a line which predicts which of the numbers "8" or "9" the object recognized in step S41 can be recognized. In this example, the recognition processing unit 12 outputs readout region information to read out three rows, i.e., (j +3) th row, (j +5) th row, and (j +7) th row, respectively, so as to perform thinning every other row in readout. Each readout region information output from the recognition processing unit 12 is input to the mediation controller 1110 as each control information for the mediation controller 1110 to perform mediation control.

On the other hand, since the image recognition ratio for visual recognition needs to be dense, the visual recognition processing unit 14 has the (j +1) th line, the (j +2) th line, and the (j +3) th line. Readout area information for sequentially reading three rows is output. Each readout region information output from the visual recognition processing unit 14 is input to the mediation controller 1110 as each control information for the mediation controller 1110 to perform mediation control.

The mediation controller 1110 obtains a logical product of each control signal delivered from the recognition processing unit 12 and each control signal delivered from the visual recognition processing unit 14. In this case, each control signal transferred from the recognition processing unit 12 corresponds to the (j +3) th, the (j +5) th, and the (j +7) th rows, and each control signal transferred from the visual recognition processing unit 14 corresponds to the (j +1) th, the (j +2) th, and the (j +3) th rows. Accordingly, a logical product is obtained by the mediation controller 1110, and a control signal indicating the (j +3) th row is output from the mediation controller 1110 as a mediation result.

Next, it is assumed that when the k-th line on the lower end side of the frame is read out from the above-described j-th line in step S42, the object identified as the number "8" or "9" in step S41 is determined as the number "8". In this case, the recognition processing unit 12 may end the recognition processing. When the recognition processing is completed, the control signal input from the recognition processing unit 12 to the mediation controller 1110 may be optional.

On the other hand, regarding the image for visual recognition, a row needs to be further read out. In this example, the visual recognition processing unit 14 sequentially outputs readout region information for reading out the (k +1) th line, the (k +2) th line, and the (k +3) th line in the line order, respectively.

For example, when the recognition processing by the recognition processing unit 12 is completed, the mediation controller 1110 ignores the control signal input from the recognition processing unit 12. Thus, for example, the mediation controller 1110 selects the (k +1) th line, the (k +2) th line, and the (k +3) th line indicated by the control signal input from the visual recognition processing unit 14 one by one according to the readout order within the frame, and outputs these lines in order. For example, the mediation controller 1110 first selects a control signal indicating the (k +1) th line closer to the upper end of the frame among the (k +1) th line, the (k +2) th line, and the (k +3) th line as a mediation result.

In this way, in the fifth embodiment, the mediation controller 1110 obtains the logical product of each control signal input from the recognition processing unit 12 and each control signal input from the visual recognition processing unit 14, thereby performing mediation so as to determine the readout row to be read out next. Therefore, for example, even when the recognition processing unit 12 and the visual recognition processing unit 14 have determined different lines as the readout line, it is possible to avoid the occurrence of a problem in performing readout of the lines.

Incidentally, there may be a case where the row indicated by the control signal input from the recognition processing unit 12 and the row indicated by the control signal input from the visual recognition processing unit 14 do not overlap. For example, when the respective control signals input from the recognition processing unit 12 to the mediation controller 1110 are associated with the (i +1) th line, the (i +3) th line, and the (i +5) th line, and the respective control signals input from the visual recognition processing unit 14 to the mediation controller 1110 are associated with the (i +2) th line, the (i +4) th line, and the (i +6) th line, there is no repetition between the lines.

Obtaining the logical product of the two by mediation controller 1110 will result in an empty set of outputs resulting in no determination of the next read out row. As a first example of avoiding such a situation, it is conceivable that, of the control signal input from the recognition processing unit 12 and the control signal input from the visual recognition processing unit 14, the control signal to be used with higher priority is determined in advance. As an example, the control signal input from the recognition processing unit 12 is preferentially selected over the control signal input from the visual recognition processing unit 14. At this time, it is conceivable to adopt, among the control signals input from the priority recognition processing unit 12, a control signal closer to the control signal input from the visual recognition processing unit 14.

As a second example, it is conceivable that the recognition processing unit 12 and the visual recognition processing unit 14 set in advance a limit to the number of readout region information that can be output. For example, in each of the recognition processing unit 12 and the visual recognition processing unit 14, three rows are selected as candidates from five rows that are adjacent in order.

Fig. 54 is an example flowchart showing mediation processing according to the fifth embodiment. The processing according to the flowchart of fig. 54 is processing performed for each readout of the readout unit.

In step S300, the readout unit 110 reads line data from the sensor unit 10. The readout unit 110 transfers the line data read out from the sensor unit 10 to the recognition processing unit 12 and the visual recognition processing unit 14. Hereinafter, the processing on the recognition processing unit 12 side of steps S301 to S303 and the processing on the visual recognition processing unit 14 side of steps S311 to S313 in fig. 54 are the same as the corresponding processing in fig. 51 described above, and therefore, the description is omitted here.

In fig. 54, in step S303, as described above, the recognition processing unit 12 generates readout line information indicating the next readout line as readout region information, and passes the generated information to the sensor controller 11. The readout region information transmitted to the sensor controller 11 will be input to the mediation controller 1110 as control information. Similarly, in step S313, the visual recognition processing unit 14 generates readout line information indicating the next readout line as readout region information, and passes the generated information to the sensor controller 11. The readout region information transmitted to the sensor controller 11 will be input to the mediation controller 1110 as control information.

After control signals are input from the recognition processing unit 12 and the visual recognition processing unit 14 to the mediation controller 1110 in steps S303 and S313, the process proceeds to step S321. In step S321, the mediation controller performs mediation of the respective control signals by using the respective control signals input from the recognition processing unit 12 and the visual recognition processing unit 14 as mediation information. This mediation determines which of the readout row indicated by the control signal delivered from the recognition processing unit 12 in step S303 and the readout row indicated by the control signal delivered from the visual recognition processing unit 14 in step S313 is defined as the readout row to be read out next. In the next step S322, the sensor controller 11 executes the row control processing for executing readout of the readout row determined by the mediation controller in step S321.

[8-1 ] first modification of fifth embodiment ]

Next, a first modification of the fifth embodiment will be described. A first modification of the fifth embodiment is an example in which the recognition result of the recognition processing unit 12 is applied as mediation information for mediation processing by the mediation controller 1110 a. Fig. 55 is a functional block diagram showing an example of functions of the imaging apparatus 1 applicable to the first modification of the fifth embodiment.

In the configuration shown in fig. 55, compared with the configuration shown in fig. 51 described above, the recognition result output from the recognition processing execution unit 124 is input to the mediation controller 1110 ab. The mediation controller 1110ab according to the first modification of the fifth embodiment uses the recognition result as mediation information to perform mediation processing between the readout region information input from the recognition processing unit 12 and the readout region information input from the visual recognition processing unit 14.

Fig. 56 is a schematic diagram showing a first example of mediation processing according to a first modification of the fifth embodiment. Fig. 56 shows time passing to the right. Furthermore, the figure indicates from the top in the vertical direction: reading control and identification results; a readout image read out by the sensor controller 11; and the mediation result obtained by mediation controller 1110 a. Further, in the example of fig. 56, the imaging apparatus 1 is used for an in-vehicle application, and reads out frames in units of rows from the lower end side to the upper end side.

In the example of fig. 56, when a moving body is recognized in the recognition processing, the control of the visual recognition processing unit 14 is prioritized.

On the lower end side of the frame (step S50), the road surface is recognized based on the line data by the recognition processing execution unit 124 in the recognition processing unit 12. In this case, since the recognition target is the road surface, readout can be performed by skipping the line to some extent. Further, mediation controller 1110a performs mediation according to the recognition processing based on the recognition result. For example, the mediation controller 1110a preferentially selects the readout region information input from the recognition processing unit 12 with respect to the readout region information input from the visual recognition processing unit 14 according to the recognition result of the recognized road surface, and then passes the selection information to the readout processing controller 1111. For example, in the case where the recognition processing unit 12 controls readout by thinning lines, readout region information generated by thinning lines at predetermined intervals will be input to the mediation controller 1110 a. The mediation controller 1110a passes readout region information indicating the thinning to the readout processing controller 1111.

It is assumed that the recognition processing unit 12 recognizes a moving body based on line data obtained by finely reading out lines. In the example of fig. 56, in step S51, the recognition processing executing unit 124 acquires an object detection result indicating that the object was detected slightly before the line position of 1/2 of the frame, and detects that the detected object is a moving body (person). In the recognition processing unit 12, the recognition processing execution unit 124 passes the recognition result indicating that the person has been recognized to the mediation controller 1110 a.

The position of the moving body varies greatly over time, and therefore, in the case of performing line readout including skip lines, the positional deviation of the recognition object in each readout line will be large, resulting in the necessity of correcting the deviation. Therefore, the mediation controller 1110a gives higher priority to control for visual recognition based on the recognition result indicating that the person has been recognized, and preferentially selects the readout region information input from the visual recognition processing unit 14 with respect to the readout region information input from the recognition processing unit 12, and passes the selected information to the readout processing controller 1111. In this case, the visual recognition processing unit 14 generates readout region information for sequentially performing readout in the order of rows, for example, and inputs the generated information to the mediation controller 1110 a.

Further, it is assumed that line readout has been performed toward the upper end side of the frame, and a recognition result indicating a non-road surface has been obtained by the recognition processing execution unit 124 in the recognition processing unit 12. In the case of a non-road surface, it is considered that there is no problem in performing line readout by using coarser thinning on the road surface identified in step S50. Therefore, the recognition processing unit 12 generates read-out region information having a thinning interval larger than that in the case of the road surface, and inputs the generated information to the mediation controller 1110 a. The arbitration controller 1110a preferentially selects the read-out region information input from the recognition processing unit 12 over the read-out region information input from the visual recognition processing unit 14 based on the recognition result indicating the non-road surface output from the recognition processing execution unit 124, and transfers the selected information to the read-out processing controller 1111.

In fig. 56, step S52 shows that, based on the read-out region information, the region of the non-road surface on the upper end side of the frame is read out more roughly than the region of the road surface on the lower end side of the frame. Further, in step S52, it is shown that the central portion where the person is detected in the frame is densely read out for visual recognition.

Fig. 57 is a schematic diagram showing a second example of mediation processing according to the first modification of the fifth embodiment. Since each part of fig. 57 is similar to that in fig. 56 described above, a description thereof will be omitted here. In addition, each image shown in fig. 57 is a dark image having lower luminance as a whole, in which the front side (the lower end side of the frame) is brighter at a slightly higher luminance.

In this second example, the mediation controller 1110a uses the confidence indicating the confidence level of the recognition result as mediation information, and performs mediation between the readout region information input from the recognition processing unit 12 and the readout region information input from the visual recognition processing unit 14.

It is assumed that line reading is started from the lower end side of the frame, and the road surface is recognized with a predetermined level of confidence (high confidence) based on the line data by the recognition processing execution unit 124 in the recognition processing unit 12 (step S60). In this case, since the imaging target is the road surface, readout can be performed by skipping rows to some extent. The recognition processing unit 12 generates readout region information in which the lines have been thinned at predetermined intervals from the recognition result, and inputs the generated information to the mediation controller 1110 a.

Since the recognition result output from the recognition processing execution unit 124 has a high confidence, the mediation controller 1110a trusts the recognition result, and selects the readout region information indicating thinning input from the recognition processing unit 12 from the readout region information input from the recognition processing unit 12 and the readout region information input from the visual recognition processing unit 14. The mediation controller 1110a passes readout region information indicating the thinning to the readout processing controller 1111.

It is assumed that a moving body is recognized by the recognition processing unit 12 based on line data obtained by finely reading out lines. In the example of fig. 57, in step S51, the recognition processing executing unit 124 recognizes a moving body (person) slightly before the line position of 1/2 of the frame. Here, in the example of fig. 57, it is assumed that a part where a person is recognized has low luminance, and therefore, the recognition processing execution unit 124 has recognized the detection of the person with a confidence less than a predetermined level (low confidence). The recognition processing execution unit 124 transfers a recognition result indicating that the person has been recognized with low confidence to the mediation controller 1110 a.

In this case, since a person is recognized with low confidence, visual confirmation of the person is considered to be necessary. Therefore, the mediation controller 1110a gives higher priority to the readout region information output from the visual recognition processing unit 14 than to the readout region information output from the recognition processing unit 12.

Here, the mediation controller 1110a may give a readout instruction to the readout process controller 1111 according to the confidence level output from the recognition process execution unit 124 to achieve easier visual recognition. For example, control may be performed such that rows in the row range in which a person has been recognized with low confidence are read out a plurality of times. In this case, in the visual recognition processing unit 14, the image processing unit 143 or the like may combine the read-out line data between the lines having mutually corresponding positions, and further perform image processing (e.g., high resolution processing, contrast adjustment) so as to increase the sharpness of the image of the person recognized with low confidence.

Further, when the imaging apparatus 1 is mounted on a vehicle capable of autonomous driving, it is conceivable to set the number of readouts, details of image processing, and the like according to an autonomous driving level (for example, level 2 to level 4) compatible with the vehicle.

Further, it is assumed that line readout has been performed toward the upper end side of the frame, and a recognition result indicating a non-road surface has been obtained with high confidence by the recognition processing execution unit 124 in the recognition processing unit 12 (step S62). In the case of a non-road surface, it is considered that there is no problem in performing line readout by using coarser thinning on the road surface identified in step S60. Therefore, the recognition processing unit 12 generates read-out region information having a thinning interval larger than that in the case of the road surface, and inputs the generated information to the mediation controller 1110 a.

Note that, within the image for visual recognition, it is considered that the non-road surface image does not require high image quality. Therefore, it is conceivable that the visual recognition processing unit 14 outputs the image data for visual recognition with the default image quality setting in the case where the recognition processing execution unit 124 recognizes the non-road surface.

Fig. 58 is a flowchart showing an example of mediation processing according to the first modification of the fifth embodiment. The processing according to the flowchart of fig. 58 is processing performed for each readout of the readout unit.

In step S300, the readout unit 110 reads line data from the sensor unit 10. The readout unit 110 transfers the line data read out from the sensor unit 10 to the recognition processing unit 12 and the visual recognition processing unit 14. Hereinafter, the processing on the recognition processing unit 12 side of steps S301 to S303 and the processing on the visual recognition processing unit 14 side of steps S311 to S313 in fig. 58 are the same as the corresponding processing in fig. 51 described above, and therefore, the description is omitted here.

In fig. 58, in step S303, as described above, the recognition processing unit 12 generates readout line information indicating the next readout line as readout region information, and passes the generated information to the sensor controller 11. The readout region information transmitted to the sensor controller 11 will be input to the mediation controller 1110a as control information. Similarly, in step S313, the visual recognition processing unit 14 generates readout line information indicating the next readout line as readout region information, and passes the generated information to the sensor controller 11. The readout region information transmitted to the sensor controller 11 will be input to the mediation controller 1110a as control information.

After the control signals are input from the recognition processing unit 12 and the visual recognition processing unit 14 to the mediation controller 1110a in steps S303 and S313, the process proceeds to step S320 b. In step S320b, the mediation controller 1110a acquires the recognition result output from the recognition processing execution unit 124 as mediation information.

In the next step S321, the mediation controller 1110b determines which of the readout region information transferred from the recognition processing unit 12 in step S303 and the readout region information transferred from the visual recognition processing unit 14 in step S313 is to be defined as readout region information indicating a line to be read out next, in accordance with the confidence level indicated in the recognition result acquired in step S320 b. In the next step S322, the sensor controller 11 passes the readout region information determined by the mediation controller 1110b in step S321 to the readout process controller 1111, and performs the line control process to read out the readout line indicated in the readout region information.

In this way, in the first modification of the fifth embodiment, it is possible to adaptively determine which of the readout region information output from the recognition processing unit 12 and the readout region information output from the visual recognition processing unit 14 is to be used as readout region information indicating a readout row to be read out next, according to the recognition result based on line data. This makes it possible to obtain an appropriate image for visual recognition from various scenes as imaging targets.

[8-2 ] second modification of fifth embodiment ]

Next, a second modification of the fifth embodiment will be described. A second modification of the fifth embodiment is an example in which the mediation controller 1110 applies the image for visual recognition output from the visual recognition processing unit 14 as mediation information for mediation processing. Fig. 59 is a functional block diagram showing an example of functions of the imaging apparatus 1 applicable to the second modification of the fifth embodiment.

In the configuration shown in fig. 59, compared with the configuration shown in fig. 51 described above, the signal processing result obtained by the image processing unit 143 in the execution of the image processing is input to the mediation controller 1110 b. The signal processing result may be unprocessed image data for visual recognition output from the image processing unit 143, or may be image data processed to facilitate determination by the mediation controller 1110. The mediation controller 1110b according to the second modification of the fifth embodiment uses the signal processing result as mediation information to perform mediation processing between the readout region information input from the recognition processing unit 12 and the readout region information input from the visual recognition processing unit 14.

Fig. 60 is a schematic diagram showing a mediation process according to a second modification of the fifth embodiment. Since each part of fig. 60 is similar to that in fig. 60 described above, a description thereof will be omitted here. Further, each image shown in fig. 60 corresponds to each image used in the above-described fig. 57, and is a dark image having lower luminance as a whole, in which the front side (the lower end side of the frame) is brighter at a slightly higher luminance.

As shown in step S70, line reading is started from the lower end side of the frame. The image processing unit 143 calculates the luminance value of the read line data (for example, the average value of the luminance values of the respective pixel data included in the line), and transfers the signal processing result including the calculated luminance value to the mediation controller 1110 b. In the case where the luminance value included in the signal processing result transferred from the image processing unit 143 is the first threshold value or more, the mediation controller 1110b determines that the read-out region is bright and suitable for visual recognition, and thus performs control of giving higher priority to visual recognition. Specifically, the mediation controller 1110b selects the readout region information output from the visual recognition processing unit 14 from the readout region information input from the recognition processing unit 12 and the readout region information input from the visual recognition processing unit 14. The mediation controller 1110b selects readout region information output from the selected visual recognition processing unit 14 as readout region information indicating a row to be readout next.

In the example of fig. 60, it is assumed that the visual recognition processing unit 14 is requested to perform line refinement when the luminance value of a line is a predetermined value or more. The visual recognition processing unit 14 generates readout region information for refining the line at the lower end of the frame, and passes the generated information to the mediation controller 1110.

Based on the line data obtained by subsequently reading out the lines, the image processing unit 143 passes the signal processing result including the luminance value of the line data to the mediation controller 1110 b. In the case where the luminance value included in the received signal processing result is less than the first threshold value, the mediation controller 1110b determines that the readout region is dark and is not suitable for visual recognition, and thus performs control of giving higher priority to recognition. Specifically, the mediation controller 1110b selects the readout region information input from the recognition processing unit 12 from the readout region information input from the recognition processing unit 12 and the readout region information input from the visual recognition processing unit 14 (step S71).

Line readout is further performed toward the upper end side of the frame, and the signal processing result including the luminance value of the line data is transferred from the image processing unit 143 to the mediation controller 1110 b. In the case where the luminance value included in the signal processing result delivered from the image processing unit 143 is the first threshold value or more, the mediation controller 1110b determines that the readout region is bright and suitable for visual recognition, and thus returns control to control that gives higher priority to visual recognition. Specifically, the mediation controller 1110b selects the readout region information output from the visual recognition processing unit 14 as readout region information indicating a row to be readout next, from the readout region information input from the recognition processing unit 12 and the readout region information input from the visual recognition processing unit 14 (step S72).

Although the above description is an example in which the luminance value is determined only by the first threshold value, the present disclosure is not limited to this example. For example, it is allowed to provide a second luminance value whose luminance value is higher than the first threshold value so as to apply the second threshold value to a control that gives higher priority to visual recognition. By using this second threshold value, a phenomenon called a "blow out" state in which the luminance value is saturated in the control for giving higher priority to visual recognition can be avoided.

Fig. 61 is a flowchart showing an example of mediation processing according to a second modification of the fifth embodiment. The processing according to the flowchart of fig. 61 is processing performed for each readout of the readout unit.

In step S300, the readout unit 110 reads line data from the sensor unit 10. The readout unit 110 transfers the line data read out from the sensor unit 10 to the recognition processing unit 12 and the visual recognition processing unit 14. Hereinafter, the processing on the recognition processing unit 12 side of steps S301 to S303 and the processing on the visual recognition processing unit 14 side of steps S311 to S313 in fig. 61 are the same as the corresponding processing in fig. 51 described above, and therefore, the description is omitted here.

In fig. 61, in step S303, as described above, the recognition processing unit 12 generates readout line information indicating the next readout line as readout region information, and passes the generated information to the sensor controller 11. The readout region information transmitted to the sensor controller 11 will be input to the mediation controller 1110b as control information. Similarly, in step S313, the visual recognition processing unit 14 generates readout line information indicating the next readout line as readout region information, and passes the generated information to the sensor controller 11. The readout region information transmitted to the sensor controller 11 will be input to the mediation controller 1110b as control information.

After the control signals are input from the recognition processing unit 12 and the visual recognition processing unit 14 to the mediation controller 1110b in steps S303 and S313, the process proceeds to step S320 c. In step S320c, the mediation controller 1110b acquires the signal processing result output from the image processing unit 143 as mediation information.

In the next step S321, the mediation controller 1110b determines which of the readout region information transferred from the recognition processing unit 12 in step S303 and the readout region information transferred from the visual recognition processing unit 14 in step S313 is to be defined as readout region information indicating a line to be read out next, from the luminance value included in the signal processing result acquired in step S320 c. In the next step S322, the sensor controller 11 passes the readout region information determined by the mediation controller 1110b in step S321 to the readout process controller 1111, and performs the line control process to read out the readout line indicated in the readout region information.

In this way, in the second modification of the fifth embodiment, it is possible to adaptively determine which of the readout region information output from the recognition processing unit 12 and the readout region information output from the visual recognition processing unit 14 is to be used as readout region information indicating a readout row to be read out next, according to the luminance value based on line data. This makes it possible to obtain an appropriate image for visual recognition according to the brightness of the imaging environment.

[8-3 ] third modification of fifth embodiment ]

Next, a third modification of the fifth embodiment will be described. A third modification of the fifth embodiment is an example in which the mediation controller 1110 applies external control information provided from the outside as mediation information for mediation processing. Fig. 62 is a functional block diagram showing an example of functions of the image forming apparatus 1 applicable to the third modification of the fifth embodiment.

In contrast to the configuration shown in fig. 51 described above, the configuration shown in fig. 62 allows external control information to be input to the mediation controller 1110 c. When the imaging apparatus 1 is mounted on a vehicle capable of autonomous driving, information indicating an autonomous driving level (for example, level 2 to level 4) compatible with the vehicle may be applied to the external control information. Without being limited thereto, output signals of other sensors may be used as the external control information. In this case, examples of other sensors include another imaging device, a LiDAR sensor, a radar apparatus, and so forth. Further, the output of the camera of the electronic mirror that monitors the condition around the vehicle by using the camera and the display may also be used as the external control information. Further, for example, information indicating an operation mode of the external device may be used as the external control information.

The mediation controller 1110c according to the third modification of the fifth embodiment uses this external control information as mediation information to perform mediation processing between readout region information input from the recognition processing unit 12 and readout region information input from the visual recognition processing unit 14.

Fig. 63 is a flowchart showing an example of mediation processing according to a third modification of the fifth embodiment. The processing according to the flowchart of fig. 63 is processing performed for each readout of the readout unit.

In step S300, the readout unit 110 reads line data from the sensor unit 10. The readout unit 110 transfers the line data read out from the sensor unit 10 to the recognition processing unit 12 and the visual recognition processing unit 14. Hereinafter, the processing on the recognition processing unit 12 side of steps S301 to S303 and the processing on the visual recognition processing unit 14 side of steps S311 to S313 in fig. 63 are the same as the corresponding processing in fig. 51 described above, and therefore, the description is omitted here.

In fig. 63, in step S303, as described above, the recognition processing unit 12 generates readout line information indicating the next readout line as readout region information, and passes the generated information to the sensor controller 11. The readout region information transmitted to the sensor controller 11 will be input to the mediation controller 1110c as control information. Similarly, in step S313, the visual recognition processing unit 14 generates readout line information indicating the next readout line as readout region information, and passes the generated information to the sensor controller 11. The readout region information transmitted to the sensor controller 11 will be input to the mediation controller 1110c as control information.

After the control signals are input from the recognition processing unit 12 and the visual recognition processing unit 14 to the mediation controller 1110b in steps S303 and S313, the process proceeds to step S320 d. In step S320d, the mediation controller 1110c acquires external control information input from the external device as mediation information.

At this time, in the case where the reliability of the identification information is low when the mediation controller 1110c uses the identification information input from another sensor as the external control information, for example, in step S303, the mediation controller 1110c controls not to select the readout region information transferred from the identification processing unit 12. Further, regarding the external control information, it is conceivable that when the imaging device 1 is mounted on a vehicle capable of autonomous driving and uses information indicating an autonomous driving level (for example, level 2 to level 4) compatible with the vehicle, the readout region information transferred from the identification processing unit 12 in step S303 will be used with higher priority.

In the next step S322, the sensor controller 11 passes the readout region information determined by the mediation controller 1110b in step S321 to the readout process controller 1111, and performs the line control process to read out the readout line indicated in the readout region information.

In this way, in the third modification of the fifth embodiment, the mediation controller 1110c performs mediation by using external control information output from an external device as mediation information. This makes it possible to apply in various cases, and in the case where the external device is an external sensor that does not have output of image data for visual recognition, an image based on the image data for visual recognition can be supplied to the external sensor.

[ 9] sixth embodiment ]

Next, as a sixth embodiment, an application example of the image forming apparatus 1 according to the first to fifth embodiments and according to various modifications of the present disclosure will be described. Fig. 64 is a diagram illustrating an example of using the image forming apparatus 1 according to the first to fifth embodiments and the respective modifications described above.

The above-described imaging apparatus 1 is applicable to the following various cases where sensing is performed on light including visible light, infrared light, ultraviolet light, and X-rays.

Devices that capture images for entertainment viewing, such as digital cameras and mobile devices with camera functionality.

Means for traffic, such as onboard sensors imaging the front, rear, surroundings, interior, etc. of the vehicle, to ensure safe driving including automatic stopping, etc., and to identify the state of the driver; a monitoring camera for monitoring a running vehicle and a road; and a distance measuring sensor that performs distance measurement between vehicles, and the like.

Devices for household appliances, including televisions, refrigerators, air conditioners, etc., to image the gestures of a user and to perform the operations of the device according to the gestures.

Devices for medical and health care, such as endoscopes and devices that perform angiography by receiving infrared light.

Means for security, such as surveillance cameras for crime prevention and cameras for personal identity verification.

Devices for cosmetic use, such as skin measuring devices for imaging the skin and microscopes for imaging the scalp.

Devices for sports, such as motion cameras and wearable cameras for sports applications.

Means for agriculture, such as cameras for monitoring the field and crop conditions.

[ further application example of the technology according to the present disclosure ]

The technique according to the present disclosure (present technique) is applicable to various products. For example, the technology according to the present disclosure may be applied to devices mounted on various moving objects such as automobiles, electric vehicles, hybrid electric vehicles, motorcycles, bicycles, personal mobile devices, airplanes, drones, ships, and robots.

Fig. 65 is a block diagram showing an example of a schematic configuration of a vehicle control system as an example of a mobile body control system to which the technique according to the embodiment of the present disclosure is applicable.

The vehicle control system 12000 includes a plurality of electronic control units connected to each other via a communication network 12001. In the example shown in fig. 65, the vehicle control system 12000 includes a drive system control unit 12010, a vehicle body system control unit 12020, an outside-vehicle information detection unit 12030, an inside-vehicle information detection unit 12040, and an integrated control unit 12050. Further, a microcomputer 12051, a sound/image output section 12052, and an in-vehicle network interface (I/F)12053 are shown as a functional configuration of the integrated control unit 12050.

The drive system control unit 12010 controls the operations of the devices related to the drive system of the vehicle according to various programs. For example, the drive system control unit 12010 functions as a control device to control: a driving force generating apparatus such as an internal combustion engine, a driving motor, or the like for generating a driving force of a vehicle, a driving force transmitting mechanism for transmitting the driving force to wheels, a steering mechanism for adjusting a steering angle of the vehicle, and a braking apparatus for generating a braking force of the vehicle, and the like.

The vehicle body system control unit 12020 controls the operations of various types of devices configured to the vehicle body according to various programs. For example, the vehicle body system control unit 12020 functions as a control device to control the following items: keyless entry system, smart key system, power window apparatus, or various lamps such as head lamp, backup lamp, brake lamp, turn signal lamp, fog lamp, and the like. In this case, the vehicle body system control unit 12020 may receive, as input, a radio wave transmitted from a mobile device that replaces a key or a signal of various switches. The vehicle body system control unit 12020 receives these input radio waves or signals to control the door lock device, power window device, lamp, and the like of the vehicle.

Vehicle exterior information detection section 12030 detects information on the exterior of the vehicle equipped with vehicle control system 12000. For example, the imaging unit 12031 is connected to the vehicle exterior information detection unit 12030. The vehicle exterior information detecting unit 12030 causes the imaging section 12031 to image an image of the outside of the vehicle, and receives the imaged image. Based on the received image, the vehicle exterior information detection unit 12030 may perform processing of detecting an object (such as a person, a vehicle, an obstacle, a sign, a symbol, or the like on the road surface), or perform processing of detecting a distance to the object.

The imaging section 12031 is an optical sensor that receives light and outputs an electric signal corresponding to the amount of light of the received light. The imaging section 12031 can output an electric signal as an image, or can output an electric signal as information on a measured distance. Further, the light received by the imaging section 12031 may be visible light, or may be invisible light such as infrared light.

The in-vehicle information detection unit 12040 detects information about the interior of the vehicle. The in-vehicle information detection unit 12040 may be connected to a driver state detection unit 12041 that detects the state of the driver. The driver state detection unit 12041 includes, for example, a camera that photographs the driver. Based on the detection information input from the driver state detection section 12041, the in-vehicle information detection unit 12040 can calculate the degree of fatigue of the driver or the degree of concentration of the driver, or can discriminate whether the driver is dozing.

The microcomputer 12051 is able to calculate a control target value for the driving force generation apparatus, the steering mechanism, or the brake apparatus based on information about the interior or exterior of the vehicle obtained by the vehicle exterior information detection unit 12030 or the vehicle interior information detection unit 12040, and output a control command to the drive system control unit 12010. For example, the microcomputer 12051 can execute cooperative control intended to realize functions of an Advanced Driver Assistance System (ADAS) including collision avoidance or impact buffering for the vehicle, following driving based on an inter-vehicle distance, vehicle speed keeping driving, warning of a vehicle collision, warning of a vehicle lane departure, and the like.

Further, the microcomputer 12051 can perform cooperative control intended for automatic running or the like that does not depend on the operation of the driver, by controlling the driving force generation apparatus, the steering mechanism, the brake apparatus based on the information on the outside or inside of the vehicle obtained by the outside-vehicle information detection unit 12030 or the inside-vehicle information detection unit 12040.

Further, the microcomputer 12051 can output a control command to the vehicle body system control unit 12020 based on the information on the outside of the vehicle obtained by the vehicle exterior information detecting unit 12030. For example, the microcomputer 12051 may control the headlamps to change from high beam to low beam based on the position of the preceding vehicle or the oncoming vehicle detected by the vehicle exterior information detecting unit 12030, thereby performing cooperative control aimed at preventing glare by controlling the headlamps.

The sound/image output portion 12052 transmits an output signal of at least one of sound and image to an output device capable of visually or aurally notifying information to a passenger of the vehicle or the outside of the vehicle. In the example of fig. 65, an audio speaker 12061, a display portion 12062, and an instrument panel 12063 are shown as output devices. The display portion 12062 may include, for example, at least one of an in-vehicle display and a flat-view display.

Fig. 66 is a diagram illustrating an example of the mounting position of the imaging section 12031. In fig. 66, the image forming portion 12031 includes image forming portions 12101, 12102, 12103, 12104, and 12105.

The imaging portions 12101, 12102, 12103, 12104, and 12105 may be arranged at positions of a front nose, side mirrors, a rear bumper, a rear door, and an upper portion of a windshield inside the vehicle 12100. The imaging portion 12101 disposed at the nose and the imaging portion 12105 disposed at the upper portion of the windshield inside the vehicle mainly obtain an image of the front of the vehicle 12100. The imaging portions 12102 and 12103 disposed on the side mirrors mainly obtain images of the lateral side of the vehicle 12100. An imaging portion 12104 disposed at a rear bumper or a rear door mainly obtains an image of the rear of the vehicle 12100. The imaging portion 12105 disposed at the upper portion of the windshield inside the vehicle is mainly used to detect a preceding vehicle, a pedestrian, an obstacle, a signal, a traffic sign, a lane, and the like.

Incidentally, fig. 66 shows an example of the shooting ranges of the imaging sections 12101 to 12104. The imaging range 12111 represents an imaging range of the imaging section 12101 disposed at the anterior nose. Imaging ranges 12112 and 12113 represent imaging ranges of imaging portions 12102 and 12103 arranged at the side view mirror, respectively. The imaging range 12114 represents an imaging range of an imaging portion 12104 disposed at a rear bumper or a rear door. For example, a bird's eye view image of the vehicle 12100 viewed from above can be obtained by superimposing the image data imaged by the imaging sections 12101 to 12104.

At least one of the imaging sections 12101 to 12104 may have a function of obtaining distance information. For example, at least one of the imaging sections 12101 to 12104 may be a stereo camera composed of a plurality of imaging elements, or may be an imaging element having pixels for phase difference detection.

For example, the microcomputer 12051 can determine the distance to each three-dimensional object within the imaging ranges 12111 to 12114 and the temporal change in the distance (relative speed to the vehicle 12100) based on the distance information obtained from the imaging sections 12101 to 12104, and thereby extract the closest three-dimensional object, which exists specifically on the traveling path of the vehicle 12100 and travels in substantially the same direction as the vehicle 12100 at a predetermined speed (e.g., equal to or greater than 0 km/h), as the preceding vehicle. Further, the microcomputer 12051 can set in advance a following distance to be maintained from the preceding vehicle, and execute automatic braking control (including following parking control), automatic acceleration control (including following start control), and the like. Therefore, it is possible to execute cooperative control intended for automatic travel or the like that does not depend on the operation of the driver.

For example, the microcomputer 12051 can classify the three-dimensional object data on the three-dimensional object into three-dimensional object data of a two-wheeled vehicle, a standard-size vehicle, a large-sized vehicle, a pedestrian, a utility pole, and other three-dimensional objects based on the distance information obtained from the imaging sections 12101 to 12104, and extract the classified three-dimensional object data for automatic avoidance of an obstacle. For example, the microcomputer 12051 discriminates whether the obstacle around the vehicle 12100 is an obstacle that can be visually recognized by the driver of the vehicle 12100 or an obstacle that is difficult for the driver of the vehicle 12100 to visually recognize. Then, the microcomputer 12051 determines the risk of collision, which indicates the risk of collision with each obstacle. In the case where the collision risk is equal to or higher than the set value, there is a possibility of collision, the microcomputer 12051 outputs an alarm to the driver via the audio speaker 12061 or the display portion 12062, and performs forced deceleration or avoidance steering via the drive system control unit 12010. Whereby the microcomputer 12051 can assist driving to avoid a collision.

At least one of the imaging units 12101 to 12104 may be an infrared camera that detects infrared rays. For example, the microcomputer 12051 can recognize a pedestrian by determining whether or not a pedestrian is present in the imaged images of the imaging sections 12101 to 12104. Such pedestrian recognition is performed by, for example, the following procedures: a program of extracting characteristic points in an imaged image of the imaging sections 12101 to 12104 as infrared cameras, and a program of determining whether or not it is a pedestrian by performing a pattern matching process on a series of characteristic points representing the contour of an object. When the microcomputer 12051 determines that a pedestrian is present in the imaged images of the imaging portions 12101 to 12104 and thus recognizes the pedestrian, the sound/image output portion 12052 controls the display portion 12062 to display a square contour line superimposed on the recognized pedestrian for emphasizing the recognized pedestrian. The sound/image output portion 12052 may also control the display portion 12062 to display an icon or the like representing a pedestrian at a desired position.

In the foregoing, examples of the vehicle control system to which the technique according to the present disclosure can be applied have been described. For example, the technique according to the present disclosure can be applied to the imaging section 12031 in the above-described configuration. By applying the imaging apparatus 1 according to the present disclosure to the imaging section 12031, it is possible to realize both imaging for recognition processing and imaging for visual recognition, and to provide sufficient information for each of the recognition processing and visual recognition of a person.

The effects described in this specification are merely examples, and thus are not limited to the effects of the examples, and other effects may exist.

Further, the present technology may also have the following configuration.

(1)

An image forming apparatus comprising:

an imaging unit having a pixel region in which a plurality of pixels are arranged, and reading and outputting pixel signals from the pixels included in the pixel region;

a readout unit controller that controls a readout unit provided as a part of the pixel region;

a first readout unit setting unit that sets a first readout unit for reading out pixel signals from the pixel region to perform a recognition process of training data for which each of the readout units has been learned;

a second readout unit setting unit that sets a second readout unit for reading out the pixel signal from the pixel region to output the pixel signal to a subsequent stage; and

a mediation unit that performs mediation between the first readout unit and the second readout unit,

wherein the reading unit controller sets the reading unit by mediation by the mediation unit.

(2)

The image forming apparatus according to (1),

wherein the mediation unit performs mediation by a logical product of the first readout unit and the second readout unit.

(3)

The image forming apparatus according to (1),

wherein the mediation unit performs mediation based on a result of the recognition processing.

(4)

The image forming apparatus according to (3),

wherein the mediation unit selects the second readout unit in a case where a result of the recognition processing indicates recognition of the moving body.

(5)

The image forming apparatus according to (3) or (4),

wherein the mediation unit selects the second readout unit in a case where the result of the recognition processing indicates a recognition confidence of the threshold value or less.

(6)

The image forming apparatus according to (1),

wherein the mediation unit performs mediation based on the pixel signal read out from the second readout unit.

(7)

According to the image forming apparatus of (6),

wherein the mediation unit selects the second readout unit in a case where the luminance based on the pixel signal exceeds the threshold value.

(8)

The imaging apparatus according to any one of (1) to (7),

wherein the mediation unit performs mediation based on external information provided from outside of the imaging apparatus.

(9)

According to the image forming apparatus of (8),

wherein the mediation unit performs mediation based on an operation mode provided from the outside.

(10)

The image forming apparatus according to (8) or (9),

wherein the mediation unit performs mediation based on a detection output of another sensor device provided from the outside.

(11)

An imaging system, comprising:

an imaging apparatus equipped with:

an imaging unit having a pixel region in which a plurality of pixels are arranged, and reading and outputting pixel signals from the pixels included in the pixel region;

a readout unit controller that controls a readout unit provided as a part of the pixel region;

a second readout unit setting unit that sets a second readout unit for reading out the pixel signal from the pixel region to output the pixel signal to a subsequent stage; and

a mediation unit that performs mediation between the first readout unit and the second readout unit; and

an information processing apparatus provided with an identification unit that performs identification processing,

wherein the reading unit controller sets the reading unit by mediation by the mediation unit.

(12)

An imaging method performed by a processor, comprising:

a readout unit control step of controlling a readout unit provided as a part of a pixel region in which a plurality of pixels are arranged, the pixel region being included in the imaging unit;

a second readout unit setting step of setting a second readout unit for reading out the pixel signal from the pixel region to output the pixel signal to a subsequent stage; and

a mediation step for performing mediation between the first readout unit and the second readout unit,

wherein the readout unit control step sets the readout unit by mediation of the mediation step.

(13)

An imaging program causing a processor to execute:

a readout unit control step of controlling a readout unit provided as a part of a pixel region in which a plurality of pixels are arranged, the pixel region being included in the imaging unit;

a second readout unit setting step of setting a second readout unit for reading out the pixel signal from the pixel region to output the pixel signal to a subsequent stage; and

a mediation step for performing mediation between the first readout unit and the second readout unit,

wherein the readout unit control step sets the readout unit by mediation of the mediation step.

Further, the present technology may also have the following configuration.

(14)

An electronic device, comprising:

an imaging unit that generates image data;

a machine learning unit that performs machine learning processing using a learning model on the image data of each unit region read out from the imaging unit; and

a function execution unit that executes a predetermined function based on a result of the machine learning process.

(15)

According to the electronic device of (14),

wherein the machine learning unit performs a machine learning process using a Convolutional Neural Network (CNN) on image data of a unit region first input among image data of the same frame.

(16)

The electronic apparatus according to (15) above,

wherein, in the case where the machine learning process using the CNN for the image data having the first input unit region fails, the machine learning unit performs the machine learning process using the Recurrent Neural Network (RNN) for the next input image data having the unit region in the same frame.

(17)

The electronic apparatus according to any one of (14) to (16), further comprising:

a control section that reads out image data from the imaging unit in units of rows,

wherein the image data is input to the machine learning unit in units of lines.

(18)

The electronic apparatus according to any one of (14) to (17),

wherein the image data in the unit area is image data having a predetermined number of lines.

(19)

The electronic apparatus according to any one of (14) to (17),

wherein the image data in the unit area is image data in a rectangular area.

(20)

The electronic apparatus according to any one of (14) to (19), further comprising:

a memory which records a program of the learning model,

wherein the machine learning means executes the machine learning process by reading out the program from the memory and executing the program.

List of reference marks

1 image forming apparatus

10 sensor unit

11 sensor controller

12 recognition processing unit

14 visual recognition processing unit

110 read-out unit

111 read-out controller

120 characteristic data calculation unit

121 characteristic data storage controller

122 feature data storage unit

123 readout determiner

124 identify process execution units

140 image data storage controller

141 image data storage unit

143 image processing unit.

123页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：固态图像捕获装置、信息处理装置、信息处理系统、信息处理方法和程序

Imaging apparatus, imaging system, imaging method, and imaging program

相关技术

网友询问留言