Depth and vision sensor for challenging agricultural environments

文档序号:1865427 发布日期:2021-11-19 浏览:11次 中文

阅读说明:本技术 用于考验农业环境的深度和视觉传感器 (Depth and vision sensor for challenging agricultural environments ) 是由 瑞安·R·克诺夫 约书亚·亚伦·莱辛 米歇尔·普拉图塞维奇 詹森·A·里斯托斯 什雷亚斯·爱 于 2020-02-14 设计创作,主要内容包括:提供了一种用于在室内农业环境中对植物进行三维成像的方法,所述室内农业环境具有与自然室外光的功率谱不同的环境光功率谱。该方法包括:将一对空间分离的立体相机对准包括植物的场景,用投影仪提供的非均匀图案照亮该场景,该投影仪利用室内农业环境中具有的低于平均环境强度的频段中的光;利用滤波器过滤进入每个相机的图像传感器的光,该滤波器选择性地使投影仪所利用的频段中的光通过;利用每个相机捕获场景的图像,以获得第一和第二相机图像;以及生成深度图,该深度图包括与第一相机图像中的每个像素对应的深度值。(A method is provided for three-dimensional imaging of a plant in an indoor agricultural environment having an ambient light power spectrum that is different from the power spectrum of natural outdoor light. The method comprises the following steps: directing a pair of spatially separated stereo cameras at a scene including a plant, the scene being illuminated with a non-uniform pattern provided by a projector that utilizes light in a frequency band having a lower than average ambient intensity in an indoor agricultural environment; filtering light entering the image sensor of each camera with a filter that selectively passes light in a frequency band utilized by the projector; capturing an image of a scene with each camera to obtain first and second camera images; and generating a depth map comprising a depth value corresponding to each pixel in the first camera image.)

1. A method for three-dimensional imaging of a plant in an indoor agricultural environment, wherein an ambient light power spectrum of the indoor agricultural environment is different from a power spectrum of natural outdoor light, the method comprising:

aligning the spatially separated stereo camera pair to a scene including the plant;

illuminating the scene with a non-uniform pattern provided by a projector using light in a frequency band in the indoor agricultural environment that is below an average ambient intensity;

filtering light entering an image sensor of each of the cameras with a filter that selectively passes light within the frequency band utilized by the projector;

shooting images of the scene by using each camera to obtain a first camera image and a second camera image; and

generating a depth map comprising a depth value corresponding to each pixel in the first camera image.

2. The method of claim 1, wherein the filter is a band pass filter, and further comprising blocking a majority of ambient light from entering the image sensor of each of the cameras with the band pass filter.

3. The method of claim 1, wherein the projector emits violet light.

4. The method of claim 3, wherein the frequency band of light emitted from the projector is from 400nm to 430 nm.

5. The method of claim 3, wherein the light entering the image sensor of each of the cameras is filtered with an optical low pass filter.

6. The method of claim 1, wherein the projector emits red light.

7. The method of claim 6, wherein the frequency band of light emitted from the projector is from 620nm to 680 nm.

8. The method of claim 6, wherein the light entering the image sensor of each of the cameras is filtered with an optical bandpass filter.

9. The method of claim 1, wherein the projector emits green light.

10. The method of claim 9, wherein the frequency band of light emitted from the projector is from 520nm to 560 nm.

11. The method of claim 9, wherein the light entering the image sensor of each of the cameras is filtered with an optical bandpass filter.

12. The method of claim 1, wherein the non-uniform pattern is a non-uniform dot pattern generated by passing laser light emitted from the projector through a diffractive optical element.

13. The method of claim 1, further comprising characterizing the ambient optical power spectrum in the indoor agricultural environment, and selecting the frequency band based on the characterization.

14. The method of claim 1, further comprising selecting the frequency band based on a pattern of illumination for the plant in the indoor agricultural environment.

15. The method of claim 1, further comprising polarizing light from the projector at a selected polarization angle and filtering light entering an image sensor of each of the cameras using a filter that selectively passes light of the selected polarization angle.

16. The method of claim 1, further comprising illuminating the scene with an unpatterned light source in the frequency band.

17. The method of claim 1, wherein generating the depth map comprises correcting the first camera image and the second camera image such that horizontal lines drawn by the first camera image and the second camera image correspond to a same epipolar line in real world space.

18. The method of claim 17, wherein generating the depth map further comprises:

performing a depth measurement for each pixel in the camera image by matching the pixel in the first camera image with a corresponding pixel in the second camera image:

searching candidate corresponding pixels along the epipolar line;

sorting each of the candidate corresponding pixels; and

selecting the candidate corresponding pixel with the highest ranking among the candidate corresponding pixels as the corresponding pixel.

19. The method of claim 18, wherein generating the depth map further comprises determining a distance of each pixel from the stereoscopic camera pair as a function of a separation distance between each pixel in the first camera image and its corresponding pixel in the second camera image.

20. The method of claim 1, further comprising identifying a location of an agricultural item in the depth map.

21. The method of claim 20, further comprising transmitting the location of the agricultural item to a robotic system configured to harvest the agricultural item.

22. The method of claim 21, further comprising harvesting the agricultural product item with the robotic system.

23. A system for three-dimensional imaging of a plant in an indoor agricultural environment, wherein an ambient light power spectrum of the indoor agricultural environment is different from a power spectrum of natural outdoor light, the system comprising:

a spatially separated stereo camera pair, each comprising an image sensor;

a filter disposed before the image sensor of the camera, the filter configured to block light having a wavelength higher than an average ambient intensity in the indoor agricultural environment;

a projector configured to project a non-uniform light pattern onto a scene including the plant; and

a processor configured to generate a depth map of the scene from images captured by the stereoscopic camera pair.

24. The system of claim 23, wherein the filter blocks a majority of ambient light in the indoor agricultural environment.

25. The system of claim 23, wherein the projector emits violet light.

26. The system of claim 25, wherein the projector emits light in a frequency band from 400nm to 430 nm.

27. The system of claim 25, wherein the filter is an optical low pass filter.

28. The system of claim 23, wherein the projector emits red light.

29. The system of claim 28, wherein the projector emits light in a frequency band from 620nm to 680 nm.

30. The system of claim 28, wherein the filter is an optical bandpass filter.

31. The system of claim 23, wherein the projector emits green light.

32. The system of claim 31, wherein the projector emits light in a frequency band from 520nm to 560 nm.

33. The system of claim 31, wherein the filter is an optical bandpass filter.

34. The system of claim 23, further comprising a spectrum analyzer configured to characterize the ambient light power spectrum in the indoor agricultural environment and provide an indication of a bandwidth of light having a lower average ambient intensity in the indoor agricultural environment.

35. The system of claim 23, wherein the frequency of light emitted by the projector is selected based on an illumination modality used for the plant in the indoor agricultural environment.

36. The system of claim 23, wherein the projector comprises a laser emitter configured to emit laser light at a frequency that is not substantially attenuated by the filter, and a diffractive optical element in an optical path of the laser, the diffractive optical element configured to produce the non-uniform pattern as a non-uniform dot pattern.

37. The system of claim 23, wherein the projector further comprises a polarizer configured to polarize light projected from the projector at a selected polarization angle, and the filter is configured to selectively pass the selected polarization angle of light into the image sensor of the camera.

38. The system of claim 23, further comprising a second projector configured to illuminate the scene with an unpatterned light source in a frequency band selectively passed by the filter.

39. The system of claim 23, wherein the processor is further configured to identify a location of an agricultural item in the depth map.

40. The system of claim 39, further comprising a robotic harvester in communication with the processor and configured to harvest the agricultural product item.

41. A robotic harvester, characterized in that the robotic harvester is configured to harvest produce items from the location identified by the system of claim 23.

Technical Field

Aspects and embodiments disclosed herein relate to machine imaging of agricultural products to determine a three-dimensional position of the agricultural product.

Background

In order to guide an automated multiple degree of freedom system (robotic system) to interact with living plants for harvesting, trimming, gridding or various forms of analysis, high precision real-time three-dimensional imaging of plants is often used as one of a variety of sensor modes in fully automated systems. In indoor agricultural environments, unique lighting conditions exist, which are not typical anywhere else. In addition, the actual shape, irregularity and surface color or texture of vine and leaf crops provide unique challenging targets for conventional three-dimensional (3D) imaging sensors. Existing and commercially available 3D imaging systems perform very poorly under these unique conditions.

Disclosure of Invention

According to aspects disclosed herein, there is provided a method for three-dimensional imaging of a plant in an indoor agricultural environment having an ambient light power spectrum that is different from the power spectrum of natural outdoor light. The method includes directing a pair of spatially separated stereo cameras at a scene including plants, illuminating the scene with a non-uniform pattern provided by a light projector that utilizes light in a frequency band having an intensity lower than an average ambient intensity in an indoor agricultural environment. Light entering the image sensor of each camera is filtered with a filter that selectively passes light in a frequency band used by the projector, an image of the scene is captured with each camera to obtain first and second camera images, and a map is generated whose depth includes a depth value corresponding to each pixel in the first camera image.

In some embodiments, the filter is a band pass filter, and the method further comprises blocking a majority of ambient light from entering the image sensor of each camera with the band pass filter.

In some embodiments, the projector emits violet light. The frequency band of the light emitted from the projector may be from 400nm to 430 nm. The light entering the image sensor of each camera may be filtered with an optical low pass filter.

In some embodiments, the projector emits red light. The frequency band of the light emitted from the projector may be from 620nm to 680 nm. The light entering the image sensor of each camera may be filtered using an optical bandpass filter.

In some embodiments, the projector emits green light. The frequency band of the light emitted from the projector may be from 520nm to 560 nm. The light entering the image sensor of each camera may be filtered using an optical bandpass filter.

In some embodiments, the non-uniform pattern is a non-uniform dot pattern generated by passing laser light emitted from a projector through a diffractive optical element.

In some embodiments, the method further comprises characterizing an ambient light power spectrum in the indoor agricultural environment and selecting a frequency band based on the characterization.

In some embodiments, the method further comprises selecting the frequency band based on a form of lighting for plants in the indoor agricultural environment.

In some embodiments, the method further includes polarizing light from the projector at a selected polarization angle and selectively filtering light entering the image sensor of each camera at the selected polarization angle using a filter.

In some embodiments, the method further comprises illuminating the scene with an unpatterned light source in a frequency band.

In some embodiments, generating the depth map comprises correcting the first and second camera images such that horizontal lines drawn by the first and second camera images correspond to the same pair of epipolar lines in real world space.

In some embodiments, generating the depth map further comprises performing a depth measurement for each pixel in the camera image by searching for candidate corresponding pixels along epipolar lines to match the pixel in the first camera image with a corresponding pixel in the second camera image, ranking each of the candidate corresponding pixels, and

and selecting the candidate corresponding pixel with the highest rank from the candidate corresponding pixels as the corresponding pixel. Generating the depth map may also include determining a distance of each pixel from the stereoscopic camera pair as a function of a separation distance between each pixel in the first camera image and its corresponding pixel in the second camera image.

In some embodiments, the method further comprises identifying a location of the agricultural item in the depth map. The method may also include conveying the location of the agricultural product to a robotic system configured to harvest the agricultural product. The method may further comprise harvesting the agricultural product using a robotic system.

According to another aspect, a system is provided for three-dimensional imaging of a plant in an indoor agricultural environment having an ambient light power spectrum that is different from the power spectrum of natural outdoor light. The system comprises: spatially separated stereo camera pairs, each stereo camera pair comprising an image sensor; a filter placed in front of the camera image sensor, the filter configured to block light having a wavelength higher than an average environmental intensity in an indoor agricultural environment; a projector configured to project a non-uniform light pattern onto a scene including plants; and a processor configured to generate a depth map of the scene from the captured images of the stereoscopic camera pair.

In some embodiments, the filter blocks a majority of ambient light in an indoor agricultural environment.

In some embodiments, the projector emits violet light. The projector may emit light in a frequency band from 400nm to 430 nm. The filter may be an optical low pass filter.

In some embodiments, the projector emits red light. The projector may emit light in a frequency band of 620nm to 680 nm. The filter may be an optical band pass filter.

In some embodiments, the projector emits green light. The projector may emit light in the 520nm to 560nm band. The filter may be an optical band pass filter.

In some embodiments, the system further comprises a spectrum analyzer configured to characterize an ambient light power spectrum in the indoor agricultural environment and provide an indication of a bandwidth of light having a lower average ambient intensity than in the indoor agricultural environment.

In some embodiments, the frequency of light emitted by the projector is selected based on the form of illumination used for plants in an indoor agricultural environment.

In some embodiments, the projector includes a laser emitter configured to emit a laser at a frequency that is not substantially attenuated by the filter, and a diffractive optical element in an optical path of the laser configured to produce the non-uniform pattern as a non-uniform dot pattern.

In some embodiments, the projector further includes a polarizer configured to polarize light projected by the projector at a selected polarization angle, and the filter is configured to selectively pass light of the selected polarization angle to the image sensor of the camera.

In some embodiments, the system further includes a second projector configured to illuminate the scene with the unpatterned light source in a frequency band selectively passed by the filter.

In some embodiments, the processor is further configured to identify a location of the agricultural item in the depth map.

In some embodiments, the system further includes a robotic harvester in communication with the processor and configured to harvest the agricultural product items.

According to another aspect, there is provided a robotic harvester configured to harvest agricultural product items from locations identified by embodiments of the above-described system.

Drawings

The figures are not drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. In the drawings:

fig. 1A shows an arrangement of image devices in a stereoscopic imaging system;

FIG. 1B shows an arrangement of images and illumination devices in another example of a stereoscopic imaging system;

FIG. 2 illustrates how the distance of an observed object from a stereoscopic imaging system can be determined;

FIG. 3a illustrates a pattern that may be projected onto a scene to help determine the depth of features in the scene;

FIG. 3B illustrates another pattern that may be projected on a scene to help determine the depth of features in the scene;

FIG. 4 illustrates the relative wavelengths of light used in some near infrared imaging systems and ambient light in a typical industrial environment;

FIG. 5 shows the relative intensity of sunlight and the environment within a greenhouse example for different wavelengths of light;

FIG. 6 shows the relative intensity of a violet light source compared to ambient light in a greenhouse at different frequencies;

FIG. 7 shows the relative intensity of a red light source compared to ambient light within a greenhouse at different frequencies;

FIG. 8 shows the relative intensity of a green light source compared to ambient light within a greenhouse at different frequencies;

FIG. 9 shows an example of a system for three-dimensional imaging of a plant in an indoor agricultural environment; and

fig. 10 shows an example of a robotic harvester.

Detailed Description

The aspects and embodiments disclosed herein are not limited to the details of construction and the arrangement of components set forth in the following description or illustrated in the drawings. The aspects and embodiments disclosed herein are capable of being practiced or of being carried out in various ways.

3D imaging for robots

Many robotic systems automatically manipulate a target item in some manner using mechanical manipulators and tools that are capable of moving in an environment. In general, as is the case with widely used industrial automation, the task is simple and highly limited: moving a single type of article from one location to another or operating in some manner with a high degree of repeatability as desired in many manufacturing processes. More recently, however, robotics has been applied to increasingly complex tasks. For example, in the e-commerce industry, robotic systems are being developed to handle out-of-order items, even with thousands of different items, to be placed in packaging for shipment. Humanoid robots are being studied that can navigate in hazardous and complex environments to perform daily manual tasks, such as opening doors or containers. Finally, some organizations have begun developing agricultural robotic systems that directly manipulate live crops to automate the seeding, reproduction, defoliation, erection and harvesting procedures.

In order to plan and execute motions that avoid damaging collisions and optimize speed, the robotic system should be able to determine its location in the local environment and understand the extent of the local environment in which it may operate. These tasks are commonly referred to collectively as "navigation" -a comprehensive ability to self-locate, map, and plan. There are many techniques for synthesizing one or more sensor signals to locations of an a priori map or constructing a map. One sensor mode used in this process is range imaging. Range imaging refers to a set of techniques that ultimately generate a 2D image, where each pixel or element in the image encodes a distance from a reference point to a corresponding point in the environment.

Advantages of stereoscopic 3D camera

In some examples, the robot responsible for generating the 3D image of the scene may utilize a stereoscopic camera.

While many range imaging sensor modes have existed for some time (e.g., structured light, radar, LiDAR, time-of-flight, and interferometric measurements), these tend to be either too expensive, limited by surface type or radiation variations, or their measurements are robust, or do not provide full range images in real time. The threshold for real-time operation varies according to the speed of the relevant dynamics in the system, but is typically within a fraction of a second for most robotic systems.

In particular, LiDAR systems will typically have very high resolution and repeatability. Unfortunately, they are generally composed of various mechanically moving parts that are prone to failure and make the instrument highly sensitive to mechanical vibrations or shocks. The time of flight sensors is also highly accurate, but tends to have a longer minimum measurement distance due to the high speed (in the range of tens of picoseconds per processor cycle) required for time of flight measurements. Both techniques result in reduced accuracy or data inefficiency when the surface has extreme reflective or transmissive properties. Many models also fail to achieve adequate signal-to-noise ratios in environments with high Near Infrared (NIR) illumination, such as outdoors. Therefore, versions of these sensors that have high spatial resolution, can operate under adverse lighting conditions, and can measure near-minimum distances (less than a few meters) are very expensive (typically thousands of dollars or more).

Fortunately, new types of sensors capable of generating range images in real time are already on the market (e.g., the sensor is capable of generating range images in real time)RealSenseTM D415、ASUS Xtion、Occipital Structure and stereosubs ZED camera). Each utilizes the variation in stereo vision to extract distance information from one or more CCD or CMOS image sensors, and in some examples, a projected light pattern to solve the correspondence, and triangulates the distance for each location within a two-dimensional "depth map". Stereo vision has been studied as a depth imaging technique for many years, but until recently the computational complexity associated with solving the correspondence problem (where a region in one image is identified in another image to determine its disparity, also known as "stereo matching") has been prohibitive, and limits these sensors to very low spatial resolution or exceptionally slow speeds that do not meet real-time requirements.

This limitation has been overcome by some developments. The improved algorithm solves the corresponding problem more accurately and reduces the resources required for the calculation. Off-the-shelf CCD and CMOS image sensors have become cheaper and higher performance. Furthermore, each of these new stereo-based sensors utilizes a dedicated integrated circuit or ASIC to perform stereo matching more efficiently than general purpose processors. Some of the above sensor systems use two color image sensors without additional illumination (binocular stereo vision), some use one color sensor and an accurately known and unique visible projected light pattern of the color sensor (the stereo vision variation is called structured light vision), and others combine both technologies. In many examples, these sensors fuse the previously described 2D depth map or range image with a 2D color image to create a projected 3D point cloud, where the points are given the exact color measured in the color image. Cameras that do so are commonly referred to as RGB-D cameras (or "red green blue depth" cameras).

Binocular stereo matching and dot matrix diagram

In some examples, providing a light pattern on a scene to be imaged may facilitate a robot to three-dimensionally image the scene with a stereoscopic camera.

Binocular RGBD cameras typically rely on a process known as binocular stereo matching to estimate the scene depth for each given pixel. Binocular stereo matching works by pointing two cameras at the same scene with a small spatial separation in their baselines, as shown in fig. 1A and 1B. These two cameras together are called a stereo pair. In various embodiments disclosed herein, the binocular vision system may include not only two cameras, but also an illumination system, e.g., a laser illuminator, which may generate and direct laser light through a dot matrix generator to generate a dot matrix on a scene to be imaged as described below, and/or a colored non-laser light source for further illuminating the scene as described further below. Appropriate lenses and/or filters may be associated with the camera and light source, as shown in FIG. 1B.

The two captured images are corrected such that the horizontal lines drawn by the two images correspond to the same pair of epipolar lines in real world space. See fig. 2.

By searching along the epipolar line, matching pixels in the left image with corresponding pixels in the right image, a depth measure is calculated at each pixel. All candidate pixels are ranked using a set of heuristics (e.g., the L2 norm around the N × N pixel image window), and the candidate pixel with the highest score is the corresponding matched pixel. The horizontal pixel separation distance between two matching pixels is proportional to the distance of the pixel from the camera. The output of this process is referred to as a depth map, in which a depth value is generated for each input pixel, or a zero value is generated for invalid or low confidence measurements. The matching criteria heuristics vary greatly from implementation to implementation, but typically include components of the RGB and luminance difference metrics.

During a pixel matching search, scenes with high variation from pixel to pixel increase the chance that a match can be found with high confidence, especially if each region of the image is unique. For example, if the sensor is measuring a scene with a christmas tree, the bright light on the tree will stand out from the dark pine needle background, and thus the matching of bright pixels will have a much higher confidence score than all other candidate pixels. In contrast, in non-textured or low contrast scenes (e.g., flat walls), pixel matching is ambiguous because no set of heuristics can disambiguate between neighboring pixels. When this occurs, depth measurement techniques applied to most or all areas of the scene may return inaccurate or invalid data.

To overcome these problems and reduce the dependence of the sensor on the unique characteristics of the content of a particular scene to produce reliable depth measurements, recent cameras (e.g., cameras)Sense of realityTMAnd microsoft Kinect camera) employs an artificial bitmap provided by a projector. In some cases, the projector's pattern is designed to ensure that each of its subregions is unique (as in the case of the DeBruijn dot-matrix diagram shown in FIG. 3B). When using this type, stereo matching can be done with only one camera and projector if the baseline between them and the globally unique pattern of the projector are known a priori (this is the method used by microsoft Kinect camera). In other examples, a randomized dot-matrix plot, for example, as shown in fig. 3A, uses a binocular (bi-camera) stereo matching process that does not guarantee uniqueness but complements normal binocular. Random dot-matrix maps add contrast and texture to surfaces in the scene and still tend to be mostly unique when examining small local areas of the image. This significantly reduces the matching ambiguity on surfaces that do not typically have many inherent textures (e.g., flat or curved surfaces of uniform color).

The dot-matrix map added to the projection onto the scene will make the pixel window heuristic more likely to match correctly in the same image pair.

In practice, a properly projected and imaged dot-map may imply a differential robotic application between sensors that can resolve the depth of each pixel, sensors that cannot resolve any depth measurements at all, or sensors that provide measurements with high error rates.

In the implementation of binocular stereo depth sensors and structured light vision sensors, it is now common to employ projected dot-matrix diagrams to enhance the implementation of stereo matching capabilities. However, the design or optimization of these sensors does not take into account operation in vegetation-rich indoor and outdoor agricultural environments, and suffers from severe performance degradation in those environments for reasons to be described further below.

Working wavelength of binocular stereo depth projector

The effectiveness of three-dimensional imaging of a scene may be enhanced by utilizing light wavelengths selected based on background ambient light in a particular environment.

The wavelength of radiation in the Near Infrared (NIR) (e.g., 800-. This is due to the moderate spectral sensitivity of many commercially available silicon CMOS imaging chips to NIR light, and the lack of NIR light in typical industrial sensing environments (e.g., factories or offices).

In an office or factory environment, lighting is primarily provided by fluorescent or LED light sources. These energy sources are energy efficient, mainly because they emit only photons visible to the human eye, with a dominant wavelength between 400 and 700nm, centered at 555nm, to match the CIE luminance curve used to measure lumens.

In designing the emission wavelength of a laser or LED-based pattern projector for a binocular stereo depth camera, the sensor designer would like the emitted and imaged pattern not to be overwhelmed or "washed out" by the ambient lighting conditions, as this would negate the advantages of the projected pattern at the first location. Because these indoor environments do not have much ambient lighting at wavelengths beyond 700nm (see fig. 4), and if the sensor is used on or around a human user, the projection of visible patterns may be undesirable, typical off-the-shelf binocular stereo cameras are designed to operate in the near infrared (830 and 860nm because of the common availability of inexpensive and high optical power laser sources at these wavelengths). Furthermore, the typical spectral sensitivity of silicon CMOS imaging sensors is much higher in this range (sometimes up to 20-40% quantum efficiency) than at longer wavelengths. These factors all contribute to the preference of sensor designers to operate in near infrared environments where common components are available and where environmental interference is minimal in the intended operating environment (home, office or warehouse).

Ambient lighting in high vegetation agricultural environments

Near-infrared illumination of a scene may not be the best choice in all environments to facilitate three-dimensional imaging of the scene. While 860nm projectors may be well suited for use in a home or office environment, they are almost the worst case choice for use in highly vegetated indoor agricultural environments (e.g., commercial greenhouses). They are also not suitable for outdoor operation. This is because, in these challenging environments, there is very strong illumination in the NIR band of 750-1000 nm.

Existing sensors do not filter these wavelengths to observe their projected patterns within 830-860 nm. Thus, in an agricultural environment, these sensors without modification will produce an overexposed, low contrast image region with little or no detail for performing stereo matching. Their projected pattern is washed out by ambient NIR light and provides little or no additional contrast in the image.

To understand why so much NIR light pollution is present in a typical indoor agricultural production environment, we must argue how they are constructed, along with the light absorption characteristics of plants. Plastic or glass covered greenhouses, high tunnels, ring greenhouses (i.e., protected cultivation or "indoor farms") often strive to maximize the amount of light that can be obtained by a crop. Most of these environments use natural sunlight to maximize yield and growth potential. Although the cover glass or plastic can attenuate UV radiation in a protected culture environment, it blocks little IR light.

This is achieved by design, as the retained IR light (due to the "greenhouse effect") provides additional heat within the overlying growth environment, thereby reducing or eliminating the need for supplemental heat during the cooler months. In addition, healthy vegetation is a reasonably effective reflector of near infrared light (near 60% at 830-860 nm). When used in combination, the transmittance of the roofing material and the absorption spectrum of the healthy vegetation produce an environmentally normalized spectrum (which is superimposed with the reference diffuse reflectance ASTM solar spectrum of solar radiation at sea level) that is expected to look like that shown in fig. 5.

Note that the key difference compared to other lighting environments is that so much NIR light intensity is maintained in a vegetation environment that its spectral power is significantly higher than the visible spectrum. Thus, conventional stereo vision sensors that are sensitive to NIR and utilize NIR projectors cannot operate effectively in these environments.

Adjusting wavelength to maximize performance

The ability to image with light having wavelengths different from those falling in the near infrared band may facilitate three-dimensional imaging of scenes in certain environments. To create a dedicated stereo imaging sensor that performs well in the previously described agricultural environment, three novel examples of imaging system configurations are disclosed herein. Each variant of the system comprises two modified elements. First, a pair of CMOS camera sensors can be used as a stereo pair in the same manner as previously constructed depth imaging sensors. However, a filter medium may be placed over each sensor that limits the wavelength of the received light to a narrow and specific frequency band of interest (a "band-pass" filter medium), or that blocks light above a specific wavelength (a "low-pass" filter medium). Second, a light pattern projector is used to project a light pattern onto the imaged scene.

In some embodiments, a narrow band-pass filter on a pair of CMOS stereo camera sensors eliminates nearly all ambient illumination in the scene, and the light that forms the final image on each CMOS sensor comes primarily from light reflected by surfaces in the scene emitted by the pattern projector. In this embodiment, a laser-based pattern projector may be used, wherein a laser source passes through a diffractive optical element to produce the desired pattern. Other light sources and optics may be used to achieve a similar effect (e.g., a Gobo disk with an incoherent point light source focused by an external optical element). Laser sources are advantageous (in particular semiconductor solid-state lasers) because they consume very little electrical energy and can be integrated into very small mechanical packages. These two attributes make them ideal for mobile, battery-powered robotic systems operating in the field. In embodiments utilizing a band pass filter medium, the light source used may be selected to precisely match the band pass filter medium placed on each CMOS sensor. The pattern to be projected may be a standard random or globally unique lattice map, such as those described previously.

In each of these three configurations, a different primary wavelength band is used for the filter media and the pattern projector light source. The use of these paired sensing and projection wavelengths is novel in stereo vision, and each of the three primary bands is selected to provide a different set of advantages for a particular indoor agricultural environment.

Example # 1: 400-activated 430nm working band-purple

In order to have maximum immunity to ambient light in closed (greenhouse, ring shed, etc.) crop environments that are mainly illuminated by natural sunlight, visible ultraviolet sensing is particularly suitable, since most commercially used greenhouse coverings block most of the ambient UV radiation in sunlight. By selecting the deep blue to violet visible wavelengths from 400-430nm as the region of interest, the depth sensor can operate approximately with near complete elimination of ambient light. (see fig. 6) due to the low sensitivity of silicon (Si) CMOS sensors below 400nm, an optical low pass filter with a cut-off frequency of 430nm can be used instead of a band pass filter to obtain the desired end result. The low-pass wavelength should be designed as close as possible to the high end of the projector's emitted Spectral Power Distribution (SPD). In this case, the sensor designer may determine the ability of the imaging sensor to read the projected lattice map reflected from the environment being measured.

For pattern projection, a wavelength-matched solid-state semiconductor laser diode may be used. Such diodes with center wavelengths in the range of 405-415nm are widely and inexpensively available even at high Continuous Wave (CW) power (particularly due to the popularity of blu-ray disc read and write technology using 405nm InGaN solid state laser diodes). Since there is virtually no readable ambient light in this configuration, a laser source with as high an optical power as possible may be required. Binocular stereo cameras can still operate very efficiently and accurately in total darkness, depending only on the projected pattern of textures they can match, but the brighter and higher contrast texture patterns, the shorter CMOS exposure duration is used. Laser light from a laser source through diffractive optics can produce very high contrast patterns. In such a design, the total spectral power of the laser pattern projector should be high enough that the exposure duration still resolves scenes with a modest amount of motion at a real-time rate (which typically means that exposure takes tens of milliseconds or less) given the sensitivity of the particular CMOS sensor used in this configuration.

Some practical considerations make it technically challenging to design a system to operate in such short bands. CMOS sensors with moderate to high sensitivity in the visible violet 400-430nm range (e.g. 55-60% in the case of the recently released sony IMX265 sensor) can be chosen, but they are not particularly sensitive to UV radiation. It is difficult to obtain a single mode laser source of high CW power and it is also difficult to design and manufacture inexpensive diffractive optical elements having a design wavelength in this range, low 0 order intensity and wide field of view. Diffractive optics that correct these problems have not been widely commercially available. These design constraints can be overcome, but the development and production costs of such wavelength selection can be made much more expensive than the other two, which will be discussed as alternative embodiments.

Furthermore, the vegetation absorbs 90-95% of the light within this bandwidth, which means that there is even higher demand on the brightness of the light source to obtain a high contrast pattern that can be exposed at real time speed.

Example # 2: 620-680nm working band-red

To facilitate the use of common off-the-shelf components and to speed up development and deployment, visible red light sensing in the 620-680nm range is an attractive operating wavelength range due to the high absorption of this band by the vegetation material undergoing photosynthesis. In addition, standard CMOS sensors that are highly sensitive in this range (e.g., 65-75% in the case of a Sony IMX365 sensor) are available. Similarly, laser light sources for integrated laser-based mode projectors are readily and inexpensively available at these wavelengths at 100+ mW continuous-wave output power (thanks to the common InGaAlP solid-state laser diode chemistry). Finally, at these design wavelengths with low 0 order intensity and wide field of view, low cost molded plastic diffractive optics can be easily manufactured.

It is desirable that the receive band of the band pass filter medium on each stereo CMOS image sensor window match the emission of the pattern projector light source as closely as possible. For example, where the laser source is centered at 660nm and the distribution matches a 2nm Full Width Half Maximum (FWHM), the band pass filter will attenuate light below 656nm and above 664 nm. This is not the only selectable wavelength within the band, but rather a common example of existing components. Alternatively, if not reached at full width at half maximum, the filter may be matched to 25% of the full width maximum of the light source emission spectrum. For example, for the described CWL 660nm and FWHM 2nm emitters, the corresponding 1/4 maximum would be about 5 nm. Thus, the matched filter will attenuate light below 650nm and above 660 nm.

As with the previous embodiment, almost all ambient light is rejected from the scene and only the high contrast projection pattern is used to perform stereo matching. However, in this embodiment, the components used to construct the system are easier to manufacture and much less costly. 85-90% of the light within this bandwidth is still absorbed by healthy vegetation, but the combination of higher CMOS sensitivity, stronger available light sources, more efficient diffractive optics, and higher but still smaller amounts of available ambient light means that it is substantially easier to construct a system that accomplishes exposure and depth measurements in real time than previous embodiments operating in visible violet light. That is, there is significantly more ambient light contamination within this bandwidth (see fig. 7), and thus the accuracy and performance advantages of producing such isolation may not be as significant as they would be if they were operating in visible violet light.

There is an exception to this limitation that is particularly relevant to the application of this depth sensor embodiment. Many fruits and vegetables that grow or ripen by biosynthetic production of various carotenes are highly reflective within this bandwidth. For example, both mature and immature tomatoes reflect approximately 95% of the available light at 630-670 nm. This is also true for other ripe fruits and vegetables, such as sweet peppers (bell peppers), which also derive their pigmentation from the biosynthetic production of beta-carotene (orange fruit), lutein (yellow fruit) or lycopene (red fruit) as they grow and ripen. Thus, operating within a bandwidth in the visible red range will provide excellent pattern contrast, particularly on the surface of the fruit itself, resulting in higher quality and more accurate depth measurements in those areas, regardless of fruit maturity. This capability provides a great advantage for robotic systems that will physically manipulate fruit growing on a plant using position data estimated from depth measurements.

Example # 3: 520-560nm working band-green

In another contemplated embodiment, a visible green wavelength is used for the pattern projector and the CMOS bandpass filter medium. The 520-560nm range is of particular interest for two reasons relating specifically to equipment for agricultural use. First, while there is significantly more ambient light at this wavelength under natural lighting (as discussed previously), many newer indoor agricultural facilities utilize artificial lighting that uses LED diodes almost exclusively to emit visible blue and red light for improved energy efficiency. This is because these are the wavelengths that are primarily absorbed and utilized during photosynthesis.

In this embodiment, a band pass filter medium is placed on each CMOS imaging sensor as closely as possible to match the projector wavelength (as in other embodiments). Standard CMOS sensors are typically very sensitive to green light. Solid-state 520nm GaN laser diodes and 532nm diode pumped solid-state laser modules are generally available at high output power and are all chosen appropriately for the center wavelength of the projector light source.

Even in the greenhouse industry, where artificial red-blue LED lighting is used for operation, both violet and red operating wavelengths may be negatively affected by ambient lighting. Therefore, the visible green embodiment becomes preferred. As shown in fig. 8, selecting an operating band centered at 532nm provides excellent noise immunity in this type of lighting environment.

In addition, the wavelength still exhibits moderate reflectance on leaf plants (about 20%) and on immature fruits (about 37% in the case of tomatoes, as shown in the above figure). The higher reflectivity means that at any given projector brightness, the pattern perceived by the CMOS sensor will have higher brightness and contrast, improving the quality of the depth measurement, reducing the exposure duration, and significantly reducing the cost of the constituent hardware.

Enhancing illumination in the operating wavelength range

In addition to selecting light of an appropriate wavelength to capture a three-dimensional image of a scene, other lighting parameters may be adjusted or provided to enhance image quality.

Another useful physical property that helps suppress environmental interference is polarization. As with the design of the wavelength-based selective filter, the contrast of the depth pattern projector brightness to environmental interference can be further enhanced if the pattern projection source is polarized at a known angle and the imaging system is also filtered to accept light polarized at the projector polarization angle. This may make operation in visible green light easier to handle and help overcome the greater amount of ambient light present in natural lighting scenes.

Stereo matching benefits from any additional and unique details present in the image exposure. In any of the above embodiments, it may be desirable to artificially provide "bulk" illumination at the design wavelength of the system using readily available high power LED modules (high brightness single diode modules that emit almost exclusively within narrow wavelength bands are available from various manufacturers such as Lumileds). In this way, diffuse reflection and possibly polarized illumination within the narrow wavelength band used by the stereo sensor can be provided, and natural texture of surfaces in the scene (in addition to the projected light pattern) is allowed to resolve in the CMOS sensor exposure. In many cases, combined illumination will be beneficial and will improve the integrity and accuracy of the depth measurement. Furthermore, the added light on the scene will further reduce the exposure duration required for resolution and imaging, helping the entire stereoscopic system to operate at higher real-time frame rates. Since all light added via "bulk" diffuse reflected illumination falls within the bandpass range of the CMOS filter medium, no energy is wasted to produce light that will eventually be filtered out by the system, an important standard for battery powered mobile robotic systems.

Example System

Fig. 9 schematically illustrates an example of a system 100 for three-dimensional imaging of a plant 200 in an indoor agricultural environment 210. The system includes spatially separated stereo camera pairs 105A,105B, each stereo camera pair 105A,105B including an image sensor 110A, 110B. The cameras 105A,105B may be physically separate, or may be in the same body or package. The cameras 105A,105B may be active stereo cameras with a projector, or in other embodiments, may be passive stereo cameras without a projector. Examples of commercially available cameras that may be used for cameras 105A,105B includeRealsenseTMA D415 or D435 camera or a ZED Mini camera. When imaging an object at a distance of 24 inches, the cameras 105A,105B may exhibit a Root Mean Square Error (RMSE) of depth accuracy of about 3mm or less, 2.5mm or less, or 2mm or less. The cameras 105A,105B may exhibit a density in pixels/mm when imaging an object at a distance of 24 inches2For a unit, the number of depth pixels is defined, with a margin of error per unit imaging area within 2 mm.

The filters 115A,115B are placed in front of the image sensors of the cameras 105A,105B, or in some embodiments in front of the entire cameras, for example in front of the objective lenses of the cameras 105A, 105B. The filters 115A,115B are configured to block light having a wavelength higher than the average environmental intensity in an indoor agricultural environment and/or selectively pass light having a particular polarization. The system 100 also includes a projector 120 configured to project a pattern of non-uniform, and in some embodiments polarized, light onto a scene including vegetation. The projector 120 may comprise or take the form of a laser transmitter configured to transmit a laser at a frequency that is not substantially attenuated by the filters 115A,115B, and a diffractive optical element 122 in the optical path of the laser, the diffractive optical element 122 being configured to produce the non-uniform pattern as a non-uniform dot pattern. Diffractive optical element 125 may include or be replaced by a polarizer configured to polarize light projected from projector 120 at a polarization angle, and filters 115A,115B configured to selectively pass light. Thus, the elements 125 shown in FIG. 9 may represent diffractive optical elements and/or polarizers. In some embodiments, the cameras 105A,105B may be active stereo cameras, which may include the projector 120 within the same package or body as the other features of the cameras 105A, 105B.

The system further includes a processor 130 configured to generate a depth map of the scene from images captured by the stereoscopic camera pair. The processor 130 is operatively connected to a memory 135, such as a disk drive or solid state memory, for storing programming instructions or recorded images.

The system 100 may also include a spectrum analyzer 140 configured to characterize an ambient light power spectrum in the indoor agricultural environment and provide an indication of a bandwidth of light having a lower average ambient intensity in the indoor agricultural environment. The spectrum analyzer 140 may be in communication with the processor 130 and/or the memory 135 to receive instructions or output results.

A second projector 145 may be included in the system 100 and configured to illuminate the scene with unpatterned light sources in frequency bands selectively passed by the filters 115A, 115B.

The system 100 may be included in a robotic harvester 300 schematically illustrated in fig. 10. The robotic harvester can include its own processor 330, wherein the processor 330 is in communication with the processor 130 of the system 100, or can be in communication with the processor 130 and operated by the processor 130. The harvester 300 can be configured to harvest the agricultural product 205 from a location identified by the system 100 using, for example, the robotic arm 305. The robotic harvester 300 may include wheels 310 or tracks or other forms of motive means to move throughout the environment, harvesting products 205 from different plants 200 in the environment.

Having thus described several aspects of at least one embodiment, it is to be appreciated various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the spirit and scope of the disclosure. The acts of the methods disclosed herein may be performed in an alternating order other than that shown, and one or more acts may be omitted, substituted or added. One or more features of any of the examples disclosed herein may be combined with or substituted for one or more features of any of the other examples disclosed. Accordingly, the foregoing description and drawings are by way of example only.

The phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. As used herein, the term "plurality" refers to two or more items or components. As used herein, dimensions described as "substantially similar" should be considered within about 25% of each other. The terms "comprising," including, "" carrying, "" having, "" containing, "and" involving, "whether in the written description or the claims and the like, are open-ended terms that mean" including but not limited to. Thus, use of such terms is meant to include the items listed thereafter and equivalents thereof as well as additional items. For the claims, the conjoint phrases "consisting of … …" and "consisting essentially of … …" alone are closed or semi-closed conjoint phrases, respectively. Use of ordinal terms such as "first," "second," "third," etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the order in which acts of a temporal method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.

26页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:作业车、收割机以及作业机

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!