Method and system for forming extended focal plane for large viewpoint changes

文档序号：835741 发布日期：2021-03-30 浏览：5次中文

阅读说明：本技术 用于形成大视点变化的扩展焦平面的方法和系统 (Method and system for forming extended focal plane for large viewpoint changes ) 是由塞波·T·瓦利佩卡·K·西尔塔宁于 2019-06-28 设计创作，主要内容包括：描述了用于捕获和显示用于多焦平面(MFP)显示器的内容的方法和系统。在一个示例中,使用大光圈相机来捕获场景的多个纹理图像,每个纹理图像具有不同的焦距。对于每个纹理图像,生成焦平面图像。为了生成焦平面图像,在一些实施例中,每个纹理图像中的每个像素被乘以相应的权重值。权重值可以基于相应像素的测量深度(例如,如使用捕获的深度图确定的)和/或基于如通过滤波确定的相应像素的聚焦(或散焦)级别。焦平面图像可以显示在多焦平面显示器上,并且可以用于生成虚拟视点。(Methods and systems for capturing and displaying content for a multi-focal plane (MFP) display are described. In one example, a large aperture camera is used to capture multiple texture images of a scene, each texture image having a different focal length. For each texture image, a focal plane image is generated. To generate the focal plane image, in some embodiments, each pixel in each texture image is multiplied by a respective weight value. The weight values may be based on the measured depth of the respective pixels (e.g., as determined using the captured depth map) and/or based on the focus (or defocus) level of the respective pixels as determined by filtering. The focal plane images may be displayed on a multi-focal plane display and may be used to generate a virtual viewpoint.)

1. A method, comprising:

obtaining a plurality of texture images of a scene, each texture image having a different respective focal length; and

for each texture image, a focal plane image is generated by: (i) determine a corresponding focus weight for each of a plurality of pixels of the texture image, wherein the focus weight represents an amount by which the pixel is in focus, and (ii) multiply a pixel value of each of the plurality of pixels by the corresponding focus weight.

2. The method of claim 1, further comprising displaying the focal plane images in a multi-focal plane display at the respective focal lengths of the focal plane images.

3. The method of claim 2, wherein the focal plane images are displayed substantially simultaneously.

4. The method of any of claims 1-3, wherein the amount by which a pixel in a texture image is in focus is determined based at least in part on a depth value corresponding to the pixel.

5. The method of any of claims 1-4, further comprising obtaining a corresponding depth map for each texture image, wherein the focus weight for the pixel in the texture image is determined based on the depth map.

6. The method of claim 5, wherein the focus weight for each pixel in a texture image is determined based at least in part on a difference between the focal distance of the texture image including the pixel and a depth value of the pixel in the corresponding depth map.

7. The method of claim 5 or 6, wherein the depth map of each texture image is captured at the focal distance of the corresponding texture image.

8. The method of any one of claims 5-7, wherein:

obtaining a plurality of texture images includes: capturing each of the plurality of texture images at the respective focal distance; and

obtaining the corresponding depth map comprises: capturing each depth map of the scene focused at the respective focal distance.

9. The method according to any of claims 1-8, wherein the focus weight w for a pixel in a texture image i_i(x, y) according to the depth z of the pixel_i(x, y) is determined such that w_i(x,y)＝w_i[z_i(x,y)]。

10. The method of claim 9, wherein when z is_i(x, y) is substantially equal to the focal length of the texture image i, w_i[z_i(x,y)]With a maximum value.

11. The method of any of claims 1-3, wherein the amount by which pixels in the texture image are in focus is determined based at least in part on a defocus map generated from the texture image.

12. The method of any of claims 1-11, further comprising generating a virtual viewpoint by shifting at least one of the focal plane images by an amount inversely proportional to the display focal length of the respective focal plane image.

13. The method of claim 12, further comprising displaying the generated virtual viewpoint as one of a stereoscopic pair of viewpoints.

14. The method of claim 12, further comprising displaying the generated virtual viewpoint to simulate motion parallax in response to viewer head motion.

15. A system comprising a processor and a non-transitory computer readable medium operable to perform a method comprising:

obtaining a plurality of texture images of a scene, each texture image having a different respective focal length; and

Background

Forming and using a multi-focal plane (MFP) is one method for avoiding vergence-adjustment conflicts that enables a viewer to naturally focus image information along the depth dimension. This approach may be particularly useful in near-eye (glasses) displays.

The MFP display creates a stack of discrete focal planes, thereby composing a 3D scene from layers along the viewer's visual axis. Views of a 3D scene are formed by projecting user-visible pixels (or voxels) at different depths and spatial angles.

Each focal plane displays a portion of the depth range for which the representation of the 3-D view corresponds to the respective focal plane. Depth blending is a method for smoothing the quantization step size and contouring when viewing views compiled from discrete focal planes, so that the user will be less likely to perceive the step size. Depth blending is described in more detail in K.Akeley et al, "stereoscopic Display Prototype with Multiple Focal lengths (A Stereo Display type with Multiple Focal distance)", ACM graphics processing (TOG), v.23n.3,2004, 08, pp.804-813 and Hu, X., & Hua, H. (2014) "Design and evaluation of depth-fused multi-Focal-plane Display Prototype" IEEE/OSA Display Technology Journal (Journal of Display Technology),10(4), 308-316.

When depth-blending is used, it has been found that rendering a relatively small number of focal planes (e.g., 4-6 planes) is sufficient for acceptable quality. This number of focal planes is technically feasible as well.

Multi-focal planar displays may be implemented by stacking of spatially multiplexed 2-D displays or by sequentially switching the focal lengths of individual 2-D displays in a time multiplexed manner. The change in focal length of a single 2-D display may be achieved by high-speed birefringence (or other variable focus element) while spatially rendering the visible portion of the corresponding multi-focus image frame. Without depth mixing, it is desirable to use a greater number of focal planes, for example 14 or more, as described in j.p. rolland et al, "multi focal plane head-mounted displays" appl.opt.39, 3209-3215 (2000).

The Human Visual System (HVS) facilitates the placement of the focal plane at regular distances over the refractive scale. On the other hand, depth information is usually most easily captured using linear scales. Both of these options can be used for MFP displays. An example of an MFP near-eye display is schematically shown in fig. 2. Fig. 2 shows the display viewed by the left eye 202 and the right eye 204 of the user. A respective eyepiece 206, 208 is provided for each eye. The eyepieces focus the images formed by the respective image stacks 210, 212. The image stacks form different images at different distances from the eyepiece. To the user's eyes, the image appears to originate from different virtual image planes, such as image planes 214, 216, 218.

The MFP display creates an approximation of the light field of the displayed scene. Since the near-eye display moves as the user's head moves, it is sufficient to support only one viewpoint at each time. Accordingly, the approximation of the light field is easier since it is not necessary to capture the light field for a large number of viewpoints.

Disclosure of Invention

The present disclosure describes methods and systems for capturing and displaying content for a multi-focal plane (MFP) display. In some embodiments, content is generated from the focal stack (images captured at varying focal lengths). Some embodiments may produce a reduced number of de-occlusions and holes when moving the MFP for large synthetic parallax or viewpoint changes.

In some embodiments, the focused image is captured with a large aperture, such that some image information is obtained from behind the occluding object.

Some embodiments also perform large aperture depth sensing, which may be achieved by large aperture depth sensors, by applying defocus maps, or by using suitable filtering and redistribution schemes on the focus stack and/or focal plane formed therefrom. In some embodiments, filtering is applied to focus the stacked images prior to forming the redistributed focal planes. In some embodiments, the filtering is applied after the focal plane is formed. The filtered results are then used to form a redistributed focal plane (or more generally, a high frequency and/or redistributed focal plane).

One example operates as follows. Obtaining a plurality of texture images p of a scene_iWherein each texture image has a different respective focal length d_i. The texture image may be, for example, an RGB image or a grayscale image, among other options. For each texture image p_iGenerating a focal plane image q_i. To generate a focal plane image q_iTexture image p_iBy a weight w_i(x, y) weighted. Texture image p_iEach pixel value p of_i(x, y) multiplied by respective weights w_i(x, y) to generate a focal plane image q_iSo that q is_i(x,y)＝p_i(x,y)·w_i(x,y)。

Weight w_i(x, y) can be expressed in the texture image p_iThe amount of focus of the middle pixel (x, y). Different techniques may be used to determine the texture image p_iThe amount of focus of the middle pixel (x, y). In some such techniques, the depth z of a pixel (x, y) is measured or otherwise determined_i(x, y), and weight w_i(x, y) is a function of depth such that w_i(x,y)＝w_i[z_i(x,y)]. Function w_i[z]May be a mixing function as used in known multi-focal displays. In some embodiments, the function is at w_i[d_i]Has a maximum value (e.g., a value of 1) indicating the likelihood that the pixel is most in focus when the measured depth of the pixel is the same as the focal length. w is a_i[z]Can be varied with z from the focal length d_iIncreasing or decreasing and monotonically decreasing, giving lower weight for pixel depths that are further away from the focal distance and less likely to be in focus. Pixels with depth values sufficiently offset from the focal plane may be given zero weight (even if a certain level of focus may be discerned).

In some embodiments, in the texture image p_iThe amount of in-focus for the middle pixel (x, y) is determined by generating a defocus map, which is the texture image p_iEach pixel in (a) is assigned a focus level (or defocus level). For example, the most focused pixel one may be given a weight, and the less focused pixels may be given a weight as low as zero.

N focal plane images q_o…q_i…q_N-1Can be generated using the techniques described herein and can be displayed on a multi-focal-plane display. Depending on the type of display, the focal plane images may be displayed simultaneously or in a rapidly recurring sequence using time multiplexing.

In some embodiments, a texture image p may be used_iMay be greater than the number of available (or desired) display planes in the multi-focal plane display. In this case, a method may include selecting one focal plane image for each display plane. For each display plane, a texture image having a focal length that is the same as or closest to the focal length of the display plane may be selected.

In some embodiments, the virtual viewpoint is generated by laterally shifting at least a first of the focal plane images relative to at least a second of the focal plane images. For example, the focal plane images may be laterally shifted by an amount inversely proportional to the display focal length of the respective focal plane images (i.e., the focal length of the display plane of the focal plane images). The virtual viewpoint may be used as one or both of the stereoscopic viewpoint pair. Virtual viewpoints may also be generated in response to viewer head motion to simulate motion parallax.

In some embodiments, capture is substantially simultaneousEach texture image p_iAnd corresponding depth map d_i. Each texture image and corresponding depth map may be captured using the same or similar optics. Each texture image and the corresponding depth map may be captured using optics having the same aperture.

Drawings

FIG. 1A is a system diagram illustrating an example communication system in which one or more disclosed embodiments may be implemented.

Fig. 1B is a system diagram illustrating an example wireless transmit/receive unit (WTRU) that may be used in the communication system shown in fig. 1A, according to an embodiment.

FIG. 2 is a schematic diagram of a multi-focal near-eye display that may be used in some embodiments.

Fig. 3A-3D are schematic representations of different texture images of the same scene. Fig. 3A is a schematic representation of an extended depth image of a scene. Fig. 3B-3D are schematic representations of images of a scene captured with a larger aperture camera focused at different distances.

Fig. 4A-4D are schematic diagrams illustrating depth maps captured at different focal lengths. The schematic depth maps shown in fig. 4A-4D correspond to the respective texture images shown in fig. 3A-3D.

Fig. 5A-5C are schematic diagrams of focus weight maps generated from the depth maps of fig. 4B-4D, respectively.

Fig. 6A schematically shows the generation of a focal plane image from the texture image of fig. 3B and the focus weight map of fig. 5A.

Fig. 6B schematically shows the generation of a focal plane image from the texture image of fig. 3C and the focus weight map of fig. 5B.

Fig. 6C schematically shows the generation of a focal plane image from the texture image of fig. 3D and the focus weight map of fig. 5C.

Fig. 7 schematically illustrates the display of the focal plane images of fig. 6A-6C to a user.

FIG. 8 is a flow diagram that illustrates a method of generating and displaying a focal plane image in some embodiments.

FIG. 9 is a schematic of different focal lengths encountered in some embodiments.

FIG. 10 is a flow diagram that illustrates a method of generating and displaying a focal plane image in some embodiments.

FIG. 11 is a flow diagram that illustrates a method of generating and displaying a focal plane image in some embodiments.

FIG. 12 is a flow diagram that illustrates a method of generating and displaying a focal plane image in some embodiments.

13A-13C illustrate depth z as a function of different focal planes in some embodiments_iFocus weight w of a function of (x, y)_iExamples of (x, y).

FIGS. 14A-14C show depth z as a function of different focal planes in additional embodiments_iFocus weight w of a function of (x, y)_iExamples of (x, y).

Example network for implementation of embodiments

Fig. 1A is a diagram illustrating an example communication system 100 in which one or more disclosed embodiments may be implemented. The communication system 100 may be a multiple-access system that provides content, such as voice, data, video, messaging, broadcast, etc., to a plurality of wireless users. The communication system 100 may enable multiple wireless users to access such content by sharing system resources, including wireless bandwidth. For example, communication system 100 may use one or more channel access methods such as Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), orthogonal FDMA (ofdma), single carrier FDMA (SC-FDMA), zero-tailed unique word DFT-spread OFDM (ZT UW DTS-s OFDM), unique word OFDM (UW-OFDM), resource block filtered OFDM, and filter bank multi-carrier (FBMC), among others.

As shown in fig. 1A, the communication system 100 may include wireless transmit/receive units (WTRUs) 102a, 102b, 102c, 102d, RANs 104/113, CNs 106/115, Public Switched Telephone Networks (PSTNs) 108, the internet 110, and other networks 112, although it should be appreciated that any number of WTRUs, base stations, networks, and/or network elements are contemplated by the disclosed embodiments. Each WTRU 102a, 102b, 102c, 102d may be any type of device configured to operate and/or communicate in a wireless environment. For example, any of the WTRUs 102a, 102b, 102c, 102d may be referred to as a "station" and/or a "STA," which may be configured to transmit and/or receive wireless signals, and may include a User Equipment (UE), a mobile station, a fixed or mobile subscriber unit, a subscription-based unit, a pager, a cellular telephone, a Personal Digital Assistant (PDA), a smartphone, a laptop, a netbook, a personal computer, a wireless sensor, a hotspot or Mi-Fi device, an internet of things (IoT) device, a watch or other wearable device, a head-mounted display (HMD), a vehicle, a drone, medical devices and applications (e.g., tele-surgery), industrial devices and applications (e.g., robots and/or other wireless devices operating in industrial and/or automated processing chain environments), consumer electronics devices and applications (e.g., robots, other wireless devices operating in industrial and/or automated processing chain environments), a wireless network interface, a wireless network, And devices operating on commercial and/or industrial wireless networks, and the like. Any of the WTRUs 102a, 102b, 102c, 102d may be interchangeably referred to as a UE.

Communication system 100 may also include base station 114a and/or base station 114 b. Each base station 114a, 114b may be any type of device configured to facilitate access to one or more communication networks (e.g., CN 106/115, the internet 110, and/or other networks 112) by wirelessly interfacing with at least one of the WTRUs 102a, 102b, 102c, 102 d. For example, the base stations 114a, 114B may be Base Transceiver Stations (BTSs), node B, e node bs, home enodebs, gnbs, NR node bs, site controllers, Access Points (APs), and wireless routers, among others. Although each base station 114a, 114b is depicted as a single element, it should be appreciated that the base stations 114a, 114b may include any number of interconnected base stations and/or network elements.

The base station 114a may be part of the RAN 104/113, and the RAN may also include other base stations and/or network elements (not shown), such as Base Station Controllers (BSCs), Radio Network Controllers (RNCs), relay nodes, and so forth. Base station 114a and/or base station 114b may be configured to transmit and/or receive wireless signals on one or more carrier frequencies, known as cells (not shown). These frequencies may be in the licensed spectrum, the unlicensed spectrum, or a combination of licensed and unlicensed spectrum. A cell may provide wireless service coverage for a particular geographic area that is relatively fixed or may vary over time. The cell may be further divided into cell sectors. For example, the cell associated with base station 114a may be divided into three sectors. Thus, in one embodiment, the base station 114a may include three transceivers, i.e., one transceiver corresponding to one sector of a cell. In an embodiment, base station 114a may use multiple-input multiple-output (MIMO) technology and may use multiple transceivers for each sector of a cell. For example, using beamforming, signals may be transmitted and/or received in desired spatial directions.

The base stations 114a, 114b may communicate with one or more of the WTRUs 102a, 102b, 102c, 102d over an air interface 116, which may be any suitable wireless communication link (e.g., Radio Frequency (RF), microwave, centimeter-wave, millimeter-wave, Infrared (IR), Ultraviolet (UV), visible, etc.). Air interface 116 may be established using any suitable Radio Access Technology (RAT).

More specifically, as described above, communication system 100 may be a multiple-access system and may use one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, and SC-FDMA, among others. For example, the base station 114a and the WTRUs 102a, 102b, 102c in the RAN 104/113 may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) terrestrial radio access (UTRA), which may establish the air interface 115/116/117 using wideband cdma (wcdma). WCDMA may include communication protocols such as High Speed Packet Access (HSPA) and/or evolved HSPA (HSPA +). HSPA may include high speed Downlink (DL) packet access (HSDPA) and/or High Speed UL Packet Access (HSUPA).

In an embodiment, the base station 114a and the WTRUs 102a, 102b, 102c may implement a radio technology such as evolved UMTS terrestrial radio access (E-UTRA), which may establish the air interface 116 using Long Term Evolution (LTE) and/or LTE-advanced (LTE-a) and/or LTE-Pro (LTE-a Pro).

In an embodiment, the base station 114a and the WTRUs 102a, 102b, 102c may implement a radio technology, such as NR radio access, that may use a New Radio (NR) to establish the air interface 116.

In an embodiment, the base station 114a and the WTRUs 102a, 102b, 102c may implement multiple radio access technologies. For example, the base station 114a and the WTRUs 102a, 102b, 102c may collectively implement LTE radio access and NR radio access (e.g., using Dual Connectivity (DC) principles). As such, the air interface used by the WTRUs 102a, 102b, 102c may be characterized by multiple types of radio access technologies and/or transmissions sent to/from multiple types of base stations (e.g., enbs and gnbs).

In other embodiments, the base station 114a and the WTRUs 102a, 102b, 102c may implement radio technologies such as IEEE 802.11 (i.e., Wireless high fidelity (WiFi)), IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA 20001X, CDMA2000 EV-DO, interim standard 2000(IS-2000), interim standard 95(IS-95), interim standard 856(IS-856), Global System for Mobile communications (GSM), enhanced data rates for GSM evolution (EDGE), and GSM EDGE (GERAN), among others.

The base station 114B in fig. 1A may be, for example, a wireless router, a home nodeb, a home enodeb, or an access point, and may use any suitable RAT to facilitate wireless connectivity in local areas such as business establishments, homes, vehicles, campuses, industrial facilities, air corridors (e.g., for use by drones), roads, and so forth. In one embodiment, the base station 114b and the WTRUs 102c, 102d may establish a Wireless Local Area Network (WLAN) by implementing a radio technology such as IEEE 802.11. In an embodiment, the base station 114b and the WTRUs 102c, 102d may establish a Wireless Personal Area Network (WPAN) by implementing a radio technology such as IEEE 802.15. In yet another embodiment, the base station 114b and the WTRUs 102c, 102d may establish the pico cell or the femto cell by using a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE-A, LTE-a Pro, NR, etc.). As shown in fig. 1A, the base station 114b may be directly connected to the internet 110. Thus, base station 114b need not access the internet 110 via CN 106/115.

The RAN 104/113 may communicate with a CN 106/115, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more WTRUs 102a, 102b, 102c, 102 d. The data may have different quality of service (QoS) requirements, such as different throughput requirements, latency requirements, fault tolerance requirements, reliability requirements, data throughput requirements, and mobility requirements, among others. CN 106/115 may provide call control, billing services, mobile location-based services, pre-paid calling, internet connectivity, video distribution, etc., and/or may perform advanced security functions such as user authentication. Although not shown in fig. 1A, it should be appreciated that the RAN 104/113 and/or CN 106/115 may communicate directly or indirectly with other RANs that employ the same RAT as the RAN 104/113 or a different RAT. For example, in addition to being connected to the RAN 104/113, which may use NR radio technology, the CN 106/115 may communicate with another RAN (not shown) that uses GSM, UMTS, CDMA2000, WiMAX, E-UTRA, or WiFi radio technology.

The CN 106/115 may also act as a gateway for the WTRUs 102a, 102b, 102c, 102d to access the PSTN 108, the internet 110, and/or other networks 112. The PSTN 108 may include a circuit-switched telephone network that provides Plain Old Telephone Service (POTS). The internet 110 may include a system of globally interconnected computer network devices that utilize common communication protocols, such as transmission control protocol/internet protocol (TCP), User Datagram Protocol (UDP), and/or IP in the TCP/IP internet protocol suite. The network 112 may include wired and/or wireless communication networks owned and/or operated by other service providers. For example, the network 112 may include another CN connected to one or more RANs, which may use the same RAT as the RAN 104/113 or a different RAT.

Some or all of the WTRUs 102a, 102b, 102c, 102d in the communication system 100 may include multi-mode capabilities (e.g., the WTRUs 102a, 102b, 102c, 102d may include multiple transceivers that communicate with different wireless networks over different wireless links). For example, the WTRU 102c shown in fig. 1A may be configured to communicate with a base station 114a using a cellular-based radio technology and with a base station 114b, which may use an IEEE 802 radio technology.

Figure 1B is a system diagram illustrating an example WTRU 102. As shown in fig. 1B, the WTRU 102 may include a processor 118, a transceiver 120, a transmit/receive element 122, a speaker/microphone 124, a keypad 126, a display/touch pad 128, non-removable memory 130, removable memory 132, a power supply 134, a Global Positioning System (GPS) chipset 136, and/or other peripherals 138, among others. It should be appreciated that the WTRU 102 may include any subcombination of the foregoing elements while maintaining consistent embodiments.

The processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a Digital Signal Processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of Integrated Circuit (IC), a state machine, or the like. The processor 118 may perform signal decoding, data processing, power control, input/output processing, and/or any other functions that enable the WTRU 102 to operate in a wireless environment. The processor 118 may be coupled to a transceiver 120 and the transceiver 120 may be coupled to a transmit/receive element 122. Although fig. 1B depicts processor 118 and transceiver 120 as separate components, it should be understood that processor 118 and transceiver 120 may be integrated together in an electronic package or chip.

The transmit/receive element 122 may be configured to transmit or receive signals to or from a base station (e.g., base station 114a) via the air interface 116. For example, in one embodiment, the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals. As an example, in an embodiment, the transmit/receive element 122 may be a radiator/detector configured to transmit and/or receive IR, UV or visible light signals. In yet another embodiment, the transmit/receive element 122 may be configured to transmit and/or receive RF and optical signals. It should be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wireless signals.

Although transmit/receive element 122 is depicted in fig. 1B as a single element, WTRU 102 may include any number of transmit/receive elements 122. More specifically, the WTRU 102 may use MIMO technology. Thus, in one embodiment, the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) that transmit and receive wireless signals over the air interface 116.

Transceiver 120 may be configured to modulate signals to be transmitted by transmit/receive element 122 and to demodulate signals received by transmit/receive element 122. As described above, the WTRU 102 may have multi-mode capabilities. Thus, the transceiver 120 may include multiple transceivers that allow the WTRU 102 to communicate via multiple RATs (e.g., NR and IEEE 802.11).

The processor 118 of the WTRU 102 may be coupled to and may receive user input data from a speaker/microphone 124, a keypad 126, and/or a display/touch pad 128, such as a Liquid Crystal Display (LCD) display unit or an Organic Light Emitting Diode (OLED) display unit. The processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128. Further, the processor 118 may access information from and store data in any suitable memory, such as the non-removable memory 130 and/or the removable memory 132. The non-removable memory 130 may include Random Access Memory (RAM), Read Only Memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 132 may include a Subscriber Identity Module (SIM) card, a memory stick, a Secure Digital (SD) memory card, and the like. In other embodiments, the processor 118 may access information from and store data in memory that is not physically located in the WTRU 102, such memory may be located, for example, in a server or a home computer (not shown).

The processor 118 may receive power from the power source 134 and may be configured to distribute and/or control power for other components in the WTRU 102. The power source 134 may be any suitable device for powering the WTRU 102. For example, power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (Ni-Cd), nickel-zinc (Ni-Zn), nickel metal hydride (NiMH), lithium ion (Li-ion), etc.), solar cells, and fuel cells, among others.

The processor 118 may also be coupled to a GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) related to the current location of the WTRU 102. In addition to or in lieu of information from the GPS chipset 136, the WTRU 102 may receive location information from base stations (e.g., base stations 114a, 114b) via the air interface 116 and/or determine its location based on the timing of signals received from two or more nearby base stations. It should be appreciated that the WTRU 102 may acquire location information via any suitable location determination method while maintaining consistent embodiments.

The processor 118 may be further coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality, and/or wired or wireless connectivity. For example, the peripheral devices 138 may include accelerometers, electronic compasses, satellite transceivers, digital cameras (for photos and/or video), Universal Serial Bus (USB) ports, vibration devices, television transceivers, hands-free headsets, mass spectrometers, and the like,Modules, Frequency Modulation (FM) radio units, digital music players, media players, video game modules, internet browsers, virtual reality and/or augmented reality (VR/AR) devices, and activity trackers, among others. The peripheral device 138 may include one or more sensors, which may be one or more of the following: a gyroscope, an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor, a geographic position sensor, an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.

The WTRU 102 may include a full duplex radio for which reception and transmission of some or all signals (e.g., associated with particular subframes for UL (e.g., for transmission) and downlink (e.g., for reception)) may be concurrent or simultaneous, etc. The full-duplex radio may include an interference management unit that reduces and/or substantially eliminates self-interference via signal processing by hardware (e.g., a choke coil) or by a processor (e.g., a separate processor (not shown) or by the processor 118). In an embodiment, the WTRU 102 may include a half-duplex radio that transmits or receives some or all signals, such as associated with a particular subframe for UL (e.g., for transmission) or downlink (e.g., for reception).

Although the WTRU is depicted in fig. 1A-1B as a wireless terminal, it is contemplated that in some representative embodiments, such a terminal may use (e.g., temporarily or permanently) a wired communication interface with a communication network.

In a representative embodiment, the other network 112 may be a WLAN.

In view of fig. 1A-1B and the corresponding description with respect to fig. 1A-1B, one or more or all of the functions described herein may be performed by one or more emulation devices (not shown). The emulation device can be one or more devices configured to simulate one or more or all of the functions described herein. For example, the emulation device may be used to test other devices and/or simulate network and/or WTRU functions.

The simulated device may be designed to conduct one or more tests with respect to other devices in a laboratory environment and/or in a carrier network environment. For example, the one or more simulated devices may perform one or more or all functions while implemented and/or deployed, in whole or in part, as part of a wired and/or wireless communication network in order to test other devices within the communication network. The one or more emulation devices can perform one or more or all functions while temporarily implemented/deployed as part of a wired and/or wireless communication network. The simulation device may be directly coupled to another device to perform testing and/or may perform testing using over-the-air wireless communication.

The one or more emulation devices can perform one or more functions, including all functions, while not being implemented or deployed as part of a wired and/or wireless communication network. For example, the simulation device may be used in a test scenario of a test laboratory and/or a wired and/or wireless communication network that is not deployed (e.g., tested) in order to conduct testing with respect to one or more components. The one or more simulation devices may be test devices. The simulation device may transmit and/or receive data using direct RF coupling and/or wireless communication via RF circuitry (which may include one or more antennas, as examples).

Detailed Description

The effect of a large aperture is used in image capture.

Practical cameras using a limited aperture produce images with a certain depth of field (DoF). Depth of field can be described as the span of distance from the capture point within which the pixel is in focus. Outside the DoF, the pixels become defocused or blurred.

When the camera parameters are known, the DoF can be calculated or estimated using known formulas.

At one extreme, an idealized pinhole camera is one with an infinitely small aperture. An ideal pinhole camera produces an image with an infinite DoF and all pixels are in focus regardless of their depth. In fact, under very well-lit conditions, a pinhole camera can be approximated by using a small aperture in a physical camera.

Under actual imaging conditions, an approximation of the pinhole image can be achieved by capturing and combining focused stack images, images with several focal lengths. Various algorithms exist to combine these images into one extended focus image. A discrete set of focus captures is used to form an extended focus image.

Fig. 3A is a schematic representation of a pinhole image 302 of a scene including nearby radios, a window frame at an intermediate distance, and buildings at a greater distance. Due to the small aperture of the pinhole camera, the objects at all distances are in focus.

Fig. 3B-3D are schematic representations of images of the same scene captured with a larger aperture camera focused at different distances. In some embodiments, the images of fig. 3B-3D are captured simultaneously or substantially simultaneously (or, in the case of virtual images, generated to represent substantially the same moment in time).

FIG. 3B is a view of focusing at a close distance f₁Is shown schematically in fig. 304. The radio itself is in focus (shown in solid lines) while the more distant window frames and buildings are at least partially unfocused (shown in dashed lines).

FIG. 3C is a view of focusing at an intermediate distance f₂Is shown schematically in fig. 306. The window frame is in focus, but the nearby radios and more distant buildings are at least partially unfocused. As also schematically shown in fig. 3C, given a non-zero aperture size, the focusing window frame is "visible through" the antenna portion of the unfocused radio. Although this effect is most pronounced in narrow objects (e.g. radio antennas), it is also pronounced around the edges of larger real world objects. However, in order to maintain the clarity and reproducibility of the present figure, the effect is shown here only for narrow objects.

FIG. 3D is a view of focusing at a greater distance f₃Is shown schematically in the figure 308. Nearby radios and somewhat more distant window frames are not in focus, but the building visible through the window is in focus. Given the aperture size, the narrow window strips (horizontal and vertical rungs) of the window frame appear at least partially transparent, as do the radio antennas. As a result, the captured image contains a larger proportion of information about the building than the pinhole image of fig. 3A.

Fig. 3A-3D illustrate the effect of occlusion in different focus captures. 3A-3D show that the occlusion caused by a close object does not remain the same when the focal distance is changed. For example, in the image of fig. 3D, the occlusions of buildings focused at far distances, such as muntin bars and radio antennas, appear to be eroded and filtered, revealing more background detail than the images of fig. 3B and 3C focused at closer distances.

Images such as those represented in fig. 3A-3D may contain information such as color and luminance information (e.g., luminance and chrominance information or RGB information) but not depth information, which are referred to herein as texture images. In some cases, where the image data includes both texture and depth information (e.g., an RGBD image), the term texture image may be used to refer to a portion of the image data that contains luminance and/or color information.

A depth measurement technique.

Some types of depth sensors use conventional camera optics and produce a depth map that by its focal properties resembles a photograph accordingly. Generally, for depth sensing, a small aperture is advantageous in order to obtain a depth map in focus over a large depth of field (DoF). A large aperture will increase sensitivity and range but will also decrease DoF.

An example of a depth sensor system with a relatively Large Aperture is described in "Focal-sweep for Large Aperture Time-of-Flight camera" by s.honnnungar et al, IEEE international conference on image processing (ICIP), 2016, pp.953-957. In some embodiments, such large aperture time-of-flight cameras may be used for depth sensing.

One example of a device capable of generating a depth map (indicating pixel distances from the capture device) is a Kinect sensor. When decomposing the view into a focal plane (MFP), a depth map may be used. An alternative technique is to take a camera-based focus capture and use filtering and other image processing means to derive a depth map using a "focus distance depth" method.

One characteristic of the defocus map is that it shows objects with the same defocus value, although at the same distance behind or in front of the focal length. Another characteristic is that the Defocus map values-although non-linear with respect to depth-can be mapped to linear distances by using information about camera parameters (aperture, focal length, etc.), as described in Shaojie Zhuo, Terence Sim, "Defocus map estimation from a single image", pattern recognition 44(2011), pp.1852-1858.

The problem addressed in some embodiments.

Multi-focal plane (MFP) representations provide the benefit of supporting viewer adjustment without extreme bandwidth, and the challenge of capturing a complete light field representation. The limitation of the current MFP approach is that the information present in the entire light field cannot be fully preserved due to information loss, e.g., due to occlusion.

Existing MFP methods typically use a texture image and corresponding depth map as inputs. The accuracy of acquiring each texture image limits the quality of the corresponding MFP decomposition process and its results, the focal plane, in addition to several other quality-affecting parameters.

Furthermore, current approaches generally do not utilize additional information provided by a focal stack, which is a collection of images captured with varying focal lengths from one view. In particular, current methods typically do not utilize the additional information provided by the focal stack captured with a large aperture. This results in the loss of information that might otherwise be captured after or through occluding objects or structures when captured using a large aperture.

In the conventional MFP method, a depth map is formed from "pinhole viewpoints", and the same segmentation (occlusion) is used in forming the MFP at each distance. To capture more information from a scene, some examples described herein use several focal captures (referred to as focal stacks) and separate depth-based segmentation (depth maps) for each captured image.

Forming and using an MFP is one method for avoiding vergence-adjustment conflicts in order to enable a viewer to naturally focus on image information in the depth dimension. The method is particularly useful in near-eye (glasses) displays. Rendering a relatively small number of MFPs (4-6) was found to be sufficient for quality while being technically feasible.

In the current method for MFP formation, a texture image and a corresponding pixel distance (depth map) are generally used. In some cases, this information is virtual and is generated using 3D modeling, resulting in a texture that is in focus anywhere (referred to as all-in-focus content).

3D information may also be captured from real world views. The view may be captured by a physical camera with one focal length, aperture and other parameters, which results in a texture image that is only in focus at a certain distance from the capture device. Accordingly, the content is not in-focus.

The examples are summarized.

The example processes and systems described herein operate to form Multiple Focal Planes (MFPs) using a focal stack (images with varying focal lengths) as input. In one example, a plurality of conventional MFP forming processes are performed in parallel for each focus stack image, and pixels and depths optimal for focusing are used.

Capturing scenes at varying focal lengths may also be applied to depth sensors, which in some embodiments use relatively large aperture optics with variable focal lengths.

Depth-based decomposition uses a different segmentation (depth map) for each texture image. Accordingly, in some embodiments, the resulting MFP uses all of the focal stack images and (most of) their information content. Specifically, more information is captured to the focal plane around the obscuration than in the conventional MFP method.

In general, a larger aperture results in more image information being captured after the edges of the object are occluded. This information expands the focal plane images and creates some overlap between them. When the focal plane images are superimposed, this overlap may appear as some illumination near the edges of the object. Depending on the desired use, there may be an optimal amount of overlap with respect to perceived image quality. Accordingly, the aperture size may be selected to be sufficient to capture enough of the occluded area without unduly highlighting or enhancing the object edges.

In some embodiments, since a relatively large aperture is used when capturing a plurality of focus stack images, information behind an occluding object or image area is also captured and passed to the MFP formation process. This additional information is saved in the process, unlike when combining the focus stack images into one extended focus image, and results in an MFP stack with an extended amount of information, referred to herein as an extended MFP stack with an extended focal plane.

Some embodiments use as input a focal stack (a series of texture images) and a series of corresponding depth maps. The focal stack image may be captured by taking a series of texture images at different focal lengths or parsing the texture images from a light field captured from the view. After applying the appropriate compression scheme to the data, the series of texture images and corresponding depth maps are transmitted.

In conventional MFP processing, a single texture image is multiplied by a focus weight map derived from a single depth map. In some embodiments, on the other hand, a series of texture images captured at different focal lengths correspond to a series of slightly different depth maps and focus weight maps. A depth map is captured using a relatively large aperture and varying focal length. In some embodiments, the same aperture and focal length (optics) are used for the texture image in the focal stack.

The received depth map is used to generate a focus weight map that is used for formation and blending of the focal plane (MFP). Each texture image in the focal stack is multiplied by a corresponding focus weight map to form a corresponding focal plane image. In some embodiments, each texture image is multiplied by a focus weight map formed from depth maps captured from/for the same focal distance.

The conventional MFP method decomposes one image with one focal length or one extended focus image (a virtually modeled scene or a compilation of several texture images). A significant amount of information behind the occluding object or area does not enter the MFP formation process.

In some embodiments, each focal plane image is formed using its corresponding focal point capture as input. In addition to collecting accurate information from all focal lengths, this method also utilizes information behind occluding objects or areas. The focal plane generated using the techniques described herein is referred to as an extended MFP.

Embodiments used herein may be employed in systems that use a focal plane to generate virtual viewpoint changes. The generation of the virtual viewpoint change can be performed for the laterally shifted viewpoints by laterally shifting the MFPs with respect to each other. The amount of shift depends on the selected viewpoint change amount (parallax) and the distance of each MFP from the viewer. In some embodiments, the generation of the virtual viewpoint change may be performed on viewpoints that are shifted in the forward or backward direction by scaling of MFPs, where closer MFPs scale by a greater amount than farther MFPs.

Compared to the shifting of a conventional MFP, the shifting of an extended MFP can result in a reduced level of de-occlusion or holes. Accordingly, this benefit may be used to increase the amount of parallax for virtual viewpoint changes.

Some existing methods use focus capture as input to the MFP decomposition program, but have been limited to aperture sizes typical for the human eye (on the order of 3-4mm under normal viewing conditions). These methods cannot be used to take advantage of the inpainting effect (reduction of holes) achieved by a focal plane extending behind the occluding object.

Some embodiments benefit from using a large aperture when capturing the focus stack. Light field is also a viable option for providing large aperture images with varying focal lengths. Unlike light field solutions based on transmitting all captured data, some embodiments produce MFPs that operate as approximations of the light field, which can be efficiently compressed and transmitted.

MFP displays are a very feasible method for supporting natural adaptation to 3D content due to acceptable technical complexity and good rendering quality. Thus, using an MFP is also a very natural choice supported by capture and transmission.

The filter-based embodiments may operate to capture focus attributes also for possible non-lambertian phenomena in the scene (e.g., also showing correct focus for reflected and refracted image information).

Example image processing and display methods.

A set of N images with different focal lengths is captured.

Some embodiments utilize additional information acquired from the scene when using relatively large aperture image capture. A relatively large aperture is here meant to be substantially larger than the aperture of the human eye, in normal conditions about 3 mm. For example, an aperture of 1cm or greater may be used. In some embodiments, an aperture of approximately 36mm may be used. In some embodiments, the aperture is in the range of one centimeter to several centimeters.

Capturing a set of N texture images of a scene having a focal length f₁,f₂…f_N. For example, the texture images of FIGS. 3B, 3C, and 3D may be at the corresponding focal distances f₁、f₂、f₃To capture. In this example, N is the same relatively small number of focal planes used in MFP formation and rendering. For simplicity, in this example, the MFP is shown at a corresponding distance d_iWhere d is₁＝f₁,d₂＝f₂…d_N＝f_N。

When the focal distance is changed, the obtained texture image is focused at the corresponding distance. Due to the large aperture used, each texture image may also contain some information from behind the edges of the occluding object, such as the portion of the building occluded by the foreground object in fig. 3D. Later in the process, the information eventually reaches the corresponding focal plane (MFP). In some examples, the symbol p_iFor texture images indexed by the value i, where p_i(x, y) is the pixel value (e.g., luminance value or RGB value, among other possibilities) at location (x, y) within the texture image.

N depth maps are captured or formed.

In this example, a separate depth map is captured for each of the N texture images. The optimal segmentation of the scene and the corresponding assignment of depth pixels may be different in case of different focal lengths.

In some examples, the symbol z_iFor reference and texture images p_iA corresponding depth map. In some embodiments, the texture image p corresponding to the image is used_iOf the same focal length d_iTo capture a depth map z_i. The depth map may be captured using a time-of-flight camera or a structured light camera, among other options. Symbol z_i(x, y) is used in some examples to refer to the depth recorded for location (x, y) within the texture image.

In depth maps captured with large aperture sizes (e.g., 1cm or more), the boundary between closer and farther objects may be "blurred". For example, a depth map captured with a large aperture may show a gradual transition in measured distance across pixels, even though there is actually a clear boundary between closer and farther objects. For example, in the case of a time-of-flight camera as used by honnnungar et al, pixels near the boundary may measure the superposition of temporally modulated light, combining light reflected from closer objects with light reflected from farther objects. When processing the received light to measure "time of flight" (e.g., according to equation 1 of honnnungar et al), the results may reflect a depth between that of a closer object and that of a farther object. While this "blurring" of depth values may be considered undesirable in prior systems, this effect is used in some examples described herein to advantageously form an extended focal plane for display while reducing the occurrence of holes or gaps between focal planes.

The focal length of the depth detection optics is adjusted such that each of the N depth maps is in focus at the same distance as the corresponding focus-captured image. When a large aperture is used, depth values for pixels/areas that are occluded by closer objects may also be obtained.

Fig. 4A-4D are schematic diagrams illustrating depth maps captured with different focal lengths. The schematic depth maps shown in fig. 4A-4D correspond to the respective texture images shown in fig. 3A-3D.

Fig. 4A shows a "pinhole" depth map 402 of the scene of fig. 3A-3D captured with a very small aperture and an essentially infinite depth of focus. More distant regions are represented by darker hatching and more proximate regions are represented by lighter hatching (or the nearest region is not hatched).

FIG. 4B is a block diagram showing the use of the same aperture and focal length f as used to capture the texture image of FIG. 3B₁Schematic illustration of a captured depth map 404. In particular, the depth map of fig. 4B focuses on the radio in the foreground. The more distant window muntin bars are out of focus and therefore appear at least partially transparent to the depth camera. (for purposes of illustration, the muntin bars in the schematic of FIG. 4B are treated to be completely transparent).

FIG. 4C is a schematic diagram of a capture pattern used in conjunction with the pattern of FIG. 3CSame aperture and focal length f used for processing images₂Schematic illustration of captured depth map 406. In particular, the depth map of fig. 4C focuses on the window frame at an intermediate distance. The radio is closer to the camera than the focal length. As a result, the radio antenna appears at least partially transparent to the depth camera. (for illustration, in the schematic of FIG. 4C, the antenna is treated as completely transparent).

FIG. 4D is a block diagram using the same aperture and focal length f as used to capture the texture image of FIG. 4D₃Schematic illustration of a captured depth map 408. In particular, the depth map of fig. 4D focuses on buildings in the background. The radio and the window frame are closer to the camera than the focal distance. As a result, the radio antenna and window muntin bar appear at least partially transparent to the depth camera. (for purposes of illustration, the antenna and muntin bars are treated as completely transparent in the schematic of FIG. 4D)

N focus weight maps are generated.

Depth blending can be achieved by applying a depth blending function to the depth map, for example, as described in Kurt Akeley, Simon J.Watt, Ahna Reza Girshick, and Martin S.Bank (2004), "Stereo Display with Multiple Focal lengths Prototype (ACM graphics processing (TOG)", v.23n.3,2004, 08 months, pp.804-813. In some embodiments, a linear filter (also referred to as a tent filter) is used, although in some embodiments a non-linear filter may be used.

In some embodiments, the depth map is used to generate a focus weight map (e.g., N focus weight maps) that indicates the weights that image pixels contribute to each focal plane image.

In some such embodiments, those pixels that are exactly at the distance of the focal plane contribute only to the corresponding focal plane (with the full weight w-1). Due to depth blending, the pixels between the two focal planes contribute to the two planes by the weights (w1 and w 2; w1+ w2 ═ 1) represented by the corresponding focus weight maps.

Symbol w_j(x, y) may be used to indicate the focus of the pixel at location (x, y) with respect to the display focal plane indexed by jAnd (4) weighting. In some examples, the focus weight map w_j(x, y) is a function of depth such that w_j(x,y)＝w_j[z_i(x,y)]Wherein z is_i(x, y) is the depth of the pixel at position (x, y) in the depth map indexed by i (corresponding to the texture image indexed by i).

In some embodiments, each of the N depth maps corresponding to the N images is processed by N blending functions. Thus, a total of N × N focus weight maps may be generated, where each focus weight map may consist of w in some examples_ij(x,y)＝w_j[z_i(x,y)]Wherein i, j is 0, …, N-1. A viable option is to use only those focus weight maps that correspond to the focal length of each texture image, so that in such an embodiment each focus weight map may be represented by w_j(x,y)＝w_j[z_i(x,y)]To indicate. Each such focus weight map contains information that is best focused and accurate compared to any other focus weight map. In an alternative embodiment, one or more focus weight maps that do not correspond to the focal length of the texture image may be selected, for example, in order to provide a desired visual effect.

FIGS. 5A-5C are diagrams for use with the corresponding texture images of FIGS. 3B-3D to generate distance f₁、f₂、f₃Schematic diagram of a focus weight map of a corresponding focal plane image. The focus weight map provides a weight that represents, for each region (e.g., pixel) of the texture image, an amount by which that region is in focus. In the illustrations of fig. 5A-5C, areas with higher weight (corresponding to more in-focus areas) are shown shaded lighter (or not) and areas with lower weight (corresponding to more out-of-focus areas) are shown shaded darker.

FIG. 5A schematically shows the generation of the distance f₁The focus weight map 504 used in the focal plane image. With a close distance f₁Is given the highest focus weight, as measured using the depth map of fig. 4B (which is also at the focal length f)₁Lower captured). For example, because the radio is positioned substantially at the distance f₁So that the radio device is in focus and corresponds toThe regions (e.g., pixels) of the radio are given the highest focus weight. Other areas, such as window frames and background buildings, are more out of focus at greater distances and therefore have lower focus weights.

FIG. 5B schematically shows the generation of the distance f₂The focus weight map 506 used in the focal plane image of (a). With a close distance f₂Is given the highest focus weight, as measured using the depth map of fig. 4C (which is also at the focal length f)₂Lower captured). For example, because the window frame is positioned substantially at the distance f₂So the window frame is in focus and the region (e.g., pixels) corresponding to the window frame is given the highest focus weight. The radio is unfocused because it is longer than the distance f₂More recently, and the background building is unfocused because it is greater than f₂At greater distances, those regions have lower focus weights.

FIG. 5C schematically shows the generation of the distance f₃The focus weight map 508 used in the focal plane image. With a close distance f₃Is given the highest focus weight, as measured using the depth map of fig. 4D (which is also at the focal length f)₃Lower captured). For example, because the background building is positioned substantially at distance f₃So the building is in focus and the area (e.g., pixels) corresponding to the building is given the highest focus weight. Other areas, such as the window frame and the radio, are at closer distances, are more out of focus, and therefore have lower focus weights.

Selection and use of N focal plane images.

In some embodiments, the focal plane image is formed by multiplying each texture image by a focus weight map corresponding to its focal length. Formed in this way, the focal plane also contains some information behind the edges of the occluding object. The larger the aperture, the greater the amount of such information when capturing the focused image (and sensing the depth).

FIG. 6A schematically shows at focal length f₁A focal plane image 604 is generated for display. In the case of this example of the present invention,the focal plane image 604 is generated by multiplying the texture image 304 by the focus weight map 504 on a pixel-by-pixel basis (possibly after scaling or otherwise aligning the texture image and the focus weight map). The most visible content in the focal plane image 604 is mainly the radio, which is at the focal distance f₁The most focused object.

FIG. 6B schematically shows at focal length f₂A focal plane image 606 is generated for display. In this example, the focal plane image 606 is generated by multiplying the texture image 306 by the focus weight map 506. The most visible content in the focal plane image 606 is mainly the window frame, which is at the focal distance f₂The most focused object.

FIG. 6C schematically shows at focal length f₃A focal plane image 608 is generated for display. In this example, the focal plane image 608 is generated by multiplying the texture image 308 by the focus weight map 508. The most visible content in the focal plane image 608 is primarily buildings, which are at the focal distance f₃The most focused object.

Fig. 7 schematically shows the display of multiple focal plane images to a user, for example using an MFP display such as that of fig. 2, specifically in this example, focal plane image 604 is displayed at the focal plane closest to the user (left side of the figure), focal plane image 606 is displayed at a focal plane further from the user, and focal plane image 608 is displayed at a focal plane further away. To the user, the focal plane image 604 may appear at a distance f from the user₁At, the focal plane image 606 may appear at a distance f from the user₂And the focal plane image 608 may appear at a distance f from the user₃To (3).

The processing method used in the examples of fig. 3-7 is shown in the flow chart of fig. 8, at 802, capturing a number N of texture images at different focal lengths. At 804, a separate depth map is generated for each of the N texture images. In step 806, a focus weight map is generated from each of the depth maps. In step 808, each texture image is multiplied by the associated focus weight map to form N focal plane images. In step 810, N focal plane images are rendered on the multi-focal plane display. In the case of a motion parallax enabled display, the focal plane images may be laterally shifted and/or scaled relative to each other in response to lateral movement of the viewer to simulate motion parallax.

Example focus weight determination.

In the examples shown in fig. 3-7, for at the focal length f_iAt each displayed focal plane, there is a focal length f_iCaptured single texture image and using focal distance f_iA single depth map is captured. (alternatively, other depth maps and texture images are captured, but not used to generate the displayed focal plane image). Under such conditions, the focus weights used to populate the focus weight map may be calculated using the method illustrated in fig. 13A-13C.

FIG. 13A is a view for having a minimum focal length f₁Example focus weights w for the focal plane (closest to the camera or viewer)₁Graph of (x, y). In this example, each focus weight w₁(x, y) only by a focal length f₁By a depth camera, the corresponding pixel depth z₁(x, y). In this example, the second focal plane is at focal length f₂To (3). In this example, the focus weight W₁(x, y) is calculated as follows.

If z is₁(x,y)≤f₁Then w is₁(x,y)＝1。

If f₁≤z₁(x,y)≤f₂Then w is₁(x,y)＝[z₁(x,y)-f₂]/[f₁-f₂]。

If z is₁(x,y)≥f₂Then w is₁(x,y)＝0。

FIG. 13B is a schematic view for having a focal length f_iExample focus weight w for the focal plane of_i(x, y) that is neither the nearest nor the farthest focal plane. In this example, each focus weight w_i(x, y) respective pixel depths z captured by the depth camera only_i(x, y) and the depth camera also has a focal length f_i. Closer focal plane at focal length f_i-1At, and the farther focal plane is at focal length f_i+1To (3).In this example, the focus weight w_i(x, y) is calculated as follows.

If z is_i(x,y)≤f_i-1Then w is_i(x,y)＝0。

If f_i-1≤z_i(x,y)≤f_iThen w is_i(x,y)＝[z_i(x,y)–f_i-1]/[f_i–f_i-1]。

If f_i≤z_i(x,y)≤f_i+1Then w is_i(x,y)＝[z_i(x,y)–f_i+1]/[f_i–f_i+1]。

If z is_i(x,y)≥f_i+1Then w is_i(x,y)＝0。

FIG. 13C is a schematic view for having a focal length f_NExample focus weight w for the focal plane of_N(x, y) which is the farthest focal plane. In this example, each focus weight w_N(x, y) respective pixel depths z captured by the depth camera only_N(x, y) and the depth camera also has a focal length f_N. Closer focal plane at focal length f_i-1At, and the farther focal plane is at focal length f_i+1To (3). In this example, the focus weight w_N(x, y) is calculated as follows.

If z is_N(x,y)≤f_N-1Then w is_N(x,y)＝0。

If f_N-1≤z_N(x,y)≤f_NThen w is_N(x,y)＝[z_N(x,y)–f_N-1]/[f_N–f_N-1]。

If z is_N(x,y)≥f_NThen w is_N(x,y)＝1。

13A-13C provide f representing a linear distance from a camera or observer_iThe value is obtained. However, in some embodiments, f is used_iAnd z_iTo calculate a focus weight w from the reciprocal of the distance of the value of (e.g. the diopter scale)_i(x, y). The equations given with respect to FIGS. 13A-13C may still be used in these embodiments, it being understood that f_iMinimum value of (e.g. f)₁) Will represent the farthest focal plane, f_iMaximum value of (e.g. f)_N) The closest focal plane will be indicated. Such embodiments using the opposite distance scale may more easily accommodate the use of a focal plane located "at infinity".

FIGS. 14A-14C are similar to FIGS. 13A-13C, but FIGS. 14A-14C show the focus weights w_i(x, y) need not be relative to depth z_iPiecewise linear. The focus weight versus depth z shown in FIGS. 14A-14C_iAre piecewise sinusoidal, but other relationships may be implemented in other embodiments. In other example embodiments, in the embodiments of FIGS. 13A-13C and 14A-14C, W_iAt f_iHas a maximum value at f_iDecreases or remains constant on either side.

In some embodiments, it may not be the case that for the focal length f_iEach display focal plane at is at a focal length f_iCaptured single texture image and at focal distance f_iA captured depth map. E.g. at focal length f_iThere may be a display plane, but at the same focal length f_iNo texture image and/or depth map is captured. Similarly, the depth map and texture image may be captured with different focal lengths. An example of such a condition is shown in fig. 9, where there are two different depth maps, three different texture images and two different focal plane images, none of which have the same corresponding focal distance.

Under such conditions, image processing in some embodiments may be performed as follows. The pixel value (e.g., luminance value or RGB value) at location (x, y) in the focal plane image i may be represented by q_i(x, y). The pixel values in the different display focal planes j may be represented by p_j(x, y). Each pixel value q_i(x, y) can be calculated as follows:

wherein w_ij(x, y) is the focus weight in the focus weight map. Weight w_ij(x, y) may in turn be represented by z_iDepth of (x, y)And determining by using the degree map. Weight w_ij(x, y) represents a captured pixel p from the texture image j_j(x, y) display pixel q in focal plane image i_iThe weight of the contribution of (x, y).

In some embodiments, the weight is determined based on at least two factors: (i) a factor based on the difference between the focal plane i and the focal distance of the captured texture image j, and (ii) a factor based on the focus level of individual pixels in the captured texture image.

When the focal plane i and the texture image j both have the same focal length, a factor based on the difference between the focal lengths of the focal plane i and the captured texture image j may have a value of 1, and the factor may be decreased due to an increase in the difference between the focal lengths.

The factor based on the focus level of individual pixels in the captured texture image may depend on the difference between the focus distance of the texture image and the measured depth of the captured pixels. The factor may have a value of 1 when the measured depth of the captured pixel is equal to the focal length of the texture image, otherwise the factor may be reduced. If no depth map is captured at the same focal distance as the texture image, the measured depth of the captured pixel may be determined, for example, by linear interpolation based on the depth map with the closest focal distance. In some embodiments, as described in more detail below, a defocus map is used to determine the focus level of individual pixels. Such an embodiment does not require the use of a capture depth map.

A defocus map is used to form a focus weight map.

In some embodiments, as described above, in order for the occluded information to eventually reach the focal plane, depth sensing is performed using an aperture having a non-negligible size rather than using a pinhole aperture. In some such embodiments, the set of depth maps may be captured using the same aperture and focal length as used to capture the depth images. In an alternative embodiment, filtering of the focus stack images is performed to capture information from the occluded area, which may appear in any focus stack image, and use it to form the extended MFP. Such embodiments may be implemented without the use of a separate depth sensor.

In some embodiments, a focus weight map is derived for each captured texture image using a "depth-to-focus" method, such as that described in Shaojie Zhuso, Terence Sim, "Defocus map estimation from a single image", pattern recognition 44(2011), pp.1852-1858.

In some embodiments, N defocus maps are formed, each corresponding to one texture image (e.g., using the method of Zhuo & Sim). Each defocus map covers the depth range of the entire captured view. The depth blending operation may be used to form a corresponding focus weight map. In such embodiments, the focus weight map is determined based on the focus level rather than the measured depth.

In some cases, the depth-blending function is symmetric, producing the same contribution whether the pixel is in front of or behind the focal point (focal plane) distance. The defocus map inherently has this characteristic.

It may be noted that for defocused images, the focal length is also known. Thus, despite the different scales, the origin of the two scales is the same. To satisfy the convention of depth maps, the defocus map may be inverted prior to depth blending. This makes it essentially a focus map showing the highest value of the highest focus. However, this map may still be referred to as a defocus map.

Fig. 10 shows an example of interpreting a (defocused) focus map as a depth map and decomposing a test image into three MFPs using linear filters. At 1002, N different texture images are captured at different focal lengths. At 1004, each texture image is filtered to form a corresponding defocus map. The generation of the defocus map may utilize camera parameters such as aperture and focal length. At 1006, N focus weight maps are generated from the defocus maps. At 1008, each texture image is multiplied by a corresponding focus weight map to form a total of N focal plane images. At step 1010, a focal plane image is rendered on the multi-focal plane display.

The focus weight map generated by using the defocus plane may correspond to a large extent to the focus weight map generated using the depth map, which does not have to be linear with distance for the defocus map, except for scale. While it is believed that this difference does not have a significant effect, in some embodiments it may be desirable to linearize the brightness scale of the defocus map. As described in Zhuo & Sim, linearization may be performed with knowledge of the camera parameters when capturing the texture image.

Filtering and redistribution are used to form multiple focal planes.

In some embodiments, filtering and redistribution are used to form the focal plane image.

When producing MFPs that support viewpoint changes (e.g., motion parallax and/or generation of stereoscopic views), filtering and redistribution may reduce de-occlusion. The redistribution operation is used to separate and redistribute by filtering the high and low frequency components of each focal plane image. The high frequencies are kept at the same level/distance at which they occur, but the low frequency components are distributed in the focal plane image. Redistribution of low frequency components is possible because they only make a minor contribution to depth cues in the human visual system.

In some embodiments, the stack of texture images is captured with different focal lengths, and depth positions for high frequencies are implied by the known focal lengths. Information from the occluded area is captured to the MFP, the redistribution benefits are gained, and no depth map or depth blending is used. In some embodiments, a large aperture image is used in order to capture information from the occluded area. The aperture diameter may be of the order of a few centimeters. The filtering and redistribution can be done in such a way that the information eventually reaches the redistributed MFP; the filtering is the same over the entire image area, so it is not possible to exclude information captured from occlusion areas. The result does not seem to suffer from the fact that the occluded area near the edge can be seen through the occluding texture, thus changing the brightness of the corresponding pixel.

There may be practical limitations to the optimal aperture size associated with information overlap around the edge. In addition to limiting the aperture size as a solution, an image processing based solution may be implemented to display occluded information only when it is revealed from behind the edge, for example when the focal planes are moved relative to each other for the virtual viewpoints (the amount of movement determines which pixels are revealed or covered).

Fig. 11 shows an example of one such method, capturing a large aperture light field from a real or synthetic scene at 1102. At 1104, N focal stack images are generated from the lightfield image, each focal stack image having a different focal length. At 1106, an extended focus image (e.g., a "pinhole" type image) is generated from the light field image. At 1108, a high pass filter is applied to the focus stack image to obtain high frequency image components. At 1110, a low pass filter is applied to the extended focus image to obtain low frequency image components. At 1112, the low frequency image component from the extended focus image is added to the filtered (high frequency) focus stack image, possibly with a scaling factor such as 1/N. At 1114, a redistributed focal plane image, now including high frequency components (from the original focal stack image) and low frequency components (from the extended focus image), is rendered. In some embodiments, the stacks of focal plane images may be shifted relative to each other to produce stereoscopic views of the scene and/or simulated motion parallax. The distribution of low frequency components in the different focal plane images allows for a large amount of shifting before any gaps or voids in the images become visible.

FIG. 12 illustrates another technique for generating a focal plane image from a light field using filtering. At 1202, a large aperture light field is captured from a real or synthetic scene. At 1204, N focus stack images pic1 … picN are generated from the lightfield image, each focus stack image having a different focal length. At 1206, an extended focus image (e.g., a "pinhole" type image) pic. (in some embodiments, the extended focus image may be formed by separate focus captures having different focal lengths). At 1208, a low pass filter is applied to the focal plane image to obtain the low frequency image component pic1.lf … picn. lf. At 1210, a low pass filter is applied to the extended focus image to obtain a low frequency image component pic _ ext. At 1212, the low frequency image component from the extended focus image is added to the focal plane image, possibly with a scaling factor such as 1/N. The resulting images now include their original low frequency components and the low frequency contribution from pic _ ext.lf; thus, at 1213, the low pass filtered image pic1.lf … picn. lf is subtracted to generate a redistributed focal plane image pic1.rd … picn. rd. These redistributed focal plane images are displayed to the user at 1214. In some embodiments, the stacks of focal plane images may be shifted relative to each other to produce stereoscopic views of the scene and/or simulated motion parallax. The distribution of low frequency components in the different focal plane images allows for a large amount of shifting before any gaps or voids in the images become visible.

In some such embodiments, the low pass filtering is performed using gaussian filtering. In the example of fig. 12, after low pass filtering (at 1208) of the focal plane image, subtracting the low pass filtered image (at 1213) from the original image has the effect of high pass filtering. In an alternative embodiment, the high-pass filtering of the focal plane image is performed explicitly.

A plurality of depth and focus images are signaled.

Embodiments described herein may use multiple depth maps and focal images corresponding to a single time instant. In some embodiments, techniques are used for efficient storage and/or communication of depth maps and focal plane images.

The depth image and the focus image are associated.

In some cases, the focal length for depth capture may be different than the focal length for image capture. The resolution of the depth map may vary, typically having a lower resolution than the resolution of the image capture. In some embodiments, during upsampling of the depth map, edge mapping in the image map may be used to provide information to refine the depth edges. The depth map may be signaled at different frame rates and interpolated to the image frame rate. The depth map may also have different bit depths and mappings of image values to depth values.

In many cases, the depth map may have little detail except around the edges of the object. In some embodiments, the resolution of the depth map may be reduced for communication and then resized to full resolution before being used to calculate the depth weighting function. The presence of a high resolution image capture may be used to guide interpolation around edges when the depth map is up-sampled for a particular focus depth value. In many cases, the depth map is a single-channel image, there are no colors, and the bit depth may be relatively low. The relationship between bit depth and actual distance can be represented by a transfer function.

Video sequence level parameters.

Given the possible differences between the focal plane image and the depth map, such as bit depth, spatial resolution, temporal frame rate and focus value, a coded video sequence comprising multiple focused images and depth maps may provide these parameter values independently for both the aggregated image and the depth map. A description of the sequence level parameters is shown in table 1.

Table 1. multiple focus image sequence information parameters.

The focused image sequence level parameter is constant in sequence and describes a characteristic common to all focused images of the time sequence.

num _ focal _ images: the number of focused images corresponding to a single frame time.

focal _ images _ fps: the frame rate of the sequence of focused images defines the time offset between the images corresponding to the same focal distance.

focal _ images _ height: the spatial height in pixels of each focused image.

focal _ images _ width: the spatial width in pixels of each focused image.

focal _ images bit _ depth: bit depth of samples of each focused image.

focal _ images _ num _ color _ planes: the number of color planes is described, e.g. 3 of the RGB or YUV sequence.

focal _ distance [ f ]: each entry of the array gives a focal length corresponding to the index of the focal image.

The depth map sequence level parameters are constant across the sequence and describe characteristics common to depth maps of the sequence.

num _ depth _ maps: the number of depth maps corresponding to a single frame time. Possibly different from the number of focal images.

depth _ maps _ fps: the frame rate of the sequence of depth maps defines the time offset between the images corresponding to the same depth distance.

depth _ maps _ height: each depth map is in the spatial height of a pixel.

depth _ maps _ width: each depth map is in the spatial width of a pixel.

depth _ maps _ bit _ depth: bit depth of samples of each depth map image.

depth _ map _ distance [ f ]: each entry of the array gives a distance value corresponding to an index of the depth map. This is the distance corresponding to the focal distance used when recording the depth map.

A frame level parameter.

Individual frames in a video sequence may indicate their type: focus the image or depth map, index the relevant sequence level parameter set, and additionally indicate the temporal offset via the image count, and the index to the focal _ distance or depth _ map _ distance value. These frame level parameters are shown in tables 2 and 3.

Table 2. aggregate image single frame parameters.

The frame level parameters for a single focused image are described below:

sequence _ id: refers to a single sequence parameter set used in case of multiple sequence parameter sets.

frame _ count: the temporal position of the focused image within the frame sequence is described.

depth _ map _ distance _ index: a sequence level list of indexed focus values.

Table 3. depth map single frame parameters.

The frame-level parameters for a single depth map are described below:

sequence _ id: refers to a single sequence parameter set used in case of multiple sequence parameter sets.

frame _ count: the temporal position of the focused image within the frame sequence is described.

depth _ map _ distance _ index: a sequence level list of depth map distance values is indexed.

Inter-picture prediction is used in coding a focus plane picture.

The correlation between images captured under different focus conditions can be exploited via inter-image prediction using techniques similar to SNR scalability, where the quality varies but the resolution does not. In some embodiments, correlation between different focus captures of the same scene is exploited by signaling one focus capture image and signaling a difference between the first focus capture image and the second focus capture image.

Inter depth map prediction is used in the encoding.

The correlation between depth maps may be used to reduce bandwidth requirements. Similar to signaling a single base focus image and an additional focus image via residual, multiple depth maps with different focus captures can be signaled by predicting between the depth maps.

Additional features in some embodiments.

In some embodiments, the number and location of focal planes formed is the same as the number and location of texture images captured. In the case where the number and/or location are different, the texture images may first be blended to the nearest focal plane according to their distance from the corresponding focal plane location.

Notably, in various MFP methods, depth maps are used to separate or decompose scene information/pixels into a plurality of depth ranges for forming respective focal planes. Instead of depth maps, other depth-dependent mapping criteria may be used for the separation. Examples of alternative depth dependent mappings are described above with respect to using a defocus map for this purpose. The defocus map is similar to the depth map, but instead of depth sensing, the defocus map is based on image blur, which can be detected by filtering the image.

Another criterion for separation in some embodiments is the use of depth of field. However, the depth of field follows relatively complex 3D and optical geometry mathematics. The DoF appears in the image as a region with focused pixels (hyperplane) while the outer regions are correspondingly defocused. By using appropriate filtering to detect the in-focus region, the calculation of DoF may be replaced by detecting the in-focus region by filtering.

In embodiments where redistribution of spatial frequency components is performed, the stack of texture images is captured by different focal lengths, and the depth position of high frequencies is implied by the known focal lengths, which is now used as a criterion for distributing depth information. In addition, filtering is used to detect a set of complementary dofs and corresponding focus stack images, covering the entire captured volume over the depth and focus information. The number and location of the focus images can be determined mathematically to capture most of the in-focus detail (high frequency) of the scene.

In some embodiments, a method includes obtaining a plurality of texture images of a scene, each texture image having a different respective focal length; and for each texture image, generating a focal plane by: (i) determining a corresponding weight for each of a plurality of pixels of the texture image, wherein the weight represents an amount by which the pixel is in focus, and (ii) multiplying a pixel value of each of the plurality of pixels by the corresponding weight. The focal plane images may be displayed in a multi-focal plane display, e.g., substantially simultaneously or in a time multiplexed manner (e.g., sequentially).

In some embodiments, a method includes obtaining a plurality of texture images p of a scene_iEach texture image having a different respective focal length d_i(ii) a And for each texture image p_iGenerating a focal plane image q by_i: (i) determining a corresponding weight for each of a plurality of pixels of a texture imagew_iWherein the weight w_i(x, y) represents the amount by which pixel (x, y) is in focus, and (ii) texture image p_iEach pixel value p of_i(x, y) multiplied by a corresponding weight w_i(x, y) to generate a focal plane image q_iSo that q is_i(x,y)＝p_i(x,y)·w_i(x,y)。

May be based at least in part on a depth value z corresponding to a pixel_i(x, y) and focal length d of texture image including the pixel_iThe difference between them to determine the amount of pixel focus in the texture image.

In some embodiments, for each texture image, a depth image z of the scene is obtained_i(x, y). For each texture image p_i(x, y) by a function w_i[z_i(x,y)]Determining a weight w_i(x, y). In some embodiments, a single depth image may be obtained for use with all texture images, and z_i(x, y) may be the same for all values of i. In some embodiments, w_i[z_i(x,y)]At w_i[d_i]Has a maximum value.

In some embodiments, obtaining the plurality of texture images comprises: receiving a set of initial texture images at a display device having a plurality of display focal planes, each display focal plane having a different respective focal length; and selecting a selected texture image p having a focal length corresponding to the focal length of the display focal plane (e.g., having the same focal length or the closest focal length) from the set of initial texture images_iA collection of (a). Each selected texture image p_iMay have a focal length d equal to one of the display focal planes_i。

In some embodiments, a method of providing a multi-layered image of a scene comprises: for each of a plurality of different focal lengths, (i) capturing a texture image of the scene focused at the respective focal length, and (ii) capturing a depth image of the scene focused at the respective focal length (e.g., using a time-of-flight camera); and transmitting the captured texture image and depth image. Each texture image and the corresponding depth image may be captured substantially simultaneously. Each texture image and the corresponding depth image are captured with the same optics. In some embodiments, the captured texture image and depth image are encoded in a bitstream, and transmitting the captured texture image and depth map comprises transmitting the encoded bitstream. In some such embodiments, encoding the captured texture image and depth image includes using at least a first one of the texture images as a predictor for encoding at least a second one of the texture images. In some embodiments, encoding the captured texture image and depth image comprises using at least one of the texture images as a predictor for encoding at least one of the depth maps in the depth image.

Note that the various hardware elements of the described embodiment(s) are referred to as "modules," which perform (i.e., execute, perform, etc.) the various functions described herein in connection with the respective modules. As used herein, a module includes hardware (e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more Application Specific Integrated Circuits (ASICs), one or more Field Programmable Gate Arrays (FPGAs), one or more memory devices) deemed suitable by one of ordinary skill in the relevant art for a given implementation. Each described module may also include instructions executable to perform one or more functions described as being performed by the respective module, and it is noted that these instructions may take the form of or include hardware (i.e., hardwired) instructions, firmware instructions, and/or software instructions, etc., and may be stored in any suitable non-transitory computer-readable medium or media, such as those commonly referred to as RAM, ROM, etc.

Although the features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable medium for execution by a computer or processor. Examples of the computer readable storage medium include, but are not limited to, Read Only Memory (ROM), Random Access Memory (RAM), registers, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks and Digital Versatile Disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

47页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于2D显示器上的内容的3D感知的近眼焦平面覆盖层的方法和系统

Method and system for forming extended focal plane for large viewpoint changes

相关技术

网友询问留言