Method and apparatus for encoding and decoding digital light field images

文档序号：1652342 发布日期：2019-12-24 浏览：26次中文

阅读说明：本技术 对数字光场图像编码和解码的方法和设备 (Method and apparatus for encoding and decoding digital light field images ) 是由曹荣轩张吉恩安东尼奥·奥尔特加于 2018-05-03 设计创作，主要内容包括：本发明公开了一种对原始的微小透镜图像(f)编码的方法,所述方法包括接收阶段(705),其中接收原始的微小透镜图像(f)的至少一部分,所述图像(f)包括多个宏像素(650、1320、1340),每个宏像素(650、1320、1340)包括与场景同一点的特定视角相对应的像素；输出阶段,其中,输出包括被编码的微小透镜图像(f)的至少一部分的比特流(fd^)。所述方法包括图像变换阶段(710),其中,所述原始的微小透镜图像(f)的像素在相对于所接收的原始的微小透镜图像具有更多列和更多行的变换多色图像(660)中被空间移位,其中具有未限定值的虚拟像素(610、620)被插入到所述原始的微小透镜图像(f)中,并且其中,执行所述移位,以便将每个宏像素(650、1320、1340)的估计中心位置置于整数像素位置。而且,方法还包括子视图生成阶段,其中,生成子视图系列(fd’),所述子视图(1310、1510、1620)包括从变换后的原始的微小透镜图像(f)的不同的宏像素(650、1320、1340)提取的相同角度坐标的像素。最后,方法包括图形编码阶段,其中,通过根据预定的图形信号处理(GSP)技术,对所述系列(fd’)的子视图(1310、1510、1620)中的至少一个的图形表示进行编码,从而生成比特流(fd^)。所述输出阶段包括输出被图形编码的比特流(fd^),以用于其传输和/或存储。(The invention discloses a method of encoding an original lenticular image (f), the method comprising a receiving phase (705) in which at least a portion of the original lenticular image (f) is received, the image (f) comprising a plurality of macrosPixels (650, 1320, 1340), each macropixel (650, 1320, 1340) comprising pixels corresponding to a particular perspective of the same point of the scene; an output phase, in which a bitstream (f) comprising at least part of the encoded microlens image (f) is output d A). The method comprises an image transformation phase (710), wherein pixels of the original lenticular image (f) are spatially shifted in a transformed multi-color image (660) having more columns and more rows with respect to the received original lenticular image, wherein virtual pixels (610, 620) having undefined values are inserted into the original lenticular image (f), and wherein the shifting is performed so as to place the estimated center position of each macro-pixel (650, 1320, 1340) at integer pixel positions. Furthermore, the method comprises a sub-view generation phase, wherein a series of sub-views (f) is generated d ') of the original lenticular image (f), said sub-views (1310, 1510, 1620) comprising pixels of the same angular coordinate extracted from different macro-pixels (650, 1320, 1340) of the transformed original lenticular image (f). Finally, the method comprises a graphics coding phase in which said series (f) is subjected to a Graphics Signal Processing (GSP) technique according to a predetermined graphics d ') to generate a bitstream (f) by encoding a graphical representation of at least one of the sub-views (1310, 1510, 1620) d A). The output stage comprises outputting a graphics-encoded bitstream (f) d A) for its transmission and/or storage.)

1. A method of encoding an original lenslet image (f), said method comprising:

-a receiving phase (705) in which at least a portion of an original lenticular image (f) is received, said image (f) comprising a plurality of macropixels (650, 1320, 1340), each macropixel (650, 1320, 1340) comprising a pixel corresponding to a specific viewing angle of the same point of the scene;

-an output phase, in which a bitstream (f) comprising at least part of the encoded microlens image (f) is output_d ^{^})，

Characterized in that the method further comprises:

-an image transformation phase (710), wherein pixels of the original micro-lens image (f) are spatially shifted in a transformed multi-color image (660) having more columns and more rows with respect to the received original micro-lens image, wherein virtual pixels (610, 620) having undefined values are inserted into the original micro-lens image (f), and wherein the shifting is performed so as to place the estimated center position of each macro-pixel (650, 1320, 1340) at integer pixel positions;

-a sub-view generation phase, wherein a series (f) of sub-views is generated_d') of said sub-views (1310, 1510, 1620) comprising pixels of the same angular coordinate extracted from different macro-pixels (650, 1320, 1340) of said original transformed micro-lens image (f);

-a graphics coding phase, in which said series (f) is coded by means of a Graphics Signal Processing (GSP) technique according to a predetermined_d') of the sub-views (1310, 1510, 1620), thereby generating a bitstream (f)_d ^{^})，

Wherein the output stage comprises outputting the bitstream (f) graphically encoded_d ^{^}) For transmission and/or storage thereof.

2. The encoding method according to claim 1, wherein the spatial shift comprises at least a rotation and/or a translation and/or a scaling operation.

3. Method of encoding according to claim 1 or 2, wherein said graphical representation is based on said series (f) of a plurality of said sub-views (1310, 1510, 1620) organized in a group of pictures (GOP) structure_d') generated.

4. Encoding method according to any one of the preceding claims, wherein, in the sub-view graph generation phase (725), the series (f) of sub-views is_d') into a plurality of GOPs consisting of a predetermined number G of sub-views (1310, 1510, 1620).

5. Method for encoding according to claim 3 or 4, wherein said series of sub-views (f) is generated by taking into account intra-or inter-view dependencies between said sub-views (1310, 1510, 1620) of said group of pictures (GOP) structure_d') such that each node of the graphical representation is connected to a predetermined number of nearest nodes according to the euclidean distance in the same sub-view and reference sub-view (1620) within the GOP structure.

6. Encoding method according to any one of the preceding claims, wherein said sub-view generation phase comprises:

-a sub-aperture generation phase (715) wherein a sub-aperture image (670) comprising a plurality of sub-views (1310, 1510, 1620) is generated by composing each sub-view such that pixels having the same relative position with respect to each macro-pixel center are used;

-a sub-aperture rearrangement stage (720) in which the repetition is based on at least one predetermined orderComposing sub-views of the generated sub-aperture image (670) to generate a series (f) of sub-views (1310, 1510, 1620)_d’)；

7. An encoding method according to claim 6, wherein, in said sub-aperture reordering stage (720), said predetermined order is a raster scan order, or a spiral order, or a zig-zag order or a chess-like order.

8. Method for encoding according to any one of the preceding claims, wherein, in the graphics encoding phase, the series of sub-views (f) is coded by means of a Graphics Fourier Transform (GFT) or a graphics-based lifting transform (GLT)_d') to generate said bitstream (f)_d ^{^})。

9. A method for decoding a bitstream comprising at least one encoded original lenticular image (f), the method comprising:

-a receiving phase, in which a series of sub-views (f) is received_d') a bit stream (f) encoded with graphics_d ^{^}) Wherein each sub-view (1310, 1510, 1620) comprises pixels of the same angular coordinate extracted from different macro-pixels (650, 1320, 1340) of said original micro-lenticular image (f), each macro-pixel comprising pixels corresponding to a specific viewing angle of the same point of the scene, and

an output phase, in which the reconstructed light field image (f) is output and/or displayed_d ^～)，

Characterized in that the method further comprises:

-a graphics decoding phase (805) in which the bitstream (f) graphics-encoded is subjected to a predetermined Graphics Signal Processing (GSP) technique_d ^{^}) Decoding is performed and the reconstructed sub-view series (f) is output_d"), wherein the sub-view (1310, 1510, 1620) comprises virtual pixels located in pixel positions having undefined color values;

demosaicingA gram filtering stage (820) in which a series of sub-views is filtered by applying a demosaicing technique to the series of sub-views (f)_d') generating a full-color demosaiced microlens image by color interpolation;

-an original micro-lens rearrangement stage (830), wherein a full-color sub-aperture image (f) is obtained based on the full-color demosaiced micro-lens image_d”’)；

Wherein the output stage comprises outputting the generated full-color sub-aperture image (f)_d”’)。

10. The decoding method of claim 9, wherein the lenslet rearrangement stage (830) comprises:

-an image reconstruction stage (810) in which the series (f) is rearranged on the basis of at least one predetermined order_d") to generate a reconstructed sub-aperture image comprising a plurality of sub-views (1310, 1510, 1620);

-a micro-lens image reconstruction stage (815), wherein the reconstructed micro-lens image is generated based on the order used in the encoding of the received original micro-lens image (f) such that the pixels of each sub-view (1310, 1510, 1620) are located in the respective macro-pixels (650, 1320, 1340).

11. A decoding method according to any one of the preceding claims 9 to 10, wherein in the lenslet rearrangement stage (830) the predetermined order is a raster scan order, or a spiral order, or a zig-zag order or a chess-like order.

12. Decoding method according to any one of the preceding claims 9 to 11, wherein in the graphics decoding stage (805), the series of sub-views (f) is decoded by a lifting transform (GLT) according to a Graphics Fourier Transform (GFT) or graphics based_d") to generate the series of reconstructed sub-views (f)_d”)。

13. An apparatus (1005) for encoding an original lenslet image (f), said apparatus comprising:

an input unit (1070) configured for acquiring at least a part of an original lenslet image (f) from a source (1000), said original lenslet image (f) comprising a plurality of macro-pixels (650, 1320, 1340), each macro-pixel (650, 1320, 1340) comprising a pixel corresponding to a particular viewing angle at the same point of a scene,

an output unit (1080) configured to output the generated bitstream (f)_d ^{^}) At least a part of (a) is,

characterized in that the device further comprises:

-at least one processing unit (1010, 1020, 1040) configured to execute a set of instructions for encoding the original lenslet image (f),

-a storage unit (1030) containing image data relating to said original lenticular image (f) and the result (f) of the execution of coded instructions_d’，f_d ^{^})，

Wherein the at least one processing unit (1010, 1020, 1040) is configured to spatially shift pixels of the original lenslet image (f) in a new polychromatic image having more columns and more rows with respect to the received original lenslet image, wherein virtual pixels (610, 620) are inserted into pixel positions having undefined color channel values, and wherein the shifting is performed so as to place the estimated center position of each macropixel at integer pixel positions;

wherein the at least one processing unit (1010, 1020, 1040) is further configured to generate a series of sub-views (f) starting from the original micro-lens image (f)_d') each sub-view (1310, 1510, 1620) comprising pixels with the same angular coordinates extracted from different macro-pixels (650, 1320, 1340) of said original micro-lens image (f);

wherein the at least one processing unit (1010, 1020, 1040) is further configured to retrieve the memory list from the memory unitMeta (1030) acquires a series of sub-views (f)_d') and performing a Graphics Signal Processing (GSP) technique for said series of sub-views (f)_d') and encoding the resulting bit stream (f)_d ^{^}) To the storage unit (1030).

14. The encoding device (1005) of claim 13, wherein the at least one processing unit (1010, 1020, 1040) is further configured for:

-generating the series of sub-views (f) by forming a sub-aperture image (670) comprising a plurality of sub-views constituted by each sub-view_d') so as to use pixels having the same relative position with respect to the center of each macro-pixel; and

-rearranging the series of sub-views (f) constituting the generated sub-aperture image (670) based on at least one predetermined order_d’)。

15. The encoding device (1005) according to claim 14, wherein the predetermined order is a raster scan order, or a spiral order, or a zig-zag order or a chess-like order.

16. The encoding device (1005) according to any one of the preceding claims 13 to 15, wherein the Graphics Signal Processing (GSP) technique is a Graphics Fourier Transform (GFT) or a graphics based lifting transform (GLT).

17. An apparatus (1100) for decoding an encoded original lenslet image, said apparatus comprising:

-an input unit (1180) configured to read a series (f) of sub-views_d') a bit stream (f) encoded with graphics_d ^{^}) Wherein each sub-view (1310, 1510, 1620) comprises pixels of the same angular coordinate extracted from different macro-pixels (650, 1320, 1340) of the original micro-lens image (f), each macro-pixel comprising a scene (f) from a communication channel or storage medium (1095)_d ^{^}) The pixels corresponding to the particular viewing angle of the same point,

-an output unit (1170) reproducing and/or outputting the processed light field image or video stream (f)_d ^～)；

Characterized in that the device further comprises:

-at least one processing unit (1110, 1120, 1150) configured to perform a method for decoding the encoded image or video stream (f)_d ^～) A set of instructions;

-a storage unit (1140) containing said coded image or video stream (f)_d ^～) Image data of interest and a result of execution of the instruction for decoding;

-said at least one processing unit (1110, 1120, 1150) configured for receiving and decoding a series of sub-views (f) according to a predetermined Graphics Signal Processing (GSP) technique_d") to recover a reconstructed series of sub-views (f)_d"), wherein the sub-view (1310, 1510, 1620) comprises virtual pixels (610, 620) located in pixel positions having undefined color values;

-the at least one processing unit (1110, 1120, 1150) configured to receive the reconstructed series of sub-views (f)_d") and generates a full-color demosaiced tiny lenticular image (1810) by color interpolation by applying a demosaicing technique to the sub-views (1310, 1510, 1620).

18. The decoding device (1100) according to claim 17, wherein the at least one processing unit (1110, 1120, 1150) is configured for:

-rearranging the sub-views based on at least one predetermined order to obtain a reconstructed sub-aperture image (f) comprising a plurality of sub-views (1310, 1510, 1620)_d"'), and;

-reconstructing the micro-lens image so as to position the pixels of each sub-view (1310, 1510, 1620) in the respective macro-pixels (650, 1320, 1340) based on the order used when encoding the original micro-lens image (f) that was received.

19. A computer program product which can be loaded into a memory unit of a digital processing device and comprises part of a software code for performing the method according to any of claims 1 to 12.

Technical Field

The present invention relates to a method and apparatus for encoding and/or decoding digital images. In particular for encoding and/or decoding digital images provided by so-called light field cameras.

Background

During operation, a conventional digital camera may capture a two-dimensional (2D) image representing the total amount of light projected onto each point on a photosensor within the camera. However, this 2D image does not contain information about the directional distribution of the light impinging on the photosensor.

In contrast, a light field camera samples a four-dimensional (4D) optical phase space or light field, which can capture information about the directional distribution of light rays. The direction information at the pixel corresponds to the position information at the aperture.

This information captured by the light field camera may be referred to as a light field, plenoptic function, or radiation.

In computed photography, the light field is a 4D recording of all rays in 3D. The radiation describes spatial and angular information and is defined as energy density per unit solid angle (in radians), per unit area.

The light field camera captures radiation, so different post-processing can be done, for example: refocusing, noise reduction, 3D view construction and depth of field modification, and in addition, it has a wide range of applications including 3D TV and medical imaging.

The light field can be captured with a conventional camera. In one conventional approach, M × N images of a scene are captured from different locations using a conventional camera. For example, if 8 × 8 images are captured from 64 different locations, 64 images would result. A pixel at each position (i, j) in each image is captured and placed in a block to generate 64 blocks.

Fig. 1(a) shows an exemplary prior art light field camera 100 or camera array, which employs an array of two or more objective lenses 110. Each objective lens is focused on a specific area of the photosensor 140. The light field camera 100 may be viewed as a combination of two or more conventional cameras, each camera simultaneously recording an image of an object on a particular area of the photosensor 140. The captured images may then be combined to form one image.

Fig. 1(b) shows an exemplary prior art plenoptic camera 150, another type of light field camera, which employs a single main objective lens 160 and a microlens or lenslet array 170, the microlens or lenslet array 170 comprising, for example, about 100,000 lenslets.

The lenslet array 170 is typically placed at a small distance (0.5 mm) from the photosensor 180, which can be, for example, a Charge Coupled Device (CCD). By the microlens array 170, each point of the 3D scene is projected to a group of pixels (called macropixels) instead of a single pixel in a conventional 2D image. Each pixel within a macropixel corresponds to a particular perspective of the scene at the same point.

Fig. 2 shows an example of an image captured by plenoptic camera 150, also called a micro-lens image (fig. 2(a)), which consists of a macro-pixel array (fig. 2(b)), typically hexagonal or circular (fig. 2 (c)). The lenslet array 170 enables the plenoptic camera 150 to capture a light field, i.e. to record not only the image intensity, but also the intensity distribution at each point, in different directions.

Each micro lens splits the beam from the main lens 160 to it into rays coming from different "pinhole" positions on the aperture of the main objective lens 160.

A plenoptic captured by a camera 150 with, for example, 100,000 tiny lenses will contain 100,000 macro-pixels. A light field captured from a light field camera, including a plenoptic camera, is typically saved as a tiny lens image (fig. 2 (a)).

Compression of light field images is an important issue for computational photography. Due to the 4D nature of the light field, and the fact that 2D slices of the light field are equivalent to conventional pictures, uncompressed files tend to be large and may occupy several GB of space.

Meanwhile, there is redundancy in the data: all rays starting from a surface point have approximately the same radiation.

There is therefore an incentive to compress light field images. Conventionally, light field images have been compressed using existing lossy and lossless image/video compression techniques.

Some conventional image compression methods treat the 2D slices in the light-field image as separate images and compress them separately. In other approaches, the 4D light field image is contained in one 2D image, which is simply compressed into one image by conventional methods.

These methods do not exploit the information and redundancy that are unique to light field images, but rather treat them as normal images.

JPEG (joint photographic experts group) is a common conventional image compression standard that employs block-based compression techniques. JPEG divides an image into 8 x 8 blocks of pixels, or more generally, block-based compression techniques divide an image into m x n blocks of pixels and compress these blocks using some transform function.

JPEG and other block-based compression techniques are known to create the problem of "blocking artifacts" due to the division of the image into blocks, where the compressed image appears to be composed of blocks, or has other introduced vertical/horizontal artifacts (e.g., vertical or horizontal lines, discontinuities, or streaks).

The JPEG standard and other block-based compression techniques can be used to directly compress light field images without taking into account the details of the light field data.

However, due to the quasi-periodic nature and the compressed blocky nature of light field images, the results tend to be poor, including significant blocky artifacts. Such blocking artifacts can severely corrupt the angular information in the light-field images, and thus can limit the horizontal and vertical parallax that can be achieved using these images.

By employing video coding standards such as AVC (advanced video codec) or HEVC (high efficiency video coding), several methods have been proposed to compress light field images specifically to frames in video.

These standards were developed by the joint collaboration team of Moving Picture Experts Group (MPEG) and video coding (JCT-VC) and employ block-based coding methods using Discrete Cosine Transform (DCT) techniques.

In light field image processing, a microlens image is generally converted into a so-called sub-aperture image, as shown in fig. 3 (a).

The sub-aperture image is composed of a plurality of sub-views, each of which is composed of pixels having the same angular coordinate extracted from different macro-pixels in the micro-lens image.

In fig. 3(a), the sub-views are arranged according to relative positions within the macro-pixel. In sub-aperture images, two types of redundancy can be used for compression, i.e. intra-view and inter-view correlation, in general.

The first redundancy type is spatial correlation within each view, which is similar to regular 2D images, where nearby pixels tend to have similar pixel intensities.

The second redundancy type is inter-view dependency between adjacent sub-views. In the literature of light field data compression, both of these correlation types are exploited in a similar manner to intra-and inter-prediction in video coding standards such as AVC and HEVC.

In general, these methods can be divided into two categories.

The first method compresses the sub-aperture image within a modified prediction in the current video codec. In "improved 3D holographic image and video coding spatial prediction" published in 2011 at the 19 th IEEE european signal processing convention, Conti et al propose an additional self-similarity (SS) mode and SS skip mode, which are already included in the current intra-prediction mode, to exploit the correlation between adjacent sub-views in the sub-iris image.

In the second approach, the sub-views in the sub-aperture image are rearranged into a pseudo-video series, which is then encoded using the existing video coding standard (e.g., HEVC). A work with different sub-image rearrangement schemes applied. Perra and Assuncao in "tiling and pseudo-temporal data alignment based light-field image efficient coding" (published in IEEE Multimedia & Expo works (ICMEW 2016)), propose a light-field coding scheme based on a low-complexity pre-processing method that can generate pseudo-video series suitable for standard compression using HEVC.

However, the aforementioned prior work requires a preprocessing stage that increases data representation redundancy prior to compression.

To illustrate the limitations of the current state of the art with respect to compressing light field images into frames in video, the architecture of a light field image encoding-decoding system is shown in its basic functional units in fig. 4.

The encoder 400 includes at least a light field image preprocessing unit 420, a sub-aperture image processing unit 430, and a block-based encoder unit 440.

The light field image pre-processing unit 420 takes as input the patterned raw micro-lens image f generated by the photosensor 410 (e.g., CDD sensor).

The patterned original tiny lens image f is a multicolor image, i.e., an image including information about different colors (e.g., red, green, blue), which can be generated by employing a color filter array on a square grid of photosensors, such as a known bayer filter. Fig. 5(a) and 6(a) schematically show examples of the color filter array (element f).

Specially arranged color filters are used in most single-chip digital image sensors used in digital cameras, camcorders, and scanners to create color images.

Referring to fig. 5, the pre-processing unit 420 first generates a full-color tiny lens image by RGB color interpolation, which is known in the art as a demosaicing technique, which increases the amount of data by three times the original raw data (fig. 5(b)), because for each color channel R, G, B, an image of the same size as the original raw image is generated.

The conversion from full-color lenslet images to sub-aperture images is subsequently described by d.g. dansereau, o.pizza rro and s.b. williams in "decoding, calibration and correction of lenslet-based plenoptic cameras" (computer vision and pattern recognition (2013) published in the IEEE agreed paper set).

During the conversion process, the demosaiced lenslet image is first rotated, translated, and scaled, and the estimated position of the center of the macropixel (indicated using dashed lines) (fig. 5(c)) may fall on the integer pixel position. This operation results in a 20% increase in the number of pixels. Finally, a sub-aperture image f' is generated from the converted microlens image based on the relative position of each pixel with respect to the center of the macro-pixel.

The sub-aperture image processing unit 430 takes the sub-aperture image f 'as input, and arranges the sub-view images constituting f' into a series.

The sub-view series may comply with various standards, for example, the series may be composed in a group of pictures or GOP (group of pictures) structure, specifying an arrangement order of intra and inter frames.

The resulting series f "of sub-views may be received from a block-based encoder.

The block-based encoder unit 440 takes the sub-view series f "as input and encodes it according to a well-known video coding standard, such as AVC or HEVC.

Furthermore, the method used by the standard during compression requires that the format be changed from 4: 4: 4-RGB format conversion to 4: 2: 0-YUV format.

Although downsampling of the U and V components reduces the redundancy introduced by demosaicing and scaling, rounding effects during color conversion may introduce other distortions. The output of the encoder unit 440 is a bitstream f that conforms to the standard.

The encoder 400 then transmits the bit streams f to the receiver node over a bandwidth limited channel or stores them on the memory support 450 for later use, e.g., for decoding purposes.

Decoder 460 includes at least a block-based decoder unit 470 and a post-processing unit 480. For simplicity, we assume that the bitstream f available to the decoder 460 is the same as the bitstream generated by the encoder 400, because in practical applications, sufficient measures are taken to minimize read/write or channel errors that occur during transmission of information from the encoder to the decoder.

The block-based decoder unit 470 takes the bitstream f ^ as input and generates the reconstructed sub-view series f' ″, according to the appropriate video coding standard (AVC, HEVC, etc.).

As described above, the post-processing unit 480 takes the series of reconstructed sub-views f' "as input and generates a reconstructed light field image f using techniques capable of operations such as image refocusing, noise reduction, 3D view construction and depth field modification^～。

Finally, the reconstructed light field image f is displayed using a display unit 490 (such as a television, monitor, etc.)^～。

In real world applications, communication is over a bandwidth-limited channel, so it is desirable that the light-field image can be subjected to some form of effective compression before being placed on the channel. The same applies to storing light field images on storage units having a limited capacity.

With respect to the problem of compressing light-field images, some pre-processing stages add redundancy to the data representation prior to compression. In a light field camera, a CCD plate is used on a photosensor to capture color information, each pixel location containing only the intensity of a single color component (R, G or B).

However, the conventional compression techniques all require full-color sub-aperture images as input.

Therefore, demosaicing is required to produce a full-color tiny lens image from the CCD patterned image, which increases the amount of data by three times the original raw data; another redundancy is introduced in the conversion from the tiny lens to the sub-aperture image.

During the conversion, the demosaiced lenslet image will rotate, translate, and scale so that the estimated location of the center of the macropixel may fall on an integer pixel location, resulting in a 20% increase in the number of pixels.

Furthermore, for methods using compression standards (e.g., in AVC and HEVC), it is necessary to compare 4: 4: 4-RGB sub-aperture image conversion to 4: 2: 0-YUV image. Although downsampling the U and V components reduces the redundancy introduced by demosaicing and scaling, rounding effects during color conversion may introduce other distortions.

Disclosure of Invention

The present invention aims to address these and other problems by providing a method and apparatus for encoding and/or decoding a digital image provided by a light field camera.

The basic idea of the invention is to produce a new compact light-field image data representation that avoids redundancy due to demosaicing and scaling. The new representation can be efficiently compressed using Graphics Signal Processing (GSP) techniques. Instead, in the decoding stage, the inverse GSP technique is performed.

In more detail, in the encoding phase, to place the estimated central position of each macro-pixel at an integer pixel position, the pixels of the original light-field image are spatially replaced with a new, transformed multicolor image, having more columns and rows with respect to the received original image. Such a displacement may introduce virtual pixels, i.e. pixel positions having undefined values. Then a series of sub-views is obtained and a bitstream (f) is generated by encoding the graphical representation of the sub-view images_d^)。

On the decoding side, the bitstream (f) is encoded in a reverse procedure to the GSP technique applied on the encoder side_d^) performing graphics decoding, obtaining a reconstructed sub-view series (f) from the result of the graphics decoding_d"). The series of sub-views comprises virtual pixels introduced at the encoding side for centering the macro-pixels to integer-pixel positions. Then, a demosaicing filter is applied to the series of sub-views to obtain a demosaiced full-color micro-lens image from which a full-color sub-aperture image (f) is obtained_d”’)。

The method disclosed in the present invention can be directly applied to an initial color gamut (e.g., RGB color gamut) without performing color conversion and rounding operations during encoding, because the color conversion and rounding operations typically cause errors.

Drawings

The characteristics and other advantages of the invention will become apparent from the description of the embodiments shown in the accompanying drawings, which are given by way of non-limiting example only.

Fig. 1 shows an exemplary light field camera 100 and an exemplary plenoptic camera 150;

FIG. 2 shows an example of a tiny lens image captured by plenoptic camera 150;

FIG. 3 shows an example of a sub-aperture image consisting of a set of sub-views;

FIG. 4 shows a block diagram of a light field image encoding-decoding system according to the prior art;

FIG. 5 illustrates a pre-processing operation for encoding a light field image according to the prior art;

FIG. 6 illustrates pre-processing operations for encoding a light field image according to an embodiment of the present invention;

FIG. 7 illustrates the function of a pre-processing stage for encoding a digital light field image according to an embodiment of the present invention;

FIG. 8 illustrates the function of a demosaicing stage for decoding a digital light field image in accordance with an embodiment of the present invention;

FIG. 9 shows a block diagram of a light field image encoding-decoding system according to an embodiment of the present invention;

FIG. 10 shows a block diagram of an apparatus for compressing a digital light field image or video stream according to an embodiment of the invention;

FIG. 11 shows a block diagram of an apparatus for decompressing a digital light-field image or video stream, according to an embodiment of the invention;

FIG. 12 shows two examples of color filter arrays;

FIG. 13 illustrates the process of generating a sub-view from a macropixel array, and vice versa;

fig. 14 shows an example of a green component of a generated sub-aperture image according to an embodiment of the present invention;

FIG. 15 is an exemplary raster scanner rearrangement diagram of sub-views that make up the generated sub-aperture image;

fig. 16 shows an example of a sub-view series subdivided according to the structure of a GOP;

FIG. 17 illustrates testing the performance of an encoder/decoder pair implemented according to an embodiment of the invention; and

fig. 18 illustrates operations for decoding a light field image according to an embodiment of the present invention.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

In this description, any reference to an "embodiment" is intended to indicate that a particular configuration, structure, or characteristic described in connection with the embodiment of the invention is included in at least one embodiment. Thus, the appearances of the phrase "in one embodiment" and other similar phrases in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, any particular configuration, structure, or characteristic may be combined in any manner deemed suitable in any one or more embodiments.

Thus, the following references are used for simplicity only and do not limit the scope or extension of the various embodiments.

Referring to fig. 10, an apparatus 1005 for compressing a digital image or video stream (also referred to as an encoding device 1005) includes the following sections:

a processing unit 1010, similar to a Central Processing Unit (CPU), configured to execute a set of instructions for performing a method for encoding digital images or video streams according to the present invention (the method will be described in detail hereinafter in this specification);

a storage unit 1030, which may be of any type (volatile or non-volatile storage unit, removable or fixed storage unit) and any technology (e.g. electronic technology, semiconductor-based technology or opto-electronic technology), containing data relating to an image to be compressed, and preferably containing instructions implementing a method of encoding a digital image or video stream according to the invention, wherein image data is a representation of at least a part of said image, and preferably in binary format, and wherein said storage unit 1140 may also contain the result of instructions to perform said method;

an input unit 1070 (e.g., an I/O device) that may be configured by the CPU1010 to read a light field image or video stream to be processed from the (video) source 1000; such an input unit may for example comprise an adapter according to at least one of the following standards: USB, firewire, RS232, IEEE 1284, Ethernet, Wi-Fi and the like;

a pre-processing unit 1020 configured for performing the generation of a series of sub-views f starting from an original micro-lens image f_dThe stages of the method of the graphical representation of' which are generated by the light field image or video source 1000. In particular, the unit is configured to receive the original lenslet image from source 1000 and then transform it so as to place the estimated central position of each macro-pixel (constituting the original lenslet image) at integer pixel positions. Is connected withThen, starting from the transformed original microlens image, a sub-aperture image 670 is generated, and then a series of sub-views (constituting the sub-aperture image) is arranged based on at least one predetermined order (e.g., a raster scan order, a spiral order, a zigzag order, a chess-like order, etc.). Importantly, the same order is also used in the decoding device 1100. Finally, dividing the sub-view series according to the group of pictures GOP, and obtaining the sub-view series f from the sub-view series_d' is shown graphically.

A graphics encoding unit 1040 configured to retrieve the series f of sub-views from the storage unit 1030_d' and performs the stages of a method for encoding digital images or video streams according to Graphics Signal Processing (GSP) techniques, such as the Graphics Fourier Transform (GFT) or the graphics-based lifting transform (GLT) for encoding each GOP separately, and the resulting bit stream f_dStoring back into the storage unit 1030;

an output unit 1080 (e.g., a network or storage adapter) that may be configured by CPU1010 to transmit the processing results over a communication channel to a destination 1095 (e.g., a storage medium, a remote client, etc.); the output unit may for example comprise an adapter according to at least one of the following standards: ethernet, SATA, SCSI, etc.;

a communication bus 1090 that allows information to be exchanged between the CPU1010, the preprocessing unit 1020, the storage unit 1030, the graphic encoding unit 1040, the input unit 1070, and the output unit 1080. The CPU1010, the preprocessing unit 1020, the storage unit 1030, the graphic encoding unit 1040, the input unit 1070, and the output unit 1080 may be connected by a star structure, instead of using the communication bus 1009.

A video source 1000, which may be a provider of real-time images (e.g., a light field camera), or may be a provider of stored content (e.g., a disk or other storage and memory device).

A Central Processing Unit (CPU)1010, which is responsible for activating the appropriate series of operations performed by the units 1020, 1040 in the encoding process performed by the device 1005.

These units may be implemented by dedicated hardware components (e.g., CPLDs, FPGAs, etc.), or may be implemented by one or more sets of instructions executed by CPU 1010. In the latter case, the units 1020, 1040 are merely logical (virtual) units.

When the apparatus 1005 is in an operating state, the CPU1010 first acquires a light field image f from the video source 1000 and loads it into the storage unit 1030.

Next, the CPU1010 activates the preprocessing unit 1020, the preprocessing unit 1020 acquires the original microlens image f from the storage unit 1030, performs the stages of the method for preprocessing the original microlens image f according to the embodiment of the present invention (see fig. 7), and subjects the resulting series of sub-views f to the preprocessing, and the like_dThe graphical representation of' is stored back in the storage unit 1030.

Next, the CPU1010 activates the graphic encoding unit 1040, and the graphic encoding unit 1040 acquires the sub-view series f from the storage unit 1030_d' and is performed in accordance with a Graphics Signal Processing (GSP) technique, such as a Graphics Fourier Transform (GFT) or a graphics-based lifting transform (GLT), for the series of sub-views f_d' stages of the method of encoding, and the resulting bit stream f_dStoring back into the storage unit 1030.

At this point, CPU1010 may process the data from storage unit 1030, which is no longer needed at encoder 1005.

Finally, the CPU1010 acquires the bit stream f from the storage unit 1030_dAnd ^ and put it into a channel or save it to a storage medium 1195.

Referring also to fig. 11, an apparatus 1100 for decompressing a digital image or video stream (also referred to as a decoding apparatus 1100) includes the following:

a processing unit 1110, similar to a Central Processing Unit (CPU), configured to execute a set of instructions for performing a method for decoding a digital image or video stream according to the present invention (said method will be described in detail hereinafter in this specification);

a storage unit 1140, which may be of any type (volatile or non-volatile storage unit, removable or fixed storage unit) and any technology (e.g. electronic technology, semiconductor-based technology or opto-electronic technology), containing data relating to a received image to be compressed, and preferably containing instructions implementing a method of decompressing a digital image or video stream according to the invention, wherein image data is a representation of at least a part of said image, and preferably in binary format, and wherein said storage unit 1140 may also contain the result of instructions to execute said method;

an input unit 1180 (e.g., a network or storage adapter) that may be configured by the CPU 1110 to read an encoded image or video stream from a communication channel or storage medium 1095; the input unit 1180 may, for example, include an adapter according to at least one of the following standards: ethernet, SATA, SCSI, etc.;

a graphics decoding unit 1120 configured to perform the stages of the method for decompressing a digital light-field image or video stream according to the invention; in particular, the unit is configured to receive and decode a bitstream f of a series of sub-views according to a predetermined Graphics Signal Processing (GSP) technique_d^ thereby restoring the reconstructed series of sub-views f_d”；

A demosaicing unit 1150 configured to perform a method for receiving a reconstructed series f of sub-views according to an embodiment of the present invention_d", and generates a full-color sub-aperture image f_d"' (see fig. 8 and 18) of the method. In particular, from the reconstructed series f of sub-views_d"initially, a reconstructed sub-aperture image is obtained by rearranging the series of reconstructed sub-views according to at least one predetermined order, and then a reconstructed microlens image is generated by positioning the pixels of each word view into corresponding macropixels 650 based on the location of the pixels. In turn, a demosaicing technique is applied to the reconstructed microlens image to generate a full-color microlens image by RGB color interpolation. Finally, a full-color sub-aperture image f is generated from the full-color microlens image after conversion_d', so as to include said full-color sub-aperture image f_dEach sub-view of' ″ is constructed by considering the same relative pixel position with respect to the center of each macro-pixel.

A post-processing unit 1130 configured to receive the full-color sub-aperture image f_d"' and for generating a reconstructed light-field image f_dTo. In particular, by using techniques that enable operations such as image refocusing, noise reduction, 3D view construction, and depth of field modification;

an output video unit 1170, such as a video adapter, which may be configured by the CPU 1110 to render and/or output the processed (decoded or decompressed) light field image or video stream, preferably on the display 1195; the output video unit may for example comprise an adapter according to at least one of the following standards: VGA, S-video, HDMI, Ethernet, etc.;

a communication bus 1190 that allows information to be exchanged between CPU 1110, graphics decoding unit 1120, demosaicing unit 1150, storage unit 1140, output video unit 1170, and input unit 1180. Alternatively, the communication bus 1190, the CPU 1110, the graphics decoding unit 1120, the demosaicing unit 1150, the storage unit 1140, the output video unit 1170, and the input unit 1180 may be connected by a star configuration.

As for the encoding apparatus 1005 described previously, the CPU 1110 of the decoding device 1100 is also responsible for activating the appropriate series of operations performed by the units 1120, 1130 and 1150.

These units may be implemented by dedicated hardware components (e.g., CPLDs, FPGAs, etc.), or may be implemented by one or more sets of instructions stored in a memory unit that are executed by CPU 1110; in the latter case, units 1120, 1130, and 1150 are merely logical (virtual) units.

When the device 1100 is in an operational state, the CPU 1110 first obtains a bit stream f from a channel or storage medium 1095 via any possible input unit_dAnd loads it into storage unit 1140.

Then, the CPU 1110 activates the graphics decoding unit 1120, and the graphics decoding unit 1120 acquires the bit stream f from the storage unit 1140_dRoot of Dahurian FangPerforming a bitstream f for decoding a series of sub-views according to a predetermined Graphics Signal Processing (GSP) technique_dStages of a method such as the Graphic Fourier Transform (GFT) or the graphic-based lifting transform (GLT), outputting a reconstructed series of sub-views f_d", and load it into the storage unit 1140.

Any GSP technology may be used in accordance with the present invention. It is important that the same technique is used in the encoding and decoding device 1100 to ensure a correct reconstruction of the initial light-field image.

Next, the CPU 1110 activates a demosaicing unit 1150, and the demosaicing unit 1150 retrieves the reconstructed series f of sub-views from the storage unit 1140_d", and executes the sub-aperture image f for generating full color according to the present invention_dThe various stages of the method of' ″, and loading them into the storage unit 1140.

Then, the CPU 1110 activates the post-processing unit 1130, and the post-processing unit 1130 acquires the full-color sub-aperture image f from the storage unit 1140_d"', and generates a reconstructed light-field image f_dAnd stores it in the storage unit 1140.

At this time, the CPU 1110 can process data from the storage unit, which is no longer needed on the decoder side.

Finally, the CPU 1110 acquires the restored light field image f from the storage unit 1140_dAnd sent to the display unit 1195 via the video adapter 1170.

It should be noted that how the encoding and decoding devices depicted in the figures may be controlled by the CPU 1110 to operate internally in a pipelined manner, so that the overall time required to process each image can be reduced, i.e., by executing more instructions simultaneously (e.g., using multiple CPUs and/or CPU cores).

It should also be noted that many other operations may be performed on the output data of the encoding apparatus 1005, such as modulation, channel coding (i.e., error protection), before the output data of the encoding apparatus 1005 is transmitted onto a channel or stored on a storage unit.

In contrast, the same inverse operations, e.g., demodulation and error correction, may be performed on the input data of the decoding apparatus 1100 before, e.g., the input data of the decoding apparatus 1100 is efficiently processed. Those operations are not relevant to embodying the present invention and will therefore be omitted.

In addition, the block diagrams shown in fig. 10 and 11 are merely exemplary. They allow an understanding of how the invention works and how it may be carried out by those skilled in the art.

Those skilled in the art will understand that these diagrams are not meant to be limiting in the sense that the functions, interrelationships, and signals shown therein can be arranged in many equivalent ways. For example, operations that appear to be performed by different logical blocks may be performed by any combination of hardware and software resources, or may be the same resources used to implement different or all blocks.

The encoding process and the decoding process will now be described in detail.

Encoding

To illustrate how the encoding process takes place, it is assumed that the image f (or a block thereof) to be processed is preferably a color-patterned original lenticular image, in which each pixel is 8-bit encoded, so that the value of the pixel can be represented by an integer value between 0 and 255. Of course, this is merely an example; the invention can process images with higher color depth (e.g. 16, 24, 30, 36 or 48 bits) without loss of generality.

Image f may be obtained by applying a color filter array on a square grid of photosensors (e.g., CDD sensors); a well-known color filter array is, for example, a bayer filter, which is used in most single-chip digital image sensors.

Fig. 12 shows some examples of color filter arrays, where the letters R, G, B denote red, green, and blue color filters, respectively, applied on a grid of photosensors.

Referring also to fig. 9, it is now described how the different parts of the encoding device 900 interact to compress a digital light field image or video stream.

Referring also to fig. 6 and 7, the preprocessing unit 920 preferably includes the steps of:

an original microlens image receiving step 705 for receiving a color patterned original microlens image f (see fig. 6(a)), which is generated by the photosensor 410; before the subsequent operation, the original image is not demosaiced;

a transformation step 710 for translating, rotating and scaling at least a part of said original micro-lens image f obtained in the previous step; in particular, the pixels of the original lenslet image f are spatially shifted (which may include scaling and/or translation and/or rotation) in a new polychromatic image having more rows and more columns relative to the received original lenslet image, wherein virtual pixels (e.g. 610, 620) are inserted at pixel positions having undefined color channel values, and wherein said shifting is performed so as to place the estimated center position (indicated by the dashed line in fig. 6 (b)) of each macropixel 650 at integer pixel positions. For example, these operations may be described as by d.g. dansereau, o.pizza ro and s.b. williams in "decoding, calibration and correction of micro-lens based plenoptic cameras" (computer vision and pattern recognition (2013) published in IEEE conference paper);

-preferably: a sub-aperture generating step 715 for generating a sub-aperture image 670 from the transformed original micro-lens image such that each sub-view comprising said sub-aperture image is constituted by taking into account the same relative pixel position with respect to the center of each macro-pixel. FIG. 13 illustrates a child view generation process: the pixels 1360 and 1370 that make up the sub-view 1310 are fetched from the macro-pixels 1320 and 1340, respectively, so that they have the same relative positions with respect to their corresponding pixel centers 1330 and 1350. Fig. 14(a) shows an example of the green component of the generated sub-aperture image 670, and fig. 14(b) is a detail of fig. 14 (a). It should be noted that due to the presence of virtual pixels (e.g. 610, 620), the pixels representing the image information (e.g. 630, 640) are irregularly spaced over the generated sub-aperture image 670 (see fig. 6(c) and 14(b)), so the image shown, for example, cannot be encoded by conventional transform coding such as that close to the HEVC standard, e.g. Discrete Cosine Transform (DCT) or Discrete Wavelet Transform (DWT). The irregular spacing is due to the lack of a demosaicing operation in the prior art, which renders the conventional encoding method ineffective. However, in order to encode the generated sub-aperture image, its graphical representation and associated graphics-based compression techniques may be used instead. In fact, the irregular spacing of the pixels can create discontinuities (i.e., high frequency components) in the image, which makes conventional encoding techniques inefficient relative to GSP techniques; in addition, images with fewer pixels may make the graphical representation simpler and more efficient to compress.

-preferably: a sub-aperture rearranging step 720 for rearranging the sub-views constituting the generated sub-aperture image 670 in a series based on at least one predetermined order. FIG. 15 illustrates an exemplary raster scanner rearrangement of sub-views 1510 constituting, for example, a generated sub-aperture image 1530, obtaining a series f of sub-views in horizontal scan order_d’；

A sub-view graph generating step 725 for generating a graphical representation of the series of sub-view images for performing graph-based compression, i.e. any GSP technique, on the series of sub-view images.

Two different graphical connection schemes may be considered.

The first approach only considers intra-view connections when constructing the graph, where each node is connected to a predetermined number K of nearest nodes, in terms of euclidean distance, i.e. the distance between the available irregularly spaced pixels (e.g. 630, 640) within the same series of sub-views.

The second approach takes into account intra-view and inter-view dependencies between the series of sub-views.

In order to reduce the graphics complexity, the sub-view series is divided into a plurality of GOPs composed of a predetermined number G of sub-views.

FIG. 16 shows a series f of sub-views subdivided according to a GOP structure 1610_dThe example of' which consists of four sub-views including a reference sub-view 1620.

Next, sub-view matching for motion estimation between each sub-view and a previous reference sub-view is sequentially performed.

An optimal global motion vector may be determined for each sub-view based on the Sum of Squared Errors (SSE), and the optimal global motion vector may be evaluated taking into account pixel samples of each sub-view and a previous reference sub-view.

Matching considers the entire sub-view rather than the block-based matching employed by applying motion estimation in, for example, HEVC.

Specifically, prior to motion search, each m × n sub-view is first extrapolated to a size of (m +2r) × (n +2r), where r is the motion search width.

This reduces the burden of encoding of motion vectors. Sub-view extrapolation may be performed by employing several techniques, such as by copying boundary pixel samples for each sub-view.

After motion estimation, each pixel is connected to a predetermined number P of nearest neighbors in terms of euclidean distance within the same sub-view and the reference view offset by the best motion vector.

Referring also to fig. 9, the graphics encoding unit 940 (and also the decoding apparatus 1100 for decoding) encodes the series of sub-views using a Graphics Signal Processing (GSP) technique such as Graphics Fourier Transform (GFT) or graphics-based lifting transform (GLT) for encoding each GOP separately.

The graph G ═ (E, V) consists of a set of nodes V ∈ V connected to the link. For connecting node v_iAnd v_jEach link e of_i，jE, existence of nonnegative value w_ij∈[0，1]The weights capture the similarity between connected nodes.

The image f may be represented as a graph, where the pixels of the image correspond to graph nodes, and the weights of the links describe pixel similarity, which may be evaluated using a predetermined non-linear function (e.g., a gaussian or cauchy function), depending on the ith pixel f of the image_iAnd the jth pixel f_jThe gray scale space distance d between_i,j＝|f_i-f_j|。

In a Graphic Fourier Transform (GFT) technique, graphic information may be represented by a weight array WThe elements of the weight array W are the weights W of the graph_ijThen, the corresponding laplace array can be obtained as L ═ D-W, where D is a diagonal array with elementsGFT is expressed by the mathematical expressionWhere U is the array, the columns of the array are the feature vectors of the array L, and f is the raster scanner vector representation of the image f.

Then to the coefficientAnd a weight w_ijQuantization and entropy coding are performed. More relevant work known in the art describes improving GFT-based coding methods, such as shown in w.hu, g.chenng, a.ortega and o.c.au in "compressed multi-resolution graphics fourier transform for piecewise smooth images" (published on IEEE transaction for image processing).

The graph-based lifting transform (GLT) technique is a multistage filter bank that guarantees reversibility. At each level m, the graph nodes are first divided into two disjoint sets, i.e. prediction sets SP^mAnd update the set SU^m。

SU^mThe value in (1) is used for predicting SP^mThe obtained prediction error is stored in SP^mThen used to update SU^mA value of (1).

SU^mWill be used as the input signal of level m +1, and SP^mUsing only SU for coefficient calculation in^mAnd vice versa.

Repeating this process produces a multi-resolution decomposition. For video/image compression applications, the update set SU for the highest level M will be^MThe coefficients in (1) are quantized and entropy coded. More related work known in the art describes methods to improve GLT based coding, for example by yThe lifting transform of graphics "(published in 2016 for the IEEE international processing conference on acoustics, speech and signals (ICASSP 2016)).

With reference also to fig. 9 and 10, it is summarized that the digital image coding method according to the invention comprises the following phases:

a receiving stage in which at least a part of the original microlens image f of the color pattern is received through the input unit 1070;

a transformation phase, in which the original micro-lens image f of the color pattern is rotated, translated and scaled by unit 920 in order to place the estimated central position of each macro-pixel at integer pixel positions by processing unit 1010;

preferably: a sub-aperture generation phase in which a sub-aperture image 670 is generated from the transformed original micro-lens image by the processing unit 1010 by the unit 920 by composing each sub-view, thereby using pixels having the same relative position with respect to the center of each macro-pixel.

Preferably: a sub-aperture rearranging phase in which the unit 920 generates a series of sub-views by rearranging, by the processing unit 1010, the sub-views constituting the generated sub-aperture image based on at least one predetermined order (e.g., a raster scan order, as shown in fig. 15);

preferably: a sub-view graph generation phase in which a graph representation of a series of sub-view images is generated by a unit 920 by organizing said series of sub-views according to a GOP structure and by a processing unit 1010 by taking into account inter-view and intra-view correlations between sub-views within the GOP structure, so that each node of said graph representation is connected to a predetermined number of nearest nodes according to euclidean distances in a reference sub-view of the same sub-view and GOP structure;

in a graphics coding phase, in which a graphics representation of a sub-view image series is coded by a processing unit 1110 according to a predetermined Graphics Signal Processing (GSP) technique, a bitstream f is generated by a unit 940_dAnd (step 730).

Finally, the graphically coded bit stream f of the sub-view series_dA can be transmitted and/or stored by means of the output unit 1080.

Decoding

Referring to fig. 8 and 9, the decoder 960 includes a graphic decoding unit 970, a demosaicing unit 975, and a post-processing unit 980.

The graphic decoding unit 970 is configured to receive and decode a bitstream f of the sub-view series according to a predetermined Graphic Signal Processing (GSP) technique_dAnd outputs a reconstructed series f of sub-views_d"(step 805).

The demosaicing unit 975 preferably performs the following steps:

a sub-aperture image reconstruction step 810 for receiving the reconstructed series of sub-views f_d"and generate a reconstructed sub-aperture image f_d", rearranging each reconstructed sub-view in the series based on at least one predetermined order (see fig. 15);

a lenslet image reconstruction step 815 for receiving the reconstructed sub-aperture image f_d"' and generates a reconstructed micro-lens image, in particular the pixels of each sub-view 1310 being located in the corresponding macro-pixels 1320, 1340 according to their order, as shown in fig. 13;

a demosaicing filter step 820 for receiving the reconstructed microlens image and applying a demosaicing algorithm to generate a full-color microlens image by RGB color interpolation;

-preferably: a sub-aperture image generating step 830 for receiving the full-color microlens image after conversion and generating a full-color sub-aperture image f_d"', so that the full-color sub-aperture image f is included_dEach sub-view 1310 of ""' is constructed by considering the same relative pixel position with respect to each macro-pixel center 1330, 1350 (see fig. 13); this operation is performed on each color channel of the converted full-color microlens image.

Optional post-processing unit 980 is configured to receive full-color sub-aperture images f_d"', and use the operations allowed in the light field image (e.g. refocusing, noise reduction, 3D view construction and sceneDeep modification) to generate a reconstructed light field image f_d ^～。

With reference also to fig. 9 and 11, it is summarized that the method for decoding a digital image or video stream according to the invention comprises the following phases:

a receiving phase, wherein the bitstream f of the series of sub-views is received by means of an input unit 1180_d^；

A graphics decoding phase, in which the bitstream f is processed by the processing unit 1110 according to a predetermined Graphics Signal Processing (GSP) technique_dDecoding and outputting a reconstructed sub-view series f_d”；

Preferably: sub-aperture image reconstruction phase, in which a reconstructed sub-aperture image f is generated by means of the processing unit 1110_d"', and based on at least one predetermined order, rearranging each reconstructed sub-view in the series;

preferably: a micro-lens image reconstruction phase, wherein the reconstructed micro-lens image is generated by means of the processing unit 1110 such that the pixels of each sub-view are located in the corresponding macro-pixels based on the order used by the encoding method.

A demosaicing filtering stage in which full-color lenticular images are generated by the processing unit 1110 by RGB color interpolation by applying a demosaicing technique;

preferably: sub-aperture image generation phase, in which a full-color sub-aperture image f is generated by the processing unit 1110_d"', so that the full-color sub-aperture image f is included_dEach sub-view of' ″ is constructed by considering the same relative pixel position with respect to the center of each macro-pixel for each color channel of the full-color micro-lens image;

preferably: a post-processing stage in which the image is processed by using operations allowed in the light-field image (e.g. from full-color sub-aperture image f)_d"' initial refocusing, noise reduction, 3D view construction, and depth of field modification), through processing unit 1110, to generate a reconstructed light-field image f_d ^～。

Finally, canTo output the reconstructed light field image f by an output video unit 1170_d ^～And displayed on the display unit 1195.

Referring to fig. 15, 16 and 17, the results of the performance tests conducted by the inventors will be discussed. In this test, encoder-decoder pairs implemented according to embodiments of the present invention have been evaluated.

To perform the encoding-decoding test, the EPFL database (m.rerabek and t.ebrahimi in "new light-field image dataset", number EPFL-CONF-218363 (2016) in the eighth quality of multimedia experience international conference (QoMEX) was used.

The sub-aperture image contains 193 sub-views of size 432 × 624. Fig. 17 shows the performance of the method described in an embodiment of the invention, compared to a coding method using the HEVC standard.

The vertical axis represents R, G and the average PSNR of the B color component. Compared with the prior art scheme, the coding gain is realized in a high bit rate area.

For testing, the baseline HEVC-based scheme uses intra-full and low latency P-configurations.

For low latency P configuration in HEVC. The sub-views are arranged in a pseudo-series in the same manner as shown in fig. 15 and divided into a plurality of GOPs of size 4 (fig. 16).

The first view in each GOP is compressed into an I-frame and the remaining frames are encoded as P-frames. For the proposed graph-based approach, each node is connected to 6 nearest neighbors and the search width r-2 is used for sub-view matching.

The transformed coefficients are uniformly quantized and entropy coded using the alphabet and component partition (AGP) proposed by seidend and pelman (Said and Pearlman) in "new, fast, efficient image codec based on hierarchical tree set partitioning" (IEEE transaction published in circuits and systems of video technology, vol 6, No. 3, p. 243-250, 1996). To evaluate the reconstructed lenslet images, the reconstructed lenslet images were demosaiced and converted to color sub-aperture images using a graphics-based encoding, in the same manner as proposed by d.g. dansereau, o.pizza rro and s.b. williams in "decoding, calibration and correction of lenslet-based plenoptic cameras" (published in "computer vision and pattern recognition" (2013) in the IEEE agreed paper set).

In the baseline method, YUV 4 to be reconstructed: 2: the 0 series is converted to RGB 4: 4: 4, where the upsampling of the U and V components is based on the nearest neighbor.

Finally, the results obtained show that the approach described in the present invention can outperform prior art schemes like HEVC based approaches.

In alternative embodiments of the present invention, the patterned raw micro-lens image f may be generated by employing other color filter arrays placed on a square grid of photosensors, in addition to the well-known bayer filter.

In another embodiment of the invention, the patterned raw microlens image f may be generated by capturing other combinations of color components (e.g., RGBY (red, green, blue, yellow) instead of RGB).

In other embodiments, the invention is integrated in video coding techniques, wherein also the temporal correlation between different light-field images is taken into account.

For this reason, a prediction mechanism similar to that used in conventional video compression standards may be used in conjunction with the present invention to efficiently compress and decompress video signals.

In other embodiments, instead of the Graphics Fourier Transform (GFT) or graphics-based lifting transform (GLT), other Graphics Signal Processing (GSP) techniques may be employed to perform the encoding and decoding stages described in this disclosure.

In other embodiments, Graphics Signal Processing (GSP) techniques employed during the encoding and decoding stages may transmit signals from the encoder device to the decoder device. Alternatively, the GSP technique employed by both the encoder and decoder is defined in the technical standard.

The present description addresses some possible variants, but it is obvious to a person skilled in the art that other embodiments can be implemented, in which some elements can be replaced by other technically equivalent elements. The invention is thus not limited to the illustrative examples described herein, but is capable of many modifications, enhancements or substitutions to equivalent components and elements without departing from the basic inventive concept as claimed below.

35页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：发送/接收包括鱼眼视频信息的360度视频的方法及其装置

Method and apparatus for encoding and decoding digital light field images

相关技术

网友询问留言