Method and apparatus for processing image signal

文档序号:1048037 发布日期:2020-10-09 浏览:35次 中文

阅读说明:本技术 用于处理图像信号的方法及设备 (Method and apparatus for processing image signal ) 是由 具文模 M·萨勒海法尔 金昇焕 林宰显 于 2019-09-02 设计创作,主要内容包括:本发明的实施方式提供了用于处理视频信号的方法及设备。根本发明实施方式的图像信号解码方法包括以下步骤:基于当前块的高度和宽度,确定非分离变换的输入长度和输出长度;确定与非分离变换的输入长度和输出长度相对应的非分离变换矩阵;以及向当前块应用非分离变换矩阵,其中,在当前块的高度和宽度中的每一个为4时,非分离变换的输入长度和输出长度分别被确定为8和16。(The embodiment of the invention provides a method and equipment for processing a video signal. The image signal decoding method according to an embodiment of the present invention includes the steps of: determining an input length and an output length of a non-split transform based on a height and a width of a current block; determining a non-split transform matrix corresponding to an input length and an output length of the non-split transform; and applying a non-split transform matrix to the current block, wherein when each of a height and a width of the current block is 4, an input length and an output length of the non-split transform are determined to be 8 and 16, respectively.)

1. A method for decoding an image signal, the method comprising the steps of:

determining an input length and an output length of a non-split transform based on a height and a width of a current block;

determining a non-split transform matrix corresponding to the input length and the output length of the non-split transform; and

applying the non-split transform matrix to coefficients in a number that depends on the input length in the current block,

wherein if each of the height and width of the current block is equal to 4, the input length of the non-split transform is determined to be 8, and the output length of the non-split transform is determined to be 16.

2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

wherein the input length and the output length of the non-split transform are determined to be 16 if each of the height and the width of the current block is not equal to 8.

3. The method of claim 2, wherein the first and second light sources are selected from the group consisting of,

wherein the step of applying the non-separating transformation matrix comprises: applying the non-split transform matrix to a 4 x 4 region on an upper left side of the current block if each of a height and a width of the current block is not equal to 4 and a product of the width and the height is less than a threshold.

4. The method of claim 2, wherein the first and second light sources are selected from the group consisting of,

wherein the step of applying the non-separating transformation matrix comprises: applying the non-split transform matrix to a 4 x 4 region on an upper left side of the current block and a 4 x 4 region located on a right side of the 4 x 4 region on the upper left side if each of the height and the width of the current block is not equal to 4 and the width is greater than or equal to the height.

5. The method of claim 2, wherein the first and second light sources are selected from the group consisting of,

wherein the step of applying the non-separating transformation matrix comprises: applying the non-split transform matrix to a 4 x 4 region on an upper left side of the current block and a 4 x 4 region located at a bottom side of the 4 x 4 region on the upper left side if each of a height and a width of the current block is not equal to 4, a product of the width and the height is greater than or equal to a threshold, and the width is less than the height.

6. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

wherein the step of determining the non-discrete transformation matrix comprises the steps of:

determining a non-split transform set index based on an intra prediction mode of the current block;

determining a non-split transform kernel corresponding to a non-split transform index within a non-split transform set included in the non-split transform set index; and

determining the non-split transformation matrix from the non-split transformation based on the input length and the output length.

7. An apparatus for decoding an image signal, the apparatus comprising:

a memory configured to store the image signal; and

a processor coupled to the memory,

wherein the processor is configured to:

determining an input length and an output length of a non-split transform based on a height and a width of a current block;

determining a non-split transform matrix corresponding to the input length and the output length of the non-split transform; and

applying the non-split transform matrix to coefficients in a number that depends on the input length in the current block,

wherein if each of the height and width of the current block is equal to 4, the input length of the non-split transform is determined to be 8, and the output length of the non-split transform is determined to be 16.

8. The apparatus of claim 7, wherein,

wherein the input length and the output length of the non-split transform are determined to be 16 if each of the height and the width of the current block is not equal to 8.

9. The apparatus as set forth in claim 8, wherein,

wherein the processor is configured to: applying the non-split transform matrix to a 4 x 4 region on an upper left side of the current block if each of a height and a width of the current block is not equal to 4 and a product of the width and the height is less than a threshold.

10. The apparatus as set forth in claim 8, wherein,

wherein the processor is configured to: applying the non-split transform matrix to a 4 x 4 region on an upper left side of the current block and a 4 x 4 region located on a right side of the 4 x 4 region on the upper left side if each of the height and the width of the current block is not equal to 4 and the width is greater than or equal to the height.

11. The apparatus as set forth in claim 8, wherein,

wherein the processor is configured to: applying the non-split transform matrix to a 4 x 4 region on an upper left side of the current block and a 4 x 4 region located at a bottom side of the 4 x 4 region on the upper left side if each of a height and a width of the current block is not equal to 4, a product of the width and the height is greater than or equal to a threshold, and the width is less than the height.

12. The apparatus of claim 7, wherein,

wherein the processor is configured to:

determining a non-split transform set index based on an intra prediction mode of the current block;

determining a non-split transform kernel corresponding to a non-split transform index within a non-split transform set included in the non-split transform set index; and

determining the non-split transformation matrix from the non-split transformation based on the input length and the output length.

Technical Field

The present invention relates to a method and apparatus for processing an image signal, and more particularly, to a method and apparatus for encoding or decoding an image signal by performing a transform.

Background

Compression coding refers to a signal processing technique for transmitting digitized information through a communication line or storing the digitized information in a suitable form in a storage medium. Media such as video, image, and audio may be objects of compression encoding, and in particular, a technique of performing compression encoding on an image is called video image compression.

The next generation of video content will feature high dimensionality, high spatial resolution and high frame rate of the scene representation. To handle such content, memory storage, memory access rates, and processing power would increase significantly.

Therefore, there is a need to design an encoding tool that more efficiently processes the next generation of video content. In particular, video codec standards following the High Efficiency Video Coding (HEVC) standard require efficient transform techniques for transforming spatial domain video signals into frequency domain signals and prediction techniques with higher accuracy.

Disclosure of Invention

Technical problem

Embodiments of the present disclosure provide an image signal processing method and apparatus using a transform with high coding efficiency and low complexity.

Technical problems solved by the present disclosure are not limited to the above technical problems, and other technical problems not described herein will become apparent to those skilled in the art from the following description.

Technical scheme

According to an embodiment of the present disclosure, a method for decoding an image signal includes: determining an input length and an output length of a non-split transform based on a height and a width of a current block; determining a non-split transform matrix corresponding to an input length and an output length of the non-split transform; and applying the non-split transform matrix to coefficients by the number of the input lengths in the current block, wherein if each of the height and width of the current block is equal to 4, the input length of the non-split transform is determined to be 8, and the output length of the non-split transform is determined to be 16.

Further, if each of the height and the width of the current block is not equal to 8, the input length and the output length of the non-split transform are determined to be 16.

Further, applying the non-separable transformation matrix includes: if each of the height and the width of the current block is not equal to 4 and the product of the width and the height is less than the threshold, a non-split transform matrix is applied to a 4 x 4 region of the upper left side of the current block.

Further, applying the non-separable transformation matrix includes: if each of the height and the width of the current block is not equal to 4 and the width is greater than or equal to the height, a non-split transform matrix is applied to a 4 x 4 region of the upper left side of the current block and a 4 x 4 region located at the right side of the 4 x 4 region of the upper left side.

Further, applying the non-separable transformation matrix includes: if each of the height and the width of the current block is not equal to 4, the product of the width and the height is greater than or equal to a threshold, and the width is less than the height, a non-split transform matrix is applied to a 4 x 4 region on the upper left side of the current block and a 4 x 4 region on the bottom side of the 4 x 4 region on the upper left side.

Further, determining the non-split transformation matrix comprises: determining a non-split transform set index based on an intra prediction mode of a current block; determining a non-split transform kernel corresponding to a non-split transform index in a non-split transform set included in a non-split transform set index; and determining a non-split transformation matrix from the non-split transformation based on the input length and the output length.

According to another embodiment of the present disclosure, an apparatus for decoding an image signal includes: a memory configured to store an image signal and a processor coupled to the memory, wherein the processor is configured to determine an input length and an output length of a non-split transform based on a height and a width of a current block; determining a non-split transform matrix corresponding to an input length and an output length of the non-split transform; and applying the non-split transform matrix to coefficients by the number of the input lengths in the current block, wherein if each of the height and width of the current block is equal to 4, the input length of the non-split transform is determined to be 8, and the output length of the non-split transform is determined to be 16.

Technical effects

According to an embodiment of the present disclosure, a video encoding method and apparatus having high encoding efficiency and low complexity may be provided by applying a transform based on a size of a current block.

Effects of the present disclosure are not limited to the above-described effects, and other effects not described herein will become apparent to those skilled in the art from the following description.

Drawings

A more complete understanding of the present disclosure and many of the attendant aspects thereof will be more readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:

fig. 1 is a block diagram schematically illustrating an encoding apparatus for encoding a video/image signal according to an embodiment of the present disclosure.

Fig. 2 is a block diagram schematically illustrating a decoding apparatus that decodes an image signal according to an embodiment of the present disclosure.

Fig. 3 is an embodiment to which the present invention may be applied, and fig. 3a, 3b, 3c, and 3d are diagrams illustrating block division structures by a Quadtree (QT), a Binary Tree (BT), a Ternary Tree (TT), and an Asymmetric Tree (AT), respectively, according to an embodiment of the present disclosure.

Fig. 4 is a block diagram schematically illustrating an encoding apparatus of fig. 1 including a transformation and quantization unit according to an embodiment of the present disclosure, and fig. 5 is a block diagram schematically illustrating a decoding apparatus including an inverse quantization and inverse transformation unit according to an embodiment of the present disclosure.

Fig. 6 is a flowchart illustrating an example of encoding a video signal via primary and secondary transforms according to an embodiment of the present disclosure.

Fig. 7 is a flowchart illustrating an example of decoding a video signal via a secondary inverse transform and a primary inverse transform according to an embodiment of the present disclosure.

Fig. 8 illustrates an exemplary transformation configuration set to which Adaptive Multiple Transformations (AMTs) are applied, according to an embodiment of the present disclosure.

Fig. 9 is a flowchart illustrating coding to which an AMT is applied according to an embodiment of the present disclosure.

Fig. 10 is a flowchart illustrating decoding to which an AMT is applied according to an embodiment of the present disclosure.

Fig. 11 is a flowchart illustrating an example of encoding an AMT flag and an AMT index according to an embodiment of the present disclosure.

FIG. 12 is a flow diagram illustrating an exemplary decoding for performing a transformation based on AMT flags and AMT indices.

Fig. 13 is a diagram illustrating Givens rotation according to an embodiment of the present disclosure, and fig. 14 illustrates the configuration of one round in a 4 × 4NSST consisting of permutations and Givens rotation layers according to an embodiment of the present disclosure.

Fig. 15 illustrates an exemplary configuration of a non-split transform set per intra prediction mode according to an embodiment of the present disclosure.

Fig. 16 illustrates three forward scan orders over transform coefficients or transform coefficient blocks, where (a) illustrates diagonal scanning, (b) illustrates horizontal scanning, and (c) illustrates vertical scanning.

Fig. 17 illustrates positions of transform coefficients in a case where a forward diagonal scan is applied when 4 × 4RST is applied to a 4 × 8 block according to an embodiment of the present disclosure, and fig. 18 illustrates an example of merging significant transform coefficients of two 4 × 4 blocks into a single block according to an embodiment of the present disclosure.

Fig. 19 illustrates an exemplary method of configuring a hybrid NSST set for each intra prediction mode according to an embodiment of the present disclosure.

Fig. 20 illustrates an exemplary method of selecting an NSST set (or kernel) in consideration of the size of a transform block and an intra prediction mode according to an embodiment of the present disclosure.

Fig. 21a and 21b illustrate forward and inverse downscaling transforms according to embodiments of the present disclosure.

Fig. 22 is a flowchart illustrating an example of decoding using a downscaling transform according to an embodiment of the present disclosure.

FIG. 23 is a flow diagram illustrating an example for applying a conditional reduction transformation according to an embodiment of the present disclosure.

Fig. 24 is a flowchart illustrating an example of decoding a quadratic inverse transform to which a conditional reduction transform is applied according to an embodiment of the present disclosure.

Fig. 25a, 25b, 26a, and 26b illustrate examples of a downscaling transform and a downscaling inverse transform according to an embodiment of the present disclosure.

Fig. 27 illustrates an exemplary region to which a reduced quadratic transform is applied according to an embodiment of the present disclosure.

Fig. 28 illustrates a downscaling transform per downscaling factor according to an embodiment of the disclosure.

Fig. 29 is a flowchart illustrating an example of decoding to which a transform is applied according to an embodiment of the present disclosure.

Fig. 30 is a block diagram illustrating an apparatus for processing a video signal according to an embodiment of the present disclosure.

Fig. 31 illustrates an exemplary video encoding system according to an embodiment of the present disclosure.

Fig. 32 is a view illustrating a structure of a repair yard streaming system according to an embodiment of the present disclosure.

Detailed Description

Some embodiments of the present disclosure are described in detail with reference to the accompanying drawings. The detailed description to be disclosed together with the accompanying drawings is intended to describe some embodiments of the present disclosure, and is not intended to describe the only embodiments of the present disclosure. The following detailed description includes further details to provide a thorough understanding of the present disclosure. However, it will be understood by those skilled in the art that the present disclosure may be practiced without such further details.

In some cases, in order to avoid the concept of the present disclosure from being obscured, known structures and devices are omitted or are shown in the form of a block diagram based on the core function of each structure and device.

Although most terms used in the present disclosure are selected from general terms widely used in the art, the applicant has arbitrarily selected some terms and explained their meanings in the following description in detail as necessary. Accordingly, the present disclosure should be understood with the intended meaning of the terms rather than the simple names or meanings thereof.

Specific terms used in the following description have been provided to aid in understanding the present disclosure, and the use of these specific terms may be changed in various forms without departing from the technical spirit of the present disclosure. For example, signals, data, samples, pictures, frames, blocks, etc. may be appropriately replaced and interpreted in each encoding process.

In the present specification, a "processing unit" refers to a unit in which encoding/decoding processing such as prediction, transformation, and/or quantization is performed. Further, the processing unit may be interpreted to mean including a unit for a luminance component and a unit for a chrominance component. For example, a processing unit may correspond to a block, a Coding Unit (CU), a Prediction Unit (PU), or a Transform Unit (TU).

In addition, the processing unit may be interpreted as a unit for a luminance component or a unit for a chrominance component. For example, the processing unit may correspond to a Coding Tree Block (CTB), a Coding Block (CB), a PU, or a Transform Block (TB) of the luma component. Further, the processing unit may correspond to a CTB, CB, PU, or TB of the chrominance component. Also, the processing unit is not limited thereto, and may be interpreted as meaning including a unit for a luminance component and a unit for a chrominance component.

In addition, the processing unit is not necessarily limited to a square block, and may be configured in a polygonal shape having three or more vertices.

As used herein, "pixels" and "coefficients" (e.g., transform coefficients or transform coefficients that have undergone a first transform) may be collectively referred to as samples. When using samples, this may mean, for example, using pixel values or coefficients (e.g., transform coefficients or transform coefficients that have undergone a first transform).

Hereinafter, a method of designing and applying a reduced quadratic transform (RST) considering a worst case computational complexity is described with respect to encoding/decoding of a still image or video.

The embodiment of the disclosure provides a method and a device for compressing images and videos. The compressed data has the form of a bitstream, and the bitstream may be stored in various types of memories and may be streamed to a terminal equipped with a decoder via a network. If the terminal has a display device, the terminal may display the decoded image on the display device, or may simply store the bitstream data. The method and apparatus proposed according to the embodiments of the present disclosure are applicable to both an encoder and a decoder, or to both a bitstream generator and a bitstream receiver, regardless of whether a terminal outputs through a display device.

The image compression apparatus mainly includes a prediction unit, a transform and quantization unit, and an entropy coding unit. Fig. 1 and 2 are block diagrams schematically illustrating an encoding apparatus and a decoding apparatus, respectively. Among these components, the transform and quantization unit transforms a residual signal, which is obtained by subtracting a prediction signal from an original signal, into a frequency domain signal via, for example, Discrete Cosine Transform (DCT) -2, and applies quantization to the frequency domain signal, thereby achieving image compression while the number of non-zero signals is significantly reduced.

Fig. 1 is a block diagram schematically illustrating an encoding apparatus for encoding a video/image signal according to an embodiment of the present disclosure.

The image divider 110 may divide an image (or a picture or a frame) input to the encoding apparatus 100 into one or more processing units. As an example, a processing unit may be referred to as a Coding Unit (CU). In this case, the coding unit may be recursively partitioned from the Coding Tree Unit (CTU) or the Largest Coding Unit (LCU) according to a binary quadtree tree (QTBT) structure. For example, one coding unit may be divided into coding units of deeper depths based on a quadtree structure and/or a binary tree structure. In this case, for example, a quad tree structure may be applied first, and then a binary tree structure may be applied. Alternatively, a binary tree structure may be applied first. The encoding process according to the embodiments of the present disclosure may be performed based on the final encoding unit that is not divided any more. In this case, the maximum coding unit may be immediately used as the final coding unit based on, for example, the coding efficiency of each image property, or the coding unit may be recursively split into coding units of lower depths as necessary, and the coding unit of the optimal size may be used as the final coding unit. The encoding process may include, for example, prediction, transformation, or reconstruction as described below. As an example, the proceeding unit may further include a prediction unit PU or a transform unit TU. In this case, the prediction unit and the transform unit may each be divided into or partitioned from the final coding unit. The prediction unit may be a unit of sample prediction, and the transform unit may be a unit for deriving transform coefficients and/or a unit for deriving residual signals from the transform coefficients.

In some cases, the term "unit" may be used interchangeably with "block" or "region". In general, an mxn block may represent a set of samples or transform coefficients composed of M columns and N rows. In general, a sample may represent a pixel or a pixel value, or may represent a pixel/pixel value of only a luminance component, or a pixel/pixel value of only a chrominance component. A sample may be used as a term corresponding to a pixel or a pixel of a picture (or image).

The encoding apparatus 100 may generate a residual signal (residual block or residual sample array) by subtracting a prediction signal (prediction block or prediction sample array) output from the inter predictor 180 or the intra predictor 185 from an input image signal (original block or original sample array), and the generated residual signal is transmitted to the transformer 120. In this case, as shown, a unit for subtracting a prediction signal (prediction block or prediction sample array) from an input image signal (original block or original sample array) in the encoder 100 may be referred to as a subtractor 115. The predictor may perform prediction on a target block for processing (hereinafter, a current block) and generate a prediction block including prediction samples of the current block. The predictor may determine whether to apply intra prediction or inter prediction in each block or CU unit. The predictor may generate various information for prediction, such as prediction mode information, as described below in connection with the respective prediction modes, and transmit the generated information to the entropy encoder 190. The prediction-related information may be encoded by an entropy encoder and output in the form of a bitstream.

The intra predictor 185 may predict the current block by referring to samples in the current picture. Depending on the prediction mode, the reference sample may be adjacent to the current block or located far away from the current block. In intra prediction, the prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The non-directional mode may include, for example, a DC mode and a planar mode. The directional modes may include, for example, 33 directional prediction modes or 65 directional prediction modes depending on the degree of fineness of the prediction direction. However, this is merely an example, and more or fewer directional prediction modes may be used. The intra predictor 185 may determine a prediction mode applied to the current block using the prediction modes applied to the neighboring blocks.

The inter predictor 180 may derive a prediction block of the current block based on a reference block (reference sample array) specified by a motion vector on a reference picture. Here, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information may be predicted per block, per sub-block, or per sample based on a correlation in motion information between neighboring blocks and the current block. The motion information may include a motion vector and a reference picture index. The motion information may also include inter prediction direction (L0 prediction, L1 prediction, or Bi prediction) information. In the case of inter prediction, the neighboring blocks may include spatial neighboring blocks existing in a current picture and temporal neighboring blocks existing in a reference picture. The reference picture including the reference block may be the same as or different from the reference picture including the temporal neighboring block. The temporal neighboring blocks may be referred to as co-located reference blocks or co-located cus (colcu), for example, and the reference pictures comprising the temporal neighboring blocks may be referred to as co-located pictures (colPic). For example, the inter predictor 180 may construct a motion information candidate list based on neighboring blocks and generate information indicating which candidate is used to derive a motion vector and/or a reference picture index of the current block. Inter prediction may be performed based on various prediction modes. For example, in the skip mode or the merge mode, the inter predictor 180 may use motion information of an adjacent block as motion information of the current block. In the skip mode, unlike in the merge mode, no residual signal is transmitted. In a Motion Vector Prediction (MVP) mode, motion vectors of neighboring blocks may be used as a motion vector predictor, and a motion vector difference may be signaled, indicating a motion vector of the current block.

The prediction signal generated via the inter predictor 180 or the intra predictor 185 may be used to generate a reconstructed signal or a residual signal.

The transformer 120 may apply a transform scheme to the residual signal, generating transform coefficients. For example, the transform scheme may include at least one of a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), a Karhunen-Loeve transform (KLT), a graph-based transform (GBT), or a conditional non-linear transform (CNT). GBT denotes a transform obtained from a graph in which information on the relationship between pixels is represented. CNT denotes the transform obtained based on generating a prediction signal using all previously reconstructed pixels. Furthermore, the transformation process may be applied to square blocks of pixels having the same size, or may also be applied to non-square variable-size blocks.

The quantizer 130 may quantize the transform coefficients and transmit the quantized transform coefficients to the entropy encoder 190, and the entropy encoder 190 may quantize a quantized signal (information on the quantized transform coefficients) and output an encoded signal in a bitstream. Information for the quantized transform coefficients may be referred to as residual information. The quantizer 130 may reorder the block-shaped quantized transform coefficients in the form of one-dimensional vectors based on the coefficient scan order, and may generate information on the quantized transform coefficients based on the one-dimensional form of the quantized transform coefficients. The entropy encoder 190 may perform various encoding methods such as, for example, exponential Golomb (Golomb), Context Adaptive Variable Length Coding (CAVLC), or Context Adaptive Binary Arithmetic Coding (CABAC). The entropy encoder 190 may encode values (e.g., syntax elements) of information needed to reconstruct the video/image together with or separately from the quantized transform coefficients. Encoded information (e.g., video/image information) may be transmitted or stored in the form of a bitstream on a per Network Abstraction Layer (NAL) basis. The bitstream may be transmitted via a network or stored in a digital storage medium. The network may include, for example, a broadcast network and/or a communication network, and the digital storage medium may include, for example, USB, SD, CD, DVD, blu-ray, HDD, SSD, or other various storage media. A transmitter (not shown) for transmitting the signal output from the entropy encoder 190 and/or a storage unit (not shown) for storing the signal output from the entropy encoder 190 may be configured as an internal/external element of the encoding apparatus 100, or the transmitter may be a component of the entropy encoder 190.

The quantized transform coefficients output from the quantizer 130 may be used to generate a prediction signal. For example, the residual signal may be reconstructed by applying inverse quantization and inverse transform to the quantized transform coefficients via an inverse quantizer 140 and an inverse transformer 150 in a loop. The adder 155 may add the reconstructed residual signal to a prediction signal output from the inter predictor 180 or the intra predictor 185, thereby generating a reconstructed signal (a reconstructed picture, a reconstructed block, or a reconstructed sample array). As in the case of applying the skip mode, when there is no residual of the target block for processing, the prediction block may be used as a reconstruction block. The adder 155 may be represented as a reconstructor or a reconstruction block generator. The generated reconstructed signal may be used for intra prediction of a next target processing block in the current picture, and as described below, the generated reconstructed signal may be filtered and then used for inter prediction of a next picture.

The filter 160 may enhance subjective/objective image quality by applying filtering to the reconstructed signal. For example, the filter 160 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed picture and transmit the modified reconstructed picture to the decoded image buffer 170. Various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, or bilateral filter. As described below in connection with the filtering methods, the filter 160 may generate various information for filtering and communicate the resulting information to the entropy encoder 190. The filtering related information may be encoded by the entropy encoder 190 and may be output in the form of a bitstream.

The modified reconstructed picture sent to the decoded picture buffer 170 may be used as a reference picture in the inter predictor 180. The encoding apparatus 100 can avoid prediction mismatch between the encoding apparatus 100 and the decoding apparatus when inter prediction is applied, and improve encoding efficiency.

The decoded picture buffer 170 may store the modified reconstructed picture as a reference picture in the inter predictor 180.

Fig. 2 is a block diagram schematically illustrating a decoding apparatus that decodes an image signal according to an embodiment of the present disclosure.

Referring to fig. 2, the decoding apparatus 200 may include an entropy decoder 210, an inverse quantizer 220, an inverse transformer 230, an adder 235, a filter 240, a decoded picture buffer 250, an inter predictor 260, and an intra predictor 265. The inter predictor 260 and the intra predictor 265 may be collectively referred to as predictors. In other words, the predictor may include an inter predictor 180 and an intra predictor 185. The inverse quantizer 220 and the inverse transformer 230 may be collectively referred to as a residual processor. In other words, the residual processor may include an inverse quantizer 220 and an inverse transformer 230. According to an embodiment, the entropy decoder 210, the inverse quantizer 220, the inverse transformer 230, the adder 235, the filter 240, the inter predictor 260, and the intra predictor 265 may be configured in a single hardware component (e.g., a decoder or a processor). According to an embodiment, the decoded picture buffer 250 may be implemented as a single hardware component (e.g., a memory or a digital storage medium).

When a bitstream including video/image information is input, the decoding apparatus 200 may reconstruct an image corresponding to video/image information processing in the encoding apparatus 100 of fig. 2. For example, the decoding apparatus 200 may perform decoding using a processing unit applied in the encoding apparatus 100. Thus, upon decoding, the processing unit may be, for example, a coding unit, and the coding unit may be partitioned from the coding tree unit or the maximum coding unit according to a quadtree structure and/or a binary tree structure. The reconstructed image signal decoded and output by the decoding apparatus 200 may be played via a player.

The decoding apparatus 200 may receive a signal in the form of a bitstream output from the encoding apparatus 100 of fig. 2 and may decode the received signal via the entropy decoder 210. For example, the entropy decoder 210 may parse the bitstream and extract information (e.g., video/image information) required for image reconstruction (or picture reconstruction). For example, the entropy decoder 210 may decode information in a bitstream based on a coding method such as exponential Golomb coding, CAVLC, or CABAC, and may output values of syntax elements required for image reconstruction and quantized values of transform coefficients with respect to a residual. Specifically, the CABAC entropy decoding method may receive a bin (bit) corresponding to each syntax element in a bitstream, determine a context model using decoding target syntax element information, decoding information on neighboring and decoding target blocks, or information on a symbol/bin decoded in a previous step, predict an occurrence probability of the bin according to the determined context model, and perform arithmetic decoding on the bin. At this time, after determining the context model, the CABAC entropy decoding method may update the context model using information on the decoded symbol/bin for the context model of the next symbol/bin. Among the pieces of information decoded by the entropy decoder 210, information regarding prediction may be provided to predictors (e.g., the inter predictor 260 and the intra predictor 265), and residual values entropy-decoded by the entropy decoder 210 (that is, quantized transform coefficients and associated processor information) may be input to the inverse quantizer 220. Among the pieces of information decoded by the entropy decoder 210, information regarding filtering may be provided to the filter 240. Meanwhile, a receiver (not shown) for receiving a signal output from the encoding apparatus 100 may be further configured as an internal/external element of the decoding apparatus 200, or the receiver may be a component of the entropy decoder 210.

The inverse quantizer 220 may inverse-quantize the quantized transform coefficients and output the transform coefficients. The inverse quantizer 220 may reorder the quantized transform coefficients in the form of a two-dimensional block. In this case, the reordering may be performed based on the coefficient scan order that the encoding apparatus 100 has performed. The inverse quantizer 220 may inverse-quantize the quantized transform coefficients using a quantization parameter (e.g., quantization step size information) to obtain transform coefficients.

The inverse transformer 230 obtains a residual signal (residual block or residual sample array) by inverse-transforming the transform coefficients.

The predictor may perform prediction on the current block and generate a prediction block including prediction samples of the current block. The predictor may determine which of intra prediction or inter prediction is applied to the current block based on information about prediction output from the entropy decoder 210 and determine a specific intra/inter prediction mode.

The intra predictor 265 may predict the current block by referring to samples in the current picture. Depending on the prediction mode, the reference sample may be adjacent to the current block or located far away from the current block. In intra prediction, the prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The intra predictor 265 may determine a prediction mode applied to the current block using prediction modes applied to neighboring blocks.

The inter predictor 260 may derive a prediction block of the current block based on a reference block (reference sample array) specified by a motion vector on a reference picture. Here, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information may be predicted per block, per sub-block, or per sample based on a correlation in motion information between neighboring blocks and the current block. The motion information may include a motion vector and a reference picture index. The motion information may also include inter prediction direction (L0 prediction, L1 prediction, or Bi prediction) information. In the case of inter prediction, the neighboring blocks may include spatial neighboring blocks existing in a current picture and temporal neighboring blocks existing in a reference picture. For example, the inter predictor 260 may construct a motion information candidate list based on information related to prediction of neighboring blocks and derive a motion vector and/or a reference picture index of the current block based on the received candidate selection information. Inter prediction may be performed based on various prediction modes. The information on prediction may include information indicating a mode of inter prediction of the current block.

The adder 235 may add the obtained residual signal to a prediction signal (e.g., a prediction block or a prediction sample array) output from the inter predictor 260 or the intra predictor 265, thereby generating a reconstructed signal (a reconstructed picture, a reconstructed block, or a reconstructed sample array). As in the case of applying the skip mode, when there is no residual of the target block for processing, the prediction block may be used as a reconstruction block.

The adder 235 may be represented as a reconstructor or a reconstruction block generator. The generated reconstructed signal may be used for intra prediction of the next target processing block in the current picture, and as described below, the generated reconstructed signal is filtered and then used for inter prediction of the next picture.

The filter 240 may enhance subjective/objective image quality by applying filtering to the reconstructed signal. For example, the filter 240 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed picture and transmit the modified reconstructed picture to the decoded picture buffer 250. Various filtering methods may include, for example, deblocking filtering, Sample Adaptive Offset (SAO), Adaptive Loop Filter (ALF), or bilateral filter.

The inter predictor 260 may use the modified reconstructed picture transmitted to the decoded picture buffer 250 as a reference picture.

In the present disclosure, the above-described embodiments in connection with the filter 160, the inter predictor 180, and the intra predictor 185 of the encoding apparatus 100 may be applied in the same or corresponding manner as the filter 240, the inter predictor 260, and the intra predictor 265 of the decoding apparatus 200.

Fig. 3a, 3b, 3c, and 3d are diagrams illustrating block division structures according to a Quadtree (QT), a Binary Tree (BT), a Ternary Tree (TT), and an Asymmetric Tree (AT), respectively, according to an embodiment of the present disclosure.

In video coding, a block may be partitioned based on QT. A sub-block partitioned by QT may be further partitioned recursively according to QT. Leaf blocks that are no longer partitioned according to QT may be partitioned according to AT least one of BT, TT, or AT. BT may have two types of splits, such as horizontal BT (2N × N ) and vertical BT (N × 2N ). TT may have two types of segmentation, such as horizontal TT (2N × 1/2N ) and vertical TT (1/2N × 2N, N × 2N, 1/2N × 2N). The AT may have four types of partitions, such as a horizontal upper AT (2N × 1/2N, 2N × 3/2N), a horizontal lower AT (2N × 3/2N, 2N × 1/2N), a vertical left AT (1/2N × 2N, 3/2N × 2N), and a vertical right AT (3/2N × 2N, 1/2N × 2N). BT, TT and AT may each be further recursively segmented using BT, TT and AT.

FIG. 3a shows an example of QT partitioning. Block a may be partitioned into four sub-blocks (a0, a1, a2, A3) according to QT. The sub-block a1 may be subdivided into four sub-blocks (B0, B1, B2, B3) according to QT.

Fig. 3b shows an example of BT segmentation. The block B3, which is no longer split according to QT, can be split into vertical BT (C0, C1) or horizontal BT (D0, D1). Like block C0, each sub-block may be further recursively split, for example, in the form of horizontal BT (E0, E1) or vertical BT (F0, F1).

Fig. 3c shows an example of TT segmentation. The block B3 which is no longer partitioned according to QT can be partitioned into a vertical TT (C0, C1, C2) or a horizontal TT (D0, D1, D2). Like block C1, each sub-block may be further recursively split, for example, in the form of a horizontal TT (E0, E1, E2) or a vertical TT (F0, F1, F2).

Fig. 3d shows an example of AT segmentation. The block B3, which is no longer partitioned according to QT, can be partitioned into vertical ATs (C0, C1) or horizontal ATs (D0, D1). Like block C1, each sub-block may be further recursively split, for example, in the form of horizontal ATs (E0, E1) or vertical TTs (F0, F1).

Meanwhile, BT, TT and AT may be used together. For example, sub-blocks partitioned by BT may be partitioned by TT or AT. Further, the subblocks divided by TT may be divided by BT or AT. The subblocks partitioned by AT may be partitioned by BT or TT. For example, each sub-block may be split by a vertical BT after being split by a horizontal BT, or each sub-block may be split by a horizontal BT after being split by a vertical BT. In this case, the final shape after segmentation may be the same, although a different segmentation order is applied.

When partitioning a block, various orders of searching for blocks may be defined. Typically, the search is performed from left to right or top to bottom. Searching for a block may mean determining whether to further partition each subblock, or if the block is not partitioned, the order in which each subblock is encoded, or the order in which a subblock is searched when referring to other adjacent blocks.

The transformation may be performed per processing unit (or transformation block) divided by the division structure as shown in fig. 3a to 3 d. Specifically, it may be divided in a row direction and a column direction, and a transformation matrix may be applied. Other types of transforms may be used along the row direction or column direction of a processing unit (or transform block), according to embodiments of the present disclosure.

Fig. 4 and 5 are embodiments to which the present disclosure is applied. Fig. 4 is a block diagram schematically illustrating the encoding apparatus 100 of fig. 1 including a transformation and quantization unit 120/130 according to an embodiment of the present disclosure, and fig. 5 is a block diagram schematically illustrating a decoding apparatus 200 including an inverse quantization and inverse transformation unit 220/230 according to an embodiment of the present disclosure.

Referring to fig. 4, the transform and quantization unit 120/130 may include a primary transform unit 121, a secondary transform unit 122, and a quantizer 130. The inverse quantization and inverse transform unit 140/150 may include an inverse quantizer 140, an inverse quadratic transform unit 151, and an inverse primary transform unit 152.

Referring to fig. 5, the inverse quantization and inverse transform unit 220/230 may include an inverse quantizer 220, an inverse quadratic transform unit 231, and an inverse primary transform unit 232.

In the present disclosure, the transformation may be performed through a plurality of steps. For example, as shown in fig. 4, two steps of primary and secondary transformation may be applied, or more transformation steps may be applied depending on the algorithm. Here, the primary transform may be referred to as a core transform.

The primary transform unit 121 may apply a primary transform to the residual signal. Here, the primary transform may be predefined as a table in the encoder and/or decoder.

The secondary transform unit 122 may apply a secondary transform to the primarily transformed signal. Here, the quadratic transform may be predefined as a table in an encoder and/or a decoder.

According to an embodiment, a non-split quadratic transform (NSST) may be conditionally applied as a quadratic transform. For example, NSST may be applied only to intra-predicted blocks, and may have a transform set applicable to each prediction mode group.

Here, the prediction mode group may be set based on the symmetry of the prediction direction. For example, since prediction mode 52 and prediction mode 16 are symmetric with respect to prediction mode 34 (diagonal direction), they may form one group, and the same set of transforms may be applied to them. Once the transform is applied to the prediction mode 52, after transposing the input data, the transform is applied to the transposed input data, since the set of transforms for the prediction mode 52 is the same as the set of transforms for the prediction mode 16.

Meanwhile, since the planar mode and the DC mode lack directional symmetry, they have respective transform sets, and each transform set may be composed of two transforms. For other directional modes, each transform set may consist of three transforms.

The quantizer 130 may perform quantization on the secondarily transformed signal.

The inverse quantization and inverse transform unit 140/150 may perform the above-described processing in reverse, and a repetitive description will not be given.

Fig. 5 is a block diagram schematically illustrating the inverse quantization and inverse transform unit 220/230 in the decoding apparatus 200.

Referring to fig. 5, the inverse quantization and inverse transform unit 220/230 may include an inverse quantizer 220, an inverse quadratic transform unit 231, and an inverse primary transform unit 232.

The inverse quantizer 220 obtains a transform coefficient from the entropy-decoded signal using quantization step size information.

The inverse quadratic transform unit 231 performs inverse quadratic transform on the transform coefficients. Here, the inverse quadratic transform means an inverse of the quadratic transform described above in connection with fig. 4.

The inverse primary transform unit 232 performs inverse primary transform on the inversely secondarily transformed signal (or block), and obtains a residual signal. Here, the inverse primary transform means an inverse transform of the primary transform described above in connection with fig. 4.

Fig. 6 is a flowchart illustrating an example of encoding a video signal via primary and secondary transforms according to an embodiment of the present disclosure. The operations of fig. 6 may be performed by the transformer 120 of the encoding apparatus 100.

The encoding apparatus 100 may determine (or select) a forward quadratic transform based on at least one of a prediction mode, a block shape, and/or a block size of the current block (S610).

The encoding apparatus 100 may determine an optimal forward quadratic transform via rate-distortion (RD) optimization. The optimal forward quadratic transform may correspond to one of a plurality of transform combinations, and the plurality of transform combinations may be defined by a transform index. For example, for RD optimization, the encoding apparatus 100 may compare all results of performing forward quadratic transform, quantization, and residual coding for respective candidates.

The encoding apparatus 100 may signal a second transformation index corresponding to the optimal forward quadratic transformation (S620). Here, other embodiments described in the present disclosure may be applied to the quadratic transform index.

Meanwhile, the encoding apparatus 100 may perform a forward primary scan on the current block (residual block) (S630).

The encoding apparatus 100 may perform a forward quadratic transform on the current block using the optimal forward quadratic transform (S640). Meanwhile, the forward secondary transformation may be RST described below. RST denotes a transform in which N pieces of residual data (N × 1 residual vectors) are input and R (R < N) pieces of transform coefficient data (R × 1 transform coefficient vectors) are output.

According to an embodiment, the RST may be applied to a specific region of the current block. For example, when the current block is N × N, the specific region may represent an N/2 × N/2 region on the upper left side. However, the present disclosure is not limited thereto, and the specific region may be set to be different depending on at least one of a prediction mode, a block shape, or a block size. For example, when the current block is N × N, the specific region may represent an M × M region at the upper left side (M ≦ N).

Meanwhile, the encoding apparatus 100 may perform quantization on the current block, thereby generating a transform coefficient block (S650).

The encoding apparatus 100 may perform entropy encoding on the transform coefficient block, thereby generating a bitstream.

Fig. 7 is a flowchart illustrating an example of decoding a video signal via secondary inverse transformation and primary decoding according to an embodiment of the present disclosure. The operations of fig. 7 may be performed by inverse transformer 230 of decoding apparatus 200.

The decoding apparatus 200 may obtain a quadratic transform index from the bitstream.

The decoding apparatus 200 may generalize the quadratic transform corresponding to the quadratic transform index.

However, steps S710 and S720 are merely embodiments, and the present disclosure is not limited thereto. For example, the decoding apparatus 200 may accommodate the secondary transform based on at least one of a prediction mode, a block shape, and/or a block size of the current block without obtaining a secondary transform index.

Meanwhile, the decoder 200 may obtain a transform coefficient block by entropy-decoding the bitstream, and may perform inverse quantization on the transform coefficient block (S730).

The decoder 200 may perform inverse quadratic transform on the inverse-quantized transform coefficient block (S740). For example, the inverse quadratic transform may be the inverse RST. The inverse RST is a transposed matrix of the RST described above in connection with fig. 6, and represents a transform in which R pieces of transform coefficient data (R × 1 transform coefficient vectors) are input and N pieces of residual data (N × 1 residual vectors) are output.

According to an embodiment, a reduced quadratic transform may be applied to a specific region of a current block. For example, when the current block is N × N, the specific region may represent an N/2 × N/2 region on the upper left side. However, the present disclosure is not limited thereto, and the specific region may be set to be different depending on at least one of a prediction mode, a block shape, or a block size. For example, when the current block is N × N, the specific region may represent an M × M region (M ≦ N) or M × L (M ≦ N, L ≦ N) at the upper left side.

The decoder 200 may perform an inverse primary transform on the result of the inverse secondary transform (S750).

The decoder 200 generates a residual block via step S750, and generates a reconstructed block by adding the residual block and the prediction block.

Fig. 8 illustrates an exemplary transformation configuration set applying Adaptive Multiple Transformations (AMTs) according to an embodiment of the present disclosure.

Referring to fig. 8, a transform configuration group may be determined based on a prediction mode, and there may be a total of six (G0 to G5) groups. G0 to G4 correspond to the case where intra prediction is applied, and G5 denotes a transform combination (or a transform set or a transform combination set) applied to a residual block generated by inter prediction.

A transform combination may consist of a horizontal transform (or row transform) applied to rows of a two-dimensional block and a vertical transform (or column transform) applied to columns of the two-dimensional block.

Here, each transform configuration group may include four transform combination candidates. Four transform combination candidates may be selected or determined via the transform combination indexes of 0 to 3, and the transform combination indexes may be transmitted from the encoding apparatus 100 to the decoding apparatus 200 via an encoding process.

According to an embodiment, residual data (or residual signal) obtained via intra prediction may have different statistical characteristics depending on an intra prediction mode. Therefore, as shown in fig. 8, transforms other than the conventional cosine transform may be applied per prediction mode. The transform type may be denoted herein as DCT type 2, DCT-II, or DCT-2.

Fig. 8 illustrates respective transform set configurations when 35 intra prediction modes are used and when 67 intra prediction modes are used. Multiple transform combinations may be applied per each transform configuration group distinguished in the intra prediction mode column. For example, a plurality of transformation combinations (transformation in the row direction, transformation in the column direction) may be composed of four combinations. More specifically, since DST-7 and DCT-5 can be applied to the row (horizontal) direction and the column (vertical) direction in group 0, there can be four combinations.

Since a total of four transform kernel combinations can be applied to each intra prediction mode, a transform combination index for selecting one of them can be transmitted per transform unit. In this disclosure, the transform combination index may be indicated as an AMT index and may be denoted as AMT _ idx.

In cores other than the one presented in fig. 8, there is a case where DCT-2 is optimal in both the row direction and the column direction due to the nature of the residual signal. Thus, the transformation can be adaptively performed by defining the AMT flag per coding unit. Here, if the AMT flag is 0, DCT-2 may be applied to both the row direction and the column direction, and if the AMT flag is 1, one of the four combinations may be selected or determined via the AMT index.

According to an embodiment, in case that the AMT flag is 0, if the number of transform coefficients is 3 or less for one transform unit, the transform kernel of fig. 8 is not applied, and DST-7 may be applied to both row direction and column direction.

According to an embodiment, the transform coefficient values are parsed first, and if the number of transform coefficients is 3 or less, the AMT index is not parsed, and DST-7 may be applied, thereby reducing transmission of additional information.

According to an embodiment, AMT may be applied only when both the width and the height of the conversion unit are 32 or less.

According to an embodiment, fig. 8 may be preset by offline training.

According to an embodiment, the AMT index may be defined by one index that may indicate a combination of horizontal transformation and vertical transformation at the same time. Alternatively, the AMT index may be defined by a horizontal transform index and a vertical transform index, respectively.

Like the AMT described above, a scheme of applying a transform selected from among a plurality of kernels (e.g., DCT-2, DST-7, and DCT-8) may be expressed as a Multiple Transform Selection (MTS) or an Enhanced Multiple Transform (EMT), and an AMT index may be expressed as an MTS index.

Fig. 9 is a flowchart illustrating coding to which an AMT is applied according to an embodiment of the present disclosure. The operations of fig. 9 may be performed by the transformer 120 of the encoding apparatus 100.

Although this disclosure basically describes applying the transforms separately for the horizontal and vertical directions, the transform combination may consist of a non-split transform.

Alternatively, separable transforms and non-separable transforms may be mixed. In this case, if a non-split transform is used, there is no need to perform transform selection in the row/column direction or selection in the horizontal/vertical direction, and the transform combination of fig. 8 can be put into use only when separable transforms are selected.

Furthermore, the scheme proposed in the present disclosure can be applied regardless of whether it is a primary transform or a secondary transform. In other words, no limitation should be applied to either party, but may be applied to both parties. Here, the primary transform may mean a transform for transforming the residual block first, and the secondary transform may mean a transform applied to a block resulting from the primary transform.

First, the encoding apparatus 100 may determine a transform configuration group corresponding to the current block (S910). Here, the transformation configuration group may be composed of combinations as shown in fig. 8.

The encoding apparatus 100 may perform transformation on candidate transformation combinations available in the transformation configuration group (S920).

As a result of performing the transform, the encoding device 100 may determine or select a transform combination having a minimum Rate Distortion (RD) cost (S930).

The encoding apparatus 100 may encode a transform combination index corresponding to the selected transform combination (S940).

Fig. 10 is a flowchart illustrating decoding to which an AMT is applied according to an embodiment of the present disclosure. The operations of fig. 10 may be performed by inverse transformer 230 of decoding apparatus 200.

First, the decoding apparatus 200 may determine a transform configuration group for the current block (S1010). The decoding apparatus 200 may parse (or obtain) a transform combination index from the video signal, wherein the transform combination index may correspond to any one of a plurality of transform combinations in the transform configuration group (S1020). For example, the set of transform configurations may include DCT-2, DST-7, or DCT-8.

The decoding apparatus 200 may generalize the transform combination corresponding to the transform combination index (S1030). Here, the transform combination may be composed of a horizontal transform and a vertical transform, and may include at least one of DCT-2, DST-7, or DCT-8. Further, as the transform combination, the transform combination described above in connection with fig. 8 may be used.

The decoding apparatus 200 may perform an inverse transform on the current block based on the generalized transform combination (S1040). In case the transformation combination consists of a row (horizontal) transformation and a column (vertical) transformation, the row (horizontal) transformation may be applied first and then the column (vertical) transformation. However, the present disclosure is not limited thereto, and the opposite manner thereof may be applied, or if composed of only non-split transforms, non-split transforms may be immediately applied.

According to an embodiment, if the vertical transform or the horizontal transform is DST-7 or DCT-8, the inverse transform of DST-7 or the inverse transform of DCT-8 may be applied per column and then per row. Further, in the vertical transform or the horizontal transform, a different transform may be applied per row and/or per column.

According to an embodiment, the transformation combination index may be obtained based on an AMT flag indicating whether to execute the AMT. In other words, the transformation combination index can be obtained only when the AMT is executed according to the AMT flag. Further, the decoding device 200 may identify whether the number of non-zero transform coefficients is greater than a threshold. At this time, the transform combination index may be resolved only when the number of non-zero transform coefficients is greater than a threshold.

According to an embodiment, the AMT flag or the AMT index may be defined at a level of at least one of a sequence, a picture, a slice, a block, a coding unit, a transform unit, or a prediction unit.

Meanwhile, according to another embodiment, the process of determining the transformation configuration group and the step of parsing the transformation combination index may be performed simultaneously. Alternatively, step S1010 may be preset in the encoding apparatus 100 and/or the decoding apparatus 200 and step S1010 may be omitted.

Fig. 11 is a flowchart illustrating an example of encoding an AMT flag and an AMT index according to an embodiment of the present disclosure. The operations of fig. 11 may be performed by the transformer 120 of the encoding apparatus 100.

The encoding apparatus 100 may determine whether the AMT is applied to the current block (S1110).

If the AMT is applied, the encoding apparatus 100 may perform encoding with the AMT flag being 1 (S1120).

The encoding apparatus 100 may determine the AMT index based on at least one of a prediction mode, a horizontal transform, or a vertical transform of the current block (S1130). Here, the AMT index denotes an index indicating any one of a plurality of transform combinations for each intra prediction mode, and may be transmitted per transform unit.

When the AMT index is determined, the encoding apparatus 100 may encode the AMT index (S1140).

On the other hand, unless the AMT is applied, the encoding apparatus 100 may perform encoding with the AMT flag being 0 (S1150).

FIG. 12 is a flowchart illustrating decoding for performing a transformation based on an AMT flag and an AMT index.

The decoding apparatus 200 may parse the AMT flag from the bitstream (S1210). Here, the AMT flag may indicate whether or not the AMT is applied to the current block.

The decoding apparatus 200 may identify whether the AMT is applied to the current block based on the AMT flag (S1220). For example, the decoding apparatus 200 may identify whether the AMT flag is 1.

If the AMT flag is 1, the decoding apparatus 200 may parse the AMT index (S1230). Here, the AMT index denotes an index indicating any one of a plurality of transform combinations for each intra prediction mode, and may be transmitted per transform unit. Alternatively, the AMT index may mean an index indicating any one of the transformation combinations defined in a preset transformation combination table. The preset transformation combination table may mean fig. 8, but the present disclosure is not limited thereto.

The decoding apparatus 200 may generalize or determine the horizontal transform and the vertical transform based on at least one of the AMT index or the prediction mode (S1240).

Alternatively, the decoding apparatus 200 may generalize the transform combination corresponding to the AMT index. For example, the decoding apparatus 200 may generalize or determine the horizontal transform and the vertical transform corresponding to the AMT index.

Meanwhile, if the AMT flag is 0, the decoding apparatus 200 may apply a preset inverse vertical transform per column (S1250). For example, the vertical inverse transform may be the inverse transform of DCT-2.

The decoding apparatus 200 may apply a preset horizontal inverse transform per line (S1260). For example, the horizontal inverse transform may be the inverse transform of DCT-2. That is, when the AMT flag is 0, a preset transform kernel may be used in the encoding apparatus 100 or the decoding apparatus 200. For example, instead of one transform kernel defined in the transform combination table as shown in fig. 8, a widely used transform kernel may be used.

NSST (non-separating secondary transformation)

The primary transform may include DCT-2 or DST-7 in HEVC or AMT. non-split transform representation described above treats the N × N two-dimensional residual block as N2× 1 vector backward, only to N2× 1 vector application N2×N2The N × N transform kernel is applied once, rather than sequentially in the row and column directions.

That is, NSST may represent a non-split square matrix applied to a vector composed of coefficients of a transform block. Furthermore, although the description of the embodiments of the present disclosure focuses on NSST as an example of a non-discrete transform applied to an upper left region (low frequency region) determined according to a block size, the embodiments of the present disclosure are not limited to the term "NSST," but any type of non-discrete transform may be applied to the embodiments of the present disclosure. For example, a non-separation transform applied to an upper left region (low frequency region) determined according to the block size may be represented as a low frequency non-separation transform (LFNST). In the present disclosure, an M × N transform (or transform matrix) means a matrix composed of M rows and N columns.

In NSST, two-dimensional block data obtained by applying a primary transform is divided into M × M blocks, and then M is divided2×M2Further, only when both the width and height of the two-dimensional block obtained by the primary transformation are 8 or more, a 64 × non-split transformation may be applied to the 8 × region on the upper left side, the rest may be split into 4 × blocks, and a 16 × non-split transformation may be applied to each 4 × block.

M2×M2The non-split transform may be applied in the form of a matrix product, but, to reduce computational load and memory requirements, may be approximated as a combination of Givens rotation layers and displacement layers. FIG. 13 illustrates a Givens rotation. As shown in fig. 13, can be described in terms of an angle of rotation by one Givens.

Fig. 13 is a diagram illustrating Givens rotation according to an embodiment of the present disclosure, and fig. 14 illustrates the configuration of one round in a 4 × 4NSST consisting of permutations and Givens rotation layers according to an embodiment of the present disclosure.

Both 8 × 8NSST and 4 × 4NSST may be configured by a hierarchy combination of Givens rotations. The matrix corresponding to one Givens rotation is shown in equation 1, and the matrix product can be represented as a graph, as shown in fig. 13.

[ formula 1]

Figure BDA0002644966030000201

[ formula 2]

tm=xmcosθ-xnsinθ

tn=xmsinθ+xncosθ

Since a given Givens rotation rotates two data items as shown in fig. 13, 32 or 8 Givens rotations are required to process 64 data items (in the case of 8 x 8NSST) or 16 data items (in the case of 4 x 4NSST), respectively. Thus, a bundle of 32 or 8 Givens rotations may form a Givens rotation layer. As shown in fig. 14, the output data of one Givens rotation layer is transmitted as the input data of the next Givens rotation layer by permutation (or shuffling). In fig. 14, the permutation pattern is defined regularly, and in the case of 4 × 4NSST, four Givens rotation layers and their corresponding permutations form a round. 4 × 4NSST performs two rounds, while 8 × 8NSST performs four rounds. Although the same permutation pattern is used for different rounds, different Givens rotation angles are applied. Therefore, it is necessary to store the angular data of all Givens rotations that make up each transformation.

In the last step, the last multi-permutation is performed on the data output through the Givens rotation layer, and permutation information is stored separately for each transformation. The permutation is performed at the end of the forward NSST, and the inverse permutation is first applied to the inverse NSST.

The inverse NSST performs the Givens rotation layer and permutation applied to the forward NSST in reverse order, and rotates each Givens rotation by a negative (-) value.

Fig. 15 illustrates an exemplary configuration of a non-split transform set per intra prediction mode according to an embodiment of the present disclosure.

Intra prediction modes applying the same NSST or NSST set may form one group. In fig. 15, 67 intra prediction modes are divided into 35 groups. For example, both the pattern No. 20 and the pattern No. 48 belong to the group No. 20 (hereinafter, pattern group).

Instead of one NSST, multiple NSSTs may be configured in a set for each mode group. Each set may include cases where NSST is not applied. For example, in the case where three different NSSTs can be applied to one pattern group, one of four cases including the case where NSST is not applied can be selected. At this time, an index for distinguishing one of four cases may be transmitted in each TU. The number of NSSTs may be configured to be different for each mode group. For example, the mode group number 0 and the mode group number 1 may be signaled separately to select one of three cases including the case where NSST is not applied.

63页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于处理图像信号的方法和设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类