Method and apparatus for processing video signal

文档序号：1061155 发布日期：2020-10-13 浏览：32次中文

阅读说明：本技术 用于处理视频信号的方法和设备 (Method and apparatus for processing video signal ) 是由具文模 M·萨勒海法尔金昇焕林宰显于 2019-09-05 设计创作，主要内容包括：本发明的实施方式提供了用于处理视频信号的方法和设备。根据本发明的实施方式的一种视频信号解码方法包括以下步骤：基于当前块的高度和宽度来确定不可分变换的输入长度和输出长度；确定与所述不可分变换的输入长度和输出长度对应的不可分变换矩阵；以及向所述当前块中与所述输入长度对应的数目的系数应用所述不可分变换矩阵,其中,所述当前块的高度和宽度大于或等于8,并且在所述当前块的高度和宽度均为8的情况下,所述不可分变换的输入长度被设置为8。(The embodiment of the invention provides a method and a device for processing a video signal. A video signal decoding method according to an embodiment of the present invention includes the steps of: determining an input length and an output length of an indivisible transform based on a height and a width of a current block; determining an indivisible transformation matrix corresponding to an input length and an output length of the indivisible transformation; and applying the indivisible transform matrix to a number of coefficients corresponding to the input length in the current block, wherein a height and a width of the current block are greater than or equal to 8, and the input length of the indivisible transform is set to 8 in a case where the height and the width of the current block are both 8.)

1. A method of decoding an image signal, the method comprising the steps of:

determining an input length and an output length of an indivisible transform based on a height and a width of a current block;

determining an indivisible transformation matrix corresponding to an input length and an output length of the indivisible transformation; and

applying the indivisible transform matrix to coefficients that depend on a number of the input lengths in the current block,

wherein the height and width of the current block are greater than or equal to 8,

wherein, if each of the height and the width of the current block is equal to 8, the input length of the indivisible transform is determined to be 8.

2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

wherein, if the height and width of the current block are not equal to 8, the input length of the indivisible transform is determined to be 16.

3. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

wherein the output length is determined to be 48 or 64.

4. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

wherein applying the non-differentiable transform matrix to the current block comprises applying the non-differentiable transform matrix to an upper-left 4x4 region of the current block if each of a height and a width of the current block is not equal to 8 and a product of the width and the height is less than a threshold.

5. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

wherein determining the indivisible transformation matrix comprises the steps of:

determining an indivisible transform set index based on an intra-prediction mode of the current block;

determining an indivisible transformation kernel corresponding to an indivisible transformation index in an indivisible transformation set included in the indivisible transformation set index; and

determining the indivisible transformation matrix according to the indivisible transformation based on the input length and the output length.

6. An apparatus for decoding an image signal, the apparatus comprising:

a memory configured to store a video signal; and

a processor coupled to the memory,

wherein the processor is configured to:

determining an input length and an output length of an indivisible transform based on a height and a width of a current block;

determining an indivisible transformation matrix corresponding to an input length and an output length of the indivisible transformation; and

applying the indivisible transform matrix to coefficients that depend on a number of the input lengths in the current block,

wherein the height and width of the current block are greater than or equal to 8,

wherein, if each of the height and the width of the current block is equal to 8, the input length of the indivisible transform is determined to be 8.

7. The apparatus as set forth in claim 6, wherein,

wherein, if the height and width of the current block are not equal to 8, the input length of the indivisible transform is determined to be 16.

8. The apparatus as set forth in claim 6, wherein,

wherein the output length is determined to be 48 or 64.

9. The apparatus as set forth in claim 6, wherein,

wherein the processor is configured to apply the indivisible transform matrix to an upper-left 4x4 region of the current block if each of a height and a width of the current block is not equal to 8 and a product of the width and the height is less than a threshold.

10. The apparatus as set forth in claim 10, wherein,

wherein the processor is configured to:

determining an indivisible transform set index based on an intra-prediction mode of the current block;

determining an indivisible transformation kernel corresponding to an indivisible transformation index in an indivisible transformation set included in the indivisible transformation set index; and

determining the indivisible transformation matrix according to the indivisible transformation based on the input length and the output length.

Technical Field

The present disclosure relates to a method and apparatus for processing a video signal, and in particular, to a method and apparatus for encoding or decoding a video signal by performing a transform.

Background

Compression coding refers to a signal processing technique for transmitting digitized information through a communication line or storing it in a storage medium in an appropriate form. Media such as video, image, and audio may be objects of compression encoding, and in particular, a technique of performing compression encoding on an image is called video image compression.

The next generation of video content will feature high spatial resolution, high frame rate and high dimensional scene representation. To handle such content, memory storage, memory access rates, and processing power would increase significantly.

Therefore, coding tools for more efficiently processing next generation video content must be designed. In particular, video codec standards following the High Efficiency Video Coding (HEVC) standard require efficient transform techniques to transform spatial domain video signals into frequency domain signals along with prediction techniques with higher accuracy.

Disclosure of Invention

Technical problem

Embodiments of the present disclosure provide an image signal processing method and apparatus applying a transform with high coding efficiency and low complexity.

Technical problems addressed by the present disclosure are not limited to the above technical problems, and other technical problems not described herein will become apparent to those skilled in the art from the following description.

Technical scheme

A method of decoding an image signal according to an embodiment of the present disclosure may include: determining an input length and an output length of an indivisible transform based on a height and a width of a current block; determining an indivisible transformation matrix corresponding to an input length and an output length of the indivisible transformation; and applying the indivisible transform matrix to coefficients depending on the number of the input lengths in the current block, wherein a height and a width of the current block are greater than or equal to 8, wherein the input length of the indivisible transform is determined to be 8 if each of the height and the width of the current block is equal to 8.

Further, if the height and width of the current block are not equal to 8, the input length of the indivisible transform may be determined to be 16.

Further, the output length may be determined as 48 or 64.

Further, applying the non-differentiable transform matrix to the current block may include applying the non-differentiable transform matrix to an upper-left 4x4 region of the current block if each of a height and a width of the current block is not equal to 8 and a product of the width and the height is less than a threshold value.

Further, determining the indivisible transformation matrix may comprise the steps of: determining an indivisible transform set index based on an intra-prediction mode of the current block; determining an indivisible transformation kernel corresponding to an indivisible transformation index in an indivisible transformation set included in the indivisible transformation set index; and determining the indivisible transformation matrix according to the indivisible transformation based on the input length and the output length.

An apparatus for decoding an image signal according to another embodiment of the present disclosure may include: a memory configured to store a video signal; and a processor coupled to the memory, wherein the processor is configured to: determining an input length and an output length of an indivisible transform based on a height and a width of a current block; determining an indivisible transformation matrix corresponding to an input length and an output length of the indivisible transformation; and applying the indivisible transform matrix to coefficients depending on the number of the input lengths in the current block, wherein a height and a width of the current block are greater than or equal to 8, wherein the input length of the indivisible transform is determined to be 8 if each of the height and the width of the current block is equal to 8.

Advantageous effects

According to an embodiment of the present disclosure, a video encoding method and apparatus having high encoding efficiency and low complexity may be provided by applying a transform based on a size of a current block.

The effects of the present disclosure are not limited to the above-described effects, and other effects not described herein will become apparent to those skilled in the art from the following description.

Drawings

The accompanying drawings, which are included to provide an understanding of the disclosure and are incorporated in and constitute a part of this specification, provide embodiments of the disclosure and illustrate technical features of the disclosure by the following description.

Fig. 1 is a block diagram schematically illustrating an encoding apparatus for encoding a video/image signal according to an embodiment of the present disclosure;

fig. 2 is a block diagram schematically illustrating a decoding apparatus for decoding an image signal according to an embodiment of the present disclosure;

fig. 3 is an embodiment to which the present invention may be applied, and fig. 3a, 3b, 3c, and 3d are views illustrating block division structures according to a Quadtree (QT), a Binary Tree (BT), a Ternary Tree (TT), and an Asymmetric Tree (AT), respectively, according to an embodiment of the present disclosure;

fig. 4 is a block diagram schematically illustrating an encoding apparatus of fig. 1 including a transform and quantization unit according to an embodiment of the present disclosure, and fig. 5 is a block diagram schematically illustrating a decoding apparatus including an inverse quantization and inverse transform unit according to an embodiment of the present disclosure;

fig. 6 is a flowchart illustrating an example of encoding a video signal via primary and secondary transforms according to an embodiment of the present disclosure;

fig. 7 is a flowchart illustrating an example of decoding a video signal via a secondary inverse transform and a primary inverse transform according to an embodiment of the present disclosure;

FIG. 8 illustrates an example set of transform configurations applying Adaptive Multiple Transforms (AMTs) in accordance with an embodiment of the present disclosure;

fig. 9 is a flowchart illustrating encoding applying AMT according to an embodiment of the present disclosure;

FIG. 10 is a flowchart illustrating the decoding of an application AMT according to an embodiment of the present disclosure;

FIG. 11 is a flowchart illustrating an example of encoding an AMT flag and an AMT index according to an embodiment of the present disclosure;

FIG. 12 is a flowchart illustrating an example decoding for performing a transformation based on AMT flags and AMT indices;

fig. 13 is a diagram illustrating Givens rotation (Givens rotation) according to an embodiment of the present disclosure, and fig. 14 is a diagram illustrating a configuration of one round in a 4 × 4NSST consisting of permutation and Givens rotation layers according to an embodiment of the present disclosure;

FIG. 15 illustrates an example configuration of a non-partitioned set of transforms for each intra-prediction mode according to an embodiment of the present disclosure;

fig. 16 illustrates three types of forward scan orders of transform coefficients or transform coefficient blocks applied in the HEVC (high efficiency video coding) standard, where (a) shows diagonal scanning, (b) shows horizontal scanning and (c) shows vertical scanning.

Fig. 17 illustrates positions of transform coefficients in the case where forward diagonal scanning is applied when 4 × 4RST is applied to a 4 × 8 block according to an embodiment of the present disclosure, and fig. 18 illustrates an example of merging significant transform coefficients of two 4 × 4 blocks into a single block according to an embodiment of the present disclosure;

fig. 19 illustrates an example method of configuring a hybrid NSST set per intra prediction mode according to an embodiment of the present disclosure;

fig. 20 illustrates an example method of selecting an NSST set (or kernel) in consideration of the size of a transform block and an intra prediction mode according to an embodiment of the present disclosure;

21a and 21b illustrate forward and reverse simplified transforms according to embodiments of the present disclosure;

fig. 22 is a flowchart illustrating an example of decoding using a simplified transform according to an embodiment of the present disclosure;

FIG. 23 is a flowchart illustrating an example of applying a conditional reduction transformation according to an embodiment of the present disclosure;

fig. 24 is a flowchart illustrating an example of decoding of a quadratic inverse transform for applying a conditional simplification transform according to an embodiment of the present disclosure;

25a, 25b, 26a and 26b illustrate examples of simplified transforms and simplified inverse transforms according to embodiments of the disclosure;

FIG. 27 illustrates an example region to which a simplified quadratic transform is applied, according to an embodiment of the disclosure;

FIG. 28 illustrates a simplified transformation of each simplification factor according to an embodiment of the disclosure;

fig. 29 illustrates an example of an encoding flowchart of performing transformation as an embodiment to which the present disclosure is applied.

Fig. 30 illustrates an example of a decoding flowchart of performing transformation as an embodiment to which the present disclosure is applied.

Fig. 31 illustrates an example of a detailed block diagram of the transformer 120 in the encoding apparatus 100 as an embodiment to which the present disclosure is applied.

Fig. 32 illustrates an example of a detailed block diagram of the inverse transformer 230 in the decoding apparatus as an embodiment to which the present disclosure is applied.

Fig. 33 illustrates an example of a decoding flow chart of applying a transform according to an embodiment of the present disclosure.

Fig. 34 illustrates an example of a block diagram of an apparatus for processing a video signal as an embodiment to which the present disclosure is applied.

Fig. 35 illustrates an example of an image encoding system as an embodiment to which the present disclosure is applied.

Fig. 36 is a block diagram of a content streaming system as an embodiment to which the present disclosure is applied.

Detailed Description

Some embodiments of the present disclosure are described in more detail with reference to the accompanying drawings. The detailed description, which will be disclosed in connection with the appended drawings, is intended to describe some exemplary embodiments of the present disclosure, and is not intended to describe the only embodiments of the present disclosure. The following detailed description includes further details in order to provide a thorough understanding of the present disclosure. However, it will be understood by those skilled in the art that the present disclosure may be practiced without these further details.

In some cases, in order to avoid the concept of the present disclosure from being ambiguous, known structures and devices are omitted, or may be shown in block diagram form based on core functions of the respective structures and devices.

Although most terms used in the present disclosure have been selected from general terms widely used in the art, some terms have been arbitrarily selected by the applicant, and their meanings will be explained in detail as necessary in the following description. Accordingly, the present disclosure should be understood based on the intended meanings of the terms rather than their simple names or meanings.

Specific terms used in the following description are provided to aid in understanding the present disclosure, and the use of the specific terms may be changed into various forms without departing from the technical spirit of the present disclosure. For example, signals, data, samples, pictures, frames, blocks, etc. may be appropriately replaced and interpreted in each encoding process.

In the present specification, a "processing unit" refers to a unit in which encoding/decoding processing such as prediction, transformation, and/or quantization is performed. In addition, a processing unit may be interpreted to mean a unit including a unit for a luminance component and a unit for a chrominance component. For example, a processing unit may correspond to a block, a Coding Unit (CU), a Prediction Unit (PU), or a Transform Unit (TU).

In addition, the processing unit may be interpreted as a unit for a luminance component or a unit for a chrominance component. For example, the processing unit may correspond to a Coding Tree Block (CTB), a Coding Block (CB), a PU, or a Transform Block (TB) for the luma component. In addition, the processing unit may correspond to a CTB, CB, PU, or TB for the chrominance component. Further, the processing unit is not limited thereto, and may be interpreted as meaning including a unit for a luminance component and a unit for a chrominance component.

In addition, the processing unit is not necessarily limited to a square block, and may be configured in a polygonal shape having three or more vertices.

As used herein, "pixels" and "coefficients" (e.g., transform coefficients or transform coefficients that have undergone a first transform) may be collectively referred to as samples. When using samples, this may mean, for example, using pixel values or coefficients (e.g., transform coefficients or transform coefficients that have undergone a first transform).

Hereinafter, a method of designing and applying a simplified quadratic transform (RST) in consideration of worst case computational complexity is described for encoding/decoding of a still image or video.

Embodiments of the present disclosure provide methods and apparatus for compressing images and video. The compressed data has the form of a bit stream, and the bit stream may be stored in various types of storage devices and may be streamed to a terminal equipped with a decoder via a network. If the terminal has a display device, the terminal may display the decoded image on the display device, or may simply store the bitstream data. The method and apparatus proposed according to the embodiments of the present disclosure are applicable to both an encoder and a decoder or both a bitstream generator and a bitstream receiver regardless of whether a terminal outputs it through a display device.

The image compression apparatus mainly includes a prediction unit, a transform and quantization unit, and an entropy coding unit. Fig. 1 and 2 are block diagrams schematically illustrating an encoding apparatus and a decoding apparatus, respectively. Among these components, a transform and quantization unit transforms a residual signal obtained by subtracting a prediction signal from an original signal into a frequency domain signal via, for example, Discrete Cosine Transform (DCT) -2, and applies quantization to the frequency domain signal, thereby enabling image compression with a greatly reduced number of non-zero signals.

Fig. 1 is a block diagram schematically illustrating an encoding apparatus for encoding a video/image signal according to an embodiment of the present disclosure.

The image divider 110 may divide an image (or a picture or a frame) input to the encoding apparatus 100 into one or more processing units. As an example, a processing unit may be referred to as a Coding Unit (CU). In this case, the coding unit may be recursively split from the Coding Tree Unit (CTU) or the Largest Coding Unit (LCU) according to a binary quadtree tree (QTBT) structure. For example, one coding unit may be divided into a plurality of coding units having deeper depths based on a quadtree structure and/or a binary tree structure. In this case, for example, a quad tree structure may be applied first, and then a binary tree structure may be applied. Alternatively, a binary tree structure may be applied first. The encoding process according to the embodiment of the present disclosure may be performed based on the final encoding unit that is not divided any more. In this case, the maximum coding unit may be immediately used as the final coding unit based on, for example, coding efficiency depending on image characteristics, or if necessary, the coding unit may be recursively split into coding units having lower depths, and a coding unit having an optimal size may be used as the final coding unit. The encoding process may include, for example, prediction, transformation, or reconstruction as described below. As an example, the processing unit may also comprise a prediction unit PU or a transform unit TU. In this case, the prediction unit and the transform unit may each be divided into or from the final coding unit. The prediction unit may be a unit of sample prediction, and the transform unit may be a unit of deriving a transform coefficient and/or a unit of deriving a residual signal from the transform coefficient.

In some cases, the term "unit" may be used interchangeably with "block" or "region". In general, an mxn block may represent a set of samples or transform coefficients consisting of M columns and N rows. In general, a sample may represent a pixel or a pixel value, or may represent a pixel/pixel value of only a luminance component or a pixel/pixel value of only a chrominance component. A sample may be used as a term corresponding to a pixel or a pixel of a picture (or image).

The encoding apparatus 100 may generate a residual signal (residual block or residual sample array) by subtracting a prediction signal (prediction block or prediction sample array) output from the inter predictor 180 or the intra predictor 185 from an input image signal (original block or original sample array), and the generated residual signal is transmitted to the transformer 120. In this case, as shown, a unit in the encoder 100 for subtracting the prediction signal (prediction block or prediction sample array) from the input image signal (original block or original sample array) may be referred to as a subtractor 115. The predictor may perform prediction on a target block to be processed (hereinafter, a current block) and generate a prediction block including prediction samples for the current block. The predictor may determine whether to apply intra prediction or inter prediction in each block or CU unit. The predictor may generate various information regarding prediction, such as prediction mode information, as described below in connection with each prediction mode, and transmit the generated information to the entropy encoder 190. The prediction-related information may be encoded by the entropy encoder 190 and output in the form of a bitstream.

The intra predictor 185 may predict the current block by referring to samples in the current picture. Depending on the prediction mode, the reference samples may be adjacent to the current block or distant from the current block. In intra prediction, the prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The non-directional mode may include, for example, a DC mode and a planar mode. Depending on the elaboration of the prediction directions, the directional modes may include, for example, 33 directional prediction modes or 65 directional prediction modes. However, this is merely an example, and more or fewer directional prediction modes may be used. The intra predictor 185 may determine a prediction mode applied to the current block using prediction modes applied to neighboring blocks.

The intra predictor 180 may derive a prediction block of the current block based on a reference block (reference sample array) specified by a motion vector on a reference picture. Here, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information may be predicted on a block-by-block, sub-block, or sample basis based on the motion information correlation between neighboring blocks and the current block. The motion information may include a motion vector and a reference picture index. The motion information may also include inter prediction direction (L0 prediction, L1 prediction, or Bi prediction) information. In the case of inter prediction, the neighboring blocks may include spatially neighboring blocks existing in the current picture and temporally neighboring blocks existing in the reference picture. The reference picture including the reference block may be the same as or different from the reference picture including the temporally adjacent block. The temporally neighboring blocks may be referred to as, for example, collocated reference blocks or collocated cu (colcu), and the reference picture including the temporally neighboring blocks may be referred to as a collocated picture (colPic). For example, the inter predictor 180 may construct a motion information candidate list based on neighboring blocks and generate information indicating which candidate is used to derive a motion vector and/or a reference picture index of the current block. Inter prediction may be performed based on various prediction modes. For example, in the skip mode and the merge mode, the inter predictor 180 may use motion information of neighboring blocks as motion information of the current block. In the skip mode, unlike in the merge mode, a residual signal cannot be transmitted. In Motion Vector Prediction (MVP) mode, motion vectors of neighboring blocks may be used as motion vector predictors, and a motion vector difference may be signaled, thereby indicating a motion vector of the current block.

The prediction signal generated via the inter predictor 180 or the intra predictor 185 may be used to generate a reconstructed signal or a residual signal.

The transformer 120 may apply a transform scheme to the residual signal, generating transform coefficients. For example, the transform scheme may include at least one of a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), a Karhunen-loeve transform (KLT), a graph-based transform (GBT), or a conditional non-linear transform (CNT). GBT means a transform obtained from a map of information representing the relationship between pixels. CNT means a transform obtained based on generating a prediction signal using all previously reconstructed pixels. In addition, the transform process may be applied to square pixel blocks of the same size, or may also be applied to non-square blocks of variable sizes.

The quantizer 130 may quantize the transform coefficients and transmit the quantized transform coefficients to the entropy encoder 190, and the entropy encoder 190 may encode the quantized signal (information of the quantized transform coefficients) and output the encoded signal in a bitstream. Information of the quantized transform coefficients may be referred to as residual information. The quantizer 130 may reorder the block-shaped quantized transform coefficients in a one-dimensional vector form based on the coefficient scan order, and generate information of the quantized transform coefficients based on the one-dimensional form of the quantized transform coefficients. The entropy encoder 190 may perform various encoding methods such as, for example, exponential Golomb (exponential Golomb), context-adaptive variable length coding (CAVLC), or context-adaptive binary arithmetic coding (CABAC). The entropy encoder 190 may encode values of pieces of information (e.g., syntax elements) necessary for reconstructing the video/image together with or separately from the quantized transform coefficients. The encoded information (e.g., video/image information) may be transmitted or stored in the form of a bitstream on a per Network Abstraction Layer (NAL) unit basis. The bitstream may be transmitted via a network or stored in a digital storage medium. The network may include, for example, a broadcast network and/or a communication network, and the digital storage medium may include, for example, USB, SD, CD, DVD, blu-ray, HDD, SSD, or other various storage media. A transmitter (not shown) for transmitting a signal output from the entropy encoder 190 and/or a storage unit (not shown) storing the signal may be configured as an internal/external element of the encoding apparatus 100, or the transmitter may be a component of the entropy encoder 190.

The quantized transform coefficients output from the quantizer 130 may be used to generate a prediction signal. For example, the residual signal may be reconstructed by applying inverse quantization and inverse transform to the quantized transform coefficients via inverse quantizer 140 and inverse transformer 150 in the loop. The adder 155 may add the reconstructed residual signal to a prediction signal output from the inter predictor 180 or the intra predictor 185, thereby generating a reconstructed signal (a reconstructed picture, a reconstructed block, or a reconstructed sample array). As in the case of applying the skip mode, when the target block to be processed does not have a residual, the prediction block may be used as a reconstructed block. The adder 155 may be represented as a reconstructor or a reconstruction block generator. The generated reconstructed signal may be used for intra prediction of a next target processing block in the current picture, and filtered as described below, and then used for inter prediction of the next picture.

The filter 160 may enhance subjective/objective image quality by applying filtering to the reconstructed signal. For example, the filter 160 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed picture and transmit the modified reconstructed picture to the decoded picture buffer 170. The various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, or bilateral filter. The filter 160 may generate various information regarding filtering and communicate the resulting information to the entropy encoder 190, as described below in connection with each filtering method. The filtering related information may be encoded by the entropy encoder 190 and output in the form of a bitstream.

The modified reconstructed picture sent to the decoded picture buffer 170 may be used as a reference picture in the inter predictor 180. The encoding apparatus 100 can avoid prediction mismatch between the encoding apparatus 100 and the decoding apparatus when inter prediction is thus applied, and enhance encoding efficiency.

The decoded picture buffer 170 may store the modified reconstructed picture so as to use it as a reference picture in the inter predictor 180.

Fig. 2 is a block diagram schematically illustrating a decoding apparatus for decoding an image signal according to an embodiment of the present disclosure.

Referring to fig. 2, the decoding apparatus 200 may include an entropy decoder 210, an inverse quantizer 220, an inverse transformer 230, an adder 235, a filter 240, a decoded picture buffer 250, an inter predictor 260, and an intra predictor 265. The inter predictor 260 and the intra predictor 265 may be collectively referred to as a predictor. In other words, the predictor may include an inter predictor 180 and an intra predictor 185. The inverse quantizer 220 and the inverse transformer 230 may be collectively referred to as a residual processor. In other words, the residual processor may include an inverse quantizer 220 and an inverse transformer 230. The entropy decoder 210, inverse quantizer 220, inverse transformer 230, adder 235, filter 240, inter-frame predictor 260, and intra-frame predictor 265 may be configured in a single hardware component (e.g., a decoder or processor) according to an embodiment. According to an embodiment, the decoded picture buffer 250 may be implemented as a single hardware component (e.g., a memory or a digital storage medium).

When a bitstream including video/image information is input, the decoding apparatus 200 may reconstruct an image corresponding to the video/image information processing in the encoding apparatus 100 of fig. 2. For example, the decoding apparatus 200 may perform decoding using a processing unit applied in the encoding apparatus 100. Thus, at the time of decoding, the processing unit may be, for example, a coding unit, and the coding unit may be divided from the coding tree unit or the maximum coding unit according to a quad tree structure and/or a binary tree structure. The reconstructed image signal decoded and output by the decoding apparatus 200 may be played via a player.

The decoding apparatus 200 may receive a signal output from the encoding apparatus 100 of fig. 2 in the form of a bitstream and may decode the received signal via the entropy decoder 210. For example, the entropy decoder 210 may parse the bitstream and extract information (e.g., video/image information) necessary for image reconstruction (or picture reconstruction). For example, the entropy decoder 210 may decode information in a bitstream based on an encoding method such as exponential golomb encoding, CAVLC, or CABAC, and may output values of syntax elements necessary for image reconstruction and quantized values of transform coefficients regarding a residual. Specifically, the CABAC entropy decoding method may receive a bin corresponding to each syntax element in the bitstream, determine a context model using decoding target syntax element information, decoding information for a neighboring target block and the decoding target block, or information of symbols/bins decoded in a previous step, predict a probability of occurrence of the bin according to the determined context model, and perform arithmetic decoding of the bin. At this time, after determining the context model, the CABAC entropy decoding method may update the context model using information of the symbol/bin decoded for the context model of the next symbol/bin. Among the pieces of information decoded by the entropy decoder 210, information regarding prediction may be provided to predictors (e.g., an inter predictor 260 and an intra predictor 265), and residual values entropy-decoded by the entropy decoder 210 (i.e., quantized transform coefficients and associated processor information) may be input to the inverse quantizer 220. Among the pieces of information decoded by the entropy decoder 210, information regarding filtering may be provided to the filter 240. In addition, a receiver (not shown) for receiving a signal output from the encoding apparatus 100 may also be configured as an internal/external element of the decoding apparatus 200, or the receiver may be a component of the entropy decoder 210.

The inverse quantizer 220 may inverse-quantize the quantized transform coefficients and output the transform coefficients. The inverse quantizer 220 may reorder the quantized transform coefficients in a two-dimensional block form. In this case, the reordering may be performed based on the coefficient scan order that the encoding apparatus 100 has performed. The inverse quantizer 220 may inverse-quantize the quantized transform coefficients using a quantization parameter (e.g., quantization step information) to obtain transform coefficients.

The inverse transformer 230 obtains a residual signal (residual block or residual sample array) by inverse-transforming the transform coefficients.

The predictor may perform prediction on the current block and generate a prediction block including prediction samples for the current block. The predictor may determine which of intra prediction or inter prediction is applied to the current block based on information about prediction output from the entropy decoder 210, and determine a specific intra/inter prediction mode.

The intra predictor 265 may predict the current block by referring to samples in the current picture. Depending on the prediction mode, the reference samples may be adjacent to the current block or distant from the current block. In intra prediction, the prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The intra predictor 265 may determine a prediction mode applied to the current block using prediction modes applied to neighboring blocks.

The intra predictor 260 may derive a prediction block of the current block based on a reference block (reference sample array) specified by a motion vector on a reference picture. Here, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information may be predicted on a block-by-block, sub-block, or sample basis based on the motion information correlation between neighboring blocks and the current block. The motion information may include a motion vector and a reference picture index. The motion information may also include inter prediction direction (L0 prediction, L1 prediction, or Bi prediction) information. In the case of inter prediction, the neighboring blocks may include spatially neighboring blocks existing in the current picture and temporally neighboring blocks existing in the reference picture. For example, the inter predictor 260 may construct a motion information candidate list based on information related to prediction of neighboring blocks and derive a motion vector and/or a reference picture index of the current block based on the received candidate selection information. Inter prediction may be performed based on various prediction modes. The information on prediction may include information indicating a mode of inter prediction with respect to the current block.

The adder 235 may add the obtained residual signal to a prediction signal (e.g., a prediction block or a prediction sample array) output from the inter predictor 260 or the intra predictor 265, thereby generating a reconstructed signal (a reconstructed picture, a reconstructed block, or a reconstructed sample array). As in the case of applying the skip mode, when there is no residual of the target block to be processed, the prediction block may be used as a reconstructed block.

The adder 235 may be represented as a reconstructor or a reconstruction block generator. The generated reconstructed signal may be used for intra prediction of a next target processing block in the current picture, and filtered as described below, and then used for inter prediction of the next picture.

The filter 240 may enhance subjective/objective image quality by applying filtering to the reconstructed signal. For example, the filter 240 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed picture and transmit the modified reconstructed picture to the decoded picture buffer 250. The various filtering methods may include, for example, deblocking filtering, Sample Adaptive Offset (SAO), Adaptive Loop Filter (ALF), or bilateral filter.

The modified reconstructed picture transmitted to the decoded picture buffer 250 may be used as a reference picture by the inter predictor 260.

In the present disclosure, the embodiments described above in connection with the filter 160, the inter predictor 180, and the intra predictor 185 of the encoding apparatus 100 may be applied in the same manner as or corresponding to the filter 240, the inter predictor 260, and the intra predictor 265 of the decoding apparatus 200.

Fig. 3a, 3b, 3c, and 3d are views illustrating block division structures according to a Quadtree (QT), a Binary Tree (BT), a Ternary Tree (TT), and an Asymmetric Tree (AT), respectively, according to an embodiment of the present disclosure.

In video coding, a block may be partitioned based on QT. A sub-block partitioned by QT may also be recursively partitioned by QT. Leaf blocks that are no longer partitioned by QT may be partitioned by AT least one of BT, TT, or AT. BT may have two types of segmentation such as horizontal BT (2N × N ) and vertical BT (N × 2N ). TT may have two types of divisions such as horizontal TT (2N × 1/2N ) and vertical TT (1/2N × 2N ). The AT may have four types of partitions such as a horizontal upper AT (2N × 1/2N,2N × 3/2N), a horizontal lower AT (2N × 3/2N,2N × 1/2N), a vertical left AT (1/2N × 2N,3/2N × 2N), and a vertical right AT (3/2N × 2N,1/2N × 2N). Each of BT, TT and AT may be further recursively segmented using BT, TT and AT.

FIG. 3a shows an example of QT partitioning. Block a may be partitioned into four sub-blocks (a0, a1, a2, A3) by QT. Sub-block a1 may be subdivided into four sub-blocks (B0, B1, B2, B3) by QT.

Fig. 3b shows an example of BT segmentation. The block B3, which is no longer split by QT, may be split into vertical BT (C0, C1) or horizontal BT (D0, D1). As with block C0, each sub-block may be further recursively partitioned, for example, in the form of horizontal BT (E0, E1) or vertical BT (F0, F1).

Fig. 3c shows an example of TT segmentation. The block B3, which is no longer split by QT, can be split into a vertical TT (C0, C1, C2) or a horizontal TT (D0, D1, D2). As with block C1, each sub-block may be further recursively split, for example, in the form of a horizontal TT (E0, E1, E2) or a vertical TT (F0, F1, F2).

Fig. 3d shows an example of AT segmentation. The block B3, which is no longer split by QT, may be split into vertical ATs (C0, C1) or horizontal ATs (D0, D1). As with block C1, each sub-block may be further recursively partitioned, for example, in the form of a horizontal AT (E0, E1) or a vertical TT (F0, F1).

Further, BT, TT and AT may be used together. For example, sub-blocks partitioned by BT may be partitioned by TT or AT. In addition, sub-blocks divided by TT may be divided by BT or AT. The subblocks partitioned by AT may be partitioned by BT or TT. For example, each sub-block may be split by vertical BT after being split by horizontal BT, or each sub-block may be split by horizontal BT after being split by vertical BT. In this case, the final shape after segmentation may be the same, although a different segmentation order is applied.

When a block is partitioned, various orders of searching for the block may be defined. Typically, the search is performed from left to right or top to bottom. The order of searching the blocks may mean an order of determining whether to further divide each sub-block, an order of encoding each sub-block in case the block is no longer divided, or a search order when the sub-blocks refer to other neighboring blocks.

The transformation may be performed in processing units (or transform blocks) divided in accordance with a division structure as shown in fig. 3a to 3 d. Specifically, it may be divided in a row direction and a column direction, and a transformation matrix may be applied. Other types of transforms may be used along the row direction or column direction of a processing unit (or transform block), according to embodiments of the present disclosure.

Fig. 4 and 5 are embodiments to which the present disclosure is applied. Fig. 4 is a block diagram schematically illustrating the encoding apparatus 100 of fig. 1 including a transform and quantization unit 120/130 according to an embodiment of the present disclosure, and fig. 5 is a block diagram schematically illustrating a decoding apparatus 200 including an inverse quantization and inverse transform unit 220/230 according to an embodiment of the present disclosure.

Referring to fig. 4, the transform and quantization unit 120/130 may include a main transform unit 121, a quadratic transform unit 122, and a quantizer 130. The inverse quantization and inverse transformation unit 140/150 may include an inverse quantizer 140, an inverse quadratic transformation unit 151, and an inverse main transformation unit 152.

Referring to fig. 5, the inverse quantization and inverse transformation unit 220/230 may include an inverse quantizer 220, an inverse quadratic transformation unit 231, and an inverse main transformation unit 232.

In the present disclosure, the transformation may be performed through a plurality of steps. For example, as shown in fig. 4, two steps of primary and secondary transformation may be applied, or more transformation steps may be applied according to an algorithm. Here, the primary transform may be referred to as a core transform.

The main transform unit 121 may apply a main transform to the residual signal. Here, the main transform may be defined in advance as a table in the encoder and/or the decoder.

The quadratic transformation unit 122 may apply quadratic transformation to the main transformed signal. Here, the quadratic transform may be defined in advance as a table in an encoder and/or a decoder.

According to an embodiment, an undifferentiated quadratic transformation (NSST) may be conditionally applied as a quadratic transformation. For example, NSST may be applied only to intra prediction blocks, and may have a transform set suitable for each prediction mode group.

Here, the prediction mode group may be set based on the symmetry of the prediction direction. For example, since prediction mode 52 and prediction mode 16 are symmetric with respect to prediction mode 34 (diagonal direction), they may form a group and the same set of transforms may be applied thereto. When applying the transform to the prediction mode 52, the transform is applied to the transposed input data after transposing the input data, because the transform set of the prediction mode 52 is the same as the transform set of the prediction mode 16.

Furthermore, since the planar mode and the DC mode lack directional symmetry, they have their own transform sets, and each transform set may be composed of two transforms. For other orientation modes, each transform set may consist of three transforms.

The quantizer 130 may perform quantization on the secondarily transformed signal.

The inverse quantization and inverse transformation unit 140/150 may inversely perform the above-described processes and a repetitive description is not provided.

Fig. 5 is a block diagram schematically illustrating the inverse quantization and inverse transformation unit 220/230 in the decoding apparatus 200.

The inverse quantizer 220 obtains transform coefficients from the entropy-decoded signal using quantization step size information.

The inverse quadratic transform unit 231 performs inverse quadratic transform on the transform coefficients. Here, the inverse quadratic transform means an inverse of the quadratic transform described above in connection with fig. 4.

The inverse main transform unit 232 performs inverse main transform on the inversely twice transformed signal (or block), and obtains a residual signal. Here, the inverse main transform represents an inverse of the main transform described above in connection with fig. 4.

Fig. 6 is a flowchart illustrating an example of encoding a video signal via primary and secondary transforms according to an embodiment of the present disclosure. The operations of fig. 6 may be performed by the transformer 120 of the encoding apparatus 100.

The encoding apparatus 100 may determine (or select) a forward quadratic transform based on at least one of a prediction mode, a block shape, and/or a block size of the current block (S610).

The encoding device 100 may determine an optimal forward quadratic transform via rate-distortion (RD) optimization. The optimal forward quadratic transform may correspond to one of a plurality of transform combinations, and the plurality of transform combinations may be defined by a transform index. For example, for RD optimization, the encoding apparatus 100 may compare all results of performing forward quadratic transform, quantization, and residual coding on the respective candidates.

The encoding apparatus 100 may signal a quadratic transform index corresponding to the optimal forward quadratic transform (S620). Here, other embodiments described in the present disclosure may be applied to the quadratic transformation index.

Also, the encoding apparatus 100 may perform a forward main scan on the current block (residual block) (S630).

The encoding apparatus 100 may perform a forward quadratic transform on the current block using the optimal forward quadratic transform (S640). Further, the forward second transformation may be RST described below. RST means a transform in which N pieces of residual data (N × 1 residual vectors) are input and R (R < N) pieces of transform coefficient data (R × 1 transform coefficient vectors) are output.

According to an embodiment, the RST may be applied to a specific region of the current block. For example, when the current block is an N × N, the specific region may mean an upper-left N/2 × N/2 region. However, the present disclosure is not limited thereto, and the specific region may be set to be different according to at least one of a prediction mode, a block shape, or a block size. For example, when the current block is N × N, the specific region may mean an upper-left M × M region (M ≦ N).

Also, the encoding apparatus 100 may perform quantization on the current block, thereby generating a transform coefficient block (S650).

The encoding apparatus 100 may perform entropy encoding on the transform coefficient block, thereby generating a bitstream.

Fig. 7 is a flowchart illustrating an example of decoding a video signal via a secondary inverse transform and a primary inverse transform according to an embodiment of the present disclosure. The operations of fig. 7 may be performed by inverse transformer 230 of decoding apparatus 200.

The decoding apparatus 200 may obtain a quadratic transform index from the bitstream.

The decoding apparatus 200 may induce a quadratic transform corresponding to the quadratic transform index.

However, steps S710 and S720 correspond to only one embodiment, and the present disclosure is not limited thereto. For example, the decoding apparatus 200 may induce the secondary transform based on at least one of a prediction mode, a block shape, and/or a block size of the current block without obtaining the secondary transform index.

In addition, the decoder 200 may obtain a transform coefficient block by entropy-decoding the bitstream, and may perform inverse quantization on the transform coefficient block (S730).

The decoder 200 may perform inverse quadratic transform on the inverse quantized transform coefficient block (S740). For example, the inverse quadratic transform may be the inverse RST. The inverse RST is a transposed matrix of the RST described above in connection with fig. 6, and means a transform in which R transform coefficient data (R × 1 transform coefficient vector) is input and N residual data (N × 1 residual vector) is output.

According to an embodiment, a simplified quadratic transform may be applied to a specific region of a current block. For example, when the current block is an N × N, the specific region may mean an upper-left N/2 × N/2 region. However, the present disclosure is not limited thereto, and the specific region may be set to be different according to at least one of a prediction mode, a block shape, or a block size. For example, when the current block is N × N, the specific region may mean an upper-left M × M (M ≦ N) or M × L (M ≦ N, L ≦ N) region.

The decoder 200 may perform inverse primary transformation on the result of the inverse quadratic transformation (S750).

The decoder 200 generates a residual block via step S750, and generates a reconstructed block by adding the residual block to the prediction block.

Fig. 8 illustrates an example transform configuration set applying Adaptive Multiple Transforms (AMTs) according to an embodiment of the present disclosure.

Referring to fig. 8, a transform configuration group may be determined based on a prediction mode, and there may be six (G0 to G5) groups in total. G0 to G4 correspond to the case where intra prediction is applied, and G5 denotes a transform combination (or a transform set or a transform combination set) applied to a residual block generated by inter prediction.

A transform combination may consist of a horizontal transform (or row transform) applied to rows of a two-dimensional block and a vertical transform (or column transform) applied to columns of the two-dimensional block.

Here, each transform configuration group may include four transform combination candidates. Four transform combination candidates may be selected or determined via the transform combination indexes 0 to 3, and the transform combination indexes may be transmitted from the encoding apparatus 100 to the decoding apparatus 200 via an encoding process.

According to an embodiment, residual data (or residual signals) obtained via intra prediction may have different statistical characteristics according to an intra prediction mode. Therefore, transforms other than the conventional cosine transform may be applied in a prediction mode, as shown in fig. 8. The transform type may be denoted herein as DCT type 2, DCT-II, or DCT-2.

Fig. 8 illustrates respective transform set configurations when 35 intra prediction modes are used and when 67 intra prediction modes are used. A plurality of transform combinations may be applied according to a transform configuration group distinguished in an intra prediction mode column. For example, a plurality of transformation combinations (transformation in the row direction, transformation in the column direction) may be composed of four combinations. More specifically, since DST-7 and DCT-5 can be applied to both the row (horizontal) and column (vertical) directions in group 0, four combinations are possible.

Since a total of four transform kernel combinations can be applied to each intra prediction mode, a transform combination index for selecting one of them can be transmitted on a transform unit-by-transform unit basis. In this disclosure, the transform combination index may be represented as an AMT index and may be represented as AMT _ idx.

In kernels other than the one presented in fig. 8, DCT-2 is sometimes optimal for both row and column directions due to the nature of the residual signal. Thus, the transformation can be performed adaptively by defining the AMT flag of each coding unit. Here, if the AMT flag is 0, DCT-2 may be applied to both the row direction and the column direction, and if the AMT flag is 1, one of the four combinations may be selected or determined via the AMT index.

According to an embodiment, in case that the AMT flag is 0, if the number of transform coefficients is 3 or less for one transform unit, the transform kernel of fig. 8 is not applied, and DST-7 may be applied to both row direction and column direction.

According to an embodiment, the transform coefficient values are parsed first, and if the number of transform coefficients is 3 or less, the AMT index is not parsed, and DST-7 may be applied, thereby reducing the transmission of additional information.

According to an embodiment, AMT may only be applied when both the width and the height of the conversion unit are 32 or less.

According to an embodiment, fig. 8 may be set up in advance via offline training.

According to an embodiment, the AMT index may be defined by one index that may indicate a combination of horizontal transformation and vertical transformation at the same time. Alternatively, the AMT index may be defined by a horizontal transform index and a vertical transform index, respectively.

Like the AMT described above, a scheme of applying a transform selected from among a plurality of kernels (e.g., DCT-2, DST-7, and DCT-8) may be expressed as a Multiple Transform Selection (MTS) or an Enhanced Multiple Transform (EMT), and an AMT index may be expressed as an MTS index.

Fig. 9 is a flowchart illustrating an encoding applying AMT according to an embodiment of the present disclosure. The operations of fig. 9 may be performed by the transformer 120 of the encoding apparatus 100.

Although this disclosure basically describes applying transforms separately for the horizontal and vertical directions, the transform combination may be made up of indivisible transforms.

Alternatively, separable transforms and nondividable transforms may be mixed. In this case, if the indivisible transform is used, it is not necessary to perform transform selection in the row/column direction or to perform selection in the horizontal/vertical direction, and only when the separable transform is selected, the transform combination of fig. 8 can be used.

In addition, the scheme proposed in the present disclosure can be applied regardless of whether it is a primary or secondary transformation. In other words, there is no restriction that either should be applied but both can be applied. Here, the primary transform may mean a transform for transforming the residual block first, and the secondary transform may mean a transform applied to the block resulting from the primary transform.

First, the encoding apparatus 100 may determine a transform configuration group corresponding to the current block (S910). Here, the transform configuration group may be composed of combinations as shown in fig. 8.

The encoding apparatus 100 may perform transformation on candidate transformation combinations available in the transformation configuration group (S920).

As a result of performing the transform, the encoding device 100 may determine or select a transform combination having a minimum Rate Distortion (RD) cost (S930).

The encoding apparatus 100 may encode a transform combination index corresponding to the selected transform combination (S940).

Fig. 10 is a flowchart illustrating decoding applying AMT according to an embodiment of the present disclosure. The operations of fig. 10 may be performed by inverse transformer 230 of decoding apparatus 200.

First, the decoding device 200 may determine a transform configuration group of the current block (S1010). The decoding apparatus 200 may parse (or obtain) a transform combination index from the video signal, wherein the transform combination index may correspond to any one of a plurality of transform combinations in the transform configuration group (S1020). For example, the set of transform configurations may include DCT-2, DST-7, or DCT-8.

The decoding apparatus 200 may induce a transform combination corresponding to the transform combination index (S1030). Here, the transform combination may be composed of a horizontal transform and a vertical transform, and may include at least one of DCT-2, DST-7, or DCT-8. In addition, as the transform combination, the transform combination described above in connection with fig. 8 may be used.

The decoding device 200 may perform an inverse transform on the current block based on the induced transform combination (S1040). In case the transformation combination consists of a row (horizontal) transformation and a column (vertical) transformation, the row (horizontal) transformation may be applied first and then the column (vertical) transformation may be applied. However, the present disclosure is not limited thereto, and the opposite manner thereof may be applied, or if composed of only an indivisible transform, the indivisible transform may be immediately applied.

According to an embodiment, if the vertical transform or the horizontal transform is DST-7 or DCT-8, the inverse transform of DST-7 or the inverse transform of DCT-8 may be applied column by column and then row by row. In addition, in the vertical transform or the horizontal transform, different transforms may be applied column-wise and/or row-wise.

According to an embodiment, the transformation combination index may be obtained based on an AMT flag indicating whether to execute the AMT. In other words, the transformation combination index can be obtained only when the AMT is executed according to the AMT flag. In addition, the decoding apparatus 200 may identify whether the number of non-zero transform coefficients is greater than a threshold. At this time, the transform combination index may be resolved only when the number of non-zero transform coefficients is greater than a threshold.

According to an embodiment, the AMT flag or the AMT index may be defined at a level of at least one of a sequence, a picture, a slice, a block, a coding unit, a transform unit, or a prediction unit.

Further, according to another embodiment, the process of determining the transformation configuration group and the step of parsing the transformation combination index may be performed simultaneously. Alternatively, step S1010 may be preset and omitted in the encoding apparatus 100 and/or the decoding apparatus 200.

Fig. 11 is a flowchart illustrating an example of encoding an AMT flag and an AMT index according to an embodiment of the present disclosure. The operations of fig. 11 may be performed by the transformer 120 of the encoding apparatus 100.

The encoding apparatus 100 may determine whether to apply ATM to the current block (S1110).

If the AMT is applied, the encoding apparatus 100 may perform encoding with the AMT flag being 1 (S1120).

The encoding apparatus 100 may determine the AMT index based on at least one of a prediction mode, a horizontal transform, or a vertical transform of the current block (S1130). Here, the AMT index denotes an index indicating any one of a plurality of transform combinations for each intra prediction mode, and may be transmitted on a transform unit-by-transform unit basis.

When the AMT index is determined, the encoding apparatus 100 may encode the AMT index (S1140).

On the other hand, unless the AMT is applied, the encoding apparatus 100 may perform encoding with the AMT flag being 0 (S1150).

FIG. 12 is a flowchart illustrating decoding for performing a transformation based on an AMT flag and an AMT index.

The decoding apparatus 200 may parse the AMT flag from the bitstream (S1210). Here, the AMT flag may indicate whether to apply the AMT to the current block.

The decoding apparatus 200 may identify whether to apply the AMT to the current block based on the AMT flag (S1220). For example, the decoding apparatus 200 may identify whether the AMT flag is 1.

If the AMT flag is 1, the decoding apparatus 200 may parse the AMT index (S1230). Here, the AMT index denotes an index indicating any one of a plurality of transform combinations for each intra prediction mode, and may be transmitted on a transform unit-by-transform unit basis. Alternatively, the AMT index may mean an index indicating any one of the transform combinations defined in the preset transform combination table. The preset transformation combination table may mean fig. 8, but the present disclosure is not limited thereto.

The decoding apparatus 200 may induce or determine a horizontal transform and a vertical transform based on at least one of the AMT index or the prediction mode (S1240).

Alternatively, the decoding apparatus 200 may induce a transform combination corresponding to the AMT index. For example, the decoding apparatus 200 may induce or determine a horizontal transform and a vertical transform corresponding to the AMT index.

In addition, if the AMT flag is 0, the decoding apparatus 200 may apply a preset inverse vertical transform by column (S1250). For example, the vertical inverse transform may be the inverse transform of DCT-2.

The decoding apparatus 200 may apply a preset horizontal inverse transform by rows (S1260). For example, the horizontal inverse transform may be the inverse transform of DCT-2. That is, when the AMT flag is 0, a preset transform kernel may be used in the encoding apparatus 100 or the decoding apparatus 200. For example, a widely used transform kernel may be used instead of the transform kernel defined in the transform combination table as shown in fig. 8.

NSST (inseparable secondary transformation)

The quadratic transformation means to apply the transformation kernel again using the application result of the primary transformation as input. The primary transform may comprise DCT-2 or DST-7 in HEVC or AMT as described above. The indivisible transform means that, after considering an N × N two-dimensional residual block as an N2 × 1 vector, an N2 × N2 transform kernel is applied only once to an N2 × 1 vector, instead of sequentially applying the N × N transform kernels to a row direction and a column direction.

That is, NSST may represent an unpartible square matrix applied to a vector composed of coefficients of a transform block. In addition, although the description of the embodiments of the present disclosure focuses on NSST as an example of an indivisible transformation applied to an upper left region (low frequency region) determined according to a block size, the embodiments of the present disclosure are not limited to the term "NSST," but any type of indivisible transformation may be applied to the embodiments of the present disclosure. For example, an indivisible transform applied to an upper left region (low frequency region) determined according to a block size may be represented as a low frequency indivisible transform (LFNST). In the present disclosure, an mxn transform (or transform matrix) means a matrix composed of M rows and N columns.

In NSST, two-dimensional block data obtained by applying a main transform is divided into M × M blocks, and then, an M2 × M2 indivisible transform is applied to each M × M block. M may be, for example, 4 or 8. NSST may be applied only to some regions, instead of applying NSST to all regions in a two-dimensional block obtained through a main transform. For example, NSST may be applied only to the upper left 8 × 8 block. In addition, only when both the width and the height of a two-dimensional block obtained by the main transform are 8 or more, a 64 × 64 indivisible transform may be applied to an 8 × 8 region on the upper left, and the remaining region may be divided into 4 × 4 blocks and a 16 × 16 indivisible transform may be applied to each of the 4 × 4 blocks.

The M2 × M2 indivisible transform may be applied in the form of a matrix product, but, to reduce computational load and memory requirements, may be approximated by a combination of a givens rotation layer and a displacement layer. Figure 13 illustrates a givens rotation. This is depicted with an angle of rotation of one givens, as shown in fig. 13.

Fig. 13 is a diagram illustrating givens rotation according to an embodiment of the present disclosure, and fig. 14 is a diagram illustrating a configuration of one round in a 4 × 4NSST composed of permutation and givens rotation layers according to an embodiment of the present disclosure.

Both 8 x 8NSST and 4x4NSST may be composed of a hierarchy combination of givens rotations. The matrix corresponding to one givens rotation is shown in equation 1, and the matrix product can be represented in a graph as shown in fig. 13.

[ formula 1]

In FIG. 13, the output by Givens rotation can be calculated as in equation 2T of_mAnd t_n。

[ formula 2]

t_m＝x_mcosθ-x_nsinθ

t_n＝x_msinθ+x_ncosθ

Since one givens rotation rotates two data as shown in fig. 13, 32 or 8 givens rotations are needed to process 64 data (in the case of 8 × 8NSST) or 16 data (in the case of 4 × 4NSST), respectively. Thus, a bundle of 32 or 8 givens revolutions may form a givens revolution layer. As shown in fig. 14, by permutation (or shuffling), the output data of one givens rotation layer is transferred as the input data of the next givens rotation layer. As shown in fig. 14, the permutation pattern is defined regularly, and in the case of 4 × 4NSST, four givens rotation layers and their corresponding permutations form one round. 4 × 4NSST is performed for two rounds and 8 × 8NSST is performed for four rounds. Although different rounds use the same permutation pattern, different givens rotation angles are applied. Therefore, it is necessary to store the angle data of all the givens rotations constituting each transformation.

In the last step, the last permutation is performed again on the data output via the givens rotation layer, and information about the permutations is stored separately, transform by transform. The permutation is performed at the end of the forward NST and the inverse permutation is applied first to the inverse NST.

The inverse NSST performs the givens rotation layer and permutation applied to the forward NSST in reverse order, and rotates each givens rotation by an angle that takes a negative (-) value.

Fig. 15 illustrates an example configuration of an indivisible transformation set for each intra prediction mode according to an embodiment of the present disclosure.

Intra prediction modes applying the same NSST or NSST set may form a group. In fig. 15, 67 intra prediction modes are divided into 35 groups. For example, both the pattern No. 20 and the pattern No. 48 belong to the group No. 20 (hereinafter, pattern group).

For each mode group, multiple NSSTs may be configured as a set instead of one NSST. Each set may include cases where NSST is not applied. For example, in the case where three different NSSTs can be applied to one pattern group, one of four cases including the case where NSST is not applied can be selected. At this time, an index for distinguishing one of the four cases may be transmitted in each TU. The number of NSSTs may be configured to be different for each mode group. For example, the mode group number 0 and the mode group number 1 may be signaled separately to select one of three cases including a case where NSST is not applied.

66页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：有效子图片提取

Method and apparatus for processing video signal

相关技术

网友询问留言