Method and apparatus for controlling residual coding for decoding or encoding a video sequence

文档序号：1524344 发布日期：2020-02-11 浏览：24次中文

阅读说明：本技术 控制用于解码或编码视频序列的残差编码的方法和装置 (Method and apparatus for controlling residual coding for decoding or encoding a video sequence ) 是由赵欣李翔刘杉于 2019-07-26 设计创作，主要内容包括：一种控制残差编码的方法,所述残差编码用于解码或编码视频序列,包括根据视频序列的分辨率确定是否将主变换的小变换尺寸用于所述视频序列的编码块的残差编码；如果视频序列的分辨率低于预设阈值,确定使用所述主变换的小变换尺寸,并将第一变换集识别为所述主变换,所述第一变换集包括离散正弦变换(DST)-4和离散余弦变换(DCT)-4；如果视频序列的分辨率不低于预设阈值,确定不使用所述主变换的小变换尺寸,并将第二变换集识别为所述主变换,所述第二变换集包括DST-7和DCT-8；使用所述识别的主变换执行所述编码块的残差编码。(A method of controlling residual coding for decoding or encoding a video sequence, comprising determining whether to use a small transform size of a primary transform for residual coding of a coding block of the video sequence according to a resolution of the video sequence; determining a small transform size using the primary transform if the resolution of the video sequence is below a preset threshold, and identifying a first set of transforms as the primary transform, the first set of transforms comprising a Discrete Sine Transform (DST) -4 and a Discrete Cosine Transform (DCT) -4; if the resolution of the video sequence is not lower than a preset threshold, determining a small transform size not using the primary transform and identifying a second set of transforms as the primary transform, the second set of transforms comprising DST-7 and DCT-8; performing residual coding of the coding block using the identified primary transform.)

1. A method of controlling residual coding for decoding or encoding a video sequence, the method comprising:

determining whether to use a small transform size of a primary transform for residual coding of a coding block of a video sequence according to a resolution of the video sequence;

determining a small transform size using the primary transform if the resolution of the video sequence is below a preset threshold, and identifying a first set of transforms as the primary transform, the first set of transforms comprising a Discrete Sine Transform (DST) -4 and a Discrete Cosine Transform (DCT) -4;

if the resolution of the video sequence is not lower than a preset threshold, determining a small transform size not using the primary transform and identifying a second set of transforms as the primary transform, the second set of transforms comprising DST-7 and DCT-8;

performing residual coding of the coding block using the identified primary transform.

2. The method of claim 1, wherein the small transform size of the primary transform is equal to or less than an 8-point transform kernel matrix.

3. The method of claim 1, wherein the large transform size of the primary transform is a transform kernel matrix larger than 8 points.

4. The method of claim 1, wherein the large transform size of the primary transform is equal to or greater than 16-point transform kernel matrix.

5. The method of claim 1, further comprising:

determining whether a quadratic transform is used for residual coding of the coding block, wherein residual coding of the coding block is performed using the identified primary transform;

if the secondary transform is determined to be used, determining whether to use a small transform size of the secondary transform for residual coding of the coding block;

identifying the first set of transforms comprising DST-4 and DCT-4 as the quadratic transform if it is determined to use a small transform size of the quadratic transform;

identifying the second set of transforms comprising DST-7 and DCT-8 as the quadratic transform if it is determined not to use the small transform size of the quadratic transform;

performing residual coding of the coding block using the identified secondary transform, the residual coding of the coding block being performed using the identified primary transform.

6. The method of claim 5, wherein the primary transform is an adaptive multi-transform and the secondary transform is an inseparable secondary transform.

7. The method of claim 1, wherein performing the residual coding comprises: a residual block of an intra-or inter-coded coding block is encoded.

8. An apparatus for controlling residual coding for decoding or encoding a video sequence, the apparatus comprising:

a first determining module, configured to determine whether to use a small transform size of a primary transform for residual coding of a coding block of a video sequence according to a resolution of the video sequence;

an identification module for identifying the location of the mobile terminal,

an execution module for executing residual coding of the coding block using the identified primary transform.

9. The apparatus of claim 8, wherein the small transform size of the primary transform is equal to or less than an 8-point transform kernel matrix.

10. The apparatus of claim 8, wherein the large transform size of the primary transform is a transform kernel matrix larger than 8 points.

11. The apparatus of claim 8, wherein the large transform size of the primary transform is equal to or larger than a 16-point transform kernel matrix.

12. The apparatus of claim 8, further comprising a second determining module for determining whether to use a quadratic transform for residual coding of the coding block, wherein residual coding of the coding block is performed using the identified primary transform;

wherein the first determining module is further to, if it is determined to use the quadratic transform, determine whether to use a small transform size of the quadratic transform for residual coding of the coding block,

wherein the identification module is further configured to,

identifying a first set of transforms comprising the DST-4 and DCT-4 as the quadratic transform if it is determined to use a small transform size of the quadratic transform;

identifying a second set of transforms comprising said DST-7 and DCT-8 as said quadratic transform if it is determined not to use the small transform size of said quadratic transform,

wherein the performing module is further for performing residual coding of the coding block using the identified secondary transform, the residual coding of the coding block being performed using the identified primary transform.

13. The apparatus of claim 8, wherein the primary transform is an adaptive multi-transform and the secondary transform is an inseparable secondary transform.

14. The apparatus of claim 8, wherein the means for performing is further configured to encode a residual block of an intra-or inter-coded block.

15. A non-transitory computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform the method of controlling residual encoding of any one of claims 1-7.

16. A computer device, comprising one or more processors and one or more memories having stored therein at least one instruction, the at least one instruction being loaded and executed by the one or more processors to implement the method of controlling residual coding of any of claims 1-7.

Technical Field

The present application relates to video coding techniques, and more particularly, to a method, apparatus, non-volatile computer-readable storage medium, and computer device for controlling residual coding for decoding or encoding a video sequence.

Background

In High Efficiency Video Coding (HEVC), the core transforms are 4-point, 8-point, 16-point, and 32-point Discrete Cosine Transforms (DCT) -2. The transform kernel matrix for the smaller DCT-2 is part of the larger DCT-2 as follows.

4x4 transformation

{64,64,64,64}

{83,36,-36,-83}

{64,-64,-64,64}

{36,-83,83,-36}

8x8 transformation

{64,64,64,64,64,64,64,64}

{89,75,50,18,-18,-50,-75,-89}

{83,36,-36,-83,-83,-36,36,83}

{75,-18,-89,-50,50,89,18,-75}

{64,-64,-64,64,64,-64,-64,64}

{50,-89,18,75,-75,-18,89,-50}

{36,-83,83,-36,-36,83,-83,36}

{18,-50,75,-89,89,-75,50,-18}

16x16 transformation

{64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64}

{90 87 80 70 57 43 25 9 -9-25-43-57-70-80-87-90}

{89 75 50 18-18-50-75-89-89-75-50-18 18 50 75 89}

{87 57 9-43-80-90-70-25 25 70 90 80 43 -9-57-87}

{83 36-36-83-83-36 36 83 83 36-36-83-83-36 36 83}

{80 9-70-87-25 57 90 43-43-90-57 25 87 70 -9-80}

{75-18-89-50 50 89 18-75-75 18 89 50-50-89-18 75}

{70-43-87 9 90 25-80-57 57 80-25-90 -9 87 43-70}

{64-64-64 64 64-64-64 64 64-64-64 64 64-64-64 64}

{57-80-25 90 -9-87 43 70-70-43 87 9-90 25 80-57}

{50-89 18 75-75-18 89-50-50 89-18-75 75 18-89 50}

{43-90 57 25-87 70 9-80 80 -9-70 87-25-57 90-43}

{36-83 83-36-36 83-83 36 36-83 83-36-36 83-83 36}

{25-70 90-80 43 9-57 87-87 57 -9-43 80-90 70-25}

{18-50 75-89 89-75 50-18-18 50-75 89-89 75-50 18}

{9-25 43-57 70-80 87-90 90-87 80-70 57-43 25 -9}

32x32 transformation

{64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 6464 64 64 64 64 64 64 64 64}

{90 90 88 85 82 78 73 67 61 54 46 38 31 22 13 4 -4-13-22-31-38-46-54-61-67-73-78-82-85-88-90-90}

{90 87 80 70 57 43 25 9 -9-25-43-57-70-80-87-90-90-87-80-70-57-43-25-9 9 25 43 57 70 80 87 90}

{90 82 67 46 22 -4-31-54-73-85-90-88-78-61-38-13 13 38 61 78 88 90 8573 54 31 4-22-46-67-82-90}

{89 75 50 18-18-50-75-89-89-75-50-18 18 50 75 89 89 75 50 18-18-50-75-89-89-75-50-18 18 50 75 89}

{88 67 31-13-54-82-90-78-46 -4 38 73 90 85 61 22-22-61-85-90-73-38 446 78 90 82 54 13-31-67-88}

{87 57 9-43-80-90-70-25 25 70 90 80 43 -9-57-87-87-57 -9 43 80 90 7025-25-70-90-80-43 9 57 87}

{85 46-13-67-90-73-22 38 82 88 54 -4-61-90-78-31 31 78 90 61 4-54-88-82-38 22 73 90 67 13-46-85}

{83 36-36-83-83-36 36 83 83 36-36-83-83-36 36 83 83 36-36-83-83-36 3683 83 36-36-83-83-36 36 83}

{82 22-54-90-61 13 78 85 31-46-90-67 4 73 88 38-38-88-73 -4 67 90 46-31-85-78-13 61 90 54-22-82}

{80 9-70-87-25 57 90 43-43-90-57 25 87 70 -9-80-80 -9 70 87 25-57-90-43 43 90 57-25-87-70 9 80}

{78 -4-82-73 13 85 67-22-88-61 31 90 54-38-90-46 46 90 38-54-90-31 6188 22-67-85-13 73 82 4-78}

{75-18-89-50 50 89 18-75-75 18 89 50-50-89-18 75 75-18-89-50 50 8918-75-75 18 89 50-50-89-18 75}

{73-31-90-22 78 67-38-90-13 82 61-46-88 -4 85 54-54-85 4 88 46-61-8213 90 38-67-78 22 90 31-73}

{70-43-87 9 90 25-80-57 57 80-25-90 -9 87 43-70-70 43 87 -9-90-25 8057-57-80 25 90 9-87-43 70}

{67-54-78 38 85-22-90 4 90 13-88-31 82 46-73-61 61 73-46-82 31 88-13-90 -4 90 22-85-38 78 54-67}

{64-64-64 64 64-64-64 64 64-64-64 64 64-64-64 64 64-64-64 64 64-64-6464 64-64-64 64 64-64-64 64}

{61-73-46 82 31-88-13 90 -4-90 22 85-38-78 54 67-67-54 78 38-85-22 904-90 13 88-31-82 46 73-61}

{57-80-25 90 -9-87 43 70-70-43 87 9-90 25 80-57-57 80 25-90 9 87-43-70 70 43-87 -9 90-25-80 57}

{54-85 -4 88-46-61 82 13-90 38 67-78-22 90-31-73 73 31-90 22 78-67-3890-13-82 61 46-88 4 85-54}

{50-89 18 75-75-18 89-50-50 89-18-75 75 18-89 50 50-89 18 75-75-1889-50-50 89-18-75 75 18-89 50}

{46-90 38 54-90 31 61-88 22 67-85 13 73-82 4 78-78 -4 82-73-13 85-67-22 88-61-31 90-54-38 90-46}

{43-90 57 25-87 70 9-80 80 -9-70 87-25-57 90-43-43 90-57-25 87-70 -980-80 9 70-87 25 57-90 43}

{38-88 73 -4-67 90-46-31 85-78 13 61-90 54 22-82 82-22-54 90-61-1378-85 31 46-90 67 4-73 88-38}

{36-83 83-36-36 83-83 36 36-83 83-36-36 83-83 36 36-83 83-36-36 83-8336 36-83 83-36-36 83-83 36}

{31-78 90-61 4 54-88 82-38-22 73-90 67-13-46 85-85 46 13-67 90-73 2238-82 88-54 -4 61-90 78-31}

{25-70 90-80 43 9-57 87-87 57 -9-43 80-90 70-25-25 70-90 80-43 -9 57-87 87-57 9 43-80 90-70 25}

{22-61 85-90 73-38 -4 46-78 90-82 54-13-31 67-88 88-67 31 13-54 82-9078-46 4 38-73 90-85 61-22}

{18-50 75-89 89-75 50-18-18 50-75 89-89 75-50 18 18-50 75-89 89-7550-18-18 50-75 89-89 75-50 18}

{13-38 61-78 88-90 85-73 54-31 4 22-46 67-82 90-90 82-67 46-22 -4 31-54 73-85 90-88 78-61 38-13}

{9-25 43-57 70-80 87-90 90-87 80-70 57-43 25 -9 -9 25-43 57-70 80-8790-90 87-80 70-57 43-25 9}

{4-13 22-31 38-46 54-61 67-73 78-82 85-88 90-90 90-90 88-85 82-78 73-67 61-54 46-38 31-22 13 -4}

The DCT-2 core exhibits symmetric/anti-symmetric properties, as listed below:

-feature # 1: the even rows with indices 0, 2, 4 … … are symmetrical to the point of symmetry before the coefficient number N/2.

-feature # 2: the odd rows with indices 1, 3, 5, 7 … … are symmetrical to the point of symmetry before the coefficient number N/2.

In addition, the N-point DCT-2 core (denoted as TN) is part of a 2N-point DCT-2 (denoted as T2N):

-feature # 3: TN (twisted nematic) _x,y＝T2N _x,2yWherein x, y is 0,1, …, N-1.

Based on the above symmetric/anti-symmetric characteristics (feature #1 and feature #2) and the relationship between N-point DCT-2 and 2N-point DCT-2 (feature #3), a so-called "partial butterfly" implementation is supported to reduce the number of operations (multiply, add/subtract, shift), and the same result of matrix multiplication can be obtained using partial butterflies.

Each transform base is called a "recursive transform" provided that it is symmetric or anti-symmetric and the N-point transform is part of a 2N-point transform. Examples of recursive transforms include DCT-2, Hadamard transform (Hadamard transform), DCT-1, Discrete Sine Transform (DST) -1, and DST-2. Reordering the transform bases of one recursive transform produces another recursive transform.

In addition to DCT-2 and 4x4 DST-7, which have been used in HEVC, adaptive multi-transform (AMT), or enhanced multi-transform (EMT) or multi-transform selection (MTS), is proposed for residual transform of inter and intra coded blocks. In addition to the current transform in HEVC, AMT also uses multiple transforms selected from the DCT/DST family. The newly introduced transformation matrices are DST-7, DCT-8, DST-1 and DCT-5. Table 1 shows the selected basis functions of DST/DCT:

table 1: transform basis functions for N-point input DCT-2/4/5/8 and DST-1/4/7

To preserve the orthogonality of the transform matrices, the transform matrices are quantized more accurately than in HEVC with a 10-bit representation instead of the 8-bit representation in HEVC. In order to keep the median of the transform coefficients within the 16-bit range, all coefficients are right-shifted by 2 bits after the horizontal transform and after the vertical transform, compared to the right-shift used for the current HEVC transform.

AMT is applied to coding units whose width and height are less than or equal to 64 (CUs), and whether AMT is applied is controlled by CU level flag. When the CU level flag is equal to 0, DCT-2 is applied to the CU to encode the residual. For luma coding blocks within AMT enabled CUs, 2 additional flags are signaled to identify the horizontal and vertical transforms to use.

For intra residual coding, a mode dependent transform candidate selection procedure is used because the residual statistics for different intra prediction modes are different. 3 transform subsets are defined as shown in table 2, and one transform subset is selected based on the intra prediction mode, as specified in table 3.

Table 2: 3 predefined transformation candidate sets

Table 3: selected horizontal (H) and vertical (V) transform sets for each intra prediction mode

With the subset concept, a transform subset is first identified based on table 3 using the intra prediction mode of CU with CU level AMT flag equal to 1. Then, for each of the horizontal and vertical transforms, one of 2 transform candidates in the identified transform subset is selected based on the explicit semaphore according to table 2.

However, for inter prediction residuals, only one transform set consisting of DST-7 and DCT-8 is used for all inter modes, and for horizontal and vertical transforms.

Among the 4 additional transform types, DST-7, DCT-8, DST-1 and DCT-5, the most efficiently used transform types are DST-7 and DCT-8. Note that DCT-8 essentially reverses the DST-7 basis left and right using sign transformation, so DCT-8 and DST-7 essentially share the same transformation basis.

The transformation core of DST-7 is a matrix composed of basis vectors, which can also be expressed as follows:

4-point DST-7:

a，b，c，d

c，c，0，-c

d，-a，-c，b

b，-d，c，-a

8 points DST-7:

a，b，c，d，e，f，g，h

c，f，h，e，b，-a，-d，-g

e，g，b，-c，-h，-d，a，f

g，c，-d，-f，a，h，b，-e

h，-a，-g，b，f，-c，-e，d

f，-e，-a，g，-d，-b，h，-c

d，-h，e，-a，-c，g，-f，b

b，-d，f，-h，g，-e，c，-a

16 point DST-7

a，b，c，d，e，f，g，h，i，j，k，l，m，n，o，p

c，f，i，l，o，o，l，i，f，c，0，-c，-f，-i，-l，-o

e，j，o，m，h，c，-b，-g，-l，-p，-k，-f，-a，d，i，n

g，n，l，e，-b，-i，-p，-j，-c，d，k，o，h，a，-f，-m

i，o，f，-c，-1，-l，-c，f，o，i，0，-i，-o，-f，c，1

k，k，0，-k，-k，0，k，k，0，-k，-k，0，k，k，0，-k

m，g，-f，-n，-a，l，h，-e，-o，-b，k，i，-d，-p，-c，j

o，c，-l，-f，i，i，-f，-l，c，o，0，-o，-c，l，f，-i

p，-a，-o，b，n，-c，-m，d，l，-e，-k，f，j，-g，-i，h

n，-e，-i，j，d，-o，a，m，-f，-h，k，c，-p，b，l，-g

l，-i，-c，o，-f，-f，o，-c，-i，l，0，-1，i，c，-o，f

j，-m，c，g，-p，f，d，-n，i，a，-k，l，-b，-h，o，-e

h，-p，i，-a，-g，o，-j，b，f，-n，k，-c，-e，m，-l，d

f，-l，o，-i，c，c，-i，o，-l，f，0，-f，l，-o，i，-c

d，-h，l，-p，m，-i，e，-a，-c，g，-k，o，-n，j，-f，b

b，-d，f，-h，j，-l，n，-p，o，-m，k，-i，g，-e，c，-a

32 point DST-7

a，b，c，d，e，f，g，h，i，j，k，l，m，n，o，p，q，r，s，t，u，v，w，x，y，z，A，B，C，D，E，F

c，f，i，l，o，r，u，x，A，D，F，C，z，w，t，q，n，k，h，e，b，-a，-d，-g，-j，-m，-p，-s，-v，-y，-B，-E

e，j，o，t，y，D，D，y，t，o，j，e，0，-e，-j，-o，-t，-y，-D，-D，-y，-t，-o，-j，-e，0，e，j，o，t，y，D

g，n，u，B，D，w，p，i，b，-e，-1，-s，-z，-F，-y，-r，-k，-d，c，j，q，x，E，A，t，m，f，-a，-h，-o，-v，-c

i，r，A，c，t，k，b，-g，-p，-y，-E，-v，-m，-d，e，n，w，F，x，o，f，-c，-1，-u，-D，-z，-q，-h，a，j，s，B

k，v，F，u，j，-a，-l，-w，-E，-t，-i，b，m，x，D，s，h，-c，-n，-y，-c，-r，-g，d，o，z，B，q，f，-e，-p，-A

m，z，z，m，0，-m，-z，-z，-m，0，m，z，z，m，0，-m，-z，-z，-m，0，m，z，z，m，0，-m，-z，-z，-m，0，m，z

o，D，t，e，-j，-y，-y，-j，e，t，D，o，0，-o，-D，-t，-e，j，y，y，j，-e，-t，-D，-o，0，o，D，t，e，-j，-y

q，E，n，-c，-t，-B，-k，f，w，y，h，-i，-z，-v，-e，1，c，s，b，-o，-F，-p，a，r，D，m，-d，-u，-A，-j，g，x

s，A，h，-k，-D，-p，c，v，x，e，-n，-F，-m，f，y，u，b，-q，-c，-j，i，B，r，-a，-t，-z，-g，1，E，o，-d，-w

u，w，b，-s，-y，-d，q，A，f，-o，-c，-h，m，E，j，-k，-F，-1，i，D，n，-g，-B，-p，e，z，r，-c，-x，-t，a，v

w，s，-d，-A，-o，h，E，k，-1，-D，-g，p，z，c，-t，-v，a，x，r，-e，-B，-n，i，F，j，-m，-c，-f，q，y，b，-u

y，o，-j，-D，-e，t，t，-e，-D，-j，o，y，0，-y，-o，j，D，e，-t，-t，e，D，j，-o，-y，0，y，o，-j，-D，-e，t

A，k，-p，-v，e，F，f，-u，-q，j，B，a，-z，-l，o，w，-d，-E，-g，t，r，-i，-C，-b，y，m，-n，-x，c，D，h，-s

c，g，-v，-n，o，u，-h，-B，a，D，f，-w，-m，p，t，-i，-A，b，E，e，-x，-l，q，s，-j，-z，c，F，d，-y，-k，r

E，c，-B，-f，y，i，-v，-l，s，o，-p，-r，m，u，-j，-x，g，A，-d，-D，a，F，b，-c，-e，z，h，-w，-k，t，n，-q

F，-a，-E，b，D，-c，-C，d，B，-e，-A，f，z，-g，-y，h，x，-i，-w，j，v，-k，-u，l，t，-m，-s，n，r，-o，-q，p

D，-e，-y，j，t，-o，-o，t，j，-y，-e，D，0，-D，e，y，-j，-t，o，o，-t，-j，y，e，-D，0，D，-e，-y，j，t，-o

B，-i，-s，r，j，-A，-a，C，-h，-t，q，k，-z，-b，D，-g，-u，p，l，-y，-c，E，-f，-v，o，m，-x，-d，F，-e，-w，n

z，-m，-m，z，0，-z，m，m，-z，0，z，-m，-m，z，0，-z，m，m，-z，0，z，-m，-m，z，0，-z，m，m，-z，0，z，-m

x，-q，-g，E，-j，-n，A，-c，-u，t，d，-B，m，k，-D，f，r，-w，-a，y，-p，-h，F，-i，-o，z，-b，-v，s，e，-C，1

v，-u，-a，w，-t，-b，x，-s，-c，y，-r，-d，z，-q，-e，A，-p，-f，B，-o，-g，C，-n，-h，D，-m，-i，E，-l，-j，F，-k

t，-y，e，o，-D，j，j，-D，o，e，-y，t，0，-t，y，-e，-o，D，-j，-j，D，-o，-e，y，-t，0，t，-y，e，o，-D，j

r，-C，k，g，-y，v，-d，-n，F，-o，-c，u，-z，h，j，-B，s，-a，-q，D，-l，-f，x，-w，e，m，-E，p，b，-t，A，-i

p，-F，q，-a，-o，E，-r，b，n，-D，s，-c，-m，C，-t，d，l，-B，u，-e，-k，A，-v，f，j，-z，w，-g，-i，y，-x，h

n，-B，w，-i，-e，s，-F，r，-d，-j，x，-A，m，a，-o，C，-v，h，f，-t，E，-q，c，k，-y，z，-l，-b，p，-D，u，-g

l，-x，C，-q，e，g，-s，E，-v，j，b，-n，z，-A，o，-c，-i，u，-F，t，-h，-d，p，-B，y，-m，a，k，-w，D，-r，f

j，-t，D，-y，o，-e，-e，o，-y，D，-t，j，0，-j，t，-D，y，-o，e，e，-o，y，-D，t，-j，0，j，-t，D，-y，o，-e

h，-p，x，-F，y，-q，i，-a，-g，o，-w，E，-z，r，-j，b，f，-n，v，-D，A，-s，k，-c，-e，m，-u，C，-B，t，-1，d

f，-1，r，-x，D，-C，w，-q，k，-e，-a，g，-m，s，-y，E，-B，v，-p，j，-d，-b，h，-n，t，-z，F，-A，u，-o，i，-c

d，-h，1，-p，t，-x，B，-F，c，-y，u，-q，m，-i，e，-a，-c，g，-k，o，-s，w，-A，E，-D，z，-v，r，-n，j，-f，b

b，-d，f，h，j，-l，n，p，r，-t，v，-x，z，-B，D，-F，E，-C，A，-y，w，-u，s，-q，o，-m，k，-i，g，-e，c，-a

Variables a, b, c, aa, ab, ac, and c, whose values may be different for different sizes of DST-7, may be derived based on the formula of DST-7 shown in table 1. For example, the value of "a" may be different for 4-point and 8-point DST-7.

To avoid floating point operations, the transform core of DST-7, similar to the DCT-2 core used in HEVC, may be implemented by, for example

Is scaled and rounded to the nearest integer or further adjusted by an offset such as + 1/-1.

Unlike DCT-2, where the fast method has been extensively studied, the implementation of DST-7 is still much less efficient than DCT-2, such as matrix multiplication.

In some video coding standards, methods have been proposed to extract the N-point DCT-4/DST-4 basis from the 2N-point DCT-2 basis, and DST-7/DCT-8 is replaced by DST-4/DCT-4, respectively. In this way, the transformation bases used in the AMT have no extra memory space, and therefore the complexity of the AMT is reduced in terms of the storage cost of the additional transformation bases used in storing the AMT.

However, experiments have shown that DST-7/DCT-8 shows better coding performance than DST-4/DCT-4. To preserve DST-7/DCT-8, while reducing the storage cost for storing additional transform bases used in AMT, in some video coding standards, methods have been proposed to embed N-point DST-7/DCT-8 into 2N-point DCT-2.

Mode-dependent non-separable quadratic transforms (NSST) have been proposed, which are applied between the forward core transform and quantization (at the encoder) and between dequantization and inverse core transform (at the decoder). To keep the complexity low, NSST is applied to low frequency coefficients only after the main transform. If both the width (W) and height (H) of the transform coefficient block are greater than or equal to 8, an 8x8 non-separable quadratic transform is applied to the top-left 8x8 region of the transform coefficient block. Otherwise, if W or H of the transform coefficient block is equal to 4, a 4x4 non-separable secondary transform is applied, and a 4x4 non-separable secondary transform is performed on the upper left min (8, W) × min (8, H) region of the transform coefficient block. The above transformation selection rules apply to the luminance and chrominance components.

Using a 4x4 input block as an example, an implementation of a matrix multiplication for a non-separable transform is described below. To apply the non-separable transform, the 4X4 input block X in equation (1) is represented as a vector in equation (2)

Computing an inseparable transform as

Wherein

Indicating a transform coefficient vector, T is a 16x16 transform matrix. Then, the 16x1 transform coefficient vector is transformed in the scanning order (horizontal, vertical, or diagonal) for the block

Reorganized into 4x4 chunks. In a 4x4 coefficient block, the coefficients with smaller indices will be placed at smaller scan indices.

There are a total of 35x3 non-separable secondary transforms for 4x4 and 8x8 block sizes, 35 the number of transform sets specified for intra prediction modes, denoted as sets, 3 the number of NSST candidates for each intra prediction mode. The mapping from intra prediction mode to transform set is defined in table 4 below. According to table 4, the transform set applied to the luma/chroma transform coefficients is specified by the corresponding luma/chroma intra prediction mode. For intra prediction modes larger than 34 (diagonal prediction direction), the transform coefficient block is transposed before/after the quadratic transform of the encoder/decoder.

Table 4: mapping from intra prediction mode to transform set index

For each transform set, the selected non-separable quadratic transform candidate is further specified by an explicitly signaled NSST index. Each intra-coded block signals an index once in the bitstream after unitary binarization using transform coefficients and truncation. For planar or DC mode, the truncation value is 2, and for angular intra prediction mode, the truncation value is 3. This NSST is signaled only when there is more than one non-zero coefficient in the CU. When not signaled, the default value is zero. A value of zero for this syntax element indicates that a quadratic transform should not be applied to the current coding block, and values 1-3 indicate which quadratic transform from the set should be applied.

In the reference set (BMS), NSST is not used for blocks coded with the transform skip mode. When the NSST index of an encoded block is signaled and the NSST index is not equal to zero, NSST is not used for blocks of components encoded in the CU using the transform skip mode. When the number of non-zero coefficients of a coding block of blocks with all components is coded with transform skip mode or non-transform skip mode Coding Blocks (CBs) is less than 2, the NSST index of the coding block is not signaled.

Disclosure of Invention

Embodiments of the present application provide a method and apparatus, a computer device, and a storage medium for controlling residual coding for decoding or encoding a video sequence, and aim to reduce a storage space for storing a transform base to reduce storage cost.

According to an embodiment of the present application, there is provided a method of controlling residual coding for decoding or encoding a video sequence, the method comprising:

determining whether to use a small transform size of a primary transform for residual coding of a coding block of a video sequence according to a resolution of the video sequence;

performing residual coding of the coding block using the identified primary transform.

According to an embodiment of the present application, there is provided an apparatus for controlling residual coding for decoding or encoding a video sequence, the apparatus comprising: a first determining module, configured to determine whether to use a small transform size of a primary transform for residual coding of a coding block of a video sequence according to a resolution of the video sequence; an identification module for, if it is determined that the resolution of the video sequence is below a preset threshold, using a small transform size of the primary transform and identifying a first set of transforms as the primary transform, the first set of transforms comprising a Discrete Sine Transform (DST) -4 and a Discrete Cosine Transform (DCT) -4; if the resolution of the video sequence is not lower than a preset threshold, determining a small transform size not using the primary transform and identifying a second set of transforms as the primary transform, the second set of transforms comprising DST-7 and DCT-8; and the execution module is used for executing residual coding of the coding block by using the identified main transformation.

According to an embodiment of the present application, there is provided a non-transitory computer-readable storage medium storing instructions which, when executed by a computer, cause the computer to perform the method of controlling residual encoding as described above.

According to an embodiment of the present application, there is provided a computer apparatus including one or more processors and one or more memories, wherein at least one instruction is stored in the one or more memories and loaded by and executed by the one or more processors to implement the above method of controlling residual coding.

In the embodiment of the present application, DST-4/DCT-4 shows coding performance similar to/better than DST-7/DCT-8 for a low resolution sequence applying smaller block size coding, and DST-7/DCT-8 shows better coding performance than DST-4/DCT-4 for a larger resolution sequence applying larger block size coding. Thus only DST-4/DCT-4 basis is used in AMT for small transform sizes and only DST-7/DCT-8 basis is used in AMT for larger transform sizes, all primary transform bases of different sizes and types can be extracted from one single transform kernel matrix, thus no additional memory for storing transform bases is needed by applying AMT, thereby reducing storage costs.

Drawings

FIG. 1A is a schematic representation of the second, fourth, sixth, eighth basis vectors of a 64-point DCT-2 and the first, second, third, and fourth basis vectors of a 32-point DCT-8;

FIG. 1B is a schematic illustration of the second, fourth, sixth, eighth basis vectors of the 64-point DCT-2, and the first, second, third, and fourth basis vectors of the scaled 32-point DCT-8 and the inverted first, second, third, and fourth basis vectors of the scaled 32-point DCT-8 with opposite signs;

fig. 2 is a simplified block diagram of a communication system according to an embodiment;

FIG. 3 is a schematic diagram of placement of a video encoder and a video decoder in a streaming environment, according to an embodiment;

fig. 4 is a functional block diagram of a video decoder according to an embodiment;

fig. 5 is a functional block diagram of a video encoder according to an embodiment;

FIG. 6 is a flow diagram of a method of controlling residual coding for decoding or encoding a video sequence according to one embodiment;

fig. 7 is a flow diagram of a method of controlling residual coding for decoding or encoding a video sequence according to another embodiment;

fig. 8 is a simplified block diagram of an apparatus for controlling residual coding for decoding or encoding a video sequence according to an embodiment;

fig. 9 is a schematic diagram of a computer system, according to an embodiment.

Detailed Description

In some video coding standards, coupled primary and secondary transform strategies are proposed that couple signaling between the primary and secondary transforms. That is, AMT and NSST share one syntax, and the NSST transform selection is decided given the transform selection of AMT for intra prediction mode. Thus, there is no need to further signal NSST.

It is observed that some of the odd bases in DCT-2 (odd bases refer to the base vectors associated with odd indices, the first base being associated with index 0) are very similar to the scaled DCT-8 base.

In FIG. 1A, the second, fourth, sixth, and eighth basis vectors of a 64-point DCT-2 are shown, as shown by the solid curves. In addition, the first, second, third, and fourth basis vectors of the 32-point DCT-8 are shown, which are composed of constants

Scaling, as shown by the dashed curve. According to FIG. 1A, the first half of the second, fourth, sixth, and eighth basis vectors of the 64-point DCT-2 are very close to

First, second, third, and fourth basis vectors of the scaled 32-point DCT-8.

Based on this observation, if the first half odd basis of the 2N-point DCT-2 is replaced with the N-point scaled DCT-8 basis and the second half is filled with the flipped DCT-8 basis plus the opposite sign, the resulting basis is very close to the original odd basis of the 2N-point DCT-2, as shown in FIG. 1B.

Specifically, FIG. 1B shows the second, fourth, sixth, eighth basis vectors (solid curves) of a 64-point DCT-2, the first, second, third, and fourth basis vectors (dashed curves) of a scaled 32-point DCT-8, the first, second, third, and fourth basis vectors (dotted curves) of the inverse of the scaled 32-point DCT-8.

Thus, by embedding the N-point DCT-8 (or DST-7 because the DST-7 and DCT-8 bases are symmetric to each other) into the 2N-point DCT-2, a new orthogonal transform, the Complex Orthogonal Transform (COT), can be derived. Because of the symmetric/anti-symmetric properties of DCT-2, feature #1 and feature #2 above, the resulting transform is still an orthogonal transform.

In this way, by embedding the N-point DCT-8/DST-7 into the 2N-point DCT-2, the complexity of performing AMT is reduced in two ways:

1. the logic to perform the transforms for DCT-2 and DCT-8/DST-7 may be shared.

2. The on-chip memory cost for storing DCT-8 and DST-7 may be reduced because they are part of the transform bases applied when AMT is not selected.

Fig. 2 is a simplified block diagram of a communication system (200) according to one embodiment. A communication system (200) comprises at least two terminals (210-220) interconnected by a network (250). For unidirectional transmission of data, the first terminal (210) may encode video data at a local location for transmission over the network (250) to other terminals (220). The second terminal (220) may receive video data encoded by other terminals from the network (250), decode the encoded data, and present the recovered video data.

Fig. 2 shows a second pair of terminals (230, 240) providing support for bi-directional transmission of encoded video, which may occur, for example, during a video conference. For bi-directional transmission of data, each terminal (230, 240) may encode locally acquired video data for transmission to other terminals over the network (250). Each terminal (230, 240) may also receive encoded video data transmitted by other terminals, decode the encoded data, and display the recovered video data on a local display device.

In FIG. 2, the terminals (210-240) may be a server, a personal computer, and a smart phone, but the principles of the embodiments are not limited thereto. Embodiments are applicable to laptop computers, tablet computers, media players, and/or dedicated video conferencing equipment. The network (250) represents any number of networks that transport encoded video data between terminals (210-240), including, for example, wired and/or wireless communication networks. The communication network (250) may exchange data in circuit-switched and/or packet-switched channels. The network may include a telecommunications network, a local area network, a wide area network, and/or the internet. For purposes of this application, the architecture and topology of the network (250) may be immaterial to the operation of the embodiments, unless explained below.

Fig. 3 illustrates placement of a video encoder and a video decoder in a streaming environment, according to an embodiment. The subject matter disclosed herein is equally applicable to other video-enabled applications including, for example, video conferencing, digital TV, storing compressed video on digital media including CDs, DVDs, memory sticks, and the like.

The streaming system may include an acquisition subsystem (313), which may include a video source (301), such as a digital camera, that creates an uncompressed stream of video samples (302). The sample stream (302) is depicted as a thick line to emphasize high data amount compared to an encoded video bitstream, the sample stream (302) can be processed by an encoder (303) coupled to the camera (301). The encoder (303) may comprise hardware, software, or a combination of hardware and software to implement or embody aspects of the disclosed subject matter as described in more detail below. The encoded video bitstream (304) is depicted as a thin line compared to the sample stream to emphasize the lower amount of data, which may be stored on the streaming server (305) for future use. One or more streaming clients (306, 308) may access the streaming server (305) to retrieve a copy (307, 309) of the encoded video bitstream (304). The client (306) may include a video decoder (310), the video decoder (310) decoding an incoming copy of the encoded video bitstream and generating an output stream of video samples (311) that may be presented on a display (312) or another presentation device (not depicted). In some streaming systems, the video bit stream (304, 307, 309) may be encoded according to some video encoding/compression standard. Examples of such standards include ITU-T H.265. The video coding standard under development is informally referred to as VVC, and the present application may be used in the context of the VVC standard.

Fig. 4 is a functional block diagram of a video decoder (310) according to an embodiment.

The receiver (410) may receive one or more decoded coder video sequences to be decoded by the decoder (310); in the same or an embodiment, the encoded video sequences are received one at a time, wherein each encoded video sequence is decoded independently of the other encoded video sequences. The encoded video sequence may be received from a channel (412), which may be a hardware/software link to a storage device that stores encoded video data. The receiver (410) may receive encoded video data as well as other data, e.g. encoded audio data and/or auxiliary data streams, which may be forwarded to their respective usage entities (not indicated). The receiver (410) may separate the encoded video sequence from other data. To prevent network jitter, a buffer memory (415) may be coupled between the receiver (410) and an entropy decoder/parser (420) (hereinafter "parser"). In some applications, the buffer memory (415) is part of the video decoder (310). The buffer memory (415) may not need to be configured or may be made smaller when the receiver (410) receives data from a store/forward device with sufficient bandwidth and controllability or from an isochronous network. Of course, for use over a traffic packet network such as the internet, a buffer memory (415) may also be required, which may be relatively large and may be of an adaptive size.

The video decoder (310) may include a parser (420) to reconstruct symbols (421) from the entropy encoded video sequence. The categories of these symbols include information for managing the operation of the video decoder (310), as well as potential information to control a display device, such as a display (312), which is not an integral part of the decoder, but may be coupled to the decoder, as shown in fig. 4. The control information for the display device may be a parameter set fragment (not denoted) of auxiliary enhancement information (SEI message) or Video Usability Information (VUI). The parser (420) may parse/entropy decode the received encoded video sequence. The encoding of the encoded video sequence may be performed in accordance with video coding techniques or standards and may follow principles generally known to those skilled in the art, including variable length coding, huffman coding, arithmetic coding with or without context sensitivity, and the like. A parser (420) may extract a subgroup parameter set for at least one of the subgroups of pixels in the video decoder from the encoded video sequence based on at least one parameter corresponding to the group. A subgroup may include a Group of pictures (GOP), a picture, a tile, a slice, a macroblock, a Coding Unit (CU), a block, a Transform Unit (TU), a Prediction Unit (PU), and so on. The entropy decoder/parser may also extract information from the encoded video sequence, such as transform coefficients, quantizer parameter values (QPs), motion vectors, and so on.

The parser (420) may perform entropy decoding/parsing operations on the video sequence received from the buffer memory (415) to create the symbols (421). A parser (420) may receive the encoded data and selectively decode particular symbols (421). Further, the parser (420) may determine whether to provide the particular symbol (421) to the motion compensated prediction unit (453), the scaler/inverse transform unit (451), the intra prediction unit (452), or the loop filter unit (454).

The reconstruction of the symbol (421) may involve a number of different units depending on the type of the encoded video picture or portion of the encoded video picture (e.g., inter and intra pictures, inter and intra blocks), among other factors. Which units are involved and the way in which they are involved can be controlled by subgroup control information parsed from the coded video sequence by a parser (420). For the sake of brevity, such a subgroup control information flow between parser (420) and a plurality of units below is not described.

In addition to the functional blocks already mentioned, the video decoder (310) may be conceptually subdivided into several functional units as described below. In a practical embodiment operating under business constraints, many of these units interact closely with each other and may be integrated with each other. However, for the purposes of describing the disclosed subject matter, a conceptual subdivision into the following functional units is appropriate.

The first unit is a sealer/inverse transform unit (451). The sealer/inverse transform unit (451) receives the quantized transform coefficients as symbols (421) from the parser (420) along with control information including which transform scheme to use, block size, quantization factor, quantization scaling matrix, etc. The sealer/inverse transform unit (451) may output a block comprising sample values, which may be input into the aggregator (455).

In some cases, the output samples of sealer/inverse transform unit (451) may belong to an intra-coded block; namely: predictive information from previously reconstructed pictures is not used, but blocks of predictive information from previously reconstructed portions of the current picture may be used. Such predictive information may be provided by intra-prediction unit (452). In some cases, the intra prediction unit (452) generates a surrounding block of the same size and shape as the block being reconstructed using reconstructed information extracted from the current (partially reconstructed) picture (456). In some cases, the aggregator (455) adds, on a per-sample basis, the prediction information generated by the intra prediction unit (452) to the output sample information provided by the scaler/inverse transform unit (451).

In other cases, the output samples of sealer/inverse transform unit (451) may belong to inter-coded and potential motion compensated blocks. In this case, motion compensated prediction unit (453) may access reference picture memory (457) to extract samples for prediction. After motion compensation of the extracted samples according to the sign (421), the samples may be added to the output of the sealer/inverse transform unit (in this case referred to as residual samples or residual signals) by an aggregator (455), thereby generating output sample information. The motion compensation unit fetching prediction samples from addresses within the reference picture memory may be controlled by a motion vector and the motion vector is used by the motion compensation unit in the form of the symbol (421), the symbol (421) for example comprising X, Y and a reference picture component. Motion compensation may also include interpolation of sample values fetched from reference picture memory, motion vector prediction mechanisms, etc., when using sub-sample exact motion vectors.

The output samples of the aggregator (455) may be employed in a loop filter unit (454) by various loop filtering techniques. The video compression techniques may include in-loop filter techniques that are controlled by parameters included in the encoded video (bitstream) and which are available to the loop filter unit (454) as symbols (421) from the parser (420). However, in other embodiments, the video compression techniques may also be responsive to meta-information obtained during decoding of previous (in decoding order) portions of the encoded picture or encoded video sequence, as well as to sample values previously reconstructed and loop filtered.

The output of the loop filter unit (454) may be a sample stream that may be output to a display device (312) and stored in a current picture buffer (456) for subsequent inter picture prediction.

Once fully reconstructed, some of the coded pictures may be used as reference pictures for future prediction. Once the encoded picture is fully reconstructed and the encoded picture is identified as a reference picture (by, for example, parser (420)), current picture buffer (456) may become part of reference picture memory (457) and a new current picture buffer may be reallocated before reconstruction of a subsequent encoded picture begins.

The video decoder (310) may perform decoding operations according to predetermined video compression techniques, such as may be recorded in the ITU-T h.265 standard. The coded video sequence may conform to the syntax specified by the video compression technique or standard used, i.e. it conforms to the syntax of the video compression technique or standard specified in the video compression technique file or standard, and in particular may also conform to the configuration file therein. For compliance, the complexity of the encoded video sequence is also required to be within the limits defined by the level of the video compression technique or standard. In some cases, the hierarchy limits the maximum picture size, the maximum frame rate, the maximum reconstruction sampling rate (measured in units of, e.g., mega samples per second), the maximum reference picture size, and so on. In some cases, the limits set by the hierarchy may be further defined by a Hypothetical Reference Decoder (HRD) specification and metadata signaled HRD buffer management in the encoded video sequence.

In an embodiment, the receiver (410) may receive additional (redundant) data along with the encoded video. The additional data may be part of an encoded video sequence. The additional data may be used by the video decoder (310) to properly decode the data and/or more accurately reconstruct the original video data. The additional data may be in the form of, for example, temporal, spatial, or signal-to-noise ratio (SNR) enhancement layers, redundant slices, redundant pictures, forward error correction codes, and so forth.

Fig. 5 is a functional block diagram of a video encoder (303) according to an embodiment.

The encoder (303) may receive video samples from a video source (301) (not part of the decoder) that may capture video images to be encoded by the encoder (303).

The video source (301) may provide a source video sequence in the form of a stream of digital video samples to be encoded by the encoder (303), which may have any suitable bit depth (e.g., 8-bit, 10-bit, 12-bit … …), any color space (e.g., bt.601y CrCB, RGB … …), and any suitable sampling structure (e.g., Y CrCB 4:2:0, Y CrCB 4:4: 4). In the media service system, the video source (301) may be a storage device that stores previously prepared video. In a video conferencing system, the video source (301) may be a camera that captures local image information as a video sequence. Video data may be provided as a plurality of individual pictures that are given motion when viewed in sequence. The picture itself may be constructed as an array of spatial pixels, where each pixel may comprise one or more samples, depending on the sampling structure, color space, etc. used. The relationship between pixels and samples can be readily understood by those skilled in the art. The following text focuses on describing the samples.

According to an embodiment, the encoder (303) may encode and compress pictures of a source video sequence into an encoded video sequence (543) in real time or under any other temporal constraints required by the application. It is a function of the controller (550) to perform the appropriate encoding speed. The controller controls other functional units as described below and is functionally coupled to these units. For simplicity, the couplings are not labeled in the figures. The parameters set by the controller (550) may include rate control related parameters (picture skip, quantizer, lambda value of rate distortion optimization technique, etc.), picture size, group of pictures (GOP) layout, maximum motion vector search range, etc. Other functions of the controller (550) may be readily identified by those skilled in the art as they may involve the video encoder (303) being optimized for a particular system design.

Some video encoders operate in a manner that is readily recognized by those skilled in the art as an "encoding loop". As an overly simplified description, the encoding loop may be made up of an encoding portion of an encoder (530) (hereinafter "source encoder") responsible for creating symbols based on the input picture and reference picture to be encoded, and a (local) decoder (533) embedded in the encoder (303). The encoder (303) reconstructs the symbols to create sample data that the (remote) decoder would also create (since any compression between the symbols and the encoded video stream is lossless in the video compression techniques contemplated by the disclosed subject matter). That reconstructed sample stream is input to a reference picture store (534). Since the decoding of the symbol stream produces bit-accurate results independent of decoder location (local or remote), the reference picture buffer contents also correspond bit-accurately between the local encoder and the remote encoder. In other words, the reference picture samples that the prediction portion of the encoder "sees" are identical to the sample values that the decoder would "see" when using prediction during decoding. Such reference picture synchronization philosophy (and drift that occurs if synchronization cannot be maintained, e.g., due to channel errors) is well known to those skilled in the art.

The operation of the "local" decoder (533) may be the same as the "remote" decoder (310) that has been described in detail above in connection with fig. 4. However, referring briefly also to fig. 4, when symbols are available and the entropy encoder (545) and parser (420) are able to losslessly encode/decode the symbols into an encoded video sequence, the entropy decoding portion of the decoder (310), including the channel (412), receiver (410), buffer memory (415), and parser (420), may not be fully implemented in the local decoder (533).

At this point it can be observed that any decoder technique other than the parsing/entropy decoding present in the decoder must also be present in the corresponding encoder in substantially the same functional form. The description of the encoder techniques may be simplified because the encoder techniques are reciprocal to the fully described decoder techniques. A more detailed description is only needed in certain areas and is provided below.

As part of the operation, the source encoder (530) may perform motion compensated predictive coding. The motion compensated predictive coding predictively codes the input frame with reference to one or more previously coded frames from the video sequence that are designated as "reference frames". In this way, the encoding engine (532) encodes differences between blocks of pixels of an input frame and blocks of pixels of a reference frame, which may be selected as a prediction reference for the input frame.

The local video decoder (533) may decode encoded video data, which may be designated as a reference frame, based on the symbols created by the source encoder (530). The operation of the encoding engine (532) may be a lossy process. When the encoded video data can be decoded at a video decoder (not shown in fig. 4), the reconstructed video sequence may typically be a copy of the source video sequence with some errors. The local video decoder (533) replicates a decoding process, which may be performed on a reference frame by the video decoder, and may cause the reconstructed reference frame to be stored in the reference picture memory (534). In this way, the encoder (303) can locally store a copy of the reconstructed reference frame that has common content (no transmission errors) with the reconstructed reference frame to be obtained by the remote video decoder.

The predictor (535) may perform a prediction search against the coding engine (532). That is, for a new frame to be encoded, predictor (535) may search reference picture memory (534) for sample data (as candidate reference pixel blocks) or some metadata, such as reference picture motion vectors, block shapes, etc., that may be referenced as appropriate predictions for the new picture. The predictor (535) may operate on a block-by-block basis of samples to find a suitable prediction reference. In some cases, from search results obtained by predictor (535), it may be determined that the input picture may have prediction references taken from multiple reference pictures stored in reference picture memory (534).

The controller (550) may manage encoding operations of the video encoder (530), including, for example, setting parameters and subgroup parameters for encoding video data.

The outputs of all of the above functional units may be entropy encoded in an entropy encoder (545). The entropy encoder losslessly compresses the symbols generated by the various functional units according to techniques known to those skilled in the art, such as huffman coding, variable length coding, arithmetic coding, etc., to convert the symbols into an encoded video sequence.

The transmitter (540) may buffer the encoded video sequence created by the entropy encoder (545) in preparation for transmission over a communication channel (560), which may be a hardware/software link to a storage device that may store encoded video data. The transmitter (540) may combine the encoded video data from the video encoder (530) with other data to be transmitted, such as encoded audio data and/or an auxiliary data stream (sources not shown).

The controller (550) may manage the operation of the encoder (303). During encoding, the controller (550) may assign a certain encoded picture type to each encoded picture, but this may affect the encoding techniques applicable to the respective picture. For example, pictures can be generally assigned to any of the following frame types:

intra pictures (I pictures), which may be pictures that can be encoded and decoded without using any other frame in the sequence as a prediction source. Some video codecs tolerate different types of intra pictures, including, for example, independent decoder refresh pictures. Those skilled in the art are aware of variants of picture I and their corresponding applications and features.

Predictive pictures (P pictures), which may be pictures that may be encoded and decoded using intra prediction or inter prediction that uses at most one motion vector and reference index to predict sample values of each block.

Bi-predictive pictures (B-pictures), which may be pictures that can be encoded and decoded using intra-prediction or inter-prediction that uses at most two motion vectors and reference indices to predict sample values of each block. Similarly, multiple predictive pictures may use more than two reference pictures and associated metadata for reconstructing a single block.

A source picture may typically be spatially subdivided into blocks of samples (e.g., blocks of 4x4, 8x8, 4x8, or 16x16 samples) and encoded block-wise. These blocks may be predictively encoded with reference to other (encoded) blocks that are determined according to the encoding allocation applied to their respective pictures. For example, a block of an I picture may be non-predictive encoded, or the block may be predictive encoded (spatial prediction or intra prediction) with reference to an already encoded block of the same picture. The pixel block of the P picture may be non-predictively encoded by spatial prediction or by temporal prediction with reference to one previously encoded reference picture. A block of a B picture may be non-predictively encoded by spatial prediction or by temporal prediction with reference to one or two previously encoded reference pictures.

The video encoder (303) may perform encoding operations according to a predetermined video encoding technique or standard, such as the ITU-T h.265 recommendation. In operation, the video encoder (303) may perform various compression operations, including predictive encoding operations that exploit temporal and spatial redundancies in the input video sequence. Thus, the encoded video data may conform to syntax specified by the video coding technique or standard used.

In an embodiment, the transmitter (540) may transmit the additional data and the encoded video. The video encoder (530) may accommodate such data as part of an encoded video sequence. The additional data includes temporal/spatial/SNR enhancement layers, other forms of redundant data such as redundant pictures and slices, etc., auxiliary enhancement information (SEI) messages, Video Usability Information (VUI) parameter set fragments, etc.

In this description, the DST-4 basis and DCT-4 basis are inverted (left and right) versions of each other, plus some basis are sign transformed according to the mathematical definition of DST-4 and DCT-4. However, even if the sign transform is eliminated, such basis may be referred to as DST-4/DCT-4 because the sign transform does not impact coding performance. Moreover, the DST-7 basis and DCT-8 basis are inverted (left and right) versions of each other, plus some of the bases are sign transformed according to the mathematical definition of DST-7 and DCT-8. However, even if the sign transform is eliminated, such basis may be referred to as DST-7/DCT-8 because the sign transform does not impact coding performance.

In this description, when an N-point COT embeds a 2N-point DCT-2, the new transform kernel is also called COT, which is a 2N-point transform. For example, a 32-point COT may consist of a 16-point DCT-2 transform and a 16-point DST-7/DCT-8 transform, and a 64-point COT may consist of a 32-point DCT-2 and a 32-point COT, or a 32-point DCT-2 and a 32-point DST-7/DCT-8.

For low resolution (e.g., standard definition) sequences that apply smaller block size coding, DST-4/DCT-4 exhibits coding performance similar to/better than DST-7/DCT-8. For larger resolution (e.g., high definition) sequences that apply larger block size coding, DST-4/DCT-4 shows worse coding performance than DST-7/DCT-8. For example, if the resolution of the video sequence is below a preset threshold, a small transform size (i.e., DST-4/DCT-4) is used for residual coding of a coded block of the video sequence; if the resolution of the video sequence is not below a preset threshold, a small transform size (i.e., DST-7/DCT-8) is used for residual coding of coding blocks of the video sequence.

Accordingly, in an embodiment, only DST-4/DCT-4 bases are used in AMT for small transform sizes, and only DST-7/DCT-8 bases are used in AMT for larger transform sizes. All primary transformation bases of different sizes and types may be extracted from one single transformation core matrix (e.g., one 64x64 transformation core matrix), and thus, by applying AMT, no additional memory for storing the transformation bases is required.

When AMT is not selected, only the DCT-2/COT basis is used for the primary transform. In one embodiment, for transform sizes less than or equal to 16 points, only DCT-2 is used as the primary transform when the AMT is not selected. For a transform size equal to 32 points, when AMT is not selected, only COT whose transform kernel is composed of the basis of 16-point DCT-2 and 16-point DCT-8 is used as the primary transform. For a transform size equal to 64 points, when AMT is not selected, only COT whose transform kernel is composed of the basis of 32-point DCT-2 and 32-point DCT-8 is used as the primary transform. In another embodiment, for a transform size equal to 64 points, when AMT is not selected, only COT whose transform kernel consists of the bases of 32-point DCT-8 and 32-point COT is used as the primary transform.

Small transform sizes may include, but are not limited to, 2-point, 4-point, and/or 8-point.

Larger transform sizes may include, but are not limited to, transform sizes greater than or equal to 8 points. In another embodiment, the larger transform size may include, but is not limited to, a transform size greater than or equal to 16 points.

In an embodiment, only one transform set DST-4, DCT-4 is used in AMT for small transform sizes.

In an embodiment, only one transform set DST-7, DCT-8 is used in the AMT for larger transform sizes.

In an embodiment, only one set of transforms DST-7, DCT-8 is used in the AMT to encode the inter prediction residual block.

In an embodiment, only one set of transforms DST-7, DCT-8 is used in the AMT to encode the intra prediction residual block.

In an embodiment, for one coding block, DST-7 or DST-4 may be selected as the horizontal/vertical transform, and DCT-8 or DCT-4 may be selected as the horizontal/vertical transform, the selection depending on the intra prediction mode.

In one embodiment, only one set of transforms DST-7, DCT-8 is used in AMT to encode the chroma intra prediction residual block. In another embodiment, only one set of transforms DST-7, DCT-8 is used in the AMT to encode the chroma inter prediction residual block.

In an embodiment, when signaling for primary transform (AMT) and secondary transform (NSST) is coupled, the secondary transform selection is coupled with DST-4/DCT-4 for small transform sizes and the secondary transform selection is coupled with DST-7/DCT-8 for larger transform sizes.

The same embodiments described may also be applied to the primary transform in the coupled primary and secondary transforms.

For certain block sizes, quadratic transforms may be disabled. In one embodiment, the certain block sizes include 4x 4. In another embodiment, the certain chunk sizes include 4x8, 8x 4.

When the primary transform and the secondary transform are applied to the same number of residuals/coefficients, the secondary transform may not be allowed. In one embodiment, when the primary transform size is 8x8, the quadratic transform size may be only 4x4, 4x8, or 8x4, which applies to the lowest 4x4, 4x8, or 8x4 transform coefficient regions. In another embodiment, when the primary transform size is 4x8 or 8x4, the quadratic transform size may be only 4x 4.

A 2x2NSST may be applied to the 4x4 block, the 2x2NSST being associated with the 4x4 transform kernel matrix.

Fig. 6 is a flow diagram of a method (600) of controlling residual coding for decoding or encoding a video sequence according to one embodiment. In some embodiments, one or more of the steps of fig. 6 may be performed by the decoder (310). In some embodiments, one or more of the steps of fig. 6 may be performed by another device or group of devices, such as encoder (303), separate from or including decoder (310).

Referring to fig. 6, in step 610, the method (600) includes determining whether to use a small transform size of a primary transform for residual coding of a coding block of a video sequence according to a resolution of the video sequence.

If the resolution of the video sequence is below a preset threshold, a small transform size using a primary transform is determined 610-yes, and in step 620, the method (600) includes identifying a first set of transforms as the primary transform, the first set of transforms including a Discrete Sine Transform (DST) -4 and a Discrete Cosine Transform (DCT) -4.

If the resolution of the video sequence is not below the preset threshold, a small transform size without using the primary transform is determined 610-no, and in step 630, the method (600) includes identifying a second set of transforms as the primary transform, the second set of transforms including DST-7 and DCT-8.

In step 640, the method (600) comprises performing residual coding of the coding block using the identified primary transform.

A small transform size may be a transform kernel matrix of 8 points or less.

The large transform size of the main transform may be a transform kernel matrix larger than 8 points.

The large transform size of the primary transform may be a transform kernel matrix of 16 points or more.

The performing of residual coding may comprise coding a residual block of an intra-or inter-coded block.

Fig. 7 is a method (700) of controlling residual coding for decoding or encoding of video encoding according to another embodiment. In some embodiments, one or more of the steps of fig. 7 may be performed by the decoder (310). In some embodiments, one or more of the steps of fig. 7 may be performed by another device or group of devices, such as encoder (303), separate from or including decoder (310).

Referring to fig. 7, in step 710, the method (700) comprises determining whether a quadratic transform is used for residual coding of the coding block, wherein residual coding of the coding block is performed using the identified primary transform.

If a quadratic transform is determined to be used, in step 720, the method (700) includes determining whether to use a small transform size of the quadratic transform for residual coding of the coding block.

If a small transform size using a quadratic transform is determined, a first set of transforms including the DST-4 and DCT-4 is identified as the quadratic transform in step 730.

If it is determined that the small transform size of the quadratic transform is not used 720-no, a second set of transforms comprising said DST-7 and DCT-8 is identified as said quadratic transform in step 740.

In step 750, the method (700) comprises performing residual coding of the coding block using the identified secondary transform, the residual coding of the coding block being performed using the identified primary transform.

The primary transform may be an adaptive multi-transform and the secondary transform may be an inseparable secondary transform.

Although fig. 6 and 7 show exemplary steps of methods (600) and (700), in some embodiments, methods (600) and (700) may include additional steps, fewer steps, different steps, or steps in different permutations than those described in fig. 6 and 7. Additionally or alternatively, two or more steps of methods (600) and (700) may be performed in parallel.

Further, the proposed method may be implemented by a processing circuit (e.g. one or more processors or one or more integrated circuits). In an embodiment, one or more processors execute a program stored in a non-transitory computer readable medium to perform one or more of the proposed methods.

Fig. 8 is a simplified block diagram of an apparatus (800) for controlling residual coding for decoding or encoding a video sequence, according to an embodiment.

Referring to fig. 8, the apparatus (800) includes a first determination code (810), an identification code (820), an execution code (830), and a second determination code (840).

A first determining code (810) is for causing at least one processor to determine whether to use a small transform size of a primary transform for residual coding of a coding block of a video sequence according to a resolution of the video sequence.

Determining a small transform size using the primary transform if the resolution of the video sequence is below a preset threshold, identifying code (820) for causing at least one processor to identify a first set of transforms as the primary transform, the first set of transforms comprising a Discrete Sine Transform (DST) -4 and a Discrete Cosine Transform (DCT) -4, and identifying a second set of transforms as the primary transform if the resolution of the video sequence is not below the preset threshold, the small transform size not using the primary transform being determined, the second set of transforms comprising DST-7 and DCT-8.

The execution code (830) is for causing the at least one processor to perform residual coding of the coded block using the identified primary transform.

A small transform size may be a transform kernel matrix of 8 points or less.

The large transform size of the main transform may be a transform kernel matrix larger than 8 points.

The large transform size of the main transform is a transform kernel matrix of 16 points or more.

Second determining code (840) is for causing at least one processor to determine whether to use a quadratic transform for residual coding of the coding block, wherein residual coding of the coding block is performed using the identified primary transform. If it is determined to use the quadratic transform, a first determining code (810) further causes the at least one processor to determine whether to use a small transform size of the quadratic transform for residual coding of the coding block. Identifying code (820) further for causing at least one processor to identify a first set of transforms comprising the DST-4 and DCT-4 as the quadratic transform if a small transform size using the quadratic transform is determined, and to identify a second set of transforms comprising the DST-7 and DCT-8 as the quadratic transform if a small transform size not using the quadratic transform is determined. The execution code (830) is further for causing at least one processor to perform residual coding of the coding block using the identified quadratic transform, the residual coding of the coding block being performed using the identified primary transform.

The primary transform may be an adaptive multi-transform and the secondary transform may be an inseparable secondary transform.

The execution code (830) may further be for causing the at least one processor to encode a residual block of the intra-or inter-coded encoded block.

An embodiment of the present application further provides an apparatus for controlling residual coding, where the residual coding is used for decoding or encoding a video sequence, and the apparatus includes: the device comprises a first determination module, an identification module and an execution module.

The first determination module is used for determining whether to use the small transform size of the main transform for residual coding of a coding block of the video sequence according to the resolution of the video sequence;

an identification module configured to identify a first set of transforms as the primary transform, the first set of transforms comprising a Discrete Sine Transform (DST) -4 and a Discrete Cosine Transform (DCT) -4, if a resolution of the video sequence is below a preset threshold, determining a small transform size using the primary transform; if the resolution of the video sequence is not lower than a preset threshold, determining not to use the small transform size of the primary transform, and identifying a second set of transforms as the primary transform, the second set of transforms comprising DST-7 and DCT-8;

an execution module is configured to perform residual coding of the coding block using the identified primary transform.

The present application further provides a computer device, which includes one or more processors and one or more memories, where at least one instruction is stored in the one or more memories, and the at least one instruction is loaded and executed by the one or more processors to implement the method for controlling residual coding according to the foregoing embodiments.

The techniques described above may be implemented as computer software using computer readable instructions and physically stored in one or more computer readable media.

The execution code (830) may further be for causing the at least one processor to encode a residual block of the intra-or inter-coded encoded block.

The techniques described above may be implemented as computer software via computer readable instructions and physically stored in one or more computer readable media.

Fig. 9 is a diagram of a computer system (900) according to an embodiment.

The computer software may be encoded in any suitable machine code or computer language, and code comprising instructions may be created by mechanisms of assembly, compilation, linking, etc., that are directly executable by a computer Central Processing Unit (CPU), Graphics Processing Unit (GPU), etc., or by means of transcoding, microcode, etc.

The instructions may be executed on various types of computers or components thereof, including, for example, personal computers, tablets, servers, smartphones, gaming devices, internet of things devices, and so forth.

The components illustrated in FIG. 9 for the computer system (900) are exemplary in nature and are not intended to suggest any limitation as to the scope of use or functionality of the computer software implementing the embodiments. Neither should the configuration of components be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiments of the computer system 900.

The computer system (900) may include some human interface input devices. The human interface input device may respond to input from one or more human users through tactile input (e.g., keyboard input, swipe, data glove movement), audio input (e.g., sound, applause), visual input (e.g., gestures), olfactory input (not shown). The human interface device may also be used to capture certain media that are not necessarily directly related to human conscious input, such as audio (e.g., speech, music, ambient sounds), images (e.g., scanned images, photographic images obtained from still-image cameras), video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).

The machine interface input devices may include one or more of the following (only one of which is depicted): keyboard (901), mouse (902), touch pad (903), touch screen (910), data glove (not shown), joystick (905), microphone (906), scanner (907), camera (908).

The computer system (900) may also include certain human interface output devices. The human interface output device may stimulate the senses of one or more human users through, for example, tactile output, sound, light, and smell/taste. The human interface output devices may include tactile output devices (e.g., tactile feedback through a touch screen (910), data glove (not shown), or joystick (905), but there may also be tactile feedback devices that are not input devices), audio output devices (e.g., speakers (909), headphones (not shown)), visual output devices (e.g., screens (910) including cathode ray tube screens, liquid crystal screens, plasma screens, organic light emitting diode screens, each with or without touch screen input functionality, tactile feedback functionality-some of which may output two-dimensional visual output or more than three-dimensional output through means such as stereoscopic output; virtual reality glasses (not shown), holographic displays, and smoke boxes (not shown)), and printers (not shown).

The computer system (900) may also include human-accessible storage devices and their associated media such as optical media including media (921) such as CD/DVD ROM/RW (920) with CD/DVD, thumb drive (922), removable hard drive or solid state drive (923), conventional magnetic media such as magnetic tape and floppy disk (not shown), ROM/ASIC/PLD based proprietary devices such as secure dongle (not shown), and so forth.

Those skilled in the art will also appreciate that the term "computer-readable medium" used in connection with the subject matter of the present disclosure does not include transmission media, carrier waves, or other transitory signals.

The computer system (900) may also include an interface (954) to one or more communication networks. The network may be, for example, wireless, wired, optical. The network may further be a local area network, a wide area network, a metropolitan area network, a vehicle and industrial network, a real time network, a delay tolerant network, etc. Examples of networks include local area networks such as ethernet, wireless local area networks, wireless communication networks (including global system for mobile communications (GSM), third generation (3G), fourth generation (4G), fifth generation (5G), Long Term Evolution (LTE), etc.), television wired or wireless wide area digital networks (including cable television, satellite television, terrestrial broadcast television), automotive and industrial networks (including CANBus), and so forth. Certain networks typically require external network interface adapters connected to certain universal data ports or peripheral buses ((949)) (e.g., USB ports of computer system (900)); other systems typically integrate into the smartphone computer system by connecting to a system bus as described below (e.g., an ethernet interface to a PC computer system or a cellular network interface). Using any of these networks, the computer system (900) may communicate with other entities. Such communications may be unidirectional, for reception only (e.g., broadcast television), unidirectional, for transmission only (e.g., CAN bus to certain CAN bus devices), or bidirectional, such as to other computer systems over a local or wide area digital network. Each of the networks and network interfaces described above may use certain protocols and protocol stacks.

The aforementioned human interface device, human accessible storage device and network interface may be connected to the core (940) of the computer system (900).

The core (940) may include one or more Central Processing Units (CPUs) (941), Graphics Processing Units (GPUs) (942), special purpose programmable processing units in the form of Field Programmable Gate Arrays (FPGAs) (943), hardware accelerators (944) for specific tasks, and so forth. The devices as well as Read Only Memory (ROM) (945), Random Access Memory (RAM) (946), internal mass storage (e.g., internal non-user accessible hard drives, Solid State Disks (SSDs), etc.) (947), etc. may be connected via a system bus (948). In some computer systems, the system bus (948) may be accessed in the form of one or more physical plugs, to extend through additional central processing units, graphics processing units, and the like. The peripheral devices may be attached directly to the system bus (948) of the core or connected through a peripheral bus (949). The architecture of the peripheral bus includes Peripheral Component Interconnect (PCI), universal serial bus USB, and the like.

The CPU (941), GPU (942), FPGA (943) and accelerator (944) may execute certain instructions, which in combination may constitute the computer code described above. The computer code may be stored in ROM (945) or RAM (946). Transitional data may also be stored in RAM (946) while persistent data may be stored in, for example, internal mass storage (947). Fast storage and retrieval of any memory device may be achieved through the use of cache memory, which may be closely associated with one or more of CPU (941), GPU (942), mass storage (947), ROM (945), RAM (946), and the like.

The computer-readable medium may have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well known and available to those having skill in the computer software arts.

By way of example, and not limitation, a computer system having architecture (900), and in particular cores (940), may provide functionality as a processor (including CPUs, GPUs, FPGAs, accelerators, etc.) executing software embodied in one or more tangible computer-readable media. Such computer-readable media may be media associated with user-accessible mass storage as described above, as well as specific memory of the core (940) having non-transitory nature, such as core internal mass storage (947) or ROM (945). Software implementing various embodiments may be stored in such devices and executed by the core (940). The computer-readable medium may include one or more memory devices or chips, according to particular needs. The software may cause the core (940), and in particular the processors therein (including CPUs, GPUs, FPGAs, etc.), to perform certain processes or certain portions of certain processes described herein, including defining data structures stored in RAM (946) and modifying such data structures according to software defined processes. Additionally or alternatively, the computer system may provide functionality that is logically hardwired or otherwise embodied in circuitry (e.g., accelerator (944)) that may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. Where appropriate, reference to software may include logic and vice versa. Where appropriate, reference to a computer-readable medium may include circuitry (e.g., an Integrated Circuit (IC)) storing executable software, circuitry comprising executable logic, or both. Embodiments include any suitable combination of hardware and software.

Although the present invention has been described with respect to a number of exemplary embodiments, various alterations, permutations, and various substitutions of the embodiments are within the scope of the present invention. It will thus be appreciated that those skilled in the art will be able to devise various systems and methods which, although not explicitly shown or described herein, embody the principles of the invention and are thus within its spirit and scope.

34页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于视频编码的自适应环内滤波

Method and apparatus for controlling residual coding for decoding or encoding a video sequence

相关技术

网友询问留言