Self-adaptive CU splitting decision method based on deep learning and multi-feature fusion

文档序号:1339277 发布日期:2020-07-17 浏览:17次 中文

阅读说明:本技术 基于深度学习和多特征融合的自适应cu拆分决策方法 (Self-adaptive CU splitting decision method based on deep learning and multi-feature fusion ) 是由 赵进超 张秋闻 王兆博 王祎菡 崔腾耀 赵永博 郭睿骁 王晓 蒋斌 黄立勋 张伟 于 2020-03-20 设计创作,主要内容包括:本发明提出了一种基于深度学习和多特征融合的自适应CU拆分决策方法,其步骤为:首先,利用标准偏差计算当前CU的纹理复杂度SD,再利用量化参数函数和深度函数构建阈值模型,将当前CU分为复杂CU和均匀CU;其次,如果复杂CU属于边缘CU,则利用基于多特征融合的CNN结构对复杂CU进行判断是否拆分;否则,利用基于自适应的CNN结构对复杂CU进行判断是否拆分。本发明将深度学习和多特征融合相结合,解决了编码复杂性的问题。基于多特征融合的CNN结构和基于自适应的CNN结构均可成功处理训练样本,避免计算所有与复杂CU的率失真RDO,从而降低了计算复杂度,节省了编码时间。(The invention provides a self-adaptive CU splitting decision method based on deep learning and multi-feature fusion, which comprises the following steps: firstly, calculating the texture complexity SD of the current CU by using a standard deviation, constructing a threshold model by using a quantization parameter function and a depth function, and dividing the current CU into a complex CU and a uniform CU; secondly, if the complex CU belongs to the edge CU, judging whether the complex CU is split or not by using a CNN structure based on multi-feature fusion; otherwise, judging whether the complex CU is split or not by using the self-adaptive CNN structure. The invention combines deep learning and multi-feature fusion, and solves the problem of coding complexity. The CNN structure based on multi-feature fusion and the CNN structure based on self-adaptation can both successfully process training samples, and rate distortion RDO (remote data analysis) of all CUs and complex CUs is avoided, so that the calculation complexity is reduced, and the encoding time is saved.)

1. A self-adaptive CU splitting decision method based on deep learning and multi-feature fusion is characterized by comprising the following steps:

s1, constructing a threshold model according to the quantization parameter function and the depth function, and calculating the threshold of the texture classification of the current CU according to the quantization parameter and the depth of the current CU;

s2, calculating the texture complexity SD of the current CU by using the standard deviation;

s3, judging whether the texture complexity SD is smaller than the threshold value in the step S1, if so, judging that the current CU is a uniform CU, not executing splitting, otherwise, judging that the current CU is a complex CU, and executing a step S4;

s4, judging whether the complex CU is located at the edge of the image, if so, executing a step S5, otherwise, executing a step S6;

s5, classifying the complex CU by using a CNN structure based on multi-feature fusion, and splitting the complex CU when the classification result is 1, namely the rate distortion of the split complex CU is smaller than that of the split complex CU, otherwise, the complex CU is not split;

and S6, classifying the complex CU by using the self-adaptive CNN structure, splitting the complex CU when the classification result is 1, and otherwise, not splitting the complex CU.

2. The adaptive CU split decision method based on deep learning and multi-feature fusion as claimed in claim 1, wherein the threshold model is:

Th=F(QP)×G(Depth),

wherein Th represents a threshold value of texture classification, Depth represents the Depth of the current CU, QP represents a quantization parameter, F (-) represents a quantization parameter function, and G (-) represents a Depth function;

the quantization parameter function and the depth function are respectively:

wherein R isCU_DepthRepresents the ratio of the current CU with Depth of Depth in one frame image.

3. The adaptive CU split decision method based on deep learning and multi-feature fusion as claimed in claim 1, wherein the expression of the texture complexity SD is:

where W and H represent the width and height of the current CU, respectively, and p (x, y) represents the pixel value at (x, y).

4. The deep learning and multi-feature fusion based adaptive CU splitting decision method according to claim 1, wherein the method for classifying the complex CU by using the CNN structure based on multi-feature fusion is as follows:

s51, acquiring complex CUs positioned at the edge in the M groups of video sequences, turning and rotating the complex CUs to be used as a data set, wherein the data set is divided into a training set I and a testing set I;

s52, respectively calculating the standard deviation and the depth feature of each complex CU in the training set I;

s53, building a network structure which is a sub-network of a convolutional layer I-a pooling layer I-a convolutional layer II-a pooling layer II-a convolutional layer III-a pooling layer III-a convolutional layer IV-a pooling layer IV-a full connection layer I-a full connection layer II, inputting training set I data corresponding to standard deviation and depth characteristics into the sub-network, fusing through the full connection layer III, and outputting a fusion result through a softmax classifier to complete training to obtain a CNN structure based on multi-characteristic fusion;

s54, respectively calculating the standard deviation and the depth characteristic of each complex CU in the test set I, inputting a CNN structure based on multi-characteristic fusion to obtain a classification result, and calculating a prediction error by using a loss function;

and S55, judging whether the prediction error is smaller than the set error, if so, storing the CNN structure based on multi-feature fusion to classify the complex CU, otherwise, adding a training set I, and returning to the step S52.

5. The adaptive CU splitting decision method based on deep learning and multi-feature fusion according to claim 1, wherein the method for classifying the complex CU by using the adaptive CNN structure is as follows:

s61, acquiring adjacent blocks corresponding to the complex CUs with the same size in the N groups of video sequences as data sets, wherein the adjacent blocks are NB1, NB2, NB3 and NB4 respectively, and dividing the data sets into a training set II and a testing set II;

s62, constructing a network structure which is a sub-network of the convolutional layer I-pooling layer I-convolutional layer II-pooling layer II-full connection layer I-full connection layer II, inputting training set II data respectively corresponding to adjacent blocks NB1, NB2, NB3 and NB4 into the sub-network respectively, fusing the training set II data through the full connection layer III, and outputting a fusion result through a softmax classifier to complete training to obtain a self-adaptive CNN structure;

s63, inputting test sets II corresponding to adjacent blocks NB1, NB2, NB3 and NB4 into a self-adaptive CNN structure to obtain classification results, and calculating prediction errors by using a loss function;

and S64, judging whether the prediction error is smaller than the preset value, if so, storing the self-adaptive CNN structure to classify the complex CUs, otherwise, adding a training set II, and returning to the step S62.

6. The adaptive CU split decision method based on deep learning and multi-feature fusion according to claim 4 or 5, wherein the loss function is:

wherein the content of the first and second substances, andrespectively representing actual and predicted sample values of the CNN structure, m representing the number of samples, i representing the ith sample, ρ1And ρ2Represents weighting coefficients and QP represents a quantization parameter.

7. The adaptive CU splitting decision method based on deep learning and multi-feature fusion according to claim 4 or 5, wherein kernel sizes of the convolutional layers I, II, III and IV are all 3 × 3, activation functions of the fully-connected layers I, II and III are all Re L U, and the pooling layers I, II, III and IV are all maximum pooling layers.

Technical Field

The invention relates to the technical field of image processing, in particular to a self-adaptive CU splitting decision method based on deep learning and multi-feature fusion.

Background

With the higher demands on video compression, the development of more efficient video coding standards becomes more important. JFET developed the next generation video coding standard, H.266/VVC. The H.266/VVC Test Model (VTM) implements a number of novel techniques that can significantly improve coding efficiency. The h.266/VVC uses quad-tree nested multi-type tree (QTMT) coding block architecture for block partitioning, which shows better coding performance but results in a great computational complexity, perhaps 5 times that of HEVC, and also contains 67 intra prediction modes for intra prediction, where the planar mode and DC mode become more dense as the prediction modes of h.265/HEVC, and thus more accurate prediction can be obtained, but at the same time the computational complexity increases. In addition, some other tools have been introduced to improve coding efficiency, such as position Dependent Intra Prediction Combination (PDPC) and Multiple Transform Selection (MTS), which significantly enhance the coding performance of h.266/VVC but result in extremely high computational complexity. Under the configuration condition of "All Intra", the Intra coding complexity of VTM is 18 times higher than that of HEVC test model (HM). Therefore, for H.266/VVC, it is crucial to develop a fast encoding algorithm that meets the actual demands of the potential market.

H.266/VVC intra coding proposes some new techniques based on efficient video coding H.265/HEVC and extends some previous approaches. Where a block division structure is the core of the coding layer, a flexible block size can achieve excellent coding performance. h.266/VVC uses the QTMT partition block structure to obtain more efficient Coding performance, but results in most of the complexity increase, so the process of obtaining the best CU (Coding Unit) is more complex than in HEVC. Furthermore, more CU shapes may greatly increase the complexity of intra prediction and have longer coding time.

In view of the above, researchers developed some work on CU partitioning decision of h.266/VVC to reduce Coding complexity, and many fast CU partitioning methods including heuristic methods and learning-based methods have been proposed in the literature, among which h.yang et al propose a fast intra-Coding method based on h.266/VVC, which combines derivation of low-complexity CTU (Coding Tree Unit) structure and fast intra-mode decision methods to speed up operation, z.jin et al propose an effective fast partition method of partition of.

In recent years, learning-based methods have attracted more and more attention and have significantly improved performance, we introduce methods for learning HEVC and h.266/VVC, z. L iu et al propose a deep CNN (Convolutional Neural Network) structure to predict CTU partitions, m.xu et al propose a deep learning method to predict CU partitions to reduce complexity of HEVC, another part propose an algorithm based on depth learning, Machine learning (M L), or CNN method for h.266/VVC to accelerate the encoding process, t.amoesty et al introduce a fast QTBT partitioning method based on M L, determine a partitioning mode for each block using a random classifier, z.jin et al propose a fast CU depth algorithm based on decision n, model bt partitioning range for a qt classification problem, a fast qt coding method based on qt g et al, introduce a fast qtn partitioning mode for a video coding scheme based on adjustable qtn, a fast coding method based on a qtn partitioning mode for a video coding scheme based on a qt.57, a smart video encoder, etc. introduces a decision-based on a quick qtn prediction algorithm based on a weighted prediction algorithm based on a qtn classification scheme, a weighted prediction method based on a weighted prediction method, a weighted prediction method based on a weighted prediction method, a decision-based on a weighted prediction method, a decision-based on a.

The depth and shape of CU blocks in the QTMT structure are closely related to the complexity and direction of textures, and deep learning can better assemble and analyze data, so that new knowledge and conventional knowledge can be obtained, and a model can be constructed and decision can be supported by utilizing the knowledge.

Disclosure of Invention

Aiming at the defects in the background technology, the invention provides a self-adaptive CU splitting decision method based on deep learning and multi-feature fusion, which combines the deep learning and the multi-feature fusion and solves the technical problem of coding complexity.

The technical scheme of the invention is realized as follows:

a self-adaptive CU splitting decision method based on deep learning and multi-feature fusion comprises the following steps:

s1, constructing a threshold model according to the quantization parameter function and the depth function, and calculating the threshold of the texture classification of the current CU according to the quantization parameter and the depth of the current CU;

s2, calculating the texture complexity SD of the current CU by using the standard deviation;

s3, judging whether the texture complexity SD is smaller than the threshold value in the step S1, if so, judging that the current CU is a uniform CU, not executing splitting, otherwise, judging that the current CU is a complex CU, and executing a step S4;

s4, judging whether the complex CU is located at the edge of the image, if so, executing a step S5, otherwise, executing a step S6;

s5, classifying the complex CU by using a CNN structure based on multi-feature fusion, and splitting the complex CU when the classification result is 1, namely the rate distortion of the split complex CU is smaller than that of the split complex CU, otherwise, the complex CU is not split;

and S6, classifying the complex CU by using the self-adaptive CNN structure, splitting the complex CU when the classification result is 1, and otherwise, not splitting the complex CU.

The threshold model is:

Th=F(QP)×G(Depth),

wherein Th represents a threshold value of texture classification, Depth represents the Depth of the current CU, QP represents a quantization parameter, F (-) represents a quantization parameter function, and G (-) represents a Depth function;

the quantization parameter function and the depth function are respectively:

wherein R isCU_DepthRepresents the ratio of the current CU with Depth of Depth in one frame image.

The expression of the texture complexity SD is:

where W and H represent the width and height of the current CU, respectively, and p (x, y) represents the pixel value at (x, y).

The method for classifying the complex CU by using the CNN structure based on the multi-feature fusion comprises the following steps:

s51, acquiring complex CUs positioned at the edge in the M groups of video sequences, turning and rotating the complex CUs to be used as a data set, wherein the data set is divided into a training set I and a testing set I;

s52, respectively calculating the standard deviation and the depth feature of each complex CU in the training set I;

s53, building a network structure which is a sub-network of a convolutional layer I-a pooling layer I-a convolutional layer II-a pooling layer II-a convolutional layer III-a pooling layer III-a convolutional layer IV-a pooling layer IV-a full connection layer I-a full connection layer II, inputting training set I data corresponding to standard deviation and depth characteristics into the sub-network, fusing through the full connection layer III, and outputting a fusion result through a softmax classifier to complete training to obtain a CNN structure based on multi-characteristic fusion;

s54, respectively calculating the standard deviation and the depth characteristic of each complex CU in the test set I, inputting a CNN structure based on multi-characteristic fusion to obtain a classification result, and calculating a prediction error by using a loss function;

and S55, judging whether the prediction error is smaller than the set error, if so, storing the CNN structure based on multi-feature fusion to classify the complex CU, otherwise, adding a training set I, and returning to the step S52.

The method for classifying the complex CU by using the self-adaptive CNN structure comprises the following steps:

s61, acquiring adjacent blocks corresponding to the complex CUs with the same size in the N groups of video sequences as data sets, wherein the adjacent blocks are NB1, NB2, NB3 and NB4 respectively, and dividing the data sets into a training set II and a testing set II;

s62, constructing a network structure which is a sub-network of the convolutional layer I-pooling layer I-convolutional layer II-pooling layer II-full connection layer I-full connection layer II, inputting training set II data respectively corresponding to adjacent blocks NB1, NB2, NB3 and NB4 into the sub-network respectively, fusing the training set II data through the full connection layer III, and outputting a fusion result through a softmax classifier to complete training to obtain a self-adaptive CNN structure;

s63, inputting test sets II corresponding to adjacent blocks NB1, NB2, NB3 and NB4 into a self-adaptive CNN structure to obtain classification results, and calculating prediction errors by using a loss function;

and S64, judging whether the prediction error is smaller than the preset value, if so, storing the self-adaptive CNN structure to classify the complex CUs, otherwise, adding a training set II, and returning to the step S62.

The loss function is:

wherein the content of the first and second substances, andrespectively representing actual and predicted sample values of the CNN structure, m representing the number of samples, i representing the ith sample, ρ1And ρ2Represents weighting coefficients and QP represents a quantization parameter.

The kernel sizes of the convolutional layers I, II, III and IV are all 3 × 3, the activation functions of the fully-connected layers I, II and III are all Re L U, and the maximum pooling layers are adopted for the pooling layers I, II, III and IV.

The beneficial effect that this technical scheme can produce: according to the method, a texture classification model based on a threshold value is established, the CUs are divided into complex CUs and uniform CUs, and the uniform CUs are not split; for the complex CU at the edge, performing a CNN structure based on multi-feature fusion to classify the complex CU; for the remaining complex CUs, an adaptive CNN structure is performed to classify the complex CUs. The division of the complex CU depends on parameters of a training network and the CU, the training sample can be successfully processed by the training scheme based on the multi-feature fusion CNN structure and the self-adaptive CNN structure, and the whole RDO calculation is avoided, so that the calculation complexity is reduced, and the encoding time is saved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a block diagram of a neighboring block location distribution of a current CU block, where C denotes the current CU block and NB denotes the neighboring block;

FIG. 3 is a convolutional neural network structure based on multi-feature fusion of the present invention;

FIG. 4 is a diagram of an adaptive convolutional neural network structure of the present invention;

FIG. 5 is a comparison of the coding time savings of the present invention versus the FPIC, ACSD and FCPD methods;

FIG. 6 shows the comparison of the increase in BD-rate of the FPIC, ACSD and FCPD methods of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a depth learning and multi-feature fusion based adaptive CU splitting decision method, which includes first calculating texture complexity of a CU using a standard deviation SD, and then establishing a threshold model capable of improving segmentation accuracy based on a function of a quantization parameter QP and depth to identify a complex CU and a uniform CU; performing an adaptive CNN structure, since an optimal CU size depends on the complexity of a neighboring block in intra coding of h.266/VVC and pixels of the neighboring block are important elements for high prediction accuracy, if a CU belongs to a complex CU at an intermediate position, determining whether to split using the adaptive CNN structure; executing a CNN structure based on multi-feature fusion, if the CU belongs to a complex CU at an edge position, the adjacent blocks cannot be referred to, in order to improve the accuracy of classification, taking SD (secure digital) and depth features for calculating texture complexity as the input of the CNN structure based on multi-feature fusion, and judging whether the CNN structure is divided or not through the CNN structure; finally, the division of the CUs depends on the parameters of the training network and the CUs, and the two CNN schemes can successfully process the training samples and terminate the rate distortion calculation of some complex CUs, thereby reducing the calculation complexity and saving the coding time. The method comprises the following specific steps:

s1, constructing a threshold model according to the quantization parameter function and the depth function to improve the segmentation precision, and calculating the threshold of the texture classification of the current CU according to the quantization parameter and the depth of the current CU;

the threshold model is:

Th=F(QP)×G(Depth),

where Th denotes a threshold of texture classification, Depth denotes a Depth of a current CU, QP denotes a quantization parameter, F (-) denotes a quantization parameter function, and G (-) denotes a Depth function, where the quantization parameter function and the Depth function are respectively:

wherein R isCU_DepthRepresenting the ratio of a CU with Depth of Depth in a frame of image; and the quantization parameter function and the depth function are empirical functions obtained through a large number of experiments.

From simulation experiments, the threshold was found to be related to the depth and QP of the CU. The depth distribution of a CU is different in different types of frames. The depth is calculated as follows:

PCU_Depth=HCU_Depth×WCU_Depth

PFrame=HFrame×WFrame

where CU _ Depth represents the Depth of the current CU, PCU_DepthAnd PFramePixels representing depth level and one frame image, H, respectivelyCU_DepthWidth, W, of current CU representing Depth as DepthCU_DepthHigh, N, representing current CU with Depth DepthCU_DepthRepresenting the number of current CUs with Depth.

S2, calculating the texture complexity SD of the current CU by using the standard deviation, wherein the expression of the texture complexity SD is as follows:

where W and H represent the width and height of the CU, respectively, and p (x, y) represents the pixel value at (x, y).

S3, judging whether the texture complexity SD is smaller than a threshold value, wherein the threshold value is calculated according to the actual situation of the current CU block, the CU blocks are different, the threshold value is also different, if so, the current CU is a uniform CU, splitting is not executed, otherwise, the current CU is a complex CU, and the step S4 is executed; and comparing the texture complexity SD with a threshold value, and if the texture complexity SD is smaller than the threshold value, determining that the CU is a uniform CU and is not split. Otherwise, the proposed adaptive CNN structure and multi-feature fusion based CNN structure are used for classification of complex CUs.

S4, judging whether the complex CU is located at the edge of the image, if so, executing a step S5, otherwise, executing a step S6; if no neighboring CU blocks of the current CU are detected, or the number of neighboring CU blocks is not sufficient for four, it can be determined that the current CU is located at an edge of the image. Since in previous research on HEVC, the general input is a fixed-size CU, while the QTMT partition structure in h.266/VVC results in square and rectangular CU blocks, the method of downsampling the input blocks loses much valuable classification information. Therefore, two CNN structures are proposed by determining whether a complex CU is located at an edge.

If the complex CU is not an edge CU, the optimal CU size depends on the complexity of the neighboring blocks in the intra coding of h.266/VVC and the pixels of the neighboring blocks are important elements for high prediction accuracy, fig. 2 gives the block positions of the neighboring blocks, and an adaptive-based CNN structure is established using the residual blocks of the neighboring blocks of the current complex CU block as input of the CNN structure to achieve classification of the complex CU.

If the complex CU is located at the edge, the edge CU does not have enough reference blocks and cannot adopt residual blocks of adjacent blocks for processing, and in order to improve the accuracy of classification, a CNN structure based on texture and depth feature fusion is established on the basis of a traditional CNN structure to classify the complex CU.

S5, classifying the complex CU by using a CNN structure based on multi-feature fusion, wherein the CNN structure based on feature fusion is used for executing some operations to increase the number of image sets, such as turning and rotating each picture in a video sequence, wherein the image sets are divided into a training set and a test set, in the training set, SD and depth features in samples of the training set are calculated and divided into two channels as input CNN structures based on multi-feature fusion, then, the obtained information is summarized to a last full connection layer L as a final classification basis of a softmax classifier, the test set can accurately classify (i.e. split or not split) the CU at the edge of a complex region by the CNN structure, the CNN structure based on texture and depth feature fusion is established on the basis of a traditional CNN structure, the calculated SD and depth features are used as input of the CNN structure, and the CNN structure is divided into two channels, wherein each channel is composed of four convolution layers, four pooling layers and three FC excitation functions L, a Re L type of SD and depth feature fusion is used as an input of the CNN structure, and the CNN structure is used for improving the accuracy of the coding of the CNN classification, and the CNN classification is improved by using an iterative process of a traditional CNN classification process of reducing the maximum classification loss of the CNN classification, wherein the CNN classification of the CNN classification is performed by using a traditional CNN classification process:

s51, acquiring complex CUs positioned at the edge in the M groups of video sequences, turning and rotating the complex CUs to be used as a data set, wherein the data set is divided into a training set I and a testing set I;

s52, respectively calculating the standard deviation and the depth feature of each complex CU in the training set, and taking the standard deviation and the depth feature as two inputs;

s53, building a network structure which is a sub-network of a convolutional layer I-a pooling layer I-a convolutional layer II-a pooling layer II-a convolutional layer III-a pooling layer III-a convolutional layer IV-a pooling layer IV-a full connection layer I-a full connection layer II, inputting training sets corresponding to standard deviation and depth characteristics into the sub-network respectively, fusing through the full connection layer III, outputting a fusion result through softmax, and completing training to obtain the CNN structure based on multi-characteristic fusion, wherein the core sizes of the convolutional layer I, the convolutional layer II, the convolutional layer III and the convolutional layer IV are all 3 × 3, the activation functions of the full connection layer I, the full connection layer II and the full connection layer III are all Re L U, and the largest pooling layer I, the pooling layer II, the pooling layer III and the pooling layer IV are all the largest pooling layer.

In the original partitioned mode decision, rate distortion values of all possible modes are calculated, and then an optimal rate distortion value is selected, which can achieve good RD properties, but causes great complexity. The invention introduces the trained model into the encoder based on the fused CNN classifier and the self-adaptive CNN classifier, can quickly obtain the division result, and avoids the calculation rate distortion, thereby reducing the calculation complexity and saving the encoding time.

S54, respectively calculating the standard deviation and the depth characteristic of each complex CU in the test set I, inputting a CNN structure based on multi-characteristic fusion to obtain a classification result, and calculating a prediction error by using a loss function;

the loss function is:

wherein the content of the first and second substances, andrespectively representing actual and predicted sample values of the CNN structure, m representing the number of samples, i representing the ith sample, ρ1And ρ2Represents weighting coefficients and QP represents a quantization parameter.

And S55, judging whether the prediction error is smaller than the set error which is 0.2, if so, storing the CNN structure based on multi-feature fusion to classify the complex CU, otherwise, adding the training set I, and returning to the step S52.

And when the classification result obtained in the step S55 is 1, that is, the rate distortion after the splitting of the complex CU is smaller than the rate distortion before the splitting, splitting the complex CU, otherwise, not splitting the complex CU.

S6, classifying the complex CU by using an adaptive-based CNN structure, classifying different CU shapes into different batches according to the size of input training data in the training process of the adaptive CNN structure, grouping CU blocks of the same shape into a batch, respectively training the CUs of different shapes, all training samples are extracted from a test video, dividing the CUs of different sizes into different data sets by collecting the training samples, the test samples are also extracted from the test video, different from the video sequence of the training samples, the CNN structure can be basically and accurately predicted to which class (i.e., split or not split) the CU by testing, so that the adaptive CNN structure is feasible, different from the video sequence of the training samples, the CNN structure is tested to which class (i.e., split or not split) the CU blocks belong, the adaptive CNN structure comprises two convolution layers, two pooling layers and three fully connected layers FC L, wherein the two pooling layers are used for the complicated CU blocks with a width larger than 32 or a height larger than 16, the complicated CU blocks are classified into a sub-layer, the sub-map the complicated CU blocks into a sub-map, as a sub-layer, the sub-block map, the sub-map, as a sub-map function of a sub-block map, the sub-block map, the sub-map, as a sub-map, the sub-sub:

s61, acquiring adjacent blocks corresponding to the complex CUs with the same size in the N groups of video sequences as data sets, wherein the adjacent blocks are NB1, NB2, NB3 and NB4 respectively, and dividing the data sets into a training set II and a testing set II;

s62, constructing a network structure which is a sub-network of a convolutional layer I-a pooling layer I-a convolutional layer II-a pooling layer II-a full connection layer I-a full connection layer II, inputting training sets II corresponding to adjacent blocks NB1, NB2, NB3 and NB4 into the sub-network respectively, fusing the training sets II through the full connection layer III, and outputting a fusion result through softmax to complete training to obtain the self-adaptive CNN structure, wherein the sizes of kernels of the convolutional layer I and the convolutional layer II are 3 × 3, activation functions of the full connection layer I, the full connection layer II and the full connection layer III are Re L U, and the pooling layer I and the pooling layer II are maximum pooling layers.

S63, inputting test sets II corresponding to adjacent blocks NB1, NB2, NB3 and NB4 into a self-adaptive CNN structure to obtain classification results, and calculating prediction errors by using a loss function;

and S64, judging whether the prediction error is smaller than the set error which is 0.2, if so, storing the self-adaptive CNN structure to classify the complex CU, otherwise, adding the training set II, and returning to the step S62.

And when the classification result of the step S64 is 1, splitting the complex CU, otherwise, not splitting the complex CU.

To evaluate the method of the present invention, simulation tests were performed on the latest H.266/VVC encoder (VTM 7.0). The test video sequence is encoded in an "All Intra" configuration using default parameters. The BD-rate reflects the encoding performance of the present invention, and the Time (TS) saved represents a reduction in complexity.

Table 1 gives the coding characteristics of the proposed overall scheme, which can save 39.31% of the coding run time, with an average BDBR delta of 0.89% for the composite view. Therefore, the invention can effectively save the coding time, and the loss of the RD performance can be ignored.

TABLE 1 encoding characteristics of the invention

It can be seen from table 1 that the present invention can save 39.31% of the encoding time and maintain similar RD performance. For different test videos, the experimental results may fluctuate, but are valid for video sequences, because High Definition (HD) or Ultra High Definition (UHD) videos tend towards larger CUs. Compared with VTM, the present invention has better coding performance, mainly due to the threshold model defined by the present invention and two improved CNN structures.

The method of the present invention is compared to the latest H.266/VVC fast method. These algorithms include FPIC, ACSD and FCPD. Fig. 5 and 6 show the encoding time saving and the encoding result of BDBR, respectively, and it can be seen from fig. 5 and 6 that the scheme has higher performance in terms of reducing the computational burden and can further save about 5.78% -9.82% of the encoding time compared with ACSD and FCPD methods. Compared with FPIC, ACSD and FCPD methods, the method of the invention has better coding efficiency and can further reduce the BD rate by 0.12-0.51%. These results show that the present invention is effective for all classifications of video sequences and has computational complexity superior to the latest fast method of h.266/VVC.

The technical scheme of the invention is described in detail above with reference to the accompanying drawings, and specifically, deep learning and multi-feature fusion are used in combination to solve the problem of coding complexity. Complex CUs and uniform CUs are first identified by building a threshold-based texture classification model. If the complex CU is an edge CU, a CNN structure based on multi-feature fusion is performed to classify the CU. Otherwise, an adaptive CNN structure is executed to classify the CU. Finally, the partitioning of the CUs depends on the parameters of the training network and the CUs. When the CU is split, the two CNN structure training schemes can successfully process training samples, and the whole RDO calculation is avoided, so that the calculation complexity is reduced, and the encoding time is saved.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于点云和影像融合技术的铁路营业线三维中线制作方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!