Method for performing video coding and decoding operation by using AI chip

文档序号:1925597 发布日期:2021-12-03 浏览:38次 中文

阅读说明:本技术 一种使用ai芯片进行视频编解码运算的方法 (Method for performing video coding and decoding operation by using AI chip ) 是由 刘凯 刘毅 刘斌 郑海勇 吕亚堂 林涛睿 王剑峰 于 2021-09-08 设计创作,主要内容包括:本发明提供一种使用AI芯片进行视频编解码运算的方法,在编码过程中,CPU读取输入的视频流数据,进行分组、划分宏块后传给AI芯片,进行帧内数据、帧间数据编码,CPU进行变换、量化及熵编码后将视频流数据打包输出;在解码过程中,CPU读取打包后的视频流数据,传递给AI芯片进行熵解码,CPU进行反量化和反变换操作后传给AI芯片,AI芯片并行对帧内数据、帧间数据进行预测,重构出图像数据,CPU对图像数据进行去块滤波操作,然后发送出去。本发明通过使用AI芯片进行视频的编解码操作,获得更加有效的压缩率和提升处理速度,方便的进行编解码算法的更新,方便开发、测试各种新的算法,并为后续将视频编解码与神经网络算法的结合打好基础。(The invention provides a method for using AI chip to carry on the operation of video coding and decoding, in the course of encoding, CPU reads the video stream data of input, carry on grouping, divide behind the macroblock and transmit to AI chip, carry on the intraframe data, interframe data coding, CPU pack the video stream data and export after transforming, quantizing and entropy coding; in the decoding process, the CPU reads the packed video stream data and transmits the video stream data to the AI chip for entropy decoding, the CPU performs inverse quantization and inverse transformation operations and then transmits the video stream data to the AI chip, the AI chip performs prediction on the intra-frame data and the inter-frame data in parallel to reconstruct the image data, and the CPU performs deblocking filtering operation on the image data and then sends the image data out. The invention uses the AI chip to carry out the video coding and decoding operation, obtains more effective compression ratio and improves the processing speed, conveniently updates the coding and decoding algorithm, conveniently develops and tests various new algorithms, and lays a foundation for the subsequent combination of the video coding and decoding and the neural network algorithm.)

1. A method for performing video coding and decoding operation by using an AI chip is characterized by comprising the following steps:

s1, in the encoding process, the CPU reads the input video stream data, divides the input video stream data into groups and macro blocks and transmits the macro blocks to the AI chip, the AI chip encodes the intra-frame data and the inter-frame data and transmits the encoded data back to the CPU, and the CPU performs transformation, quantization and entropy encoding and then packs and outputs the video stream data;

s2, in the decoding process, the CPU reads the packed video stream data and transmits the data to the AI chip, the AI chip performs entropy decoding and transmits the data back to the CPU, the CPU performs inverse quantization and inverse transformation operations and transmits the data to the AI chip, the AI chip performs prediction on the intra-frame data and the inter-frame data in parallel to reconstruct the image data, and the CPU performs deblocking filtering operation on the image data and then transmits the image data.

2. The method of claim 1, wherein the AI chip is configured to perform the video codec operation,

step S1 specifically includes the following steps:

s11, the CPU reads the input video stream data, groups and divides each frame of data into macro blocks, and then transmits the macro blocks to the AI chip;

s12, the AI chip encodes the frame data and the frame data in parallel, and performs motion estimation and motion compensation on the frame data;

s13, the AI chip transmits the coded video stream data to the CPU, and the CPU performs transformation and quantization operations;

s14, the CPU packs and outputs the video stream data after entropy coding operation.

3. The method of claim 1, wherein the AI chip is configured to perform the video codec operation,

step S2 specifically includes the following steps:

s21, the CPU reads the packed video stream data, identifies the video frame and data slice, and transmits to the AI chip;

s22, the AI chip performs entropy decoding and then returns the decoded result to the CPU;

s23, the CPU performs inverse quantization and inverse transformation operation and then returns the operation to the AI chip;

s24, the AI chip predicts the intra-frame data and the inter-frame data in parallel, reconstructs the image data and transmits the image data back to the CPU;

s25, the CPU sends out the image data after the deblocking filter operation.

4. The method of claim 2, wherein the AI chip is configured to perform the video codec operation,

in step S12, the motion estimation processing flow specifically includes the following steps:

s121, the CPU transmits image data and reference frame data of a frame to be coded to an AI chip;

s122, the calculation tasks are divided inside the AI chip to realize parallelization processing, and different image data and reference frame data are processed by using the same algorithm in each calculation part;

s123, dividing tasks according to the number of reference frames to be compared, and realizing simultaneous comparison of a plurality of reference frames;

s124, performing task division for the row or the column of the frame to be coded;

s125, starting parallel processing of all divided tasks, comparing macro block data, and calculating difference values;

s126, according to the calculation result of the parallel task, the macro block data with the minimum difference value is taken for recording and encoding;

and S127, transmitting the result to the CPU for subsequent processing.

5. The method of claim 3, wherein the AI chip is configured to perform the video codec operation,

in step S22, the entropy decoding process flow specifically includes the following steps:

s221, the CPU analyzes the packed video stream data, analyzes the corresponding video frame and data slice, and transmits the data slice data to the AI chip;

s222, the AI chip receives the data of the data slice, divides the tasks and distributes the tasks to each processing unit for parallel processing;

s223, the processing units in the AI chip execute entropy decoding operation in parallel;

and S224, the AI chip collects the results of the entropy decoding operation and sends the results to the CPU for subsequent processing.

6. The method of claim 2, wherein the AI chip is configured to perform the video codec operation,

in the step S12, in step S12,

the intra-frame data coding and the motion compensation are processed in an AI chip parallel processing mode.

7. The method of claim 3, wherein the AI chip is configured to perform the video codec operation,

in the step S24, in step S24,

and inter-frame data prediction is processed in an AI chip parallel processing mode.

Technical Field

The invention relates to the technical field of video coding and decoding, in particular to a method for carrying out video coding and decoding operation by using an AI chip.

Background

With the continuous development of smart phones and 5G communication technologies, people are increasingly producing and consuming a large amount of video data such as monitoring, live broadcasting, video communication, short videos and the like, the downloading amount and daily living number of apps of the type are continuously increased, and data reports predict that the video data accounts for more than 80% of the total data amount of the internet.

The storage and transmission of a large amount of video data is a huge challenge, the challenge is larger and larger with the continuous improvement of the resolution and the frame rate of the video data, and the video coding and decoding scheme in the field of computers is specially researched to solve the problem. The key technology for solving the problems of video data storage and transmission is video compression, and the mainstream encoding formats in the market at present are H264/H265, VP8/VP9, MPEG-4 and the like.

Meanwhile, with the development of video encoding formats, methods for implementing video encoding and decoding are also changing. At the PC end, the method is realized by a pure software scheme of a CPU at the earliest time, so that a large amount of CPU resources are occupied, and resource contention easily occurs. Later, a GPU scheme is adopted, a special coding and decoding hardware module is integrated in the GPU, the CPU is released, and the efficiency is improved. At the mobile phone end, a dedicated coding and decoding hardware module is adopted at present, some manufacturers are called VPUs, some manufacturers are integrated in GPUs, and the module is a dedicated DSP chip and is responsible for video coding and decoding in the mobile phone.

The traditional video coding and decoding chip can meet most of the current requirements, but still has the following problems:

1) the special video coding and decoding chip still has a bottleneck on performance;

2) the video coding and decoding chip is a special chip and cannot be shared with other functional modules;

3) the algorithm is written into the coding and decoding chip in advance, and the method brings inconvenience for updating the algorithm;

4) the development and the test of a new coding and decoding algorithm are not facilitated;

therefore, the prior art is deficient and needs further improvement.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method for performing video coding and decoding operation by using an AI chip.

In order to achieve the purpose, the invention adopts the following specific scheme:

the invention provides a method for carrying out video coding and decoding operation by using an AI chip, which comprises the following steps:

s1, in the encoding process, the CPU reads the input video stream data, divides the input video stream data into groups and macro blocks and transmits the macro blocks to the AI chip, the AI chip encodes the intra-frame data and the inter-frame data and transmits the encoded data back to the CPU, and the CPU performs transformation, quantization and entropy encoding and then packs and outputs the video stream data;

s2, in the decoding process, the CPU reads the packed video stream data and transmits the data to the AI chip, the AI chip performs entropy decoding and transmits the data back to the CPU, the CPU performs inverse quantization and inverse transformation operations and transmits the data to the AI chip, the AI chip performs prediction on the intra-frame data and the inter-frame data in parallel to reconstruct the image data, and the CPU performs deblocking filtering operation on the image data and then transmits the image data.

Further, step S1 specifically includes the following steps:

s11, the CPU reads the input video stream data, groups and divides each frame of data into macro blocks, and then transmits the macro blocks to the AI chip;

s12, the AI chip encodes the frame data and the frame data in parallel, and performs motion estimation and motion compensation on the frame data;

s13, the AI chip transmits the coded video stream data to the CPU, and the CPU performs transformation and quantization operations;

s14, the CPU packs and outputs the video stream data after entropy coding operation.

Further, step S2 specifically includes the following steps:

s21, the CPU reads the packed video stream data, identifies the video frame and data slice, and transmits to the AI chip;

s22, the AI chip performs entropy decoding and then returns the decoded result to the CPU;

s23, the CPU performs inverse quantization and inverse transformation operation and then returns the operation to the AI chip;

s24, the AI chip predicts the intra-frame data and the inter-frame data in parallel, reconstructs the image data and transmits the image data back to the CPU;

s25, the CPU sends out the image data after the deblocking filter operation.

Further, in step S12, the motion estimation processing flow specifically includes the following steps:

s121, the CPU transmits image data and reference frame data of a frame to be coded to an AI chip;

s122, the calculation tasks are divided inside the AI chip to realize parallelization processing, and different image data and reference frame data are processed by using the same algorithm in each calculation part;

s123, dividing tasks according to the number of reference frames to be compared, and realizing simultaneous comparison of a plurality of reference frames;

s124, performing task division for the row or the column of the frame to be coded;

s125, starting parallel processing of all divided tasks, comparing macro block data, and calculating difference values;

s126, according to the calculation result of the parallel task, the macro block data with the minimum difference value is taken for recording and encoding;

and S127, transmitting the result to the CPU for subsequent processing.

Further, in step S22, the entropy decoding processing flow specifically includes the following steps:

s221, the CPU analyzes the packed video stream data, analyzes the corresponding video frame and data slice, and transmits the data slice data to the AI chip;

s222, the AI chip receives the data of the data slice, divides the tasks and distributes the tasks to each processing unit for parallel processing;

s223, the processing units in the AI chip execute entropy decoding operation in parallel;

and S224, the AI chip collects the results of the entropy decoding operation and sends the results to the CPU for subsequent processing.

Further, in step S12,

the intra-frame data coding and the motion compensation are processed in an AI chip parallel processing mode.

Further, in step S24,

and inter-frame data prediction is processed in an AI chip parallel processing mode.

By adopting the technical scheme of the invention, the invention has the following beneficial effects:

the invention carries out the video coding and decoding operation by using the AI chip, utilizes the increasingly stronger computing power of the AI chip under the condition of not increasing additional investment, not only can obtain more effective compression ratio and improve processing speed, but also can update the coding and decoding algorithm more conveniently, is convenient to develop and test various new algorithms, lays a foundation for the subsequent combination of video coding and decoding and neural network algorithm, and develops a more intelligent, more adaptive and more optimized video coding and decoding scheme.

Drawings

FIG. 1 is a schematic diagram of a CPU and an AI chip cooperating to complete an encoding process according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating the amount of computation performed by motion estimation encoding according to an embodiment of the present invention;

FIG. 3 is a flow diagram illustrating a motion estimation parallel process according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a decoding process performed by the CPU and the AI chip according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating the overall encoding/decoding process according to an embodiment of the present invention;

FIG. 6 is a flowchart of the encoding flow steps of an embodiment of the present invention;

fig. 7 is a flowchart of the decoding process according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the following figures and specific examples.

The invention is explained in detail in connection with figures 1-7,

the invention provides a method for carrying out video coding and decoding operation by using an AI chip, which comprises two flows of coding and decoding.

In the traditional software coding and decoding scheme, the CPU is used for independently completing the coding and decoding, and in the new scheme, the CPU and an AI chip (NPU) are used for cooperatively completing the coding and decoding.

Firstly, an encoding process:

the implementation of the new codec framework is shown in fig. 1.

Because the structure of the AI chip is naturally suitable for a large amount of parallel computing, the part of the video coding that can be operated in parallel is handed to the AI chip for processing, and only the task of serial operation is handed to the CPU for completion.

The whole encoding process is as follows:

the first step is as follows: the CPU reads the input video stream data, groups and divides each frame of data into macro blocks, and transmits the video stream data to the AI chip;

the second step is that: the AI chip encodes the intra-frame data and the inter-frame data in parallel, wherein for the inter-frame data encoding, motion estimation and motion compensation are required;

the third step: the AI chip transmits the coded video stream data to the CPU, and the conversion and quantization operations are carried out on the CPU;

the fourth step: performing entropy coding operation on the CPU, and outputting after packaging operation;

among all tasks of video coding, intra data coding, motion estimation, and motion compensation are computationally intensive tasks. It is known that in a coding protocol such as H264, a video is divided into macroblocks, and the sizes of the macroblocks have various patterns, such as 16 × 16, 8 × 8, 4 × 4, etc. When the data in the frame is coded, the current macro block to be coded is compared with the macro blocks which are already coded one by one, the comparison operation is increased when the resolution is higher and the divided macro blocks are smaller, and the comparison speed can be accelerated by utilizing the parallel processing characteristic of an AI chip, so that a better coding effect is obtained.

The parallel computing method of motion estimation comprises the following steps:

in the motion of the video, the number of video frames per second is generally more than 30 frames, so the difference between the frames of the moving object is very small, in this case, the mode of the reference frame is used, other frames are compared with the reference frame, only the difference is recorded, and the finally encoded data volume is reduced to a great extent.

When the motion estimation coding is carried out, firstly, a picture is divided into macro blocks, then data of the macro blocks to be coded are used for carrying out one-to-one comparison with the macro blocks in a reference frame, wherein the number of the reference frames which can be compared is more, and the more the number of the reference frames is, the more the reference macro blocks with small difference can be found. When the contrast value is calculated, the amount of the cyclic contrast calculation is very large when there are a plurality of reference frames and the resolution is high. Fig. 2 is a diagram illustrating the amount of computation performed by motion estimation coding.

After the AI chip is introduced, these computations can be parallel, which promotes parallelism in several ways:

meanwhile, coding calculation can be carried out on multiple frames;

simultaneously, a plurality of reference frames can be compared in parallel;

the macro blocks in the coded frame can be grouped according to rows or columns, and meanwhile, the comparison calculation is carried out;

finally, for each macroblock to be encoded, after comparing the plurality of calculation results, the one with the smallest amount of difference data is left to be used as the final encoding.

Fig. 3 is a schematic diagram of a motion estimation parallel processing flow.

The motion estimation parallel processing flow specifically comprises the following steps:

1) the CPU transmits image data and reference frame data of a frame to be coded to an AI chip (NPU);

2) the AI chip divides the calculation tasks to realize parallel processing, and each calculation part uses the same algorithm but different processed data;

3) dividing tasks according to the number of reference frames to be compared to realize simultaneous comparison of a plurality of reference frames;

4) dividing tasks for rows or columns of a frame to be coded (without limitation to other division modes);

5) all divided tasks start to be processed in parallel, macro block data are compared, and difference values are calculated;

6) according to the calculation result of the parallel task, the macro block data with the minimum difference value is taken for recording and encoding, and the optimal encoding effect is achieved;

7) transmitting the result to a CPU for subsequent processing;

besides motion estimation, several other processing stages such as intra data coding, motion compensation can also be implemented in a similar way to parallelize the processing.

II, decoding process:

besides the acceleration by using AI chip parallel computation in the encoding stage, the parallelization process can be realized in the decoding stage, and the whole decoding process flow is as follows:

the first step is as follows: the CPU reads the packed video stream data, identifies the video frame and the data sheet therein, and transmits the video frame and the data sheet to the AI chip;

the second step is that: the AI chip carries out entropy decoding on the video stream data, wherein the operation can be carried out in parallel and then is transmitted back to the CPU;

the third step: the CPU carries out inverse quantization and inverse transformation operations and transmits the operations to the AI chip again;

the fourth step: the AI chip predicts the intra-frame data and the inter-frame data in parallel, reconstructs image data and transmits the image data back to the CPU;

the fifth step: the CPU performs operations such as deblocking filtering on the image data and then sends out the image data.

Wherein the parallel processing for entropy decoding is as follows:

1) the CPU analyzes the packed video stream data to analyze corresponding video frames and data slices, and can perform parallel decoding processing on the data slices because the data slices are taken as independent units during entropy coding;

2) the AI chip receives a large amount of data pieces sent by the CPU, then divides tasks and distributes the tasks to each processing unit for parallel processing;

3) the processing units in the AI chip execute entropy decoding operation in parallel;

4) the AI chip collects the results of the entropy decoding operation and finally sends the results to a CPU for subsequent processing;

besides entropy decoding, several other processing stages such as intra data encoding, inter data prediction can also implement parallelization in a similar manner.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种自恢复的视频解码方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类