Video processing method, related device and storage medium

文档序号:1820095 发布日期:2021-11-09 浏览:13次 中文

阅读说明:本技术 一种视频处理的方法、相关装置及存储介质 (Video processing method, related device and storage medium ) 是由 魏雪 杨广东 杨卫 于 2020-05-08 设计创作,主要内容包括:本申请公开了一种视频处理的方法,该方法可应用于云技术下的数据传输领域,本申请实施例包括:获取原始视频序列,其中,原始视频序列包括经过渲染后得到的P个视频帧,P为大于或等于2的整数;根据原始视频序列获取目标视频序列,其中,目标视频序列包括经过渲染后得到的P个视频帧以及经过插帧处理后得到的Q个视频帧,Q为大于或等于1的整数;对目标视频序列进行编码处理,得到视频编码序列;向终端设备发送视频编码序列,以使终端设备对视频编码序列进行解码处理,得到待渲染的视频序列。本申请还提供了装置及存储介质,本申请可以节省服务器侧的处理资源,减少处理器的开销,有利于提升服务器的业务处理能力。(The application discloses a video processing method, which can be applied to the field of data transmission under the cloud technology, and the embodiment of the application comprises the following steps: acquiring an original video sequence, wherein the original video sequence comprises P video frames obtained after rendering, and P is an integer greater than or equal to 2; acquiring a target video sequence according to the original video sequence, wherein the target video sequence comprises P video frames obtained after rendering and Q video frames obtained after frame interpolation, and Q is an integer greater than or equal to 1; coding a target video sequence to obtain a video coding sequence; and sending the video coding sequence to the terminal equipment so that the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered. The application also provides a device and a storage medium, and the processing resources of the server side can be saved, the expense of the processor is reduced, and the service processing capability of the server is favorably improved.)

1. A method of video processing, comprising:

acquiring an original video sequence, wherein the original video sequence comprises P video frames obtained after rendering, and P is an integer greater than or equal to 2;

acquiring a target video sequence according to the original video sequence, wherein the target video sequence comprises P video frames obtained after rendering and Q video frames obtained after frame interpolation, and Q is an integer greater than or equal to 1;

coding the target video sequence to obtain a video coding sequence;

and sending the video coding sequence to a terminal device so that the terminal device decodes the video coding sequence to obtain a video sequence to be rendered.

2. The method of claim 1, wherein the obtaining the target video sequence from the original video sequence comprises:

acquiring a first rendered video frame and a second rendered video frame from the original video sequence, wherein the first rendered video frame is a previous frame image adjacent to the second rendered video frame, and the first rendered video frame and the second rendered video frame are both video frames obtained after rendering;

performing frame interpolation processing on the first rendered video frame and the second rendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the first rendered video frame, and the target video frame is a previous frame image adjacent to the second rendered video frame;

generating a first video subsequence in the target video sequence according to the first rendered video frame, the target video frame and the second rendered video frame, wherein the first video subsequence is composed of the first rendered video frame, the target video frame and the second rendered video frame in sequence.

3. The method of claim 2, wherein the frame interpolation processing of the first rendered video frame and the second rendered video frame to obtain a target video frame comprises:

acquiring a first frame number corresponding to the first rendered video frame;

acquiring a second frame number corresponding to the second rendered video frame;

calculating the average value of the first frame number and the second frame number to obtain a target frame number;

based on the target frame number, acquiring K pieces of pixel information corresponding to the target video frame through an interpolation prediction model, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer greater than 1.

4. The method of claim 1, wherein the obtaining the target video sequence from the original video sequence comprises:

acquiring a first rendered video frame and a second rendered video frame from the original video sequence, wherein the first rendered video frame is a previous frame image adjacent to the second rendered video frame, and the first rendered video frame and the second rendered video frame are both video frames obtained after rendering;

performing frame interpolation processing on the first rendered video frame and the second rendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the second rendered video frame;

and generating a second video subsequence in the target video sequence according to the first rendered video frame, the target video frame and the second rendered video frame, wherein the second video subsequence is composed of the first rendered video frame, the second rendered video frame and the target video frame in sequence.

5. The method of claim 4, wherein the frame interpolation processing of the first rendered video frame and the second rendered video frame to obtain a target video frame comprises:

acquiring a second frame number corresponding to the second rendered video frame;

determining a next adjacent frame number to the second frame number as a third frame number;

calculating the average value of the second frame number and the third frame number to obtain a target frame number;

based on the target frame number, acquiring K pieces of pixel information corresponding to the target video frame through an interpolation prediction model, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer greater than 1.

6. A method of video processing, comprising:

receiving a video coding sequence sent by a server;

decoding the video coding sequence to obtain a video sequence to be rendered, wherein the video sequence to be rendered comprises X video frames which are not rendered, and X is an integer greater than or equal to 2;

acquiring a target video sequence according to the video sequence to be rendered, wherein the target video sequence comprises the X video frames which are not rendered and Y video frames obtained after frame interpolation processing, and Y is an integer greater than or equal to 1;

and rendering the target video sequence to obtain a target video.

7. The method according to claim 6, wherein the obtaining a target video sequence according to the video sequence to be rendered comprises:

acquiring a first unrendered video frame and a second unrendered video frame from the video sequence to be rendered, wherein the first unrendered video frame is a previous frame image adjacent to the second unrendered video frame, and the first unrendered video frame and the second unrendered video frame are both video frames obtained after unrendering;

performing frame interpolation processing on the first unrendered video frame and the second unrendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the first unrendered video frame, and the target video frame is a previous frame image adjacent to the second unrendered video frame;

and generating a first video subsequence in the target video sequence according to the first unrendered video frame, the target video frame and the second unrendered video frame, wherein the first video subsequence is composed of the first unrendered video frame, the target video frame and the second unrendered video frame in sequence.

8. The method of claim 7, wherein the frame interpolation processing on the first unrendered video frame and the second unrendered video frame to obtain a target video frame comprises:

acquiring a first frame number corresponding to the first unrendered video frame;

acquiring a second frame number corresponding to the second unrendered video frame;

calculating the average value of the first frame number and the second frame number to obtain a target frame number;

based on the target frame number, acquiring K pieces of pixel information corresponding to the target video frame through an interpolation prediction model, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer greater than 1.

9. The method according to claim 6, wherein the obtaining a target video sequence according to the video sequence to be rendered comprises:

acquiring a first unrendered video frame and a second unrendered video frame from the video sequence to be rendered, wherein the first unrendered video frame is a previous frame image adjacent to the second unrendered video frame, and the first unrendered video frame and the second unrendered video frame are both video frames obtained after unrendering;

performing frame interpolation processing on the first unrendered video frame and the second unrendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the second unrendered video frame;

and generating a second video subsequence in the target video sequence according to the first unrendered video frame, the target video frame and the second unrendered video frame, wherein the second video subsequence is composed of the first unrendered video frame, the second unrendered video frame and the target video frame in sequence.

10. The method of claim 9, wherein the frame interpolation processing on the first unrendered video frame and the second unrendered video frame to obtain a target video frame comprises:

acquiring a second frame number corresponding to the second unrendered video frame;

determining a next adjacent frame number to the second frame number as a third frame number;

calculating the average value of the second frame number and the third frame number to obtain a target frame number;

based on the target frame number, acquiring K pieces of pixel information corresponding to the target video frame through an interpolation prediction model, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer greater than 1.

11. A method of video processing, comprising:

a server acquires an original video sequence, wherein the original video sequence comprises P video frames obtained after rendering, and P is an integer greater than or equal to 2;

the server acquires a first target video sequence according to the original video sequence, wherein the first target video sequence comprises P video frames obtained after rendering and Q video frames obtained after frame interpolation, and Q is an integer greater than or equal to 1;

the server carries out coding processing on the first target video sequence to obtain a video coding sequence;

the server sends the video coding sequence to terminal equipment;

the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered, wherein the video sequence to be rendered comprises (P + Q) video frames which are not rendered;

the terminal device acquires a second target video sequence according to the video sequence to be rendered, wherein the second target video sequence comprises the (P + Q) video frames which are not rendered and Y video frames obtained after frame interpolation processing, and Y is an integer greater than or equal to 1;

and the terminal equipment performs rendering processing on the second target video sequence to obtain a target video.

12. A video processing apparatus, comprising:

an obtaining module, configured to obtain an original video sequence, where the original video sequence includes P video frames obtained through rendering, and P is an integer greater than or equal to 2;

the obtaining module is further configured to obtain a target video sequence according to the original video sequence, where the target video sequence includes the P video frames obtained after rendering and Q video frames obtained after frame interpolation, and Q is an integer greater than or equal to 1;

the coding module is used for coding the target video sequence to obtain a video coding sequence;

and the sending module is used for sending the video coding sequence to the terminal equipment so that the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered.

13. A video processing apparatus, comprising:

the receiving module is used for receiving a video coding sequence sent by the server;

the decoding module is used for decoding the video coding sequence to obtain a video sequence to be rendered, wherein the video sequence to be rendered comprises X video frames which are not rendered, and X is an integer greater than or equal to 2;

an obtaining module, configured to obtain a target video sequence according to the video sequence to be rendered, where the target video sequence includes the X video frames that are not rendered and Y video frames obtained after frame interpolation processing, and Y is an integer greater than or equal to 1;

and the rendering module is used for rendering the target video sequence to obtain a target video.

14. A computer device, comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is configured to execute a program in the memory, the processor is configured to perform the method of any one of claims 1 to 5 or the method of any one of claims 6 to 10 according to instructions in the program code;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

15. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any of claims 1 to 5, or perform the method of any of claims 6 to 10.

Technical Field

The present application relates to the field of industry applications based on cloud technologies, and in particular, to a video processing method, a related apparatus, and a storage medium.

Background

Cloud gaming is an online gaming technology based on cloud computing technology. In the running mode of the cloud game, all games run at the server side, and the rendered game pictures are compressed and then transmitted to the user through the network. The terminal equipment used by the user does not need a high-end processor and a display card, and only needs basic video decompression capacity.

At present, a service flow based on a cloud game is that a terminal device is connected to a cloud game server, and then interaction between a user and a game is realized through a data stream and a control stream, wherein the data stream mainly includes game picture data, that is, the cloud game server encodes a game picture, transmits the encoded game picture data to the terminal device, and the encoded game picture data is decoded by the terminal device and then displayed on an interface.

However, in the existing business process, the cloud game server renders each frame of game picture, and then encodes based on the rendered video frame, and the whole process needs to consume more processing resources, which causes the cost of the processor to be too high, and reduces the business processing capacity of the cloud game server.

Disclosure of Invention

The embodiment of the application provides a video processing method, a related device and a storage medium, which can save processing resources at a server side, reduce the overhead of a processor and facilitate the promotion of the service processing capacity of the server.

In view of the above, an aspect of the present application provides a method for video processing, including:

acquiring an original video sequence, wherein the original video sequence comprises P video frames obtained after rendering, and P is an integer greater than or equal to 2;

acquiring a target video sequence according to the original video sequence, wherein the target video sequence comprises P video frames obtained after rendering and Q video frames obtained after frame interpolation, and Q is an integer greater than or equal to 1;

coding a target video sequence to obtain a video coding sequence;

and sending the video coding sequence to the terminal equipment so that the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered.

Another aspect of the present application provides a method for video processing, including:

receiving a video coding sequence sent by a server;

decoding the video coding sequence to obtain a video sequence to be rendered, wherein the video sequence to be rendered comprises X video frames which are not rendered, and X is an integer greater than or equal to 2;

acquiring a target video sequence according to a video sequence to be rendered, wherein the target video sequence comprises X video frames which are not rendered and Y video frames which are obtained after frame interpolation processing, and Y is an integer which is greater than or equal to 1;

and rendering the target video sequence to obtain the target video.

Another aspect of the present application provides a method for video processing, including:

the method comprises the steps that a server obtains an original video sequence, wherein the original video sequence comprises P video frames obtained after rendering, and P is an integer larger than or equal to 2;

the method comprises the steps that a server obtains a first target video sequence according to an original video sequence, wherein the first target video sequence comprises P video frames obtained after rendering and Q video frames obtained after frame interpolation, and Q is an integer greater than or equal to 1;

the server carries out coding processing on the first target video sequence to obtain a video coding sequence;

the server sends a video coding sequence to the terminal equipment;

the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered, wherein the video sequence to be rendered comprises (P + Q) video frames which are not rendered;

the terminal equipment acquires a second target video sequence according to the video sequence to be rendered, wherein the second target video sequence comprises (P + Q) video frames which are not rendered and Y video frames obtained after frame interpolation processing, and Y is an integer greater than or equal to 1;

and the terminal equipment renders the second target video sequence to obtain the target video.

Another aspect of the present application provides a video processing apparatus, including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an original video sequence, the original video sequence comprises P video frames obtained after rendering, and P is an integer greater than or equal to 2;

the acquisition module is further used for acquiring a target video sequence according to the original video sequence, wherein the target video sequence comprises P video frames obtained after rendering and Q video frames obtained after frame interpolation, and Q is an integer greater than or equal to 1;

the encoding module is used for encoding a target video sequence to obtain a video encoding sequence;

and the sending module is used for sending the video coding sequence to the terminal equipment so that the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered.

In one possible design, in one implementation of another aspect of an embodiment of the present application,

the acquiring module is specifically used for acquiring a first rendered video frame and a second rendered video frame from an original video sequence, wherein the first rendered video frame is a previous frame image adjacent to the second rendered video frame, and the first rendered video frame and the second rendered video frame are both video frames obtained after rendering;

performing frame interpolation processing on the first rendered video frame and the second rendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the first rendered video frame, and the target video frame is a previous frame image adjacent to the second rendered video frame;

and generating a first video subsequence in the target video sequence according to the first rendered video frame, the target video frame and the second rendered video frame, wherein the first video subsequence is composed of the first rendered video frame, the target video frame and the second rendered video frame in sequence.

In one possible design, in another implementation of another aspect of an embodiment of the present application,

an obtaining module, configured to obtain a first frame number corresponding to a first rendered video frame;

acquiring a second frame number corresponding to a second rendered video frame;

calculating the average value of the first frame number and the second frame number to obtain a target frame number;

based on the target frame number, K pieces of pixel information corresponding to the target video frame are obtained through an interpolation frame prediction model, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer larger than 1.

In one possible design, in another implementation of another aspect of an embodiment of the present application,

the acquiring module is specifically used for acquiring a first rendered video frame and a second rendered video frame from an original video sequence, wherein the first rendered video frame is a previous frame image adjacent to the second rendered video frame, and the first rendered video frame and the second rendered video frame are both video frames obtained after rendering;

performing frame interpolation processing on the first rendered video frame and the second rendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the second rendered video frame;

and generating a second video subsequence in the target video sequence according to the first rendered video frame, the target video frame and the second rendered video frame, wherein the second video subsequence consists of the first rendered video frame, the second rendered video frame and the target video frame in sequence.

In one possible design, in another implementation of another aspect of an embodiment of the present application,

an obtaining module, configured to obtain a second frame number corresponding to a second rendered video frame;

determining a next adjacent frame number of the second frame number as a third frame number;

calculating the average value of the second frame number and the third frame number to obtain a target frame number;

based on the target frame number, K pieces of pixel information corresponding to the target video frame are obtained through an interpolation frame prediction model, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer larger than 1.

Another aspect of the present application provides a video processing apparatus, including:

the receiving module is used for receiving a video coding sequence sent by the server;

the decoding module is used for decoding the video coding sequence to obtain a video sequence to be rendered, wherein the video sequence to be rendered comprises X video frames which are not rendered, and X is an integer greater than or equal to 2;

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a target video sequence according to a video sequence to be rendered, the target video sequence comprises X video frames which are not rendered and Y video frames which are obtained after frame interpolation processing, and Y is an integer which is greater than or equal to 1;

and the rendering module is used for rendering the target video sequence to obtain the target video.

In one possible design, in one implementation of another aspect of an embodiment of the present application,

the obtaining module is specifically configured to obtain a first unrendered video frame and a second unrendered video frame from a video sequence to be rendered, where the first unrendered video frame is a previous frame image adjacent to the second unrendered video frame, and the first unrendered video frame and the second unrendered video frame are both video frames obtained without rendering;

performing frame interpolation processing on a first unrendered video frame and a second unrendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the first unrendered video frame, and the target video frame is a previous frame image adjacent to the second unrendered video frame;

and generating a first video subsequence in the target video sequence according to the first unrendered video frame, the target video frame and the second unrendered video frame, wherein the first video subsequence is composed of the first unrendered video frame, the target video frame and the second unrendered video frame in sequence.

In one possible design, in another implementation of another aspect of an embodiment of the present application,

an obtaining module, configured to obtain a first frame number corresponding to a first unrendered video frame;

acquiring a second frame number corresponding to a second unrendered video frame;

calculating the average value of the first frame number and the second frame number to obtain a target frame number;

based on the target frame number, K pieces of pixel information corresponding to the target video frame are obtained through an interpolation frame prediction model, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer larger than 1.

In one possible design, in another implementation of another aspect of an embodiment of the present application,

the obtaining module is specifically configured to obtain a first unrendered video frame and a second unrendered video frame from a video sequence to be rendered, where the first unrendered video frame is a previous frame image adjacent to the second unrendered video frame, and the first unrendered video frame and the second unrendered video frame are both video frames obtained without rendering;

performing frame interpolation processing on the first unrendered video frame and the second unrendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the second unrendered video frame;

and generating a second video subsequence in the target video sequence according to the first unrendered video frame, the target video frame and the second unrendered video frame, wherein the second video subsequence consists of the first unrendered video frame, the second unrendered video frame and the target video frame in sequence.

In one possible design, in another implementation of another aspect of an embodiment of the present application,

an obtaining module, configured to obtain a second frame number corresponding to a second unrendered video frame;

determining a next adjacent frame number of the second frame number as a third frame number;

calculating the average value of the second frame number and the third frame number to obtain a target frame number;

based on the target frame number, K pieces of pixel information corresponding to the target video frame are obtained through an interpolation frame prediction model, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer larger than 1.

Another aspect of the present application provides a computer device, where the computer device is specifically a server, and the server includes: a memory, a transceiver, a processor, and a bus system;

wherein, the memory is used for storing programs;

a processor for executing the program in the memory, the processor for performing the above-described aspects of the method according to instructions in the program code;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

Another aspect of the present application provides a computer device, where the computer device is specifically a terminal device, and the terminal device includes: a memory, a transceiver, a processor, and a bus system;

wherein, the memory is used for storing programs;

a processor for executing the program in the memory, the processor for performing the above-described aspects of the method according to instructions in the program code;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

Another aspect of the present application provides a video processing system, which includes a server and a terminal device;

the server is used for acquiring an original video sequence, wherein the original video sequence comprises P video frames obtained after rendering, and P is an integer greater than or equal to 2;

the server is further used for acquiring a first target video sequence according to the original video sequence, wherein the first target video sequence comprises P video frames obtained after rendering and Q video frames obtained after frame interpolation, and Q is an integer greater than or equal to 1;

the server is also used for coding the first target video sequence to obtain a video coding sequence;

the server is also used for sending the video coding sequence to the terminal equipment;

the terminal device is used for decoding the video coding sequence to obtain a video sequence to be rendered, wherein the video sequence to be rendered comprises (P + Q) video frames which are not rendered;

the terminal device is further configured to obtain a second target video sequence according to the video sequence to be rendered, where the second target video sequence includes (P + Q) video frames that are not rendered and Y video frames obtained after frame interpolation processing, and Y is an integer greater than or equal to 1;

and the terminal equipment is also used for rendering the second target video sequence to obtain the target video.

Another aspect of the present application provides a computer-readable storage medium having instructions stored thereon, which, when executed on a computer, cause the computer to perform the method of the above-described aspects.

According to the technical scheme, the embodiment of the application has the following advantages:

in the embodiment of the application, a method for processing a video is provided, in which a server first obtains an original video sequence, the original video sequence includes P video frames obtained after rendering, then obtains a target video sequence according to the original video sequence, the target video sequence further includes Q video frames obtained after frame interpolation processing, then the server performs coding processing on the target video sequence to obtain a video coding sequence, and finally the server sends the video coding sequence to a terminal device so that the terminal device performs decoding processing on the video coding sequence to obtain a video sequence to be rendered. Through the method, the server only needs to render part of the video frames, and then performs frame interpolation processing based on the rendered video frames, so that the target video sequence is obtained, and the resources consumed by the frame interpolation processing are less than the resources required by the rendering, so that the processing resources on the server side can be saved, the expense of a processor is reduced, and the service processing capacity of the server is favorably improved.

Drawings

FIG. 1 is a block diagram of an embodiment of a video processing system;

fig. 2 is a schematic diagram of a cloud architecture of a video processing system according to an embodiment of the present application;

FIG. 3 is a diagram of a cloud gaming architecture of a video processing system according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an embodiment of a video processing method in the embodiment of the present application;

fig. 5 is a schematic diagram of an embodiment of a server-based implementation of interpolated frame processing in an embodiment of the present application;

FIG. 6 is a diagram illustrating generation of a target video frame in an interpolated frame mode according to an embodiment of the present application;

FIG. 7 is a diagram of an embodiment of training an interpolation prediction model in an embodiment of the present application;

FIG. 8 is a diagram illustrating an embodiment of verifying an interpolated frame prediction model according to an embodiment of the present application;

FIG. 9 is a diagram of an embodiment of implementing an add-drop frame process based on a server in an embodiment of the present application;

FIG. 10 is a diagram illustrating generation of a target video frame in an extrapolation frame mode according to an embodiment of the present application;

fig. 11 is a schematic diagram of another embodiment of a video processing method in the embodiment of the present application;

fig. 12 is a schematic diagram of an embodiment of processing an interpolated frame based on a terminal device in an embodiment of the present application;

fig. 13 is a schematic diagram of an embodiment of implementing an add-drop frame processing based on a terminal device in an embodiment of the present application;

FIG. 14 is a schematic view of an interaction flow of a video processing method according to an embodiment of the present application;

FIG. 15 is a schematic diagram of an embodiment of an interpolation frame processing implemented by a video processing system according to an embodiment of the present application;

FIG. 16 is a schematic diagram of an embodiment of implementing an interpolated frame processing based on a video processing system in an embodiment of the present application;

FIG. 17 is a schematic diagram of an embodiment of implementing interpolation and interpolation frame processing based on a video processing system in an embodiment of the present application;

FIG. 18 is a schematic diagram of another embodiment of an implementation of inter-interpolated frame processing based on a video processing system in an embodiment of the present application;

fig. 19 is a schematic diagram of an embodiment of a video processing apparatus according to the embodiment of the present application;

fig. 20 is a schematic diagram of another embodiment of a video processing apparatus according to an embodiment of the present application;

FIG. 21 is a schematic structural diagram of a server in an embodiment of the present application;

fig. 22 is a schematic structural diagram of a terminal device in an embodiment of the present application;

fig. 23 is a schematic structural diagram of a video processing system according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a video processing method, a related device and a storage medium, which can save processing resources at a server side, reduce the overhead of a processor and facilitate the promotion of the service processing capacity of the server.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may, for example, be implemented in a sequence other than that illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that the video processing method provided by the present application is applicable to a service scene under a remote network and a service scene under a cloud technology, where the service scene under the cloud technology includes, but is not limited to, a cloud game service or a cloud video service. Taking the cloud game service as an example, the application utilizes a dynamic frame insertion technology, and only 30 game frames per second need to be operated on the cloud game server under the requirement that a player experiences 60 (frame) game frames per second, so that the expense of a processor is saved. Taking a cloud video service as an example, the dynamic frame interpolation technology is utilized, and under the requirement that audiences experience 24 frames of animations per second, the cloud video server only needs to run 12 frames of animations per second, so that the expense of a processor is saved.

For convenience of understanding, the present application provides a video processing method, which may be applied to the video processing system shown in fig. 1, please refer to fig. 1, fig. 1 is an architecture schematic diagram of the video processing system in an embodiment of the present application, as shown in the diagram, one server may provide services for multiple terminal devices, for example, the server 1 may establish a communication connection with M terminal devices, the server 2 may establish a communication connection with N terminal devices, values of M and N depend on a processing capability of the server, and in a general case, N and M may take a value of 100.

It is understood that the server shown in fig. 1 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, and is not limited herein.

For convenience of understanding, the present application provides a video processing method, which may also be applied to the video processing system shown in fig. 2, please refer to fig. 2, fig. 2 is a cloud architecture schematic diagram of the video processing system in the embodiment of the present application, as shown in the figure, a plurality of cloud servers jointly form a cloud server cluster, and one cloud server cluster may provide services for a plurality of terminal devices, for example, the cloud server cluster includes 4 cloud servers, and the cloud server cluster may establish communication connection with M terminal devices.

It is understood that the cloud server shown in fig. 2 may be a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, web services, cloud communication, middleware services, domain name services, security services, Content Delivery Network (CDN), big data and artificial intelligence platforms, and the like. The terminal devices shown in fig. 1 and 2 may be, but are not limited to, a smart phone, a tablet computer, a television, an Over The Top (OTT), a notebook computer, a desktop computer, a smart speaker, a smart watch, and The like. The terminal device is deployed with a client, for example, a video client or a game client. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

Based on this, cloud technology (cloud technology) is adopted in the architecture shown in fig. 2 to implement business communication, where the cloud technology refers to a hosting technology that unifies series of resources such as hardware, software, and network in a wide area network or a local area network to implement computation, storage, processing, and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

Referring to fig. 3, fig. 3 is a schematic diagram of a cloud game architecture of a video processing system in an embodiment of the present application, as shown in the figure, when a terminal device interacts with a cloud game server, there may be control flow transmission and data flow transmission, the control flow is mainly responsible for sending a control signal, and when a user triggers an operation through an input device (e.g., a keyboard, a mouse, or a joystick), the signal is encoded in a sending list and transmitted to the cloud game server through a network. Meanwhile, the cloud game server needs to transmit a data stream to the terminal device, the data stream comprises an audio data stream and a video data stream, and the terminal device performs inverse decoding on the data stream and then displays the data stream on a screen.

Cloud gaming (Cloud gaming), also known as game on demand (gaming), is an online gaming technology based on Cloud computing technology. Cloud game technology enables light-end devices (thin clients) with relatively limited graphics processing and data computing capabilities to run high-quality games. In the cloud game scene, the game is not in the game terminal of the player, but runs in the cloud server, and the cloud server renders the game scene into a video and audio stream which is transmitted to the game terminal of the player through the network. The player game terminal does not need to have strong graphic operation and data processing capacity, and only needs to have basic streaming media playing capacity and capacity of acquiring player input instructions and sending the instructions to the cloud server. Cloud gaming can reduce the cost of the device for players to play games to a large extent compared to traditional gaming models. For many high-quality games that require long-term updates, cloud games can also reduce the cost to the gamer to issue and update maintenance games.

In terms of guaranteeing the game experience of the player, the quality of the multimedia stream rendered in the game scene depends on the network communication bandwidth, the multimedia stream of the cloud game needs to consume more bandwidth compared with the traditional network game, and the higher the image quality of the multimedia stream is, the higher the consumed bandwidth resource is. The video processing method provided by the application not only can reduce the resource overhead of the cloud server side, but also can reduce the transmission quantity of video frames, thereby saving the network bandwidth under the condition of ensuring the quality of game pictures. Taking a large Multiplayer Online (MMO) game as an example, the consumption of one GPU is 36%, the consumption of the GPU is 20% after the method provided by the present application is adopted, at most 3 game processes are run on one GPU in a traditional mode, and 5 game processes can be run on one GPU after the method provided by the present application is adopted.

With reference to fig. 4, an embodiment of a video processing method in the present application will be described below from the perspective of a server, where the video processing method in the present application includes:

101. the method comprises the steps that a server obtains an original video sequence, wherein the original video sequence comprises P video frames obtained after rendering, and P is an integer larger than or equal to 2;

in this embodiment, the server acquires consecutive P frames, that is, acquires P video frames, and the P video frames may constitute an original video sequence. The video frame may refer to a rendered video picture, a rendered game picture, or another rendered type picture. The server according to the present application may be a local server (e.g., a game server or a video server), or may be a cloud server (e.g., a cloud game server or a cloud video server).

The screen rendering requires an image obtained by performing operations on information such as geometry and vertices of a drawn graphic by a program. In this process, the processor of the computer needs to perform a large number of operations. In practical applications, both Central Processing Units (CPUs) and Graphics Processing Units (GPUs) can perform rendering tasks. As the requirements on the resolution and the quality of the picture are higher and higher, and the single-precision floating point performance of the CPU is difficult to meet the rendering requirement of complex pictures, most of the graphic rendering work can be taken over by the GPU, and the CPU needs to schedule the GPU for rendering through some instructions.

102. The method comprises the steps that a server obtains a target video sequence according to an original video sequence, wherein the target video sequence comprises P video frames obtained after rendering and Q video frames obtained after frame interpolation processing, and Q is an integer greater than or equal to 1;

in this embodiment, the server performs frame interpolation processing on P video frames in the original video sequence, and may perform frame interpolation processing on two adjacent video frames, for example, perform frame interpolation processing on the No. 1 video frame and the No. 2 video frame, perform frame interpolation processing on the No. 2 video frame and the No. 3 video frame, and so on. The frame interpolation processing may also be performed on two related video frames, for example, the frame interpolation processing is performed on the video frame No. 1 and the video frame No. 2, and then the frame interpolation processing is performed on the video frame No. 3 and the video frame No. 4, and so on. The frame interpolation processing may also be performed by using the interval video frames, for example, the frame interpolation processing is performed on the No. 1 video frame and the No. 2 video frame, and then the frame interpolation processing is performed on the No. 5 video frame and the No. 6 video frame. And generating Q video frames obtained after frame interpolation based on the original video sequence, wherein the Q video frames and the P video frames jointly form a target video sequence.

It will be appreciated that inserting a frame of image consumes less resources than rendering a frame of image. The frame interpolation method includes, but is not limited to, frame sampling, frame blending, motion compensation, and optical flow. The frame sampling means to use the display time of each key frame to be lengthened, which is equivalent to inserting the same key frames. Frame mixing refers to that transparency of key frames before and after a frame is inserted and a new frame is synthesized. Motion compensation refers to recognizing the motion of an object and then performing compensation frame interpolation. The optical flow method is that under the condition that the gray value (or the brightness value) of the same pixel point is constant in the front frame and the rear frame, the motion trail of the pixel point in the front frame and the rear frame is found, and then the prediction frame insertion processing is carried out based on the motion trail.

103. The server carries out coding processing on the target video sequence to obtain a video coding sequence;

in this embodiment, the server performs encoding processing on the target video sequence by using an encoding algorithm, thereby generating a video encoding sequence. The encoding process may be executed on a CPU of the server, or may be executed on the GPU, or may be executed on other encoding hardware, for example, an encoding chip within the GPU or a dedicated encoding chip independent from the GPU. The encoding algorithm can be H264, H265, VP8, VP9, etc., and is not limited herein.

In the cloud game scenario, since the cloud game is usually a low-latency service, a backward-encoding reference frame or a bidirectional-encoding reference frame is not used in the encoding process. This is because, if backward-encoded reference frames or bi-directionally encoded reference frames (i.e. B frames) are considered, when receiving a current video frame, the terminal device needs to wait for the next video frame to arrive before decoding the current video frame, so as to cause a delay of one frame. The video coding sequence includes at least one Group of Pictures (GOP), which is composed of an I-frame and a plurality of B-frames (or P-frames) and is also a basic unit accessed by the encoder and decoder. The I frame is an independent frame carrying all information, and can be independently decoded without referring to other video frames. P frames need to reference the previous I frame for encoding. The B-frame records the difference between the present video frame and the previous and subsequent video frames.

104. And the server sends the video coding sequence to the terminal equipment so that the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered.

In this embodiment, the server sends the video coding sequence to the terminal device, so that the terminal device can decode the video coding sequence to obtain a video sequence to be rendered, and finally, render the video sequence to be rendered, so as to generate a target video, and display the target video on a screen of the terminal device. The decoding process may be executed on a CPU of the terminal device, may be executed on the GPU, and may also be executed on other decoding hardware, for example, a decoding chip within the GPU or a dedicated decoding chip independent of the GPU. After a video frame is obtained by decoding, the video frame can be read by a CPU or a GPU on the terminal equipment side, rendered and displayed on an interface.

In the embodiment of the application, a method for processing a video is provided, in which a server first obtains an original video sequence, the original video sequence includes P video frames obtained after rendering, then obtains a target video sequence according to the original video sequence, the target video sequence further includes Q video frames obtained after frame interpolation processing, then the server performs coding processing on the target video sequence to obtain a video coding sequence, and finally the server sends the video coding sequence to a terminal device so that the terminal device performs decoding processing on the video coding sequence to obtain a video sequence to be rendered. Through the method, the server only needs to render part of the video frames, and then performs frame interpolation processing based on the rendered video frames, so that the target video sequence is obtained, and the resources consumed by the frame interpolation processing are less than the resources required by the rendering, so that the processing resources on the server side can be saved, the expense of a processor is reduced, and the service processing capacity of the server is favorably improved.

Optionally, on the basis of the foregoing embodiments corresponding to fig. 4, in another optional embodiment of the video processing method provided in the embodiment of the present application, the obtaining, by the server, the target video sequence according to the original video sequence may include:

the method comprises the steps that a server obtains a first rendered video frame and a second rendered video frame from an original video sequence, wherein the first rendered video frame is a previous frame image adjacent to the second rendered video frame, and the first rendered video frame and the second rendered video frame are video frames obtained after rendering;

the server carries out frame interpolation processing on the first rendered video frame and the second rendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the first rendered video frame, and the target video frame is a previous frame image adjacent to the second rendered video frame;

the server generates a first video subsequence in the target video sequence according to the first rendered video frame, the target video frame and the second rendered video frame, wherein the first video subsequence is composed of the first rendered video frame, the target video frame and the second rendered video frame in sequence.

In this embodiment, a manner of obtaining a target video sequence based on interpolated frames is described, and it should be noted that, since an original video sequence may include a large number of rendered video frames, for convenience of description, any two adjacent rendered video frames will be described as an example below, in an actual application, a processing manner of other two adjacent frames is similar, and details are not repeated here.

Specifically, the server first obtains two adjacent video frames, namely a first rendered video frame and a second rendered video frame, from the original video sequence, and then generates a new video frame based on the two rendered video frames, namely, generates a target video frame, wherein the target video frame is a frame located between the first rendered video frame and the second rendered video frame, corresponding to an additional frame image inserted. And generating a first video subsequence in the target video sequence according to the sequence of the first rendered video frame, the target video frame and the second rendered video frame, and in practical application, generating a series of video subsequences in a similar mode to finally generate the target video sequence.

For easy understanding, please refer to fig. 5, fig. 5 is a schematic diagram illustrating an embodiment of implementing interpolation frame processing based on a server according to an embodiment of the present invention, and as shown in the drawing, taking a cloud game scene as an example, a cloud game server captures a rendered nth game frame and an n +1 th game frame, and performs interpolation frame processing on the nth game frame and the n +1 th game frame to obtain an additional game frame, i.e., an n +0.5 th game frame. Similarly, the cloud game server continues to acquire the (n + 2) th game frame, and performs interpolation frame processing on the (n + 1) th game frame and the (n + 2) th game frame to obtain an additional game frame, i.e., the (n + 1.5) th game frame. By analogy, the target video sequence is formed by the game pictures of the nth frame, the (n + 0.5) th frame, the (n + 1) th frame, the (n + 1.5) th frame, the (n + 2) th frame and the like, and then the target video sequence is coded to obtain the video coding sequence. The cloud game server transmits the video coding sequence to the terminal equipment through the network, so that the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered, and finally, the video sequence is rendered and displayed on an interface of the terminal equipment.

Based on the above description, please refer to fig. 6, fig. 6 is a schematic diagram of generating a target video frame in the interpolation frame mode in the embodiment of the present application, and as shown in the figure, assuming that the first rendered video frame is the nth frame and the second rendered video frame is the (n + 1) th frame, the target video frame is generated after the interpolation frame processing, where the target video frame is the (n + 0.5) th frame. As can be seen, assuming that the server renders 30 frames of pictures, the actually encoded pictures may be 60 frames, and the pictures output to the terminal device are also 60 frames, so that the number of decoded pictures for the terminal device is 60 frames. The image effect of the interpolation frame is good, and the interpolation frame is suitable for services which have low requirements on time delay and high requirements on image quality, such as non-real-time fighting games and the like.

Secondly, in the embodiment of the present application, a method for obtaining a target video sequence based on an interpolated frame is provided, and in the above manner, the target video sequence obtained by using the interpolated frame has a better image effect, but because a cloud service scene may generate a frame delay, the method is more suitable for services which have low requirements on delay but have high requirements on picture quality. Therefore, the overhead of a server side processor is saved, and meanwhile, the picture quality can be improved.

Optionally, on the basis of each embodiment corresponding to fig. 4, in another optional embodiment of the video processing method provided in the embodiment of the present application, the performing, by the server, frame interpolation processing on the first rendered video frame and the second rendered video frame to obtain the target video frame may include:

the server acquires a first frame number corresponding to a first rendered video frame;

the server acquires a second frame number corresponding to the second rendered video frame;

the server calculates the average value of the first frame number and the second frame number to obtain a target frame number;

the server obtains K pieces of pixel information corresponding to the target video frame through an interpolation prediction model based on the target frame number, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer larger than 1.

In this embodiment, a manner of interpolating a frame based on an interpolation frame prediction model is introduced, and it should be noted that, since an original video sequence may include a large number of rendered video frames, for convenience of description, any two adjacent rendered video frames will be described as an example below, in an actual application, a processing manner of other two adjacent frames is similar, and details are not repeated here.

Specifically, each rendered video frame corresponds to a frame number, where the frame number of the first rendered video frame is a first frame number, the frame number of the second rendered video frame is a second frame number, and assuming that the first frame number is n and the second frame number is n +1, the target frame number is calculated as follows:

U=[n+(n+1)]/2;

wherein, U represents the target frame number, i.e., n + 0.5.

Inputting the target frame number into a trained frame interpolation prediction model, outputting K pieces of pixel information through the frame interpolation prediction model, wherein K represents the total number of pixels included in one video frame, and acquiring the target video frame when the pixel information corresponding to the K pixels is obtained. The pixel information may be expressed in a Luminance-Bandwidth-Chrominance (YUV) form or may be expressed in a Red Green Blue (RGB) form.

In the following, a manner of training the frame interpolation prediction model will be described, taking a cloud game scene as an example, in a general case, one corresponding frame interpolation prediction model may be trained for each game, or the same frame interpolation prediction model may be trained for a plurality of games, which is not limited herein. Firstly, a video to be trained is required to be acquired, the video to be trained comprises a plurality of frames of training images, the mth frame of training image to the r th frame of training image can be extracted from the video to be trained in the training process, wherein m is more than 0 and less than r, and m can take different values in different frame interpolation prediction models. And taking the m-th frame training image to the r-th frame training image as a known image frame sequence, and respectively extracting the pixel information of each frame of training image in the known image frame sequence. The frame interpolation prediction model can be obtained by training by utilizing the pixel information of each frame of training image in the known image frame sequence.

The present application represents an interpolated frame prediction model in the following manner:

f(t,pos)≈frame(t,pos)

wherein t represents the frame number, that is, the value range of t is a numerical value which is greater than or equal to m and less than or equal to r. pos represents a coordinate point (x, y) corresponding to the video frame. frame(t,pos)The pixel information of the coordinate point corresponding to pos in the t video frames is represented in an RGB form, a YUV form, or other forms, which is not exhaustive here. Frame interpolation prediction model f(t,pos)The function can be a one-time fitting function or a multi-time fitting function, and also can adopt a functional relation determined by a neural network or a deep learning method.

For easy understanding, please refer to fig. 7, fig. 7 is a schematic diagram of an embodiment of training an interpolation frame prediction model in the embodiment of the present application, and as shown in the figure, if the interpolation frame prediction model used in training an interpolation frame is needed, an mth frame training image and an m +2 th frame training image may be input to the interpolation frame prediction model to be trained, and a target image is output by the interpolation frame prediction model, where the target image may be a predicted m +1 th frame training image. And the like until each training image in the known image frame sequence is processed similarly.

If the interpolation frame prediction model used in training the interpolation frame is needed, the mth frame training image and the (m + 1) th frame training image may be input to the interpolation frame prediction model to be trained, and the target image may be output by the interpolation frame prediction model, where the target image may be a predicted (m + 2) th frame training image. And the like until each training image in the known image frame sequence is processed similarly.

After a plurality of target images are obtained through prediction, the quality of an interpolation prediction model needs to be evaluated, and one feasible evaluation mode is that the following loss function is adopted for calculation:

L=∑(f(t,pos)-frame(t,pos))2

wherein L represents a loss value.

For convenience of illustration, referring to fig. 8, fig. 8 is a schematic diagram illustrating an embodiment of verifying an interpolation frame prediction model in the present embodiment, as shown in the figure, taking a known image frame sequence with a Frame Per Second (FPS) of 60 as an example, a predicted image corresponding to all odd frames is used as an input of the interpolation frame prediction model, and a target image corresponding to an even frame is output after passing through the interpolation frame prediction model. The target image corresponding to the even frame is compared with the predicted image in the known image frame sequence. For example, the 1 st predicted image and the 3 rd predicted image are input into the interpolation prediction model, the interpolation prediction model outputs a target image, the pixel information of each pixel point on the target image is compared with the pixel information of each pixel point of the 2 nd predicted image, and if the difference between the two is smaller than a threshold value, the training of the interpolation prediction model is finished.

In the embodiment of the present application, a method for interpolating a frame based on an interpolation frame prediction model is provided, and in the above manner, each pixel information in a target video frame can be predicted by using a trained interpolation frame prediction model, and the target video frame is reconstructed from the pixel information, so that a process of interpolating one frame in the video frame is implemented, and thus, the feasibility and operability of the scheme are improved.

Optionally, on the basis of the foregoing embodiments corresponding to fig. 4, in another optional embodiment of the video processing method provided in the embodiment of the present application, the obtaining, by the server, the target video sequence according to the original video sequence may include:

the method comprises the steps that a server obtains a first rendered video frame and a second rendered video frame from an original video sequence, wherein the first rendered video frame is a previous frame image adjacent to the second rendered video frame, and the first rendered video frame and the second rendered video frame are video frames obtained after rendering;

the server carries out frame interpolation processing on the first rendered video frame and the second rendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the second rendered video frame;

and the server generates a second video subsequence in the target video sequence according to the first rendered video frame, the target video frame and the second rendered video frame, wherein the second video subsequence consists of the first rendered video frame, the second rendered video frame and the target video frame in sequence.

In this embodiment, a manner of obtaining a target video sequence based on extrapolated frames is introduced, and it should be noted that, since an original video sequence may include a large number of rendered video frames, for convenience of description, any two adjacent rendered video frames will be described as an example below, in an actual application, processing manners of other two adjacent frames are similar, and details are not repeated here.

Specifically, the server first obtains two adjacent video frames, namely a first rendered video frame and a second rendered video frame, from the original video sequence, and then generates a new video frame based on the two rendered video frames, namely, generates a target video frame, wherein the target video frame is a frame located between the second rendered video frame and the next rendered video frame, which is equivalent to additionally inserting a frame of image. And generating a second video subsequence in the target video sequence according to the sequence of the first rendered video frame, the second rendered video frame and the target video frame, and in practical application, generating a series of video subsequences in a similar mode to finally generate the target video sequence.

For easy understanding, please refer to fig. 9, where fig. 9 is a schematic view illustrating an embodiment of implementing the interpolation frame processing based on a server in the embodiment of the present application, and as shown in the figure, taking a cloud game scene as an example, a cloud game server captures a rendered nth game frame and an n +1 th game frame, and performs the interpolation frame processing on the nth game frame and the n +1 th game frame to obtain an additional game frame, that is, an n +1.5 th game frame. Similarly, the cloud game server continues to acquire the (n + 2) th game frame, and performs frame extrapolation processing on the (n + 1) th game frame and the (n + 2) th game frame to obtain an additional game frame, i.e., the (n + 2.5) th game frame. By analogy, the target video sequence is formed by the game pictures of the nth frame, the (n + 1) th frame, the (n + 1.5) th frame, the (n + 2) th frame, the (n + 2.5) th frame and the like, and then the target video sequence is coded to obtain the video coding sequence. The cloud game server transmits the video coding sequence to the terminal equipment through the network, the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered, and finally the video sequence is rendered and displayed on the interface of the terminal equipment.

Based on the above description, please refer to fig. 10, fig. 10 is a schematic diagram illustrating the generation of the target video frame in the extrapolation frame mode according to the embodiment of the present application, and as shown in the figure, assuming that the first rendered video frame is the nth frame and the second rendered video frame is the (n + 1) th frame, the target video frame is generated after the extrapolation frame processing, wherein the target video frame is the (n + 1.5) th frame. As can be seen, assuming that the server renders 30 frames of pictures, the actually encoded pictures may be 60 frames, and the pictures output to the terminal device are also 60 frames, so that the number of decoded pictures is 60 frames for the terminal device. The extrapolation frame does not usually generate extra time delay, and is suitable for services with higher requirements on time delay, such as real-time battle games and the like.

Secondly, in the embodiment of the present application, a method for obtaining a target video sequence based on an extrapolated frame is provided, and the target video sequence obtained by using the extrapolated frame usually does not generate extra time delay, so that the method is more suitable for services with high requirements on time delay but low requirements on picture quality. Therefore, the overhead of a server side processor is saved, and the phenomenon of picture delay caused by time delay can be avoided.

Optionally, on the basis of each embodiment corresponding to fig. 4, in another optional embodiment of the video processing method provided in the embodiment of the present application, the performing, by the server, frame interpolation processing on the first rendered video frame and the second rendered video frame to obtain the target video frame may include:

the server acquires a second frame number corresponding to the second rendered video frame;

the server determines the next adjacent frame number of the second frame number as a third frame number;

the server calculates the average value of the second frame number and the third frame number to obtain a target frame number;

the server obtains K pieces of pixel information corresponding to the target video frame through an interpolation prediction model based on the target frame number, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer larger than 1.

In this embodiment, a method for performing frame interpolation based on an interpolation frame prediction model is introduced, and it should be noted that, since an original video sequence may include a large number of rendered video frames, for convenience of description, any two adjacent rendered video frames will be described below as an example, in an actual application, the processing manner of other two adjacent frames is similar, and details are not repeated here.

Specifically, each rendered video frame corresponds to a frame number, where the frame number of the first rendered video frame is a first frame number, the frame number of the second rendered video frame is a second frame number, and assuming that the second frame number is n +1 and the third frame number is n +2, the target frame number is calculated as follows:

U=[n+1+(n+2)]/2;

wherein, U represents the target frame number, i.e., n + 1.5.

Inputting the target frame number into a trained frame interpolation prediction model, outputting K pieces of pixel information through the frame interpolation prediction model, wherein K represents the total number of pixels included in one video frame, and acquiring the target video frame when the pixel information corresponding to the K pixels is obtained. The pixel information may be expressed in YUV form or may be expressed in RGB form.

It should be noted that the training method of the frame interpolation prediction model has been described in the above embodiments, and therefore, the description thereof is omitted here.

In the embodiment of the application, a method for performing frame extrapolation based on an interpolation frame prediction model is provided, and through the method, each pixel information in a target video frame can be predicted by using a trained interpolation frame prediction model, and the target video frame is reconstructed by using the pixel information, so that a process of extrapolating one frame from the video frame is realized, and the feasibility and the operability of the scheme are improved.

With reference to fig. 11, a method for processing a video in the present application will be described below from the perspective of a terminal device, where another embodiment of the method for processing a video in the present application includes:

201. the terminal equipment receives a video coding sequence sent by the server;

in this embodiment, the server acquires P consecutive video frames, and these P video frames may constitute the original video sequence. The video frame may refer to a rendered video picture, a rendered game picture, or another rendered type picture. The server according to the present application may be a local server (e.g., a game server or a video server), or may be a cloud server (e.g., a cloud game server or a cloud video server).

The server adopts a coding algorithm to code the original video sequence, thereby generating a video coding sequence. The encoding process may be executed on a CPU of the server, or may be executed on the GPU, or may be executed on other encoding hardware, for example, an encoding chip in the GPU or a dedicated encoding chip independent from the GPU. The encoding algorithm may be H264, H265, VP8, VP9, or the like, and is not limited herein.

202. The terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered, wherein the video sequence to be rendered comprises X video frames which are not rendered, and X is an integer greater than or equal to 2;

in this embodiment, the server sends a video coding sequence to the terminal device, where the video coding sequence includes at least two video frames. The terminal device may decode the video coding sequence, so as to obtain a video sequence to be rendered, where the video sequence to be rendered includes X video frames that are not rendered, and the number of video frames included in the video sequence to be rendered is the same as the number of video frames included in the video coding sequence.

203. The method comprises the steps that terminal equipment obtains a target video sequence according to a video sequence to be rendered, wherein the target video sequence comprises X video frames which are not rendered and Y video frames which are obtained after frame interpolation processing, and Y is an integer which is greater than or equal to 1;

in this embodiment, the terminal device performs frame interpolation on X video frames in a video sequence to be rendered, and may perform frame interpolation on two adjacent video frames, for example, perform frame interpolation on the video frame No. 1 and the video frame No. 2, perform frame interpolation on the video frame No. 2 and the video frame No. 3, and so on. The frame interpolation processing may also be performed on two related video frames, for example, the frame interpolation processing is performed on the video frame No. 1 and the video frame No. 2, and then the frame interpolation processing is performed on the video frame No. 3 and the video frame No. 4, and so on. The frame interpolation processing may also be performed by using the interval video frames, for example, the frame interpolation processing is performed on the No. 1 video frame and the No. 2 video frame, and then the frame interpolation processing is performed on the No. 5 video frame and the No. 6 video frame. And generating Y video frames obtained after frame interpolation based on the video sequence to be rendered, wherein the X video frames and the Y video frames jointly form a target video sequence.

It will be appreciated that inserting a frame of image consumes less resources than rendering a frame of image. The frame interpolation method includes, but is not limited to, frame sampling, frame mixing, motion compensation, and optical flow method, which are not described herein again.

204. And the terminal equipment performs rendering processing on the target video sequence to obtain the target video.

In this embodiment, the terminal device performs rendering processing on the target video sequence, so as to generate a target video, and displays the target video on a screen of the terminal device. The decoding process may be executed on a CPU of the terminal device, may be executed on the GPU, and may also be executed on other decoding hardware, for example, a decoding chip within the GPU or a dedicated decoding chip independent of the GPU. After a video frame is obtained by decoding, the video frame can be read by a CPU or a GPU on the terminal equipment side, rendered and displayed on an interface.

In the following, taking a scene as an example, it is assumed that the server generates 30 frames of images, and the corresponding code rate is 10 megabits per second (Mbps), and the terminal device can realize the effect of 60 frames of images after frame insertion. Further, assuming that the server generates 60-frame images and the corresponding code rate is 20Mbps, the terminal device can achieve the effect of the 60-frame images without frame insertion. Therefore, the method for inserting the frame in the terminal equipment can save transmission bandwidth.

In the embodiment of the application, a video processing method is provided, in which a terminal device receives a video coding sequence sent by a server, decodes the video coding sequence to obtain a video sequence to be rendered, obtains a target video sequence according to the video sequence to be rendered, and finally renders the target video sequence to obtain a target video. Through the method, the server only needs to render fewer video frames and then transmits the video frames to the terminal equipment, the terminal equipment generates the target video sequence in a frame insertion mode and renders the target video, so that the processing resources and the coding overhead of the server can be saved corresponding to the server in the whole process, and the service processing capacity of the server can be improved. The whole process saves transmission bandwidth for the client.

Optionally, on the basis of the foregoing embodiments corresponding to fig. 11, in another optional embodiment of the video processing method provided in the embodiment of the present application, the obtaining, by the terminal device, the target video sequence according to the video sequence to be rendered may include:

the terminal equipment acquires a first unrendered video frame and a second unrendered video frame from a video sequence to be rendered, wherein the first unrendered video frame is a previous frame image adjacent to the second unrendered video frame, and the first unrendered video frame and the second unrendered video frame are video frames obtained after the second unrendered video frame is not rendered;

the terminal equipment performs frame interpolation processing on a first unrendered video frame and a second unrendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the first unrendered video frame, and the target video frame is a previous frame image adjacent to the second unrendered video frame;

the terminal equipment generates a first video subsequence in the target video sequence according to the first unrendered video frame, the target video frame and the second unrendered video frame, wherein the first video subsequence is composed of the first unrendered video frame, the target video frame and the second unrendered video frame in sequence.

In this embodiment, a manner of obtaining a target video sequence based on interpolated frames is described, and it should be noted that, since a video sequence to be rendered may include a large number of unrendered video frames, for convenience of description, any two adjacent unrendered video frames will be described as an example below, in an actual application, processing manners of other two adjacent frames are similar, and details are not described here.

Specifically, the terminal device first obtains two adjacent video frames, namely a first unrendered video frame and a second unrendered video frame, from a video sequence to be rendered, and then generates a new video frame based on the two unrendered video frames, namely a target video frame, wherein the target video frame is a frame located between the first unrendered video frame and the second unrendered video frame, which is equivalent to additionally inserting a frame of image. And generating a first video subsequence in the target video sequence according to the sequence of the first unrendered video frame, the target video frame and the second unrendered video frame, and in practical application, generating a series of video subsequences in a similar manner to finally generate the target video sequence.

For convenience of understanding, please refer to fig. 12, where fig. 12 is a schematic view of an embodiment of implementing interpolation frame processing based on a terminal device in the embodiment of the present application, as shown in the figure, taking a cloud game scene as an example, a cloud game server captures a rendered one-frame game picture, when obtaining multiple frames of rendered game pictures, an original video sequence is obtained, and then the original video sequence is encoded to obtain a video coding sequence. The cloud game server transmits the video coding sequence to the terminal equipment through the network, and then the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered. Then, the terminal device may perform interpolation frame processing based on the nth frame game screen and the (n + 1) th frame game screen to obtain an additional one frame game screen, i.e., an (n + 0.5) th frame game screen. Similarly, the terminal device continues to acquire the (n + 2) th frame game screen, and performs interpolation frame processing on the (n + 1) th frame game screen and the (n + 2) th frame game screen to obtain an additional frame game screen, namely, the (n + 1.5) th frame game screen. By analogy, the game pictures of the nth frame, the (n + 0.5) th frame, the (n + 1) th frame, the (n + 1.5) th frame, the (n + 2) th frame and the like together form a target video sequence, and finally the target video sequence is displayed on an interface of the terminal equipment through rendering.

Based on the above description, referring to fig. 6 again, it is assumed that the first unrendered video frame is the nth frame and the second unrendered video frame is the (n + 1) th frame, and the target video frame is generated after the interpolation frame processing, wherein the target video frame is the (n + 0.5) th frame. As can be seen from this, if the server renders 30 frames of pictures, the actually encoded pictures may be 30 frames, and the pictures output to the terminal device may also be 30 frames, so that the terminal device may render a 60-frame image after the number of decoded pictures is 30 frames and the interpolation frame processing is performed. The image effect of the interpolation frame is good, and the interpolation frame is suitable for services which have low requirements on time delay and high requirements on image quality, such as non-real-time fighting games and the like.

Secondly, in the embodiment of the present application, a method for obtaining a target video sequence based on an interpolated frame is provided, and in the above manner, the target video sequence obtained by using the interpolated frame has a better image effect, but because a cloud service scene may generate a frame delay, the method is more suitable for services which have low requirements on delay but have high requirements on picture quality. Therefore, the overhead of a server side processor is saved, and meanwhile, the picture quality can be improved.

Optionally, on the basis of each embodiment corresponding to fig. 11, in another optional embodiment of the video processing method provided in the embodiment of the present application, the performing, by the terminal device, frame interpolation processing on the first unrendered video frame and the second unrendered video frame to obtain the target video frame may include:

the terminal equipment acquires a first frame number corresponding to a first unrendered video frame;

the terminal equipment acquires a second frame number corresponding to a second unrendered video frame;

the terminal equipment calculates the average value of the first frame number and the second frame number to obtain a target frame number;

the terminal equipment obtains K pieces of pixel information corresponding to the target video frame through an interpolation frame prediction model based on the target frame number, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer larger than 1.

In this embodiment, a manner of interpolating a frame based on an interpolation frame prediction model is introduced, and it should be noted that, since a video sequence to be rendered may include a large number of unrendered video frames, for convenience of description, any two adjacent unrendered video frames will be described as an example below, in an actual application, processing manners of other two adjacent frames are similar, and details are not described here.

Specifically, each unrendered video frame corresponds to a frame number, where the frame number of the first unrendered video frame is a first frame number, the frame number of the second unrendered video frame is a second frame number, and assuming that the first frame number is n and the second frame number is n +1, the calculation method of the target frame number is as follows:

U=[n+(n+1)]/2;

wherein, U represents the target frame number, i.e., n + 0.5.

Inputting the target frame number into a trained frame interpolation prediction model, outputting K pieces of pixel information through the frame interpolation prediction model, wherein K represents the total number of pixels included in one video frame, and acquiring the target video frame when the pixel information corresponding to the K pixels is obtained. The pixel information may be expressed in YUV form or may be expressed in RGB form.

In the embodiment of the present application, a method for interpolating a frame based on an interpolation frame prediction model is provided, and in the above manner, each pixel information in a target video frame can be predicted by using a trained interpolation frame prediction model, and the target video frame is reconstructed from the pixel information, so that a process of interpolating one frame in the video frame is implemented, and thus, the feasibility and operability of the scheme are improved.

Optionally, on the basis of the foregoing embodiments corresponding to fig. 11, in another optional embodiment of the video processing method provided in the embodiment of the present application, the obtaining, by the terminal device, the target video sequence according to the video sequence to be rendered may include:

the terminal equipment acquires a first unrendered video frame and a second unrendered video frame from a video sequence to be rendered, wherein the first unrendered video frame is a previous frame image adjacent to the second unrendered video frame, and the first unrendered video frame and the second unrendered video frame are video frames obtained after the second unrendered video frame is not rendered;

the terminal equipment carries out frame interpolation processing on the first unrendered video frame and the second unrendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the second unrendered video frame;

and the terminal equipment generates a second video subsequence in the target video sequence according to the first unrendered video frame, the target video frame and the second unrendered video frame, wherein the second video subsequence consists of the first unrendered video frame, the second unrendered video frame and the target video frame in sequence.

In this embodiment, a manner of obtaining a target video sequence based on extrapolated frames is introduced, and it should be noted that, since a video sequence to be rendered may include a large number of unrendered video frames, for convenience of description, any two adjacent unrendered video frames will be described below as an example, in an actual application, processing manners of other two adjacent frames are similar, and details are not described here.

Specifically, the terminal device first acquires two adjacent video frames, namely a first unrendered video frame and a second unrendered video frame, from a video sequence to be rendered, and then generates a new video frame based on the two unrendered video frames, namely generates a target video frame, wherein the target video frame is a frame located between the second unrendered video frame and a next unrendered video frame, which is equivalent to additionally inserting a frame of image. And generating a second video subsequence in the target video sequence according to the sequence of the first unrendered video frame, the second unrendered video frame and the target video frame, and in practical application, generating a series of video subsequences in a similar manner to finally generate the target video sequence.

For easy understanding, please refer to fig. 13, where fig. 13 is a schematic view of an embodiment of implementing frame interpolation processing based on a terminal device in the embodiment of the present application, as shown in the figure, taking a cloud game scene as an example, a cloud game server captures a rendered frame of game screen, when obtaining multiple frames of rendered game screens, an original video sequence is obtained, and then the original video sequence is encoded to obtain a video coding sequence. The cloud game server transmits the video coding sequence to the terminal equipment through the network, and then the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered. Therefore, the terminal device can perform extrapolation frame processing according to the nth frame game picture and the (n + 1) th frame game picture to obtain an additional frame game picture, namely the (n + 1.5) th frame game picture. Similarly, the terminal device continues to acquire the (n + 2) th frame of game picture, and performs extrapolation frame processing on the (n + 1) th frame of game picture and the (n + 2) th frame of game picture to obtain an additional frame of game picture, namely the (n + 2.5) th frame of game picture. By analogy, the game pictures of the nth frame, the (n + 1) th frame, the (n + 1.5) th frame, the (n + 2) th frame and the like together form a target video sequence, and finally the target video sequence is displayed on the interface of the terminal equipment through rendering.

Based on the above description, please refer to fig. 10 again, as shown in the figure, it is assumed that the first unrendered video frame is the nth frame and the second unrendered video frame is the (n + 1) th frame, and the target video frame is generated after the interpolation frame processing, wherein the target video frame is the (n + 1.5) th frame. As can be seen from this, if the server renders 30 frames of pictures, the actually encoded pictures may be 30 frames, and the pictures output to the terminal device are also 30 frames, so that the terminal device may render 60 frames of images after the external frame processing, with the decoded number of pictures being 30 frames. The extrapolation frame does not usually generate extra time delay, and is suitable for services with higher requirements on time delay, such as real-time battle games and the like.

Secondly, in the embodiment of the present application, a method for obtaining a target video sequence based on an extrapolated frame is provided, and the target video sequence obtained by using the extrapolated frame usually does not generate extra time delay, so that the method is more suitable for services with high requirements on time delay but low requirements on picture quality. Therefore, the overhead of a server side processor is saved, and the phenomenon of picture delay caused by time delay can be avoided.

Optionally, on the basis of each embodiment corresponding to fig. 11, in another optional embodiment of the video processing method provided in the embodiment of the present application, the performing, by the terminal device, frame interpolation processing on the first unrendered video frame and the second unrendered video frame to obtain the target video frame may include:

the terminal equipment acquires a second frame number corresponding to a second unrendered video frame;

the terminal equipment determines the next adjacent frame number of the second frame number as a third frame number;

the terminal equipment calculates the average value of the second frame number and the third frame number to obtain a target frame number;

the terminal equipment obtains K pieces of pixel information corresponding to the target video frame through an interpolation frame prediction model based on the target frame number, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer larger than 1.

In this embodiment, a method for performing frame interpolation based on an interpolation frame prediction model is introduced, and it should be noted that, since a video sequence to be rendered may include a large number of unrendered video frames, for convenience of description, any two adjacent unrendered video frames will be described below as an example, in practical application, the processing method for other two adjacent frames is similar, and details are not repeated here.

Specifically, each unrendered video frame corresponds to a frame number, where the frame number of the first unrendered video frame is a first frame number, the frame number of the second unrendered video frame is a second frame number, and assuming that the second frame number is n +1 and the third frame number is n +2, the target frame number is calculated as follows:

U=[n+1+(n+2)]/2;

wherein, U represents the target frame number, i.e., n + 1.5.

Inputting the target frame number into a trained frame interpolation prediction model, outputting K pieces of pixel information through the frame interpolation prediction model, wherein K represents the total number of pixels included in one video frame, and acquiring the target video frame when the pixel information corresponding to the K pixels is obtained. The pixel information may be expressed in YUV form or may be expressed in RGB form.

It should be noted that the training method of the frame interpolation prediction model has been described in the above embodiments, and therefore, the description thereof is omitted here.

In the embodiment of the application, a method for performing frame extrapolation based on an interpolation frame prediction model is provided, and through the method, each pixel information in a target video frame can be predicted by using a trained interpolation frame prediction model, and the target video frame is reconstructed by using the pixel information, so that a process of extrapolating one frame from the video frame is realized, and the feasibility and the operability of the scheme are improved.

With reference to fig. 14, a method for video processing in the present application will be described from the perspective of a video processing system, and another embodiment of the video processing method in the present application includes:

301. the method comprises the steps that a server obtains an original video sequence, wherein the original video sequence comprises P video frames obtained after rendering, and P is an integer larger than or equal to 2;

in this embodiment, the server acquires consecutive P frames, that is, acquires P video frames, and the P video frames may constitute an original video sequence. The video frame may refer to a rendered video picture, a rendered game picture, or another rendered type picture. For convenience of description, the server in the present application is described as an example of a cloud game server applied to a cloud game service, and in addition, in an actual application, a CPU and a graph GPU may perform a rendering task, which should not be construed as limiting the present application.

302. The method comprises the steps that a server obtains a first target video sequence according to an original video sequence, wherein the first target video sequence comprises P video frames obtained after rendering and Q video frames obtained after frame interpolation, and Q is an integer greater than or equal to 1;

in this embodiment, the server performs frame interpolation on P video frames in the original video sequence, so as to obtain the first target video sequence, where resources consumed for interpolating one frame of image are less than resources consumed for rendering one frame of image. It is understood that the manner of the frame interpolation process is similar to that of step 102, and therefore, the description thereof is omitted here.

303. The server carries out coding processing on the first target video sequence to obtain a video coding sequence;

in this embodiment, the server performs encoding processing on the first target video sequence by using an encoding algorithm, so as to generate a video encoding sequence. The encoding process may be executed on the CPU of the server, the GPU, or other encoding hardware, for example, an encoding chip inside the GPU or a dedicated encoding chip independent of the GPU. The encoding algorithm may be H264, H265, VP8, VP9, or the like, and is not limited herein. It is understood that the encoding process may refer to the content described in step 103, and thus the description is omitted here.

304. The server sends a video coding sequence to the terminal equipment;

in this embodiment, the server transmits a video encoding sequence to the terminal device through the network, and then the terminal device receives the video encoding sequence, where the video encoding sequence includes (P + Q) video frames.

305. The terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered, wherein the video sequence to be rendered comprises (P + Q) video frames which are not rendered;

in this embodiment, the terminal device may decode the video coding sequence, so as to obtain a video sequence to be rendered, where the video sequence to be rendered includes (P + Q) video frames that are not rendered, that is, the number of video frames included in the video sequence to be rendered is the same as the number of video frames included in the video coding sequence.

306. The terminal equipment acquires a second target video sequence according to the video sequence to be rendered, wherein the second target video sequence comprises (P + Q) video frames which are not rendered and Y video frames obtained after frame interpolation processing, and Y is an integer greater than or equal to 1;

in this embodiment, the terminal device performs frame interpolation on (P + Q) video frames in a video sequence to be rendered, may perform frame interpolation on two adjacent video frames, may also perform frame interpolation on two related video frames, and may also perform frame interpolation by using an interval video frame, which is not limited herein. And generating Y video frames obtained after frame interpolation based on the video sequence to be rendered, wherein the (P + Q) video frames and the Y video frames jointly form a second target video sequence.

It is understood that the frame interpolation method includes, but is not limited to, frame sampling, frame mixing, motion compensation, and optical flow method, which are not described herein again.

307. And the terminal equipment renders the second target video sequence to obtain the target video.

In this embodiment, the terminal device performs rendering processing on the second target video sequence, so as to generate a target video, and displays the target video on a screen of the terminal device. The decoding process may be executed on a CPU of the terminal device, may be executed on the GPU, and may also be executed on other decoding hardware, for example, a decoding chip within the GPU or a dedicated decoding chip independent of the GPU. After a video frame is obtained by decoding, the video frame can be read by a CPU or a GPU on the terminal equipment side, rendered and displayed on an interface.

In the embodiment of the application, a video processing method is provided, in which a frame interpolation mode is adopted on both sides of a server and a terminal device to generate video frames, and by the above mode, the performance of the server and the terminal device can be integrated, for the server, only a part of video frames need to be rendered, then frame interpolation processing is performed based on the rendered video frames to obtain a video sequence to be encoded, for the terminal device, a target video sequence is generated in a frame interpolation mode, and a target video is rendered. And the resource consumed by the frame insertion processing is less than the resource required by the rendering, so that the processing resource at the server side can be saved, the expense of a processor is reduced, the service processing capability of the server is favorably improved, the performance requirement on the terminal equipment is reduced, and the performances of the server and the terminal equipment are effectively balanced. The whole process saves transmission bandwidth for the client.

Based on the video processing method described in fig. 14, four frame interpolation processing methods will be described below.

The first method is a method in which the server and the terminal device each use an interpolated frame. Referring to fig. 15, fig. 15 is a schematic diagram illustrating an embodiment of implementing interpolation frame processing based on a video processing system according to an embodiment of the present application, as shown in the drawing, taking a cloud game scene as an example, a cloud game server captures a rendered nth frame game screen and an n +1 th frame game screen, and performs interpolation frame processing on the nth frame game screen and the n +1 th frame game screen to obtain an additional frame game screen, i.e., an n +0.5 th frame game screen. Similarly, the cloud game server continues to acquire the (n + 2) th game frame, and performs interpolation frame processing on the (n + 1) th game frame and the (n + 2) th game frame to obtain an additional game frame, i.e., the (n + 1.5) th game frame. By analogy, the first target video sequence is formed by the game pictures of the nth frame, the (n + 0.5) th frame, the (n + 1) th frame, the (n + 1.5) th frame, the (n + 2) th frame and the like, and then the first target video sequence is coded to obtain the video coding sequence. The cloud game server transmits the video coding sequence to the terminal equipment through the network, and then the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered. And performing interpolation frame processing according to the nth frame game picture and the (n + 0.5) th frame game picture to obtain an additional frame game picture, namely the (n + 0.25) th frame game picture. Similarly, the terminal device continues to acquire the (n + 1) th frame game screen, and performs interpolation frame processing on the (n + 0.5) th frame game screen and the (n + 1) th frame game screen to obtain an additional frame game screen, namely the (n + 0.75) th frame game screen. By analogy, the game pictures of the nth frame, the (n + 0.25) th frame, the (n + 0.5) th frame, the (n + 0.75) th frame, the (n + 1) th frame and the like together form a second target video sequence, and the second target video sequence is displayed on the interface of the terminal equipment through rendering.

The second mode is that the server and the terminal device both adopt the processing mode of the interpolation frame. Referring to fig. 16, fig. 16 is a diagram illustrating an embodiment of implementing an interpolation frame process based on a video processing system according to an embodiment of the present application, as shown in the drawing, taking a cloud game scene as an example, a cloud game server captures a rendered nth frame game screen and an n +1 th frame game screen, and performs an extrapolation frame process on the nth frame game screen and the n +1 th frame game screen to obtain an additional frame game screen, i.e., an n +1.5 th frame game screen. Similarly, the cloud game server continues to acquire the (n + 2) th game frame, and performs frame extrapolation processing on the (n + 1) th game frame and the (n + 2) th game frame to obtain an additional game frame, i.e., the (n + 2.5) th game frame. By analogy, the first target video sequence is formed by the game pictures of the nth frame, the (n + 1) th frame, the (n + 1.5) th frame, the (n + 2) th frame, the (n + 2.5) th frame and the like, and then the first target video sequence is coded to obtain the video coding sequence. The cloud game server transmits the video coding sequence to the terminal equipment through the network, and then the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered. And performing interpolation frame processing according to the (n + 1) th frame game picture and the (n + 1.5) th frame game picture to obtain an additional frame game picture, namely the (n + 1.75) th frame game picture. Similarly, the terminal device continues to acquire the (n + 2) th game frame, and performs interpolation frame processing on the (n + 1.5) th game frame and the (n + 2) th game frame to obtain an additional game frame, namely the (n + 2.25) th game frame. By analogy, the game pictures of the nth frame, the (n + 1) th frame, the (n + 1.5) th frame, the (n + 1.75) th frame, the (n + 2) th frame, the (n + 2.25) th frame, the (n + 2.5) th frame and the like together form a second target video sequence, and the second target video sequence is displayed on the interface of the terminal device through rendering.

The third way is that the server adopts an interpolation frame, and the terminal equipment adopts an interpolation frame processing way. Referring to fig. 17, fig. 17 is a schematic diagram illustrating an embodiment of implementing an interpolation frame process based on a video processing system in the embodiment of the present application, taking a cloud game scene as an example, a cloud game server captures a rendered nth game frame and an n +1 th game frame, and performs an interpolation frame process on the nth game frame and the n +1 th game frame to obtain an additional game frame, i.e., an n +0.5 th game frame. Similarly, the cloud game server continues to acquire the (n + 2) th game frame, and performs interpolation frame processing on the (n + 1) th game frame and the (n + 2) th game frame to obtain an additional game frame, i.e., the (n + 1.5) th game frame. By analogy, the first target video sequence is formed by the game pictures of the nth frame, the (n + 0.5) th frame, the (n + 1) th frame, the (n + 1.5) th frame, the (n + 2) th frame and the like, and then the first target video sequence is coded to obtain the video coding sequence. The cloud game server transmits the video coding sequence to the terminal equipment through the network, and then the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered. And performing extrapolation frame processing according to the nth frame game picture and the (n + 0.5) th frame game picture to obtain an additional frame game picture, namely the (n + 0.75) th frame game picture. Similarly, the terminal device continues to acquire the (n + 1) th game picture, and performs frame extrapolation processing on the (n + 0.5) th game picture and the (n + 1) th game picture to obtain an additional game picture, namely the (n + 1.25) th game picture. By analogy, the second target video sequence is composed of the game pictures of the nth frame, the (n + 0.5) th frame, the (n + 0.75) th frame, the (n + 1) th frame, the 1.25) th frame, the (n + 1.5) th frame, the (n + 1.75) th frame, the (n + 2) th frame and the like, and is displayed on the interface of the terminal equipment through rendering.

The fourth way is that the server uses the interpolated frame and the terminal device uses the interpolated frame processing. Referring to fig. 18, fig. 18 is a schematic diagram of another embodiment of implementing interpolation and interpolation frame processing based on a video processing system in the embodiment of the present application, as shown in the drawing, taking a cloud game scene as an example, a cloud game server captures a rendered nth game frame and an n +1 th game frame, and performs extrapolation frame processing on the nth game frame and the n +1 th game frame to obtain an additional game frame, i.e., an n +1.5 th game frame. Similarly, the cloud game server continues to acquire the (n + 2) th game frame, and performs frame extrapolation processing on the (n + 1) th game frame and the (n + 2) th game frame to obtain an additional game frame, i.e., the (n + 2.5) th game frame. By analogy, the first target video sequence is formed by the game pictures of the nth frame, the (n + 1) th frame, the (n + 1.5) th frame, the (n + 2) th frame, the (n + 2.5) th frame and the like, and then the first target video sequence is coded to obtain the video coding sequence. The cloud game server transmits the video coding sequence to the terminal equipment through the network, and then the terminal equipment decodes the video coding sequence to obtain a video sequence to be rendered. And then, carrying out interpolation frame processing according to the nth frame game picture and the (n + 1) th frame game picture to obtain an additional frame game picture, namely the (n + 0.5) th frame game picture. Similarly, the terminal device continues to acquire the (n + 1.5) th game frame, and performs interpolation frame processing on the (n + 1) th game frame and the (n + 1.5) th game frame to obtain an additional game frame, namely the (n + 1.25) th game frame. In this way, the second target video sequence is composed of the game pictures of the nth frame, the (n + 0.5) th frame, the (n + 1) th frame, the (n + 1.25) th frame, the (n + 1.5) th frame, the (n + 1.75) th frame, the (n + 2) th frame, the (n + 2.25) th frame, the (n + 2.5) th frame and the like, and is displayed on the interface through rendering.

Referring to fig. 19, fig. 19 is a schematic view of an embodiment of a video processing apparatus according to an embodiment of the present application, and the video processing apparatus 40 includes:

an obtaining module 401, configured to obtain an original video sequence, where the original video sequence includes P video frames obtained after rendering, and P is an integer greater than or equal to 2;

the obtaining module 401 is further configured to obtain a target video sequence according to the original video sequence, where the target video sequence includes P video frames obtained after rendering and Q video frames obtained after frame interpolation, and Q is an integer greater than or equal to 1;

the encoding module 402 is configured to perform encoding processing on a target video sequence to obtain a video encoding sequence;

a sending module 403, configured to send the video coding sequence to the terminal device, so that the terminal device performs decoding processing on the video coding sequence to obtain a video sequence to be rendered.

Alternatively, on the basis of the embodiment corresponding to fig. 19, in another embodiment of the video processing apparatus 40 provided in the embodiment of the present application,

an obtaining module 401, configured to obtain a first rendered video frame and a second rendered video frame from an original video sequence, where the first rendered video frame is an image of a previous frame adjacent to the second rendered video frame, and the first rendered video frame and the second rendered video frame are both video frames obtained after rendering;

performing frame interpolation processing on the first rendered video frame and the second rendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the first rendered video frame, and the target video frame is a previous frame image adjacent to the second rendered video frame;

and generating a first video subsequence in the target video sequence according to the first rendered video frame, the target video frame and the second rendered video frame, wherein the first video subsequence is composed of the first rendered video frame, the target video frame and the second rendered video frame in sequence.

Alternatively, on the basis of the embodiment corresponding to fig. 19, in another embodiment of the video processing apparatus 40 provided in the embodiment of the present application,

an obtaining module 401, specifically configured to obtain a first frame number corresponding to a first rendered video frame;

acquiring a second frame number corresponding to a second rendered video frame;

calculating the average value of the first frame number and the second frame number to obtain a target frame number;

based on the target frame number, K pieces of pixel information corresponding to the target video frame are obtained through an interpolation frame prediction model, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer larger than 1.

Alternatively, on the basis of the embodiment corresponding to fig. 19, in another embodiment of the video processing apparatus 40 provided in the embodiment of the present application,

an obtaining module 401, configured to obtain a first rendered video frame and a second rendered video frame from an original video sequence, where the first rendered video frame is an image of a previous frame adjacent to the second rendered video frame, and the first rendered video frame and the second rendered video frame are both video frames obtained after rendering;

performing frame interpolation processing on the first rendered video frame and the second rendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the second rendered video frame;

and generating a second video subsequence in the target video sequence according to the first rendered video frame, the target video frame and the second rendered video frame, wherein the second video subsequence consists of the first rendered video frame, the second rendered video frame and the target video frame in sequence.

Alternatively, on the basis of the embodiment corresponding to fig. 19, in another embodiment of the video processing apparatus 40 provided in the embodiment of the present application,

an obtaining module 401, specifically configured to obtain a second frame number corresponding to a second rendered video frame;

determining a next adjacent frame number of the second frame number as a third frame number;

calculating the average value of the second frame number and the third frame number to obtain a target frame number;

based on the target frame number, K pieces of pixel information corresponding to the target video frame are obtained through an interpolation frame prediction model, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer larger than 1.

Referring to fig. 20, fig. 20 is a schematic view of an embodiment of a video processing apparatus according to an embodiment of the present application, in which the video processing apparatus 50 includes:

a receiving module 501, configured to receive a video coding sequence sent by a server;

a decoding module 502, configured to decode a video coding sequence to obtain a video sequence to be rendered, where the video sequence to be rendered includes X video frames that are not rendered, and X is an integer greater than or equal to 2;

an obtaining module 503, configured to obtain a target video sequence according to a video sequence to be rendered, where the target video sequence includes X video frames that are not rendered and Y video frames obtained after frame interpolation processing, and Y is an integer greater than or equal to 1;

and a rendering module 504, configured to perform rendering processing on the target video sequence to obtain a target video.

Alternatively, on the basis of the embodiment corresponding to fig. 20, in another embodiment of the video processing apparatus 50 provided in the embodiment of the present application,

an obtaining module 503, specifically configured to obtain a first unrendered video frame and a second unrendered video frame from a video sequence to be rendered, where the first unrendered video frame is a previous frame image adjacent to the second unrendered video frame, and the first unrendered video frame and the second unrendered video frame are both video frames obtained without rendering;

performing frame interpolation processing on a first unrendered video frame and a second unrendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the first unrendered video frame, and the target video frame is a previous frame image adjacent to the second unrendered video frame;

and generating a first video subsequence in the target video sequence according to the first unrendered video frame, the target video frame and the second unrendered video frame, wherein the first video subsequence is composed of the first unrendered video frame, the target video frame and the second unrendered video frame in sequence.

Alternatively, on the basis of the embodiment corresponding to fig. 20, in another embodiment of the video processing apparatus 50 provided in the embodiment of the present application,

an obtaining module 503, specifically configured to obtain a first frame number corresponding to a first unrendered video frame;

acquiring a second frame number corresponding to a second unrendered video frame;

calculating the average value of the first frame number and the second frame number to obtain a target frame number;

based on the target frame number, K pieces of pixel information corresponding to the target video frame are obtained through an interpolation frame prediction model, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer larger than 1.

Alternatively, on the basis of the embodiment corresponding to fig. 20, in another embodiment of the video processing apparatus 50 provided in the embodiment of the present application,

an obtaining module 503, specifically configured to obtain a first unrendered video frame and a second unrendered video frame from a video sequence to be rendered, where the first unrendered video frame is a previous frame image adjacent to the second unrendered video frame, and the first unrendered video frame and the second unrendered video frame are both video frames obtained without rendering;

performing frame interpolation processing on the first unrendered video frame and the second unrendered video frame to obtain a target video frame, wherein the target video frame is a next frame image adjacent to the second unrendered video frame;

and generating a second video subsequence in the target video sequence according to the first unrendered video frame, the target video frame and the second unrendered video frame, wherein the second video subsequence consists of the first unrendered video frame, the second unrendered video frame and the target video frame in sequence.

Alternatively, on the basis of the embodiment corresponding to fig. 20, in another embodiment of the video processing apparatus 50 provided in the embodiment of the present application,

an obtaining module 503, specifically configured to obtain a second frame number corresponding to a second unrendered video frame;

determining a next adjacent frame number of the second frame number as a third frame number;

calculating the average value of the second frame number and the third frame number to obtain a target frame number;

based on the target frame number, K pieces of pixel information corresponding to the target video frame are obtained through an interpolation frame prediction model, wherein the K pieces of pixel information are used for determining the target video frame, and K is an integer larger than 1.

Fig. 21 is a schematic diagram of a server 600 according to an embodiment of the present application, which may have a relatively large difference due to different configurations or performances, and may include one or more CPUs 622 (e.g., one or more processors) and a memory 632, and one or more storage media 630 (e.g., one or more mass storage devices) for storing applications 642 or data 644. Memory 632 and storage medium 630 may be, among other things, transient or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 622 may be configured to communicate with the storage medium 630 and execute a series of instruction operations in the storage medium 630 on the server 600.

The Server 600 may also include one or more power supplies 626, one or more wired or wireless network interfaces 650, one or more input-output interfaces 658, and/or one or more operating systems 641, such as a Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTMAnd so on.

The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 21.

As shown in fig. 22, for convenience of description, only the portions related to the embodiments of the present application are shown, and details of the specific techniques are not disclosed, please refer to the method portion of the embodiments of the present application. The terminal device may be any terminal device including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a Point of Sales (POS), a vehicle-mounted computer, a television, and an OTT box, taking the terminal device as the mobile phone as an example:

fig. 22 is a block diagram illustrating a partial structure of a mobile phone related to a terminal device provided in an embodiment of the present application. Referring to fig. 22, the cellular phone includes: radio Frequency (RF) circuit 710, memory 720, input unit 730, display unit 740, sensor 750, audio circuit 760, wireless fidelity (WiFi) module 770, processor 780, and power supply 790. Those skilled in the art will appreciate that the handset configuration shown in fig. 22 is not intended to be limiting and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 22:

the RF circuit 710 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, for processing downlink information of a base station after receiving the downlink information to the processor 780; in addition, the data for designing uplink is transmitted to the base station. In general, the RF circuit 710 includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuit 710 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), and the like.

The memory 720 may be used to store software programs and modules, and the processor 780 may execute various functional applications and data processing of the cellular phone by operating the software programs and modules stored in the memory 720. The memory 720 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, etc. Further, the memory 720 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 730 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 730 may include a touch panel 731 and other input devices 732. The touch panel 731, also referred to as a touch screen, can collect touch operations of a user (e.g. operations of the user on or near the touch panel 731 by using any suitable object or accessory such as a finger, a stylus, etc.) and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 731 may include two portions of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts it to touch point coordinates, and sends the touch point coordinates to the processor 780, and can receive and execute commands from the processor 780. In addition, the touch panel 731 may be implemented by various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 730 may include other input devices 732 in addition to the touch panel 731. In particular, other input devices 732 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 740 may be used to display information input by the user or information provided to the user and various menus of the mobile phone. The Display unit 740 may include a Display panel 741, and optionally, the Display panel 741 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 731 can cover the display panel 741, and when the touch panel 731 detects a touch operation on or near the touch panel 731, the touch operation is transmitted to the processor 780 to determine the type of the touch event, and then the processor 780 provides a corresponding visual output on the display panel 741 according to the type of the touch event. Although the touch panel 731 and the display panel 741 are shown as two separate components in fig. 22, in some embodiments, the touch panel 731 and the display panel 741 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 750, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts brightness of the display panel 741 according to brightness of ambient light, and a proximity sensor that turns off the display panel 741 and/or a backlight when the mobile phone is moved to an ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in each direction (generally three axes), detect the gravity when stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, tapping) and the like for recognizing the posture of the mobile phone; other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor may be further configured on the mobile phone, which are not described herein again.

Audio circuitry 760, speaker 761, and microphone 762 may provide an audio interface between a user and a cell phone. The audio circuit 760 can transmit the electrical signal converted from the received audio data to the speaker 761, and the electrical signal is converted into a sound signal by the speaker 761 and output; on the other hand, the microphone 762 converts the collected sound signal into an electric signal, converts the electric signal into audio data after being received by the audio circuit 760, and then processes the audio data output processor 780, and then transmits the audio data to, for example, another cellular phone through the RF circuit 710, or outputs the audio data to the memory 720 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 770, and provides wireless broadband Internet access for the user. Although fig. 22 shows the WiFi module 770, it is understood that it does not belong to the essential constitution of the handset, and can be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 780 is a control center of the mobile phone, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 720 and calling data stored in the memory 720, thereby integrally monitoring the mobile phone. Optionally, processor 780 may include one or more processing units; optionally, processor 780 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 780.

The handset also includes a power supply 790 (e.g., a battery) for providing power to the various components, optionally, the power supply may be logically connected to the processor 780 via a power management system, so as to implement functions such as managing charging, discharging, and power consumption via the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.

The steps performed by the terminal device in the above-described embodiment may be based on the terminal device configuration shown in fig. 22.

Referring to fig. 23, fig. 23 is a schematic structural diagram of a video processing system in an embodiment of the present application, and as shown in the figure, a server 801 acquires an original video sequence, where the original video sequence includes P video frames obtained after rendering, and P is an integer greater than or equal to 2. The server 801 obtains a first target video sequence according to the original video sequence, where the first target video sequence includes P video frames obtained through rendering and Q video frames obtained through frame interpolation, and Q is an integer greater than or equal to 1. The server 801 performs encoding processing on the first target video sequence to obtain a video encoding sequence. The server 801 sends the video coding sequence to the terminal device 802. The terminal device 802 decodes the video coding sequence to obtain a video sequence to be rendered, where the video sequence to be rendered includes (P + Q) video frames that are not rendered. The terminal device 802 obtains a second target video sequence according to the video sequence to be rendered, where the second target video sequence includes (P + Q) video frames that are not rendered and Y video frames obtained after frame interpolation processing, and Y is an integer greater than or equal to 1. The terminal device 802 performs rendering processing on the second target video sequence to obtain a target video.

Further, in an embodiment of the present application, a computer-readable storage medium is provided, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the steps executed by the video processing apparatus in the method as described in the foregoing embodiments.

Embodiments of the present application also provide a computer program product including a program, which, when run on a computer, causes the computer to perform the steps performed by the video processing apparatus in the methods described in the foregoing embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

41页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:全景视频图像处理方法、服务端、目标设备、装置和系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类