Data coding method in real-time video communication and media terminal equipment

文档序号：1820079 发布日期：2021-11-09 浏览：15次中文

阅读说明：本技术 一种实时视频通信中的数据编码方法及媒体终端设备 (Data coding method in real-time video communication and media terminal equipment ) 是由钱晓炯周银沈伟伟项文于 2021-08-04 设计创作，主要内容包括：本发明涉及一种实时视频通信中的数据编码方法及媒体终端设备,其中方法包括预先保存编码器状态集合,实时视频通信会话开始后先使用编号为i的编码器状态运行t1秒,然后获取GTH、W、H、BW、ET和Cur,计算THmax[Cur]和THbw[Cur],然后计算编码器状态集合中THmax的集合和THbw的集合,进一步得到每个编码器状态的TH值,将GTH和每个编码器状态中的TH值逐个比较大小来更新Cur值,使用编号为Cur的编码器状态运行t2秒,然后重复执行上述步骤,直到实时视频通信会话结束。与现有技术相比,本发明采用混合编码方式,能够适应频繁变化的网络状态,同时发挥不同编码器不同优势,获得最大的信息传递收益和相对最低的网络或计算资源。(The invention relates to a data coding method in real-time video communication and media terminal equipment, wherein the method comprises the steps of saving an encoder state set in advance, operating for t1 seconds by using an encoder state with the number of i after a real-time video communication session starts, then acquiring GTH, W, H, BW, ET and Cur, calculating THmax [ Cur ] and THbw [ Cur ], then calculating the set of THmax and the set of THbw in the encoder state set, further acquiring the TH value of each encoder state, comparing the TH value of GTH and each encoder state one by one to update the Cur value, operating for t2 seconds by using the encoder state with the number of Cur, and then repeatedly executing the steps until the real-time video communication session ends. Compared with the prior art, the invention adopts a hybrid coding mode, can adapt to frequently changing network states, simultaneously exerts different advantages of different encoders, and obtains the maximum information transmission benefit and the relatively lowest network or computing resource.)

1. A method of encoding data in real-time video communication, the method comprising: prestoring an encoder state set, wherein objects in the encoder state set are encoder states, the number of the encoder state sets is N, the encoder states are configuration states of a certain encoder, and each encoder state comprises the following contents: the encoder comprises a number, an encoder mode name, RS, CR, THmax, THbw, TH and Certainty, wherein RS represents the relative speed of the state of the encoder and is a preset constant; CR represents a compression rate of the encoder state, and is a predetermined constant; THmax represents the maximum video throughput rate of the encoder state; THbw represents the video throughput rate that can be achieved based on the current available network bandwidth, in bps; TH represents the video throughput rate that the encoder state can achieve, in bps; certainty is a Certainty flag, which indicates whether the maximum throughput rate THmax for a certain encoder state is determined through testing, and the values thereof are as follows: unknown or presume or confirmed, where unknown represents unknown; presume means by RS inference, confirmed means that actual measurement has been determined;

the data in the real-time video communication is then encoded by:

step 1, after a real-time video communication session starts, firstly, the state of an encoder with the number i is used by default to run for t1 seconds, and then the step 2 is started;

step 2, obtaining the following parameters: GTH, W, H, BW, ET and Cur;

w and H are the width and height of the coded image respectively, and are given according to the width and height of the maximum image actually coded currently;

ET is the encoding duration, is the average encoding duration of the multi-frame image with the corresponding resolution, and is a constant;

BW is the network available bandwidth estimated at present, BWE-other _ bitrate, BWE is the network bandwidth estimated at present, other _ bitrate is other bandwidth that must be reserved;

cur is the number of the currently used encoder state, the value range of Cur is 1-N, and the initial value of Cur is i;

step 3, calculating to obtain parameter data THmax [ Cur ] in the current encoder state, and making a Certainty mark Certainty [ Cur ] as confirmed;

THmax [ Cur ] is the value of the parameter THmax in the encoder state numbered Cur; certainty [ Cur ] is the value of parameter Certainty in the encoder state numbered Cur;

step 4, calculating a set of the maximum video throughput rates THmax corresponding to the states of all the other encoders except that the Certainty in the encoder state set is confirmed, and setting the Certainty as presume;

wherein the value range of t is 1-N; THmax [ t ] is the value of the parameter THmax in the encoder state numbered t;

step 5, calculating THbw values of all encoder states in the encoder state set:

and wherein

k ranges from 1 to N; THbw [ k ] is the value of the parameter THbw in the encoder state numbered k; CR [ k ] is the value of the parameter CR in the encoder state numbered k;

step 6, comparing THmax and THbw in each encoder state in the encoder state set one by one, taking a smaller value of THmax and THbw, and assigning the obtained value to TH of the corresponding encoder state;

step 7, comparing the GTH obtained in step 2 with the TH value in each encoder state in the encoder state set one by one:

if the GTH is larger than the TH value in all the encoder states, finding the encoder state with the largest TH value in all the encoder states, and assigning the serial number of the encoder state to Cur;

if the TH values in all the encoder states are larger than the GTH, the encoder states with the TH values larger than the GTH in the encoder states are combined into a set which is called a temporary encoder state set, in the temporary encoder state set, the TH value in each encoder state is larger than the GTH, the encoder state with the highest CR value in the temporary encoder state set is found, and the serial number of the encoder state is assigned to Cur;

and 8, operating for t2 seconds by using the encoder state with the number Cur, then returning to the step 2, and repeatedly executing the step 2 to the step 7 until the real-time video communication session is ended.

2. The method of claim 1, wherein the encoder mode name in the encoder state with number i is h264-veryfast, and t1 is 2-5; t2 is 3-7.

3. The method of claim 1, wherein the encoder states in the set of encoder states comprise:

4. a media terminal device, including the media engine module which can establish the audio and video channel with the called party and is responsible for the receiving and sending of the audio and video media data and the encoding and decoding, characterized in that: an encoder state policy selection module integrated within the media engine module, the encoder state policy selection module encoding data in real-time video communication using the data encoding method of claim 1.

Technical Field

The invention relates to a data coding method in real-time video communication and media terminal equipment.

Background

In a service scene of a real-time video call, encoding specifications adopted by audio and video data have a plurality of formats, including h.263, h.264, h.265, h.266, VP8, VP9, AV1, and the like, and different encoding formats have great compression efficiency differences (for example, the compression efficiencies of h.263 and h.266 are different by 10 times at most); generally, the more efficient the compression is, the more complex the compression is, and the more the computational power needs to be (similarly, comparing h.263 and h.266, the computational power needs may differ by a factor of 100).

Even if the same encoding format such as h.264, the compression efficiency and the required computational power brought by different profiles (encoder modes) differ by several times; also, even if the same profile (encoder mode) is used, there are encoders such as H264 that provide different encoding speeds, and the compression efficiency and speed are different by several times and several tens of times, respectively.

For terminal equipment for real-time video call operation, some are mobile phones, some are PCs, some are embedded equipment such as boxes or watches, and the calculation difference is also several times to hundreds times.

Due to the complexity of real-time video telephony, the combination of encoder applications, i.e. the optimal solution, is usually not considered in order to guarantee basic interworking. Specifically, the existing scheme is usually based on a static priority configuration, for example, the priority is configured in advance at the terminal, for example, the data is encoded by first using h.265, then the data is encoded by using h.264, then negotiation is performed according to a capability negotiation protocol (RFC4566/RFC3254), and the encoding and decoding type is determined by both parties. A little bit of scheme is optimized, the difference of the performance of the equipment is considered, asymmetric negotiation is supported, so that different characteristics of the low-performance equipment and the high-performance equipment are exerted, different coding and decoding formats are adopted, however, after the negotiation is finished, the adopted coding format is still fixed and unchanged, and therefore, in the real-time operation process, the change caused by the frequency reduction operation due to the overhigh temperature of the hardware equipment and the like is avoided, and the change caused by the dynamic change of the network bandwidth is also avoided. Therefore, it can be said that the existing scheme does not obtain an optimal processing method based on the known various video coding and decoding capabilities. However, how to negotiate the capability based on the capability supported by the computing power of the existing equipment is still a problem worthy of optimization.

Disclosure of Invention

The first technical problem to be solved by the present invention is to provide a data encoding method in real-time video communication, which can adapt to frequently changing network states and simultaneously exert different advantages of different encoders, thereby obtaining the maximum information transmission benefit and the relatively lowest network or computing resources.

A further object of the present invention is to provide a media terminal device, which can adapt to frequently changing network states during real-time video call and simultaneously exert different advantages of different encoders, thereby obtaining the maximum information transmission benefit and relatively minimum network or computing resources.

The technical scheme adopted by the invention for solving the above-mentioned primary technical problems is as follows: a method of encoding data in real-time video communication, the method comprising: prestoring an encoder state set, wherein objects in the encoder state set are encoder states, the number of the encoder state sets is N, the encoder states are configuration states of a certain encoder, and each encoder state comprises the following contents: the encoder comprises a number, an encoder mode name, RS, CR, THmax, THbw, TH and Certainty, wherein RS represents the relative speed of the state of the encoder and is a preset constant; CR represents a compression rate of the encoder state, and is a predetermined constant; THmax represents the maximum video throughput rate of the encoder state; THbw represents the video throughput rate that can be achieved based on the current available network bandwidth, in bps; TH represents the video throughput rate that the encoder state can achieve, in bps; certainty is a Certainty flag, which indicates whether the maximum throughput rate THmax for a certain encoder state is determined through testing, and the values thereof are as follows: unknown or presume or confirmed, where unknown represents unknown; presume means by RS inference, confirmed means that actual measurement has been determined;

the data in the real-time video communication is then encoded by:

step 1, after a real-time video communication session starts, firstly, the state of an encoder with the number i is used by default to run for t1 seconds, and then the step 2 is started;

step 2, obtaining the following parameters: GTH, W, H, BW, ET and Cur;

where the GTH is the target video throughput rate,x is the number of video streams to be transmitted in real-time video communication, W [ j ]]Indicating the width of the corresponding j video stream image; h [ j ]]Indicating the height of the corresponding jth video stream image; GFPS is a subscribed target video frame rate, and the YUV format video occupies 1.5 bytes and 8 bits per byte per pixel, so that the GTH unit is the number of pixel bits per second, namely bps;

w and H are the width and height of the coded image respectively, and are given according to the width and height of the maximum image actually coded currently;

ET is the encoding duration, is the average encoding duration of the multi-frame image with the corresponding resolution, and is a constant;

BW is the network available bandwidth estimated at present, BWE-other _ bitrate, BWE is the network bandwidth estimated at present, other _ bitrate is other bandwidth that must be reserved;

cur is the number of the currently used encoder state, the value range of Cur is 1-N, and the initial value of Cur is i;

step 3, calculating to obtain parameter data THmax [ Cur ] in the current encoder state, and making a Certainty mark Certainty [ Cur ] as confirmed;

THmax [ Cur ] is the value of the parameter THmax in the encoder state numbered Cur; THbw [ Cur ] is the value of the parameter THbw in the encoder state numbered Cur; certainty [ Cur ] is the value of parameter Certainty in the encoder state numbered Cur;

wherein the value range of t is 1-N;

step 5, calculating THbw values of all encoder states in the encoder state set:

and wherein

k ranges from 1 to N;

step 7, comparing the GTH obtained in step 2 with the TH value in each encoder state in the encoder state set one by one:

2. The method of claim 1, wherein the encoder mode name in the encoder state with number i is h264-veryfast, and t1 is 2-5; t2 is 3-7.

3. The method of claim 1, wherein the encoder states in the set of encoder states comprise:

the technical scheme adopted by the invention for solving the further technical problems is as follows: a media terminal device, including the media engine module which can establish the audio and video channel with the called party and is responsible for the receiving and sending of the audio and video media data and the encoding and decoding, characterized in that: an encoder state strategy selection module is integrated in the media engine module, and the encoder state strategy selection module encodes data in real-time video communication by adopting the data encoding method.

Compared with the prior art, the invention has the advantages that: on the basis of the known video coding and decoding capabilities of various software and hardware, the method adopts a hybrid coding mode according to various key variable parameters participating in the actual scene, particularly the current network state and the coding and decoding and throughput information processing capabilities of different coder states, obtains the coder state with the optimal processing effect again after a period of time, then uses the optimal coding and decoding state to code and decode communication data, can adapt to different terminal environments, does not need to detect the CPU occupancy rate, and has wide adaptability; the method can adapt to frequently changing network states, simultaneously exert different advantages of different encoders, and obtain the maximum information transmission benefit and the relatively lowest network or computing resource.

Drawings

Fig. 1 is a flowchart of a data encoding method in real-time video communication according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying examples.

As shown in fig. 1, a data encoding method in real-time video communication first pre-stores an encoder state set, where an object in the encoder state set is an encoder state, the number of the encoder state sets is N, the encoder state is a configuration state of a certain encoder, and each encoder state includes: the encoder comprises a number, an encoder mode name, RS, CR, THmax, THbw, TH and Certainty, wherein RS represents the relative speed of the state of the encoder and is a preset constant; CR represents a compression rate of the encoder state, and is a predetermined constant; THmax represents the maximum video throughput rate of the encoder state, THmax is unknown at first in bps and can be measured after running for a period of time, and the calculation formula of THmax participates in step 3; THbw represents the video throughput rate which can be achieved based on the current available network bandwidth, the unit is bps, THbw is unknown at first and can be measured after running for a period of time, and a calculation formula of THbw participates in the step 4; TH represents the video throughput rate which can be achieved by the state of the encoder, the unit is bps, TH is unknown at first and can be measured after running for a period of time, and the obtaining mode of TH participates in step 6; certainty is a Certainty flag, which indicates whether the maximum throughput rate THmax for a certain encoder state is determined through testing, and the values thereof are as follows: unknown or presume or confirmed, where unknown represents unknown; presume means by RS inference, confirmed means that actual measurement has been determined;

the encoder mode set content is exemplified as follows:

numbering	Encoder mode name	RS	CR	THmax	THbw	TH	Certainty
								1	h264-ultrafast	1	165
2	h264-superfast	0.6	276
								3	h264-veryfast	0.5	331
4	h264-faster	0.3	368
								5	h264-fast	0.2	415
6	h264-medium	0.1	442
								7	av1-ultrafast	0.25	737
8	av1-superfast	0.2	829
								9	av1-veryfast	0.1	921
10	av1-fast	0.05	950
								11	iOS-HW-HEVC	0.9	400

Each encoder state contains a plurality of attributes including number, encoder schema name, RS, CR, THmax, THbw, TH and Certainty, which are objectified and stored in an array, which is called "encoder state set";

the data in the real-time video communication is then encoded by:

step 1, after a real-time video communication session starts, firstly, the state of an encoder with the number i is used by default to run for t1 seconds, and then the step 2 is started; in this embodiment, i is 3, corresponding to the encoder mode name in the encoder state being h264-veryfast, t1 is 2-5;

step 2, obtaining the following parameters: GTH, W, H, BW, ET and Cur;

w and H are the width and height of the coded image respectively, and are given according to the width and height of the maximum image actually coded currently;

ET is the encoding duration, is the average encoding duration of the multi-frame image with the corresponding resolution, and is a constant;

BW is the network available bandwidth estimated at present, BWE-other _ bitrate, BWE is the network bandwidth estimated at present, other _ bitrate is other bandwidth that must be reserved;

cur is the number of the currently used encoder state, the value range of Cur is 1-N, and the initial value of Cur is i;

step 3, calculating to obtain parameter data THmax [ Cur ] in the current encoder state, and making a Certainty mark Certainty [ Cur ] as confirmed;

wherein the value range of t is 1-N;

step 5, calculating THbw values of all encoder states in the encoder state set:

and wherein

k ranges from 1 to N;

step 7, comparing the GTH obtained in step 2 with the TH value in each encoder state in the encoder state set one by one:

step 8, operating for t2 seconds by using the encoder state with the number of Cur, then returning to the step 2, and repeatedly executing the step 2 to the step 7 until the real-time video communication session is ended; t2 is 3-7 seconds, preferably 5 seconds.

The embodiment of the invention also provides a media terminal device, which comprises a media engine module which can establish an audio and video channel with a called party and is responsible for receiving, transmitting, encoding and decoding audio and video media data, wherein an encoder state strategy selection module is integrated in the media engine module, and the encoder state strategy selection module adopts the data encoding method to encode the data in real-time video communication.

11页详细技术资料下载

Data coding method in real-time video communication and media terminal equipment

相关技术

网友询问留言