Method for determining video delay in real-time video

文档序号:172949 发布日期:2021-10-29 浏览:33次 中文

阅读说明:本技术 一种实时视频中视频延时确定方法 (Method for determining video delay in real-time video ) 是由 胡一凡 张宇 殷力 李晓聪 何凯 于 2021-05-17 设计创作,主要内容包括:一种实时视频中视频延时确定方法,包括:用户创建聊天房间,构建用户同自己聊天的场景;第一视频终端的发送模块采集带有时间信息的视频画面,封包发送,并在封包协议的填充字段加上发包时的系统时刻;第二视频终端的接收模块接收音视频包,获取接收时的系统时刻,解封装音视频包,获取音视频包中填充字段内容,得出网络传输延时;对接收模块收到的音视频包进行解码和渲染,在屏幕上进行显示,屏幕获取模块周期获取屏幕画面,得出接收终端延时。本发明发送模块和接收模块获取的是同一系统的系统时刻,避免了系统误差,同时,网络传输延时和接收终端延时严格区别开,且未引入其他可能影响延时确定结果的环节,确定的网络传输延时和接收终端延时精确度高。(A method for determining video delay in real-time video comprises the following steps: a user creates a chat room and constructs a scene that the user chats with the user; a sending module of a first video terminal collects a video picture with time information, packages and sends the video picture, and adds system time when the package is sent in a filling field of a package protocol; a receiving module of the second video terminal receives the audio and video packet, acquires the system time during receiving, decapsulates the audio and video packet, acquires the content of a filling field in the audio and video packet, and obtains network transmission delay; and decoding and rendering the audio and video packet received by the receiving module, displaying the audio and video packet on a screen, and periodically acquiring a screen picture by the screen acquisition module to obtain the time delay of the receiving terminal. The sending module and the receiving module acquire the system time of the same system, so that system errors are avoided, meanwhile, the network transmission delay and the receiving terminal delay are strictly distinguished, other links which possibly influence the delay determination result are not introduced, and the determined network transmission delay and the determined receiving terminal delay are high in accuracy.)

1. A method for determining video delay in real-time video is characterized by comprising the following steps:

s100, creating a chat room by a user, and constructing a scene that the user chats with the user;

s200, a sending module of a first video terminal collects a video picture with time information, packages and sends the video picture, and adds system time when the package is sent in a filling field of a package protocol;

s300, a receiving module of the second video terminal receives the audio and video packet, obtains system time during receiving, decapsulates the audio and video packet, obtains contents of filling fields in the audio and video packet, and obtains network transmission delay;

s400, decoding and rendering the audio and video packet received by the receiving module, displaying the audio and video packet on a screen, and periodically acquiring a screen picture by the screen acquisition module to obtain the time delay of the receiving terminal.

2. The method according to claim 1, wherein in S200, a sending module of the video terminal encapsulates video data using an RTP protocol, and in a generated RTP packet, a filling flag is set to 1 at a system time when a filling area is filled in the generated RTP packet.

3. The method according to claim 1, wherein in step S200, when the size of the video data sent in a packet is too large, the video data is sent in packets, and the same system time is filled in the filling area of each packet, and the filling flag of each packet is set to have data filling.

4. The method for determining video delay in real-time video according to claim 1, wherein in S300, the method for obtaining network transmission delay comprises:

s301, a sending module of the first video terminal sends a video packet to a video server, and the video server forwards the received video packet to a receiving module of the video terminal;

s302, a receiving module of the second video terminal receives the video packet, obtains the current system time, decapsulates the video packet, reads a data value of a filling area, and sends the decapsulated video data to a playing end for decoding and rendering display;

and S303, the receiving module determines the back-and-forth time delay of the current frame of video data from the video terminal to the video server according to the current system time and the data value of the filling area and the difference value of the current system time and the data value of the filling area.

5. The method for determining video delay in real-time video according to claim 1, wherein in S400, the method for obtaining the delay of the receiving terminal comprises:

s401, a screen acquisition module acquires a screen picture of a video terminal in a fixed period and stores the picture as a picture;

s402, checking S401 to store any one of the picture contents in the picture, wherein the video picture displayed by the screen picture display frame has time information, and reading the time value of the video picture;

s403, acquiring time information carried by video frame data to be sent in the read picture in S402, continuously reading the picture behind the current picture, searching a target picture with the same time information as the time information carried by the video frame data to be sent read in S402 in the video picture received by the receiving module, and reading the time information carried by the video frame data to be sent in the target picture;

s404, determining the total delay of the video data of the current frame in the network transmission and receiving terminal according to the time value of the video picture read in the S402 and the time information carried by the video frame data to be sent in the target picture in the S403, and subtracting the network transmission delay in the S300 to obtain the delay of the receiving terminal.

6. The method according to claim 1, wherein the sending module of the first video terminal and the receiving module of the second video terminal operate in the same system, so as to ensure that the sending time and the receiving time of the video packets are collected in the same system.

7. The method as claimed in claim 5, wherein the screen capture module captures a fixed period of a screen image of the video terminal related to the video frame rate, and the relationship is: the period T is related to the video frame rate fps in milliseconds, with T equal to the reciprocal of fps.

8. The method according to claim 5, wherein in the receiving terminal delay determining process, if the video data packet is lost, which results in that the time value read from the graph in S402 does not find a target picture matching the time value in S403, the current picture is discarded, the operation of S402 is repeated, and another picture is replaced, and S403-S404 are executed to determine the receiving terminal delay.

Technical Field

The invention relates to the field of video streaming media, in particular to a method for determining video delay in a real-time video.

Background

The real-time video communication system mainly comprises a video sending end, a video service end and a video receiving end, is an image, sound and character interactive system based on an IP network, and enables real-time audio and video communication of two or more people around the world to be possible. The real-time video communication system has wide application range and various use forms, and can be used in video conferences, video group chatting, network anchor interaction, live broadcasting and wheat connecting and other applications. When a real-time video communication system is used, particularly when a plurality of people chat or live broadcast are connected with the microphone at the same time, the delay between two communication parties can be obviously sensed, and the delay is caused by a network, a video server and a video receiving end. In terms of engineering use, people put more energy on the determination of the delay of a video server, and the delay caused by links such as network transmission and receiving terminal caching, decoding and rendering is neglected.

In a patent with chinese patent publication No. CN110519127A, a method for determining network node delay is introduced, and the specific implementation manner is that "the local node can calculate the network delay between the local node and the node to be detected according to the time when the node to be detected receives the detection data packet and the time when the local node itself sends the detection data packet, that is, the time when the detection data packet is written into the detection port". It can be found that the method collects the time for sending the detection data packet and the time for receiving the detection data packet by the node to be detected, the collected times are not the same system time, and the system times of different systems may have errors, which may result in low calculated network delay precision.

Disclosure of Invention

In view of the above, the present invention has been made to provide a method for determining video latency in real-time video that overcomes or at least partially solves the above mentioned problems.

In order to solve the technical problem, the embodiment of the application discloses the following technical scheme:

a method for determining video delay in real-time video comprises the following steps:

s100, creating a chat room by a user, and constructing a scene that the user chats with the user;

s200, a sending module of a first video terminal collects a video picture with time information, packages and sends the video picture, and adds system time when the package is sent in a filling field of a package protocol;

s300, a receiving module of the second video terminal receives the audio and video packet, obtains system time during receiving, decapsulates the audio and video packet, obtains contents of filling fields in the audio and video packet, and obtains network transmission delay;

s400, decoding and rendering the audio and video packet received by the receiving module, displaying the audio and video packet on a screen, and periodically acquiring a screen picture by the screen acquisition module to obtain the time delay of the receiving terminal.

Further, in S200, the sending module of the video terminal encapsulates the video data using an RTP protocol, and in the generated RTP packet, the filling area fills the system time at this time, and the filling flag is set to 1.

Further, in S200, when the packet sending video data is too large, the video data is sent in packets, and the same system time is filled in the filling area of each packet, and the filling flag of each packet is set to have data filling.

Further, in S300, the method for obtaining the network transmission delay includes:

s301, a sending module of the first video terminal sends a video packet to a video server, and the video server forwards the received video packet to a receiving module of the video terminal;

s302, a receiving module of the second video terminal receives the video packet, obtains the current system time, decapsulates the video packet, reads a data value of a filling area, and sends the decapsulated video data to a playing end for decoding and rendering display;

and S303, the receiving module determines the back-and-forth time delay of the current frame of video data from the video terminal to the video server according to the current system time and the data value of the filling area and the difference value of the current system time and the data value of the filling area.

Further, in S400, the method for obtaining the delay of the receiving terminal is as follows:

s401, a screen acquisition module acquires a screen picture of a video terminal in a fixed period and stores the picture as a picture;

s402, checking S401 to store any one of the picture contents in the picture, wherein the video picture displayed by the screen picture display frame has time information, and reading the time value of the video picture;

s403, acquiring time information carried by video frame data to be sent in the read picture, continuously reading the picture behind the current picture, searching a target picture with the same time information as the time information carried by the video frame data to be sent read in S402 in the video picture received by the receiving module, and reading the time information carried by the video frame data to be sent in the target picture;

s404, determining the total delay of the video data of the current frame in the network transmission and receiving terminal according to the time value of the video picture read in the S402 and the time information carried by the video frame data to be sent in the target picture in the S403, and subtracting the network transmission delay in the S300 to obtain the delay of the receiving terminal.

Furthermore, a sending module and a receiving module of the video terminal operate in the same system, and the sending time and the receiving time of the video packet are ensured to be collected in the same system.

Further, the screen obtaining module obtains that the fixed period of the screen image of the video terminal is related to the video frame rate, and the specific relationship is as follows: the period T is related to the video frame rate fps in milliseconds, with T equal to the reciprocal of fps.

Further, in the receiving terminal delay determining process, if the video data packet is lost, which results in that the time value read from the graph in S402 does not find a target picture matching the time value in S403, the current picture is discarded, the operation of S402 is repeated, and another picture is replaced to determine the receiving terminal delay.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

the invention provides a method for determining video delay in real-time video, which comprises the steps of creating a chat room by a user, entering the room again, constructing a scene of chatting with the self audio and video through a video service, wherein a video picture sent by an audio and video sending module of the user is provided with time information, when a media packet is packaged and sent, a filling position 1 of a packaging protocol is used for acquiring system time, a system time value is assigned to a filling field of the packaging packet, when an audio and video receiving module of the same user receives the media packet, the system time is acquired, the media packet is unpacked at the moment, the filling field value of the packet is acquired, so that the network delay of the packet can be determined, and a screen display picture is periodically acquired by a screen acquisition module, thereby determining the delay of a receiving terminal. According to the method, the sending module and the receiving module acquire the system time of the same system, so that system errors are avoided, meanwhile, network transmission delay and receiving terminal caching, decoding and rendering delay are strictly distinguished, and other links which possibly influence a delay determination result are not introduced, so that the accuracy of the network transmission delay and the receiving terminal delay determined by the method is high. The problem that network transmission and receiving terminals in the prior art both delay real-time videos is solved.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a flowchart of a method for determining video delay in real-time video according to embodiment 1 of the present invention;

fig. 2 is a flowchart of a network delay determination method according to embodiment 1 of the present invention;

fig. 3 is a flowchart of a method for determining terminal delay in embodiment 2 of the present invention;

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In order to solve the problems in the prior art, embodiments of the present invention provide a method for determining a video delay in a real-time video.

Example 1

The embodiment discloses a method for determining video delay in a real-time video, as shown in fig. 1, including:

s100, creating a chat room by a user, and constructing a scene that the user chats with the user; the embodiment supports the scene that the users chat with the own audio and video in the chat room aiming at the real-time audio and video service. For a better understanding of the present embodiment, for example: the user accesses the audio and video server on the PC, applies for creating and joining the chat room, and appoints the number of people in the chat room 2, the room number 001, the login password and the like. The user accesses the same audio and video server again on the same PC to apply for entering the chat room 001, and at the moment, two video pictures are available on one access page. A user can create a group of sending modules and receiving modules every time the user visits a chat room, in order to ensure that an uplink network and a downlink network of video data are smooth, the resolution ratio of the video can be set to be 640x480, and the current 4G network, wired network and Wifi can well support two paths of 640x480 videos to be sent and received simultaneously.

S200, a sending module of the video terminal collects video pictures with time information, packages and sends the video pictures, and the system time when the packages are sent is added to a filling field of a package protocol. In this embodiment S200, when the packet sending video data is too large, the video data is sent in packets, and the same system time is filled in the filling area of each packet, and the filling flag of each packet is set to have data filling.

In some preferred embodiments, a sending module of the video terminal encapsulates video data using an RTP protocol, and in a generated RTP packet, a padding flag is set to 1 at a system time when a padding area is filled. If the sent video data is an I frame, the data volume is large, and the video data needs to be encapsulated into a plurality of RTP packets for sending, the filling area of each RTP packet is filled at the same system time, and the filling marks are all set to be 1.

S300, a receiving module of the video terminal receives the audio and video packet, obtains the system time during receiving, decapsulates the audio and video packet, obtains the content of the filling field in the audio and video packet, and obtains the network transmission delay.

In S300 of this embodiment, as shown in fig. 2, the method for obtaining the network transmission delay includes:

s301, a sending module of the video terminal sends a video packet to a video server, and the video server forwards the received video packet to a receiving module of the video terminal; for example: the chat room has only 2 video chats, namely 1 to 1 video communication, the video server provides a forwarding function to the outside at the moment, receives an RTP packet sent by the user A, and the video server directly sends the RTP packet to the user B without any processing.

S302, a receiving module of the video terminal receives the video packet, obtains the current system time, decapsulates the video packet, reads the data value of the filling area, and sends the decapsulated video data to a playing terminal for decoding and rendering display. For example: continuing the example in S301, the receiving module of user B (i.e. user logs in for the second time) receives the 30 th frame video packet, records the current system time, the time value is t2, decapsulates the video packet, the data value of the filled area is t1, sends the decapsulated video data to the playing end for display, and if the video packet received by the receiving module of user B is a sub-packet, and sends the video data to the playing end for display after the sub-packet is received.

And S303, the receiving module determines the back-and-forth time delay of the current frame of video data from the video terminal to the video server according to the current system time and the data value of the filling area and the difference value of the current system time and the data value of the filling area.

For example: continuing with the example of S302, the receiving module of the user B (i.e., the user logs in for the second time) has obtained the receiving time value of the 30 th frame of video data as t2, the sending module of the user a (i.e., the user logs in for the first time) has sent the 30 th frame of video data as t1, the difference t3 between t2 and t1 is the network transmission delay of the 30 th frame of video data from the user a (i.e., the user logs in for the first time) to the video server and then to the user B (i.e., the user logs in for the second time), and half of the difference t3 is the network transmission delay of the user for the uplink or downlink of the current frame of video data.

It should be noted that, in the network transmission delay determining process, if a video data packet is lost, the receiving module itself has a packet loss processing mechanism, where the processing mechanism is common knowledge in the field of audio and video, and the network delay determining method introduced in this patent is not affected by the video data packet loss. The method for determining the delay of the frame of video data in the network transmission can continuously count the network transmission delay of the fps frame of video data according to the frame rate fps of video data acquisition to obtain the network delay average value of the fps frame of video data, wherein the network delay average value is the network delay of a 1-second time period.

S400, decoding and rendering the audio and video packet received by the receiving module, displaying the audio and video packet on a screen, and periodically acquiring a screen picture by the screen acquisition module to obtain the time delay of the receiving terminal.

In this embodiment S400, as shown in fig. 3, the method for obtaining the delay of the receiving terminal is as follows:

s401, a screen acquisition module acquires a screen picture of a video terminal in a fixed period and stores the picture as a picture; in this embodiment, the screen obtaining module obtains the fixed period of the screen image of the video terminal related to the video frame rate, and the specific relationship is as follows: the period T is related to the video frame rate fps in milliseconds, with T equal to the reciprocal of fps.

For example: the screen of the video terminal has two display frames, the frame of the display frame 1 is the video frame to be sent by the sending module of the user a (i.e. the user logs in for the first time), and the frame of the display frame 2 is the video frame sent by the receiving module of the user a (i.e. the user logs in for the first time) to receive the user B (i.e. the user logs in for the second time). The user A and the user B respectively log in for two times by the same user, the user A and the user B respectively have independent video terminals, the two independent terminals are positioned in the same PC and share the same camera, so that the picture of the display frame 1 is the video picture to be sent by the user A sending module and is also equal to the video picture to be sent by the user B sending module. The frame rate of video data collected by a camera of a video terminal is 25 frames per second, a screen acquisition module acquires a terminal screen once every 40 milliseconds, the acquired terminal screen is stored as a jpg format picture, and the picture name is increased monotonically from 1.

S402, checking S401 to store any one picture content in the pictures, wherein the video picture displayed by the screen picture display frame has time information, and reading the time value of the video picture.

For example: continuing with the example of S401, the video terminal camera captures a picture with time information, such as directly capturing a stopwatch, with a precision of microseconds. A picture with a picture name of 30.jpg is read, and the picture time value of the screen display frame 1 is denoted as T1. Because the screen acquisition module acquires the screen picture according to the frame rate, the picture of the display frame 1 in the 30.jpg picture is the 30 th frame of video data picture to be sent.

And S403, acquiring the time information carried by the video frame data to be sent in the read picture in S402, continuously reading the picture after the current picture, searching for a target picture with the same time information as the time information carried by the video frame data to be sent read in S402 in the video picture received by the receiving module, and reading the time information carried by the video frame data to be sent in the target picture.

S404, determining the total delay of the video data of the current frame in the network transmission and receiving terminal according to the time value of the video picture read in the S402 and the time information carried by the video frame data to be sent in the target picture in the S403, and subtracting the network transmission delay in the S300 to obtain the delay of the receiving terminal.

For example: in the example of S402, the time value of the picture in the picture display frame 1 in the 30.jpg picture is read as T1, the time value of the picture display frame 2 in each picture is checked one by one from 31.jpg, in 36.jpg picture, the time value of the picture display frame 2 is T1, which is the same as the time value of the picture display frame 1 in the 30.jpg picture, and 36.jpg is the target picture to be searched, and the time value of the picture display frame 1 in the 36.jpg picture is read and recorded as T3. The difference T4 between T3 and T1 is the total delay of the 30 th frame of video data in the network transmission and receiving terminal, and in S302 and S303, the delay of the 30 th frame of video data in the network transmission can be determined as T3, and the difference between T4 and T3 is the delay of the 30 th frame of video data receiving terminal.

In the receiving terminal delay determining process, if a video data packet is lost, which results in that the time value read from the picture in S402 does not find a target picture matched with the time value in S403, the current picture is abandoned, the operation of S402 is repeated, and a picture is replaced to determine the receiving terminal delay.

The method for determining the delay of one frame of video data in the receiving terminal is introduced, and the delay of the receiving terminal of the fps frame of video data can be continuously counted according to the frame rate fps acquired by the video data, so as to obtain the average value of the delay of the receiving terminal of the fps frame of video data, wherein the average value of the delay of the receiving terminal is the delay of the receiving terminal in a 1 second time period.

In some preferred embodiments, in the receiving terminal delay determining process, if a packet is lost, which results in that the time value read from the graph in S402 does not find a target picture matching the time value in S403, the current picture is discarded, the operation of S402 is repeated, another picture is replaced, and S403-S404 are performed to determine the receiving terminal delay.

The embodiment provides a method for determining video delay in real-time video, which comprises the steps of creating a chat room by a user, entering the room again, constructing a scene of chatting with the self audio and video through a video service, wherein a video picture sent by an audio and video sending module of the user is provided with time information, when a media packet is packaged and sent, a filling position 1 of a packaging protocol is used for acquiring system time, a system time value is assigned to a filling field of the packaging packet, when an audio and video receiving module of a second video terminal of the same user receives the media packet, the system time is acquired at the moment, the media packet is unpacked, the filling field value of the packet is acquired, the network delay of the packet can be determined, and a screen display picture is periodically acquired by a screen acquisition module, so that the method for determining the delay of a receiving terminal is determined. According to the method, the sending module and the receiving module acquire the system time of the same system, so that system errors are avoided, meanwhile, network transmission delay and receiving terminal caching, decoding and rendering delay are strictly distinguished, and other links which possibly influence a delay determination result are not introduced, so that the accuracy of the network transmission delay and the receiving terminal delay determined by the method is high. The problem that network transmission and receiving terminals in the prior art both delay real-time videos is solved.

It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.

In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. Of course, the processor and the storage medium may reside as discrete components in a user terminal.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.

What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or".

11页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:多媒体数据处理方法、装置、设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类