Video playing method, device and equipment and computer readable storage medium

文档序号：1345784 发布日期：2020-07-21 浏览：7次中文

阅读说明：本技术 视频播放方法、装置和设备、计算机可读存储介质 (Video playing method, device and equipment and computer readable storage medium ) 是由韩存爱于 2019-01-11 设计创作，主要内容包括：本发明公开了一种视频播放方法,包括：当正在播放所述视频的应用程序从前台被切换到后台时,发起记录所述视频的当前播放位置；当所述应用程序从后台被切换回前台时,取回所记录的当前播放位置并且在所述视频中获取最接近所述当前播放位置且在所述当前播放位置之前的帧内编码帧；发起从所述最接近的帧内编码帧开始解码所述视频；并且当解码到达所述当前播放位置时,发起从所述当前播放位置恢复播放所述视频。还公开了一种视频播放设备和计算机可读存储介质。(The invention discloses a video playing method, which comprises the following steps: when an application program playing the video is switched from a foreground to a background, initiating to record the current playing position of the video; when the application program is switched back to the foreground from the background, retrieving the recorded current playing position and acquiring an intra-frame coding frame which is closest to and before the current playing position in the video; initiating decoding of the video starting from the closest intra-coded frame; and when the decoding reaches the current playing position, initiating the video to resume playing from the current playing position. A video playback device and a computer-readable storage medium are also disclosed.)

1. A video playback method, comprising:

when an application program playing the video is switched from a foreground to a background, initiating to record the current playing position of the video;

when the application program is switched back to the foreground from the background, retrieving the recorded current playing position and acquiring an intra-frame coding frame which is closest to and before the current playing position in the video;

initiating decoding of the video starting from the closest intra-coded frame; and is

And when the decoding reaches the current playing position, initiating the video to be played from the current playing position again.

2. The method of claim 1, wherein initiating recording of a current play position of the video comprises: saving the current playback position to a global memory, and wherein retrieving the recorded current playback position comprises: and reading the current playing position from the global memory.

3. The method of claim 1, wherein the video comprises video payload data, identification metadata indicating a frame type of each video frame in the video payload data, and temporal metadata indicating a display time value of the each video frame, and wherein obtaining the closest intra-coded frame comprises: searching for the closest intra-coded frame based on the identifying metadata and the temporal metadata.

4. The method of claim 3, wherein the identifying metadata further includes index information indicating respective offsets of intra-coded frames in the video payload data relative to a starting position of the video, and wherein searching for the closest intra-coded frame comprises:

sequentially parsing the temporal metadata of the intra-coded frames based on the index information, wherein:

deriving a display time value for the current intra-coded frame from the temporal metadata;

comparing the display time value with the time value of the current playing position;

in response to the display time value being less than the time value of the current play position, continuing the parsing of the time metadata for a next intra-coded frame;

in response to the display time value being greater than the time value of the current play position, determining a previous intra-coded frame as the closest intra-coded frame; and is

And determining the current intra-coded frame as the closest intra-coded frame in response to the display time value being equal to the time value of the current playing position.

5. The method of claim 4, further comprising: and determining the last intra-coded frame as the closest intra-coded frame in response to the display time value of the last intra-coded frame in the intra-coded frames being smaller than the time value of the current playing position.

6. The method of claim 3, wherein the video comprises a plurality of slices, each slice comprising a plurality of temporally spaced intra-coded frames, each of the plurality of intra-coded frames followed by a respective non-intra-coded frame, wherein the temporal metadata further comprises temporal information indicating respective durations of the plurality of slices, and wherein searching for the closest intra-coded frame comprises:

determining one of the plurality of segments where the current playing position is located according to the current playing position and the time information; and is

Sequentially analyzing the identification metadata of each intra-frame and non-intra-frame in the slice where the current playing position is located, wherein:

deriving a frame type for the current frame from the identifying metadata;

in response to the current frame being a non-intra-coded frame, continuing the parsing of the identification metadata for a next frame;

deriving a display time value for the current frame from the temporal metadata for the current frame in response to the current frame being an intra-coded frame;

comparing the display time value with the time value of the current playing position;

in response to the display time value being less than the time value of the current play position, continuing the parsing of the identification metadata for a next frame;

in response to the display time value being greater than the time value of the current play position, determining a previous intra-coded frame as the closest intra-coded frame; and is

And in response to the display time value being equal to the time value of the current playing position, determining the current frame as the closest intra-coded frame.

7. The method of claim 6, wherein searching for the closest intra-coded frame further comprises:

in response to the current frame being a non-intra-coded frame, deriving a display time value for the current frame from the temporal metadata for the current frame before continuing the parsing of the identification metadata for a next frame; and is

And determining the last intra-frame of the intra-frame coding frames as the closest intra-frame in response to that the display time value of the current frame is equal to the end time of the segment where the current playing position is located.

8. The method of claim 1, wherein initiating decoding of the video comprises:

passing the obtained closest intra-coded frame and a plurality of video frames following the closest intra-coded frame to a decoder component; and is

Each time the current frame is decoded, judging whether the display time value of the decoded current frame is equal to the time value of the current playing position, wherein:

in response to the display time value not being equal to the time value of the current play position, discarding the decoded current frame and continuing the determination for the next decoded frame, and

determining that decoding has reached the current play position in response to the display time value being equal to the time value of the current play position.

9. The method of claim 1, wherein the application runs in an iOS platform in which an application programming interface is opened for accessing an underlying hardware decoding framework, and wherein initiating decoding of the video comprises: invoking the underlying hardware decode framework via the application programming interface to perform decoding.

10. The method of claim 1, wherein initiating decoding of the video comprises: a software decoder is invoked to perform the decoding.

11. A video playback apparatus comprising:

means for initiating recording of a current playback position of the video when an application program that is playing the video is switched from a foreground to a background;

means for retrieving the recorded current play position and obtaining an intra-coded frame in the video that is closest to and before the current play position when the application is switched from background back to foreground;

means for initiating decoding of the video starting from the closest intra-coded frame; and

means for initiating resuming playing of the video from the current play position when decoding reaches the current play position.

12. A video playback device comprising:

a play position recording module configured to record a current play position of the video when an application program that is playing the video is switched from a foreground to a background;

an intra-coded frame acquisition module configured to retrieve the recorded current play position and acquire an intra-coded frame in the video that is closest to and before the current play position when the application is switched from background back to foreground; and

a playback resumption module configured to decode the video starting from the closest intra-coded frame and, when decoding reaches the current playback position, cause the application to resume playing the video from the current playback position.

13. A video playback device comprising a processor and a memory configured to store computer instructions that, when executed on the processor, cause the processor to perform the method of any of claims 1-10.

14. A computer-readable storage medium configured to store computer instructions that, when executed on a processor, cause the processor to perform the method of any one of claims 1-10.

Technical Field

The present invention relates to video content reproduction, and more particularly, to a video playing method, a video playing apparatus, a video playing device, and a computer-readable storage medium.

Background

During the playing of a video, a brief black screen may occur when the video application is switched into the background (e.g., due to the user pressing the home key) and then reverts to the foreground to continue playing the video. The normal picture can not appear until the next key frame in the video comes, and the picture at this time is usually not continuous with the picture when the user clicks the home key. This results in a discontinuous viewing experience.

Disclosure of Invention

It would be advantageous to provide a solution that can alleviate, reduce or eliminate the above-mentioned problems.

According to an aspect of the present invention, there is provided a video playing method, including: when an application program playing the video is switched from a foreground to a background, initiating to record the current playing position of the video; when the application program is switched back to the foreground from the background, retrieving the recorded current playing position and acquiring an intra-frame coding frame which is closest to and before the current playing position in the video; initiating decoding of the video starting from the closest intra-coded frame; and when the decoding reaches the current playing position, initiating the video to resume playing from the current playing position.

In some embodiments, initiating recording of the current playback position of the video comprises: and storing the current playing position in a global memory. Retrieving the recorded current play position comprises: and reading the current playing position from the global memory.

In some embodiments, the video includes video payload data, identification metadata indicating a frame type of each video frame in the video payload data, and temporal metadata indicating a display time value for the each video frame. Obtaining the closest intra-coded frame comprises: searching for the closest intra-coded frame based on the identifying metadata and the temporal metadata.

In some embodiments, the identification metadata further includes index information indicating respective offsets of intra-coded frames in the video payload data relative to a starting position of the video. Searching for the closest intra-coded frame comprises: sequentially parsing the temporal metadata of the intra-coded frames based on the index information, wherein: deriving a display time value for the current intra-coded frame from the temporal metadata; comparing the display time value with the time value of the current playing position; in response to the display time value being less than the time value of the current play position, continuing the parsing of the time metadata for a next intra-coded frame; in response to the display time value being greater than the time value of the current play position, determining a previous intra-coded frame as the closest intra-coded frame; and in response to the display time value being equal to the time value of the current play position, determining the current intra-coded frame as the closest intra-coded frame.

In some embodiments, the method further comprises: and determining the last intra-coded frame as the closest intra-coded frame in response to the display time value of the last intra-coded frame in the intra-coded frames being smaller than the time value of the current playing position.

In some embodiments, the video comprises a plurality of slices, each slice comprising a plurality of temporally spaced intra-coded frames, each of the plurality of intra-coded frames being followed by a respective non-intra-coded frame. The temporal metadata further includes temporal information indicating respective durations of the plurality of slices. Searching for the closest intra-coded frame comprises: determining one of the plurality of segments where the current playing position is located according to the current playing position and the time information; sequentially analyzing the identification metadata of each intra-frame and non-intra-frame in the slice where the current playing position is located, wherein: deriving a frame type for the current frame from the identifying metadata; in response to the current frame being a non-intra-coded frame, continuing the parsing of the identification metadata for a next frame; deriving a display time value for the current frame from the temporal metadata for the current frame in response to the current frame being an intra-coded frame; comparing the display time value with the time value of the current playing position; in response to the display time value being less than the time value of the current play position, continuing the parsing of the identification metadata for a next frame; in response to the display time value being greater than the time value of the current play position, determining a previous intra-coded frame as the closest intra-coded frame; and in response to the display time value being equal to the time value of the current play position, determining the current frame as the closest intra-coded frame.

In some embodiments, searching for the closest intra-coded frame further comprises: in response to the current frame being a non-intra-coded frame, deriving a display time value for the current frame from the temporal metadata for the current frame before continuing the parsing of the identification metadata for a next frame; and determining the last intra-frame of the intra-frame coding frames as the closest intra-frame in response to that the display time value of the current frame is equal to the end time of the segment where the current playing position is located.

In some embodiments, initiating decoding of the video comprises: passing the obtained closest intra-coded frame and a plurality of video frames following the closest intra-coded frame to a decoder component; and each time the current frame is decoded, judging whether the display time value of the decoded current frame is equal to the time value of the current playing position, wherein: in response to the display time value not being equal to the time value for the current play position, discarding the decoded current frame and continuing the determination for the next decoded frame, and in response to the display time value being equal to the time value for the current play position, determining that decoding has reached the current play position.

In some embodiments, the application runs in an iOS platform in which an application programming interface is opened for accessing the underlying hardware decoding framework. Initiating decoding of the video comprises: invoking the underlying hardware decode framework via the application programming interface to perform decoding.

In some embodiments, initiating decoding of the video comprises: a software decoder is invoked to perform the decoding.

According to another aspect of the present invention, there is provided a video playback apparatus including: means for initiating recording of a current playback position of the video when an application program that is playing the video is switched from a foreground to a background; means for retrieving the recorded current play position and obtaining an intra-coded frame in the video that is closest to and before the current play position when the application is switched from background back to foreground; means for initiating decoding of the video starting from the closest intra-coded frame; and means for initiating resuming playing of the video from the current play position when decoding reaches the current play position.

According to still another aspect of the present invention, there is provided a video playback device including: a play position recording module configured to record a current play position of the video when an application program that is playing the video is switched from a foreground to a background; an intra-coded frame acquisition module configured to retrieve the recorded current play position and acquire an intra-coded frame in the video that is closest to and before the current play position when the application is switched from background back to foreground; and a playback resumption module configured to decode the video starting from the closest intra-coded frame and, when decoding reaches the current playback position, cause the application to resume playing the video from the current playback position.

In some embodiments, the playback position recording module is configured to save the current playback position to a global memory to record the current playback position of the video. The intra-frame acquisition module is configured to read the current play position from the global memory to retrieve the recorded current play position.

In some embodiments, the video includes video payload data, identification metadata indicating a frame type of each video frame in the video payload data, and temporal metadata indicating a display time value for the each video frame. The intra-coded frame acquisition module is configured to seek the closest intra-coded frame based on the identification metadata and the temporal metadata.

In some embodiments, the identification metadata further includes index information indicating respective offsets of intra-coded frames in the video payload data relative to a starting position of the video. The intra-coded frame acquisition module is configured to: sequentially parsing the temporal metadata of the intra-coded frames based on the index information, wherein: deriving a display time value for the current intra-coded frame from the temporal metadata; comparing the display time value with the time value of the current playing position; in response to the display time value being less than the time value of the current play position, continuing the parsing of the time metadata for a next intra-coded frame; in response to the display time value being greater than the time value of the current play position, determining a previous intra-coded frame as the closest intra-coded frame; and in response to the display time value being equal to the time value of the current play position, determining the current intra-coded frame as the closest intra-coded frame.

In some embodiments, the intra-coded frame acquisition module is further configured to: and determining the last intra-coded frame as the closest intra-coded frame in response to the display time value of the last intra-coded frame in the intra-coded frames being smaller than the time value of the current playing position.

In some embodiments, the video comprises a plurality of slices, each slice comprising a plurality of temporally spaced intra-coded frames, each of the plurality of intra-coded frames being followed by a respective non-intra-coded frame. The temporal metadata further includes temporal information indicating respective durations of the plurality of slices. The intra-coded frame acquisition module is configured to: determining one of the plurality of segments where the current playing position is located according to the current playing position and the time information; sequentially analyzing the identification metadata of each intra-frame and non-intra-frame in the slice where the current playing position is located, wherein: deriving a frame type for the current frame from the identifying metadata; in response to the current frame being a non-intra-coded frame, continuing the parsing of the identification metadata for a next frame; deriving a display time value for the current frame from the temporal metadata for the current frame in response to the current frame being an intra-coded frame; comparing the display time value with the time value of the current playing position; in response to the display time value being less than the time value of the current play position, continuing the parsing of the identification metadata for a next frame; in response to the display time value being greater than the time value of the current play position, determining a previous intra-coded frame as the closest intra-coded frame; and in response to the display time value being equal to the time value of the current play position, determining the current frame as the closest intra-coded frame.

In some embodiments, the intra-coded frame acquisition module is configured to: and determining the last intra-frame of the intra-frame coding frames as the closest intra-frame in response to that the display time value of the current frame is equal to the end time of the segment where the current playing position is located.

In some embodiments, the play recovery module is configured to: passing the obtained closest intra-coded frame and a plurality of video frames following the closest intra-coded frame to a decoder component; and each time the current frame is decoded, judging whether the display time value of the decoded current frame is equal to the time value of the current playing position, wherein: in response to the display time value not being equal to the time value for the current play position, discarding the decoded current frame and continuing the determination for the next decoded frame, and in response to the display time value being equal to the time value for the current play position, determining that decoding has reached the current play position.

In some embodiments, the application runs in an iOS platform in which an application programming interface is opened for accessing the underlying hardware decoding framework. The playback recovery module is configured to invoke the underlying hardware decode framework via the application programming interface to perform decoding.

In some embodiments, the playback recovery module is configured to invoke a software decoder to perform the decoding.

According to yet another aspect of the present invention, there is provided a video playback device comprising a processor and a memory configured to store computer instructions which, when executed on the processor, cause the processor to perform the method as described above.

According to yet another aspect of the present invention, there is provided a computer readable storage medium configured to store computer instructions which, when executed on a processor, cause the processor to perform the method as described above.

These and other aspects of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter.

Drawings

Further details, features and advantages of the invention are disclosed in the following description of exemplary embodiments with reference to the accompanying drawings, in which:

fig. 1 shows a flow chart of a video playing method according to an embodiment of the invention;

FIG. 2 illustrates in more detail an example flow of obtaining a nearest intra-coded frame in the method of FIG. 1;

fig. 3 is a schematic diagram showing searching for an intra-coded frame within a slice in the related art;

FIG. 4 illustrates another example flow of obtaining a nearest intra-coded frame in the method of FIG. 1 in more detail;

FIG. 5 shows a schematic illustration of the operation of FIG. 4;

FIG. 6 shows in more detail a schematic and exemplary illustration of initiating decoding and resuming play in the method of FIG. 1;

FIG. 7 shows a schematic block diagram of a video playback device according to an embodiment of the present invention; and is

Fig. 8 generally illustrates an example system that includes an example computing device that represents one or more systems and/or devices that may implement the various techniques described herein.

Detailed Description

The inventors of the present application have recognized that the video play discontinuity problem experienced by video applications during foreground and background switching is due to the buffered data in the video decoding component being emptied. For example, in an iOS platform, hardware decoding is done using an underlying hardware decoding framework ("VideoToolBox") that has direct access to the hardware decoder, and when the user clicks the home key or otherwise cuts the video application into the background, the data cached in the VideoToolBox will be reclaimed. Then, when the video application returns to the foreground, the previous key frame on which the next non-key frame is decoded has been emptied, so that the non-key frame cannot be decoded normally, resulting in a blank screen. The normal decoded picture is not available until the next key frame comes, and the picture at this time is not continuous with the picture before the playback is interrupted. Other platforms (e.g., Android or Windows) may suffer from the same problems.

Fig. 1 shows a flow diagram of a video playback method 100 according to an embodiment of the invention.

At step 110, when an application (hereinafter also referred to as "video app") that is playing a video is switched from the foreground to the background (e.g., due to a user pressing a home key, a video playback window being minimized, etc.), a current playback position (hereinafter also referred to as "CurrentPos") at which the video is recorded is initiated.

In some embodiments, this includes saving CurrentPos to global memory. In this way, the information about the current play position of the video is not emptied and can be used later for resume play of the video. Other embodiments are possible. In particular, CurrentPos may be a relative amount of time, e.g., relative to a start time of the video (e.g., 0 minutes 0 seconds), and may have an accuracy of, e.g., milliseconds or microseconds.

The term "video" as used herein is intended to include media files in various encoding formats and/or packaging formats. The video may be located locally with the video app playing the video, or received in real-time from a remote location (e.g., via a wired or wireless network). The video app here includes a variety of applications with video playback functionality, such as a video player or other app embedded with a video player, such as Tencent video, WeChat, microblog, variants thereof (e.g., cell phone version, web page version, computer version), and so forth.

When the video app is switched from the background back to the foreground, the recorded CurrentPos is retrieved, and the intra-coded frame closest to and before CurrentPos is retrieved in the video, at step 120.

In some embodiments, retrieving the recorded CurrentPos includes reading CurrentPos from the global memory. The retrieved CurrentPos is then utilized to locate the appropriate intra-coded frame in the video. Intra-coded frames (also called "I-frames") contain all the information to reconstruct a complete picture, while non-intra-coded frames (e.g., P-frames and B-frames) must rely directly or indirectly on I-frames for decoding. To shorten processing latency to resume play as soon as possible, it is desirable to obtain the I-frame closest to and temporally before CurrentPos. This can be achieved with a seek (seek) operation for the I-frame.

Typically, video, regardless of its encoding format and packaging format, includes video payload data and metadata describing the video payload data. The metadata includes, for example, identification metadata (e.g., a field in the header) indicating a frame type of each video frame in the video payload data and time metadata (e.g., a display timestamp (pts) in the header) indicating a display time value for the each video frame. Therefore, in step 120, the identification metadata may be used to identify whether a frame is an I-frame, and the temporal metadata may be used to search for an I-frame closest to and temporally before CurrentPos (referred to as "closest I-frame").

The corresponding seek operation for several exemplary formats of video is described below with reference to fig. 2, 3, 4 and 5.

For MP4 format or similar format video, the identifying metadata also includes index information, such as an I-frame index table, that indicates the respective offset of each I-frame in the video payload data relative to the start position of the video. Thus, for this type of video, the individual I-frames can be located sequentially by parsing the index information, and the required I-frames are determined by parsing the temporal metadata of the individual I-frames. In the case of the MP4 format, the I-frame index table and the display time information are both recorded in the HEADER BOX. In performing the seek operation, the I-frame index table and the display time information may be obtained by directly reading the HEADER BOX if the HEADER BOX is locally available, and the I-frame index table and the display time information may be obtained by sending a request for the HEADER BOX to a remote video source if the HEADER BOX is locally unavailable.

Fig. 2 illustrates in more detail an example process 120A for obtaining the closest I-frame that is applicable to this type of video format. Referring to fig. 2, at step 121, a display time value of a current I-frame (e.g., the 1 st I-frame in a video) is derived from the temporal metadata of the current I-frame. At step 122, the display time value is compared to the time value of CurrentPos. If the display time value is less than the time value of CurrentPos (step 123), then the parsing of the time metadata continues for the next I-frame (step 124) and so the flow returns to step 121. In contrast, if the display time value is not less than the time value of CurrentPos (step 123), it is determined at step 125 whether the display time value of the current I-frame is greater than the time value of CurrentPos. If the display time value is greater than the time value of CurrentPos (step 125), it indicates that the display time of the current I-frame has exceeded CurrentPos, and the previous I-frame is determined to be the closest required I-frame (step 126). If the display time value is not greater than the time value of CurrentPos (step 125), i.e., the display time value is equal to the time value of CurrentPos, then it indicates that the current I frame is the required I frame closest to CurrentPos. During the seek operation, the metadata of each I-frame is parsed without decoding the video payload data, thus resulting in only a short processing delay.

In the extreme case where CurrentPos is after the last I-frame in the video, process 120A will not be able to locate the nearest I-frame required because step 125 will not be triggered for this last I-frame. In this case, the last I frame may be determined to be the closest required I frame in response to the display time value of the last I frame indicated by the I frame index table being less than the time value of CurrentPos.

For video in H L S (HTTP L ve Streaming) + TS (transport Streaming) format or similar, it includes multiple slices (segments), each slice includes multiple I-frames separated in time, and each of the multiple I-frames is followed by a corresponding non-intra-coded frame.

Specifically, in the case of the H L S + TS format, video includes M3U8 description files and TS media files, where the M3U8 description files are made up of a series of tags (tags) and the TS media files are described textually an example of one M3U8 description file is given below.

#EXTM3U

#EXT-X-TARGETDURATION:10

#EXTINF:10,

./0.ts

#EXTINF:10,

./1.ts

In this example, tag # EXT-X-target duration specifies the duration (in units: seconds) of the largest TS media segment, and each tag # extlnf specifies the duration of the corresponding TS media segment, which must be less than or equal to the maximum value specified by tag # EXT-X-target duration (in this example, 10 seconds).

Although the resulting starting I-frame is prior in time to the seek position and may be used for subsequent decoding operations, the starting I-frame may be too far away from the seek position resulting in a long time spent decoding useless data.

Fig. 4 illustrates another example process 120B of obtaining the closest I-frame, applicable to this type of video format, in more detail. Referring to fig. 4, at step 221, one of the TS media segments where CurrentPos is located is determined according to CurrentPos and the duration of each TS media segment. Then, the identification metadata of each video frame in the slice where the CurrentPos is located is sequentially parsed. At step 222, a frame type of the current frame is derived from the identifying metadata. For example, in the case where a TS media file is encoded using the h.264 standard, whether the current frame is an I frame may be determined using the nal _ unit _ type field. If the current frame is a non-I frame (step 223), the parsing of the identifying metadata continues for the next frame (step 224) and flow returns to step 222. If the current frame is an I-frame (step 223), then the display time value for the current frame is derived from the temporal metadata for the current frame at step 225. At step 226, the display time value is compared to the time value of CurrentPos. If the display time value is less than the time value of CurrentPos (step 227), the parsing of the identifying metadata continues for the next frame (step 228) and flow returns to step 222. In contrast, if the display time value is not less than the time value of CurrentPos (step 227), it is determined whether the display time value is greater than the time value of CurrentPos at step 229. If the display time value is greater than the time value of CurrentPos (step 229), this indicates that the display time of the current frame (specifically, I-frame) has exceeded CurrentPos, and the previous I-frame is determined to be the closest I-frame required. If the display time value is not greater than the time value of CurrentPos (step 229), i.e., the display time value is equal to the time value of the current playing position, then the current frame (specifically, I frame) is determined to be the closest required I frame.

Fig. 5 shows a schematic illustration of the operation of fig. 4. In this example, it is assumed that CurrentPos is 15 s. Since the duration of the TS media segment 0.TS and 1.TS is 10s, it can be determined that CurrentPos is located within the TS media segment 1. TS. As shown in fig. 5, the seek operation is performed in the TS media segment 1.TS of CurrentPos. The identification metadata is parsed from the initial I frame a of 1.ts, and since the video frame is an I frame, the time metadata thereof is parsed to obtain the display time value thereof. Since the display time value of the I-frame a is smaller than the time value of CurrentPos, the parsing of the identification metadata is continued for the next frame (not shown). Since the next frame is a non-I-frame, the parsing of the identification metadata continues for the next frame. When I-frame b is parsed, the parsing of the identification metadata continues for the next frame (not shown) since its display time value is still less than the time value of CurrentPos. Such a process continues until the I-frame d is parsed. Since the I frame d is an I frame, the time metadata thereof is parsed to obtain the display time value thereof. Since the display time value of the I frame d is greater than the time value of CurrentPos, the previous I frame (i.e., I frame c) of the I frame d is determined to be the closest required I frame. During seek operations, the metadata of each video frame is parsed without decoding the video payload data, thus resulting in only a short processing delay.

Similar to flow 120A, in the extreme case where CurrentPos is after the last I-frame in the TS media segment, flow 120B will not be able to locate the nearest I-frame required, since step 229 will not be triggered for this last I-frame. In view of this, in some embodiments, the temporal metadata for each video frame may be parsed regardless of the type of video frame (I-frame or non-I-frame). Specifically, at step 223, if the current frame is a non-I-frame, then the display time value for the current frame is derived from the temporal metadata of the current frame prior to step 224. This derived display time value for the current frame may be compared to the time value for the current play position. And in response to the display time value of the current frame being equal to the end time of the TS media segment, determining the last I frame in the TS media segment as the required closest I frame. The end time of the TS media segment may be obtained, for example, by accumulating the duration of the TS media segment with the duration of the previous TS media segments. In the example of fig. 5, the end time of the TS media segment 1.TS may be calculated as 10+10=20 s.

Now, an I-frame closest to and temporally before the current play position CurrentPos is acquired, and thus the video can be decoded starting from the acquired I-frame.

Referring back to fig. 1, at step 130, decoding of the video is initiated starting from the nearest I-frame, and at step 140, resuming playing of the video from CurrentPos is initiated when decoding reaches CurrentPos.

Fig. 6 shows in more detail a schematic and exemplary illustration of initiating decoding and resuming play in the method 100 of fig. 1.

At step 131, the retrieved closest I frame data and a number of video frames following the closest I frame are read (e.g., from a buffer of the player) and passed to a decoder for decoding. In some embodiments, the video app runs in the iOS platform. The iOS systems of version 8.0 and beyond provide an Application Programming Interface (API) via which to access the underlying hardware decoding framework, the VideoToolBox, and the VideoToolBox can directly access the hardware decoder to speed up decoding. In this case, decoding can be performed by calling the VideoToolBox via such an API, thereby shortening the delay to the resuming of the playing of the video. It will be understood that the concepts of the present invention are platform-independent and that other embodiments are possible. For example, a software decoder may be invoked for decoding. This is possible in certain application scenarios. For example, in the case of a web-enabled video app used on a desktop or laptop computer, the decoding may be performed by calling a software decoder built into the kernel of the web browser.

At step 132, each time the current frame is decoded, it is determined whether the display time value of the decoded current frame is equal to the time value of CurrentPos. If the display time value is not equal to the time value of CurrentPos, the decoded current frame is discarded (step 133) and the decision for the next frame is continued (step 134). If the presentation time value of the decoded current frame is equal to the time value of CurrentPos, it is determined that the decoding has reached CurrentPos (step 135). At this point, resuming playing the video from CurrentPos is initiated. As shown in fig. 6, the decoded frame data is sent to the application layer for display.

The seek operation for the closest I frame and the decoding operation from the closest I frame may take so short time that the resulting latency is not noticeable to the user. Thus, from the user's perspective, the playback of the video will be continuous and smooth as the video app switches from the background back to the foreground.

This provides advantages over several typical approaches. In a typical scenario, when the application switches into the background, the current player instance is stopped directly, and when the application is switched back to the foreground, the playback flow is re-executed from the beginning (e.g., invoking the player's start logic), and the start playback position is set to the playback position of the previous application when it switched to the background. However, calling start logic to replay takes longer, and a short blank screen may occur. In another typical scenario, when the application cuts into the background, playback is paused by calling the player's pause logic, and then when the application cuts back from the background to the foreground, the player's resume logic is called to replay the current video. However, directly invoking resume logic may result in a decoding failure and thus a black screen, since the buffered data in the decoder component has been emptied at this time. The duration of the black screen is determined by the interval between the current play position and the display time value of the next key frame. Due to the decoding failure, normal pictures are not obtained until the next I frame, resulting in discontinuous pictures being played.

Fig. 7 shows a schematic block diagram of a video playback device 700 according to an embodiment of the present invention. Referring to fig. 7, the video playback device 700 includes a playback position recording module 710, an intra-coded frame acquisition module 720, and a playback recovery module 730.

The play position recording module 710 is configured to record the current play position of the video when the application program that is playing the video is switched from the foreground to the background. The operation of the play position recording module 710 has been described above in detail with respect to the method embodiment illustrated in connection with fig. 1 and is not repeated here for the sake of brevity.

The intra-coded frame acquisition module 720 is configured to retrieve the recorded current play position and acquire an intra-coded frame in the video that is closest to and before the current play position when the application is switched from background back to foreground. The operation of the intra-coded frame acquisition module 720 has been described in detail above with respect to the method embodiments illustrated in connection with fig. 1-5 and will not be repeated here for the sake of brevity.

The play back recovery module 730 is configured to decode the video starting from the closest intra-coded frame and, when the decoding reaches the current play position, cause the application to resume playing the video from the current play position. The operation of the play recovery module 730 has been described in detail above with respect to the method embodiments illustrated in connection with fig. 1 and 6 and will not be repeated here for the sake of brevity.

It will be appreciated that the play position recording module 710, the intra-coded frame acquisition module 720, and the play recovery module 730 may be implemented by software, firmware, hardware, or a combination thereof.

Fig. 8 generally illustrates an example system 800 that includes an example computing device 810 that represents one or more systems and/or devices that may implement the various techniques described herein. Computing device 810 may be, for example, a device associated with a client (e.g., a client device), a system-on-chip, a server of a service provider, and/or any other suitable computing device or computing system. The video playback device 700 described above with respect to fig. 7 may take the form of a computing device 810. Alternatively, the video playback device 700 may be implemented as a computer program in the form of a video app 816. More specifically, the video playback device 700 may be implemented as an integral part of the video player or as a plug-in that may be downloaded and installed separately from the video player.

The example computing device 810 as illustrated includes a processing system 811, one or more computer-readable media 812, and one or more I/O interfaces 813 communicatively coupled to each other. Although not shown, computing device 810 may also include a system bus or other data and command transfer system that couples the various components to one another. A system bus can include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. Various other examples are also contemplated, such as control and data lines.

The processing system 811 represents functionality to perform one or more operations using hardware. Thus, the processing system 811 is illustrated as including hardware elements 814 that may be configured as processors, functional blocks, and so forth. This may include implementation in hardware as an application specific integrated circuit or other logic device formed using one or more semiconductors. The hardware elements 814 are not limited by the materials from which they are formed or the processing mechanisms employed therein. For example, a processor may be comprised of semiconductor(s) and/or transistors (e.g., electronic Integrated Circuits (ICs)). In such a context, processor-executable instructions may be electronically-executable instructions.

The computer-readable medium 812 is illustrated as including memory/storage 815. Memory/storage 815 represents memory/storage capacity associated with one or more computer-readable media. The memory/storage 815 may include volatile media (such as Random Access Memory (RAM)) and/or nonvolatile media (such as Read Only Memory (ROM), flash memory, optical disks, magnetic disks, and so forth). The memory/storage 815 may include fixed media (e.g., RAM, ROM, a fixed hard drive, etc.) as well as removable media (e.g., flash memory, a removable hard drive, an optical disk, and so forth). The computer-readable medium 812 may be configured in various other ways as further described below.

One or more I/O interfaces 813 represent functionality that allows a user to enter commands and information to computing device 810, and optionally also allows information to be presented to the user and/or other components or devices using various input/output devices. Examples of input devices include a keyboard, a cursor control device (e.g., a mouse), a microphone (e.g., for voice input), a scanner, touch functionality (e.g., capacitive or other sensors configured to detect physical touch), a camera (e.g., motion that may not involve touch may be detected as gestures using visible or invisible wavelengths such as infrared frequencies), and so forth. Examples of output devices include a display device (e.g., a monitor or projector), speakers, a printer, a network card, a haptic response device, and so forth. Accordingly, the computing device 810 may be configured in various ways to support user interaction, as described further below.

Computing device 810 may also include video app 816. Video app 816 may be, for example, a software instance of video playback device 700 of fig. 7, and in combination with other elements in computing device 810 implement the techniques described herein.

Various techniques may be described herein in the general context of software hardware elements or program modules. Generally, these modules include routines, programs, objects, elements, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The terms "module" (e.g., play position recording module 710, intra-coded frame acquisition module 720, and play recovery module 730 in the preceding paragraphs), "functionality," and "component" as used herein generally represent software, firmware, hardware, or a combination thereof. The features of the techniques described herein are platform-independent, meaning that the techniques may be implemented on a variety of computing platforms having a variety of processors.

An implementation of the described modules and techniques may be stored on or transmitted across some form of computer readable media. Computer readable media can include a variety of media that can be accessed by computing device 810. By way of example, and not limitation, computer-readable media may comprise "computer-readable storage media" and "computer-readable signal media".

"computer-readable storage medium" refers to a medium and/or device, and/or a tangible storage apparatus, capable of persistently storing information, as opposed to mere signal transmission, carrier wave, or signal per se. Accordingly, computer-readable storage media refers to non-signal bearing media. Computer-readable storage media include hardware such as volatile and nonvolatile, removable and non-removable media and/or storage devices implemented in a method or technology suitable for storage of information such as computer-readable instructions, data structures, program modules, logic elements/circuits or other data. Examples of computer readable storage media may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, hard disks, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other storage devices, tangible media, or an article of manufacture suitable for storing the desired information and accessible by a computer.

"computer-readable signal medium" refers to a signal-bearing medium configured to transmit instructions to hardware of computing device 810, such as via a network. Signal media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave, data signal or other transport mechanism. Signal media also includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.

As previously described, the hardware elements 814 and the computer-readable medium 812 represent instructions, modules, programmable device logic, and/or fixed device logic implemented in hardware that may be used in some embodiments to implement at least some aspects of the techniques described herein.

Combinations of the foregoing may also be used to implement the various techniques and modules described herein. Thus, software, hardware, or program modules and other program modules may be implemented as one or more instructions and/or logic embodied on some form of computer-readable storage medium and/or by one or more hardware elements 814. Computing device 810 may be configured to implement particular instructions and/or functions corresponding to software and/or hardware modules. Thus, implementing modules as modules executable by computing device 810 as software may be implemented at least partially in hardware, for example, using computer-readable storage media of a processing system and/or hardware elements 814. The instructions and/or functions may be executable/operable by one or more articles of manufacture (e.g., one or more computing devices 810 and/or processing systems 811) to implement the techniques, modules, and examples described herein.

In various implementations, computing device 810 may assume a variety of different configurations. For example, computing device 810 may be implemented as a computer-like device including a personal computer, desktop computer, multi-screen computer, laptop computer, netbook, and so forth. The computing device 810 may also be implemented as a mobile device-like device including mobile devices such as mobile telephones, portable music players, portable gaming devices, tablet computers, multi-screen computers, and the like. Computing device 810 may also be implemented as a television-like device that includes devices with or connected to a generally larger screen in a casual viewing environment. These devices include televisions, set-top boxes, game consoles, and the like.

The techniques described herein may be supported by these various configurations of computing device 810 and are not limited to specific examples of the techniques described herein. Computing device 810 may also interact through a wide variety of communication technologies "clouds" 820.

Cloud 820 includes and/or is representative of a platform 822 for resources 824. The platform 822 abstracts underlying functionality of hardware (e.g., servers) and software resources of the cloud 820. Resources 824 may include applications and/or data that may be used when computer processing is performed on a server remote from computing device 810. Resources 824 may also include services provided over the internet and/or over a subscriber network such as a cellular or Wi-Fi network. The platform 822 may abstract resources and functions to connect the computing device 810 with other computing devices. The platform 822 may also serve to abstract the hierarchy of resources to provide a corresponding level of hierarchy encountered for the requirements of the resources 824 implemented via the platform 822.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject matter, from a study of the drawings, the disclosure, and the appended claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

20页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于显示图像的方法和装置

Video playing method, device and equipment and computer readable storage medium

相关技术

网友询问留言