Media file loading method and device and storage medium

文档序号：1784944 发布日期：2019-12-06 浏览：10次中文

阅读说明：本技术 一种媒体文件加载方法、装置及存储介质 (Media file loading method and device and storage medium ) 是由银国徽于 2018-05-29 设计创作，主要内容包括：本公开提供了一种媒体文件加载方法,包括检测播放器播放媒体文件过程中所到达的播放点；从所述媒体文件包括的内容单元中,获取播放时间位于所述播放点之后的内容单元；其中,所述内容单元为基于所述媒体文件根据播放时间发生的内容变换划分得到；通过所述播放器加载所获取的内容单元。本公开还提供一种媒体文件加载装置及存储介质。(The present disclosure provides a media file loading method, including detecting a playing point reached by a player in a process of playing a media file; acquiring a content unit with playing time behind the playing point from the content units included in the media file; the content unit is obtained by converting and dividing the content based on the media file according to the playing time; loading, by the player, the obtained content unit. The disclosure also provides a media file loading device and a storage medium.)

1. a media file loading method, comprising:

detecting a playing point reached by a player in the process of playing a media file;

Acquiring a content unit with playing time behind the playing point from the content units included in the media file;

The content unit is obtained by converting and dividing the content based on the media file according to the playing time;

Loading, by the player, the obtained content unit.

2. the method of claim 1, wherein obtaining the content unit with the playing time after the playing point comprises:

Determining a first time stamp corresponding to the playing start time of the content unit and a second time stamp corresponding to the playing end time of the content unit;

Searching a first key frame with a decoding time before the first timestamp and closest to the first timestamp, and a second key frame with a decoding time after the second timestamp and closest to the second timestamp;

Extracting video frames between the first key frame and the second key frame from the media file.

3. The method of claim 2, further comprising:

Searching a first audio frame with the decoding time before the decoding time of the first key frame and closest to the decoding time of the first key frame, and a second audio frame with the decoding time after the decoding of the second key frame and closest to the decoding time of the second key frame;

Extracting audio frames between the first audio frame and the second audio frame from the media file.

4. the method of claim 1, wherein loading, by the player, the retrieved content unit comprises:

When the player is running in an embedded web page,

And loading the content unit to a media resource expansion interface of the webpage, wherein the media resource expansion interface is used for the player to call the media element of the webpage to play the content unit.

5. The method of claim 4, wherein loading the content unit to a media resource extension interface of a web page comprises:

When the media file is in a non-streaming media file format,

Extracting video frames corresponding to the content units from the media files;

Packaging the video frames corresponding to the content units into a segmented media file, wherein the segmented media file can be independently decoded;

encapsulating the plurality of segmented media files as multimedia data objects.

6. the method of claim 1, wherein loading, by the player, the retrieved content unit comprises:

displaying the summary information of the content unit on a playing interface of the player;

Responding to the operation of selecting the content unit based on the summary information;

loading, by the player, the selected content unit.

7. the method of claim 1, wherein loading, by the player, the retrieved content unit comprises:

When the selected content unit is not determined by human-computer interaction,

And sequentially loading the at least one content unit through the player according to the acquired time stamp corresponding to the starting time of the at least one content unit.

8. The method of claim 1, wherein loading, by the player, the retrieved content unit comprises:

acquiring descending order of the request times of the content units included in the media files;

loading the at least one content unit in descending order of the number of requests.

9. the method of any one of claims 1 to 8, further comprising:

After loading the retrieved content unit by the player,

In response to a request to change display resolution, determining a third timestamp corresponding to a time at which the request was received and a fourth timestamp corresponding to an end time of a currently played content unit;

acquiring a new content unit of which the playing time is between the third time stamp and the fourth time stamp and is the switched target resolution;

Loading, by the player, the retrieved new content unit to replace the content unit of the original resolution.

10. a media file loading apparatus, comprising:

the detection unit is used for detecting the playing point reached by the player in the process of playing the media file;

An obtaining unit, configured to obtain, from content units included in the media file, a content unit whose playback time is located after the playback point;

the content unit is obtained by converting and dividing the content based on the media file according to the playing time;

A loading unit for loading the acquired content unit by the player.

11. The apparatus according to claim 10, wherein the obtaining unit is further configured to determine a first timestamp corresponding to a play start time of the content unit and a second timestamp corresponding to a play end time of the content unit;

Extracting video frames between the first key frame and the second key frame from the media file.

12. the apparatus of claim 11, wherein the obtaining unit is further configured to find a first audio frame whose decoding time is before the decoding time of the first key frame and closest to the decoding time of the first key frame, and a second audio frame whose decoding time is after the decoding of the second key frame and closest to the decoding time of the second key frame;

extracting audio frames between the first audio frame and the second audio frame from the media file.

13. The apparatus of claim 10, wherein the loading unit is further configured to, when the player runs in an embedded manner in a web page,

And loading the content unit to a media resource expansion interface of the webpage, wherein the media resource expansion interface is used for calling the media element of the webpage by the player to play the content unit.

14. The apparatus of claim 13, wherein the loading unit is further configured to, when the media file is in a non-streaming media file format,

Extracting video frames corresponding to the content units from the media files;

Packaging the video frames corresponding to the content units into a segmented media file, wherein the segmented media file can be independently decoded;

Encapsulating the plurality of segmented media files as multimedia data objects.

15. the apparatus of claim 10, wherein the loading unit is further configured to display summary information of the content unit on a playing interface of the player;

Responding to the operation of selecting the content unit based on the summary information;

Loading, by the player, the selected content unit.

16. The apparatus of claim 10, wherein the loading unit is further configured to, when the selected content unit is not determined through human-computer interaction,

And sequentially loading the at least one content unit through the player according to the acquired time stamp corresponding to the starting time of the at least one content unit.

17. The apparatus according to claim 10, wherein the loading unit is further configured to obtain a descending order of the number of requests for content units included in the media file;

loading the at least one content unit in descending order of the number of requests.

18. The apparatus according to any of the claims 10 to 17, wherein, after the loading unit loads the obtained content unit by the player,

the obtaining unit is further configured to determine, in response to a request for changing a display resolution, a third timestamp corresponding to a time at which the request is received and a fourth timestamp corresponding to an end time of a currently played content unit; acquiring a new content unit of which the playing time is between the third time stamp and the fourth time stamp and is the switched target resolution;

The loading unit is further configured to load, by the player, the acquired new content unit to replace the content unit of the original resolution.

19. a media file loading apparatus, comprising:

a memory for storing executable instructions;

A processor for implementing the media file loading method of any one of claims 1 to 9 when executing the executable instructions stored in the memory.

20. A storage medium storing executable instructions for implementing a media file loading method as claimed in any one of claims 1 to 9 when executed.

Technical Field

The present disclosure relates to media playing technologies, and in particular, to a method and an apparatus for loading a media file, and a storage medium.

background

When the multimedia information is played through the webpage, the buffering or loading of the multimedia information is completed by the webpage browser; the method is specifically realized in such a way that a webpage browser loads segmented multimedia data from a current playing point to an ending point from the current playing point, and the browser cannot control the size of the cached or loaded multimedia data in the loading process. Thus, when the user selectively views the loaded multimedia data, unnecessary consumption of traffic is caused.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a media file loading method, apparatus and storage medium, which can reduce unnecessary consumption of traffic when playing multimedia information.

in one aspect, an embodiment of the present disclosure provides a media file loading method, including:

Detecting a playing point reached by a player in the process of playing a media file;

acquiring a content unit with playing time behind the playing point from the content units included in the media file; the content unit is obtained by converting and dividing the content based on the media file according to the playing time;

loading, by the player, the obtained content unit.

on the other hand, an embodiment of the present disclosure further provides a media file loading apparatus, including:

The detection unit is used for detecting the playing point reached by the player in the process of playing the media file;

An obtaining unit, configured to obtain, from content units included in the media file, a content unit whose playback time is located after the playback point;

The content unit is obtained by converting and dividing the content based on the media file according to the playing time;

a loading unit for loading the acquired content unit by the player.

In another aspect, an embodiment of the present disclosure further provides a media file loading apparatus, including:

A memory for storing executable instructions;

And the processor is used for realizing the media file loading method provided by the embodiment of the disclosure by executing the executable instructions stored in the memory.

In another aspect, the present disclosure provides a storage medium storing executable instructions, where the executable instructions are executed to implement the media file loading method provided in the present disclosure.

Executable instructions may be interpreted as a generic concept of installation packages, programs, code, plug-ins, libraries (dynamic/static libraries).

In the embodiment of the disclosure, the playing point reached by the player in the process of playing the media file is detected; acquiring a content unit with playing time behind the playing point from the content units included in the media file; the content unit is obtained by converting and dividing the content based on the media file according to the playing time; loading, by the player, the obtained content unit. Therefore, the media file is divided in advance to obtain a plurality of content units, and the player loads part of the content units behind the playing point or loads only one piece of content behind the playing point, so that when a user selectively watches the segmented media file loaded behind the playing point, the player loads the content units behind the playing point in advance, and the consumption of flow can be avoided.

Drawings

FIG. 1 is a schematic view of an alternative construction of a container provided by embodiments of the present disclosure;

Fig. 2 is a schematic diagram of an alternative package structure of an MP4 file provided by an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a media data container storing media data in a media file provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an alternative packaging structure of an FMP4 file provided by an embodiment of the present disclosure;

FIG. 5 is an alternative structural diagram of a media file loading device according to an embodiment of the disclosure;

FIG. 6 is a schematic diagram of an alternative processing flow of a media file loading method provided by an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of an alternative process for obtaining a content unit with a playing time after the playing point according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of an alternative process flow for loading content units into an MSE interface of a web page when a media file is in the MPEG 4 file format according to an embodiment of the present disclosure;

FIG. 9 is an alternative flow diagram of a package provided by examples of the present disclosure;

FIG. 10 is a schematic diagram of an alternative process for parsing media information from a metadata container according to an embodiment of the present disclosure;

FIG. 11 is a schematic diagram of an alternative process flow for loading a retrieved content unit by a player according to an embodiment of the present disclosure;

FIG. 12A is a diagram illustrating summary information of a content unit displayed on a play progress bar of a player according to an embodiment of the disclosure;

FIG. 12B is a diagram illustrating an embodiment of the present disclosure displaying summary information of a content unit in an idle area of a playing interface of a player;

FIG. 13 is a schematic flow chart illustrating yet another alternative process for loading an acquired content unit via a player according to an embodiment of the present disclosure;

Fig. 14 is a schematic flowchart of a player sending a segmented media file to a media element of a web page for decoding and playing through a media source extension interface of the web page according to an embodiment of the present disclosure

FIG. 15 is an alternative diagram of a player according to an embodiment of the present disclosure playing a segmented media file through a media source extension interface of a web page;

FIG. 16 is a schematic diagram of an MP4 file being converted into an FMP4 file and being played through a media source extension interface according to an embodiment of the present disclosure;

fig. 17 is a schematic processing flow diagram of another alternative media file loading method applied to a player according to an embodiment of the disclosure;

fig. 18 is a schematic structural diagram of a media file loading apparatus according to an embodiment of the present disclosure.

Detailed Description

For the purpose of making the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be described in further detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present disclosure, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present disclosure.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure.

Before the present disclosure is explained in further detail, terms and expressions referred to in the embodiments of the present disclosure are explained, and the terms and expressions referred to in the embodiments of the present disclosure are applied to the following explanations.

1) a media file storing a file of encoded media data (e.g., at least one of audio data and video data) in a container (Box), which further includes metadata to express media information to ensure that the media data is correctly decoded.

For example, a file in which multimedia data is packaged in the MP4 container format is referred to as an MP4 file. Typically, the MP4 file stores therein Video data encoded by the Advanced Video Coding (AVC, h.264) or MPEG-4(Part 2) specification and Audio data encoded by the Advanced Audio Coding (AAC) specification, although other encoding schemes for Video and Audio are not excluded.

2) The container (Box), also called a Box, an object-oriented component defined by a unique type identifier and a length, see fig. 1, is an optional structural diagram of the container provided by the embodiment of the present disclosure, and includes a container Header (Box Header) and container Data (Box Data), which are filled with binary Data to express various information.

The container header includes a size (size) and a type (type), the size indicates a size of a storage space occupied by the container, the type indicates a type of the container, and referring to fig. 2, the container header is an optional packaging structure diagram of an MP4 file provided by an embodiment of the present disclosure, and basic container types referred to in an MP4 file include a file type container (ftyp box), a metadata container (moov box), and a media data container (mdat box).

the container data portion may store specific data, where the container is referred to as a "data container," and may further encapsulate other types of containers, where the container is referred to as a "container of a container.

3) track (Track), a temporally ordered sequence of related samples (Sample) in a container of media data, a Track representing a sequence of video frames or a sequence of audio frames for media data, and a set of consecutive samples of the same Track called a Chunk (Chunk).

4) a file type container, a container for storing the capacity (i.e. length of occupied bytes) and type of a file in a media file, as shown in fig. 2, binary data stored in the file type container describes the type and capacity of the container according to the specified byte length.

5) A metadata container, a container in a media file for storing metadata (i.e., data describing multimedia data stored in the media data container), and information expressed by binary data stored in the metadata container in the MP4 file are referred to as media information.

As shown in fig. 2, the header of the metadata container represents the type of the container as "moov box" using binary data, and the container data part encapsulates an mvhd container for storing general information of an MP4 file, is independent of an MP4 file, and is related to the playing of an MP4 file, including a time length, a creation time, a modification time, and the like.

The media data container of the media file may include sub-containers corresponding to a plurality of tracks, such as an audio track container (audio track box) and a video track container (video track box), in which references and descriptions of media data of the corresponding tracks are included, and the necessary sub-containers include: a container (denoted tkhd box) for describing the characteristics and overall information of the track (e.g. duration, width, height), and a container (denoted mdia box) for recording media information of the track (e.g. information of media type and sample).

As for the sub-containers packaged in the mdia box, it may include: recording the relevant attributes and content of the track (denoted mdhd box), recording the playing procedure information of the media (denoted hdlr box), describing the media information of the media data in the track (denoted minf box); the minf box in turn has a sub-container (denoted as dinf box) for explaining how to locate the media information, and a sub-container (denoted as stbl box) for recording all the time information (decoding time/display time), position information, codec etc. of the samples in the track.

Referring to fig. 3, which is a schematic structural diagram of a media data container in a media file for storing media data according to an embodiment of the present disclosure, the time, type, capacity and location of a sample in the media data container can be interpreted by using media information identified from binary data in a stbl box container, and each sub-container in the stbl box is described below.

the stsd box contains a sample description (sample description) table, and there may be one or more description tables in each media file according to different coding schemes and the number of files storing data, and the description information of each sample can be found through the description tables, and the description information can ensure correct decoding of the sample, and different media types store different description information, for example, the description information is the structure of the image in the case of video media.

the stts box stores the duration information of the samples and provides a table to map time (decoding time) and the serial numbers of the samples, and the samples at any time in the media file can be located through the sttx box; the stts box also uses other tables to map the sample size and pointers, where each entry in the table provides the serial number of consecutive samples within the same time offset and the offset of the sample, and increments these offsets to build a complete time-sample mapping table, and the calculation formula is as follows:

DT(n+1)＝DT(n)+STTS(n) (1)

Where STTS (n) is the duration of the nth sample, DT (n) is the display time of the nth sample, the samples are arranged in a time sequence, such that the offset is always non-negative, DT generally starts with 0, and the calculation formula is as follows, taking the display time DT (i) of the ith sample as an example:

DT(i)＝SUM(for j＝0 to i-1 of delta(j)) (2)

the sum of all offsets is the duration of the media data in the track.

the stss box records the sequence number of the key frame in the media file.

the stsc box records the mapping relation between the samples and the blocks for storing the samples, the relation between the serial numbers of the samples and the serial numbers of the blocks is mapped through a table, and the blocks containing the specified samples can be found through table lookup.

The stco box defines the position of each block in the track, expressed in terms of the offset of the starting byte in the media data container, and the length (i.e., the size) relative to the starting byte.

The stsz box records the size (i.e., size) of each sample in the media file.

6) A media data container, a container for storing multimedia data in a media file, for example, a media data container in an MP4 file, as shown in fig. 3, a sample is a unit stored in the media data container, and is stored in a block of the media file, and the lengths of the block and the sample may be different from each other.

7) And segmenting the media files, wherein the media files are divided into subfiles, and each segmented media file can be independently decoded.

Taking the MP4 file as an example, the media data in the MP4 file is divided according to the key frames, the divided media data and the corresponding metadata are packaged to form FMP4 file, and the metadata in each FMP4 file can ensure that the media data is decoded correctly.

for example, when converting an MP4 file as shown in fig. 2 into a plurality of Fragmented MP4(Fragmented MP4) files, referring to fig. 4, which is an optional packaging structure diagram of an FMP4 file provided in the embodiment of the present disclosure, one MP4 file may be converted into a plurality of FMP4 files, and each FMP4 file includes three basic containers: moov containers, moof containers, and mdat containers.

The moov container includes MP4 file level metadata describing all media data in the MP4 file from which the FMP4 file is derived, such as the duration, creation time, and modification time of the MP4 file.

The moof container stores segment-level metadata describing media data packaged in the FMP4 file where it is located, ensuring that the media data in the FMP4 can be decoded.

The 1 moof container and the 1 mdat container constitute 1 segment of the segment MP4 file, and 1 or more such segments may be included in the 1 segment MP4 file, and the metadata encapsulated in each segment ensures that the media data encapsulated in the segment can be independently decoded.

the following describes a flow of a player implementing an embodiment of the present disclosure to acquire media data in a given period.

When playing a movie or a track, the player must be able to correctly parse the data stream, obtain the corresponding media data for a certain time and ensure that the piece of media data can be decoded independently.

1. determining a time interval corresponding to the media data to be acquired, wherein the time interval is a period of time for continuously playing a current playing point, and the time corresponding to the playing point is time measurement relative to a media time coordinate system (taking the playing start time of the media file as a time origin).

2. The stts box is checked to determine the sequence number of the samples for a given period of decoding time.

for audio frame frames, the stts box is checked to determine the sequence number of the audio frame for a given period of decoding time.

For video frames, due to the compression algorithm, if the first frame in a given period is not a key frame, the first frame in the given period needs to be traced back to the key frame before the start time of the given period according to the time sequence to ensure that the frames in the given period can be decoded.

3. the sequence number of the block including the sample is determined by querying the stsc box according to the employed sequence number.

4. The offset of the block is looked up from the stco box.

5. And searching the stsz box according to the serial number of the sample, and finding the offset of the sample in the block and the volume of the sample.

The process of finding key frames implementing the embodiments of the present disclosure is described.

1. The sequence number of samples in a given time is determined.

2. the stss box is checked to find the key frame after this sample.

3. The stsc box is checked to find the block corresponding to the key frame.

4. The offset of the block is extracted from the stco box.

5. The stsz box is used to find the offset of the key frame sample within the block and the size of the key frame.

The following first describes a media file loading device for implementing the embodiment of the present disclosure, where the media file loading device detects a playing point reached by a player in a process of playing a media file; acquiring a content unit with playing time behind the playing point from the content units included in the media file; the content unit is obtained by converting and dividing the content based on the media file according to the playing time; loading, by the player, the obtained content unit.

The following continues to describe the structure of a media file loading apparatus that implements an embodiment of the present disclosure.

referring to fig. 5, which is an optional structural schematic diagram of the media file loading apparatus 100 according to the embodiment of the disclosure, the media file loading apparatus shown in fig. 5 includes: at least one processor 150, at least one communication bus 160, a user interface 180, at least one network interface 170, and memory 190. The various components in the media file loading device 100 are coupled together by a communication bus 160. It will be appreciated that a communication bus 160 is used to enable communications among the components. The communication bus 160 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various busses are labeled in figure 5 as communication bus 160.

The user interface 180 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, or a touch screen. The network interface 170 may include a standard wired interface and the wireless interface may be a WiFi interface.

It is understood that the Memory 190 may be a high-speed RAM Memory or a Non-Volatile Memory (Non-Volatile Memory), such as at least one disk Memory. Memory 190 may also be at least one storage system physically located remote from processor 150.

the media file loading method applied to the media file loading apparatus provided by the embodiment of the present disclosure may be applied to the processor 150, or implemented by the processor 150. The processor 150 may be an integrated circuit chip having signal processing capabilities. In implementation, various operations in the media file loading method applied to the media file loading apparatus may be performed by an integrated logic circuit in the form of hardware or instructions in the form of software in the processor 150. The processor 150 described above may be a general purpose processor, a DSP or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. The processor 150 may implement or execute a media file loading method, steps and logic block diagram applied to a media file loading apparatus according to the embodiments of the present disclosure. A general purpose processor may be a microprocessor or any conventional processor or the like. The media file loading method applied to the media file loading device provided by the embodiment of the disclosure can be directly implemented by a hardware decoding processor, or implemented by combining hardware and software modules in the decoding processor.

By way of example, the software module may be located in a storage medium, which may be the memory 190 shown in fig. 5, and the processor 150 reads executable instructions in the memory 190 and, in conjunction with hardware thereof, performs an optional process flow of the media file loading method applied to the media file loading apparatus provided by the disclosed embodiment, as shown in fig. 6, including the following steps:

Step S101, detecting the playing point reached by the player in the process of playing the media file.

In some embodiments, the play point may be a time reached by a jump operation of the play progress, for example, the original play point is 20% of the play progress, and the jump play point is 30%; the playing point may also be a time reached by continuous playing, for example, a time reached from 30 th minute to 40 th minute.

the player may be an H5 player embedded in a browser, or may be a dedicated video playing Application (APP).

step S102, obtaining the content unit with the playing time after the playing point from the content units included in the media file.

Here, the media file may be stored in the server, and therefore, the media file loading means acquires a content unit whose play time is located after the play point from among content units included in the media file stored in the server.

The content unit is obtained by pre-dividing the media file according to the content transformation generated by the playing time.

in some embodiments, the content unit divides the media file into portions based on at least one of the following factors: characters, scenes and episodes in a media file.

Taking the example that the media file is a spring festival leaping evening party, the spring festival leaping evening party can be divided into a plurality of content units according to the plots (program contents) in the spring festival leaping evening party. Taking the example that the media file is a talk show, the talk show can be divided into a plurality of content units according to the characters in the talk show. Taking the example where the media file is a movie, the movie may be divided into a plurality of content units according to scenes in the movie.

in some embodiments, for a media file that does not support streaming media playback, such as MPEG 4 file format, the media file is an optional processing flow for acquiring a content unit whose playback time is after the playback point, as shown in fig. 7, the optional processing flow includes the following steps:

Step S1021, determining a first time stamp corresponding to the play start time of the content unit and a second time stamp corresponding to the play end time of the content unit.

Step S1022, a first key frame whose decoding time is before the first timestamp and is closest to the first timestamp, and a second key frame whose decoding time is after the second timestamp and is closest to the second timestamp are searched.

In step S1023, video frames between the first key frame and the second key frame are extracted from the media file.

Here, since the decoding time of the first key frame is earlier than the first time stamp corresponding to the play start time of the content unit and the decoding time of the second key frame is later than the second time stamp corresponding to the play end time of the content unit, it is possible to make the extracted video frames between the first key frame and the second key frame all video frames in the content unit and avoid the loss of the extracted video frames in the content unit.

Step S1024, find a first audio frame whose decoding time is before the decoding time of the first key frame and is closest to the decoding time of the first key frame, and a second audio frame whose decoding time is after the decoding of the second key frame and is closest to the decoding time of the second key frame.

Each independent audio frame can be decoded independently, and only the key frame in the video frame can represent all image information, namely, the image information corresponding to the key frame can be obtained only by decoding the key frame; therefore, the embodiment of the disclosure first obtains a start key frame (i.e., a first key frame) corresponding to the content unit and an end key frame (i.e., a second key frame) corresponding to the content unit, and then obtains a first audio frame corresponding to the first key frame and a second audio frame corresponding to the second key frame respectively through the obtained first key frame and the second key frame; therefore, the consistency of the audio data and the video data in the media file can be maintained, and the phenomenon that only the video data does not have the audio data is avoided.

In step S1025, an audio frame between the first audio frame and the second audio frame is extracted from the media file.

in the embodiment of the present disclosure, the video frame extracted in step S1023 and the audio frame extracted in step S1025 together constitute a content unit after the play time maintenance play point.

In other embodiments, for a media file in a streaming media format, an optional processing flow for acquiring a content unit whose playing time is located after the playing point is as follows: directly acquiring a segmented file between the playing start time and the playing end time of a content unit in a media file; wherein, each segmented file can be decoded and played independently.

in step S103, the acquired content unit is loaded by the player.

in some embodiments, for media files in streaming media format, such as HLS and FLV, the loaded content units are segments of the media file between the play start time and the play end time of the content units that can be independently decoded for play; when the player runs in a mode of being embedded in the webpage, the obtained content unit is loaded to the MSE interface of the webpage, and the MSE interface is used for enabling the player to call the media element of the webpage to play the obtained content unit.

here, the web page may be a web page of a browser, or a web page of an APP in which a browser kernel is embedded.

One implementation way for the MSE interface to call the media element of the web page to play the acquired content unit is as follows: the MSE interface creates a Media Source object (Media Source) as a Source of a virtual Uniform Resource Locator (URL), creates a cache object (Source Buf fer) as a cache of the Media Source, and adds frames (including video frames and audio frames) between two or more adjacent key frames of the content unit to the cache object; calling a media element of the webpage to play the virtual URL; here, the media element is an audio/video tag.

In other embodiments, when the media file is in the MPEG 4 file format, an alternative process for loading content units into the MSE interface of the web page, as shown in fig. 8, comprises the following steps:

step S1031, extracting video frames corresponding to the content units from the media file.

in some embodiments, the specific implementation process of extracting the video frame corresponding to the content unit is the same as the specific implementation process of step S1023.

After extracting the video frame corresponding to the content unit, the audio frame corresponding to the content unit needs to be extracted; the specific implementation process of extracting the audio frame corresponding to the content unit is the same as the specific implementation process of step S1025.

step S1032 packages the video frames corresponding to the content units into segmented media files.

In some embodiments, it is desirable to encapsulate video frames and audio frames corresponding to a content unit together into a segmented media file.

here, the segmented media files can be independently decoded.

step S1033 encapsulates the plurality of segmented media files into a multimedia data object.

In some embodiments, the player obtains media data corresponding to a given time period from the server, where the given time period is used for continuing a playback point, and encapsulates the media data according to the media data and metadata describing the media data according to an encapsulation structure of the segmented media file, so as to form a segmented media file that can be used for being independently decoded by media elements of the web page.

Referring to fig. 9, fig. 9 is an alternative flow chart of packaging segmented media files provided by the disclosed example, which will be described in conjunction with the steps shown in fig. 9.

Step S201, filling data representing the type and compatibility of the segmented media file into a file type container of the segmented media file.

For example, taking as an example an FMP4 file packaged to form a package structure as shown in fig. 4, the type and length of a container (representing the entire length of the ftyp box) are filled in the header of the file type container of the FMP4 file, that is, the ftyp box, and data (binary data) representing that the file type is FMP4 and a compatible protocol is generated by filling in the data portion of the ftyp box.

step S202, filling metadata indicating a file level of the segmented media file into a metadata container of the segmented media file.

In some embodiments, the metadata describing the media data required to fill the nested structure is calculated from the nested structure of the metadata container in the segmented media file, based on the media data to be filled into the encapsulation structure of the segmented media file.

Still taking fig. 4 as an example, metadata representing the file level of the FMP4 file is calculated and filled into a metadata container (i.e., moov box) of the FMP4, in which three containers of mvhd, track, and video extension (mvex) are nested.

Wherein, the metadata packaged in the mvhd container is used for representing the media information related to the playing of the segmented media file, including the position, the duration, the creation time, the modification time, and the like; the sub-containers nested in the track container represent references and descriptions of corresponding tracks in the media data, for example, a container (denoted as tkhd box) in which characteristics and overall information (such as duration and width) describing the tracks, and a container (denoted as mdia box) in which media information (such as information of media type and sample) of the tracks are nested in the track container.

Step S203, correspondingly filling the extracted media data and the metadata describing the media data into a media data container and a metadata container at a segment level in a segment container of the segmented media file.

In some embodiments, one or more segments (fragments) may be encapsulated in a segmented media file, and for media data to be filled, one or more segmented media data containers (i.e., mdat boxes) of the segmented media file may be filled, and a segment-level metadata container (denoted as moof box) is encapsulated in each segment, wherein the filled metadata is used to describe the media data filled in the segment, so that the segments can be independently decoded.

in conjunction with fig. 4, for example, the media data to be filled is filled into 2 segments of the packaging structure of the FMP4 file, and each segment is filled with the media data; the metadata that needs to be filled into the metadata container (i.e., moof box) of the segmentation level of the corresponding segment is calculated and correspondingly filled into the child containers nested in the moof box, wherein the head of the moof box is called moof box, and the filled binary data is used to indicate the type of the container as "moof box" and the length of the moof box.

In some embodiments of filling data into the corresponding container in steps S201 to S203, when the filling operation is performed, the write operation function of the calling class completes writing and merging of binary data in the memory buffer of the child container, and returns an instance of the class, where the returned instance is used to merge the child container with the child container having the nested relationship.

as an example of the stuffing data, a class MP4 for implementing a package function is established, and each sub-container in the segmented media file is packaged as a static method of class Stream; establishing class streams for realizing binary data operation functions, wherein each class Stream is provided with a memory buffer area for storing binary data to be filled; converting multi-byte decimal data to be padded into binary data by a static method provided by Stream; merging and filling binary data to be filled into the sub-containers in the memory buffer area through a write operation function provided by the Stream-like instance; the static method provided by Stream returns a new Stream instance, and the merging of the current child container and other child containers with nested relation can be realized.

Before the encapsulation of the segmented media file, metadata of the media data to be filled needs to be calculated, which needs to be calculated in combination with metadata in the media file to obtain metadata at the level of the segmented media file (for example, corresponding to metadata filled into moov box for FMP4 file), and metadata at the level of the segments in the segmented media file (for example, corresponding to metadata filled into moof box for FMP4 file).

In the following, an exemplary implementation is described in which metadata encapsulated in a metadata container of a media file is parsed to obtain media information describing media data encapsulated in a media data container of the media file.

In some embodiments of the present disclosure, the media file is an MP4 file, the nested structure of the sub-containers in the metadata container of the media file is parsed, and binary data in each sub-container is read according to the nested structure; and analyzing the media information of the media data represented by each sub-container from the read binary data.

with reference to the structure shown in fig. 2, the moov container of the MP4 file is a nested structure, the nested structure of the sub-containers in the metadata container is parsed, the sub-containers nested in the moov container, such as the mvhd container, the audio track container, and the video track container, are determined, if the sub-containers are also nested with containers, the parsing continues until the sub-containers are parsed, the binary data encapsulated in the corresponding sub-containers are read, and the media message represented by the binary data, such as the sequence number of the key frame in the media file recorded by the stss box, the volume (i.e., size) of each sample in the media file recorded by the stsz box, is parsed.

in some embodiments of the present disclosure, a manner is provided by which the parser is set according to the container type, and the child containers in the metadata container are parsed according to the container type, so as to obtain the media information, which will be described in conjunction with fig. 9.

referring to fig. 10, fig. 10 is a schematic diagram of an alternative flow for parsing media information from a metadata container according to an embodiment of the present disclosure, which will be described in conjunction with the steps shown in fig. 10.

Step S301 locates the position of the metadata container in the media file.

In some embodiments, by reading binary data from the binary data of the media file that conforms to the container header specification, the offset and the size of the metadata container in the media file are located according to the type and length of the container identified in the read binary data.

for example, for binary data of a media file, binary data starting from zero bytes corresponds to a file type container, the starting position of the binary data of the media file is read through the specified length of the container header, the binary data conforming to the specified length of the container header is read, and the type and length of a container located behind the file type container in the media file can be determined by parsing the read binary data.

if the parsed type is a file type container, the length (i.e., capacity) of the metadata container can be parsed, and the offset of the metadata container is the length of the file type container.

If the parsed type is a media data container, the binary data conforming to the canonical length of the container header is continuously read according to the length of the media data container and the length of the classified type container as offsets, so that the length (i.e., the capacity) of the metadata container can be parsed, wherein the length of the metadata container is the sum of the length of the file type container and the length of the media data container.

The media file has no specification except that the initial container is a file type container, and the packaging sequence of the subsequent container is not specified, so that the positions of the file type container in the media file can be accurately and efficiently located no matter whether the packaging sequence of the container in the media file is the file type container, the metadata container and the media data container or the file type container, the media data container and the metadata container.

Step S302, according to the position of the metadata container in the media file, binary data corresponding to the metadata container is obtained from the binary data of the media file.

The position of the metadata container in the media file is represented by an offset and a capacity, the binary data is read from the media file at the position corresponding to the offset until the length of the read binary data conforms to the capacity of the metadata container, and therefore the binary data corresponding to the metadata container is read.

step S303, sequentially analyzing binary data corresponding to the specified length of the container header in the binary data of the metadata container to obtain the container type of the child container in the metadata container and the length of the container data of the child container.

in some embodiments, for the case where multiple sub-containers are nested in a metadata container, the offset of each reading of binary data is the sum of the lengths of the sub-containers that have been identified, and the length of the read binary data conforms to the canonical length of the container header, so that the type and length of the currently processed sub-container can be parsed.

For example, when reading for the first time, the binary data is read from zero bytes of the binary data of the metadata container, and the length of the read binary data conforms to the specified length of the container header, so that the type and length of the first sub-container can be parsed; and in the second reading, the binary data is read by taking the length of the first read sub-container as an offset, and the length of the read binary data conforms to the specified length of the container header, so that the type and the length of the second sub-container can be analyzed.

the binary data is read in the mode, the condition of backspacing caused by multi-reading can not occur, the condition of secondary reading caused by less reading can not occur, and the analysis efficiency and the accuracy can be ensured.

And step S304, calling a parser with a type corresponding to the container type of the sub-container, and sequentially parsing binary data corresponding to the length of the container data in the unresolved data to obtain the media information represented by the container data.

In some embodiments, a typical container type nested in a metadata container is pre-marked for indicating whether the container is directly used for packaging binary data or is further packaged with a container, for example, a container is further packaged with a mark such as mvhd box, audio track box, and video track box shown in fig. 2, and a container is directly packaged with binary data with a mark such as stts box, stsd box shown in fig. 2.

Setting parsers corresponding to the container types one by one for the container types marked as directly encapsulating the binary data, wherein the parsers are used for parsing the represented media information according to the binary data; in step S304, when the container type of the child container parsed in step S303 is compared with the pre-marked container type, the following two cases are involved.

Case 1) when the container type of the child container is determined to be pre-marked by comparison and pre-marked for directly encapsulating binary data, a parser corresponding to the container type of the child container is called, and the container data in the child container is parsed by the parser, so that the media information represented by the container data is obtained.

Case 2) when it is determined by the comparison that the container type of the child container is pre-marked and pre-marked for continuously packaging the container, recursively parsing the binary data corresponding to the child container according to the canonical length of the container header in the media file until it is parsed that the container type of the container packaged in the child container is pre-marked and pre-marked for directly packaging the binary data, calling a parser corresponding to the container type of the container packaged in the child container, parsing the binary data byte by byte, the length of the parsed binary data corresponding to the length of the container data of the container packaged in the child container, to obtain the media information represented by the container data of the container packaged in the child container.

in some embodiments, a method for recording media information during parsing a metadata container is described, where binary data corresponding to a canonical length of a container header in binary data of the metadata container is sequentially parsed to obtain a container type of a child container in the metadata container, an object is established according to a nesting relationship between the child container and an attributed container and a nesting relationship between the child container and an encapsulated container, and when the container type of the child container is pre-marked to be used for directly encapsulating the binary data, an array including the media information is stored in the object established in the corresponding child container, where the stored media information is represented by the container data of the child container.

for example, in fig. 2, when the type of the parsed sub-container is stts box, since the stts box is pre-marked as direct package binary data, an array including media information is stored in an object created corresponding to the stts box, where the media information is duration information represented by container data of the stts box.

in some embodiments, the manner of recording the nesting relationship between the child containers in the process of parsing the metadata container is described in that when binary data corresponding to a canonical length of a container header in the binary data of the metadata container is sequentially parsed to obtain a container type of the child container in the metadata container, if the container type is pre-marked as directly encapsulating the binary data, the parsed child container is recorded in a invoked parser; and setting the recorded instances of the child containers into child container attributes, wherein the child container attributes are included in the containers to which the child containers belong, and are used for describing the nesting relation between the child containers and the belonged containers.

For example, in fig. 2, when the type of the parsed sub-container is stsd box, since stsd box is pre-marked as directly encapsulating binary data, stsd box is recorded in the parser corresponding to the stsd box, an instance of stsd box is set to the stbl box sub-container attribute, and so on, and finally a plurality of sub-containers nested in stbl box, such as stsd box, stts box, stsc box, etc., are recorded in the sub-container attribute of stsd box.

in some embodiments, when the container type of the sub-container is not pre-marked or the container type of the sub-container is pre-marked to directly package the binary data but a parser of the corresponding type is not called, the binary data corresponding to the sub-container is ignored for parsing, and according to the length of the sub-container, a part of the binary data corresponding to a next sub-container is jumped to for parsing.

In fact, the user-defined container type can appear in the media file, the progress of the overall analysis of the metadata container can not be influenced in a skipping mode, meanwhile, when the container type of the metadata container changes, the latest metadata container can be compatible and analyzed quickly by adding, deleting and modifying analyzers of corresponding types, and the media file has the characteristic of flexible and quick upgrading.

In the embodiment of the present disclosure, an optional process flow for loading the obtained content unit by the player, as shown in fig. 11, includes:

Step S401, displaying summary information of the content unit on the playing interface of the player.

Here, the summary information is used to represent main information of the content unit, such as a scene, a character, a plot, and the like corresponding to the content unit. Taking the media file as a spring festival and a happy evening as an example, the summary information of the content unit may include: type (frets, songs, dances, etc.), characters (performers, composers, dancers, etc.), duration of performance, etc.

In some embodiments, as shown in fig. 12A, summary information of a content unit may be displayed on a play progress bar of a player.

In other embodiments, as shown in fig. 12B, the summary information of the content unit may also be displayed in a free area of the playing interface of the player.

When the summary information of the content units is displayed, the summary information of a preset number of content units can be displayed, the summary information of all the content units can be displayed, and the summary information of the content units included in a preset time length can be displayed.

Step S402, responding to the operation of selecting the content unit based on the summary information.

In some embodiments, the user triggers selection of a content unit to be loaded by mouse clicking or the like based on the summary information of the displayed content unit.

in step S403, the selected content unit is loaded by the player.

In the embodiment of the present disclosure, when the selected content unit is not determined through human-computer interaction, another optional processing flow for loading the acquired content unit through the player is as follows:

And sequentially loading the at least one content unit through the player according to the acquired time stamp corresponding to the starting time of the at least one content unit.

Here, the sequential loading refers to a chronological order corresponding to the starting time of the content unit.

In specific implementation, in order to save the cache space, all content units after the play point are not loaded at the same time, and only part of content units after the play point are loaded. Of course, when there is only one content unit after the play point, the content unit is loaded.

When a plurality of content units are loaded, the loaded content units may be displayed differently, such as displaying different loaded content units in different colors on the progress bar, or displaying different loaded content units in different contrast pairs on the progress bar.

In the embodiment of the present disclosure, a further optional processing flow for loading the obtained content unit by the player, as shown in fig. 13, includes:

in step S801, the descending order of the number of requests for content units included in the media file is obtained.

In some embodiments, the number of requests for content units included in the media file is first obtained, and the obtained number of requests for content units is then arranged in descending order.

31页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：基于离线转换的播放控制方法、装置及存储介质

Media file loading method and device and storage medium

相关技术

网友询问留言