Dynamic self-adaptive streaming transmission method and device based on HTTP

文档序号:261471 发布日期:2021-11-16 浏览:6次 中文

阅读说明:本技术 基于http的动态自适应流传输的方法和装置 (Dynamic self-adaptive streaming transmission method and device based on HTTP ) 是由 伊拉吉·索达加 于 2020-09-28 设计创作,主要内容包括:本公开的各方面提供了接收媒体数据的方法、装置和非易失性计算机可读存储介质。一种装置包括处理电路,所述处理电路接收包括多个元数据样本的多个片段的定时元数据轨道,所述多个片段中的每一个仅包括所述多个元数据样本中的一个,所述多个元数据样本中的每一个包括一个或多个事件消息框。对所述定时元数据轨道执行碎片化和碎片整理过程。所述处理电路确定每个事件消息框的开始时间和活动持续时间。所述处理电路基于所述事件消息框的开始时间和活动持续时间,处理包括在所述事件消息框中的事件信息。(Aspects of the present disclosure provide methods, apparatuses, and non-volatile computer-readable storage media for receiving media data. An apparatus includes a processing circuit that receives a timed metadata track comprising a plurality of segments of a plurality of metadata samples, each of the plurality of segments comprising only one of the plurality of metadata samples, each of the plurality of metadata samples comprising one or more event message boxes. Performing a defragmentation and defragmentation process on the timed metadata track. The processing circuit determines a start time and an activity duration for each event message box. The processing circuit processes event information included in the event message box based on the start time and the activity duration of the event message box.)

1. A method of receiving media data, the method comprising:

receiving a timed metadata track comprising a plurality of segments of a plurality of metadata samples, each of the plurality of segments comprising only one of the plurality of metadata samples, each of the plurality of metadata samples comprising one or more event message boxes;

determining a start time and an activity duration for each event message box; and

processing event information included in the event message box based on the start time and the activity duration of the event message box, wherein

Performing a defragmentation and defragmentation process on the timed metadata track.

2. The method of claim 1, wherein the time stamp of each event message box is equal to the time stamp of the timed metadata track.

3. The method of claim 1, wherein the presentation time and duration of each of the plurality of metadata samples is equal to an earliest presentation time and duration of one of the plurality of segments comprising the respective metadata sample.

4. The method of claim 3, wherein the presentation time of each of the plurality of metadata samples is an anchor point for one of a presentation time value and a presentation time delta value for an event message box included in the respective metadata sample.

5. The method of claim 1, wherein a sum of a start time and an activity duration of each event message box is limited by an end presentation time of the timed metadata track.

6. The method of claim 1, wherein each of the plurality of segments is one of a Common Media Application Format (CMAF) fragment and a dynamic adaptive streaming over hypertext transfer protocol (DASH) fragment.

7. The method of claim 1, wherein each event message box included in one of the plurality of metadata samples includes a different schema identifier.

8. The method of claim 1, wherein the fragmenting and defragmenting process is based on the international organization for standardization ISO/international electrotechnical commission IEC base media file format ISOBMFF fragmenting and defragmenting process.

9. An apparatus for receiving media data, the apparatus comprising processing circuitry configured to:

receiving a timed metadata track comprising a plurality of segments of a plurality of metadata samples, each of the plurality of segments comprising only one of the plurality of metadata samples, each of the plurality of metadata samples comprising one or more event message boxes;

determining a start time and an activity duration for each event message box; and

processing event information included in the event message box based on the start time and the activity duration of the event message box, wherein

Performing a defragmentation and defragmentation process on the timed metadata track.

10. The apparatus of claim 9, wherein a time stamp of each event message box is equal to a time stamp of the timed metadata track.

11. The apparatus of claim 9, wherein a presentation time and duration of each of the plurality of metadata samples is equal to an earliest presentation time and duration of one of the plurality of segments comprising the respective metadata sample.

12. The apparatus of claim 11, wherein the presentation time of each of the plurality of metadata samples is an anchor point for one of a presentation time value and a presentation time delta value for an event message box included in the respective metadata sample.

13. The apparatus of claim 9, wherein a sum of a start time and an activity duration of each event message box is limited by an end presentation time of the timed metadata track.

14. The apparatus of claim 9, wherein each of the plurality of segments is one of a Common Media Application Format (CMAF) fragment and a dynamic adaptive streaming over hypertext transfer protocol (DASH) fragment.

15. The apparatus of claim 9, wherein each event message box included in one of the plurality of metadata samples includes a different schema identifier.

16. The apparatus of claim 9, wherein the defragmentation and defragmentation process is based on the international organization for standardization ISO/international electrotechnical commission IEC base media file format ISOBMFF defragmentation and defragmentation process.

17. A non-transitory computer-readable storage medium storing instructions that, when executed by a computer for receiving media data, cause the computer to perform:

receiving a timed metadata track comprising a plurality of segments of a plurality of metadata samples, each of the plurality of segments comprising only one of the plurality of metadata samples, each of the plurality of metadata samples comprising one or more event message boxes;

determining a start time and an activity duration for each event message box; and

processing event information included in the event message box based on the start time and the activity duration of the event message box, wherein

Performing a defragmentation and defragmentation process on the timed metadata track.

18. The non-transitory computer-readable storage medium of claim 17, wherein the timestamp of each event message box is equal to the timestamp of the timed metadata track.

19. The non-transitory computer-readable storage medium of claim 17, wherein a presentation time and duration of each of the plurality of metadata samples is equal to an earliest presentation time and duration of one of the plurality of segments that includes the respective metadata sample.

20. The non-transitory computer-readable storage medium of claim 19, wherein the presentation time of each of the plurality of metadata samples is an anchor point for one of a presentation time value and a presentation time delta value for an event message box included in the respective metadata sample.

Technical Field

Embodiments are described that generally relate to methods and apparatus for dynamic adaptive streaming over hypertext transfer protocol.

Background

The background description provided herein is intended to present the background of the disclosure as a whole. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description, is not admitted to be prior art by inclusion in the filing of this disclosure, nor is it expressly or implied that it is prior art to the present disclosure.

Moving Picture Experts Group (MPEG) dynamic adaptive streaming over hypertext transfer protocol (DASH) provides a standard for streaming multimedia content over IP networks. The DASH standard allows for event message boxes to be carried in media segments.

Disclosure of Invention

Aspects of the present disclosure provide an apparatus for receiving media data. An apparatus includes a processing circuit that receives a timed metadata track comprising a plurality of segments of a plurality of metadata samples, each of the plurality of segments comprising only one of the plurality of metadata samples, each of the plurality of metadata samples comprising one or more event message boxes. The processing circuit determines a start time and an activity duration for each event message box. The processing circuit processes event information included in the event message box based on the start time and the activity duration of the event message box. Performing a defragmentation and defragmentation process on the timed metadata track.

In an embodiment, the time stamp of each event message box is equal to the time stamp of the timed metadata track.

In an embodiment, the presentation time and duration of each of the plurality of metadata samples is equal to the earliest presentation time and duration of one of the plurality of segments comprising the respective metadata sample.

In an embodiment, the presentation time of each of the plurality of metadata samples is an anchor point of one of a presentation time value and a presentation time increment value of an event message box included in the respective metadata sample.

In an embodiment, the sum of the start time and the activity duration of each event message box is limited by the end presentation time of the timed metadata track.

In an embodiment, each of the plurality of segments is one of a Common Media Application Format (CMAF) fragment and a dynamic adaptive streaming over hypertext transfer protocol (DASH) fragment.

In an embodiment, each event message box included in one of the plurality of metadata samples includes a different schema identifier.

In an embodiment, the defragmentation and defragmentation process is based on an international organization for standardization (ISO)/International Electrotechnical Commission (IEC) basic media file format (ISOBMFF) defragmentation and defragmentation process.

Aspects of the present disclosure provide a method of receiving media data. In one method, a timed metadata track is received comprising a plurality of segments of a plurality of metadata samples, each of the plurality of segments comprising only one of the plurality of metadata samples, each of the plurality of metadata samples comprising one or more event message boxes. The start time and the activity duration of each event message box are determined. Processing event information included in the event message box based on the start time and the activity duration of the event message box. Performing a defragmentation and defragmentation process on the timed metadata track.

Aspects of the present disclosure also provide a non-transitory computer-readable storage medium storing instructions that, when executed by a computer for receiving media data, cause the computer to perform any one or combination of the methods of receiving media data.

Drawings

Other features, properties, and various advantages of the disclosed subject matter will be further apparent from the following detailed description and the accompanying drawings, in which:

fig. 1 illustrates an exemplary dynamic adaptive streaming over hypertext transfer protocol (DASH) system in accordance with an embodiment of the present disclosure.

Fig. 2 illustrates an exemplary DASH client architecture in accordance with an embodiment of the present disclosure.

FIG. 3 illustrates an exemplary in-band event timing model in accordance with an embodiment of the disclosure.

Fig. 4 illustrates an example of a media track and a timing metadata track that may be included in a content stream according to an embodiment of the present disclosure.

Fig. 5 illustrates a flow diagram summarizing an example of a process, according to some embodiments.

Fig. 6 is a schematic diagram of a computer system, according to an embodiment.

Detailed Description

I. Dynamic adaptive streaming over hypertext transfer protocol (DASH) and Media Presentation Description (MPD)

Dynamic adaptive streaming over hypertext transfer protocol (DASH) is an adaptive bitrate streaming technology that enables streaming of media content using hypertext transfer protocol (HTTP) infrastructure, such as web servers, Content Delivery Networks (CDNs), various proxies and caches, and so on. DASH supports on-demand and live streaming from DASH servers to DASH clients and allows DASH clients to control streaming sessions so that DASH servers do not need to handle additional streaming adaptation management load in large-scale deployments. DASH also allows DASH clients to select streaming from various DASH servers, further enabling load balancing of the network to facilitate DASH clients. DASH provides dynamic switching between different media tracks, e.g., by changing the bit rate to adapt to network conditions.

In DASH, a Media Presentation Description (MPD) file provides information for DASH clients to adaptively stream media content by downloading media segments from a DASH server. The MPD file may be fragmented and delivered in portions to reduce session start-up latency. The MPD file may also be updated during the streaming session. In some examples, the MPD file supports representations of content accessibility features, ratings, and camera views. DASH also supports delivery of multi-view and scalable coded content.

The MPD file may contain a sequence of one or more periods (period). Each of the one or more periods may be defined by a period element in the MPD file. The MPD file may include the availableStartTime attribute of the MPD and the start (start) attribute for each period. For media presentations having a dynamic type (e.g., for live services), the sum of the MPD attribute availableStartTime and the start attribute of the period, and the duration of the media segments, may indicate the available time for the period, in coordinated Universal Time (UTC) format, particularly the first media segment of each presentation in the corresponding period. For media presentations with static types (e.g., for on-demand services), the start attribute for the first epoch may be 0. For any other time period, the start attribute may specify a time offset between the start time of the corresponding time period relative to the start time of the first time period. Each period may extend to the beginning of the next period or, if the last period, to the end of the media presentation. The slot start time may be accurate and reflect the actual timing resulting from playing the media of all previous slots.

Each time period may contain one or more adaptation sets, and each adaptation set may contain one or more presentations of the same media content. The presentation may be one of a plurality of selectable encoded versions of audio or video data. The multiple presentations may differ by coding type, for example by bit rate, resolution, and/or codec of the video data and by bit rate, and/or codec of the audio data. The term presentation may be used to refer to a piece of encoded audio or video data that corresponds to a particular period of multimedia content and is encoded in a particular manner.

An adaptation set for a particular period may be assigned to the group indicated by the group attribute in the MPD file. Adaptation sets in the same group are generally considered to be mutually alternative. For example, each adaptation set of video data for a particular time period may be assigned to the same group such that any one adaptation set may be selected for decoding to display video data of multimedia content for the corresponding time period. In some examples, media content within a period of time may be presented by one adaptation set (if any) from group 0, or by a combination of at most one adaptation set from each non-zero group. The timing data for each presentation of a period may be represented relative to the start time of the period.

A presentation may include one or more segments. Each presentation may include an initialization segment, or each segment of a presentation may be self-initializing. When present, the initialization segment may contain initialization information for accessing the presentation. In some cases, the initialization segment does not contain media data. A fragment may be uniquely referenced by an identifier, such as a Uniform Resource Locator (URL), a Uniform Resource Name (URN), or a Uniform Resource Identifier (URI). The MPD file may provide an identifier for each segment. In some examples, the MPD file may also provide byte ranges in the form of range attributes, which may correspond to data of segments within a file accessible by a URL, URN, or URI.

Each presentation may also include one or more media components, where each media component may correspond to an encoded version of a separate media type, such as audio, video, or timed text (e.g., for closed captioning). The media components may be temporally contiguous across the boundaries of successive media segments within a presentation.

In some embodiments, the DASH client may access and download the MPD file from a DASH server. That is, the DASH client may retrieve the MPD file for initiating the real-time session. Based on the MPD file, for each selected presentation, the DASH client may make a number of decisions, including determining the latest segment available on the server, determining segment availability start times for the next and possible future segments, determining when to start playing the segment and from which timeline in the segment to start playing, and determining when to acquire/retrieve a new MPD file. Once the service is played, the client can track drift between the live service and its own play, which drift needs to be detected and compensated for.

Event message box

The international organization for standardization (ISO)/International Electrotechnical Commission (IEC)23009-1DASH standard introduces an event message box for carrying events with media segments. The ISO/IEC 23000-19 Common Media Application Format (CMAF) allows the inclusion of an event message box at the beginning of each CMAF block. The transmission of event messages as part of a CMAF sparse metadata track (sparse metadata track) has been discussed. However, a problem arises as to whether such tracks meet the fragmentation/defragmentation requirements of ISO/IEC 14496-12ISO/IEC base media File Format (ISOBMFF) tracks.

The present disclosure includes embodiments for signaling and processing of event information (e.g., event information included in a timing metadata track). The event information may correspond to media timed events associated with points in time or periods in a media presentation (e.g., a continuous audio and/or video presentation). For example, event information may be used for dynamic content replacement, advertisement insertion, presentation of supplemental content with audio and/or video, making changes to a web page, and executing application code triggered at a particular point on a media timeline of a media presentation (e.g., an audio and/or video media stream). In addition, the event information may be provided through a different method.

Media timing events can be used to carry information that is synchronized to the media stream. For example, event information may include metadata (or timing metadata) describing the content of the media presentation, such as program or chapter titles, or geographical location information. Additionally, the event information may include control messages of the media player that are associated with a particular time during playback of the media presentation, such as an ad insertion cue.

Embodiments of the present disclosure may be implemented in MPEG-DASH. The timed metadata track may include an embedded event message box (emsg) that carries event information. The timed metadata track may be used to carry information similar to the MPD events and in-band events described above. The event information may include metadata that is time-synchronized with information provided in other tracks. Due to this synchronization, metadata in the timed metadata track may be provided irregularly or discontinuously, and such a timed metadata track may be referred to as a sparse timed metadata track.

When the timed metadata track carries event message boxes as part of the metadata sample, each event message box may include its own timing that uses the earliest presentation time of a Common Media Application Format (CMAF) block or track (including the event message box) as an anchor point. However, the ISO/IEC basic media file format (ISOBMFF) requires that the timing of the data be preserved if the ISOBMFF tracks go through any fragmentation and defragmentation processes. Sparse timed metadata tracks carrying embedded events may not be able to support this requirement due to loss of block or track boundaries during the fragmentation and defragmentation process.

The present disclosure presents methods for sparse timed metadata tracks carrying embedded events to support any arbitrary fragmentation and defragmentation process to become an efficient ISOBMFF track.

Fig. 1 illustrates an exemplary DASH system (100) in accordance with an embodiment of the present disclosure. In a DASH system (100), MPD files are sent from a DASH server (101) (e.g., a content server) to a DASH client (102). The DASH client (102) may receive media segments from the DASH server (101) based on the MPD file. The DASH client (102) may send a request to the DASH server (101) to update the MPD file. A DASH server (101) may provide a content stream including primary content (e.g., a main program) and one or more timed metadata tracks.

Fig. 2 illustrates an exemplary DASH client architecture in accordance with an embodiment of the present disclosure. A DASH client (or DASH player) may be configured to communicate with an application (212) and process various types of events, including (i) MPD events, (ii) in-band events, and (iii) timing metadata events.

The manifest parser (210) may parse a manifest (e.g., MPD). For example, the manifest may be provided by a DASH server (101). The manifest parser (210) may extract event information about MPD events, in-band events, and timing metadata events embedded in the timing metadata track. The extracted event information may be provided to DASH logic (211) (e.g., DASH player control, selection, and heuristic logic). DASH logic (211) may notify the application (212) of the event scheme signaled in the manifest based on the event information.

The event information may include event scheme information for distinguishing different event streams. The application (212) may use the event schema information to subscribe to event schemas of interest. The application (212) may also indicate a desired scheduling pattern for each subscription plan through one or more subscription Application Program Interfaces (APIs). For example, the application (212) may send a subscription request to the DASH client that identifies one or more event schemas of interest and any desired corresponding scheduling patterns.

If the application (212) subscribes to one or more event schemas delivered as part of one or more timed metadata tracks, the in-band event and 'moof' parser (203) may stream the one or more timed metadata tracks to the timed metadata track parser (204). For example, the in-band event and 'moof' parser (203) parses movie fragment boxes ('moof') and then parses the timed metadata track based on control information from DASH logic (211).

The timed metadata track parser (204) may extract event messages embedded in the timed metadata track. The extracted event messages may be stored in an event and timing metadata buffer (206). A synchronizer/scheduler module (208) (e.g., an event and timing metadata synchronizer and scheduler) may schedule (or send) subscribed events to an application (212).

MPD events described in the MPD may be parsed by the manifest parser (210) and stored in the event and timing metadata buffer (206). For example, the manifest parser (210) parses each event stream element of the MPD and parses each event described in each event stream element. For each event signaled in the MPD, event information such as presentation time and event duration may be stored in an event and timing metadata buffer (206) associated with the event.

An in-band event and 'moof' parser (203) may parse the media segments to extract in-band event messages. Any such identified in-band events and associated presentation times and durations may be stored in an event and timing metadata buffer (206).

Accordingly, the event and timing metadata buffer (206) may store MPD events, in-band events, and/or timing metadata events therein. The event and timing metadata buffer (206) may be, for example, a first-in-first-out (FIFO) buffer. The event and timing metadata buffer (206) may be managed in correspondence with the media buffer (207). For example, as long as a media segment is present in the media buffer (207), any event or timing metadata corresponding to the media segment may be stored in the event and timing metadata buffer (206).

The DASH access API (202) may manage the acquisition and reception of content streams (or data streams) including media content and various metadata through the HTTP protocol stack (201). The DASH access API (202) may separate the received content stream into different data streams. The data stream provided to the in-band event and 'moof' parser (203) may include a media fragment, one or more timing metadata tracks, and in-band event signaling included in the media fragment. In an embodiment, the data stream provided to the manifest parser (210) may comprise an MPD.

The DASH access API (202) may forward the manifest to the manifest parser (210). In addition to describing events, manifests may also provide information about media segments to DASH logic (211), which DASH logic (211) may communicate with applications (212) and in-band events and 'moof' parser (203). The application (212) may be associated with media content handled by a DASH client. Control/synchronization signals exchanged between the application (212), DASH logic (211), manifest parser (210), and DASH access API (202) may control the retrieval of media segments from the HTTP protocol stack (201) based on information provided in the manifest about the media segments.

An in-band event and 'moof' parser (203) may parse the media data stream into media segments, including media content, timing metadata in the timing metadata track, and any signaled in-band events in the media segments. Media segments comprising media content may be parsed by a file format parser (205) and stored in a media buffer (207).

Events stored in the event and timing metadata buffer (206) may allow the synchronizer/scheduler (208) to communicate available events (or events of interest) related to the application (212) through the event/metadata API. The application (212) may be configured to process the available events (e.g., MPD events, in-band events, or timing metadata events) and subscribe to specific events or timing metadata through the notification synchronizer/scheduler (208). Any events stored in the event and timing metadata buffer (206) that are not related to the application (212)) but are related to the DASH client itself may be forwarded by the synchronizer/scheduler (208) to the DASH logic (211) for further processing.

In response to an application (212) subscribing to a particular event, the synchronizer/scheduler (208) may transmit to the application (212) an event instance (or timing metadata sample) corresponding to the event schema to which the application (212) has subscribed. The event instances may be delivered according to a scheduling mode (e.g., for a particular event scheme) or a default scheduling mode indicated by the subscription request. For example, in an on-receive scheduling mode, event instances may be sent to the application (212) as received in the event and timing metadata buffer (206). On the other hand, in a start-up (on-start) scheduling mode, event instances may be sent to the application (212) at their associated presentation time (e.g., synchronized with a timing signal from the media decoder (209)).

It should be noted that in DASH client architecture, the coarse data stream lines represent the media data streams, the narrow data stream lines represent the event and timing metadata data streams, and the dashed data stream lines represent control and synchronization. Furthermore, the same processing model may be used for CMAF events.

Fig. 3 illustrates an exemplary timing model of an event message box according to an embodiment of the disclosure. In the timing model, each event message box may be described by three timing parameters on the media timeline: (i) event Arrival Time (AT), which is the earliest presentation time of the segment comprising the event message box; (ii) event presentation/Start Time (ST), which is the moment in the Media (MPD) timeline at which an event becomes active; and (iii) an event Duration (DU) during which the event is active.

An event message box may be inserted at the beginning of the media segment. Thus, the earliest presentation time of the media segment carrying the event message box can be considered to be the location of the event message box on the media timeline. The DASH client may retrieve and parse the media segments before or AT the AT of the event message box.

The ST of the event message frame may be offset from the location of the media segment (e.g., DASH segment or CMAF block) in the track that carries the event message frame. The anchor point of the ST may be different according to the version of the event message box. For a version 0 event message box, the anchor may be the earliest presentation time of the CMAF fragment/DASH fragment carrying the event message box. For the event message box of version 1, the anchor may be the earliest presentation time of the CMAF track/DASH period carrying the event message box.

According to ISO/IEC 23000-19, the time stamp (timescale) of the event message box may be equal to the time stamp in the MediaHeaderBox of the CMAF track. The event message box may be inserted at the beginning of a CMAF block, CMAF fragment, or CMAF fragment. The anchor of the event message box of version 0 in the CMAF tile is the earliest presentation time of the tile. Further, according to ISO/IEC 23009-1, the event message box may be placed before the first 'moof' box of the segment, or may be placed between any media data box ('mdat') and 'moof' box. In the latter case, the equivalent 'emsg' with the same id value should appear before the first 'moof' box of any fragment.

Fragmentation/defragmentation process

The process of defragmentation/defragmentation of ISOBMFF may defragment and defragment ISOBMFF tracks having arbitrary sized fragments. In either process, the generated non-fragmented or fragmented tracks should be valid ISOBMFF tracks. In this case, all event message boxes included in the new track must be maintained at the correct timing.

According to aspects of the present disclosure, a fragmentation/defragmentation process may be performed on metadata tracks in units of metadata samples. Thus, the correct timing of the single metadata sample may be maintained during the defragmentation/defragmentation process. However, where a fragment includes multiple metadata samples, the correct timing of the fragment may not be maintained during the fragmentation/defragmentation process. Furthermore, if the fragment includes a version 0 event message box and the correct timing of the fragment is not maintained during the defragmentation/defragmentation process, it may not be possible to maintain the correct timing of the event message box since the anchor point of the ST of the version 0 event message box is the earliest presentation time of the fragment.

The survivability of the CMAF fragmentation/DASH fragments under any arbitrary fragmentation and defragmentation can be analyzed as follows.

If the DASH segment/CMAF shard includes version 0 event message boxes, the fragmentation/defragmentation process may fail because the anchor point of the event message box is the earliest presentation time of the DASH segment/CMAF shard and may be lost during the fragmentation/defragmentation process of the DASH/CMAF media tracks.

If the DASH segment/CMAF shard includes version 1 event message boxes, the fragmentation/defragmentation process may pass since the anchor of the event message boxes is the earliest presentation time of the DASH/CMAF media track and is preserved during the fragmentation/defragmentation process of the DASH/CMAF media track.

Table 1 summarizes the state of DASH and CMAF tracks of the fragmentation/defragmentation process.

TABLE 1

Track Fragmentation Defragmentation
DASH w/emsg V0 X X
DASH w/emsg V1 OK OK
CMAF w/emsg V0 X X
CMAF w/emsg V1 OK OK

Fig. 4 illustrates an example of a media track and a timing metadata track that may be included in a content stream according to an embodiment of the present disclosure. The metadata track may include an event message box. The event message box may be used to provide signaling for general events related to media presentation time. In some embodiments, if the DASH client detects an event message box with a scheme not defined in the MPD, the DASH client ignores the event message box.

The event message box may include message scheme identification information (e.g., scheme _ id _ uri), optional values of events contained in the event message box, timing information, and event data. The timing information may indicate a timestamp (e.g., in ticks per second) of other time information, such as a media presentation time increment of a media presentation time of an event relative to a reference presentation time (e.g., the beginning of a segment or metadata sample), a media presentation time of an event, and an event duration (e.g., in media presentation time).

The metadata track may carry an event message box as part of a metadata sample included in the metadata track. In addition, the metadata sample may include one or more event message boxes. Each event message box may belong to an event schema defined by a schema URI id and an optional value for the corresponding event message box. Since event instances from multiple schemas may be included in one or more metadata samples, event schemas need to be identified in the DASH manifest for the DASH client to discover these schemas.

DASH includes two elements that can be used to describe the event scheme in MPD. These two elements are the event stream element for MPD events (e.g., EventStream) and the inband event stream element for inband events (e.g., InbandEventStream). The two event scenarios may use the same configuration.

Single sample sparse timing metadata

According to aspects of the present disclosure, the metadata fragments/fragments including the embedded event message boxes may be single sample DASH fragments/CMAF fragments or multi-sample DASH fragments/CMAF fragments. A single sample DASH segment/CMAF fragment can only include one metadata sample, and the duration of the metadata sample is equal to the duration of the DASH segment/CMAF fragment. The multi-sample DASH fragments/CMAF fragments may include a plurality of metadata samples.

If the single sample DASH fragment/CMAF fragment includes a version 0 event message box, the fragmentation/defragmentation process may pass. Since the earliest presentation time of a fragment/shard is the same as the presentation time of the only one metadata sample included in the fragment/shard, the timing of an event message box may be preserved during the defragmentation/defragmentation process if the anchor of the event message box is considered to be the presentation time of the metadata sample including the event message box.

If the single sample DASH segment/CMAF segment includes a version 1 event message box, the fragmentation/defragmentation process may pass since the earliest presentation time of the track is the anchor point of the event message box.

If the multi-sample DASH fragment/CMAF fragment includes a version 0 event message box, the fragmentation/defragmentation process may fail because the anchor of the event message box is the earliest presentation time of the fragment/fragment and may be lost during the fragmentation/defragmentation process.

If the multi-sample DASH segment/CMAF segment includes a version 1 event message box, the fragmentation/defragmentation process may pass since the earliest presentation time of a track is the anchor point of the event message box.

Table 2 summarizes the state of single sample and multiple sample metadata fragments/shards with embedded event message boxes during the defragmentation/defragmentation process.

TABLE 2

According to aspects of the present disclosure, the following constraints may be applied on CMAF sparse metadata tracks to satisfy the ISOBMFF fragmentation/defragmentation process: (i) each CMAF fragment/DASH segment includes only one metadata sample (or is limited to one metadata sample), and the duration of a metadata sample is the duration of a fragment/segment; (ii) the earliest presentation time of a CMAF fragment/DASH fragment is the presentation time of the metadata sample; (iii) in a non-fragmented track, each event message box of version 0 may use the presentation time of the metadata sample (including the respective event message box) as an anchor point for a presentation time offset parameter (e.g., presentation time delta) for the respective event message box; (iv) in a non-fragmented track, each event message box of version 1 may use the earliest presentation time of the track as an anchor for a presentation time parameter (e.g., presentation time) of the corresponding event message box; (v) in all cases (e.g., DASH or CMAF, fragmented or non-fragmented), the timestamp of each event message box (e.g., version 0 or version 1) may be equal to the timestamp of the track; and (vi) the end time of the event message box (e.g., the event start time plus the event duration) does not exceed the end (or last) presentation time of the track, even if the value of the event duration indicates that the end time of the event message box exceeds the end presentation time of the track.

Because the presentation time of the metadata samples carrying the event message boxes does not change during any defragmentation/defragmentation process, constraints (i) and (ii) may allow defragmentation/defragmentation of tracks without changing the timing of the event message boxes of version 0.

Constraints (iii) and (iv) are constraints on non-fragmented tracks, allowing the relationship between fragmented/segmented single-sample sparse metadata tracks and non-fragmented tracks to be maintained.

The constraint (v) may ensure that the timing of the event is aligned with the track sample timing so that there is no fractional drift when the file format parser (205) and the media decoder (209) use integer arithmetic.

Constraint (vi) may limit the event duration to a maximum track duration, and thus may simplify operations in the file format parser (205), especially since application events must be passed to applications (212) that do not necessarily have a track duration. Thus, if the duration of each event message box is set such that the event message box ends before the track or at the end of the track, the file format parser (205) does not need to truncate the event duration if the event duration exceeds the track duration before passing the event to the application (212).

Some advantages of single-sample sparse timed metadata tracks may be listed below. The structure of a single-sample sparse timed metadata track is simple, since each fragment/slice comprises only one (synchronous) metadata sample. Each metadata sample may include one or more event message boxes with various schemas/sub-schemas. If the fragmentation/defragmentation process maintains a single sample fragmentation/fragmentation constraint, the fragmentation track may pass through the fragmentation/defragmentation process. The constraints on the anchor points of the event message boxes in the non-fragmented tracks are very simple and easy to maintain. Attributes of the event message box may be maintained for delivery using a separate track from the media track such that delivery is not dependent on any particular media track.

According to aspects of the present disclosure, sparse timed metadata tracks may be generated in which each fragment/shard includes only one metadata sample. The metadata sample may include one or more event message boxes. Each event message box may include different scheme identifiers and different values for the associated sub-scheme identifiers carrying the message data payload. The time stamp of the event message box is equal to the time stamp of the track. The presentation time and duration of the metadata sample is equal to the earliest presentation time and duration of the fragment/fragment comprising the metadata sample. In the case of defragmentation, the anchor point of the presentation time (e.g., presentation time) and/or the presentation time offset (e.g., presentation time delta) of the event message box is the presentation time of the metadata sample comprising the event message box.

V. flow chart

Fig. 5 shows a flowchart outlining a process (500) according to an embodiment of the present disclosure. In various embodiments, process (500) is performed by processing circuitry, such as processing circuitry in a DASH client (102). In some embodiments, process (500) is implemented in software instructions, such that when executed by processing circuitry, the processing circuitry performs process (500). The process (500) begins (S510), where the process (500) receives a timed metadata track comprising a plurality of segments of a plurality of metadata samples. Each of the plurality of segments may include only one of the plurality of metadata samples. Each of the plurality of metadata samples includes one or more event message boxes. Then, the process (500) proceeds to step (S520).

At step (S520), the process (500) determines the start time and the activity duration of each event message box. Then, the process (500) proceeds to step (S530).

In step (S530), the process (500) processes event information included in the event message box based on the start time and the activity duration of the event message box. The process (500) then ends.

In an embodiment, the time stamp of each event message box is equal to the time stamp of the timed metadata track.

In an embodiment, a presentation time and duration of each of the plurality of metadata samples is equal to an earliest presentation time and duration of one of the plurality of segments comprising the respective metadata sample, and the defragmentation and defragmentation process is performed on the timed metadata track.

In an embodiment, the presentation time of each of the plurality of metadata samples is an anchor point of one of a presentation time value and a presentation time increment value of an event message box included in the respective metadata sample.

In an embodiment, the sum of the start time and the activity duration of each event message box is limited by the end presentation time of the timed metadata track.

In an embodiment, each of the plurality of fragments is one of a CMAF fragment and a DASH fragment.

In an embodiment, each event message box included in one of the plurality of metadata samples includes a different schema identifier.

In an embodiment, the defragmentation and defragmentation process is based on the ISO/IEC ISOBMFF defragmentation and defragmentation process.

VI computer system

The techniques described above may be implemented as computer software via computer readable instructions and physically stored in one or more computer readable media. For example, fig. 6 illustrates a computer system (600) suitable for implementing certain embodiments of the disclosed subject matter.

The computer software may be encoded in any suitable machine code or computer language, and by assembly, compilation, linking, etc., mechanisms create code that includes instructions that are directly executable by one or more computer Central Processing Units (CPUs), Graphics Processing Units (GPUs), etc., or by way of transcoding, microcode, etc.

The instructions may be executed on various types of computers or components thereof, including, for example, personal computers, tablets, servers, smartphones, gaming devices, internet of things devices, and so forth.

The components illustrated in FIG. 6 for the computer system (600) are exemplary in nature and are not intended to limit the scope of use or functionality of the computer software implementing embodiments of the present disclosure in any way. Neither should the configuration of components be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiments of the computer system (600).

The computer system (600) may include some human interface input devices. Such human interface input devices may respond to input from one or more human users through tactile input (e.g., keyboard input, swipe, data glove movement), audio input (e.g., sound, applause), visual input (e.g., gestures), olfactory input (not shown). The human-machine interface device may also be used to capture media that does not necessarily directly relate to human conscious input, such as audio (e.g., voice, music, ambient sounds), images (e.g., scanned images, photographic images obtained from still-image cameras), video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).

The human interface input device may include one or more of the following (only one of which is depicted): a keyboard (601), a mouse (602), a touch pad (603), a touch screen (610), a data glove (not shown), a joystick (605), a microphone (606), a scanner (607), and a camera (608).

The computer system (600) may also include certain human interface output devices. Such human interface output devices may stimulate the senses of one or more human users through, for example, tactile outputs, sounds, light, and olfactory/gustatory sensations. Such human interface output devices may include tactile output devices (e.g., tactile feedback through a touch screen (610), data glove (not shown), or joystick (605), but there may also be tactile feedback devices that do not act as input devices), audio output devices (e.g., speaker (609), headphones (not shown)), visual output devices (e.g., screens (610) including cathode ray tube screens, liquid crystal screens, plasma screens, organic light emitting diode screens, each with or without touch screen input functionality, each with or without haptic feedback functionality-some of which may output two-dimensional visual output or more than three-dimensional output by means such as stereoscopic picture output; virtual reality glasses (not shown), holographic displays and smoke boxes (not shown)), and printers (not shown). These visual output devices, such as a touch screen (610), may be connected to the system bus (648) by a graphics adapter (650).

The computer system (600) may also include human-accessible storage devices and their associated media, such as optical media including compact disc read-only/rewritable (CD/DVD ROM/RW) (620) with CD/DVD or similar media (621), thumb drive (622), removable hard drive or solid state drive (623), conventional magnetic media such as magnetic tape and floppy disk (not shown), ROM/ASIC/PLD based application specific devices such as security dongle (not shown), and the like.

Those skilled in the art will also appreciate that the term "computer-readable medium" used in connection with the disclosed subject matter does not include transmission media, carrier waves, or other transitory signals.

The computer system (600) may also include a network interface (654) to one or more communication networks (655). For example, the one or more communication networks (655) may be wireless, wired, optical. The one or more communication networks (655) may also be local area networks, wide area networks, metropolitan area networks, vehicular and industrial networks, real-time networks, delay tolerant networks, and the like. Examples of the one or more communication networks (655) also include ethernet, wireless local area networks, local area networks such as cellular networks (GSM, 3G, 4G, 5G, LTE, etc.), television wired or wireless wide area digital networks (including cable, satellite, and terrestrial broadcast television), automotive and industrial networks (including CANBus), and so forth. Some networks typically require external network interface adapters for connecting to some general purpose data ports or peripheral buses (649) (e.g., USB ports of the computer system (600)); other systems are typically integrated into the core of the computer system (600) by connecting to a system bus as described below (e.g., an ethernet interface to a PC computer system or a cellular network interface to a smart phone computer system). Using any of these networks, the computer system (600) may communicate with other entities. The communication may be unidirectional, for reception only (e.g., wireless television), unidirectional for transmission only (e.g., CAN bus to certain CAN bus devices), or bidirectional, for example, to other computer systems over a local or wide area digital network. Each of the networks and network interfaces described above may use certain protocols and protocol stacks.

The human interface device, human accessible storage device, and network interface described above may be connected to the core (640) of the computer system (600).

The core (640) may include one or more Central Processing Units (CPUs) (641), Graphics Processing Units (GPUs) (642), special purpose programmable processing units in the form of Field Programmable Gate Arrays (FPGAs) (643), hardware accelerators (644) for specific tasks, and so forth. These devices, as well as Read Only Memory (ROM) (645), random access memory (646), internal mass storage (e.g., internal non-user accessible hard drives, solid state drives, etc.) (647), etc. may be connected via a system bus (648). In some computer systems, the system bus (648) may be accessed in the form of one or more physical plugs so as to be expandable by additional central processing units, graphics processing units, and the like. The peripheral devices may be attached directly to the system bus (648) of the core or connected through a peripheral bus (649). The architecture of the peripheral bus includes peripheral component interconnect PCI, universal serial bus USB, etc.

The CPU (641), GPU (642), FPGA (643), and accelerator (644) may execute certain instructions, which in combination may constitute the computer code described above. The computer code may be stored in ROM (645) or RAM (646). Transitional data may also be stored in RAM (646) while persistent data may be stored in, for example, internal mass storage (647). Fast storage and retrieval for any memory device may be achieved through the use of caches, which may be closely associated with one or more of CPU (641), GPU (642), mass storage (647), ROM (645), RAM (646), etc.

The computer-readable medium may have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present disclosure, or they may be of the kind well known and available to those having skill in the computer software arts.

By way of example, and not limitation, a computer system having an architecture (600), and in particular a core (640), may provide functionality as a processor (including CPUs, GPUs, FPGAs, accelerators, etc.) executing software embodied in one or more tangible computer-readable media. Such computer-readable media may be media associated with the user-accessible mass storage described above, as well as certain storage with a non-volatile core (640), such as core internal mass storage (647) or ROM (645). Software implementing various embodiments of the present disclosure may be stored in such devices and executed by the core (640). The computer-readable medium may include one or more memory devices or chips, according to particular needs. The software may cause the core (640), and in particular the processors therein (including CPUs, GPUs, FPGAs, etc.), to perform certain processes or certain portions of certain processes described herein, including defining data structures stored in RAM (646) and modifying such data structures according to software-defined processes. Additionally or alternatively, the computer system may provide functionality that is logically hardwired or otherwise embodied in circuitry (e.g., accelerator (644)) that may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. Where appropriate, reference to software may include logic and vice versa. Where appropriate, reference to a computer-readable medium may include circuitry (e.g., an Integrated Circuit (IC)) storing executable software, circuitry comprising executable logic, or both. The present disclosure includes any suitable combination of hardware and software.

While this disclosure has described several exemplary embodiments, various modifications, permutations and various equivalents thereof are within the scope of this disclosure. It will thus be appreciated that those skilled in the art will be able to devise various systems and methods which, although not explicitly shown or described herein, embody the principles of the disclosure and are thus within its spirit and scope.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:经由日志检测敏感数据暴露

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类