Transmission device, transmission method, reception device, and reception method

文档序号:1802483 发布日期:2021-11-05 浏览:28次 中文

阅读说明:本技术 发送装置、发送方法、接收装置和接收方法 (Transmission device, transmission method, reception device, and reception method ) 是由 塚越郁夫 于 2016-02-10 设计创作,主要内容包括:本发明公开了发送装置、发送方法、接收装置和接收方法。发送装置,包括:图像编码单元,生成包括基本视频流及扩展视频流的两个视频流,基本视频流包括基本格式图像数据的编码图像数据,以及扩展视频流包括从多种类型中选择的一种类型的高质量格式图像数据的编码图像数据;发送单元,被配置成发送包括基本视频流和扩展视频流的预定格式的容器;以及信息插入单元,被配置成将指示与扩展视频流中所包含的编码图像数据相对应的高质量格式的信息插入到扩展视频流和/或容器中。使得能够在接收侧容易地识别与扩展视频流中所包含的编码图像数据相对应的高质量格式。(The invention discloses a transmitting apparatus, a transmitting method, a receiving apparatus and a receiving method. A transmitting apparatus, comprising: an image encoding unit that generates two video streams including a base video stream and an extension video stream, the base video stream including encoded image data of base format image data, and the extension video stream including encoded image data of one type of high quality format image data selected from a plurality of types; a transmitting unit configured to transmit a container of a predetermined format including a base video stream and an extended video stream; and an information insertion unit configured to insert information indicating a high quality format corresponding to encoded image data contained in the extended video stream into the extended video stream and/or the container. So that a high-quality format corresponding to encoded image data contained in the extended video stream can be easily recognized at the receiving side.)

1. A transmitting apparatus, comprising:

an image encoding unit configured to generate two video streams including a base video stream and an extension video stream, the base video stream including encoded image data of base format image data, and the extension video stream including encoded image data of one type of high quality format image data selected from a plurality of types;

a transmitting unit configured to transmit a container of a predetermined format including the base video stream and the extended video stream; and

an information insertion unit configured to insert information indicating a high quality format corresponding to the encoded image data contained in the extended video stream into the extended video stream and/or the container.

2. The transmission apparatus according to claim 1, wherein

The image encoding unit:

performing a predictive encoding process within the base format image data on the base format image data to obtain encoded image data, an

Selectively performing a predictive encoding process within the high-quality format image data or a predictive encoding process between the high-quality format image data and the base format image data on the high-quality format image data to obtain encoded image data.

3. The transmission apparatus according to claim 2, wherein

The base format image data is normal dynamic range and low frame rate image data,

the high quality format image data is any one of high dynamic range and high frame rate image data, high dynamic range and low frame rate image data, and normal dynamic range and high frame rate image data, an

The encoded image data of the high-quality format image data includes an encoded component of high-dynamic-range image data based on differential information with respect to normal-dynamic-range image data and/or an encoded component of high-frame-rate image data based on differential information with respect to low-frame-rate image data.

4. The transmission apparatus according to claim 3, wherein

When differential information with respect to the normal dynamic range image data is obtained, the image encoding unit performs dynamic range conversion on the normal dynamic range image data to reduce a difference.

5. The transmission apparatus according to claim 4, wherein

The image encoding unit performs dynamic range conversion on the normal dynamic range image data based on conversion information for converting a value of conversion data based on a normal dynamic range photoelectric conversion characteristic into a value of conversion data based on a high dynamic range photoelectric conversion characteristic.

6. The transmission apparatus according to claim 5, wherein

The information insertion unit also inserts the conversion information into the extended video stream and/or the container.

7. The transmission apparatus according to claim 1, wherein

The image encoding unit:

such that a time indicated by the decoding time stamp added to the encoded image data of each image contained in the extended video stream: equal to the time indicated by the decoding time stamp added to the encoded image data of each image contained in the elementary video stream; or becomes an intermediate time between times indicated by decoding time stamps added to encoded image data of each picture contained in the base video stream,

equalizing an interval between times indicated by decoding time stamps added to encoded image data of each image contained in the base video stream, and

equalizing an interval between times indicated by decoding time stamps added to encoded image data of each image contained in the extended video stream.

8. The transmission apparatus according to claim 1, wherein

The extended video stream has a NAL unit structure, an

The information insertion unit inserts the information indicating a high quality format corresponding to the encoded image data contained in the extended video stream into a header of the NAL unit.

9. The transmission apparatus according to claim 1, wherein

The extended video stream has a NAL unit structure, an

The information insertion unit inserts the information indicating a high quality format corresponding to the encoded image data included in the extended video stream into an area of a SEANAL unit.

Technical Field

The present technology relates to a transmitting apparatus, a transmitting method, a receiving apparatus, and a receiving method, and relates to a transmitting apparatus that transmits high-quality format image data together with basic format image data and the like.

Background

In general, it is known to transmit high-quality format image data together with basic format image data, and selectively use either the basic format image data or the high-quality format image data on the reception side. For example, patent document 1 describes scalably performing media encoding, generating a base layer stream for a low resolution video service and an extension layer stream for a high resolution video service, and transmitting a broadcast signal including the base layer stream and the extension layer stream.

List of cited documents

Patent document

Patent document 1: PCT national publication No. 2008-543142

Disclosure of Invention

Problems to be solved by the invention

An object of the present technology is to make it easy for a receiving side to recognize a high-quality format corresponding to encoded image data contained in an extended video stream.

Solution to the problem

The concept of the present technology resides in a transmitting apparatus comprising:

an image encoding unit configured to generate two video streams including a base video stream including encoded image data of base format image data and an extension video stream including encoded image data of one type of high quality format image data selected from a plurality of types;

a transmitting unit configured to transmit a container of a predetermined format including a base video stream and an extended video stream; and

an information insertion unit configured to insert information indicating a high quality format corresponding to encoded image data contained in an extended video stream into the extended video stream and/or the container.

In the present technology, two video streams including a base video stream and an extension video stream are generated by an image encoding unit, wherein the base video stream includes encoded image data of base format image data, and the extension video stream includes encoded image data of one type of high quality format image data selected from a plurality of types. Then, a container of a predetermined format including the base video stream and the extended video stream is transmitted by the transmitting unit.

For example, the image encoding unit may perform predictive encoding processing within the base format image data on the base format image data to obtain encoded image data, and selectively perform predictive encoding processing within the high quality format image data and predictive encoding processing between the high quality format image data and the base format image data on the high quality format image data to obtain encoded image data.

In this case, for example, the base format image data may be normal dynamic range and low frame rate image data, the high quality format image data may be any one of high dynamic range and high frame rate image data, high dynamic range and low frame rate image data, and normal dynamic range and high frame rate image data, and the encoded image data of the high quality format image data may include an encoded component of the high dynamic range image data based on differential information related to the normal dynamic range image data and/or an encoded component of the high frame rate image data based on differential information related to the low frame rate image data.

For example, when difference information related to normal dynamic range image data is obtained, the image encoding unit may perform dynamic range conversion on the normal dynamic range image data to reduce the difference. In this case, for example, the image encoding unit may perform dynamic range conversion on the normal dynamic range image data based on conversion information for converting a value of conversion data based on the normal dynamic range photoelectric conversion characteristic into a value of conversion data based on the high dynamic range photoelectric conversion characteristic.

Information indicating a high quality format corresponding to encoded image data of an extended video stream (high quality format information) is inserted into the extended video stream and/or the container by the information insertion unit. For example, the extended video stream may have a NAL unit structure, and the information insertion unit may insert high quality format information into a header of the NAL unit.

Also, for example, the extension video stream may include a NAL unit structure, and the information insertion unit may insert high quality format information into a region of the SEI NAL unit. Further, for example, the container may be MPEG2-TS, and the information insertion unit may insert the high quality format information into a video elementary stream loop corresponding to the extended video stream existing under the arrangement of the program map table.

For example, the information insertion unit may also insert the conversion information into the extended video stream and/or the container. In this case, the reception side can appropriately perform the processing of dynamic range conversion for the normal dynamic range image data so as to obtain the high dynamic range image data based on the conversion information.

In the present technology, information indicating a high-quality format corresponding to encoded image data of an extension video stream (high-quality format information) is inserted into the extension video stream and/or the container in this manner. Therefore, the reception side can easily recognize the high quality format of the high quality format image data. Then, based on the information and the display capability information, the receiving side can obtain image data corresponding to the display capability from the base video stream and the extension video stream as display image data.

Note that, in the present technology, for example, the image encoding unit may cause the time indicated by the decoding time stamp added to the encoded image data of each image contained in the extension video stream to be: equal to the time indicated by the decoding time stamp added to the encoded image data of each image contained in the elementary video stream; or becomes an intermediate time between times indicated by decoding time stamps added to the encoded image data of each picture contained in the base video stream, equalizes an interval between times indicated by decoding time stamps added to the encoded image data of each picture contained in the base video stream, and equalizes an interval between times indicated by decoding time stamps added to the encoded image data of each picture contained in the extension video stream. The interval between the decoding time stamps is equalized in this way, so that the decoding capability of the receiving side can be effectively used.

Further, another concept of the present technology is a receiving apparatus comprising:

a receiving unit configured to receive a container of a predetermined format including two video streams, the two video streams including a base video stream including encoded image data of base format image data and an extension video stream including encoded image data of one type of high quality format image data selected from among the plurality of types,

information indicating a high quality format corresponding to encoded image data contained in the extended video stream is inserted into the extended video stream and/or the container, and

the receiving apparatus further comprises:

an information extraction unit configured to extract information from the extended video stream and/or the container; and

a processing unit configured to obtain image data corresponding to display capability from the base video stream and the extended video stream as display image data based on the extracted information and the display capability information.

In the present technology, a receiving unit receives a container of a predetermined format including two video streams including a base video stream including encoded image data of base format image data and an extension video stream including encoded image data of one type of high quality format image data selected from among the plurality of types. For example, the encoded image data contained in the base video stream may be generated by performing a predictive encoding process within the base format image data with respect to the base format image data, and the encoded image data contained in the extended video may be generated by selectively performing a predictive encoding process within the high quality format image data and a predictive encoding process between the high quality format image data and the base format image with respect to the high quality format image data.

Information indicating a high quality format corresponding to encoded image data contained in the extended video stream is inserted into the extended video stream and/or the container. Information is extracted from the extended video stream and/or the container by an information extraction unit. Based on the extracted information and the display capability information of the processing unit, image data corresponding to the display capability is obtained from the base video stream and the extended video stream by the processing unit as display image data.

In the present technology, in this way, based on the information indicating the high-quality format corresponding to the encoded image data contained in the extended video stream, the image data corresponding to the display capability can be obtained from the base video stream and the extended video stream as the display image data. Therefore, image data corresponding to the display capability can be efficiently obtained as display image data.

Effects of the invention

According to the present technology, the receiving side can easily recognize a high-quality format corresponding to encoded image data contained in an extended video stream. Note that the effects described herein are inevitably limited, and any of the effects described in the present disclosure may be exhibited.

Drawings

Fig. 1 is a block diagram showing a configuration example of a transmission and reception system as an embodiment.

Fig. 2 is a block diagram showing a configuration example of a transmitting apparatus.

Fig. 3 is a block diagram showing a configuration example of an image data generating unit that generates the basic-format image data Vb and the high-quality-format image data Ve.

Fig. 4 is a block diagram showing a configuration example of a main part of the encoding unit.

Fig. 5 is a schematic diagram for describing stage adjustment (dynamic range conversion) in the case of dynamic range expansion.

Fig. 6 (a) and (b) are schematic diagrams showing a structural example of a NAL unit header and contents of main parameters of the structural example.

Fig. 7 is a diagram illustrating an access unit of a header of a GOP in the case where the encoding method is HEVC.

Fig. 8 is a diagram illustrating an access unit other than a header of a GOP in the case where the encoding method is HEVC.

Fig. 9 is a diagram illustrating an example of the structure of a scalable linking SEI message.

Fig. 10 is a diagram illustrating the contents of main information in a structural example of a scalable linking SEI message.

Fig. 11 is a schematic diagram showing a configuration example (HDR + HFR) of encoded image data of the base video stream BS and encoded image data of the extension video stream ES.

Fig. 12 is a schematic diagram for explaining management of decoding and display time (decoding and display time stamp) in encoded image data.

Fig. 13 is a schematic diagram showing a configuration example (HDR + LFR) of encoded image data of the base video stream BS and encoded image data of the extension video stream ES.

Fig. 14 is a diagram showing a configuration example (SDR + HFR) of the encoded image data of the base video stream BS and the encoded image data of the extension video stream ES.

Fig. 15 is a diagram showing a configuration example ("SDR + HFR" → "HDR + HFR") of the encoded image data of the base video stream BS and the encoded image data of the extension video stream ES.

Fig. 16 is a schematic diagram showing a configuration example ("SDR + HFR" → "HDR + LFR") of the encoded image data of the base video stream BS and the encoded image data of the extension video stream ES.

Fig. 17 is a schematic diagram showing a configuration example ("HDR + HFR" → "HDR + LFR") of the encoded image data of the base video stream BS and the encoded image data of the extension video stream ES.

Fig. 18 (a) and (b) are diagrams illustrating a structural example of a scalable link descriptor in a configuration example and the content of main information.

Fig. 19 (a) and (b) are diagrams illustrating examples of values of fields in the case where a value of "nuh _ layer _ id" of a header of a NAL unit is fixed by a coded component and in the case where a value of "nuh _ layer _ id" of a header of a NAL unit is flexibly allocated.

Fig. 20 is a schematic diagram showing a configuration example of the transport stream TS.

Fig. 21 is a block diagram showing a configuration example of a receiving apparatus.

Fig. 22 is a block diagram showing a configuration example of a main part of the decoding unit.

Detailed Description

Modes for carrying out the invention

Hereinafter, a form of carrying out the invention (hereinafter referred to as "embodiment") will be described.

Note that the description will be given in the following order:

1. examples of the embodiments

2. Modifying

<1. example >

[ transmitting and receiving System ]

Fig. 1 shows a configuration example of a transmission and reception system 10 as an embodiment. The transmission and reception system 10 includes a transmission apparatus 100 and a reception apparatus 200. The transmission apparatus 100 transmits a transport stream TS as a container on a broadcast wave packet or a network packet.

The transport stream TS comprises two video streams including a base video stream and an extension video stream. The elementary video stream has encoded image data of the elementary format image data. The basic format image data is normal dynamic range and low frame rate image data. Encoded image data of base format image data is generated by applying a predictive encoding process such as H.264/AVC or H.265/HEVC to the base format image data. In this case, the encoded image data is obtained by performing a predictive encoding process of the inside of the basic format image data.

The extended video stream has encoded image data of one type of high-quality format image data selected from a plurality of types. In the present embodiment, the high-quality format image data is any one of high dynamic range and high frame rate image data, high dynamic range and low frame rate image data, and normal dynamic range and high frame rate image data.

Encoded image data of an extended video stream is generated by applying a predictive encoding process such as H.264/AVC or H.265/HEVC to high-quality format image data. In this case, the encoded image data is obtained by selectively performing the intra prediction encoding process of the high-quality format image data or the prediction encoding process between the high-quality format image data and the base format image data.

In this case, the encoded image data of the high-quality format image data has an encoding component of the high-dynamic-range image data based on the differential information related to the normal-dynamic-range image data and/or an encoding component of the high-frame-rate image data based on the differential information related to the low-frame-rate image data.

The transmission apparatus 100 inserts information indicating a high-quality format corresponding to encoded image data of an extended video stream (hereinafter, this information is referred to as "high-quality format" information as appropriate) into a transport stream TS that is an extended video stream and/or a container. This information is inserted in the header of the NAL unit or in the area of the SEI NAL unit. Further, this information is inserted into a video elementary stream loop (video elementary stream loop) corresponding to the extended video stream existing under the arrangement of the program map table.

The reception device 200 receives the transport stream TS transmitted on the broadcast wave packet or the network packet from the transmission device 100. As described above, the transport stream TS includes a base video stream of encoded image data having base format image data and an extension video stream of encoded image data having high quality format image data.

As described above, the high quality format information is inserted into the transport stream TS as an extended video stream and/or a container. The reception apparatus 200 acquires image data corresponding to the display capability from the base video stream and the extended video stream as display image data based on the information and the display capability information.

Arrangement of transmitting devices "

Fig. 2 shows a configuration example of the transmission apparatus 100. The transmission apparatus 100 processes the basic format image data Vb and the high quality format image data Ve as transmission image data. Here, the basic-format image data Vb is Standard Dynamic Range (SDR) image data having a frame frequency of 50Hz (low frame rate: LFR).

The high-quality format image data Ve is high-quality format image data of one type selected from three types including, for example, (a) HDR image data at a frame frequency of 100Hz (high frame rate: HFR), (b) HDR image data at a frame frequency of 50Hz (lfr), and (c) SDR image data at a frame frequency of 100Hz (HFR). The HDR image data has a luminance in the range of 0 to 100% × N, for example 0 to 1000%, or more than 0 to 1000%, wherein the luminance of the white peak of the conventional LDR image is 100%.

Fig. 3 shows a configuration example of the image data generating unit 150 that generates the basic format image data Vb and the high-quality format image data Ve. The image data generation unit 150 includes a camera 151 and a conversion unit 152.

The camera 151 images an object and outputs high-quality format image data Ve. In the case where the high-quality-format image data Ve output by the camera 151 is HDR image data having a frame frequency of 100Hz, the conversion unit 152 performs conversion of the frame rate and the dynamic range, and outputs basic-format image data Vb that is SDR image data having a frame frequency of 50 Hz.

Further, in the case where the high-quality format image data Ve output by the camera 151 is HDR image data having a frame frequency of 50Hz, the conversion unit 152 performs only conversion of the dynamic range, and outputs basic format image data Vb that is SDR image data having a frame frequency of 50 Hz. Further, in the case where the high-quality format image data Ve output by the camera 151 is SDR image data having a frame frequency of 100Hz, the conversion unit 152 performs only the conversion of the frame rate, and outputs the base-format image data Vb which is the SDR image data having a frame frequency of 50 Hz.

Referring back to fig. 2, the transmission apparatus 100 includes a control unit 101, a photoelectric conversion unit 102, an RGB/YCbCr conversion unit 103, a photoelectric conversion unit 104, an RGB/YCbCr conversion unit 105, a video encoder 106, a system encoder 107, and a transmission unit 108. The control unit 101 includes a Central Processing Unit (CPU), and controls operations of the respective units of the transmission apparatus 100 based on a control program.

The photoelectric conversion unit 102 applies SDR photoelectric conversion characteristics (SDR OETF curve) to the base-format image data Vb to obtain base-format image data Vb' for transmission. The RGB/YCbCr conversion unit 103 converts the basic format image data Vb' from the RGB domain to the luminance and chrominance (YCbCr) domain.

The photoelectric conversion unit 104 applies the HDR photoelectric conversion characteristic (HDR OETF curve) or the SDR photoelectric conversion characteristic (SDR OETF curve) to the high-quality format image data Ve to obtain high-quality format image data Ve' for transmission. The RGB/YCbCr conversion unit 105 converts the high-quality format image data Ve' from the RGB domain into a luminance and chrominance (YCbCr) domain.

The video encoder 106 includes an encoding unit 106b and an encoding unit 106 e. The encoding unit 106b applies a predictive encoding process such as h.264/AVC or h.265/HEVC to the base-format image data Vb' for transmission to obtain encoded image data, and generates a base video stream (video base stream) BS having the encoded image data.

The encoding unit 106e applies a predictive encoding process such as h.264/AVC or h.265/HEVC to the high-quality format image data Ve' for transmission to obtain encoded image data, and generates an extended video stream (video elementary stream) ES having the encoded image data. In this case, the encoding unit 106e selectively performs intra prediction of the image data Ve ' and prediction between the image data Ve ' and the image data Vb ' for each encoding block so that the prediction residual is small.

Fig. 4 shows a configuration example of a main part of the encoding unit 106 e. The encoding unit 106e includes an intra-layer prediction unit 161, an inter-layer prediction unit 162, a prediction adjustment unit 163, a selection unit 164, and an encoding function unit 165.

The intra-layer prediction unit 161 performs prediction (intra-layer prediction) inside the image data V1 on the image data V1 to be encoded to obtain prediction residual data. The inter-layer prediction unit 162 performs prediction (inter-layer prediction) between the image data V1 and the image data V2 to be referred to on the image data V1 to be encoded to obtain prediction residual data.

The prediction adjusting unit 163 performs the following processing according to the type of scalable extension (scalable extension) of the image data V1 with respect to the image data V2, in order to efficiently perform inter-layer prediction in the inter-layer prediction unit 162. That is, in the case of dynamic range extension, the prediction adjusting unit 163 performs level adjustment (dynamic range conversion) for converting data from SDR to HDR. In the case of frame rate extension, the prediction adjusting unit 163 skips this processing.

With reference to fig. 5, the stage adjustment (dynamic range conversion) in the case of dynamic range expansion will be further described. The solid line a shows an example of an SDR OETF curve representing the SDR photoelectric conversion characteristic. The solid line b shows an example of an HDR OETF curve representing the HDR photoelectric conversion characteristic. The horizontal axis represents the input luminance level, P1 represents the input luminance level corresponding to the SDR maximum luminance level, and P2 represents the input luminance level corresponding to the HDR maximum luminance level.

Further, the vertical axis represents the relative values of the transmission code value or the normalized encoding level. The relative maximum coding level M denotes the HDR maximum coding level and the SDR maximum coding level. The reference coding level G represents the transport coding level of the HDR OETF in the input luminance level P1 corresponding to the SDR maximum coding level and represents the so-called reference white level. The branch coding level B represents a coding level at which the SDR OETF curve and the HDR OETF curve branch and separate from the same track. Pf denotes the input luminance level corresponding to the branch coding level. Note that the branch coding level B may be any value of 0 or more.

In the level adjustment (dynamic range conversion) in the prediction adjusting unit 163, data from the branch encoding level B to the relative maximum encoding level M of the basic format image data Vb' is converted into a value having conversion data based on the HDR photoelectric conversion characteristic. In this case, the relative maximum coding level M of the SDR is converted to comply with the reference coding level G. Note that input data smaller than the branch encoding stage B is output as output data as it is.

Here, the conversion information is provided by a conversion table or a conversion coefficient. In the case where the conversion information is supplied through the conversion table, the prediction adjusting unit 163 performs the conversion by referring to the conversion table. On the other hand, in the case where the conversion information is provided by the conversion coefficient, the prediction adjusting unit 163 performs the conversion by calculation using the conversion coefficient. For example, the prediction adjusting unit 163 performs conversion on input data from the branch encoding stage B to the relative maximum encoding stage M by the following expression (1) where the conversion coefficient is C:

output data branch coding stage B + (input data branch coding stage B) × c. (1)

The selection unit 164 selectively extracts prediction residual data obtained in the intra-layer prediction unit 161 or prediction residual data obtained in the inter-layer prediction unit 162 for each coding block and transmits the prediction residual data to the encoding function unit 165. In this case, for example, the selection unit 164 extracts prediction residual data having a smaller prediction residual. The encoding function unit 165 performs encoding processing such as transform coding, quantization, and entropy encoding on the prediction residual data extracted by the selection unit 164 to obtain encoded image data CV.

The encoded image data contained in the extended video stream includes an encoded component according to the type of the high-quality format image data Ve. That is, in the case where the image data Ve is (a) HFR (100Hz) and HDR image data, HDR and HFR encoding components are contained. Further, in the case where the image data Ve is (b) LFR (50Hz) and HDR image data, HDR and LFR encoding components are contained. Further, in the case where the image data Ve is (c) HFR (100Hz) and SDR image data, SDR and HFR encoding components are contained.

Referring back to fig. 2, the encoding unit 106e inserts information indicating a high-quality format corresponding to encoded image data of the extended video stream (high-quality format information) into the extended video stream ES. For example, the encoding unit 106e inserts high quality format information into the header of a NAL unit or an area of a SEI NAL unit.

Fig. 6 (a) shows a structural example (syntax) of the NAL unit header, and fig. 6 (b) shows the contents (semantics) of the main parameters of the structural example. The 1-bit field "forbidden _ zero _ bit" requires 0. The 6-bit field "NAL _ unit _ type" indicates the NAL unit type. The 6-bit field "nuh _ layer _ ID" indicates the ID of the extension layer. The 3-bit field "nuh _ temporal _ id _ plus 1" indicates temporal _ id (0 to 6), and takes a value (1 to 7) obtained by adding 1 to the temporal _ id.

In the elementary video stream, the value of "nuh _ layer _ id" is "0". When high quality format information is inserted into the header of a NAL unit, the value of "nuh _ layer _ id" is fixed with the type of a coded component contained in coded picture data in an extended video stream. That is, the value of "nuh _ layer _ id" is a fixed value according to the type of high-quality format corresponding to the encoded image data.

At this time, "nuh _ layer _ id" constitutes high-quality format information. For example, in the case where the encoding components contained in the encoded image data are HDR and HFR encoding components, "nuh _ layer _ id" is "6". In addition, in the case where the encoded components included in the encoded image data are HDR and LFR encoded components, "nuh _ layer _ id" is "5". Further, in the case where the code components contained in the coded image data are SDR and HFR code components, "nuh _ layer _ id" is "0".

Meanwhile, in the extended video stream, high quality format information is inserted into an SEI NAL unit with a value of "nuh _ layer _ id" flexibly allocated. In this case, "nuh _ layer _ id" does not have a function of directly indicating a high-quality format corresponding to the coded image data of the extension video stream.

At this time, the encoding unit 106e inserts a newly defined Scalable linking SEI message (Scalable _ linking SEI message) having high quality format information into the "SEI" part of the Access Unit (AU).

Fig. 7 shows an access unit of a header of a group of pictures (GOP) in the case where the encoding method is HEVC. Further, fig. 8 shows an access unit different from the header of the GOP in the case where the encoding method is HEVC. In the case of the HEVC coding method, the SEI message group "Prefix _ SEI" for decoding is arranged before the segment with the coded pixel data, and the SEI message group "Suffix _ SEI" for display is arranged after the segment. The scalable linking SEI message is arranged as, for example, an SEI message group "Suffix _ SEI" or "Prefix _ SEI" as shown in fig. 7 and 8.

Fig. 9 shows a structural example (syntax) of a scalable linking SEI message. Fig. 10 shows the contents (semantics) of the main information in the structure example. A one-bit flag information "Scalable _ linkage _ cancel _ flag" indicates whether to refresh the SEI message of the "Scalable _ linkage" SEI message. A "0" indicates a refresh SEI message. A "1" indicates that the SEI message is not refreshed, that is, remains the original message.

In the case where "Scalable _ linkage _ cancel _ flag" is "0", the following fields exist. The 16-bit field "scalable _ index" is an index indicating the type of scalable extension stream. That is, this field constitutes high quality format information.

For example, "0 x 8" indicates an extension component of HDR, that is, encoding components included in encoded image data are HDR and LFR encoding components. Further, "0 x 10" represents the extension component of HFR, i.e., the encoding components contained in the encoded image data are SDR and HFR encoding components. Further, "0 x 18" denotes extension components of HDR and HFR, that is, encoding components contained in encoded image data are HDR and HFR encoding components.

The 6-bit field "nuh _ layer _ ID" indicates a layer ID contained in the extended video stream. The 15-bit field "reference _ level" indicates a reference brightness level value, i.e., a reference encoding level G (see fig. 5). One bit flag information "conversion _ table _ flag" is indicated by the conversion table (i.e., the presence of conversion table information).

In the case where "conversion _ table _ flag" is "1", there is an 8-bit field "table _ size". This field indicates the number of inputs to the conversion table. Then, there are 16-bit fields "predctrl _ y [ i ]", "predctrl _ cb [ i ]" and "predctrl _ cr [ i ]", the numbers of which are the numbers of the inputs, respectively. The field "predctrl _ y [ i ]" indicates the value of the predictive adjustment conversion with respect to luminance. The field "predctrl _ Cb [ i ]" indicates a value of the prediction adjustment conversion with respect to the chroma Cb. The field "predctrl _ Cr [ i ]" indicates the value of the predictive adjustment conversion with respect to the chroma Cr.

Fig. 11 shows a configuration example of the encoded image data of the base video stream BS obtained in the encoding unit 106b and the encoded image data of the extended video stream ES obtained in the encoding unit 106 e. In this example, the SDR and LFR encoding components are contained in the encoded image data of the base video stream BS, and the HDR and HFR encoding components are contained in the encoded image data of the extension video stream ES. In this example, the value of "nuh _ layer _ id" is fixed based on the type of coded component contained in the coded image data, and "nuh _ layer _ id" constitutes high-quality format information.

The horizontal axis represents the combined Picture Order (POC), and the display time on the left side is first in progress, while the right side is behind the left side. The vertical axis represents the hierarchy. Each box represents an image, and each arrow represents a reference relationship between images in the predictive encoding process. In the inter-layer prediction and the intra-layer prediction, the current picture is changed in each block, and the direction of prediction and the reference number are not limited to the illustrated example. The numbers in the boxes indicate the order of pictures to be encoded, i.e., the encoding order (decoding order on the receiving side). Some image sub-groups are put together to form a group of pictures (GOP).

The encoded image data of the elementary video stream is hierarchically encoded in two levels of 0 and 1. 0 and 1 are set as temporal _ id (level identification information), respectively, which are arranged in the header of NAL units (NAL _ units) configuring encoded image data of the level 0 and level 1 pictures. In addition, the value of "nuh _ layer _ id" of the header of the NAL unit is "0".

Further, the encoded image data of the extended video stream is hierarchically encoded in three levels of 0, 1, and 2. 0. 1 and 2 are set as temporal _ id (level identification information), respectively, which are arranged in the header of NAL units (NAL _ units) of encoded image data configuring the pictures of levels 0, 1, and 2. Further, the value of "nuh _ layer _ id" of the header of the NAL unit is "6", and indicates that the encoding components contained in the encoded image data are HDR and HFR encoding components.

Fig. 12 is a schematic diagram of managing decoding and display times (decoding and display time stamps) in the case of the layered coding of fig. 11. These numbers correspond to the numbers in the boxes in fig. 11, which represent the images.

"B" represents an image constituting a basic frame rate. "L" denotes an image component of the basic frame rate, and the dynamic range is a prediction difference from the basic video stream BS. "U" denotes a frame located at an intermediate time between frames of the basic frame rate, and becomes a high frame rate by being temporally combined with the frame "L" or "B". Further, "U" may have a differential component with the frame "B" or "L" in the dynamic range and become a high dynamic range by being combined with the frame "B" or "L".

In the management of the decoding time (decoding time stamp), encoding buffer management (HRD management) is performed such that decoding of "B" and "L" is performed simultaneously, and decoding of "U" is performed at an intermediate time between the decoding time of "B" or "L" before "U" and the decoding time of "B" or "L" after "U". In the display time (display time stamp), the coding buffer management (HRD management) is similarly performed so that "B" and "L" are predicted, combined, and displayed simultaneously, and "U" is displayed at an intermediate time between the display times of the basic frame rate images before and after "U".

Note that in the following other configuration examples, management of decoding and display time (decoded and displayed time stamp) is similarly performed, although detailed description thereof is omitted.

Fig. 13 also shows a configuration example of the encoded image data of the base video stream BS obtained in the encoding unit 106b and the encoded image data of the extended video stream ES obtained in the encoding unit 106 e. In this example, the SDR and LFR encoding components are contained in the encoded image data of the base video stream BS, and the HDR and LFR encoding components are contained in the encoded image data of the extension video stream ES. In this example, the value of "nuh _ layer _ id" is fixed based on the type of coded component contained in the coded image data, and "nuh _ layer _ id" constitutes high-quality format information.

The encoded image data of the elementary video stream is hierarchically encoded in two levels of 0 and 1. 0 and 1 are set as temporal _ id (level identification information), respectively, which are arranged in the header of NAL units (NAL _ units) configuring encoded image data of the level 0 and level 1 pictures. In addition, the value of "nuh _ layer _ id" of the header of the NAL unit is "0".

Further, the encoded image data of the extended video stream is hierarchically encoded in two levels of levels 0 and 1 in the image sub-group before switching, and 0 and 1 are set as temporal _ id (level identification information), respectively, which is arranged in the header of NAL units (NAL _ units) configuring the encoded image data of the images of level 0 and level 1.

The layered coding is performed in two levels of 0 and 2 in the image sub-group after the switching, and 0 and 2 are set as temporal _ id (level identification information), respectively, which is arranged in a header of NAL units (NAL _ units) configuring encoded image data of the images of level 0 and level 2. Further, the value of "nuh _ layer _ id" of the header of the NAL unit is "5", which means that the encoding components contained in the encoded image data are HDR and LFR encoding components.

Fig. 14 also shows a configuration example of the encoded image data of the base video stream BS obtained in the encoding unit 106b and the encoded image data of the extended video stream ES obtained in the encoding unit 106 e. In this example, the SDR and LFR encoding components are contained in the encoded image data of the base video stream BS, and the SDR and HFR encoding components are contained in the encoded image data of the extension video stream ES. In this example, the value of "nuh _ layer _ id" is fixed based on the type of coded component contained in the coded image data, and "nuh _ layer _ id" constitutes high-quality format information.

The encoded image data of the elementary video stream is hierarchically encoded in two levels of 0 and 1. 0 and 1 are set as temporal _ id (level identification information), respectively, which is arranged at the header of NAL units (NAL _ units) configuring the encoded image data of the pictures of levels 0 and 1. In addition, the value of "nuh _ layer _ id" of the header of the NAL unit is "0".

Further, the encoded image data of the extended video stream is hierarchically encoded in three levels of levels 0, 1, and 2 in the image sub-group before switching, and 0, 1, and 2 are respectively set as temporal _ id (level identification information) arranged at the header of NAL units (NAL _ units) configuring the encoded image data of the images of levels 0, 1, and 2. Here, images "2", "5", "8", and "11" are copies of images "1", "4", "7", and "10", respectively, image "3" refers to image "2", and image "6" refers to image "5".

The layered coding is performed at one level of level 0 in the sub-group of pictures after the switching, and 0 is set as temporal _ id (level identification information) which is arranged in a header of NAL units (NAL _ units) configuring coded picture data of the pictures of level 0. Note that as shown by the dashed line, one level of level 1 or one level of level 2 may be used. Further, the value of "nuh _ layer _ id" of the header of the NAL unit is "0", which means that the coding components contained in the coded image data are SDR and HFR coding components.

Fig. 15 also shows a configuration example of the encoded image data of the base video stream BS obtained in the encoding unit 106b and the encoded image data of the extended video stream ES obtained in the encoding unit 106 e. In this example, the SDR and LFR encoding components are contained in the encoded image data of the base video stream BS, the SDR and HFR encoding components are contained in the encoded image data of the extension video stream ES before switching, and the HDR and HFR encoding components are contained in the encoded image data of the extension video stream ES after switching.

In this example, the value of "nuh _ layer _ id" is flexibly allocated, and "nuh _ layer _ id" does not constitute high-quality format information. That is, this example is an example in which the "scalable _ index" of the scalable linking SEI message constitutes high quality format information.

The encoded image data of the elementary video stream is hierarchically encoded in two levels of 0 and 1. 0 and 1 are set as temporal _ id (level identification information), respectively, which are arranged in the header of NAL units (NAL _ units) configuring the encoded image data of the pictures of levels 0 and 1. In addition, the value of "nuh _ layer _ id" of the header of the NAL unit is "0".

Further, the encoded image data of the extended video stream is hierarchically encoded in three levels of levels 0, 1, and 2 in the image sub-group before switching, and 0, 1, and 2 are respectively set as temporal _ id (level identification information) arranged in the header of NAL units (NAL _ units) configuring the encoded image data of the images of levels 0, 1, and 2. Here, images "2", "5", "8", and "11" are copies of images "1", "4", "7", and "10", respectively, image "3" refers to image "2", and image "6" refers to image "5".

The hierarchical encoding is performed in three levels of levels 0, 1, and 2 in a sub group of pictures after switching, and 0, 1, and 2 are set as temporal _ id (level identification information) arranged in a header of NAL units (NAL _ units) configuring encoded picture data of pictures of levels 0 and 1, 2. Here, the images "14", "17", "20", and "23" are HDR differential images of the images "13", "16", "19", and "22", respectively.

The value of "nuh _ layer _ id" of the header of the NAL unit is "6" before and after the switching, and in "nuh _ layer _ id", the type of coded component contained in the coded picture data of the extended video stream is not indicated. In this case, "scalable _ index" corresponds to "0 x 10" (which represents SDR and HFR encoded components) after the handover, and "scalable _ index" corresponds to "0 x 18" (which represents HDR and HFR encoded components) before the handover.

Fig. 16 also shows a configuration example of the encoded image data of the base video stream BS obtained in the encoding unit 106b and the encoded image data of the extended video stream ES obtained in the encoding unit 106 e. In this example, the SDR and LFR encoding components are contained in the encoded image data of the base video stream BS, the SDR and HFR encoding components are contained in the encoded image data of the extension video stream ES before switching, and the HDR and LFR encoding components are contained in the encoded image data of the extension video stream ES after switching.

In this example, the value of "nuh _ layer _ id" is flexibly allocated, and "nuh _ layer _ id" does not constitute high-quality format information. That is, this example is an example in which the "scalable _ index" of the scalable linking SEI message constitutes high quality format information.

The encoded image data of the elementary video stream is hierarchically encoded in two levels of 0 and 1. 0 and 1 are set as temporal _ id (level identification information), respectively, which are arranged in the header of NAL units (NAL _ units) configuring the encoded image data of the pictures of levels 0 and 1. In addition, the value of "nuh _ layer _ id" of the header of the NAL unit is "0".

Further, the encoded image data of the extended video stream is hierarchically encoded in three levels of levels 0, 1, and 2 in the image sub-group before switching, and 0, 1, and 2 are respectively set as temporal _ id (level identification information) arranged in the header of NAL units (NAL _ units) configuring the encoded image data of the images of levels 0, 1, and 2. Here, images "2", "5", "8", and "11" are copies of images "1", "4", "7", and "10", respectively, image "3" refers to image "2", and image "6" refers to image "5".

The layered coding is performed in three levels of levels 0, 1, and 2 in the sub-group of the pictures after the switching, and 0, 1, and 2 are set as temporal _ id (level identification information) arranged in the header of NAL units (NAL _ units) configuring the coded picture data of the pictures of levels 0 and 1, 2. Here, the images "14", "16", "18", and "20" are HDR differential images of the images "13", "15", "17", and "19", respectively.

The value of "nuh _ layer _ id" of the header of the NAL unit is "6" before and after the switching, and in "nuh _ layer _ id", the type of coded component contained in the coded picture data of the extended video stream is not indicated. In this case, "scalable _ index" corresponds to "0 x 10" (which represents SDR and HFR encoded components) before the handover, and "scalable _ index" corresponds to "0 x 08" (which represents HDR and LFR encoded components) after the handover.

Fig. 17 also shows a configuration example of the encoded image data of the base video stream BS obtained in the encoding unit 106b and the encoded image data of the extended video stream ES obtained in the encoding unit 106 e. In this example, the SDR and LFR encoding components are contained in the encoded image data of the base video stream BS, the HDR and HFR encoding components are contained in the encoded image data of the extension video stream ES before switching, and the HDR and LFR encoding components are contained in the encoded image data of the extension video stream ES after switching.

In this example, the value of "nuh _ layer _ id" is flexibly allocated, and "nuh _ layer _ id" does not constitute high-quality format information. That is, this example is an example in which the "scalable _ index" of the scalable linking SEI message constitutes high quality format information.

The encoded image data of the elementary video stream is hierarchically encoded in two levels of 0 and 1. 0 and 1 are set as temporal _ id (level identification information), respectively, which are arranged in the header of NAL units (NAL _ units) configuring the encoded image data of the pictures of levels 0 and 1. In addition, the value of "nuh _ layer _ id" of the header of the NAL unit is "0".

Further, the encoded image data of the extended video stream is hierarchically encoded in three levels of levels 0, 1, and 2 in the image sub-group before switching, and 0, 1, and 2 are respectively set as temporal _ id (level identification information) arranged in the header of NAL units (NAL _ units) configuring the encoded image data of the images of levels 0, 1, and 2. Here, the images "2", "5", "8", and "11" are HDR differential images of the images "1", "4", "7", and "10", respectively.

The hierarchical encoding is performed in three levels of levels 0, 1, and 2 in a sub group of pictures after switching, and 0, 1, and 2 are set as temporal _ id (level identification information) arranged in a header of NAL units (NAL _ units) configuring encoded picture data of pictures of levels 0 and 1, 2. Here, the images "14", "16", "18", and "20" are images of HDR difference values of the images "13", "15", "17", and "19", respectively.

The value of "nuh _ layer _ id" of the header of the NAL unit is "6" before and after the switching, and in "nuh _ layer _ id", the type of coded component contained in the coded picture data of the extended video stream is not indicated. In this case, "scalable _ index" corresponds to "0 x 18" (which represents HDR and HFR encoding components) before the handover, and "scalable _ index" corresponds to "0 x 08" (which represents HDR and LFR encoding components) after the handover.

Referring back to fig. 2, the system encoder 107 PES-packetizes and TS-packetizes the base video stream BS and the extended video stream ES generated in the video encoder 106 to generate a transport stream TS. Then, the transmission unit 108 transmits the transport stream TS to the reception apparatus 200 on the broadcast wave packet or the network packet.

The system encoder 107 inserts high quality format information into the transport stream TS as a container. In the present embodiment, a newly defined Scalable _ linking descriptor (Scalable _ linking descriptor) is inserted into a video elementary stream loop (video elementary stream loop) corresponding to an extended video stream existing under the arrangement of a Program Map Table (PMT).

Fig. 18 (a) shows an example (syntax) of the structure of a scalable link descriptor (scalable link descriptor). Fig. 18 (b) shows the content (semantic meaning) of the main information in the configuration example. The 8-bit field 'descriptor _ tag' indicates a descriptor type, and indicates an extensible link descriptor here. The 8-bit field "descriptor _ length" indicates the length (size) of the descriptor, and indicates the number of bytes to follow as the length of the descriptor.

The 16-bit field "scalable _ index" is an index indicating the type of scalable extension stream. The field of the scalable _ index is set to the same value as the field of the scalable _ index of the scalable link SEI message (see fig. 9). Therefore, in case that the value of "nuh _ layer _ id" of the header of the NAL unit is flexibly allocated and the field "scalable _ index" of the scalable link SEI message constitutes high quality format information, the field "fields _ index" of the scalable link descriptor constitutes the same information.

A 6-bit field of "nuh layer ID" indicates a layer ID contained in the extended video stream. The "nuh _ layer _ id" field is set to the same value as the field "nuh _ layer _ id" of the header of the NAL unit. Therefore, in the case where the value of "nuh _ layer _ id" of the header of the NAL unit is fixed based on the type of coded component contained in the coded picture data of the extended video stream and the field "nuh _ layer _ id" constitutes high quality format information, the field of "nuh _ layer _ id" of the scalable link descriptor constitutes the same information.

Fig. 19 (a) shows an example of the value of "nuh _ layer _ id" of the header of the NAL unit based on the value of the field in the case where the coded component is fixed. In this case, a field "nuh _ layer _ id" of the header of the NAL unit constitutes high-quality format information. In this case, the field 'nuh _ layer _ id' of the scalable link descriptor also constitutes high quality format information. In this case, the value of "scalable _ index" of the scalable link descriptor does not indicate a high quality format corresponding to the encoded picture data of the extended video stream.

Fig. 19 (b) shows an example of values of fields in a case where a value of "nuh _ layer _ id" of a header of a NAL unit is flexibly allocated. In this case, the field "nuh _ layer _ id" of the header of the NAL unit does not constitute high quality format information. In this case, the field "scalable _ index" of the scalable linking SEI message constitutes high quality format information. In this case, the field "scalable _ index" of the scalable link descriptor also constitutes high quality format information. Note that although not shown, it is considered that prediction conversion information (dynamic range conversion information) is included in the scalable link descriptor, similar to the scalable link SEI message (see fig. 9).

[ configuration of transport stream TS ]

Fig. 20 shows a configuration example of the transport stream TS. The transport stream TS comprises a base video stream BS and an extended video stream ES. In this configuration example, PES packets of a video streamVideo PESIs present.

For example, the Packet Identification (PID) of the elementary video stream BS is PID 1. The elementary video stream BS comprises encoded image data of images of the elementary format image data. In the encoded image data, NAL units such as AUD, VPS, SPS, PPS, PSEI, SLICE, SSEI, and EOS exist.

Further, the Packet Identification (PID) of the extended video stream ES is, for example, PID 2. The extended video stream ES includes encoded image data of images of the high-quality format image data. In the encoded image data, NAL units such as AUD, SPS, PPS, PSEI, SLICE, SSEI, and EOS exist.

For example, when the value of "nuh _ layer _ id" of the header of a NAL unit is fixed based on a coded component, a field of "nuh _ layer _ id" constitutes high-quality format information. Meanwhile, in case that the value of "nuh _ layer _ id" of the header of the NAL unit is flexibly allocated, a scalable link SEI message (see fig. 9) including a field of "scalable _ index" constituting high quality format information is inserted into a portion of "SEI" of an Access Unit (AU).

Further, a Program Map Table (PMT) is included in the transport stream TS as Program Specific Information (PSI). The PSI is information describing to which program the basic system contained in the transport stream belongs.

In the PMT, there is a program loop (program loop) that describes information about the entire program. Further, in the PMT, there is an elementary stream loop (elementary stream loop) having information on elementary streams.

In this configuration example, there are video elementary stream loops (video ES loops) corresponding to the base video stream BS and the extended video stream ES, respectively. In a video elementary stream loop corresponding to the base video stream BS, information such as a stream type (ST0) and a packet identification (PID1) is arranged, and a descriptor describing information on the extended video stream ES is also arranged.

Further, in the video elementary stream loop corresponding to the extended video stream ES, information such as the stream type (ST1) and the packet identification (PID2) is arranged, and a descriptor describing information on the extended video stream ES is also arranged. As one of the descriptors, a scalable link descriptor is arranged (see (a) and (b) in fig. 18).

The operation of the transmitting apparatus 100 shown in fig. 2 will be briefly described. The basic-format image data Vb as SDR image data having a frame frequency of 50Hz is supplied to the photoelectric conversion unit 102. In this photoelectric conversion unit 102, an SDR photoelectric conversion characteristic (SDR OETF curve) is applied to the base-format image data Vb, and base-format image data Vb' for transmission is obtained. The basic format image data Vb' is converted from the RGB domain to the luminance and chrominance (YCbCr) domain by the RGB/YCbCr conversion unit 103, and then supplied to the encoding units 106b and 106e of the video encoder 106.

Further, the high-quality format image data Ve is supplied to the photoelectric conversion unit 104. The high-quality format image data Ve is, for example, any of (a) HDR image data with a frame frequency of 100Hz, (b) HDR image data with a frame frequency of 50Hz, and (c) SDR image data with a frame frequency of 100 Hz.

In the photoelectric conversion unit 104, the HDR photoelectric conversion characteristic (HDR OETF curve) or the SDR photoelectric conversion characteristic (SDR OETF curve) is applied to the high-quality-format image data Ve, and high-quality-format image data Ve' for transmission is obtained. The high-quality format image data Ve' is converted from the RGB domain to the luminance and chrominance (YCbCr) domain by the RGB/YCbCr conversion unit 105, and then supplied from the encoding unit 106e of the video encoder 106.

In the encoding unit 106b, a predictive encoding process such as h.264/AVC or h.265/HEVC is applied to the base-format image data Vb' for transmission, and encoded image data is obtained, and a base video stream (video base stream) BS including the encoded video data is generated. The base video stream BS is provided to the system encoder 107.

In the encoding unit 106e, a predictive encoding process such as h.264/AVC or h.265/HEVC is applied to the high-quality format image data Ve' for transmission, and encoded image data is obtained, and an extended video stream (video elementary stream) ES including the encoded video data is generated. In this case, in the encoding unit 106e, prediction inside the image data Ve ' and prediction between the image data Ve ' and the image data Vb ' are selectively performed for each encoding block so as to make the prediction residual smaller.

In the encoding unit 106e, high-quality format information is inserted into the extended video stream ES. This information is inserted, for example, in the header of the NAL unit. In this case, the value of "nuh _ layer _ id" is fixed based on the type of the coded component contained in the coded image data (see (a) and (b) in fig. 6). Also, for example, the information is inserted into an area of the SEI NAL unit. In this case, a newly defined scalable link SEI message (see fig. 9) is inserted into the "SEI" part of the Access Unit (AU).

In the system encoder 107, the base video stream BS and the extension video stream ES generated in the video encoder 106 are PES-packetized and TS-packetized, and a transport stream TS is generated. At this time, in the system encoder 107, high quality format information is inserted into the transport stream TS as a container. That is, the newly defined scalable link descriptor (see (a) and (b) in fig. 18) is arranged in the video elementary stream loop corresponding to the extended video stream existing under the arrangement of the PMT.

The transport stream TS generated in the system encoder 107 is transmitted to the transmission unit 108. In the transmission unit 108, a transport stream TS on a broadcast wave packet or a network packet is transmitted to the receiving device 200.

Arrangement of receiving apparatus "

Fig. 21 shows a configuration example of the reception apparatus 200. The reception apparatus 200 corresponds to a configuration example of the transmission apparatus 100 of fig. 2. The receiving apparatus 200 includes a control unit 201, a receiving unit 202, a system decoder 203, a video decoder 204, a YCbCr/RGB converting unit 205, an electric-to-optical converting unit 206, a YCbCr/RGB converting unit 207, an electric-to-optical converting unit 208, and a display unit 209.

The control unit 201 includes a Central Processing Unit (CPU), and controls the operations of the respective units of the reception apparatus 200 based on a control program. The receiving unit 202 receives a transport stream TS on a broadcast wave packet or a network packet transmitted from the transmitting device 100. The system decoder 203 extracts the base video stream BS and the extended video stream ES from the transport stream TS.

Further, the system decoder 203 extracts various types of information inserted in the transport stream TS as a container and transmits the information to the control unit 201. This information includes the scalable link descriptor described above (see (a) and (b) in fig. 18). The control unit 201 identifies the type of coded components contained in the coded picture data of the extended video stream ES based on the "nuh _ layer _ id" field or the "scalable _ index" field of the descriptor. That is, the control unit 201 identifies whether HDR and HFR encoding components are contained, whether HDR and LFR encoding components are contained, or whether SDR and HFR encoding components are contained, and selects an object to be decoded according to the configuration of the receiving and displaying system.

The video decoder 204 includes a decoding unit 204b and a decoding unit 204 e. The decoding unit 204b applies a decoding process to the base video stream BS to obtain base-format image data Vb'. In this case, the decoding unit 204b performs prediction compensation inside the image data Vb'.

The decoding unit 204e performs decoding processing on the extended video stream ES to generate high-quality format image data Ve'. In this case, the decoding unit 204e performs, for each coding block, prediction compensation inside the image data Ve ' or between the image data Ve ' and the image data Vb ', corresponding to prediction in encoding.

Fig. 22 shows a configuration example of a main part of the decoding unit 204 e. The decoding unit 204e performs the inverse of the processing of the encoding unit 106e of fig. 4. The decoding unit 204e includes a decoding function unit 241, an intra-layer prediction compensation unit 242, an inter-layer prediction compensation unit 243, a prediction adjustment unit 244, and a selection unit 245.

The decoding function unit 241 performs decoding processing other than prediction compensation on the encoded image data CV to obtain prediction residual data. The intra-layer prediction compensation unit 242 performs prediction compensation (intra-layer prediction compensation) inside the image data V1 on the prediction residual data to obtain image data V1. The inter-layer prediction compensation unit 243 performs prediction compensation (inter-layer prediction compensation) between the image data V1 and the image data V2 to be referred to on the prediction residual data to obtain image data V1.

Similar to the prediction adjusting unit 163 of the encoding unit 106e of fig. 4, the prediction adjusting unit 244 performs processing according to the type of scalable extension of the image data V1 with respect to the image data V2, although detailed description thereof is omitted. The selection unit 245 selectively extracts the image data V1 obtained in the intra-layer prediction compensation unit 242 or the image data V1 obtained in the inter-layer prediction compensation unit 243 for each coding block corresponding to prediction in encoding, and outputs the image data V1.

Referring to fig. 21, the decoding unit 204e extracts a parameter set, an SEI message, and the like, inserted into an access unit configuring an extended video stream ES, and transmits the extracted information to the control unit 201. The control unit 201 identifies a high quality format corresponding to encoded picture data of the extended video stream ES from a field "nuh _ layer _ id" of a header of a NAL unit or a field "scalable _ index" of a scalable link SEI message.

As described above, the control unit 201 also identifies the type of the coded component contained in the coded picture data of the extended video stream ES based on the field "nuh _ layer _ id" or the field "scalable _ index" of the scalable link descriptor. However, identification information such as a container level of a descriptor cannot follow a dynamic change in a video frame unit. Identifying the type of coded component contained in the coded picture data of the extended video stream ES from the field "nuh _ layer _ id" of the header of the NAL unit or the field "scalable _ index" of the scalable link SEI message enables following dynamic changes in frame units.

Further, the control unit 201 identifies conversion table information for prediction adjustment from a field "scalable _ index" of the scalable link SEI message. The control unit 201 sets the conversion table to the prediction adjusting unit 244. With this setting, the prediction adjustment unit 244 can reliably perform level adjustment (dynamic range conversion) similar to that of the prediction adjustment unit 163 on the transmission side.

The YCbCr/RGB converting unit 205 converts the basic format image data Vb' obtained in the decoding unit 204b from the luminance and chrominance (YCbCr) domain into the RGB domain. The electro-optical conversion unit 206 applies electro-optical conversion having characteristics opposite to those of the photoelectric conversion unit 102 in the transmission apparatus 100 to the basic format image data Vb' converted into the RGB domain to obtain basic format image data Vb. The basic format image data is SDR image data with the frame frequency of 50 Hz.

The YCbCr/RGB converting unit 207 converts the high-quality format image data Ve' obtained in the decoding unit 204e from the luminance and chrominance (YCbCr) domain into the RGB domain. The electro-optical conversion unit 208 applies electro-optical conversion having characteristics opposite to those of the photoelectric conversion unit 104 in the transmission apparatus 100 to the high-quality format image data Ve' converted into the RGB domain to obtain the high-quality format image data Ve.

The high-quality format image data is any one of (a) HDR image data at a frame rate of 100Hz (high frame rate: HFR), (b) HDR image data at a frame rate of 50Hz (lfr), and (c) SDR image data at a frame rate of 100Hz (HFR).

The display unit 209 is configured by, for example, a Liquid Crystal Display (LCD), an organic Electroluminescence (EL) panel, or the like. The display unit 209 displays an image of the basic format image data Vb or the high quality format image data Ve according to the display capability.

In this case, the control unit 201 controls image data to be supplied to the display unit 209, that is, image data obtained as display image data. This control is performed based on the high-quality format information of the encoded image data corresponding to the extended video stream ES and thus based on the high-quality format information of the high-quality format image data Ve and the display capability information of the display unit 209.

That is, in a case where the display unit 209 cannot display an image of the high-quality format image data Ve, the control unit 201 performs control such that the basic format image data Vb is supplied to the display unit 209. On the other hand, in a case where the display unit 209 can display an image of the high-quality-format image data Ve, the control unit 201 performs control such that the high-quality-format image data Ve is supplied to the display unit 209.

The operation of the receiving apparatus 200 shown in fig. 21 will be briefly described. In the receiving unit 202, a transport stream TS on a broadcast wave packet or a network packet transmitted from the transmitting device 100 is received. The transport stream TS is supplied to the system decoder 203. In the system decoder 203, the base video stream BS and the extended video stream ES are extracted from the transport stream TS. The elementary video stream BS is supplied to the decoding unit 204 b. The extended video stream ES is supplied to the decoding unit 204 e.

Further, in the system decoder 203, various types of information inserted in the transport stream TS as a container are extracted and transmitted to the control unit 201. This information also includes a scalable link descriptor (see (a) and (b) in fig. 18). In the control unit 201, a high quality format corresponding to the encoded picture data of the extended video stream ES is identified based on the field "scalable _ index" or the field "nuh _ layer _ id" of the descriptor.

In the decoding unit 204b, the decoding process is applied to the base video stream BS, and base-format image data Vb' is generated. In this case, in the decoding unit 204b, prediction compensation is performed inside the image data Vb'. In the decoding unit 204e, a decoding process is applied to the extended video stream ES, and high-quality format image data Ve' is generated. In this case, corresponding to prediction in encoding, in the decoding unit 204e, prediction compensation within the image data Ve ' or prediction compensation between the image data Ve ' and the image data Vb ' is performed for each encoding block.

Further, in the decoding unit 204e, a parameter set, an SEI message, and the like inserted in an access unit configuring the extended video stream ES are extracted and transmitted to the control unit 201. In the control unit 201, a high quality format corresponding to coded picture data of the extended video stream ES is identified in frame units from a field "nuh _ layer _ id" of a header of a NAL unit or a field "scalable _ index" of a scalable linking SEI message.

The basic format image data Vb' obtained in the decoding unit 204b is converted from the luminance and chrominance (YCbCr) domain into the RGB domain in the YCbCr/RGB converting unit 205, and is supplied to the electro-optical converting unit 206. In the electro-optical conversion unit 206, electro-optical conversion having characteristics opposite to the photoelectric conversion in the transmission apparatus 100 is applied to the basic format image data Vb' converted into the RGB domain, and the basic format image data Vb is obtained.

The high-quality format image data Ve' obtained in the decoding unit 204e is converted from the luminance and chrominance (YCbCr) domain into the RGB domain in the YCbCr/RGB converting unit 207, and is supplied to the electro-optical converting unit 208. In the electro-optical conversion unit 208, electro-optical conversion having characteristics opposite to those of the photoelectric conversion in the transmission apparatus 100 is applied to the high-quality format image data Ve' converted into the RGB domain, and the high-quality format image data Ve is obtained.

Image data (display image data) supplied to the display unit 209 is controlled by the control unit 201. This control is performed based on the high-quality format information of the encoded image data corresponding to the extended video stream ES and thus based on the high-quality format information of the high-quality format image data Ve and the display capability information of the display unit 209.

In a case where the display unit 209 cannot display an image of the high-quality format image data Ve, the control unit 201 performs control such that the basic format image data Vb is supplied to the display unit 209. By this control, a basic format (50H and normal dynamic range) image is displayed on the display unit 209. On the other hand, in the case where the display unit 209 can display an image of the high-quality-format image data Ve, the high-quality-format image data Ve is supplied to the display unit 209. By this control, a high-quality format image is displayed on the display unit 209.

As described above, in the transmission and reception system 10 shown in fig. 1, information indicating a high-quality format corresponding to encoded image data of an extension video stream (high-quality format information) is inserted into the extension video stream and/or the container. Therefore, the reception side can easily recognize the high quality format of the high quality format image data. Then, the reception side can select either the basic format image data Vb or the high quality format image data Ve as the display image data based on the information and the display capability information, and can easily display an image according to the display capability.

<2. modification >

Note that, in the above-described embodiment, the transmission and reception system 10 including the transmission apparatus 100 and the reception apparatus 200 has been described. However, the configuration of the transmission and reception system to which the present technology can be applied is not limited to the present embodiment. For example, a part of the sink device 200 may have a configuration of a set-top box and a monitor connected with a digital interface such as a high-definition multimedia interface (HDMI). In this case, the set-top box can acquire the display capability information by acquiring Extended Display Identification Data (EDID) from a monitor or the like. Note that "HDMI" is a registered trademark.

Further, in the above-described embodiment, an example in which the container is a transport stream (MPEG-2TS) has been described. However, the present technology can be similarly applied to a system having a configuration of distributing data to a receiving terminal using a network such as the internet. In distribution over the internet, data is often distributed in containers or other formats of MP 4. That is, containers of various formats such as a transport stream (MPEG-2TS) used in the digital broadcasting standard, an MPEG Media Transport (MMT) as a next generation transport, and an MP4 for internet distribution fall within the scope of the container of the present invention.

Further, the present technology may have the following configuration.

(1) A transmitting apparatus, comprising:

an image encoding unit configured to generate two video streams including a base video stream including encoded image data of base format image data and an extension video stream including encoded image data of one type of high quality format image data selected from a plurality of types;

a transmitting unit configured to transmit a container of a predetermined format including a base video stream and an extended video stream; and

an information insertion unit configured to insert information indicating a high quality format corresponding to encoded image data contained in a video stream into an extended video stream and/or a container.

(2) The transmission apparatus according to (1), wherein

Image encoding unit

Performing a predictive encoding process within the base format image data on the base format image data to obtain encoded image data, an

Selectively performing a predictive encoding process within the high-quality format image data or a predictive encoding process between the high-quality format image data and the base format image data on the high-quality format image data to obtain encoded image data.

(3) The transmission apparatus according to (2), wherein

The base format image data is normal dynamic range and low frame rate image data,

the high quality format image data is any one of high dynamic range and high frame rate image data, high dynamic range and low frame rate image data, and normal dynamic range and high frame rate image data, an

The encoded image data of the high-quality format image data includes an encoded component of high-dynamic-range image data based on differential information with respect to normal-dynamic-range image data and/or an encoded component of high-frame-rate image data based on differential information with respect to low-frame-rate image data.

(4) The transmission apparatus according to (3), wherein

When differential information with respect to the normal dynamic range image data is obtained, the image encoding unit performs dynamic range conversion on the normal dynamic range image data to reduce a difference.

(5) The transmission apparatus according to (4), wherein

The image encoding unit performs dynamic range conversion on the normal dynamic range image data based on conversion information for converting a value of conversion data based on a normal dynamic range photoelectric conversion characteristic into a value of conversion data based on a high dynamic range photoelectric conversion characteristic.

(6) The transmission apparatus according to (5), wherein

The information insertion unit also inserts the conversion information into the extended video stream and/or the container.

(7) The transmission apparatus according to any one of (1) to (6), wherein

The image encoding unit:

such that a time indicated by the decoding time stamp added to the encoded image data of each image contained in the extended video stream: equal to the time indicated by the decoding time stamp added to the encoded image data of each image contained in the elementary video stream; or becomes an intermediate time between times indicated by decoding time stamps added to encoded image data of each picture contained in the base video stream,

equalizing an interval between times indicated by decoding time stamps added to encoded image data of each image contained in the base video stream, and

equalizing an interval between times indicated by decoding time stamps added to encoded image data of each image contained in the extended video stream.

(8) The transmission apparatus according to any one of (1) to (7), wherein

The extended video stream has a NAL unit structure, an

The information insertion unit inserts the information indicating a high quality format corresponding to the encoded image data contained in the extended video stream into a header of the NAL unit.

(9) The transmission apparatus according to any one of (1) to (7), wherein

The extended video stream has a NAL unit structure, an

The information insertion unit inserts the information indicating a high quality format corresponding to the encoded image data included in the extension video stream into a region of an SEI NAL unit.

(10) The transmission apparatus according to any one of (1) to (9), wherein

The container is MPEG2-TS, and

the information insertion unit inserts the information indicating a high quality format corresponding to the encoded image data contained in the extended video stream into a video elementary stream loop corresponding to the extended video stream existing under an arrangement of a program map table.

(11) The sending method comprises the following steps:

an image encoding step of generating two video streams including a base video stream including encoded image data of base format image data and an extension video stream including encoded image data of one type of high quality format image data selected from a plurality of types;

a transmission step of transmitting, by a transmission unit, a container of a predetermined format including a base video stream and an extended video stream; and

an information insertion step of inserting information indicating a high quality format corresponding to encoded image data of the high quality format image data into the extension video stream and/or the container.

(12) A receiving apparatus, comprising:

a receiving unit configured to receive a container of a predetermined format including two video streams, the two video streams including a base video stream including encoded image data of base format image data and an extension video stream including encoded image data of one type of high quality format image data selected from among the plurality of types,

information indicating a high quality format corresponding to encoded image data of the high quality format image data is inserted into the extension video stream and/or the container,

the receiving apparatus further comprises:

an information extraction unit configured to extract information from the extended video stream and/or the container; and

a processing unit configured to obtain image data corresponding to display capability from the base video stream and the extended video stream as display image data based on the extracted information and the display capability information.

(13) The receiving apparatus according to (12), wherein

Generating encoded image data contained in the base video stream by performing intra-prediction encoding processing of the base format image data on the base format image data, an

Encoded image data contained in the extended video stream is generated by selectively performing, on high-quality-format image data, intra-prediction encoding processing of the high-quality-format image data and prediction encoding processing between the high-quality-format image data and base-format image data.

(14) A receiving method, comprising:

a receiving step of receiving, by a receiving unit, a container of a predetermined format including two video streams including a base video stream including encoded image data of base format image data and an extension video stream including encoded image data of one type of high quality format image data selected from among the plurality of types,

information indicating a high quality format corresponding to encoded image data contained in the high quality format image data is inserted into the extended video stream and/or the container, and

the receiving method additionally comprises:

an information extraction step of extracting information from the extended video stream and/or the container; and

a processing step of obtaining image data corresponding to the display capability from the base video stream and the extended video stream as display image data based on the extracted information and the display capability information.

The main feature of the present technology is that it is possible to easily identify a high-quality format corresponding to encoded image data contained in an extended video stream by inserting information indicating the high-quality format corresponding to the encoded image data contained in the extended video stream into the extended video stream and/or a container at a receiving side (see fig. 20).

10 transmitting and receiving system

100 transmitting device

101 control unit

102 and 104 photoelectric conversion units

103 and 105 RGB/YCbCR conversion units

106 video encoder

106b and 106e coding units

107 system encoder

108 sending unit

150 image data generating unit

151 camera

152 conversion unit

161 intra-layer prediction unit

162 inter-layer prediction unit

163 prediction regulation unit

164 selection unit

165 code function unit

200 receiving device

201 control unit

202 receiving unit

203 systematic encoder

204 video encoder

204b and 204e decoding units

205 and 207 YCbCr/RGB conversion units

206 and 208 electro-optical conversion units

209 display unit

241 decoding functional unit

Intra 242 layer prediction compensation unit

243 inter-layer prediction compensation unit

244 prediction regulation unit

245 select a cell.

45页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于稀疏卷积神经网络的点云几何无损压缩方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类