Information processing apparatus and method

文档序号:1256838 发布日期:2020-08-21 浏览:14次 中文

阅读说明:本技术 信息处理装置和方法 (Information processing apparatus and method ) 是由 高桥辽平 平林光浩 于 2018-12-28 设计创作,主要内容包括:本公开内容涉及用于使得能够更容易地选择子图片流的信息处理装置和方法。根据本公开内容,创建文件以便与每个图片区域的布置信息分开地包括与整个图片中对应于要保存的子图片的区域有关的信息,并且该文件还包括通过对子图片进行编码而获得的图像编码数据。本公开内容可应用于例如信息处理装置、图像处理装置、图像编码装置、文件创建装置、文件发送装置、分发装置、文件接收装置、图像解码装置或再现装置。(The present disclosure relates to an information processing apparatus and method for enabling easier selection of a sub-picture stream. According to the present disclosure, a file is created so as to include information on a region corresponding to a sub-picture to be saved in an entire picture separately from arrangement information of each picture region, and the file further includes image encoding data obtained by encoding the sub-picture. The present disclosure is applicable to, for example, an information processing apparatus, an image encoding apparatus, a file creating apparatus, a file transmitting apparatus, a distributing apparatus, a file receiving apparatus, an image decoding apparatus, or a reproducing apparatus.)

1. An information processing apparatus comprising:

a file generating section configured to generate a file including, as information different from arrangement information of each of the picture regions, region-related information relating to a region corresponding to the stored sub-picture in the entire picture, and also including image encoding data resulting from encoding the sub-picture.

2. The information processing apparatus according to claim 1,

the entire picture includes omnidirectional video.

3. The information processing apparatus according to claim 1,

the region-related information is included in the file as information of each of the sub-pictures.

4. The information processing apparatus according to claim 3,

the file comprises an international organization for standardization base media file format ISOBMFF file,

the arrangement information of each of the picture areas includes information signaled in a region-by-region packing box, an

The region-related information is stored in a scheme information box in the ISOBMFF file different from the region-by-region packing box, or the region-related information is stored in a box at a lower layer of the scheme information box.

5. The information processing apparatus according to claim 1,

the region-related information includes information indicating whether the entire picture is the same as a projection picture.

6. The information processing apparatus according to claim 5,

the file comprises an international organization for standardization base media file format (ISOBMFF) file, an

The region-related information is indicated by the presence or absence of a specific box stored in the sub-picture synthesis box.

7. The information processing apparatus according to claim 1,

the file also includes stereoscopic information including information related to stereoscopic display of the entire picture.

8. The information processing apparatus according to claim 7,

the entire picture includes omnidirectional video.

9. The information processing apparatus according to claim 7,

the stereoscopic information is included in the file as information of each of the sub-pictures.

10. The information processing apparatus according to claim 9,

the file comprises an international organization for standardization base media file format (ISOBMFF) file, an

The stereoscopic information is stored in a scheme information box in the ISOBMFF file or in a box at a lower layer of the scheme information box.

11. The information processing apparatus according to any one of claims 7,

the file also includes information relating to the display size of the sub-picture.

12. The information processing apparatus according to claim 7,

the file further includes sub-stereoscopic information, which is information related to stereoscopic display of each of the sub-pictures.

13. The information processing apparatus according to claim 7,

the file further includes view information indicating a view type of the sub-picture.

14. The information processing apparatus according to claim 13,

the view information includes information of each of the regions included in the sub-picture.

15. An information processing method comprising:

a file is generated which includes, as information different from arrangement information of each of the picture regions, region-related information related to a region corresponding to the stored sub-picture in the entire picture, and which also includes image encoding data resulting from encoding the sub-picture.

16. An information processing apparatus comprising:

a file acquisition section configured to acquire a file that includes, as information different from arrangement information of each of picture regions, region-related information related to a region corresponding to a stored sub-picture in an entire picture, and that further includes image encoding data resulting from encoding the sub-picture; and

an image processing section configured to select a stream of the image encoding data based on the region-related information included in the file acquired by the file acquisition section.

17. The information processing apparatus according to claim 16,

the entire picture includes omnidirectional video.

18. The information processing apparatus according to claim 16,

the region-related information is included in the file as information of each of the sub-pictures.

19. The information processing apparatus according to claim 16,

the region-related information changes dynamically within the stream.

20. An information processing method comprising:

acquiring a file including, as information different from arrangement information of each of picture regions, region-related information related to a region corresponding to a stored sub-picture in an entire picture, and image encoding data resulting from encoding the sub-picture; and

selecting the stream of image encoding data based on the region-related information included in the acquired file.

Technical Field

The present disclosure relates to an information processing apparatus and method, and particularly to an information processing apparatus and method capable of more easily selecting a stream of a sub-picture.

Background

In the past, there are standards of known HTTP (hypertext transfer protocol) -based adaptive content distribution technologies, including MPEG-DASH (moving picture experts group dynamic adaptive streaming over HTTP) (see, for example, non-patent document 1(NPL1) and non-patent document 2(NPL 2)).

Further, the file format of MPEG-DASH includes ISOBMFF (international organization for standardization base media file format), which includes the file container specification of the international standardization technology "MPEG-4 (moving picture experts group 4)" for moving image compression (see, for example, non-patent document 3(NPL 3)).

Incidentally, it has been devised to perform distribution of an omnidirectional image (also referred to as a projection plane image) including a three-dimensional structure image mapped to a plane image using MPEG-DASH; as in the case of so-called omnidirectional video, the three-dimensional structure image includes an image that extends 360 degrees in the horizontal direction and 180 degrees in the vertical direction and is projected on the three-dimensional structure. For example, MPEG-DASH may be applied by mapping a three-dimensional structure image to a single plane and distributing a projection plane image including the three-dimensional structure image mapped to the plane. It is also suggested that, in the above case, the projection plane image of the omnidirectional video (also referred to as the whole picture) be divided into a plurality of sub-pictures, and then the sub-pictures be stored in a plurality of tracks (tracks). Note that the identification of the display area of the sub-picture requires the following processes, including: first, an entire picture is constructed from sub-pictures based on sub-picture division information, and then the entire picture subjected to region-wise packing is rearranged based on region-wise packing information (see, for example, non-patent literature (NPL 4)).

Reference list

Non-patent document

Non-patent document 1

“Information technology.Dynamic adaptive streaming over HTTP(DASH).Part 1:Media presentation description and segment formats,”ISO/IEC23009-1,2014/05

Non-patent document 2

“Information technology.Dynamic adaptive streaming over HTTP(DASH).Part 1:Media presentation description and segment formats AMENDMENT2:Spatialrelationship description,generalized URL parameters and other extensions,”ISO/IEC 23009-1:2014/Amd 2:2015,2015/07

Non-patent document 3

“Information technology-Coding of audio-visual objects-Part 12:ISObase media file format,”ISO/IEC 14496-12,2005-10-01

Non-patent document 4

Ye-Kui Wang,Youngkwon Lim,“MPEG#120OMAF meeting agenda and minutes,”ISO/IEC JTC1/SC29/WG11MPEG2017/M41823,October 2017,Macau,China

Disclosure of Invention

Technical problem

However, in the presently proposed method, in the case of dividing an image into sub-pictures, arrangement information indicating that the size and position of each picture region has changed (region-by-region packing information about the entire picture that is not divided) is signaled in a region-by-region packing box below the sub-picture synthesis box. Therefore, in the case of selecting and reproducing the sub-picture tracks, in order to identify the display region of the sub-picture tracks on the projection picture, it is necessary to parse the sub-picture composition frame to distinguish the region-by-region packing information from the sub-picture division information. Such selection and reproduction may increase the load of processing compared to selection and reproduction for tracks other than the sub-picture track.

In view of these circumstances, an object of the present disclosure is to allow easier selection of a stream of sub-pictures. Solution to the problem

An information processing apparatus according to an aspect of the present technology is an information processing apparatus including a file generating section configured to generate a file including region-related information related to a region corresponding to a stored sub-picture in an entire picture as information different from arrangement information of each of picture regions, and also including image encoding data resulting from encoding the sub-picture.

An information processing method according to an aspect of the present technology is an information processing method including: a file is generated, the file including, as information different from arrangement information of each of the picture regions, region-related information related to a region corresponding to the stored sub-picture in the entire picture, and also including image encoding data resulting from encoding the sub-picture.

An information processing apparatus according to another aspect of the present technology is an information processing apparatus including a file acquisition section configured to acquire a file including, as information different from arrangement information of each of picture regions, region-related information related to a region corresponding to a stored sub-picture in an entire picture, and further including image encoded data generated by encoding the sub-picture; and an image processing section configured to select a stream of image encoding data based on the region-related information included in the file acquired by the file acquisition section.

An information processing method according to another aspect of the present technology is an information processing method including acquiring a file including, as information different from arrangement information of each of picture regions, region-related information related to a region corresponding to a stored sub-picture in an entire picture, and further including image encoding data generated by encoding the sub-picture; and selecting a stream of image encoding data based on the region-related information included in the acquired file.

In an information processing apparatus and method according to an aspect of the present technology, a file is generated which includes, as information different from arrangement information of each of picture regions, region-related information related to a region corresponding to a stored sub-picture in the entire picture, and which further includes image encoding data resulting from encoding the sub-picture.

In the information processing apparatus and method according to another aspect of the present technology, a file including region-related information related to a region corresponding to a stored sub-picture in an entire picture as information different from arrangement information of each of picture regions is acquired, and image encoded data generated by encoding the sub-picture is further included, and a stream of the image encoded data is selected based on the region-related information included in the acquired file.

The invention has the advantages of

In accordance with the present disclosure, information may be processed. In particular, the stream of sub-pictures can be more easily selected.

Drawings

Fig. 1 is a diagram showing an example of a Box (Box) hierarchy of ISOBMFF sub-picture tracks.

Fig. 2 is a diagram showing an example of a frame hierarchy of a track that is not an ISOBMFF sub-picture track.

Fig. 3 is a diagram showing an example of the syntax of a sub-picture synthesis frame.

Fig. 4 is a diagram showing an example of syntax of a sub-picture region frame.

Fig. 5 is a diagram showing an example of semantics of fields defined in a sub-picture region box.

Fig. 6 is a diagram showing an example of the syntax of a box packed by region.

Fig. 7 is a diagram showing an example of syntax of a region-by-region packing structure.

Fig. 8 is a diagram showing an example of semantics of fields defined in a region-by-region packing structure.

Fig. 9 is a diagram showing an example of syntax of the retrievegionpacking.

Fig. 10 is a diagram showing an example of semantics of fields defined in the retrievegionpacking.

Fig. 11 is a block diagram showing a main configuration example of a file generating apparatus.

Fig. 12 is a block diagram showing a main configuration example of a client apparatus.

Fig. 13 is a diagram showing an example of display area information.

Fig. 14 is a flowchart showing an example of the procedure of the upload process.

Fig. 15 is a flowchart showing an example of the procedure of the content reproduction processing.

Fig. 16 is a diagram showing an example of syntax of a 2D overlay information box.

Fig. 17 is a diagram showing an example of semantics of fields defined in a 2D overlay information box.

Fig. 18 is a diagram showing an example of display area information.

Fig. 19 is a diagram showing an example of a sub-picture including a discontinuous region.

Fig. 20 is a diagram showing an example of syntax of a 2D overlay information box.

Fig. 21 is a diagram showing an example of semantics of fields added in this case.

Fig. 22 is a diagram showing an example of the syntax of the region-by-region packing structure.

Fig. 23 is a diagram showing an example of semantics of fields added in this case.

Fig. 24 is a diagram showing an example of syntax of the retrievprojectedregion.

Fig. 25 is a diagram showing an example of semantics of fields defined in the repojectedregion.

Fig. 26 is a diagram showing an example of the syntax of the region-by-region packing structure.

Fig. 27 is a diagram showing an example of syntax of the retrievegionpacking.

Fig. 28 is a diagram showing an example of the syntax of the overlay information box.

Fig. 29 is a diagram showing an example of the semantics of fields defined in an overlay information box.

Fig. 30 is a diagram showing an example of the syntax of a sphere offset projection SEI message.

Fig. 31 is a diagram showing an example of semantics of fields defined in a sphere offset projection SEI message.

Fig. 32 is a diagram showing an example of syntax of a 2D overlay information sample entry.

Fig. 33 is a diagram showing an example of syntax of a 2D overlay information sample.

Fig. 34 is a diagram showing an example of a sample table frame.

Fig. 35 is a diagram showing an example of syntax of a 2D overlay information sample group entry.

Fig. 36 is a flowchart showing an example of the procedure of the upload process.

Fig. 37 is a flowchart showing an example of the procedure of the content reproduction processing.

Fig. 38 is a diagram showing an example of attribute values of a 2D overlay information descriptor.

Fig. 39 is a diagram showing an example of attribute values of a 2D overlay information descriptor.

Fig. 40 is a diagram showing an example of data types.

Fig. 41 is a diagram showing an example of packing attribute values of descriptors by region.

Fig. 42 is a diagram showing an example of packing attribute values of descriptors by area.

Fig. 43 is a diagram showing an example of attribute values of a content overlay descriptor.

Fig. 44 is a diagram showing an example of attribute values of a content overlay descriptor in succession to fig. 43.

Fig. 45 is a diagram showing an example of data types.

Fig. 46 is a diagram showing an example of the data type following fig. 45.

Fig. 47 is a diagram showing an example of the data type in continuation to fig. 46.

Fig. 48 is a diagram showing an example of division into sub-pictures.

Fig. 49 is a diagram showing an example of division into sub-pictures.

Fig. 50 is a diagram showing an example of division into sub-pictures.

Fig. 51 is a flowchart showing an example of the procedure of the upload process.

Fig. 52 is a flowchart showing an example of the procedure of the content reproduction processing.

Fig. 53 is a diagram showing an example of the syntax of an original stereoscopic video frame.

Fig. 54 is a diagram showing an example of semantics of fields defined in an original stereoscopic video frame.

Fig. 55 is a diagram showing an example of signals for display sizes.

Fig. 56 is a diagram showing an example of syntax of a pixel aspect ratio box.

Fig. 57 is a diagram showing an example of semantics of fields defined in a pixel aspect ratio box.

Fig. 58 is a diagram showing an example of signaling the aspect ratio of a pixel within the display time.

Fig. 59 is a diagram showing an example of the syntax of the original scheme information box.

Fig. 60 is a diagram showing an example of syntax of a 2D overlay information box.

Fig. 61 is a diagram showing an example of semantics of fields defined in a 2D overlay information box.

Fig. 62 is a diagram showing the signal notification of stereo _ presentation _ enable.

Fig. 63 is a diagram showing an example of syntax of a track stereoscopic video frame.

Fig. 64 is a diagram showing an example of syntax of a 2D overlay information box.

Fig. 65 is a diagram showing an example of semantics of fields defined in a 2D overlay information box.

Fig. 66 is a diagram showing an example of signaling of view _ idc.

Fig. 67 is a flowchart showing an example of the procedure of the upload process.

Fig. 68 is a flowchart showing an example of the procedure of the content reproduction processing.

Fig. 69 is a diagram showing an example of attribute values of a 2D overlay information descriptor.

Fig. 70 is a diagram showing an example of attribute values of a 2D overlay information descriptor, continuing from fig. 69.

Fig. 71 is a diagram showing an example of data types.

Fig. 72 is a block diagram showing a main configuration example of a computer.

Fig. 73 is a diagram showing an example of the syntax of the sub-picture synthesis frame.

Fig. 74 is a diagram showing an example of syntax of a supplementary property.

Detailed Description

Embodiments (hereinafter referred to as embodiments) of the present disclosure will be described. The description is made in the following order.

1. Signaling information related to a sub-picture

2. First embodiment (signaling the display area of the sub-picture and the extension of the ISOBMFF)

3. Second embodiment (signaling the display area of the sub-picture and the expansion of the MPD)

4. Third embodiment (signaling stereoscopic information about entire picture and extension of ISOBMFF)

5. Fourth embodiment (signaling stereoscopic information about entire picture and extension of MPD)

6. Additional features

<1. Signaling of information related to sub-Picture >

< documents supporting technical contents and terminology, etc. >

The scope of the disclosure in the present technology includes not only the contents described in the embodiments but also the contents described in the following non-patent documents known at the time of filing the present application.

NPL1 (as described above)

NPL2 (as described above)

NPL3 (as described above)

NPL4 (as described above)

In other words, the contents described in the above-mentioned non-patent documents are the basis for determining the support requirement. For example, technical terms such as parsing, syntax, and semantics are within the scope of the disclosure of the present technology and support requirements for the claims are met even if the embodiments do not include a direct description of the terms.

<MPEG-DASH>

In the past, there are known standards for HTTP (hypertext transfer protocol) -based adaptive content distribution techniques, including MPEG-DASH (dynamic adaptive streaming over moving picture experts group-HTTP) described in NPL1 and NPL2, for example.

MPEG-DASH allows video to be reproduced at an optimal bit rate according to changes in network bandwidth, for example, using HTTP, which corresponds to a communication protocol similar to that used for downloading internet web pages from websites.

The standard allows easier development of an infrastructure for a moving image distribution service and a technique for a moving image reproduction client. In particular, for operators participating in the distribution service, these standards are advantageous for improving compatibility between the moving image distribution service and moving image reproduction customers, and further facilitate utilization of existing content resources, and are expected to be effective for promoting growth of the market.

MPEG-DASH mainly includes two technical designs including a standard for describing a manifest file specification called MPD (media presentation description) for managing metadata of moving images and audio files, and an operation standard of a file format called a segment format for actual communication of moving image content.

For example, as described in NPL3, the file format of MPEG-DASH includes an ISOBMFF (international organization for standardization base media file format) including a file container specification "MPEG-4 (moving picture experts group-4)" for moving picture compression international standard technology, which includes an additional extension for satisfying the MPEG-DASH requirements as an extension specification 14496-12 of ISO/IEC (international organization for standardization/international electrotechnical commission).

< distribution of Omnidirectional video Using MPEG-DASH >

Incidentally, the projected planar image includes a three-dimensional structure image mapped to the planar image; as in the case of so-called omnidirectional video, the three-dimensional structure image includes an image that extends 360 degrees in the horizontal direction and 180 degrees in the vertical direction and is projected on the three-dimensional structure. For example, by rendering a surrounding image viewed from a viewpoint (omnidirectional video) on a three-dimensional structure around the viewpoint to provide a three-dimensional structure image, the image around the viewpoint can be naturally expressed, or an image in a desired sight-line direction can be easily generated from the three-dimensional structure image.

In recent years, it has been designed to distribute projection plane images (omnidirectional video and the like) using MPEG-DASH. For example, as described in NPL4, MPEG-DASH may be applied by mapping a three-dimensional structure image to a single plane and distributing a projection plane image having the three-dimensional structure image mapped to the plane.

Methods for projection and mapping onto a plane on a three-dimensional structure (which are also referred to as projection formats) include ERP (perspective projection) and CMP (cube map projection), for example. For example, in ERP, an image extending 360 degrees in the horizontal direction and 180 degrees in the vertical direction and projected on a three-dimensional structure is mapped to a single plane such that the latitudinal direction and the longitudinal direction of the spherical three-dimensional structure are orthogonal to each other. In addition, in CMP, for example, an image extending 360 degrees in the horizontal direction and 180 degrees in the vertical direction is projected onto the surface of a three-dimensional structure, and the surface of the three-dimensional structure is spread and mapped to a single plane so that the surfaces are arranged in a predetermined order.

The projection plane image onto which the omnidirectional video is projected and mapped is also referred to as a projection picture. In other words, a projection picture refers to a two-dimensional image (two-dimensional picture) that is determined for each projection format and represents omnidirectional video.

For the MPEG-I Part2 omnidirectional media format (ISO/IEC23090-2) FDIS (final draft international standard) (hereinafter referred to as OMAF) described in NPL4, a technique has been discussed in which a projection plane image (also referred to as a whole picture) of one omnidirectional video is divided into a plurality of sub-pictures, which are stored in a plurality of tracks.

For example, there is a use case in which sub-picture tracks corresponding to the field of view are configured for respective specific field of view regions, and in which the client selects and reproduces any sub-picture track according to the field of view region of the client.

< Block hierarchy of ISOBMFF File >

The frame hierarchy 11 in fig. 1 represents an example of a frame hierarchy of an ISOBMFF file used to form an omnidirectional video as a sub-picture track.

As shown in the box hierarchy 11, in this case, information about the entire picture is stored under the track group box. For example, a sub-picture composition box (spco) stores information for grouping sub-picture tracks and indicating, for example, whether the picture has been divided into sub-pictures. In addition, under the sub-picture composition box, boxes such as a sub-picture region box (sprg), a region-wise packing box (rwpk), and a stereoscopic video box (stvi) are formed.

The sub-picture region box stores sub-picture division information indicating, for example, a manner of division into sub-pictures. Further, the region-by-region packing box stores region-by-region packing information about the entire picture that is not divided. In addition, the stereoscopic video frame stores information (stereoscopic information) on stereoscopic display (stereoscopic display) of the entire picture. The stereoscopic information includes information indicating, for example, the type of stereoscopic display image (e.g., side-by-side or top-bottom).

Further, under the scheme information frame (schi) under the limited scheme information frame (rinf) under the limited sample entry (resv) (type of sample entry) under the sample table frame (stbl) under the media information frame (minf) under the media frame (mdia), frames such as a projected omnidirectional video frame (povd) and a stereoscopic video frame (stvi) are formed.

Projecting the omnidirectional video frame stores metadata associated with the omnidirectional video. The stereoscopic video frame stores stereoscopic information about a sub-picture to which the frame corresponds.

The box hierarchy 12 in fig. 2 represents an example of a box hierarchy of an ISOBMFF file, where the omnidirectional video is not formed as a sub-picture track.

As shown in the box hierarchy 12, in this case, instead of forming a track group box, a region-wise packing box is formed under the projected omnidirectional video box.

In other words, in the case of forming the omnidirectional video as a sub-picture, only the region-by-region packing box indicating arrangement information indicating that the size and position of each picture region have changed is signaled for the sub-picture composition box, and includes region-by-region packing information on the entire picture that is not divided. In contrast, a region-by-region packing box is signaled in projecting the omnidirectional video box without forming the omnidirectional video as a sub-picture, and includes region-by-region packing information about pictures stored in the track. A track including sub-picture composition information is hereinafter referred to as a sub-picture track.

< selection of sub-picture track >

Therefore, the processing required by the client to identify the display area on the projection plane image (projection picture) of the image in the track differs depending on whether the track is a sub-picture track or an ordinary track that is not divided into sub-pictures. For example, in the case of selecting and reproducing a sub-picture track, in order to identify a display region of the sub-picture track on a projected picture, it is necessary to parse the sub-picture synthesis frame to identify the area packing information and the sub-picture division information. In contrast, in the case of selecting and reproducing a track that is not a sub-picture track, it is not necessary to perform this processing.

Syntax 21 in fig. 3 represents an example of syntax of a Sub Picture Composition Box (Sub Picture Composition Box). As shown in syntax 21, a Sub-Picture Region Box (Sub Picture Region Box) and a Region-by-Region Packing Box (Region Wise Packing Box) are set in the Sub-Picture synthesis Box.

The syntax 22 in fig. 4 represents an example of the syntax of the sub-picture region box. Fields such as track _ x, track _ y, track _ width, track _ height, composition _ width, and composition _ height are defined in the sub-picture region box as shown in syntax 22.

The semantics 23 in fig. 5 represent an example of the semantics of the fields in the sub-picture region box. As shown by semantics 23, track _ x indicates the horizontal position of a sub-picture stored in a track over the entire picture. track _ y indicates the vertical position of the sub-picture stored in the track over the entire picture. track _ width indicates the width of the sub-pictures stored in the track. track _ height indicates the height of the sub-pictures stored in the track. The composition _ width indicates the width of the entire picture. composition _ height indicates the height of the entire picture.

Syntax 24 in fig. 6 indicates an example of syntax for packing boxes by region. As shown in syntax 24, a Region Packing structure (Region Wise Packing structure) is set in the Region Packing box.

Syntax 25 in fig. 7 represents an example of syntax of the region-by-region packing structure. Fields such as dependent _ picture _ matching _ flag, num _ regions, proj _ picture _ width, proj _ picture _ height, packet _ picture _ width, packet _ picture _ height, guard _ band _ flag [ i ], packet _ type [ i ], and guard band (i) are set in the per-region packing structure as shown in syntax 25.

Semantics 26 in FIG. 8 represent an example of the semantics of the fields defined in the region-wise packing structure. As shown in the semantic 26, the dependent _ picture _ matching _ flag is flag information indicating whether the same region-wise packing is applied to the view for the left eye (left view) and the view for the right eye (right view) in the case where the picture is a stereoscopic picture. For example, a value of 0 for this field indicates that the picture is a single-view picture (single view) or that different packing is applied to the left view and the right view. A value of 1 for this field indicates that the same packing is applied to the left and right views.

In addition, num _ regions indicates the number of packed regions. proj _ picture _ width indicates the width of the projection picture. proj _ picture _ height indicates the height of the projection picture. The packet _ picture _ width indicates the width of a packed picture (a picture that is packed by region). packet _ picture _ height indicates the height of a packed picture.

In addition, guard _ band _ flag [ i ] is flag information indicating whether or not a guard band exists. For example, a value of 0 for this field indicates that there is no guard band in the packing region, and a value of 1 for this field indicates that there is a guard band in the packing region. packing _ type [ i ] indicates the shape of the packing area. For example, a value of 0 for this field indicates that the packed area is rectangular. Guard band information about the periphery of the region is guard band information (guardband (i)).

In addition, as shown in syntax 25, a RectRegionPacking is also provided in the region-by-region packing structure. Syntax 27 in fig. 9 represents an example of syntax of the retrievegionpacking (packing by region). As shown in syntax 27, fields such as proj _ reg _ width [ i ], proj _ reg _ height [ i ], proj _ reg _ top [ i ], proj _ reg _ left { i }, transform _ type [ i ], packet _ reg _ width [ i ], packet _ reg _ height [ i ], packet _ reg _ top [ i ], and packet _ reg _ left [ i ] are set in the retrievegionpacking.

Semantics 28 in fig. 10 represent an example of the semantics of the fields defined in the retrievegionpacking. As shown by semantics 28, proj reg width [ i ] indicates the width of the projection region corresponding to the region-packed application source. proj reg height i indicates the height of the projection region corresponding to the region-packed application source. proj reg top i indicates the vertical position of the projection region corresponding to the region-packed application source. proj reg left i indicates the horizontal position of the projection area corresponding to the area-packed application source. transform _ type [ i ] indicates a rotation or mirror of the packed area. The packed _ reg _ width [ i ] indicates the width of the packed region rearranged by region-wise packing. packed _ reg _ height [ i ] indicates the height of the packed region rearranged by region-wise packing. The packed _ reg _ top [ i ] indicates the vertical position of the packed region rearranged by region-wise packing. packed _ reg _ left [ i ] indicates the horizontal position of the packed region rearranged by region-wise packing.

In other words, for example, in the case where the client selects a sub-picture track according to the field of view of the user, it is necessary to parse such information, and therefore, such selection may increase the load of the corresponding processing as compared with selecting and reproducing a track that is not a sub-picture track.

< identification of stereoscopic information >

In addition, in the case where the entire picture of the stereoscopic omnidirectional video is divided into sub-pictures, a stereoscopic video frame indicating stereoscopic information on the entire picture (which type of stereoscopic display image the entire picture is, etc.) is signaled in the sub-picture synthesis frame, and a stereoscopic video frame indicating stereoscopic information on the sub-picture (which type of stereoscopic display image the sub-picture is, etc.) is signaled below the scheme information frame of the sample entry in the track. In contrast, in the case where the entire picture is not divided into sub-pictures, a stereoscopic video frame is signaled only under the scheme information frame, and includes stereoscopic information on the pictures stored in the track.

Therefore, the processing required by the client to identify the stereoscopic information on the track differs depending on whether the track is a sub-picture track or an ordinary track that is not divided into sub-pictures. For example, in the case where the entire picture is a stereoscopic image (stereoscopic image), the sub-picture track resulting from division includes L view and R view, but in some cases, the frame packing arrangement is not top-bottom or side-by-side.

Therefore, in the case of recognizing whether such a sub-picture can be stereoscopically displayed, it is necessary to perform processing involving parsing the sub-picture synthesis frame, recognizing the area-by-area packing information and the sub-picture division information, and recognizing the stereoscopic information. In contrast, in the case of selecting and reproducing a track that is not a sub-picture track, it is not necessary to perform this processing.

In other words, for example, in a case where a client selects a sub-picture track according to the stereoscopic display capability of the client, such selection may increase the load of the corresponding process as compared with selecting and reproducing a track that is not a sub-picture track.

The selection of sub-picture tracks of an ISOBMFF file has been described. However, in the MPD file, the sub-pictures are managed as an adaptation set. The selection of an adaptation set of reference sub-pictures in the MPD file may increase the processing load for reasons similar to those described above. In other words, whether using an ISOBMFF file or an MPD file, the load of the selection of streams may increase.

< Signaling display area of sub-Picture >

Therefore, in the case where the entire picture is divided into sub-pictures, information on the sub-picture display area (which is provided to the content reproduction side) is signaled. The display area represents an area in the entire picture. Specifically, the information related to the sub-picture display area is area-related information related to an area corresponding to the sub-picture in the entire picture, in other words, information indicating which partial image of the entire picture corresponds to the sub-picture. The information indicates, for example, the position, size, shape, etc. of the region corresponding to the sub-picture. The method of expressing the region is optional, and for example, the range of the region may be indicated by coordinates or the like.

This enables a client reproducing the content to know where the sub-picture is to be displayed in the omnidirectional video based on the above information.

At this time, information related to the sub-picture display area is signaled as information for each sub-picture. This allows the client to easily obtain this information. Therefore, the client can easily select a desired stream of sub-pictures. For example, in the case where a stream is selected according to the field of view of the user, the client can easily select an appropriate stream corresponding to the field of view direction or range.

< signaling of stereoscopic information on the entire picture divided into sub-pictures >

In addition, stereoscopic information including information related to stereoscopic display of the entire picture divided into sub-pictures is signaled. This enables the client that reproduces the content to easily grasp whether or not the entire picture is a stereoscopic image (stereoscopic image), and in the case where the entire picture is a stereoscopic image, to grasp the type of the stereoscopic image. Therefore, the client can easily grasp what image is included in the sub-picture (for example, which part of what type of stereoscopic image (or monoscopic image (monoscopic image)) the image included in the sub-picture corresponds to).

Therefore, the client can easily select a desired stream. For example, in the case where a stream is selected according to the capabilities of the client, the client can easily select an appropriate stream according to the capabilities of the client.

< document creation apparatus >

Now, a configuration of an apparatus that provides signaling related to a sub-picture will be described. Fig. 11 is a block diagram showing an example of the configuration of a file generation apparatus according to an aspect of an information processing apparatus to which the present technology is applied. The file generation device 100 shown in fig. 11 is a device that generates an ISOBMFF file (segment file) or an MPD file. For example, the file generation apparatus 100 implements the techniques described in NPL1 to NPL4, and generates an ISOBMFF file including a stream or an MPD file corresponding to a control file for controlling distribution of the stream using an MPEG-DASH compliant method, and uploads (transmits) the file to a server that distributes the file via a network.

Note that fig. 11 shows main components such as a processing section and a data flow, but does not show all components of the file generating apparatus. In other words, in the file generating apparatus 100, there may be a processing section not shown as a block in fig. 11, or there may be a process or a data flow not shown as an arrow in fig. 11.

As shown in fig. 11, the file generation apparatus 100 includes a control section 101, a memory 102, and a file generation section 103.

The control section 101 controls the operation of the file generating apparatus 100 as a whole. For example, the control section 101 controls and causes the file generation section 103 to generate an ISOBMFF file or an MPD file, and uploads the generated ISOBMFF file or MPD file. The control unit 101 executes processing related to such control using the memory 102. For example, the control section 101 loads a desired program or the like into the memory 102, and executes the program to execute processing related to the control as described above.

The file generation unit 103 executes processing related to generation and upload (transmission) of the ISOBMFF file or MPD file under the control of the control unit 101. As shown in fig. 11, the file generation section 103 includes a data input section 111, a data encoding and generation section 112, an MPD file generation section 113, a recording section 114, and an upload section 115.

The data input unit 111 performs processing related to reception of data input. For example, the data input section 111 receives data such as images necessary for generating textures and meshes and metadata necessary for generating an MPD file, for example. In addition, the data input section 111 feeds the received data to the data encoding and generating section 112 and the MPD file generating section 113.

The data encoding and generating unit 112 executes processing related to encoding of data and generation of a file. For example, the data encoding and generating section 112 generates a stream of textures, meshes, and the like based on data such as an image fed from the data input section 111. In addition, the data encoding and generating section 112 generates an ISOBMFF file storing the generated stream. In addition, the data encoding and generating section 112 feeds the generated ISOBMFF file to the recording section 114.

As shown in fig. 11, the data encoding and generating section 112 includes a preprocessing section 121, an encoding section 122, and a segment file generating section 123.

The preprocessing section 121 performs processing on data such as a non-encoded image. For example, the preprocessing section 121 generates a stream of textures or meshes based on data such as an image fed from the data input section 111. In addition, for example, the preprocessing section 121 supplies the generated stream to the encoding section 122.

The encoding unit 122 executes processing related to encoding of a stream. For example, the encoding section 122 encodes the stream fed from the preprocessing section 121. In addition, for example, the encoding section 122 feeds the encoded data encoded by the encoding section 122 to the segment file generating section 123.

The segment file generation unit 123 executes processing related to generation of the segment file. For example, based on metadata or the like fed from the data input section 111, the segment file generating section 123 forms the encoded data fed from the encoding section 122 into a file in units of segments (generates a segment file). In addition, for example, as processing relating to the generation of the segment file, the segment file generating section 123 feeds the ISOBMFF file generated as described above to the recording section 114. For example, the segment file generating section 123 generates an ISOBMFF file as a segment file, and feeds the generated ISOBMFF file to the recording section 114.

The MPD file generating unit 113 executes processing related to generation of an MPD file. For example, the MPD file generating section 113 generates an MPD file based on metadata or the like fed from the data input section 111. In addition, for example, the MPD file generating section 113 feeds the generated MPD file to the recording section 114. Note that the MPD file generating section 113 may acquire metadata and the like necessary for generating an MPD file from the segment file generating section 123.

The recording section 114 includes any recording medium such as a hard disk or a semiconductor memory, and performs processing related to, for example, recording of data. For example, the recording section 114 records the MPD file fed from the MPD file generating section 113. In addition, for example, the recording section 114 records the segment file (for example, ISOBMFF file) fed from the segment file generating section 123.

The upload unit 115 executes processing related to uploading (transmission) of a file. For example, the uploading unit 115 reads the MPD file recorded in the recording unit 114. In addition, for example, the uploading section 115 uploads (transmits) the read MPD file to a server (not shown) that distributes the MPD file to clients or the like via a network or the like.

In addition, for example, the upload section 115 reads a segment file (for example, an ISOBMFF file) recorded in the recording section 114. In addition, for example, the uploading section 115 uploads (sends) the read segment file to a server (not shown) that distributes the segment file to a client or the like via a network or the like.

In other words, the uploading section 115 functions as a communication section that transmits the MPD file or the segment file (e.g., the ISOBMFF file) to the server. Note that the destination of the MPD file from the uploading section 115 may be the same as or different from the destination of the segment file (e.g., the ISOBMFF file) from the uploading section 115. In addition, in the example described herein, the file generation apparatus 100 functions as an apparatus that uploads an MPD file or a segment file (e.g., an ISOBMFF file) to a server that distributes files to clients. However, the file generating apparatus 100 may function as a server. In this case, the upload section 115 of the file generation apparatus 100 is only required to distribute an MPD file or a segment file (for example, an ISOBMFF file) to a client via a network.

< client terminal device >

Fig. 12 is a block diagram showing an example of a configuration of a client apparatus according to an aspect of an information processing apparatus to which the present technology is applied. The client device 200 shown in fig. 12 is a device that acquires an MPD file or a segment file (e.g., an ISOBMFF file) and reproduces content based on the file. For example, the client apparatus 200 implements the techniques described in NPL1 to NPL4 to acquire a segment file from a server (or the above-described file generation apparatus 100) using an MPEG-DASH compliant method and reproduce streams (contents) included in the segment file. At this time, the client apparatus 200 may acquire an MPD file from the server (or the file generation apparatus 100 described above), select a desired segment file using the MPD file, and acquire the segment file from the server.

Note that fig. 12 shows main components such as a processing section and a data flow, but does not show all components of the client device. In other words, in the client apparatus 200, there may be a processing section not shown as a block in fig. 12, or there may be a process or a data flow not shown as an arrow in fig. 12.

As shown in fig. 12, the client apparatus 200 includes a control section 201, a memory 202, and a reproduction processing section 203.

The control section 201 controls the operation of the client apparatus 200 as a whole. For example, the control section 201 controls and causes the reproduction processing section 203 to acquire an MPD file or a segment file (for example, an ISOBMFF file) from a server or reproduce a stream (content) included in the segment file. The control unit 201 executes processing related to such control using the memory 202. For example, the control section 201 loads a desired program or the like into the memory 202, and executes the program to execute processing related to the control as described above.

The reproduction processing section 203 executes processing relating to reproduction of a stream (content) included in the segment file according to control by the control section 201. As shown in fig. 12, the reproduction processing section 203 includes a measurement section 211, an MPD file acquisition section 212, an MPD file processing section 213, a segment file acquisition section 214, a display control section 215, a data analysis and decoding section 216, and a display section 217.

The measurement unit 211 performs processing related to measurement. For example, the measurement unit 211 measures a transmission band of a network between the client apparatus 200 and the server. In addition, for example, the measurement section 211 feeds the corresponding measurement result to the MPD file processing section 213.

The MPD file acquisition unit 212 executes processing related to acquisition of an MPD file. For example, the MPD file acquisition section 212 acquires an MPD file corresponding to a desired content (content to be reproduced) from a server via a network. In addition, for example, the MPD file acquisition section 212 feeds the acquired MPD file to the MPD file processing section 213.

The MPD file processing section 213 executes processing based on the MPD file. For example, the MPD file processing section 213 selects a stream to be acquired based on the MPD file fed from the MPD file acquisition section 212. In addition, for example, the MPD file processing section 213 feeds the corresponding selection result to the segment file acquisition section 214. Note that selection of a stream to be acquired involves appropriately using the measurement result from the measurement section 211 and information relating to the viewpoint position and the line-of-sight direction of the user fed from the display control section 215.

The fragmented file acquisition section 214 performs processing related to acquisition of a fragmented file (for example, an ISOBMFF file). For example, the segment file acquisition section 214 acquires a segment file storing a stream necessary for reproducing a desired content from a server via a network. In addition, for example, the segment file acquisition section 214 feeds the acquired segment file to the data analysis and decoding section 216.

Note that the server from which the segment file acquisition section 214 acquires the segment file (e.g., the ISOBMFF file) may be the same as or different from the server from which the MPD file acquisition section 212 acquires the MPD file. In addition, the segment file acquisition section 214 may acquire a segment file based on the selection result of the stream fed from the MPD file processing section 213. In other words, the segment file acquisition section 214 may acquire a segment file storing a stream selected based on the MPD file or the like from the server.

The display control unit 215 executes processing related to control of content reproduction (display). For example, the display control section 215 acquires detection results of the viewpoint position and the line-of-sight direction of the user who views and listens to the content. For example, the display control section 215 feeds the acquired detection result (information on the viewpoint position and the line of sight direction of the user) to the MPD file processing section 213 and the data analyzing and decoding section 216.

The data analyzing and decoding unit 216 performs processing related to, for example, analysis or decoding of data. For example, the data analysis and decoding section 216 processes the ISOBMFF file fed from the segment file acquisition section 214 to generate a display image of the content. In addition, the data analyzing and decoding section 216 feeds data relating to the display image to the display section 217.

As shown in fig. 12, the data analysis and decoding section 216 includes a segment file processing section 221, a decoding section 222, and a display information generating section 223.

The fragmented file processing section 221 performs processing on a fragmented file (for example, an ISOBMFF file). For example, the segment file processing section 221 extracts encoded data of a desired stream from the ISOBMFF file fed by the segment file acquisition section 214. In addition, for example, the segmented file processing section 221 feeds the extracted encoded data to the decoding section 222.

Note that the segment file processing section 221 may select a stream based on information on the viewpoint position and the line-of-sight direction of the user fed from the display control section 215 or the transmission band measured by the measurement section 211, and extract encoded data of the stream from the segment file.

The decoding unit 222 executes processing related to decoding. For example, the decoding section 222 decodes the encoded data fed from the segmented file processing section 221. In addition, for example, the decoding section 222 feeds the stream decoded by the decoding section 222 to the display information generating section 223.

The display information generation unit 223 executes processing related to generation of data of a display image. For example, the display information generation section 223 generates data of a display image corresponding to the viewpoint position and the line-of-sight direction of the user based on the information related to the viewpoint position and the line-of-sight direction of the user fed from the display control section 215 and the stream fed from the decoding section 222. In addition, for example, the display information generation section 223 feeds data of the generated display image to the display section 217.

The display portion 217 includes any display device, for example, a display including a liquid crystal display panel or the like or a projector, and performs processing related to image display using the display device. For example, the display portion 217 performs content reproduction, such as image display, based on the data fed from the display information generation portion 223.

<2 > first embodiment

< display area information about sub-picture signaled in ISOBMFF >

The above information relating to the sub-picture display area may be signaled in an ISOBMFF file corresponding to the segment file.

In other words, a file may be generated which includes, as information different from the arrangement information of each picture region, region-related information relating to a region in the entire picture corresponding to the stored sub-picture, and which also includes image encoding data resulting from encoding the sub-picture.

For example, in the file generating apparatus 100 serving as an information processing apparatus, the segment file generating section 123 functions as a file generating section that generates a file that includes, as information different from arrangement information of each picture region, region-related information related to a region in the entire picture corresponding to a stored sub-picture, and that also includes image encoded data resulting from encoding the sub-picture. In other words, the information processing apparatus (e.g., the file generating apparatus 100) may include a file generating section (e.g., the segment file generating section 123).

Thus, as described above, the client can easily select a stream based on the above information.

Note that in the ISOBMFF file, the stream is managed as a track. In other words, when using the ISOBMFF file, the selection of tracks results in the selection of streams.

In addition, the above-described picture may be all or a part of an omnidirectional video (a projection plane image obtained from projection and mapping of an image extending 360 degrees in the horizontal direction and 180 degrees in the vertical direction). The omnidirectional video is an image in all directions around the viewpoint (i.e., a peripheral image viewed from the viewpoint). By rendering the omnidirectional video into a three-dimensional structure, the omnidirectional video may be formed as an image extending 360 degrees in the horizontal direction and 180 degrees in the vertical direction. As described above, by mapping the three-dimensional structure image to a single plane to form a projection plane image, stream distribution control to which MPEG-DASH is applied can be performed. In other words, even in the case where the file generating apparatus 100 uses all or a part of such a projected planar image as the entire picture and divides the entire picture into sub-pictures, the present technology can be applied as described above. Note that even in the case where a part of the projection plane image is used as the entire picture, information about the display area of the sub picture in the entire projection plane image is signaled.

For example, as shown in fig. 13, an image extending 360 degrees in the horizontal direction and 180 degrees in the vertical direction is projected on a three-dimensional structure (cube) by cube map projection (cube projection) to generate a three-dimensional structure image 301. In addition, the three-dimensional structure image 301 is mapped to a single plane by a predetermined method to generate a projection plane image (projection picture) 302. The file generation apparatus 100 divides the projection plane image 302 into sub-pictures (sub-pictures 303 to 308), and generates an ISOBMFF file in which the sub-pictures are stored in different tracks.

At this time, as indicated by an arrow 311, the file generating apparatus 100 signals information (display region information) indicating which sub-pictures correspond to which part of the entire picture (projection plane image 302) in the ISOBMFF file.

Therefore, even in the case of distributing omnidirectional video, the client can easily select a stream based on the information as described above.

Note that region-related information (display region information) may be included in the ISOBMFF file as information for each sub-picture. This enables the client to easily know which part of the whole picture corresponds to the sub-picture by simply referring to the information in the sub-picture track.

< upload treatment Process >

An example of the upload processing procedure executed by the file generating apparatus 100 in fig. 11 in the above-described case will be described with reference to the flowchart in fig. 14.

When the upload process is started, in step S101, the data input section 111 of the file generating apparatus 100 acquires an image and metadata.

In step S102, the segment file generating unit 123 generates an ISOBMFF file including display region information on a display region in the projection picture as information of each sub-picture.

In step S303, the ISOBMFF file generated by the processing in step S102 is recorded in the recording section 114.

In step S104, the upload section 115 reads the ISOBMFF file recorded in step S103 from the recording section 114, and uploads the ISOBMFF file to the server.

When the processing in step S104 ends, the upload processing ends.

By the upload process performed as described above, the file generation apparatus 100 can generate an ISOBMFF file including display area information about a display area in a projection picture as information of each sub-picture.

Accordingly, the client can easily select and reproduce an appropriate stream corresponding to the user's field of view or the like based on the above information.

< utilization of sub-picture display area information signaled in ISOBMFF >

In addition, the selection and reproduction of the stream may be performed by using information related to the sub-picture display area signaled in the ISOBMFF file.

In other words, a file including region-related information related to a region in the entire picture corresponding to the stored sub-picture as information different from the arrangement information of each picture region and also including image encoded data resulting from encoding the sub-picture may be acquired, and a stream of the image encoded data may be selected based on the region-related information included in the acquired file.

For example, in the client apparatus 200 serving as an information processing apparatus, the segment file acquisition section 214 functions as a file acquisition section that acquires a file that includes region-related information relating to a region in the entire picture corresponding to the stored sub-picture as information different from arrangement information of each picture region and that also includes image encoded data generated by encoding the sub-picture, and the data analysis and decoding section 216 may function as an image processing section that selects a stream of the image encoded data based on the region-related information included in the file acquired by the file acquisition section. In other words, the information processing apparatus (e.g., the client apparatus 200) may include a file acquisition section (e.g., the segment file acquisition section 214) and an image processing section (e.g., the data analysis and decoding section 216).

This allows the client device 200 to more easily select a stream.

Note that the above-described picture (entire picture) may be all or a part of an omnidirectional video (projection plane image resulting from projection and mapping of an image extending 360 degrees in the horizontal direction and 180 degrees in the vertical direction). In other words, even in the case where the client apparatus 200 uses all or a part of the projected planar image as the entire picture, divides the entire picture into sub-pictures forming a stream, and reproduces the image, the present technology can be applied as described above.

In addition, region-related information (display region information) may be included in the ISOBMFF file as information of each sub-picture. This enables the client device 200 to easily know which part of the entire picture corresponds to the sub-picture by simply referring to information in the sub-picture track.

< procedure of content reproduction processing >

An example of the procedure of the content reproduction processing executed by the client apparatus 200 in the above-described case will be described with reference to the flowchart in fig. 15.

When the content reproduction process is started, in step S121, the segment file acquisition section 214 of the client device 200 acquires an ISOBMFF file that includes display region information about a display region in a projection picture as information for each sub-picture.

In step S122, the display control unit 215 acquires the measurement result of the viewpoint position (and the line of sight direction) of the user.

In step S123, the measurement unit 211 measures the transmission bandwidth of the network between the server and the client apparatus 200.

In step S124, the segment file processing unit 221 selects a sub-picture track corresponding to the field of view of the user of the client apparatus 200 based on the display area information on the display area of the sub-picture in the projected picture.

In step S125, the segment file processing section 221 extracts encoded data of the stream in the track selected in step S124 from the ISOBMFF file acquired in step S121.

In step S126, the decoding unit 222 decodes the encoded data of the stream extracted in step S125.

In step S127, the display information generation unit 223 reproduces the stream (content) resulting from the decoding performed in step S126. More specifically, the display information generation section 223 generates data of a display image from the stream, and feeds the data to the display section 217 for display.

When the processing in step S127 ends, the content reproduction processing ends.

The content reproduction processing performed as described above enables the client apparatus 200 to more easily select a stream using the information about the sub-picture display area included in the ISOBMFF file. For example, based on this information, the client device 200 can easily select an appropriate stream corresponding to the user's field of view.

< definition in 2D overlay information Box >

As described above, the segment file generating section 123 of the file generating apparatus 100 newly defines the display area information on the sub-picture indicating to which portion of the displayed projection picture each sub-picture corresponds, and signals the display area information in the track. In other words, the segment file generating section 123 defines the display area information on the sub-picture as information for each sub-picture.

For example, the segment file generating part 123 defines the 2D overlay information frame as display area information on the sub-picture, and signals the 2D overlay information frame as a frame different from the area-by-area packing frame. For example, the segment file generating section 123 defines the 2D overlay information frame as the 2D overlay information frame in the schema information frame. For example, the segment file generating section 123 defines the 2D overlay information frame as a 2D overlay information frame in the projected omnidirectional video frame below the plan information frame. In addition, the segment file generating part 123 may define the 2D overlay information frame for the segment file generating part 123 among other frames.

In other words, display area information on the sub-picture (area-related information related to an area in the entire picture corresponding to the sub-picture stored in the track) may be stored in a scheme information box in the ISOBMFF file, which is different from the area-by-area packing box, or may be stored in a box that is different from the area-by-area packing box and is located in a lower layer of the scheme information box.

This enables the client apparatus 200 to easily select and reproduce a sub-picture track without parsing the sub-picture synthesis frame.

Note that even in the case where the picture stored in each track is not a sub-picture or there is no region-wise packing box (the picture is not region-wise packed), the 2D overlay information box may be used to signal display region information.

The syntax 331 in fig. 16 represents an example of the syntax of the 2D overlay Information Box (2D Coverage Information Box). Fields such as proj _ picture _ width, proj _ picture _ height, proj _ reg _ width, proj _ reg _ height, proj _ reg _ top, and proj _ reg _ left are set in the 2D overlay information box as shown in syntax 331.

The syntax 332 in fig. 17 represents an example of syntax of a field defined in the 2D overlay information box. As shown in syntax 332, proj _ picture _ width indicates the width of the projection picture and proj _ picture _ height indicates the height of the projection picture. proj reg width indicates the width of the region on the projected picture corresponding to the picture in the track. proj reg height indicates the height of the region on the projected picture corresponding to the picture in the track. The proj reg top indicates the vertical coordinates of the area on the projected picture corresponding to the picture in the track. proj _ reg _ left indicates the area horizontal coordinate on the projected picture corresponding to the picture in the track.

In other words, various information as shown in fig. 18 is defined in the 2D overlay information box.

Note that the field may be expressed as an actual number of pixels, or proj _ reg _ width, proj _ reg _ height, proj _ reg _ top, and proj _ reg _ left may be expressed as relative values with respect to proj _ picture _ width and proj _ picture _ height. Representing each field as an actual number of pixels is useful in selecting a track according to the resolution of the display of the client.

By referring to the 2D overlay information frame for the sub-picture track configured as described above, the client device 200 can easily identify the display area of the sub-picture track without parsing the sub-picture composition frame. Thus, the client device 200 may easily select the sub-picture track, for example, according to the user's field of view. Note that the client apparatus 200 may select a track other than the sub-image track by similar processing.

Alternatively, in a Sub Picture Composition Box (Sub Picture Composition Box) shown in the syntax 21 of fig. 3, an identification _ to _ proj jpic _ flag field may be additionally defined as shown in the syntax 1001 of fig. 73 to indicate whether the entire Picture is the same as the projection Picture, and a Sub Picture Composition Box shown in the syntax 22 of fig. 4 may indicate display area information on a Sub Picture track in the case where the entire Picture is the same as the projection Picture. For the value of the identification _ to _ proj jpic _ flag field, a value of 0 indicates that the entire picture is different from the projection picture, and a value of 1 indicates that the entire picture is the same as the projection picture.

At this time, in the case where the identification _ to _ proj _ pic _ flag field is 1, the entire picture has not been subjected to the region-by-region packetization processing, and the semantics of the track _ x, track _ y, track _ width, track _ height, composition _ width, and composition _ height fields shown in the semantics 23 in fig. 5 are respectively the same as the proj _ reg _ left, proj _ reg _ top, proj _ reg _ width, proj _ reg _ height, proj _ pic _ width, and proj _ picture _ height fields in the 2D overlay information frame shown in the semantics 332 in fig. 17.

Note that the explicit _ to _ proj _ pic _ flag field may be additionally defined in the sub-picture region box or any other box. Alternatively, the presence or absence of a particular box may indicate whether the entire picture is the same as the projected picture.

In addition, 1 bit of the 24-bit flag common to the sub-picture composition box and other boxes corresponding to the FullBox extension may be used to indicate whether the entire picture is the same as the projection picture.

< case where the sub-picture includes a discontinuous region >

Note that, as shown in fig. 19, the syntax 331 in fig. 16 fails to handle the case where the sub-picture includes a discontinuous region. In the example of fig. 19, the projected picture 351 is divided into sub-pictures 352 through 355. In this case, the sub-picture 352 includes a left plane and a right plane in the three-dimensional structure image (the left plane and the right plane are adjacent to each other). In the projection picture 351, the left plane and the right plane are not continuous with each other. In addition, the sub-picture 353 includes a top plane and a bottom plane in the three-dimensional structure image (the top plane and the bottom plane are adjacent to each other). In the projected picture 351, the top plane and the bottom plane are not continuous with each other.

The syntax 331 in fig. 16 can only specify one resizable region in the projection picture, and thus cannot specify a plurality of non-contiguous regions as described above.

Accordingly, it may be allowed to specify a plurality of regions in the 2D overlay information frame, and it may be allowed to specify a plurality of discontinuous regions in the projected picture.

The syntax 371 in fig. 20 represents an example of syntax of the 2D overlay information box (2D Coverage information box) in this case. As shown in syntax 371, in this case, a num _ regions field is added to the defined field. Semantics 372 in fig. 21 represent an example of the semantics of the added fields in the 2D overlay information box in this case. As shown by semantics 372, num _ regions indicates the number of regions on the projection picture included in the sub-picture.

In other words, in this case, the num _ regions field is used to (independently) define fields in the 2D overlay information box shown in fig. 17 for each region of the projection picture. Thus, a plurality of regions of the projection picture can be specified. This enables signaling of non-contiguous display areas of the projected picture.

Note that in the case where the 2D overlay information frame is signaled in the sub-picture composition frame, the display area in the entire picture (projection picture) may be signaled.

In addition, in the case where there is no 2D overlay information box in the projected omnidirectional video box in the track, this may indicate that the track stores 360 degrees omnidirectional video. Similarly, in the case where there is no 2D overlay information box in the sub-picture composition box, this may indicate that the entire picture including the sub-picture track is 360 degree omni-directional video.

< expansion of the box by area >

The area-by-area packing structure in the area-by-area packing box defined in the OMAF may be extended to signal which display area of the projected picture corresponds to the sub-picture in the track. In the sample entries in the sub-picture track, the signaled position by area packing box is below the projected omnidirectional video box. Note that the per-region packing box may be signaled to any other location.

For example, a flag signaling display area information on a sub-picture is newly defined, a ct Projected Region structure signaling display area information on a sub-picture is newly defined, and display area information is signaled in a packed-by-area structure. Note that even in the case where the pictures stored in the track are not sub-pictures, the display region information may be signaled using a region-by-region packing structure.

Syntax 373 in fig. 22 represents an example of syntax of the region-by-region packing structure in the above case. As shown in syntax 373, in this case, a 2D _ coverage _ flag field is added to the fields defined in the per-region packing structure. Semantics 374 in fig. 23 represent an example of the semantics of the fields additionally defined in the region-by-region packing structure in this case. As shown by the semantic 374, 2D _ coverage _ flag is flag information indicating whether to signal only a display area on a projection picture. For example, a field with a value of 0 indicates that the packing by area information is to be signaled. In addition, a field having a value of 1 indicates that a display area on the projection picture is to be signaled.

Note that in this case, a rectprotjetredregion is also defined for the region-by-region packing structure. Syntax 375 in fig. 24 indicates an example of syntax of a recpprojetredregion. As shown in syntax 375, fields such as proj _ reg _ width [ i ], proj _ reg _ height [ i ], proj _ reg _ top [ i ], and proj _ reg _ left [ i ] are defined in the RectProjedRegion.

Semantics 376 in fig. 25 represent an example of the syntax of the fields defined in the recpropjetedregion. As shown by semantics 376, proj reg width indicates the width of the region on the projected picture corresponding to the picture in the track. proj reg height indicates the height of the region on the projected picture corresponding to the picture in the track. The proj reg top indicates the vertical coordinates of the area on the projected picture corresponding to the picture in the track. proj _ reg _ left indicates the area horizontal coordinate on the projected picture corresponding to the picture in the track.

Note that the above fields may be indicated by the actual number of pixels, or proj _ reg _ width, proj _ reg _ height, proj _ reg _ top, and proj _ reg _ left may be indicated by relative values signaled in the per-region packing structure with respect to proj _ picture _ width and proj _ picture _ height.

In addition, the Rect wide Packing Struct may be extended such that when 2D _ coverage _ flag is 1, only the display area information in the projection picture is signaled.

Syntax 377 in fig. 26 represents an example of the syntax of the Rect Wise Packing Struct in the above case. Syntax 378 in fig. 27 is a diagram showing an example of syntax of the rectegionpacking set in the Rect Wise Packing structure in this case.

< extension of overlay information frame >

The overlay information frame indicating the display area of the track on the spherical surface defined in the OMAF may be expanded to enable the display area on the projection picture to be signaled by the redefined 2D content overlay structure.

In other words, display area information on the sub-picture (area-related information related to an area in the entire picture corresponding to the sub-picture stored in the track) may be stored in the overlay information frame indicating the display area of the track on the spherical surface.

Syntax 379 in fig. 28 represents an example of the syntax of the extended overlay information box. As shown in syntax 379, in this case, 2D _ coverage _ flag, ContentCoverageStruct () and 2DContentCoverageStruct () are defined in the overlay information box.

Semantics 380 in fig. 29 represent an example of the semantics of the fields described above. As shown in semantics 380, 2D _ coverage _ flag is flag information that signals the type of display region information. A value of 0 indicates that the spherical surface display area information is signaled, and a value of 1 indicates that the display area on the projection picture is signaled. ContentCoverageStruct () signals the display area of the track on the projection picture. 2DConentCoverageStruct () signals the display area of the track on the spherical surface. The fields in the 2D content overlay structure are similar to the fields in the 2D overlay information box in the case of fig. 20.

Note that the content overlay structure may be extended to signal the display area on the projected picture in addition to the spherical surface display area.

< Signaling in case where the sub-picture division method dynamically changes >

The signaling in the case where the sub-picture division method does not dynamically change has been described. On the other hand, in the case where the division method dynamically changes, display region information on the display region of the sub-picture in the projection picture dynamically changes. The above example cannot handle this situation.

Therefore, an example of additional signaling for signaling dynamically-changing display region information about the display region of the sub-picture will be described below. Note that the signaled information is the same as that signaled in the 2D overlay information box as described above (e.g., fig. 16).

< Supplemental Enhancement Information (SEI) message >

For HEVC or AVC, the 2D overlay information SEI message may be redefined and display area information related to dynamically changing sub-pictures in a stream may be signaled in units of access units.

In other words, display area information on the sub-picture (area-related information related to an area in the entire image corresponding to the sub-picture stored in the track) may be stored in the supplemental enhancement information message in the ISOBMFF file.

Syntax 381 in fig. 30 represents an example of syntax of the 2D overlay information SEI message in the above case. As shown in syntax 381, the following is set in the 2D overlay information SEI message: 2D _ coverage _ information _ cancel _ flag, 2D _ coverage _ information _ persistence _ flag, 2D _ coverage _ information _ reserved _ zero _6bits, proj _ picture _ width, proj _ picture _ height, num _ regions, proj _ reg _ width [ i ], proj _ reg _ height [ i ], proj _ reg _ top [ i ], proj _ reg _ left [ i ], and the like.

The semantics 382 in fig. 31 represent an example of the semantics of the fields defined in the 2D coverage information SEI message. As shown in the semantics 382, 2D _ coverage _ information _ cancel _ flag is flag information related to cancellation of 2D _ coverage _ information. The value 1 cancels the continuous application of the SEI before the output order. In addition, the value is 0 and 2D coverage information is signaled.

The 2D _ coverage _ information _ persistence _ flag is flag information related to an application range of the SEI. When the value is 0, then SEI information is applied only to pictures including SEI. Furthermore, the value 1 continues the application of SEI until a new coded video sequence is started or the end of the stream is reached.

The 2D _ coverage _ information _ reserved _ zero _6bits are padded with 0 s. proj _ picture _ width indicates the width of the projection picture. proj _ picture _ height indicates the height of the projection picture. num _ regions indicates the number of regions on the projected picture. The proj _ reg _ width indicates the width of a region on the projection picture corresponding to the stream. proj reg height indicates the height of the region on the projection picture corresponding to the stream. The proj _ reg _ top indicates the vertical coordinate of the area on the projection picture corresponding to the stream. proj _ reg _ left indicates the area horizontal coordinate on the projection picture corresponding to the stream.

Note that each of the above fields may be indicated by the actual number of pixels, or proj _ reg _ width, proj _ reg _ height, proj _ reg _ top, and proj _ reg _ left may be indicated by relative values with respect to proj _ picture _ width and proj _ picture _ height.

< timing metadata >

In addition, a mechanism of timing metadata corresponding to a stream storing chronologically varying metadata may be used to redefine 2D overlay information timing metadata, and display area information on sub-pictures, which dynamically varies within a reference stream, may be signaled. For example, '2 dco' is used as a track reference type for tracks associated with the 2D overlay information timing metadata.

In other words, display area information on the sub-picture (area-related information related to an area in the entire picture corresponding to the sub-picture stored in the track) may be stored in the timing metadata in the ISOBMFF file.

The use of timing metadata enables the client to identify dynamically changing display regions without decoding the sub-picture stream and use the display regions as references to select from the stream.

The syntax 383 in fig. 32 represents an example of syntax of the 2D Coverage Information sample entry (2D Coverage Information SampleEntry). The syntax 384 in fig. 33 represents an example of the syntax of the 2D Coverage information sample (2D Coverage information sample).

In the 2D overlay information sample entry, the normally invariant proj _ picture _ width and proj _ picture _ height within the stream are signaled. Note that in the case where the proj _ picture _ width and the proj _ picture _ height vary within a stream, the proj _ picture _ width and the proj _ picture _ height may be signaled in the 2d hierarchy information sample.

Note that the semantics of the fields in the 2D coverage information sample entry and the 2D coverage information sample are similar to those in fig. 17 and 21.

< sample group >

By using a tool called a sample group and corresponding to a mechanism of associating meta-information in units of samples, display area information on a sub-picture dynamically changing within a stream can be signaled in units of samples.

As shown in fig. 34, a Sample Group in which meta information is described is signaled as a Group Entry (Group Entry) in a Sample Group description Box (Sample Group description Box) of a Sample Table Box (Sample Table Box), and the Sample Group is associated with a Sample via a Sample To Group Box (Sample To Group Box).

As shown in fig. 34, the grouping _ type of the sample-to-group box indicates the grouping _ type of the sample group description box to be associated. For 1 entry, sample _ count indicating the number of samples belonging to the group entry and group _ description _ index indicating the index of the group entry to be associated are signaled.

For example, 2D overlay information sample group entries may be redefined, and display area information about sub-pictures that dynamically change within a stream may be stored in the 2D overlay information sample entries.

In other words, display area information on the sub-picture (area-related information related to an area in the entire picture corresponding to the sub-picture stored in the track) may be stored in the sample group entry in the ISOBMFF file.

A syntax 391 in fig. 35 represents an example of syntax of a 2D Coverage information sample Group Entry (2D Coverage information sample Group Entry). As described above, a Sample Group Entry (Sample Group Entry) is signaled in the Sample Group description box, and the Sample-to-Group box associates the Sample with the Sample Group Entry. The grouping _ type is '2 cgp'.

Note that the semantics of the fields in the 2D overlay information sample group entry are similar to those in fig. 16 and 21.

Note that the above-described three examples (supplemental enhancement information (SEI) message, timing metadata, sample group) may be used as a signal to dynamically change display region information even if a picture stored in a track is not a sub-picture.

In addition, in the case where the display area of the sub-picture on the projection picture is dynamically changed as described above, the information of the 2D overlay information frame signaled in the projection omnidirectional video frame may include an initial value of the display area of the stream.

In addition, a flag indicating the display area of a sub-picture in a projection picture that dynamically changes within a stream may be signaled in a 2D overlay information box or any other box. This information enables the client to easily identify the stream including the dynamically changing display area.

<3 > second embodiment

< signaling of information related to the display area of a sub-picture in an MPD file >

The MPD file may be used to signal information related to a display area of the sub-picture. In other words, in order to enable the client to select and reproduce an adaptation set of reference sub-pictures according to the field of view of the user, display region information regarding the display regions of the sub-pictures on the projected picture may be redefined in the MPD file and signaled in the adaptation set.

In other words, a control file that manages image encoded data of each of a plurality of sub-pictures into which an entire picture is divided and then encoded, and that includes, as information different from arrangement information of each picture region, region-related information related to a region in the entire picture corresponding to the sub-picture, the control file being for controlling distribution of the image encoded data, may be generated.

For example, in the file generating apparatus 100 serving as an information processing apparatus, the MPD file generating section 113 may serve as a file generating section that generates a control file that manages image encoded data of each of a plurality of sub-pictures into which an entire picture is divided and then encoded, and the control file includes, as information different from arrangement information of each picture region, region-related information related to a region in the entire picture corresponding to the sub-picture, the control file being used to control distribution of the image encoded data. In other words, the information processing apparatus (e.g., the file generation apparatus 100) may include a file generation section (e.g., the MPD file generation section 113).

This enables the client to more easily select a stream based on the information as described above.

Note that in the MPD file, metadata of each stream is managed as an Adaptation Set (Adaptation Set) or Representation (replication). In other words, in the case of using an MPD file, a stream is selected by selecting an adaptation set or representation.

In addition, the above-described picture (entire picture) may be all or a part of an omnidirectional video (projection plane image resulting from projection and mapping of an image extending 360 degrees around the horizontal direction and 180 degrees around the vertical direction). In other words, in the case where the file generating apparatus 100 uses all or a part of such a projection plane image as the entire picture and divides the entire picture into sub-pictures, the present technology can be applied as described above.

Therefore, even in the case of distributing omnidirectional video, as described above, the client can more easily select a stream based on the information.

Note that the area-related information (display area information) may be included in the MPD file as information for each sub-picture. This enables the client to easily know which part of the entire picture corresponds to the sub-picture by referring to the information about the sub-picture referred to by the adaptation set.

< upload treatment Process >

An example of the upload processing procedure executed by the file generating apparatus 100 in fig. 11 in the above-described case will be described with reference to the flowchart in fig. 36.

When the upload process is started, in step S201, the data input section 111 of the file generation apparatus 100 acquires an image and metadata.

In step S202, the segment file generating unit 123 generates a segment file of the image.

In step S203, the MPD file generating section 113 generates an MPD file including display region information on a display region in a projected picture as information for each sub-picture.

In step S204, the recording section 114 records the segment file generated by the processing in step S202. The MPD file generated by the processing in step S203 is recorded in the recording unit 114.

In step S205, the uploading section 115 reads the segment file recorded in step S204 from the recording section 114, and uploads the segment file to the server. In addition, the uploading section 115 reads the MPD file recorded in step S204 from the recording section 114, and uploads the MPD file to the server.

When the processing in step S204 ends, the upload processing ends.

The upload processing performed as described above enables the file generation apparatus 100 to generate an MPD file that includes display area information relating to a display area in a projected picture as information for each sub-picture.

Accordingly, the client can more easily select and reproduce an appropriate stream corresponding to, for example, the field of view of the user based on the display area information.

< utilization of information related to display area of sub-picture and signaled in MPD file >

In addition, information related to the display area of the sub-picture and signaled in the MPD file may be used to select a stream.

In other words, a control file that manages image encoded data of each of a plurality of sub-pictures, in which an entire picture is divided into a plurality of sub-pictures and then encoded, and that includes region-related information related to a region in the entire picture corresponding to the sub-picture as information different from arrangement information of each picture region may be acquired, the control file is used to control distribution of the image encoded data, and a stream of the image encoded data may be selected based on the region-related information included in the acquired control file.

For example, in the client apparatus 200 serving as an information processing apparatus, the MPD file acquisition section 212 may serve as a control file acquisition section that acquires a control file that manages image encoded data of each of a plurality of sub-pictures, wherein an entire picture is divided into a plurality of sub-pictures and then encoded, the control file includes, as information different from arrangement information of each picture region, region-related information related to a region in the entire picture corresponding to the sub-picture, the control file is used to control distribution of the image encoded data, and the MPD file processing section 213 may serve as an image processing section that selects a stream of the image encoded data based on the region-related information included in the control file acquired by the file acquisition section. In other words, the information processing apparatus (e.g., the client apparatus 200) may include a file acquisition section (e.g., the MPD file acquisition section 212) and an image processing section (e.g., the MPD file processing section 213).

This enables the client apparatus 200 to easily select a stream.

Note that the above-described picture (entire picture) may be all or a part of an omnidirectional video (a projection plane image obtained by projection and mapping of an image extending 360 degrees around the horizontal direction and 180 degrees around the vertical direction). In other words, even in the case where the client apparatus 200 uses all or a part of the projection plane image as the entire picture, divides the entire picture into sub-pictures forming a stream, and reproduces the image, the present technology can be applied as described above.

In addition, the region-related information (display region information) may be included in the MPD file as information for each sub-picture. This enables the client apparatus 200 to easily know what portion of the entire picture corresponds to the sub-picture simply by referring to the information about the sub-picture referred to by the adaptation set.

< content reproduction processing procedure >

An example of the content reproduction processing procedure executed by the client apparatus 200 in the above-described case will be described with reference to the flowchart in fig. 37.

When the content reproduction process is started, in step S221, the MPD file acquisition section 212 of the client apparatus 200 acquires an MPD file including display region information on a display region in a projected picture as information for each sub-picture.

In step S222, the display control unit 215 acquires the measurement result of the viewpoint position (and the line of sight direction) of the user.

In step S223, the measurement unit 211 measures the transmission bandwidth of the network between the server and the client apparatus 200.

In step S224, the MPD file processing section 213 selects an adaptation set that refers to a sub-picture corresponding to the field of view of the user of the client apparatus 200, based on display area information about the display area of the sub-picture in the projected picture.

In step S225, the MPD file processing section 213 selects, from the adaptation set selected in step S224, a representation corresponding to the viewpoint position and the line-of-sight direction of the user, the transmission bandwidth of the network between the client and the server, and the like.

In step S226, the segment file acquisition section 214 acquires the segment file corresponding to the representation selected in step S225.

In step S227, the segment file processing section 221 extracts encoded data from the segment file acquired in step S226.

In step S228, the decoding unit 222 decodes the encoded data of the stream extracted in step S227.

In step S229, the display information generation unit 223 reproduces the stream (content) resulting from the decoding in step S228. More specifically, the display information generation section 223 generates data of a display image from the stream and feeds the data of the display image to the display section 217 to cause the display section 217 to display the display image.

When the processing in step S229 ends, the content reproduction processing ends.

The content reproduction processing performed as described above enables the client apparatus 200 to more easily select a stream using information about a sub-picture display area included in the MPD file. For example, based on this information, the client device 200 may easily select an appropriate stream corresponding to the user's field of view.

< definition of 2D overlay information descriptor >

As described above, the MPD file generating section 113 of the file generating apparatus 100 redefines and signals display area information on sub-pictures, the display area information indicating what portion of a displayed projection picture corresponds to a sub-picture referred to by an adaptation set. In other words, the MPD file generating section 113 defines display area information on a sub-picture as information for each sub-picture.

For example, the MPD file generating section 113 defines a 2D coverage information descriptor as display region information on a sub-picture, and signals the 2D coverage information descriptor as a different descriptor from the region-by-region packing descriptor. For example, the MPD file generating section 113 defines the supplementary property at @ schemeIdUri ═ urn: mpeg: omaf:2017:2dco "as the 2D overlay information descriptor. Note that the MPD file generating section 113 may define the 2D overlay information descriptor using the same basic characteristics of the schemeIdUri.

In other words, the image encoding data of each sub-picture may be managed for each adaptation set, the arrangement information of each picture region may be stored in a region-by-region packing descriptor, and display region information on the sub-picture (region-related information related to a region in the entire picture corresponding to the sub-picture referred to by the adaptation set) may be defined in a supplemental characteristic or a basic characteristic in the MPD file.

Note that DASH clients that do not support schemeIdUri of basic properties need to ignore the adaptation set (which may alternatively be a representation, etc.) in which the properties are written. In addition, DASH clients that do not support schemeIdUri of supplemental properties may ignore the property values and utilize an adaptation set (which may alternatively be a representation, etc.).

Note that the 2D coverage information descriptor may be present not only in the adaptation set, but also in the MPD or representation. In addition, the 2D coverage information descriptor is applicable even in a case where a picture referred to by the adaptation set is not a sub-picture or a picture referred to by the adaptation set is not subjected to the region-wise packing process.

The attribute value 411 in fig. 38 represents an example of an attribute value of the 2D overlay information descriptor. As shown by the attribute value 411, omaf: @ proj _ picture _ width has the data type xs unassigned, and indicates the width of the projected picture. omaf @ proj _ picture _ height has the data type xs unsigned and indicates the height of the projected picture. omaf @ proj _ reg _ width has the data type xs unidentified and indicates the width of the area on the projection picture corresponding to the picture referred to by the adaptation set. omaf @ proj _ reg _ height has the data type xs unsignedInt and indicates the height of the region on the projection picture corresponding to the picture referred to by the adaptation set. omaf @ proj _ reg _ top has the data type xs unidentified and indicates the vertical coordinate of the area on the projection picture corresponding to the picture referred to by the adaptation set. omaf @ proj _ reg _ left has a data type xs unidentified and indicates the area horizontal coordinate on the projection picture corresponding to the picture referred to by the adaptation set.

Each of the above attribute values may be indicated by an actual number of pixels, or omaf: @ proj _ reg _ width, omaf: @ proj _ reg _ height, omaf: @ proj _ reg _ top, and omaf: @ proj _ reg _ lift may be indicated by relative values with respect to omaf: @ proj _ picture _ width and omaf: @ proj _ picture _ height.

In addition, information indicating that the entire picture is the same as the projection picture may be defined in a supplemental characteristic or a basic characteristic in the MPD file. For example, the MPD file generating section 113 defines the supplemental characteristics at @ schemeIdUri ═ urn: mpeg: omaf:2017: prid "shown in fig. 74 as the projection picture identity descriptors. For example, the presence of the descriptor in the adaptation set indicates that the entire picture including the sub-picture referred to by the adaptation set is not subjected to the region-wise packing process and is the same as the projection picture.

At this time, the display area of the sub-picture referred to by the adaptation set in which the same descriptor of the picture is projected may be represented, for example, by MPEG-DASH SRD (spatial relationship description) indicating the display area of each of two or more areas obtained by division of the entire picture, which are independently encoded. Although not shown, the SRD indicates sub-picture division information indicating, for example, the same manner of dividing a sub-picture as in the case of the sub-picture region box shown in the syntax 22 in fig. 4.

At this time, although not shown, in the adaptation set in which the projected picture identity descriptor exists, the semantics of object _ x, object _ y, object _ width, object _ height, total _ width, and total _ height corresponding to the attribute values of the SRD are identical to the semantics of the individual attributes of the 2D overlay information descriptor shown in attribute value 441 in FIG. 38, of: @ proj _ reg _ left, omaf: @ proj _ prog _ top, omaf: @ proj _ prog _ reg _ width, omaf: @ proj _ prog _ height, omaf: @ proj _ prog _ width, and omaf: @ proj _ picture _ height.

Note that the projection picture identity descriptor may be present not only in the adaptation set, but also in the MPD or representation, and information indicating that the entire picture is identical to the projection picture may be defined by any other descriptor, element, or attribute.

< case where the sub-picture includes a discontinuous region >

Note that in the above example, in the case where the sub-picture includes a discontinuous region on the projected picture, the display region information cannot be signaled. Thus, the 2D overlay information descriptor may be enabled to handle cases where the sub-picture comprises non-contiguous regions on the projected picture.

The attribute value 412 in fig. 39 represents an example of the attribute value of the 2D coverage information descriptor in the above case. As shown by attribute value 412, twoDCoverage is a container element with a data type omaf twtwoDCovergetType. twodcoverlap @ proj _ picture _ width has a data type xs, unsignedInt and indicates the width of the projection picture. twodcoverlap @ proj _ picture _ height has a data type xs unsignedInt and indicates the height of the projection picture.

twocdcoverigenfo has a data type omaf twocoverigenfect and indicates an element indicating area information about an area on a projection picture. Multiple attribute values may be signaled. twodcoverreach.twtwodcoverreach info @ proj _ reg _ width has a data type xs, unsignedInt and indicates the width of the area on the projection picture corresponding to the picture referred to by the adaptation set. twodcoverreach. twodcoverreach info @ proj _ reg _ height has a data type xs, unsignedInt and indicates the height of the area on the projection picture corresponding to the picture referred to by the adaptation set.

twodcoverreach. twodcoverreach info @ proj _ reg _ top has a data type xs, unsignedInt and indicates the vertical coordinates of the area on the projection picture corresponding to the picture referred to by the adaptation set. twodcoverreach.twtwodcoverreach info @ proj _ reg _ left has a data type xs, unsignedInt and indicates the area horizontal coordinates on the projection picture corresponding to the picture referred to by the adaptation set.

The data type 413 in fig. 40 represents an example of a definition representing a data type of the 2D coverage information descriptor.

As described above, by enabling signaling of a plurality of regions on a projection picture, signaling of non-continuous display regions on the projection picture is allowed.

< extensions of Packed by region descriptor >

The region-by-region packing descriptor defined in the OMAF may be extended to signal display region information about a display region of a sub-picture referred to by an adaptation set on a projection picture.

The attribute value 414 in fig. 41 represents an example of an attribute value based on the packed-by-region descriptor expanded with the attribute value 411 in the signaling fig. 38. The data type is similar to that in fig. 40.

As shown by attribute value 414, omaf: @ packing _ type has the data type omaf: Optinalstofunningedbyte and indicates the packing type packed by region. The attribute value of 0 indicates packing of the rectangular area.

omaf @ proj _ picture _ width has the data type xs unassigned and indicates the width of the projected picture. omaf @ proj _ picture _ height has the data type xs unsigned and indicates the height of the projected picture. omaf @ proj _ reg _ width has the data type xs unidentified and indicates the width of the area on the projection picture corresponding to the picture referred to by the adaptation set. omaf @ proj _ reg _ height has the data type xs unsignedInt and indicates the height of the region on the projection picture corresponding to the picture referred to by the adaptation set.

omaf @ proj _ reg _ top has the data type xs unidentified and indicates the vertical coordinate of the area on the projection picture corresponding to the picture referred to by the adaptation set. omaf @ proj _ reg _ left has a data type xs unidentified and indicates the area horizontal coordinate on the projection picture corresponding to the picture referred to by the adaptation set.

Each of the above attribute values may be indicated by an actual number of pixels, or omaf: @ proj _ reg _ width, omaf: @ proj _ reg _ height, omaf: @ proj _ reg _ top, and omaf: @ proj _ reg _ lift may be indicated by relative values with respect to omaf: @ proj _ picture _ width and omaf: @ proj _ picture _ height.

The attribute value 412 in fig. 42 represents an example of an attribute value based on the packed-by-region descriptor expanded with the attribute value 412 in fig. 39 (i.e., an attribute value of the packed-by-region descriptor handling the case including the discontinuous region). The data type is similar to that in fig. 40.

As shown by attribute value 415, omaf: @ packing _ type has the data type omaf: Optinalstofunningedbyte and indicates the packing type packed by region. The attribute value of 0 indicates packing of the rectangular area.

TwoDCoverage is a container element with a data type omaf TwoDCovergetType. twodcoverlap @ proj _ picture _ width has a data type xs, unsignedInt and indicates the width of the projection picture. twodcoverlap @ proj _ picture _ height has a data type xs unsignedInt and indicates the height of the projection picture. twocdcoverigenfo has a data type omaf twocoverigenfect and indicates an element indicating area information about an area on a projection picture. Multiple attribute values may be signaled.

twodcoverreach.twtwodcoverreach info @ proj _ reg _ width has a data type xs, unsignedInt and indicates the width of the area on the projection picture corresponding to the picture referred to by the adaptation set. twodcoverreach. twodcoverreach info @ proj _ reg _ height has a data type xs, unsignedInt and indicates the height of the area on the projection picture corresponding to the picture referred to by the adaptation set.

twodcoverreach. twodcoverreach info @ proj _ reg _ top has a data type xs, unsignedInt and indicates the vertical coordinates of the area on the projection picture corresponding to the picture referred to by the adaptation set. twodcoverreach.twtwodcoverreach info @ proj _ reg _ left has a data type xs, unsignedInt and indicates the area horizontal coordinates on the projection picture corresponding to the picture referred to by the adaptation set.

< extension of content overlay descriptor >

In addition, the content overlay descriptor defined in the OMAF and indicating the display area adapted to be collected on the spherical surface may be extended to enable signaling of the display area of the projection picture.

In other words, the image encoding data of each sub-picture may be managed for each adaptation set, the arrangement information of each picture region may be a region-by-region packed descriptor, and display region information on the sub-picture (region-related information related to a region in the entire picture corresponding to the sub-picture referred to by the adaptation set) indicating a display region of the adaptation set on a spherical surface may be defined in an overlay information descriptor in the MPD file.

Attribute value 416 in fig. 43 and attribute value 417 in fig. 44 represent examples of attribute values of the extended content overlay descriptor. In the case of the extended content overlay descriptor, the 2D _ coverage _ flag attribute is used to switch between signaling the spherical surface region and signaling the display region on the projection picture, similar to the case of the extended content overlay descriptor in the ISOBMFF file described above.

Cc is a container element with omaf CCType, as shown by the attribute value 416. cc @2D _ coverage _ flag is flag information having a data type xs, bolean, and indicating whether a display area is defined on a spherical surface or on a projection picture. A value of 0 for this property indicates a definition on the spherical surface and a value of 1 indicates a definition on the projection picture.

Sphere Coverage is a container element with information of the spherical surface display area of data type omaf, sphere Coverage type. This element is present only when cc @2D _ coverage _ flag is 0. A spherical coverage @ shape _ type has a data type xs unsigned byte and indicates the shape of the spherical surface region. The attribute value of 0 indicates that the area is surrounded by four large circles. In addition, the attribute value of 1 indicates that the region is surrounded by two azimuth circles and two elevation angles.

Personal coverage @ view _ idc _ presence _ flag is flag information having a data type xs, boolean and indicating whether a view _ idc attribute exists. An attribute value of 0 indicates that the view _ idc attribute is not present, and an attribute value of 1 indicates that the view _ idc attribute is present.

Logical coverage @ default _ view _ idc has the data type omaf ViewType and indicates a view that is common to all regions. For example, the attribute value of 0 indicates that all regions included in the sub-picture have a view type of a single view (view _ idc). In addition, the attribute value of 1 indicates that all regions included in the sub-picture have a view type of left view (view _ idc). For example, the attribute value of 2 indicates that all regions included in the sub-picture have a view type of right view (view _ idc). In addition, the attribute value of 3 indicates that all regions included in the sub-picture have a view type (view _ idc) of a stereoscopic view. When cc @ view _ idc _ presence _ flag is 0, there must be an attribute value. In addition, when cc @ view _ idc _ presence _ flag is 1, the attribute is prohibited from being present.

coverageinfo is an element having the data type omaf coverageInfoType and indicating spherical surface region information. Multiple elements may be signaled.

coverageinfo @ view _ idc has the data type omaf ViewType and indicates the view of each region. For example, the attribute value of 0 indicates that the region corresponding to the attribute value has a view type of a single view (view _ idc). In addition, the attribute value of 1 indicates that the region corresponding to the attribute value has a view type (view _ idc) of the left view. In addition, the attribute value of 2 indicates a view type (view _ idc) in which the region corresponding to the attribute value has a right view. In addition, the attribute value of 3 indicates that the region corresponding to the attribute value has a view type (view _ idc) of a stereoscopic view. When cc @ view _ idc _ presence _ flag is 0, the presence of the attribute value is prohibited. This attribute must be present when cc @ view _ idc _ presence _ flag is 1.

Sphere coverage. coverageinfo @ center _ azimuth has the data type omaf: Range1 and indicates the azimuth of the center of the spherical surface display area. Speedalcoveage coverageinfo @ center _ elevation has the data type omaf Range2 and indicates the elevation angle of the center of the display area of the spherical surface. Illustrative coverage info @ center tilt has the data type omaf Range1 and indicates the tilt angle of the center of the display area of the spherical surface. SpericalCoverage. coverageInfo @ azimuth _ range has the data type omaf: HRange and indicates the azimuthal range of the center of the spherical surface display area. Speedalcovege coverageinfo @ elevation range has the data type omaf VRange and indicates the range of elevation angles at the center of the display area of the spherical surface.

Twocdcoverage is a container element having display area information on a display area on a projection picture of data type omaf: twocovergetType. A container element is only present when cc @2D _ coverage _ flag is 1.

Twocdcoverage @ proj _ picture _ width has a data type xs, unsignedInt and indicates the width of the projection picture. Twodcoverage @ proj _ picture _ height has the data type xs unsignedInt and indicates the height of the projection picture. Twocdcoverigenfo is an element having a data type omaf twocoverigenfect and indicating area information about an area on a projection picture. Multiple elements may be signaled.

Twoddcovergeinfo @ proj reg _ width has a data type xs, unsignedInt and indicates the width of the region on the projection picture corresponding to the picture referred to by the adaptation set. Twoddcovergeinfo @ proj _ reg _ height has a data type xs, unsignedInt and indicates the height of the area on the projection picture corresponding to the picture referred to by the adaptation set.

Twocdcoverage, twocdcoverageinfo @ proj _ reg _ top has a data type xs, unsignedInt and indicates the vertical coordinates of the area on the projection picture corresponding to the picture referred to by the adaptation set. Twocdcoverage, twocdcoverageinfo @ proj _ reg _ left has a data type xs, unsignedInt and indicates the region horizontal coordinate on the projection picture corresponding to the picture referred to by the adaptation set.

Data type 418 in fig. 45, data type 419 in fig. 46, and data type 420 in fig. 47 are examples of definitions of data types of the extended content overlay descriptor.

< Signaling in case where the sub-picture division method dynamically changes >

Note that in the case where the display area in the projection picture dynamically changes within the stream, in addition to the above signaling, an additional flag may be signaled at the 2D overlay information descriptor, the region-wise packed descriptor, and the content overlay descriptor to indicate that the display area dynamically changes within the stream.

<4 > third embodiment

< signaling of stereoscopic information >

As described above in < identification of stereoscopic information > in <1. signaling of information related to a sub-picture >, in the case where the entire picture of the stereoscopic omnidirectional video is divided into sub-pictures, stereoscopic information on the entire picture is signaled in a stereoscopic video frame signaled under a sub-picture composition frame, and stereoscopic information on the sub-picture is signaled in a stereoscopic video frame under a scheme information frame in a sample entry.

For example, in the case where the entire picture is a stereoscopic image, the following three modes are variations of sub-pictures that can be generated.

Fig. 48 is a diagram showing an example of an aspect of the first mode of division into sub-pictures. In this case, the projection picture (entire picture) 431 is divided into sub-pictures 432 to 437. The projection picture 431 includes side-by-side stereoscopic images. Each of the sub-pictures 432-437 includes a left view and a right view that are the same display area on the projected picture 431 and correspond to an up-down or side-by-side frame packing arrangement such as may be signaled in a stereoscopic video frame.

Fig. 49 is a diagram showing an example of one aspect of the second mode of division into sub-pictures. In this case, the projection picture (entire picture) 441 is divided into sub-pictures 442 to 446. The projection picture 441 includes side-by-side stereoscopic images. Each of the sub-pictures 442 to 446 includes a left view and a right view, but the display area of the views does not match the display area on the projected picture 441. Each of the sub-pictures does not correspond to an up-down or side-by-side frame packing arrangement such as may be signaled in a stereoscopic video frame.

Fig. 50 is a diagram showing an example of an aspect of the third mode of division into sub-pictures. In this case, the projection picture (entire picture) 451 is divided into a sub-picture 452 and a sub-picture 453. The projection picture 451 includes side-by-side stereoscopic images. The sub-picture 452 is a mono-visual picture including only a left view. The sub-picture 453 is a mono-vision picture including a right-only view.

In the first mode, a stereoscopic video frame is signaled in the sample entry/rinf/schi of the sub-picture track to signal appropriate frame packing arrangement information. For example, in fig. 48, side-by-side signaling is used. Note that the frame packing arrangement information on the entire picture may be signaled instead of the frame packing arrangement information on the sub-picture.

In the second mode, no stereoscopic video frame is signaled in the sample entry/rinf/schi. Therefore, the client may not be able to identify whether the sub-picture is a mono-vision picture or whether the sub-picture includes a left view and a right view but does not apply frame packing arrangement information such as top-bottom or side-by-side to the sub-picture.

In the third mode, no stereoscopic video frame is signaled in the sample entry/rinf/schi. Therefore, the client may not be able to recognize whether the entire picture that is not divided is a mono-visual picture or a stereoscopic image. Whether magnification is required during rendering varies depending on whether the entire picture that is not divided is a mono-visual picture or a stereoscopic image, and therefore, it cannot be identified that the client may be prevented from performing appropriate rendering.

For example, as shown in fig. 50, for a sub-picture in which the entire picture is a side-by-side stereoscopic image and includes only a left view, rendering needs to be scaled up by two times in the horizontal direction. In contrast, in the case where the entire picture is a mono-vision picture, this process is unnecessary.

< signaling of stereoscopic information on the entire picture to be divided into sub-pictures in ISOBMFF >

Accordingly, stereoscopic information including information related to stereoscopic display of an entire picture to be divided into sub-pictures may be performed in an ISOBMFF file corresponding to a segment file.

In other words, image data of each of a plurality of sub-pictures, into which an entire picture is divided and then encoded, is stored on one track, and a file including stereoscopic information including information related to stereoscopic display of the entire picture can be generated.

For example, in the file generating apparatus 100 serving as an information processing apparatus, the segment file generating section 123 may function as a file generating section that stores, in each track, image data on one of a plurality of sub-pictures that are divided from an entire picture and then encoded, the file generating section generating a file including stereoscopic information including information relating to stereoscopic display of the entire picture. In other words, the information processing apparatus (e.g., the file generating apparatus 100) may include a file generating section (e.g., the segment file generating section 123).

This enables the client to more easily select a stream based on the information as described above.

In addition, the above-described picture (entire picture) may be all or a part of an omnidirectional video (projection plane image resulting from projection and mapping of an image extending 360 degrees around the horizontal direction and 180 degrees around the vertical direction). In other words, in the case where the file generating apparatus 100 uses all or a part of the projection plane image as the entire picture and divides the entire picture into sub-pictures, the present technology can be applied as described above.

This enables the client to more easily select a stream based on the information as described above even in the case of distributing omnidirectional video.

Note that stereoscopic information on the entire picture may be included in the ISOBMFF file as information of each sub-picture. This enables the client to easily know stereoscopic information about an entire picture (e.g., whether an image is an entire-picture stereoscopic image and what type of stereoscopic image) simply by referring to information about a sub-picture track.

< procedure of uploading treatment >

An example of the procedure of the upload process executed by the file generating apparatus 100 in fig. 11 in the above-described case will be described with reference to the flowchart in fig. 51.

When the upload process is started, in step S301, the data input section 111 of the file generation apparatus 100 acquires an image and metadata.

In step S302, the segment file generating section 123 generates an ISOBMFF file including stereoscopic information on the entire picture (projected picture) as information for each sub-picture.

In step S303, the ISOBMFF file generated by the processing in step S302 is recorded in the recording section 114.

In step S304, the upload section 115 reads the ISOBMFF file recorded in step S303 from the recording section 114, and uploads the ISOBMFF file to the server.

When the processing in step S304 ends, the upload processing ends.

By performing the upload process as described above, the file generation apparatus 100 can generate an ISOBMFF file including stereoscopic information on a projected picture as information for each sub-picture.

Therefore, based on this information, the client can more easily select and reproduce an appropriate stream corresponding to the capability of the client.

< Using stereoscopic information on the entire picture to be divided into sub-pictures, wherein the information is signaled in ISOBMFF >

In addition, streams may be selected and reproduced using stereoscopic information regarding an entire picture to be divided into sub-pictures, wherein the information is signaled in the ISOBMFF file.

In other words, image data of each of a plurality of sub-pictures into which an entire picture is divided and then encoded may be stored in one track, a file including stereoscopic information including information related to stereoscopic display of the entire picture may be acquired, and a stream of image encoding data may be selected based on the stereoscopic information included in the acquired file.

For example, in the client apparatus 200 corresponding to the information processing apparatus, the segmented file acquisition section 214 functions as a file acquisition section that stores, in each track, image data for each of a plurality of sub-pictures into which an entire picture is divided and then encoded, the file acquisition section acquires a file including stereoscopic information including information related to stereoscopic display of the entire picture, and the data analysis and decoding section 216 may function as an image processing section that selects a stream of image encoded data based on the stereoscopic information included in the file acquired by the file acquisition section. In other words, the information processing apparatus (e.g., the client apparatus 200) may include a file acquisition section (e.g., the segment file acquisition section 214) and an image processing section (e.g., the data analysis and decoding section 216).

This enables the client apparatus 200 to select a stream more easily.

Note that the above-described picture (entire picture) may be all or a part of an omnidirectional video (projection plane image resulting from projection and mapping of an image extending 360 degrees around the horizontal direction and 180 degrees around the vertical direction). In other words, in the case where the client apparatus 200 uses all or a part of the projection plane image as the entire picture, divides the entire picture into sub-pictures to acquire a sub-picture stream, and reproduces the stream, the present technology can be applied as described above.

In addition, stereoscopic information on the entire picture may be included in the ISOBMFF file as information for each sub-picture. This enables the client apparatus 200 to easily know stereoscopic information about an entire picture (e.g., whether an image is an entire-picture stereoscopic image and what type of stereoscopic image) simply by referring to information about a sub-picture track.

< procedure of content reproduction processing >

An example of the content reproduction processing procedure executed by the client apparatus 200 in the above-described case will be described with reference to the flowchart in fig. 52.

When the content reproduction process is started, in step S321, the segment file acquisition section 214 of the client device 200 acquires an ISOBMFF file that includes stereoscopic information on the entire picture (projected picture) as information for each sub-picture.

In step S322, the display control unit 215 acquires the measurement result of the viewpoint position (and the line of sight direction) of the user.

In step S323, the measurement unit 211 measures the transmission bandwidth of the network between the server and the client apparatus 200.

In step S324, the segment file processing section 221 determines whether the client apparatus 200 performs stereoscopic reproduction (or whether the client apparatus 200 has the capability of performing stereoscopic reproduction). In a case where it is determined that the client apparatus 200 performs stereoscopic reproduction (or has the capability of performing stereoscopic reproduction), the process proceeds to step S325.

In step S325, the segment file processing section 221 sets a stereoscopically displayable sub-picture track as a selection candidate. At this time, by referring to the stereoscopic information on the entire picture included in the ISOBMFF file acquired in step S321, the segment file processing section 221 may include, in the selection candidates, stereoscopically displayable sub-picture tracks (e.g., the sub-picture 445 and the sub-picture 446 in fig. 49) to which the frame packing arrangement in the second mode is not applied. When the processing in step S325 ends, the processing proceeds to step S327.

In addition, in step S324, in a case where it is determined that the client apparatus 200 does not perform stereoscopic reproduction (or is determined not to have the capability of performing stereoscopic reproduction), the processing proceeds to step S326.

In step S326, the segment file processing section 221 sets a sub-picture track including a mono-visual picture as a selection candidate based on, for example, stereoscopic information on the entire picture included in the ISOBMFF file acquired in step S321. At this time, by referring to the stereoscopic information on the entire picture included in the ISOBMFF file acquired in step S321, the segment file processing section 221 may know that the sub-picture track in the third mode (e.g., the sub-picture 452 or the sub-picture 453 in fig. 50) needs to be scaled up by two times in the horizontal direction during rendering. When the processing in step S326 ends, the processing proceeds to step S327.

In step S327, the segment file processing section 221 selects a sub-picture track corresponding to the field of view of the user of the client apparatus 200 from the candidates set in step S325 or step S326.

In step S328, the segment file processing section 221 extracts encoded data of the stream in the sub-picture track selected in step S327 from the ISOBMFF file acquired in step S321.

In step S329, the decoding unit 222 decodes the encoded data of the stream extracted in step S328.

In step S330, the display information generation unit 223 reproduces the stream (content) resulting from the decoding in step S329. More specifically, the display information generation section 223 generates data of a display image from the stream and feeds the data to the display section 217 to cause the display section 217 to display the display image.

When the processing in step S330 ends, the content reproduction processing ends.

By performing the content reproduction process as described above, the client apparatus 200 can more easily select a stream using stereoscopic information on an entire picture to be divided into sub-pictures, the stereoscopic information being included in the ISOBMFF file. For example, based on the information, the client apparatus 200 can more easily select and reproduce an appropriate stream corresponding to, for example, the capability of the client apparatus 200.

< signaling of stereoscopic information on entire picture in sample entry >

For example, in the case where the entire picture that is not divided is a stereoscopic image, the segment file generating section 123 of the file generating apparatus 100 may signal stereoscopic information on the entire picture in a sample entry in the sub-picture track.

For example, stereoscopic information about an entire picture may be stored in a scheme information box under a sample entry in the ISOBMFF file or in a box in a lower layer of the scheme information box.

< original stereoscopic video frame >

For example, in order to signal stereoscopic information on an entire picture that is not divided, the segment file generating section 123 of the file generating apparatus 100 may redefine an original stereoscopic video frame and signal a frame under the scheme information frame (schi) of a sample entry in a sub-picture track. In other words, the original stereoscopic video frame may store stereoscopic information on the entire picture that is not divided.

Note that the position of the original stereoscopic video frame is optional and is not limited to the above-described scheme information frame. In addition, the information signaled in the original stereoscopic video frame is similar to the information signaled in the stereoscopic video frame.

A syntax 461 in fig. 53 represents an example of the syntax of the original stereoscopic video frame. Fields such as single _ view _ allowed, stereo _ scheme, length, and stereo _ indication _ type are defined in the original stereoscopic video frame as shown in syntax 461.

The semantics 462 in fig. 54 represent an example of the semantics of the fields defined in the original stereoscopic video frame. Single _ view _ allowed is information indicating the type of view allowed, as shown by semantics 462. For example, a value of 0 for this field indicates that the content is intended to be displayed only on a display that supports stereoscopic vision. In addition, a value of 1 for this field indicates that content is enabled to be displayed in a right view on the mono-visual display. In addition, a value of 2 for this field indicates that content is enabled to be displayed in a left view on the mono-visual display.

The stereo _ scheme is information related to a frame packing method. For example, a value of 1 for this field indicates that the frame packing method conforms to the frame packing arrangement SEI in ISO/IEC 14496-10. In addition, this field with a value of 2 indicates that the frame packing method conforms to annex.L of ISO/IEC 13818-2. In addition, this field having a value of 3 indicates that the frame packing method conforms to frame/service compatibility and 2D/3D hybrid service in ISO/IEC 23000-11.

length indicates the byte length of the stereo _ indication _ type. In addition, the stereo _ indication _ type indicates a frame packing method compliant with stereo _ scheme.

The segment file processing part 221 of the client apparatus 200 may acquire stereoscopic information on an entire picture by referring to the original stereoscopic video frame and the 2D overlay information frame. Then, based on this information, in the case where the stereoscopic video frame is not signaled in the sample entry in the sub-picture track, the segment file processing section 221 can easily identify whether the sub-picture is monoscopic or the sub-picture includes the left view and the right view, but without parsing the sub-picture composition frame, the frame packing arrangement signaled in the stereoscopic video frame is not applied to the sub-picture. In other words, as in the case where the track is not a sub-picture track, the stereoscopic information may be identified only from the information stored in the sample entry.

In other words, the client apparatus 200 can independently select and reproduce a sub-picture track without parsing the sub-picture composition frame.

< signalling of display size >

Further, the display size scaled up based on the frame packing arrangement for the entire picture can be signaled in the width and height of the track header of the sub-picture track storing the mono-visual sub-picture (as a sub-picture of the mono-visual picture) resulting from the division of the entire picture of the stereoscopic

In other words, the ISOBMFF file may include information related to the display size of the sub-picture.

An example of this is shown in fig. 55. As shown in fig. 55, a sub-picture 471 and a sub-picture 472 generated from the entire picture of the stereoscopic image include images in different areas scaled down in the horizontal direction. Therefore, it is necessary to scale up in the horizontal direction at the time of display (rendering). Thus, the display size of the image 473 for the display time is signaled as the width and height of the track header. This allows the client device 200 to properly render the mono-visual sub-picture.

Note that the signaling of the pixel aspect ratio box ((pasp) defined in ISOBMFF) that signals the pixel aspect ratio information for display time may be signaled in the virtual sample entry instead of the width and height of the track header.

Syntax 481 in fig. 56 represents an example of syntax of a pixel aspect ratio box. As shown in syntax 481, fields are defined in the pixel aspect ratio box, e.g., hSpacing and vSpacing.

Semantics 482 in fig. 57 represents an example of the semantics of the fields defined in the pixel aspect ratio box. As shown in syntax 482, hsacing and vsacing are information indicating relative pixel height and relative pixel width. During rendering, the pixel width is multiplied by vSpace/hSpace to be displayed based on this information.

An example of this is shown in fig. 58. The sub-picture 491 and the sub-picture 492 shown in fig. 58 are sub-pictures generated from the entire picture of the stereoscopic image, and each sub-picture includes images in different regions scaled down in the horizontal direction. Thus, by signaling hSpace-1 and vSpace-2, client device 200 may perform pixel width-doubled rendering when, for example, displaying (rendering) sub-picture 491 to display the image at an appropriate aspect ratio similar to image 493. In other words, the client device 200 can appropriately render a mono-visual sub-picture (including a sub-picture of a mono-visual image).

< original solution information box >

In addition, the original scheme information box may be redefined under a restricted scheme information box (rinf) of a sample entry in a sub-picture track, and stereoscopic information on an entire picture that is not divided may be signaled in the box. Note that the position where the original scenario information box is defined is optional and is not limited to the above-described rinf.

A syntax 501 in fig. 59 represents an example of the syntax of the original schema information box in the above case. As shown in the syntax 501, for example, scheme _ specific _ data is defined in the original scheme information box.

Information on the entire picture that has not been divided into sub-pictures is signaled in the scheme _ specific _ data. For example, in the case where the entire picture is stereoscopic, a stereoscopic video frame including stereoscopic information on the entire picture may be signaled. This enables the client apparatus 200 to independently select and reproduce sub-picture tracks without parsing the sub-picture composition frame.

Note that not only the stereoscopic video frame but also the post-processing information related to the entire picture can be signaled in the scheme _ specific _ data (e.g., the per-region packing frame).

< additional information to facilitate selection of sub-picture tracks >

Furthermore, the 2D overlay information box may be extended to signal stereo related information that facilitates track selection.

For example, a stereo _ presentation _ enable _ flag may be added to the 2D overlay information frame to signal whether the sub-picture tracks are stereoscopically displayable. By referring to this information, the client apparatus 200 can recognize whether or not stereoscopic display is possible without performing processing of recognizing whether or not stereoscopic display is possible based on the above-described stereoscopic information on the entire picture in the third embodiment and the region information on the region on the projected picture signaled in the 2D overlay information frame defined in the first embodiment.

In other words, the ISOBMFF file may further include sub-stereoscopic information including information related to stereoscopic display of each sub-picture.

The syntax 502 in fig. 60 represents an example of the 2D overlay information box in the above case. As shown in the syntax 502, in this case, stereo _ presentation _ capable is also defined (stereo _ presentation _ capable is added to the defined fields).

Semantics 503 in fig. 61 represents an example of the semantics of the added field. Stereo _ presentation _ capable is information related to the stereoscopic display of pictures in a track, as shown by semantics 503. A value of 0 for this field indicates that the picture in the track is monoscopic or that the picture includes L view and R view but is not stereoscopically displayable. A value of 1 for this field indicates that some areas of the pictures in the track are stereoscopically displayable. A value of 2 for this field indicates that all regions of the pictures in the track are stereoscopically displayable.

Fig. 62 is a diagram showing an example of signaling of stereo _ presentation _ enable. For example, as shown in the upper side of fig. 62, sub-pictures 512 to 517 resulting from division of an entire picture (projection picture) 511 including side-by-side stereoscopic images each include a left view and a right view and are stereoscopically displayable. Therefore, the streo _ presentation _ capable is set to 2 for the sub-picture.

In contrast, as shown in the lower side of fig. 62, sub-pictures 522 to 526 resulting from division of the entire picture (projection picture) 521 including the side-by-side stereoscopic images each include a left view and a right view. However, the sub-pictures 522 to 524 are not stereoscopically displayable. Therefore, the streo _ presentation _ capable is set to 0 for the sub-picture.

In addition, for sub-picture 525 and sub-picture 526, some of the regions of the sub-picture are stereoscopically displayable. Therefore, the streo _ presentation _ capable is set to 1 for the sub-picture.

Note that, under a separate frame (dedicated frame storing information), information indicating whether a picture is stereoscopically displayable, for example, a redefined track stereoscopic video frame, may be signaled under the schi of a sub-picture track.

Syntax 531 in fig. 63 represents an example of syntax of the orbital stereoscopic video frame in the above case.

In addition, the above-described rectangular projection area structure of the first embodiment or the region-by-region packing box signaled below the projected omnidirectional video frame of the sample entry in the sub-picture track may be extended to signal the stereo _ presentation _ capable _ flag. In addition, the stereo _ presentation _ enable _ flag may be signaled in any other block.

Further, in case of disabling stereoscopic display, a track _ not _ integrated _ for _ presentation _ one flag of a track header frame may be used to signal that independent reproduction of sub-pictures is not desired.

Note that the various types of information described above can be applied to a case where the picture stored in the track is not a sub-picture.

< Signaling of View information >

In addition, the 2D overlay information box may be extended to additionally signal view information of a display area of a sub-picture on the projected picture. By referring to this information, the client apparatus 200 can easily recognize whether a sub-picture is a mono-visual image or the sub-picture includes an L view and an R view without performing a process of identifying view information of each region according to the above-described stereoscopic information about the entire picture in the third embodiment and the region information about the region on the projected picture signaled in the 2D overlay information frame defined in the third embodiment.

In other words, the ISOBMFF file may further include view information indicating a view type of the sub-picture.

A syntax 532 in fig. 64 is a diagram showing an example of syntax of the 2D overlay information box in the above case. As shown in the syntax 532, in this case, fields such as view _ idc _ presence _ flag, default _ view _ idc, and view _ idc are additionally defined in the 2D overlay information frame.

Semantics 533 in fig. 65 represent an example of the semantics of a field additionally defined in the 2D overlay information box. As shown by semantics 533, view _ idc _ presence _ flag indicates whether there is a separate view _ idc in each region. For example, a value of 0 for this field indicates that there is no separate view _ idc in each region. In addition, a value of 1 for this field indicates that there is a separate view _ idc in each region.

In other words, the ISOBMFF file may further include information indicating whether view information of each region exists.

default _ view _ idc indicates a view common to all regions. For example, a value of 0 for this field indicates that all regions in the sub-picture correspond to a mono-visual view. In addition, a value of 1 for this field indicates that all regions in the sub-picture correspond to the left view. In addition, a value of 2 for this field indicates that all regions in the sub-picture correspond to the right view. In addition, a value of 3 for this field indicates that all regions in the sub-picture correspond to stereoscopic views.

view _ idc indicates a view of each region. For example, a value of 0 for this field indicates that the region corresponds to a single visual view. In addition, a value of 1 for this field indicates that the region corresponds to the left view. In addition, a value of 2 for this field indicates that the region corresponds to the right view. In addition, a value of 3 in this field indicates that the region corresponds to a stereoscopic view. In addition, in the case where no field exists, default _ view _ idc indicates an indication of a view of each region.

In other words, the view information may be information of each region included in the sub-picture.

Fig. 66 is a diagram illustrating an example of an aspect in which view _ idc is signaled. As shown in fig. 66, in the case where the entire picture 541 side by side is divided into sub-pictures such as a sub-picture 542 and a sub-picture 543, view _ idc is set to 3 for each of the sub-pictures.

In contrast, in the case of dividing the entire picture 541 into sub-pictures such as the sub-picture 544 and the sub-picture 545, the view _ idc is set to 1 for the sub-picture 544 and 2 for the sub-picture 545.

Note that the above-described type of additional information can be applied even in the case where the pictures stored in the track are not sub-pictures.

Similarly, the per-region bounding box and rectangular (Rect) projection region structure defined in the first embodiment may be extended to signal view information.

<5 > fourth embodiment >

< signaling of stereoscopic information on entire picture to be divided into sub-pictures in MPD >

Stereoscopic information related to the entire picture divided into sub-pictures as described above may be signaled in the MPD file. In other words, in order to enable a client to select and reproduce an adaptation set, for example, referring to a sub-picture according to the capability of the client, in an MPD file, stereoscopic information about an entire picture to be divided into sub-pictures may be redefined and signaled in adaptation.

In other words, a control file may be generated which manages, for each adaptation set, image encoding data of each of a plurality of sub-pictures into which an entire picture is divided and then encoded, and which includes stereoscopic information including information relating to stereoscopic display of the adaptation set, the control file being used to control distribution of the image encoding data.

For example, in the file generating apparatus 100 serving as an information processing apparatus, the MPD file generating section 113 may serve as a file generating section that generates a control file that manages, for each adaptation set, image encoded data of each of a plurality of sub-pictures into which an entire picture is divided and then encoded, and that includes stereoscopic information including information relating to stereoscopic display of the adaptation set, the control file being used to control distribution of the image encoded data. In other words, the information processing apparatus (e.g., file generation apparatus 100) may include a file generation section (e.g., MPD file generation section 113).

This enables the client to more easily select a stream based on the information as described above.

In addition, the above-described picture (entire picture) may be all or a part of an omnidirectional video (projection plane image resulting from projection and mapping of an image extending 360 degrees around the horizontal direction and 180 degrees around the vertical direction). In other words, in the case where the file generating apparatus 100 uses all or a part of the projection plane image as the entire picture and divides the entire picture into sub-pictures, the present technology can be applied as described above.

This enables the client to more easily select a stream based on the information as described above even in the case of distributing omnidirectional video.

Note that the area-related information (display area information) may be included in the MPD file as information for each sub-picture. This enables the client to easily know which part of the entire picture corresponds to the sub-picture simply by referring to information on the sub-picture referred to by the adaptation set.

< procedure of uploading treatment >

An example of the procedure of the upload process executed by the file generating apparatus 100 in fig. 11 in the above-described case will be described with reference to the flowchart in fig. 67.

When the upload process is started, in step S401, the data input section 111 of the file generating apparatus 100 acquires an image and metadata.

In step S402, the segment file generating unit 123 generates a segment file of the image.

In step S403, the MPD file generated by the MPD file generating section 113 includes stereoscopic information on an entire picture (projected picture) as information for each sub-picture.

In step S404, the segment file generated by the processing in step S402 is recorded in the recording section 114. The MPD file generated by the processing in step S403 is recorded in the recording unit 114.

In step S405, the uploading section 115 reads the segment file recorded in step S404 from the recording section 114, and uploads the segment file to the server. In addition, the uploading section 115 reads the MPD file recorded in step S404 from the recording section 114, and uploads the MPD file to the server.

When the processing in step S404 ends, the upload processing ends.

By performing the upload process as described above, the file generation apparatus 100 can generate an MPD file including stereoscopic information on an entire picture as information for each sub-picture.

Accordingly, based on the display area information, the client can more easily select and reproduce, for example, an appropriate stream corresponding to the capability of the client apparatus 200.

< Using stereoscopic information about an entire picture to be divided into sub-pictures, wherein the stereoscopic information is signaled in an MPD file >

In addition, a stream may be selected using stereoscopic information on an entire picture to be divided into sub-pictures, where the stereoscopic information is signaled in an MPD file.

In other words, a control file may be acquired which manages, for each adaptation set, image encoding data of each of a plurality of sub-pictures into which an entire picture is divided and then encoded, and which includes stereoscopic information including information relating to stereoscopic display of the adaptation set, the control file being used to control distribution of the image encoding data. Then, selection of the image encoding data stream may be performed based on the stereoscopic information included in the acquired control file.

For example, in the client apparatus 200 serving as an information processing apparatus, the MPD file acquisition section 212 may function as a file acquisition section that manages, for each adaptation set, image encoded data of each of a plurality of sub-pictures into which an entire picture is divided and then encoded, and acquires a control file for controlling distribution of the image encoded data and including stereoscopic information including information related to stereoscopic display of the adaptation set, and the MPD file processing section 213 may function as an image processing section that selects a stream of the image encoded data based on the stereoscopic information included in the acquired control file. In other words, the information processing apparatus (e.g., the client apparatus 200) may include a file acquisition section (e.g., the MPD file acquisition section 212) and an image processing section (e.g., the MPD file processing section 213).

This enables the client apparatus 200 to select a stream more easily.

Note that the above-described picture (entire picture) may be all or a part of an omnidirectional video (projection plane image resulting from projection and mapping of an image extending 360 degrees around the horizontal direction and 180 degrees around the vertical direction). In other words, in the case where the client apparatus 200 uses all or a part of the projection plane image as the entire picture, divides the entire picture into sub-pictures to acquire a stream of the sub-pictures, and reproduces the stream, the present technology can be applied as described above.

In addition, the region-related information (display region information) may be included in the MPD file as information for each sub-picture. This enables the client apparatus 200 to easily know which part of the entire picture corresponds to the sub-picture simply by referring to information on the sub-picture referred to by the adaptation set.

< procedure of content reproduction processing >

An example of the content reproduction processing procedure performed by the client apparatus 200 in the above-described case will be described with reference to the flowchart in fig. 68.

When the content reproduction process is started, in step S421, the MPD file acquisition section 212 of the client apparatus 200 acquires an MPD file that includes stereoscopic information on an entire picture (projected picture) as information for each sub-picture.

In step S422, the display control unit 215 acquires the measurement result of the viewpoint position (and the line of sight direction) of the user.

In step S423, the measurement unit 211 measures the transmission bandwidth of the network between the server and the client device 200.

In step S424, the MPD file processing section 213 determines whether the client apparatus 200 performs stereoscopic reproduction (or whether the client apparatus 200 has the capability of performing stereoscopic reproduction). In a case where it is determined that the client apparatus 200 performs stereoscopic reproduction (or has the capability of performing stereoscopic reproduction), the process proceeds to step S425.

In step S425, the MPD file processing section 213 sets an adaptation set referring to the stereoscopically displayable sub-picture as a selection candidate. At this time, by referring to the stereoscopic information on the entire picture included in the MPD file acquired in step S421, the MPD file processing section 213 may include, in the selection candidates, adaptation set stereoscopically displayable sub-pictures in which the frame packing arrangement in the second mode is not applied (e.g., the sub-picture 445 and the sub-picture 446 in fig. 49). When the processing in step S425 ends, the processing proceeds to step S427.

In addition, in step S424, in a case where it is determined that the client apparatus 200 does not perform stereoscopic reproduction (or determines that it does not have a stereoscopic reproduction function), the processing proceeds to step S426.

In step S426, the MPD file processing section 213 sets, as selection candidates, an adaptation set that refers to a sub picture including a mono-visual picture, based on, for example, stereoscopic information about an entire picture included in the MPD file acquired in step S421. At this time, by referring to the stereoscopic information on the entire picture included in the MPD file acquired in step S421, the MPD file processing section 213 may know that a sub-picture in the third mode (for example, the sub-picture 452 or the sub-picture 453 in fig. 50) needs to be enlarged twice in the horizontal direction during rendering. When the processing in step S426 ends, the processing proceeds to step S427.

In step S427, the MPD file processing section 213 selects an adaptation set that refers to a sub-picture corresponding to the field of view of the user of the client apparatus 200 from the candidates set in step S425 or step S426.

In step S428, the MPD file processing section 213 selects, from the adaptation set selected in step S424, representations corresponding to the viewpoint position and the line-of-sight direction of the user, the transmission bandwidth of the network between the client and the server, and the like.

In step S429, the segment file acquisition section 214 acquires the segment file corresponding to the representation selected in step S428.

In step S430, the segment file processing section 221 extracts encoded data from the segment file acquired in step S429.

In step S431, the decoding unit 222 decodes the encoded data of the stream extracted in step S430.

In step S432, the display information generation unit 223 reproduces the stream (content) obtained from the decoding in step S431. More specifically, the display information generation section 223 generates data of a display image from the stream and feeds the data of the display image to the display section 217 to cause the display section 217 to display the display image.

When the processing in step S432 ends, the content reproduction processing ends.

By performing the content reproduction process as described above, the client apparatus 200 can more easily select a stream using information about the display area of the sub-picture included in the MPD file. For example, based on the information, the client apparatus 200 can more easily select and reproduce an appropriate stream corresponding to, for example, the capability of the client apparatus 200.

< Signaling details of stereoscopic information in MPD File >

For example, as described in the third embodiment, the 2D overlay information descriptor may be extended to signal a stereo _ presentation _ enable field and view information.

In other words, the MPD file may further include view information indicating a view type of the sub-picture.

In addition, the view information may be information of each of the regions included in the sub-picture.

In addition, the MPD file may further include information indicating whether view information exists for each region.

The attribute value 551 in fig. 69 and the attribute value 552 in fig. 70 represent examples of the attribute value of the extended 2D overlay information descriptor. As shown by attribute value 551 and attribute value 552, twoDCoverage is a container element having a data type omaf, twoDCovergetType. twoDCover @ stereo _ presentation _ enable has the data type omaf StereoPresentitionType and indicates whether the adaptation set is stereoscopically displayable. For example, the attribute value of 0 indicates that the picture referred to by the adaptation set is monoscopic or not stereoscopically displayable. In addition, the attribute value of 1 indicates that some regions of the picture referred to by the adaptation set are stereoscopically displayable. In addition, the attribute value of 2 indicates that all regions of the picture are stereoscopically displayable.

twodcover @ view _ idc _ presence _ flag has the data type xs, bootean and indicates whether there is a separate view _ idc in each region. For example, the attribute value of 0 indicates that there is no separate view _ idc in each region. In addition, the attribute value of 1 indicates that there is a separate view _ idc in each region. twoDCoverlap @ default _ view _ idc has the data type omaf ViewType and indicates a view that is common to all regions. For example, the attribute value of 0 indicates a mono-visual view, the attribute value of 1 indicates a left view, the attribute value of 2 indicates a right view, and the attribute value of 3 indicates a stereoscopic view. Note that this attribute must exist when twodcover @ view _ idc _ presence _ flag is 0. In addition, when twodcoverride @ view _ idc _ presence _ flag is 1, the attribute is prohibited from being present.

twodcoverlap @ proj _ picture _ width has a data type xs, unsignedInt and indicates the width of the projection picture. twodcoverlap @ proj _ picture _ height has a data type xs unsignedInt and indicates the height of the projection picture. twocdvoverigenfo has a data type omaf twocdvoverigenfontype and indicates an element of area information related to an area on a projection picture. Multiple signals may be signaled.

twodcovergeinfo @ view _ idc has the data type omaf ViewType and indicates the view of each region. For example, the attribute value of 0 indicates a mono-visual view, the attribute value of 1 indicates a left view, the attribute value of 2 indicates a right view, and the attribute value of 3 indicates a stereoscopic view. Note that when twodcoverride @ view _ idc _ presence _ flag is 0, the attribute is prohibited from being present. In addition, when twodcover @ view _ idc _ presence _ flag is 1, this attribute must be present.

twodcoverreach.twtwodcoverreach info @ proj _ reg _ width has a data type xs, unsignedInt and indicates the width of the area on the projection picture corresponding to the picture referred to by the adaptation set. twodcoverreach. twodcoverreach info @ proj _ reg _ height has a data type xs, unsignedInt and indicates the height of the area on the projection picture corresponding to the picture referred to by the adaptation set.

twodcoverreach. twodcoverreach info @ proj _ reg _ top has a data type xs, unsignedInt and indicates the vertical coordinates of the area on the projection picture corresponding to the picture referred to by the adaptation set. twodcoverreach.twtwodcoverreach info @ proj _ reg _ left has a data type xs, unsignedInt and indicates the area horizontal coordinates on the projection picture corresponding to the picture referred to by the adaptation set.

A data type 553 in fig. 71 indicates an example of the definition of the data type in this case.

Note that the extended region-by-region packing descriptor and the content overlay descriptor described in the second embodiment may be further extended to signal stereo _ presentation _ enable and view information. In addition, any other descriptor may be used to signal these types of information.

<6. supplementary feature >

< computer >

The series of processing steps described above may be caused to be performed by hardware or software. In the case where a series of processing steps is executed by software, a program included in the software is installed in a computer. Here, the computer includes a computer integrated in dedicated hardware, or a general-purpose personal computer that can perform various functions by installing various programs, for example.

Fig. 72 is a block diagram showing a configuration example of hardware of a computer that executes the above-described series of processing steps according to a program.

In a computer 900 shown in fig. 72, a CPU (central processing unit) 901, a ROM (read only memory) 902, a RAM (random access memory) 903, and a bus 904 are connected together.

The bus 904 is also connected to an input/output interface (I/O interface) 910. The I/O interface 910 is connected to the input unit 911, the output unit 912, the storage unit 913, the communication unit 914, and the drive 915.

The input unit 911 includes, for example, a keyboard, a mouse, a microphone, a touch panel, and an input terminal. The output portion 912 includes, for example, a display, a speaker, and an output terminal. The storage section 913 includes, for example, a hard disk, a RAM disk, and a nonvolatile memory. The communication section 914 includes, for example, a network interface. The drive 915 drives a removable medium 921 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

In the computer configured as described above, the CPU 901 loads a program stored in the storage section 913, for example, into the RAM 903 via the I/O interface 910 and the bus 904, and executes the program to execute the above-described series of processing steps. The RAM 903 also appropriately stores, for example, data necessary for executing the respective steps of the processing executed by the CPU 901.

The program executed by the computer (CPU 901) may be recorded in, for example, a removable medium 921 as a package medium application. In this case, the removable medium 921 is installed in the drive 915 so that the program can be installed in the storage section 913 via the input/output interface 910.

In addition, the program may be provided via a wired or wireless transmission medium such as a local area network, the internet, or digital satellite broadcasting. In that case, the program may be received by the communication section 914 and installed in the storage section 913.

In addition, the program may be installed in advance in the ROM 902 or the storage section 913.

< object of applying the present technology >

The case where the present technology is applied to an ISOBMFF file or an MPD file has been described. However, the present technology is not limited to these examples, and may be applied to a file that conforms to any standard and is used to distribute a stream of projection plane images having a three-dimensional structure image mapped to a single plane. In other words, unless specifications of various types of processing (such as distribution control, file format, and encoding and decoding schemes) are inconsistent with the present technology, the specifications are optional. In addition, some of the above process steps and specifications may be omitted unless the omission is inconsistent with the present technique.

In addition, the file generation apparatus 100 and the client apparatus 200 have been described above as application examples of the present technology. However, the present technique can be applied to any configuration.

For example, the present technology can be applied to various types of electronic devices, such as transmitters and receivers (e.g., television receivers and cellular phones) for satellite broadcasting, cable broadcasting (such as cable television), distribution over the internet, and distribution to terminals through cellular communication, or apparatuses that record images in media such as optical disks, magnetic disks, and flash memories and reproduce images from storage media (e.g., hard disk recorders and image pickup apparatuses).

In addition, for example, the present technology can be implemented as a partial configuration of an apparatus such as a processor (e.g., a video processor) serving as a system LSI (large scale integration), a module (e.g., a video module) using a plurality of processors or the like, a unit (e.g., a video unit) using a plurality of modules or the like, or a device (e.g., a video device) including a unit to which any other function is added.

In addition, the present technology can be applied to a network system including a plurality of devices. For example, the present technology may be implemented as cloud computing in which a plurality of apparatuses share and collectively perform processing via a network. For example, the present technology may be implemented in a cloud service in which a service related to an image (moving image) is provided to any terminal, such as a computer, an AV (audio visual) device, a personal digital assistant, an IoT (internet of things) device.

Note that a system as used herein refers to a collection of a plurality of components (devices, modules (portions), etc.), whether or not all of the components are present in the same housing. Therefore, both a plurality of devices accommodated in different housings and connected together via a network and one device including a plurality of modules accommodated in one housing are systems.

< field and purpose to which the present technology is applied >

The system, apparatus, processing section, etc. to which the present technology is applied may be used in any field, for example, transportation, medical care, crime prevention, agriculture, animal husbandry, mining, cosmetics, factories, home appliances, weather, and nature monitoring. In addition, the system, apparatus, processing section, etc. may be used for any purpose.

For example, the present technology can be applied to a system and apparatus for providing viewing and listening to content and the like. In addition, for example, the present technology can be applied to systems and apparatuses for transportation purposes such as monitoring of traffic conditions and automatic operation control. Further, for example, the present technology can be applied to a system and a device for security purposes. In addition, for example, the present technology can be applied to a system and an apparatus for the purpose of automatic control of a machine or the like. Furthermore, the present techniques may be applied to systems and devices for agricultural and animal husbandry purposes, for example. In addition, the present technology can be applied to systems and devices that monitor natural states such as volcanoes, forests and oceans, and wildlife. Further, for example, the present technology can be applied to a system and an apparatus for the purpose of sports.

< other characteristics >

Note that the "flag" used herein refers to information for identifying a plurality of states, and includes not only information for identifying two states of true (1) or false (0), but also information enabling identification of three or more states. Thus, the "flag" may take, for example, 1/0 two values or three or more values. In other words, the number of bits included in the "flag" is optional, and the number of bits may be one or more bits. In addition, it is assumed that the identification information (including the flag) is not only in a form of including the identification information in the bitstream but also in a form of including difference information regarding a difference of the identification information from the specific reference information in the bitstream. Therefore, the "flag" and the "identification information" used herein include not only information but also difference information relating to a difference from the reference information.

In addition, various types of information (metadata, etc.) about encoded data (bit stream) may be transmitted or recorded in any form as long as the information is associated with the encoded data. The term "associated" as used herein means that one data is made available (made linkable) when another data is processed, for example. In other words, data associated with each other may be integrated into one data or used as separate data. For example, information associated with the encoded data (image) may be transmitted on a different transmission line from the transmission line on which the encoded data (image) is transmitted. In addition, the information associated with the encoded data (image) may be recorded in a recording medium different from the recording medium in which the encoded data (image) is recorded (or in a recording area different from the recording area in which the encoded data (image) is recorded in the same recording medium). Note that such "association" may correspond to a portion of data rather than the entire data. For example, an image may be associated with information corresponding to the image in any unit (e.g., a plurality of frames, a frame, or a portion within a frame).

Note that the terms "synthesize", "multiplex", "add", "integrate", "include", "store", "put", "access", and "insert" used herein refer to the collection of a plurality of objects as one object, for example, the collection of encoded data and metadata as one data, and refer to a method for the above-described "association".

In addition, the embodiments of the present technology are not limited to the above-described embodiments, and various changes may be made to the embodiments without departing from the spirit of the present technology.

For example, a configuration described as one apparatus (or one processing section) may be divided into a plurality of apparatuses (or a plurality of processing sections). In contrast, a configuration described as a plurality of devices (a plurality of processing sections) may be integrated into one device (or one processing section). In addition, as for the configuration of each device (or each processing section), it is obvious that a configuration other than the above-described configuration may be added. Further, a part of the configuration of one apparatus (or one processing section) may be included in the configuration of another apparatus (or another processing section) as long as the configuration and operation of the system as a whole remain substantially the same.

In addition, for example, the above-described program may be executed in any apparatus. In that case, it is sufficient if the apparatus includes the required functions (function blocks, etc.) and can obtain the required information.

In addition, for example, each step in one flowchart may be performed by one device or shared by a plurality of devices. Further, in the case where one step includes a plurality of processes, the plurality of processes may be executed by one apparatus or shared by a plurality of apparatuses. In other words, a plurality of processes included in one step can be executed as a plurality of process steps. On the contrary, the process described as a plurality of steps can be integrated into one step to be executed.

In addition, for example, a program executed by a computer may be configured such that processing steps describing the program are executed in chronological order in the order described herein, or are executed in parallel or individually at a desired timing (for example, a timing of calling processing). In other words, the processing steps may be performed in a different order than the above-described order. Further, the processing steps describing a program may be executed in parallel with or in combination with the processing of another program.

In addition, for example, unless the implementation is inconsistent with the present technology, a plurality of technologies related to the present technology may be implemented independently and individually. It will be apparent that any number of the present techniques may be implemented together. For example, some or all of the present technology described in any embodiment may be implemented in combination with some or all of the present technology described in another embodiment. Additionally, some or all of any of the techniques described above may be implemented with another technique not described above.

Note that the present technology can also adopt the following configuration.

(1)

An information processing apparatus, comprising:

a file generating section configured to generate a file including, as information different from arrangement information of each of picture regions, region-related information relating to a region corresponding to the stored sub-picture in the entire picture, and also including image encoding data resulting from encoding the sub-picture.

(2)

The information processing apparatus according to (1), wherein,

the picture comprises an omnidirectional video.

(3)

The information processing apparatus according to (1) or (2), wherein,

the region-related information is included in the file as information of each of the sub-pictures.

(4)

The information processing apparatus according to (3), wherein,

the files include ISOBMFF (international organization for standardization base media file format) files,

the arrangement information of each of the picture areas includes information signaled in a region-by-region packing box, an

The region-related information is stored in a scheme information box in the ISOBMFF file, the scheme information box being different from the region-by-region packing box, or the region-related information is stored in a box of a lower layer of the scheme information box.

(5)

The information processing apparatus according to (3), wherein,

the file includes an ISOBMFF (international organization for standardization base media file format) file, an

The area-related information is stored in an overlay information box indicating a display area of the track on the spherical surface.

(6)

The information processing apparatus according to any one of (1) to (5), wherein,

the region-related information changes dynamically within the stream.

(7)

The information processing apparatus according to (6), wherein,

the file includes an ISOBMFF (international organization for standardization base media file format) file, an

The region-related information is stored in a supplemental enhancement information message.

(8)

The information processing apparatus according to (6), wherein,

the file includes an ISOBMFF (international organization for standardization base media file format) file, an

The region-related information is stored in the timing metadata.

(9)

The information processing apparatus according to (6), wherein,

the file includes an ISOBMFF (international organization for standardization base media file format) file, an

The region-related information is stored in a sample group entry.

(10)

An information processing method comprising:

a file is generated which includes, as information different from arrangement information of each of picture regions, region-related information relating to a region corresponding to a stored sub-picture in the entire picture, and which also includes image encoding data resulting from encoding the sub-picture.

(11)

An information processing apparatus, comprising:

a file acquisition section configured to acquire a file that includes, as information different from arrangement information of each of picture regions, region-related information relating to a region corresponding to a stored sub-picture in an entire picture, and that also includes image encoded data resulting from encoding the sub-picture; and

an image processing section configured to select a stream of the image encoding data based on the region-related information included in the file acquired by the file acquisition section.

(12)

The information processing apparatus according to (11), wherein,

the picture comprises an omnidirectional video.

(13)

The information processing apparatus according to (11) or (12), wherein,

the region-related information is included in the file as information of each of the sub-pictures.

(14)

The information processing apparatus according to (13), wherein,

the file includes an ISOBMFF (international organization for standardization base media file format) file, an

The region-related information is stored in a scheme information box in the ISOBMFF file or in a box at a lower layer of the scheme information box.

(15)

The information processing apparatus according to (13), wherein,

the file includes an ISOBMFF (international organization for standardization base media file format) file, an

The area-related information is stored in an overlay information box indicating a display area of the track on the spherical surface.

(16)

The information processing apparatus according to any one of (11) to (15), wherein,

the region-related information changes dynamically within the stream.

(17)

The information processing apparatus according to (16), wherein,

the file includes an ISOBMFF (international organization for standardization base media file format) file, an

The region-related information is stored in a supplemental enhancement information message.

(18)

The information processing apparatus according to (16), wherein,

the file includes an ISOBMFF (international organization for standardization base media file format) file, an

The region-related information is stored in the timing metadata.

(19)

The information processing apparatus according to (16), wherein,

the file includes an ISOBMFF (international organization for standardization base media file format) file, an

The region-related information is stored in a sample group entry.

(20)

An information processing method comprising:

acquiring a file including, as information different from arrangement information of each of picture regions, region-related information relating to a region corresponding to a stored sub-picture in an entire picture, and image encoding data resulting from encoding the sub-picture; and

selecting the stream of image encoding data based on region-related information included in the acquired file.

(21)

An information processing apparatus, comprising:

a file generating section configured to generate a file that stores, in each track, image data of each of a plurality of sub-pictures, each of which is divided from an entire picture and then encoded, and that includes stereoscopic information including information relating to stereoscopic display of the entire picture.

(22)

The information processing apparatus according to (21), wherein,

the picture comprises an omnidirectional video.

(23)

The information processing apparatus according to (21) or (22), wherein,

the stereoscopic information is included in the file as information of each of the sub-pictures.

(24)

The information processing apparatus according to (23), wherein,

the file includes an ISOBMFF (international organization for standardization base media file format) file, an

The stereoscopic information is stored in a scheme information box in the ISOBMFF file or in a box at a lower layer of the scheme information box.

(25)

The information processing apparatus according to any one of (21) to (24), wherein,

the file also includes information relating to the display size of the sub-picture.

(26)

The information processing apparatus according to any one of (21) to (25), wherein,

the file further includes sub-stereoscopic information including information related to stereoscopic display of each of the sub-pictures.

(27)

The information processing apparatus according to any one of (21) to (26), wherein,

the file further includes view information indicating a view type of the sub-picture.

(28)

The information processing apparatus according to (27), wherein,

the view information includes information of each of the regions included in the sub-picture.

(29)

The information processing apparatus according to (28), wherein,

the file further includes information indicating whether the view information exists in each of the regions.

(30)

An information processing method comprising:

generating a file that stores image data of each of a plurality of sub-pictures, each of which is divided from an entire picture and then encoded, in each track, and includes stereoscopic information including information related to stereoscopic display of the entire picture.

(31)

An information processing apparatus, comprising:

a file acquisition section configured to acquire a file that stores, in each track, image data of each of a plurality of sub-pictures, each of which is divided from an entire picture and then encoded, and that includes stereoscopic information including information relating to stereoscopic display of the entire picture; and

an image processing section configured to select a stream of the image encoding data based on the stereoscopic information included in the file acquired by the file acquisition section.

(32)

The information processing apparatus according to (31), wherein,

the picture comprises an omnidirectional video.

(33)

The information processing apparatus according to (31) or (32), wherein,

the stereoscopic information is included in the file as information of each of the sub-pictures.

(34)

The information processing apparatus according to (33), wherein,

the file includes an ISOBMFF (international organization for standardization base media file format) file, an

The stereoscopic information is stored in a scheme information box in the ISOBMFF file or in a box below the scheme information box.

(35)

The information processing apparatus according to any one of (31) to (34), wherein,

the file also includes information relating to the display size of the sub-picture.

(36)

The information processing apparatus according to any one of (31) to (35), wherein,

the file further includes sub-stereoscopic information including information related to stereoscopic display of each of the sub-pictures.

(37)

The information processing apparatus according to any one of (31) to (36), wherein,

the file further includes view information indicating a view type of the sub-picture.

(38)

The information processing apparatus according to (37), wherein,

the view information includes information of each of the regions included in the sub-picture.

(39)

The information processing apparatus according to (38), wherein,

the file further includes information indicating whether the view information exists in each of the regions.

(40)

An information processing method comprising:

acquiring a file that stores image data of each of a plurality of sub-pictures, each of which is divided from an entire picture and then encoded, in each track, and includes stereoscopic information including information related to stereoscopic display of the entire picture; and

selecting a stream of the image encoding data based on the stereoscopic information included in the acquired file.

(41)

An information processing apparatus, comprising:

a file generating section configured to generate a control file that manages image encoded data of each of a plurality of sub-pictures, each of which is divided from the entire picture and then encoded, and that includes region-related information on a region corresponding to the sub-picture in the entire picture as information different from arrangement information of each picture region in the picture region, the control file being used to control distribution of the image encoded data.

(42)

The information processing apparatus according to (41), wherein,

the picture comprises an omnidirectional video.

(43)

The information processing apparatus according to (41) or (42), wherein,

the region-related information is included in the control file as information of each of the sub-pictures.

(44)

The information processing apparatus according to (43), wherein,

the control file comprises an MPD (media presentation description) file,

managing image encoding data for each of the sub-pictures for each adaptation group,

the arrangement information of each of the picture regions is stored in a region-by-region packing descriptor, and

the region-related information is defined in a supplemental characteristic or an essential characteristic of the MPD file.

(45)

The information processing apparatus according to (43), wherein,

the control file comprises an MPD (media presentation description) file,

managing image encoding data for each of the sub-pictures for each adaptation group,

the arrangement information of each of the picture regions is stored in a region-by-region packing descriptor, and

the region-related information is defined in a content overlay description of the MPD file.

(46)

An information processing method comprising:

a control file is generated which manages image encoded data of each of a plurality of sub-pictures, each of which is divided from the entire picture and then encoded, and which includes region-related information relating to a region in the entire picture corresponding to the sub-picture as information different from arrangement information of each picture region in the picture region, the control file being for controlling distribution of the image encoded data.

(51)

An information processing apparatus, comprising:

a file acquiring section configured to acquire a control file that manages image encoded data of each of a plurality of sub-pictures, each of which is divided from the entire picture and then encoded, and that includes region-related information on a region corresponding to the sub-picture in the entire picture as information different from arrangement information of each picture region in the picture region, the control file being used to control distribution of the image encoded data; and

an image processing section configured to select a stream of the image encoding data based on the region-related information included in the control file acquired by the file acquisition section.

(52)

The information processing apparatus according to (51), wherein,

the picture comprises an omnidirectional video.

(53)

The information processing apparatus according to (51) or (52), wherein,

the region-related information is included in the control file as information of each of the sub-pictures.

(54)

The information processing apparatus according to (53), wherein,

the control file comprises an MPD (media presentation description) file,

managing image encoding data for each of the sub-pictures for each adaptation group,

the arrangement information of each of the picture regions is stored in a region-by-region packing descriptor, and

the region-related information is defined in a supplemental characteristic or an essential characteristic of the MPD file.

(55)

The information processing apparatus according to (53), wherein,

the control file comprises an MPD (media presentation description) file,

managing image encoding data for each of the sub-pictures for each adaptation group,

the arrangement information of each of the picture regions is stored in a region-by-region packing descriptor, and

the region-related information is defined in a content overlay description of the MPD file.

(56)

An information processing method comprising:

acquiring a control file that manages image encoded data of each of a plurality of sub-pictures, each of which is divided from the entire picture and then encoded, and that includes region-related information on a region in the entire picture corresponding to the sub-picture as information different from arrangement information of each picture region in the picture region, the control file being for controlling distribution of the image encoded data; and

selecting a stream of the image encoding data based on the region-related information included in the acquired control file.

(61)

An information processing apparatus, comprising:

a file generating section configured to generate a control file that manages, for each adaptation set, image encoded data of each of a plurality of sub-pictures, each of which is divided from the entire picture and then encoded, and that includes stereoscopic information relating to stereoscopic display of the adaptation set, the control file being used to control distribution of the image encoded data.

(62)

The information processing apparatus according to (61), wherein,

the picture comprises an omnidirectional video.

(63)

The information processing apparatus according to (61) or (62), wherein,

the control file further includes view information indicating a view type of the sub-picture.

(64)

The information processing apparatus according to (63), wherein,

the view information includes information of each of the regions included in the sub-picture.

(65)

The information processing apparatus according to (63) or (64), wherein,

the control file further includes information indicating whether the view information exists in each of the regions.

(66)

The information processing apparatus according to any one of (63) to (65), wherein,

the control file further includes information indicating whether the adaptation set is capable of stereoscopic display.

(67)

An information processing method comprising:

generating a control file that manages, for each adaptation set, image encoding data of each of a plurality of sub-pictures, each of which is divided from the entire picture and then encoded, and that includes stereoscopic information relating to stereoscopic display of the adaptation set, the control file being used to control distribution of the image encoding data.

(71)

An information processing apparatus, comprising:

a file acquisition section configured to acquire a control file that manages, for each adaptation set, image encoded data of each of a plurality of sub-pictures, each of which is divided from the entire picture and then encoded, and that includes stereoscopic information relating to stereoscopic display of the adaptation set, the control file being used to control distribution of the image encoded data; and

an image processing section configured to select a stream of the image encoding data based on the stereoscopic information included in the control file acquired by the file acquisition section.

(72)

The information processing apparatus according to (71), wherein,

the picture comprises an omnidirectional video.

(73)

The information processing apparatus according to (71) or (72), wherein,

the control file further includes view information indicating a view type of the sub-picture.

(74)

The information processing apparatus according to (73), wherein,

the view information includes information of each of the regions included in the sub-picture.

(75)

The information processing apparatus according to (73) or (74), wherein,

the control file further includes information indicating whether the view information exists in each of the regions.

(76)

The information processing apparatus according to any one of (73) to (75), wherein,

the control file further includes information indicating whether the adaptation set is capable of stereoscopic display.

(77)

An information processing method comprising:

acquiring a control file that manages, for each adaptation set, image encoding data of each of a plurality of sub-pictures, each of which is divided from the entire picture and then encoded, and that includes stereoscopic information relating to stereoscopic display of the adaptation set, the control file being used to control distribution of the image encoding data; and

selecting a stream of the image encoding data based on the stereoscopic information included in the acquired control file.

[ list of reference marks ]

100 file generating device, 101 control part, 102 memory, 103 file generating part, 111 data input part, 112 data coding and generating part, 113MPD file generating part, 114 storage part, 115 uploading part, 121 preprocessing part, 122 coding part, 123 segment file generating part, 200 client device, 201 control part, 202 memory, 203 reproduction processing part, 211 measuring part, 212MPD file obtaining part, 213MPD file processing part, 214 segment file obtaining part, 215 display control part, 216 data analyzing and decoding part, 217 display part, 221 segment file processing part, 222 decoding part, 223 display information generating part, 900 computer.

124页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:智能个人防护装备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类