client device, client device processing method, server, and server processing method

文档序号：1722495 发布日期：2019-12-17 浏览：17次中文

阅读说明：本技术 客户端设备、客户端设备处理方法、服务器以及服务器处理方法 (client device, client device processing method, server, and server processing method ) 是由塚越郁夫于 2018-04-19 设计创作，主要内容包括：允许多个客户端(查看器)共享它们的VR空间以便彼此进行通信。接收来自服务器的分布式服务器的流,该分布式服务器的流包括通过编码背景图像获得的视频流。接收来自另一客户端设备的客户端发送流,该客户端发送流包括用于显示另一客户端的代表图像的代表图像元信息。解码视频流以获得背景图像。基于代表图像元信息生成代表图像的图像数据。通过在背景图像上合成代表图像来获得显示图像数据。(Multiple clients (viewers) are allowed to share their VR space to communicate with each other. A stream of a distributed server from the servers is received, the stream of the distributed server including a video stream obtained by encoding the background image. A client transmit stream is received from another client device, the client transmit stream including representative image element information for displaying a representative image of the other client. The video stream is decoded to obtain a background image. Image data representing an image is generated based on the representative image element information. Display image data is obtained by synthesizing the representative image on the background image.)

1. a client device, comprising:

A receiver configured to receive a stream of a distributed server from a transmitter, the stream of the distributed server including a video stream obtained by encoding a background image, the background image having a view angle of at least 180 degrees; the receiver is further configured to receive a client transmit stream from another client device, the client transmit stream comprising representative image metadata corresponding to a representative image of the other client device; and

A controller configured to control:

a decoding process to decode the video stream to obtain the background image,

Representative image data generation processing of generating the representative image based on the representative image element information, an

And image data synthesizing processing of synthesizing the representative image on the background image.

2. The client device according to claim 1, wherein the information indicating the allowable composition range for the representative image in the background image is inserted in a layer of the video stream and/or a layer of a stream of the distributed server; and is

The controller is configured to control the synthesis processing based on the information indicating the allowable synthesis range such that the representative image is placed within the allowable synthesis range in the background image.

3. The client device according to claim 2, wherein the representative image meta-information includes synthesis position information indicating a synthesis position in the allowable synthesis range for the representative image; and is

the controller is configured to control the synthesizing process in such a manner that the representative image is synthesized at the synthesis position indicated by the synthesis position information.

4. The client device according to claim 2, wherein the representative image meta-information includes size information indicating a size of the representative image; and is

The controller is configured to control the synthesizing process so as to synthesize the representative image on the background image in accordance with the size indicated by the size information.

5. The client device of claim 3, wherein the client transmit stream comprises audio data corresponding to the representative image metadata and object metadata; and is

the controller is further configured to perform audio output processing in which rendering processing corresponding to the object metadata is performed on the audio data to obtain audio output data in which a sound image position coincides with a synthesis position of the representative image.

6. the client device of claim 3, wherein the client transmit stream comprises text data corresponding to the representative image metadata and display position information; and is

The controller is further configured to control a text synthesis process to synthesize text display data on the background image based on the display position information, thereby displaying text represented by the text data at a position corresponding to the synthesis position of the representative image.

7. The client device of claim 1, further comprising a transmitter configured to transmit the client transmit stream to the other client device, the client transmit stream comprising the representative image element information for displaying a representative image of the other client;

Wherein the representative image data generation processing further generates a representative image of the other client based on the representative image meta-information for displaying the representative image of the other client.

8. the client device of claim 1, wherein the background image is image data of a wide view image, the wide view being 270 degrees or greater;

The controller also controls an image clipping process of clipping a portion of the background image to obtain display image data.

9. A client device processing method, comprising:

Receiving, with a receiver, a stream of a distributed server including a video stream obtained by encoding a background image, a transmitted image having a view angle of at least 180 degrees, from a server; and further receiving a client transmission stream from another client device, the client transmission stream including representative image metadata for displaying a representative image of the other client; and is

controlling with a controller:

A decoding process to decode the video stream to obtain the background image,

Representative image data generation processing of generating the representative image based on the representative image element information, an

and image data synthesizing processing of synthesizing the representative image on the background image.

10. A server, comprising:

An imaging device configured to capture an image of a subject to obtain a background image, the background image having a viewing angle of at least 180 degrees; and

A transmitter configured to transmit a stream of a distributed server including a video stream obtained by encoding the background image to a client device;

Wherein the information indicating the allowable composition range for representing an image in the background image is inserted into a layer of the video stream and/or a layer of a stream of the distributed server.

11. The server of claim 10, wherein the background image is image data of a wide view image, the wide view being at least 180 degrees.

12. a non-transitory computer readable medium having computer readable instructions which, when executed by a processor, perform a method comprising:

Receiving, using a receiver, a stream of a distributed server including a video stream obtained by encoding a background image, a transmitter image having a view angle of at least 180 degrees from a server; and further receiving a client transmission stream from another client device, the client transmission stream including representative image metadata for displaying a representative image of the other client; and is

Controlling with a controller:

A decoding process to decode the video stream to obtain the background image,

Representative image data generation processing of generating the representative image based on the representative image element information, an

And image data synthesizing processing of synthesizing the representative image on the background image.

Technical Field

The present technology relates to a client device, a client device processing method, a server, and a server processing method. More particularly, the technology relates to a client device that performs the following: a proxy image (or representative image) such as an avatar of each client is synthesized on the background image (or transmitted image) from the server.

Background

There are Head Mounted Displays (HMDs), each of which is worn on the head of a user and is capable of presenting an image to the user using, for example, a display part located in front of the eyes of the user (for example, see patent document 1). In recent years, a tendency of personal enjoyment of an all-day-space-level image prepared for using Virtual Reality (VR) on an HMD has been witnessed. It may be desirable for everyone to enjoy a personalized VR space, and multiple people may not only view their VR spaces individually, but may also share their own VR spaces for communicating with each other.

Reference list

Patent document

Patent document 1: JP 2016-

Disclosure of Invention

Technical problem

The purpose of the present technology is to allow multiple clients (viewers) to share their VR space in order to communicate with each other.

Means for solving the problems

According to an aspect of the present technology, there is provided a client apparatus including:

A receiving section configured to receive a stream of a distributed server from the server, the stream of the distributed server including a video stream obtained by encoding the background image, and also receive a client transmission stream from another client device, the client transmission stream including proxy image meta information for displaying proxy images of other clients; and

A control section configured to control a decoding process of decoding the video stream to obtain a background image, a proxy image data generation process of generating a proxy image based on the proxy image meta information, and an image data synthesis process of synthesizing the proxy image on the background image.

With the present technology, the reception section receives a stream of a distributed server from a server, the stream of the distributed server including a video stream obtained by encoding a background image, and also receives a client transmission stream from another client device, the client transmission stream including proxy image meta information for displaying proxy images of other clients. The proxy image is, for example, an avatar or symbol recognizable as a character.

the control section or a controller such as a computer processor controls the decoding process, the proxy image data generating process, and the image data synthesizing process. The decoding process involves decoding the video stream to obtain a background image. The proxy image data generation process involves generating a proxy image based on the proxy image meta information. The image data synthesis process involves synthesizing a proxy image on a background image.

For example, information indicating an allowable composition range for the proxy image in the background image may be inserted into a layer of the video stream and/or a layer of the stream of the distributed server. Based on the information indicating the allowable combination range, the control section may control the combination processing such that the proxy image is placed within the allowable combination range in the background image.

in this case, the proxy image meta-information may include synthesis position information indicating a synthesis position within an allowable synthesis range for the proxy image. The control section may control the synthesizing process so that the proxy image is synthesized at the synthesizing position indicated by the synthesizing position information. Also in this case, for example, the proxy image meta-information may include size information indicating the size of the proxy image. The control section may control the synthesizing process so that the proxy image is synthesized on the background image in accordance with the size indicated by the size information.

With the present technology, as described above, the proxy image is generated in the background image based on the proxy image meta information. The proxy image is synthesized over the background image. This allows each client to identify a proxy image of another client synthesized over the common background image. Thus, clients can share their VR space to communicate pleasantly with each other.

Note that in accordance with the present technique, the client send stream may include audio data corresponding to proxy image meta-information, as well as object meta-data, for example. The control section may further perform audio output processing in which rendering processing corresponding to the object metadata is performed on the audio data to obtain audio output data whose sound image position coincides with the synthesis position of the proxy image. This allows each client to recognize as if each proxy image transmitted the client's voice represented by the proxy image at its synthesized location in its background image.

Also with the present technology, for example, the client transmission stream may include subtitle (or text) data corresponding to proxy image meta-information and display position information. The control section may also control the subtitle synthesis process to synthesize the subtitle display data on the background image based on the display position information, such that the subtitle represented by the subtitle data is displayed at a position corresponding to the synthesis position of the proxy image. This allows each client to identify subtitles from the proxy image of the other client at a location corresponding to the composite location of the proxy images of the other clients in the background image.

also with the present technology, for example, the client device may further include a transmission section configured to transmit a client transmission stream including proxy image meta information for displaying a proxy image of the own client to another client device. The proxy image data generation process may also generate the proxy image of the own client based on proxy image meta information for displaying the proxy image of the own client. This makes it possible to synthesize not only proxy images of other clients but also proxy images of own clients on the background image.

also with the present technology, for example, the background image may be a wide view image, where the wide view is an image of 180 degrees or more. The control section may also control an image clipping process of clipping a part of the background image to obtain display image data. For example, an image derived from the display image data may be displayed on the HMD, with the clip range determined by the pose of the head detected by a sensor mounted on the HMD.

Further, according to another concept of the present technology, there is provided a server including:

An imaging section configured to image a subject to obtain a background image; and

A transmitting section configured to transmit a stream of a distributed server including a video stream obtained by encoding a background image to a client device;

Wherein the information indicating the allowable composition range for the proxy image in the background image is inserted into a layer of the video stream and/or a layer of the stream of the distributed server.

With the present technology, an imaging section images a subject to obtain a background image. The background image may be image data such as a wide view image. The transmitting section transmits a stream of a distributed server including a video stream obtained by encoding a background image to the client device. In this configuration, information indicating the allowable composition range for the proxy image in the background image is inserted into the layer of the video stream and/or the layer of the stream of the distributed server.

According to the present technology, as described above, when transmitting, information indicating the allowable composition range for the proxy image in the background image is inserted into the layer of the video stream and/or the layer of the stream of the distributed server. This makes it easy for the client device to place the proxy image of each client in the background image within the range expected by the server based on the information indicating the allowable composition range.

Advantageous effects of the invention

the present technology allows multiple clients to share their own VR space for communicating with each other. Note that the above-described advantageous effects are not limitations of the present disclosure. Other advantages of the present disclosure will become apparent from the description that follows.

Drawings

FIG. 1 is a block diagram depicting a typical configuration of a space-sharing display system embodying the present technology.

Fig. 2 is a schematic diagram depicting a typical relationship between a server and a plurality of client devices between which streams are sent and received.

Fig. 3 is a block diagram depicting a typical configuration of a server.

Fig. 4 is a table diagram depicting a typical structure of a video attribute information SEI message.

Fig. 5 is a table diagram depicting the contents of main information in a typical structure of a video attribute information SEI message.

Fig. 6 is a set of schematic diagrams illustrating information on camera states.

fig. 7 is a table diagram depicting typical information saved in a video attribute information box.

fig. 8 is a block diagram depicting a typical configuration of a transmission system of a client device.

FIG. 9 is a set of tabular diagrams depicting a typical structure of avatar rendering control information and the contents of the main information in the typical structure.

FIG. 10 is a set of tabular diagrams depicting a typical structure of avatar database selection information and the content of the main information in the typical structure.

Fig. 11 is a set of table diagrams depicting a typical structure of voice object rendering information as object metadata on each object and the contents of main information in the typical structure.

Fig. 12 is a diagram illustrating how values of "azimuth", "radius", and "elevation" are obtained.

Fig. 13 is a set of table diagrams illustrating a typical structure of a TTML structure and metadata.

Fig. 14 is a block diagram depicting a typical configuration of a receiving system of a client device.

Fig. 15 is a block diagram depicting a typical configuration of a receiving module.

Fig. 16 is a block diagram depicting a typical configuration of an avatar database selecting part.

FIG. 17 is a table diagram depicting a typical list of an avatar database.

Fig. 18 is a schematic diagram outlining a rendering process performed by a renderer.

Fig. 19 is a schematic diagram outlining sound pressure control by remapping performed by the renderer.

Fig. 20 is a schematic diagram depicting a typical background image.

fig. 21 is a diagram depicting a typical state in which an avatar and a subtitle are synthesized in the allowable synthesis range (sy _ window) of a background image.

Detailed Description

Described below are preferred modes (hereinafter, referred to as embodiments) for carrying out the present invention. Note that the description will be given under the following headings:

45页详细技术资料下载

client device, client device processing method, server, and server processing method

相关技术

网友询问留言