Apparatus and method for mapping virtual environment

文档序号:1559152 发布日期:2020-01-21 浏览:15次 中文

阅读说明:本技术 映射虚拟环境的装置和方法 (Apparatus and method for mapping virtual environment ) 是由 N.A.E.瑞安 H.A.D.斯宾塞 A.斯旺 S.A.圣约翰布里斯林 P.S.帕内萨 于 2019-07-11 设计创作,主要内容包括:一种映射虚拟环境的方法,包括步骤:获取视频游戏标题输出的第一视频图像序列;获取创建视频图像的对应游戏内虚拟相机位置序列;获取创建视频图像时视频游戏使用的深度缓存的对应深度缓存值序列;以及对于获取的序列的多个视频图像和对应的深度缓存值中的每一个,获取对应于与各个视频图像内的预定位置集相对应的所选预定深度值集的映射点;其中于每对深度值和视频图像位置,映射点具有基于深度值的到虚拟相机位置的距离,以及基于虚拟相机和各个视频图像位置的相对位置的位置,从而获取对应于第一视频图像序列的映射点的地图数据集。(A method of mapping a virtual environment, comprising the steps of: acquiring a first video image sequence output by a video game title; acquiring a corresponding in-game virtual camera position sequence for creating a video image; acquiring a depth cache value sequence corresponding to a depth cache used by a video game when a video image is created; and for each of a plurality of video images of the acquired sequence and corresponding depth buffer values, acquiring a mapping point corresponding to a selected set of predetermined depth values corresponding to a set of predetermined positions within the respective video image; wherein for each pair of a depth value and a video image position, the mapped point has a distance to the virtual camera position based on the depth value and a position based on the relative positions of the virtual camera and the respective video image position, thereby obtaining a map data set corresponding to the mapped point of the first video image sequence.)

1. A method of mapping a virtual environment, comprising the steps of:

acquiring a first video image sequence output by a video game title;

obtaining a corresponding in-game virtual camera position sequence that creates the video image;

obtaining a depth cache value sequence corresponding to a depth cache used by the video game when the video image is created; and

for each of a plurality of video images of the acquired sequence and corresponding depth buffer values,

obtaining mapping points corresponding to a selected set of predetermined depth values corresponding to a set of predetermined locations within each video image; wherein

For each pair of depth value and video image position,

map points having distances to the virtual camera position based on the depth values and positions based on relative positions of the virtual camera and the respective video image positions,

thereby obtaining a map data set corresponding to mapped points of the first sequence of video images.

2. The method of claim 1, wherein the set of predetermined locations within a respective video image comprise pixels sampled from one or more horizontal lines through the respective image.

3. The method of claim 1, wherein the set of predetermined locations within each video image comprises a sampled distribution of locations over a region of the video image.

4. The method according to claim 1, comprising the steps of:

obtaining color information at the set of predetermined locations within the respective video image; and

associating the color information with the corresponding generated mapping point.

5. The method according to claim 1, comprising the steps of:

recording the first sequence of video images output by the video game;

recording the corresponding in-game virtual camera position sequence used to create the video images;

recording the sequence of corresponding depth cache values for a depth cache used by the video game when creating the video image; and

the mapping points are recorded, and the mapping points are recorded,

and

associating the sequence of in-game virtual camera positions, the sequence of depth cache values, and the mapping point with the record of the first sequence of video images.

6. The method of claim 5, comprising the steps of:

generating a second sequence of video images encoding the sequence of depth buffer values.

7. The method according to claim 1, comprising the steps of:

one or more additional sets of map data generated using sequences of video images, virtual camera positions, and depth cache values from separate instances of the same video game are obtained.

8. The method according to claim 1, comprising the steps of:

generating a graphical representation of some or all of the mapped points of at least a first map data set for the video game output.

9. The method according to claim 1, wherein,

the sequence of video images is obtained from a first video recording, wherein the corresponding virtual camera position is associated with the corresponding sequence of depth cache values.

10. The method according to claim 9, wherein,

the corresponding sequence of depth buffer values is obtained from a second video recording generated by encoding the sequence of depth buffer values.

11. The method according to claim 9, wherein,

the mapping point is obtained from data associated with a first video recording comprising the sequence of video images, or a second video recording comprising the sequence of depth buffer values.

12. The method according to claim 9, comprising the steps of:

generating at least a portion of a map comprising a graphical representation of some or all of the mapped points of at least a first map data set for display with the first video recording;

selecting a location on the displayed map using a user interface; and

controlling a playback position of the first video recording by selecting a video frame whose corresponding camera position is closest to the selected position on the displayed map.

13. The method according to claim 9, comprising the steps of:

generating at least a portion of a first map comprising graphical representations of some or all of the mapped points of at least a first map data set for display with the first video recording;

generating at least a portion of a second map comprising graphical representations of some or all of the map points of at least a second set of map data associated with different video recordings of different sequences of video images output by the same video game title and sharing the same in-game coordinate system;

displaying at least a portion of the first map during playback of the first video recording, the displayed portion including at least the current virtual camera position associated with the displayed video image; and

displaying at least a portion of the second map during playback of the first video recording if the respective portion of the second map is within a predetermined range of the current virtual camera position in the in-game coordinate system.

14. The method of claim 13, comprising the steps of:

detecting whether a user interacts with a displayed portion of the second map, and if so, switching to playback of the corresponding second video.

15. A computer-readable medium having computer-executable instructions configured to cause a computer system to perform the method of any of the preceding claims.

16. An entertainment device comprising:

a video data processor configured to obtain a first sequence of video images of a video game title output;

a camera position data processor configured to acquire a corresponding in-game sequence of virtual camera positions that created the video images;

a depth data processor configured to obtain a sequence of corresponding depth cache values for a depth cache used by the video game when creating the video images; and

a mapping data processor configured to, for each of a plurality of video images of the acquired sequence and corresponding depth buffer values,

obtaining mapping points corresponding to a selected set of predetermined depth values corresponding to a set of predetermined locations within each video image; wherein

For each pair of depth value and video image position,

map points having distances to the virtual camera position based on the depth values and positions based on relative positions of the virtual camera and the respective video image positions,

thereby obtaining a map data set corresponding to mapped points of the first sequence of video images.

Technical Field

The invention relates to an apparatus and method for mapping a virtual environment.

Background

Players of video games often need help or continue the game if they get stuck or want to find other features; or to improve their existing games, such as breaking personal best scores.

One source of assistance may be a map of the gaming environment. However, such maps are laborious to make, especially where a high level of detail is required or the gaming environment is large. Alternatively, a game developer may render a game environment from a virtual camera position at a virtual high altitude to create a map, but this, in turn, may have the disadvantage that individual features in the environment are too small or unclear; typically on a map, points of interest and other objects are not zoomed when the distance is to be zoomed.

Furthermore, maps of the overall environment may not provide a type of detail and/or a relevance to the user experience that makes the map appealing.

At the same time, video capture of in-game footage and commentary by the video creator (e.g., video capture

Figure BDA0002126357080000011

Browsing and fast-play games) are popular as guides or entertainment, but rely on commentators to provide the required information and/or to explain or demonstrate the specific actions desired by the audience. Whether or not relevant information can be provided to meet the viewer's requirements for a given video is not easily predictable, which can lead to frustration when viewing the video without revealing the desired answer. Wherein the omitted information may beSuch as the position of the player within the game. Thus, the benefits of such video to viewers who want to obtain video game assistance are highly variable.

Disclosure of Invention

The present invention seeks to solve or mitigate these problems.

In a first aspect, a method of mapping a virtual environment is provided according to claim 1.

In another aspect, an entertainment device is provided according to claim 16.

Other various aspects and features of the present invention are defined in the appended claims.

Drawings

Embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of an entertainment apparatus operable as one or more of a video recording device, a video playback device, and an event analyzer, according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a system including a video recording device, a video playback device, and an event analyzer, according to an embodiment of the present invention.

Fig. 3 is a flowchart of a video recording method according to an embodiment of the present invention.

Fig. 4A and 4B illustrate video images and corresponding depth cache information according to embodiments of the present invention.

Fig. 5 illustrates an image coding scheme of depth buffer information and additional data according to an embodiment of the present invention.

Fig. 6 is a flowchart of a video playback method according to an embodiment of the present invention.

FIG. 7 illustrates video enhancement of a graphical representation of a statistically significant in-game event, according to an embodiment of the present invention.

Fig. 8 is a flowchart of an event analysis method according to an embodiment of the present invention.

Fig. 9-12 are exemplary illustrations of possible enhancements to video recordings of game shots in accordance with embodiments of the present invention.

FIG. 13 is a flow diagram of a method of mapping a virtual environment according to an embodiment of the invention.

Fig. 14A and 14B illustrate a process of acquiring depth information of a predetermined set of points in a video image according to an embodiment of the present invention.

FIG. 14C illustrates mapped points in a map space according to an embodiment of the invention.

Detailed Description

An apparatus and method for mapping a virtual environment are disclosed. In the following description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the invention. It will be apparent, however, to one skilled in the art that these specific details need not be employed to practice the present invention. Rather, specific details known to those of ordinary skill in the art have been omitted where appropriate for the sake of clarity.

FIG. 1 schematically illustratesThe overall system architecture of the entertainment device. A system unit 10 is provided having various peripheral devices connectable thereto.

The system unit 10 includes an Accelerated Processing Unit (APU)20, the accelerated processing unit 20 being a separate chip that in turn includes a Central Processing Unit (CPU)20A and a Graphics Processing Unit (GPU) 20B. APU20 has access to Random Access Memory (RAM) location 22.

APU20 communicates with bus 40 (optionally) through I/O bridge 24, which I/O bridge 24 may be a separate element of APU20 or part of APU 20.

Connected to bus 40 are data storage components such as a hard disk drive 37 and a blu-ray drive 36 that is operable to access data on a compatible optical disk 36A. Further, the RAM unit 22 may communicate with the bus 40.

Optionally also connected to bus 40 is an auxiliary processor 38. An auxiliary processor 38 may be provided to run or support the operating system.

The system unit 10 communicates with the peripheral devices as appropriate through an audio/video input port 31, an ethernet port 32, a bluetooth wireless link 33, a Wi-Fi wireless link 34, or one or more Universal Serial Bus (USB) ports 35. Audio and video may be output through an AV output 39, such as an HDMI port.

The peripheral devices may include a monoscopic or stereoscopic camera 41 (e.g., PlayStation eye), a stick video game controller 42 (e.g., PlayStation movement), and a conventional handheld video game controller 43 (e.g., dualshrock 4), a portable entertainment device 44 (e.g., dualshrock 4), and a video game controller

Figure BDA0002126357080000031

And

Figure BDA0002126357080000032

) A keyboard 45 and/or mouse 46, a media controller 47 (e.g., a remote control), and headphones 48. Other peripheral devices may similarly be considered, such as printers or 3D printers (not shown).

GPU20B (optionally together with CPU 20A) generates video images and audio for output via AV output 39. Alternatively, it may be generated together with or by an audio processor (not shown).

Video and audio may (optionally) be presented to the television 51. With television support, the video may be stereoscopic. The audio may be presented to the home theater system 52 in one of a variety of formats, such as stereo, 5.1 surround sound, or 7.1 surround sound. Video and audio may also be presented to a head mounted display unit 53 worn by the user 60.

In operation, the entertainment device defaults to an operating system, such as a variation of freebsd 9.0. The operating system may run on the CPU20A, the auxiliary processor 38, or a mixture of both. The operating system provides a graphical user interface, such as a PlayStation dynamic menu, to the user. The menu allows the user to access operating system functions and select games and other content (optionally).

Referring now also to FIG. 2, the entertainment apparatus 10 described above may operate under appropriate software instructions as a video recording device (210A) and/or a video playback device (210B), according to an embodiment of the present invention. Optionally, the entertainment device may also operate as an event analyzer 220, separate from or integrated with the recording/playback character. In other implementations, the event analyzer may be a remote server and/or the video playback device may be a different form of device of entertainment device 10, such as a mobile phone or tablet, a PC, a smart television, a set-top box, or a different kind of video game console.

If the devices are separate devices, they may communicate over the internet (e.g., using ethernet or WiFi ports 32, 34 as appropriate, or using cellular mobile data).

Turning now also to fig. 3, video recording device 210A may operate according to the following video recording method, wherein:

a first step S310 includes recording a first sequence of video images of the video game output. For example, the PlayStation4 typically saves the video of the current video image output in a data loop that allows the last N minutes of game settings to be stored, where N may be, for example, 20 minutes. This video data may also be subsequently copied over a network to long term storage, such as a disk drive of an entertainment device or a remote host, in response to user input, an in-game event, or a predetermined event, so as not to be lost when overwritten in a data loop.

A second step S320 includes recording a sequence of depth buffer values of a depth buffer used by the video game. Entertainment devices use depth caching in calculating which portions of a virtual scene are in front of each other and therefore may occlude each other in the final rendered image. Thus, it may provide depth data for each pixel of the rendered image.

In one embodiment, the array of depth data for corresponding pixels of the rendered image may in turn be considered a depth image. Thus, for example, an 8-bit or 16-bit depth value may be stored as an 8-bit or 16-bit grayscale image corresponding to the rendered image. The depth image may be of the same resolution as the corresponding video image, or a reduced resolution version (e.g., 50% in size, having 1/4 pixels) may be used.

Thus, for example, for a conventional image format having three 8-bit data channels (e.g., for red, green, and blue), an 8-bit depth value may occupy one data channel, or a 16-bit depth value may occupy two data channels.

Fig. 4A shows a rendered image, while fig. 4B shows the corresponding depth image in two data channels.

Thus, the step of recording a sequence of depth cache values may comprise: for each instance of the recorded depth buffer, a depth image is generated using at least one of a plurality of image data channels of the depth image.

An alternative method for recording a sequence of depth buffer values is described below.

A third step S330 includes recording a sequence of in-game virtual camera positions for generating the video images. The virtual camera position is a camera position used when rendering the output video image. The position may be defined as an absolute position relative to the game world coordinate system, and/or relative to a global coordinate system having a predetermined origin (e.g., in the latter case, the position of the virtual camera in the first output video image may be set at the origin, with subsequent camera positions relative to the origin). Alternatively, additional camera position information may be provided, as desired, such as one or more of camera direction/rotation at location, field of view, focal length, closest drawing distance, farthest drawing distance, etc.

The virtual camera position data may be recorded as a separate data stream or metadata associated with the encoded depth data, as described below, but may also be recorded by encoding the data in a different one of a plurality of channels of image data that are not used to record depth buffer values. Thus, for example, in an RGB image that encodes depth data using red and green channels, the blue channel may be used to encode camera position data. Thus, in embodiments of the invention, virtual camera position data within a game is spatially recorded as high contrast tiles in respective image data channels.

Also optionally, if the game is not from the perspective of the first person (i.e. when the player is not theoretically at the virtual camera position, but a virtual character is displayed on the screen), the player's position, and optionally one or more of their direction/rotation, speed, mode (e.g. running/walking) and/or posture, may be recorded using the same coordinate scheme or the same coordinate scheme as used to record the virtual camera position data. Again, this may be a separate data stream, metadata, or encoded within the image (e.g., encoded in a similar manner with the virtual camera data).

In a first example of encoding depth data within a color channel of a video image, FIG. 5 shows RGB images of encoded depth data in the red and green channels, and camera/player position data in the blue channel, for the same image as in FIG. 4A, and corresponding to the depth data shown in FIG. 4B. For grayscale reproduction of the image, it should be understood that the blocky features represent spatial encoding of camera and player position data (optionally) and the like in the blue channel. The block size may correspond to a macroblock of the encoded image or may be any size depending on the amount of information to be encoded. The lower limit of the block size will be defined by the block size that can be reliably recovered after video recording, hosting/storage/streaming media (as the case may be), and any image compression/decompression used during the game. Also, for grayscale rendition of the image, it will be appreciated that the stripes in the terrain are caused by the lower 8 bits of the 16-bit depth data in the green channel, causing the green value to cycle between 0 and 255 as the 16-bit distance value increases. In this example, the choice of R, G and B channels is arbitrary. Thus, the supplemental data may be provided as a second sequence of images running in parallel with the sequence of video images output by the video game.

However, this scheme may suffer from higher compression rates that are often present in online video streams. Thus, a preferred set of alternative methods for recording depth cache data and other data (optionally) will now be described.

In an embodiment of the present invention, two color videos are recorded; rendering and displaying color video of a scene, as previously described, and encoding 16-bit depth data, is typically based on one of the following formats:

h.265 video using Main 4:4:4:416 intra-frame template sampled at bit depth 16 bits at 4:0:0 monochrome, or

H.265 video using Main 4:4:4:416 intra-frame template sampled at bit depth 16 bits at 4:4:4 chroma, or

Video of similar format, e.g. high throughput 4:4:416 intra-frame template of 4:0:0 or 4:4:416 bits,

for example, in any HEVC version, or equivalent coding scheme, that supports these examples, and/or

UCL color video, where a 16-bit depth buffer is converted into three color channels (e.g. RGB values), which can then be stored like normal video using H264, AV9, H265, etc.

As described in FabrizioPece, JanKautz and timweyerh for live-action video, UCL color video is capable of elastic compression of depth data (considered color data); "adapting standard video codec for depth stream", EuroVR joint virtual reality conference-EGVE (2011), editors r.

In summary, in the UCL color video format, the upper 8 bits of the first color channel are linearly mapped from 16-bit depth data to provide an approximate depth value. Then, the second and third color channels are also maps (e.g., triangular waves) that are applied to 16-bit depth data and have a period that is at least twice the quantization level of the depth data (e.g., for an 8-bit depth having 256 levels, the period will be 512 or less), but have a different phase. Because of the phase difference, they encode complementary high resolution depth information with respect to the spatial period of the function. The depth information may then be recovered by inverting the mapping in the first channel to provide a coarse depth value, and then inverting the mapping, which is typically one of the second and third channels, in dependence on the value from the first channel to obtain a relative depth value for adjusting the coarse value.

In another similarly contemplated format, the 16-bit depth data is compressed to 8 bits in a lossy manner and stored in the upper 8 bits of the first color channel. The first correction layer (e.g., the difference between the original depth layer and the decompressed version of the data in the first color channel) is then stored in the upper 8 bits of the second color channel. Finally, the second correction layer (e.g., the difference between the original depth layer and the decompressed version of the data in the first color channel corrected using the correction layer) is stored in the upper 8 bits of the third color channel. Alternatively, the correction data for a single correction step may be stored in the respective upper 8 bits between the second and third color channels.

In other words, regardless of the format used, the first 8-bit channel provides coarse but global depth values, while the second and third 8-bit channels provide higher resolution depth information to adjust the coarse estimate.

Also in both cases, if the video codec comprises e.g. 10 bit color depth, 8 bits in each channel are more robust to video compression.

It is therefore worth noting that in both cases, 16 bits of the original depth data are stored as 24-bit coded depth data (e.g. RGB data or similar data such as YUV, YCOCG, YCBCR etc.) within the host data scheme, and these bits preferably do not comprise the least significant bits in the host data scheme, although it will be appreciated that some schemes allocate different numbers of bits to different channels, so optionally but less preferably at least one but not all of the channels may be fully occupied by coded depth data. Of course, if a potential error in the depth information is acceptable, or the expected compression scheme does not affect the bit values (or only to a visually acceptable degree), the encoded depth data may occupy all of some or all of the corresponding bits of the color channel.

It is worth noting that while the above summary refers to color channels, in some color coding schemes, not all channels correspond to colors themselves (e.g., a channel may indicate brightness); in each case, however, the scheme as a whole is used to define points in a color space, and in this sense these channels are referred to herein as color channels, or equally as color space descriptor channels.

It should be understood that 16-bit depth data and 8-bit compression in three channels are merely exemplary. More generally, in UCL color video, N-bit depth data (where N is typically 16) can be encoded as M bits (where M is typically 8) per color space descriptor channel, and typically M < P, where P is the native bit depth of the corresponding channel in the host data scheme, and M is preferably 1 bit, or more preferably two bits, less than P. The number of color channels used is typically three (e.g., YUV), but may be different (e.g., CMYK).

Thus, more generally, encoding a sequence of depth buffer values includes encoding depth data of a first bit depth over a plurality of color space descriptor channels used by a selected video scheme, such that the first channel encodes data indicative of the depth to a first level of precision. The or each subsequent channel encodes data indicative of depth together with data of the first channel to a second higher level. Typically, during this encoding, the depth data in each channel is encoded to a shorter bit length than the corresponding channel, although this is not required.

Thus, both color video of the rendering environment and color coded video of the depth data may be encoded and/or compressed by conventional video coding techniques, e.g., h.265, e.g., for streaming to one or more viewers, and quantization of the depth information will generally be maintained in the same (or more) manner as the robustness of the color data in the rendered video.

Optionally, to provide ease of transmission and synchronization, both videos may be encoded as a stereo pair (although this is not the case). A fourth step S340 includes recording one or more in-game events and their respective in-game positions using a similar scheme as the virtual camera position and (optionally) the player position. The choice of what in-game events to record in this manner will be made by the designer, but may typically include one or more of collision/character death, overriding/defeating real or computer-based opponents, changing the in-game state of the user (e.g., changing weapons of equipment, etc., or using nitrogen propulsion in a car), and player selection (e.g., turning left or right to avoid obstacles, or choosing to skip it). In the latter case, the selection may be associated with a predetermined in-game decision point, which may be location-based (e.g., obstacle or path selection), or may be logical (e.g., when navigating a dialog tree using an in-game character). In the case of location-based selection, the selection made may be associated with the location of a decision point within the game rather than the location of the user or camera, due to changes about the user as the user responds to the selection, to assist in subsequent analysis of the decision. Alternatively or additionally, such decisions may be encoded when made by the user, or when the in-game decision point is at a closest drawing position relative to the virtual camera, or some other predetermined relationship with the virtual camera (e.g., within a predetermined distance) to provide predictability as to which video image may be associated with the selection data, or the selection data may be encoded for each image between these two moments (or similarly, for any video frame for which the camera and/or user avatar is within a predetermined distance of the in-game decision point). In addition to location specific events, ongoing events may be recorded. Thus, optionally, for each video image, the current user input (e.g., a button pressed or associated input value) may also be recorded in a similar manner to provide an approximate record of the user's interaction with the game, and similarly, the user's in-game position may be considered an ongoing event even if the user's in-game position (e.g., avatar position) differs from the camera position. As described later herein, while this recording step typically occurs during the game and reflects events directly caused by game play, alternatively or additionally, the recording step of such in-game events may also occur after the video images and other data are output, and optionally after they are broadcast/streamed; that is, the viewer then views the video using a viewer compatible with the techniques herein, with sufficient information available to define their own in-game events after the fact. This information can then be shared, for example, by republishing the updated videos, or by sending them to a central server, as appropriate and as described below.

As described above, data is recorded for each of a sequence of video images output by a video game, but is typically not recorded as part of the output video image sequence itself, but rather as a parallel data sequence in which at least depth data is encoded as a video image.

In a preferred color video encoding method, other information described herein encoded within the color data channel (e.g., virtual camera position, and optionally avatar position and/or in-game event position) may be stored as separate data streams within the color coded depth video (or possibly within the rendered output video, or possibly in both to provide redundancy, or across both to reduce or balance the respective overhead, e.g., to facilitate synchronous streaming of the video).

For example, the data may be stored in one or more extensible metadata platform streams or the like, each associated with one of the two videos. Alternatively, the selected video scheme may include a user field that can accommodate the data.

Thus, in a fifth step S350, the sequence of depth buffer values is encoded as a second sequence of video images; in a sixth step S360, the in-game virtual camera position sequence is associated with at least one of the first and second video image sequences (typically the second video image sequence).

In another optional step, an identifier of the video game is also associated with one or both of the video image sequences (as well as with any optional additional information, such as player position, user selection, etc., that is also encoded).

The identifier enables subsequent analysis of the video sequence without the need to identify the game from which the video sequence was recorded based on the image. The identifier may simply be the name of the plain text game or a unique issue number associated with the game. For simplicity, the identifier may correspond to an identifier already associated with the game, such as all rights to register the game using a management service associated with the host video game console.

Optionally, the one or more in-game events and their respective in-game locations are associated with an identifier of the video game.

The in-game events, their respective in-game locations, and the identifier for the video game may then (optionally) be uploaded to a remote server operable as an event analyzer 220 that receives such data from a plurality of client devices acting as video recorders, and identifies statistically significant characteristics of the data, as described below.

The in-game events and their respective in-game positions may alternatively or additionally be encoded with an identifier of the video game within the color channel of the sequence of depth buffer values, the sequence of in-game virtual camera positions, and the sequence of supplemental images, thereby (also) associating them with the identifier of the video game in this manner. This allows a particular instance of an in-game event to be associated with a particular video recording.

Optionally, each recorded image sequence of the video game (video recording) may have a unique video ID, which may (optionally) be sent to the event analyzer. The event data can then be sent to the event analyzer in association with the unique video ID. Subsequently, the event analyzer may (optionally) send the event data back to the video playback device to which the unique video ID was sent, in addition to any statistical analysis.

Similarly, optionally, a sequence of depth buffer values and/or a sequence of in-game virtual camera positions, as well as any other optional data (e.g., player avatar positions) may also be uploaded to the event analyzer in association with the unique video ID.

If all of the supplemental data is uploaded in this manner, it may be provided to the server as a parallel video recording encoded as previously described, or as a separate data element encoded by the server in this manner.

Subsequently, when the video playback device transmits the unique video ID found in the video recording, it can receive the entire supplemental data, e.g., a parallel video recording encoded as described previously.

Alternatively or additionally, the video recorder may first upload the parallel video recording to an online host (e.g., a client device)

Figure BDA0002126357080000101

Or) The URL associated with the hosted record is retrieved and embedded in the user field of the video record of the game before being uploaded to the online host. Equivalently, after uploading the parallel video recording to the online host, the video recorder may embed the video ID in the user field of the video recording using the unique video ID as the video name, which is then used as a search term for the online host.

In either case, as long as the unique video ID or URL remains associated with the original video (e.g., in the user field of the video), a video playback device configured to implement the techniques disclosed herein may access the required supplemental data by either: by requesting it from the event analyzer, or accessing the parallel hosted video from an online host, the video itself remains conventional and can be processed and transmitted through conventional or legacy devices and applications without special consideration for processing and transmitting non-standard supplemental data related to these technologies.

Turning now to fig. 6, the video playback device 210B may operate according to the following video return access method, wherein:

a first step S610 includes obtaining a first video recording of a video game session, including a sequence of video images. This may be done in any suitable way, e.g. downloading a video file, streaming for e.g. streaming

Figure BDA0002126357080000103

OrSuch as a video stream of a network service, or access a video recording already existing in a local storage (e.g., HDD37 or BD ROM 36/36a) of the playback apparatus.

A second step S620 includes retrieving a second video recording (e.g., h.265 and UCL examples herein) generated by encoding the sequence of depth buffer values.

A third step S630 includes acquiring a sequence of in-game virtual camera positions associated with at least one acquired video recording. For example as an extension to a data file or metadata provided with a video recording, or a user field. Additional data such as avatar position and in-game event data may also (optionally) be included.

Alternatively, such parallel video records including encoded data and/or one or more other data sequences (camera position, avatar position, event position, etc.) may be accessed by referencing a unique video ID obtained from a data field of a received video record and submitted to an event analyzer server, or by referencing a URL or search term obtained from a data field of a received video record for accessing data from a data hosting service.

It will be appreciated that the supplemental data may be downloaded in its entirety, or (alternatively) streamed at the same rate as the video recording of the video game (and applicable depth data video), to provide the supplemental data in real-time. Alternatively, the supplemental data may be streamed at a predetermined frame offset (e.g., 1, 2, 3, or more frames ahead of the video recording of the video game) to provide sufficient processing time to prepare the processed information for receipt of the corresponding video frame of the video game, and/or to provide any smoothing, averaging, or error correction functions for the supplemental data, which will depend on receipt of subsequent frames of the supplemental data. This may be accomplished by filling the video recording of the video game with a predetermined number of blank frames required at the beginning of the video game, or delaying the play of the video recording of the video game by a predetermined number of frames required. Such processing time shifting may also (optionally) be implemented if the game shots and depth video are encoded as stereo pairs, such that the depth video is a predetermined frame or frames prior to the game shots.

As described above, in-game event data may optionally be acquired at this stage, thereby subsequently allowing for augmentation (e.g., display of a path taken or user comment) in response to the in-game event itself.

However, alternatively or additionally, (optionally) optional step S635 includes obtaining data indicative of statistically significant in-game events and in-game event locations. As described elsewhere herein, this data is obtained as a file from the event analyzer or streamed to the video player during playback. The data typically includes data indicative of in-game event analysis data, e.g., data relating to the importance of the event and (optionally) other statistical data (and typically also the type of event, etc., to help select how the event is graphically represented).

As noted above, the selections that record what in-game time may be made by the designer, and may include one or more of collisions, character deaths, overrides or defeats an opponent or is in fact overridden or defeated by an opponent, changes in the in-game state of the user, player selections, and/or player inputs. As described above, enhancements based on these events themselves may be provided. However, alternatively, the data may be analyzed as described elsewhere herein, and the data related to the analysis may then be associated with the event location.

If the data is subsequently downloaded as a file before video playback of a particular in-game location is displayed, the event location may be used to decide when to use the event analysis data, and alternatively or additionally, when the data is subsequently streamed in synchronization with the playback of the recording of the video game, the event analysis data may then be streamed according to when the recording of the video game reaches the event location, optionally by a predetermined offset from the camera location at the time of the recording, or (optionally) from the player avatar location at the time of the recording.

The fourth step S640 then comprises calculating a position within the current video image of the first video recording (game shot) corresponding to the in-game event position based on the associated in-game virtual camera position and the decoded depth buffer value obtained from the second depth video.

Thus, for a currently displayed video image of a video game, if the position or global coordinate system of the camera used within the game is known, and the depth data for the pixels in the displayed video image is known or can be interpolated from the associated depth data, the in-game or global (X, Y, Z) coordinates for each pixel in the currently displayed video image of the video game can actually be determined. Thus, the location of the in-game event within the currently displayed video image may be determined.

In particular, if for a given event, the corresponding X, Y coordinate in the currently displayed video image is determined to have a Z coordinate that is closer than the event's Z coordinate, then the event is actually occluded from the current viewpoint of the displayed video image by objects within the virtual environment depicted within the video image.

Using this information, in a fifth step S650, the video playback device may enhance the current video image with a graphical representation of the in-game event in response to the calculated position. In particular, the video playback device may decide whether to occlude some or all of the graphical representation of the in-game event based on whether the displayed element of the environment is currently between the in-game event location and the viewpoint of the video presentation. For example, the video playback device may prepare to render a simple polygon-based object, such as a tetrahedron acting as a pointer, and then perform a so-called Z-culling on the tetrahedron in the final rendering using the Z-values of the video image, such that the tetrahedron appears to be naturally embedded in the video volume environment, appropriately occluded from the current viewpoint of the virtual camera recording the video image. Thus, the technique may include calculating any occlusion of the graphical representation at the calculated position caused by the foreground element of the video image from the depth buffer value.

The playback device may not increment each acquired in-game event. For example, the viewer may be able to switch the type of event to be displayed, or set an event threshold (e.g., only display events that add more than the threshold to the score). Likewise, depending on the game settings, the playback device may also display statistically significant in-game events or subsets, or only statistically significant events or subsets thereof.

Fig. 7 illustrates such enhancement of the example video image of fig. 4A. In this figure, the paths of other players are shown (e.g., a particularly fast or a particularly slow player is shown with a red or blue track, respectively). For grayscale rendition, the specific meaning of each track is not necessary for understanding the invention. It should be noted that when the Z position of the track data exceeds the Z position of a feature within the video image, as seen in the expanded section, the track disappears behind the feature of the environment.

Fig. 7 also shows a simple polygonal object (in this case a tetrahedron) indicating an event such as a vehicle braking or arresting. It is to be understood that more elaborate, visually appealing and information rich indicator objects, optionally with their textures, may be used for this purpose, and that in general different corresponding graphical representations will be used for corresponding different event types. It will be appreciated that other graphical effects may be applied, such as reducing the brightness of video pixels whose x, y, z positions intersect the indicator object in a line in a predetermined direction, thereby creating a distinct shadow of the indicator object within the video game environment, and thereby increasing the distinct immersion of the indicator object. Similarly, the video playback device may analyze the effective resolution or compression rate of the video and reduce the effective resolution of the indicator object to substantially match (e.g., render the object by pixelation and/or blurring) so that the indicator object appears more like part of the background video image.

Optionally, the video playback device may also obtain a sequence of in-game player positions associated with the video recording. As also previously described, this may be in a separate file or stream or encoded within the parallel image sequence. The enhancement phase may then include displaying a relationship between the current player position in the video and one or more event positions. This may variously take the form of indicating a distance/countdown value between the player location and the event location, adding a line, arrow or path between the player location and the event location, displaying or fading only the indicator object associated with the event when the distance between the player location and the event location is within a predetermined threshold, or the like.

Turning now to fig. 8, a server operating as the event analyzer 220 may operate according to the following event analysis method, wherein:

a first step S810 includes receiving video game identifiers and one or more associated in-game events and their respective in-game locations from a plurality of video game devices (functioning as video recording devices), respectively. Thus, as previously described herein, the event analyzer receives data from a plurality of video game devices relating to a particular video game that identifies an in-game event occurring within the game.

As previously described, the event analyzer may optionally receive any other supplemental data recorded by the video recording device along with a unique video recording ID.

A second step S820 then includes analyzing one or more aspects of the game events associated with the video game identifier, and their respective in-game locations, to identify statistically significant in-game events.

This may be done, for example, by geospatially analyzing multiple events of similar type to identify hot spots, cold spots, and other sets of statistics indicative of the behavior of a collective set of players for a particular instance of an event of that type or location.

An example form of geospatial analysis may be the known Getis-Ord-Gi statistics. The analysis evaluates features with respect to its neighbors so that clusters of similar features gain importance in global evaluation and are therefore identified as hotspots. Cold spots can be identified in the opposite way if desired.

The importance aspect of an event may be selected by weighting the event. Thus, for example, a set of points in game space relating to the position of a user applying brakes within a race game may be weighted according to the respective final single turn times associated with each point.

The analysis then generates a z-score for each point (e.g., which reflects how many neighbors also have high weights), and a p-value, which indicates whether the point is anomalous, respectively.

These analyses may be performed periodically for a particular game title and/or in response to receiving more data for the game (e.g., after receiving 1, 10, 100, 1000, 10000, or 100000 additional data sets, as appropriate).

It should be understood that different aspects of an event or event type may be analyzed separately in this manner depending on the weights used. For example, braking events may be analyzed with respect to the number of laps, or whether the lap was completed, the vehicle type, or demographic data of the user.

To this end, it should be understood that the video recording device may also record other information about the game, which may be referred to as session data; i.e., not the events within the game themselves, but information about the game, such as difficulty settings, player skill level or equipment used, final outcomes determined after completion of the level (time, score, achievement, ranking, etc.), etc. In the case where only a portion of the level is video recorded, the video recording device may (optionally) append such data to the video recording once the level (or other logical point at which such values are obtained) is reached, even if the video recording itself has been interrupted. Such session data may be stored as a separate file or encoded in the game video recording or parallel video recording in any suitable manner. For example, session data may be included as one or more frames at the beginning or end of a parallel video recording.

Alternatively, or in addition to such geospatial analysis, a user-generated event marker or user ID may be associated with a particular uploaded event data set.

This allows the event analyzer to provide event data corresponding to a particular individual, such as an example player found on a friends list of the user associated with their own user ID. In this way, individual selections and events from the played-back viewer's friends may be shown as an enhancement to the video, and/or the geospatial analysis is limited to their friends group, alternatively or in addition to a statistical analysis of a broader group of players.

This principle can also be extended to clans, teams and other self-identifying groups through user-generated tags, so that, for example, an electronic competition team can enhance video released by competitors simply by overlaying their performance data.

Further expanding, the geospatial analysis may be performed multiple times for different groups of players to provide analysis based on location (e.g., country statistics) and demographic data (e.g., age and gender). Such information is typically available from registration details associated with each user ID.

In any case, after the analyzing, a third step S830 includes receiving the video game identifier and at least one of the in-game virtual camera position and the in-game player position from the video playback device.

Thus, in addition to receiving data from the video recording device, the event analysis server also receives data requests from one or more video playback devices. The request identifies the video game in the video so that the event analyzer knows the data set to be referenced (although this may be implicit, e.g., when the event analysis server is dedicated to supporting a game, then the act of sending the request to the server involves identification of the relevant video game and data set).

The request also includes at least one of an in-game virtual camera position and an in-game player position. This may take different forms depending on how the data is transferred back to the video playback device. If data for the entire level (or a segment/branch of a level or region, etc.) is to be downloaded, the identification of the level is used to identify the location of the camera/player until the extent necessary to retrieve the relevant data.

Meanwhile, if data is being streamed, the current position of the camera/player corresponding to the display frame of the video stream (or, alternatively, a predetermined number of previous frames to account for access and network latency) may be used. This allows the received data stream to track the progress of events within the video, which may vary from video to video, due to how each user plays the game, and what events they encounter (e.g. car crashes or malfunctions, rather than driving smoothly), or it may be difficult to predict which events are relevant to the current video display.

It should be appreciated that a variety of schemes may be employed within the technique between location updates on a per level and per frame basis, such as periodic updates of location based on time or distance (e.g., every N seconds or no M meters, where N or M is a predetermined value). Alternatively, the trigger may be encoded by the video recording device within the video itself (e.g., a flag or value in a user field) corresponding to the time at which the event occurred during the game, so that when these are encountered during playback, the corresponding camera or player position is sent to the event analyzer for receiving data about the corresponding event.

In any event, in response, then in a fourth step S840, the event analysis server selects relevant analysis data for one or more identified statistically significant in-game events associated with the received video game identifier and having in-game locations within a predetermined distance of at least one receiving location (corresponding to a level, cycle time or distance, or current or upcoming video frame, as described above). The predetermined distance may be a drawn distance of the video game such that the indicator object of the event may appear to be part of an in-game rendering, but may in principle be any predetermined distance from the receiving location.

The selection of the analysis data may follow one or more principles. For example, data may be selected that has the greatest significance to the event (e.g., if the user is provided with a selection to turn left or right, and 95% of the players turning right complete the ranking, while 95% of the players turning left do not, this aspect of the decision event is more important than 60% of the car turning right and 60% of the motorcycle turning left). Alternatively, data relating to the player's apparent selections/events in the video may be preferentially selected. Similarly, data relating to preferences set by a viewer of the video may be selected, such as to find the most common events, or the least common events, or preferred events such as collisions but not overrides.

Finally, in a fifth step S850, the event analysis server transmits data indicating the in-game event analysis data and its in-game position to the video playback apparatus. The video playback device may then use this data to construct an enhancement layer of the video, as previously described, as shown in fig. 7.

It will be appreciated that because the location of the events within the game are itself constrained by the rules and circumstances of the game, the cumulative event data received by the event analysis server will be consistent for the game (e.g. all events relating to overrides will be at locations corresponding to tracks, or at least at locations in the race where overrides are legal). As a result, the location associated with the event analysis data and the location associated with the camera/player within the current video will coincide with each other, and the gaming environment within the video (as shown in fig. 7) will thus appear to interact naturally within the game even though the event analysis server and the video playback device may not have any explicit knowledge/model of the actual virtual environment.

It should be understood that in principle a video game console may operate as both a video recording device and a video playback device, so that users can view their own games almost immediately with the statistical event data superimposed on top. In addition, video game consoles can in principle also operate as event analysis servers, for example to analyze a user's (e.g., professional electronic competitive player) game history to help them identify trends within their game.

Variations of the above techniques are also contemplated.

For example, video compression is typically effective in reducing color bit depth. This produces a slightly flat color area but has limited impact on most videos. However, if color channels are used to encode the depth data, this compression may significantly affect the depth data.

Possible solutions include using only the more significant bits within a lane (e.g., only 6 bits in an 8-bit lane, or 8 bits in a 10-bit lane, or 10 bits in a 120-bit lane).

Another solution indicates that if a 16-bit depth value is encoded on two 8-bit color channels (each with less significant bits affected by compression), in fact the middle significant bits of the depth value may be corrupted. However, if the bits are alternated between two color channels (e.g., bit 1 (depth) to bit 1 (green), bit 2 (depth) to bit 1 (blue), bit 3 (depth) to bit 2 (green), bit 4 (depth) to bit 2 (blue), etc.), then the less significant bits of the depth data will be affected by the compression in this manner.

The use of the received analysis data may vary to suit the style of the game that has been videoed. For example, in some cases it may be appropriate to provide comments that simply explain whether a player has taken a popular or unpopular choice at some point within the game, while many graphically complex statistics relating to the location of multiple players may be appropriate for another game.

Similarly, the event analysis server may provide graphical data (indicator geometry, texture, etc.) in response to receiving the video game identifier, such that the video playback device has graphical assets to highlight events that graphically remain consistent with a particular game in the video (since the video playback device may not have a game installed on its own, and may not have information or assets regarding it).

Although the description relates to parallel video recordings using depth data and suggests that camera and event data etc. are also encoded within the data channels of the parallel video recording, in principle such data may be included in the user data fields of the video codec supporting these fields so that part or all of the data is directly included in the video recording of the game. This may be an option for videos hosted by non-legacy services that have been adapted to accommodate the present technology (e.g., by preserving such data fields, or not applying lossy compression to them).

The techniques described herein enable the use of depth caching and in-game virtual camera positions to superimpose paths (e.g., as shown in fig. 7) and other information taken by different players on a track or any other traversable virtual environment.

It should be appreciated that these techniques may facilitate enhancement of video recording of games for many purposes. For example, the graphically provided commentary and analysis may be overlaid on top of the electronic athletic cover.

Other examples include that the user may add a virtual object to the playback of the user's video. The virtual object may be a marker or a message, for example in the form of a 3D annotation with text. Typically, the user defines post-game events for which virtual objects or graphical effects are provided.

For example, when viewing a video of a player completing a track, the spectator may leave a "good-stick-like jump!at the position where the player makes the jump or at a peak height indicating the jump! "as shown in fig. 9. Where the objects selected to indicate such events are themselves 3D, they may therefore have a position and orientation that is consistent with the environment within the video, such that they appear to be part of the original captured shot. One example is an arrow set by the original player or a subsequent viewer of the video pointing to the hidden treasure, as shown in fig. 10.

In another example, the player or spectator may choose to display a "dead zone" option. The "death area" may appear in the virtual environment as a shaded space (as shown by the shaded area in fig. 11) and represents the area in the map where most players are killed. During the game, the depth data stored in the depth video may be used to render the death area so that it is displayed at the correct depth for the current view of the environment. This may be turned on by a spectator watching the player playing the game (e.g., in an electronic competition).

In another example, the virtual object may be a path taken by a successful player in a manner similar to that shown with respect to FIG. 7. In another embodiment, a virtual object may refer to an indicator such as the location of an enemy within a game. In yet another example, the virtual object may indicate the effective range of the weapon (see color-coded bands in fig. 12 indicating the effectiveness of the weapon as a distance). For example, a spectator of an electronic competition may wish to open it to see where an enemy was when the player was killed.

Thus, it will be more generally understood that while in principle an in-game event may be recorded during a game in order to enhance the video, more generally, the in-game event, and thus the virtual object or graphical effect, may be associated with the recording (offline or online) after the recording is generated and processed in the same way as the in-game event was recorded during the game as another possible source or layer of enhancement data.

Thus, it should be appreciated that after a recording has been made and output by the original player, based on where the user chooses to place the in-game event within the current image, a video viewer compatible with the parallel data set, depth, and camera position of the video may calculate where in the recording to define additional in-game events (e.g., one of the comments, objects, or regions, or other overlays mentioned above); this x, y position, in turn, corresponds to a depth value (distance) from the camera viewpoint in the respective video image, thus allowing it to be defined within the relevant data relative to the same reference point as the camera itself (e.g., in a similar manner as other in-game event data).

It should be understood that although reference is made herein to "video recording" and "video image sequences," these include pre-recorded video (e.g., uploaded to a network-based host or streaming server) as well as live video (again, e.g., uploaded to a streaming server). In either case, the ability to enhance the captured shots is based on a combination of video recording of game shots and concurrent recording of depth data and camera data.

Thus, for example, a streaming gaming service such as PS NOW may output color video and depth coded video, which may be used to render virtual objects within a live game. For example, a second player on the local console may participate by enhancing and/or visually narrating the first player's experience.

It should be understood that the methods described herein may be performed on conventional hardware as appropriate, either by software instructions or by inclusion or substitution of specialized hardware.

Thus, the required adaptation to existing parts of a conventional equivalent device may be implemented in the form of a computer program product comprising processor implementable instructions stored in a non-transitory machine readable medium, such as a floppy disk, optical disk, hard disk, PROM, RAM, flash memory or any combination of these or other storage media, or in hardware as an ASIC (application specific integrated circuit) or FPGA (field programmable gate array) or other configurable circuit suitable for adaptation to a conventional equivalent device. Separately, such computer programs may be transmitted by data signals over a network, such as an ethernet network, a wireless network, the internet, or any combination of these or other networks.

Thus, the hardware of the video recording apparatus may be a conventional computing device, such as PlayStation4 operating under suitable software instructions, which includes: a recording processor (e.g., CPU20A operating under suitable software instructions) configured to record a first sequence of video images output by the video game to a video storage device (e.g., HDD 37), the recording processor configured to record (e.g., via suitable software instructions) a sequence of depth cache values of a depth cache used by the video game, the recording processor configured to record (e.g., via suitable software instructions) a sequence of in-game virtual camera positions used to generate the video images, and the recording processor configured to record (e.g., via suitable software instructions) one or more in-game events and their respective in-game positions; a video generation processor (e.g., CPU20A, again operating under suitable software instructions) configured to generate a second sequence of video images of the sequence of coded depth buffer values; and an association processor configured to associate (e.g. by suitable software instructions) the in-game sequence of virtual camera positions to at least one of the first and second sequences of video images. In each case, the appropriate software instructions suitably implement the methods and techniques described herein.

Similarly, the hardware of the video playback apparatus may be conventional computing devices, such as PlayStation4 operating under suitable software instructions, which includes: a playback processor (e.g., CPU20A operating under suitable software instructions) configured to obtain a first video recording of a video game session, comprising a sequence of video images, the playback processor configured to obtain (e.g., by suitable software instructions) a second video recording generated by encoding a sequence of depth cache values, the playback processor configured to obtain a sequence of in-game virtual camera positions associated with at least one obtained video recording; an analysis processor (e.g., CPU20A operating under suitable software instructions) configured to provide the in-game virtual camera position to the event analyzer, the analysis processor configured to obtain data indicative of statistically significant in-game events and in-game event positions (e.g., via suitable software instructions); a position calculation processor (e.g., CPU20A operating under suitable software instructions) configured to calculate a position within a current video image of a first video recording corresponding to an in-game event position, depending on an associated in-game virtual camera position and decoded depth buffer value; and an enhancement processor (e.g., CPU20A and/or GPU20B operating under suitable software instructions) configured to enhance the current video image with a graphical representation of a statistically significant in-game event in response to the calculated position. Also in each case, the appropriate software instructions suitably implement the methods and techniques described herein.

Finally, likewise, the event analyzer may be a conventional computing device, such as a server or PlayStation4 operating under suitable software instructions, including: a receiver (e.g., an ethernet port 32 operating with CPU20A under appropriate software instructions) configured to receive video game identifiers and one or more associated in-game events and their respective in-game locations from a plurality of video recording devices, respectively; an analysis processor (e.g., CPU20A operating under suitable software instructions) configured to analyze one or more aspects of the in-game events associated with the video game identifier and their respective in-game locations to identify statistically significant in-game events; the receiver is configured to subsequently receive a video game identifier and at least one of an in-game virtual camera position and an in-game player position from the video playback device; a selection processor (e.g., CPU20A operating under suitable software instructions) configured to select one or more identified statistically significant in-game events associated with the received video game identifier and having in-game locations within a predetermined distance of at least one receiving location; and a transmitter (e.g., an ethernet port 32 operating with CPU20A under appropriate software instructions) configured to transmit data indicative of the in-game events and their in-game locations to the video playback device. Also in each case, the appropriate software instructions suitably implement the methods and techniques described herein.

As previously described, the video recording apparatus, video playback apparatus and event analyzer may be three separate devices, or may be two separate devices in any combination, or a single device that includes all three aspects.

Thus, the system (200) may include a video recording apparatus as described herein, and one or more of a video playback apparatus and an event analyzer, as separate devices or within a single device. In principle, the system may similarly include a video playback device and an event analyzer as described herein.

It will be appreciated that the use of the above techniques and apparatus provide a facility in which users can record their own game settings and upload them to a publicly accessible host, for example

Figure BDA0002126357080000211

And

Figure BDA0002126357080000212

(or perhaps a proprietary host of an administrator of a network associated with the type of video game console, for example), and then a viewer of the video may view the video of the user's game play using a device or application suitable for implementing the techniques described herein, the video being enhanced with information about how other players play the game, thereby creating an environment within the game for user decisions, success and failure, and creating a richer viewing experience.

As previously mentioned, a useful source of information for players of a game and potentially viewers of a video recording of the game is a map or similar video recording of the game that is relevant to the user experience of the game.

Turning now to fig. 13, to address or mitigate this problem, a method of mapping a virtual environment (e.g., the gaming environments shown in fig. 4A, 7, 9, 10-12) includes:

in a first step S1310, a first sequence of video images of a video game title output is obtained, for example by accessing the image data directly from an entertainment device while the game is running, or from a video recording of the game generated as described herein, as previously described.

In a second step S1320, a corresponding in-game virtual camera position sequence creating a video image is obtained, e.g. by accessing the relevant data directly from the entertainment device or from a video recording from which the image is output, or from data associated with the video recording, or from data embedded in a separate video recording also comprising depth information, or by any other suitable means for associating a virtual camera position with a corresponding video image, as previously described herein, while running the game.

In a third step S1330, a corresponding sequence of depth buffer values for a depth buffer used by the video game in creating the video image is obtained, e.g., by accessing the relevant depth buffer data directly from the entertainment device when running the game, or from data associated with a video recording from which the image is output (e.g., a second video recording in the form of a separate data file or for encoding the depth data using, for example, any suitable technique described herein), as previously described herein.

Then, in a fourth step S1340, for each of a plurality of video images of the acquired sequence and corresponding depth buffer values, sub-step S1342 comprises: obtaining mapping points corresponding to a selected set of predetermined depth values corresponding to a set of predetermined locations within each video image; wherein, for each pair of a depth value and a video image position, the map point has a distance to the virtual camera position based on the depth value and a position based on the relative position of the virtual camera and the respective video image position.

The mapped points may be obtained by calculating map locations corresponding to distances and directions of the map points, where the directions are based on differences between the optical center of the virtual camera and corresponding selected locations within the video image, and the distances are based on the virtual camera locations and depth values corresponding to the selected locations within the video image.

Fig. 14A to 14C illustrate a process of acquiring a predetermined position set of pixels including the center line of an image. In fig. 14A, the grayscale image element corresponds to the depth value, with darker values being farther away. It should be understood that the precise nature of these elements is not important to an understanding of the present invention. As described above, the predetermined set of locations 1410 is a set of pixels along a centerline of the image. The set may sample all pixels or a subset of pixels and thus may include, for example, every N pixels, where N may be 1, 2, 3,. 10, etc., as non-limiting examples.

The top circle 1420 indicates the current virtual camera position and the bottom circle 1430 indicates the virtual camera position in the case of 2D mapping (e.g., height set to zero, or equivalently the current level of the virtual ground subtracted at the virtual camera position). In this example, the field of view of the virtual camera is indicated by the angle between the two lines radiating from each circle.

As shown in fig. 14B, a corresponding depth value of each pixel in the predetermined set is acquired. It should be appreciated that in the case where the resolution of the depth value information is less than the image resolution, a regular sampling of every N pixels corresponding to the effective resolution of the depth information may alternatively be used.

As shown in fig. 14C, map points may then be identified in map space 1450 based on the relative positions of the sample pixels within the image and the positions of the virtual cameras and the acquired depth values.

Typically, a map is generated relative to the origin of the in-game coordinate system. Accordingly, map points may be calculated based on the position of the virtual camera in the game coordinate system and the distance and direction from the virtual camera, as indicated by the depth values and pixel positions within the image captured by the virtual camera.

Optionally, the calculated direction may also take into account the field of view of the virtual camera. Virtual cameras with different fields of view may result in the same pixel location within an image corresponding to different angles away from the optical axis of the virtual camera. Alternatively, the field of view may be fixed or assumed, and/or the effect of the field of view imposes a zoom factor in the calculation that is negligible for generating the map.

Similarly, the calculated direction may (optionally) further take into account the direction of the virtual camera, such that the apparent direction of the sample pixel with respect to the optical axis of the virtual camera is added to the direction of the optical axis of the camera itself. In this case, the direction of the virtual camera may be acquired together with the position of the virtual camera. Likewise, however, for some games, the orientation of the virtual camera may be fixed or assumed, and thus no additional data or calculations are required to account for it.

In any case, the map points are effectively projected from the viewpoint of the virtual camera in the direction indicated by the selected image pixels and over the distance indicated by the corresponding depth data, and are located within the in-game coordinate system on the map space.

Thus, during the course of successive images, a map data set corresponding to map points of the first sequence of video images is acquired.

It should be understood that the image data, depth data, and virtual camera position (and optionally, orientation and field of view), the data may be obtained directly from the game at runtime or may be obtained from encoded data associated with a video recording of the game encoded using any of the techniques described herein. Thus, the map data set may be acquired by calculating map points at the time of game play, or during playback of video in accordance with the techniques described herein.

It should also be understood that the map data set may similarly be recorded in association with a video recording of a game using any of the techniques described herein, such that the playback device does not have to calculate map points itself, as described below.

As described above, one predetermined set of sample locations within each video image includes the center line of the image, which generally intersects the optical axis of the virtual camera. More generally, however, the predetermined set of locations may include samples along one or more horizontal lines of pixels in the image, such as the centerline and location and a predetermined distance or proportion above and below the centerline, or a predetermined location or proportion at a predetermined location or proportion relative to the top or bottom of the image (if the centerline is not used).

Thus, for example, a horizontal line 25% up from the bottom of the image may capture features of terrain that may not be included in the centerline. At the same time, a horizontal line up 75% from the bottom of the image may capture relevant airborne features or other features that may be expected from an overhead view of the map, e.g., the centerline may intersect the trunk of the tree, but not the branches/crowns of the tree that the user may desire to see in the map.

In the case where multiple horizontal lines are used to sample the image, the absolute, relative, or grade height of the horizontal lines may be associated with the resulting map points. Thus, where map points are generated at the same location within the map space, optionally only map points associated with absolute, relative or hierarchically highest horizontal lines may be retained. Alternatively, if there are different absolute, relative or grade heights, multiple map points at the same location may be retained.

Alternatively or additionally, a predetermined sample position set location within each video image comprises a sample distribution location over the video image area. Thus, for example, a regular array of pixels may be sampled (e.g., every nth pixel horizontally, and every mth pixel vertically, where N may be as previously described and M may be similar). Obviously, in case the resolution of the depth information is lower than the pixel resolution, then the sampling may select the pixels corresponding to the effective resolution of the depth information.

In this case, a point cloud may be generated for the image frame within the map space, wherein the direction of the mapped point has a height relative to the optical axis of the video game and a left/right direction. Thus, a 3D representation of the map may be generated within the map space. Such a map may still be rendered in 2D (i.e. ignoring height information) if desired, but (optionally) the point cloud may be used as a basis for rendering a 3D approximation of the mapped area.

Alternatively or additionally, the highest height value in the point cloud within the predetermined 2D area of the map may be stored in association with the map point to give a height value in a similar manner as described above for sampling on multiple horizontal lines.

Additional information that may be included in the map dataset includes the continuous position of the virtual camera (e.g., in the case of a first person perspective game) and/or the position of the user's avatar (e.g., in the case of a third person perspective game), which may also record the user's position trajectory within the map.

In an embodiment of the invention, the further steps comprise: color information is acquired at a predetermined set of locations within each video image, and the acquired color information is associated with a corresponding generated mapping point.

In other words, points on the map may be associated with corresponding points of the same color within the video game environment, such that subsequent graphical displays of the map can resemble the video game environment as seen from a reduced-scale, top-down perspective.

To assist in this process, and referring again to fig. 14C, it should be understood that not only the mapped points may be associated with corresponding points on the screen, but also points along the projection line from the optical axis of the virtual camera to the mapped points may be associated with corresponding points on the screen (which, as previously described herein, may be calculated based on camera position, depth information, and the direction of view from the camera through the sampling point to the virtual environment). Thus, referring to fig. 14C, while map points representing the end points of lines have links corresponding to depth information on the image, directions corresponding to the position of sample pixels on the screen and the virtual camera optical axis (as described above), lines emanating from the virtual camera position 1420, it should be understood that color information for points along such lines is also available as they each represent a clear line of sight between the camera and the end point, and that color information for corresponding positions of any display surface below these lines can be sampled for each corresponding position within the map (e.g., for each map pixel, or for each P map pixels where P can be 1, 2, 3,. 10, etc.).

In this way, the unobtrusive color of the terrain visible at the current virtual camera position can be sampled and used to color the map.

It will also be appreciated that using a horizontal line position at a suitable position below the image centre line, and/or using a sample distribution over the image area, or more generally sampling a line of sight that is not parallel to the angle of the virtual terrain but converges at some point, will result in the line of sight terminating at the ground/terrain of the virtual environment. Thus, where color information is captured only at the termination point, it is sufficient to populate the map, and possibly elevated features (e.g., buildings, trees, etc.), with the color information relating to the terrain.

Thus, it should be understood that the map may include multiple types of data, including mapping point data indicating the termination points of lines of sight from the virtual camera through a predetermined set of sample locations, and (optionally) simple color data of visible terrain elements below those lines of sight, and/or similar (optionally) color data associated with the mapping points themselves.

As described above, the resulting stored colour information then allows a map or map interface to be generated that is similar to the environment seen within the game, as it effectively samples the rendered textures of the displayed environment and stores these textures at corresponding locations within the map data.

It should be understood that while the video game itself may use this information to generate a map reflecting the user's personal and potentially unique journey within the game for display during the game, potentially mapped points may be recorded in association with a record of video game images of the kind previously described herein, and more generally a map data set including such mapped points, and (optionally) any colour information described herein.

Therefore, in an embodiment of the present invention, a mapping method includes: the step of recording a first sequence of video images of the video game output using any of the techniques described herein; a step of recording a corresponding in-game virtual camera position sequence for creating video images using any of the techniques described herein; a step of recording, using any of the techniques described herein, a sequence of corresponding depth cache values of a depth cache used by the video game while creating the video image; and recording mapped points for auxiliary data, such as event data, camera data, etc., also using any of the techniques described herein; and associating the sequence of in-game virtual camera positions, the depth buffer value, and the mapping point with a recording of the first sequence of video images, also as previously described herein.

As previously described, one of the techniques for recording depth buffer values includes generating a second sequence of video images encoding a sequence of depth buffer values. As previously mentioned, the second sequence of video images may accommodate corresponding auxiliary data, such as virtual video camera information, including location and (optionally) orientation, event data, and/or (it should be understood) map points, map point height information, map point color information (and/or map color information only), and/or the like.

However, it should be understood that any of the previously described techniques may alternatively be used, for example using a separate data file associated with the recording of the first sequence of video images.

It should also be understood that a map data set including map points, and (optionally) map point height information, map point color information, and/or map color only information may be generated during a single game instance (e.g., directly associated with a video sequence corresponding to that instance of the game), but alternatively or additionally, the map data set may be stored locally at the entertainment device, or remotely at the server, so that successive game instances may be added to the map data set, creating a cumulative record of user exploration of virtual environments extending beyond the single game instance.

The mappings obtained from the various game instances may be combined because they share a common in-game coordinate system.

It should be understood that this is true whether the various instances of the game are from the same user or from the same entertainment device. Thus, in embodiments of the invention, map data sets from multiple users may be combined, with their mapped points, or more generally map data points other than empty or unfilled points, (optionally) associated with a given user, so that multiple maps that may occasionally intersect or overlap may be disambiguated by the user.

In this way, for example, a user can compare his or her map to the maps of their friends and see where different friends have a shared or unique experience.

Similarly, in this manner, an entertainment device or server may compile a more complete environment map (e.g., if the environment is programmatically generated) by aggregating map data sets generated by multiple seekers of the environment, which may not otherwise be easily mapped.

Therefore, the mapping method may include: one or more additional sets of map data generated using sequences of video images, virtual camera positions and depth buffer values derived from separate instances of the same video game are obtained. In this case, "separate instances" may likewise mean separate instances of a video game played by the same user on the same device, separate instances of a video game played by different users on the same device, or separate instances of a video game played (by any user) on different devices. It is clear that when playing games on different devices, the "video game" itself may be a different installation of the same video game title.

A graphical representation of some or all of the mapped points of at least the first map data set may then be generated, for example for a video game output.

The above description discusses the creation of a map data set, and more generally, the mapping data points used by a map data set that uses images, depth information, camera information, and (optionally) other information (e.g., color information obtained directly from the video game itself). However, it should be understood that one of the various techniques described herein may be used to record video images from the game, as well as depth information, camera information, and (optionally) other information, for subsequent playback that is not dependent on the source video game itself, as described herein.

Thus, it should be understood that the steps of obtaining a first sequence of video images for video game title output (S1310), obtaining a corresponding sequence of in-game virtual camera positions for which video images were created (S1320), and obtaining a corresponding sequence of depth buffer values for a depth buffer used by the video game when the video images were created (S1330) may all be accomplished by retrieving the video game images and associated files (e.g., a second video record encoding depth information and other auxiliary data as described herein) from their video records.

Accordingly, a sequence of video images may be acquired from a first video recording, wherein corresponding virtual camera positions are associated with a sequence of depth buffer values. Typically, but not necessarily, the corresponding sequence of depth buffer values is obtained from a second video recording generated by encoding the sequence of depth buffer values, as described herein.

Accordingly, a playback apparatus using the recorded information can similarly acquire map points using the data, and as previously described herein, can construct an equivalent map data set by calculating them from the image, depth, and camera data. The playback device may then enhance the playback of the recorded information with the map data, or render it separately.

Alternatively or additionally, also as previously described herein, the entertainment device may record a mapped data set, which, as previously described, includes map points and (optionally) height data, color data, etc.; or more generally, the map points may simply be recorded using any suitable technique described herein as ancillary data similar to the event data or camera data, such that the data is associated with a first video recording comprising a sequence of video images, or (optionally) a second video recording comprising a sequence of depth buffer values, typically by encoding in one of these videos.

In this way, advantageously, at least for a portion of the map corresponding to image data that has not yet been displayed (i.e. map information corresponding to a current location displayed in the video, accessing map information that would otherwise require image/depth/camera information for locations before or after the current location), a map point or set of map data may be included in such a way that the playback device may access the map point, and (optionally) height data, color data, etc.

Thus, the playback device may generate at least a partial map, including a graphical representation of some or all of the mapped points of at least the first map data set, for display with the first video recording, where the partial map may correspond to a scene in the video that has not yet been displayed.

By displaying some or all of the map that actually summarizes the scenes in the video recording, the map may be used as a user interface for controlling the video recording.

For example, in a video recording of a journey from a start point a to an end point B, the map will be similar to the path following the camera position displayed to the user during that journey.

Thus, by using a suitable user interface, such as a mouse pointer, slider, or a dial controlled by a joystick, a location on the displayed map can be selected. And this intended position may serve as a substitute for the desired position within the video playback.

As a result, the playback position of the first video recording can be controlled by selecting the video frame whose corresponding camera position is closest to the selected position on the displayed map.

In this way, for example, a user may download a video of a trip from start point a to end point B and render with a map that summarizes the trip (either as an initial interface or as an overlay that (optionally) is overlaid or placed alongside the video playback); the user can then click on a point in the map and playback will jump to the corresponding portion of the video recording.

Similarly, the current camera position associated with a given displayed video frame may be displayed on a map so that viewers of the video can easily associate what they see with the position on the map. This may also help the user navigate the playback of the video, for example if more traditional fast forward or rewind controls are used, rather than or as such interacting directly with the map.

As previously discussed, map data sets from multiple game instances may be combined. Thus, while playback of the current video may use the map data associated with the video, map data sets from other game instances (e.g., from a central server) may also be accessed. A typical set of map data to be accessed may be won by the same user but with different scores or other measurable achievements, or by friends of a player also playing the game in the current video, or by friends of a viewer also playing the current video of the game, or by other players identified as likely to be interested in the viewer, such as players at the top of a local, regional, or global leaderboard of the game, or players having event data in their records that is different from the event data in the current video record (e.g., in the currently displayed video, there is no related crash event near the currently displayed location, but another video does have a related crash event near the currently displayed location).

The graphical display of at least a portion of the map from the map data set of one or more of these additional players or game instances can be easily included with the map of the current video because the mapped points share the same in-game coordinate system.

Thus, the mapping method may comprise the steps of: generating at least a portion of a first map, the first map including graphical representations of some or all of the mapped points of at least the first map data set for display with the first video recording; and generating at least a portion of a second map comprising at least a graphical representation of some or all of the mapped points of a second set of map data associated with different video recordings of different sequences of video images output by the same video game title and sharing the same in-game coordinate system; displaying at least a portion of the first map during playback of the first video recording, the displayed portion including at least a current virtual camera position associated with the displayed video image; and displaying at least a portion of the second map during playback of the first video recording if the respective portion of the second map is within a predetermined range of the current virtual camera position in the in-game coordinate system.

In this manner, a viewer of the current video may see separate map occurrences (e.g., different tortuous trajectories corresponding to player movement in different videos) when a game depicted in another available video occurs in an area of the virtual game environment that is the same as or similar to the current video.

Subsequently, in an embodiment of the present invention, the playback means may detect whether the user interacts with the displayed portion of the second map, and if so, switch to playback of the corresponding second video.

In this manner, when the maps corresponding to the videos indicate that the videos display scenes from virtual environments that are within a proximity threshold of each other, the viewer may navigate the virtual environment from the perspective of different players by jumping between the videos.

The user's interaction with the second map may be similar to their interaction with the first map, i.e., the second video starts at a frame whose virtual camera position best corresponds to the selected position on the second map. Alternatively or additionally, the user may indicate a desire to switch the video stream at the closest point between the mapped points on the two maps or between the virtual camera locations on the two maps. Thus, for example, when the mapping points or camera positions substantially intersect (i.e., as individual players in the video cross path), the user may indicate a desire to switch video streams.

This gives the viewer the freedom to browse the video recording of the game environment by jumping to the recording of a different player at the corresponding location in their respective in-game tour.

As previously mentioned, maps and (by extension) video recordings may be selected from a broader library of materials based on any reasonable criteria, such as video recordings made by friends of the viewer or friends of the original player, or video recordings made by the highest ranked player, or video recordings with the most "like" or other viewer indications, approvals, etc.

One possibility is that a group of players within a multiplayer online game each record their own perspective of the game and associate the videos into a group. These videos may then be viewed according to rules appropriate for generating narratives of the group story, using the techniques described herein. Thus, for example, within a dead tournament game, two players are grouped together such that the maps of their two videos intersect, if one of the players kills the other, the video continues if the user is watching the video of the winning player, while if the user is watching the video of the defeated player, the video switches to the video of the winning player at the point where their respective camera positions are closest, optionally after the killing event. In this case, for example, event information is also encoded to indicate the nature of such interaction. In a similar manner, within a racing game, for example, video playback may switch to anyone who exceeds the current leader.

As previously mentioned, it should be understood that the methods described herein may be performed on suitable conventional hardware suitably configured by software instructions or by inclusion or substitution of dedicated hardware. Thus, the required configuration of existing parts of a conventional equivalent apparatus may be implemented in the form of a computer program product comprising processor implementable instructions stored or transmitted by any suitable means, as previously described.

Thus, for example, the hardware for an entertainment device operable to implement the mapping techniques described herein may be a conventional computing device, such as PlayStation4 operating under suitable software instructions, that includes: a video data processor (e.g., CPU20A and/or GPU20B) configured to obtain (e.g., via suitable software instructions) a first sequence of video images of a video game title output; a camera position data processor (e.g., CPU20A and/or GPU20B) configured to obtain (e.g., by suitable software instructions) a corresponding in-game virtual camera position sequence that creates a video image; a depth data processor (e.g., CPU20A and/or GPU20B) configured to obtain (e.g., via appropriate software instructions) a corresponding sequence of depth cache values for a depth cache used by the video game in creating the video images; and a mapping data processor (e.g., CPU20A and/or GPU20B) configured (e.g., by suitable software instructions) to, for each of a plurality of video images of the acquired sequence and corresponding depth cache values, acquire mapping points corresponding to a selected set of predetermined depth values corresponding to a set of predetermined positions within the respective video image; wherein for each pair of a depth value and a video image position, the mapping point has a distance from the virtual camera position based on the depth value and a position based on the relative position of the virtual camera and the respective video image position, thereby acquiring a map data set of mapping points corresponding to the first sequence of video images.

Variations of the above-described hardware, corresponding to the various techniques described and claimed herein, as would be apparent to one skilled in the art, are considered to be within the scope of the present invention, including but not limited to:

a predetermined set of locations comprising pixels sampled from one or more horizontal lines through the respective images;

a predetermined set of locations comprising a sampling distribution of locations on a video image area;

a color processor (e.g., CPU20A and/or GPU20B) configured to obtain color information at a predetermined set of locations within each video image and associate the color information with a corresponding generated mapping point;

hardware as previously described herein configured to record a first sequence of video images output by a video game, record a corresponding sequence of in-game virtual camera positions for creating the video images, record a corresponding sequence of depth buffer values of a depth buffer used by the video game when creating the video images (e.g., by generating a second sequence of video images encoding the sequence of depth buffer values), and record mapping points (and optionally mapping color data, height data, etc.), and associate the sequence of in-game virtual camera positions, the sequence of depth buffer values, and the mapping points with the record of the first sequence of video images;

a map processor (e.g., CPU20A and/or GPU20B operating under suitable software instructions) configured to obtain one or more additional map data sets generated using video image sequences, virtual camera positions and depth cache values derived from separate instances of the same video game;

a graphical output processor (e.g., CPU20A and/or GPU20B operating under suitable software instructions) configured to generate a graphical representation of some or all of the map points of at least the first map data set for video game output;

a sequence of video images obtained from a first video recording, wherein corresponding virtual camera positions are associated with a sequence of depth cache values;

the corresponding sequence of depth cache values is obtained from a second video recording generated by encoding the sequence of depth cache values;

the mapping point is obtained from data associated with a first video recording comprising a sequence of video images, or a second video recording comprising a sequence of depth buffer values;

a map interface processor (e.g., CPU20A and/or GPU20B operating under suitable software instructions) configured to generate at least a portion of a map, the first map including a graphical representation of some or all of the mapped points of at least a first map data set for display with a first video recording, select a location on the displayed map using a user interface, and control a playback location of the first video recording by selecting a video frame whose corresponding camera location is closest to the selected location on the displayed map;

a map interface processor (e.g., CPU20A and/or GPU20B operating under suitable software instructions) configured to generate at least a portion of a first map comprising graphical representations of at least some or all of the mapped points of a first map data set for display with a first video recording, generate at least a portion of a second map comprising graphical representations of at least some or all of the mapped points of a second map data set associated with different video recordings of different video image sequences output by the same video game title and sharing the same in-game coordinate system, display at least a portion of the first map during playback of the first video recording, the displayed portion containing at least a current virtual camera position associated with the displayed video image, and if the respective portion of the second map is within a predetermined range of current virtual camera positions in the in-game coordinate system Within the enclosure, displaying at least a portion of the second map during playback of the first video recording; and

the map interface processor is configured to detect whether a user interacts with the displayed portion of the second map and, if so, switch to playback of the corresponding second video.

40页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:虚拟形象直播方法和装置、服务器及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类