Real-time VR image filtering method, system and storage medium based on fixation point information

文档序号：1046869 发布日期：2020-10-09 浏览：29次中文

阅读说明：本技术 基于注视点信息的实时vr图像过滤方法、系统和存储介质 (Real-time VR image filtering method, system and storage medium based on fixation point information ) 是由周鹏冀德于 2019-03-27 设计创作，主要内容包括：本发明涉及基于注视点信息的实时VR图像过滤方法、系统和存储介质,包括：通过头戴显示设备中高速相机捕捉人眼图像,从该人眼图像中提取眼球的注视点信息；根据该头戴显示设备的头部方位信息确定观察视角,并渲染出该观察视角观看到的VR画面；根据该注视点信息确定该VR画面的注视点区域,根据该注视点区域选定该VR画面的非注视点区域,并对该非注视点区域进行过滤。本发明可提高图像压缩比,并可在传统VR流程基础上对图像压缩比进行深度提升,进一步降低VR对传输带宽的需求。本发明还公开了一种用于所述基于注视点信息的实时VR图像过滤系统的实施方法,以及存储介质,用于存储执行所述基于注视点信息的实时VR图像过滤方法的程序。(The invention relates to a real-time VR image filtering method, a system and a storage medium based on fixation point information, comprising the following steps: capturing a human eye image through a high-speed camera in head-mounted display equipment, and extracting the fixation point information of eyeballs from the human eye image; determining an observation angle of view according to the head position information of the head-mounted display device, and rendering a VR picture observed by the observation angle of view; and determining a gazing point area of the VR picture according to the gazing point information, selecting a non-gazing point area of the VR picture according to the gazing point area, and filtering the non-gazing point area. The invention can improve the image compression ratio, and can deeply improve the image compression ratio on the basis of the traditional VR process, thereby further reducing the requirement of VR on transmission bandwidth. The invention also discloses an implementation method for the real-time VR image filtering system based on the gazing point information, and a storage medium for storing a program for executing the real-time VR image filtering method based on the gazing point information.)

1. A real-time VR image filtering method based on fixation point information is characterized by comprising the following steps:

step 1, capturing a human eye image through a high-speed camera in head-mounted display equipment, and extracting the fixation point information of eyeballs from the human eye image;

step 2, determining an observation angle of view according to the head orientation information of the head-mounted display device, and rendering a VR picture observed by the observation angle of view;

and 3, determining the gazing point area of the VR picture according to the gazing point information, selecting the non-gazing point area of the VR picture according to the gazing point area, and filtering the non-gazing point area.

2. The method of claim 1, wherein the step 3 comprises:

step 31, the gazing point information includes axis direction information of a focus area, the axis direction information is mapped to pixel coordinates of a display screen through a distortion correction function to obtain the pixel coordinates of the focus area, and the focus area is divided into the following areas according to a preset range by taking a central point of the focus area as a circle center: a primary region, a middle region, and a secondary region;

step 32, image filtering processing is performed on the intermediate region and the secondary region.

3. The method of claim 2, wherein the predetermined range of each region is specifically:

a main region, which is a region within 60 degrees of the fixation axis;

a middle region, looking axially in the range between 60 and 120 degrees;

the secondary region is a region other than 120 degrees in the gaze axis direction.

4. The method of claim 2, wherein the filtering radius of the middle region is smaller than the filtering radius of the secondary region.

5. A real-time VR image filtering system based on point-of-regard information, comprising:

the module 1 captures a human eye image through a high-speed camera in head-mounted display equipment, and extracts the fixation point information of eyeballs from the human eye image;

the module 2 determines an observation angle of view according to the head position information of the head-mounted display device, and renders a VR picture observed by the observation angle of view;

and the module 3 determines a gazing point area of the VR picture according to the gazing point information, selects a non-gazing point area of the VR picture according to the gazing point area, and filters the non-gazing point area.

6. The real-time VR image filtering system of claim 5 based on gaze point information, wherein the module 3 includes:

the module 31, the gazing point information includes axis direction information of a focus area, the axis direction information is mapped to a pixel coordinate of a display screen through a distortion correction function to obtain the pixel coordinate of the focus area, and the focus area is divided into according to a preset range by taking a central point of the focus area as a circle center: a primary region, a middle region, and a secondary region;

and a module 32 for performing image filtering processing on the middle area and the secondary area.

7. The real-time VR image filtering system of claim 6 wherein the predetermined range for each region is specifically:

a main region, which is a region within 60 degrees of the fixation axis;

a middle region, looking axially in the range between 60 and 120 degrees;

the secondary region is a region other than 120 degrees in the gaze axis direction.

8. The real-time VR image filtering system of claim 6 wherein the filtering radius of the middle region is smaller than the filtering radius of the secondary region.

9. An implementation method for the real-time VR image filtering system based on the gaze point information as claimed in any one of claims 5 to 8.

10. A storage medium storing a program for executing the real-time VR image filtering method based on gaze point information of any one of claims 1 to 4.

Technical Field

The invention relates to the field of Virtual Reality (VR) and image compression, in particular to a real-time VR image filtering method, a real-time VR image filtering system and a real-time VR image filtering storage medium based on fixation point information.

Background

Previous VR systems include four parts: 1) a VR rendering system, 2) a data transmission system, 3) a display system, 4) a motion feedback system. The VR rendering system is responsible for generating left and right eye images for binocular vision. The data transmission system is responsible for providing a high-speed data channel for transferring images to the display system. 3) The display system is responsible for updating pictures to the display device at a higher refresh rate. The display device provides independent binocular visual channels for the user to experience stereoscopic vision. 4) The motion feedback system is responsible for capturing motion data of the user and feeding the data back to the VR rendering system, so that the rendering perspective changes with the motion of the user.

The previous VR system workflow is:

1) the VR rendering system transfers the three-dimensional scene data and rendering parameters to a graphics pipeline on the graphics card through a graphics API, such as OpenGL or Direct3D, and the rendered image is saved in a video memory. The camera parameters used by the rendering process come from the motion feedback system of the VR headset. Because the system is oriented to binocular vision, the rendering system alternately renders the left and right images. The image is also lens corrected, typically barrel corrected (barrel correction), after rendering is complete. This is because the image ultimately needs to be viewed through the lens of the display system, which causes a pincushion distortion to the flat image. Since the pincushion distortion and barrel distortion are inverse functions of each other, the barrel distortion of the image must be made in advance in order to cancel the effect of the pincushion distortion.

2) The data transmission system acquires the video memory address of the image, and then takes out the data in the video memory and transmits the data to the display equipment through the high-speed channel. At this stage, the data may be compressed or uncompressed. For uncompressed data, the transmission is typically done using HDMI or DisplayPort channels. For compressed data, wireless or limited networks are typically used for transmission and display systems are required to have decompression capabilities.

3) The display system comprises a lens system and a display screen with a higher refresh rate. The lens system uses two independent sets of convex lenses to magnify and relay the content of the display screen to the left and right eyes of use, respectively. The display screen splices the content of the left eye image and the right eye image together and displays the spliced content on the same screen. The lens system will focus the lenses of both eyes on the corresponding screen positions. To meet VR refresh rate requirements, the display screen typically needs to be capable of refreshing at least 90 times per second. For compressed images, the display system also needs to perform image decompression before doing so. The decompression process is typically done by an on-board decoding chip, which process typically does not exceed 5 milliseconds.

4) The motion feedback system primarily senses the user's head rotation and displacement and the direction of gaze of the eye, and is typically located in the same device as the display system. To reduce motion to photo latency, sensing devices typically sample motion data at a higher frequency, for example 120 times per second. The sampled motion data is communicated to the VR rendering system using a separate channel, typically a USB or network, independent of the image channel. The rendering system updates the camera parameters of the graphics pipeline and the visible three-dimensional object set according to the head and eyeball data to form a control closed loop.

The eyeball gaze information of the prior art is only used for guiding a rendering process to perform multi-resolution rendering (i.e., a picture is divided into a plurality of regions, and different resolutions are used for rendering in different regions). The division of the area is based on the fixation point: the area where the point of regard is located uses the highest resolution; the non-gaze point region uses low resolution. This significantly reduces the amount of shading (shading) computation for low resolution areas, thereby increasing rendering speed. However, multi-resolution rendering suffers from two problems:

1) it must be explicitly bound to VR applications at the architecture level as a stage of the VR rendering pipeline. Therefore, VR applications that do not implement multi-resolution rendering cannot add this functionality without modifying the architecture.

2) Because the two sides of the boundary divided by the color-changing boundary are regions with different resolutions, the color abrupt change exists, and therefore, the human eyes are more sensitive to the boundary. To alleviate this problem, multi-resolution rendering has to keep the boundaries as far away from the point of regard as possible, i.e. reduce the non-point of regard area. The existence of this contradiction greatly reduces the effect of the algorithm.

Disclosure of Invention

In order to solve the above technical problems, the present invention aims to provide an image filtering system based on gaze point information, which mainly solves two problems: 1. on the premise of not changing the original VR assembly line, the fuzzy degree of the non-fixation point area can be dynamically adjusted, and the color similarity of the secondary area is increased; 2. smooth transition of the boundary region is ensured, making the boundary less noticeable, thereby allowing a larger non-gazing point region than multi-resolution rendering.

It should be noted that the present invention is not competitive with the conventional multi-resolution rendering, and can be used in combination with the conventional multi-resolution rendering, so that the boundaries of the multi-resolution rendering can be made smoother, and the range of the non-gazing point region can be expanded.

Specifically, the invention discloses a real-time VR image filtering method based on fixation point information, which comprises the following steps:

step 1, capturing a human eye image through a high-speed camera in head-mounted display equipment, and extracting the fixation point information of eyeballs from the human eye image;

The real-time VR image filtering method based on the gazing point information, wherein the step 3 comprises the following steps:

step 32, image filtering processing is performed on the intermediate region and the secondary region.

The real-time VR image filtering method based on the gazing point information is characterized in that the preset range of each area is as follows:

a main region, which is a region within 60 degrees of the fixation axis;

a middle region, looking axially in the range between 60 and 120 degrees;

the secondary region is a region other than 120 degrees in the gaze axis direction.

The real-time VR image filtering method based on the gazing point information is characterized in that the filtering radius of the middle area is smaller than that of the secondary area.

The invention also discloses a real-time VR image filtering system based on the fixation point information, which comprises:

the module 1 captures a human eye image through a high-speed camera in head-mounted display equipment, and extracts the fixation point information of eyeballs from the human eye image;

the module 2 determines an observation angle of view according to the head position information of the head-mounted display device, and renders a VR picture observed by the observation angle of view;

The real-time VR image filtering system based on the gazing point information, wherein the module 3 includes:

and a module 32 for performing image filtering processing on the middle area and the secondary area.

The real-time VR image filtering system based on the gazing point information is characterized in that the preset range of each region is as follows:

a main region, which is a region within 60 degrees of the fixation axis;

a middle region, looking axially in the range between 60 and 120 degrees;

the secondary region is a region other than 120 degrees in the gaze axis direction.

The real-time VR image filtering system based on the gazing point information, wherein the filtering radius of the middle area is smaller than that of the secondary area.

The invention also discloses an implementation method for the real-time VR image filtering system based on the fixation point information.

The invention also discloses a storage medium for storing a program for executing the real-time VR image filtering method based on the gazing point information.

The invention effectively improves the color similarity of the non-fixation point area through the image filtering technology, and has two advantages compared with the prior art:

1) the image filtering time is within 1 millisecond, and the image compression ratio can be obviously improved on the premise of not obviously increasing the system delay

2) The image compression ratio can be deeply improved on the basis of the traditional VR process, and the requirement of VR on transmission bandwidth is further reduced.

Drawings

FIG. 1 is a system block diagram;

FIG. 2 is a view point region division diagram;

fig. 3 is a map of gaze point information to pixels.

Detailed Description

The invention realizes a real-time VR image filtering system based on the fixation point information. The system consists of the following subsystems: an eye tracking system, a VR rendering system, an image filtering system, and an image compression system.

An eye tracking system is located in a head-mounted display device, captures images of a human eye with a high-speed camera, and extracts eye gaze point information from the images.

The VR rendering system receives head position information of the head mounted display device, determines a viewing angle and renders a frame of a virtual scene visible from the viewing angle.

The image filtering system determines a fixation point area in a picture according to eyeball fixation point information of the receiving head-mounted display equipment, and filters a non-fixation point area, such as fuzzy operation. Filtering is essentially a weighted sum of neighboring pixels, eliminating the original high frequency (much higher color value than neighboring pixels) and low frequency (much lower color value than neighboring pixels) data. This is equivalent to narrowing down the color range of each pixel of the filter region and approaching the weighted average, thereby bringing the colors closer together.

An image compression system receives the partially filtered image and generates encoded data. Compared with an unfiltered image, the color similarity of the filtered image in a non-gazing point area is increased, so that less encoded data can be generated after the image is subjected to an image compression algorithm. Although the filtering operation may cause blurring of the image, human eyes have low recognition of regions other than the fixation point, and thus the blurring is hardly perceived by human eyes. Conventional VR systems do not have image filtering and therefore have to reduce the amount of data transmitted by means of high compression ratio image compression algorithms before passing the data through the network. However, high compression ratios are traded for computation time and image quality, which is unacceptable for VR applications. The system utilizes the information of the fixation point, reduces the resolution ratio of a secondary area in the image, and reduces the data volume at the cost of quality which is difficult to observe by human eyes, so that an algorithm which is stronger in real time can be selected in an image compression stage, and the VR application is better adapted.

In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

The system used in the invention comprises six parts: 1) a VR rendering system, 2) an image filtering system, 3) a data compression system, 4) a data decompression system, 5) a motion feedback system, and 6) a display system. The VR rendering system is responsible for generating stereoscopic images, i.e. rendering images alternately to be observed by the left and right eyes, respectively. In order to meet the requirement of VR immersion, the rendering speed at least reaches 90 left and right eye images per second. And after the image filtering system outputs the image to the rendering system, respectively determining non-fixation point areas of the left eye and the right eye according to the fixation point information provided by the motion feedback system, and performing image filtering processing on the non-fixation point areas. And the data compression system compresses the filtered image and transmits the encoded data stream to the client through the network. The data decompression system of the client receives the image coding data from the network, restores the left-eye image and the right-eye image from the image coding data, and finally submits the left-eye image and the right-eye image to the display system. The display system respectively refreshes the left eye image and the right eye image to the screen areas corresponding to the left eye and the right eye, and the binocular vision channels are isolated through the lens system, so that the two eyes of a user respectively see correct images. The main difference between the present invention and the prior art is the use of eye capture information, and therefore, the present invention will be described only with respect to a human eye capture system among motion feedback systems. The human eye capturing system captures the state of human eyes at a certain moment through a high-speed camera, and then extracts the observation direction of the human eyes, namely the fixation point information. The frequency of the snap shots is typically more than 90 times per second to match the output images of the rendering system and the image filtering system to the human eye movements.

The image filtering system is the core of the invention and is a processing stage which is not existed in the prior art. Its treatment is divided into two stages:

1) and dividing a fixation point area and a non-fixation point area in the image according to the fixation point information. The point of regard gives information about the direction of the axis of the region of interest, which is used to determine the two-dimensional coordinates (u, v) on the curved surface of the lens. Due to the pincushion distortion induced by the lens, this coordinate has a one-to-one mapping functional relationship with the pixel coordinate (x, y) of the display screen, i.e., (x, y) ═ f (u, v). By solving this function, the central pixel of the region of interest can be obtained. This mapping process is device independent, i.e. lens systems of different manufacturers follow this principle with only parameter differences. The boundary of the region of interest may have an arbitrary shape with respect to the center pixel, and in this example, a circle is used for description. The distance from the center to the boundary can be configured as a system parameter, for example, dividing the region of interest of the human eye into three parts: a) a primary region within 60 degrees of the gaze axis, b) a middle region within 60 degrees to 120 degrees of the gaze axis, and c) a secondary region outside 120 degrees of the gaze axis.

2) And carrying out image filtering processing on the non-fixation point area. The main area needs to maintain the original resolution of the image because the human eye is extremely sensitive to any image loss in that area. The intermediate region and the secondary region belong to regions that deviate from the focus of the human eye, and blurring the contents of these regions hardly causes the human eye to perceive them. The invention is not limited to functions that achieve the blurring effect, and the description will be made using a gaussian blurring function in this example. The secondary region is less visually important than the intermediate region, so its degree of blur may be relatively high. This can be achieved by using a larger filtering radius for the gaussian function. The filtering process can be accelerated massively in parallel by means of the general-purpose computing unit of the graphics card, which usually takes no more than 1 millisecond.

The following are method examples corresponding to the above system examples, and this embodiment mode can be implemented in cooperation with the above embodiment modes. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.