Method and device for forming 3D vision by using dynamic image of endoscope lens

文档序号：1538296 发布日期：2020-02-14 浏览：29次中文

阅读说明：本技术 一种利用内镜镜头动态图像形成3d视觉的方法及装置 (Method and device for forming 3D vision by using dynamic image of endoscope lens ) 是由张澍田陈东张一� 于 2019-11-29 设计创作，主要内容包括：本发明提供一种利用内镜镜头动态图像形成3D视觉的方法及装置,涉及3D视觉形成技术领域。该利用内镜镜头动态图像形成3D视觉的方法及装置,包括单镜头拍摄系统、视差图提取系统、深度图计算系统、视频合成系统、可视化系统与数据处理系统,步骤如下：S1.帧间视差图的提取：利用单镜头拍取所需要的图片组,保证拍摄的图片与图片之间有一定的时间间隔,然后将拍摄的图像输入视频编码器,并计算每帧的视差图,编码器的输出包括一个标准的MPEG4格式数据,以及X方向和Y方向的视差图。通过合理的对图像进行处理,使得计算深度映射方法无要非常高的计算密集型操作,同样适合在低端硬件上将2D视频转换为3D视频。(The invention provides a method and a device for forming 3D vision by utilizing an endoscopic lens dynamic image, and relates to the technical field of 3D vision formation. The method and the device for forming 3D vision by utilizing the dynamic images of the endoscope lens comprise a single-lens shooting system, a disparity map extraction system, a depth map calculation system, a video synthesis system, a visualization system and a data processing system, and comprise the following steps: s1, extracting an inter-frame parallax image: the method comprises the steps of taking a required picture group by using a single lens, ensuring a certain time interval between the taken pictures, inputting the taken images into a video encoder, and calculating a disparity map of each frame, wherein the output of the encoder comprises standard data in an MPEG4 format and disparity maps in an X direction and a Y direction. By reasonably processing the image, the method for calculating the depth mapping does not need very high calculation intensive operation, and is also suitable for converting the 2D video into the 3D video on low-end hardware.)

1. An apparatus for forming 3D vision by using a dynamic image of an endoscope lens is characterized in that: the system comprises a single-lens shooting system, a disparity map extraction system, a depth map calculation system, a video synthesis system, a visualization system and a data processing system.

2. A method for forming 3D vision by using an endoscope lens dynamic image is characterized by comprising the following steps: the method comprises the following steps:

s1, extracting an inter-frame parallax image:

taking a required picture group by using a single lens, ensuring a certain time interval between the shot pictures, inputting the shot images into a video encoder, and calculating a disparity map of each frame, wherein the output of the encoder comprises standard MPEG4 format data and disparity maps in the X direction and the Y direction;

s2, converting the disparity map into a depth map:

the depth map is obtained by using the extracted inter-frame parallax map, the calculation of the dense depth map is basically a process of solving the corresponding relation between one frame of pixel position and the other frame of pixel position, the mapping from one image to the other image can be obtained by combining the space field around each pixel in one image with the other image, and the method for directly calculating the depth maps is to extract the motion vector existing in the compressed video file;

s21, motion vector extraction:

in the temporal compression in which each frame image is divided into blocks and a block search is performed between adjacent frames to determine the positions of the blocks, in such a manner that it is necessary to move a memory block from one frame to another frame in order to reduce the amount of information to be stored, MPEG4 can calculate a motion vector at a block size of 4 × 4 pixels so as to have an accuracy of one-quarter pixel;

s22, mapping the motion vector to the depth mapping:

the motion vector image is directly taken as a depth image to be processed, the approximation is suitable for two images/frames to be shot in parallel, or the two images/frames are obtained in a small parallax range under the condition of extra-large acquisition, the dynamic range of the depth value needs to be changed under the condition of amplification, the expansion quantity of the dynamic range needs to be equal to a scaling factor, and in order to achieve the purpose of visualization, under the condition of rotating around a specific object, the parallax value needs to be reversed so that the approaching object receives higher parallax;

s3, video synthesis:

the image is oversampled four times on the X-axis to enable quarter-pixel accuracy, then resampled by a depth map controlled grid, interpolation using the same scheme as MPEG 4; interpolating the image into two times by using a six-tap filter, and then realizing quadruple interpolation by using bilinear interpolation;

s4, visualization:

the method of Anaglyphs is utilized to realize the visualization of three-dimensional images, which is most suitable for viewing by using standard hardware without special display hardware, and the synthesis of Anaglyphs is a simple process, wherein a red channel in one image is replaced by a red channel of a second image of a stereo pair, and defocusing and depth map compression of the red channel are utilized;

s5, data processing:

a video sequence is acquired and saved as a motion JPEG (mjpeg) sequence, and then the frames are split into JPEG image frames so that adjacent frames can be processed as stereo pairs, and in order to test different types of motion, the single-lens camera system is moved along the X-axis and y-axis, and the rotation of the object is also tested.

Technical Field

The invention relates to the technical field of 3D vision formation, in particular to a method and a device for forming 3D vision by utilizing an endoscope lens dynamic image.

Background

In recent years there has been significant progress in the development of stereoscopic display technology, including the use of autostereoscopic displays (displays that support independent 3D viewing) and multi-view autostereoscopic displays, and despite the great progress in display technology, the problem of content generation still exists, the problem of acquisition of stereoscopic content also still exists, mainly due to the problem of time synchronisation, and the zoom and focus characteristics of the stereoscopic arrangement, and furthermore there are many ways in which the stereo arrangement does not support the use of multi-view displays, calculating depth maps from stereo pairs (or adjacent video frames).

The main drawback of all these methods is however that they require very computationally intensive operations, although these techniques can be implemented on modern high-end computers or dedicated hardware, they are not suitable for converting 2D video into 3D video on low-end hardware, and moreover, although there are some software solutions for low-resolution image depth analysis (excluding stereo synthesis) and hardware solutions for VGA resolution, the amount of computation increases dramatically with the introduction and popularity of high-definition television, and obviously any additional information that helps to determine the optical flow is desirable without increasing the computational complexity or introducing more computational operations.

Disclosure of Invention

Technical problem to be solved

In view of the deficiencies of the prior art, the present invention provides a method and apparatus for 3D vision formation using dynamic images from an endoscopic lens, which solves the problem of the existing computational depth mapping methods that require very computationally intensive operations, although these techniques can be implemented on modern high-end computers or dedicated hardware, they are not suitable for converting 2D video to 3D video on low-end hardware.

(II) technical scheme

In order to achieve the purpose, the invention is realized by the following technical scheme: a device for forming 3D vision by utilizing an endoscopic lens dynamic image comprises a single-lens shooting system, a disparity map extraction system, a depth map calculation system, a video synthesis system, a visualization system and a data processing system.

A method for forming 3D vision by using an endoscope lens dynamic image comprises the following steps:

s1, extracting an inter-frame parallax image:

s2, converting the disparity map into a depth map:

s21, motion vector extraction:

s22, mapping the motion vector to the depth mapping:

s3, video synthesis:

s4, visualization:

s5, data processing:

The working principle is as follows: taking a required picture group by using a single lens, ensuring a certain time interval between the shot pictures, inputting the shot images into a video encoder, and calculating a disparity map of each frame, wherein the output of the encoder comprises standard MPEG4 format data and disparity maps in the X direction and the Y direction; obtaining a depth map by using the extracted inter-frame disparity map, wherein the calculation of the dense depth map is basically a process of solving the corresponding relation between one frame of pixel position and another frame of pixel position, and the spatial field around each pixel in one image is combined with the other image; the image is oversampled four times on the X-axis to enable quarter-pixel accuracy, then resampled by a depth map controlled grid, interpolation using the same scheme as MPEG 4; interpolating the image into two times by using a six-tap filter, and then realizing quadruple interpolation by using bilinear interpolation; the method of Anaglyphs is utilized to realize the visualization of three-dimensional images, which is most suitable for viewing by using standard hardware without special display hardware, and the synthesis of Anaglyphs is a simple process, wherein a red channel in one image is replaced by a red channel of a second image of a stereo pair, and defocusing and depth map compression of the red channel are utilized; a video sequence is acquired and saved as a motion JPEG (mjpeg) sequence, and then the frames are split into JPEG image frames so that adjacent frames can be processed as stereo pairs, and in order to test different types of motion, the single-lens camera system is moved along the X-axis and y-axis, and the rotation of the object is also tested.

(III) advantageous effects

The invention provides a method and a device for forming 3D vision by utilizing an endoscopic lens dynamic image. The method has the following beneficial effects:

1. the method and the device for forming 3D vision by using the endoscope lens dynamic image adopt three-stage processing, wherein (1) the extraction of an inter-frame parallax image, (2) the inter-frame parallax image is converted into a depth image, and (3) an artificial stereo (two or more views) image is generated by using a synthesized depth image.

2. According to the method and the device for forming the 3D vision by utilizing the dynamic images of the endoscope lens, the images are reasonably processed, so that the method for calculating the depth mapping does not need very high calculation-intensive operation, and is also suitable for converting the 2D video into the 3D video on low-end hardware.

Drawings

FIG. 1 is a schematic overall flow diagram of the present invention;

fig. 2 is a diagram illustrating conversion of a disparity map into a depth map according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

7页详细技术资料下载

Method and device for forming 3D vision by using dynamic image of endoscope lens

相关技术

网友询问留言