Image parallel registration method, system and device based on GPU computing platform

文档序号:1379777 发布日期:2020-08-14 浏览:22次 中文

阅读说明:本技术 基于gpu计算平台的图像并行配准方法、系统、装置 (Image parallel registration method, system and device based on GPU computing platform ) 是由 赵美婷 蒿杰 吕志丰 范秋香 于 2020-04-23 设计创作,主要内容包括:本发明属于图像配准技术领域,具体涉及一种基于GPU计算平台的图像并行配准方法、系统、装置,旨在解决现有技术中海量图像下基于傅里叶变换的图像配准算法处理效率低的问题。本发明提供的基于GPU计算平台的图像并行配准方法,将图像配准并行化,对海量图像进行多GPU任务划分,根据图像分辨率大小划分子任务,将子任务分配给GPU的线程块,在核函数内基于傅里叶变换的配准算法并行完成数据计算,从而对图像配准进行加速,且傅里叶变换的配准算法每一个子步骤均是在GPU核函数内完成,使得每个GPU内最大化并行效率。本发明采用异步传输的方式实现数据传输、配准、传回和写入磁盘三个过程流水线并行,提高了海量图像并行配准的效率,做到实时处理。(The invention belongs to the technical field of image registration, and particularly relates to a method, a system and a device for parallel image registration based on a GPU (graphics processing unit) computing platform, aiming at solving the problem of low processing efficiency of an image registration algorithm based on Fourier transform under massive images in the prior art. According to the image parallel registration method based on the GPU computing platform, image registration is parallelized, multiple GPU task division is carried out on massive images, sub tasks are divided according to the size of image resolution, the sub tasks are distributed to thread blocks of the GPU, data computation is completed in parallel in kernel functions based on a Fourier transform registration algorithm, and therefore image registration is accelerated, and each sub step of the Fourier transform registration algorithm is completed in the GPU kernel functions, so that the parallel efficiency in each GPU is maximized. The invention realizes the pipeline parallelism of three processes of data transmission, registration, return and disk writing by adopting an asynchronous transmission mode, improves the efficiency of parallel registration of massive images and realizes real-time processing.)

1. An image parallel registration method based on a GPU computing platform is characterized in that the number of GPUs in the GPU computing platform is X, and the method comprises the following steps:

step S100, acquiring a template image, acquiring frequency domain data of the template image through a first registration algorithm to serve as first data, and respectively storing the first data into a video memory of each GPU; the first registration algorithm is a Fourier transform-based registration algorithm;

step S200, segmenting the template image to obtain N images with the same resolution, calculating corresponding frequency domain data through the first registration algorithm to serve as second data, and storing the second data into a video memory of each GPU;

step S300, acquiring a group of images to be registered, dividing the images to be registered in the group of images to be registered, and respectively inputting the images to be registered into X memory buffer areas;

step S400, each GPU reads the image to be registered in the corresponding memory buffer area to a video memory, and respectively acquires the frequency domain image of each image to be registered as third data through a kernel function and a first registration algorithm; based on the first data and the third data, obtaining translation parameters of the image to be registered through a preset translation parameter calculation method, translating the translation parameters, and taking the translated image to be registered as a first image;

step S500, segmenting the first image to obtain N second images with the same resolution, and respectively calculating through the first registration algorithm to obtain corresponding frequency domain data as fourth data; and based on the second data and the fourth data, obtaining a translation parameter of the second image by the preset translation parameter calculation method and translating the translation parameter to obtain a registered image.

2. The image parallel registration method based on the GPU computing platform of claim 1, wherein the "segmenting the template image" in step S200 and the "segmenting the first image" in step S500 are based on a preset segmentation method, and the preset segmentation method is: the method comprises the following steps of segmenting an image to be segmented through a sliding window with preset parameters, and obtaining N small images with the same resolution ratio after segmentation, wherein the calculation formula of N is as follows:

wherein, W is the width of the image to be segmented, H is the height of the image to be segmented, Sw is the width of the sliding window, Sh is the height of the sliding window, and D is the sliding step length of the sliding window.

3. The image parallel registration method based on the GPU computing platform of claim 1, wherein the pre-set translation parameter calculating method in step S400 specifically comprises the following steps:

step A100, based on the first data and the third data, calculating through a CUDA library function and inverse Fourier transform to obtain time domain data of each image to be registered;

step A200, based on the time domain data, obtaining the translation parameters of each image to be registered through kernel function calculation.

4. A GPU computing platform based image parallel registration method as claimed in claim 1, wherein the GPU computing platform further comprises a CPU, the method further comprising the steps of:

and step S600, each GPU respectively transmits the registered images to a CPU memory and stores the images to a hard disk.

5. A GPU computing platform based image parallel registration method according to claim 1, wherein the step S100 of obtaining the frequency domain data of the template image by the first registration algorithm is completed in the GPU.

6. The image parallel registration method based on the GPU computing platform of claim 5, wherein in step S200, the template image is segmented by a preset segmentation method to obtain N images with the same resolution, and the frequency domain data corresponding to the N images are obtained by the first registration algorithm through calculation respectively and are completed in the GPU as the second data.

7. The image parallel registration method based on the GPU computing platform of claim 1, wherein in step S300, dividing the images to be registered in the image group to be registered is dividing the images to be registered in the image group to be registered based on the number of the images to be registered and the number of GPUs.

8. An image parallel registration system based on a GPU computing platform is characterized by comprising a CPU module and X identical GPU modules;

the CPU module is configured to transmit the template images to the GPU module, divide the images to be registered in the images to be registered based on the number of the images to be registered and the number of the GPU modules, and input the divided images to be registered into X memory buffers respectively;

the GPU module is configured to acquire a template image from the CPU module, acquire frequency domain data of the template image through a first registration algorithm, serve as first data, and respectively store the first data into a video memory; the first registration algorithm is a Fourier transform-based registration algorithm;

segmenting the template image by a preset segmentation method to obtain N images with the same resolution, calculating corresponding frequency domain data by the first registration algorithm respectively to serve as second data, and storing the second data into a video memory respectively;

reading the images to be registered in the corresponding memory buffer area to a video memory, and respectively acquiring the frequency domain images of the images to be registered as third data through a kernel function and a first registration algorithm; based on the first data and the third data, obtaining translation parameters of the image to be registered through a preset translation parameter calculation method, translating the translation parameters, and taking the translated image to be registered as a first image;

segmenting the first image by a preset segmentation method to obtain N second images with the same resolution, and respectively calculating corresponding frequency domain data by the first registration algorithm to serve as fourth data; and based on the second data and the fourth data, obtaining a translation parameter of the second image by a preset translation parameter calculation method, translating to obtain a registered image, and transmitting the registered image to the CPU module.

9. A storage device having stored therein a plurality of programs, wherein the program applications are loaded and executed by a processor to implement the GPU computing platform based image parallel registration method of any of claims 1-7.

10. A processing device comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; characterized in that the program is adapted to be loaded and executed by a processor to implement the GPU computing platform based image parallel registration method of any of claims 1-7.

Technical Field

The invention belongs to the technical field of image registration, and particularly relates to a method, a system and a device for parallel image registration based on a GPU computing platform.

Background

Image registration is an important technology in image processing, and mainly refers to a process of aligning two or more images of the same object at spatial positions and mapping one image onto the other image by finding a spatial transformation so that points corresponding to the same spatial position in the two images are in one-to-one correspondence. Image registration is an important step for accurately obtaining image information, and has wide research and application in the fields of remote sensing images, medical images, computer vision, target positioning, even neural research and the like.

The image registration algorithm has different classification modes according to different methods, and comprises a feature-based registration algorithm, a frequency domain transformation-based image registration algorithm and a gray-scale-based image registration algorithm. Among them, the image registration algorithm based on the frequency domain is also the registration algorithm which is widely applied at present, and the most common one is fourier transform. The algorithm has high inclusiveness on image translation and zooming in registration, but the data volume calculated by the algorithm is very large, and particularly when high-resolution images are registered, the image processing efficiency is low, and the research efficiency of researchers is limited. Especially, when massive image data is processed, the processing efficiency is greatly reduced, and a long image registration wait becomes a difficult problem and a research hotspot in practical research.

In recent years, Graphics Processing Units (GPUs) have become the first accelerator in the field of high-performance parallel computing. An important approach to solving parallel computing using GPUs is to use the cuda (computedified Device architecture) architecture. The CUDA is a programming model released by NVIDIA corporation in 2007, which is a heterogeneous programming model of CPU + GPU. Due to the appearance of the CUDA, GPU programming becomes simpler, the function is stronger, and the application field is wider. The research efficiency is limited by the problems of long image registration time, low efficiency and the like under mass data, parallel acceleration of the algorithm is necessary, and the acceleration of the algorithm by using the GPU becomes a problem which needs to be solved urgently in the field.

Disclosure of Invention

In order to solve the above-mentioned problems in the prior art, that is, to solve the problem of low processing efficiency of an image registration algorithm based on fourier transform under a mass of images in the prior art, a first aspect of the present invention provides an image parallel registration method based on a GPU computing platform, where the number of GPUs in the GPU computing platform is X, and the method includes the following steps:

step S100, acquiring a template image, acquiring frequency domain data of the template image through a first registration algorithm to serve as first data, and respectively storing the first data into a video memory of each GPU; the first registration algorithm is a Fourier transform-based registration algorithm;

step S200, segmenting the template image to obtain N images with the same resolution, calculating corresponding frequency domain data through the first registration algorithm to serve as second data, and storing the second data into a video memory of each GPU;

step S300, acquiring a group of images to be registered, dividing the images to be registered in the group of images to be registered, and respectively inputting the images to be registered into X memory buffer areas;

step S400, each GPU reads the image to be registered in the corresponding memory buffer area to a video memory, and respectively acquires the frequency domain image of each image to be registered as third data through a kernel function and a first registration algorithm; based on the first data and the third data, obtaining translation parameters of the image to be registered through a preset translation parameter calculation method, translating the translation parameters, and taking the translated image to be registered as a first image;

step S500, segmenting the first image to obtain N second images with the same resolution, and respectively calculating through the first registration algorithm to obtain corresponding frequency domain data as fourth data; and based on the second data and the fourth data, obtaining a translation parameter of the second image by a preset translation parameter calculation method and translating to obtain a registered image.

In some preferred embodiments, the "segmenting the template image" in step S200 and the "segmenting the first image" in step S500 are based on a preset segmentation method, where the preset segmentation method is: the method comprises the following steps of segmenting an image to be segmented through a sliding window with preset parameters, and obtaining N small images with the same resolution ratio after segmentation, wherein the calculation formula of N is as follows:

wherein, W is the width of the image to be segmented, H is the height of the image to be segmented, Sw is the width of the sliding window, Sh is the height of the sliding window, and D is the sliding step length of the sliding window. In some preferred technical solutions, the method for calculating the preset translation parameter in step S400 specifically includes the following steps:

step A100, based on the first data and the third data, calculating through a CUDA library function and inverse Fourier transform to obtain time domain data of each image to be registered;

step A200, based on the time domain data, obtaining the translation parameters of each image to be registered through kernel function calculation.

In some preferred technical solutions, the GPU computing platform further includes a CPU, and the method further includes the following steps: and step S600, each GPU respectively transmits the registered images to a CPU memory and stores the images to a hard disk.

In some preferred embodiments, the step S100 of obtaining the frequency domain data of the template image through the first registration algorithm is completed in the GPU.

In some preferred technical solutions, "the template image is segmented by a preset segmentation method in step S200 to obtain N images with the same resolution, and frequency domain data corresponding to the N images are obtained by calculation through the first registration algorithm, and the obtained frequency domain data is completed in the GPU as second data.

In some preferred technical solutions, in the step S300, "dividing the images to be registered in the image group to be registered" is to divide the images to be registered in the image group to be registered based on the number of the images to be registered and the number of GPUs.

The invention provides an image parallel registration system based on a GPU computing platform, which comprises a CPU module and X same GPU modules;

the CPU module is configured to transmit the template images to the GPU module, divide the images to be registered in the images to be registered based on the number of the images to be registered and the number of the GPU modules, and input the divided images to be registered into X memory buffers respectively;

the GPU module is configured to acquire a template image from the CPU module, acquire frequency domain data of the template image through a first registration algorithm, serve as first data, and respectively store the first data into a video memory; the first registration algorithm is a Fourier transform-based registration algorithm;

segmenting the template image by a preset segmentation method to obtain N images with the same resolution, calculating corresponding frequency domain data by the first registration algorithm respectively to serve as second data, and storing the second data into a video memory respectively;

reading the images to be registered in the corresponding memory buffer area to a video memory, and respectively acquiring the frequency domain images of the images to be registered as third data through a kernel function and a first registration algorithm; based on the first data and the third data, obtaining translation parameters of the image to be registered through a preset translation parameter calculation method, translating the translation parameters, and taking the translated image to be registered as a first image;

segmenting the first image by a preset segmentation method to obtain N second images with the same resolution, and respectively calculating corresponding frequency domain data by the first registration algorithm to serve as fourth data; and based on the second data and the fourth data, obtaining a translation parameter of the second image by a preset translation parameter calculation method, translating to obtain a registered image, and transmitting the registered image to the CPU module.

A third aspect of the present invention provides a storage device, in which a plurality of programs are stored, and the program applications are loaded and executed by a processor to implement the image parallel registration method based on the GPU computing platform according to any of the above technical solutions.

A fourth aspect of the present invention provides a processing apparatus, comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the image parallel registration method based on the GPU computing platform in any one of the technical schemes.

The invention has the beneficial effects that:

according to the invention, a GPU computing platform is used, massive images can be processed in real time based on a multi-GPU parallel technology, and each sub-step of a registration algorithm based on Fourier transform is completed in a GPU kernel function, so that the parallel efficiency in each GPU is maximized.

According to the image registration method, image registration is parallelized, multi-GPU task division is carried out on massive images, sub-tasks are divided according to the resolution of the images, the sub-tasks are distributed to thread blocks of a GPU, data calculation is completed in a kernel function in parallel, and therefore image registration is accelerated.

The invention divides the whole processing flow into three stages, and adopts an asynchronous transmission mode to enable the data transmission and the GPU calculation to be executed in parallel, thereby realizing the pipeline parallel of the three processes of data transmission, registration, return and writing into a disk, further improving the efficiency of parallel registration of massive images and achieving the real-time processing.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a block flow diagram of an embodiment of a method for parallel image registration based on a GPU computing platform according to the present invention;

FIG. 2 is a thread relationship diagram of an embodiment of an image parallel registration method based on a GPU computing platform according to the present invention;

FIG. 3 is a diagram illustrating the relationship between a memory buffer and a GPU when template data is processed according to an embodiment of the image parallel registration method based on a GPU computing platform;

FIG. 4 is a schematic pipeline diagram of three processing stages in an embodiment of an image parallel registration method based on a GPU computing platform according to the present invention;

FIG. 5 is a diagram illustrating a relationship between a memory buffer and a GPU in image parallel registration according to an embodiment of the image parallel registration method based on a GPU computing platform;

FIG. 6 is a flowchart of a global image parallel registration algorithm of an embodiment of the image parallel registration method based on a GPU computing platform of the present invention;

fig. 7 is a block diagram of a local image parallel registration flow of an embodiment of an image parallel registration method based on a GPU computing platform according to the present invention.

Detailed Description

In order to make the embodiments, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

In the image parallel registration method based on the GPU computing platform, the number of GPUs in the GPU computing platform is X, in the preferred embodiment of the invention, X is a positive integer, and it needs to be noted that the image parallel registration method is also suitable for a single GPU; in order to more fully illustrate the advantages of the present invention, X ≧ 2 is taken as an example in the present specification for specific description, and specifically, the present invention is described below with multiple GPUs as an example, and the image parallel registration method based on the GPU computing platform of the present invention includes the following steps:

step S100, acquiring a template image, acquiring frequency domain data of the template image through a first registration algorithm to serve as first data, and respectively storing the first data into a video memory of each GPU; the first registration algorithm is a registration algorithm based on Fourier transform;

step S200, segmenting the template image to obtain N images with the same resolution, calculating corresponding frequency domain data through the first registration algorithm to serve as second data, and storing the second data into a video memory of each GPU;

step S300, acquiring a group of images to be registered, dividing the images to be registered in the group of images to be registered, and respectively inputting the images to be registered into X memory buffer areas;

step S400, each GPU reads the image to be registered in the corresponding memory buffer area to a video memory, and respectively acquires the frequency domain image of each image to be registered as third data through a kernel function and a first registration algorithm; based on the first data and the third data, obtaining translation parameters of the image to be registered through a preset translation parameter calculation method, translating the translation parameters, and taking the translated image to be registered as a first image;

step S500, segmenting the first image to obtain N second images with the same resolution, and respectively calculating through the first registration algorithm to obtain corresponding frequency domain data as fourth data; and based on the second data and the fourth data, obtaining a translation parameter of the second image by a preset translation parameter calculation method and translating to obtain a registered image.

The registration algorithm is completed by a heterogeneous computing platform based on a CPU + GPU, and the registration method can be used for completing the registration of the high-resolution images in real time, quickly, accurately and efficiently under the condition of mass images.

For the purpose of more clearly illustrating the present invention, a preferred embodiment of the present invention will be described in detail below with reference to the accompanying drawings.

The invention provides an image parallel registration method based on a GPU computing platform, and as a preferred embodiment of the invention, programs are written by C/C + + and CUDA. The CUDA programming model enables GPU programming to be simple and more powerful. The CUDA technology adopts a new general parallel interface, does not need to use a graphic API interface, and can carry out GPU programming by using a general programming language C/C + +. In the preferred embodiment, a dual GPU computing platform is used. I.e. X is 2. The flow chart of the image parallel registration method based on the GPU computing platform is shown in FIG. 1, when massive images are processed, firstly, a GPU obtains a template image, the template image is segmented according to registration algorithm parameters, and related template data of the template image which is calculated in advance is respectively stored in each GPU video memory. And then registering the image to be registered, wherein the whole process of the registration of the method comprises the steps of dividing the number of memory buffer areas according to the number of the GPUs, sequentially reading the image to be registered into different memory buffer areas, sequentially transmitting a plurality of image data to each GPU video memory for registration processing through a CUDA programming zero-copy technology, and respectively returning the registered image to the memory and storing the image in a disk after the registration is finished. And image data is formed on the whole data transmission and is transmitted into a video memory, a GPU (graphics processing unit) is used for parallel registration, and a registration result is transmitted back to a real-time pipeline processing of a memory and a write-in disk. And (4) each time, putting the complete image to be registered into a GPU memory, carrying out parallel registration until the registration is finished, and returning the registration result to the GPU memory. The CPU only performs the tasks of reading an original image and storing a registration result image, and all the steps of the registration algorithm are finished in the GPU, so that the parallel processing efficiency is greatly improved.

Specifically, a whole image is subjected to parallel registration in each GPU, task division is carried out according to the number of images to be registered and the number of GPUs, and tasks are distributed to different GPUs. And the registration in each GPU is divided into a global registration part and a local registration part, the global registration is firstly carried out, then the image is cut on the global registration result, and the local registration is carried out.

Firstly, global registration is carried out, a sub-task is divided according to the size of the image resolution, and the sub-task is allocated to kernel (kernel function) of the GPU. And calculating the global geometric transformation relation of the image relative to the template image according to a registration algorithm based on Fourier transformation.

Further, local registration is performed, because global registration is performed on the whole image, the local registration accuracy of the image is low, and accurate research on the determined part of the image cannot be satisfied. The local registration is to divide the image, divide the image into N small images, divide the sub-tasks according to the size of N and the size of the resolution of the small images, allocate the sub-tasks to a GPU processor, and calculate the small images at the same positions of each small image relative to the template image by using a registration algorithm based on Fourier transform for registration.

In the preferred embodiment of the invention, the template image is a well-known and determined image, and the template image is respectively transmitted into each GPU video memory and divided into tasks which can be executed in parallel. And performing Fourier transform on the whole image of the template image in each GPU according to a registration algorithm based on Fourier transform, storing frequency domain data of the template image in a video memory, wherein the data is global registration template data, namely first data.

Further, template image segmentation is completed in each GPU, and is performed according to a preset segmentation method, the preset segmentation method is sliding window segmentation, an image to be segmented is segmented through a sliding window with preset parameters, N small images with the same resolution are obtained after segmentation, and the calculation formula of N is as follows:

N=Wn*Wm

wherein, W is the width of the image to be segmented, H is the height of the image to be segmented, Sw is the width of the sliding window, Sh is the height of the sliding window, and D is the sliding step length of the sliding window.

In a preferred embodiment of the present invention, after the template image is segmented, task division is performed according to the number N of the small images and the resolution size Sw × Sh of the small images, and parallel computation is completed in a kernel (kernel) of the GPU, that is, according to a registration algorithm based on fourier transform, fourier transform is performed on each small image, and frequency domain data thereof is stored in a video memory, where this data is local registration template data, that is, second data.

In some preferred embodiments, in step S300, "divide the images to be registered in the image group to be registered based on the number of the images to be registered and the number of GPUs, and simultaneously perform task division according to the number of GPUs and buffer areas, where in each task processing process, the GPUs are in one-to-one correspondence with the buffer areas, that is, GPU1 processes the images in buffer area 1, and GPU2 processes the images in buffer area 2, so as to implement task parallel processing of a large number of images, which may specifically refer to fig. 5.

The number of the memory buffers is the same as the number of the GPUs. Each buffer can store the number of images as P during task allocation:

number of M/GPUs

Where M represents the number of images to be registered. It should be noted that the invention is applied to parallel registration under massive images, so when P is an integer, the number of tasks allocated to each GPU by the system is the same, which facilitates parallel completion of registration tasks. When P is a non-integer, the system allocates redundant tasks to any GPU in a random allocation mode, so that the number of the tasks among multiple GPUs is not large in difference, and parallel registration can still be completed. The buffer area is stored by adopting a method of a circular buffer area, and one area is released after each image is processed, so that the continuous processing of mass image data is ensured.

Furthermore, the invention adopts an asynchronous transmission mode to enable the data transmission of the CPU-GPU and the calculation of the GPU to be parallel, thereby avoiding the bottleneck of the data transmission of the CPU-GPU frequently faced by the parallel acceleration of the GPU. The registration of each image is mainly divided into four stages, wherein the first stage is to transfer the image from a CPU memory to a GPU video memory, the second stage is to start kernel function calculation for registration, and the third stage is to transmit the registered image data back to a host computer to be written into a disk. 3 processes of data transmission, registration, transmission back and writing into a disk are realized in a pipeline parallel mode.

Preferably, step S400 is global registration, each sub-step of the fourier transform based registration algorithm is assigned to kernel processing by the GPU. And performing task division on each sub-step of the algorithm according to the image resolution W x H by adopting a global image parallel registration algorithm, calculating the size of a proper thread block, and starting a kernel function to perform parallel calculation. The whole registration process is processed in parallel in the GPU, so that the parallel efficiency of the registration algorithm is maximized.

The preset translation parameter calculation method specifically comprises the following steps: step A100, based on the first data and the third data, calculating through a CUDA library function and inverse Fourier transform to obtain time domain data of each image to be registered; step A200, based on the time domain data, obtaining the translation parameters of each image to be registered through kernel function calculation.

Fig. 6 is a global registration parallel processing flow in the GPU, which mainly includes the following algorithm sub-steps:

algorithm substep 1: according to the method of starting GPU kernel function twice, calculating the sum of global image pixel data in parallel;

algorithm substep 2: and dividing proper thread blocks and grid sizes according to the image resolution, starting a GPU (graphics processing unit) kernel function, and carrying out parallel processing on the global image pixel data and the image median in the algorithm substep 1.

Algorithm substep 3: based on the result obtained in the substep 2, calculating FFT (fast Fourier transform) through a CUDA library function to obtain frequency domain data, namely third data, of the image to be registered;

algorithm substep 4: performing parallel multiplication on the third data and the first data obtained in the substep 3 of the algorithm by using a GPU;

algorithm substep 5: performing inverse Fourier transform on the result in the sub-step 4 according to the CUDA library function to obtain time domain data;

algorithm substep 6: dividing thread blocks and grid sizes according to the resolution, customizing a kernel function, and performing mobile transformation on time domain data;

algorithm substep 7: finding out the coordinate corresponding to the maximum value of the data peak value according to a method of starting the GPU kernel function twice, and further obtaining the translation parameter of each image to be registered;

algorithm substep 8: and appropriate thread blocks and grid sizes are divided according to the image resolution, and the original image is subjected to parallel translation based on the translation parameters in the algorithm substep 7 to obtain well-registered image data, namely the first image.

Step S500 is local registration, the first image is segmented in each GPU according to the preset segmentation method, thread block segmentation is performed according to the image resolution, kernel functions of the GPUs are started to perform parallel segmentation, and the kernel functions are started to obtain N pieces of small image data with the same resolution at a time, that is, the small image data is the second image.

Fig. 7 shows a local registration parallel algorithm flow of the present invention, in which a kernel function is used in each GPU to segment a global registration result image, so as to obtain N small image data at a time, and perform local registration according to a global image parallel registration algorithm. Firstly, fft calculation of N small images is completed once by using a cuFFT library function batch method of CUDA, then parallel local registration algorithm calculation is sequentially and circularly carried out on the N small images in GPU to obtain translation parameters of each local image, subtasks are divided according to global image resolution, proper thread blocks and grid sizes are selected, a kernel function is started once, image local adjustment is carried out in parallel, a final registration result is obtained, and a Zero-Copy method is adopted to asynchronously transmit the registration result back to a CPU memory.

According to the global image parallel registration algorithm, sequentially performing parallel local registration algorithm calculation on N second images in a GPU to obtain corresponding frequency domain data serving as fourth data, obtaining a translation parameter of each local image through a preset translation parameter calculation method based on the second data and the fourth data, dividing subtasks according to global image resolution, selecting proper thread blocks and grid sizes, starting a kernel function, performing parallel processing on the images to obtain final registered images, transmitting the final registered images back to a CPU memory, and storing the final registered images to a hard disk.

Through all the steps, the image parallel registration method (algorithm) based on the GPU computing platform is realized. The invention provides a method for processing massive images by using a multi-GPU parallel technology, and maximizes parallel efficiency in each GPU, so that each sub-step of a registration algorithm is completed in a GPU kernel function. The image registration algorithm is a time-consuming part in image processing, image registration is parallelized, multi-GPU task division is carried out on massive images, sub-tasks are divided according to the resolution of the images, the sub-tasks are distributed to thread blocks of a GPU, data calculation is completed in a kernel function in parallel, and therefore image registration is accelerated; in addition, the whole processing flow is divided into three stages, the data transmission and the GPU calculation are executed in parallel by adopting an asynchronous transmission mode, and the three processes of data transmission, registration, return and writing into a disk are realized in a pipeline parallel mode, so that the efficiency of parallel registration of massive images is further improved, and the real-time processing is realized.

In order to verify the execution efficiency of the method, the method adopts a high-resolution image as original data, randomly selects a partial image as a template reference image, and performs 3 experiments on the basis of ensuring the accuracy of the experiments. The experimental environment is shown in detailed configuration table 1.

Table 1 this experimental environment configuration

Experiment 1

In the experiment, 2048 × 2048 and 2048 × 1024 high-resolution images are used as original data template images to perform registration experiments, the high efficiency of the parallel registration algorithm is verified by a contrast serial method, and the experimental results are shown in the following table 2.

TABLE 2 Serial method under high resolution image and the calculation time comparison of the method of the present invention

As can be seen from Table 2, in a high-resolution image experiment, the parallel registration algorithm of the invention has high execution efficiency, can greatly shorten the registration time, and has an acceleration ratio of about 183 compared with a serial registration algorithm under a CPU. Because all the processes of the registration parallel algorithm are completed in the GPU, the parallel computing performance of the GPU is fully exerted, and the running efficiency of the registration algorithm is effectively improved.

Experiment 2

In the experiment, 2048 × 2048 high-resolution images are used as original data, the stability of the parallel registration algorithm is verified by a contrast serial method under the background of testing massive images, and the experimental result is shown in table 3.

TABLE 3 Serial method under massive image and the calculation time comparison method of the invention

As can be seen from Table 3, in the experiment of massive images, the parallel registration algorithm of the invention has stable execution, the running time basically increases linearly, and the acceleration ratio is stabilized at about 155 compared with the serial method.

Experiment 3

In the experiment, 2048 × 2048 high-resolution images are used as original data, the running time of parallel algorithms under test of a single GPU and a double GPU is compared, and the experiment result is shown in table 4.

TABLE 4 comparison of computation times for the parallel registration algorithm of the present invention under single GPU and dual GPUs

As can be seen from table 4, the parallel registration algorithm of the present invention has a running time 2 times that of the dual GPUs in a single GPU, because in the context of massive images, the dual GPUs adopt task division in parallel, and the number of the images is equally divided according to the number of the GPUs, the running time decreases linearly as the number of the GPUs increases, and the speed-up ratio increases linearly.

A second aspect of the preferred embodiment of the present invention provides an image parallel registration system based on a GPU computing platform, which includes a CPU module and X identical GPU modules, where X is a positive integer, and it should be noted that the image parallel registration system of the present invention is also applicable to a single GPU; in order to more fully illustrate the advantages of the present invention, X ≧ 2 is specifically illustrated in the present specification; the CPU module is configured to transmit the template images to the GPU module, divide the images to be registered in the images to be registered based on the number of the images to be registered and the number of the GPU modules, and input the divided images to be registered into X memory buffers respectively; the GPU module is configured to acquire a template image from the CPU module, acquire frequency domain data of the template image through a first registration algorithm, serve as first data, and respectively store the first data into a video memory; the first registration algorithm is a Fourier transform-based registration algorithm; segmenting the template image by a preset segmentation method to obtain N images with the same resolution, calculating corresponding frequency domain data by the first registration algorithm respectively to serve as second data, and storing the second data into a video memory respectively; reading the images to be registered in the corresponding memory buffer area to a video memory, and respectively acquiring the frequency domain images of the images to be registered as third data through a kernel function and a first registration algorithm; based on the first data and the third data, obtaining translation parameters of the image to be registered through a preset translation parameter calculation method, translating the translation parameters, and taking the translated image to be registered as a first image; segmenting the first image by a preset segmentation method to obtain N second images with the same resolution, and respectively calculating corresponding frequency domain data by the first registration algorithm to serve as fourth data; and based on the second data and the fourth data, obtaining a translation parameter of the second image by a preset translation parameter calculation method, translating to obtain a registered image, and transmitting the registered image to the CPU module.

A third aspect of the preferred embodiments of the present invention provides a storage device, in which a plurality of programs are stored, and the programs are loaded and executed by a processor to implement the above-mentioned image parallel registration method based on the GPU computing platform.

A fourth aspect of preferred embodiments of the present invention provides a processing apparatus, comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the image parallel registration method based on the GPU computing platform.

In the technical solution in the embodiment of the present application, at least the following technical effects and advantages are provided:

according to the invention, a GPU computing platform is used, massive images can be processed in real time based on a multi-GPU parallel technology, and each sub-step of a registration algorithm based on Fourier transform is completed in a GPU kernel function, so that the parallel efficiency in each GPU is maximized. According to the image registration method, image registration is parallelized, multi-GPU task division is carried out on massive images, sub-tasks are divided according to the resolution of the images, the sub-tasks are distributed to thread blocks of a GPU, data calculation is completed in a kernel function in parallel, and therefore image registration is accelerated. The invention divides the whole processing flow into three stages, and adopts an asynchronous transmission mode to enable the data transmission and the GPU calculation to be executed in parallel, thereby realizing the pipeline parallel of the three processes of data transmission, registration, return and writing into a disk, further improving the efficiency of parallel registration of massive images and achieving the real-time processing.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method examples, and are not described herein again.

Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种病理数据载体及其管理系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!