Full-reference image quality evaluation method based on structural clues

文档序号:1379184 发布日期:2020-08-14 浏览:19次 中文

阅读说明:本技术 一种基于结构线索的全参考图像质量评估方法 (Full-reference image quality evaluation method based on structural clues ) 是由 赵云灏 高炜 侯瑞 胡杨 刘敏 张劳社 白平 陈康 侯晓松 刘利军 付旭东 张 于 2020-04-16 设计创作,主要内容包括:本发明公开了一种基于结构线索的全参考图像质量评估方法,包括以下步骤:基于网格对图像进行分割,得若干区域,利用随机游走的方式将多个区域串联起来,以形成候选突出区域,然后基于结构保留的卷积神经网络进行深度表示的提取,再根据提取的深度计算能够反映结构线索的距离度量,最后利用反映结构线索的距离度量评估全参考图像的质量,该方法能够基于人类感知利用距离度量评估全参考图像的质量。(The invention discloses a full-reference image quality evaluation method based on a structural clue, which comprises the following steps of: the method comprises the steps of segmenting an image based on grids to obtain a plurality of regions, connecting the regions in series in a random walk mode to form candidate salient regions, extracting depth representation based on a convolutional neural network with a reserved structure, calculating distance measurement capable of reflecting a structure clue according to the extracted depth, and finally evaluating the quality of a full reference image by using the distance measurement capable of reflecting the structure clue.)

1. A full reference image quality evaluation method based on structural clues is characterized by comprising the following steps:

the method comprises the steps of segmenting an image based on a grid to obtain a plurality of regions, connecting the regions in series in a random walk mode to form candidate salient regions, extracting depth representation based on a convolutional neural network with a reserved structure, calculating distance measurement capable of reflecting a structure clue according to the extracted depth, and finally evaluating the quality of a full reference image by using the distance measurement capable of reflecting the structure clue.

2. The method of claim 1, wherein the process of forming the candidate salient region comprises:

performing image segmentation by using a grid-based method, wherein each grid is represented by color and texture, a gradient histogram feature of 128-D and a color moment of 9-D are used for representing each grid, and the distance between adjacent grids is calculated by formula (1);

wherein x isiAnd xjRespectively representing 137-D low-level characteristics of the ith area and the jth area, wherein sigma is a standardized parameter, and similar grids are strengthened through an equation (2) to form candidate salient areas;

wherein the content of the first and second substances,andrespectively representing the ranking scores of the ith and j grids, mu being the regularization parameter, yjAs a sparse matrix, when yjWhen 1, then the jth grid is significant, when yjWhen 0, the jth mesh is not significant, and n represents the number of meshes connected to form the candidate salient region.

3. The method of claim 1, wherein a saliency-based ranking algorithm based on low-level features and high-level semantic features is employed to mine image-level labels using manifold learning algorithm, that is, the method comprises

Wherein f ismAnd fnRespectively representing the fusion characteristics of the m-th candidate salient region and the n-th candidate salient region, wmWeight of the m-th candidate salient region, ls(fm,fn) In order to calculate the similarity between the m-th candidate salient region and the n-th candidate salient region, where K is the number of candidate salient regions, the order of the salient region is calculated by equation (3), and the intrinsic structure information of the image is represented by the order of the salient region.

4. The method of claim 1, wherein the adaptive spatial pool is dynamically sized to ensure that the depth representations have the same dimension by supporting various input image sizes through an adaptive spatial pool layer, and then the multi-channel feature is connected into the long depth representation by using an aggregation operator.

5. The method of claim 1, wherein the distance metric capable of reflecting the structural cue is:

D(x,y)=f(s(x,y),l(x,y),c(x,y)) (4)

where s (x, y) is the structural similarity, l (x, y) is the illumination factor, c (x, y) is the color factor, x is the reference image, and y is the test image, so that the image y has a higher quality score, and the expression of the structural similarity s (x, y) is:

the illumination factor l (x, y) is:

where S is the magnitude of the illumination change relative to the background illumination, l (x, y) is a function of S, c (x, y) represents the difference in color moments of images x and y, c (x, y) is:

c(x,y)=||c(x)-c(y)||2(7)

the function c (-) is:

wherein the content of the first and second substances,

similarity between the reference image and the test image is calculated by a distance metric to evaluate a quality score between the test images.

Technical Field

The invention relates to a full-reference image quality evaluation method, in particular to a full-reference image quality evaluation method based on a structural clue.

Background

Image quality evaluation can be divided into subjective evaluation and objective evaluation. Subjective evaluations can be divided into absolute evaluations and relative evaluations. The biggest difference between subjective evaluation and objective evaluation is that objective evaluation has no reference image. Generally, subjective evaluation can better reflect image quality, but the methods all need human participation and cannot be popularized in practical application. The objective evaluation algorithm has a greater application value.

Objective evaluations include Full Reference (FR), Reduced Reference (RR), and No Reference (NR). The overall reference image quality evaluation means that when the difference image and the reference image between the comparisons are calculated and an ideal image is selected as the reference image, the degree of distortion of the image is analyzed for evaluation to obtain the quality evaluation of the image. The common objective evaluation of the quality of the full-reference image is mainly based on three aspects of pixel statistics, information theory and structural information. The simplified reference is also called a semi-reference, and takes partial characteristic information of an ideal image as a reference to compare and analyze the evaluation image to obtain an evaluation result of the image quality. Since the reference information is feature information extracted from an image, it is necessary to extract partial feature information of an image to be evaluated and an ideal image and then evaluate the quality of the image by comparing the extracted partial information. The reference methods are classified into a method based on original image characteristics, a method based on digital watermarking, and a method based on a wavelet domain statistical model. Since the simplified reference quality evaluation depends on certain features of the image, the amount of data is much reduced compared to the image as a whole. The non-reference evaluation is also called a blind evaluation method, and because it is generally difficult to obtain an ideal image, the quality evaluation method completely deviating from the dependence on the ideal reference image is widely used. The no reference method is generally based on statistical properties of the image.

The goal of Image Quality Assessment (IQA) is to make an objective assessment of image quality based on a designed algorithm that yields a quality score that should approximate the subjective evaluation of the observer. Image quality is related to color harmony, illumination and objects inside the image. Generally, the image quality scores for various distortions are low. Subjective image evaluation can easily distinguish images of different quality scores, but this remains a challenge for objective evaluation algorithms that aim to design mathematical algorithms to simulate human perception of images. We recognize that structural cues in images are perceived by people when viewing a scene, and therefore consider structural cues to play an important role in image quality assessment, however, such important attributes are not well coded. In addition, the conventional algorithm has the following limitations:

1) mathematical models of existing designs do not effectively conform to human perception. Therefore, the results of objective evaluation algorithms are far from subjective evaluation.

2) Good distance metric performance is of great importance for IQA because of the inherent differences between the distorted and original images. Conventional algorithms, however, do not efficiently utilize the distance metric.

Disclosure of Invention

The present invention is directed to overcoming the above-mentioned shortcomings of the prior art, and provides a full reference image quality assessment method based on structural cues, which can assess the quality of a full reference image based on human perception by using distance metrics.

In order to achieve the above object, the method for evaluating the quality of a full reference image based on a structural cue of the present invention comprises the following steps:

the method comprises the steps of segmenting an image based on a grid to obtain a plurality of regions, connecting the regions in series in a random walk mode to form candidate salient regions, extracting depth representation based on a convolutional neural network with a reserved structure, calculating distance measurement capable of reflecting a structure clue according to the extracted depth, and finally evaluating the quality of a full reference image by using the distance measurement capable of reflecting the structure clue.

The specific process of forming the candidate salient region is as follows:

performing image segmentation by using a grid-based method, wherein each grid is represented by color and texture, a gradient histogram feature of 128-D and a color moment of 9-D are used for representing each grid, and the distance between adjacent grids is calculated by formula (1);

wherein x isiAnd xjRespectively representing 137-D low-level characteristics of the ith area and the jth area, wherein sigma is a standardized parameter, and similar grids are strengthened through an equation (2) to form candidate salient areas;

wherein the content of the first and second substances,andrespectively representing the ranking scores of the ith and j grids, mu being the regularization parameter, yjAs a sparse matrix, when yjWhen 1, then the jth grid is significant, when yjWhen 0, the jth mesh is not significant, and n represents the number of meshes connected to form the candidate salient region.

Setting a significance sorting algorithm based on low-level features and high-level semantic features, and mining image-level labels by using a manifold learning algorithm, namely

Wherein f ismAnd fnRespectively representing the fusion characteristics of the m-th candidate salient region and the n-th candidate salient region, wmWeight of the m-th candidate salient region, ls(fm,fn) In order to calculate the similarity between the m-th candidate salient region and the n-th candidate salient region, where K is the number of candidate salient regions, the order of the salient region is calculated by equation (3), and the intrinsic structure information of the image is represented by the order of the salient region.

Supporting various input image sizes through an adaptive space pool layer, dynamically adjusting the size of the adaptive space pool, ensuring that depth representations have the same dimension, and then connecting multichannel characteristics into a long depth representation by using an aggregation operator.

The distance metric that can reflect structural cues is:

D(x,y)=f(s(x,y),l(x,y),c(x,y)) (4)

where s (x, y) is the structural similarity, l (x, y) is the illumination factor, c (x, y) is the color factor, x is the reference image, and y is the test image, so that the image y has a higher quality score, and the expression of the structural similarity s (x, y) is:

the illumination factor l (x, y) is:

where S is the magnitude of the illumination change relative to the background illumination, l (x, y) is a function of S, c (x, y) represents the difference in color moments of images x and y, c (x, y) is:

c(x,y)=||c(x)-c(y)||2(7)

the function c (-) is:

wherein the content of the first and second substances,

similarity between the reference image and the test image is calculated by a distance metric to evaluate a quality score between the test images.

The invention has the following beneficial effects:

the method for evaluating the quality of the full reference image based on the structural clues extracts depth representation based on the convolutional neural network with the structure reserved when the method is specifically operated, so that the image structure is well reserved, distance measurement capable of reflecting the structural clues is constructed to consider human perception, then the similarity between the reference image and an evaluation image is calculated through the distance measurement to evaluate the quality of the image, and the method has good robustness and effectiveness compared with the existing algorithm under three data sets through tests.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of a deep network based on structure preservation;

FIG. 3 is a graph of the results of a simulation experiment under different parameters;

FIG. 4 is a graph of the results of a comparison of different parameters in a simulation experiment;

fig. 5 is a graph showing the comparison accuracy results in the simulation experiment.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings:

image Quality Assessment (IQA) is a key technology in intelligent systems such as computer vision and image clustering, retrieval, compression, classification and the like, the aim of the IQA is to objectively assess the image quality based on a designed algorithm, the quality is in accordance with human visual perception, namely, the quality score obtained by the algorithm is similar to the subjective evaluation of an observer, and the IQA plays an important role in modern advertising systems and image sharing platforms.

The invention provides a full reference image quality evaluation method based on structural clues, which utilizes distance measurement to evaluate the quality of a full reference image, so that the result of an objective evaluation algorithm is closer to subjective evaluation, and the specific operation is as follows: firstly, segmenting an image based on a grid to obtain a plurality of regions, connecting the plurality of regions in series by utilizing a random walk mode to form candidate salient regions, then extracting depth representation based on a convolutional neural network with a reserved structure, calculating distance measurement capable of reflecting a structure clue according to the extracted depth, and finally evaluating the quality of the full reference image by utilizing the distance measurement capable of reflecting the structure clue.

Referring to fig. 1, the method for evaluating the quality of a full reference image based on a structural cue specifically includes the following steps:

1. structure of rope

Biological and psychological experiments show that when people observe an image, the eyes are distributed in sequence, namely, people firstly focus on the most prominent area and then distribute the eyes to the second prominent area, and the conversion path of the human sight reflects the intrinsic structural clues of the image. Recognizing that human perception of an image is based on intrinsic structural clues of the image, the invention can realize full-reference image quality evaluation on the basis, and in order to capture a salient region, the invention provides a target detection algorithm based on a grid, which specifically comprises the following steps: image segmentation is performed using a set of segmentation parameters m × n, where various regions may be captured, and then the regions are concatenated using random walks to form candidate salient regions.

For example, image segmentation is performed by using a mesh-based method, each mesh is represented by color and texture, the image is characterized by the color and the texture being complementary to each other, wherein each mesh is represented by a gradient histogram feature of 128-D and a color moment of 9-D, and the distance between adjacent meshes is calculated by equation (1) considering that similar meshes have similar underlying features;

wherein x isiAnd xj137-D low-level features respectively representing the ith and jth regions, σ being a normalization parameter, wherein meshes belonging to the same salient region have similar features, and meshes belonging to different salient regions have different features, and thus the similar meshes are reinforced by equation (2) to form candidate salient regions;

wherein the content of the first and second substances,andrespectively representing the ranking scores of the ith and j grids, mu being the regularization parameter, yjIs a sparse matrix, wherein when yj1, the jth grid is significant, yjThe jth mesh is not significant, and n represents the number of meshes connected to form the candidate salient region.

The structural information plays an important role in image understanding, the invention aims to establish a structural clue for image quality evaluation, and the invention simulates the human visual mechanism to construct a structural relation framework by considering that the human visual mechanism has the capability of adaptively extracting the structural information among the salient regions, and the specific process is as follows: defining a significance sorting algorithm based on low-level features and high-level semantic features, and mining image-level labels by using a manifold learning algorithm. This operation can be expressed as:

wherein f ismAnd fnRespectively representing the fusion characteristics of the m-th candidate salient region and the n-th candidate salient region, wmWeight of the m-th candidate salient region, ls(fm,fn) To calculate the similarity of the m-th and n-th candidate salient regions, K is the number of candidate salient regions, the order of the salient region is calculated by equation (3), and the intrinsic structure information of the image is represented by the order of the salient region.

2. Deep network based on structure reservation

After the salient region is obtained, depth representation extraction is carried out by utilizing a convolutional neural network. In order to retain structural information in the image, the invention designs a deep network based on structural retention, as shown in fig. 2. With the classical CNN architecture AlexNet proposed by krishevsky et al, it is noted that salient regions may have different shapes and sizes, so the present invention designs an Adaptive Spatial Pool (ASP) layer to support various input image sizes, dynamically adjusts the size of the adaptive spatial pool to ensure that depth representations have the same dimensions, and then concatenates the multi-channel features into a long depth representation using an aggregation operator.

The aggregation operator is described as: each salient region is defined as: feature dimension l for each salient regioni∈Rt(i∈[1,K]) In order to solve the problem that the size explosion can be caused if the features of each salient region are directly connected in series, the invention provides an aggregation algorithm, which is specifically as follows: by using statistical summary multi-channel characteristics, F ═ minimum, maximum, mean, is extracted from each channel firstThe minimum feature value is taken and then concatenated together, after which the maximum values of the features are extracted and concatenated to obtain a long depth representation, for example, assuming that there are 5 salient objects in an image, each feature dimension it is 256, when the features of each salient region are concatenated directly, the depth is represented as 5 × 256 1280 dimension, and by using the aggregation algorithm described in the present invention, the depth is represented as 5 × 4 20 dimensions only.

3. Distance measurement

The full reference image quality evaluation adopts an original image as a reference image, good performance distance measurement can reflect the difference between the images, and the distance measurement capable of reflecting structural clues is set as follows:

D(x,y)=f(s(x,y),l(x,y),c(x,y)) (4)

wherein s (x, y) is the structural similarity, l (x, y) is the illumination factor, c (x, y) is the color factor, and the formula (4) calculates the similarity between the image x and y, wherein x is the reference image, and y is the test image. A smaller image D (x, y) indicates that images x and y are similar and therefore image y has a higher quality score, and the structural similarity s (x, y) is defined based on the obtained depth representation.

R. the illumination factor is:

where S is the magnitude of the illumination change relative to the background illumination, and l (x, y) is a function of S. c (x, y) represents the difference in color moments for images x and y, c (x, y) being:

c(x,y)=||c(x)-c(y)||2(7)

wherein the function c (-) is:

wherein the content of the first and second substances,

the distance metric calculates the similarity between the reference image and the test image, thereby evaluating the quality score between the test images, and in summary, the present invention can be summarized as shown in table 1;

TABLE 1

Simulation experiment

Many existing IQA algorithms exhibit impressive performance in evaluating the quality of a distorted image generated from the same original image. However, when different types of distorted images or distorted images generated from different original images are evaluated, their effectiveness may be degraded. Therefore, cross-images and cross-distortion estimation are important indicators for evaluating performance of the IQA algorithm.

In the implementation process, the method is compared with other most advanced cross images and algorithms under cross distortion environments, and cross image evaluation refers to generation of distorted images by using various original images. Selecting one of the images as a reference image; the cross distortion evaluation refers to the generation of various types of distortion images to evaluate the performance of IQA, and in order to prove the effectiveness of objective evaluation, the invention provides an evaluation index to evaluate the relationship between objective and subjective evaluation, as shown in formula (9);

wherein S isoImage quality score, S, for objective evaluationsIn order to subjectively evaluate the obtained image quality scores, in the experiment, 30 volunteers participate, the method is compared with 9 IQA algorithms under real-time TID2008 and CSIQ data sets, comparison results under different algorithms and data sets are given in tables 2, 3 and 4, and the method refers to tables 2, 3 and 4, so that the competitive performance is obtained compared with the most advanced algorithm. The invention canTo preserve the structural information of the image, which contributes significantly to the understanding of the image, the image quality evaluation based on structural cues is superior to other competitors to a great extent.

TABLE 2

TABLE 3

TABLE 4

The realization of the invention has three key parameters, namely the number of the significant areas K, the dimension of the significant area l and the image segmentation parameter mxn, fig. 3 and fig. 4 show the comparison results under different parameters, the image segmentation parameter mxn can influence the selection of the areas, a larger mxn can capture more prominent areas and more detailed objects, but the calculation complexity can also be increased, a too small mxn can not capture the prominent areas or can only extract a small number of prominent areas, therefore, the selection of the proper parameters has important significance for the image segmentation, and fig. 5 is the comparison precision result under different parameters.

It will be appreciated by those skilled in the art that the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The embodiments disclosed above are therefore to be considered in all respects as illustrative and not restrictive. All changes which come within the scope of or equivalence to the invention are intended to be embraced therein.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:辅助阅读方法、装置、电子设备和存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!