Image understanding method based on fine-grained feature extraction

文档序号:1963811 发布日期:2021-12-14 浏览:15次 中文

阅读说明:本技术 一种基于细粒度特征提取的图像理解方法 (Image understanding method based on fine-grained feature extraction ) 是由 俞文心 张志强 丁劲皓 凌德玉 车璐 龚俊 于 2021-08-23 设计创作,主要内容包括:本发明公开一种基于细粒度特征提取的图像理解方法,包括步骤:对输入图像进行特征提取;基于提取特征进行图像子区域定位,对每一个子定位区域提取对应的区域特征;然后基于所述区域特征生成各自的区域描述信息;将所有区域描述信息进行统一的整合生成针对于输入图像内容的长文本描述。本发明采用细粒度的特征提取方式实现更准确的区域定位从而促进区域内容描述的准确性。同时本发明对所有合成的图像区域内容描述进行统一的整合并生成几句更高准确度的长文本描述,提高图像充分理解技术的实用性。同时本发明也能够使机器更好的理解图像内容并为人类提供更好的服务,这对于促进机器更加智能具有重大作用。(The invention discloses an image understanding method based on fine-grained feature extraction, which comprises the following steps: performing feature extraction on an input image; performing image subregion positioning based on the extracted features, and extracting corresponding region features from each subregions; then generating respective area description information based on the area features; and uniformly integrating all the area description information to generate a long text description aiming at the input image content. The invention adopts a fine-grained feature extraction mode to realize more accurate region positioning so as to promote the accuracy of region content description. Meanwhile, the invention integrates the content description of all synthesized image regions uniformly and generates several long text descriptions with higher accuracy, thereby improving the practicability of the image full understanding technology. Meanwhile, the invention also enables the machine to better understand the image content and provide better service for human beings, which has an important effect on promoting the machine to be more intelligent.)

1. An image understanding method based on fine-grained feature extraction is characterized by comprising the following steps:

s10, extracting the characteristics of the input image;

s20, positioning the image sub-regions based on the extracted features, and extracting corresponding region features from each sub-positioning region; then generating respective area description information based on the area features;

and S30, uniformly integrating all the area description information to generate a long text description aiming at the input image content.

2. The image understanding method based on fine-grained feature extraction according to claim 1, wherein in the step S10: and performing feature extraction on the input image by using the convolutional neural network to acquire the features of the input image.

3. The image understanding method based on fine-grained feature extraction according to claim 2, wherein in the step S20: and positioning the image subareas of the characteristics of the input image through the area suggestion network.

4. The image understanding method based on fine-grained feature extraction according to claim 3, wherein in the step S20: and performing regional feature extraction on each sub-positioning region through a convolutional neural network.

5. The image understanding method based on fine-grained feature extraction according to claim 4, wherein in the step S20: and generating respective area description information for the area characteristics of each sub-positioning area through a recurrent neural network.

6. An image understanding method based on fine-grained feature extraction according to any one of claims 1 to 5, wherein in step S30, the method integrates all region description information uniformly to generate a long text description aiming at the input image content, and comprises the following steps: the generation of the ith long text description is synthesized by using all other area descriptions on the basis of the ith area description information.

7. The image understanding method based on fine-grained feature extraction according to claim 6, wherein a convolutional neural network is used to uniformly integrate all region description information to generate a long text description for the input image content.

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an image understanding method based on fine-grained feature extraction.

Background

Image understanding is the presentation of natural language statements from images that can describe the content of the image, similar to talking in the picture, i.e., directly describing the content of the image seen. The essence of image understanding techniques is the translation from visual to language, which is a relatively simple task for humans, but is extremely challenging for machines. Unlike image and text information that is directly seen by humans, information received by a machine is binary data, and there is no obvious difference in the appearance, so it is difficult to translate between two different forms of information. With the development of deep learning technology in artificial intelligence, the technology combining a deep convolutional neural network and a cyclic neural network has been successful in image understanding in recent years. For an input image, the technology can generate more accurate text description. Image understanding technology is currently being developed toward full understanding, i.e., text description of contents of various regions of an input image to achieve full understanding of the image contents. The image full understanding technology can better help people to better understand the image content and find some detailed content which is not easy to observe by people. The method has good promotion effect on improving the practicability of the image understanding technology and promoting the related image understanding system and software.

Most of the existing image understanding technologies have the biggest disadvantage that only one corresponding text description can be generated based on the input image, so that the current technology has limited understanding degree on the image content and has no good practicability. Some current methods can perform intensive text generation based on an input image, that is, a salient region of the input image is located first, and then text description is performed on the content of each located region, so that sufficient understanding of the image is achieved. Then, the current practice still has a space for further improvement on the accuracy of the regional content description, and on the other hand, the current regional description contents are all simple phrases, and the effective integration of all regional descriptions is lacked. These make the current image fully-understood techniques less practical.

Disclosure of Invention

In order to solve the problems, the invention provides an image understanding method based on fine-grained feature extraction, which adopts a fine-grained feature extraction mode to realize more accurate region positioning so as to promote the accuracy of region content description. Meanwhile, the invention integrates the content description of all synthesized image regions uniformly and generates several long text descriptions with higher accuracy, thereby improving the practicability of the image full understanding technology. Meanwhile, the invention also enables the machine to better understand the image content and provide better service for human beings, which has an important effect on promoting the machine to be more intelligent.

In order to achieve the purpose, the invention adopts the technical scheme that: an image understanding method based on fine-grained feature extraction comprises the following steps:

s10, extracting the characteristics of the input image;

s20, positioning the image sub-regions based on the extracted features, and extracting corresponding region features from each sub-positioning region; then generating respective area description information based on the area features;

and S30, uniformly integrating all the area description information to generate a long text description aiming at the input image content.

Further, in the step S10: and performing feature extraction on the input image by using the convolutional neural network to acquire the features of the input image.

Further, in the step S20: and positioning the image subareas of the characteristics of the input image through the area suggestion network.

Further, in the step S20: and performing regional feature extraction on each sub-positioning region through a convolutional neural network.

Further, in the step S20: and generating respective area description information for the area characteristics of each sub-positioning area through a recurrent neural network.

Further, in step S30, the unified integration of all the area description information to generate a long text description for the input image content includes the steps of: the generation of the ith long text description is synthesized by using all other area descriptions on the basis of the ith area description information.

Further, the description information of all the areas is uniformly integrated by utilizing a convolution neural network to generate a long text description aiming at the content of the input image.

The beneficial effects of the technical scheme are as follows:

according to the method, the accuracy of image understanding is improved by using a fine-grained feature extraction method, the content description of the dense areas which are fully understood by the images is integrated uniformly, and long text description with higher sentence accuracy is generated based on the content of the area description. The invention improves the accuracy of image understanding, greatly improves the practicability of the image understanding technology, and can play a role in assisting people to better understand the image content.

The method can help people to understand all contents of the image more quickly and accurately, so that the time for people to understand all contents of the image can be saved. Thus, the image understanding system and software can be better popularized.

Drawings

FIG. 1 is a schematic flow chart of an image understanding method based on fine-grained feature extraction according to the present invention;

fig. 2 is a schematic diagram illustrating an image understanding method based on fine-grained feature extraction in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described with reference to the accompanying drawings.

In this embodiment, referring to fig. 1 and fig. 2, the present invention provides an image understanding method based on fine-grained feature extraction, including the steps of:

s10, extracting the characteristics of the input image;

s20, positioning the image sub-regions based on the extracted features, and extracting corresponding region features from each sub-positioning region; then generating respective area description information based on the area features;

and S30, uniformly integrating all the area description information to generate a long text description aiming at the input image content.

As an optimization solution of the above embodiment, in the step S10: and performing feature extraction on the input image by using the convolutional neural network to acquire the features of the input image.

As an optimization solution of the above embodiment, in the step S20: positioning image sub-regions of the characteristics of the input image through a region suggestion network; extracting the regional characteristics of each sub-positioning region through a convolutional neural network; and generating respective area description information for the area characteristics of each sub-positioning area through a recurrent neural network.

The specific processes of image feature extraction, region positioning, region feature extraction and region description generation are as follows:

fea_image=CNN(I);

L=RPN(fea_image);

fea_reigon_i=CNN(li);

cap_reigon_i=RNN(fea_reigon_i)。

wherein I represents an input image; fea _ image represents the extracted image features; l represents the result of region localization; li denotes the ith positioning area; fea _ reign _ i and cap _ reign _ i respectively represent the feature corresponding to the ith area and the generated area description; CNN, RPN and RNN denote convolutional neural network, region proposal network and recurrent neural network, respectively.

As an optimization solution of the above embodiment, in step S30, the unified integration of all the region description information generates a long text description for the input image content, including the steps of: the generation of the ith long text description is synthesized by using all other area descriptions on the basis of the ith area description information. The advantage of this is that all the area description content can be fully utilized to generate more accurate image understanding result under the condition of pertinence.

And uniformly integrating all the area description information by using a convolutional neural network to generate a long text description aiming at the input image content.

The specific unified integration process is as follows:

long_cap_i=RNN(cap_reigon_1,...,cap_reigon_(i-1),cap_reigon_(i+1),...,cap_reigon_n|cap_reigon_i)。

the generation of the ith long text description is synthesized by using all other region descriptions on the basis of the ith region description. The advantage of this is that all the area description content can be fully utilized to generate more accurate image understanding result under the condition of pertinence.

Specific examples may employ:

image understanding system

An image understanding system according to the family of web pages is provided in which a user can upload images, and then the system automatically generates corresponding text content from the uploaded images and presents the same in the web pages. In this way, people are helped to quickly understand all the contents of the image.

Second, image understanding software

The software comprises two parts: image understanding, image profiling.

Image understanding software formed using the present invention allows a user to select a local image among the software, which can then automatically synthesize the corresponding textual description. The generated text results are directly displayed in the software so that the user can quickly understand the content of the image. Meanwhile, the user can select an image parsing function, and the software can show the staged results in the process of generating the text description, namely show the positioning result of the salient regions in the image, the description content of each positioning region and a few-sentence long text result generated based on the region description content. The method can make the user fully understand which important areas and the contents of the important areas are contained in the image, and can also understand the whole contents of the image.

The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

7页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:图像评估

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!