Instance splitting method, instance splitting apparatus, electronic device, and computer-readable storage medium

文档序号：1964300 发布日期：2021-12-14 浏览：19次中文

阅读说明：本技术 实例分割方法、装置、电子设备和计算机可读存储介质 (Instance splitting method, instance splitting apparatus, electronic device, and computer-readable storage medium ) 是由马宇宸黎泽明于 2021-08-05 设计创作，主要内容包括：本申请提供一种实例分割方法、装置、电子设备和计算机可读存储介质。包括：对目标图像进行特征提取,得到至少两个不同层级的基础特征；针对各个所述基础特征,基于所述基础特征对所述目标图像进行N个阶段的实例分割,得到N个实例分割数据；其中,N的取值根据所述基础特征所对应的层级数所确定,N大于或等于1；基于各个所述基础特征所对应的实例分割数据,确定所述目标图像的实例分割结果。以此可以进行多阶段的实例分割预测,每个阶段根据前一阶段的输入和浅层特征的融合,不断增强实例分割预测的能力,进而提升实例分割的准确度。(The application provides an example segmentation method, an example segmentation device, an electronic device and a computer-readable storage medium. The method comprises the following steps: performing feature extraction on the target image to obtain at least two different levels of basic features; for each basic feature, carrying out N stages of example segmentation on the target image based on the basic feature to obtain N pieces of example segmentation data; the value of N is determined according to the level number corresponding to the basic feature, and N is greater than or equal to 1; and determining an example segmentation result of the target image based on the example segmentation data corresponding to each basic feature. Therefore, multi-stage instance segmentation prediction can be performed, and each stage continuously enhances the capability of instance segmentation prediction according to the input of the previous stage and the fusion of shallow features, so that the accuracy of instance segmentation is improved.)

1. An image processing method, comprising:

performing feature extraction on the target image to obtain at least two different levels of basic features;

for each basic feature, carrying out N stages of example segmentation on the target image based on the basic feature to obtain N pieces of example segmentation data; the value of N is determined according to the level number corresponding to the basic feature, and N is greater than or equal to 1;

and determining an example segmentation result of the target image based on the example segmentation data corresponding to each basic feature.

2. The method according to claim 1, wherein before performing N stages of example segmentation on the target image based on the base features for each of the base features to obtain N example segmentation data, the method further comprises:

determining semantic segmentation features corresponding to the target image based on the basic features;

correspondingly, for each basic feature, performing N stages of example segmentation on the target image based on the basic feature to obtain N example segmentation data includes:

and for each basic feature, carrying out N-stage example segmentation on the target image based on the basic feature and the semantic segmentation feature to obtain N pieces of example segmentation data.

3. The method of claim 2, wherein the performing, for each of the base features, N stages of instance segmentation on the target image based on the base feature and the semantic segmentation feature comprises:

for the base feature of the ith level, carrying out example segmentation on the target image by using N cascaded example segmentation modules based on the base feature and the semantic segmentation feature;

the input data of a first example segmentation module is the basic feature and the semantic segmentation feature, and the output data of the first example segmentation module is a first example segmentation result and an example segmentation feature aiming at the basic feature; the input data of the kth example segmentation module is determined based on the example segmentation features output by the kth-1 example segmentation module and the base features of the ith (k-1) level corresponding to the base features; k is more than or equal to 2 and less than or equal to N, and the value of N is i-1.

4. The method according to claim 3, wherein the instance segmentation feature output by the first instance segmentation module is determined by:

carrying out instance segmentation based on the basic features and the semantic segmentation features by using the first instance segmentation module to obtain corresponding instance segmentation scores, wherein the instance segmentation scores are used for indicating the probability that pixel points corresponding to the segmentation scores are boundaries;

and masking the instance segmentation scores to obtain instance segmentation features output by the first instance segmentation module.

5. The method of claim 3, wherein the instance segmentation characteristics output by the kth instance segmentation module are determined by:

acquiring instance segmentation characteristics output by a (k-1) th instance segmentation module;

determining an example segmentation score of a kth example segmentation module based on the example segmentation features output by the kth-1 example segmentation module, the base features of the i- (k-1) th level and the kth example segmentation module;

and masking the instance segmentation scores to obtain instance segmentation features output by the kth instance segmentation module.

6. The method according to claim 5, wherein the determining the instance segmentation score of the kth instance segmentation module based on the instance segmentation feature output by the kth-1 instance segmentation module, the i- (k-1) th level base feature and the kth instance segmentation module comprises:

adding and fusing the example segmentation features corresponding to the (k-1) th stage and the basic features of the (i- (k-1) th level to obtain first fusion features of the (k-1) th stage;

multiplying and fusing the first fusion characteristic of the k-1 stage and the example segmentation characteristic corresponding to the k-1 stage to obtain a second fusion characteristic of the k-1 stage;

and carrying out the example segmentation of the kth stage based on the second fusion characteristics of the kth-1 stage to obtain the example segmentation scores corresponding to the kth stage.

7. The method according to any one of claims 1 to 5, wherein the determining an example segmentation result of the target image based on the example segmentation data corresponding to each of the basic features comprises:

for each basic feature, carrying out fusion processing on each instance segmentation score corresponding to the basic feature to obtain an instance segmentation comprehensive score corresponding to the basic feature;

and determining an example segmentation result of the target image according to the example segmentation comprehensive score.

8. An image processing apparatus characterized by comprising:

the extraction module is used for extracting the features of the target image to obtain at least two different levels of basic features;

the segmentation module is used for carrying out N stages of example segmentation on the target image based on the basic features aiming at each basic feature to obtain N pieces of example segmentation data; the value of N is determined according to the level number corresponding to the basic feature, and N is greater than or equal to 1;

and the determining module is used for determining an example segmentation result of the target image based on the example segmentation data corresponding to each basic feature.

9. An electronic device, comprising: a processing device and a storage device;

the storage means having stored thereon a computer program which, when executed by the processing means, performs the method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of the preceding claims 1 to 7.

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an instance partitioning method, apparatus, electronic device, and computer-readable storage medium.

Background

The task of Object Detection is to find all objects in a given image, determine the class, size and position of the objects, and give accurate pixel-level segmentation of the objects, which is one of the basic techniques of computer vision-related business. The machine automatically frames different examples from the image by using a target detection method, and then carries out pixel-by-pixel marking in different example areas by using a semantic segmentation method.

The current pursuit is mainly to compromise the precision and the speed, however, the precision and the speed are usually contradictory, and if a more proper balance is obtained between the two, the technical problem to be solved by the technicians in this field is urgently needed.

Disclosure of Invention

An object of the embodiments of the present application is to provide an example segmenting method, apparatus, electronic device and computer-readable storage medium, so as to achieve both accuracy and speed more appropriately.

In a first aspect, the present application provides an image processing method. The method comprises the following steps: performing feature extraction on the target image to obtain at least two different levels of basic features; for each basic feature, carrying out N stages of example segmentation on the target image based on the basic feature to obtain N example segmentation data; the value of N is determined according to the level number corresponding to the basic characteristic, and N is greater than or equal to 1; and determining an example segmentation result of the target image based on the example segmentation data corresponding to each basic feature.

In some optional implementations, before performing N stages of example segmentation on the target image based on the basic features for each basic feature to obtain N example segmentation data, the method further includes: determining semantic segmentation features corresponding to the target image based on the basic features; correspondingly, for each basic feature, performing N-stage example segmentation on the target image based on the basic feature to obtain N example segmentation data, including: and for each basic feature, carrying out N-stage example segmentation on the target image based on the basic feature and the semantic segmentation feature to obtain N pieces of example segmentation data.

In some optional implementations, for each base feature, performing N-stage instance segmentation on the target image based on the base feature and the semantic segmentation feature, including: aiming at the basic features of the ith level, carrying out example segmentation on the target image by using N cascaded example segmentation modules based on the basic features and the semantic segmentation features; the input data of the first instance segmentation module is basic features and semantic segmentation features, and the output data of the first instance segmentation module is a first instance segmentation result and instance segmentation features aiming at the basic features; the input data of the kth example segmentation module is determined based on the example segmentation features output by the kth-1 example segmentation module and the base features of the ith (k-1) level corresponding to the base features; k is more than or equal to 2 and less than or equal to N, and the value of N is i-1.

In some alternative implementations, the instance segmentation feature output by the first instance segmentation module is determined by: carrying out instance segmentation based on the basic features and the semantic segmentation features by utilizing a first instance segmentation module to obtain corresponding instance segmentation scores, wherein the instance segmentation scores are used for indicating the probability that pixel points corresponding to the segmentation scores are boundaries; and masking the instance segmentation scores to obtain instance segmentation features output by the first instance segmentation module.

In some alternative implementations, the instance segmentation feature output by the kth instance segmentation module is determined by: acquiring instance segmentation characteristics output by a (k-1) th instance segmentation module; determining an example segmentation score of a kth example segmentation module based on the example segmentation features output by the kth-1 example segmentation module, the base features of the i- (k-1) th level and the kth example segmentation module; and masking the instance segmentation scores to obtain instance segmentation features output by the kth instance segmentation module.

In some optional implementations, determining the instance segmentation score of the kth instance segmentation module based on the instance segmentation features output by the kth-1 instance segmentation module, the i- (k-1) th level base features, and the kth instance segmentation module comprises: adding and fusing the example segmentation features corresponding to the (k-1) th stage and the basic features of the (i- (k-1) th level to obtain first fusion features of the (k-1) th stage; multiplying and fusing the first fusion characteristic of the k-1 stage and the example segmentation characteristic corresponding to the k-1 stage to obtain a second fusion characteristic of the k-1 stage; and carrying out example segmentation of the kth stage based on the second fusion characteristics of the kth-1 stage to obtain example segmentation scores corresponding to the kth stage.

In some optional implementations, determining an example segmentation result of the target image based on example segmentation data corresponding to each of the base features includes: for each basic feature, carrying out fusion processing on each instance segmentation score corresponding to the basic feature to obtain an instance segmentation comprehensive score corresponding to the basic feature; and determining an example segmentation result of the target image according to the example segmentation comprehensive score.

In a second aspect, an image processing apparatus is provided. The method comprises the following steps: the extraction module is used for extracting the features of the target image to obtain at least two different levels of basic features; the segmentation module is used for carrying out N stages of example segmentation on the target image based on the basic features aiming at each basic feature to obtain N pieces of example segmentation data; the value of N is determined according to the level number corresponding to the basic characteristic, and N is greater than or equal to 1; and the determining module is used for determining an example segmentation result of the target image based on the example segmentation data corresponding to each basic feature.

In some optional implementations, the system further includes a semantic segmentation module, configured to determine, based on each basic feature, a semantic segmentation feature corresponding to the target image; correspondingly, the segmentation module is specifically configured to: and for each basic feature, carrying out N-stage example segmentation on the target image based on the basic feature and the semantic segmentation feature to obtain N pieces of example segmentation data.

In some optional implementations, the segmentation module is specifically configured to: aiming at the basic features of the ith level, carrying out example segmentation on the target image by using N cascaded example segmentation modules based on the basic features and the semantic segmentation features; the input data of the first instance segmentation module is basic features and semantic segmentation features, and the output data of the first instance segmentation module is a first instance segmentation result and instance segmentation features aiming at the basic features; the input data of the kth example segmentation module is determined based on the example segmentation features output by the kth-1 example segmentation module and the base features of the ith (k-1) level corresponding to the base features; k is more than or equal to 2 and less than or equal to N, and the value of N is i-1.

In some optional implementations, the segmentation module is specifically configured to: carrying out instance segmentation based on the basic features and the semantic segmentation features by utilizing a first instance segmentation module to obtain corresponding instance segmentation scores, wherein the instance segmentation scores are used for indicating the probability that pixel points corresponding to the segmentation scores are boundaries; and masking the instance segmentation scores to obtain instance segmentation features output by the first instance segmentation module.

In some optional implementations, the segmentation module is specifically configured to: the example segmentation characteristics output by the kth example segmentation module are determined by the following steps: acquiring instance segmentation characteristics output by a (k-1) th instance segmentation module; determining an example segmentation score of a kth example segmentation module based on the example segmentation features output by the kth-1 example segmentation module, the base features of the i- (k-1) th level and the kth example segmentation module; and masking the instance segmentation scores to obtain instance segmentation features output by the kth instance segmentation module.

In some optional implementations, the segmentation module is specifically configured to: adding and fusing the example segmentation features corresponding to the (k-1) th stage and the basic features of the (i- (k-1) th level to obtain first fusion features of the (k-1) th stage; multiplying and fusing the first fusion characteristic of the k-1 stage and the example segmentation characteristic corresponding to the k-1 stage to obtain a second fusion characteristic of the k-1 stage; and carrying out example segmentation of the kth stage based on the second fusion characteristics of the kth-1 stage to obtain example segmentation scores corresponding to the kth stage.

In some optional implementations, the determining module is specifically configured to: for each basic feature, carrying out fusion processing on each instance segmentation score corresponding to the basic feature to obtain an instance segmentation comprehensive score corresponding to the basic feature; and determining an example segmentation result of the target image according to the example segmentation comprehensive score.

In a third aspect, the present application provides an electronic device, comprising: a processing device and a storage device;

the storage means has stored thereon a computer program which, when run by the processing means, performs the method according to any of the preceding embodiments.

In a fourth aspect, the present application provides a computer-readable storage medium having a computer program stored thereon, the computer program, when executed by a processor, performing the method of any of the preceding embodiments.

The embodiment of the application provides an instance segmentation method, an instance segmentation device, electronic equipment and a computer-readable storage medium. Obtaining at least two different levels of basic features by performing feature extraction on a target image; for each basic feature, carrying out N stages of example segmentation on the target image based on the basic feature to obtain N example segmentation data; the value of N is determined according to the level number corresponding to the basic characteristic, and N is greater than or equal to 1; and determining an example segmentation result of the target image based on the example segmentation data corresponding to each basic feature. Therefore, multi-stage example segmentation prediction can be carried out, each stage continuously enhances the capacity of example segmentation prediction according to the input of the previous stage and the fusion of shallow features, the final example segmentation result is obtained by fusing the example segmentation results of each stage, and the number of the prediction stages is controlled through more effective fusion, so that the speed is considered, and the accuracy of example segmentation is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is an exemplary electronic device for implementing an image processing method of an embodiment of the present application;

FIG. 2 is a flow chart of an image processing method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an image processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an image processing method according to an embodiment of the present application;

fig. 5 is a schematic diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments.

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and the computer vision technology generally comprises the technologies of face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.

First, an example electronic device 100 for implementing the image processing method of the embodiment of the present application is described with reference to fig. 1.

As shown in FIG. 1, electronic device 100 includes one or more processing devices 102, one or more memory devices 104, an input device 106, an output device 108, and an image capture device 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and that the electronic device may have other components and structures as desired.

The processing device 102 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.

Storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by processing device 102 to implement the client functionality (implemented by a processor) of the embodiments of the present application described below and/or other desired functionality. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

Exemplarily, an exemplary electronic device for implementing the image processing method according to the embodiment of the present application may be implemented on a mobile terminal such as a smartphone, a tablet computer, and the like.

According to the embodiment of the application, the cascade example segmentation extractor is used, and the capability of the example feature extractor is continuously enhanced according to the input of the previous stage and the fusion of the shallow features, so that the accuracy of example segmentation is improved. The image processing method will be described in detail with reference to specific embodiments.

According to an embodiment of the present application, there is provided an embodiment of an image processing method, it should be noted that the steps shown in the flowchart of the drawings may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in an order different from that here.

Fig. 2 is a flowchart of an image processing method according to an embodiment of the present application, as shown in fig. 2, the method including the steps of:

s210, extracting the features of the target image to obtain at least two different levels of basic features.

The extraction method for extracting the features of the target image may include various methods. For example, the Feature Pyramid extraction module may be composed of a residual network (ResNet) and a Feature Pyramid Network (FPN) structure for extraction, and the basic features of the at least two different levels may be Feature pyramids, and the basic features of different levels may be extracted through a plurality of Feature extraction networks for extracting different levels.

The feature pyramid is an example, and the feature maps of different sizes included in the feature pyramid are to be used for detecting objects of different sizes. For example, the feature pyramid is divided into 5 layers in total, namely P3, P4, P5, P6, P7. Wherein P3 is the FPN shallow layer characteristic diagram, and P7 is the FPN high layer characteristic diagram.

The number of layers of the feature pyramid is only exemplary and is not limited to the embodiments of the present application.

S220, carrying out N stages of example segmentation on the target image based on the basic features aiming at each basic feature to obtain N pieces of example segmentation data.

The value of N is determined according to the level number corresponding to the basic characteristic, and N is greater than or equal to 1;

in some embodiments, each base feature may be sequentially determined as a current base feature, and N-stage instance segmentation may be performed for the current base feature.

For example, in one embodiment, the at least two different levels of basic features extracted in S210 are five different levels of basic features, which are denoted as P3, P4, P5, P6 and P7; the base features may be ordered from small to large in the hierarchy, with the ordered sequence number indicating the hierarchy in which the base feature resides. At this time, P3 is the base signature of level 1, P4 is the base signature of level 2, P5 is the base signature of level 3, P6 is the base signature of level 4, and P7 is the base signature of level 5. P3, P4, P5, P6 and P7 may be taken as current base features in sequence. N is 5 when P7 is taken as the current base feature, 4 when P6 is taken as the current base feature, 3 when P5 is taken as the current base feature, 2 when P4 is taken as the current base feature, and 1 when P3 is taken as the current base feature.

For each current underlying feature, the following steps may be performed in order to achieve a multi-stage instance split. For the 1 st stage example segmentation, performing 1 st stage example segmentation based on the current basic features to obtain example segmentation data corresponding to the 1 st stage; and for the k-stage example segmentation, performing the k-stage example segmentation based on the example segmentation data corresponding to the k-1 stage and the i- (k-1) th hierarchy basic feature to obtain the example segmentation data corresponding to the k-stage.

In some embodiments, the 1 st stage example segmentation is carried out based on the current basic features to obtain the example segmentation scores corresponding to the 1 st stage; determining example segmentation features corresponding to the 1 st stage based on the example segmentation scores corresponding to the 1 st stage; determining example segmentation features corresponding to the k-1 stage based on the example segmentation scores corresponding to the k-1 stage; carrying out example segmentation at the kth stage based on the example segmentation features corresponding to the kth-1 stage and the i- (k-1) th level basic features to obtain example segmentation scores corresponding to the kth stage; and determining the example segmentation features corresponding to the k stage based on the example segmentation scores corresponding to the k stage.

The current basic feature is the basic feature of the ith level, the basic feature of the (i- (k-1) th level is the low-level basic feature of the ith level basic feature corresponding to the kth-level example segmentation, k is greater than or equal to 2 and less than or equal to N, and the value of N is i-1.

For example, a first instance segmentation module may be used to perform instance segmentation based on the basic feature characteristics to obtain a corresponding instance segmentation score, where the instance segmentation score is used to indicate a probability that a pixel point corresponding to the segmentation score is a boundary; and masking the instance segmentation scores to obtain instance segmentation features output by the first instance segmentation module.

The instance segmentation feature output by the kth instance segmentation module may be determined by: acquiring instance segmentation characteristics output by a (k-1) th instance segmentation module; determining an example segmentation score of a kth example segmentation module based on the example segmentation features output by the kth-1 example segmentation module, the base features of the i- (k-1) th level and the kth example segmentation module; and masking the instance segmentation scores to obtain instance segmentation features output by the kth instance segmentation module.

The mask image may be obtained by a mask process. The mask image may be used to cover a particular image or object. The mask image may be a two-dimensional matrix array or may be a multi-valued image. For example, the mask image may be a two-dimensional matrix array composed of "0 and 1", where 0 is used to indicate that the corresponding pixel is not a boundary, and 1 is used to indicate that the corresponding pixel is a boundary.

Specifically, the example segmentation features corresponding to the (k-1) th stage and the (i- (k-1) th hierarchy base features can be added and fused to obtain first fusion features of the (k-1) th stage; multiplying and fusing the first fusion characteristic of the k-1 stage and the example segmentation characteristic corresponding to the k-1 stage to obtain a second fusion characteristic of the k-1 stage; and carrying out example segmentation of the kth stage based on the second fusion characteristics of the kth-1 stage to obtain example segmentation scores corresponding to the kth stage.

The output of each stage is an instance division score (mask), which may be referred to as p _ mask. P is to be_{_mask}Example segmentation features are obtained by a threshold filter, which may be denoted as f_{_mask}The threshold value in the threshold filter may be 0.5, if the instance partition score is greater than the threshold value, it may indicate that the corresponding pixel point is a boundary, if the instance partition score is less than the threshold value, it may indicate that the corresponding pixel point is a boundary, and f is_{_mask}May be one [0,1 ]]A mask of [0,1 ]]The mask, which may also be referred to as a mask image, segments the instance into features f_{_mask}Directly adding with the shallow layer feature to obtain a first fusion feature f_{_fusion}. F is to be measured_{_mask}And f is with the_{_fusion}And multiplying the pixels to obtain a second fusion characteristic.

And S230, determining an example segmentation result of the target image based on the example segmentation data corresponding to each basic feature.

For each basic feature, fusing each instance segmentation score corresponding to the basic feature to obtain an instance segmentation comprehensive score corresponding to the basic feature; and determining an example segmentation result of the target image according to the example segmentation comprehensive score.

The fusion processing may refer to obtaining an instance segmentation comprehensive score after element-by-element weighted averaging based on instance segmentation scores corresponding to the basic features of each hierarchy; and determining an example segmentation result of the target image according to the example segmentation comprehensive score. The example segmentation scores correspond to each pixel in the target image and are provided with an example segmentation score value, correspondingly, the basic features of all the levels correspond to the example segmentation scores, each pixel corresponds to a plurality of example segmentation score values, and the element-by-element weighted average is used for carrying out weighted average on the example segmentation score values corresponding to each pixel.

This step may be implemented by a post-processing module in which a classification score p of the multi-stage instance segmentation may be applied_{_mask}Obtaining a comprehensive fraction p after element-by-element weighted averaging_{_final}The output size is (H multiplied by W multiplied by C)^*)，C^*H x W is used to indicate size and position for the detector to predict the class of object.

According to the embodiment of the application, multi-stage example segmentation prediction can be carried out, each stage is fused according to the input of the previous stage and the shallow feature, the capacity of the example segmentation prediction is continuously enhanced, the final example segmentation result is obtained by fusing the example segmentation results of each stage, the number of the prediction stages is controlled through more effective fusion through the fusion with the shallow feature, the speed is taken into consideration, and meanwhile, the accuracy of the example segmentation is improved through the multi-stage example segmentation prediction.

In some embodiments, semantic features may also be incorporated to further improve the accuracy of the instance segmentation results. Based on this, the method further comprises: determining semantic segmentation features corresponding to the target image based on the basic features; correspondingly, the target image can be subjected to N-stage example segmentation based on the basic features and the semantic segmentation features aiming at each basic feature.

At least two basic features of different levels can be fused to obtain a third fused feature; and determining semantic segmentation characteristics corresponding to the target image based on the third fusion characteristics and the semantic segmentation model.

The semantic segmentation model can comprise two first convolution layers of 3x3 and one second convolution layer of 1x1 which are connected in sequence. The semantic segmentation model also includes a 1x1 third convolutional layer in parallel with the second convolutional layer, wherein the first convolutional layer of the two 3x3 can be used to extract the initial semantic segmentation features. The input of the second convolution layer is the initial semantic segmentation characteristic and the output is the semantic segmentation result, the input of the third convolution layer is the initial semantic segmentation characteristic and the output is the channel number, and the semantic segmentation characteristic can comprise the semantic segmentation result and the channel number.

For example, in the case of semantic segmentation using FPN, each feature map may be convolved + up-sampled by two times to 1/4 resolution of the original image, and then added together, and finally up-sampled by 4 times to the resolution equal to the original image. And finally, returning a feature map which has the same size as the original image and the number of channels as the number of categories, wherein the feature map is the result of semantic segmentation.

As an example, the step S220 may be specifically implemented by the following steps: and aiming at the basic features of the ith level, carrying out example segmentation on the target image by using N cascaded example segmentation modules based on the basic features and the semantic segmentation features.

The input data of the first instance segmentation module is basic features and semantic segmentation features, and the output data of the first instance segmentation module is a first instance segmentation result and instance segmentation features aiming at the basic features; the input data of the kth example segmentation module is determined based on the example segmentation features output by the kth-1 example segmentation module and the base features of the ith (k-1) level corresponding to the base features; k is more than or equal to 2 and less than or equal to N, and the value of N is i-1.

Specifically, the current basic features can be determined in sequence according to the features, and N-stage instance segmentation is performed on the current basic features; for the 1 st stage example segmentation, performing 1 st stage example segmentation based on the current basic features and semantic segmentation features to obtain example segmentation data corresponding to the 1 st stage; and for the k-stage example segmentation, performing the k-stage example segmentation based on the example segmentation data corresponding to the k-1 stage and the basic features of the i- (k-1) th level to obtain the example segmentation data corresponding to the k-stage.

Specifically, the 1 st stage instance segmentation can be performed based on the current basic features and semantic segmentation features to obtain the instance segmentation scores corresponding to the 1 st stage; determining example segmentation features corresponding to the 1 st stage based on the example segmentation scores corresponding to the 1 st stage; determining example segmentation features corresponding to the k-1 stage based on the example segmentation scores corresponding to the k-1 stage; carrying out example segmentation at the kth stage based on the example segmentation features corresponding to the kth-1 stage and the i- (k-1) th level basic features to obtain example segmentation scores corresponding to the kth stage; and determining the example segmentation features corresponding to the k stage based on the example segmentation scores corresponding to the k stage.

For N, k and its hierarchy, reference may be made to the description in the previous embodiments.

As an example, a first instance segmentation module may be used to perform instance segmentation based on the basic features and the semantic segmentation features to obtain corresponding instance segmentation scores; and masking the instance segmentation scores to obtain instance segmentation features output by the first instance segmentation module.

In some embodiments, the semantic segmentation feature corresponding to the target image may be determined by a target semantic segmentation model, and the N-stage instance segmentation may be performed by a target multi-stage instance segmentation model. Based on the semantic segmentation model and the multistage instance segmentation model, the initial semantic segmentation model and the initial multistage instance segmentation model can be trained in a joint training mode to obtain the target semantic segmentation model and the target multistage instance segmentation model.

As one example, the multi-stage instance segmentation model may stack multiple stages, as shown in fig. 3. The multi-stage instance division model can be composed of 4 layers of Conv 3 × 3(3 × 3 convolution), the output is an instance division score, and the instance division score is filtered to obtain instance division characteristics. And directly adding the example segmentation features and the shallow features to obtain first fusion features. And multiplying the example segmentation feature and the first fusion feature pixel to obtain a second fusion feature which is used as the input of the next stage.

As shown in fig. 4, the embodiment of the present application may be implemented by 4 modules: the system comprises a feature extraction module, a semantic segmentation module, an instance segmentation module and a post-processing module. The feature extraction module is to extract a feature pyramid that is to be used as an input to a first convolution layer of the semantic segmentation module and a region generation Network (RPN) layer of the instance segmentation module. Wherein the first convolution layer is two convolution layers of 3x3, 1conv of 1x1 is used in the semantic segmentation module to predict the result of semantic segmentation, and one 1x1conv is used to combine the region of interest output by the RPN layer to adjust the number of channels, as the input of the multi-stage instance segmentation module in the instance segmentation module, and the input of the multi-stage instance segmentation module also comprises the region of interest output by the RPN layer. And the output of the multi-stage instance segmentation module is processed by the post-processing module to obtain a final result. Multitask training is increased by adding semantic segmentation tasks.

In some embodiments, semantic features may also be used to further improve the accuracy of the instance segmentation results. Based on the semantic segmentation feature, the semantic segmentation feature corresponding to the target image can be determined based on at least two different levels of basic features; and for each basic feature in at least two different levels of basic features, performing N-stage instance segmentation based on the basic features and the semantic segmentation features.

Fig. 5 is a schematic structural diagram of an example dividing device according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus includes:

the extraction module 501 is configured to perform feature extraction on a target image to obtain at least two different levels of basic features;

a segmentation module 502, configured to perform N-stage instance segmentation on the target image based on the basic features to obtain N instance segmentation data for each basic feature; the value of N is determined according to the level number corresponding to the basic characteristic, and N is greater than or equal to 1;

a determining module 503, configured to determine an example segmentation result of the target image based on the example segmentation data corresponding to each basic feature.

In some embodiments, the method further includes a semantic segmentation module, configured to determine, based on each basic feature, a semantic segmentation feature corresponding to the target image;

correspondingly, the segmentation module is specifically configured to:

In some embodiments, the segmentation module is specifically configured to:

aiming at the basic features of the ith level, carrying out example segmentation on the target image by using N cascaded example segmentation modules based on the basic features and the semantic segmentation features;

the input data of the first instance segmentation module is basic features and semantic segmentation features, and the output data of the first instance segmentation module is a first instance segmentation result and instance segmentation features aiming at the basic features; the input data of the kth example segmentation module is determined based on the example segmentation features output by the kth-1 example segmentation module and the base features of the ith (k-1) level corresponding to the base features; k is more than or equal to 2 and less than or equal to N, and the value of N is i-1.

In some embodiments, the segmentation module is specifically configured to: carrying out instance segmentation based on the basic features and the semantic segmentation features by utilizing a first instance segmentation module to obtain corresponding instance segmentation scores, wherein the instance segmentation scores are used for indicating the probability that pixel points corresponding to the segmentation scores are boundaries; and masking the instance segmentation scores to obtain instance segmentation features output by the first instance segmentation module.

In some embodiments, the segmentation module is specifically configured to: the example segmentation characteristics output by the kth example segmentation module are determined by the following steps: acquiring instance segmentation characteristics output by a (k-1) th instance segmentation module; determining an example segmentation score of a kth example segmentation module based on the example segmentation features output by the kth-1 example segmentation module, the base features of the i- (k-1) th level and the kth example segmentation module; and masking the instance segmentation scores to obtain instance segmentation features output by the kth instance segmentation module.

In some embodiments, the segmentation module is specifically configured to:

adding and fusing the example segmentation features corresponding to the (k-1) th stage and the basic features of the (i- (k-1) th level to obtain first fusion features of the (k-1) th stage;

and carrying out example segmentation of the kth stage based on the second fusion characteristics of the kth-1 stage to obtain example segmentation scores corresponding to the kth stage.

In some embodiments, the determining module is specifically configured to:

and determining an example segmentation result of the target image according to the example segmentation comprehensive score. The device provided by the embodiment of the present application has the same implementation principle and technical effect as the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments where no part of the device embodiments is mentioned.

Further, the present embodiment also provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the steps of the method provided by the foregoing method embodiment.

The image processing method, the image processing apparatus, and the computer program product of the system provided in the embodiments of the present application include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a division of one logic function, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the technical solutions of the present application, and the scope of the present application is not limited thereto, although the present application is described in detail with reference to the foregoing examples, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

17页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种通用型车牌文本识别方法

Instance splitting method, instance splitting apparatus, electronic device, and computer-readable storage medium

相关技术

网友询问留言