Image feature extracting method and its obvious object prediction technique

文档序号：1756155 发布日期：2019-11-29 浏览：17次中文

阅读说明：本技术 影像特征提取方法及其显著物体预测方法 (Image feature extracting method and its obvious object prediction technique ) 是由孙民郑仙资赵浚宏刘庭禄于 2018-05-21 设计创作，主要内容包括：本发明公开一种类神经网络的影像特征提取方法及其显著物体预测方法,适用于环景影像,包含下列步骤：将环景影像投影至立方模型以产生包含多个图像且彼此具有链接关系的图像组；以图像组作为类神经网络的输入,其中,当类神经网络的运算层对其中多个图像进行填补运算时,根据连结关系由多个图像中的相邻图像取得须填补的数据,以保留图像边界部分的特征；以及由类神经网络的运算层的运算而产生填补特征图,并由填补特征图中提取影像特征图。(The present invention discloses the image feature extracting method and its obvious object prediction technique of a kind of neural network, suitable for panorama image, the image group for including multiple images to generate and there is linking relationship each other is comprised the steps of panorama image projecting to cubic model；Using image group as the input of neural network, wherein when the operation layer of neural network carries out filling up operation to plurality of image, the data that must be filled up are obtained by the adjacent image in multiple images according to connection relationship, to retain the feature of image boundary part；And it is generated by the operation of the operation layer of neural network and is filled up characteristic pattern, and extract image feature figure in characteristic pattern by filling up.)

1. a kind of image feature extracting method of neural network is suitable for panorama image, which is characterized in that include following step It is rapid:

The panorama image projecting is generated to cubic model comprising multiple images and each other with the image group of linking relationship；

Using described image group as the input of neural network, wherein when the operation layer of the neural network is to wherein described When multiple images carry out filling up operation, the number that must be filled up is obtained by the adjacent image in described multiple images according to connection relationship According to retain the feature of image boundary part；And

It is generated by the operation of the operation layer of the neural network and is filled up characteristic pattern, and shadow is extracted by described fill up in characteristic pattern As characteristic pattern.

2. image feature extracting method as described in claim 1, which is characterized in that the operation layer to described multiple images into Row operation, and then generating, there are the multiple of connection relationship to fill up characteristic pattern each other, and formed and fill up characteristic pattern group.

3. image feature extracting method as claimed in claim 2, which is characterized in that when the operation layer pair of the neural network It is the multiple when filling up one of characteristic pattern and carrying out filling up operation, according to connection relationship, filled up in characteristic pattern by the multiple Adjacent characteristic pattern of filling up obtain the data that must fill up.

4. image feature extracting method as described in claim 1, which is characterized in that the operation layer is convolutional layer or pond Layer.

5. image feature extracting method as claimed in claim 4, which is characterized in that further include the neighbor map of described image Range as obtaining the data that must be filled up is controlled by the dimension of the filter of the operation layer.

6. a kind of obvious object prediction technique is suitable for panorama image, which is characterized in that comprise the steps of the panorama Image projecting is generated to cubic model comprising multiple images and each other with the image group of linking relationship；

Conspicuousness scoring is carried out to the picture element of each image in the static models, and obtains static obvious object figure；And in operation It is added in layer with shot and long term Memory Neural Networks operation layer, multiple static obvious object figures of different time is assembled, then It scores via conspicuousness and obtains dynamic obvious object figure；And

With loss equation formula, according to the dynamic obvious object figure of prior point to the dynamic obvious object figure of current point in time into Row optimization, using the obvious object prediction result as the panorama image.

7. obvious object prediction technique as claimed in claim 6, which is characterized in that the operation layer to described multiple images into Row operation, and then generating, there are the multiple of connection relationship to fill up characteristic pattern each other, and formed and fill up characteristic pattern group.

8. obvious object prediction technique as claimed in claim 7, which is characterized in that when the operation layer pair of the neural network It is the multiple when filling up one of characteristic pattern and carrying out filling up operation, according to connection relationship, filled up in characteristic pattern by the multiple Adjacent characteristic pattern of filling up obtain the data that must fill up.

9. obvious object prediction technique as claimed in claim 6, which is characterized in that the operation layer is convolutional layer or pond Layer.

10. obvious object prediction technique as claimed in claim 9, which is characterized in that further include the adjacent of described image The range that image obtains the data that must be filled up is controlled by the dimension of the filter of the operation layer.

Technical field

The present invention relates to a kind of image feature extracting method of neural network and its obvious object prediction techniques, use Of the invention carries out cube image procossing mode for filling up (Cube padding) via cubic model (Cube model), makes shadow Picture is complete and undistorted in the feature performance of pole, to meet the demand of user.

Background technique

In recent years, image joint technology starts to flourish, and 360 degree of panorama images are the one kind being widely used now Image presentation mode since its can each orientation of correspondence without dead angle can be used in every field, and covered again for now Machine learning mode, the prediction and study at no dead angle can be developed.

It but is square cylindrical projection since panorama image now is cylindrical equidistant projection method (EQUI) mostly, but equidistant cylinder Projection, which will cause image, can also generate extra pixel (being distorted) in the distortion of south poles (near pole), also generate object and distinguish The inconvenience known and applied, when handling these images with the system of computer vision, as the distortion of projection reduces prediction Precision.

Therefore, in the significance prediction of panorama image, how in the training framework of machine learning, more efficiently Panorama image pole problem of dtmf distortion DTMF is handled, and more rapidly and accurately generating output characteristic value will be associated picture processing institute, manufacturer Wish the target reached, therefore, the present inventor thinks and designs a kind of image feature extracting method and through engineering The mode of habit is compared with existing technology, and is improved for the missing of the prior art, and then the implementation in enhancement industry It utilizes.

Summary of the invention

In view of above-mentioned problem of the prior art, the object of the invention is to provide a kind of image feature extracting method and Its obvious object prediction technique may still have flaw or not to solve the object that prior art image method for repairing and mending is repaired out Naturally what is be distorted can not extract the defect of image feature value.

Purpose according to the present invention proposes a kind of image feature extracting method and its obvious object prediction technique, it includes Following steps: in by panorama image projecting to cubic model (Cube model) with generate comprising multiple images and each other have chain Connect the image group (Image stack) of relationship；Using image group as the neural network (Convolution Neural Networks, CNN) input, wherein when the operation layer (Operation layer) of neural network is to plurality of image When carrying out filling up operation (Padding), closed according to connection by the adjacent image (Neighboring images) in multiple images The data that must be filled up are obtained, to retain the feature of image boundary part；And it is produced by the operation of the operation layer of neural network Life is filled up characteristic pattern (Padded feature map), and image feature figure is extracted in characteristic pattern by filling up, and image feature figure is simultaneously It extracts static obvious object figure again with static models, shot and long term memory nerve can also be inserted into the operation layer of neural network Characteristic pattern is filled up in the operation generation of network operations layer (long short-term memory, LSTM), and is using loss equation Formula (Loss function) is to filling up after characteristic pattern is modified, and then the dynamic obvious object figure generated.

Preferably, panorama image may include any image presentation mode with 360 degree of visual angles.

Preferably, cubic model does not limit in addition to cube six surface models of the invention, may include extending to polygon yet Shape model, for example, octahedral model and 12 surface models etc..

Preferably, multiple images and each other with linking relationship image group (Image stack), the company of connection relationship Mode is connect with its cubic model and panorama image is put into the pretreatment (Pre- projected among cubic model Process), corresponding image boundary between the face and face in six faces of cubic model is used overlapping method by this pretreatment (Overlap) mode carries out, and is being adjusted it in neural network training.

Preferably, multiple images may include any by panorama image projecting to cubic model and multiple with linking relationship Image is formed by image group, and has the multiple images of the relative location generated according to connection relationship between image group.

Preferably, the multiple images of image group acknowledge connection relationship and utilization such as above-mentioned its process pretreatment (Pre- Process after cubic model), and input of the image group as neural network (CNN) according to this.

Preferably, image group can use operation layer with the operation layer training of neural network in the training process (Operation layer) progress image feature extracts training, and to by cubic model and with link while training The adjacent image (Neighboring images) that the multiple images of relationship are formed by image group carries out filling up operation It (Padding) is cube to fill up (cube padding), adjacent image is that the image in cubic model between face and face is For adjacent image, each such image group the operation layer training of neural network all have at least corresponding top, lower section, The cubic adjacent image of left, right, according to its adjacent image superposed relationship and confirm the characteristic value of its image boundary, and transport The bounds of its image boundary are further confirmed with the boundary of its operation layer.

Preferably, the adjacent image that can further include image to the range of operation layer obtains the range for the data that must be filled up It is controlled by the dimension (Dimension) of one of operation layer filter (Filter).

Preferably, image group confirms the mark of adjacent image and is overlapped in the operation layer training by neural network and closed It is to fill up characteristic pattern after system, confirms neighbor map in the operation layer training by neural network in present invention adjustment image group The mark and superposed relationship of picture make it have the performance of optimization in feature crawl and efficiency in neural network training process.

Preferably, when operation layer carries out operation to the image group, can further include generation each other there is above-mentioned connection to close The multiple of system fill up characteristic pattern.

Preferably, the expression and overlapping of adjacent image are confirmed in the operation layer training by neural network in image group It is to fill up characteristic pattern behind pass, via post-processing module (Post-process), this post-processing module is to filling up in characteristic pattern With maximum pond (Max-pooling), back projection (Inverse projection) and raising frequency (Up-sampling) etc. The characteristic pattern of filling up of operation layer by neural network is extracted image feature figure by processing method.

Preferably, and to its image feature figure static models (Static model, M are carried out_s) it is extracted after amendment it is quiet State obvious object figure, static models amendment confirm shadow with mark true value (Ground truth, GT) in image feature figure As feature mode and to carry out conspicuousness scoring (Saliency scoring) to the picture element of each image be static obvious object Figure (Static saliency map,)。

Preferably, the present invention need to first pass through the area method such as present invention under scanning curve using its conspicuousness methods of marking and mention And linearly dependent coefficient (Linear Correlation Coefficient, CC), Judd's area under the curve method (AUC- Judd, AUC-J) and multi-waves curve under area method (AUC-Borji, AUC-B) be all citing scanning curve under area side Method, therefore the present invention is all applicable to area method under any scanning curve, and after being scanned area under the curve method Image feature figure can be grabbed to it carries out a conspicuousness scoring.

Preferably, conspicuousness scores, it is main adjust re-optimization image feature extracting method of the invention static models with And among the dynamic model of insertion shot and long term Memory Neural Networks operation layer, and prior art side can be compared again from scoring simultaneously Method and baseline (Baseline), such as zero padding mend (Zero-padding), motion amplitude (Motion Magnitude), consistent Property significant image (ConsistentVideoSal) and significant neural (SalGAN), and confirm that this present invention scores from conspicuousness Brilliant score can be obviously shown in this objective method.

Preferably, image group may be inserted into shot and long term Memory Neural Networks fortune via the operation layer training of neural network Calculate in layer generate two and fill up characteristic pattern with time continuity feature, and its image group have with it is above-mentioned illustrated vertical Square model and multiple images with linking relationship are formed by image group and indicate it.

Preferably, image group may be inserted into shot and long term Memory Neural Networks fortune via the operation layer training of neural network What is generated in calculation layer fills up characteristic pattern with time continuity feature, by two of shot and long term Memory Neural Networks operation layer Continuously filling up characteristic pattern need to be modified with loss equation formula again, and loss equation formula mainly strengthens two and continuously fills up feature The time consistency of figure.

Preferably, when operation layer carries out operation to multiple images, can further include to generate has connection relationship each other It is multiple to fill up characteristic pattern, it forms this and fills up characteristic pattern group.

Preferably, operation layer can further include convolutional layer (Convolutional layer), pond layer (Pooling ) and shot and long term Memory Neural Networks operation layer (LSTM) layer.

Another object according to the present invention, a kind of method for proposing obvious object prediction, is suitable for panorama image, under including Column step: the image feature figure of panorama image is extracted, as static models；The picture element of image each in static models is carried out significant Property scoring, and obtain static obvious object figure；And be added in operation layer with shot and long term Memory Neural Networks operation layer, it will be different Multiple static obvious object figures of time are assembled, then score via conspicuousness and obtain a dynamic obvious object figure；And With loss equation formula, carried out according to dynamic obvious object figure of the dynamic obvious object figure of prior point to current point in time excellent Change, using the obvious object prediction result as panorama image.

In conclusion image feature extracting method under this invention and obvious object prediction technique, can have one or more A following advantages:

(1) this image feature extracting method and its obvious object prediction technique, can be using based on panorama image and using Cubic model mode makes its non-warping distortion of pole image feature figure in turn, and parameter can adjust image relay range in cubic model And the molding depth network architecture, and then the distortion factor is reduced to promote image feature figure crawl quality.

(2) this image feature extracting method and its obvious object prediction technique, can be via convolutional neural networks and to shadow , so that repairing the image energy completed closer to practical image, subtract as being repaired, then with thermal imagery as image output is completed The case where unnatural picture, occurs in few image.

(3) this image feature extracting method and its obvious object prediction technique can be useful in any panoramic shooting and virtual In the auxiliary of reality, the operation of device will not be hindered because of huge operand, improve and use upper popularization.

(4) this image feature extracting method and its obvious object prediction technique, all can be with known shadow on output effect As complementing method all can show more optimized in conspicuousness scoring.

Detailed description of the invention

The step of Fig. 1 is the image feature extracting method of the embodiment of the present invention is schemed.

Fig. 2 is after the image feature extracting method panorama image input of the embodiment of the present invention is trained by neural network Static models and insertion shot and long term Memory Neural Networks operation layer corresponding relationship distribution diagram.

Fig. 3 is the computing module schematic diagram of the image feature extracting method of the embodiment of the present invention.

Fig. 4 is the VGG-16 neural network training model of the image feature extracting method of the embodiment of the present invention.

Fig. 5 is the ResNet-50 neural network training model of the image feature extracting method of the embodiment of the present invention.

Fig. 6 is the stereopsis schematic diagram of the image feature extracting method of the embodiment of the present invention.

Fig. 7 is that the panorama image solid line of the image feature extracting method of the embodiment of the present invention and cubic model grid lines indicate Figure.

Fig. 8 is six face distribution diagrams of the stereopsis of the image feature extracting method of the embodiment of the present invention.

Fig. 9 be the image feature extracting method of the embodiment of the present invention cube fill up with zero padding mend it is practical relatively figure.

Figure 10 is the shot and long term Memory Neural Networks operation layer block diagram of the image feature extracting method of the embodiment of the present invention.

Figure 11 A-11D is the practical crawl effect picture of the image feature extracting method of the embodiment of the present invention.

Figure 12 A and 12B be the embodiment of the present invention comparison image feature extracting method practical crawl feature thermally scheme and Actual plan view.

Figure 13 A and 13B are the practical crawl feature in different images source of the image feature extracting method of the embodiment of the present invention And thermally scheme.

Description of symbols

S101, S102, S103, S104, S105: step

201: static models

202: time model

203,3013: preprocessing module

204: neural network training

205,3012: post-processing module

206,100a: shot and long term Memory Neural Networks operation layer

207,3011: loss module

301: module

400a:VGG-16 neural network training model

500a:ResNet-50 neural network training model

601: panorama image

602,1202: cubic model

603: solving border issue

604: image feature figure

605: characteristic pattern application

701: cubic model schematic diagram

702: the lattice line chart of zero padding compensating method

703: the lattice line chart of cube complementing method

801: cube expanded view

802: image boundary overlapping schematic diagram

The face 803:F schematic diagram

901: cube filling up

902: zero padding is mended

903a, 903b: cubic plane

1201,1304: cylindrical equidistant projection method

1203, Ours: image feature extracting method

1301: special to characteristic pattern

1302: practical thermally to scheme

1303: normal visual field figure

1305: cube filling up model method

1306:Drone

1307:Wild-360

B, D, F, L, R, T: six faces of cubic model

NFoVs: normal visual field figure

P1, P2, P3: corresponding points

Size: size

Pool/2: pond layer

GT, 1204: true value

Frame, 1205: actual plan view

Time: time shaft

Specific embodiment

The present invention cooperates attached drawing, and detailed description are as follows with the expression-form of embodiment: as shown in Figure 1, it is respectively this The method figure of the embodiment of the image feature extracting method of the pick-up image of invention, comprises the steps of (S101-S105):

Step S101: one 360 degree of panorama images of input, 360 degree of panorama images can be taken by various image capturing devices , for example, wild-360 and Drone etc..

Step S102: multiple images are established with a preprocessing module (Pre-process) and there is linking relationship each other Image group (Image stack).For example, preprocessing module 3013 is by six faces of cubic model as a corresponding panorama image Corresponding multiple images, connection relationship image boundary carry out by way of overlapping method (Overlap), this preprocessing module 3013 indicate to indicate such as the preprocessing module 3013 in Fig. 3, after pretreated model P passes through in panorama image It system in the middle, Generate a panorama image It corresponded under cubic model.This cubic model can be as shown in Figure 7, wherein cubic model 701 is from working as In panorama image indicated with circular net ruling, and the six face tables in the face B of corresponding cubic model, the face D, the face F, the face L, the face R, the face T To show, connection relationship is other than the overlapping method (Overlap) that step S101 is referred to and further includes one adjacent image of confirmation, and It can be seen that the cubic model schematic diagram in a corresponding face F from cubic model 903, and in the multiple images and fortune of confirmation connection relationship Image group is formed after the cubic model of preprocessing module (Pre-process) with such as above-mentioned its, and image group is done according to this For the input of neural network (CNN).

Step S103: neural network training is carried out with image group, neural network training process can be in the mind of class later Through being referred in network training process, wherein the range in the operation layer of neural network training can further include the phase of the image Adjacent image obtains the ranges of the data that must be filled up by the dimension (Dimension) of one of operation layer filter (Filter) into one Step control adjacent image image boundary overlapping (Overlap), and from neural network training process feature crawl with There is the performance for finding out optimization in efficiency.It after image group is using neural network training, generates one and fills up characteristic pattern, and from figure It can illustrate that its cube is filled up (Cube padding) and adjacent image and can be illustrated from cubic model 801,802,803 in 8, such as from Cubic model 801 is the expression of cubic model expanded view, and the face F in the middle is one side, and the four sides adjacent for the face F is the face T, L Face, the face R, the face D indicate, and further can indicate the overlapping between image from cubic model 802, fill up characteristic pattern for image group As input picture, and the output image in the operation layer when cube filling up with neural metwork training after dimension adjustment Fill up characteristic pattern.

Step S104: with a post-processing module (Post-process) to filling up in characteristic pattern with maximum pond (Max- Pooling), the processing methods such as back projection (Inverse projection) and raising frequency (Up-sampling) are by class The characteristic pattern of filling up of the operation layer of neural network extracts image feature figure, such as linear using area method under scanning curve Related coefficient (Linear Correlation Coefficient, CC), Judd's area under the curve method (AUC-Judd, AUC- J area method (AUC-Borji, AUC-B)) and under multi-waves curve, be all citing scanning curve under area method.Therefore this Invention is all applicable to area method under any scanning curve, and can just grab to it after being scanned area under the curve method Take image feature figure.

Step S105: area under the curve method is scanned to it can just grab the progress of image feature figure significantly to it later Property scoring, the main re-optimization image feature extracting method of the invention that adjusts is in static models and insertion shot and long term memory nerve Among the dynamic model of network operations layer, and art methods and baseline can be compared again from scoring simultaneously (Baseline), such as zero padding mends (Zero-padding), motion amplitude (Motion Magnitude), the significant image of consistency (ConsistentVideoSal) and it is significant neural (SalGAN), and confirm that this present invention scores this objectively side from conspicuousness Brilliant score can be obviously shown in method.

In step s 102, into the image group of neural network (CNN) training, i.e., neural network training of the invention System is VGG-16 with 500a as shown in Figure 5 and 600a shown in fig. 6 is two type neural network training model of ResNet-50 It is trained, carrying out the operation layer among neural network training includes convolutional layer (Convolutional layer) and pond The training for changing layer (Pooling layer), there is the convolution kernel using 7 × 7,3 × 3 and 1 × 1 in convolutional layer.With English in figure Text abbreviation and number are named and are grouped to each convolutional layer.

As shown in figs. 4 and 5, the neural network training pattern of image feature extracting method of the invention, Fig. 4 VGG- 16 neural network training model 400a and Fig. 5 are that the operation layer in ResNet-50 neural network training model 500a includes volume Lamination and pond layer, the range of this operation layer is controlled by the dimension (Dimension) of filter (Filter), and controls this The range of operation layer controls cube bounds filled up simultaneously.

In 400a VGG-16 neural network training model using 3 × 3 convolution kernel wherein first group include two first 3 × 3conv of convolutional layer, 64, size Size:224 and first are jumped convolutional layer i.e. the first pond layer pool/2；Second group includes two A second convolutional layer conv, 128, size Size:112 and second are jumped convolutional layer i.e. the second pond layer pool/2；Third group packet Containing three 3 × 3Conv of third convolutional layer, 256, size Size:56 and third jump convolutional layer i.e. third pond layer pool/2；The Four groups include three Volume Four lamination 3 × 3conv, and 512, size Size:28 and the 4th jumps convolutional layer i.e. the 4th pond layer pool/2；5th group includes three the 5th 3 × 3conv of convolutional layer, and 512, size Size:14 and the 5th jumps convolutional layer i.e. the 5th Pond layer pool/2；6th group then carries out resolution scan under size Size:7.Such grouping indicates to generate by the group Characteristic pattern of filling up afterwards is identical dimensional, and Size number is resolution ratio, and the number after operation layer then represents characteristic dimension, should Dimension controls the range of this operation layer also while controlling cube bounds filled up of the invention.In this, convolutional layer with Both pond layers purpose is all that the information for generating preceding layer is further mixed and spread, with the receptive field of rear layer (Receptive field) is gradually expanded, it is expected that capturing feature of the image under different levels.It is different from just across convolutional layer The step size settings that are to jump in place of normal convolutional layer are 2, and the size for filling up characteristic pattern after this layer halves naturally, and reaching more has Effect information exchange reduces computational complexity simultaneously.

It is by the purposes of VGG-16 neural network training model convolutional layer convolutional layer in 400a by the information of preceding layer Successively integration allows the characteristic pattern resolution ratio of filling up being gradually reduced to expand back and be originally inputted resolution ratio, therefore enlargement ratio is set It is 2.It is rolled up at present in addition, linked simultaneously using pond layer on being designed at this front corresponding to filling up on characteristic pattern string for resolution ratio Long-pending result continuation is transmitted backward, it is therefore intended that most preceding several layers of will be possessed strong object structures information and is used to prompt and assist convolution The generation of layer is as a result, can be as far as possible close to original image structure.After the generation model of the present embodiment can input image, by above-mentioned Convolution, conversion and export generation image, but the form of convolutional layer of the present invention and the number of plies are not limited to framework described in figure, for Different resolution image and to the convolution channel type and the adjustment made of the number of plies for generating model, should also be contained in scope of the present application In the middle.

By in 500a ResNet-50 neural network training model had using 7 using neural network training pattern × 7,3 × 3 and 1 × 1 convolution kernel, wherein first group includes 7 × 7 convolution kernel conv of the first convolutional layer, 64/2 and first are jumped The i.e. first maximum of convolutional layer pond layer max pool/2；Second group at size Size:56 comprising in three groups every group of operation layer all Comprising three second convolutional layer 1 × 1Conv, 64, second convolutional layer 3 × 3conv, 64, second convolutional layer 1 × 1conv, 64 and (solid line expression) and the convolution interlayer that jumps (dotted line expression) all do with the second maximum pond layer max pool/2 and connect between volume base Knot；Third group is at size Size:28 comprising all rolling up comprising three third convolutional layers, first third in three groups every group of operation layer Base wherein be respectively 1 × 1conv, 128/2,3 × 3conv, 64 and 1 × 1conv, 512, second third convolutional layer 1 × 1conv, 128,3 × 3conv, 128 and 1 × 1conv, 512, third 1 × 1conv of third convolutional layer, 128,3 × 3conv, 128 and 1 × 1conv, between 512 and volume base and the convolutional layer that jumps all does with third maximum pond layer max pool/2 and connects Knot；4th group at size Size:14 comprising all including three Volume Four laminations, first Volume Four in three groups every group of operation layer Base is wherein respectively 1 × 1conv, 256/2,3 × 3conv, 256 and 1 × 1conv, 1024, second third convolutional layer 1 × 1conv, 256,3 × 3conv, 256 and 1 × 1conv, 1024, third third convolutional layer 1 × 1conv, 256,3 × 3conv, 256 and 1 × 1conv, between 1024 and volume base and the convolutional layer that jumps all uses the 4th maximum pond layer max pool/ 2 do and link；5th group at size Size:7 comprising all including three the 5th convolutional layers in three groups every group of operation layer, first the The bases of volume five are wherein respectively 1 × 1conv, 512/2,3 × 3conv, 512 and 1 × 1conv, 2048, second the 5th convolution 1 × 1conv of layer, 512,3 × 3conv, 512 and 1 × 1conv, 2048, third the 5th 1 × 1conv of convolutional layer, 512,3 × 3conv, 512 and 1 × 1conv do with the 5th maximum pond layer Max pool/2 between 2048 and volume base and link and jump Convolutional layer is done with average pond layer avg pool/2 to be linked；It is to the 6th group of then size Size:7 after an average pond layer Lower to carry out resolution scan, grouping indicates that the characteristic pattern of filling up after group generation is identical dimensional, and such as every layer below Shown in bracket number, Size number is resolution ratio, and the number after operation layer then represents characteristic dimension, which controls this operation The range of layer also controls cube bounds filled up of the present invention simultaneously.In this, both convolutional layer and pond layer purpose All it is that the information for generating preceding layer is further mixed and spread, with the receptive field (Receptive of rear layer Field it) is gradually expanded, it is expected that capturing feature of the image under different levels.Across different convolutional layers in normal convolutional layer it Being in the step size settings that jump is 2, and the resolution ratio for filling up characteristic pattern after this layer halves naturally, reaches more effective information and hands over It changes while reducing computational complexity.

By the purposes of the convolutional layer of ResNet-50 neural network training model in 500a be by the information of preceding layer by Layer integration allows the characteristic pattern resolution ratio of filling up being gradually reduced to expand back and be originally inputted resolution ratio, therefore enlargement ratio is set as 2.In addition, designing upper link simultaneously using pond layer at this fills up current convolution on characteristic pattern string for what front corresponded to resolution ratio Result continuation transmit backward, it is therefore intended that will most it is preceding it is several layers of possess strong object structures information be used to prompt and assist convolutional layer Generation as a result, can be regarded again with one group of data segment (block) under equal resolution in the middle as far as possible close to original image structure Real-time imaging extraction process needs not wait for entire neural network training and completes to extract again.The generation model of the present embodiment After image being inputted, generation image is exported by above-mentioned convolution, conversion, but the form of convolutional layer of the present invention and the number of plies not office It is limited to framework described in figure, the tune made for different resolution image to the convolution channel type and the number of plies that generate model It is whole, it should also be contained in the claim of this case.

The two type neural network training models of VGG-16, the ResNet-50 referred in above-mentioned Fig. 4 and Fig. 5.Such as " IEEE International calculator vision and pattern-recognition meeting (IEEE Conference on Computer Vision and Pattern Recognition) ", as also recording in 1512.03385 and 1409.1556, by panorama shadow in the image feature extracting method As via cubic model convert and use the progress cube of above two neural network training pattern fill up in and generation fill up Characteristic pattern.

In step s 103, image group fills up characteristic pattern later using neural network training for one, this fills up feature Figure and need to be using a post-processing module (Post-process) to filling up in characteristic pattern with maximum pond (max- Pooling), the processing methods such as back projection (inverse projection) and raising frequency (up-sampling) are by class The characteristic pattern of filling up of the operation layer of neural network extracts image feature figure.

In step s 103, this fill up characteristic pattern and need to using one post-processing mould (Post-process) extract through The characteristic pattern of filling up for crossing the operation layer of neural network extracts image feature figure, which, which can use, thermally schemes (Heat map) and its hot field (Heat zone) mode is grabbed to confirm that its image feature is compared with real image characteristic value It is confirmed whether to extract correct image feature.

In step s 103, when the operation layer that image group is trained using neural network, it can be inserted into shot and long term wherein Memory Neural Networks operation layer (LSTM), and do dynamic model training again, need during retraining along with loss equation formula its Main two strengthened through the training of shot and long term Memory Neural Networks operation layer continuously fill up the time consistency of characteristic pattern.

As shown in Fig. 2, it is respectively the panorama shadow of the embodiment of the image feature extracting method of pick-up image of the invention As input is by the static models and dynamic model flow chart after neural network training, component explanation and component connection letter It is single to describe, I in Fig. 2_tAnd I_t-1Be all a panorama image input and after preprocessing module 203, that is, enter neural network instruct Practice model 204 wherein comprising filling up CP to the progress cube of panorama image, can obtain and fill up characteristic pattern M_S,t-1、M_S,tAnd it passes through and locates later Module 205 is managed, that is, generates static obvious object figure O^S _t-1、O^SOr by shot and long term Memory Neural Networks operation layer 206 after Corresponding L is corrected via loss module 207 again after processing module 205_t-1、L_tA dynamic obvious object figure O can be obtained_t-1、O_t, the group Between part relationship mutually cherish description all can by above embodiment illustrate and the present invention refer to preprocessing module 203, post-processing Module 205, loss module 207 following again can be described it, convert out six faces via cubic model with panorama image M is exported after two dimensional image and this six-face image as a static models 201_S,And by will be from convolutional layer (Convolutional Layer) multiplication feature M_lWith completely connected layer W_fc, it is as follows with its formula:

M_S=M_l*W_fc

In the middle, M_S∈R^6×K×w×w、M_l∈R^6×c×w×w、W_fc∈R^c×K×1×1, c is number of channels, and w is that corresponding feature is wide Degree, " * " indicate convolution algorithm, and K is the class number of the model of pre-training on specific classification data set, in order to generate static notable figure S, according to pixel mobile picture (Pixel-wisely) along the M of dimension (Dimension)_SIn maximum value.

As shown in figure 3, illustrating the module (301) that the present invention uses, include

The computing module for losing module (Loss, L) 3011, by shot and long term Memory Neural Networks operation layer (LSTM) The dynamic obvious object figure O of reason_t、O_t-1And it generates and fills up characteristic pattern m_tImage impairment can be minimized by loss module (L) Form dynamic notable figure L_t, loss module is that a loss equation formula (Loss function) is used to carry out, the loss equation formula Main two strengthened through the training of shot and long term Memory Neural Networks operation layer continuously fill up the time consistency of characteristic pattern, loss Equation can following the description again.

The computing module of post-processing module (Post-process) 3012 refers to by the inverse throwing after maximum pond layer Max Shadow (Inverse projection) P^-1It is handled after converting back image using raising frequency (Upsampling) U, this is made to fill up feature Scheme M_tOr thermally scheme H_tBy projection to cubic model please by neural network training comprising must be after post treatment after cube filling up Module can restore the obvious object figure O through neural network training_t、O_t ^S。

Preprocessing module (Pre-process) 3013 is that must pass through preprocessing module before projecting using cubic model, pre- Processing module is generated comprising a preprocessed module of panorama image It (P) by multiple images and to be put into cubic model and make this more There is a image a linking relationship to form an image group I each other_t。

As shown in fig. 6, the characteristics of image schematic diagram of the cubic model of image feature extracting method of the invention and cube mould Six face distribution diagrams of type, Fig. 6 is for practical panorama image 601 via reconvert after cubic model schematic diagram 602 at corresponding actual rings The thermal imagery mode 603 of scape image 601 is expressed as the extraction of its image feature via image feature Figure 60 4 again after solving border issue Practical thermally figure (704 and thermally figure can correspond to and from normal visual field (Normal Field from the reality of P1, P2, P3 corresponding points Of View) NFoVs angle indicates its characteristic pattern using 605.

If Fig. 7 is the panorama image (solid line expression) under cubic model, six faces are expressed as the face B, the face D, the face F, the face L, R Face and the face T simultaneously can indicate cubic model schematic diagram 701 and six faces via the lattice line chart of zero padding compensating method from grid lines 702 and six face via the lattice line chart 703 of cube complementing method compare it is evident that zero padding compensating method cube net The distortion of 2 edge solid line of ruling Figure 70,

And it is as follows with its cubic model formula:

In the middle, S^j(x, y) of (x, y) is conspicuousness scoring (saliency scoring) S at the position of cubic plane j, By this cubic model formula.

If Fig. 8 is that real image maps six faces (face B, the face D, the face F, the face L, the face R and the face T) cube expanded view 801 Image relay part (frame) is confirmed from cubic model processing sequence 802 and can be learnt from image boundary overlapping schematic diagram, and can be right The face F of the face cubic model F schematic diagram 803 is answered to confirm to do.

As shown in figure 9, cubic model (Cube padding) method and prior art zero padding compensating method (zero Padding) its characteristic pattern image does significant degree and compares, and is evident that from the crawl feature picture frame of Fig. 9 through cube shadow filled up As feature extracting method neutral characteristic figure in 901 white area be significantly more than the image feature extracting method mended through zero padding 902 white area, and more hold from the image after cubic model processes can be represented in icon than the image of zero padding benefit technology Easily grab its image feature, and the practical striograph after being all cubic model in cubic plane 903a, 903b.

In conclusion be all still image processing, thus can time model 202 combines in Fig. 2 for another example, the image for keeping it static A continuous dynamic image, the time mould 202 such as the Memory Neural Networks operation of Figure 10 shot and long term are generated along with time sequence arranges Shown in layer 100a block diagram, the running of shot and long term Memory Neural Networks operation layer is as follows:

g_t=tanh (W_xc*X_t+W_hc*H_t-1+b_c)

In the middleElement is indicated to the multiplication of element, σ () is S type function, all W_*And b_*It is the model ginseng that need to learn Number, i are input values, and f is to ignore value and o output valve for the control signal of [0,1], and g is the input signal by transformation, value For [- 1, -1], C is memory unit value, H ∈ R^6×K×w×wIt is the representation as output and regular input, M_SIt is static mould The output of type, t are time indexs and can be used to indicate time step in subscript., and above-mentioned shot and long term is remembered into nerve net Network operation layer (LSTM) brings six faces (face B, the face D, the face F, the face L, the face R and the face T) after cube filling up into turn.

Its formula is as follows:

In the middle,It is that main conspicuousness scores at position (x, y) through a time step t in the position of cubic plane j Place, and need to be using the model between the discrete picture under consistent loss (the Temporal consistent loss) adjustment of dynamic Related receptor carrys out the training time with 3 loss functions to every pixel displacement warpage, the influence of smoothness etc., therefore the present invention Model simultaneously carrys out optimized reconstruction loss L through time shaft^recons, smoothly lose L^smooth, dynamic reconstruction lose L^motion, each time The total losses function of step-length t can indicate are as follows:

L in the middle^recons(Temporal reconstruction loss) is lost for dynamic reconstruction, L^smoothSmoothly to lose (Smoothness loss), L^motionMobile eclipsing loss (Motion masking loss), via the consistent loss adjustment of dynamic Cross the total losses function that can formulate each time step t, and via

Dynamic reconstruction loss equation formula

Be in dynamic reconstruction loss equation formula should be had by the same pixel across different time step-length t it is similar significant Property scoring, it is with similar movement mode to picture that this equation, which helps more accurately repair characteristic pattern,.

Smooth loss equation formula

It is used to limit neighbouring frame in smooth loss equation formula with similar response without big change, it Inhibit time reconstruction equation and mobile eclipsing loss equation noisy (Noisy) or drift (Drifting) and

Mobile eclipsing loss equation

If motion amplitude reduces ∈ Move Mode and keeps stablizing in long-time step-length in mobile eclipsing loss equation, The saliency scoring of these non-moving pixels should be lower than change patch (Patch).

And by a plurality of static state obvious object figures of different timeAssembled (aggregate), then via aobvious Write property scoring and obtain dynamic obvious object figure (Temporal saliency map,), and use loss equation formula (Loss Function), according to the dynamic obvious object figure of prior pointTo the dynamic obvious object of current point in time FigureIt optimizes, using the obvious object prediction result as panorama image.

As shown in figure 11, the image feature extracting method of comparative static model and existing Extraction of Image method are in class nerve net Network training process VGG-16 and ResNet-50 with plus under the dynamic model of shot and long term Memory Neural Networks operation layer (LSTM), And horizontal axis is that (for pixel from Full HD:1920pixel to 4K:3096pixel), the longitudinal axis is display number per second to image resolution ratio (FPS)。

Compare four kinds of image analysis methods in static models.

1. cylindrical equidistant projection method (EQUI) generates characteristic pattern as input for the six face cubes that static models use (Our state) directly does cylindrical equidistant projection method to it.

2. cube method (Cubemap) is that the six face cubes that static models use generate characteristic pattern as input (Our state), however, after mending (ZP) through neural network operation layer is operated by convolutional layer and pond layer using zero padding Dimension control the zero padding benefit image boundary so that still there is successional loss on the surface of cube.

Make the angle between its face and face that there are 120 degree 3. overlapping method (Overlap) sets one cube of variant filled up There is image at more overlappings and generate characteristic pattern, however, mending (ZP) using zero padding and for passing through network operation operation Layer controls zero padding benefit by the dimension after convolutional layer (Convolution layer) and pond layer (Pooling layer) Image boundary so that still because of zero padding compensating method therefore there is successional loss on the surface of cube.

4. cubic model of the invention and when panorama image is only directly placed into the pretreatment of cubic model, are not made any It adjusts (Our static), through operation neural network operation layer by convolutional layer and pond layer.

5. image feature extracting method (Ours) of the invention, briefly method of the invention is to fill out with above-mentioned cube It mends model method 1305 and further fills up mode with cube and carry out one overlapping method of setting, for passing through neural network Operation layer passes through the dimension control cube of convolutional layer (Convolution layer) and pond layer (Pooling layer) Behind the boundary filled up, so that the surface of cube is without successional loss.

6. dynamic training process is mainly image feature extracting method (Ours) of the invention, briefly method of the invention One overlapping method of setting is carried out to fill up model method with above-mentioned cube and further filling up mode with cube, is used for By neural network operation layer by convolutional layer (Convolution layer) and pond layer (Pooling layer) The dimension control cube boundary filled up after, and be inserted into shot and long term Memory Neural Networks operation layer (LSTM) followed by, and fortune Shot and long term Memory Neural Networks operation layer (EQUI+LSTM) is added with known cylindrical equidistant projection method.

Compared with above-mentioned image feature extracting method and from figure through ResNet-50 neural network training model and VGG-16 neural network training model is evident that the raising with image resolution ratio, as a result cube filling up model method 1305 speed becomes closer to cubic mapping method, in addition, of the invention cube is filled up model method 1305 and overlapping method The image resolution ratio of all static models test be all more than equidistant long square-cylindrical static models method.

It as shown in table 1, is that six kinds of methods in above-mentioned Figure 12 A and Figure 12 B score with baseline (Baseline) through conspicuousness Representation after change is compared with following three kinds of obvious objects prediction technique assessment mode, cylindrical equidistant projection method (EQUI), overlapping method (Overlap), dynamic training through shot and long term Memory Neural Networks operation layer (LSTM) comparative approach It is all identical as the 5th diagram.

Obvious object prediction technique is and compares with three kinds of area under the curve, Judd's area under the curve method (AUC-Judd, AUC-J) system is by calculating correcting errors rate and False Rate and measuring our conspicuousness prediction and regard with the mankind for viewpoint Feel that area method (AUC-Borji, AUC-B) system is to image pixel under difference and the multi-waves curve between the brass tacks of label Uniformly random sampling is carried out, and the significant map values other than these pixel thresholds are defined as erroneous judgement and linearly dependent coefficient (CC) Related coefficient is a kind of measurement based on distribution, for measuring the linear relationship between given Saliency maps and viewpoint, coefficient value Between -1 and 1, indicate it is with linear relationship between our output numerical value and ground truth.

From in table 1 other than the method in above-mentioned Figure 11 A to Figure 11 D i.e. add image feature extracting method of the invention (Ours), briefly method of the invention is to fill up model method 1305 with above-mentioned cube and further fill out with cube Benefit mode carries out one overlapping method of setting, for passing through convolutional layer (Convolution by neural network operation layer Layer) and behind the dimension of pond layer (Pooling layer) the control cube boundary filled up, so that the surface of cube is without even The loss of continuous property.

With other existing base linc motion amplitudes (Motion Magnitude), the significant image of consistency (ConsistentVideoSal) and significant neural (SalGAN) does conspicuousness scoring and compares.

Numerically it is evident that the image feature extracting method (Ours) of the present invention in addition in ResNet- from table 1 The lower score of 50 neural network training is slightly low outer with our cubic model (Our static) than only, remaining is all highest Score, it follows that conspicuousness scoring possesses more brilliant performance to the present invention again.

Table 1

As shown in Figure 12 A to Figure 12 B, we analyze by the striograph of dynamic training from reality with practical panorama image Border range thermally can find to obviously increase via our method wire region in figure, represent and do with of the invention with the prior art Comparing can be as seen from the figure in cylindrical equidistant projection method 1201, cubic model 1202, overlapping method 1203,1204 image of true value Being on characteristic pattern can more optimized progress characteristics of image crawl.

It as shown in table 2, is finally still to be determined whether by human eye because image distortion is other than machine determines whether to be distorted For distortion be main foundation thus use cubic model method (Ours statics), cylindrical equidistant projection method (EQUI), cube It shows consideration for drawing method (Cubemap) and true value (Ground truth, GT) compares scoring, Numerical value method uses human eye Determine whether to be distorted, as image through human eye determine it is undistorted as score (Win) and image fault as lose points (Loss) do ratio Compared with can determine that the scoring of image feature extracting method (Ours) 1203 of the invention is above cylindrical equidistant projection method from score (EQUI), cube method (Cubemap) and the method (Ours for using cubic model but being mended using zero padding The image processing methods such as statics), and from the image feature of the image feature extracting method 1203 through the present invention in human eye judgement Have been approached practical figure.

Table 2

Image feature extracting method 1203 again by taking Figure 12 A and Figure 12 B as an example, and in corresponding diagram 12, it is corresponding and with reality Plan view 1205 and physical plane enlarged drawing 1207 simultaneously compared with, it is evident that image feature extracting method 1203 of the invention Mainly the performance on thermally figure is compared more significant with other methods.

Again by taking Figure 13 A and Figure 13 B as an example, isometric circle is done with two kinds of panorama image Wild-360 1306 and Drone1307 Column projecting method (EQUI) 1304 with cube fill up model method (Ours static) 1305 and its spy done to characteristic pattern 1301 Compare obviously find to compare and cube fills up model method 1305 in practical thermally Figure 130 2 and normal visual field Figure 130 3 and real It is all showed on image-capture when having time axis Time changes again in the plan view Frame of border more superior.

Image feature extracting method Ours of the invention fills up model method 1305 and again with vertical with above-mentioned cube Side fills up mode and carries out one overlapping method of setting, for passing through convolutional layer by neural network operation layer Behind the dimension control cube boundary filled up of (Convolution layer) and pond layer (Pooling layer), so that cube The surface of body is without successional loss.The feature extracting method and obvious object prediction technique of above-mentioned panorama image can be transported further For intelligently fortune mirror editing, intelligent monitor system, the navigation of robot field domain, perception of the artificial intelligence to wide-angle content of panorama image In understanding judgement, it is not limited merely to the application of the panorama image in previous embodiment.

It is discussed in detail although the contents of the present invention have passed through above preferred embodiment, but it should be appreciated that above-mentioned Description is not considered as limitation of the present invention.After those skilled in the art have read above content, for of the invention A variety of modifications and substitutions all will be apparent.Therefore, protection scope of the present invention should be limited to the appended claims.

32页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于污水偷排企业定位的图像识别方法

Image feature extracting method and its obvious object prediction technique

相关技术

网友询问留言