Visual localization in images using weakly supervised neural networks

文档序号：1618497 发布日期：2020-01-10 浏览：21次中文

阅读说明：本技术 使用弱监督神经网络在图像中视觉定位 (Visual localization in images using weakly supervised neural networks ) 是由彭冠铨吴子彦扬·恩斯特于 2018-03-16 设计创作，主要内容包括：一种用于在测试图像中的视觉异常定位的系统和方法,包括使用经训练的分类器网络、使用图像级,以多个尺度迭代,生成用于测试图像的注意力图。在分类器网络中测试图像的前向传播检测到第一类的条件下,使用分类器网络的反转来生成当前注意力图。可提取当前注意力图的一个或多个注意力区域并将其调整大小为子图像。对于每个尺度迭代,在当前注意力图显著不同于先前注意力图的条件下,进行当前注意力图的一个或多个区域的提取。测试图像中类的区域的视觉定位基于一个或多个注意力图。(A system and method for visual anomaly localization in a test image includes generating an attention map for the test image using a trained classifier network, using image levels, iteratively at multiple scales. The inversion of the classifier network is used to generate a current attention map on the condition that the forward propagation of the test image in the classifier network detects the first class. One or more attention areas of the current attention map may be extracted and resized to a sub-image. For each scale iteration, extraction of one or more regions of the current attention map is performed under conditions where the current attention map differs significantly from the previous attention map. Visual localization of regions of classes in the test image is based on one or more attention maps.)

1. A system for visual localization in a test image, comprising:

at least one storage device for storing computer-executable instructions; and

at least one processor configured to access the at least one storage device and execute the instructions for:

iteratively, at a plurality of scales, generating an attention map for the test image using a trained classifier network, wherein the classifier network is weakly supervised using image level classification of at least one class, wherein, in case a first class is detected by forward propagation of the test image in the classifier network, a current attention map is generated using inversion of the classifier network;

in each scale iteration, extracting one or more regions of which the attention value of the current attention map is larger than a threshold value;

resizing each of the one or more extraction regions into a sub-image, wherein the size of the sub-image is an incremental enlargement of the extraction region;

wherein, in the event that the forward propagation of the sub-images in the classifier network detects the first class, each subsequent attention graph of the respective scale iterations is based on an inversion of the classifier network for each sub-image;

wherein for each scale iteration, extraction of one or more regions of the current attention map is performed in the event that the current attention map differs significantly from the previous attention map; and

identifying a first category of visual positioning of the test image based on one or more of the subsequent attention maps.

2. The system of claim 1, further comprising: averaging the attention maps to generate an attention map combination, wherein the attention map combination is used to identify the first category of visual positioning.

3. The system of claim 1, further comprising: the scale iteration is terminated on a condition that a difference in pixel level magnitude between the current attention map and the previous attention map is less than a threshold.

4. The system of claim 1, further comprising: terminating the scale iteration on a condition that the count of the scale iteration is equal to a maximum count.

5. The system of claim 1, wherein the first class is defined as an anomaly class, the second class is defined as a normal class, and the classifier network is configured to detect the anomaly class for a test image including an anomaly region.

6. The system of claim 1, wherein the first category is defined as a first conceptual class, a second class is defined as a second conceptual class, and the classifier network is configured to determine the visual localization of the test image on a condition that the first conceptual class is successfully detected.

7. The system of claim 1, wherein the incremental amplification is a variable scale.

8. A method for visual localization in a test image, comprising:

in each scale iteration, extracting one or more regions of which the attention value of the current attention map is larger than a threshold value;

resizing each of the one or more extraction regions into a sub-image, wherein the size of the sub-image is an incremental enlargement of the extraction region;

wherein each subsequent attention map for each scale iteration is based on an inversion of the classifier network for each sub-image if the forward propagation of the sub-image in the classifier network detects the first class;

identifying the first category of visual positioning of the test image based on one or more of the subsequent attention maps.

9. The method of claim 8, further comprising: averaging the attention maps to generate an attention map combination, wherein the attention map combination is used to identify the first category of visual positioning.

10. The method of claim 8, further comprising: the scale iteration is terminated on a condition that a difference in pixel level magnitude between the current attention map and the previous attention map is less than a threshold.

11. The method of claim 8, further comprising: terminating the scale iteration on a condition that the count of the scale iteration is equal to a maximum count.

12. The method of claim 8, wherein the first class is defined as an anomaly class, the second class is defined as a normal class, and the classifier network is configured to detect the anomaly class of the test image including the anomaly region.

13. The method of claim 8, wherein the first category is defined as a first conceptual class, a second class is defined as a second conceptual class, and the classifier network is configured to determine the visual localization of the test image on a condition that the first conceptual class is successfully detected.

14. The method of claim 8, wherein the incremental amplification is a variable scale.

Technical Field

The invention relates to artificial intelligence. More particularly, the present invention relates to the application of artificial intelligence to visual recognition systems.

Background

The vision recognition system may apply a machine learning based approach, such as a convolutional neural network, which may include a training system to identify features or objects of interest in the image according to learned classifications. The classification may include tangible attributes, such as identifying the presence of a particular animate or inanimate object. For example, the system may be trained to learn one or more classifications (e.g., flower, dog, chair), and once trained, it analyzes a series of test images to identify which images include the subject of the trained classification.

Visual localization with machine learning assistance can be applied to detect anomalies within an image or to identify anomalous objects that potentially cause anomalies. These applications are important for both employee safety and quality control in industrial processes. Conventional anomaly detection methods require visual inspection by personnel, either by physical inspection or by viewing images from a camera feed.

Current methods of machine learning based visual localization have practical limitations, including the need to train dense manual pixel-wise or bounding box labeling of images. For example, labeling may include drawing a bounding box around an abnormal object appearing in the image, which is time consuming and non-scaleable.

Disclosure of Invention

Aspects according to embodiments of the invention include a system for visual localization in a test image, the system comprising: at least one storage device storing computer-executable instructions; and at least one processor configured to access the at least one storage device and execute instructions for iterating at a plurality of scales using the trained classifier network to generate an attention map of the test image, wherein the classifier network is weakly supervised using image-level classification. In the event that the first class is detected by forward propagation of the test image in the classifier network, a current attention map is generated using the inverse of the classifier network. The executed instructions may also, in each scale iteration, extract one or more regions of the current attention map having an attention value greater than a threshold, and may resize each of the one or more extracted regions into a sub-image, wherein the size of the sub-image is an enlargement by an increment than the extracted region. Each subsequent attention map for each scale iteration may be based on the inverse of the classifier network for each sub-image, provided that the forward propagation of the sub-image in the classifier network detects the first class. For each scale iteration, one or more regions of the current attention map may be extracted on a condition that the current attention map differs significantly from the previous attention map. The executed instructions may also identify a first category of visual positioning of the test image based on one or more of the subsequent attention maps.

Drawings

Non-limiting and non-exhaustive embodiments of the present embodiments are described with reference to the following figures, wherein like reference numerals refer to like elements throughout the various figures unless otherwise specified.

Fig. 1 illustrates a block diagram of an example system for detecting image categories in accordance with one or more embodiments of the present disclosure.

FIG. 2 shows an example of training data input for the system shown in FIG. 1.

Fig. 3 illustrates an attention map of an abnormal region of an image according to one or more embodiments of the present disclosure.

Fig. 4 illustrates a flow diagram of an example process for visual localization using weakly supervised networks in accordance with one or more embodiments of the present disclosure.

Fig. 5 shows an example of an attention map image associated with the visual localization process shown in fig. 4.

FIG. 6 illustrates an example computing environment in which embodiments of the present disclosure may be implemented.

Detailed Description

Aspects of embodiments of the present disclosure include a method of detecting a localized area in an image of one or more objects using a weakly supervised network. A classifier network, such as a Convolutional Neural Network (CNN), may be trained to classify images as including classification objects or features having concept classes. The captured image may be processed by a classifier network to classify the content of the image according to one or more categories. For example, classification may be applied to identify the presence of any anomalies in the image relative to a trained normal state. The anomaly may correspond to a defect in an object depicted in the image, or to an anomaly detected within a normal setting. Without a priori knowledge (what shape or form an anomaly in an image takes), gradient-based inversion or backpropagation of a network of classifiers can be applied to discover the inherent characteristics of normal and anomalous portions of an image. For example, an attention map may be generated in the form of a grayscale representation of the input image in which the attention of suspected abnormalities is highlighted. To improve anomaly detection and filter out false detections, each high-response region of interest (e.g., the brightest region of the attention map) may be cropped into a patch or sub-image for further processing at a delta scale. Each sub-image may be scaled to a standard image size and processed by the forward propagation of the classifier as in the first pass of the input image. If an anomalous classification is detected, an inversion of the classifier can be performed to generate an attention map. Further iterations with incremental scaling of the attention area may be repeated until the difference between successive attention maps is less than a threshold, or until a maximum number of iterations have occurred. A compilation of attention maps may be generated, for example by averaging the pixel magnitudes of all the attention maps, which has the effect of reducing the gray level intensity of areas without anomalies and enhancing the gray level value in areas with indications of anomaly detection.

Fig. 1 is a block diagram depicting image anomaly detection in accordance with one or more example embodiments of the present disclosure. The classifier 110, e.g., CNN, may have a hidden layer 125, which may include or omit one or more of the following layer types: convolutional layers, max pooling layers, full link layers, and softmax layers. The convolutional layer may extract features, such as edges and lines, from the image data. Optional inference module 127 may generate a feature map, which may map one or more features to categories. The feature map may be stored as a look-up table.

During training, the classifier 110 may be trained to classify images based on a comparison of the input images with training data 102, which may be images or patches, i.e., learn to identify objects in a series of input images. The training data 102 may include labels or annotations identifying the classifications to be learned by the classifier 110. The classifier 110 may also be trained to classify whether an image includes abnormal regions. The training data 102 used by the classifier 110 may be defined by a patch computed from the input image and a corresponding true value label indicating whether the patch contains an abnormal region. Patch-based training can help the classifier 110 learn fine-grained features of abnormal regions by focusing on smaller regions. Patch-based training may also avoid overfitting in cases where the training data is too small to train the model.

For example, the training data 102 may include a set of images known to have image anomalies and so annotated. Binary labels may be defined for the entire training image based on the presence of any outliers in the image, which may be readily obtained in real-time without extensive labeling effort. Abnormal regions in the training images need not be defined, segmented, or delineated. A suitable training image may be defined as having at least one outlier pixel. Each training image may be labeled or annotated and stored prior to training the classifier 110. For example, a training image with an abnormal region may be annotated as "positive" and a training image without an abnormal region may be annotated as "negative". Because the training data 102 is image-based (i.e., annotations are applied to the entire image) rather than pixel-based, deep learning by the classifier 110 according to embodiments of the present invention is considered weakly supervised. Applying image-based annotations greatly simplifies the training process and is done with less effort.

The classifier 110 may be trained without the benefit of prior data or information (i.e., from scratch) or may be fine-tuned according to the size of the training set in the application domain.

The training of the classifier 110 may apply a cost function that measures how successful the desired output is compared to a given training sample, where each hidden layer 125 is a weighted and biased function f that performs an operation on one or more inputs. The weights and bias values may be adjusted until the desired output value is acceptable for a range of forward propagation.

The classifier 110 may be trained in two stages. In the initial training phase, a very large set of images with various objects may be fed to the classifier as training data 102. For example, images from the ImageNet collection can be used to initialize the parameters of the classifier 110. As a refinement stage of the training, a specialized image of the expected anomaly may be fed to the classifier as training data 102. For example, where misplacing objects poses a potential safety hazard to a particular setting is a common anomaly at a monitoring station (e.g., a tool left by a worker near the mobile machine after a repair event), various images of foreign objects may be used as training data 102 during the refinement phase. As another example, in addition to anomaly detection, the training data 102 may be selected to train a classifier to distinguish memorable and memorable images.

During normal operation after the classifier 110 has been fully trained, the classifier 110 may process the input image 101 during the forward propagation 120, generating a 1D vector of prediction outputs 140. For example, where the classification is positive P for the presence of an image anomaly or negative N for the absence of an anomaly, there may be two outputs with likelihood values between 0 and 1, such as P ═ 0.9 and N ═ 0.1. Here, since P > N, decision 145 will be defined to indicate "positive", i.e. the input image contains an anomaly. In the example with two outputs 140, the decision 145 is binary. In this example, the 1D prediction vector includes two entries for two categories. However, the invention may implement more than two classes, e.g. n classes, where n >2, in which case the 1D prediction vector contains n values.

In response to the classification decision 145, the classifier 110 may perform back-propagation 130 of each convolutional layer 125 in the reverse order of the forward propagation 120. For example, a gradient function may be applied to the resulting values obtained during the forward propagation function f of a given layer 125. In one embodiment, the following back propagation operations may be performed on each layer 125 separately:

whereinIs the gradient of the function f over the input x and is determined by the partial derivative.

After the final operation of backpropagating 130 through the hidden layer 125 for each pixel of the image, a gradient value is determined for each pixel and converted to an attention map 103. For each pixel in the attention map 103, a gray value may be assigned in proportion to the gradient value. Note that the force diagram 103 may be defined by a mutual mapping between the forward and backward propagation of the classifier 110. The attention map 103 includes an attention map produced by back propagation with a positive signal and a negative signal. To generate the positive and negative attention maps, a positive or negative signal for counter-propagation may be generated by setting a positive and node in a fully connected layer in the classifier 110. For example, a positive signal for counter-propagation may be generated by setting a positive node to a value of 1 and a negative node to a value of 0. A negative signal for counter-propagation may be generated by setting the positive node to a value of 0 and the negative node to a value of 1. Note that the force diagram 103 may be generated by two counter-propagating conductors, one with a positive signal and the second with a negative signal. Positive attention corresponding to a positive signal attempts to encode the location of an anomalous pixel. A negative attention corresponding to a negative signal attempts to encode the position of a normal pixel. Both positive and negative attention maps may include pixels that are neither abnormally nor normally encoded, and are considered indeterminate. Negative attention maps can be used to improve the confidence of positive attention maps by reducing the area of uncertain pixels. The attention map pixels are encoded using a score based on a magnitude threshold scale. The negative attention map may reconfirm the location of the anomalous pixel by determining that, for each pixel, there is no high score on both the positive and negative attention maps.

FIG. 2 illustrates an example of enhanced training data for the system shown in FIG. 1. In one embodiment, the training data 202 may include data having image-level labels, such as the training data 102 described above, in conjunction with a smaller sample set of image data 203 using bounding box labels. This additional refinement of the training data may provide a more robust and reliable data set for improving the weakly supervised classifier system of fig. 1. Image data 203 may be used to create new classes to guide the classifier during training time. The new category may include image patches that exclude key focus regions of the classifier 110. Such a feature may be used as an additional measure to reduce the likelihood that the classifier 110 detects insignificant regions from the background in the image during image-level monitoring, particularly when noise and background variations are present in the scene.

The image level classification may consist of n classes. For example, in the basic case of n-2, the training class is chosen between two types of images, one belonging to the first class and the other belonging to the second class. Examples may include tangible classes such as images with faces and images without faces. Within the concept class, instances include defining a normal class and an exception class, where the exception class includes an exception region. Another example of concept classes includes a memorable class and a memorable class, where the memorable class can be defined as an image with visually appealing areas or with significant detail.

In one embodiment, the training data 202 may consist of two classes based on the rotational orientation of an object in the image foreground, where the first class may represent images of the object that are rotated to some degree from images of the same object in the second class representing a reference baseline orientation. For each class having image level labels, the training data having image level labels 102 may include a large number of samples, e.g., about 400 samples. The images selected for the training data 203 with bounding box labels may consist of a smaller number of samples, for example about 20 images per class, with the bounding box enclosing key markers in the images that help detect rotation. Image patch 205 may be cropped from the sample image for each category and may be marked to include a mark with the same image level category label corresponding to the image from which the patch was cropped. The image patch 207 may be cropped from areas that are not within the bounding box and marked as a new category named "blur". The image patches 205 and 207 may be sized to match the size of the original image. Once all sets of training data images 205 and 207 have been generated and adjusted, classifier 110 may be trained by applying training data 102, 205, and 207. The training data 102, 205, and 207 may also be applied as retraining by one of the available open source training data sets (e.g., ImageNet) after the initial training (or "pre-training") of the classifier 110.

Fig. 3 illustrates an example of an attention map of an abnormal region of an image according to one or more embodiments of the present disclosure. In one embodiment, the trained classifier network 110 may generate an attention map of the anomaly class. The input image 301 may be one of several test images that are visually positioned by the classifier network 110 for anomalies. The input image 301 includes a normal region 302 and an abnormal region 303. For this example, the normal region 302 includes the machine in the foreground and the abnormal region 303 of the image is a misplaced tool on the surface of the machine, which may present a risk of interfering with the safe operation of the machine. Thus, the anomalous region 303 will be located by the classifier network 110. The attention map 311, which may be generated by the back propagation of the input image 301 through the classifier 110 of fig. 1, displays the high attention area 313 as an indication of an abnormal area by a sharp gray scale contrast or intensity change relative to the rest of the image (e.g., relatively darker or lighter than other areas). Other variations are possible, such as using a color gradient to emphasize attention area 313. Although an example of a misplaced tool for visual positioning is shown and described, other features of the image foreground may be selected for normal and abnormal classification, including but not limited to rotational orientation of symmetric objects.

Fig. 4 shows a flow diagram of an example of a visual localization process using a weakly supervised network in accordance with one or more embodiments of the present disclosure. As shown, the process is a scale-based set of iterative operations, scale 0 to scale N. At scale 0, a test image, such as input image 401, may be fed to a classifier operation 402, which may include classifier forward propagation 432, tests for successful class detection 442, and classifier inversion 452. Classifier inversion 452 may generate an attention map 403. The attention extractor 405 may crop one or more sub-images 411 from the attention map 403.

Fig. 5 shows an example of an attention map image associated with the visual localization process shown in fig. 4. At scale 0, the input image 501 may be processed by the classifier operation 402 to generate an attention map 502 that identifies regions 512, 522, 532 as potential localized regions of classification indicated by attention values that exceed a threshold (e.g., pixels having high contrast and/or strong metric values). Attention extractor 405 may then perform extraction operation 503, via block 513, and crop the abnormal region indicated by the strongest attention. The block 513 operation may be based on a clustering algorithm with parameters set according to the optimized clustering region to have a minimum number of pixels to provide meaningful analysis and may also have a maximum limit of bin area to cover processing time and resources. Attention extractor 405 may crop each block region into an extracted image and resize the extracted image in a variable scale (e.g., to an N × M pixel size, where N and M are parameters of a classifier network) according to a set of parameters for the classifier operation. When processed by subsequent iterations, the resized extracted image effectively enlarges the regions of interest 512, 522, 532 for refining the first attention map. The resized extracted image becomes a sub-image 511, 521, 531 for input to the scale 1 iteration. If subsequent scale iterations 2 through N are performed, the attention area may be expanded at each iteration to further refine the attention map until an optimal attention map is reached.

Returning to fig. 4, the sub-image 411 may be fed to a classifier operation 412 for a scale 1 iteration. Although not shown in the illustration for simplicity, the classifier operation 412 includes the same operations as described above for the classifier operation 402. Note that the force map 413 is the output of the classifier operation 412 (in a manner similar to the test 442 of the scale 0 iteration) on condition that the sought class was successfully detected. A limit test 414 may check whether the iterative attention mapping should continue. For example, the limit test 414 may determine whether the attention map 413 differs significantly from the attention map 403 to ensure further refinement by additional iterations with incremental scaling. For example, a comparison of the gray levels of the entire attention map 403 and the entire attention map 413 may indicate a change in pixel level gray level intensity. In case the change value is smaller than the threshold, the process may stop and the conclusion is that the optimal attention map has been obtained. If the change is not less than the threshold, attention extractor 415 may begin the extraction operation on attention map 413. As another example, limit test 414 may include a test that checks whether a preset maximum iteration count has occurred, which may limit the expenditure of processing resources based on designer preferences. As shown in fig. 4, the scale 1 iteration may end up by cropping and resizing the attention area to produce one or more sub-images 421 corresponding to the number of extracted attention areas. Further iterations of scale 2 through scale N may be performed in a manner similar to the scale 1 iterations described above, provided that the limit test does not prompt termination.

Returning to fig. 5, a scale 1 iteration is shown for an example operation in which only one sub-image generates the detected class from the forward propagation of the classifier 412. In the case where sub-images 521 and 531 fail to generate a detected classification, only sub-image 511 is processed by the inverse of the classifier to produce an attention map 542 with refined attention area 552, since attention areas 512 and 532 may be misinformed by background noise. When constructing the attention map 542, the attention area is rescaled from scale 1 to scale 0 and placed at a location in the map 542 corresponding to its location in the original attention map 502 so that a one-to-one comparison of successive attention maps can be performed. Comparison of the attention map 542 with the previous attention map 502 indicates a significant gray level change because the regions 512 and 532 have been eliminated. The new attention map 542 may also generate a finer attention area 552 than the previous attention map area 522 as a result of the classifier operation on the magnified image 511. For example, the attention area 552 may include fewer pixels where the additional pixels of the previous attention area 522 have been eliminated and are attributable to background noise present in the scale 0 iteration. The attention area 552 may be cropped by the extraction tool at 513 and resized to a sub-image 541. The scale 2 iteration may be performed using the sub-image 541 as the forward propagating input image of the classifier network. After the detection of the categories and the inversion of the classifier network, an attention map 562 is generated with an attention area 572 that is very similar to the attention area 552 of the previous attention map 542. The iterative process may be terminated on condition that the gray scale change between attention maps 542 and 562 is below a threshold. A combined attention map may be generated by averaging the attention maps 502, 542, and 562 to generate a visual location of the sought category in the test image 501. Alternatively, the attention map 562 may be used to test the best visual positioning of the sought category of the image 501.

An advantage of iterations with incremental scales 0 through N is that the resized sub-image may be analyzed by classifier inversion at a finer resolution than the previous scale, so that the resulting attention map may reveal one or more previous attention regions as erroneous, as was the case for regions 512 and 532 of attention map 502 that were eliminated in the scale 1 iteration.

FIG. 6 illustrates an example computing environment 700 in which embodiments of the present disclosure may be implemented. Computers and computing environments, such as computer system 710 and computing environment 700, are known to those skilled in the art and are therefore described briefly herein.

As shown in FIG. 6, computer system 710 may include a communication mechanism such as a system bus 721 or other communication mechanism for communicating information within computer system 710. The computer system 710 also includes one or more processors 720 coupled with the system bus 721 for processing information.

Processor 720 may include one or more Central Processing Units (CPUs), Graphics Processing Units (GPUs), or any other processor known in the art. More generally, the processors described herein are devices for executing machine-readable instructions stored on a computer-readable medium for performing tasks, and may include any one or a combination of hardware and firmware. The processor may also include a memory storing machine-readable instructions executable to perform tasks. The processor acts upon the information by manipulating, analyzing, modifying, converting, or transmitting the information for use by the executable program or the information device, and/or by routing the information to an output device. A processor may use or include the capabilities of, for example, a computer, controller or microprocessor, and may be conditioned using executable instructions to perform specialized functions that are not performed by a general purpose computer. The processor may include any type of suitable processing unit, including but not limited to a central processing unit, microprocessor, Reduced Instruction Set Computer (RISC) microprocessor, Complex Instruction Set Computer (CISC) microprocessor, microcontroller, Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), System On Chip (SOC), Digital Signal Processor (DSP), or the like. Further, processor 720 may have any suitable micro-architectural design, including any number of component elements, such as registers, multiplexers, arithmetic logic units, cache controllers to control read/write operations to cache memory, branch predictors, and the like. The micro-architectural design of a processor can support any of a variety of instruction sets. The processor may be coupled (electrically coupled and/or include executable components) with any other processor capable of interaction and/or communication therebetween. The user interface processor or generator is a known element comprising electronic circuitry or software or a combination of both for generating a display image or part thereof. The user interface includes one or more display images that enable a user to interact with the processor or other device.

The system bus 721 may include at least one of a system bus, a memory bus, an address bus, or a message bus, and may allow information (e.g., data (including computer-executable code), signaling, etc.) to be exchanged between the various components of the computer system 710. The system bus 721 may include, but is not limited to, a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and the like. The system bus 721 may be associated with any suitable bus architecture, including but not limited to Industry Standard Architecture (ISA), Micro Channel Architecture (MCA), enhanced ISA (eisa), Video Electronics Standards Association (VESA), Accelerated Graphics Port (AGP), Peripheral Component Interconnect (PCI), PCI-Express, PC memory card international association (PCMCIA), Universal Serial Bus (USB), and the like.

With continued reference to FIG. 6, the computer system 710 may also include a system memory 730 coupled to the system bus 721 for storing information and instructions to be executed by the processor 720. The system memory 730 may include computer-readable storage media in the form of volatile and/or nonvolatile memory such as Read Only Memory (ROM)731 and/or Random Access Memory (RAM) 732. The RAM 732 may include other dynamic storage devices (e.g., dynamic RAM, static RAM, and synchronous DRAM). The ROM731 can include other static storage devices (e.g., programmable ROM, erasable PROM, and electrically erasable PROM). In addition, system memory 730 may be used to store temporary variables or other intermediate information during execution of instructions by processor 720. A basic input/output system 733(BIOS), containing the basic routines that help to transfer information between elements within computer system 710, such as during start-up, may be stored in ROM 731. RAM 732 may contain data and/or program modules that are immediately accessible to and/or presently being operated on by processor 720. The system memory 730 may also include, for example, an operating system 734, application programs 735, and other program modules 736.

An operating system 734 may be loaded into memory 730 and may provide an interface between other application software executing on computer system 710 and the hardware resources of computer system 710. More specifically, operating system 734 may include a set of computer-executable instructions for managing the hardware resources of computer system 710 and for providing common services to other applications (e.g., managing memory allocation among various applications). In some example embodiments, operating system 734 may control the execution of one or more program modules depicted as stored in data storage 740. Operating system 734 may include any operating system now known or that may be developed in the future, including but not limited to any server operating system, any host operating system, or any other proprietary or non-proprietary operating system.

The application 735 may be a set of computer-executable instructions for performing a visual positioning process in accordance with embodiments of the present disclosure.

The computer system 710 may also include a disk/media controller 743 coupled to the system bus 721 to control one or more storage devices for storing information and instructions, such as a hard disk 741 and/or a removable media drive 742 (e.g., a floppy disk drive, an optical disk drive, a tape drive, a flash drive, and/or a solid state drive). The storage 740 may be added to the computer system 710 using an appropriate device interface (e.g., Small Computer System Interface (SCSI), Integrated Device Electronics (IDE), Universal Serial Bus (USB), or firewire). The storage 741, 742 may be external to the computer system 710 and may be used to store image processing data according to embodiments of the present disclosure, such as input image data 101, training data 102, attention map 103, output 140, and decision data 145 described with respect to fig. 1, training data 202 shown and described with respect to fig. 2, attention maps 403, 413, 423, input image 401, sub-images 411, 421, and attention map combination 405 shown in fig. 4.

The computer system 710 may also include a display controller 765 coupled to the system bus 721 to control a display or monitor 766, such as a Cathode Ray Tube (CRT) or Liquid Crystal Display (LCD), for displaying information to a computer user. The computer system includes a user input interface 760 and one or more input devices, such as a user terminal 761, which may include a keyboard, touch screen, tablet and/or pointing device, for interacting with a computer user and providing information to the processor 720. The display 766 may provide a touch screen interface that allows input to supplement or replace the communication of direction information and command selections by the user terminal device 761.

Computer system 710 may perform a portion or all of the processing steps of embodiments of the invention in response to processor 720 executing one or more sequences of one or more instructions contained in a memory, such as system memory 730. Such instructions may be read into the system memory 730 from another computer-readable medium, such as a magnetic hard disk 741 or a removable media drive 742. Hard disk 741 may contain one or more data stores and data files used by embodiments of the present invention. Data stores may include, but are not limited to, databases (e.g., relational, object-oriented, etc.), file systems, flat files, distributed data stores where data is stored on more than one node of a computer network, peer-to-peer network data stores, and the like. The data store may store various types of data, such as layer 125 of classifier network 110 shown in FIG. 1. The data storage content and data files may be encrypted to improve security. Processor 720 may also be used in a multi-processing arrangement to execute one or more sequences of instructions contained in system memory 730. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions. Thus, implementations are not limited to any specific combination of hardware circuitry and software.

As mentioned above, computer system 710 may include at least one computer-readable medium or memory for holding instructions programmed according to embodiments of the invention and for containing data structures, tables, records, or other data described herein. The term "computer-readable medium" as used herein refers to any medium that participates in providing instructions to processor 720 for execution. A computer-readable medium may take many forms, including but not limited to, non-transitory, non-volatile media, and transmission media. Non-limiting examples of non-volatile media include optical, solid state drives, magnetic disks, and magneto-optical disks, such as the magnetic hard disk 741 or the removable media drive 742. Non-limiting examples of volatile media include dynamic memory, such as system memory 730. Non-limiting examples of transmission media include coaxial cables, copper wire and fiber optics, including the wires that comprise system bus 721. Transmission media can also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.

The computer-readable medium instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, an electronic circuit comprising, for example, a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), can execute computer-readable program instructions by personalizing the electronic circuit with state information of the computer-readable program instructions to perform aspects of the present invention.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable medium instructions.

The computing environment 700 may also include a computer system 710 operating in a networked environment that captures input images 101, 401, 501 using logical connections to one or more remote computers, such as a remote computing device 780, and one or more visual detection devices 781, such as cameras that may detect one of RGB, infrared, depth (e.g., stereo cameras), and so forth. The network interface 770 may enable communication with other remote devices 780 or systems and/or storage 741, 742, e.g., via a network 771. The remote computing device 780 may be a personal computer (laptop or desktop), a mobile device, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer system 710. When used in a networked environment, the computer system 710 may include a modem 772 for establishing communications over the network 771, such as the Internet. The modem 772 can be connected to the system bus 721 via the user network interface 770 or via another appropriate mechanism.

Network 771 may be any network or system known in the art including the internet, an intranet, a Local Area Network (LAN), a Wide Area Network (WAN), a Metropolitan Area Network (MAN), a direct connection or series of connections, a cellular telephone network, or any other network or medium capable of facilitating communication between computer system 710 and other computers, such as remote computing device 780. The network 771 may be wired, wireless, or a combination thereof. The wired connection may be implemented using Ethernet, Universal Serial Bus (USB), RJ-6, or any other wired connection known in the art. The wireless connection may be implemented using Wi-Fi, WiMAX and bluetooth, infrared, cellular networks, satellite or any other wireless connection method known in the art. Additionally, several networks may work separately or in communication with each other to facilitate communications in the network 771.

It should be appreciated that the program modules, applications, computer-executable instructions, code, etc., depicted in FIG. 6 as being stored in system memory 730 are merely illustrative and not exhaustive, and that the processing described as supported by any particular module may alternatively be distributed over multiple modules or executed by different modules. In addition, various program modules, scripts, plug-ins, Application Programming Interfaces (APIs), or any other suitable computer-executable code hosted locally on computer system 710, remote device 780, and/or hosted on other computing devices accessible via one or more networks 771 may be provided to support the functionality provided by the program modules, application programs, or computer-executable code depicted in fig. 6 and/or additional or alternative functionality. Further, the functionality may be variously modular such that processes described as being commonly supported by a collection of program modules depicted in FIG. 6 may be performed by a fewer or greater number of modules, or the functionality described as being supported by any particular module may be supported, at least in part, by another module. Further, program modules that support the functionality described herein may form part of one or more applications program(s) executable on any number of systems or devices according to any suitable computing model, such as a client-server model, a peer-to-peer model, and so forth. Additionally, any of the functions described as being supported by any of the program modules depicted in fig. 6 may be implemented at least partially in hardware and/or firmware on any number of devices.

As used herein, an executable application includes code or machine-readable instructions for adjusting a processor to implement a predetermined function, such as that of an operating system, context data acquisition system, or other information processing system, for example, in response to a user command or input. An executable procedure is a piece of code or machine readable instruction, sub-routine, or other distinct portion of code or portion of an executable application for performing one or more particular processes. These processes may include receiving input data and/or parameters; performing an operation on the received input data and/or performing a function in response to the received input parameters; and providing resulting output data and/or parameters.

A Graphical User Interface (GUI) as used herein includes one or more display images generated by a display processor and enables a user to interact with the processor or other device and associated data acquisition and processing functions. The GUI also includes an executable program or executable application. The executable program or executable application program conditions the display processor to generate a signal representing the GUI display image. These signals are provided to a display device that displays the image viewed by the user. The processor, under the control of an executable program or executable application program, manipulates the GUI display image in response to signals received from the input device. In this way, a user may interact with the display image using the input device, enabling the user to interact with the processor or other device.

The functions and process steps described herein may be performed automatically or in whole or in part in response to user commands. The automatically performed activity (including the steps) is performed in response to one or more executable instructions or device operations without the user directly initiating the activity.

The systems and processes in the drawings are not exclusive. Other systems, processes and menus can be derived to achieve the same objectives in accordance with the principles of the present invention. Although the present invention has been described with reference to particular embodiments, it is to be understood that the embodiments and variations shown and described herein are for illustration purposes only. Modifications to the current design may be implemented by those skilled in the art without departing from the scope of the invention. As described herein, various systems, subsystems, agents, managers and processes may be implemented using hardware components, software components and/or combinations thereof. Unless the element is explicitly stated using the phrase "means for.

19页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种表观特征的描述属性识别方法及装置

Visual localization in images using weakly supervised neural networks

相关技术

网友询问留言