Object detection method, device, equipment and computer readable storage medium

文档序号:1832039 发布日期:2021-11-12 浏览:7次 中文

阅读说明:本技术 对象检测方法、装置、设备及计算机可读存储介质 (Object detection method, device, equipment and computer readable storage medium ) 是由 包利强 吴文龙 张天亮 于 2021-10-15 设计创作,主要内容包括:本申请实施例公开了一种对象检测方法、装置、设备及计算机可读存储介质。其中方法包括:遍历待检对象的多视角图像集中的各个视角图像,若在当前遍历的当前视角图像中检测到需检测的对象元素,则将检测到的对象元素作为待检对象的候选对象元素,并确定候选对象元素在当前视角图像中的位置,以及从当前视角图像中检测到的候选对象元素的初始元素信息;基于当前视角图像与关联视角图像之间的投影关系,和候选对象元素在当前视角图像中的位置,在关联视角图像中检索候选对象元素;根据检索结果和初始元素信息,生成候选对象元素的目标元素信息,以检测候选对象元素的真实性,提升了对象检测的准确度。(The embodiment of the application discloses an object detection method, an object detection device, object detection equipment and a computer-readable storage medium. The method comprises the following steps: traversing each view image in a multi-view image set of the object to be detected, if an object element to be detected is detected in a current view image which is currently traversed, using the detected object element as a candidate object element of the object to be detected, and determining the position of the candidate object element in the current view image and initial element information of the candidate object element detected from the current view image; based on the projection relation between the current perspective image and the associated perspective image and the position of the candidate object element in the current perspective image, searching the candidate object element in the associated perspective image; and generating target element information of the candidate object element according to the retrieval result and the initial element information so as to detect the authenticity of the candidate object element and improve the accuracy of object detection.)

1. An object detection method, characterized in that the method comprises:

traversing each view angle image in the multi-view angle image set of the object to be detected, wherein different view angle images are obtained by shooting the object to be detected by adopting different shooting view angles;

if an object element needing to be detected is detected in a current traversed current view image, taking the detected object element as a candidate object element of the object to be detected, and determining the position of the candidate object element in the current view image and initial element information of the candidate object element detected from the current view image;

acquiring an associated view image of the current view image from the remaining view images except the current view image in the multi-view image set, wherein a projection relation exists between the current view image and the associated view image;

retrieving the candidate object element in the associated perspective image based on a projection relationship between the current perspective image and the associated perspective image and a position of the candidate object element in the current perspective image;

and generating target element information of the candidate object element according to the retrieval result and the initial element information of the candidate object element, and detecting the authenticity of the candidate object element based on the target element information.

2. The method of claim 1, wherein the projection relationship between the current perspective image and the associated perspective image is one of the projection relationships in the multi-perspective joint modeling results of the set of multi-perspective images;

the construction method of any projection relation in the multi-view joint modeling result is as follows:

selecting a view angle image from the multi-view angle image set of the object to be detected as a reference view angle image, and selecting a view angle image which has an overlapping area with the reference view angle image from the multi-view angle image set as a reference view angle image;

determining a plurality of reference calibration points in the reference view image and a plurality of reference calibration points in the reference view image; one reference calibration point corresponds to one reference calibration point, and the calibration points refer to: calibrating the position of one characteristic point of the object to be detected in the visual angle image to obtain a point;

and calculating the projection relation between the reference visual angle image and the reference visual angle image according to the position coordinate of each reference calibration point and the position coordinate of the corresponding reference calibration point.

3. The method according to claim 2, wherein the calculating a projection relationship between the base view image and the reference view image based on the position coordinates of each base index point and the position coordinates of the corresponding reference index point comprises:

obtaining a perspective transformation matrix to be solved, wherein the perspective transformation matrix to be solved comprises a plurality of parameters to be solved;

projecting the position coordinates of each reference calibration point from the two-dimensional space where the reference visual angle image is located to the three-dimensional space by using the perspective transformation matrix to be solved so as to obtain the three-dimensional space coordinates of each reference calibration point, wherein the three-dimensional space coordinates of each reference calibration point comprise the plurality of parameters;

projecting the three-dimensional space coordinate of each reference calibration point to a two-dimensional space where the reference visual angle image is located to obtain the projection coordinate of each reference calibration point, wherein the projection coordinate of each reference calibration point comprises the plurality of parameters;

solving the value of each parameter in the perspective transformation matrix according to the constraint condition that the projection coordinate of each reference calibration point is equal to the position coordinate of the corresponding reference calibration point, so as to obtain the solved perspective transformation matrix;

and adopting the solved perspective transformation matrix as a projection relation between the reference visual angle image and the reference visual angle image.

4. The method of claim 1, wherein the retrieving the candidate object element in the associated view image based on a projection relationship between the current view image and the associated view image and a position of the candidate object element in the current view image comprises:

based on the projection relation between the current view image and the associated view image, projecting the position of the candidate object element in the current view image into the associated view image to obtain a projection position;

if K object elements are detected in the associated perspective image, determining the position of each object element in the K object elements in the associated perspective image, wherein K is a positive integer;

calculating the position matching degree between the position of each object element in the associated view angle image and the projection position; searching the position matching degree which is greater than the threshold value of the matching degree from the calculated position matching degrees;

if the retrieval is successful, determining that the candidate object element is retrieved from the associated view angle image; and if the retrieval fails, determining that the candidate object element is not retrieved in the associated view angle image.

5. The method of claim 4, wherein the position of each object element in the associated perspective image is identified using a labeling box, and the projection position is identified using a projection box; the calculating the position matching degree between the position of each object element in the associated view angle image and the projection position comprises the following steps:

calculating the intersection ratio between the position of the kth object element in the associated view angle image and the projection position, wherein K belongs to [1, K ];

and determining the calculated intersection ratio as the position matching degree between the position of the kth object element in the associated view angle image and the projection position.

6. The method of claim 1, wherein the initial element information includes a confidence of the candidate object element, and wherein generating target element information for the candidate object element based on the search result and the initial element information for the candidate object element comprises:

if the retrieval result indicates that the candidate object element is not retrieved in the associated view image, performing reduction processing on the confidence coefficient in the initial element information;

and taking the object element information obtained after the reduction processing as the target element information of the candidate object element.

7. The method of claim 1, wherein generating target element information for the candidate object element based on the search result and the initial element information for the candidate object element comprises:

if the retrieval result indicates that the candidate object element is retrieved in the associated view image, taking object element information of the candidate object element detected in the associated view image as associated element information;

and carrying out information fusion on the initial element information and the associated element information of the candidate object element to obtain target element information of the candidate object element.

8. The method of claim 7, wherein the initial element information includes a confidence level of the candidate object element and the associated element information includes a confidence level of the candidate object element; the information fusion of the initial element information and the associated element information of the candidate object element to obtain the target element information of the candidate object element includes:

acquiring a first weight of the initial element information and a second weight of the associated element information;

weighting and summing the confidence coefficient in the initial element information and the confidence coefficient in the associated element information by adopting the first weight and the second weight to obtain a target confidence coefficient;

adding the target confidence to target element information of the candidate object element.

9. The method according to claim 7 or 8, wherein the initial element information includes an element category of the candidate object element, the number of associated view images is N, the associated element information of each associated view image includes the element category of the candidate object element, N is an integer greater than 1;

the information fusion of the initial element information and the associated element information of the candidate object element to obtain the target element information of the candidate object element includes:

if the element category in the initial element information is the same as the element category in each associated element information, taking the element category as a target element category, and adding the target element category to the target element information of the candidate object element;

if the element type in at least one piece of associated element information is different from the element type in the initial element information, counting the number of each element type, determining the element type with the largest number as a target element type, and adding the target element type to the target element information of the candidate object element.

10. The method of claim 7, wherein the initial element information comprises a confidence level of the candidate object element and an element class of the candidate object element, the associated element information comprises an element class of the candidate object element, and the information fusing the initial element information and the associated element information of the candidate object element to obtain the target element information of the candidate object element comprises:

if the element type in the initial element information is the same as the element type in the associated element information, performing augmentation processing on the confidence degree in the initial element information, and taking the element information obtained after augmentation processing as target element information of the candidate object element;

if the element type in the initial element information is different from the element type in the associated element information, performing reduction processing on the confidence degree in the initial element information, and taking the element information obtained after the reduction processing as the target element information of the candidate object element.

11. The method of claim 7, wherein the number of the associated view images is N, N is an integer greater than 1, and one associated view image corresponds to one associated element information;

the information fusion of the initial element information and the associated element information of the candidate object element to obtain the target element information of the candidate object element includes:

voting initial element information and N pieces of associated element information of the candidate object element to obtain the number of supported votes of the initial element information and the number of supported votes of each piece of associated element information;

and selecting the object element information with the maximum number of supported tickets as the target element information of the candidate object element from the initial element information and the N pieces of associated element information.

12. The method of claim 1, wherein said detecting authenticity of said candidate object element based on said target element information comprises:

if the confidence in the target element information is greater than or equal to a confidence threshold, judging that the candidate object element has authenticity;

and if the confidence coefficient in the target element information is smaller than the confidence coefficient threshold value, judging that the candidate object element does not have authenticity.

13. The method of claim 1, wherein the method further comprises:

obtaining authenticity detection results of H candidate object elements detected from all view images after all view images in the multi-view image set are traversed, wherein H is a positive integer;

and judging the object detection result of the object to be detected according to the authenticity check result of the H candidate object elements, wherein the object detection result is used for indicating whether the object to be detected passes the detection.

14. An object detecting apparatus, characterized by comprising:

the processing unit is used for traversing each view angle image in the multi-view angle image set of the object to be detected, and different view angle images are obtained by shooting the object to be detected by adopting different shooting view angles; and if an object element to be detected is detected in the currently traversed current view image, taking the detected object element as a candidate object element of the object to be detected, and determining the position of the candidate object element in the current view image and initial element information of the candidate object element detected from the current view image;

an obtaining unit, configured to obtain an associated view image of the current view image from remaining view images, other than the current view image, in the multi-view image set, where a projection relationship exists between the current view image and the associated view image;

the processing unit is further configured to retrieve the candidate object element from the associated perspective image based on a projection relationship between the current perspective image and the associated perspective image and a position of the candidate object element in the current perspective image; and the system is used for generating target element information of the candidate object element according to the retrieval result and the initial element information of the candidate object element, and detecting the authenticity of the candidate object element based on the target element information.

15. An object detecting apparatus, characterized by comprising: a storage device and a processor;

the storage device stores a computer program therein;

a processor executing a computer program implementing the object detection method of any one of claims 1-13.

16. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the object detection method of any one of claims 1-13.

17. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements an object detection method as claimed in any one of claims 1 to 13.

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for object detection.

Background

In daily life, there are many scenarios of object element detection; such as defect detection, lesion detection, motion detection, and the like. Taking the object element as an example of defect detection, in the industrial production process, due to various reasons (such as misoperation, collision, scratching and the like), some products (such as parts) have defects, and the products with the defects are often not normally put into use. Therefore, after the product is produced, it is usually necessary to perform defect detection on the product to ensure that the product is ready for use. Actually, it is found that, in most of the object element detection scenes, object element detection is mainly performed on an object to be detected in a manual detection manner, and such a detection manner may result in low accuracy.

Disclosure of Invention

The embodiment of the application provides an object detection method, device and equipment and a computer readable storage medium, which can better improve the accuracy of object detection.

In one aspect, an embodiment of the present application provides an object detection method, including:

traversing each view angle image in the multi-view angle image set of the object to be detected, wherein different view angle images are obtained by shooting the object to be detected by adopting different shooting view angles;

if an object element needing to be detected is detected in the current traversed current view image, the detected object element is used as a candidate object element of the object to be detected, the position of the candidate object element in the current view image is determined, and initial element information of the candidate object element detected from the current view image is determined;

acquiring a related visual angle image of the current visual angle image from the residual visual angle images except the current visual angle image in the multi-visual angle image set, wherein a projection relation exists between the current visual angle image and the related visual angle image;

based on the projection relation between the current perspective image and the associated perspective image and the position of the candidate object element in the current perspective image, searching the candidate object element in the associated perspective image;

and generating target element information of the candidate object element according to the retrieval result and the initial element information of the candidate object element, and detecting the authenticity of the candidate object element based on the target element information.

In one aspect, an embodiment of the present application provides an object detection apparatus, including:

the processing unit is used for traversing each view angle image in the multi-view angle image set of the object to be detected, and different view angle images are obtained by shooting the object to be detected by adopting different shooting view angles; and the image processing device is used for taking the detected object element as a candidate object element of the object to be detected if the object element to be detected is detected in the current view image which is traversed currently, and determining the position of the candidate object element in the current view image and the initial element information of the candidate object element detected from the current view image;

the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring an associated visual angle image of a current visual angle image from residual visual angle images except the current visual angle image in a multi-visual angle image set, and a projection relation exists between the current visual angle image and the associated visual angle image;

the processing unit is further used for retrieving candidate object elements in the associated view images based on the projection relation between the current view images and the associated view images and the positions of the candidate object elements in the current view images; and the target element information is used for generating the target element information of the candidate object element according to the retrieval result and the initial element information of the candidate object element, and detecting the authenticity of the candidate object element based on the target element information.

In one embodiment, the projection relationship between the current perspective image and the associated perspective image is one of the projection relationships in the multi-perspective joint modeling result of the multi-perspective image set;

the construction method of any projection relation in the multi-view joint modeling result is as follows:

selecting a visual angle image from a multi-visual angle image set of an object to be detected as a reference visual angle image, and selecting a visual angle image which has an overlapping area with the reference visual angle image from the multi-visual angle image set as a reference visual angle image;

determining a plurality of reference calibration points in the reference view image and a plurality of reference calibration points in the reference view image; one reference calibration point corresponds to one reference calibration point, and the calibration points refer to: calibrating the position of a characteristic point of an object to be detected in the visual angle image to obtain a point;

and calculating the projection relation between the reference visual angle image and the reference visual angle image according to the position coordinates of each reference calibration point and the position coordinates of the corresponding reference calibration point.

In an embodiment, the processing unit is configured to calculate a projection relationship between the reference perspective image and the reference perspective image according to the position coordinates of each reference calibration point and the position coordinates of the corresponding reference calibration point, and specifically is configured to:

acquiring a perspective transformation matrix to be solved, wherein the perspective transformation matrix to be solved comprises a plurality of parameters to be solved;

projecting the position coordinates of each reference calibration point from a two-dimensional space where the reference visual angle image is located to a three-dimensional space by adopting a perspective transformation matrix to be solved so as to obtain three-dimensional space coordinates of each reference calibration point, wherein the three-dimensional space coordinates of each reference calibration point comprise a plurality of parameters;

projecting the three-dimensional space coordinate of each reference calibration point to a two-dimensional space where the reference visual angle image is located to obtain the projection coordinate of each reference calibration point, wherein the projection coordinate of each reference calibration point comprises a plurality of parameters;

solving the value of each parameter in the perspective transformation matrix according to the constraint condition that the projection coordinate of each reference calibration point is equal to the position coordinate of the corresponding reference calibration point, so as to obtain the solved perspective transformation matrix;

and adopting the solved perspective transformation matrix as a projection relation between the reference visual angle image and the reference visual angle image.

In an embodiment, the processing unit is configured to retrieve, based on a projection relationship between the current perspective image and the associated perspective image and a position of the candidate object element in the current perspective image, the candidate object element in the associated perspective image, and is specifically configured to:

based on the projection relation between the current view angle image and the associated view angle image, projecting the position of the candidate object element in the current view angle image to the associated view angle image to obtain a projection position;

if K object elements are detected in the associated view image, determining the position of each object element in the K object elements in the associated view image, wherein K is a positive integer;

calculating the position matching degree between the position of each object element in the associated view angle image and the projection position; searching the position matching degree which is greater than the threshold value of the matching degree from the calculated position matching degrees;

if the retrieval is successful, determining that the candidate object elements are retrieved from the associated view angle image; and if the retrieval fails, determining that the candidate object element is not retrieved in the associated view angle image.

In one embodiment, the position of each object element in the associated view image is identified by a label frame, and the projection position is identified by a projection frame;

the processing unit is configured to calculate a position matching degree between a position of each object element in the associated view image and the projection position, and specifically configured to:

calculating the intersection ratio between the position of the kth object element in the associated view angle image and the projection position, wherein K belongs to [1, K ];

and determining the calculated intersection ratio as the position matching degree between the position of the kth object element in the associated view angle image and the projection position.

In an embodiment, the initial element information includes a confidence of the candidate object element, and the processing unit is configured to generate target element information of the candidate object element according to the search result and the initial element information of the candidate object element, and is specifically configured to:

if the retrieval result indicates that the candidate object element is not retrieved in the associated view image, reducing the confidence coefficient in the initial element information;

and taking the object element information obtained after the reduction processing as target element information of the candidate object element.

In one embodiment, the processing unit is configured to generate target element information of the candidate object element according to the search result and initial element information of the candidate object element, and is specifically configured to:

if the retrieval result indicates that the candidate object element is retrieved in the associated view image, taking object element information of the candidate object element detected in the associated view image as associated element information;

and carrying out information fusion on the initial element information and the associated element information of the candidate object element to obtain the target element information of the candidate object element.

In one embodiment, the initial element information includes a confidence of the candidate object element, and the associated element information includes a confidence of the candidate object element;

the processing unit is configured to perform information fusion on the initial element information and the associated element information of the candidate object element to obtain target element information of the candidate object element, and specifically configured to:

acquiring a first weight of the initial element information and a second weight of the associated element information;

weighting and summing the confidence coefficient in the initial element information and the confidence coefficient in the associated element information by adopting the first weight and the second weight to obtain a target confidence coefficient;

adding the target confidence to the target element information of the candidate object element.

In one embodiment, the initial element information includes element categories of the candidate object elements, the number of the associated view images is N, the associated element information of each associated view image includes the element categories of the candidate object elements, and N is an integer greater than 1;

the processing unit is configured to perform information fusion on the initial element information and the associated element information of the candidate object element to obtain target element information of the candidate object element, and specifically configured to:

if the element category in the initial element information is the same as the element category in each associated element information, taking the element category as a target element category and adding the target element category to the target element information of the candidate object element;

if the element type in at least one piece of associated element information is different from the element type in the initial element information, counting the number of each element type, determining the element type with the largest number as a target element type, and adding the target element type to the target element information of the candidate object element.

In one embodiment, the initial element information includes a confidence of the candidate object element and an element class of the candidate object element, and the associated element information includes an element class of the candidate object element;

the processing unit is configured to perform information fusion on the initial element information and the associated element information of the candidate object element to obtain target element information of the candidate object element, and specifically configured to:

if the element type in the initial element information is the same as the element type in the associated element information, performing augmentation processing on the confidence coefficient in the initial element information, and taking the element information obtained after augmentation processing as target element information of the candidate object element;

and if the element type in the initial element information is different from the element type in the associated element information, reducing the confidence degree in the initial element information, and taking the element information obtained after the reduction as the target element information of the candidate object element.

In one embodiment, the number of the associated view images is N, where N is an integer greater than 1, and one associated view image corresponds to one associated element information;

the processing unit is configured to perform information fusion on the initial element information and the associated element information of the candidate object element to obtain target element information of the candidate object element, and specifically configured to:

voting initial element information and N pieces of associated element information of the candidate object element to obtain the number of support votes of the initial element information and the number of support votes of each piece of associated element information;

and selecting object element information with the maximum number of supported tickets as target element information of the candidate object elements from the initial element information and the N pieces of associated element information.

In an embodiment, the processing unit is configured to detect authenticity of the candidate object element based on the target element information, and in particular to:

if the confidence coefficient in the target element information is greater than or equal to the confidence coefficient threshold value, judging that the candidate object element has authenticity;

and if the confidence coefficient in the target element information is smaller than the confidence coefficient threshold value, judging that the candidate object element does not have authenticity.

In one embodiment, the processing unit is further configured to:

after all view images in the multi-view image set are traversed, authenticity detection results of H candidate object elements detected from all view images are obtained, wherein H is a positive integer;

and judging the object detection result of the object to be detected according to the authenticity check result of the H candidate object elements, wherein the object detection result is used for indicating whether the object to be detected passes the detection.

Accordingly, the present application provides an object detection device, the device comprising:

a processor for loading and executing a computer program;

a computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements the object detection method described above.

Accordingly, the present application provides a computer readable storage medium having stored thereon a computer program adapted to be loaded by a processor and to execute the above object detection method.

Accordingly, the present application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the object detection method.

According to the embodiment of the application, when an object element needing to be detected is detected in a current traversed current perspective image, the detected object element can be used as a candidate object element of an object to be detected, and the candidate object element is searched in the associated perspective image based on the projection relation between the current perspective image and a related associated perspective image and the position of the candidate object element in the current perspective image, so that target element information of the candidate object element is generated according to the search result and initial element information of the candidate object element searched from the current perspective image, and the authenticity of the candidate object element is detected based on the target element information. The method for detecting the joint object elements of the candidate object elements in the current view image by associating the view image can accurately judge whether the candidate object elements are real object elements or object elements detected by mistake, so that the detection error of the object elements can be effectively reduced, the detection accuracy of the object elements is improved, and the detection accuracy of the object is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1a is a schematic view of a scene of object detection provided in an embodiment of the present application;

FIG. 1b is a flowchart of an object detection scheme provided by an embodiment of the present application;

fig. 1c is a schematic flowchart of workpiece inspection according to an embodiment of the present disclosure;

fig. 2 is a flowchart of an object detection method according to an embodiment of the present application;

fig. 3 is a flowchart of another object detection method provided in the embodiment of the present application;

fig. 4a is a schematic diagram of a projection relationship provided in an embodiment of the present application;

fig. 4b is a schematic diagram of a projection relationship of images at various viewing angles according to an embodiment of the present disclosure;

fig. 4c is a schematic projection diagram according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The following describes various terms related to the embodiments of the present application:

artificial Intelligence (AI): AI is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The AI technology is a comprehensive subject, and relates to the field of extensive technology, both hardware level technology and software level technology. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, processing technologies for large applications, operating/interactive systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The computer vision technology is a science for researching how to make a machine see, and in particular, the computer vision technology is to use a camera and a computer to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further perform graphic processing, so that the computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data; which typically includes techniques for image processing, video semantic understanding, video content/behavior recognition, and the like. The method mainly relates to the steps of detecting object elements of an image of an object to be detected through an AI model, and further judging whether the object element exists in the object to be detected.

The embodiment of the application provides an object detection scheme and an object detection system, so that the accuracy of object detection is improved. As shown in fig. 1a, the object detection system may comprise at least: a terminal device 101 and an object detection device 102 used by a person to be detected. The object detection scheme provided by the embodiment of the present application may be executed by the object detection device 102, where the object detection device 102 may be a terminal device or a server. The terminal device may include, but is not limited to: the examples of the present disclosure include, but are not limited to, smart phones (such as Android phones, IOS phones, etc.), tablet computers, portable personal computers, mobile internet devices (MID for short), smart voice interaction devices, smart appliances, and vehicle terminals. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and an artificial intelligence platform, which is not limited in the embodiment of the present application.

In a specific implementation, the general principle of the object detection scheme is shown in fig. 1b, specifically:

(1) after acquiring each view image in the multi-view image set of the object to be detected, the object detection device 102 separately performs object element detection on each view image to detect whether an object element to be detected exists in each view image; for example, it is assumed that the multi-view image set of the object to be examined includes view images of the object to be examined at N (N is a positive integer) different views: { X1…XNAcquiring, by the object detection device 102, view angle images of the object to be detected at N (N is a positive integer) different view angles, and then respectively aligning X1…XNObject element detection is performed separately. The view images in the multi-view image set of the object to be inspected may be transmitted from the terminal device 101 to the object detection device 102, or the object detection device 102 may capture the images through a mounted image capture device (e.g., a camera). The independent detection of the view angle images under different view angles can be realized by an Artificial Intelligence (AI) algorithm; for example, the initial model is trained through a sample set (a set of images labeled with object elements): inputting the sample image into an initial model, and adjusting parameters of the initial model according to a loss value between a prediction result of the initial model and an annotation result of the sample image to obtain an object element detection model, so that object element detection is performed on each view angle image through the object element detection model.

(2) If X1…XNAnd if the detection results of the object elements are that no object element to be detected exists, ending the operation and judging that the object to be detected passes the detection. Accordingly, if XiIf an object element to be detected is detected (i is a positive integer and i is less than or equal to N), the object detection device 102 may be at XiThe detected object element is taken as a candidate object element, which may be understood as a suspicious object element. Then, from X1…Xi-1,Xi-1…XNObtaining at least one of X andiassociated view angle images based on XiAnd the projection relationship between the associated perspective images, anCandidate element at XiSearching candidate object elements in the associated view angle image to obtain a search result; e.g. from X1…Xi-1,Xi-1…XNIn obtaining Xj,XjAnd XiThere is a projective relationship (based on X)jAnd XiThe overlapping region between them is constructed), the object detection device 102 sets the candidate object element at XiThe position in (3) is projected according to the projection relation to obtain the candidate object element in XjIf at XjDetecting K object elements, determining that each object element of the K object elements is in XjIn (1), K is a positive integer; calculating XjWherein each object element is in XjThe position matching degree of the middle position and the projection position is obtained, and a retrieval result is obtained based on the position matching degree; also for example, from X1…Xi-1,Xi-1…XNIn order to obtain multiple X andithe candidate object elements are arranged at X in the associated view angle image with the target projection relationiThe positions in the image are projected according to the projection relation to obtain the projection positions of the candidate object elements in the associated view images, and then the position matching degree of the position of each object element in each associated view image in the associated view image and the projection position is determined, so that the retrieval results of the candidate object elements in each associated view image are obtained. In one embodiment, the object detection device 102 calculates an intersection ratio between a projection position region in each associated perspective image and each object element region in the associated perspective image, determines that a candidate object element can be detected in the associated perspective image if the intersection ratio is greater than a scale threshold, and determines that no candidate object element is detected in the associated perspective image if the intersection ratio is less than or equal to the scale threshold.

(3) The object detection device 102 performs object element information fusion on the search results in multiple views and the initial element information of the candidate object element to obtain the target element information of the candidate object element, and detects X according to the target element informationiAuthenticity of the candidate object element; in one embodiment, the retrieval knots at various viewing anglesIf the result includes associated element information, the associated element information includes confidence of the candidate object element, XiIf the retrieval result indicates that the candidate object element is detected, performing weighted summation on the confidence coefficient of the candidate object element in the associated element information and the confidence coefficient of the candidate object element in the initial element information to obtain target element information of the candidate object element; if the retrieval result indicates that the candidate object element is not detected, reducing the confidence coefficient of the candidate object element to obtain the target element information of the candidate object element; and detecting X according to the target element informationiThe authenticity of the candidate object element (if the confidence in the target element information is greater than or equal to the confidence threshold, the candidate object element is judged to have authenticity; if the confidence in the target element information is less than the confidence threshold, the candidate object element is judged not to have authenticity). In another embodiment, the number of associated perspective images is N, N is an integer greater than 1, and the retrieval result of each associated perspective image includes associated element information; voting initial element information and N pieces of associated element information of the candidate object element to obtain the number of support votes of the initial element information and the number of support votes of each piece of associated element information; selecting object element information with the maximum number of supported tickets as target element information of candidate object elements from the initial element information and the N pieces of associated element information, and detecting X according to the target element informationiThe authenticity of the candidate object element.

(4) If XiIf the candidate object element does not have authenticity, continuing to Xi+1Performing object element detection if X1…XNIf no candidate object element with authenticity exists, judging that the object to be detected passes the detection; if XiIf the candidate object element has authenticity, judging that the object to be detected does not pass the detection.

It should be noted that the object detection method provided by the present application can be applied to various object element detection scenes. For example, the object to be examined may be a workpiece, a face, a hand, a body to be medically examined, a vehicle, or the like; then correspondingly, the object element may be a defect, an expression, a gesture, a lesion or a vehicle fault, etc.; further, the object element detection scene may specifically be: a defect detection scene, an expression detection scene, a gesture detection scene, a focus detection scene, a vehicle fault detection scene, and the like.

Fig. 1c is a schematic flow chart of workpiece detection according to an embodiment of the present disclosure, and as shown in fig. 1c, images of a workpiece to be detected are first acquired from multiple viewing angles to obtain a multi-viewing-angle image set of the workpiece to be detected, and if a defect is detected in a current viewing-angle image, the defect detected in the current viewing-angle image can be used as a candidate defect (or referred to as a suspicious defect). Then, acquiring at least one associated visual angle image of the current visual angle image from the multi-visual angle image set based on the projection relation among all the visual angle images, and retrieving candidate defects in the associated visual angle image to obtain a retrieval result; generating target defect information of the candidate defects according to the retrieval result and the initial defect information of the candidate defects, and detecting the authenticity of the candidate defects based on the target defect information; obtaining a detection result of the workpiece to be detected based on the authenticity of the candidate defects detected in each view image; for example, if the number of candidate defects with authenticity detected in each view angle image of the workpiece to be detected exceeds 2, determining that the workpiece to be detected is unqualified; correspondingly, if the number of the candidate defects with authenticity detected in each view angle image of the workpiece to be detected is not more than 2, the workpiece to be detected is judged to be qualified. Therefore, the object detection scheme provided by the application can be applied to industrial automatic quality inspection, wherein the industrial automatic quality inspection refers to a process of carrying out automatic quality detection in the production process of industrial products by using a computer algorithm. Compared with the traditional manual quality inspection process, the industrial automatic quality inspection device has the characteristics of intelligence, high efficiency and stability, and is the development direction of future industrial quality inspection. Further, the performance of the algorithm in the industrial automatic quality inspection can be measured by the omission factor and the over-killing factor; the missing rate is one of the commonly used indexes for measuring the performance of the algorithm in the industrial quality inspection, and reflects the workpiece ratio judged to be flawless (OK) by the algorithm for a batch of defective (NG) workpieces, and the calculation formula is as follows:

in the industrial quality inspection process, the lower the omission factor is, the better the performance of the algorithm is.

The over-killing rate is the same as the omission factor, and is another common index in industrial quality inspection. It reflects the rate at which the algorithm determines to be defective (NG) for a batch of non-defective (OK) workpieces, and is calculated as:

in the industrial quality inspection process, the lower the over-killing rate, the better the performance of the algorithm.

Similarly, for expression detection, if an expression (such as a smiling expression) to be detected is detected in a current perspective image, the expression detected in the current perspective image can be used as a candidate expression (or called a suspicious expression), then based on a projection relationship between the perspective images, at least one associated perspective image of the current perspective image is acquired from a multi-perspective image set, and the candidate expression is retrieved from the associated perspective image; after the retrieval result is obtained, judging the authenticity of the candidate expression according to the retrieval result; for example, if a candidate expression is detected in the associated perspective image, it is determined that the candidate expression in the current perspective image has authenticity; and if the candidate expression is not detected in the associated visual angle image, judging that the candidate expression in the current visual angle image does not have authenticity. Further, obtaining an expression detection result based on the authenticity of the candidate expression detected in each visual angle image; for example, if the ratio of the number of candidate expressions with authenticity detected in each perspective image to the number of each perspective image is greater than a proportional threshold, it is determined that an expression to be detected is detected; correspondingly, if the ratio of the number of the candidate expressions with authenticity detected in each perspective image to the number of each perspective image is smaller than or equal to a proportional threshold, it is determined that the expression to be detected is not detected.

For gesture detection, if a gesture (such as an OK gesture) to be detected is detected in a current perspective image, the gesture detected in the current perspective image can be used as a candidate gesture (or called a suspicious gesture), at least one associated perspective image of the current perspective image is obtained from a multi-perspective image set based on a projection relation between the perspective images, and a candidate target gesture is retrieved from the associated perspective image; after the retrieval result is obtained, judging the authenticity of the candidate gesture according to the retrieval result; for example, if a candidate gesture is detected in the associated perspective image, it is determined that the candidate gesture in the current perspective image has authenticity; and if the candidate gesture is not detected in the associated view image, judging that the candidate gesture in the current view image does not have authenticity. Further, obtaining a gesture detection result based on the authenticity of the candidate gesture detected in each perspective image; for example, if the ratio of the number of candidate gestures with authenticity detected in each perspective image to the number of each perspective image is greater than a proportional threshold, determining that a gesture to be detected is detected; correspondingly, if the ratio of the number of the candidate gestures with authenticity detected in each perspective image to the number of each perspective image is smaller than or equal to the proportional threshold, it is determined that the gesture to be detected is not detected.

For lesion detection, if a lesion to be detected is detected in a current view image, the lesion detected in the current view image can be used as a candidate lesion (or referred to as a suspicious lesion), based on a projection relationship between the view images, at least one associated view image of the current view image is obtained from a plurality of view image sets, and the candidate lesion is retrieved from the associated view image; after the retrieval result is obtained, judging the authenticity of the candidate focus according to the retrieval result; for example, if a candidate lesion is detected in the associated view image, it is determined that the candidate lesion in the current view image has authenticity; and if the candidate focus is not detected in the associated visual angle image, judging that the candidate focus in the current visual angle image has no authenticity. Further, based on the authenticity of the candidate focus detected in each view image, obtaining a focus detection result; for example, if the number of candidate lesions detected as having authenticity in each view angle image is greater than 3, it is determined that a lesion is detected; accordingly, if the number of candidate lesions detected as having authenticity in each of the perspective images is less than or equal to 3, it is determined that no lesion is detected.

For vehicle fault detection, if a vehicle fault needing to be detected is detected in a current view image, the vehicle fault detected in the current view image can be used as a candidate vehicle fault (or called as a suspicious vehicle fault), at least one associated view image of the current view image is obtained from a multi-view image set based on the projection relation among all the view images, and the candidate vehicle fault is retrieved from the associated view image; after the retrieval result is obtained, judging the authenticity of the candidate vehicle fault according to the retrieval result; for example, if a candidate vehicle fault is detected in the associated perspective image, it is determined that the candidate vehicle fault in the current perspective image has authenticity; and if the candidate vehicle fault is not detected in the associated view angle image, judging that the candidate vehicle fault in the current view angle image does not have authenticity. Further, obtaining a fault detection result based on the authenticity of the candidate vehicle fault detected in each view angle image; for example, if the number of candidate vehicle troubles detected in each perspective image with authenticity is greater than 3, it is determined that a vehicle trouble is detected; accordingly, if the number of candidate vehicle troubles detected in each perspective image with authenticity is less than or equal to 3, it is determined that no vehicle trouble is detected.

In the embodiment of the application, each view angle image (X) of the object to be detected1…XN) Detecting object element independently, if X is detectediIf there is a candidate object element, at least one and X are obtainediAssociated view angle images are associated, the candidate object element is searched in each associated view angle image, and X is detected according to the search result of each associated view angle image and the initial element information of the candidate object elementiThe authenticity of the candidate object element. Therefore, the candidate object elements are searched through the multi-view search result, the detection error of the object elements can be effectively reduced, and the object detection is further improvedAccuracy.

Based on the above description of the object detection method, the present application embodiment proposes an object detection method, which can be executed by the object detection device 102 mentioned in fig. 1 a. Referring to fig. 2, the object detection method may include the following steps S201 to S205:

s201, traversing each view image in the multi-view image set of the object to be detected.

The object detection equipment carries out object element detection on all view images in a multi-view image set of the object to be detected one by one, and different view images are obtained by shooting the object to be detected by adopting different shooting views. In one embodiment, the object element detection for each view image refers to: carrying out object element detection on each view image through an object element detection model to obtain an object element detection result of each view image; the object element detection model is obtained by training an initial model through a sample set (an image set marked with object elements). Specifically, a sample image is input into an initial model, and parameters of the initial model are adjusted according to a loss value between a prediction result of the initial model and an annotation result of the sample image, so that an object element detection model is obtained.

S202, if an object element needing to be detected is detected in the current traversed view image, the detected object element is used as a candidate object element of the object to be detected, and the position of the candidate object element in the current view image and the initial element information of the candidate object element detected from the current view image are determined.

Wherein, the initial element information of the candidate object element is used for indicating the related information of the candidate object element, and the initial element information of the candidate object element may include but is not limited to the confidence of the candidate object element and the element category of the candidate object element; for example, assuming that the candidate object element is a candidate defect, the initial element information of the candidate defect may include, but is not limited to, a confidence level of the candidate defect, and an element category (e.g., a missing category, a scratch category, etc.) of the candidate defect; for another example, assuming the candidate object element as a candidate lesion, the initial elemental information of the candidate lesion may include, but is not limited to, the confidence level of the candidate lesion, the element type (e.g., infection type, lesion type, etc.) of the candidate lesion, the risk level (e.g., primary, secondary, tertiary, etc.) of the candidate lesion, etc.; for another example, assuming that the candidate object element is a candidate gesture, the initial element information of the candidate gesture may include, but is not limited to, a confidence level of the candidate gesture, an element category of a hand (e.g., left hand or right hand) corresponding to the candidate gesture, a number of fingers corresponding to the candidate gesture, and the like.

If the object detection device detects an object element in the currently traversed current perspective image (if the object element is detected in the current perspective image output by the object element detection model), taking the detected object element as a candidate object element of the object to be detected, and determining the position of the candidate object element in the current perspective image and initial element information of the candidate object element detected from the current perspective image according to the object element detection result; for example, the object element detection result carries coordinates of the object element, and the object detection device determines the position of the candidate object element in the current view image according to the coordinates, and packages relevant information such as confidence degree and element category of the candidate object element into initial element information of the candidate object element.

S203, acquiring the related view angle image of the current view angle image from the remaining view angle images except the current view angle image in the multi-view angle image set.

The associated view image of the current view image means: and in the multi-view image set, viewing angle images which have a projection relation with the current viewing angle image exist in the remaining viewing angle images except the current viewing angle image. The projection relation is determined through multi-view joint modeling of the multi-view image set, and the projection relation between the current view image and the associated view image is one of the projection relations in the multi-view joint modeling result of the multi-view image set. In one embodiment, any projection relation in the multi-view joint modeling result is constructed as follows:

selecting a view angle image from a multi-view angle image set of an object to be detected as a reference view angle image, and selecting a view angle image which has an overlapping area with the reference view angle image from the multi-view angle image set as a reference view angle image (the overlapping area refers to an image area which simultaneously appears in two view angle images); determining a plurality of reference calibration points in the reference view image and a plurality of reference calibration points in the reference view image; one reference calibration point corresponds to one reference calibration point, and the calibration points refer to: calibrating the position of a characteristic point of an object to be detected in the visual angle image to obtain a point; for example, if the reference perspective image and the reference perspective image both include feature points 1-4 of the object to be detected, the position of the feature point 1 in the reference perspective image is marked as a reference marking point 1, and the reference marking point 1 corresponds to the reference marking point 1; similarly, a reference calibration point 2-a reference calibration point 4 may be calibrated based on the positions of the feature points 2-4 in the reference view image, and a reference calibration point 2-a reference calibration point 4 may be calibrated based on the positions of the feature points 2-4 in the reference view image. Calculating a projection relation between the reference visual angle image and the reference visual angle image according to the position coordinates of each reference calibration point and the position coordinates of the corresponding reference calibration point; for example, the perspective transformation matrix between the reference perspective image and the reference perspective image can be solved according to the position coordinates of the plurality of reference calibration points and the position coordinates of the reference calibration points corresponding to the reference points, so as to obtain the projection relationship between the reference perspective image and the reference perspective image.

And S204, searching candidate object elements in the associated view images based on the projection relation between the current view images and the associated view images and the positions of the candidate object elements in the current view images.

The object detection device projects the position of the candidate object element in the current perspective image into the associated perspective image to obtain a projection position based on the projection relationship between the current perspective image and the associated perspective image. The object detection device retrieves the associated view image based on the projection position to determine whether the candidate object element can be retrieved from the associated view image, specifically:

if K object elements are detected in the associated view image, determining the position of each object element in the K object elements in the associated view image, wherein K is a positive integer; calculating the position matching degree between the position of each object element in the associated view angle image and the projection position; the position matching degree may be specifically calculated by calculating an intersection ratio of a position of each object element in the associated perspective image and the projection position (an intersection area of a region corresponding to the position of each object element in the associated perspective image and a region corresponding to the projection position is divided by a merging area), an overlap ratio (an intersection area of a region corresponding to the position of each object element in the associated perspective image and a region corresponding to the projection position is divided by an area of a region corresponding to the projection position), a center point distance (a distance between a center point of the position of each object element in the associated perspective image and a center point of the projection position), and the like. After the position matching degree between the position of each object element in the associated view angle image and the projection position is obtained, the object detection equipment searches the position matching degree which is greater than a matching degree threshold value in the calculated position matching degree; if the retrieval is successful, determining that the candidate object elements are retrieved from the associated view angle image; and if the retrieval fails, determining that the candidate object element is not retrieved in the associated view angle image.

Accordingly, if no object element is detected in the associated view image (i.e., no object element exists in the associated view image), it is determined that no candidate object element is retrieved in the associated view image.

S205, generating target element information of the candidate object element according to the retrieval result and the initial element information of the candidate object element, and detecting the authenticity of the candidate object element based on the target element information.

The search result is used to indicate whether a candidate object element is searched in the associated view image, and if the search result indicates that the candidate object element is searched in the associated view image, the search result may further include object element information of the candidate object element (such as a confidence of the candidate object element, an element type, and the like).

In one embodiment, the initial element information includes confidence levels of the candidate object elements, and if the retrieval result indicates that the candidate object elements are not retrieved from the associated perspective image, the object detection apparatus performs a reduction process (e.g., reduction by 0.1) on the confidence levels in the initial element information, and uses the object element information obtained after the reduction process as target element information of the candidate object elements. Accordingly, if the retrieval result indicates that the candidate object element is retrieved from the associated view image, the object detection device takes the initial element information of the candidate object element detected from the associated view image as the target element information, and performs information fusion (for example, fusion of confidence degrees of the candidate object elements) on the initial element information and the associated element information of the candidate object element to obtain the target element information of the candidate object element.

In another embodiment, the initial element information includes element categories of the candidate object elements, the number of associated perspective images is N, N is an integer greater than 1, and the search result of each associated perspective image includes associated element information; voting initial element information and N pieces of associated element information of the candidate object element to obtain the number of support votes of the initial element information and the number of support votes of each piece of associated element information; and selecting object element information with the maximum number of supported tickets as target element information of the candidate object elements from the initial element information and the N pieces of associated element information.

After the target element information of the candidate object element is obtained, the object detection device judges whether the confidence in the target element information is greater than a confidence threshold. If the confidence coefficient in the target element information is greater than or equal to the confidence coefficient threshold value, judging that the candidate object element has authenticity; and if the confidence coefficient in the target element information is smaller than the confidence coefficient threshold value, judging that the candidate object element does not have authenticity.

According to the embodiment of the application, when an object element is detected in a currently traversed current perspective image, the detected object element can be used as a candidate object element of an object to be detected, and the candidate object element is searched in the associated perspective image based on the projection relation between the current perspective image and a related associated perspective image and the position of the candidate object element in the current perspective image, so that target element information of the candidate object element is generated according to the search result and initial element information of the candidate object element searched from the current perspective image, and the authenticity of the candidate object element is detected based on the target element information. The method for detecting the joint object elements of the candidate object elements in the current view image by associating the view image can accurately judge whether the candidate object elements are real object elements or object elements detected by mistake, so that the detection error of the object elements can be effectively reduced, the detection accuracy of the object elements is improved, and the detection accuracy of the object is further improved.

Based on the above description of the object detection method, the present application embodiment proposes another object detection method, which can be performed by the object detection apparatus 102 mentioned in fig. 1 a. Referring to fig. 3, the object detection method may include the following steps S301 to S308:

s301, performing multi-view joint modeling on the multi-view image set of the object to be detected.

And carrying out homography estimation on each view image in the multi-view image set of the object to be detected, and further associating the multi-view images. Wherein, homography estimation refers to that isomorphic relations among different projective spaces are estimated in projective geometry, namely the mapping relation from one plane to another plane in the space; the multi-view image association refers to collecting a plurality of pictures under different views for an object to be inspected (such as a workpiece), and then modeling the mapping relation between the pictures by utilizing homography estimation.

In one embodiment, one view image is selected from a multi-view image set of an object to be inspected as a reference view image, and a view image having an overlapping area with the reference view image is selected from the multi-view image set as a reference view image; determining a plurality of reference calibration points in the reference view image and a plurality of reference calibration points in the reference view image; one reference calibration point corresponds to one reference calibration point, and the calibration points refer to: calibrating the position of a characteristic point of an object to be detected in the visual angle image to obtain a point; for example, if the reference perspective image and the reference perspective image both include feature points 1-4 of the object to be detected, the position of the feature point 1 in the reference perspective image is marked as a reference marking point 1, and the reference marking point 1 corresponds to the reference marking point 1; similarly, a reference calibration point 2-a reference calibration point 4 may be calibrated based on the positions of the feature points 2-4 in the reference view image, and a reference calibration point 2-a reference calibration point 4 may be calibrated based on the positions of the feature points 2-4 in the reference view image.

Determining a plurality of reference calibration points in the reference visual angle image, and after determining a plurality of reference calibration points in the reference visual angle image, acquiring a perspective transformation matrix to be solved by the object detection equipment, wherein the perspective transformation matrix to be solved comprises a plurality of parameters to be solved; projecting the position coordinates of each reference calibration point from a two-dimensional space where the reference visual angle image is located to a three-dimensional space by adopting a perspective transformation matrix to be solved so as to obtain three-dimensional space coordinates of each reference calibration point, wherein the three-dimensional space coordinates of each reference calibration point comprise a plurality of parameters; projecting the three-dimensional space coordinate of each reference calibration point to a two-dimensional space where the reference visual angle image is located to obtain the projection coordinate of each reference calibration point, wherein the projection coordinate of each reference calibration point comprises a plurality of parameters; and solving values of all parameters in the perspective transformation matrix according to the constraint condition that the projection coordinate of each reference calibration point is equal to the position coordinate of the corresponding reference calibration point to obtain a solved perspective transformation matrix, and adopting the solved perspective transformation matrix as the projection relation between the reference visual angle image and the reference visual angle image.

Fig. 4a is a schematic diagram of a projection relationship provided in an embodiment of the present application. As shown in fig. 4a, calibrating the feature points in the overlapping region of the reference view image to obtain a reference calibration point 1-a reference calibration point 4, and calibrating the feature points in the overlapping region of the reference view image to obtain a reference calibration point a-a reference calibration point d; wherein the reference calibration point 1 corresponds to the reference calibration point a, the reference calibration point 2 corresponds to the reference calibration point b, the reference calibration point 3 corresponds to the reference calibration point c, and the reference calibration point 4 corresponds to the reference calibration point d; and solving a projection transformation matrix between the reference visual angle image and the reference visual angle image based on each reference calibration point and the reference calibration point corresponding to the reference calibration point to obtain a projection relation between the reference visual angle image and the reference visual angle image.

Specifically, the Perspective Transformation (Perspective Transformation) projects a reference index point in a reference view image into the reference view image, and is also called projection Mapping (projected Mapping). The method can be divided into two parts: firstly, projecting a reference calibration point in a reference visual angle image into a three-dimensional space, and then projecting a three-dimensional space coordinate into a two-dimensional space where the reference visual angle image is located to obtain a projection coordinate of each reference calibration point. The transformation formula for projecting the reference calibration point in the reference view image into the three-dimensional space is as follows:

where (x, y) is the position coordinate of the reference index point in the reference view image, M [ x y 1]TThe reference index point is mapped into a three-dimensional space through a perspective transformation matrix M, the mapped three-dimensional space coordinates are expressed as (X, Y, Z), and M can be expressed as:

based on the above description, in general, the process of projecting the reference calibration point in the reference perspective image into the three-dimensional space can be expressed by the following formula:

further, the three-dimensional space coordinates in the three-dimensional space can be projected to the two-dimensional space where the reference visual angle image is located, and the projection coordinates of each reference calibration point are obtained; then, according to the constraint condition that the projection coordinate of each reference calibration point is equal to the position coordinate of the corresponding reference calibration point, the following equation is established:

wherein, (x ', y') is a position coordinate of a reference calibration point corresponding to the reference calibration point (x, y) of the reference view image in the reference view image. To facilitate solving for the values of the various parameters in M, M in M can be used33The value of (1) is set to be 1, so that M only contains 8 unknowns, and M can be solved through 4 groups of points corresponding to the reference visual angle image and the reference visual angle image (each group of points comprises a reference calibration point in the reference visual angle image and a reference calibration point corresponding to the reference calibration point in the reference visual angle image, at least 3 reference calibration points in the reference visual angle image are not collinear, and at least 3 reference calibration points in the reference visual angle image are not collinear), so that a perspective transformation matrix from the reference visual angle image to the reference visual angle image is obtained, similarly, a perspective transformation matrix from the reference visual angle image to the reference visual angle image can be obtained through inverse operation, and the mapping relation between the reference visual angle image and the reference visual angle image is further determined.

In one embodiment, the object detection device may implement the solution of the perspective change matrix by calling the built-in function of OpenCV (an open source function library for image processing, analysis, machine vision). The specific implementation code is as follows:

calibrating the characteristic points in the reference visual angle image and the reference visual angle image;

import cv2;

importnumpy as np;

# is the coordinates of the fiducial marker point in the fiducial view image and the coordinates of the reference marker point in the reference view image (numerical values are used for example only);

points_m = np.float32([[1000,100],[800,120],[650,400],[900,700]]);

points_n = np.float32([[500,20],[250,50],[200,400],[400,700]]);

# calculates the perspective transformation matrix;

M = cv2.getPerspectiveTransform(points_m,points_n).

based on the above-mentioned description of step S301, it is assumed that the multi-view image set of the object to be examined includes view images 1 to 5. Fig. 4b is a schematic diagram of a projection relationship between images at different viewing angles according to an embodiment of the present application. As shown in fig. 4b, after the multi-view joint modeling is completed, the view images 1-5 in the multi-view image set of the object to be inspected may be associated by a perspective transformation matrix; for example, a point in perspective image 1 may be projected into perspective image 2 by perspective transformation matrix M12, and a point in perspective image 2 may also be projected into perspective image 1 by perspective transformation matrix M21.

S302, traversing each view image in the multi-view image set of the object to be detected.

In one embodiment, let us say that the object detection device is currently focusing on view images X in the multi-view image setiPerforming detection to determine the view angle image XiIf there is an object element to be detected, then using detection algorithm to detect the view angle image XiThe detection result obtained by performing the detection can be expressed as:

wherein, f (X)i) Representing by using detection algorithm f (X) to view angle image XiDetection is carried out, K represents the image X from the visual angleiThe number of detected candidate object elements (K ≧ 0); b isikAs a view angle image XiThe bounding box of the detected candidate element k, CikElement class, conf, of candidate element kikIs the confidence of candidate element k.

And S303, if an object element needing to be detected is detected in the current traversed current view image, using the detected object element as a candidate object element of the object to be detected, and determining the position of the candidate object element in the current view image and initial element information of the candidate object element detected from the current view image.

S304, acquiring the related view images of the current view images from the residual view images except the current view images in the multi-view image set.

The specific implementation of step S304 and step S305 can refer to the implementation of step S202 and step S203 in fig. 2, and will not be described herein again.

S305, searching candidate object elements in the associated view images based on the projection relation between the current view images and the associated view images and the positions of the candidate object elements in the current view images.

The object detection device projects the position of the candidate object element in the current perspective image into the associated perspective image to obtain a projection position based on the projection relationship between the current perspective image and the associated perspective image. Fig. 4c is a schematic projection diagram according to an embodiment of the present disclosure. As shown in fig. 4c, the position of the candidate object element in the current view image is projected according to the projection relationship between the current view image and the associated view image, so as to obtain the projection position of the object element in the associated view image.

The object detection device retrieves the associated view image based on the projection position to determine whether the candidate object element can be retrieved from the associated view image, specifically:

if K object elements are detected in the associated view image, determining the position of each object element in the K object elements in the associated view image, wherein K is a positive integer; calculating the position matching degree between the position of each object element in the associated view angle image and the projection position; the position matching degree may be specifically calculated by calculating an intersection ratio of a position of each object element in the associated perspective image and the projection position (an intersection area of a region corresponding to the position of each object element in the associated perspective image and a region corresponding to the projection position is divided by a merging area), an overlap ratio (an intersection area of a region corresponding to the position of each object element in the associated perspective image and a region corresponding to the projection position is divided by an area of a region corresponding to the projection position), a center point distance (a distance between a center point of the position of each object element in the associated perspective image and a center point of the projection position), and the like.

In one embodiment, the position of each object element in the K object elements in the associated view image is identified by using a labeling frame, the projection position is identified by using a projection frame, the object detection device calculates the intersection ratio of the projection frame and each labeling frame, and determines the intersection ratio of the projection frame and each labeling frame as the position matching degree between the position of the object element in the associated view image and the projection position.

In another embodiment, the position of each object element in the K object elements in the associated perspective image is identified by using a labeling frame, the projection position is identified by using a projection frame, the object detection device calculates the degree of coincidence between the projection frame and each labeling frame, and determines the degree of coincidence between the projection frame and each labeling frame as the position matching degree between the position of the object element in the associated perspective image and the projection position.

In still another embodiment, the object detection apparatus acquires the center point coordinate of each object element in the associated perspective image and the center point coordinate of the projection position, calculates the distance between the center point coordinate of each object element and the center point coordinate of the projection position, and determines the distance between the center point coordinate of the projection position and the center point coordinate of each object element as the position matching degree between the position of the object element in the associated perspective image and the projection position.

After the position matching degree between the position of each object element in the associated view angle image and the projection position is obtained, if the position matching degree is determined according to the intersection ratio of the projection frame and each marking frame or the coincidence degree of the projection frame and each marking frame, the object detection equipment searches the position matching degree which is greater than a matching degree threshold value in the calculated position matching degree; if the retrieval is successful, determining that the candidate object elements are retrieved from the associated view angle image; and if the retrieval fails, determining that the candidate object element is not retrieved in the associated view angle image.

If the position matching degree is determined according to the distance between the center point coordinate of the projection position and the center point coordinate of each object element, the object detection equipment searches the position matching degree smaller than the matching degree threshold value in the calculated position matching degree; if the retrieval is successful, determining that the candidate object elements are retrieved from the associated view angle image; and if the retrieval fails, determining that the candidate object element is not retrieved in the associated view angle image.

Taking defect detection as an example, based on the projection relationship between the current view image and the associated view image, projecting the position of the candidate defect in the current view image to the associated view image to obtain the projection position of the candidate defect; if the candidate defect is detected in the associated view angle image, determining the position of the candidate defect in the associated view angle image; calculating the position matching degree between the position of the candidate defect in the associated visual angle image and the projection position to obtain a retrieval result; for example, if the position matching degree between the position of the candidate defect in the associated view angle image and the projection position is greater than the matching degree threshold value, the search is successful, and the candidate defect is determined to be detected in the associated view angle image; correspondingly, if the position matching degree between the position of the candidate defect in the associated view angle image and the projection position is less than or equal to the matching degree threshold value, the retrieval is failed, and the candidate defect is judged not to be detected in the associated view angle image.

In addition, if no object element is detected in the associated view image (i.e., no object element exists in the associated view image), it is determined that no candidate object element is retrieved in the associated view image.

S306, generating target element information of the candidate object element according to the retrieval result and the initial element information of the candidate object element, and detecting the authenticity of the candidate object element based on the target element information.

The search result is used to indicate whether a candidate object element is searched in the associated view image, and if the search result indicates that the candidate object element is searched in the associated view image, the search result may further include object element information of the candidate object element (such as a confidence of the candidate object element, an element type, and the like).

In one embodiment, the initial element information includes confidence levels of the candidate object elements, and if the retrieval result indicates that the candidate object elements are not retrieved from the associated perspective image, the object detection apparatus performs a reduction process (e.g., reduction by 0.1) on the confidence levels in the initial element information, and uses the object element information obtained after the reduction process as target element information of the candidate object elements. Accordingly, if the retrieval result indicates that the candidate object element is retrieved in the associated view image, the object detection device takes the object element information of the candidate object element detected in the associated view image as associated element information, and performs information fusion on the initial element information and the associated element information of the candidate object element to obtain target element information of the candidate object element. Specifically, the initial element information and the associated element information both include confidence degrees of candidate object elements, the object detection device obtains a first weight of the initial element information and a second weight of the associated element information, and performs weighted summation on the confidence degrees in the initial element information and the confidence degrees in the associated element information by using the first weight and the second weight to obtain a target confidence degree, and adds the target confidence degree to the target element information of the candidate object elements; for example, the confidence of the candidate object element in the initial element information is set to be 0.8, and the first weight is set to be 1; the confidence of the candidate object element in the associated element information is 0.4, and the second weight is 0.2, then the target confidence =0.8 × 1+0.4 × 0.2= 0.88.

In another embodiment, the number of the associated perspective images is N, where N is an integer greater than 1, the initial element information and the associated element information of each associated perspective image further include an element category of the candidate object element, and if the element category in the initial element information and the element category in each associated element information are the same, the element category in the initial element information is taken as a target element category and added to the target element information of the candidate object element; if the element category in at least one piece of associated element information is different from the element category in the initial element information, counting the number of each element category, determining the element category with the largest number as a target element category, and adding the target element category to the target element information of the candidate object element; for example, assuming that the number of associated perspective images is 9, the element class indicating the artifact a is class 1, the element class indicating the artifact a is class 2 in the associated element information of the associated perspective images 1 to 7, and the element class indicating the artifact a is class 1 in the associated element information of the associated perspective images 8 and 9, the target element class of the artifact a is determined to be class 2 because the number 7 of classes 1 > the number 3 of classes 2.

Optionally, by combining the two embodiments, the initial element information and the associated element information of each associated view image include both a confidence level and an element category, and the initial element information and the associated element information are fused to obtain target element information, that is, the target element information includes both a target confidence level and a target element category.

In yet another embodiment, the initial element information includes a confidence of the candidate object element and an element class of the candidate object element, and the associated element information includes an element class of the candidate object element. If the element type in the initial element information is the same as the element type in the associated element information, performing augmentation processing on the confidence coefficient in the initial element information (for example, augmenting the confidence coefficient by 0.1), and taking the description information obtained after augmentation processing (including the augmented confidence coefficient and the element type of the candidate object element) as the target element information of the candidate object element; if the element type in the initial element information is different from the element type in the associated element information, performing reduction processing on the confidence coefficient in the initial element information (for example, reducing the confidence coefficient by 0.1), and taking the element information obtained after the reduction processing (including the reduced confidence coefficient and the element type of the candidate object element) as target element information of the candidate object element; for example, in the initial element information of the current view image, the confidence of the candidate object element 1 is 0.7, and the element type is type 1; if the element type of candidate element 1 in the associated element information is category 1, the confidence =0.7+0.1=0.8 of candidate element 1; if the element type of candidate element 1 in the associated element information is category 1, the confidence =0.7-0.1=0.6 of candidate element 1. And taking the updated confidence coefficient of the candidate element 1 as the target element information of the candidate element 1. It should be noted that the above numerical values are only examples and do not constitute practical limitations of the present application.

In yet another embodiment, the initial element information includes element categories of the candidate object elements, the number of associated perspective images is N, N is an integer greater than 1, and the retrieval result of each associated perspective image includes associated element information; voting initial element information and N pieces of associated element information of the candidate object element to obtain the number of support votes of the initial element information and the number of support votes of each piece of associated element information; and selecting object element information with the maximum number of supported tickets as target element information of the candidate object elements from the initial element information and the N pieces of associated element information.

In one embodiment, the initial element information and the associated element information both include confidence degrees of the candidate object elements, and if the confidence degrees of the candidate object elements in the initial element information or the associated element information are greater than or equal to a confidence degree threshold value, the candidate object elements are approved to have authenticity; if the confidence coefficient of the candidate object element in the initial element information or the associated element information is smaller than the confidence coefficient threshold value, the objection against the candidate object element is true, and target element information of the candidate object element is generated according to the voting result; for example, if the confidence threshold is set to 0.6 and the confidence of the candidate object element in the initial element information is set to 0.7, the vote is awarded based on the initial element information; if the confidence of the candidate object element in the associated element information 1 is 0.3, the vote is objected to based on the associated element information 1. After the target element information of the candidate object element is obtained, if the vote agreeing that the candidate object element has authenticity is greater than or equal to the vote objecting that the candidate object element has authenticity, judging that the candidate object element has authenticity; if the vote agreeing that the candidate object element has authenticity is less than the vote agreeing that the candidate object element has authenticity, then the candidate object element is determined to have no authenticity.

In another embodiment, the initial element information and the associated element information both include element categories of candidate object elements, the ticket obtaining of each element category is counted, the element category with the most ticket obtaining is determined as a target element category, and the target element category and the ticket obtaining rate of the target element category are used as target element information of the candidate object elements; for example, assuming that the vote of the element category 1 is 10 tickets and the vote of the element category 2 is 30 tickets, the element category 2 is determined as the target element category. Further, if the ticket obtaining rate of the target element category is greater than or equal to the ticket obtaining rate threshold value, judging that the candidate object element has authenticity, and determining the category of the candidate object element as the target element category; if the ticket obtaining rate of the target element type is smaller than the ticket obtaining rate threshold value, judging that the candidate object element does not have authenticity; for example, assuming that the threshold value of the ticket obtaining rate is 60%, if the ticket obtaining rate of the target element category is 75%, it is determined that the candidate object element has authenticity, and the category of the candidate object element is the target element category; accordingly, if the vote rate of the target element category is 55%, it is determined that the candidate object element does not have authenticity.

In yet another embodiment, the initial element information and the associated element information include both the confidence level of the candidate object element and the element category of the candidate object element, and the confidence level and the element category of the candidate object element in the initial element information or the associated element information are integrated to perform voting (for example, if the confidence level of the candidate object element is greater than a confidence threshold value and the element category belongs to a preset element category, then the candidate object element is voted for authenticity, otherwise the candidate object element is voted for authenticity), and the target element information of the candidate object element is generated according to the voting result. After the target element information of the candidate object element is obtained, if the vote agreeing that the candidate object element has authenticity is greater than or equal to the vote objecting that the candidate object element has authenticity, judging that the candidate object element has authenticity; if the vote agreeing that the candidate object element has authenticity is less than the vote agreeing that the candidate object element has authenticity, then the candidate object element is determined to have no authenticity.

S307, counting authenticity detection results of H candidate object elements detected from all the view angle images, wherein H is a positive integer.

The number of candidate object elements detected from each view image in the multi-view image set is counted, and the authenticity of each candidate object element is obtained through the above steps S302 to S306.

And S308, judging the object detection result of the object to be detected according to the authenticity check result of the H candidate object elements.

In one embodiment, if the authenticity check result that h candidate object elements exist is that the candidate object elements have authenticity, determining that the object to be detected does not pass the detection; otherwise, judging that the object to be detected passes the detection; h is a positive integer, and H is less than or equal to H; for example, for the workpiece a, assuming that h =5 (i.e., when the number of candidate defects of the workpiece having authenticity is less than 5, the workpiece is determined to be detected qualified), and a total of 10 candidate defects are detected from the multi-perspective image set of the workpiece a, wherein authenticity check results of 6 candidate defects are determined to be authentic, the workpiece a is determined to be detected unqualified.

In the embodiment of the application, for each candidate object element on the object to be detected, corresponding associated element information can be found under multiple viewing angles, and the omission ratio is low accordingly; similarly, even if the candidate element is not detected in a particular view, but is detected in most other views, the over-killing rate is reduced. By comprehensively judging the object element information under multiple viewing angles, such as voting, weight fusion and other modes, the real object element can be found with higher probability, and the situations of over-killing and missing detection in the object detection process are improved. Therefore, the homography estimation technology is used for carrying out multi-view angle correlation modeling on the object to be detected in the industrial automatic quality inspection process, and morphological information of different view angles in a plurality of pictures is used for carrying out joint inference on the quality inspection result of the object to be detected, so that the error of object element detection can be effectively reduced, the accuracy of object detection is improved, and meanwhile, the detection result has robustness.

While the method of the embodiments of the present application has been described in detail above, to facilitate better implementation of the above-described aspects of the embodiments of the present application, the apparatus of the embodiments of the present application is provided below accordingly.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an object detecting apparatus according to an embodiment of the present disclosure, which may be mounted on the object detecting device 102 shown in fig. 1 a. The object detection apparatus shown in fig. 5 may be used to perform some or all of the functions in the method embodiments described above with respect to fig. 2 and 3. Referring to fig. 5, the details of each unit are as follows:

the processing unit 501 is configured to traverse each view-angle image in the multi-view-angle image set of the object to be inspected, where different view-angle images are obtained by shooting the object to be inspected with different shooting view angles; and the image processing device is used for taking the detected object element as a candidate object element of the object to be detected if the object element to be detected is detected in the current view image which is traversed currently, and determining the position of the candidate object element in the current view image and the initial element information of the candidate object element detected from the current view image;

an obtaining unit 502, configured to obtain an associated view image of the current view image from remaining view images, except for the current view image, in the multi-view image set, where a projection relationship exists between the current view image and the associated view image;

the processing unit 501 is further configured to retrieve a candidate object element from the associated perspective image based on a projection relationship between the current perspective image and the associated perspective image, and a position of the candidate object element in the current perspective image; and the target element information is used for generating the target element information of the candidate object element according to the retrieval result and the initial element information of the candidate object element, and detecting the authenticity of the candidate object element based on the target element information.

In one embodiment, the projection relationship between the current perspective image and the associated perspective image is one of the projection relationships in the multi-perspective joint modeling result of the multi-perspective image set;

the construction method of any projection relation in the multi-view joint modeling result is as follows:

selecting a visual angle image from a multi-visual angle image set of an object to be detected as a reference visual angle image, and selecting a visual angle image which has an overlapping area with the reference visual angle image from the multi-visual angle image set as a reference visual angle image;

determining a plurality of reference calibration points in the reference view image and a plurality of reference calibration points in the reference view image; one reference calibration point corresponds to one reference calibration point, and the calibration points refer to: calibrating the position of a characteristic point of an object to be detected in the visual angle image to obtain a point;

and calculating the projection relation between the reference visual angle image and the reference visual angle image according to the position coordinates of each reference calibration point and the position coordinates of the corresponding reference calibration point.

In an embodiment, the processing unit 501 is configured to calculate a projection relationship between the reference perspective image and the reference perspective image according to the position coordinates of each reference calibration point and the position coordinates of the corresponding reference calibration point, and specifically is configured to:

acquiring a perspective transformation matrix to be solved, wherein the perspective transformation matrix to be solved comprises a plurality of parameters to be solved;

projecting the position coordinates of each reference calibration point from a two-dimensional space where the reference visual angle image is located to a three-dimensional space by adopting a perspective transformation matrix to be solved so as to obtain three-dimensional space coordinates of each reference calibration point, wherein the three-dimensional space coordinates of each reference calibration point comprise a plurality of parameters;

projecting the three-dimensional space coordinate of each reference calibration point to a two-dimensional space where the reference visual angle image is located to obtain the projection coordinate of each reference calibration point, wherein the projection coordinate of each reference calibration point comprises a plurality of parameters;

solving the value of each parameter in the perspective transformation matrix according to the constraint condition that the projection coordinate of each reference calibration point is equal to the position coordinate of the corresponding reference calibration point, so as to obtain the solved perspective transformation matrix;

and adopting the solved perspective transformation matrix as a projection relation between the reference visual angle image and the reference visual angle image.

In an embodiment, the processing unit 501 is configured to retrieve, based on a projection relationship between the current perspective image and the associated perspective image and a position of the candidate object element in the current perspective image, the candidate object element in the associated perspective image, and is specifically configured to:

based on the projection relation between the current view angle image and the associated view angle image, projecting the position of the candidate object element in the current view angle image to the associated view angle image to obtain a projection position;

if K object elements are detected in the associated view image, determining the position of each object element in the K object elements in the associated view image, wherein K is a positive integer;

calculating the position matching degree between the position of each object element in the associated view angle image and the projection position; searching the position matching degree which is greater than the threshold value of the matching degree from the calculated position matching degrees;

if the retrieval is successful, determining that the candidate object elements are retrieved from the associated view angle image; and if the retrieval fails, determining that the candidate object element is not retrieved in the associated view angle image.

In one embodiment, the position of each object element in the associated view image is identified by a label frame, and the projection position is identified by a projection frame;

the processing unit 501 is configured to calculate a position matching degree between a position of each object element in the associated view image and the projection position, and specifically, is configured to:

calculating the intersection ratio between the position of the kth object element in the associated view angle image and the projection position, wherein K belongs to [1, K ];

and determining the calculated intersection ratio as the position matching degree between the position of the kth object element in the associated view angle image and the projection position.

In an embodiment, the initial element information includes a confidence of the candidate object element, and the processing unit 501 is configured to generate target element information of the candidate object element according to the search result and the initial element information of the candidate object element, and specifically is configured to:

if the retrieval result indicates that the candidate object element is not retrieved in the associated view image, reducing the confidence coefficient in the initial element information;

and taking the object element information obtained after the reduction processing as target element information of the candidate object element.

In one embodiment, the processing unit 501 is configured to generate target element information of the candidate object element according to the search result and initial element information of the candidate object element, and specifically is configured to:

if the retrieval result indicates that the candidate object element is retrieved in the associated view image, taking object element information of the candidate object element detected in the associated view image as associated element information;

and carrying out information fusion on the initial element information and the associated element information of the candidate object element to obtain the target element information of the candidate object element.

In one embodiment, the initial element information includes a confidence of the candidate object element, and the associated element information includes a confidence of the candidate object element;

the processing unit 501 is configured to perform information fusion on the initial element information and the associated element information of the candidate object element to obtain target element information of the candidate object element, and specifically configured to:

acquiring a first weight of the initial element information and a second weight of the associated element information;

weighting and summing the confidence coefficient in the initial element information and the confidence coefficient in the associated element information by adopting the first weight and the second weight to obtain a target confidence coefficient;

adding the target confidence to the target element information of the candidate object element.

In one embodiment, the initial element information includes element categories of the candidate object elements, the number of the associated view images is N, the associated element information of each associated view image includes the element categories of the candidate object elements, and N is an integer greater than 1;

the processing unit 501 is configured to perform information fusion on the initial element information and the associated element information of the candidate object element to obtain target element information of the candidate object element, and specifically configured to:

if the element category in the initial element information is the same as the element category in each associated element information, taking the element category as a target element category and adding the target element category to the target element information of the candidate object element;

if the element type in at least one piece of associated element information is different from the element type in the initial element information, counting the number of each element type, determining the element type with the largest number as a target element type, and adding the target element type to the target element information of the candidate object element.

In one embodiment, the initial element information includes a confidence of the candidate object element and an element class of the candidate object element, and the associated element information includes an element class of the candidate object element;

the processing unit 501 is configured to perform information fusion on the initial element information and the associated element information of the candidate object element to obtain target element information of the candidate object element, and specifically configured to:

if the element type in the initial element information is the same as the element type in the associated element information, performing augmentation processing on the confidence coefficient in the initial element information, and taking the element information obtained after augmentation processing as target element information of the candidate object element;

and if the element type in the initial element information is different from the element type in the associated element information, reducing the confidence degree in the initial element information, and taking the element information obtained after the reduction as the target element information of the candidate object element.

In one embodiment, the number of the associated view images is N, where N is an integer greater than 1, and one associated view image corresponds to one associated element information;

the processing unit 501 is configured to perform information fusion on the initial element information and the associated element information of the candidate object element to obtain target element information of the candidate object element, and specifically configured to:

voting initial element information and N pieces of associated element information of the candidate object element to obtain the number of support votes of the initial element information and the number of support votes of each piece of associated element information;

and selecting object element information with the maximum number of supported tickets as target element information of the candidate object elements from the initial element information and the N pieces of associated element information.

In one embodiment, the processing unit 501 is configured to detect authenticity of the candidate object element based on the target element information, and specifically is configured to:

if the confidence coefficient in the target element information is greater than or equal to the confidence coefficient threshold value, judging that the candidate object element has authenticity;

and if the confidence coefficient in the target element information is smaller than the confidence coefficient threshold value, judging that the candidate object element does not have authenticity.

In one embodiment, the processing unit 501 is further configured to:

after all view images in the multi-view image set are traversed, authenticity detection results of H candidate object elements detected from all view images are obtained, wherein H is a positive integer;

and judging the object detection result of the object to be detected according to the authenticity check result of the H candidate object elements, wherein the object detection result is used for indicating whether the object to be detected passes the detection.

According to an embodiment of the present application, some steps involved in the object detection methods shown in fig. 2 and 3 may be performed by respective units in the object detection apparatus shown in fig. 5. For example, step S201, step S202, step S204, and step S205 shown in fig. 2 may be executed by the processing unit 501 shown in fig. 5, and step S203 may be executed by the acquisition unit 502 shown in fig. 5. Steps S301 to S303 and steps S305 to S308 shown in fig. 3 may be performed by the processing unit 501 shown in fig. 5, and step S304 may be performed by the acquisition unit 502 shown in fig. 5. The units in the object detection apparatus shown in fig. 5 may be respectively or entirely combined into one or several other units to form the object detection apparatus, or some unit(s) may be further split into multiple functionally smaller units to form the object detection apparatus, which may achieve the same operation without affecting the achievement of the technical effects of the embodiments of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the object detection apparatus may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may be implemented by cooperation of a plurality of units.

According to another embodiment of the present application, the object detection apparatus as shown in fig. 5 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the respective methods as shown in fig. 2 and 3 on a general-purpose computing apparatus such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and the object detection method of the embodiment of the present application may be implemented. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.

Based on the same inventive concept, the principle and the advantageous effect of the object detection apparatus provided in the embodiment of the present application for solving the problem are similar to those of the object detection method in the embodiment of the present application for solving the problem, and for brevity, the principle and the advantageous effect of the implementation of the method can be referred to, and are not described herein again.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an object detection device according to an embodiment of the present disclosure, and as shown in fig. 6, the object detection device at least includes a processor 601, a communication interface 602, and a memory 603. The processor 601, the communication interface 602, and the memory 603 may be connected by a bus or other means. The processor 601 (or Central Processing Unit, CPU) is a computing core and a control core of the terminal, and can analyze various instructions in the terminal and process various data of the terminal, for example: the CPU can be used for analyzing a power-on and power-off instruction sent to the terminal by a user and controlling the terminal to carry out power-on and power-off operation; the following steps are repeated: the CPU may transmit various types of interactive data between the internal structures of the terminal, and so on. The communication interface 602 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI, mobile communication interface, etc.), and may be controlled by the processor 601 to transmit and receive data; the communication interface 602 can also be used for transmission and interaction of data inside the terminal. A Memory 603 (Memory) is a Memory device in the terminal for storing programs and data. It is understood that the memory 603 herein may comprise a built-in memory of the terminal, and may also comprise an extended memory supported by the terminal. The memory 603 provides storage space that stores the operating system of the terminal, which may include, but is not limited to: android system, iOS system, Windows Phone system, etc., which are not limited in this application.

An embodiment of the present application further provides a computer-readable storage medium (Memory), which is a Memory device in a terminal and is used for storing programs and data. It is understood that the computer readable storage medium herein can include both a built-in storage medium in the terminal and an extended storage medium supported by the terminal. The computer readable storage medium provides a storage space that stores a processing system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 601. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, at least one computer readable storage medium located remotely from the aforementioned processor is also possible.

In one embodiment, the computer-readable storage medium has one or more instructions stored therein; one or more instructions stored in a computer-readable storage medium are loaded and executed by the processor 601 to implement the corresponding steps in the above-described embodiment of the object detection method; in particular implementations, one or more instructions in the computer-readable storage medium are loaded and executed by processor 601 to perform the following operations:

traversing each view angle image in the multi-view angle image set of the object to be detected, wherein different view angle images are obtained by shooting the object to be detected by adopting different shooting view angles;

if an object element needing to be detected is detected in the current traversed current view image, the detected object element is used as a candidate object element of the object to be detected, the position of the candidate object element in the current view image is determined, and initial element information of the candidate object element detected from the current view image is determined;

acquiring a related visual angle image of the current visual angle image from the residual visual angle images except the current visual angle image in the multi-visual angle image set, wherein a projection relation exists between the current visual angle image and the related visual angle image;

based on the projection relation between the current perspective image and the associated perspective image and the position of the candidate object element in the current perspective image, searching the candidate object element in the associated perspective image;

and generating target element information of the candidate object element according to the retrieval result and the initial element information of the candidate object element, and detecting the authenticity of the candidate object element based on the target element information.

As an optional embodiment, the projection relationship between the current perspective image and the associated perspective image is one of the projection relationships in the multi-perspective joint modeling result of the multi-perspective image set;

the construction method of any projection relation in the multi-view joint modeling result is as follows:

selecting a visual angle image from a multi-visual angle image set of an object to be detected as a reference visual angle image, and selecting a visual angle image which has an overlapping area with the reference visual angle image from the multi-visual angle image set as a reference visual angle image;

determining a plurality of reference calibration points in the reference view image and a plurality of reference calibration points in the reference view image; one reference calibration point corresponds to one reference calibration point, and the calibration points refer to: calibrating the position of a characteristic point of an object to be detected in the visual angle image to obtain a point;

and calculating the projection relation between the reference visual angle image and the reference visual angle image according to the position coordinates of each reference calibration point and the position coordinates of the corresponding reference calibration point.

As an alternative embodiment, a specific embodiment of calculating the projection relationship between the reference perspective image and the reference perspective image according to the position coordinate of each reference calibration point and the position coordinate of the corresponding reference calibration point by the processor 601 is as follows:

acquiring a perspective transformation matrix to be solved, wherein the perspective transformation matrix to be solved comprises a plurality of parameters to be solved;

projecting the position coordinates of each reference calibration point from a two-dimensional space where the reference visual angle image is located to a three-dimensional space by adopting a perspective transformation matrix to be solved so as to obtain three-dimensional space coordinates of each reference calibration point, wherein the three-dimensional space coordinates of each reference calibration point comprise a plurality of parameters;

projecting the three-dimensional space coordinate of each reference calibration point to a two-dimensional space where the reference visual angle image is located to obtain the projection coordinate of each reference calibration point, wherein the projection coordinate of each reference calibration point comprises a plurality of parameters;

solving the value of each parameter in the perspective transformation matrix according to the constraint condition that the projection coordinate of each reference calibration point is equal to the position coordinate of the corresponding reference calibration point, so as to obtain the solved perspective transformation matrix;

and adopting the solved perspective transformation matrix as a projection relation between the reference visual angle image and the reference visual angle image.

As an alternative embodiment, the processor 601, based on the projection relationship between the current perspective image and the associated perspective image and the position of the candidate object element in the current perspective image, specifically performs the following steps:

based on the projection relation between the current view angle image and the associated view angle image, projecting the position of the candidate object element in the current view angle image to the associated view angle image to obtain a projection position;

if K object elements are detected in the associated view image, determining the position of each object element in the K object elements in the associated view image, wherein K is a positive integer;

calculating the position matching degree between the position of each object element in the associated view angle image and the projection position; searching the position matching degree which is greater than the threshold value of the matching degree from the calculated position matching degrees;

if the retrieval is successful, determining that the candidate object elements are retrieved from the associated view angle image; and if the retrieval fails, determining that the candidate object element is not retrieved in the associated view angle image.

As an optional embodiment, the position of each object element in the associated view image is identified by using a labeling frame, and the projection position is identified by using a projection frame; the processor 601 calculates the position of each object element in the associated view image, and the specific embodiment of the position matching degree with the projection position is as follows:

calculating the intersection ratio between the position of the kth object element in the associated view angle image and the projection position, wherein K belongs to [1, K ];

and determining the calculated intersection ratio as the position matching degree between the position of the kth object element in the associated view angle image and the projection position.

As an alternative embodiment, the initial element information includes confidence levels of the candidate object elements; specific examples of the processor 601 generating the target element information of the candidate object element according to the search result and the initial element information of the candidate object element are as follows:

if the retrieval result indicates that the candidate object element is not retrieved in the associated view image, reducing the confidence coefficient in the initial element information;

and taking the object element information obtained after the reduction processing as target element information of the candidate object element.

As an alternative embodiment, the specific embodiment that the processor 601 generates the target element information of the candidate object element according to the search result and the initial element information of the candidate object element is as follows:

if the retrieval result indicates that the candidate object element is retrieved in the associated view image, taking object element information of the candidate object element detected in the associated view image as associated element information;

and carrying out information fusion on the initial element information and the associated element information of the candidate object element to obtain the target element information of the candidate object element.

As an alternative embodiment, the initial element information includes a confidence of the candidate object element, and the associated element information includes a confidence of the candidate object element;

the specific embodiment of the processor 601 performing information fusion on the initial element information and the associated element information of the candidate object element to obtain the target element information of the candidate object element is as follows:

acquiring a first weight of the initial element information and a second weight of the associated element information;

weighting and summing the confidence coefficient in the initial element information and the confidence coefficient in the associated element information by adopting the first weight and the second weight to obtain a target confidence coefficient;

adding the target confidence to the target element information of the candidate object element.

As an alternative embodiment, the initial element information includes element categories of the candidate object elements, the number of the associated view images is N, the associated element information of each associated view image includes the element categories of the candidate object elements, and N is an integer greater than 1;

the specific embodiment of the processor 601 performing information fusion on the initial element information and the associated element information of the candidate object element to obtain the target element information of the candidate object element is as follows:

if the element category in the initial element information is the same as the element category in each associated element information, taking the element category as a target element category and adding the target element category to the target element information of the candidate object element;

if the element type in at least one piece of associated element information is different from the element type in the initial element information, counting the number of each element type, determining the element type with the largest number as a target element type, and adding the target element type to the target element information of the candidate object element.

As an alternative embodiment, the initial element information includes the confidence of the candidate object element and the element category of the candidate object element, and the associated element information includes the element category of the candidate object element;

the specific embodiment of the processor 601 performing information fusion on the initial element information and the associated element information of the candidate object element to obtain the target element information of the candidate object element is as follows:

if the element type in the initial element information is the same as the element type in the associated element information, performing augmentation processing on the confidence coefficient in the initial element information, and taking the element information obtained after augmentation processing as target element information of the candidate object element;

and if the element type in the initial element information is different from the element type in the associated element information, reducing the confidence degree in the initial element information, and taking the element information obtained after the reduction as the target element information of the candidate object element.

As an optional embodiment, the number of associated view images is N, where N is an integer greater than 1, and one associated view image corresponds to one associated element information;

the specific embodiment of the processor 601 performing information fusion on the initial element information and the associated element information of the candidate object element to obtain the target element information of the candidate object element is as follows:

voting initial element information and N pieces of associated element information of the candidate object element to obtain the number of support votes of the initial element information and the number of support votes of each piece of associated element information;

and selecting object element information with the maximum number of supported tickets as target element information of the candidate object elements from the initial element information and the N pieces of associated element information.

As an alternative embodiment, the specific embodiment of the processor 601 detecting the authenticity of the candidate object element based on the target element information is as follows:

if the confidence coefficient in the target element information is greater than or equal to the confidence coefficient threshold value, judging that the candidate object element has authenticity;

and if the confidence coefficient in the target element information is smaller than the confidence coefficient threshold value, judging that the candidate object element does not have authenticity.

As an alternative embodiment, the processor 601, by executing the executable program code in the memory 603, further performs the following operations:

after all view images in the multi-view image set are traversed, authenticity detection results of H candidate object elements detected from all view images are obtained, wherein H is a positive integer;

and judging the object detection result of the object to be detected according to the authenticity check result of the H candidate object elements, wherein the object detection result is used for indicating whether the object to be detected passes the detection.

Based on the same inventive concept, the principle and the beneficial effect of the problem solving of the object detection device provided in the embodiment of the present application are similar to the principle and the beneficial effect of the problem solving of the object detection method in the embodiment of the present application, and for the sake of brevity, the principle and the beneficial effect of the implementation of the method can be referred to, and are not described herein again.

The embodiment of the present application further provides a computer-readable storage medium, in which one or more instructions are stored, and the one or more instructions are adapted to be loaded by a processor and execute the object detection method of the foregoing method embodiment.

Embodiments of the present application further provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the object detection method of the above method embodiments.

Embodiments of the present application also provide a computer program product or a computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method for object detection.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The modules in the device can be merged, divided and deleted according to actual needs.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, which may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

37页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于人工智能和大数据的药品包装设计方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!