Picture duplicate checking method and device and computer readable storage medium

文档序号:1875547 发布日期:2021-11-23 浏览:21次 中文

阅读说明:本技术 图片查重方法、装置和计算机可读存储介质 (Picture duplicate checking method and device and computer readable storage medium ) 是由 金彬 陈杰 于 2020-05-19 设计创作,主要内容包括:本申请提出了一种图片查重方法、装置和计算机可读存储介质,其中,该方法包括:基于待查重图片的属性信息,为所述待查重图片生成对应的字符串;通过布隆过滤器对所述字符串进行图片查重计算;通过所述布隆过滤器输出查重结果。通过本申请的技术方案,简化了图片查重过程,提升了查重效率。(The application provides a picture duplicate checking method, a picture duplicate checking device and a computer readable storage medium, wherein the method comprises the following steps: generating a corresponding character string for the picture to be checked based on the attribute information of the picture to be checked; performing picture duplication checking calculation on the character string through a bloom filter; and outputting a duplicate checking result through the bloom filter. Through the technical scheme, the picture duplicate checking process is simplified, and the duplicate checking efficiency is improved.)

1. A picture duplication checking method is characterized by comprising the following steps:

generating a corresponding character string for the picture to be checked based on the attribute information of the picture to be checked;

performing picture duplication checking calculation on the character string through a bloom filter;

and outputting a duplicate checking result through the bloom filter.

2. The picture duplication checking method of claim 1, wherein the step of performing the picture duplication checking calculation on the character string through the bloom filter comprises:

generating a specified number of hash values for the string;

detecting whether the array positions of the hash values of the specified number in the character string are all 1;

the step of outputting the duplicate checking result through the bloom filter comprises the following steps:

under the condition that the array positions of the hash values of the specified number corresponding to the character string are all 1, outputting the duplicate checking result as that the picture to be checked is recorded;

and under the condition that the array positions corresponding to the hash values of the specified number in the character string are not all 1, setting the array positions corresponding to the hash values of the specified number in the character string to be 1, and outputting the duplicate checking result as that the picture to be duplicated is not recorded.

3. The picture duplication checking method of claim 1 or 2,

if the picture to be checked is the original picture without marked structural information, then

The attribute information includes original picture information, wherein,

the original picture information includes a picture file volume and a picture file binary content.

4. The picture duplication checking method of claim 1 or 2,

if the picture to be checked is the marked picture marked with the structural information, then

The attribute information includes original picture information and/or structured information, wherein,

the original picture information comprises a picture file volume and a picture file binary content;

the structured information comprises picture identification, graph marking information and label marking information.

5. The picture duplication checking method of claim 4, further comprising:

under the condition that the duplication checking result is that the picture to be duplicated is not recorded, storing the character string corresponding to the picture to be duplicated into a structured information database;

under the condition that the duplication checking result is that the picture to be duplicated is recorded, storing the character string corresponding to the picture to be duplicated into a misjudgment list associated with the structured information database;

and

and providing the structured information database and the misjudgment list to a sender of the structured information acquisition instruction based on the received structured information acquisition instruction.

6. The picture duplication checking method of claim 1 or 2,

if the picture to be checked is the picture to be marked with the structural information to be marked, the picture to be marked is marked

The attribute information comprises picture identification and the structural information to be marked.

7. The picture duplication checking method according to claim 6, further comprising, before the step of generating a corresponding character string for the picture to be duplicated based on the attribute information of the picture to be duplicated,:

selecting the picture to be checked which meets the specified picture shooting condition from a sample picture set;

setting the structural information to be marked for the picture to be checked;

the step of generating a corresponding character string for the picture to be checked based on the attribute information of the picture to be checked comprises the following steps:

generating the corresponding character string for the picture to be checked based on the picture identification generated by the picture to be checked and the structural information to be marked;

after the step of outputting the duplicate checking result through the bloom filter, the method further comprises the following steps:

under the condition that the duplication checking result is that the picture to be duplicated is not recorded, marking the picture to be duplicated by the structural information to be marked;

and discarding the picture to be checked if the result of the check is that the picture to be checked is recorded.

8. The picture duplication checking method according to claim 1, wherein the step of generating a corresponding character string for the picture to be duplicated based on the attribute information of the picture to be duplicated comprises:

respectively generating corresponding sub-character strings for each item of attribute information of the picture to be checked;

and merging the sub character strings of each item of attribute information according to a specified sequence to obtain the character string corresponding to the picture to be checked.

9. A picture duplication checking device is characterized by comprising:

the character string generating unit is used for generating a corresponding character string for the picture to be checked based on the attribute information of the picture to be checked;

the bloom filter calculation unit is used for carrying out picture duplication checking calculation on the character string through a bloom filter;

and the duplicate checking result output unit is used for outputting the duplicate checking result through the bloom filter.

10. A computer-readable storage medium having stored thereon computer-executable instructions for performing the method flow of any of claims 1-8.

[ technical field ] A method for producing a semiconductor device

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for checking duplicate pictures, and a computer-readable storage medium.

[ background of the invention ]

Training for machine learning requires a large amount of sample data labeled with structured information indicating attributes such as the type or content of the sample data. In order to ensure the accuracy of the training result of machine learning, data needs to be subjected to deduplication processing. In this regard, a data management system is established in the related art, and structured information of sample data is recorded, and since each sample data has unique structured information, it is possible to detect whether the data management system has the same structured information as the structured information for a newly added data.

However, the amount of sample data is huge, and accordingly, the amount of structured information of the data management system is also huge, and the structured information has multiple types, such as a label box type, an entity label, and the like, and if it is required to query whether newly added data has been recorded by the data management system, it is required to compare mass data once for each item of structured information of the newly added data in the data management system. The process is computationally intensive, consumes a lot of time and system resources, and is inefficient in duplicate checking.

Therefore, how to improve the efficiency of sample data duplication checking in the machine learning training process becomes a technical problem to be solved urgently at present.

[ summary of the invention ]

The embodiment of the application provides a picture duplicate checking method and device and a computer readable storage medium, and aims to solve the technical problem that sample data duplicate checking efficiency is low in a machine learning training process in the related technology.

In a first aspect, an embodiment of the present application provides a picture duplicate checking method, including: generating a corresponding character string for the picture to be checked based on the attribute information of the picture to be checked; performing picture duplication checking calculation on the character string through a bloom filter; and outputting a duplicate checking result through the bloom filter.

In the foregoing embodiment of the present application, optionally, the step of performing the picture duplication checking calculation on the character string through a bloom filter includes: generating a specified number of hash values for the string; detecting whether the array positions of the hash values of the specified number in the character string are all 1; the step of outputting the duplicate checking result through the bloom filter comprises the following steps: under the condition that the array positions of the hash values of the specified number corresponding to the character string are all 1, outputting the duplicate checking result as that the picture to be checked is recorded; and under the condition that the array positions corresponding to the hash values of the specified number in the character string are not all 1, setting the array positions corresponding to the hash values of the specified number in the character string to be 1, and outputting the duplicate checking result as that the picture to be duplicated is not recorded.

In the above embodiment of the present application, optionally, the to-be-checked duplicate picture is an original picture without structural information, and the attribute information includes original picture information, where the original picture information includes a picture file volume and a picture file binary content.

In the above embodiment of the present application, optionally, the to-be-checked picture is a marked picture marked with structured information, and the attribute information includes original picture information and/or structured information, where the original picture information includes a picture file volume and a picture file binary content; the structured information comprises picture identification, graph marking information and label marking information.

In the above embodiments of the present application, optionally, the method further includes: under the condition that the duplication checking result is that the picture to be duplicated is not recorded, storing the character string corresponding to the picture to be duplicated into a structured information database; under the condition that the duplication checking result is that the picture to be duplicated is recorded, storing the character string corresponding to the picture to be duplicated into a misjudgment list associated with the structured information database; and providing the structured information database and the misjudgment list to a sender of the structured information acquisition instruction based on the received structured information acquisition instruction.

In the above embodiment of the present application, optionally, if the to-be-checked picture is a to-be-marked picture with structural information to be marked, the attribute information includes a picture identifier and the structural information to be marked.

In the foregoing embodiment of the present application, optionally, before the step of generating a corresponding character string for the picture to be checked based on the attribute information of the picture to be checked, the method further includes: selecting the picture to be checked which meets the specified picture shooting condition from a sample picture set; setting the structural information to be marked for the picture to be checked; the step of generating a corresponding character string for the picture to be checked based on the attribute information of the picture to be checked comprises the following steps: generating the corresponding character string for the picture to be checked based on the picture identification generated by the picture to be checked and the structural information to be marked; after the step of outputting the duplicate checking result through the bloom filter, the method further comprises the following steps: under the condition that the duplication checking result is that the picture to be duplicated is not recorded, marking the picture to be duplicated by the structural information to be marked; and discarding the picture to be checked if the result of the check is that the picture to be checked is recorded.

In the foregoing embodiment of the present application, optionally, the step of generating a corresponding character string for the picture to be checked based on the attribute information of the picture to be checked includes: respectively generating corresponding sub-character strings for each item of attribute information of the picture to be checked; and merging the sub character strings of each item of attribute information according to a specified sequence to obtain the character string corresponding to the picture to be checked.

In a second aspect, an embodiment of the present application provides an apparatus for duplicate checking of pictures, including: the character string generating unit is used for generating a corresponding character string for the picture to be checked based on the attribute information of the picture to be checked; the bloom filter calculation unit is used for carrying out picture duplication checking calculation on the character string through a bloom filter; and the duplicate checking result output unit is used for outputting the duplicate checking result through the bloom filter.

In the foregoing embodiment of the present application, optionally, the bloom filter calculating unit is specifically configured to: generating a specified number of hash values for the string; detecting whether the array positions of the hash values of the specified number in the character string are all 1; the duplicate checking result output unit is used for: under the condition that the array positions of the hash values of the specified number corresponding to the character string are all 1, outputting the duplicate checking result as that the picture to be checked is recorded; and under the condition that the array positions corresponding to the hash values of the specified number in the character string are not all 1, setting the array positions corresponding to the hash values of the specified number in the character string to be 1, and outputting the duplicate checking result as that the picture to be duplicated is not recorded.

In the above embodiment of the present application, optionally, the to-be-checked duplicate picture is an original picture without structural information, and the attribute information includes original picture information, where the original picture information includes a picture file volume and a picture file binary content.

In the above embodiment of the present application, optionally, the to-be-checked picture is a marked picture marked with structured information, and the attribute information includes original picture information and/or structured information, where the original picture information includes a picture file volume and a picture file binary content; the structured information comprises picture identification, graph marking information and label marking information.

In the above embodiments of the present application, optionally, the method further includes: a character string storage unit, configured to store the character string corresponding to the duplicate picture to be checked into a structured information database when the duplicate checking result is that the duplicate picture to be checked is not recorded, and store the character string corresponding to the duplicate picture to be checked into a false positive list associated with the structured information database when the duplicate checking result is that the duplicate picture to be checked is recorded; and the information providing unit is used for providing the structured information database and the misjudgment list to an issuer of the structured information acquisition instruction based on the received structured information acquisition instruction.

In the above embodiment of the present application, optionally, the duplicate picture to be checked is a picture to be labeled with structural information to be labeled, and the attribute information includes a picture identifier and the structural information to be labeled.

In the above embodiments of the present application, optionally, the method further includes: the picture screening unit is used for selecting the picture to be checked meeting the specified picture shooting conditions in the sample picture set before the character string generating unit generates the corresponding character string for the picture to be checked; the content to be marked setting unit is used for setting the structural information to be marked for the picture to be checked; the character string generation unit is used for: generating the corresponding character string for the picture to be checked based on the picture identification generated by the picture to be checked and the structural information to be marked; the picture duplicate checking device further comprises: further comprising: the first execution unit is used for marking the picture to be checked with the structural information to be marked when the duplication checking result is that the picture to be checked is not recorded after the duplication checking result is output; and the second execution unit is used for discarding the picture to be duplicated when the duplication checking result is that the picture to be duplicated is recorded.

In the foregoing embodiment of the present application, optionally, the character string generating unit is configured to: respectively generating corresponding sub-character strings for each item of attribute information of the picture to be checked; and merging the sub character strings of each item of attribute information according to a specified sequence to obtain the character string corresponding to the picture to be checked.

In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the method of any of the first aspects above.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium storing computer-executable instructions for performing the method flow of any one of the first aspect.

By the aid of the technical scheme, the picture duplicate checking process before machine learning training is simplified, and duplicate checking efficiency is improved.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 shows a flow diagram of a picture duplication checking method according to an embodiment of the present application;

FIG. 2 shows a flow diagram of a picture duplication checking method according to another embodiment of the present application;

FIG. 3 shows a flow diagram of a picture duplication checking method according to yet another embodiment of the present application;

FIG. 4 shows a flow diagram of a picture duplication checking method according to yet another embodiment of the present application;

FIG. 5 shows a block diagram of a picture reviewing device according to an embodiment of the present application;

FIG. 6 shows a block diagram of an electronic device according to an embodiment of the application.

[ detailed description ] embodiments

For better understanding of the technical solutions of the present application, the following detailed descriptions of the embodiments of the present application are provided with reference to the accompanying drawings.

It should be understood that the embodiments described are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

Fig. 1 shows a flowchart of a picture duplication checking method according to an embodiment of the present application.

As shown in fig. 1, a process of a picture duplicate checking method according to an embodiment of the present application includes:

102, generating a corresponding character string for the picture to be checked based on the attribute information of the picture to be checked.

In the process of machine learning such as face detection, face recognition, human body recognition, animal recognition, scene recognition, safety monitoring and the like, a large number of sample pictures are often trained, and before training, the sample pictures are firstly checked for duplication, so that the influence of repeated training on the same sample pictures on the machine learning result is avoided.

When the duplication checking is performed on the duplication checking image with the multiple attribute information in the related technology, mass inquiry is often performed on each item of attribute information in a data management system, and a large amount of system resources and time are consumed. In the technical scheme of the application, a character string can be generated by using various attribute information of the image to be checked, and then the character string is only checked for duplication, so that duplication checking times are reduced, duplication checking workload is greatly reduced, and duplication checking efficiency is improved.

In an implementation manner of the present application, corresponding sub-character strings may be respectively generated for each item of attribute information of the to-be-checked duplicate picture, and the sub-character strings of each item of attribute information are merged according to a specified order to obtain the character string corresponding to the to-be-checked duplicate picture.

The appointed sequence can be preset according to the actual requirement of duplicate checking, and after the preset, the substrings are combined in the appointed sequence for all the pictures to be checked.

And 104, carrying out picture duplication checking calculation on the character strings through a bloom filter.

And step 106, outputting a duplicate checking result through the bloom filter.

Further, the following procedure is included in the bloom filter:

generating a specified number of hash values for the string; detecting whether the array positions corresponding to the hash values of the specified number in the character string are all 1, where step 106 includes: under the condition that the array positions of the hash values of the specified number corresponding to the character string are all 1, outputting the duplicate checking result as that the picture to be checked is recorded; and under the condition that the array positions corresponding to the hash values of the specified number in the character string are not all 1, setting the array positions corresponding to the hash values of the specified number in the character string to be 1, and outputting the duplicate checking result as that the picture to be duplicated is not recorded.

When a string str is to be recorded in the bloom filter, K hash values h (1, str), h (2, str), … …, h (K, str) are calculated for the string str, and then the positions of the groups corresponding to h (1, str), h (2, str), … …, h (K, str) are set to 1. If it is required to detect whether the string str is outdated by the bloom filter, K hash values h (1, str), h (2, str), … …, and h (K, str) may be calculated for the string str, and then it is checked whether the array positions corresponding to the hash values are all 1. If the array positions corresponding to the hash values are all 1, the character string str is considered to exist, namely the character string str is recorded; if the array positions corresponding to the hash values are not all 1, the string str is considered to be absent, that is, the string str is not recorded, at this time, the array positions corresponding to the hash values can all be set to 1, and the recording of the string str is completed.

After the character strings are generated, the character strings are input into a bloom filter, so that the bloom filter is used for replacing a technical scheme of massive query on a data management system in the related technology, the bloom filter occupies a small memory, 1 hundred million-level data query can be returned in millisecond level, the picture duplicate checking process is greatly simplified, and the duplicate checking efficiency is improved.

Further, the to-be-checked duplicated picture includes an original picture without structural information, a labeled picture with structural information, and a to-be-labeled picture with structural information to be labeled, and the duplication checking modes of these three types of to-be-checked duplicated pictures are further described in detail through the embodiments of fig. 2 to 4 below.

Fig. 2 shows a flowchart of a picture duplication checking method according to another embodiment of the present application.

And under the condition that the picture to be checked is the original picture without structural information, the attribute information comprises original picture information, wherein the original picture information comprises a picture file volume and picture file binary content. As shown in fig. 2, a flow of a picture duplicate checking method according to another embodiment of the present application includes:

step 202, generating a first character string corresponding to the original picture according to the picture file volume and the picture file binary content of the original picture without structural information.

The method has the advantages that the first character string is generated by the picture file volume and the picture file binary content of the original picture, the first character string can be input into the bloom filter to be subjected to duplicate checking in the following steps, multiple times of inquiry of different attribute information in the related technology is simplified into single detection of the first character string, and massive data inquiry by using a data management system is simplified into treatment of the bloom filter, so that the process of picture duplicate checking is simplified, duplicate checking calculation amount and time consumption are greatly reduced, and duplicate checking efficiency is improved.

At step 204, 10 hash values are generated for the first string.

Step 206, detecting whether the array positions of the 10 hash values in the first character string are all 1, if so, entering step 208, otherwise, entering step 210.

Step 208, determining that the original picture is recorded, and discarding the original picture.

Step 210, determining that the original picture is not recorded, and setting array positions corresponding to the 10 hash values in the first character string to be 1, so as to record the original picture.

When a string str is to be recorded in the bloom filter, K hash values h (1, str), h (2, str), … …, h (K, str) are calculated for the string str, and then the positions of the groups corresponding to h (1, str), h (2, str), … …, h (K, str) are set to 1. If it is required to detect whether the string str is outdated by the bloom filter, K hash values h (1, str), h (2, str), … …, and h (K, str) may be calculated for the string str, and then it is checked whether the array positions corresponding to the hash values are all 1. If the array positions corresponding to the hash values are all 1, the character string str is considered to exist, namely the character string str is recorded; if the array positions corresponding to the hash values are not all 1, the string str is considered to be absent, that is, the string str is not recorded, at this time, the array positions corresponding to the hash values can all be set to 1, and the recording of the string str is completed.

After the first character string is generated, the first character string is input into a bloom filter, so that the bloom filter is used for replacing the technical scheme of massive query on a data management system in the related technology, the bloom filter occupies a small memory, 1 hundred million-level data query can be returned in millisecond level, the picture duplicate checking process is greatly simplified, and the duplicate checking efficiency is improved.

In an actual scene of face recognition, when a neural network model for face recognition is trained, a large number of sample face pictures which are not marked with structural information need to be checked for duplication.

Specifically, according to the picture file volume and the picture file binary content of the sample face picture without structural information, a character string a corresponding to the sample face picture is generated, and then the character string a can be input into the bloom filter to perform the following steps of duplicate checking.

In the bloom filter, 10 hash values are generated for the character string a, and then whether the array positions of the 10 hash values corresponding to the character string a are all 1 or not is detected. And when the detection result is yes, determining that the sample face picture is recorded, and discarding the sample face picture. Otherwise, determining that the sample face picture is not recorded, and setting array positions corresponding to the 10 hash values in the character string a to be 1 so as to record the sample face picture as an effective sample. The bloom filter occupies small memory, 1 hundred million-level data query can be returned in millisecond level, so that the picture duplicate checking process is greatly simplified, and the duplicate checking efficiency is improved.

Fig. 3 shows a flowchart of a picture duplication checking method according to yet another embodiment of the present application.

Under the condition that the picture to be checked is the marked picture marked with the structural information, the attribute information comprises original picture information and/or structural information, wherein the original picture information comprises a picture file volume and picture file binary content; the structured information comprises picture identification, graph marking information and label marking information. The picture identification is the unique ID of the picture, the graph marking information is the shape of an identification frame in the picture, such as a rectangular frame, a circular frame and the like, and the label marking information is the type of marked content in the picture, such as people, animals, inanimate objects and the like. Of course, the attribute information may include a variety of information under the original picture information and/or under the structured information.

As shown in fig. 3, the description is given by taking the attribute information including the picture identifier, the graphic labeling information, and the label labeling information as an example, and includes:

step 302, generating a corresponding second character string according to the image identifier, the graph marking information and the label marking information of the marked image marked with the structural information.

The method comprises the steps of generating a second character string by using the picture identification, the graph marking information and the label marking information of the marked picture with marked structural information, inputting the second character string into a bloom filter to carry out duplicate checking in the following steps, simplifying multiple queries on different attribute information in the related technology into single detection on the second character string, and simplifying mass data query by using a data management system into processing of the bloom filter, so that the process of checking duplicate of the picture is simplified, the duplicate checking calculation amount and the consumed time are greatly reduced, and the duplicate checking efficiency is improved.

At step 304, 10 hash values are generated for the second string.

Step 306, detecting whether the array positions of the 10 hash values in the second character string are all 1, if so, entering step 308, otherwise, entering step 310.

And 308, determining that the marked picture is recorded, and discarding the marked picture.

Step 310, determining that the marked picture is not recorded, and setting array positions corresponding to the 10 hash values in the first character string as 1 to record the marked picture.

After the second character string is generated, the second character string is input into the bloom filter, so that the bloom filter is used for replacing the technical scheme of massive query on the data management system in the related technology, the bloom filter occupies a small memory, 1 hundred million-level data query can be returned in millisecond level, the picture duplicate checking process is greatly simplified, and the duplicate checking efficiency is improved.

In an implementation manner of the present application, recording the tagged picture includes storing a second character string corresponding to the tagged picture in a structured information database.

In another implementation manner of the present application, after discarding the labeled picture, the second character string corresponding to the labeled picture may be stored in a misjudgment list associated with the structured information database. Then, based on the received structural information acquisition instruction, the structural information database and the misjudgment list are provided to the sender of the structural information acquisition instruction.

Therefore, the structured information of the effective sample data recorded in the structured information database can be obtained, the structured information of the sample data which is found to be duplicated in the misjudgment list can be obtained, the statistics of the data is facilitated, and the analysis can be carried out based on the misjudgment list in the subsequent monitoring or fault problem query process.

In an actual scene of traffic management, when a neural network model for license plate recognition is trained, a large number of sample license plate pictures marked with structured information need to be checked for duplication.

Specifically, according to the picture identification, the graph marking information and the label marking information of the sample license plate picture marked with the structural information, a corresponding character string b is generated, and then the character string b can be input into a bloom filter to carry out the following steps of duplicate checking.

In the bloom filter, 10 hash values are generated for the character string b, and whether the array positions of the 10 hash values in the character string b are all 1 or not is detected. And if the detection result is yes, determining that the sample license plate picture is recorded, and discarding the sample license plate picture. Otherwise, determining that the sample license plate picture is not recorded, and setting array positions of the 10 hash values in the first character string to be 1 so as to record the sample license plate picture. The bloom filter occupies small memory, 1 hundred million-level data query can be returned in millisecond level, so that the picture duplicate checking process is greatly simplified, and the duplicate checking efficiency is improved.

Fig. 4 shows a flowchart of a picture duplication checking method according to yet another embodiment of the present application.

And under the condition that the picture to be re-checked is the picture to be labeled with the structural information to be labeled, the attribute information comprises a picture identifier and the structural information to be labeled. As shown in fig. 4, the process of performing duplicate checking on the to-be-labeled picture with the to-be-labeled structured information includes:

in step 402, an original picture satisfying a specified picture taking condition is selected from a sample picture set.

The designated picture shooting conditions include, but are not limited to, designated camera point locations, designated shooting scenes and the like, and through the step, the sample picture set can be preliminarily screened to obtain an original picture meeting the actual requirements.

And step 404, setting structural information to be marked for the original picture to obtain the picture to be marked with the structural information to be marked.

When the actual task requirement is that the specified structural information is to be obtained, the specified structural information may be set as the structural information to be labeled of the original picture, for example, if the actual task requirement is that "person" is labeled in the form of "rectangular frame", the structural information to be labeled is the graphic labeling information "rectangular frame" and the label labeling information "person".

And 406, generating a corresponding third character string according to the picture identifier of the picture to be labeled with the structural information to be labeled and the structural information to be labeled.

The method comprises the steps of generating a third character string by using a picture identifier of a picture to be marked with structural information to be marked and the structural information to be marked, inputting the third character string into a bloom filter to carry out duplicate checking in the following steps, simplifying multiple times of inquiry on different attribute information in the related technology into single detection on the third character string, and simplifying mass data inquiry into bloom filter processing by using a data management system, thereby simplifying the process of picture duplicate checking, greatly reducing duplicate checking calculation amount and time consumption, and improving duplicate checking efficiency.

At step 408, 10 hash values are generated for the third string.

In step 410, whether the array positions of the 10 hash values in the third string are all 1 is detected, if so, the step 412 is entered, otherwise, the step 414 is entered.

After the third character string is generated, the third character string is input into a bloom filter, so that the bloom filter is used for replacing the technical scheme of massive query on a data management system in the related technology, the bloom filter occupies small memory, 1 hundred million-level data query can be returned in millisecond level, the picture duplicate checking process is greatly simplified, and the duplicate checking efficiency is improved.

In step 412, it is determined that the picture to be annotated with the structural information to be annotated has been recorded, and the picture to be annotated with the structural information to be annotated is discarded.

Step 414, determining that the picture to be marked with the structural information to be marked is not recorded, marking the picture with the structural information to be marked, and setting array positions corresponding to the 10 hash values in the first character string as 1 to record the original picture.

Finally, because it is determined that the original picture needs to be recorded, the original picture can be recorded after being labeled by the structural information to be labeled, and the labeling and recording processes are completed.

In the actual scene of gait recognition, when training a neural network model for human gait, structural information to be marked needs to be set for a large number of sample human body pictures, and duplication checking is carried out on the structural information.

Specifically, a sample human body picture shot at a camera point a is selected from a sample human body picture set, and structural information to be labeled is set as a graphic labeling information rectangular frame and a label labeling information human for the sample human body picture. Next, a corresponding character string c is generated for the sample human body picture having the graphic labeling information "rectangular frame" and the label labeling information "person". Then the character string c is input into a bloom filter to be checked for duplication by the following steps.

In the bloom filter, 10 hash values are generated for the character string c, and then whether the array positions of the 10 hash values in the character string c are all 1 or not is detected. And if so, determining that the sample human body picture with the structural information to be marked is recorded, and discarding the sample human body picture with the structural information to be marked. Otherwise, determining that the sample human body picture with the structural information to be marked is not recorded, marking the sample human body picture with the structural information to be marked, and setting array positions of the 10 hash values corresponding to the first character string as 1 to record the sample human body picture. The bloom filter occupies small memory, 1 hundred million-level data query can be returned in millisecond level, so that the picture duplicate checking process is greatly simplified, and the duplicate checking efficiency is improved.

Finally, because it is determined that the sample human body picture needs to be recorded, the sample human body picture can be recorded after being marked by the graphic marking information 'rectangular box' and the label marking information 'human body', and the marking and recording processes are completed.

Fig. 5 shows a block diagram of a picture reviewing device according to an embodiment of the present application.

As shown in fig. 5, an embodiment of the present application provides a picture duplication checking apparatus 500, including: a character string generating unit 502, configured to generate a corresponding character string for a duplicate picture to be checked based on attribute information of the duplicate picture to be checked; a bloom filter calculation unit 504, configured to perform picture duplication checking calculation on the character string through a bloom filter; and a duplicate checking result output unit 506, configured to output a duplicate checking result through the bloom filter.

In the foregoing embodiment of the present application, optionally, the bloom filter calculating unit 504 is specifically configured to: generating a specified number of hash values for the string; detecting whether the array positions of the hash values of the specified number in the character string are all 1; the duplicate checking result output unit 506 is configured to: under the condition that the array positions of the hash values of the specified number corresponding to the character string are all 1, outputting the duplicate checking result as that the picture to be checked is recorded; and under the condition that the array positions corresponding to the hash values of the specified number in the character string are not all 1, setting the array positions corresponding to the hash values of the specified number in the character string to be 1, and outputting the duplicate checking result as that the picture to be duplicated is not recorded.

In the above embodiment of the present application, optionally, the to-be-checked duplicate picture is an original picture without structural information, and the attribute information includes original picture information, where the original picture information includes a picture file volume and a picture file binary content.

In the above embodiment of the present application, optionally, the to-be-checked picture is a marked picture marked with structured information, and the attribute information includes original picture information and/or structured information, where the original picture information includes a picture file volume and a picture file binary content; the structured information comprises picture identification, graph marking information and label marking information.

In the above embodiments of the present application, optionally, the method further includes: a character string storage unit, configured to store the character string corresponding to the duplicate picture to be checked into a structured information database when the duplicate checking result is that the duplicate picture to be checked is not recorded, and store the character string corresponding to the duplicate picture to be checked into a false positive list associated with the structured information database when the duplicate checking result is that the duplicate picture to be checked is recorded; and the information providing unit is used for providing the structured information database and the misjudgment list to an issuer of the structured information acquisition instruction based on the received structured information acquisition instruction.

In the above embodiment of the present application, optionally, the duplicate picture to be checked is a picture to be labeled with structural information to be labeled, and the attribute information includes a picture identifier and the structural information to be labeled.

In the above embodiments of the present application, optionally, the method further includes: a picture screening unit, configured to select a picture to be checked that meets a specified picture shooting condition in a sample picture set before the character string generating unit 502 generates a corresponding character string for the picture to be checked; the content to be marked setting unit is used for setting the structural information to be marked for the picture to be checked; the character string generation unit is used for: generating the corresponding character string for the picture to be checked based on the picture identification generated by the picture to be checked and the structural information to be marked; the picture duplication checking apparatus 500 further includes: further comprising: the first execution unit is used for marking the picture to be checked with the structural information to be marked when the duplication checking result is that the picture to be checked is not recorded after the duplication checking result is output; and the second execution unit is used for discarding the picture to be duplicated when the duplication checking result is that the picture to be duplicated is recorded.

In the foregoing embodiment of the present application, optionally, the character string generating unit 502 is configured to: respectively generating corresponding sub-character strings for each item of attribute information of the picture to be checked; and merging the sub character strings of each item of attribute information according to a specified sequence to obtain the character string corresponding to the picture to be checked.

The picture duplicate checking device 500 uses the scheme described in any one of the embodiments shown in fig. 1 to fig. 4, and therefore, all the technical effects described above are achieved, and are not described again here.

FIG. 6 shows a block diagram of an electronic device according to an embodiment of the application.

As shown in fig. 6, an electronic device 600 of one embodiment of the present application includes at least one memory 602; and a processor 604 communicatively coupled to the at least one memory 602; wherein the memory stores instructions executable by the at least one processor 604 and configured to perform the aspects of any of the embodiments of fig. 1-4 described above. Therefore, the electronic device 600 has the same technical effect as any one of the embodiments of fig. 1 to 4, and is not described herein again.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.

(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.

(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.

(5) And other electronic devices with data interaction functions.

In addition, the present application provides a computer-readable storage medium storing computer-executable instructions for executing the method flow described in any one of the above embodiments of fig. 1 and 2.

The technical scheme of the application is described in detail in combination with the drawings, so that the picture duplicate checking process is simplified and the duplicate checking efficiency is improved.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:图像优先级确定方法和图像处理方法、装置、设备、介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!