Information processing apparatus, information processing method, and storage medium

文档序号：1937583 发布日期：2021-12-07 浏览：15次中文

阅读说明：本技术 信息处理装置、信息处理方法和存储介质 (Information processing apparatus, information processing method, and storage medium ) 是由佐藤庆尚于 2021-05-25 设计创作，主要内容包括：本发明公开一种信息处理装置、信息处理方法和存储介质。候选数据确定单元获取得分的估计结果,所述得分表示作为注释被添加到目标数据的标签的可能性。标签候选输出单元从用户接收对候选标签的指定。候选数据确定单元从特征空间中包括的多个标记数据中确定候选数据,所述候选数据表示在特征空间被划分的各个象限中分布的多个标记数据,其中,所述标签作为注释被添加到所述标记数据中,并且,其中,使用表示作为注释而添加的标签的可能性的得分作为轴来定义所述特征空间。候选数据确定单元基于多个象限中的每个象限所包括的标记数据为多个象限中的每一个象限确定候选数据。(The invention discloses an information processing apparatus, an information processing method, and a storage medium. The candidate data determination unit acquires an estimation result of a score representing a possibility of a tag being added as an annotation to the target data. The tag candidate output unit receives designation of a candidate tag from a user. The candidate data determination unit determines candidate data representing a plurality of label data distributed in respective quadrants into which the feature space is divided, from among a plurality of label data included in the feature space, wherein the label is added as an annotation to the label data, and wherein the feature space is defined using a score representing a likelihood of the label added as an annotation as an axis. The candidate data determination unit determines candidate data for each of the plurality of quadrants based on the flag data included in each of the plurality of quadrants.)

1. An information processing apparatus, the information processing apparatus comprising:

an acquisition unit configured to acquire an estimation result of a score representing a possibility of adding a tag as an annotation to target data;

a candidate tag receiving unit configured to receive specification of a candidate tag that is a candidate of a tag to be added as an annotation to the target data;

a determination unit configured to determine candidate data representing a plurality of label data distributed in respective quadrants into which a feature space is divided, from among a plurality of label data included in the feature space, wherein the label is added as an annotation to the label data, and wherein the feature space is defined using, as an axis, a score representing a possibility of adding a label as an annotation; and

a candidate data output unit configured to output the candidate data determined for each quadrant to a predetermined output destination.

2. The information processing apparatus according to claim 1, wherein the determination unit divides the area in which the label data is distributed into the quadrants, based on a median value between a maximum value and a minimum value of the score of each of the label data of each of the plurality of candidate labels.

3. The information processing apparatus according to claim 1,

wherein the determination unit divides a new area, which is defined using a specified candidate data among the plurality of candidate data output to the output destination as a base point, into a plurality of new quadrants, and determines new candidate data for each of the plurality of new quadrants based on flag data included in each of the plurality of new quadrants, and

wherein the candidate data output unit outputs the new candidate data determined for each of the plurality of new quadrants to the output destination.

4. The information processing apparatus according to claim 3, wherein the following processing is sequentially performed: a process of dividing the new area into the plurality of new quadrants, the new area being defined using a specified candidate data of a plurality of candidate data output to the output destination as a base point, and a process of determining the new candidate data for each of the plurality of new quadrants based on the flag data included in each of the plurality of new quadrants, and a process of outputting the new candidate data determined for each of the plurality of new quadrants to the output destination by the candidate data output unit.

5. The information processing apparatus according to claim 3, wherein the new area is defined such that a size of the new area is limited compared to an area previously divided into a plurality of quadrants.

6. The information processing apparatus according to claim 3, the information processing apparatus further comprising: a notification unit configured to notify the notification information to a predetermined output destination in a case where: the distance to the target data in the feature space calculated for each of the candidate data specified sequentially is increased by a predetermined multiple or more with respect to the distance calculated for the previously specified candidate data.

7. The information processing apparatus according to claim 1, wherein the determination unit controls the number of candidate data determined for each of the plurality of quadrants, according to a distance in the feature space between the specified target data and each of the plurality of quadrants.

8. The information processing apparatus according to claim 7, wherein the determination unit controls such that the larger the number of candidate data determined for each of the quadrants is, for quadrants closer to the specified target data in the feature space.

9. An information processing method executed by an information processing apparatus, the information processing method comprising:

obtaining an estimation result of a score representing a possibility of adding a tag as an annotation to the target data;

receiving a designation of a candidate tag that is a candidate for a tag to be added as an annotation to the target data;

determining candidate data representing a plurality of label data distributed in respective quadrants into which a feature space is divided, from among a plurality of label data included in the feature space, wherein the label is added as an annotation to the label data, and wherein the feature space is defined using, as an axis, a score representing a possibility of adding a label as an annotation; and

the candidate data determined for each quadrant is output to a predetermined output destination.

10. A non-transitory computer-readable storage medium storing a computer-executable program for causing a computer to perform the method of claim 9.

Technical Field

The invention relates to an information processing apparatus, an information processing method, and a storage medium.

Background

As a program for creating training data to be used for machine learning, an annotation tool for adding correct answer information (correct answer label) to be learned is used. Some annotation tools have, for example, a functional group for reducing the workload of a user (i.e., a functional group for assisting the user) to create training data.

For example, to create a machine learning model with higher performance, it may be desirable to add more accurate correct answer information to the training data and use a larger amount of training data. Thus, some annotation tools have functionality for assisting users to more efficiently add more accurate correct answer information to target data for training purposes. For example, japanese patent application No. 2019-114018 discusses an example of a function of assisting a user in determining which correct answer information to add to target data by classifying data to which correct answer information is added based on the correct answer information and presenting the classified data as reference information.

Disclosure of Invention

As the number of data to which correct answer information is added increases, the number of candidate data to be used for the cue reference information also increases. In this case, it may become very difficult for the user to check all the data. In addition, even if some data is randomly extracted for prompting reference information, the reference information prompted based on the extraction result may not always be used as a reference when the user determines which correct answer information is to be added to the target data.

Embodiments of the present disclosure assist users in a more desirable pattern of tags to be added as annotations.

According to embodiments of the present disclosure, an information processing apparatus includes an acquisition unit configured to acquire an estimation result of a score representing a possibility of adding a tag as an annotation to target data; a candidate tag receiving unit configured to receive specification of a candidate tag that is a candidate of a tag to be added as an annotation to the target data; a determination unit configured to determine candidate data representing a plurality of label data distributed in respective quadrants into which a feature space is divided, from among a plurality of label data included in the feature space, wherein the label is added as an annotation to the label data, and wherein the feature space is defined using a score representing a possibility of adding a label as an annotation as an axis; and a candidate data output unit configured to output the candidate data determined for each quadrant to a predetermined output destination.

Other features of various embodiments of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the attached drawings.

Drawings

Fig. 1 is a block diagram showing an example of a hardware configuration of an information processing apparatus according to an embodiment.

Fig. 2 is a block diagram showing an example of a functional configuration of an information processing apparatus according to an embodiment.

FIG. 3 is a table illustrating an example of a data structure for managing tag data according to one embodiment.

Fig. 4 is a table illustrating an example of a data structure for managing score estimation results according to one embodiment.

Fig. 5 is a flowchart illustrating an example of processing to be performed by the information processing apparatus according to one embodiment.

Fig. 6 illustrates an example of an operation screen for receiving designation of tag candidates according to one embodiment.

FIG. 7 illustrates an example of an output screen for hinting information based on candidate data, according to one embodiment.

Fig. 8 is a flowchart illustrating an example of processing to be performed by the information processing apparatus according to one embodiment.

Fig. 9A to 9F are diagrams respectively illustrating a process for dividing an area of point cloud data into a plurality of quadrants according to one embodiment.

Fig. 10 is a flowchart illustrating an example of processing to be performed by the information processing apparatus according to one embodiment.

Fig. 11 is a block diagram showing another example of the functional configuration of an information processing apparatus according to an embodiment.

Fig. 12 is a flowchart showing another example of processing to be performed by the information processing apparatus according to one embodiment.

Fig. 13 is a flowchart showing another example of processing to be performed by the information processing apparatus according to one embodiment.

Fig. 14 is a block diagram showing another example of the functional configuration of an information processing apparatus according to an embodiment.

Fig. 15 is a flowchart showing another example of processing to be performed by the information processing apparatus according to one embodiment.

FIG. 16 shows an example of a dialog box for prompting for notification information, according to one embodiment.

Detailed Description

Exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numerals are used to designate components having substantially the same functional configuration in the specification and the drawings, and are not described again.

< overview of the annotations >

Supervised learning is an example of a technique directed to training (i.e., building) machine learning models based on so-called machine learning. In supervised learning, a data set including training data in which data to be input to a learning model is associated with correct answer labels to be predicted based on the data is used to construct a machine learning model. In the case of building a machine learning model, for example, if such a data set is not present or incomplete, the data set is built by an annotation operation that adds a correct answer label as an annotation to the data to be input after the data is collected. In some cases, an annotation tool is used that includes a function for assisting a user in performing an operation of adding a correct answer tag to data, so that the user can perform an annotation operation more easily.

The annotation tool prompts a user with data about an image, a document, or the like to be annotated (hereinafter also referred to as target data), and receives from the user a designation of a correct answer tag to be added to the target data as an annotation. The annotation tool then adds the correct answer label specified by the user to the target data, thereby generating training data to be included in the data set.

Among various types of annotation tools, there is a tool that effectively performs a labeling operation to add a correct answer label to target data as described above using a machine learning model (hereinafter also referred to as a "trained model") constructed based on preliminary machine learning. As a specific example, a tool using a trained model causes the trained model to analyze target data to extract a tag candidate to be added to the target data as an annotation, and prompts the extracted tag candidate to a user. This enables the user to select a candidate to be added as a correct answer label to the target data from the label candidates prompted by the annotation tool.

To create a machine learning model with higher performance, it may be desirable to use more accurate correct answer information to add to the training data and use a larger amount of training data. Thus, some annotation tools have the ability to assist the user so that the user can more efficiently add more accurate correct answer information to the target data for training purposes.

As an example of such a user-assisted function provided by the above-described annotation tool, a function of classifying data to which correct answer information is added based on the correct answer information and prompting the classified data as reference information is proposed.

On the other hand, as the number of data to which correct answer information is added increases, the number of candidate data to be used for the prompt reference information also increases. In this case, it may become very difficult for the user to check all the data. In addition, even if some data is randomly extracted for the cue reference information, the conditions for extraction may not always be appropriate. Therefore, when the user determines which correct answer information is to be added to the target data, the reference information suggested based on the extraction result may not always be used as a reference.

Accordingly, the present disclosure proposes a technique related to an annotation tool that can assist a user in selecting a tag to be added as an annotation to target data in a more desirable mode.

Hereinafter, the existing trained machine learning model is also referred to as a pre-trained model, and hereinafter, a label to be selected as a candidate of correct answer information to be added to data to be annotated is also referred to as a candidate label. Hereinafter, data to be subjected to an annotation operation is also referred to as target data, and hereinafter, data to which correct answer information has been added as an annotation is also referred to as tag data.

An example of a hardware configuration of an information processing apparatus 100 according to an exemplary embodiment of the present disclosure will be described with reference to fig. 1. As shown in fig. 1, the information processing apparatus 100 according to the present exemplary embodiment includes a Central Processing Unit (CPU)111, a Read Only Memory (ROM)112, and a Random Access Memory (RAM) 113. The information processing apparatus 100 further includes an auxiliary storage device 114, an output device 115, an input device 116, and a communication interface (I/F) 117. The CPU111, ROM 112, RAM113, secondary storage device 114, output device 115, input device 116, and communication I/F117 are interconnected via bus 118.

The CPU111 is a central processing unit that controls various operations of the information processing apparatus 100. For example, the CPU111 may control the operation of the entire information processing apparatus 100. The ROM 112 stores control programs, boot programs, and the like that can be executed by the CPU 111. The RAM113 is a main memory of the CPU111, and is used as a work area or a temporary storage area for loading various programs.

The secondary storage device 114 stores various data and various programs. The secondary storage device 114 is implemented by a storage device capable of temporarily or persistently storing various data, such as a nonvolatile memory typified by a Hard Disk Drive (HDD) and a Solid State Drive (SSD).

The output device 115 is a device that outputs various information and is used to prompt a user for various information. In the present exemplary embodiment, the output device 115 is realized by a display device such as a display or the like. The output device 115 displays various display information, thereby prompting the user for information. In another example, the output device 115 may be implemented by a sound output device that outputs sound such as voice or electronic sound. In this case, the output device 115 outputs a sound such as a voice or an electronic sound, thereby prompting the user for information. The device serving as the output device 115 may be appropriately changed according to the medium used to prompt the user with information.

The input device 116 is used to receive various instructions from a user. In the present exemplary embodiment, the input device 116 includes an input device such as a mouse, a keyboard, and a touch panel. In another example, the input device 116 may include a sound collection device, such as a microphone, to collect speech spoken by the user. In this case, various analysis processes such as acoustic analysis and natural language processing are performed on the collected voice, so that the content of the voice instruction can be recognized as an instruction from the user. The device used as the input device 116 may be appropriately changed according to the method of recognizing an instruction from the user. Various types of devices may be used as the input device 116.

The communication I/F117 is used for communication with an external device via a network. The device serving as the communication I/F117 may be appropriately changed according to the type of communication path or a communication method to be applied.

The CPU111 loads a program stored in the ROM 112 or the secondary storage device 114 into the RAM113 and executes the program, thereby realizing the functional configurations shown in fig. 2, 11, and 14. The processes illustrated in the flowcharts of fig. 5, 8, 10, 12, 13, and 15 are realized by executing programs.

As a storage medium for supplying the program, for example, a flexible disk, a hard disk, an optical disk, and a magneto-optical disk can be used. Other examples of storage media may include Compact Discs (CD) -ROMs, CD (R) recordable, magnetic tape, non-volatile memory cards, ROMs, and Digital Versatile Discs (DVDs).

The program may be directly executed by the computer, or may be executed under the management of basic software such as an Operating System (OS) or the like running on the computer.

The program read from the storage medium may be processed by a function expansion board mounted on the computer, a function expansion unit connected to the computer, or the like.

A first exemplary embodiment of the present disclosure will be described below. In order to distinguish the information processing apparatus according to the first exemplary embodiment from the information processing apparatuses according to the other exemplary embodiments, hereinafter, for convenience of explanation, the information processing apparatus according to the first exemplary embodiment is referred to as an information processing apparatus 200.

In the present exemplary embodiment and the other exemplary embodiments described below, in order to facilitate explanation of technical features of the present disclosure, description may be made by focusing on the following cases: the image data is used as data to which correct answer information (tag) is added as an annotation. However, the image data is merely an example, and is not intended to limit the type of data to which correct answer information is added by an information processing apparatus to which the technique according to the present disclosure is applied. In other words, the type of data to which correct answer information is added may be appropriately changed within a range that does not depart from the basic idea of the technology according to the present disclosure.

(function configuration)

An example of the functional configuration of the information processing apparatus 200 according to the present exemplary embodiment will be described with reference to fig. 2. The information processing apparatus 200 includes a pre-training model reading unit 201, a label data management unit 202, a score estimation unit 203, an estimation result management unit 204, a candidate data determination unit 206, and a candidate data extraction unit 207. The information processing apparatus 200 further includes a tag candidate input unit 205, a candidate data output unit 208, and a candidate data input unit 209.

The pre-training model reading unit 201 reads a learning model (pre-training model) constructed by training a model in advance. The unit that reads the pre-training model is not particularly limited as long as the unit can be referred to by the pre-training model reading unit 201. The pre-training model reading unit 201 may load the read pre-training model into the memory of the information processing apparatus 200.

The tag data management unit 202 manages tag data (e.g., tag image data). For example, the tag data management unit 202 may store a plurality of pieces of tag data in such a manner that each piece of tag data can be read separately. The tag data managing unit 202 may manage the tag data using a database or the like.

For example, FIG. 3 shows an example of a data structure for managing tag data. In the example shown in fig. 3, an Identification (ID) is assigned to each piece of tag data, and information on a tag added to the tag data and information on a path to reach the tag data are associated with the ID. As a specific example, "tag a" is added as a tag to tag data to which "1" is assigned as an ID, and the tag data is stored at a position indicated by "AAA \ BBB \ CCC".

In the example shown in fig. 3, the tag data as a file is managed. However, the method for managing the tag data is not particularly limited as long as the data can be managed. As a specific example, the tag data management unit 202 may manage tag data as Base64 format data. In this case, the tag data management unit 202 may convert the file of the tag data into Base64 format data as necessary.

The score estimation unit 203 estimates correct answer information (label) that can be added to the target data to be annotated and a score representing the possibility that the target data is data indicated by the correct answer information, based on the read pre-trained model. Specifically, the score estimation unit 203 may estimate a label indicating the target data using the pre-trained model read by the pre-trained model reading unit 201, and estimate a score representing the possibility that the target data is the data indicated by the label. For example, the score is expressed in the form of a probability that the tag indicates the target data (i.e., a probability that the target data is the data indicated by the tag).

The estimation result management unit 204 manages correct answer information (label) that can be added to the target data and the score for the correct answer information estimated by the score estimation unit 203.

For example, fig. 4 shows an example of a data structure for managing estimation results regarding a tag and a score of the tag that can be added to target data for each target data. In the example shown in fig. 4, information corresponding to the score estimation result of each tag is associated with an ID assigned to each piece of tag data. In the example shown in fig. 4, the score estimation result for each of "tag a", "tag B", "tag C", and "tag D" is associated with an ID. As a specific example, the label data assigned with "1" as the ID indicates that "label a" has a score of "0.7", "label B" has a score of "0.4", "label C" has a score of "0.2", and "label D" has a score of "0.1".

The tag candidate input unit 205 receives designation of a candidate of a tag to be added as an annotation to target data (hereinafter also referred to as a candidate tag) from a user who performs an annotation operation. As a specific example, in a case where the user wants to know which tag is added to the target data, the tag candidate input unit 205 functions as an interface for inputting a tag candidate to the information processing apparatus 200.

The candidate data determination unit 206 determines, as candidate data, marker data for prompting a user who performs an annotation operation for reference information. Specifically, the candidate data determination unit 206 determines candidate data based on one or more candidate tags input via the tag candidate input unit 205 and information managed by the estimation result management unit 204. The processing performed by the candidate data determination unit 206 will be described in detail below.

The candidate data extracting unit 207 reads the tag data matching the expected condition from the tag data managing unit 202. For example, the candidate data extracting unit 207 may extract the tag data determined as the candidate data by the candidate data determining unit 206 from the data managed by the tag data managing unit 202.

The candidate data output unit 208 outputs information corresponding to the candidate data (tag data) extracted by the candidate data extraction unit 207 to a predetermined output destination. Therefore, the candidate data output unit 208 may prompt the information corresponding to the candidate data to the user as reference information, and the user may determine which correct answer information to add to the target data based on the reference information.

Candidate data input unit 209 receives, from the user, a designation of candidate data that is closer in meaning to target data among the candidate data for which information is presented to the user. As a specific example, the candidate data input unit 209 may receive, from the user, a designation of candidate data closer to the target data among the candidate data prompted to the user by the candidate data output unit 208.

(treatment)

Next, an example of processing to be performed by the information processing apparatus 200 according to the present exemplary embodiment will be described below. An example of a series of processing procedures for prompting reference information to add a tag to target data will be described with reference to fig. 5.

In step S501, the tag candidate input unit 205 receives specification of a candidate for a tag to be added as an annotation to target data from a user. For example, fig. 6 shows an example of an operation screen (user interface) on which designation of a candidate tag is received from a user.

The operation screen 600 includes an image display area 610, a tag candidate presentation area 620, a determination button 601, and a sample image display button 602. The image display area 610 is an area in which an image 611 corresponding to image data (target data) to be annotated is displayed. The tag candidate prompt region 620 is a region in which one or more candidate tags are prompted as candidates for a tag to be added as an annotation to target data, and designation of at least a part of the one or more candidate tags is received from a user. The determination button 601 is a button that receives an instruction from the user to determine a candidate tag specified in the tag candidate cue area 620 as a tag to be added to the target data. The sample image display button 602 is a button that receives an instruction for displaying a sample image from a user.

In the example shown in fig. 6, a target area 612 to be tagged as an annotation is specified for an image 611 displayed in the image display area 610. For example, the target area 612 is set according to designation made by the user using the pointer 630. For example, in order to operate the pointer 630, an input device 116 such as a pointing device connected to the information processing apparatus 200 is used. As a specific example, an area having four corners corresponding to four points specified on the image 611 by the pointer 630 may be set as the target area 612. This operation method is merely an example, and the operation method for setting the target region 612 is not particularly limited as long as at least a partial region of the image 611 can be specified.

In the example shown in fig. 6, candidate buttons 621 to 624 corresponding to the candidate labels of "label a" to "label D", respectively, are prompted in the label candidate prompt area 620. The candidate buttons 621 to 624 are buttons for receiving designation of candidate labels associated with the corresponding candidate buttons, respectively, from the user.

The example shown in fig. 6 is merely an example, and the configuration for prompting information on the designation of the reception candidate tag is not necessarily limited to the example shown in fig. 6. For example, the number of buttons to be prompted in the tag candidate prompting area 620 is not limited to four. For example, in other embodiments, the number of buttons may be changed according to the number of candidate tags. The candidate tags that may be selected may be changed according to the object to be annotated. As a specific example, candidate labels that can be selected may be determined according to an estimation result based on a pre-trained model for a partial image included in the specified target region 612. In this case, a predetermined number (for example, four) of tags having higher scores among a series of tags obtained as a result of the estimation may be determined as candidate tags.

After the target area 612 is specified and any one of the candidate buttons 621 to 624 is pressed, the label candidate input unit 205 acquires a label corresponding to the pressed candidate button as a candidate label when the sample image display button 602 is pressed. The pressing of each of the candidate buttons 621 to 624 and the pressing of the sample image display button 602 are realized by operating the indicator 630, for example, using a pointing device or the like.

Referring back to fig. 5, in step S502, the candidate data determination unit 206 sets the feature space using, as an axis, a score representing the possibility that the target data is the data indicated by the candidate tag, based on the information managed by the estimation result management unit 204 and the candidate tag acquired in step S501. The process for setting the feature space will be described in detail below with reference to fig. 8.

In step S503, the candidate data determination unit 206 determines candidate data (for example, candidate images) that prompt the user with information by using the feature space set in step S502. The process for determining candidate data that prompts the user for information will be described in detail below with reference to fig. 10.

In step S504, the candidate data extraction unit 207 extracts candidate data from the data managed by the tag data management unit 202 according to the determination result in step S503.

In step S505, the candidate data output unit 208 outputs information corresponding to the candidate data extracted in step S504 to a predetermined output destination, thereby prompting the user for the information. As a specific example, the candidate data output unit 208 may cause a display device such as a display to display an image corresponding to the image data extracted as the candidate data, thereby prompting the user of the image. For example, fig. 7 shows an example of an output screen (user interface) for prompting the user for information corresponding to the candidate data. In particular, fig. 7 shows an example of a screen for prompting the user of an image corresponding to image data extracted as candidate data.

The output screen 700 includes a candidate image display area 710, an end button 701, and a re-search button 702. The candidate image display area 710 displays an image (hereinafter also referred to as "candidate image") corresponding to image data extracted as candidate data. In the example shown in fig. 7, in the candidate image display area 710, candidate images 711 to 714 are displayed as images corresponding to image data extracted as candidate data. In the vicinity of each of the candidate images 711 to 714, information indicating a label added to each of the candidate images 711 to 714 is displayed. The end button 701 is a button for receiving an instruction from the user regarding termination of display of a candidate image. The re-search button 702 is a button for receiving an instruction on re-searching a candidate image from the user.

The example shown in fig. 7 is merely an example, and the configuration for prompting for information corresponding to candidate data is not necessarily limited to the example shown in fig. 7. For example, the number of candidate images displayed in the candidate image display area 710 is not limited to four. More than four images may be displayed or less than four images may be displayed. If it is difficult to display a series of candidate images to be displayed in the candidate image display area 710, the candidate image display area 710 may be configured to be scrollable by a scroll operation using a scroll bar or the like, thereby expanding the display area of the candidate images.

Referring again to fig. 5, in step S506, the candidate data determination unit 206 determines whether termination of the process for extracting candidate data is instructed. For example, if the end button 701 is pressed on the output screen 700 shown in fig. 7, the candidate data determination unit 206 may determine that the process for extracting candidate data is instructed to be terminated.

In step S506, if the candidate data determination unit 206 determines that termination of the processing for extracting candidate data is not instructed (no in step S506), the processing proceeds to step S507. In step S507, the candidate data input unit 209 receives, from the user, a designation of at least one piece of candidate data for which information is prompted to the user in step S505. Further, the candidate data determination unit 206 acquires candidate data whose designation is received from the user by the candidate data input unit 209 from the candidate data input unit 209.

As a specific example, the candidate data input unit 209 receives, on the output screen 700 shown in fig. 7, a designation of any one of the candidate images 711 to 714 displayed in the candidate image display area 710. Then, when the re-search button 702 is pressed in a state in which any of the candidate images 711 to 714 is specified, the candidate data input unit 209 acquires candidate data (image data) corresponding to the specified candidate image, and outputs the acquired candidate data to the candidate data determination unit 206. Then, the candidate data determination unit 206 performs the processing of step S502 and subsequent steps based on the candidate data acquisition result.

On the other hand, in step S506, if the candidate data determination unit 206 determines that an instruction to terminate the processing for extracting candidate data is instructed (yes in step S506), the series of processing procedures shown in fig. 5 is terminated.

Next, an example of processing regarding feature space setting shown with respect to the processing of step S502 in fig. 5 will be described with reference to fig. 8.

In step S801, the candidate data determination unit 206 sets a feature space using each candidate label acquired in step S501 as an axis.

In step S802, the candidate data determination unit 206 plots the label data in the feature space set in step S801 based on the information managed by the estimation result management unit 204 (information corresponding to the estimation result obtained by the score estimation unit 203). Therefore, point cloud data corresponding to the marker data is defined on the feature space.

In step S803, the candidate data determination unit 206 determines whether or not candidate data (e.g., image data corresponding to the specified candidate image) acquired in the process of step S507 shown in fig. 5 is input.

If the candidate data determination unit 206 determines that no candidate data is input in step S803 (no in step S803), the processing proceeds to step S804. In step S804, the candidate data determination unit 206 acquires the maximum value and the minimum value of the score of each candidate label defining each axis of the feature space as the range of the point cloud data defined in step S802 in the feature space.

In step S805, the candidate data determination unit 206 calculates an intermediate value between the maximum value and the minimum value acquired in step S804 for each axis (i.e., each candidate label) with respect to the setting of the feature space.

In step S806, the candidate data determination unit 206 divides at least a region where the point cloud data exists in the feature space into a plurality of quadrants based on the maximum value, the minimum value, and the median value acquired or calculated for each axis with respect to the setting of the feature space. Therefore, a plurality of quadrants are provided at least in the region where the point cloud data exists in the feature space.

For example, fig. 9A to 9F are explanatory views each illustrating a process of setting a plurality of quadrants by dividing an area where point group data exists in a feature space. In the examples shown in fig. 9A to 9F, for convenience of explanation of technical features according to the present exemplary embodiment, the feature space 900 is set as a two-dimensional space using the score of the label a as the horizontal axis 901 and the score of the label B as the vertical axis 902 for convenience.

Specifically, fig. 9A schematically shows a state in which, in the feature space 900 set in the process of step S801, the points 903 corresponding to the respective mark data are plotted in the process of step S802.

Fig. 9B schematically shows an area 910 where point cloud data exists. The region 910 is defined based on the maximum value and the minimum value acquired for each axis in the process of step S804.

Fig. 9C schematically shows a state in which the region 910 in which the point cloud data exists in the feature space is divided into a plurality of quadrants 911 to 914 based on the calculation result of the intermediate value in the process of step S805. Specifically, in the example shown in fig. 9C, the area 910 is divided into a plurality of quadrants 911 to 914 by a straight line passing through the middle value of the score of the tab a and perpendicular to the horizontal axis 901 and a straight line passing through the middle value of the score of the tab B and perpendicular to the vertical axis 902.

Although the examples shown in fig. 9A to 9F show the case where two candidate tags are used for the sake of simplifying the explanation, three or more candidate tags may be used to set the feature space. In this case, the feature space set in step S801 is a three-dimensional or more feature space.

Referring back to fig. 8, in step S803, if the candidate data determination unit 206 determines that candidate data is input (yes in step S803), the processing proceeds to step S807. In step S807, the candidate data determination unit 206 identifies a point corresponding to the candidate data (user-specified candidate data) acquired in the process of step S507 among the point group data set in the feature space. Then, the candidate data determination unit 206 acquires the coordinates of the recognition point 903 in the feature space.

In step S808, the candidate data determination unit 206 sets a new area by dividing the length of each side of the area in which the four quadrants set in advance in step S806 are combined into two with the coordinates acquired in step S807 as the center.

Next, in step S806, the candidate data determination unit 206 divides the new area set in step S808 into a plurality of new quadrants based on a process similar to that of step S806 described above.

Referring back to fig. 9A to 9F, fig. 9D schematically shows a state in which the point 915 corresponding to the candidate data acquired in step S507 is identified in the process of step S807. Fig. 9E schematically shows a state in which a new area 920 obtained by dividing the length of each side of an area (i.e., the area 910) in which the four quadrants 911 to 914 are combined into two is set centering on the point 915. Therefore, the new area 920 is set using the point 915 corresponding to the candidate data specified by the user as a base point.

Fig. 9F schematically shows a state in which the new region 920 set in the feature space is divided into a plurality of new quadrants 921 to 924 in the process of step S806 after the processes of steps S807 and S808 are performed.

The examples of fig. 9A to 9F show the case where the area set based on the maximum value and the minimum value of each axis is divided into a plurality of quadrants based on the intermediate value between the maximum value and the minimum value. However, these examples are not intended to limit the method of arranging the quadrants. As a specific example, clustering of the point cloud data set in step S802 may be performed, and each cluster set by clustering may be set as a quadrant.

Next, an example of processing for determining candidate data shown in the processing of step S503 in fig. 5 will be described with reference to fig. 10.

In step S1001, the candidate data determination unit 206 calculates the position (coordinates) of the center point of each quadrant set in the processing of step S806 illustrated in fig. 8.

In step S1002, the candidate data determination unit 206 calculates a distance in the feature space between the position of the center point calculated for each quadrant in step S1001 and the position of each point included in the point cloud data set in the process of step S802 shown in fig. 8.

In step S1003, the candidate data determination unit 206 identifies, for each quadrant, the point closest to the center point of the quadrant based on the calculation result obtained in step S1002 (i.e., the distance between the center point of the quadrant and each point contained in the point cloud data). Further, the candidate data determination unit 206 determines the marker data corresponding to the identification point for each quadrant as the candidate data representing the quadrant. Then, the candidate data extracting unit 207 extracts the marker data determined as the candidate data by the candidate data determining unit 206 from the marker data managed by the marker data managing unit 202.

Through the above-described control processing, the user sequentially designates (e.g., selects) more likely candidates from the candidates presented as the reference information. Therefore, the range of candidate data to be presented as reference information in the feature space is sequentially limited. Therefore, even in the case where there is no significant difference in reliability among a plurality of different tags, the user can sequentially specify more possible candidates from the presented candidates, thereby making it possible to effectively narrow down the semantic scope of the information-presented candidates.

This configuration enables the user to select a tag desired to be added to target data by a simple operation without checking a large amount of candidate data (tag data). In other words, with the technique according to the present exemplary embodiment, it is expected that even in a case where tags indicating different subjects do not have a significant difference in reliability, and thus the user is expected to carefully perform the tagging operation, it is possible to obtain the advantageous effect of assisting the user in performing the tagging operation in a more desirable manner.

A second exemplary embodiment of the present disclosure will be described below. In order to distinguish the information processing apparatus according to the second exemplary embodiment from the information processing apparatuses according to the other exemplary embodiments, the information processing apparatus according to the second exemplary embodiment will be hereinafter referred to as an information processing apparatus 1100 for convenience of explanation.

In the first exemplary embodiment described above, the set feature space is divided into a plurality of quadrants, and one candidate data is extracted from each quadrant. In the second exemplary embodiment, the number of candidate data to be extracted from each quadrant is changed based on the score representing the possibility that the target data to be annotated is the data indicated by each candidate tag. Therefore, differences between the features of the technique according to the second exemplary embodiment and the features of the technique according to the first exemplary embodiment will be mainly described below, and substantially the same portions as the first exemplary embodiment will not be described in detail.

(function configuration)

An example of the functional configuration of the information processing apparatus 1100 according to the present exemplary embodiment will be described with reference to fig. 11. The information processing apparatus 1100 according to the present exemplary embodiment differs from the information processing apparatus 200 shown in fig. 2 in that the information processing apparatus 1100 includes a target data input unit 1101.

The target data input unit 1101 receives input of target data to be annotated. The information processing apparatus 1100 according to the present exemplary embodiment estimates a label that can be added to the target data received by the target data input unit 1101 and a score of the label based on the pre-trained model, and uses the estimation result in the process for determining candidate data. This processing and the processing to be performed by the information processing apparatus 1100 according to the present exemplary embodiment will be described in detail below.

(treatment)

Next, an example of processing to be performed by the information processing apparatus 1100 according to the present exemplary embodiment will be described in detail below.

First, an example of a series of processing procedures for prompting reference information to add a tag to target data will be described with reference to fig. 12. The example shown in fig. 12 is different from the example shown in fig. 5 in that the processes of steps S1201 and S1202 are added, and a part of the process of step S503 is changed.

In step S1201, the target data input unit 1101 receives input of target data to be annotated. In order to receive the input of the target data, for example, the operation screen (user interface) 600 described with reference to fig. 6 may be used.

As a specific example, when the sample image display button 602 is pressed after the target area 612 is specified in the image display area 610, the image data on a partial image corresponding to the target area 612 in the image displayed in the image data display area 610 may be acquired as the target data.

In step S1202, the score estimation unit 203 estimates a tag that can be added to the target data acquired in step S1201 and a score indicating the possibility that the target data is the data indicated by the tag.

Subsequently, the processes of steps S501 and S502 are performed, and then the process for determining candidate data is performed in step S503.

An example of the processing of step S503 in the example shown in fig. 12 will be described with reference to fig. 13. The example shown in fig. 13 is different from the example shown in fig. 10 in that the processing of steps S1301 and S1302 is included instead of the processing of step S1003.

In step S1301, the candidate data determination unit 206 plots the target data in the feature space set in step S502 based on the score of each tag estimated for the target data in step S1202.

In step S1302, the candidate data determination unit 206 calculates a distance between the position of the target data plotted in the feature space in step S1301 and the center point of each quadrant set in step S1001. Then, the candidate data determination unit 206 controls the number of target data to be determined from each quadrant, so that a larger number of candidate data can be determined from the quadrant closer to the position of the target data.

As a specific example, the candidate data determination unit 206 may determine three candidate data closest to the center point of the quadrant from among the quadrants closest to the position of the target data. In this case, the candidate data determination unit 206 may determine two candidate data closest to the center point of the quadrant from among the quadrants sharing one side with the quadrant closest to the position of the target data. Further, the candidate data determination unit 206 may determine one candidate data closest to the center point of the quadrant from among the quadrants sharing one point with the quadrant closest to the position of the target data. Then, the processing of steps S504 to S507 shown in fig. 12 is performed as in the example shown in fig. 5.

In the above example, the number of candidate data extracted from each quadrant is set to a fixed number according to the proximity to the position of the target data. However, the operation of the information processing apparatus according to the present exemplary embodiment is not necessarily limited in this manner.

As a specific example, the number of candidate data to be extracted from each quadrant may be changed based on the designation of the user according to the proximity to the position of the target data. In another example, the number of candidate data to be extracted from the quadrants may be changed according to a distance between the position of the target data and the center point of each quadrant. Although the center point is used as the representative point of each quadrant in the above exemplary embodiment, any point other than the center point may be used.

A third exemplary embodiment of the present disclosure will be described below. In order to distinguish the information processing apparatus according to the third exemplary embodiment from the information processing apparatuses according to the other exemplary embodiments, the information processing apparatus according to the third exemplary embodiment will be hereinafter referred to as an information processing apparatus 1400 for convenience of explanation.

In the second exemplary embodiment described above, the number of candidate data to be extracted from each quadrant is changed based on the score representing the possibility that the target data to be annotated is data indicated by each candidate tag. In the present exemplary embodiment, if the distance between the target data selected by the user and the candidate data in the feature space is increased by a predetermined multiple or more, the predetermined information is provided to the user based on the score indicating the possibility that the target data is the data indicated by each candidate tag. Therefore, differences between the features of the technique according to the third exemplary embodiment and the features of the technique according to the second exemplary embodiment will be mainly described below, and substantially the same portions as those of the second exemplary embodiment will not be described in detail.

(function configuration)

An example of the functional configuration of the information processing apparatus 1400 according to the present exemplary embodiment will be described with reference to fig. 14. The information processing apparatus 1400 according to the present exemplary embodiment differs from the information processing apparatus 1100 shown in fig. 11 in that the information processing apparatus 1400 includes a notification unit 1401.

The notification unit 1401 causes a predetermined output unit to provide notification information according to a predetermined condition, thereby prompting the user of the notification information. As a specific example, if the distance between the target data selected by the user and the candidate data in the feature space increases by a predetermined multiple or more, the notification unit 1401 may prompt the user with notification information indicating a warning.

(treatment)

Next, an example of processing to be performed by the information processing apparatus 1400 according to the present exemplary embodiment will be described below with reference to fig. 15, particularly by focusing on a series of processing procedures in which reference information is presented for adding a tag to target data. The example shown in fig. 15 is different from the example shown in fig. 12 in that the processes of steps S1501 to S1504 are added as the processes after step S507.

In step S1501, the candidate data determination unit 206 calculates a distance between the position of the target data plotted in step S1301 in the feature space set in step S502 and the position of the candidate data specified in step S507.

In step S1502, the candidate data determination unit 206 determines whether the distance calculated in step S1501 is incremented by a predetermined multiple or more in the case where the process of step S507 is continuously executed a plurality of times. As a specific example, the candidate data determination unit 206 may determine whether the distance calculated in step S1501 is increased by three times or more.

In step S1502, if the candidate data determination unit 206 determines that the distance calculated in step S1501 is not incremented by the predetermined multiple or more (no in step S1502), the processing returns to step S503. In this case, the processing of step S503 and the subsequent steps are executed again.

On the other hand, in step S1502, if the candidate data determination unit 206 determines that the distance calculated in step S1501 is incremented by a predetermined multiple or more (yes in step S1502), the processing proceeds to step S1503. In step S1503, the notification unit 1401 causes a predetermined output unit to provide notification information (for example, notification information indicating a warning) to thereby prompt the user with the notification information.

For example, fig. 16 shows an example of a dialog box (user interface) prompting the user for predetermined notification information as display information. The dialog 1600 displays notification information (e.g., character information) representing a warning, thereby prompting the user of the notification information. Dialog 1600 includes OK button 1601, redo button 1602, and close button 1603.

OK button 1601 is a button for receiving an instruction from the user to continue execution of the processing currently being executed. The resume button 1602 is a button for interrupting the processing currently being executed and receiving an instruction from the user to redo the processing for determining the candidate data (to execute the processing again according to the setting of the feature space). The close button 1603 is a button for receiving an instruction to close the dialog 1600 from the user. In the present exemplary embodiment, when the off button 1603 is pressed, a process similar to that executed when the determination button 1601 is pressed is applied.

In step S1504, the candidate data determination unit 206 determines whether or not the redo process is selected. As a specific example, if it is detected that the redo button 1602 is pressed on the dialog 1600 shown in fig. 16, the candidate data determination unit 206 may determine that the redo process is selected. On the other hand, if it is detected that the decision button 1601 or the close button 1603 is pressed on the dialog 1600, the candidate data determination unit 206 may determine that the redo processing is not selected.

In step S1504, if the candidate data determination unit 206 determines that the redo processing is selected (yes in step S1504), the processing returns to step S502. In this case, the feature space setting process exemplified by the process of step S502 is executed again.

On the other hand, in step S1504, if the candidate data determination unit 206 determines that the redo processing is not selected (no in step S1504), the processing returns to step S503. In this case, the process for determining the candidate data exemplified by the process of step S503 is executed again.

The present exemplary embodiment described above shows an example in which notification information (e.g., a warning) is provided when the distance between the target data and the selected candidate data is increased by a predetermined multiple or more. However, the processing to be executed by the information processing apparatus according to the present exemplary embodiment is not necessarily limited thereto.

As a specific example, the notification information may be provided according to whether the target data and the selected candidate data are spaced apart from each other by a predetermined distance or more. In another example, the distance between the target data and the candidate data may be accumulated each time the candidate data is selected, and the notification information may be provided when the accumulated value of the distance exceeds a threshold. The threshold value for determining the number of times, the distance, the accumulated value of the distance, or the like may be a fixed value, or may be changed based on an instruction from the user.

In the example of the above-described processing, the determination in step S1504 provides the user with two options, that is, an option to continue the processing and an option to redo the processing from the setting of the feature space. However, the options that the user can select in the determination are not limited to this example.

As a specific example, an option may be provided to restart processing from a state in which the previous candidate data is selected. In this case, a process for storing the state of the feature space at the time of selecting the previous candidate data may be added. In this case, an interface (e.g., a button) for resuming the processing from a state in which the user has selected the previous candidate data may be provided on the dialog 1600 shown in fig. 16.

Embodiments of the present disclosure can also be realized by a process in which a program for realizing one or more functions according to the above-described exemplary embodiments is supplied to a system or an apparatus via a network or a recording medium, and the processor in the computer of the one or more systems or apparatuses reads and executes the program. Embodiments may also be implemented by a circuit (e.g., an Application Specific Integrated Circuit (ASIC)) for implementing one or more functions according to the above-described exemplary embodiments.

The above-described embodiments are merely examples, and may be modified or changed in various ways without departing from the technical scope according to the present disclosure.

For example, the functions of the information processing apparatus according to the present exemplary embodiment may be realized by a plurality of apparatuses in a cooperative manner. As a specific example, some components of the information processing apparatus shown in fig. 2, 11, and 14 may be provided in an external device different from the information processing apparatus. Further, the load on the processing performed by at least some of the components of the information processing apparatuses shown in fig. 2, 11, and 14 may be distributed to a plurality of apparatuses. Although the exemplary embodiment is described by focusing on the case where image data is mainly used as data to which a tag is added as an annotation, the type of data is not necessarily limited to the above-described image data. Further, the method of prompting for related information and the method of receiving designation of target data from the user may be appropriately changed according to the type of data to which a tag is added as an annotation.

According to embodiments of the present disclosure, a user may be helped to select a tag to be added as an annotation in a more desirable manner.

Other embodiments

The embodiments of the present disclosure can also be realized by a method in which software (programs) that perform the functions of the above-described embodiments are supplied to a system or an apparatus through a network or various storage media, and a computer or a Central Processing Unit (CPU), a Micro Processing Unit (MPU) of the system or the apparatus reads out and executes the methods of the programs.

Although exemplary embodiments are described herein, it should be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.

33页详细技术资料下载

Information processing apparatus, information processing method, and storage medium

相关技术

网友询问留言