Confidence measures for deployed machine learning models

文档序号：1850878 发布日期：2021-11-16 浏览：2次中文

阅读说明：本技术 用于已部署机器学习模型的置信度度量 (Confidence measures for deployed machine learning models ) 是由 M·伦加 R·威克 T·克林德 M·伯格托尔特 H·卡洛鲁斯于 2020-01-21 设计创作，主要内容包括：呈现了用于获取机器学习模型的置信度度量的概念。一个这种概念利用机器学习模型处理输入数据以生成初步结果。它还生成输入数据的多个修改实例并且利用机器学习模型处理输入数据的多个修改实例以生成相应多个二次结果。基于二次结果来确定与初步结果有关的置信度度量。(Concepts for obtaining confidence measures for machine learning models are presented. One such concept processes input data using a machine learning model to generate preliminary results. It also generates a plurality of modified instances of the input data and processes the plurality of modified instances of the input data with the machine learning model to generate a respective plurality of secondary results. A confidence measure associated with the preliminary result is determined based on the secondary result.)

1. A method for obtaining a confidence measure for a machine learning model, the method comprising:

processing input data with the machine learning model to generate a preliminary result;

generating a plurality of modified instances of the input data;

processing the plurality of modified instances of the input data with the machine learning model to generate a respective plurality of secondary results; and

determining a confidence measure related to the preliminary result based on the secondary result.

2. The method of claim 1, wherein determining a confidence metric comprises:

determining a measure of the distribution or variance of the secondary results; and

determining a confidence measure based on the determined measure of distribution or variance.

3. The method of claim 2, wherein determining a measure of the distribution or variance of the secondary result comprises determining at least one of:

the inverse variance of the quadratic result;

shannon entropy of the secondary result;

(ii) a kini coefficient of the secondary result;

(iii) Kullback-Liebler divergence of the secondary results; and

a concentration measure of the secondary results.

4. The method of any of claims 1-3, wherein generating the plurality of modified instances of the input data comprises:

a first spatial warping transform is applied to the input data to generate a first modified instance of the input data.

5. The method of claim 4, further comprising:

applying a first inverse spatial warping transform to the secondary result generated for the first modified instance of the input data.

6. The method of any of claims 1-5, wherein generating the plurality of modified instances of the input data comprises:

adding noise to the input data to generate a second modified instance of the input data.

7. The method of any of claims 1-6, wherein generating the plurality of modified instances of the input data comprises:

applying a local deformation transformation to the input data to generate a third modified instance of the input data.

8. The method of claim 7, further comprising:

applying a first inverse local deformation transform to the secondary result generated for the third modified instance of the input data.

9. The method of any of claims 1-8, wherein the machine learning model comprises at least one of:

an artificial neural network;

generating an antagonistic network GAN; and

a bayesian network.

10. The method of any of claims 1 to 9, further comprising:

associating the determined confidence measure with the preliminary result.

11. A computer program product for obtaining a confidence measure for a machine learning model, wherein the computer program product comprises a computer-readable storage medium having computer-readable program code embodied therein, the computer-readable program code configured to: all the steps of any one of claims 1 to 10 are performed when executed on at least one processor.

12. A system comprising at least one processor and the computer program product of claim 11.

13. A system for obtaining a confidence measure for a machine learning model, the system comprising:

an input interface (110) configured to obtain input data;

a data modification component (120) configured to generate a plurality of modified instances of the input data;

a machine learning model interface (122) configured to communicate the input data and the plurality of modified instances of the input data to a machine learning model (105), and further configured to receive a preliminary result generated by the machine learning model processing the input data, and configured to receive a plurality of secondary results generated by the machine learning model processing the respective plurality of modified instances of the input data; and

an analysis component (124) configured to determine a confidence measure related to the preliminary result based on the secondary result.

14. The system of claim 13, wherein the analysis component is configured to determine a measure of distribution or variance of the secondary results and to determine a confidence measure based on the determined measure of distribution or variance.

15. The system of claim 13 or 14, wherein the data modification component is configured to apply a first spatial warping transform to the input data to generate a first modified instance of the input data, and wherein the data modification component is further configured to apply a first inverse spatial warping transform to the secondary result generated for the first modified instance of the input data.

Technical Field

The present invention relates generally to machine learning, and more particularly to obtaining confidence measures for deployed machine learning models.

Background

Recent technological advances have led to the use of Machine Learning (ML) models that are intended to assist in data analysis (e.g., for identifying medical features and/or making clinical decisions). Typical data analysis applications include recognition, delineation (e.g., semantic segmentation, voxel labeling), and classification (e.g., classification).

ML models are typically trained using training data sets that are limited in size and/or variability. For example, in the medical field, the variability represented by all training data is limited due to the lack of large databases. Therefore, so-called 'augmentation' methods are often employed to increase the size and/or variability of the training data set in order to improve the performance, reliability and/or robustness of the ML model.

After training and deployment, the customer uses the final (fixed) ML model to evaluate the input data (e.g., new medical cases).

For the client (i.e., at the client side), the ML component/system is typically a closed/fixed 'black box' configured to receive input data and generate/output results or decisions based on the input data. Thus, in a typical use case, the ML component/system is 'sealed' (or fixed), and it is not possible to perform retraining of the ML model on the client side. Such sealing (or fixing) of ML components/systems may be due to a number of different reasons, including, for example, limited computing resources; a licensing issue; infeasibility of field tag correction; or FDA constraints.

Disclosure of Invention

The present invention provides a system and a method as defined in the independent claims. The dependent claims provide advantageous embodiments.

A method for obtaining a confidence measure for a machine-learned ML model is provided, the method comprising: processing the input data using an ML model to generate a preliminary result; generating a plurality of modified instances of the input data; processing the plurality of modified instances of the input data using the ML model to generate a corresponding plurality of secondary results; and determining a confidence measure associated with the preliminary result based on the secondary result.

A concept is presented for determining a confidence (i.e., a confidence measure) related to a deployed ML model. In particular, it is proposed that the confidence measure may be determined by modifying (or augmenting) the input data and analyzing the results provided by the ML model for the modified (or augmented) input data. Such a proposal may rely on the notion that an acceptable ML model should have 'good performance', for example, so that small perturbations to the input data should have correspondingly less impact on the model output.

For example, the input data may be automatically subjected to several modifications (or augmentations), which are then processed by the ML model. Based on the results associated with the modified (or augmented) data, the variation in the results can be analyzed to assess the robustness or variability of the results. This may enable a confidence measure of the ML model to be determined. For example, a confidence measure specific to certain input data may be determined, and this may be based on a variance of results provided by the ML model that processes a modified (or augmented) version of the particular input data.

For example, the proposed embodiments may be used to identify whether the results of a client-side ML model (e.g., provided by processing specific input data) are reliable.

Still further, embodiments may facilitate providing additional information related to uncertainty associated with the output or results of the ML model.

Thus, the proposed embodiments may be particularly advantageous for applications that preferably indicate the perceptual accuracy or reliability of the deployed (e.g., client-side) ML model output. By way of example, this may be particularly important in the healthcare field, where healthcare practitioners need to understand and evaluate ML model results (and accept or adjust ML model decisions accordingly).

Unlike traditional ML models, which may be provided with an indication of a global/general confidence level, e.g., provided by a model provider, the proposed embodiments may provide a confidence measure that is specific to the input data (e.g., a single medical case).

Thus, concepts beyond the traditional approach of simply highlighting a general or global confidence level may be provided. For example, the proposed embodiments may associate confidence measures with ML model results/outputs for specific input data of the ML model. This may enable providing results to supplemental information (such as image overlays and related textual descriptions), which may allow experts (e.g., clinical experts, technicians, data analysts, engineers, medical practitioners, radiologists, etc.) to quickly evaluate the results of the model by focusing on the results/outputs associated with higher confidence measures.

The ML model employed by the embodiments may be constructed using conventional machine learning and/or image processing techniques to leverage historical data and/or established knowledge to improve the accuracy of the determinations/results provided by the proposed embodiments.

Embodiments can provide a confidence estimate (e.g., a measure of uncertainty) with respect to a result associated with particular input data (e.g., an image feature or region). As such, embodiments may help identify input data and/or output results with a high degree of ML model uncertainty.

Thus, the proposed embodiments can identify input data (e.g., medical image regions) that are important to the ML model output, and also associate such input data with visual features (e.g., image overlays with relevant textual descriptions) that may be useful to a user (such as, for example, a medical practitioner). This may allow a user to quickly and easily verify the results of the model and identify situations where the model does not make a correct or trustworthy decision. Further, embodiments may identify an uncertainty (i.e., a confidence measure) associated with each input data (e.g., each medical case). For example, this may allow a user (such as a medical practitioner) to view the model output starting with the least determined (i.e., with the lowest confidence measure) output.

Accordingly, the presented embodiments may facilitate improved data analysis and case diagnosis (e.g., made more accurate and/or easier). Embodiments may also be employed to improve the efficiency and/or effectiveness of a Clinical Decision Support (CDS) system. Thus, the proposed embodiments may provide an improved CDS concept.

Thus, the proposed embodiments may be particularly relevant for medical data analysis and medical image analysis. For example, it may be helpful to identify input/output data (e.g., medical cases or medical image features) of the ML model, and to identify uncertainties (i.e., confidence measures) associated with ML model output of the input data. Thus, the proposed concept may also facilitate accurate assessment or diagnosis of a subject's health using medical analysis. Thus, the input data may comprise, for example, medical data, medical images, or medical features. Further, the results generated by processing the input data using the ML model may include reasoning, medical decision, diagnosis, decision, or recommendation.

In some proposed embodiments, determining the confidence measure may include: determining a measure of the distribution or variance of the secondary results; and determining a confidence measure based on the determined measure of distribution or variance. For example, determining a measure of the distribution or variance of the secondary results may include determining at least one of: the inverse variance of the quadratic result; shannon (Shannon) entropy of the quadratic result; the coefficient of the second order outcome kini (gini); (iv) Kullback-Liebler divergence of secondary results; and a concentration metric of the secondary results. Thus, simple mathematical methods or formulas may be employed to determine the confidence measures of the machine learning model. Thus, it may be realized that using the ML model facilitates easy implementation of accurate and/or informed data analysis with reduced complexity.

It should be appreciated that various approaches, methods, or functions may be used to provide a confidence metric based on secondary results. For example, some embodiments may employ the inverse variance of the quadratic result, while other embodiments may employ methods for measuring histograms of classified data, such as shannon entropy, kini coefficient, Kullback-liebler (kl) divergence, and so forth. Alternatively or additionally, a concentration metric of empirical distribution may be employed.

In some embodiments, generating the plurality of modified instances of the input data may include: a first spatial warping (warping) transformation is applied to the input data to generate a first modified instance of the input data. Thus, a simple modification/augmentation method may be employed to generate modified instances of the input data. This may also allow control of the modifications made. Thus, a modified instance of generating input data can be realized easily with reduced complexity.

Further, embodiments may also include: a first inverse spatial warping transform is applied to the quadratic result generated for the first modified instance of the input data. In this way, the results may be transformed back to be compared with the results of the unmodified input data, thereby enabling easier and/or more accurate evaluation.

In some embodiments, generating the plurality of modified instances of the input data may include: noise is added to the input data to generate a second modified instance of the input data. Adding noise may enable simple addition of random modifications, thereby ensuring that small random perturbations are made. Thus, small or minor modifications to the input data with reduced complexity may be easily achieved by the proposed embodiments.

Further, generating multiple modified instances of the input data may include: a local deformation transformation (e.g., a bending function) is applied to the input data to generate a third modified instance of the input data. Further, such embodiments may further include: the first inverse local deformation transform is applied to the secondary results generated for the third modified instance of the input data.

By way of example, the machine learning model may include an artificial neural network, a generative confrontation network (GAN), a bayesian network, or a combination thereof.

Embodiments may also include the steps of: the determined confidence measure is associated with the preliminary result.

Embodiments may also include the steps of: an output signal is generated based on the determined confidence measure. Embodiments may be adapted to provide such an output signal to at least one of: a subject, a medical practitioner, a medical imaging device operator, and a radiologist. Thus, the output signal may be provided to a user or medical device for indicating the calculation result/decision and its associated confidence measure.

Some embodiments may further comprise the steps of: generating a control signal for modifying the graphical element based on the determined confidence measure. The graphical element may then be displayed in accordance with the control signal. In this way, a user (such as a radiologist) may have a suitably arranged display system that can receive and display information about the results provided by the machine learning model. Thus, embodiments may enable a user to remotely analyze results (e.g., output, decision, inference, etc.) from deployed (e.g., client-side) machine learning models.

According to yet another aspect of the present invention, a computer program product for obtaining a confidence measure for a machine learning model is provided, wherein the computer program product comprises a computer readable storage medium having computer readable program code embodied therein, the computer readable program code being configured to, when executed on at least one processor, perform all the steps of an embodiment.

A computer system may be provided comprising a computer program product according to an embodiment; one or more processors adapted to perform a method according to an embodiment by executing the computer readable program code of the computer program product.

In another aspect, the invention relates to a computer-readable non-transitory storage medium comprising instructions which, when executed by a processing device, perform the steps of a method for identifying features in medical imaging of a subject according to an embodiment.

According to another aspect of the present invention, there is provided a system for obtaining confidence measures for a machine learning model, the system comprising an input interface configured to obtain input data; a data modification component configured to generate a plurality of modified instances of input data; a machine learning model interface configured to communicate input data and a plurality of modified instances of the input data to a machine learning model, and further configured to receive a preliminary result generated by the machine learning model processing the input data and receive a plurality of secondary results generated by the machine learning model processing a respective plurality of modified instances of the input data; and an analysis component configured to determine a confidence measure related to the preliminary result based on the secondary result.

It should be appreciated that all or part of the proposed system may comprise one or more data processors. For example, the system may be implemented using a single processor adapted for data processing to determine a confidence measure for the deployed machine learning model.

The system for obtaining the confidence measure of the machine learning model may be located remotely from the machine learning model, and data may be communicated between the machine learning model and the system unit via a communication link.

The system may include a server device including an input interface, a data modification component, and a machine learning model interface; and a client device comprising an analysis component. Accordingly, a dedicated data processing device may be employed to determine the confidence measure, thereby reducing the processing requirements or capabilities of other components or devices of the system.

The system includes a client device, wherein the client device includes an input interface, a data modification component, a client-side machine learning model, and an analysis component. In other words, a user (such as a medical professional) may have a suitably arranged client device (such as a laptop, tablet, mobile phone, PDA, etc.) that processes received input data (e.g., medical data) in order to generate preliminary results and associated confidence measures.

Thus, the processing may be hosted at a location different from the location at which the input data is generated and/or processed. For example, for reasons of computational efficiency, it may be advantageous to perform only part of the processing at a particular location, thereby reducing associated costs, processing power, transmission requirements, and the like.

Thus, it should be appreciated that processing power may thus be distributed throughout the system in different ways depending on predetermined constraints and/or availability of processing resources.

Embodiments may also enable some of the processing load to be distributed throughout the system. For example, the pre-processing may be performed at a data acquisition system (e.g., a medical imaging/sensing system). Alternatively or additionally, the processing may be performed at the communications gateway. In some embodiments, processing may occur at a remote gateway or server, thereby foregoing processing requirements from end users or output devices. Such distribution of processing and/or hardware may allow for improved maintenance capabilities (e.g., by concentrating complex or expensive hardware in a preferred location). It may also design or locate the computational load and/or traffic within the networked system based on the available processing power. The preferred approach may be to process the initial/source data locally and transmit the extracted data for overall processing at a remote server.

Embodiments may be implemented in connection with pre-existing, pre-installed, or otherwise separately supplied machine learning models. Other embodiments may be provided with (e.g., integrated into) new apparatus that include machine learning models.

These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.

Drawings

Examples according to aspects of the present invention will now be described in detail with reference to the accompanying drawings, in which

Fig. 1 is a simplified block diagram of a system for obtaining confidence measures for a machine learning model, according to an embodiment;

fig. 2 is a flow diagram of a method for obtaining a confidence measure for a machine learning model, according to an embodiment; and

fig. 3 is a simplified block diagram of a system for obtaining confidence measures for a machine learning model, according to another embodiment.

Detailed Description

A concept for obtaining a confidence measure for a machine learning model is presented. Further, the confidence measures may be associated with particular input data provided to the machine learning model. Thus, embodiments may enable providing information that may be useful for evaluating model output.

In particular, the confidence measure may be determined by modifying (or augmenting) the input data and analyzing the results provided by the ML model for the modified (or augmented) input data. For example, the input data may be automatically subjected to several different modifications (or augmentations), which are then processed by the ML model. The ML model output (i.e., the results) of the modified (or augmented) data can then be analyzed to assess the robustness or variability of the results. This may enable a confidence measure of the ML model to be determined.

For example, a confidence metric associated with the input data of the ML model may be determined, and this may be based on a variance of the ML model output (i.e., the result) of the modified (or augmented) input data.

Associating such confidence measures with the input data of the ML model may cause an indication (e.g., a graphical overlay of a textual description with the associated confidence measure) to be associated with the ML model output of the input data. This may help to evaluate ML model results simply and quickly (e.g., by identifying output of the model that is less reliable).

Embodiments may provide an estimate of uncertainty about the ML model output. Thus, the proposed embodiments may be used, for example, to identify whether the output of a client-side ML model (e.g., provided by processing specific input data) is reliable.

For example, embodiments may be used to improve medical data analysis of a subject. Thus, the illustrative embodiments may be used in many different types of medical evaluation devices and/or medical evaluation facilities, such as hospitals, wards, research facilities, and the like.

By way of example, the ML model output confidence estimate may be used to understand and/or evaluate decisions made by the ML model. Using the proposed embodiments, the user may, for example, identify less reliable or more reliable model outputs/results.

Further, embodiments may be integrated in a data analysis system or ML decision system to provide real-time information about the results to a user (e.g., technician, data analyst). Using this information, a technician can review the model outputs and/or decisions and, if necessary, adjust or modify the outputs and/or decisions.

The proposed embodiments can identify uncertain decisions or outputs from the ML model. Such decisions/outputs may then be focused on and/or refined (e.g., via learning from more information sources).

In order to provide context for a description of the elements and functionality of the illustrative embodiments, the following figures are provided as examples of how aspects of the illustrative embodiments may be implemented. Thus, it should be appreciated that the figures are only examples and are not intended to assert or imply any limitation with regard to the environments, systems, or methods in which aspects or embodiments of the present invention may be implemented.

Embodiments of the present invention may be directed to enabling potential ranking of ML model results. This may be useful for evaluating deployed (e.g., client-side) ML models, for example, by identifying uncertain decisions or outputs. This may help to reduce the impact of erroneous or inaccurate decisions, thereby providing improved data analysis. Thus, embodiments may be used for real-time data evaluation purposes, e.g., to evaluate whether a medical image analysis model is appropriate for a particular subject and/or medical scanning procedure.

Fig. 1 illustrates an embodiment of a system 100 for obtaining confidence measures for an ML model 105 according to an embodiment. Herein, the ML model 105 is deployed to the client and, thus, has been trained and completed. Therefore, client-side retraining of the ML model 105 is not feasible or possible.

The system 100 comprises an interface component 110, the interface component 110 being adapted to acquire input data 10. Herein, the interface component 110 is adapted to receive input data 10 in the form of medical images 10 from a medical imaging apparatus 115 (such as, for example, an MRI device).

The medical image 10 is communicated to the interface component 110 via a wired connection or a wireless connection. By way of example, the wireless connection may comprise a medium-short range communication link. For the avoidance of ambiguity, a medium-short range communication link may be considered to mean a short range or medium range communication link having a range of up to about one hundred (100) meters. In short-range communication links designed for very short communication distances, the travel distance of signals typically varies from a few centimeters to several meters, whereas in medium-range communication links designed for medium-short range communication, the travel distance of signals typically amounts to up to one hundred (100) meters. Examples of short-range wireless communication links include ANT +, bluetooth low energy, ieee802.15.4, ISA 100a, infrared (IrDA), Near Field Communication (NFC), RFID, 6LoWPAN, UWB, wireless HART, wireless HD, wireless USB, ZigBee. Examples of medium range communication links include Wi-Fi, ISM band, Z-Wave. Herein, the output signal is not encrypted for communication via a wired connection or a wireless connection in a secure manner. However, it should be appreciated that in other embodiments, one or more encryption techniques and/or one or more secure communication links may be employed for communication of signals/data in the system.

The system 100 also includes a data modification component 120, the data modification component 120 configured to generate a plurality of modified instances of the input data. In particular, the data modification component 120 of the present embodiment is configured to apply a plurality of different spatial warping transformations to the medical image 10 to generate a corresponding plurality of modified instances of the medical image 10. In this way, the data modification component 120 makes small/minor modifications to the medical image 10 to generate multiple modified (or augmented) versions of the medical image.

The system 100 also includes an ML model interface 122, the ML model interface 122 configured to communicate the medical image 10 and the plurality of modified instances of the medical image to the ML model 105. To this end, the machine learning model interface 122 of the system 100 may communicate the machine learning model 105 via the internet or "cloud" 50.

In response to receiving the medical image 10 and the plurality of modified instances of the medical image, the ML model processes the received data to generate respective results. More specifically, the medical image 10 is processed by the ML model 105 to generate a preliminary result, and the modified instances of the input data are processed by the ML model 105 to generate a corresponding plurality of secondary results.

These results generated by the ML model 105 are then communicated back to the system 100. Thus, the machine learning model interface 122 is further configured to receive the preliminary results (generated by the ML model 105 processing the medical image 10) and receive the plurality of secondary results (generated by the ML model 105 processing the respective plurality of modified instances of the medical image 10).

In this embodiment, because the data modification component 120 applies a plurality of different spatial warping transforms to the medical image 10 (to generate a corresponding plurality of modified instances of the medical image 10), the data modification component 120 is further configured to apply a corresponding inverse spatial warping transform to the received plurality of secondary results. In this way, the secondary results are transformed back or normalized for reference to the primary results.

Herein, it should be noted that for this purpose, applying the transformation and/or inverse transformation, the data modification component 120 may be in communication with one or more data processing resources available in the internet or "cloud" 50. Such data processing resources may undertake some or all of the processing required to implement the transformation. Thus, it should be appreciated that this embodiment may employ distributed processing principles.

The system 100 also includes an analysis component 124, the analysis component 124 configured to determine a confidence measure based on the secondary results. Herein, the analysis component 124 is configured to determine a confidence measure based on the inverse variance of the secondary result. Thus, the value of the confidence measure is higher when the variance of the secondary result is lower. Conversely, when the variance of the secondary result is high, the value of the confidence measure is low. The confidence measures are then associated with the input medical image 10 and its preliminary results processed by the ML model 105.

Again, it should be noted that to determine the confidence measure based on the secondary results, the analysis component 124 may be in communication with one or more data processing resources available in the internet or "cloud" 50. Such data processing resources may perform some or all of the processing required to determine the confidence measure. Thus, it should be appreciated that this embodiment may employ distributed processing principles.

The analyzing component 124 is further adapted to generate an output signal 130 representing the preliminary result and the determined confidence level. In other words, after determining the reliability level of the preliminary result, the output signal 130 representing the reliability level of the preliminary result is generated.

The system also includes a Graphical User Interface (GUI)160, the GUI 160 for providing information to one or more users. The output signal 130 is provided to the GUI 160 via a wired connection or a wireless connection. By way of example, the wireless connection may comprise a medium-short range communication link. As indicated in fig. 1, the output signal 130 is provided from the data processing unit 110 to the GUI 160. However, where the system already uses data processing resources via the internet or cloud 50, the output signal GUI 160 may be made available to the GUI 160 via the internet or cloud 50.

Based on the output signal 130, the GUI 160 is adapted to convey information by displaying one or more graphical elements in a display area of the GUI 160. As such, the system may communicate information about the outcome of processing the medical image 10 using the ML model 105, which may be used to indicate a level of certainty or confidence associated with the processing outcome. For example, the GUI 160 may be used to display graphical elements to medical practitioners, data analysts, engineers, medical imaging device operators, technicians, and so forth.

Although the example embodiment of fig. 1 detailed above is described with respect to medical imaging, it should be appreciated that the concepts presented may be extended to other forms of input data, such as medical case notes, engineering images, and the like.

Further, from the above description, it should be appreciated that the ML model may take any suitable form. For example, the ML model may include an artificial neural network, a generative confrontation network (GAN), a bayesian network, or a combination thereof.

From the above description of FIG. 1, it should be appreciated that embodiments may provide an estimate of the confidence in the results returned from the ML model. The estimate of confidence may be derived by means of input data augmentation/modification and is directed to the deployed ML model acquisition at the client side.

Embodiments may be premised on the following proposals: an accurate or reliable/trustworthy ML model should have a "good performance", i.e. small perturbations to the input data should have a small corresponding effect on the output data. To analyze the ML model, the input data (e.g., new cases) to be processed are subjected to a number of expansions/modifications, which are then all processed by the ML model. The variance from the results of processing the extended/modified input may then be used to determine a measure of confidence (which may represent the accuracy, robustness, or reliability of the ML model). As such, embodiments may provide input-specific (e.g., case-specific) confidence measures, and this may not be relevant to any statement or suggestion made by the provider of the ML model.

Thus, the proposed embodiments can be generalized to a proposal based on interpreting the generalization capability of the ML model as a continuity property. In other words, if the ML model (such as a neural network) is well generalized to invisible input data (e.g., case C)₀) Then, for the input data (C)₀) Should only have a small corresponding effect on the ML model output.

Embodiments may be applied to the client-side deployment phase (rather than the training phase) of the deep learning module.

Also by way of example, a method for obtaining confidence measures for an ML model according to an embodiment may be summarized as follows:

(i) new case C₀Input to the ML model to produce a particular preliminary result R₀(ii) a In the case of semantic segmentation via voxel labeling, C₀Corresponds to the input image volume, and R₀Corresponding to a volume of the same size, wherein each voxel contains label information, e.g. designating a level as C₀Corresponding to a single integer value of an image voxel. Alternatively, for C from₀Each voxel of R₀A probability vector may be included whose entries correspond to the associated level probabilities. For example, for an ML model that can decide between n different levels, the vector associated with a certain image voxel consists of n entries totaling 1.

(ii) Case C0 is modified ('augmented') in a number of ways, e.g., by spatial registration with other patients, by any local deformation (bending), by adding noise, etc., or a combination of these approaches. In the case of semantic segmentation, when applying the spatial transformation, the coordinates in the original image are stored for each voxel in the curved image.

(iii) Each extended version Ci of C0 is also transmitted to the ML model, and a corresponding result Ri is obtained respectively; for semantic segmentation, a common ML model is the U-Net architecture.

(iv) The preliminary result R0 is provided to the user, aided by variability in the set { Ri }, which indicates a confidence level associated with the preliminary result R0 (where the confidence level is inversely proportional to the variability of the set { Ri }). For semantic segmentation, the confidence level for each voxel is calculated separately.

For each voxel in C0, we look up the corresponding position in each Ci and collect the output of the ML algorithm from Ri. All outputs associated with the considered voxels are summed using histograms (rank frequency summary). Based on analysis of histograms and related empirical distributions, such as shannon entropy (i.e.,where pi is the probability of a given symbol) or a kini coefficient (which is equal to the area under the absolute equator) (defined as 0.5) minus the area under the lorentz curve divided by the area under the absolute equator. In other words, twice the area between the lorentz curve and the absolute equator can be calculated).

For voxel labeling (e.g., semantic segmentation) and localization tasks, the use of augmentation through differential homomorphic spatial transformations (e.g., rigid, affine, differential homomorphic warping) may allow embodiments to uniquely transform the resulting label image or localization coordinates back to the original voxel grid (by applying the inverse transform). Thus, in the labeling task, for each voxel in the original voxel grid, the entire label population is generated. In addition to the shannon entropy or kini coefficients mentioned above, the sample variance of the label population can then be used to derive quantitative voxel-wise confidence. More precisely, the confidence is inversely proportional to the sample variance or shannon entropy.

For visualization, the confidence may be encoded in the display as color saturation or opacity using the color hue of the level label. In the positioning task, an entire set of point locations can be generated, whose distributions can likewise be superimposed as individual points or via a fitted point density function (e.g., using a normal distribution).

For classification tasks, the level of deviation R₀(which is assigned to C by the network)₀) Rank assignment of { R_iThe scale of (c) may provide a robustness measure (i.e., a confidence measure) with respect to the transform for augmentation.

Referring now to FIG. 2, a flow diagram of a method 200 for obtaining confidence measures for an ML model is depicted. For purposes of this example, the ML model includes at least one of: artificial neural networks, Generative Antagonistic Networks (GANs), and bayesian networks.

In fig. 2, step 210 includes: processing input data d using ML models₀To generate a preliminary result R₀。

Step 220 includes: generating input data d₀A plurality of modification examples of (d)_i、d_ii、d_iii). More specifically, step 220 of generating multiple modified instances of the input data includes multiple steps, namely steps 222, 224, 226. Step 222 includes applying a hash to the input data d₀Applying a first spatial bending transformation (e.g. a rigid or affine transformation) to generate a first modified instance d of the input data_i. Step 224 comprises: to input data d₀Adding noise to generate a second modified instance d of the input data_ii. Step 226 includes: for input data d₀Applying a local deformation transformation to generate a third modified instance d of the input data_iii。

Step 230 includes: processing multiple modified instances of input data using an ML model (d)_i、d_ii、d_iii) To generate a corresponding plurality of secondary results (R)_i、R_ii、R_iii). It should be noted herein that, where appropriate, the secondary results may also be transformed back (i.e., normalized) using respective inverse transforms to the preliminary result R₀A comparison is made. For example, the inverse spatial warping transform is applied to a first modified instance of the input datad_iGenerated quadratic result R_i. Furthermore, the inverse local deformation transformation is applied to the third modified instance d for the input data_iiiGenerated quadratic result R_iii。

Step 240 includes: based on (normalised) quadratic results (R)_i、R_ii、R_iii) A confidence measure associated with the ML model is determined. Herein, determining the confidence metric comprises: based on (normalised) quadratic results (R)_i、R_ii、R_iii) The inverse variance of the confidence measure. As mentioned above, in other embodiments, determining the confidence measure may include: and (5) determining the Shannon entropy or the Keyni coefficient.

The exemplary embodiment of fig. 2 also includes step 250: comparing the determined confidence measure with the preliminary result R₀And (4) associating. In this way, R of the preliminary result can be generated₀Simple representation of the reliability and comparing it with the preliminary result R₀And (4) associating. This may enable a user to pair input data d with respect to analyzing the ML model₀To quickly and easily identify and evaluate preliminary results R₀Importance and/or relevance of.

Referring now to FIG. 3, another embodiment of a system according to the present invention is depicted, including an applicable ML model 410. Herein, the ML model 410 includes a traditional neural network 410, which traditional neural network 410 may be used, for example, for medical/clinical decision services.

The neural network 410 communicates output signals representing results or decisions from processing the input data to a remotely located data processing system 430 via the internet 420 (e.g., using a wired connection or a wireless connection) for obtaining confidence measures for the ML model (such as a server).

The data processing system 430 is adapted to acquire and process input data to identify confidence measures for the ML model 410 in accordance with a method in accordance with the presented embodiments.

More specifically, the data processing system 430 takes input data and generates multiple modified instances of the input data. Data processing system 430 then communicates the input data and multiple modified instances of the input data to ML model 410. ML model 410 processes data received from data processing system 430 to generate corresponding results. More specifically, the input data is processed by ML model 410 to generate a preliminary result, and the modified instances of the input data are processed by ML model 410 to generate a corresponding plurality of secondary results. The data processing system 430 then obtains the preliminary results and the plurality of secondary results from the ML model 410. Based on the variability of the secondary results, data processing system 430 determines a confidence measure for the preliminary results created by ML model 410.

The data processing system 430 is further adapted to generate an output signal representative of the confidence measure. Thus, the data processing system 430 provides a centrally accessible processing resource that can receive input data and run one or more algorithms to identify and respectively identify the reliability of preliminary results (e.g., decisions) output by the ML model 410 for the input data. Information related to the obtained confidence measures may be stored by the data processing system (e.g., in a database) and provided to other components of the system. Such providing information about the ML model and preliminary results may be in response to receiving a request (e.g., via the internet 420) and/or may be done without a request (i.e., "pushed").

To receive information about the reliability of the ML model or preliminary results, and thus enable model/data analysis or evaluation, the system further includes a first mobile computing device 440 and a second mobile computing device 450.

Herein, the first mobile computing device 440 is a mobile phone device (such as a smartphone) having a display for displaying graphical elements representing confidence measures. The second mobile computing device 450 is a mobile computer, such as a laptop or tablet computer, having a display for displaying graphical elements representing the ML model results and associated confidence measures.

The data processing system 430 is adapted to communicate the output signals to the first mobile computing device 440 and the second mobile computing device 450 via the internet 420 (e.g., using a wired connection or a wireless connection). As mentioned above, this may be done in response to receiving a request from the first mobile computing device 440 or the second mobile computing device 450.

Based on the received output signals, the first mobile computing device 440 and the second mobile computing device 450 are adapted to display one or more graphical elements in a display area provided by their respective displays. To this end, the first mobile computing device 440 and the second mobile computing device 450 each include a software application for processing, decrypting, and/or interpreting the received output signals in order to determine how to display the graphical element. Thus, the first mobile computing device 440 and the second mobile computing device 450 each comprise a processing arrangement adapted to determine one or more values representative of the confidence measures and to generate graphical elements for modifying a size, shape, position, orientation, pulse, or color based on the confidence measures.

Thus, the system can communicate information about features in the ML model results to users of the first mobile computing device 440 and the second mobile computing device 450. For example, each of the first mobile computing device 440 and the second mobile computing device 450 may be used to display graphical elements to a healthcare practitioner, data analyst, or technician.

The implementation of the system of fig. 3 may vary between: (i) the data processing system 430 communicates the context of display-ready data, which may include, for example, display data containing graphical elements (e.g., in JPEG or other image format) that are simply displayed to a user of the mobile computing device using a conventional image or web page display (which may be a web-based browser, etc.); and (ii) a case where the data processing system 430 communicates raw data information that the receiving mobile computing device then processes to generate preliminary results and associated confidence measures (e.g., using local software running on the mobile computing device). Of course, in other implementations, processing may be shared between the data processing system 430 and the receiving mobile computing device such that portions of the data generated at the data processing system 430 are sent to the mobile computing device for further processing by the local dedicated software of the mobile computing device. Thus, embodiments may employ server-side processing, client-side processing, or any combination thereof.

Further, where data processing system 430 does not 'push' information (e.g., output signals) but instead communicates information in response to receiving a request, a user of the device making such a request may be required to confirm or verify their identity and/or security credentials in order to communicate the information.

The present invention may be a system, method and/or computer program product for obtaining confidence measures for machine learning models. The computer program product may include a computer-readable storage medium (or multiple media) having computer-readable program instructions thereon for causing a processor to perform aspects of the invention.

The computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer-readable storage medium includes the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device such as a punch card or a raised structure with instructions recorded therein, and any suitable combination of the preceding. A computer-readable storage medium as used herein should not be interpreted as a transient signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., optical pulses through an optical cable), or an electrical signal transmitted through an electrical wire.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a corresponding computing/processing device or to an external computer or external storage device via a network (e.g., the internet, a local area network, a wide area network, and/or a wireless network). The network may include copper transmission cables, optical transmission fibers, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium within the respective computing/processing device.

The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, electronic circuitry, including, for example, programmable logic circuitry, Field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), may personalize the electronic circuitry by executing computer-readable program instructions with state information of the computer-readable program instructions in order to carry out aspects of the present invention.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable storage medium having the instructions stored therein comprise an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Thus, from the above description, it should be appreciated that embodiments may be used to determine confidence measures that are not related to ML model provider declarations. The confidence measure may be specific to the input data (e.g., specific to the input case). Thus, embodiments may provide information that may be used as a robustness metric for ranking and detecting tasks. Furthermore, the data modifications/extensions may be specifically designed for a particular use case.

Thus, the presented embodiments may be applicable to a wide range of data analysis concepts/fields, including medical data analysis and clinical decision support applications. For example, embodiments may be used for medical image screening, where a medical image of a subject is used to investigate and/or evaluate the subject. For this case, pixel level information about the hierarchical uncertainty (i.e., decision confidence) may be provided, which may explain or supplement the ML model output of a medical professional (e.g., radiologist).

The description has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the proposed embodiment, one or more practical applications, and to enable others of ordinary skill in the art to understand the various embodiments with various modifications.

17页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：被配置用于检测眩光状况的可见光传感器

Confidence measures for deployed machine learning models

相关技术

网友询问留言