Learning data enhancement strategy

文档序号：1047882 发布日期：2020-10-09 浏览：3次中文

阅读说明：本技术 学习数据增强策略 (Learning data enhancement strategy ) 是由维贾伊·瓦苏德万巴雷特·佐福埃金·多乌什·丘布克国·V·勒于 2019-05-20 设计创作，主要内容包括：方法、系统和装置,包括编码在计算机存储介质上的计算机程序,其用于学习用于训练机器学习模型的数据增强策略。在一个方面,一种方法包括：接收用于训练机器学习模型以执行特定机器学习任务的训练数据；确定多个数据增强策略,包括在多个时间步长中的每一个：基于在先前时间步长处生成的数据增强策略的质量度量,生成当前数据增强策略；使用当前数据增强策略,在训练数据上训练机器学习模型；以及,在使用当前数据增强策略已经对机器学习模型进行训练之后,使用该机器学习模型确定当前数据增强策略的质量度量；以及基于所确定的数据增强策略的质量度量,选择最终数据增强策略。(Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for learning data enhancement strategies for training machine learning models. In one aspect, a method comprises: receiving training data for training a machine learning model to perform a particular machine learning task; determining a plurality of data enhancement policies, including at each of a plurality of time steps: generating a current data enhancement policy based on a quality metric of the data enhancement policy generated at a previous time step; training a machine learning model on training data using a current data enhancement strategy; and, after the machine learning model has been trained using the current data enhancement strategy, determining a quality metric for the current data enhancement strategy using the machine learning model; and selecting a final data enhancement policy based on the determined quality metric of the data enhancement policy.)

1. A method, comprising:

receiving training data for training a machine learning model to perform a particular machine learning task, the training data comprising a plurality of training inputs;

determining a plurality of data enhancement strategies, wherein each data enhancement strategy has a plurality of data enhancement strategy parameters that define a process of transforming a training input before the training input is used to train a machine learning model, the determining comprising, at each of a plurality of time steps:

generating a current data enhancement policy based on a quality metric of a data enhancement policy generated at a previous time step, wherein the quality metric of the data enhancement policy represents performance of the machine learning model for a particular machine learning task as a result of training the machine learning model using the data enhancement policy;

training a machine learning model on the training data using the current data-enhancement strategy, wherein training a machine learning model using a data-enhancement strategy comprises:

selecting a batch of training data;

according to the data enhancement strategy, determining a batch of enhanced training data by transforming training input in the batch of training data; and

adjusting the current values of the parameters of the machine learning model based on the enhanced batch of training data; and

after training the machine learning model using the current data enhancement policy, determining a quality metric of the current data enhancement policy using the machine learning model;

selecting a final data enhancement policy based on the determined quality metric of the data enhancement policy; and

a final trained machine learning model is generated by training a final machine learning model using the final data enhancement strategy.

2. The method of claim 1, wherein the particular machine learning task is an image processing task comprising classification or regression.

3. The method of any of claims 1-2, wherein:

each data enhancement policy includes one or more sub-policies;

each sub-policy comprises a sequence of one or more transform tuples, wherein for each transform tuple the data enhancement policy parameters define: (i) a transform operation, and (ii) a size of the transform operation; and

transforming the training inputs in the batch of training data according to the data enhancement strategy includes, for each training input:

identifying a sub-policy included in the data enhancement policy; and

transforming the training input according to the identified sub-strategy by applying each transformed tuple included in the identified sub-strategy to the training input in turn.

4. The method of claim 3, wherein identifying the sub-strategies included in the data enhancement strategy for the training input comprises randomly sampling the sub-strategies included in the data enhancement strategy.

5. The method of any of claims 3 to 4, wherein applying a transform tuple to the training input comprises:

applying the transform operation in the transform tuple to the training input with the transform operation size in the transform tuple.

6. The method of any of claims 3 to 4, wherein:

for each transform tuple, the data enhancement policy parameter further defines a probability of applying the transform operation; and

applying a transform tuple to the training input comprises:

applying the transform operation in the transform tuple to the training input with the transform operation size in the transform tuple with the transform probability in the transform tuple.

7. The method of any of claims 1 to 6, wherein the machine learning model is a neural network, and adjusting the current values of the machine learning model parameters based on the enhanced batch of training data comprises:

determining a gradient of a loss function using the enhanced batch of training data: and

adjusting current values of the machine learning model parameters using the gradient.

8. The method of any of claims 1 to 7, wherein:

generating the current data enhancement policy based on a quality metric of a data enhancement policy generated at a previous time step comprises: generating the current data enhancement strategy by using a strategy neural network according to the current value of the strategy neural network parameter; and

the policy neural network is trained by reinforcement learning techniques, and at each time step, the reinforcement learning reward signal is based on a quality metric of the current data enhancement policy at that time step.

9. The method of claim 8, wherein, for each data enhancement policy parameter, the policy network output defines a score distribution of possible values for the data enhancement policy parameter.

10. The method of claim 9, wherein determining the current data enhancement policy from the policy network output comprises: sampling a value of each data enhancement policy parameter using the score distribution of the data enhancement policy parameters.

11. The method of any of claims 8 to 10, wherein the policy neural network is a recurrent neural network.

12. The method of any of claims 1 to 7, wherein generating a current data enhancement policy based on a quality metric of a data enhancement policy generated at a previous time step comprises:

generating the current data enhancement policy using a genetic programming process.

13. The method of any of claims 1 to 12, wherein determining the quality metric of the current data enhancement policy using the machine learning model after the machine learning model has been trained using the current data enhancement policy comprises:

determining, using validation data comprising a plurality of training inputs, a performance metric of the machine learning model for a particular machine learning task;

determining the quality metric based on the performance metric.

14. The method of claim 13, wherein the training input included in the validation data is not included in the training data.

15. The method of any of claims 1 to 14, wherein selecting the final data enhancement policy based on the determined quality metric of the data enhancement policy comprises:

the determined data enhancement policy with the highest quality score is selected.

16. The method of any of claims 1-15, wherein the training input is an image.

17. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations of the respective methods of any of claims 1-16.

18. One or more computer storage media storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations of the respective methods of any of claims 1-16.

Technical Field

This specification relates to processing data using machine learning models.

Background

The machine learning model receives input and generates output, e.g., predicted output, based on the received input. Some machine learning models are parametric models, and generate an output based on the received inputs and values of the parameters of the model.

Some machine learning models are depth models that employ a multi-layer model to generate output for received input. For example, a deep neural network is a deep machine learning model that includes an output layer and one or more hidden layers, each applying a non-linear transformation to a received input to generate an output.

Disclosure of Invention

This specification describes a training system implemented as a computer program on one or more computers at one or more locations.

According to a first aspect, a method is provided that includes receiving training data for training a machine learning model to perform a particular machine learning task. The training data includes a plurality of training inputs. A plurality of data enhancement strategies are determined, wherein each data enhancement strategy has a plurality of data enhancement strategy parameters that define a process of transforming a training input before the training input is used to train a machine learning model.

At each of a plurality of time steps, a current data enhancement policy is generated based on a quality metric of a data enhancement policy generated at a previous time step. The quality metric of the data-enhancement policy represents performance of the machine learning model for the particular machine learning task as a result of training the machine learning model using the data-enhancement policy. The machine learning model is trained on training data using a current data enhancement strategy. Training a machine learning model using a data enhancement strategy includes: selecting a batch of training data; determining a batch of enhanced training data by transforming training input in the batch of training data according to a data enhancement strategy; and adjusting current values of machine learning model parameters based on the enhanced set of training data. After training the machine learning model using the current data enhancement strategy, a quality metric for the current data enhancement strategy is determined using the machine learning model.

A final data enhancement policy is selected based on the determined quality metric of the data enhancement policy. The final trained machine learning model is generated by training the final machine learning model using a final data enhancement strategy.

In some embodiments, the particular machine learning task is an image processing task that includes classification or regression.

In some embodiments, each data enhancement policy includes one or more sub-policies. Each sub-policy comprises a sequence of one or more transform tuples, wherein for each transform tuple the data enhancement policy parameters define: (i) a transform operation, and (ii) a size of the transform operation. Transforming training inputs in a batch of training data according to a data enhancement strategy includes, for each training input: identifying a sub-policy included in the data enhancement policy; and transforming the training input according to the identified sub-strategy by applying each transformed tuple included in the identified sub-strategy to the training input in turn.

In some embodiments, identifying the sub-strategies included in the data enhancement strategy for the training input includes randomly sampling the sub-strategies included in the data enhancement strategy.

In some embodiments, applying the transform tuple to the training input comprises: the transform operations in the transform tuple are applied to the training input with the transform operation size in the transform tuple.

In some embodiments, for each transform tuple, the data enhancement policy parameters further define a probability of applying the transform operation; and applying the transform tuple to the training input comprises: applying, with the transform probabilities in the transform tuples, the transform operations in the transform tuples to the training input with the transform operation sizes in the transform tuples.

In some embodiments, the machine learning model is a neural network, and adjusting the current values of the machine learning model parameters based on the enhanced batch of training data comprises: determining a gradient of the loss function using the enhanced set of training data: and adjusting current values of the machine learning model parameters using the gradients.

In some embodiments, generating the current data enhancement policy based on the quality metric of the data enhancement policy generated at the previous time step comprises generating the current data enhancement policy using a policy neural network in accordance with current values of policy neural network parameters; and training the strategy neural network through a reinforcement learning technology, and at each time step, reinforcing the quality metric of the strategy based on the current data of the time step by the reinforcement learning reward signal.

In some embodiments, for each data enhancement policy parameter, the policy network outputs a score distribution defining possible values for the data enhancement policy parameter.

In some implementations, determining, by the policy network output, the current data enhancement policy includes sampling a value of each data enhancement policy parameter using a score distribution of the data enhancement policy parameters.

In some embodiments, the strategic neural network is a recurrent neural network.

In some embodiments, after the machine learning model has been trained using the current data enhancement strategy, determining the quality metric of the current data enhancement strategy using the machine learning model comprises: determining a performance metric of the machine learning model for the particular machine learning task using validation data comprising a plurality of training inputs; and determining a quality metric based on the performance metric.

In some embodiments, the training input included in the validation data is not included in the training data.

In some implementations, selecting a final data enhancement policy based on the quality metric of the determined data enhancement policy includes selecting the determined data enhancement policy with the highest quality score.

In some embodiments, the training input is an image.

According to a second aspect, there is provided a system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising the operations of the first aspect.

According to a third aspect, one or more computer storage media storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising the operations of the first aspect are provided.

Particular embodiments of the subject matter described in this specification can be implemented to realize one or more of the following advantages.

The training system described in this specification may use a "learned" data enhancement strategy to train a machine learning model instead of a manually designed data enhancement strategy. The training system may identify a particular data enhancement strategy to learn the data enhancement strategy by automatically searching a space of possible data enhancement strategies that enable a machine learning model trained using the particular data enhancement strategy to efficiently perform the machine learning task.

Machine learning models trained using learned data enhancement strategies may perform better (e.g., by achieving higher prediction accuracy) than machine learning models trained using manually designed data enhancement strategies. Moreover, using learned data enhancement strategies may enable machine learning models to be trained to achieve acceptable performance levels using less training data than would otherwise be required, thereby enabling more efficient use of computing resources (e.g., memory).

The data enhancement strategies learned by the training system described in this specification may, in some cases, be communicated between training data sets. That is, the data enhancement strategy learned with reference to the first training data set may be used to effectively train the machine learning model on the second training data set (i.e., even if the data enhancement strategy is not learned with reference to the second training data set). The transferability of data enhancement strategies learned by the training system may reduce consumption of computing resources, for example, by enabling the learned data enhancement strategies to be reused on a new training data set, rather than learning new data enhancement strategies for the new training data set.

The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the present subject matter will become apparent from the description, the drawings, and the claims.

Drawings

Fig. 1 illustrates an exemplary training system.

FIG. 2 illustrates an exemplary policy generation engine.

FIG. 3 is a diagram of an exemplary data enhancement policy.

Fig. 4 is a graphical illustration of the effect of applying different sub-strategies of a data enhancement strategy to an original image.

Fig. 5 shows an example of performance gains that may be achieved by using a training system.

Fig. 6 depicts different types of transformation operations that may be applied to an image.

FIG. 7 is a flow diagram of an exemplary process for automatically selecting a data enhancement strategy and using the data enhancement strategy to train a machine learning model.

Like reference numbers and designations in the various drawings indicate like elements.

Detailed Description

This specification describes a training system implemented as a computer program on one or more computers in one or more locations that trains a machine learning model using data enhancement strategies to perform machine learning tasks. Data enhancement strategies may be used to increase the number and diversity of training inputs used in training the machine learning model, resulting in a trained machine learning model that more efficiently performs machine learning tasks (e.g., with higher prediction accuracy). The training system automatically searches a space of possible data enhancement strategies to identify a particular data enhancement strategy, and then trains the machine learning model using the particular data enhancement strategy.

Fig. 1 illustrates an exemplary training system 100. Training system 100 is an example of a system implemented as a computer program on one or more computers in one or more locations in which the systems, components, and techniques described below may be implemented.

The training system 100 is configured to generate a trained machine learning model 102, the trained machine learning model 102 being trained to perform a machine learning task by training the machine learning model 104 using: (i) a training data set 106, and (ii) a "final" data enhancement policy 108. As described in more detail below, the training system 100 identifies the final data enhancement strategy 108 by searching a space of possible data enhancement strategies.

The machine learning task may involve processing the images to generate predictions based on the images. That is, the machine learning model 104 may be configured to process inputs comprising images to generate respective outputs, e.g., classification outputs, regression outputs, or a combination thereof. Following are a few examples of machine learning tasks.

In one example, the machine learning model is configured to process the image to generate a classification output including a respective score corresponding to each of a plurality of categories. The score for a category indicates the likelihood that the image belongs to that category. In some cases, a category may be an object class (e.g., dog, cat, person, etc.), and an image may belong to the category if the image depicts objects included in the object class corresponding to the category. In some cases, a category may represent a global image attribute (e.g., an image depicts a day or night scene, or an image depicts a summer or winter scene), and may belong to the category if the image has a global attribute corresponding to the category.

In another example, the machine learning model is configured to process the image to generate a pixel-level classification output that includes, for each pixel, a respective score corresponding to each of a plurality of categories. For a given pixel, the score for a category indicates the likelihood that the pixel belongs to that category. In some cases, a category may be an object class, and a pixel may belong to the category if the pixel belongs to a portion of an object included in the object class corresponding to the category. That is, the pixel-level classification output may be a semantic segmentation output.

In another example, the machine learning model is configured to process an image to generate a regression output that estimates one or more continuous variables that characterize the image (i.e., an infinite number of possible values may be assumed). In a particular example, the regression output may estimate coordinates of a bounding box that encloses various objects depicted in the image. The coordinates of the bounding box may be defined by the x-y coordinates of the vertices of the bounding box.

For convenience, this specification will primarily describe the machine learning task as processing images to generate predictions based on the images. More generally, a machine learning task may be any task that involves processing input to generate a prediction based on the input. Some other examples of machine learning tasks are described next.

In one example, the machine learning task may be a speech recognition task, wherein the machine learning model is configured to process a representation of an audio waveform to generate an output characterizing a sequence of phonemes, characters or words corresponding to the audio waveform.

In another example, the machine learning task may be a video analysis task, wherein the machine learning model is configured to process a sequence of video frames to generate an output characterizing the video frames, e.g., by characterizing whether the video frames depict a person performing a particular action.

In another example, the machine learning task may be a natural language processing task, wherein the machine learning model is configured to process a portion of text to generate an output characterizing the portion of text, e.g., by characterizing a translation of the portion of text into another natural language.

The training data 106 is composed of a plurality of training examples, where each training example specifies a training input and a corresponding target output. The training input includes an image. The target output represents the output that should be generated by the machine learning model by processing the training input. For example, the target output may be a classification output specifying a class (e.g., an object class) corresponding to the input image, or a regression output specifying one or more continuous variables (e.g., object bounding box coordinates) corresponding to the input image.

The machine learning model 104 may have any suitable machine learning model architecture. For example, the machine learning model may be a neural network model, a random forest model, a Support Vector Machine (SVM) model, a linear model, or a combination thereof.

The system 100 may receive the training data 106 and the data defining the machine learning model 104 in any of a variety of ways. For example, the system 100 may receive the training data 106 and the data defining the machine learning model 104 as uploads from remote users of the system 100, e.g., over a data communication network, e.g., using an Application Programming Interface (API) that makes the system 100 available.

The data enhancement strategy is defined by a set of parameters (referred to in this document as "data enhancement strategy parameters") that specify the process of transforming (i.e., included in the training examples) the training input prior to using the training input to train the machine learning model. Examples of data enhancement policies and corresponding data enhancement policy parameters are described in more detail with reference to fig. 3.

The process for transforming the training input typically includes applying one or more operations (referred to in this document as "transformation operations") to the images included in the training input. These operations may be any suitable kind of image processing operation, such as a translation operation, a rotation operation, a clipping operation, a color inversion operation, or a combination thereof. The data enhancement policy parameters may specify which transformation operations should be applied, at what size, and at what probability. An example of a possible transformation operation is described in more detail with reference to fig. 6.

The system 100 implements a search algorithm to identify data enhancement strategies with a high "quality metric" from a space of possible data enhancement strategies. The quality metric of the data enhancement strategy characterizes the performance (e.g., prediction accuracy) of a machine learning model trained using the data enhancement strategy. For convenience, a higher quality metric is understood herein to imply better performance (e.g., higher prediction accuracy).

The system 100 may determine a quality metric for a data enhancement policy by evaluating the performance of a machine learning model trained using the data enhancement policy on a "validation set". The validation set consists of one of more training examples that are not used in training the machine learning model. For example, the system 100 may determine the quality metric 110 based on any suitable performance metric of the trained machine learning model on the validation set, such as (FI score or Matthews correlation coefficient in the case of a classification task, or (in the case of a regression task) square error or absolute error for a lower value performance metric (e.g., square error performance metric) indicating better performance of the trained machine learning model, the quality metric of the data enhancement policy may be inversely proportional to the performance metric (i.e., thus, a higher quality metric still implies better performance).

The system 100 may obtain the validation set, for example, by randomly partitioning a larger training data set to generate training data 106 (i.e., for training the machine learning model) and the validation set (i.e., for evaluating the performance of the trained machine learning model).

As used throughout the document, the space of possible data enhancement policies refers to the space parameterized by the possible values of the data enhancement policy parameters.

The training system 100 includes a training engine 112 and a policy generation engine 114.

In each of a plurality of iterations, referred to herein as "time-steps," policy generation engine 114 generates one or more "current" data enhancement policies 116. For each current data enhancement strategy 116, the system 100 uses the training engine 112 to train the machine learning model 104 using the current data enhancement strategy, and thereafter, determines the quality metrics 110 for the current data enhancement strategy. The policy generation engine 114 uses the quality metric 110 of the current data enhancement policy 116 to improve the expected quality metric of the data enhancement policy generated for the next time step.

Training a machine learning model refers to determining a training value of a parameter of the machine learning model from an initial value of the parameter of the machine learning model. The training engine 112 may train the machine learning model 104 starting from initial values of, for example, randomly selected or default machine learning model parameters.

In general, the machine learning model may be trained using data enhancement strategies by transforming the training inputs of existing training examples to generate "new" training examples, and then using the new training examples (in lieu of or in addition to the existing training examples) to train the machine learning model. For example, the images included in the training input of the training example may be transformed by applying one or more image processing operations specified by the data enhancement policy to the images.

In some cases, the training inputs of the training examples may be transformed (e.g., according to a data enhancement strategy) while maintaining the same respective target outputs. For example, for an image classification task where the target output specifies a type of object depicted in the training input, applying an image processing operation (e.g., rotation, cropping, etc.) to the images included in the training input will not affect the type of object depicted in the images. Thus, in this example, the transformed training input will correspond to the same target output as the original training input.

However, in some cases, transforming the training inputs of the training examples may also require changing the target outputs of the training examples. In one example, a target output corresponding to a training input may specify coordinates of a bounding box that encloses an object depicted in an image of the training input. In this example, applying a panning operation to the images of the training input would require applying the same panning operation to the bounding box coordinates specified by the target output.

The particular operations performed by the training engine 112 to train the machine learning model 104 using the data enhancement strategy depend on the architecture of the machine learning model 104, e.g., whether the machine learning model 104 is a neural network model or a random forest model. Referring to FIG. 7, an example of training a neural network model using a data enhancement strategy is described in more detail.

Policy generation engine 114 may use any of a variety of techniques to search the space of possible data enhancement policies. In one embodiment, the policy generation engine 114 generates the data enhancement policy using a neural network, referred to herein as a "policy" network. The policy network may be trained using reinforcement learning techniques, wherein the reward is provided by a quality metric corresponding to the data enhancement policy generated by the policy neural network (i.e., the policy network is trained to maximize). An exemplary embodiment of the policy generation engine 114 is described in more detail with reference to FIG. 2, the policy generation engine 114 using a policy network trained using reinforcement learning techniques.

In other embodiments, the policy generation engine 114 may use any of a variety of other techniques to search the space of possible data enhancement policies. For example, the policy generation engine 114 may use, for example, the stochastic search techniques described in, with reference to H.Mania, A.Guy and B.Recht, "Simple random search results a dynamic adaptive search implementation learning (a competitive approach is provided by Simple stochastic search)", arXiv:1803.07055v1,2018. As another example, the policy generation engine 114 may use, for example, the evolutionary search technique described in reference to E.real, A.Aggarwal, Y.Huang, and Q.V.le: "Regularized evolution for image classifier architecture search" arXiv:1802.01548,2018 (i.e., "genetic programming process").

The system 100 may continue to generate data enhancement policies until search termination criteria are met. For example, if a data enhancement strategy has been generated for a predetermined number of time steps, the training system 100 may determine that the search termination criteria are met. As another example, the training system 100 may determine that the search termination criteria are met if the quality metric of the generated data enhancement policy meets a predetermined threshold.

After determining that the search termination criteria are met, the system 100 selects a final data enhancement policy based on the respective quality metrics 110 of the generated data enhancement policies. For example, the system 100 may select the data enhancement strategy generated by the training system 100 with the highest quality score as the final data enhancement strategy. As another example, which will be described in more detail with reference to fig. 3, the system 100 may combine a predetermined number (e.g., 5) of data enhancement policies generated by the training system with the highest quality scores to generate the final data enhancement policy 108.

The system 100 may generate the trained machine learning model 102 by training the machine learning model 104 on the training data 106 using the final data enhancement strategy 108, and then output data defining the trained machine learning model 102.

In some embodiments, the system 100 uses the trained machine learning model 102 to process requests received by a user, for example, through an API provided by the system 100, instead of or in addition to outputting the trained machine learning model 102. That is, the system 100 may receive input to be processed, process the input using the trained machine learning model 102, and provide output generated by the trained machine learning model 102 or data derived from the generated output in response to the received input.

Although the training system 100 determines the final data enhancement strategy 108 with respect to a particular training data set 106, the final data enhancement strategy 108 may (in some cases) be passed to other training data sets. That is, the final data enhancement strategy 108 determined with respect to the training data 106 may be used to effectively train other machine learning models on different training data sets.

FIG. 2 illustrates an exemplary policy generation engine 114. The policy generation engine 114 includes: a policy neural network 202 configured to generate policy network outputs according to parameters of the policy network 202; and a parameter update engine 204 configured to adjust policy network parameter values. Each policy network output 206 generated by the policy network 202 defines a corresponding data enhancement policy 116.

In particular, each policy network output 206 includes a respective output at each of a plurality of output locations, and each output location corresponds to a different data enhancement policy parameter. Thus, each policy network output includes a respective value of a respective data enhancement policy parameter at each output location. In general, the values of the data enhancement policy parameters specified by a given policy network output define a data enhancement policy.

At each time step, the policy generation engine 114 generates one or more policy network outputs 206 using the policy network 202 according to the current values of the policy network parameters, each policy network output 206 defining a corresponding data enhancement policy 116.

Policy network 202 is a recurrent neural network that includes one or more recurrent neural network layers, for example, a Long Short Term Memory (LSTM) layer or a Gated Recurrent Unit (GRU) layer. The policy network 202 generates the policy network output 206 by sequentially generating a respective data-enhanced policy parameter corresponding to each output location in the sequence of output locations of the policy network output 206. In particular, for each output location, the policy network 202 receives as input a value of a data enhancement policy parameter corresponding to a previous output location in the policy network output 206, and processes the input to update the current hidden state of the policy network 202. For a first output location in the policy network output, the policy network 202 may process the predetermined placeholder input since there is no previous output location.

The policy network 202 also includes a respective output layer for each output location in the policy network output 206. The output layer corresponding to the output location processes the updated hidden state of the policy network 202 corresponding to the output location to generate an output defining a score distribution of possible values of the data enhancement policy parameters corresponding to the output location. For example, the output layer may first project the updated hidden state of the policy network 202 into the appropriate dimension of the number of possible values for the corresponding data enhancement policy parameters. The output layer may then apply softmax to the hidden state of the projection to generate a respective score for each of the plurality of possible values for the respective data enhancement policy parameter.

For each output location, the policy network 202 generates a value for the corresponding data enhancement policy parameter by sampling a distribution of scores of possible values of the data enhancement policy parameter (i.e., generated by the corresponding output layer). The possible values that a given data enhancement strategy parameter may take are fixed prior to training, and the number of possible values may be different for different data enhancement strategy parameters.

Typically, the values of the policy network parameters do not change within a time step. However, by sampling the values of the data enhancement policy parameters from a distribution of scores of possible values, the policy network 202 may generate a plurality of different policy network outputs 206 per time step (i.e., each policy network output 206 corresponds to a different data enhancement policy 116).

For each data enhancement strategy 116 generated at a time step, the training system 100 trains the machine learning model using the data enhancement strategy 116 and then determines a corresponding quality metric 110 (as previously described) for the trained machine learning model.

The parameter update engine 204 then uses the quality metrics 110 as reward signals to update the current values of the policy network parameters using reinforcement learning techniques. That is, the parameter update engine 204 adjusts the current values of the policy network parameters by training the policy network using reinforcement learning techniques to generate policy network outputs that result in improving the quality metrics of the corresponding data enhancement policies. More specifically, parameter update engine 204 trains the policy network to generate a policy network output that maximizes the received reward determined based on the quality metric of the corresponding data enhancement policy. In particular, the reward for a given policy network output is a function of a quality metric of the data enhancement policy defined by the given policy network output. For example, the reward may be one of or directly proportional to: quality metric, quality metric squared, quality metric cubed, quality metric square root, and the like.

In some cases, the parameter update engine 204 trains the policy network 202 using policy gradient techniques to maximize the desired reward. For example, the policy gradient technique may be a REINFORCE technique or a near-end policy optimization (PPO) technique. For example, at a given time step, the parameter update engine 204 updates the current values of the policy network parameters using a gradient combination of the policy network parameters given by the following expression:

where K is indexed to the policy network outputs generated at the time step, K is the total number of policy network outputs generated at the time step, T is indexed to the output position in the policy network outputs, T is the total number of output positions in the policy network outputs, ▽_θIs a gradient operator with respect to a policy network parameter, P (a)_k,t) Is a score corresponding to a data enhancement policy parameter generated at an output location t in a policy network output k, R_kIs the reward for policy network outcome k and b is a baseline function, such as the exponential moving average of previous rewards.

By repeatedly updating the values of the policy network parameters, the parameter update engine 204 may train the policy network 202 to generate policy network outputs that define data enhancement policies that result in improved performance of the trained machine learning model on machine learning tasks. That is, updating the values of the policy network parameters in this manner may improve the expected accuracy of the validation set of the machine learning model trained using the data enhancement policies proposed by the policy network.

Fig. 3 is an illustration of an exemplary data enhancement policy 300. The data enhancement policy 300 is comprised of one or more "sub-policies" 302-A through 302-N. Each sub-policy specifies a series of transformation operations (e.g., 304-A through 304-M), such as image processing operations, e.g., translation, rotation, or clipping operations. Each transformation operation has an associated size (e.g., 306-A through 306-M) and an associated probability (e.g., 308-A through 308-M).

The size of the transform operation is an ordered set of one or more values that specify how the transform operation is to be applied to the training input. For example, the size of the panning operation may specify the number of pixels that the image should be panned in the x and y directions. As another example, the small of the rotation operation may specify the number of radians that the image should be rotated.

To transform the training input using the data enhancement strategy 300, a sub-strategy is selected from a set of sub-strategies included in the data enhancement strategy, for example, by randomly sampling the sub-strategies. Then, for each transformation operation in the sequence of transformation operations specified by the selected sub-policy, the transformation operation is applied to the training input with a probability associated with the transformation operation. If a transformation operation is applied to the training input, it is applied at the size associated with the transformation operation. The transformation operations applied to the training input are applied according to the order of the transformation operations specified by the sub-policies. A graphical illustration of the effect of applying different sub-policies to an image is shown in fig. 4.

The parameters of the exemplary data enhancement policy 300 include, for each transformation operation of each sub-policy: (i) the type of transformation operation (e.g., translation, rotation, clipping, etc.), (ii) the size of the transformation operation, and (iii) the probability of applying the transformation operation.

Each data enhancement policy parameter may have a predetermined set of possible values. For example, the type of transformation operation may be selected from a predetermined set of possible transformation types. The size of the transform operation may have a predetermined number of possible values, for example, evenly spaced over a continuous range of allowable values. For example, for a rotation operation, the continuous range of allowable values may be [0,2 π ] radians, and the possible values for the magnitude of the rotation operation may be: {0, π/2, π,3 π/2 }. The probability of applying a transform operation may have a predetermined number of possible values, e.g., evenly spaced across the range [0,1 ]. In one example, the possible values of the probability of applying the transformation operation may be: {0,0.2,0.4,0.6,0.8,1}.

The data enhancement policy may have a predetermined number of sub-policies, and each sub-policy may specify a predetermined number of transformation operations. In some cases, the probability of applying the transformation operation may be 0, resulting in the transformation operation never being applied to the training input. Designing the space for data enhancement strategies in this manner may increase the variety of data enhancement strategies discovered.

In a particular example, the data enhancement policy may have 5 sub-policies, each specifying 2 transform operations, each having a respective scalar size and probability value, and may be any one of 16 different transform types, in this example, the data enhancement policy will have a total of 5 × 2 × 2 × 16-320 parameters, if the size of the transform operation has 10 possible values, and the probability of applying the transform operation has 11 possible values, the space of possible data enhancement policies will include about 29 × 10³²A data enhancement policy.

As described with reference to fig. 1, the training system 100 may combine a predetermined number of data enhancement strategies generated by the training system 100 with the highest quality scores to generate a final data enhancement strategy. For data enhancement policies of the form described with reference to fig. 3, multiple data enhancement policies may be combined by aggregating their respective sub-policies into a single combined data enhancement policy. To apply the combined data enhancement strategy to the training input, one of the sub-strategies may be randomly selected from the combined set of sub-strategies, and the training input may be transformed according to a sequence of transformation operations specified by the randomly selected sub-strategy (as previously described).

Fig. 4 is an illustration of the effect of applying different sub-strategies to an original image 400. For example, "sub-policy 1" specifies a series of transformation operations, including: (i) an equalization operation with a corresponding probability of 0.4 and a magnitude of 4, and (ii) a rotation operation with a corresponding probability of 0.8 and a magnitude of 8 degrees. The equalization operation and the rotation operation are described in more detail with reference to fig. 6. The effect of applying sub-policy 1 to the original image 400 at three different times is illustrated by 402, 404 and 406. Fig. 4 also illustrates the effect of applying "sub-policy 2", "sub-policy 3", "sub-policy 4", and "sub-policy 5" to the original image 400.

Fig. 5 shows an example of the performance gain that can be achieved by using the training system described in this specification. For each of the multiple training data sets (i.e., "CIFAR-10", "CIFAR-100", "SVHN", "StanfordCars", and "ImageNet"), the error rate of the neural network model trained using the data enhancement strategy selected by the training system 100 (i.e., evaluated on the corresponding validation set) exceeds some of the best published results for that training data set.

Fig. 6 depicts different types of transformation operations that may be applied to an image. The value of one of the parameters of any given enhancement policy may define which of the set of transform types shown in fig. 6 are included in a given enhancement policy.

FIG. 7 is a flow diagram of an exemplary process 700 for automatically selecting a data enhancement strategy and using the data enhancement strategy to train a machine learning model. For convenience, process 700 is described as being performed by a system of one or more computers located in one or more locations. For example, a training system such as training system 100 of FIG. 1, suitably programmed with reference to the present description, may perform process 700.

The system receives training data for training a machine learning model to perform a particular machine learning task (702). For example, the system may receive training data through an API available to the system. The training data includes a plurality of training examples, each example specifying a training input and a corresponding target output.

The system performs steps 704 through 708 of process 700 at each of a plurality of time steps. For convenience, each of steps 704 and 708 will be described as being performed at a "current" time step.

The system generates one or more current data enhancement policies based on quality metrics of the data enhancement policies generated at previous time steps (704). If the current time step is a first time step, the system may generate the data enhancement policies in any of a variety of ways, for example, by randomly selecting parameter values that define each data enhancement policy. In one embodiment, the system may generate a current data enhancement policy incentive (e.g., as described with reference to fig. 2) using a policy network that is trained using reinforcement learning techniques based on prizes defined by quality metrics of data enhancement policies generated at previous time steps. In other embodiments, the system may use any of a variety of suitable search techniques, such as random search techniques or evolutionary search techniques, to generate the current data enhancement policy.

For each current data enhancement strategy, the system trains a machine learning model (706) on training data using the current data enhancement strategy. In one example, the machine learning model is a neural network model, and the system trains the neural network model in a plurality of training iterations. At each training iteration, the system selects a current "batch" (i.e., set) of one or more training examples, and then determines a training example for the "enhanced" batch by transforming the training inputs in the training examples of the current batch using a current data enhancement strategy. Optionally, the system may adjust the target output in the training examples of the current batch to account for the transformation applied to the training input (as described previously). The system processes the transformed training input according to the current parameter values of the machine learning model to generate a corresponding output. The system then determines the gradient of the objective function, which may measure the similarity between: (i) outputs generated by the machine learning model, and (ii) target outputs specified by the training examples, and current values of machine learning model parameters are adjusted using the gradients. The system may determine the gradient using, for example, a back propagation process, and the system may use the gradient to adjust the current values of the machine learning model parameters using any suitable gradient descent optimization process, such as the RMSprop or Adam process.

For each current data enhancement policy, after the machine learning model has been trained using the current data enhancement policy, the system determines a quality metric for the current data enhancement policy using the machine learning model (708). For example, the system may determine the quality metric as a performance metric (e.g., FI score or mean square error) of the trained machine learning model on a validation dataset composed of a plurality of training examples that are not used to train the machine learning model.

The system may repeatedly perform steps 704 through 708 until the search termination criteria are met (e.g., if steps 704 through 708 have been performed a predetermined number of times).

After the system determines that the search termination criteria are met, the system selects a final data enhancement policy based on the quality metrics of the data enhancement policies generated at step 704 and 708 (710). For example, the system may generate a final data enhancement policy by combining a predetermined number of data enhancement policies generated during steps 704 through 708 with the highest quality scores.

The system generates a final trained machine learning model by training the machine learning model on training data using a final data enhancement strategy (712). The system may provide the final trained machine learning model to a user of the system. Alternatively or in combination, the system may process requests received from the user using a trained machine learning model.

In some cases, the system may train the final machine learning model using the final data enhancement strategy for more training iterations than when the system trains the machine learning model using the "current" data enhancement strategy generated at step 706. For example, the system may train the final machine learning model using a final data enhancement strategy until a convergence criterion is met, e.g., until a prediction accuracy of the final machine learning model evaluated on the validation set reaches a minimum. On the other hand, the system may train each of the current data enhancement strategies generated at step 706 for a smaller (e.g., fixed) number of training iterations.

This specification uses the term "configured" in connection with system and computer program components. A system of one or more computers to be configured to perform particular operations or actions means that the system has installed thereon software, firmware, hardware, or a combination of software, firmware, hardware that in operation causes the system to perform the operations or actions. By one or more computer programs to be configured to perform particular operations or actions is meant that the one or more programs include instructions which, when executed by a data processing apparatus, cause the apparatus to perform the operations or actions.

Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly embodied computer software or firmware, in computer hardware comprising the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a tangible, non-transitory storage medium for execution by, or to control the operation of, data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access storage device, or a combination of one or more of them. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.

The term "data processing apparatus" refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. An apparatus may also be, or further comprise, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can optionally include, in addition to hardware, code that creates an execution environment for the computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

A computer program, which may also be referred to or described as a program, software application, app, module, software module, script, or code, may be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages; and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, such as one or more scripts stored in a markup language document; in a single file dedicated to the program in question, or in multiple coordinated files, such as files that store one or more modules, sub-programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.

In this specification, the term "engine" is used broadly to refer to a software-based system, subsystem, or process that is programmed to perform one or more particular functions. Typically, the engine will be implemented as one or more software modules or components installed on one or more computers in one or more locations. In some cases, one or more computers will be dedicated to a particular engine; in other cases, multiple engines may be installed and run on the same computer or multiple computers.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and in combination with, special purpose logic circuitry, e.g., an FPGA or an ASIC.

A computer suitable for executing a computer program may be based on a general purpose microprocessor, or a special purpose microprocessor, or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a central processing unit for executing or carrying out instructions and one or more memory devices for storing instructions and data. The central processing unit and the memory can be supplemented by, or incorporated in, special purpose logic circuitry. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Further, the computer may be embedded in another device, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game controller, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a Universal Serial Bus (USB) flash drive, etc.

Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and storage devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD-ROM disks.

To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback, such as visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. Further, the computer may interact with the user by sending documents to and receiving documents from the device used by the user; for example, by sending a web page to a web browser on the user's device in response to receiving a request from the web browser. In addition, the computer may interact with the user by sending a text message or other form of message to a personal device, such as a smartphone that is running a messaging application, and then receiving a response message from the user.

The data processing apparatus for implementing a machine learning model may further comprise a dedicated hardware accelerator unit for processing common and computationally intensive parts of the machine learning training or production, i.e. reasoning, workload, for example.

The machine learning model may be implemented and deployed using a machine learning framework. The machine learning framework is, for example, a TensorFlow framework, a Microsoft Cognitive Toolkit framework, an Apache Singa framework, or an Apache MXNet framework.

Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server; or include middleware components, such as application servers; or include a front-end component, such as a client computer having a graphical user interface, a web browser, or an app with which a user can interact with an implementation of the subject matter described in this specification; or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a Local Area Network (LAN) and a Wide Area Network (WAN), such as the Internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, the server transmits data, e.g., HTML pages, to the user device, e.g., for the purpose of displaying data to a user interacting with the device as a client and receiving user input from the user. Data generated at the user device, e.g., a result of the user interaction, may be received at the server from the device.

While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings and described in the claims in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Specific embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous.

24页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：大规模并行神经推理计算元件

Learning data enhancement strategy

相关技术

网友询问留言