Neural network training method and device, electronic equipment and storage medium

文档序号：1862178 发布日期：2021-11-19 浏览：10次中文

阅读说明：本技术 神经网络的训练方法、装置、电子设备及存储介质 (Neural network training method and device, electronic equipment and storage medium ) 是由郝洋丁文彪刘子韬于 2020-04-30 设计创作，主要内容包括：本申请实施例提供一种神经网络的训练方法、装置、设备及存储介质,神经网络的训练方法包括：从用于训练神经网络的第一网络的当前样本集中选择至少一个样本组,样本组包括至少两正样本和一负样本,两正样本中包括第一样本；将样本组中的每个样本输入第一网络得到各样本对应的特征表示；针对每一个样本组,根据样本组中各样本对应的特征表示,以及各样本的置信度,确定样本组的后验概率；基于至少一个样本组对应的后验概率,在至少一个样本组中确定至少一个目标数据组；基于至少一个目标数据组对第一网络进行训练。提高了网络训练的效果,增强了神经网络的准确度。(The embodiment of the application provides a training method, a device, equipment and a storage medium of a neural network, wherein the training method of the neural network comprises the following steps: selecting at least one sample group from a current sample set of a first network for training a neural network, the sample group comprising at least two positive samples and a negative sample, the two positive samples comprising the first sample; inputting each sample in the sample group into a first network to obtain a feature representation corresponding to each sample; for each sample group, determining the posterior probability of the sample group according to the feature representation corresponding to each sample in the sample group and the confidence coefficient of each sample; determining at least one target data group in at least one sample group based on the posterior probability corresponding to the at least one sample group; the first network is trained based on at least one target data set. The effect of network training is improved, and the accuracy of the neural network is enhanced.)

1. A method of training a neural network, comprising:

selecting at least one sample group from a current sample set of a first network used to train the neural network, the sample group comprising at least two positive samples and a negative sample, the two positive samples comprising the first sample;

inputting each sample in the sample group into the first network to obtain a feature representation corresponding to each sample;

for each sample group, determining the posterior probability of the sample group according to the feature representation corresponding to each sample in the sample group and the confidence coefficient of each sample;

determining at least one target data group in at least one sample group based on the posterior probability corresponding to the at least one sample group;

training the first network based on the at least one target data set.

2. The method of claim 1, further comprising:

determining at least one second sample among the positive samples of the at least one sample group;

and determining a first characteristic representation corresponding to the first sample according to the second characteristic representation corresponding to the second sample in at least one sample group.

3. The method of claim 2, wherein determining a first feature representation corresponding to a first sample from a second feature representation corresponding to the second sample in at least one sample group comprises:

and calculating the average characteristic representation of the second characteristic representation corresponding to the second sample in at least one sample group as the first characteristic representation corresponding to the first sample.

4. The method of claim 1, wherein determining at least one target data set in at least one sample set based on a posterior probability corresponding to the at least one sample set comprises:

determining a sample group with a posterior probability smaller than a preset probability as the target data group; or the like, or, alternatively,

and sequencing the posterior probabilities according to the magnitude of the probability values, and determining a sample group which is ranked at the back or at the front in a preset proportion as the target data group.

5. The method of claim 1, wherein determining the posterior probability of the sample group according to the feature representation corresponding to each sample in the sample group and the confidence level of each sample comprises:

calculating the similarity of each sample in the sample group and the first sample according to the feature representation corresponding to each sample in the sample group and the first feature representation corresponding to the first sample;

determining the confidence of each sample according to the crowdsourcing label of each sample in the sample group;

and calculating the posterior probability of the sample group according to the similarity and the confidence coefficient.

6. The method of claim 1, further comprising:

selecting a sample set from a plurality of sample sets used to train a first network of the neural network as the current sample set; and

after training the first network based on at least one target data set in the current sample set, repeating the process of selecting a sample set from a plurality of sample sets as a current sample set and training the first network;

and adjusting the parameters of the first network until the loss value of the first network meets a first preset condition.

7. The method of claim 6, wherein adjusting the parameter of the first network until the loss value of the first network satisfies a predetermined condition comprises:

calculating a loss value of each target sample in the at least one target data group according to the similarity of each target sample in the at least one target data group and the first sample and the confidence coefficient of each target sample;

and adjusting the weight parameter in the first network according to the loss value until the loss value of the first network meets the first preset condition.

8. The method of claim 7, wherein calculating the loss value of each target sample in the at least one target data set according to the similarity of each target sample in the at least one target data set to the first sample and the confidence level of each target sample comprises:

calculating the posterior probability of each target sample in the at least one target data group according to the similarity of each target sample in the at least one target data group and the first sample and the confidence coefficient of each target sample;

and substituting the posterior probability of each target sample in the at least one target data group into a loss function to obtain the loss value.

9. The method according to any one of claims 1-8, further comprising:

and inputting the samples in the current sample set into the trained first network to obtain training feature representations corresponding to the samples, and training a second network of the neural network according to the training feature representations corresponding to the samples and the labels corresponding to the samples.

10. An apparatus for training a neural network, comprising:

a sample initialization module, configured to select at least one sample group from a current sample set of a first network used for training the neural network, where the sample group includes at least two positive samples and a negative sample, and the two positive samples include the first sample;

the characteristic representation module is used for inputting each sample in the sample group into the first network to obtain characteristic representation corresponding to each sample;

the operation module is used for determining the posterior probability of each sample group according to the feature representation corresponding to each sample in the sample group and the confidence coefficient of each sample;

the first training module is used for determining at least one target data set in at least one sample set based on the posterior probability corresponding to the sample set; training the first network based on the at least one target data set.

11. An electronic device, comprising: a processor; and a memory configured to store computer-executable instructions that, when executed, cause the processor to implement the method of any of claims 1 to 9 above.

12. A storage medium storing computer-executable instructions which, when executed, implement the method of any of claims 1 to 9.

Technical Field

The embodiment of the application relates to the technical field of machine learning, in particular to a training method and device of a neural network, electronic equipment and a storage medium.

Background

In the machine learning process, a large number of samples are needed to train the neural network model, and the trained neural network can be used for detection. In some application scenarios, sample data is usually pre-labeled, and in order to improve the precision of labeling, labeling can be performed by multiple parties, i.e. crowd-sourced labels, where one sample includes labels labeled by multiple parties.

However, the degree of inconsistency of the crowdsourcing labels is high, and the samples which are wrongly marked by a part of markers probably have a barrier effect on the training of the neural network, so that the accuracy of the neural network is reduced.

Disclosure of Invention

In view of the above, an objective of the present invention is to provide a method and an apparatus for training a neural network, an electronic device, and a storage medium, so as to overcome the above-mentioned drawbacks.

The embodiment of the application provides a training method of a neural network, which comprises the following steps:

selecting at least one sample group from a current sample set of a first network for training a neural network, the sample group comprising at least two positive samples and a negative sample, the two positive samples comprising the first sample; inputting each sample in the sample group into a first network to obtain a feature representation corresponding to each sample; for each sample group, determining the posterior probability of the sample group according to the feature representation corresponding to each sample in the sample group and the confidence coefficient of each sample; determining at least one target data group in at least one sample group based on the posterior probability corresponding to the at least one sample group; the first network is trained based on at least one target data set.

Optionally, in an embodiment of the present application, the method further includes:

determining at least one second sample among the positive samples of the at least one sample group; and determining a first characteristic representation corresponding to the first sample according to the second characteristic representation corresponding to the second sample in at least one sample group.

Optionally, in an embodiment of the present application, determining a first feature representation corresponding to a first sample according to a second feature representation corresponding to a second sample in at least one sample group includes:

and calculating the average feature representation of the second feature representations corresponding to the second samples in at least one sample group as the first feature representation corresponding to the first sample.

Optionally, in an embodiment of the present application, determining at least one target data group in at least one sample group based on a posterior probability corresponding to the at least one sample group includes:

determining a sample group with the posterior probability smaller than the preset probability as a target data group; or, sorting the posterior probabilities according to the magnitude of the probability values, and determining the sample group with a preset proportion ranked behind or in front as the target data group.

Optionally, in an embodiment of the present application, determining a posterior probability of the sample group according to the feature representation corresponding to each sample in the sample group and the confidence of each sample includes:

calculating the similarity of each sample in the target data set and the first sample according to the feature representation corresponding to each sample in the sample set and the first feature representation corresponding to the first sample; determining the confidence of each sample according to the crowdsourcing label of each sample in the sample group; and calculating the posterior probability of the sample group according to the similarity and the confidence.

Optionally, in an embodiment of the present application, the method further includes:

selecting a sample set from a plurality of sample sets of a first network for training a neural network as a current sample set; and

after training the first network based on at least one target data group in the current sample set, repeating the process of selecting a sample set from a plurality of sample sets as the current sample set and training the first network;

and adjusting parameters of the first network until the loss value of the first network meets a first preset condition.

Optionally, in an embodiment of the present application, adjusting a parameter of the first network until a loss value of the first network meets a preset condition includes:

calculating a loss value of each target sample in at least one target data group according to the similarity of each target sample in at least one target data group and the first sample and the confidence coefficient of each target sample; and adjusting the weight parameters in the first network according to the loss value until the loss value of the first network meets a first preset condition.

Optionally, in an embodiment of the present application, calculating a loss value of each target sample in the at least one target data group according to a similarity between each target sample in the at least one target data group and the first sample and a confidence of each target sample, includes:

calculating the posterior probability of each target sample in at least one target data group according to the similarity of each target sample in at least one target data group and the first sample and the confidence coefficient of each target sample; and substituting the posterior probability of each target sample in at least one target data group into a loss function to obtain a loss value.

Optionally, in an embodiment of the present application, the method further includes:

The embodiment of the application provides a training device of a neural network, including:

the device comprises a sample initialization module, a neural network analysis module and a neural network analysis module, wherein the sample initialization module is used for selecting at least one sample group from a current sample set of a first network for training the neural network, the sample group comprises at least two positive samples and a negative sample, and the two positive samples comprise the first sample;

the characteristic representation module is used for inputting each sample in the sample group into the first network to obtain characteristic representation corresponding to each sample;

the training module is used for determining at least one target data set in at least one sample set based on the posterior probability corresponding to the at least one sample set; the first network is trained based on at least one target data set.

An embodiment of the present application provides an electronic device, including: a processor; and a memory configured to store computer-executable instructions that, when executed, cause the processor to implement a method as described in any embodiment of the present application.

Embodiments of the present application provide a storage medium storing computer-executable instructions, which when executed implement a method described in any of the embodiments of the present application.

According to the training method, the training device, the electronic equipment and the storage medium of the neural network, at least one sample group is selected from a current sample set of a first network for training the neural network, the sample group comprises at least two positive samples and a negative sample, and the two positive samples comprise the first sample; inputting each sample in the sample group into a first network to obtain a feature representation corresponding to each sample; for each sample group, determining the posterior probability of the sample group according to the feature representation corresponding to each sample in the sample group and the confidence coefficient of each sample; determining at least one target data group in at least one sample group based on the posterior probability corresponding to the at least one sample group; the first network is trained based on at least one target data set. The first network in the neural network is trained independently, the training effect of the first network is improved, all samples are not directly used for network training, the target data group is selected according to the posterior probability, the first network is trained by utilizing the target data group, the influence of sample data error marking on the network training process is reduced, the network training effect is improved, and the accuracy of the neural network is enhanced.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

fig. 1 is a flowchart of a training method of a neural network according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a training method of a neural network according to an embodiment of the present disclosure;

fig. 3 is an architecture diagram of a first network according to an embodiment of the present application;

fig. 4 is a flowchart of training a first network according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating an effect of a sample distribution provided by an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating an effect of a sample distribution provided in an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating an effect of a sample distribution provided in an embodiment of the present application;

FIG. 8 is a schematic diagram of a predicted effect according to an embodiment of the present application;

fig. 9 is a block diagram of a training apparatus for a neural network according to an embodiment of the present disclosure;

fig. 10 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.

The first embodiment,

An embodiment of the present application provides a training method of a neural network, and as shown in fig. 1, fig. 1 is a flowchart of the training method of the neural network provided in the embodiment of the present application. The training method of the neural network comprises the following steps:

step 101, selecting at least one sample set from a current sample set of a first network for training a neural network.

The sample group comprises at least two positive samples and one negative sample, wherein the two positive samples comprise the first sample. As an implementation manner, the embodiment of the present invention randomly selects two positive samples (i.e., positive samples or positive examples) and a fixed number of negative samples (i.e., negative samples or negative examples) to combine into a sample group, where the two positive samples are different from each other. It should be noted that the first sample (which may also be referred to as an anchor point) is selected from the positive samples for subsequent operations, and is not a sample used for training. For example, all sample groups may share a first sample, and as another example, each sample group in all sample groups has a first sample. A positive sample may be a sample containing the target to be detected/identified, and a non-positive sample may be considered a negative sample. For example, if the neural network is identifying a cell phone, the image containing the cell phone may be taken as a positive sample, and the image not containing the cell phone may be taken as a negative sample; for another example, if the neural network recognizes a human face, an image containing the human face may be used as a positive sample, and an image not containing the human face may be used as a negative sample. Of course, this is merely an example and does not represent a limitation of the present application.

The first network may be an embedded network in a neural network, i.e. the output of the first network is the input of the next network or the machine learning model, for example, the output of the first network may be used as the input of the classifier, which is only exemplary and not meant to limit the present application.

And 102, inputting each sample in the sample group into the first network to obtain the characteristic representation corresponding to each sample.

It should be noted that, after a sample is input into the first network, according to the matrix operation of the last layer of the first network, the output result is also a feature vector, which is the feature Representation (english: Representation), so that in some application scenarios, one sample corresponds to one feature Representation. Of course, this is merely an example and does not represent a limitation of the present application.

Optionally, each sample in the same sample group may be subjected to nonlinear transformation through the first network, and an output of a last fully-connected layer of the nonlinear transformation is used as a feature representation corresponding to the sample. By using the nonlinear transformation, the effect of the first network feature extraction can be improved when the number of the first network layers is increased.

And 103, determining the posterior probability of each sample group according to the feature representation corresponding to each sample in the sample group and the confidence coefficient of each sample.

for each sample group, calculating the similarity between each sample except the first sample in the sample group and the first sample according to the feature representation corresponding to each sample in the sample group and the first feature representation corresponding to the first sample; determining the confidence of each sample according to the crowdsourcing label of each sample in the sample group; and calculating the posterior probability of the sample group according to the similarity and the confidence. In some application scenarios, a sample group contains not only a first sample (i.e., an anchor point), but also another positive example and a plurality of negative examples, and the posterior probability of the sample group indicates the similarity degree of the another positive sample and the first sample in the sample group (relative to the similarity degree of the negative sample and the first sample).

Optionally, in an embodiment of the present application, the method further includes: determining at least one second sample among the positive samples of the at least one sample group; and determining a first characteristic representation corresponding to the first sample according to the second characteristic representation corresponding to the second sample in at least one sample group. It should be noted that the second sample may be a randomly determined positive sample. It should be noted that, for each sample group, a second sample can be determined from its positive samples as the initial anchor point of the sample group, and the first sample is the target anchor point of the sample group. In addition, the other positive sample in the set of samples is different from the anchor point.

Further optionally, determining a first feature representation corresponding to the first sample according to a second feature representation corresponding to a second sample in the at least one sample group includes: and calculating the average feature representation of the second feature representations corresponding to the second samples in at least one sample group as the first feature representation corresponding to the first sample.

In an application scenario, the second sample may be determined first, and then the first sample may be determined; in another application scenario, the first feature representation corresponding to the first sample may be directly determined, and for this, four implementation manners are listed here to illustrate how to determine the first sample:

in a first implementation, the method further comprises: and calculating the average feature representation of the feature representations corresponding to the at least one positive sample as the feature representation of the first sample.

In a first implementation, the method further comprises: and calculating an average feature representation of feature representations corresponding to at least one positive sample, and taking the positive sample with the feature representation closest to the average feature representation as the first sample.

In a third implementation, the method further comprises: determining a second sample in the positive samples of the sample group and obtaining at least one second sample of at least one sample group; and calculating the average feature representation of the feature representations corresponding to the at least one second sample as the feature representation of the first sample.

In a fourth implementation, the method further includes: determining a second sample in the positive samples of the sample group and obtaining at least one second sample of at least one sample group; and calculating an average feature representation of the feature representations corresponding to at least one second sample, and taking the second sample with the feature representation closest to the average feature representation as the first sample.

The second sample may be randomly determined among all the positive samples, as long as it is guaranteed to be different from the existing positive samples in the same sample group. In the above four implementation manners, the feature representation of the first sample is determined according to the average feature representation of the multiple samples, and compared with a case where a certain sample is used as the first sample, if the sample is labeled incorrectly, the first sample determined according to the sample affects the training process, so that the accuracy of a single network for training is reduced.

Of course, this is merely an example and does not represent a limitation of the present application. Based on the above description, a specific application scenario is listed here to further describe the posterior probability, and optionally, in an application scenario, a sample group may include a first sample, a positive example and m negative examples, the positive example and the first sample together may be taken as two positive samples, m is an integer greater than 1, and the sample group may be represented as g ═ X (X ═ X)_a,X_p,X_n0,X_n0,X_n1,X_n2,……X_n(m-1)) (ii) a Wherein X_aDenotes the first sample, X_pRepresents a positive example, X_n0,X_n1,X_n2,……X_n(m-1)M negative examples are shown. The posterior probability of the sample set is based on each sample (X) in the sample set_p,X_n0,X_n0,X_n1,X_n2,……X_n(m-1)) Similarity to the first sample and each sample (X), respectively_p,X_n0,X_n0,X_n1,X_n2,……X_n(m-1)) The posterior probability of the sample group can indicate whether the positive example in the sample group is distinguished from the negative example by the network learning.

In one embodiment of the present application, the confidence of a sample is computed based on the crowd-sourced labels of the sample. Because crowd-sourced labels are labels of multiple parties to a sample, confidence may indicate how accurate the label of the sample is. For example, in an application scenario of face recognition, an image is labeled by 7 parties, where a label of 1 indicates that the image contains a face and is a positive sample, and a label of 0 indicates that the image does not contain a face and is a negative sample; if the label of the image is 7 1, the reliability of the image indicating that the image is a positive sample is high, and similarly, if the label of the image is 7 0, the reliability of the negative sample indicating that the image is a negative sample is high; if the label of the image is 51, 20, the image is represented as a positive sample, and similarly, if the label of the image is 50, 21, the image is represented as a negative sample; if the label of the image is 4 1, 3 0 s, or the label of the image is 3 0, 4 1 s, it indicates that there is a large dispute for the annotation of the image, and therefore, in the crowd-sourced label of the image, the closer the number of annotations of positive/negative samples is to 1/2 of the total number, the greater the dispute for the annotation of the sample is, the lower the confidence of the sample is (i.e., the confidence is lower), and the closer the number of annotations of positive/negative samples is to 0 or the total number, the higher the confidence of the sample is (i.e., the confidence is higher). Of course, this is merely an example and does not represent a limitation of the present application.

And 104, determining at least one target data group in at least one sample group based on the posterior probability corresponding to at least one sample group.

The target data set includes at least one target sample. It should be noted that the number of target data sets may be one or more. The at least one target sample included in each target data set may include a first sample, a positive sample, and m negative samples, because the target data set is selected from at least one sample set, and therefore, the structure of the target data set is the same as that of the sample set described in step 103, and is not described herein again.

Optionally, in a first optional implementation manner, determining a target data group in at least one sample group according to a posterior probability includes: and determining the sample group with the posterior probability being greater than or equal to the preset probability as the target data group. The greater the posterior probability is, the higher the reliability of the sample group is, and if the accuracy of the first network is low (for example, the first network is a model that is just built), the accuracy of the first network can be rapidly improved by training the first network with the sample group with the higher reliability.

In a second optional implementation manner, the posterior probabilities of at least one sample group are arranged in a descending order, and d% (d is a hyper-parameter, and d% is a preset proportion) sample groups with the top rank are selected from the at least one sample group as a target data group. And if the posterior probabilities of the at least one sample group are arranged from big to small, selecting d% of ranked sample groups from the at least one sample group as target data groups. For example, the sample group with the smallest posterior probability in at least one sample group is used as the target data group, which is, of course, only exemplified here and does not represent that the present application is limited thereto.

The smaller the posterior probability is, the more difficult it is to judge whether the sample in the sample group is a positive example or a negative example, and the accuracy of the first network can be improved by further training the first network with the sample group. For example, in combination with the first implementation manner, if the first network is a model that is just built, the accuracy of the first network may be quickly improved by using a sample group with high reliability, because the sample group with high reliability is a sample group that is easy to judge by the first network, if the training is performed by using the sample group with high reliability all the time, after the cycle is repeated many times, the accuracy of the first network may not be improved again when the accuracy of the first network is improved to a certain degree, and at this time, if the training is performed by using a sample group with low posterior probability (i.e., a sample group that is difficult to judge by the first network), the accuracy of the first network may be further improved.

Step 105, training the first network based on at least one target data set.

Optionally, in an embodiment of the present application, training the first network based on at least one target data group includes:

inputting the target data group into a first network to obtain at least one target characteristic representation corresponding to at least one target sample; the weighting parameters in the first network are adjusted according to the at least one target feature representation and the feature representation of the first sample.

Optionally, in an embodiment of the present application, the method further includes:

selecting a sample set from a plurality of sample sets of a first network for training a neural network as a current sample set; after the first network is trained on the basis of at least one target data group in the current sample set, repeating the process of selecting a sample set from a plurality of sample sets as the current sample set and training the first network; and adjusting parameters of the first network until the loss value of the first network meets a first preset condition.

And step 101-. It should be noted that, the samples are input into the first network twice in total, the first time is that each sample in the sample group is input into the first network to obtain the feature representation of each sample, and the feature representation of each sample is to calculate the posterior probability so as to select the target data group; and the second time of inputting the first network is to input the target samples in the target data group into the first network to obtain the characteristic representation of each target sample, wherein the characteristic representation of each target sample is to calculate loss values, and parameters of the first network are adjusted according to the loss values.

Optionally, in an embodiment of the present application, adjusting a parameter of the first network until a loss value of the first network meets a preset condition includes:

Optionally, in an implementation manner, the loss value is obtained by substituting the similarity, the confidence level, and a weight parameter in the neural network into the loss function, and the weight parameter when the loss value is minimum (i.e., the function value of the loss function is minimum) is used as a new weight parameter for next training. If the loss value does not drop in 5 consecutive training, the training is completed. Of course, the loss function is only exemplified here, and other functions may be set as desired, and the present application is not limited to this.

Optionally, in an embodiment of the present application, the method further includes:

and inputting the samples in the current sample set into the trained first network to obtain the training feature representation corresponding to each sample, and training the second network of the neural network according to the training feature representation corresponding to each sample and the label corresponding to each sample. The neural network comprises a first network and a second network, the output of the first network is the input of the second network, the second network is trained by utilizing the samples, the feature representation of the samples and the labels of the samples, the feature representation of the samples output by the first network is increased, and the network training effect is improved.

In the training method of the neural network provided by the embodiment of the application, at least one sample group is selected from a current sample set of a first network for training the neural network, the sample group comprises at least two positive samples and a negative sample, and the two positive samples comprise the first sample; inputting each sample in the sample group into a first network to obtain a feature representation corresponding to each sample; for each sample group, determining the posterior probability of the sample group according to the feature representation corresponding to each sample in the sample group and the confidence coefficient of each sample; determining at least one target data group in at least one sample group based on the posterior probability corresponding to the at least one sample group; the first network is trained based on at least one target data set. The first network in the neural network is trained independently, the training effect of the first network is improved, all samples are not directly used for network training, the target data group is selected according to the posterior probability, the first network is trained by utilizing the target data group, the influence of sample data error marking on the network training process is reduced, the network training effect is improved, and the accuracy of the neural network is enhanced.

Example II,

Based on the training method of the neural network described in the first embodiment, a specific application scenario is listed here to describe the training method of the neural network in detail. In this embodiment, the second network is taken as an example for explanation, in this application, the classifier refers to a machine learning model that implements a classification function, and the first network described in the first embodiment may be an embedded network of the classifier, that is, the output of the first network is the input of the classifier.

Fig. 2 is a flowchart of a training method of a neural network provided in an embodiment of the present application, and as shown in fig. 2, the training method of the neural network provided in the embodiment of the present application includes the following steps:

step 201, preprocessing at least one sample to obtain a plurality of sample groups for performing a first batch of training on a first network of a neural network.

The pre-processing of the at least one sample comprises dividing the at least one sample into at least one sample group; a confidence is calculated for each sample based on the crowd-sourced labels.

In one implementation, for one training process, a sample of a certain class label may be randomly selected in the sample pool as a positive sample, and all other samples of different class labels may be used as negative samples. Randomly selecting one sample from all positive samples as a second sample; then randomly selecting a different positive sample and m negative samples to form a sample group; and selecting the next first sample, and randomly selecting a different positive sample and m negative samples from the rest samples in the sample pool to form another sample group. This is done until 512 sample sets are selected, which is, of course, only exemplary and not intended to limit the present application. The 512 sample sets may be used as a training batch, and the 256 data play sets may also be used as a training batch, which is not limited in the present application, and in the present embodiment, only 512 sample sets are used as an example for illustration, which does not represent that the present application is limited thereto.

Calculating a confidence for each sample in the sample group according to the crowdsourcing label of each sample in the sample group, specifically, calculating by a formula one, wherein the formula one is as follows:

where δ represents the confidence of the sample, vote represents the number of positive example labels in the crowdsourcing labels of the sample, and max (vote) represents the total number of labels in the crowdsourcing labels, for example, 7 people label a sample at the same time, where max (vote) is 7. In connection with the explanation of the first embodiment, the closer the number of labels of the positive sample/labels of the negative sample is to 1/2 of the total number, the greater the dispute of the sample, the lower the confidence level of the sample (i.e. the lower the confidence), and the closer the number of labels of the positive sample/labels of the negative sample is to 0 or the total number, the higher the confidence level of the sample (i.e. the higher the confidence).

Step 202, training a first network using at least one sample set.

Fig. 3 is an architecture diagram of a first network provided in an embodiment of the present application, and referring to fig. 3, taking fig. 4 as an example to specifically describe training of the first network, and fig. 4 is a flowchart of training of the first network provided in an embodiment of the present application, where training of the first network includes the following steps:

step 2021, inputting the samples in the at least one sample group into the first network to obtain a feature representation of the at least one sample.

Step 2022, determine the first sample. The distance between each sample and the first sample is explained here.

The first sample may be referred to as an anchor point, which may be understood as a sample for reference. As shown in fig. 5, fig. 5 is a schematic diagram illustrating an effect of a sample distribution provided in an embodiment of the present application; in fig. 5, white triangles represent anchor points, white circles represent positive samples, and black circles represent negative samples. The training target of the first network is to increase the similarity between the anchor point and the same type of data, which is equivalent to reducing the distance between the anchor point and the same type of data; the similarity between the anchor point and the heterogeneous data is reduced, which is equivalent to increasing the distance between the anchor point and the heterogeneous data, the distance between the anchor point and the homogeneous data is minimum, and when the distance between the anchor point and the heterogeneous data is maximum, the first network accuracy is very high.

Of course, in another application scenario, an anchor point may be selected incorrectly, for example, a negative example is selected as the anchor point, as shown in fig. 6, fig. 6 is a schematic diagram illustrating an effect of a sample distribution provided by the embodiment of the present application; in fig. 6, the black triangles represent anchor points, the white circles represent positive samples, and the black circles represent negative samples.

Because there is a small amount of mislabeled data, in some training processes, the labels of randomly selected anchor points may be mislabeled, which may reduce the accuracy of the first network.

Therefore, at least one second sample may be randomly selected first, and the first sample may be determined by using the feature representation corresponding to the second sample, for example, an average feature representation of the feature representation corresponding to the second sample may be calculated by formula three, and the first sample may be determined according to the average feature representation, where formula three is as follows:

wherein, F_averARepresenting the average of the characterizations of the corresponding at least one second sample, size representing the number of groups of samples, which in this embodiment may be 512, and embed (xa) representing the characterizations of the second samples.

As shown in fig. 7, fig. 7 is a schematic diagram of an effect of a sample distribution provided in an embodiment of the present application, in fig. 7, white circles represent positive samples, and black circles represent negative samples, and it can be seen that an area where the white circles are located also includes the black circles, but the first sample determined by calculating the average feature representation is not affected too much. The positive sample with the feature representation closest to the average feature representation may be used as the first sample, or the average feature representation may be directly operated on as the feature representation of the first sample, which greatly reduces the impact of the incorrectly labeled sample on the accuracy of the first network.

Step 2023, calculate the similarity between each sample and the first sample.

For samples in the same sample group, calculating the similarity between each sample and the first sample of the sample group, specifically, calculating the similarity according to the euclidean distance, as shown in formula two:

r(F_a,F_p)＝const-||F_a-F_p||₂(formula two);

wherein r (F)_a,F_p) Represents the similarity of the sample to the first sample, | F_a-F_p||₂Representing Euclidean distance between the sample and the first sample, const is a constant, and const, F can be obtained by means of cross validation_pRepresenting a characteristic representation of the sample, F_aA feature representation of the first sample is represented. The Euclidean distance is used for calculating the distance between two vectors, and the larger the distance is, the smaller the similarity is, and the smaller the distance is, the larger the similarity is. Since the sample is usually represented by a feature vector, and the feature representation obtained after the first network operation is also a vector, the calculation may be performed by using euclidean distance, which is only an exemplary illustration and does not represent a limitation to the present application.

Step 2024, select a target data set among the at least one sample set.

The posterior probability for each of the at least one sample group may be calculated by equation four, which is as follows:

wherein the content of the first and second substances,representing the posterior probability, x_*Representing samples of the set of samples other than the first sample, x_pRepresenting positive samples, δ representing confidence, δ_pRepresents a positive sample x_pConfidence of, δ_*Represents a sample x_*R represents similarity, F represents a feature representation, e.g., F (x)_p) Representing a characteristic representation of a positive sample, F_averARepresenting a feature representation of the first sample (i.e., an average feature representation obtained by equation three), F (x)_*) Represents a sample x_*Is shown.

The posterior probability of each sample group can be calculated according to a formula III, the posterior probabilities of at least one sample group are arranged from small to large, and the first d% (d is a hyperparameter) sample groups in at least one array are selected as target data groups.

The selection criteria may be expressed by equation five, which is as follows:

where IA (. cndot.) is an illustrative function and percentile (. cndot.) is a percentile calculation. Of course, this is merely an example and does not represent a limitation of the present application. The accuracy of the first network can be improved by training with a sample group with a small posterior probability (i.e. a sample group which is difficult to judge by the first network).

Step 2025, train the first network with the target data set.

And after the target data set is input into the first network, obtaining at least one target feature representation corresponding to at least one target sample, calculating the similarity of each target sample according to the at least one target feature representation, calculating the similarity by using a formula II, calculating the posterior probability according to the similarity and the confidence coefficient, referring to a formula IV, and calculating the loss value by using the posterior probability. The loss value may be calculated by equation six, which is as follows:

where θ ═ argminl (g), L (g) denotes a loss function, i.e., a loss value, and the training target of the first network is the network weight θ at which the loss function L is minimum.

After the step 202 is completed, the steps 201 and 202 may be executed in a recycling manner until l (g) continues for t times without decreasing, and t may be set by itself, for example, t may be 5 times, 10 times, and the like, which is not limited in this application.

Step 203, training a classifier by using the feature representation of the at least one sample and the crowd-sourced label of the at least one sample.

It should be noted that the feature representation of at least one sample may be obtained by inputting at least one sample into the trained first network, which is only exemplary and not meant to limit the present application.

There is no requirement here for a specific class of classifier. For example, in one specific embodiment, Logistic Regression (LR) may be selected as the classifier.

Steps 201-203 describe the network training process in detail, and after the network training is completed, the trained classifier can be used for prediction. And inputting the data to be predicted into the trained first network to obtain feature representation, and inputting the feature representation into a classifier to obtain a prediction result.

When a new sample is predicted, at least one new data is input into a trained first network to obtain a corresponding feature representation, and then the feature representation is input into a trained classifier to obtain a corresponding prediction result.

The data feature representation mode is obtained through the neural network, so that the classification effect of the classifier on crowdsourcing data is enhanced; two real data sets with different feature quantities are listed to illustrate the effectiveness of the algorithm, as shown in fig. 8, fig. 8 is a schematic diagram of a prediction effect provided by the embodiment of the present application, the input features of the data set 1 are 50-dimensional text-based statistical features, the training set samples 908 are, the test set samples 200 are, and the task is to determine how fluent a student spoken language expression is (whether fluent or not, two classes); the input features of the data set 2 are 1632-dimensional acoustic emotion features, 406 training set samples, 200 testing set samples, and the task is to judge whether the emotion of the voice is rich (yes/no, two categories); the number of crowdsourcing marking personnel is 11, and the crowdsourcing marking personnel are classified into two categories.

Simple classifiers such as LR classifier and Gradient Boosting iterative Decision Tree (GBDT) are used as baseline contrast, and the simple classifier with the best classification effect can be taken as the baseline for each data set. The results data in fig. 8 demonstrate the effectiveness of the algorithm of the present invention in crowdsourcing small scale data.

Example III,

Based on the training method of the neural network described in the foregoing embodiments, an embodiment of the present application provides a training apparatus of a neural network, configured to perform the training method of the neural network described in any one of the foregoing embodiments, as shown in fig. 9, where the training apparatus 90 of the neural network includes:

a sample initialization module 901, configured to select at least one sample group from a current sample set of a first network used for training a neural network, where the sample group includes at least two positive samples and a negative sample, and the two positive samples include the first sample;

a feature representation module 902, which inputs each sample in the sample group into the first network to obtain a feature representation corresponding to each sample;

the operation module 903 is used for determining the posterior probability of each sample group according to the feature representation corresponding to each sample in the sample group and the confidence coefficient of each sample;

a training module 904 for determining at least one target data set in the at least one sample set based on the posterior probability corresponding to the at least one sample set; the first network is trained based on at least one target data set.

Optionally, in an embodiment of the present application, the feature representation module 902 is further configured to determine at least one second sample among the positive samples of the at least one sample group; and determining a first characteristic representation corresponding to the first sample according to the second characteristic representation corresponding to the second sample in at least one sample group.

Optionally, in an embodiment of the present application, the feature representation module 902 is configured to calculate an average feature representation of second feature representations corresponding to second samples in at least one sample group as a first feature representation corresponding to a first sample.

Optionally, in an embodiment of the present application, the operation module 903 is configured to calculate a similarity between each sample in the target data set and the first sample according to the feature representation corresponding to each sample in the sample set and the first feature representation corresponding to the first sample; determining the confidence of each sample according to the crowdsourcing label of each sample in the sample group; and calculating the posterior probability of the sample group according to the similarity and the confidence.

Optionally, in an embodiment of the present application, the sample initializing module 901 is configured to select a sample set from a plurality of sample sets of a first network used for training a neural network as a current sample set;

a sample initialization module 901 and a training module 904, configured to repeat a process of selecting a sample set from multiple sample sets as a current sample set and training a first network after training the first network based on at least one target data set in the current sample set; and adjusting parameters of the first network until the loss value of the first network meets a first preset condition.

Optionally, in an embodiment of the present application, the training module 904 is configured to calculate a loss value of each target sample in the at least one target data set according to a similarity between each target sample in the at least one target data set and the first sample and a confidence of each target sample; and adjusting the weight parameters in the first network according to the loss value until the loss value of the first network meets a first preset condition.

Optionally, in an embodiment of the present application, the training module 904 is configured to calculate a posterior probability of each target sample in the at least one target data set according to a similarity between each target sample in the at least one target data set and the first sample and a confidence of each target sample; and substituting the posterior probability of each target sample in at least one target data group into a loss function to obtain a loss value.

Optionally, in an embodiment of the present application, the training module 904 is configured to determine a sample group with a posterior probability smaller than a preset probability as a target data group; or, sorting the posterior probabilities according to the magnitude of the probability values, and determining the sample group with the top rank or the top rank in a preset proportion as the target data group.

Optionally, in an embodiment of the present application, the operation module 903 is further configured to calculate a confidence of the sample according to the crowdsourcing label of the sample.

Optionally, in an embodiment of the present application, the training module 904 is further configured to input samples in the current sample set into the trained first network, obtain a training feature representation corresponding to each sample, and train the second network of the neural network according to the training feature representation corresponding to each sample and the label corresponding to each sample.

The training device for the neural network provided by the embodiment of the application selects at least one sample group from a current sample set of a first network for training the neural network, wherein the sample group comprises at least two positive samples and a negative sample, and the two positive samples comprise the first sample; inputting each sample in the sample group into a first network to obtain a feature representation corresponding to each sample; for each sample group, determining the posterior probability of the sample group according to the feature representation corresponding to each sample in the sample group and the confidence coefficient of each sample; determining at least one target data group in at least one sample group based on the posterior probability corresponding to the at least one sample group; the first network is trained based on at least one target data set. The first network in the neural network is trained independently, the training effect of the first network is improved, all samples are not directly used for network training, the target data group is selected according to the posterior probability, the first network is trained by utilizing the target data group, the influence of sample data error marking on the network training process is reduced, the network training effect is improved, and the accuracy of the neural network is enhanced.

Example four,

Based on the training method of the neural network described in the foregoing embodiments, an embodiment of the present application provides an electronic device for performing the training method of the neural network described in any of the foregoing embodiments, as shown in fig. 10, where the electronic device 100 includes: at least one processor (processor)1002 and memory (memory) 1004.

The memory 1004 stores therein computer-executable instructions that, when executed, cause the processor 1002 to implement a method of training a neural network as described in any of the embodiments of the present application.

Optionally, the electronic device may further include a bus 1006 and a communication Interface 1008 (Communications Interface), wherein the processor 1002, the communication Interface 1008, and the memory 1004 communicate with each other via the communication bus 1006.

A communication interface 1008 for communicating with other devices.

The processor 1002 may be a central processing unit CPU or an ASIC specific integrated circuit

(Application Specific Integrated Circuit) or one or more Integrated circuits configured to implement embodiments of the invention. The electronic device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

The memory 1004 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

Example V,

Based on the training method of the neural network described in the above embodiments, an embodiment of the present application provides a storage medium, where computer-executable instructions are stored, and when executed, implement the method described in any embodiment of the present application.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include: smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.

(3) A portable entertainment device: such devices can display and play multimedia content. This type of device comprises: audio, video players (e.g., ipods), handheld game consoles, electronic books, and smart toys and portable car navigation devices.

(4) And other electronic equipment with data interaction function.

Thus, particular embodiments of the present subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may be advantageous.

The method illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular transactions or implement particular abstract data types. The application may also be practiced in distributed computing environments where transactions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

22页详细技术资料下载

Neural network training method and device, electronic equipment and storage medium

相关技术

网友询问留言