Meta-learning-based large-scale multi-label text classification method

文档序号：1889944 发布日期：2021-11-26 浏览：5次中文

阅读说明：本技术 一种基于元学习的大规模多标签文本分类方法 (Meta-learning-based large-scale multi-label text classification method ) 是由戴新宇汪然苏希傲龙思宇于 2021-08-27 设计创作，主要内容包括：本发明公开了一种基于元学习的大规模多标签文本分类方法,主要包括：使用采样策略从训练集中采样获得若干样本,形成若干子任务,基于采样获得的若干子任务让模型进行元学习,元学习后的模型在原始数据集上使用监督学习方法进行微调,使得模型的性能进一步提升。本方法将大规模多标签文本分类问题转化为一个元学习问题,通过构造大量含有少样本和零样本标签的多标签文本分类子任务并优化模型在这些任务上的泛化误差,让模型能够显式的学习如何更好的预测那些少样本和零样本标签。(The invention discloses a large-scale multi-label text classification method based on meta-learning, which mainly comprises the following steps: sampling is carried out on the training set by using a sampling strategy to obtain a plurality of samples to form a plurality of subtasks, the model is subjected to meta-learning based on the plurality of subtasks obtained by sampling, and the model after the meta-learning is subjected to fine tuning on an original data set by using a supervised learning method, so that the performance of the model is further improved. The method converts a large-scale multi-label text classification problem into a meta-learning problem, and enables the model to learn how to better predict the labels of the few samples and the zero samples by constructing a large number of multi-label text classification subtasks containing the labels of the few samples and the zero samples and optimizing the generalization errors of the model on the tasks.)

1. A large-scale multi-label text classification method based on meta-learning is characterized by comprising the following steps:

step S1: acquiring a data set, and dividing the data set into a training set, a verification set and a test set according to a proportion; the data set is a set of samples, and the samples consist of a section of natural language text and related labels thereof;

step S2: randomly initializing model parameters;

step S3: sampling from the training set by using a sampling strategy to obtain a plurality of samples which are used as a sample set of the subtasks; performing meta-learning by the model based on a plurality of subtasks obtained by sampling;

step S4: the model after meta-learning is subjected to fine tuning on an original data set by using a supervised learning method;

step S5: and testing the test samples in the test set, and selecting a plurality of labels with the highest model prediction probability as prediction results.

2. The method for large-scale multi-label text classification based on meta-learning as claimed in claim 1, wherein the sampling strategies in step S3 include sample-based sampling strategies and label-based sampling strategies.

3. The method for large-scale multi-label text classification based on meta-learning as claimed in claim 2, wherein in step S3, a sample-based sampling strategy is used, i.e. a sample set with several samples sampled uniformly from the sample set without replacement is used as a subtask.

4. The method as claimed in claim 2, wherein in step S3, a label-based sampling strategy is used, that is, a plurality of labels are uniformly sampled from the sample set, and a sample is randomly selected from the sample set labeled with each label as the sample set of the subtask.

5. The method as claimed in claims 3 and 4, wherein the probability of uniform sampling, that is, collecting each sample or label, is the same.

6. The method for classifying text according to claim 2, wherein in step S3, the sample set obtained by sampling with the sample-based sampling strategy and the sample set obtained by sampling with the label-based sampling strategy are respectively divided into the support set and the query set according to a certain proportion.

7. The method for classifying text according to claim 2, wherein in step S3, the sub-tasks obtained based on sample sampling and the sub-tasks obtained based on label sampling are mixed according to a certain ratio, so that the model performs meta-learning based on the sub-tasks.

8. The method of claim 7, wherein the model performs meta-learning based on a plurality of subtasks, and comprises:

step S31: calculating a loss function value of the model on the support set by using a binary cross entropy loss function;

step S32: updating the model parameters by using a gradient descent algorithm according to the loss function value calculated in the step S31;

step S33: calculating a loss function value of the model updated in the step S32 on the query set by using a binary cross entropy loss function;

step S34: the initial model is selected for learning using a particular optimizer based on the loss function values on the query set in step S32.

9. The method for large-scale multi-label text classification based on meta-learning as claimed in claim 1, wherein the step S4 includes:

step S41: inputting a training set, a verification set and model parameters obtained after meta-learning;

step S42: calculating samples in a training set by using a forward algorithm, namely predicting the probability of each label being positive according to the natural language text;

step S43: calculating the probability of the predicted label and the loss function value of the real label by using a binary cross entropy loss function;

step S44: calculating the gradient of the loss function value to each model parameter, and updating the model parameters by using a back propagation algorithm;

step S45: calculating the prediction performance of the model on the verification set by using a specific evaluation index, and evaluating the model;

step S46: judging whether the model performance is improved, if so, returning to the step S42 to continue iterative training, otherwise, executing the step S47;

step S47: and finishing the training model.

10. The method according to claim 9, wherein in step S45, the evaluation index is top 5 recall, that is, the recall of the top 5 ranked tags is selected as the evaluation index.

Technical Field

The invention belongs to the field of artificial intelligence multi-label text classification, and relates to a large-scale multi-label text classification method based on meta-learning.

Background

Large-scale Multi-label Text Classification (Large-scale Multi-label Text Classification) is an important and practical technique in the field of artificial intelligence. It is widely used in a number of scenarios, such as the aggregation of a large number of articles, the automatic diagnosis of diseases from patient medical records, the assignment of relevant legal concept labels to the act, etc. Among these tasks, due to the huge label collection and limited human labeling resources, the large-scale multi-label text classification task usually faces the challenge of long-tail label distribution, i.e. many labels have only few or even no labeled samples.

At present, the mainstream multi-label text classification technical scheme is to encode each text into a dense expression vector through a deep neural network, and then allocate a two-classifier to each label for prediction. However, such methods require training with labeling data consisting of a large amount of text and corresponding relevant labels, and accurate recognition can only be guaranteed with a predetermined label system. In a real multi-label text classification task scene, labels tend to present a severe long tail distribution, that is, many labels in a set have only a very small number (few samples) or even no samples (zero samples), and with the continuous evolution of a label system, new labels are added, and how to better deal with the labels also becomes an important challenge.

Patent CN113076426A discloses a method, an apparatus, a device and a storage medium for multi-label text classification and model training, in the process of training a multi-label text classification model, a classifier capable of capturing correlation between labels is trained based on the predicted features of each label output by the model, and the trained multi-label text classification model can capture correlation between labels more accurately by synchronously training the classifier and the multi-label text classification model. However, the labels of the few samples and the zero samples cannot be well predicted because the prior information of the labels is not utilized.

Rios A proposes a multi-label text classification model based on a text matching mechanism in a paper (Rios A, Kavuluu R. Few-shot and zero-shot multi-label learning for structured label spaces, proceedings of the Conference on Empirical Methods in Natural Language processing.2020.), uses description information of labels to match with text to complete prediction, and uses a volume neural network on a hierarchical structure diagram of the labels to mine the correlation between the labels, thereby being capable of dealing with labels with small samples and zero samples. Lu J has been proposed in the paper (Lu J, Du L, Liu M, et al, Multi-Label Few/Zero-shot Learning with Knowledge Aggregated from Multiple Label graphs. proceedings of the 2020Conference on Empirical Methods in Natural Language processing. 2020). besides using a priori Label hierarchy maps, descriptive similarity maps and co-occurrence maps among labels are manually constructed and using a graph neural network to mine Label correlation information in Multiple aspects. In both of the above two papers, a text matching model is used to solve the problem of multi-label text classification, and a relationship graph between label description information and labels is combined, so that labels with few samples and zero samples can be handled. However, the methods only use a common supervised learning method for model training, so that the model is more prone to predict the common labels with the large number of correct samples, and the prediction accuracy of the model for the labels with few samples and zero samples is low.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the technical problem that the prediction accuracy of a model for a few-sample label and a zero-sample label in the prior art is low, and provides a large-scale multi-label text classification method based on meta-learning.

In order to solve the technical problem, the invention discloses a large-scale multi-label text classification method based on meta-learning, which comprises the following steps:

step S2: randomly initializing model parameters;

step S4: the model after meta-learning is subjected to fine tuning on an original data set by using a supervised learning method;

step S5: and testing the test samples in the test set, and selecting a plurality of labels with the highest model prediction probability as prediction results.

Preferably, in step S1, the data set is divided into a training set, a validation set and a test set according to a ratio of 15:2: 2.

Further, in step S2, the parameters of the model are typically many different tensors, and if all the tensors are initialized to 0, the model will not converge, and some specific random strategies need to be used for initialization, where the random strategies include uniform initialization and normal initialization.

Further, in the step S3, the sampling strategy includes a sample-based sampling strategy and a label-based sampling strategy.

Further, in the step S3, a sample-based sampling strategy is used, that is, a sample set with a plurality of samples sampled uniformly from the sample set without replacement is used as a subtask.

Further, in step S3, a label-based sampling strategy is used, that is, a plurality of labels are uniformly sampled from the sample set, and one sample is randomly selected from the sample set labeled with each label as the sample set of the subtask.

Further, the uniform sampling is that the probability of collecting each sample or label is the same.

Further, in step S3, the sample set obtained by sampling using the sample-based sampling strategy and the sample set obtained by sampling using the label-based sampling strategy are respectively divided into the support set and the query set in proportion.

Preferably, a sample set obtained by sampling based on the sampling strategy of the sample is divided into a support set and a query set according to the ratio of 2: 1; the sample set obtained by sampling based on the sampling strategy of the label is divided into a support set and a query set according to the ratio of 2: 1.

Further, in step S3, a plurality of subtasks obtained based on the sample sampling and a plurality of subtasks obtained based on the tag sampling are mixed according to a certain ratio, so that the model performs meta-learning based on the plurality of subtasks.

Further, the model performs meta-learning based on a plurality of subtasks, including:

step S31: calculating a loss function value of the model on the support set by using a binary cross entropy loss function;

step S32: updating the model parameters by using a gradient descent algorithm according to the loss function value calculated in the step S31;

step S33: calculating a loss function value of the model updated in the step S32 on the query set by using a binary cross entropy loss function;

step S34: the initial model is selected for learning using a particular optimizer based on the loss function values on the query set in step S32.

Preferably, the number of steps of updating the model parameters is set to 1 in consideration of time efficiency.

Further, the step S4 includes:

step S41: inputting a training set, a verification set and model parameters obtained after meta-learning;

step S42: calculating samples in a training set by using a forward algorithm, namely predicting the probability of each label being positive according to the natural language text;

step S43: calculating the probability of the predicted label and the loss function value of the real label by using a binary cross entropy loss function;

step S44: calculating the gradient of the loss function value to each model parameter, and updating the model parameters by using a back propagation algorithm;

step S45: calculating the prediction performance of the model on the verification set by using a specific evaluation index, and evaluating the model;

step S46: judging whether the model performance is improved, if so, returning to the step S42 to continue iterative training, otherwise, executing the step S47;

step S47: and finishing the training model.

Further, in the step S45, the evaluation index is a recall rate.

Preferably, the recall of the top 5 ranked tags is chosen as the evaluation index.

The invention provides a large-scale multi-label text classification method based on meta-learning, which converts the large-scale multi-label text classification problem into a meta-learning problem, and enables a model to learn how to better predict labels of few samples and zero samples by constructing a large number of multi-label text classification subtasks containing the few samples and the zero samples and optimizing the generalization errors of the model on the tasks.

Has the advantages that: compared with the prior art, the invention has the advantages and effects that:

in the technical aspect, (1) a large-scale multi-label text classification problem is converted into a meta-learning problem for the first time; (2) a novel meta-learning algorithm suitable for large-scale multi-label text classification scenes is provided, and how to predict few-sample and zero-sample labels is predicted through model explicit learning.

From the application level, (1) a plurality of labels related to a piece of text can be automatically predicted without manual classification; (2) with the continuous evolution of the label system, the invention can automatically and accurately predict the new labels only by a small amount of labels or even without labels, thereby further reducing the labor cost required by labeling samples.

Drawings

FIG. 1 is a flow chart of a meta-learning large-scale multi-label text classification algorithm of the present invention;

FIG. 2 is a flow chart of the model meta-learning algorithm of the present invention;

FIG. 3 is a flow chart of a model tuning algorithm of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, a method for classifying a large-scale multi-label text based on meta-learning includes:

the invention adopts a data set EURLEX57K to carry out experiments, the EURLEX57K data set is a large-scale multi-label text classification data set in the legal field, and comprises 57,000 legal documents and 4271 legal concepts as labels, and each document is labeled with about 5 labels on average; wherein 4271 tags are divided into 746 common tags, Frequent (e.g., "international affairs", "tax unity"), 3362 small sample tags, Few-shot (e.g., "immigration", "license of business"), and 163 Zero sample tags, Zero-shot (e.g., "criminal liability", "military research"), depending on whether they are assigned to more than 50, less than 50 but at least one, or no documents, respectively; dividing a data set into a training set, a verification set and a test set according to the ratio of 15:2: 2;

corresponding preprocessing is needed for the texts and the labels in the data set, namely, all words in the descriptions of the texts and the labels are extracted, and an Embedding matrix is used for converting all the words into a vector; then respectively inputting the word vectors of the document and the word vectors described by the labels into a text Encoder module and a label Encoder module so as to obtain text expression vectors and label expression vectors, and then carrying out the subsequent steps;

step S2: randomly initializing model parameters; the parameters of the model are generally many different tensors, if all the tensors are initialized to 0, the model generally cannot be converged, and some specific random strategies need to be used for initialization, wherein the random strategies comprise uniform initialization and normal initialization; the present invention uses the current best performing AGRU-KAMG model, which has about 30,000,000 trainable parameters;

step S3: as shown in fig. 2, a sampling strategy is used to obtain a plurality of samples from a training set, so as to form a plurality of subtasks, and the model performs meta-learning based on the plurality of subtasks obtained by sampling;

the sampling strategies comprise sample-based sampling strategies and label-based sampling strategies; wherein, a sampling strategy based on samples is used, namely a plurality of samples are uniformly sampled from a sample set without being replaced to be used as a sample set of a subtask; the invention samples 192 samples evenly;

using a label-based sampling strategy, namely uniformly sampling a plurality of labels from the labeled text set, and randomly selecting a sample from the sample set labeled with each label as a sample set of the subtasks; the invention samples 192 labels evenly;

wherein, the probability of uniformly sampling, namely collecting each sample or label is the same;

dividing a sample set obtained by sampling by using a sample-based sampling strategy and a sample set obtained by sampling by using a label-based sampling strategy into a support set and a query set according to the proportion of 2: 1;

mixing a plurality of subtasks obtained based on sample sampling and a plurality of subtasks obtained based on label sampling according to the proportion of 1:1, and enabling the model to perform meta-learning based on the plurality of subtasks; in the specific implementation, 300 subtasks are sampled for meta-learning.

The model performs meta-learning on 300 subtasks obtained based on sampling, including:

step S31: calculating a loss function value of the model on the support set by using a binary cross entropy loss function; the loss function value of the model on the support set after the meta-learning convergence is about 2.0;

step S32: updating the model parameters by using a gradient descent algorithm according to the loss function value calculated in the step S31; in the process, time efficiency is considered, and the number of steps for updating the model parameters is 1;

step S33: calculating a loss function value of the model updated in the step S32 on the query set by using a binary cross entropy loss function; the loss function value of the model on the query set after meta-learning convergence is about 0.8;

step S34: the initial model is selected to be learned by using a specific optimizer according to the loss function value on the query set in step S32; the invention uses Adam optimizer for meta-learning.

Step S4: as shown in FIG. 3, the meta-learned model is fine-tuned on the original data set using supervised learning methods, including;

step S41: inputting a training set, a verification set and model parameters obtained after meta-learning;

step S42: calculating samples in a training set by using a forward algorithm, namely predicting the probability of each label being positive according to the natural language text;

step S43: calculating the probability of the predicted label and the loss function value of the real label by using a binary cross entropy loss function; the loss function value on the verification set after model convergence is about 15;

step S44: calculating the gradient of the loss function value to each model parameter, and updating the model parameters by using a back propagation algorithm;

step S45: calculating the prediction performance of the model on the verification set by using a specific evaluation index, and evaluating the model; the invention adopts the first 5 recall ratios to evaluate the model; wherein the top 5 recall represents the recall of the top 5 ranked tags;

step S46: judging whether the model performance is improved, if so, returning to the step S42 to continue iterative training, otherwise, executing the step S47;

step S47: and finishing the training model.

Step S5: testing the test samples in the test set, and selecting a plurality of labels with the highest model prediction probability as prediction results; the similarity between the text and each label vector is used for predicting the labels, namely, a multi-label classification task is converted into a text matching task, so that the labels with few samples and zero samples can be effectively dealt with.

Through experiments, the experimental results of the invention on the test set are shown in the following table:

	Overall	Frequent	Few-shot	Zero-shot
					AGRU-KAMG	66.0	72.4	59.1	54.5
AGRU-KAMG+ours	67.7	74.2	64.3	59.0

wherein, the AGRU-KAMG + sources means that a meta-learning method based on a special sampling strategy is used on the basis of the AGRU-KAMG model; the four columns of data in the table are the first 5 recall ratios (%) of the model on all labels (Overall), common labels (frequency), few sample labels (Few-shot) and Zero sample labels (Zero-shot), and it can be seen from the experimental results that the use of the meta-learning method based on the special sampling strategy on the basis of the AGRU-KAMG model can bring significant performance improvement to the AGRU-KAMG model, especially on the few sample labels (Few-shot) and Zero sample labels (Zero-shot).

The present invention provides a method and a concept for a large-scale multi-label text classification method based on meta-learning, and a method and a way for implementing the technical scheme are numerous, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of improvements and embellishments can be made without departing from the principle of the present invention, and these improvements and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

11页详细技术资料下载

Meta-learning-based large-scale multi-label text classification method

相关技术

网友询问留言